Abstract
Text classification is the task of assigning predefined categories to text documents. It is a common machine learning problem. Statistical text classification that makes use of machine learning methods to learn classification rules are particularly known to be successful in this regard. In this research project we are trying to re-invent the text classification problem with a sound methodology based on statistical data compression technique-the Minimum Message Length (MML) principle. To model the data sequence we have used the Probabilistic Finite State Automata (PFSAs). We propose two approaches for text classification using the MML-PFSAs. We have tested both the approaches with the Enron spam dataset and the results of our empirical evaluation has been recorded in terms of the well known classification measures i.e. recall, precision, accuracy and error. The results indicate good classification accuracy that can be compared with the state of art classifiers.
Original language | English |
---|---|
Title of host publication | Proceedings on 5th International Conference on Eco-Friendly Computing and Communication Systems, ICECCS 2016 |
Editors | Vinod Prasad , Ashutosh Kumar Singh , Jimson Mathew |
Place of Publication | Piscataway USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 1-6 |
Number of pages | 6 |
ISBN (Electronic) | 9781509043590 |
ISBN (Print) | 9781509043560 |
DOIs | |
Publication status | Published - 5 Apr 2017 |
Event | 5th International Conference on Eco-Friendly Computing and Communication Systems, ICECCS 2016 - Bhopal, Madhya Pradesh, India Duration: 8 Nov 2016 → 9 Nov 2016 |
Conference
Conference | 5th International Conference on Eco-Friendly Computing and Communication Systems, ICECCS 2016 |
---|---|
Country/Territory | India |
City | Bhopal, Madhya Pradesh |
Period | 8/11/16 → 9/11/16 |
Keywords
- Minimum Message Length (MML)
- Probabilistic Finite State Automaton (PFSA)
- Spam Filtering