A novel ensemble representation learning method for document classification

P. Sharmila, S. Venkatesh, C. Deisy, S. Parthasarathy, S. Parasuraman

Research output: Chapter in Book/Report/Conference proceedingConference PaperOther

Abstract

Representation learning is the central role for all natural language processing task. Bag of Words method lacks in semantics and word order, hence word embedding model Word2Vec is used to capture the word semantics. But for morphological rich language, the vector representation would be noisy due to polysemy. To address these problems, Bag of Concepts is introduced to capture association between the words in the documents and forms concept cluster. Sometimes, Bag of Concepts representations may ignore the syntax for large amount of data. Hence a novel ensemble representation learning method for document classification is proposed by combining the word2vec and Bag of Concepts model to tackle the above mentioned problems. Extensive results on the Reuter datasets, show that the proposed model for document classification outperforms the baseline model in terms of F1 score.

Original languageEnglish
Title of host publication2018 IEEE 4th International Symposium in Robotics and Manufacturing Automation, ROMA 2018
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISBN (Electronic)9781728103747
DOIs
Publication statusPublished - 2018
EventIEEE International Symposium in Robotics and Manufacturing Automation 2018 - Perambalur, Tamil Nadu, India
Duration: 10 Dec 201812 Dec 2018
Conference number: 4th
https://ieeexplore.ieee.org/xpl/conhome/8976031/proceeding (Proceedings)

Conference

ConferenceIEEE International Symposium in Robotics and Manufacturing Automation 2018
Abbreviated titleROMA 2018
Country/TerritoryIndia
CityPerambalur, Tamil Nadu
Period10/12/1812/12/18
Internet address

Keywords

  • bag-of-concepts
  • document classification
  • Representation learning
  • word2vec

Cite this