Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression

Li Wan, Tansu Alpcan, Margreta Kuijper, Emanuele Viterbo

Research output: Contribution to journalArticleResearchpeer-review

Abstract

We propose a novel supervised dictionary learning framework for text classification, integrating the Lempel-Ziv-Welch (LZW) algorithm for data compression and dictionary construction. This two-phase approach refines dictionaries by optimizing dictionary atoms for discriminative power using mutual information and class distribution. Our method facilitates classifier training, such as SVMs and neural networks. We introduce the information plane area rank (IPAR) to evaluate the information-theoretic performance of our algorithm. Tested on six benchmark text datasets, our model performs nearly as well as top models in limited-vocabulary settings, lagging by only about 2% while using just 10% of the parameters. However, its performance drops in diverse-vocabulary contexts due to the LZW algorithm's limitations with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.

Original languageEnglish
Number of pages6
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusAccepted/In press - 1 Jul 2024

Keywords

  • Accuracy
  • Atoms
  • Classification algorithms
  • Dictionaries
  • Dictionary learning
  • information bottleneck
  • information theory
  • Neural networks
  • supervised learning
  • Text categorization
  • Vectors

Cite this