Sparse subspace representation for spectral document clustering

Budhaditya Saha, Dinh Phung, Duc Son Pham, Svetha Venkatesh

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pairwise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.

Original languageEnglish
Title of host publicationProceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
Pages1092-1097
Number of pages6
DOIs
Publication statusPublished - 1 Dec 2012
Externally publishedYes
Event12th IEEE International Conference on Data Mining, ICDM 2012 - Brussels, Belgium
Duration: 10 Dec 201213 Dec 2012

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference12th IEEE International Conference on Data Mining, ICDM 2012
CountryBelgium
CityBrussels
Period10/12/1213/12/12

Keywords

  • Document clustering
  • Sparse representation

Cite this

Saha, B., Phung, D., Pham, D. S., & Venkatesh, S. (2012). Sparse subspace representation for spectral document clustering. In Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012 (pp. 1092-1097). (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDM.2012.46