Abstract
We present a novel method for document clustering using sparse representation of documents in conjunction with spectral clustering. An ℓ1-norm optimization formulation is posed to learn the sparse representation of each document, allowing us to characterize the affinity between documents by considering the overall information instead of traditional pairwise similarities. This document affinity is encoded through a graph on which spectral clustering is performed. The decomposition into multiple subspaces allows documents to be part of a sub-group that shares a smaller set of similar vocabulary, thus allowing for cleaner clusters. Extensive experimental evaluations on two real-world datasets from Reuters-21578 and 20Newsgroup corpora show that our proposed method consistently outperforms state-of-the-art algorithms. Significantly, the performance improvement over other methods is prominent for this datasets.
Original language | English |
---|---|
Title of host publication | Proceedings - 12th IEEE International Conference on Data Mining, ICDM 2012 |
Pages | 1092-1097 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 1 Dec 2012 |
Externally published | Yes |
Event | IEEE International Conference on Data Mining 2012 - Brussels, Belgium Duration: 10 Dec 2012 → 13 Dec 2012 Conference number: 12th http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6412852 (Conference Proceedings) |
Publication series
Name | Proceedings - IEEE International Conference on Data Mining, ICDM |
---|---|
ISSN (Print) | 1550-4786 |
Conference
Conference | IEEE International Conference on Data Mining 2012 |
---|---|
Abbreviated title | ICDM 2012 |
Country/Territory | Belgium |
City | Brussels |
Period | 10/12/12 → 13/12/12 |
Internet address |
|
Keywords
- Document clustering
- Sparse representation