Abstract
Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paper begins with a review of the basic theory of the variational extension to the expectation-maximization algorithm, and then presents discrete component finding algorithms in that light. Experiments are conducted on both bigram word data and document bag-of-word to expose some of the subtleties of this new class of algorithms.
Original language | English |
---|---|
Title of host publication | Machine Learning |
Subtitle of host publication | ECML 2002 - 13th European Conference on Machine Learning, Proceedings |
Editors | Tapio Elomaa, Heikki Mannila, Hannu Toivonen |
Publisher | Springer-Verlag London Ltd. |
Pages | 23-34 |
Number of pages | 12 |
ISBN (Print) | 9783540440369 |
Publication status | Published - 1 Jan 2002 |
Event | European Conference on Machine Learning 2002 - Helsinki, Finland Duration: 19 Aug 2002 → 23 Aug 2002 Conference number: 13th https://link.springer.com/book/10.1007/3-540-36755-1 (Proceedings) |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 2430 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Machine Learning 2002 |
---|---|
Abbreviated title | ECML 2002 |
Country/Territory | Finland |
City | Helsinki |
Period | 19/08/02 → 23/08/02 |
Internet address |
|