Abstract
Most existing topic models make the bagof-words assumption that words are generated independently, and so ignore potentially useful information about word order. Previous attempts to use collocations (short sequences of adjacent words) in topic models have either relied on a pipeline approach, restricted attention to bigrams, or resulted in models whose inference does not scale to large corpora. This paper studies how to simultaneously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor Grammar-based topical collocation model (AG-colloc) (Johnson, 2010), and develop a point-wise sampling algorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification, information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model.
Original language | English |
---|---|
Title of host publication | Proceedings of the Conference Volume 1: Long Papers |
Subtitle of host publication | The 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing |
Editors | Chengqing Zong, Michael Strube |
Place of Publication | Red Hook New York USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1460-1469 |
Number of pages | 10 |
Volume | 1 |
ISBN (Print) | 9781941643723 |
Publication status | Published - 2015 |
Externally published | Yes |
Event | Annual Meeting of the Association of Computational Linguistics and International Joint Conference on Natural Language Processing 2015 - Beijing, China Duration: 26 Jul 2015 → 31 Jul 2015 Conference number: 53rd https://www.aclweb.org/anthology/events/acl-2015/ (Proceedings) |
Conference
Conference | Annual Meeting of the Association of Computational Linguistics and International Joint Conference on Natural Language Processing 2015 |
---|---|
Abbreviated title | ACL-IJCNLP 2015 |
Country/Territory | China |
City | Beijing |
Period | 26/07/15 → 31/07/15 |
Other | ACL has held jointly with International Joint Conference on Natural Language Processing, Proceedings of System Demonstrations |
Internet address |
|