A computationally efficient algorithm for learning topical collocation models

Zhendong Zhao, Lan Du, Benjamin Borschinger, John K Pate, Massimiliano Ciaramita, Mark Steedman, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Most existing topic models make the bagof-words assumption that words are generated independently, and so ignore potentially useful information about word order. Previous attempts to use collocations (short sequences of adjacent words) in topic models have either relied on a pipeline approach, restricted attention to bigrams, or resulted in models whose inference does not scale to large corpora. This paper studies how to simultaneously learn both collocations and their topic assignments. We present an efficient reformulation of the Adaptor Grammar-based topical collocation model (AG-colloc) (Johnson, 2010), and develop a point-wise sampling algorithm for posterior inference in this new formulation. We further improve the efficiency of the sampling algorithm by exploiting sparsity and parallelising inference. Experimental results derived in text classification, information retrieval and human evaluation tasks across a range of datasets show that this reformulation scales to hundreds of thousands of documents while maintaining the good performance of the AG-colloc model.
Original languageEnglish
Title of host publicationProceedings of the Conference Volume 1: Long Papers
Subtitle of host publicationThe 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing
EditorsChengqing Zong, Michael Strube
Place of PublicationRed Hook New York USA
PublisherAssociation for Computational Linguistics (ACL)
Pages1460-1469
Number of pages10
Volume1
ISBN (Print)9781941643723
Publication statusPublished - 2015
Externally publishedYes
EventAnnual Meeting of the Association of Computational Linguistics 2015 - Beijing, China
Duration: 26 Jul 201531 Jul 2015
Conference number: 53rd
https://www.aclweb.org/anthology/events/acl-2015/ (Proceedings)

Conference

ConferenceAnnual Meeting of the Association of Computational Linguistics 2015
Abbreviated titleACL-IJCNLP 2015
CountryChina
CityBeijing
Period26/07/1531/07/15
OtherACL has held jointly with International Joint Conference on Natural Language Processing, Proceedings of System Demonstrations
Internet address

Cite this