Topic models with topic ordering regularities for topic segmentation

Lan Du, John K Pate, Mark Johnson

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

5 Citations (Scopus)

Abstract

Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.
Original languageEnglish
Title of host publicationProceedings - 14th IEEE International Conference on Data Mining, ICDM 2014
Subtitle of host publicationShenzhen, China / 14-17 December 2014
EditorsRavi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, Xindong Wu
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages803-808
Number of pages6
ISBN (Electronic)9781479943029
ISBN (Print)9781479943036
DOIs
Publication statusPublished - 2014
Externally publishedYes
EventIEEE International Conference on Data Mining 2014 - Shenzhen, China
Duration: 14 Dec 201417 Dec 2014
Conference number: 14th
http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7022262 (Conference Proceedings)

Conference

ConferenceIEEE International Conference on Data Mining 2014
Abbreviated titleICDM 2014
Country/TerritoryChina
CityShenzhen
Period14/12/1417/12/14
Internet address

Keywords

  • topic model
  • topic segmentation
  • top-t ordering

Cite this