OTLDA: a geometry-aware optimal transport approach for topic modeling

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

We present an optimal transport framework for learning topics from textual data. While the celebrated Latent Dirichlet allocation (LDA) topic model and its variants have been applied to many disciplines, they mainly focus on word-occurrences and neglect to incorporate semantic regularities in language. Even though recent works have tried to exploit the semantic relationship between words to bridge this gap, they, however, these models which are usually extensions of LDA or Dirichlet Multinomial mixture (DMM) are tailored to deal effectively with either regular or short documents. The optimal transport distance provides an appealing tool to incorporate the geometry of word semantics into it. Moreover, recent developments on efficient computation of optimal transport distance also promote its application in topic modeling. In this paper we ground on optimal transport theory to naturally exploit the geometric structures of semantically related words in embedding spaces which leads to more interpretable learned topics. Comprehensive experiments illustrate that the proposed framework outperforms competitive approaches in terms of topic coherence on assorted text corpora which include both long and short documents. The representation of learned topic also leads to better accuracy on classification downstream tasks, which is considered as an extrinsic evaluation.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 33 (NeurIPS 2020)
EditorsH. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin
Place of PublicationSan Diego CA USA
PublisherNeural Information Processing Systems (NIPS)
Number of pages10
Publication statusPublished - 2020
EventAdvances of Neural Information Processing Systems 2020 - Online, Virtual, Online, United States of America
Duration: 6 Dec 202012 Dec 2020
Conference number: 34th
https://proceedings.neurips.cc/paper/2020 (Proceedings )
https://nips.cc/Conferences/2020 (Website)

Publication series

NameAdvances in Neural Information Processing Systems
PublisherMorgan Kaufmann Publishers
Volume2020-December
ISSN (Print)1049-5258

Conference

ConferenceAdvances of Neural Information Processing Systems 2020
Abbreviated titleNeurIPS 2020
Country/TerritoryUnited States of America
CityVirtual, Online
Period6/12/2012/12/20
Internet address

Cite this