Neural Topic Model via Optimal Transport

He Zhao, Dinh Phung, Viet Huynh, Trung Le, Wray Buntine

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

39 Citations (Scopus)

Abstract

Recently, Neural Topic Models (NTMs) inspired by variational autoencoders have obtained increasingly research interest due to their promising results on text analysis. However, it is usually hard for existing NTMs to achieve good document representation and coherent/diverse topics at the same time. Moreover, they often degrade their performance severely on short documents. The requirement of reparameterisation could also comprise their training quality and model flexibility. To address these shortcomings, we present a new neural topic model via the theory of optimal transport (OT). Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions. Importantly, the cost matrix of the OT distance models the weights between topics and words, which is constructed by the distances between topics and words in an embedding space. Our proposed model can be trained efficiently with a differentiable loss. Extensive experiments show that our framework significantly outperforms the state-of-the-art NTMs on discovering more coherent and diverse topics and deriving better document representations for both regular and short texts.

Original languageEnglish
Title of host publicationThe Ninth International Conference on Learning Representations
EditorsAlice Oh, Naila Murray, Ivan Titov
Place of PublicationUSA
PublisherOpenReview
Number of pages15
Publication statusPublished - 2021
EventInternational Conference on Learning Representations 2022 - Online, United States of America
Duration: 25 Apr 202229 Apr 2022
Conference number: 10th
https://openreview.net/group?id=ICLR.cc/2022/Conference (Peer Reviews)
https://iclr.cc/Conferences/2022 (Website)

Conference

ConferenceInternational Conference on Learning Representations 2022
Abbreviated titleICLR 2022
Country/TerritoryUnited States of America
Period25/04/2229/04/22
Internet address

Cite this