A left-to-right algorithm for likelihood estimation in gamma-poisson factor analysis

Joan Capdevila, Jesús Cerquides, Jordi Torres, François Petitjean, Wray Buntine

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)


Computing the probability of unseen documents is a natural evaluation task in topic modeling. Previous work has addressed this problem for the well-known Latent Dirichlet Allocation (LDA) model. However, the same problem for a more general class of topic models, referred here to as Gamma-Poisson Factor Analysis (GaP-FA), remains unexplored, which hampers a fair comparison between models. Recent findings on the exact marginal likelihood of GaP-FA enable the derivation of a closed-form expression. In this paper, we show that its exact computation grows exponentially with the number of topics and non-zero words in a document, thus being only solvable for relatively small models and short documents. Experimentation in various corpus also indicates that existing methods in the literature are unlikely to accurately estimate this probability. With that in mind, we propose L2R, a left-to-right sequential sampler that decomposes the document probability into a product of conditionals and estimates them separately. We then proceed by confirming that our estimator converges and is unbiased for both small and large collections. Code related to this paper is available at: https://github.com/jcapde/L2R, https://doi.org/10.7910/DVN/GDTAAC.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationEuropean Conference, ECML PKDD 2018 Dublin, Ireland, September 10–14, 2018 Proceedings, Part II
EditorsMichele Berlingerio, Francesco Bonchi, Thomas Gärtner, Neil Hurley, Georgiana Ifrim
Place of PublicationCham Switzerland
Number of pages17
ISBN (Electronic)9783030109288
ISBN (Print)9783030109271
Publication statusPublished - 2019
EventEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases 2018 - Dublin, Ireland
Duration: 10 Sep 201814 Sep 2018
https://link.springer.com/book/10.1007/978-3-030-10925-7 (Proceedings)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Machine Learning European Conference on Principles and Practice of Knowledge Discovery in Databases 2018
Abbreviated titleECML PKDD 2018
Internet address


  • Estimation methods
  • Factor analysis
  • Gamma-poisson
  • Importance sampling
  • Left-to-right
  • Topic models

Cite this