Experiments with learning graphical models on text

Joan Capdevila , Ethan Zhao, Francois Petitjean, Wray Lindsay Buntine

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)

Abstract

A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

Original languageEnglish
Pages (from-to)363-387
Number of pages25
JournalBehaviormetrika
Volume45
Issue number2
DOIs
Publication statusPublished - 1 Oct 2018

Keywords

  • Document analysis
  • Evaluation
  • Graphical models
  • Latent variables
  • Matrix factorisation
  • Unsupervised learning

Cite this