Experiments with learning graphical models on text

Research output: Contribution to journalArticleResearchpeer-review

Abstract

A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.
Original languageEnglish
Pages (from-to)363-387
Number of pages25
JournalBehaviormetrika
Volume45
Issue number2
DOIs
Publication statusPublished - Oct 2018

Cite this

@article{fa91eda48e114cc6953ea9dfd4545123,
title = "Experiments with learning graphical models on text",
abstract = "A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.",
author = "Joan Capdevila and Ethan Zhao and Francois Petitjean and Buntine, {Wray Lindsay}",
year = "2018",
month = "10",
doi = "10.1007/s41237-018-0050-3",
language = "English",
volume = "45",
pages = "363--387",
journal = "Behaviormetrika",
issn = "0385-7417",
publisher = "Springer",
number = "2",

}

Experiments with learning graphical models on text. / Capdevila , Joan; Zhao, Ethan; Petitjean, Francois; Buntine, Wray Lindsay.

In: Behaviormetrika, Vol. 45, No. 2, 10.2018, p. 363-387.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Experiments with learning graphical models on text

AU - Capdevila , Joan

AU - Zhao, Ethan

AU - Petitjean, Francois

AU - Buntine, Wray Lindsay

PY - 2018/10

Y1 - 2018/10

N2 - A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

AB - A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

U2 - 10.1007/s41237-018-0050-3

DO - 10.1007/s41237-018-0050-3

M3 - Article

VL - 45

SP - 363

EP - 387

JO - Behaviormetrika

JF - Behaviormetrika

SN - 0385-7417

IS - 2

ER -