Experiments with Learning Graphical Models on Text

Joan Capdevila , He Zhao, Francois Petitjean, Wray Lindsay Buntine

    Research output: Contribution to journalArticleResearchpeer-review

    Abstract

    A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.
    LanguageEnglish
    Number of pages25
    JournalBehaviormetrika
    DOIs
    Publication statusAccepted/In press - 2018

    Cite this

    @article{fa91eda48e114cc6953ea9dfd4545123,
    title = "Experiments with Learning Graphical Models on Text",
    abstract = "A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.",
    author = "Joan Capdevila and He Zhao and Francois Petitjean and Buntine, {Wray Lindsay}",
    year = "2018",
    doi = "10.1007/s41237-018-0050-3",
    language = "English",
    journal = "Behaviormetrika",
    issn = "0385-7417",
    publisher = "Springer",

    }

    Experiments with Learning Graphical Models on Text. / Capdevila , Joan; Zhao, He; Petitjean, Francois; Buntine, Wray Lindsay.

    In: Behaviormetrika, 2018.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Experiments with Learning Graphical Models on Text

    AU - Capdevila , Joan

    AU - Zhao, He

    AU - Petitjean, Francois

    AU - Buntine, Wray Lindsay

    PY - 2018

    Y1 - 2018

    N2 - A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

    AB - A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evaluated in different contexts. This paper reports on our experiments with a representative set of state of the art models: chordal graphs, matrix factorisation, and hierarchical latent tree models. For the chordal graphs, we use different scoring functions. For matrix factorisation models, we use different hierarchical priors, asymmetric priors on components. We use Boolean matrix factorisation rather than topic models, so we can do comparable evaluations. The experiments perform a number of evaluations: probability for each document, omni-directional prediction which predicts different variables, and anomaly detection. We find that matrix factorisation performed well at anomaly detection but poorly on the prediction task. Chordal graph learning performed the best generally, and probably due to its lower bias, often out-performed hierarchical latent trees.

    U2 - 10.1007/s41237-018-0050-3

    DO - 10.1007/s41237-018-0050-3

    M3 - Article

    JO - Behaviormetrika

    T2 - Behaviormetrika

    JF - Behaviormetrika

    SN - 0385-7417

    ER -