MML clustering of continuous-valued data using Gaussian and t distributions

Yudi Agusta, David L Dowe

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    7 Citations (Scopus)

    Abstract

    Clustering, also known as mixture modelling or intrinsic classification, is the problem of identifying and modelling components (or clusters, or classes) in a body of data. We consider here the application of the Minimum Message Length (MML) principle to a clustering problem of Gaussian and t distributions. Earlier work in the MML clustering was conducted in regards to the multinomial and Gaussian distributions (Wallace and Boulton, 1968) and in addition, the von Mises circular and Poisson distributions (Wallace and Dowe, 1994, 2000). Our current work extends this by applying the Gaussian distribution to the more general t distribution. Point estimation of the t distribution is performed using the MML approximation proposed by Wallace and Freeman (1987). A comparison of the MML estimations of the t distribution to those of the Maximum Likelihood (ML) method in terms of their Kullback-Leibler (KL) distances is also provided. Within each component, our application also performs a model selection on whether a particular group of data is best modelled as a Gaussian or a t distribution. The proposed modelling method is then applied to several artificially generated datasets. The modelling results are compared to the results obtained when using the MML clustering of Gaussian distributions. Our modelling method compares quite well to an alternative clustering program (EMMIX) which uses various modelling criteria such as the Akaike Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (BIC).
    Original languageEnglish
    Title of host publicationAI 2002: Advances in Artificial Intelligence
    Subtitle of host publication15th Australian Joint Conference on Artificial Intelligence Canberra, Australia, December 2-6, 2002 Proceedings
    EditorsBob McKay, John Slaney
    Place of PublicationBerlin Germany
    PublisherSpringer
    Pages143-154
    Number of pages12
    ISBN (Print)3540001972
    DOIs
    Publication statusPublished - 2002
    EventAustralasian Joint Conference on Artificial Intelligence 2002 - Canberra, Australia
    Duration: 2 Dec 20026 Dec 2002
    Conference number: 15th
    https://link.springer.com/book/10.1007/3-540-36187-1 (Proceedings)

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer
    Volume2557
    ISSN (Print)0302-9743

    Conference

    ConferenceAustralasian Joint Conference on Artificial Intelligence 2002
    Abbreviated titleAI 2002
    Country/TerritoryAustralia
    CityCanberra
    Period2/12/026/12/02
    Internet address

    Cite this