Single factor analysis in MML mixture modelling

Russell T. Edwards, David L. Dowe

    Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

    23 Citations (Scopus)


    Mixture modelling concerns the unsupervised discovery of clusters within data. Most current clustering algorithms assume that variables within classes are uncorrelated. We present a method for producing and evaluating models which account for inter-attribute correlation within classes with a single Gaussian linear factor. The method used is Minimum Message Length (MML), an invariant, information-theoretic Bayesian hypothesis evaluation criterion. Our work extends and unifies that of Wallace and Boulton (1968) and Wallace and Freeman (1992), concerned respectively with MML mixture modelling and MML single factor analysis. Results on simulated data are comparable to those of Wallace and Freeman (1992), outperforming Maximum Likelihood. We include an application of mixture modelling with single factors on spectral data from the Infrared Astronomical Satellite. Our model shows fewer unnecessary classes than that produced by AutoClass (Goebel et. al. 1989) due to the use of factors in modelling correlation.

    Original languageEnglish
    Title of host publicationResearch and Development in Knowledge Discovery and Data Mining - 2nd Pacific-Asia Conference, PAKDD 1998, Proceedings
    EditorsXindong Wu, Ramamohanarao Kotagiri, Kevin B. Korb
    Number of pages14
    ISBN (Print)3540643834, 9783540643838
    Publication statusPublished - 1998
    EventPacific-Asia Conference on Knowledge Discovery and Data Mining 1998 - Melbourne, Australia
    Duration: 15 Apr 199817 Apr 1998
    Conference number: 2nd (Proceedings)

    Publication series

    NameLecture Notes in Computer Science
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349


    ConferencePacific-Asia Conference on Knowledge Discovery and Data Mining 1998
    Abbreviated titlePAKDD 1988
    Internet address


    • Induction in KDD
    • Minimum message length
    • MML
    • Noise handling
    • Statistical and machine learning

    Cite this