Abstract
Mixture modelling concerns the unsupervised discovery of clusters within data. Most current clustering algorithms assume that variables within classes are uncorrelated. We present a method for producing and evaluating models which account for inter-attribute correlation within classes with a single Gaussian linear factor. The method used is Minimum Message Length (MML), an invariant, information-theoretic Bayesian hypothesis evaluation criterion. Our work extends and unifies that of Wallace and Boulton (1968) and Wallace and Freeman (1992), concerned respectively with MML mixture modelling and MML single factor analysis. Results on simulated data are comparable to those of Wallace and Freeman (1992), outperforming Maximum Likelihood. We include an application of mixture modelling with single factors on spectral data from the Infrared Astronomical Satellite. Our model shows fewer unnecessary classes than that produced by AutoClass (Goebel et. al. 1989) due to the use of factors in modelling correlation.
Original language | English |
---|---|
Title of host publication | Research and Development in Knowledge Discovery and Data Mining - 2nd Pacific-Asia Conference, PAKDD 1998, Proceedings |
Editors | Xindong Wu, Ramamohanarao Kotagiri, Kevin B. Korb |
Publisher | Springer |
Pages | 96-109 |
Number of pages | 14 |
ISBN (Print) | 3540643834, 9783540643838 |
DOIs | |
Publication status | Published - 1998 |
Event | Pacific-Asia Conference on Knowledge Discovery and Data Mining 1998 - Melbourne, Australia Duration: 15 Apr 1998 → 17 Apr 1998 Conference number: 2nd https://link.springer.com/book/10.1007/3-540-64383-4 (Proceedings) |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 1394 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Pacific-Asia Conference on Knowledge Discovery and Data Mining 1998 |
---|---|
Abbreviated title | PAKDD 1988 |
Country/Territory | Australia |
City | Melbourne |
Period | 15/04/98 → 17/04/98 |
Internet address |
|
Keywords
- Induction in KDD
- Minimum message length
- MML
- Noise handling
- Statistical and machine learning