Abstract
We consider the problem of Consensus Clustering. Given a finite set of input clusterings over some data items, a consensus clustering is a partitioning of the items which matches as closely as possible the given input clusterings. The best exact approach to tackling this problem is by modelling it as a Boolean Integer Program (BIP). Unfortunately, the size of the BIP grows cubically in the number of data items, hence this method is applicable to only small sets of items. In this paper we show how to tackle the problem progressively, leading to much improved solution times and far less memory usage than previously. For the case where approximate clusterings are acceptable, we show a number of heuristic techniques for extracting good clusterings from the solutions of the linear relaxation of the BIP, and on several very large data sets we demonstrate much higher quality approximations than previously possible. When optimal solutions are desired, the problem is much harder, and we present some novel and existing techniques that can assist in finding candidate answers and proving the optimality thereof. For the first time we present optimal Consensus Clusterings for several complete, albeit small, data sets.
Original language | English |
---|---|
Title of host publication | Computer Science 2010 - Proceedings of the 33rd Australasian Computer Science Conference, ACSC 2010 |
Pages | 61-69 |
Number of pages | 9 |
Publication status | Published - 1 Dec 2010 |
Externally published | Yes |
Event | Australasian Computer Science Conference 2010 - Brisbane, Australia Duration: 18 Jan 2010 → 22 Jan 2010 Conference number: 33rd |
Publication series
Name | Conferences in Research and Practice in Information Technology Series |
---|---|
Volume | 102 |
ISSN (Print) | 1445-1336 |
Conference
Conference | Australasian Computer Science Conference 2010 |
---|---|
Abbreviated title | ACSC 2010 |
Country/Territory | Australia |
City | Brisbane |
Period | 18/01/10 → 22/01/10 |
Keywords
- Clustering
- Combinatorial optimization
- Integer programming
- Linear programming