Improved Consensus Clustering via linear programming

Nicholas Downing, Peter J. Stuckey, Anthony Wirth

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)

Abstract

We consider the problem of Consensus Clustering. Given a finite set of input clusterings over some data items, a consensus clustering is a partitioning of the items which matches as closely as possible the given input clusterings. The best exact approach to tackling this problem is by modelling it as a Boolean Integer Program (BIP). Unfortunately, the size of the BIP grows cubically in the number of data items, hence this method is applicable to only small sets of items. In this paper we show how to tackle the problem progressively, leading to much improved solution times and far less memory usage than previously. For the case where approximate clusterings are acceptable, we show a number of heuristic techniques for extracting good clusterings from the solutions of the linear relaxation of the BIP, and on several very large data sets we demonstrate much higher quality approximations than previously possible. When optimal solutions are desired, the problem is much harder, and we present some novel and existing techniques that can assist in finding candidate answers and proving the optimality thereof. For the first time we present optimal Consensus Clusterings for several complete, albeit small, data sets.

Original languageEnglish
Title of host publicationComputer Science 2010 - Proceedings of the 33rd Australasian Computer Science Conference, ACSC 2010
Pages61-69
Number of pages9
Publication statusPublished - 1 Dec 2010
Externally publishedYes
EventAustralasian Computer Science Conference 2010 - Brisbane, Australia
Duration: 18 Jan 201022 Jan 2010
Conference number: 33rd

Publication series

NameConferences in Research and Practice in Information Technology Series
Volume102
ISSN (Print)1445-1336

Conference

ConferenceAustralasian Computer Science Conference 2010
Abbreviated titleACSC 2010
CountryAustralia
CityBrisbane
Period18/01/1022/01/10

Keywords

  • Clustering
  • Combinatorial optimization
  • Integer programming
  • Linear programming

Cite this