On the efficiency of k-means clustering: Evaluation, optimization, and algorithm selection

Sheng Wang, Yuan Sun, Zhifeng Bao

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper presents a thorough evaluation of the existing methods that accelerate Lloyd’s algorithm for fast k-means clustering. To do so, we analyze the pruning mechanisms of existing methods, and summarize their common pipeline into a unified evaluation framework UniK. UniK embraces a class of well-known methods and enables a ne-grained performance breakdown. Within UniK, we thoroughly evaluate the pros and cons of existing methods using multiple performance metrics on a number of datasets. Furthermore, we derive an optimized algorithm over UniK, which effectively hybridizes multiple existing methods for more aggressive pruning. To take this further, we investigate whether the most effcient method for a given clustering task can be automatically selected by machine learning, to bene t practitioners and researchers.

Original languageEnglish
Pages (from-to)163-175
Number of pages13
JournalProceedings of the VLDB Endowment
Volume14
Issue number2
DOIs
Publication statusPublished - 2020
Externally publishedYes

Cite this