TY - JOUR
T1 - Enhanced visual analysis for cluster tendency assessment and data partitioning
AU - Wang, Liang
AU - Geng, Xin
AU - Bezdek, James
AU - Leckie, Christopher
AU - Ramamohanarao, Kotagiri
PY - 2010
Y1 - 2010
N2 - Visual methods have been widely studied and used in data cluster analysis. Given a pairwise dissimilarity matrix \schmi D of a set of n objects, visual methods such as the VAT algorithm generally represent \schmi D as an n\times n image \rm I (\tilde \schmi D ) where the objects are reordered to reveal hidden cluster structure as dark blocks along the diagonal of the image. A major limitation of such methods is their inability to highlight cluster structure when \schmi D contains highly complex clusters. This paper addresses this limitation by proposing a Spectral VAT algorithm, where \schmi D is mapped to \schmi D ^ \prime in a graph embedding space and then reordered to \tilde \schmi D ^ \prime using the VAT algorithm. A strategy for automatic determination of the number of clusters in \rm I ( \tilde \schmi D ^ \prime ) is then proposed, as well as a visual method for cluster formation from \rm I ( \tilde \schmi D ^ \prime ) based on the difference between diagonal blocks and off-diagonal blocks. A sampling-based extended scheme is also proposed to enable visual cluster analysis for large data sets. Extensive experimental results on several synthetic and real-world data sets validate our algorithms.
AB - Visual methods have been widely studied and used in data cluster analysis. Given a pairwise dissimilarity matrix \schmi D of a set of n objects, visual methods such as the VAT algorithm generally represent \schmi D as an n\times n image \rm I (\tilde \schmi D ) where the objects are reordered to reveal hidden cluster structure as dark blocks along the diagonal of the image. A major limitation of such methods is their inability to highlight cluster structure when \schmi D contains highly complex clusters. This paper addresses this limitation by proposing a Spectral VAT algorithm, where \schmi D is mapped to \schmi D ^ \prime in a graph embedding space and then reordered to \tilde \schmi D ^ \prime using the VAT algorithm. A strategy for automatic determination of the number of clusters in \rm I ( \tilde \schmi D ^ \prime ) is then proposed, as well as a visual method for cluster formation from \rm I ( \tilde \schmi D ^ \prime ) based on the difference between diagonal blocks and off-diagonal blocks. A sampling-based extended scheme is also proposed to enable visual cluster analysis for large data sets. Extensive experimental results on several synthetic and real-world data sets validate our algorithms.
UR - http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05288527
U2 - 10.1109/TKDE.2009.192
DO - 10.1109/TKDE.2009.192
M3 - Article
VL - 22
SP - 1401
EP - 1414
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
SN - 1041-4347
IS - 10
ER -