A random finite set model for data clustering

Dinh Phung, Ba-Ngu Vo

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

4 Citations (Scopus)


The goal of data clustering is to partition data points into groups to optimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.
Original languageEnglish
Title of host publicationFUSION 2014 - 17th International Conference on Information Fusion (FUSION)
Subtitle of host publicationSalamanca, 7-10th July 2014
EditorsJavier Bajo, Stefano Coraluppi
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)9788490123553
Publication statusPublished - 2014
Externally publishedYes
EventInternational Conference on Information Fusion 2014 - Salamanca, Spain
Duration: 7 Jul 201410 Jul 2014
Conference number: 17th
https://ieeexplore.ieee.org/xpl/conhome/6900113/proceeding (Proceedings)

Publication series

NameFUSION 2014 - 17th International Conference on Information Fusion


ConferenceInternational Conference on Information Fusion 2014
Abbreviated titleFUSION 2014
Internet address

Cite this