Scalable nonparametric Bayesian multilevel clustering

Viet Huynh, Dinh Phung, Svetha Venkatesh, XuanLong Nguyen, Matt Hoffman, Hung Hai Bui

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

5 Citations (Scopus)


Multilevel clustering problems where the content and contextual information are jointly clustered are ubiquitous in modern datasets. Existing works on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multilevel clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in structured mean-field and stochastic variational inference (SVI) and develop a treestructured SVI algorithm that exploits the interplay between content and context modeling. Our new algorithm avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation on the Apache Spark platform. We conduct extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-ofmagnitude faster without loss of model quality. Our Spark-based implementation gains another order-of-magnitude speedup and can scale to large real-world datasets containing millions.

Original languageEnglish
Title of host publication32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016
Subtitle of host publicationJersey City, New Jersey, USA 25-29 June 2016
EditorsAlexander Ihler , Dominik Janzing
Place of PublicationRed Hook NY USA
PublisherAssociation For Uncertainty in Artificial Intelligence (AUAI)
Number of pages10
ISBN (Electronic)9781510827806
Publication statusPublished - 2016
Externally publishedYes
EventConference in Uncertainty in Artificial Intelligence 2016 - Jersey City, United States of America
Duration: 25 Jun 201629 Jun 2016
Conference number: 32nd (Proceedings)


ConferenceConference in Uncertainty in Artificial Intelligence 2016
Abbreviated titleUAI 2016
Country/TerritoryUnited States of America
CityJersey City
Internet address

Cite this