Variational autoencoders for sparse and overdispersed discrete data

He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung, Mingyuan Zhou

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.
Original languageEnglish
Title of host publicationProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
EditorsSilvia Chiappa, Roberto Calandra
PublisherProceedings of Machine Learning Research (PMLR)
Number of pages10
Publication statusPublished - 2020
EventInternational Conference on Artificial Intelligence and Statistics 2020 - Virtual, Italy
Duration: 3 Jun 20205 Jun 2020
Conference number: 23rd (Website) (Proceedings)


ConferenceInternational Conference on Artificial Intelligence and Statistics 2020
Abbreviated titleAISTATS 2020
Internet address

Cite this