Modelling multilevel data in multimedia: a hierarchical factor analysis approach

Sunil Gupta, Dinh Phung, Svetha Venkatesh

Research output: Contribution to journalArticleResearchpeer-review


Multimedia content understanding research requires rigorous approach to deal with the complexity of the data. At the crux of this problem is the method to deal with multilevel data whose structure exists at multiple scales and across data sources. A common example is modeling tags jointly with images to improve retrieval, classification and tag recommendation. Associated contextual observation, such as metadata, is rich that can be exploited for content analysis. A major challenge is the need for a principal approach to systematically incorporate associated media with the primary data source of interest. Taking a factor modeling approach, we propose a framework that can discover low-dimensional structures for a primary data source together with other associated information. We cast this task as a subspace learning problem under the framework of Bayesian nonparametrics and thus the subspace dimensionality and the number of clusters are automatically learnt from data instead of setting these parameters a priori. Using Beta processes as the building block, we construct random measures in a hierarchical structure to generate multiple data sources and capture their shared statistical at the same time. The model parameters are inferred efficiently using a novel combination of Gibbs and slice sampling. We demonstrate the applicability of the proposed model in three applications: image retrieval, automatic tag recommendation and image classification. Experiments using two real-world datasets show that our approach outperforms various state-of-the-art related methods.

Original languageEnglish
Pages (from-to)4933-4955
Number of pages23
JournalMultimedia Tools and Applications
Issue number9
Publication statusPublished - May 2016
Externally publishedYes


  • Bayesian nonparametrics
  • Beta process
  • Dirichlet process
  • Multilevel data
  • Multimedia
  • Semantic gap

Cite this