Joint attributes and event analysis for multimedia event detection

Zhigang Ma, Xiaojun Chang, Zhongwen Xu, Nicu Sebe, Alexander G. Hauptmann

Research output: Contribution to journalArticleResearchpeer-review

43 Citations (Scopus)


Semantic attributes have been increasingly used the past few years for multimedia event detection (MED) with promising results. The motivation is that multimedia events generally consist of lower level components such as objects, scenes, and actions. By characterizing multimedia event videos with semantic attributes, one could exploit more informative cues for improved detection results. Much existing work obtains semantic attributes from images, which may be suboptimal for video analysis since these image-inferred attributes do not carry dynamic information that is essential for videos. To address this issue, we propose to learn semantic attributes from external videos using their semantic labels. We name them video attributes in this paper. In contrast with multimedia event videos, these external videos depict lower level contents such as objects, scenes, and actions. To harness video attributes, we propose an algorithm established on a correlation vector that correlates them to a target event. Consequently, we could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework. To validate our method, we perform experiments on the real-world large-scale TRECVID MED 2013 and 2014 data sets and compare our method with several state-of-the-art algorithms. The experiments show that our method is advantageous for MED.

Original languageEnglish
Pages (from-to)2921-2930
Number of pages10
JournalIEEE Transactions on Neural Networks and Learning Systems
Issue number7
Publication statusPublished - Jul 2017
Externally publishedYes


  • Correlation uncovering
  • multimedia event detection (MED)
  • video attributes

Cite this