Bi-level semantic representation analysis for multimedia event detection

Xiaojun Chang, Zhigang Ma, Yi Yang, Zhiqiang Zeng, Alexander G. Hauptmann

Research output: Contribution to journalArticleResearchpeer-review

212 Citations (Scopus)


Multimedia event detection has been one of the major endeavors in video event analysis. A variety of approaches have been proposed recently to tackle this problem. Among others, using semantic representation has been accredited for its promising performance and desirable ability for human-understandable reasoning. To generate semantic representation, we usually utilize several external image/video archives and apply the concept detectors trained on them to the event videos. Due to the intrinsic difference of these archives, the resulted representation is presumable to have different predicting capabilities for a certain event. Notwithstanding, not much work is available for assessing the efficacy of semantic representation from the source-level. On the other hand, it is plausible to perceive that some concepts are noisy for detecting a specific event. Motivated by these two shortcomings, we propose a bi-level semantic representation analyzing method. Regarding source-level, our method learns weights of semantic representation attained from different multimedia archives. Meanwhile, it restrains the negative influence of noisy or irrelevant concepts in the overall concept-level. In addition, we particularly focus on efficient multimedia event detection with few positive examples, which is highly appreciated in the real-world scenario. We perform extensive experiments on the challenging TRECVID MED 2013 and 2014 datasets with encouraging results that validate the efficacy of our proposed approach.

Original languageEnglish
Article number7442559
Pages (from-to)1180-1197
Number of pages18
JournalIEEE Transactions on Cybernetics
Issue number5
Publication statusPublished - May 2017
Externally publishedYes


  • Bi-level
  • concept-level
  • multimedia event detection (MED)
  • semantic representation
  • source-level

Cite this