A unified framework with a benchmark dataset for surveillance event detection

Zhicheng Zhao, Xuanchong Li, Xingzhong Du, Qi Chen, Yanyun Zhao, Fei Su, Xiaojun Chang, Alexander G. Hauptmann

Research output: Contribution to journalArticleResearchpeer-review

4 Citations (Scopus)


As an important branch of multimedia content analysis, Surveillance Event Detection (SED) is still a quite challenging task due to high abstraction and complexity such as occlusions, cluttered backgrounds and viewpoint changes etc. To address the problem, we propose a unified SED detection framework which divides events into two categories, i.e., short-term events and long-duration events. The former can be represented as a kind of snapshots of static key-poses and embodies an inner-dependencies, while the latter contains complex interactions between pedestrians, and shows obvious inter-dependencies and temporal context. For short-term event, a novel cascade Convolutional Neural Network (CNN)–HsNet is first constructed to detect the pedestrian, and then the corresponding events are classified. For long-duration event, Dense Trajectory (DT) and Improved Dense Trajectory (IDT) are first applied to explore the temporal features of the events respectively, and subsequently, Fisher Vector (FV) coding is adopted to encode raw features and linear SVM classifiers are learned to predict. Finally, a heuristic fusion scheme is used to obtain the results. In addition, a new large-scale pedestrian dataset, named SED-PD, is built for evaluation. Comprehensive experiments on TRECVID SEDtest datasets demonstrate the effectiveness of proposed framework.

Original languageEnglish
Pages (from-to)62-74
Number of pages13
Publication statusPublished - Feb 2018
Externally publishedYes


  • Cascade CNN
  • Pedestrian dataset
  • Pedestrian detection
  • Surveillance event detection

Cite this