TY - JOUR
T1 - A unified framework with a benchmark dataset for surveillance event detection
AU - Zhao, Zhicheng
AU - Li, Xuanchong
AU - Du, Xingzhong
AU - Chen, Qi
AU - Zhao, Yanyun
AU - Su, Fei
AU - Chang, Xiaojun
AU - Hauptmann, Alexander G.
PY - 2018/2
Y1 - 2018/2
N2 - As an important branch of multimedia content analysis, Surveillance Event Detection (SED) is still a quite challenging task due to high abstraction and complexity such as occlusions, cluttered backgrounds and viewpoint changes etc. To address the problem, we propose a unified SED detection framework which divides events into two categories, i.e., short-term events and long-duration events. The former can be represented as a kind of snapshots of static key-poses and embodies an inner-dependencies, while the latter contains complex interactions between pedestrians, and shows obvious inter-dependencies and temporal context. For short-term event, a novel cascade Convolutional Neural Network (CNN)–HsNet is first constructed to detect the pedestrian, and then the corresponding events are classified. For long-duration event, Dense Trajectory (DT) and Improved Dense Trajectory (IDT) are first applied to explore the temporal features of the events respectively, and subsequently, Fisher Vector (FV) coding is adopted to encode raw features and linear SVM classifiers are learned to predict. Finally, a heuristic fusion scheme is used to obtain the results. In addition, a new large-scale pedestrian dataset, named SED-PD, is built for evaluation. Comprehensive experiments on TRECVID SEDtest datasets demonstrate the effectiveness of proposed framework.
AB - As an important branch of multimedia content analysis, Surveillance Event Detection (SED) is still a quite challenging task due to high abstraction and complexity such as occlusions, cluttered backgrounds and viewpoint changes etc. To address the problem, we propose a unified SED detection framework which divides events into two categories, i.e., short-term events and long-duration events. The former can be represented as a kind of snapshots of static key-poses and embodies an inner-dependencies, while the latter contains complex interactions between pedestrians, and shows obvious inter-dependencies and temporal context. For short-term event, a novel cascade Convolutional Neural Network (CNN)–HsNet is first constructed to detect the pedestrian, and then the corresponding events are classified. For long-duration event, Dense Trajectory (DT) and Improved Dense Trajectory (IDT) are first applied to explore the temporal features of the events respectively, and subsequently, Fisher Vector (FV) coding is adopted to encode raw features and linear SVM classifiers are learned to predict. Finally, a heuristic fusion scheme is used to obtain the results. In addition, a new large-scale pedestrian dataset, named SED-PD, is built for evaluation. Comprehensive experiments on TRECVID SEDtest datasets demonstrate the effectiveness of proposed framework.
KW - Cascade CNN
KW - Pedestrian dataset
KW - Pedestrian detection
KW - Surveillance event detection
UR - http://www.scopus.com/inward/record.url?scp=85029675837&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2017.04.079
DO - 10.1016/j.neucom.2017.04.079
M3 - Article
AN - SCOPUS:85029675837
VL - 278
SP - 62
EP - 74
JO - Neurocomputing
JF - Neurocomputing
SN - 0925-2312
ER -