Complex event detection has been progressively researched in recent years for the broad interest of video indexing and retrieval. To fulfill the purpose of event detection, one needs to train a classifier using both positive and negative examples. Current classifier training treats the negative videos as equally negative. However, we notice that many negative videos resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not optimal to capture the discriminative cues from the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the classifier learning, which brings mutual reciprocality. Meanwhile, the labels of positive examples are supposed to remain unchanged. We thus additionally introduce a constraint for this purpose. On the other hand, the state-of-the-art deep convolutional neural network features are leveraged in our approach for event detection to further boost the performance. Extensive experiments on the challenging TRECVID MED 2014 dataset have validated the efficacy of our proposed approach.
- Attribute representation
- attribute selection
- complex event detection
- selective fine-grained labeling