In this paper, we focus on the problem of temporal activity detection, which aims to directly predict the temporal bounds of actions. Most existing temporal activity detection algorithms treat the classification of each action proposal separately and neglect vital semantic correlations between actions in one video. This will deteriorate the classification performance in the scenario of long-tail problems, where only a handful of examples are available for uncommon actions. To solve this problem, we propose to incorporate knowledge to reason over large scale action classes and maintain semantic coherency within one video. Specifically, we employ an implicit knowledge reasoning module and an explicit knowledge reasoning module to incorporate the knowledge constraints to facilitate temporal activity localization. To demonstrate the superiority of the proposed model, we test the proposed method on large-scale action detection datasets, namely ActivityNet and THUMOS’14 datasets. The experimental results have demonstrated the superiority of the proposed model. Codes and models will be released once this paper is accepted.
|Number of pages||7|
|Journal||Journal of Visual Communication and Image Representation|
|Publication status||Published - Oct 2019|
- Knowledge constraints
- Reasoning module
- Temporal activity detection