Abstract
The rising urban crime rates globally underscore the need for advanced video surveillance systems capable of autonomously detecting violent actions. Current deep learning models face limitations, struggling with subtle motions and lacking real-time capabilities. In response, we advocate for a paradigm shift in surveillance oriented violent action detection, emphasizing the pivotal role of human-object interaction (HOI) detection as opposed to conventional action recognition methodologies. Our contributions include unveiling Violence-HOI (V-HOI), a dataset capturing HOI interactions in static surveillance images. Additionally, we introduce Violence-Net (V Net), a novel convolutional-transformer network architecture, which outperforms existing HOI approaches by 5.25 percentage points in mean average precision. Moreover, when trained on V-HOI, V-Net achieves near real-time processing at 10.43 frames per second, demonstrating its practicality in dynamic surveillance scenarios. The code and dataset is available at https://github.com/MarcusLimJunYi/vhoi.
Original language | English |
---|---|
Title of host publication | 2024 IEEE Internatonal Conference on Advanced Video and Signal Based Surveillance (AVSS) |
Editors | Shan Jia |
Place of Publication | Piscataway NJ USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Number of pages | 7 |
Edition | 2024 |
ISBN (Electronic) | 9798350374285 |
DOIs | |
Publication status | Published - 2024 |
Event | IEEE International Conference on Video and Signal Based Surveillance (AVSS) 2024 - Niagara Falls, Canada Duration: 15 Jul 2024 → 16 Jul 2024 Conference number: 20th https://ieeexplore.ieee.org/xpl/conhome/10672516/proceeding (Proceedings) |
Conference
Conference | IEEE International Conference on Video and Signal Based Surveillance (AVSS) 2024 |
---|---|
Abbreviated title | AVSS 2024 |
Country/Territory | Canada |
City | Niagara Falls |
Period | 15/07/24 → 16/07/24 |
Internet address |