Abstract
The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model’s dynamic decisions, allowing our DSTS module to generalize better. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets. We will release our code.
Original language | English |
---|---|
Title of host publication | ECCV 2022 |
Editors | Shai Avidan, Gabriel Brostow, Giovanni Maria Farinella, Tal Hassner |
Place of Publication | Zurich |
Publisher | European Computer Vision Association |
Number of pages | 17 |
Publication status | Published - 2022 |
Event | European Conference on Computer Vision 2022 - Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022 Conference number: 17th https://link.springer.com/book/10.1007/978-3-031-19830-4 (Proceedings) https://eccv2022.ecva.net (Website) |
Conference
Conference | European Conference on Computer Vision 2022 |
---|---|
Abbreviated title | ECCV 2022 |
Country/Territory | Israel |
City | Tel Aviv |
Period | 23/10/22 → 27/10/22 |
Internet address |
|
Keywords
- Action recognition
- fine-grained
- dynamic neural networks