Dynamic Spatio-Temporal Specialization learning for fine-grained action recognition

Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang, Jinghua Wang, Jun Liu

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model’s dynamic decisions, allowing our DSTS module to generalize better. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets. We will release our code.
Original languageEnglish
Title of host publicationECCV 2022
EditorsShai Avidan, Gabriel Brostow, Giovanni Maria Farinella, Tal Hassner
Place of PublicationZurich
PublisherEuropean Computer Vision Association
Number of pages17
Publication statusPublished - 2022
EventEuropean Conference on Computer Vision 2022 - Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022
Conference number: 17th
https://link.springer.com/book/10.1007/978-3-031-19830-4 (Proceedings)
https://eccv2022.ecva.net (Website)


ConferenceEuropean Conference on Computer Vision 2022
Abbreviated titleECCV 2022
CityTel Aviv
Internet address


  • Action recognition
  • fine-grained
  • dynamic neural networks

Cite this