Part-aware Unified Representation of Language and Skeleton for zero-shot action recognition

Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

While remarkable progress has been made on supervisedskeleton-based action recognition, the challenge of zeroshotrecognition remains relatively unexplored. In this paper,we argue that relying solely on aligning label-level semanticsand global skeleton features is insufficient to effectivelytransfer locally consistent visual knowledge fromseen to unseen classes. To address this limitation, we introducePart-aware Unified Representation between Languageand Skeleton (PURLS) to explore visual-semantic alignmentat both local and global scales. PURLS introduces a newprompting module and a novel partitioning module to generatealigned textual and visual representations across differentlevels. The former leverages a pre-trained GPT-3to infer refined descriptions of the global and local (bodypart-based and temporal-interval-based) movements fromthe original action labels. The latter employs an adaptivesampling strategy to group visual features from all bodyjoint movements that are semantically relevant to a givendescription. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets, i.e.,NTU-RGB+D 60, NTU-RGB+D 120, and a newly curateddataset Kinetics-skeleton 200. The results showcase theuniversality and superior performance of PURLS, surpassingprior skeleton-based solutions and standard baselinesfrom other domains. The source codes can be accessed athttps://github.com/azzh1/PURLS.
Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
EditorsEric Mortensen
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages18761-18770
Number of pages10
Publication statusPublished - 2023
EventIEEE Conference on Computer Vision and Pattern Recognition 2024 - Seattle, United States of America
Duration: 17 Jun 202421 Jun 2024
https://openaccess.thecvf.com/CVPR2024 (Proceedings)
https://cvpr.thecvf.com/Conferences/2024 (Website)
https://ieeexplore.ieee.org/xpl/conhome/10654794/proceeding (Proceedings)

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition 2024
Abbreviated titleCVPR 2024
Country/TerritoryUnited States of America
CitySeattle
Period17/06/2421/06/24
Internet address

Cite this