JRDB-Act: a large-scale dataset for spatio-temporal action, social group and activity detection

Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, Hamid Rezatofighi

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

30 Citations (Scopus)

Abstract

The availability of large-scale video action understanding datasets has facilitated advances in the interpretation of visual scenes containing people. However, learning to recognise human actions and their social interactions in an unconstrained real-world environment comprising numerous people, with potentially highly unbalanced and longtailed distributed action labels from a stream of sensory data captured from a mobile robot platform remains a significant challenge, not least owing to the lack of a reflective large-scale dataset. In this paper, we introduce JRDB-Act, as an extension of the existing JRDB, which is captured by a social mobile manipulator and reflects a real distribution of human daily-life actions in a university campus environment. JRDB-Act has been densely annotated with atomic actions, comprises over 2.8M action labels, constituting a large-scale spatio-temporal action detection dataset. Each human bounding box is labeled with one pose-based action label and multiple (optional) interaction-based action labels. Moreover JRDB-Act provides social group annotation, conducive to the task of grouping individuals based on their interactions in the scene to infer their social activities (common activities in each social group). Each annotated label in JRDB-Act is tagged with the annotators' confidence level which contributes to the development of reliable evaluation strategies. In order to demonstrate how one can effectively utilise such annotations, we develop an end-to-end trainable pipeline to learn and infer these tasks, i.e. individual action and social group detection. The data and the evaluation code will be publicly available at https://jrdb.erc.monash.edu/

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
EditorsKristin Dana, Gang Hua, Stefan Roth, Dimitris Samaras, Richa Singh
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages20951-20960
Number of pages10
ISBN (Electronic)9781665469463
ISBN (Print)9781665469470
DOIs
Publication statusPublished - 2022
EventIEEE Conference on Computer Vision and Pattern Recognition 2022 - New Orleans, United States of America
Duration: 19 Jun 202224 Jun 2022
https://ieeexplore.ieee.org/xpl/conhome/9878378/proceeding (Proceedings)
https://cvpr2022.thecvf.com
https://cvpr2022.thecvf.com/ (Website)

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PublisherIEEE, Institute of Electrical and Electronics Engineers
Volume2022-June
ISSN (Print)1063-6919
ISSN (Electronic)2575-7075

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition 2022
Abbreviated titleCVPR 2022
Country/TerritoryUnited States of America
CityNew Orleans
Period19/06/2224/06/22
Internet address

Keywords

  • Action and event recognition
  • Datasets and evaluation
  • Video analysis and understanding

Cite this