Joint learning of social groups, individuals action and sub-group activities in videos

Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid, Hamid Rezatofighi

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

34 Citations (Scopus)


The state-of-the art solutions for human activity understanding from a video stream formulate the task as a spatio-temporal problem which requires joint localization of all individuals in the scene and classification of their actions or group activity over time. Who is interacting with whom, e.g. not everyone in a queue is interacting with each other, is often not predicted. There are scenarios where people are best to be split into sub-groups, which we call social groups, and each social group may be engaged in a different social activity. In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task. Our main contributions are: i) we propose an end-to-end trainable framework for the social task; ii) our proposed method also sets the state-of-the-art results on two widely adopted benchmarks for the traditional group activity recognition task (assuming individuals of the scene form a single group and predicting a single group activity label for the scene); iii) we introduce new annotations on an existing group activity dataset, re-purposing it for the social task. The data and code for our method is publicly available (

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part IX
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Place of PublicationCham Switzerland
Number of pages19
ISBN (Electronic)9783030585457
ISBN (Print)9783030585440
Publication statusPublished - 2020
Externally publishedYes
EventEuropean Conference on Computer Vision 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020
Conference number: 16th (Proceedings) (Website)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision 2020
Abbreviated titleECCV 2020
Country/TerritoryUnited Kingdom
Internet address


  • Collective behaviour recognition
  • Social grouping
  • Video understanding

Cite this