Audio-Visual Automatic Group Affect analysis

Garima Sharma, Abhinav Dhall, Jianfei Cai

Research output: Contribution to journalArticleResearchpeer-review

7 Citations (Scopus)

Abstract

Affective computing has progressed well due to methods, which can identify a person's posed and spontaneous perceived affect with high accuracy. This paper focuses on group-level affect analysis on videos, which is one of the first few multimodal group-level affect analysis studies. There are many challenges on video-based group-level affect analysis as most of the work is focused on either a single person's affect recognition or image-based group affect analysis. To address this, first, we present an audio-visual perceived group affect dataset - 'Video-level Group AFfect (VGAF)'. VGAF is a large-scale dataset consisting of 4,183 group videos. The videos are collected from YouTube with large variations in the keywords for collecting data across different genders, group settings, group sizes, illuminations and poses. The variety within the dataset will help the study of perception of group affect in a real environment. The data is manually annotated for three group affect classes - positive, neutral, and negative. Further, a fusion based audio-visual method is proposed to set a benchmark performance on the proposed dataset. The experimental results show the effectiveness of facial, holistic and speech features for group-level affect analysis. The baseline code, dataset, and pre-trained models are available at [LINK].

Original languageEnglish
Pages (from-to)1056-1069
Number of pages14
JournalIEEE Transactions on Affective Computing
Volume14
Issue number2
DOIs
Publication statusPublished - Apr 2023

Keywords

  • affect recognition
  • affective computing
  • Group-level affect recognition

Cite this