Group-level focus of visual attention for improved active speaker detection

Christopher Birmingham, Maja Mataric, Kalin Stefanov

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

This work addresses the problem of active speaker detection in physically situated multiparty interactions. This challenge requires a robust solution that can perform effectively across a wide range of speakers and physical contexts. Current state-of-the-art active speaker detection approaches rely on machine learning methods that do not generalize well to new physical settings. We find that these methods do not transfer well even between similar datasets. We propose the use of group-level focus of visual attention in combination with a general audio-video synchronizer method for improved active speaker detection across speakers and physical contexts. Our dataset-independent experiments demonstrate that the proposed approach outperforms state-of-the-art methods trained specifically for the task of active speaker detection.

Original languageEnglish
Title of host publicationICMI ’21 CompanionCompanion Publication of the 2021 International Conference onMultimodal Interaction
EditorsSharon Oviatt, Albert Ali Salah, Guoying Zhao
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages37-42
Number of pages6
ISBN (Electronic)9781450384711
DOIs
Publication statusPublished - 2021
EventInternational Conference on Multimodal Interfaces 2021 - Online, Montreal, Canada
Duration: 18 Oct 202122 Oct 2021
Conference number: 23rd
https://dl.acm.org/doi/proceedings/10.1145/3462244 (Proceedings)

Conference

ConferenceInternational Conference on Multimodal Interfaces 2021
Abbreviated titleICMI 2021
Country/TerritoryCanada
CityMontreal
Period18/10/2122/10/21
Internet address

Keywords

  • active speaker detection
  • focus of visual attention
  • neural networks

Cite this