Spatial bias in vision-based voice activity detection

Kalin Stefanov, Mohammad Adiban, Giampiero Salvi

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

2 Citations (Scopus)


We develop and evaluate models for automatic vision-based voice activity detection (VAD) in multiparty human-human interactions that are aimed at complementing acoustic VAD methods. We provide evidence that this type of vision-based VAD models are susceptible to spatial bias in the dataset used for their development; the physical settings of the interaction, usually constant throughout data acquisition, determines the distribution of head poses of the participants. Our results show that when the head pose distributions are significantly different in the train and test sets, the performance of the vision-based VAD models drops significantly. This suggests that previously reported results on datasets with a fixed physical configuration may overestimate the generalization capabilities of this type of models. We also propose a number of possible remedies to the spatial bias, including data augmentation, input masking and dynamic features, and provide an in-depth analysis of the visual cues used by the developed vision-based VAD models.

Original languageEnglish
Title of host publicationProceedings of ICPR 2020, 25th International Conference on Pattern Recognition
EditorsKim Boyer, Brian C. Lovell, Marcello Pelillo, Nicu Sebe, Rene Vidal, Jingyi Yu
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)9781728188089
ISBN (Print)9781728188096
Publication statusPublished - 2021
Externally publishedYes
EventInternational Conference on Pattern Recognition 2020 - Virtual , Milano, Italy
Duration: 10 Jan 202115 Jan 2021
Conference number: 25th (Proceedings) (Website)

Publication series

NameProceedings - International Conference on Pattern Recognition
PublisherIEEE, Institute of Electrical and Electronics Engineers
ISSN (Print)1051-4651


ConferenceInternational Conference on Pattern Recognition 2020
Abbreviated titleICPR 2020
Internet address


  • Dataset bias
  • Neural networks
  • Spatial bias
  • Vision
  • Voice activity detection

Cite this