Abstract
This work addresses the problem of active speaker detection in physically situated multiparty interactions. This challenge requires a robust solution that can perform effectively across a wide range of speakers and physical contexts. Current state-of-the-art active speaker detection approaches rely on machine learning methods that do not generalize well to new physical settings. We find that these methods do not transfer well even between similar datasets. We propose the use of group-level focus of visual attention in combination with a general audio-video synchronizer method for improved active speaker detection across speakers and physical contexts. Our dataset-independent experiments demonstrate that the proposed approach outperforms state-of-the-art methods trained specifically for the task of active speaker detection.
Original language | English |
---|---|
Title of host publication | ICMI ’21 CompanionCompanion Publication of the 2021 International Conference onMultimodal Interaction |
Editors | Sharon Oviatt, Albert Ali Salah, Guoying Zhao |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 37-42 |
Number of pages | 6 |
ISBN (Electronic) | 9781450384711 |
DOIs | |
Publication status | Published - 2021 |
Event | International Conference on Multimodal Interfaces 2021 - Online, Montreal, Canada Duration: 18 Oct 2021 → 22 Oct 2021 Conference number: 23rd https://dl.acm.org/doi/proceedings/10.1145/3462244 (Proceedings) |
Conference
Conference | International Conference on Multimodal Interfaces 2021 |
---|---|
Abbreviated title | ICMI 2021 |
Country/Territory | Canada |
City | Montreal |
Period | 18/10/21 → 22/10/21 |
Internet address |
|
Keywords
- active speaker detection
- focus of visual attention
- neural networks