Look who's talking: visual identification of the active speaker in multi-party human-robot interaction

Kalin Stefanov, Akihiro Sugimoto, Jonas Beskow

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

4 Citations (Scopus)

Abstract

This paper presents analysis of a previously recorded multimodal interaction dataset. The primary purpose of that dataset is to explore patterns in the focus of visual attention of humans under three different conditions - two humans involved in task-based interaction with a robot; the same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The paper presents a data-driven methodology for automatic visual identification of the active speaker based on facial action units (AUs). The paper also presents an evaluation of the proposed methodology on 12 different interactions with an approximate length of 4 hours. The methodology will be implemented on a robot and used to generate natural focus of visual attention behavior during multi-party human-robot interactions.

Original languageEnglish
Title of host publication2nd Workshop on Advancements in Social Signal Processing for Multimodal Interaction 2016 (ASSP4MI2016)
EditorsLouis-Philippe Morency, Carlos Busso, Catherine Pelachaud
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Pages22-27
Number of pages6
ISBN (Electronic)9781450345576
DOIs
Publication statusPublished - 2016
Externally publishedYes
EventWorkshop on Advancements in Social Signal Processing for Multimodal Interaction 2016 - Tokyo, Japan
Duration: 16 Nov 201616 Nov 2016
Conference number: 2nd
https://dl.acm.org/doi/proceedings/10.1145/3005467 (Proceedings)
https://web.archive.org/web/20160804170758/https://wwwhome.ewi.utwente.nl/~truongkp/icmi2016-assp4mi (Website)

Conference

ConferenceWorkshop on Advancements in Social Signal Processing for Multimodal Interaction 2016
Abbreviated titleASSP4MI 2016
CountryJapan
CityTokyo
Period16/11/1616/11/16
Internet address

Keywords

  • Active speaker identification
  • Human-robot interaction
  • Multi-modal interaction

Cite this