Mutual disambiguation of recognition errors in a multimodal architecture

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

169 Citations (Scopus)


As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions.

Original languageEnglish
Title of host publicationCHI '99 - Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Place of PublicationNew York NY USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages8
ISBN (Print)0201485591, 9780201485592
Publication statusPublished - 1999
Externally publishedYes
EventInternational Conference on Human Factors in Computing Systems 1999 - Pittsburgh, United States of America
Duration: 15 May 199920 May 1999
Conference number: 17th


ConferenceInternational Conference on Human Factors in Computing Systems 1999
Abbreviated titleCHI 1999
CountryUnited States of America


  • Diverse users
  • Multimodal architecture
  • Mutual disambiguation
  • Recognition errors
  • Robust performance
  • Speech and pen input

Cite this