Abstract
As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions.
Original language | English |
---|---|
Title of host publication | CHI '99 - Proceedings of the SIGCHI conference on Human Factors in Computing Systems |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 576-583 |
Number of pages | 8 |
ISBN (Print) | 0201485591, 9780201485592 |
DOIs | |
Publication status | Published - 1999 |
Externally published | Yes |
Event | International Conference on Human Factors in Computing Systems 1999 - Pittsburgh, United States of America Duration: 15 May 1999 → 20 May 1999 Conference number: 17th |
Conference
Conference | International Conference on Human Factors in Computing Systems 1999 |
---|---|
Abbreviated title | CHI 1999 |
Country/Territory | United States of America |
City | Pittsburgh |
Period | 15/05/99 → 20/05/99 |
Keywords
- Diverse users
- Multimodal architecture
- Mutual disambiguation
- Recognition errors
- Robust performance
- Speech and pen input