Fragile error handling in recognition-based systems is a major problem that degrades their performance, frustrates users, and limits commercial potential. The aim of the present research was to analyze the types and magnitude of linguistic adaptation that occur during spoken and multimodal human-computer error resolution. A semiautomatic simulation method with a novel error-generation capability was used to collect samples of users' spoken and pen-based input immediately before and after recognition errors, and at different spiral depths in terms of the number of repetitions needed to resolve an error. When correcting persistent recognition errors, results revealed that users adapt their speech and language in three qualitatively different ways. First, they increase linguistic contrast through alternation of input modes and lexical content over repeated correction attempts. Second, when correcting with verbatim speech, they increase hyperarticulation by lengthening speech segments and pauses, and increasing the use of final falling contours. Third, when they hyperarticulate, users simultaneously suppress linguistic variability in their speech signal's amplitude and fundamental frequency. These findings are discussed from the perspective of enhancement of linguistic intelligibility. Implications are also discussed for corroboration and generalization of the Computer-elicited Hyperarticulate Adaptation Model (CHAM), and for improved error handling capabilities in next-generation spoken language and multimodal systems.
- Error resolution
- Linguistic contrast
- Multimodal interaction
- Spiral errors
- Spoken and multimodal interaction