Predicting spoken disfluencies during human-computer interaction

Sharon Oviatt

Research output: Contribution to journalArticleResearchpeer-review

112 Citations (Scopus)

Abstract

This research characterizes the spontaneous spoken disfluencies typical of human-computer interaction, and presents a predictive model accounting for their occurrence. Data were collected during three empirical studies in which people spoke or wrote to a highly interactive simulated system as they completed service transactions. The studies involved within-subject factorial designs in which the input modality and presentation format were varied. Spoken disfluency rates during human-computer interaction were documented to be substantially lower than rates typically observed during comparable human-human speech. Two separate factors, both associated with increased planning demands, were statistically related to higher disfluency rates: (1) length of utterance; and (2) lack of structure in the presentation format. Regression techniques demonstrated that a linear model based simply on utterance length accounted for over 77% of the variability in spoken disfluencies. Therefore, design methods capable of guiding users' speech into briefer sentences have the potential to eliminate the majority of spoken disfluencies. In this research, for example, a structured presentation format successfully eliminated 60-70% of all disfluent speech. The long-term goal of this research is to provide empirical guidance for the design of robust spoken language technology.

Original languageEnglish
Pages (from-to)19-35
Number of pages17
JournalComputer Speech & Language
Volume9
Issue number1
DOIs
Publication statusPublished - Jan 1995
Externally publishedYes

Cite this