Combining user modeling and machine learning to predict users' multimodal integration patterns

Xiao Huang, Sharon Oviatt, Rebecca Lunsford

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

13 Citations (Scopus)

Abstract

Temporal as well as semantic constraints on fusion are at the heart of multimodal system processing. The goal of the present work is to develop user-adaptive temporal thresholds with improved performance characteristics over state-of-the-art fixed ones, which can be accomplished by leveraging both empirical user modeling and machine learning techniques to handle the large individual differences in users' multimodal integration patterns. Using simple Naive Bayes learning methods and a leave-one-out training strategy, our model correctly predicted 88% of users' mixed speech and pen signal input as either unimodal or multimodal, and 91% of their multimodal input as either sequentially or simultaneously integrated. In addition to predicting a user's multimodal pattern in advance of receiving input, predictive accuracies also were evaluated after the first signal's end-point detection-the earliest time when a speech/pen multimodal system makes a decision regarding fusion. This system-centered metric yielded accuracies of 90% and 92%, respectively, for classification of unimodal/multimodal and sequential/simultaneous input patterns. In addition, empirical modeling revealed a .92 correlation between users' multimodal integration pattern and their likelihood of interacting multimodally, which may have accounted for the superior learning obtained with training over heterogeneous user data rather than data partitioned by user subtype. Finally, in large part due to guidance from user-modeling, the techniques reported here required as little as 15 samples to predict a "surprise" user's input patterns.

Original languageEnglish
Title of host publicationMachine Learning for Multimodal Interaction - Third International Workshop, MLMI 2006, Revised Selected Papers
PublisherSpringer
Pages50-62
Number of pages13
ISBN (Print)3540692673, 9783540692676
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event3rd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2006 - Bethesda, United States of America
Duration: 1 May 20064 May 2006

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume4299
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference3rd International Workshop on Machine Learning for Multimodal Interaction, MLMI 2006
Country/TerritoryUnited States of America
CityBethesda
Period1/05/064/05/06

Cite this