This paper presents a statistical approach to developing multimodal recognition systems and, in particular, to integrating the posterior probabilities of parallel input signals involved in the multimodal system. We first derive the performance bounds of multimodal recognition probabilities, and identify the primary factors that influence multimodal recognition performance. We then develop a technique, a Members-Teams-Committee (MTC) recognition approach, designed to optimize accurate recognition during the multimodal integration process. We evaluate these methods using Quickset, a speech/gesture multimodal system, and report evaluation results based on an empirical corpus collected with Quickset. From an architectural perspective, the integration technique presented here offers enhanced robustness. It also is premised on more realistic assumptions than previous multimodal systems using semantic fusion. From a methodological standpoint, the evaluation techniques that we describe provide a valuable tool for evaluating multimodal systems.
|Number of pages||10|
|Publication status||Published - 1 Dec 1999|
|Event||Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99) - Madison, WI, USA|
Duration: 23 Aug 1999 → 25 Aug 1999
|Conference||Proceedings of the 1999 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99)|
|City||Madison, WI, USA|
|Period||23/08/99 → 25/08/99|