We integrate a supervised machine learning mechanism for detecting erroneous words in the output of a speech recognizer with a two-tier error-correction approach that features (1) a noisy-channel model that replaces erroneous words with generic words, and (2) a phonetic-similarity mechanism that refines the generic words based on a short list of candidate interpretations. Our results, obtained on a corpus of 341 referring expressions, show that the first tier improves interpretation performance, and the second tier yields further improvements.
|Title of host publication||Interspeech 2015: Speech Beyond Speech|
|Editors||Sebastian Moller, Hermann Ney, Bernd Mobius, Elmar Noth, Stefan Steidl|
|Place of Publication||Baixas France|
|Publisher||International Speech Communication Association (ISCA)|
|Pages||2032 - 2036|
|Number of pages||5|
|Publication status||Published - 2015|
|Event||Annual Conference of the International Speech Communication Association (was Eurospeech) 2015 - Dresden, Germany|
Duration: 6 Sep 2015 → 10 Sep 2015
Conference number: 16th
|Conference||Annual Conference of the International Speech Communication Association (was Eurospeech) 2015|
|Abbreviated title||Interspeech 2015|
|Period||6/09/15 → 10/09/15|
Zukerman, I., Partovi, A., & Kim, S. N. (2015). Context-dependent error correction of spoken referring expressions. In S. Moller, H. Ney, B. Mobius, E. Noth, & S. Steidl (Eds.), Interspeech 2015: Speech Beyond Speech (pp. 2032 - 2036). International Speech Communication Association (ISCA).