Employing distance-based semantics to interpret spoken referring expressions

Ingrid Zukerman, Su Nam Kim, Thomas Kleinbauer, Masud Moshtaghi

    Research output: Contribution to journalArticleResearchpeer-review

    Abstract

    In this paper, we present Scusi?, an anytime numerical mechanism for the interpretation of spoken referring expressions. Our contributions are: (1) an anytime interpretation process that considers multiple alternatives at different interpretation stages (speech, syntax, semantics and pragmatics), which enables Scusi? to defer decisions to the end of the interpretation process; (2) a mechanism that combines scores associated with the output of the different interpretation stages, taking into account the uncertainty arising from a variety of sources, such as ambiguity or inaccuracy in a description, speech recognition errors and out-of-vocabulary terms; and (3) distance-based functions with probabilistic semantics that represent lexical similarity between objects names and similarity between stated requirements and physical properties of objects (viz colour, size and positional relations). We considered two approaches for combining these descriptive attributes, viz multiplicative and additive, and determined whether prioritizing certain interpretation stages and descriptive attributes affects interpretation performance. We conducted two experiments to evaluate different aspects of Scusi? s performance: Interpretive, where we compared Scusi? s understanding of descriptions that are mainly ambiguous or inaccurate with people s understanding of these descriptions, and Generative, where we assessed Scusi? s understanding of naturally occurring spoken descriptions. Our results show that Scusi? s understanding of the descriptions in the Interpretive trial is comparable to that of people; and that its performance is encouraging when given arbitrary spoken descriptions in diverse scenarios, and excellent for the corresponding written descriptions. In both experiments, Scusi? significantly outperformed a baseline system that maintains only top same-score interpretations.
    Original languageEnglish
    Pages (from-to)154 - 185
    Number of pages32
    JournalComputer Speech and Language
    Volume34
    Issue number1
    DOIs
    Publication statusPublished - 2015

    Cite this

    Zukerman, Ingrid ; Kim, Su Nam ; Kleinbauer, Thomas ; Moshtaghi, Masud. / Employing distance-based semantics to interpret spoken referring expressions. In: Computer Speech and Language. 2015 ; Vol. 34, No. 1. pp. 154 - 185.
    @article{5aa2a249422b42939cf1f4279ed87803,
    title = "Employing distance-based semantics to interpret spoken referring expressions",
    abstract = "In this paper, we present Scusi?, an anytime numerical mechanism for the interpretation of spoken referring expressions. Our contributions are: (1) an anytime interpretation process that considers multiple alternatives at different interpretation stages (speech, syntax, semantics and pragmatics), which enables Scusi? to defer decisions to the end of the interpretation process; (2) a mechanism that combines scores associated with the output of the different interpretation stages, taking into account the uncertainty arising from a variety of sources, such as ambiguity or inaccuracy in a description, speech recognition errors and out-of-vocabulary terms; and (3) distance-based functions with probabilistic semantics that represent lexical similarity between objects names and similarity between stated requirements and physical properties of objects (viz colour, size and positional relations). We considered two approaches for combining these descriptive attributes, viz multiplicative and additive, and determined whether prioritizing certain interpretation stages and descriptive attributes affects interpretation performance. We conducted two experiments to evaluate different aspects of Scusi? s performance: Interpretive, where we compared Scusi? s understanding of descriptions that are mainly ambiguous or inaccurate with people s understanding of these descriptions, and Generative, where we assessed Scusi? s understanding of naturally occurring spoken descriptions. Our results show that Scusi? s understanding of the descriptions in the Interpretive trial is comparable to that of people; and that its performance is encouraging when given arbitrary spoken descriptions in diverse scenarios, and excellent for the corresponding written descriptions. In both experiments, Scusi? significantly outperformed a baseline system that maintains only top same-score interpretations.",
    author = "Ingrid Zukerman and Kim, {Su Nam} and Thomas Kleinbauer and Masud Moshtaghi",
    year = "2015",
    doi = "10.1016/j.csl.2015.01.002",
    language = "English",
    volume = "34",
    pages = "154 -- 185",
    journal = "Computer Speech and Language",
    issn = "0885-2308",
    publisher = "Elsevier",
    number = "1",

    }

    Employing distance-based semantics to interpret spoken referring expressions. / Zukerman, Ingrid; Kim, Su Nam; Kleinbauer, Thomas; Moshtaghi, Masud.

    In: Computer Speech and Language, Vol. 34, No. 1, 2015, p. 154 - 185.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Employing distance-based semantics to interpret spoken referring expressions

    AU - Zukerman, Ingrid

    AU - Kim, Su Nam

    AU - Kleinbauer, Thomas

    AU - Moshtaghi, Masud

    PY - 2015

    Y1 - 2015

    N2 - In this paper, we present Scusi?, an anytime numerical mechanism for the interpretation of spoken referring expressions. Our contributions are: (1) an anytime interpretation process that considers multiple alternatives at different interpretation stages (speech, syntax, semantics and pragmatics), which enables Scusi? to defer decisions to the end of the interpretation process; (2) a mechanism that combines scores associated with the output of the different interpretation stages, taking into account the uncertainty arising from a variety of sources, such as ambiguity or inaccuracy in a description, speech recognition errors and out-of-vocabulary terms; and (3) distance-based functions with probabilistic semantics that represent lexical similarity between objects names and similarity between stated requirements and physical properties of objects (viz colour, size and positional relations). We considered two approaches for combining these descriptive attributes, viz multiplicative and additive, and determined whether prioritizing certain interpretation stages and descriptive attributes affects interpretation performance. We conducted two experiments to evaluate different aspects of Scusi? s performance: Interpretive, where we compared Scusi? s understanding of descriptions that are mainly ambiguous or inaccurate with people s understanding of these descriptions, and Generative, where we assessed Scusi? s understanding of naturally occurring spoken descriptions. Our results show that Scusi? s understanding of the descriptions in the Interpretive trial is comparable to that of people; and that its performance is encouraging when given arbitrary spoken descriptions in diverse scenarios, and excellent for the corresponding written descriptions. In both experiments, Scusi? significantly outperformed a baseline system that maintains only top same-score interpretations.

    AB - In this paper, we present Scusi?, an anytime numerical mechanism for the interpretation of spoken referring expressions. Our contributions are: (1) an anytime interpretation process that considers multiple alternatives at different interpretation stages (speech, syntax, semantics and pragmatics), which enables Scusi? to defer decisions to the end of the interpretation process; (2) a mechanism that combines scores associated with the output of the different interpretation stages, taking into account the uncertainty arising from a variety of sources, such as ambiguity or inaccuracy in a description, speech recognition errors and out-of-vocabulary terms; and (3) distance-based functions with probabilistic semantics that represent lexical similarity between objects names and similarity between stated requirements and physical properties of objects (viz colour, size and positional relations). We considered two approaches for combining these descriptive attributes, viz multiplicative and additive, and determined whether prioritizing certain interpretation stages and descriptive attributes affects interpretation performance. We conducted two experiments to evaluate different aspects of Scusi? s performance: Interpretive, where we compared Scusi? s understanding of descriptions that are mainly ambiguous or inaccurate with people s understanding of these descriptions, and Generative, where we assessed Scusi? s understanding of naturally occurring spoken descriptions. Our results show that Scusi? s understanding of the descriptions in the Interpretive trial is comparable to that of people; and that its performance is encouraging when given arbitrary spoken descriptions in diverse scenarios, and excellent for the corresponding written descriptions. In both experiments, Scusi? significantly outperformed a baseline system that maintains only top same-score interpretations.

    UR - http://goo.gl/Cejc3S

    U2 - 10.1016/j.csl.2015.01.002

    DO - 10.1016/j.csl.2015.01.002

    M3 - Article

    VL - 34

    SP - 154

    EP - 185

    JO - Computer Speech and Language

    JF - Computer Speech and Language

    SN - 0885-2308

    IS - 1

    ER -