Understanding unnatural questions improves reasoning over text

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

Abstract

Complex question answering (CQA) over raw text is a challenging task. A prominent approach to this task is based on the programmer-interpreter framework, where the programmer maps the question into a sequence of reasoning actions and the interpreter then executes these actions on the raw text. Learning an effective CQA model requires large amounts of human-annotated data, consisting of the ground-truth sequence of reasoning actions, which is time-consuming and expensive to collect at scale. In this paper, we address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions which are more convenient to parse. We firstly generate synthetic (question, action sequence) pairs by a data generator, and train a semantic parser that associates synthetic questions with their corresponding action sequences. To capture the diversity when applied to natural questions, we learn a projection model to map natural questions into their most similar unnatural questions for which the parser can work well. Without any natural training data, our projection model provides high-quality action sequences for the CQA task. Experimental results show that the QA model trained exclusively with synthetic data outperforms its state-of-the-art counterpart trained on human-labeled data.
Original languageEnglish
Title of host publicationCOLING 2020
Subtitle of host publicationThe 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengquing Zong
Place of PublicationStroudsburg PA USA
PublisherAssociation for Computational Linguistics (ACL)
Pages4949–4955
Number of pages7
ISBN (Electronic)9781952148279
DOIs
Publication statusPublished - 2020
EventInternational Conference on Computational Linguistics 2020 - Virtual, Barcelona, Spain
Duration: 8 Dec 202013 Dec 2020
Conference number: 28th
https://coling2020.org (Website)
https://www.aclweb.org/anthology/volumes/2020.coling-main/ (Proceedings)

Conference

ConferenceInternational Conference on Computational Linguistics 2020
Abbreviated titleCOLING 2020
CountrySpain
CityBarcelona
Period8/12/2013/12/20
Internet address

Cite this