VQA-E: explaining, elaborating, and enhancing your answers for visual questions

Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

1 Citation (Scopus)

Abstract

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We also conduct a user study to validate the quality of the synthesized explanations. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018
Subtitle of host publication15th European Conference Munich, Germany, September 8–14, 2018 Proceedings, Part VII
EditorsVittorio Ferrari, Martial Hebert, Cristian Sminchisescu, Yair Weiss
Place of PublicationCham Switzerland
PublisherSpringer
Pages570-586
Number of pages17
ISBN (Electronic)9783030012342
ISBN (Print)9783030012335
DOIs
Publication statusPublished - 2018
Externally publishedYes
EventEuropean Conference on Computer Vision 2018 - Munich, Germany
Duration: 8 Sep 201814 Sep 2018
Conference number: 15th
https://eccv2018.org/
https://link.springer.com/book/10.1007/978-3-030-01246-5 (Proceedings)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11211
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision 2018
Abbreviated titleECCV 2018
CountryGermany
CityMunich
Period8/09/1814/09/18
Internet address

Keywords

  • Model with Explanation
  • Visual question answering

Cite this