Skip to main navigation Skip to search Skip to main content

Multiple interaction learning with question-type prior knowledge for constraining answer search space in Visual Question Answering

  • Tuong Do
  • , Binh X. Nguyen
  • , Huy Tran
  • , Erman Tjiputra
  • , Quang D. Tran
  • , Thanh-Toan Do

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearch

Abstract

Different approaches have been proposed to Visual Question Answering (VQA). However, few works are aware of the behaviors of varying joint modality methods over question type prior knowledge extracted from data in constraining answer search space, of which information gives a reliable cue to reason about answers for questions asked in input images. In this paper, we propose a novel VQA model that utilizes the question-type prior information to improve VQA by leveraging the multiple interactions between different joint modality methods based on their behaviors in answering questions from different types. The solid experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that the proposed method yields the best performance with the most competitive approaches.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020 Workshops
Subtitle of host publicationGlasgow, UK, August 23–28, 2020 Proceedings, Part II
EditorsAdrien Bartoli, Andrea Fusiello
Place of PublicationCham Switzerland
PublisherSpringer
Pages496-510
Number of pages15
ISBN (Electronic)9783030660963
ISBN (Print)9783030660956
DOIs
Publication statusPublished - 2020
Externally publishedYes
EventVisual Inductive Priors for Data-Efficient Deep Learning 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020
https://link.springer.com/book/10.1007/978-3-030-66096-3 (Proceedings)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12536
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceVisual Inductive Priors for Data-Efficient Deep Learning 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20
Internet address

Keywords

  • Multiple interaction learning
  • Visual Question Answering

Cite this