Finding it at another side: a viewpoint-adapted matching encoder for Change Captioning

Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review


Change Captioning is a task that aims to describe the difference between images with natural language. Most existing methods treat this problem as a difference judgment without the existence of distractors, such as viewpoint changes. However, in practice, viewpoint changes happen often and can overwhelm the semantic difference to be described. In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task. Moreover, we further simulate the attention preference of humans and propose a novel reinforcement learning process to fine-tune the attention directly with language evaluation rewards. Extensive experimental results show that our method outperforms the state-of-the-art approaches by a large margin in both Spot-the-Diff and CLEVR-Change datasets.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020
Subtitle of host publication16th European Conference Glasgow, UK, August 23–28, 2020 Proceedings, Part XIV
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
Place of PublicationCham Switzerland
Number of pages17
ISBN (Electronic)9783030585686
ISBN (Print)9783030585679
Publication statusPublished - 2020
EventEuropean Conference on Computer Vision 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020
Conference number: 16th (Proceedings) (Website)

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceEuropean Conference on Computer Vision 2020
Abbreviated titleECCV 2020
CountryUnited Kingdom
Internet address


  • Attention
  • Change captioning
  • Image captioning
  • Reinforcement learning

Cite this