Stack-captioning: coarse-to-fine learning for image captioning

Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

70 Citations (Scopus)

Abstract

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multi-stage prediction framework for image captioning, composed of multiple decoders each of which operates on the output of the previous stage, producing increasingly refined image descriptions. Our proposed learning approach addresses the difficulty of vanishing gradients during training by providing a learning objective function that enforces intermediate supervisions. Particularly, we optimize our model with a reinforcement learning approach which utilizes the output of each intermediate decoder's test-time inference algorithm as well as the output of its preceding decoder to normalize the rewards, which simultaneously solves the well-known exposure bias problem and the loss-evaluation mismatch problem. We extensively evaluate the proposed approach on MSCOCO and show that our approach can achieve the state-of-the-art performance.

Original languageEnglish
Title of host publicationThe Thirty-Second AAAI Conference on Artificial Intelligence
EditorsSheila McIlraith, Kilian Weinberger
Place of PublicationPalo Alto CA USA
PublisherAssociation for the Advancement of Artificial Intelligence (AAAI)
Pages6837-6844
Number of pages8
ISBN (Electronic)9781577358008
Publication statusPublished - 2018
Externally publishedYes
EventAAAI Conference on Artificial Intelligence 2018 - New Orleans, United States of America
Duration: 2 Feb 20187 Feb 2018
Conference number: 32nd
https://aaai.org/Conferences/AAAI-18/

Conference

ConferenceAAAI Conference on Artificial Intelligence 2018
Abbreviated titleAAAI 2018
Country/TerritoryUnited States of America
CityNew Orleans
Period2/02/187/02/18
Internet address

Cite this