Unpaired image captioning by language pivoting

Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2018
Subtitle of host publication15th European Conference Munich, Germany, September 8–14, 2018 Proceedings, Part I
EditorsVittorio Ferrari, Martial Hebert, Cristian Sminchisescu, Yair Weiss
Place of PublicationCham Switzerland
PublisherSpringer
Pages519-535
Number of pages17
ISBN (Electronic)9783030012465
ISBN (Print)9783030012458
DOIs
Publication statusPublished - 2018
Externally publishedYes
EventEuropean Conference on Computer Vision 2018 - Munich, Germany
Duration: 8 Sep 201814 Sep 2018
Conference number: 15th
https://eccv2018.org/
https://link.springer.com/book/10.1007/978-3-030-01246-5 (Proceedings)

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume11205
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Computer Vision 2018
Abbreviated titleECCV 2018
CountryGermany
CityMunich
Period8/09/1814/09/18
Internet address

Keywords

  • Image captioning
  • Unpaired learning

Cite this