Unpaired image captioning via scene graph alignments

Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

3 Citations (Scopus)

Abstract

Most of current image captioning models heavily rely on paired image-caption datasets. However, getting large scale image-caption paired data is labor-intensive and time-consuming. In this paper, we present a scene graph-based approach for unpaired image captioning. Our framework comprises an image scene graph generator, a sentence scene graph generator, a scene graph encoder, and a sentence decoder. Specifically, we first train the scene graph encoder and the sentence decoder on the text modality. To align the scene graphs between images and sentences, we propose an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality. Experimental results show that our proposed model can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin.
Original languageEnglish
Title of host publicationIEEE International Conference on Computer Vision 2019
EditorsIn So Kweon, Nikos Paragios, Ming-Hsuan Yang, Svetlana Lazebnik
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages10323-10332
Number of pages10
ISBN (Electronic)9781728148038
DOIs
Publication statusPublished - 2019
EventIEEE International Conference on Computer Vision 2019 - Seoul, Korea, Republic of (South)
Duration: 27 Oct 20192 Nov 2019
Conference number: 17th
http://iccv2019.thecvf.com/

Conference

ConferenceIEEE International Conference on Computer Vision 2019
Abbreviated titleICCV 2019
CountryKorea, Republic of (South)
CitySeoul
Period27/10/192/11/19
Internet address

Cite this

Gu, J., Joty, S., Cai, J., Zhao, H., Yang, X., & Wang, G. (2019). Unpaired image captioning via scene graph alignments. In I. S. Kweon, N. Paragios, M-H. Yang, & S. Lazebnik (Eds.), IEEE International Conference on Computer Vision 2019 (pp. 10323-10332). IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICCV.2019.01042