C4Synth: Cross-caption cycle-consistent text-to-image synthesis

K. J. Joseph, Arghya Pal, Sailaja Rajanala, Vineeth N. Balasubramanian

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

17 Citations (Scopus)


Generating an image from its description is a challenging task worth solving because of its numerous practical applications ranging from image editing to virtual reality. All existing methods use one single caption to generate a plausible image. A single caption by itself, can be limited and may not be able to capture the variety of concepts and behavior that would be present in the image. We propose two deep generative models that generate an image by making use of multiple captions describing it. This is achieved by ensuring ‘Cross-Caption Cycle Consistency’ between the multiple captions and the generated image(s). We report quantitative and qualitative results on the standard Caltech-UCSD Birds (CUB) and Oxford-102 Flowers datasets to validate the efficacy of the proposed approach.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision WACV 2018
EditorsMichael Brown, Yanxi Liu, Peyman Milanfar
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages9
ISBN (Electronic)9781728119755
ISBN (Print)9781728119762
Publication statusPublished - 2019
Externally publishedYes
EventIEEE Winter Conference on Applications of Computer Vision 2019 - Waikoloa Village, United States of America
Duration: 7 Jan 201911 Jan 2019
Conference number: 19th
https://wacv19.wacv.net/ (Website)
https://ieeexplore.ieee.org/xpl/conhome/8642793/proceeding (Proceedings)


ConferenceIEEE Winter Conference on Applications of Computer Vision 2019
Abbreviated titleWACV 2019
Country/TerritoryUnited States of America
CityWaikoloa Village
Internet address

Cite this