Research output per year
Research output per year
Po-Yao Huang, Guoliang Kang, Wenhe Liu, Xiaojun Chang, Alexander G. Hauptmann
Research output: Chapter in Book/Report/Conference proceeding › Conference Paper › Research › peer-review
Visual-semantic embeddings are central to many multimedia applications such as cross-modal retrieval between visual data and natural language descriptions. Conventionally, learning a joint embedding space relies on large parallel multimodal corpora. Since massive human annotation is expensive to obtain, there is a strong motivation in developing versatile algorithms to learn from large corpora with fewer annotations. In this paper, we propose a novel framework to leverage automatically extracted regional semantics from un-annotated images as additional weak supervision to learn visual-semantic embeddings. The proposed model employs adversarial attentive alignments to close the inherent heterogeneous gaps between annotated and un-annotated portions of visual and textual domains. To demonstrate its superiority, we conduct extensive experiments on sparsely annotated multimodal corpora. The experimental results show that the proposed model outperforms state-of-the-art visual-semantic embedding models by a significant margin for cross-modal retrieval tasks on the sparse Flickr30k and MS-COCO datasets. It is also worth noting that, despite using only 20% of the annotations, the proposed model can achieve competitive performance (Recall at 10 > 80.0% for 1K and > 70.0% for 5K text-to-image retrieval) compared to the benchmarks trained with the complete annotations.
Original language | English |
---|---|
Title of host publication | Proceedings of the 27th ACM International Conference on Multimedia |
Editors | Guillaume Gravier, Hayley Hung, Chong-Wah Ngo, Wei Tsang Ooi |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1758-1767 |
Number of pages | 10 |
ISBN (Electronic) | 9781450368896, 9781450367936 |
DOIs | |
Publication status | Published - 2019 |
Event | ACM International Conference on Multimedia 2019 - Nice, France Duration: 21 Oct 2019 → 25 Oct 2019 Conference number: 27th https://dl.acm.org/doi/proceedings/10.1145/3343031 |
Conference | ACM International Conference on Multimedia 2019 |
---|---|
Abbreviated title | MM 2019 |
Country/Territory | France |
City | Nice |
Period | 21/10/19 → 25/10/19 |
Internet address |
Research output: Contribution to journal › Article › Research › peer-review