Abstract
At the intersection between computer vision and natural language processing, there has been recent progress on two natural language generation tasks: Dense Image Captioning and Referring Expression Generation for objects in complex scenes. The former aims to provide a caption for a specified object in a complex scene for the benefit of an interlocutor who may not be able to see it. The latter aims to produce a referring expression that will serve to identify a given object in a scene that the interlocutor can see. The two tasks are designed for different assumptions about the common ground between the interlocutors, and serve very different purposes, although they both associate a linguistic description with an object in a complex scene. Despite these fundamental differences, the distinction between these two tasks is sometimes overlooked. Here, we undertake a side-by-side comparison between image captioning and reference game human datasets and show that they differ systematically with respect to informativity. We hope that an understanding of the systematic differences among these human datasets will ultimately allow them to be leveraged more effectively in the associated engineering tasks.
Original language | English |
---|---|
Title of host publication | PaM 2020, Proceedings of the Probability and Meaning Conference |
Editors | Christine Howes, Stergios Chatzikyriakidis, Adam Ek, Vidya Somashekarappa |
Place of Publication | Stroudsburg PA USA |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 104-108 |
Number of pages | 5 |
Publication status | Published - 2020 |
Externally published | Yes |
Event | Conference on Probability and Meaning 2020 - Sweden, Sweden Duration: 14 Oct 2020 → 15 Oct 2020 https://aclanthology.org/2020.pam-1.0/ (Proceedings) https://sites.google.com/view/pam2020/home (Website) |
Conference
Conference | Conference on Probability and Meaning 2020 |
---|---|
Abbreviated title | PaM 2020 |
Country/Territory | Sweden |
City | Sweden |
Period | 14/10/20 → 15/10/20 |
Internet address |
|