Exploring uncertainty measures for image-caption embedding-and-retrieval task

Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai

Research output: Contribution to journalArticleResearchpeer-review


With the significant development of black-box machine learning algorithms, particularly deep neural networks, the practical demand for reliability assessment is rapidly increasing. On the basis of the concept that "Bayesian deep learning knows what it does not know,"the uncertainty of deep neural network outputs has been investigated as a reliability measure for classification and regression tasks. By considering an embedding task as a regression task, several existing studies have quantified the uncertainty of embedded features and improved the retrieval performance of cutting-edge models by model averaging. However, in image-caption embedding-and-retrieval tasks, well-known samples are not always easy to retrieve. This study shows that the existing method has poor performance in reliability assessment and investigates another aspect of image-caption embedding-and-retrieval tasks. We propose posterior uncertainty by considering the retrieval task as a classification task, which can accurately assess the reliability of retrieval results. The consistent performance of the two uncertainty measures is observed with different datasets (MS-COCO and Flickr30k), different deep-learning architectures (dropout and batch normalization), and different similarity functions. To the best of our knowledge, this is the first study to perform a reliability assessment on image-caption embedding-and-retrieval tasks.

Original languageEnglish
Article number46
Number of pages19
JournalACM Transactions on Multimedia Computing, Communications and Applications
Issue number2
Publication statusPublished - Jun 2021


  • Bayesian deep learning
  • image-caption retrieval
  • semantic embedding
  • Uncertainty quantification

Cite this