TY - JOUR
T1 - Exploring uncertainty measures for image-caption embedding-and-retrieval task
AU - Hama, Kenta
AU - Matsubara, Takashi
AU - Uehara, Kuniaki
AU - Cai, Jianfei
N1 - Funding Information:
This study was partially supported by the MIC/SCOPE #172107101 and JSPS KAKENHI (19H04172). Authors’ addresses: K. Hama and T. Matsubara, Osaka University, Osaka, Japan; emails: [email protected]; [email protected]; K. Uehara, Osaka Gakuin University, Osaka, Japan; email: [email protected]; J. Cai, Monash University, Clayton, Australia; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1551-6857/2021/04-ART46 $15.00 https://doi.org/10.1145/3425663
Publisher Copyright:
© 2021 ACM.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/6
Y1 - 2021/6
N2 - With the significant development of black-box machine learning algorithms, particularly deep neural networks, the practical demand for reliability assessment is rapidly increasing. On the basis of the concept that "Bayesian deep learning knows what it does not know,"the uncertainty of deep neural network outputs has been investigated as a reliability measure for classification and regression tasks. By considering an embedding task as a regression task, several existing studies have quantified the uncertainty of embedded features and improved the retrieval performance of cutting-edge models by model averaging. However, in image-caption embedding-and-retrieval tasks, well-known samples are not always easy to retrieve. This study shows that the existing method has poor performance in reliability assessment and investigates another aspect of image-caption embedding-and-retrieval tasks. We propose posterior uncertainty by considering the retrieval task as a classification task, which can accurately assess the reliability of retrieval results. The consistent performance of the two uncertainty measures is observed with different datasets (MS-COCO and Flickr30k), different deep-learning architectures (dropout and batch normalization), and different similarity functions. To the best of our knowledge, this is the first study to perform a reliability assessment on image-caption embedding-and-retrieval tasks.
AB - With the significant development of black-box machine learning algorithms, particularly deep neural networks, the practical demand for reliability assessment is rapidly increasing. On the basis of the concept that "Bayesian deep learning knows what it does not know,"the uncertainty of deep neural network outputs has been investigated as a reliability measure for classification and regression tasks. By considering an embedding task as a regression task, several existing studies have quantified the uncertainty of embedded features and improved the retrieval performance of cutting-edge models by model averaging. However, in image-caption embedding-and-retrieval tasks, well-known samples are not always easy to retrieve. This study shows that the existing method has poor performance in reliability assessment and investigates another aspect of image-caption embedding-and-retrieval tasks. We propose posterior uncertainty by considering the retrieval task as a classification task, which can accurately assess the reliability of retrieval results. The consistent performance of the two uncertainty measures is observed with different datasets (MS-COCO and Flickr30k), different deep-learning architectures (dropout and batch normalization), and different similarity functions. To the best of our knowledge, this is the first study to perform a reliability assessment on image-caption embedding-and-retrieval tasks.
KW - Bayesian deep learning
KW - image-caption retrieval
KW - semantic embedding
KW - Uncertainty quantification
UR - http://www.scopus.com/inward/record.url?scp=85107938786&partnerID=8YFLogxK
U2 - 10.1145/3425663
DO - 10.1145/3425663
M3 - Article
AN - SCOPUS:85107938786
SN - 1551-6857
VL - 17
JO - ACM Transactions on Multimedia Computing, Communications and Applications
JF - ACM Transactions on Multimedia Computing, Communications and Applications
IS - 2
M1 - 46
ER -