TY - JOUR
T1 - Attention in Attention Networks for person retrieval
AU - Fang, Pengfei
AU - Zhou, Jieming
AU - Roy, Soumava KUMAR
AU - Ji, Pan
AU - Petersson, Lars
AU - Harandi, Mehrtash T.
N1 - Publisher Copyright:
IEEE
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2022/9/1
Y1 - 2022/9/1
N2 - This paper generalizes the Attention in Attention (AiA) mechanism, proposed in [1], by employing explicit mapping in reproducing kernel Hilbert spaces to generate attention values of the input feature map. The AiA mechanism models the capacity of building inter-dependencies among the local and global features by the interaction of inner and outer attention modules. Besides a vanilla AiA module, termed linear attention with AiA, two non-linear counterparts, namely, second-order polynomial attention and Gaussian attention, are also proposed to utilize the non-linear properties of the input features explicitly, via the second-order polynomial kernel and Gaussian kernel approximation. The deep convolutional neural network, equipped with the proposed AiA blocks, is referred to as Attention in Attention Network (AiA-Net). The AiA-Net learns to extract a discriminative pedestrian representation, which combines complementary person appearance and corresponding part features. Extensive ablation studies verify the effectiveness of the AiA mechanism and the use of non-linear features hidden in the feature map for attention design. Furthermore, our approach outperforms current state-of-the-art by a considerable margin across a number of benchmarks. In addition, state-of-the-art performance is also achieved in the video person retrieval task with the assistance of the proposed AiA blocks.
AB - This paper generalizes the Attention in Attention (AiA) mechanism, proposed in [1], by employing explicit mapping in reproducing kernel Hilbert spaces to generate attention values of the input feature map. The AiA mechanism models the capacity of building inter-dependencies among the local and global features by the interaction of inner and outer attention modules. Besides a vanilla AiA module, termed linear attention with AiA, two non-linear counterparts, namely, second-order polynomial attention and Gaussian attention, are also proposed to utilize the non-linear properties of the input features explicitly, via the second-order polynomial kernel and Gaussian kernel approximation. The deep convolutional neural network, equipped with the proposed AiA blocks, is referred to as Attention in Attention Network (AiA-Net). The AiA-Net learns to extract a discriminative pedestrian representation, which combines complementary person appearance and corresponding part features. Extensive ablation studies verify the effectiveness of the AiA mechanism and the use of non-linear features hidden in the feature map for attention design. Furthermore, our approach outperforms current state-of-the-art by a considerable margin across a number of benchmarks. In addition, state-of-the-art performance is also achieved in the video person retrieval task with the assistance of the proposed AiA blocks.
KW - Attention in Attention Mechanism
KW - Benchmark testing
KW - Convolutional Neural Network
KW - Estimation
KW - Feature extraction
KW - Gaussian Kernel
KW - Kernel
KW - Pedestrian Representation
KW - Person Retrieval
KW - Second-order Polynomial Kernel
KW - Task analysis
KW - Training
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=85104651366&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2021.3073512
DO - 10.1109/TPAMI.2021.3073512
M3 - Article
C2 - 33856981
AN - SCOPUS:85104651366
SN - 0162-8828
VL - 44
SP - 4626
EP - 4641
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 9
ER -