Attention in Attention Networks for person retrieval

Pengfei Fang, Jieming Zhou, Soumava KUMAR Roy, Pan Ji, Lars Petersson, Mehrtash T. Harandi

Research output: Contribution to journalArticleResearchpeer-review

35 Citations (Scopus)

Abstract

This paper generalizes the Attention in Attention (AiA) mechanism, proposed in [1], by employing explicit mapping in reproducing kernel Hilbert spaces to generate attention values of the input feature map. The AiA mechanism models the capacity of building inter-dependencies among the local and global features by the interaction of inner and outer attention modules. Besides a vanilla AiA module, termed linear attention with AiA, two non-linear counterparts, namely, second-order polynomial attention and Gaussian attention, are also proposed to utilize the non-linear properties of the input features explicitly, via the second-order polynomial kernel and Gaussian kernel approximation. The deep convolutional neural network, equipped with the proposed AiA blocks, is referred to as Attention in Attention Network (AiA-Net). The AiA-Net learns to extract a discriminative pedestrian representation, which combines complementary person appearance and corresponding part features. Extensive ablation studies verify the effectiveness of the AiA mechanism and the use of non-linear features hidden in the feature map for attention design. Furthermore, our approach outperforms current state-of-the-art by a considerable margin across a number of benchmarks. In addition, state-of-the-art performance is also achieved in the video person retrieval task with the assistance of the proposed AiA blocks.

Original languageEnglish
Pages (from-to)4626-4641
Number of pages16
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume44
Issue number9
DOIs
Publication statusPublished - 1 Sept 2022

Keywords

  • Attention in Attention Mechanism
  • Benchmark testing
  • Convolutional Neural Network
  • Estimation
  • Feature extraction
  • Gaussian Kernel
  • Kernel
  • Pedestrian Representation
  • Person Retrieval
  • Second-order Polynomial Kernel
  • Task analysis
  • Training
  • Visualization

Cite this