Learning local feature representation from matching, clustering and spatial transform

Jianhan Mei, Xudong Jiang, Jianfei Cai

Research output: Contribution to journalArticleResearchpeer-review

2 Citations (Scopus)


This paper focuses on learning the local image region representation via deep neural networks. Existing works mainly learn from matched corresponding image patches, with which the learned feature is too sensitive to the individual local patch matching result and cannot handle aggregation based tasks such as image level retrieval. Thus, we propose to use both the matched corresponding image patches and the clustering result as labels for the network training. To resolve the inconsistency between the matched correspondences and clustering results, we propose a semi-supervised iterative training scheme together with a dual margins loss. Moreover, a jointly learned spatial transform prediction network is utilized to obtain better spatial transform invariance of the learned local features. Using SIFT as the label initializer, experimental results show the comparable or even better performance than the hand-crafted feature, which sheds lights on learning local feature representation in an unsupervised or weakly supervised manner.

Original languageEnglish
Article number102601
Number of pages10
JournalJournal of Visual Communication and Image Representation
Publication statusPublished - Aug 2019
Externally publishedYes


  • Convolutional Neural Network (CNN)
  • Local feature learning
  • Local image representation
  • Semi-supervised learning
  • Spatial transform

Cite this