Identifying regions of interest in an image has long been of great importance in a wide range of tasks, including place recognition. In this letter, we propose a novel attention mechanism with flexible context, which can be incorporated into existing feedforward network architecture to learn image representations for long-term place recognition. In particular, in order to focus on regions that contribute positively to place recognition, we introduce a multiscale context-flexible network to estimate the importance of each spatial region in the feature map. Our model is trained end-to-end for place recognition and can detect regions of interest of arbitrary shape. Extensive experiments have been conducted to verify the effectiveness of our approach and the results demonstrate that our model can achieve consistently better performance than the state of the art on standard benchmark datasets. Finally, we visualize the learned attention maps to generate insights into what attention the network has learned.
- deep learning in robotics and automation
- visual-based navigation