Enriched deep recurrent visual attention model for multiple object recognition

Artsiom Ablavatski, Shijian Lu, Jianfei Cai

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

27 Citations (Scopus)


We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) -An improved attention-based architecture for multiple object recognition. The proposed model is a fully differentiable unit that can be optimized end-To-end by using Stochastic Gradient Descent (SGD). The Spatial Transformer (ST) was employed as visual attention mechanism which allows to learn the geometric transformation of objects within images. With the combination of the Spatial Transformer and the powerful recurrent architecture, the proposed EDRAM can localize and recognize objects simultaneously. EDRAM has been evaluated on two publicly available datasets including MNIST Cluttered (with 70K cluttered digits) and SVHN (with up to 250k real world images of house numbers). Experiments show that it obtains superior performance as compared with the state-of-The-Art models.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017
EditorsMichael S. Brown, Rogério Feris, Conrad Sanderson, Matthew Turk
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)9781509048229
ISBN (Print)9781509048236
Publication statusPublished - 2017
Externally publishedYes
EventIEEE Winter Conference on Applications of Computer Vision 2017 - Santa Rosa, United States of America
Duration: 24 Mar 201731 Mar 2017
https://ieeexplore.ieee.org/xpl/conhome/7925475/proceeding (Proceedings)


ConferenceIEEE Winter Conference on Applications of Computer Vision 2017
Abbreviated titleWACV 2017
Country/TerritoryUnited States of America
CitySanta Rosa
Internet address

Cite this