Modality and component aware feature fusion for RGB-D scene classification

Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

36 Citations (Scopus)

Abstract

While convolutional neural networks (CNN) have been excellent for object recognition, the greater spatial variability in scene images typically meant that the standard full-image CNN features are suboptimal for scene classification. In this paper, we investigate a framework allowing greater spatial flexibility, in which the Fisher vector (FV) encoded distribution of local CNN features, obtained from a multitude of region proposals per image, is considered instead. The CNN features are computed from an augmented pixel-wise representation comprising multiple modalities of RGB, HHA and surface normals, as extracted from RGB-D data. More significantly, we make two postulates: (1) component sparsity - that only a small variety of region proposals and their corresponding FV GMM components contribute to scene discriminability, and (2) modal non-sparsity - within these discriminative components, all modalities have important contribution. In our framework, these are implemented through regularization terms applying group lasso to GMM components and exclusive group lasso across modalities. By learning and combining regressors for both proposal-based FV features and global CNN features, we were able to achieve state-of-the-art scene classification performance on the SUNRGBD Dataset and NYU Depth Dataset V2.

Original languageEnglish
Title of host publicationProceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
EditorsLourdes Agapito, Tamara Berg, Jana Kosecka, Lihi Zelnik-Manor
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages5995-6004
Number of pages10
ISBN (Electronic)9781467388504, 9781467388511
ISBN (Print)9781467388528
DOIs
Publication statusPublished - 2016
Externally publishedYes
EventIEEE Conference on Computer Vision and Pattern Recognition 2016 - Las Vegas, United States of America
Duration: 27 Jun 201630 Jun 2016
http://cvpr2016.thecvf.com/
https://ieeexplore.ieee.org/xpl/conhome/7776647/proceeding (Proceedings)

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition 2016
Abbreviated titleCVPR 2016
CountryUnited States of America
CityLas Vegas
Period27/06/1630/06/16
Internet address

Cite this