MMSS: multi-modal sharable and specific feature learning for RGB-D object recognition

Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

51 Citations (Scopus)

Abstract

Most of the feature-learning methods for RGB-D object recognition either learn features from color and depth modalities separately, or simply treat RGB-D as undifferentiated four-channel data, which cannot adequately exploit the relationship between different modalities. Motivated by the intuition that different modalities should contain not only some modal-specific patterns but also some shared common patterns, we propose a multi-modal feature learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities. In this way, we obtain features reflecting shared properties as well as modal-specific properties in different modalities. The information of the multi-modal learning frameworks is back-propagated to the early CNN layers. Experimental results show that our proposed multi-modal feature learning method outperforms state-of-the-art approaches on two widely used RGB-D object benchmark datasets.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Computer Vision, ICCV 2015
EditorsKatsushi Ikeuchi, Christoph Schnörr, Josef Sivic, René Vidal
Place of PublicationPiscataway NJ USA
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages1125-1133
Number of pages9
ISBN (Electronic)9781467383912, 9781467383905
DOIs
Publication statusPublished - 2015
Externally publishedYes
EventIEEE International Conference on Computer Vision 2015 - Santiago, Chile
Duration: 11 Dec 201518 Dec 2015
Conference number: 15th

Conference

ConferenceIEEE International Conference on Computer Vision 2015
Abbreviated titleICCV 2015
CountryChile
CitySantiago
Period11/12/1518/12/15

Cite this

Wang, A., Cai, J., Lu, J., & Cham, T-J. (2015). MMSS: multi-modal sharable and specific feature learning for RGB-D object recognition. In K. Ikeuchi, C. Schnörr, J. Sivic, & R. Vidal (Eds.), Proceedings - 2015 IEEE International Conference on Computer Vision, ICCV 2015 (pp. 1125-1133). [7410491] IEEE, Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICCV.2015.134