TY - JOUR
T1 - Multi-level view associative convolution network for view-based 3D model retrieval
AU - Gao, Zan
AU - Zhang, Yan
AU - Zhang, Hua
AU - Guan, Weili
AU - Feng, Dong
AU - Chen, Shengyong
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61872270, Grant 62020106004, Grant 92048301, and Grant 61572357; in part by the Young Creative Team in universities of Shandong Province under Grant 2020KJN012; in part by the Jinan 20 Projects in universities under Grant 2020GXRC040 and Grant 2018GXRC014; in part by the Tianjin New Generation Artificial Intelligence Major Program under Grant 19ZXZNGX00110; in part by the New Artificial Intelligence (AI) Project Toward the Integration of Education and Industry in Qilu University of Technology (QLUT) under Grant 2020KJC-JC01; and in part by the Shandong Provincial Key Research and Development Program under Grant 2019TSLH0202
Publisher Copyright:
© 1991-2012 IEEE.
PY - 2022/4
Y1 - 2022/4
N2 - With the continuous improvement of image processing capabilities, a three-dimensional (3D) model that can contain rich information is becoming the fourth type of multimedia data (in addition to sound, image, and video). Moreover, since there is a wide range of applications of 3D models, how to quickly and effectively obtain the correct target model from the massive data has become a key issue. To date, 3D model retrieval approaches have been proposed, and in these approaches, view-based 3D model retrieval methods can achieve satisfactory performance. In the 3D model retrieval task, the latent relationship mining of all images in a 3D model, the adaptive fusion of different images, and the discriminative feature extraction are the main challenges, but in most existing solutions, these issues are separately performed and they are not explored in an end-to-end network architecture. To solve these issues, in this work, we propose a novel and effective multi-level view associative convolution network (MLVACN) to realize view-based 3D model retrieval, where the relationship exploration of multiple-view images, the fusion of different images, and the feature discrimination learning are realized in a unified end-to-end framework. Specifically, we design the group association layer and the block association layer to study the latent relationships among different views from the view-level and the block-level, respectively. Moreover, the weight fusion layer is further designed to adaptively fuse different views in a 3D model. In addition, these three layers are embedded into the MLVACN. Finally, the pairwise discrimination loss function is proposed to learn the discriminative features of the 3D model. Extensive experimental results on three 3D model retrieval datasets including ModelNet40, ModelNet10, and ShapeNetCore55 demonstrate that MLVACN can outperform state-of-the-art methods in term of mAP. When the ModelNet40 dataset is used, the mAP of MLVACN is improved by 13.25%, 7.75%, 3.95%, and 0.61% as compared to those of the MVCNN, GVCNN, PVNet, and MLVCNN methods, respectively.
AB - With the continuous improvement of image processing capabilities, a three-dimensional (3D) model that can contain rich information is becoming the fourth type of multimedia data (in addition to sound, image, and video). Moreover, since there is a wide range of applications of 3D models, how to quickly and effectively obtain the correct target model from the massive data has become a key issue. To date, 3D model retrieval approaches have been proposed, and in these approaches, view-based 3D model retrieval methods can achieve satisfactory performance. In the 3D model retrieval task, the latent relationship mining of all images in a 3D model, the adaptive fusion of different images, and the discriminative feature extraction are the main challenges, but in most existing solutions, these issues are separately performed and they are not explored in an end-to-end network architecture. To solve these issues, in this work, we propose a novel and effective multi-level view associative convolution network (MLVACN) to realize view-based 3D model retrieval, where the relationship exploration of multiple-view images, the fusion of different images, and the feature discrimination learning are realized in a unified end-to-end framework. Specifically, we design the group association layer and the block association layer to study the latent relationships among different views from the view-level and the block-level, respectively. Moreover, the weight fusion layer is further designed to adaptively fuse different views in a 3D model. In addition, these three layers are embedded into the MLVACN. Finally, the pairwise discrimination loss function is proposed to learn the discriminative features of the 3D model. Extensive experimental results on three 3D model retrieval datasets including ModelNet40, ModelNet10, and ShapeNetCore55 demonstrate that MLVACN can outperform state-of-the-art methods in term of mAP. When the ModelNet40 dataset is used, the mAP of MLVACN is improved by 13.25%, 7.75%, 3.95%, and 0.61% as compared to those of the MVCNN, GVCNN, PVNet, and MLVCNN methods, respectively.
KW - adaptive weight fusion
KW - block association layer
KW - group association layer
KW - multi-level
KW - pairwise discrimination loss
KW - View-based 3D model retrieval
UR - http://www.scopus.com/inward/record.url?scp=85112471828&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2021.3091581
DO - 10.1109/TCSVT.2021.3091581
M3 - Article
AN - SCOPUS:85112471828
SN - 1051-8215
VL - 32
SP - 2264
EP - 2278
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 4
ER -