TY - JOUR
T1 - DualGNN
T2 - Dual Graph Neural Network for multimedia recommendation
AU - Wang, Qifan
AU - Wei, Yinwei
AU - Yin, Jianhua
AU - Wu, Jianlong
AU - Song, Xuemeng
AU - Nie, Liqiang
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grants 62172261 and 61802231 and in part by the Shandong Provincial Natural Science Foundation under Grant ZR2019QF001.
Publisher Copyright:
© 1999-2012 IEEE.
PY - 2023/12/24
Y1 - 2023/12/24
N2 - One of the important factors affecting micro-video recommender systems is to model the multi-modal user preference on the micro-video. Despite the remarkable performance of prior arts, they are still limited by fusing the user preference derived from different modalities in a unified manner, ignoring the users tend to place different emphasis on different modalities. Furthermore, modality-missing is ubiquity and unavoidable in the micro-video recommendation, some modalities information of micro-videos are lacked in many cases, which negatively affects the multi-modal fusion operations. To overcome these disadvantages, we propose a novel framework for the micro-video recommendation, dubbed Dual Graph Neural Network (DualGNN), upon the user-microvideo bipartite and user co-occurrence graphs, which leverages the correlation between users to collaboratively mine the particular fusion pattern for each user. Specifically, we first introduce a single-modal representation learning module, which performs graph operations on the user-microvideo graph in each modality to capture single-modal user preferences on different modalities. And then, we devise a multi-modal representation learning module to explicitly model the user's attentions over different modalities and inductively learn the multi-modal user preference. Finally, we propose a prediction module to rank the potential micro-videos for users. Extensive experiments on two public datasets demonstrate the significant superiority of our DualGNN over state-of-the-arts methods.
AB - One of the important factors affecting micro-video recommender systems is to model the multi-modal user preference on the micro-video. Despite the remarkable performance of prior arts, they are still limited by fusing the user preference derived from different modalities in a unified manner, ignoring the users tend to place different emphasis on different modalities. Furthermore, modality-missing is ubiquity and unavoidable in the micro-video recommendation, some modalities information of micro-videos are lacked in many cases, which negatively affects the multi-modal fusion operations. To overcome these disadvantages, we propose a novel framework for the micro-video recommendation, dubbed Dual Graph Neural Network (DualGNN), upon the user-microvideo bipartite and user co-occurrence graphs, which leverages the correlation between users to collaboratively mine the particular fusion pattern for each user. Specifically, we first introduce a single-modal representation learning module, which performs graph operations on the user-microvideo graph in each modality to capture single-modal user preferences on different modalities. And then, we devise a multi-modal representation learning module to explicitly model the user's attentions over different modalities and inductively learn the multi-modal user preference. Finally, we propose a prediction module to rank the potential micro-videos for users. Extensive experiments on two public datasets demonstrate the significant superiority of our DualGNN over state-of-the-arts methods.
KW - graph neural network
KW - Micro-video recommender systems
KW - multi-modal fusion
KW - representation learning
UR - http://www.scopus.com/inward/record.url?scp=85122068240&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3138298
DO - 10.1109/TMM.2021.3138298
M3 - Article
AN - SCOPUS:85122068240
SN - 1520-9210
VL - 25
SP - 1074
EP - 1084
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -