DualGNN: Dual Graph Neural Network for multimedia recommendation

Qifan Wang, Yinwei Wei, Jianhua Yin, Jianlong Wu, Xuemeng Song, Liqiang Nie

Research output: Contribution to journalArticleResearchpeer-review

64 Citations (Scopus)

Abstract

One of the important factors affecting micro-video recommender systems is to model the multi-modal user preference on the micro-video. Despite the remarkable performance of prior arts, they are still limited by fusing the user preference derived from different modalities in a unified manner, ignoring the users tend to place different emphasis on different modalities. Furthermore, modality-missing is ubiquity and unavoidable in the micro-video recommendation, some modalities information of micro-videos are lacked in many cases, which negatively affects the multi-modal fusion operations. To overcome these disadvantages, we propose a novel framework for the micro-video recommendation, dubbed Dual Graph Neural Network (DualGNN), upon the user-microvideo bipartite and user co-occurrence graphs, which leverages the correlation between users to collaboratively mine the particular fusion pattern for each user. Specifically, we first introduce a single-modal representation learning module, which performs graph operations on the user-microvideo graph in each modality to capture single-modal user preferences on different modalities. And then, we devise a multi-modal representation learning module to explicitly model the user's attentions over different modalities and inductively learn the multi-modal user preference. Finally, we propose a prediction module to rank the potential micro-videos for users. Extensive experiments on two public datasets demonstrate the significant superiority of our DualGNN over state-of-the-arts methods.

Original languageEnglish
Pages (from-to)1074-1084
Number of pages11
JournalIEEE Transactions on Multimedia
Volume25
DOIs
Publication statusPublished - 24 Dec 2023
Externally publishedYes

Keywords

  • graph neural network
  • Micro-video recommender systems
  • multi-modal fusion
  • representation learning

Cite this