Abstract
In a multimedia recommender system, rich multimodal dynamics of user-item interactions are worth availing ourselves of and have been facilitated by Graph Convolutional Networks (GCNs). Yet, the typical way of conducting multimodal fusion with GCN-based models is either through graph mergence fusion that delivers insufficient inter-modal dynamics, or through node alignment fusion that brings in noises which potentially harm multimodal modelling. Unlike existing works, we propose EgoGCN, a structure that seeks to enhance multimodal learning of user-item interactions. At its core is a simple yet effective fusion operation dubbed EdGe-wise mOdulation (EGO) fusion. EGO fusion adaptively distils edge-wise multimodal information and learns to modulate each unimodal node under the supervision of other modalities. It breaks isolated unimodal propagations, allows the most informative inter-modal messages to spread, whilst preserving intra-modal processing. We present a hard modulation and a soft modulation to fully investigate the multimodal dynamics behind. Experiments on two real-world datasets show that EgoGCN comfortably beats prior methods.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th ACM International Conference on Multimedia |
Editors | Marco Bertini, Klaus Schoeffmann |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 385-394 |
Number of pages | 10 |
ISBN (Electronic) | 9781450392037 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | ACM International Conference on Multimedia 2022 - Lisbon, Portugal Duration: 10 Oct 2022 → 14 Oct 2022 Conference number: 30th https://dl.acm.org/doi/proceedings/10.1145/3503161 (Proceedings) https://2022.acmmm.org/ (Website) |
Conference
Conference | ACM International Conference on Multimedia 2022 |
---|---|
Abbreviated title | MM 2022 |
Country/Territory | Portugal |
City | Lisbon |
Period | 10/10/22 → 14/10/22 |
Internet address |
|
Keywords
- graph fusion
- multimedia recommendation
- multimodal dynamics