TY - JOUR
T1 - MGAT
T2 - Multimodal Graph Attention Network for recommendation
AU - Tao, Zhulin
AU - Wei, Yinwei
AU - Wang, Xiang
AU - He, Xiangnan
AU - Huang, Xianglin
AU - Chua, Tat-Seng
N1 - Funding Information:
This research is part of NExT++ research and also supported by the National Research Foundation Singapore under its AI Singapore Programme, Linksure Network Holding Pte Ltd, and the Asia Big Data Association(Award No.: AISG-100E-2018-002). NExT++ is supported by the National Research Foundation, Prime Minister's Office, Singapore under its IRC@SG Funding Initiative. Moreover, this research is also supported by the national key research and development program of china(no.2019YFB1406201) and the future school program (no.CSDP17FS3231).
Funding Information:
This research is part of NExT++ research and also supported by the National Research Foundation Singapore under its AI Singapore Programme, Linksure Network Holding Pte Ltd, and the Asia Big Data Association(Award No.: AISG-100E-2018-002). NExT++ is supported by the National Research Foundation, Prime Minister's Office, Singapore under its IRC@SG Funding Initiative. Moreover, this research is also supported by the national key research and development program of china(no.2019YFB1406201) and the future school program (no.CSDP17FS3231).
Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/9
Y1 - 2020/9
N2 - Graph neural networks (GNNs) have shown great potential for personalized recommendation. At the core is to reorganize interaction data as a user-item bipartite graph and exploit high-order connectivity among user and item nodes to enrich their representations. While achieving great success, most existing works consider interaction graph based only on ID information, foregoing item contents from multiple modalities (e.g., visual, acoustic, and textual features of micro-video items). Distinguishing personal interests on different modalities at a granular level was not explored until recently proposed MMGCN (Wei et al., 2019). However, it simply employs GNNs on parallel interaction graphs and treats information propagated from all neighbors equally, failing to capture user preference adaptively. Hence, the obtained representations might preserve redundant, even noisy information, leading to non-robustness and suboptimal performance. In this work, we aim to investigate how to adopt GNNs on multimodal interaction graphs, to adaptively capture user preference on different modalities and offer in-depth analysis on why an item is suitable to a user. Towards this end, we propose a new Multimodal Graph Attention Network, short for MGAT, which disentangles personal interests at the granularity of modality. In particular, built upon multimodal interaction graphs, MGAT conducts information propagation within individual graphs, while leveraging the gated attention mechanism to identify varying importance scores of different modalities to user preference. As such, it is able to capture more complex interaction patterns hidden in user behaviors and provide a more accurate recommendation. Empirical results on two micro-video recommendation datasets, Tiktok and MovieLens, show that MGAT exhibits substantial improvements over the state-of-the-art baselines like NGCF (Wang, He, et al., 2019) and MMGCN (Wei et al., 2019). Further analysis on a case study illustrates how MGAT generates attentive information flow over multimodal interaction graphs.
AB - Graph neural networks (GNNs) have shown great potential for personalized recommendation. At the core is to reorganize interaction data as a user-item bipartite graph and exploit high-order connectivity among user and item nodes to enrich their representations. While achieving great success, most existing works consider interaction graph based only on ID information, foregoing item contents from multiple modalities (e.g., visual, acoustic, and textual features of micro-video items). Distinguishing personal interests on different modalities at a granular level was not explored until recently proposed MMGCN (Wei et al., 2019). However, it simply employs GNNs on parallel interaction graphs and treats information propagated from all neighbors equally, failing to capture user preference adaptively. Hence, the obtained representations might preserve redundant, even noisy information, leading to non-robustness and suboptimal performance. In this work, we aim to investigate how to adopt GNNs on multimodal interaction graphs, to adaptively capture user preference on different modalities and offer in-depth analysis on why an item is suitable to a user. Towards this end, we propose a new Multimodal Graph Attention Network, short for MGAT, which disentangles personal interests at the granularity of modality. In particular, built upon multimodal interaction graphs, MGAT conducts information propagation within individual graphs, while leveraging the gated attention mechanism to identify varying importance scores of different modalities to user preference. As such, it is able to capture more complex interaction patterns hidden in user behaviors and provide a more accurate recommendation. Empirical results on two micro-video recommendation datasets, Tiktok and MovieLens, show that MGAT exhibits substantial improvements over the state-of-the-art baselines like NGCF (Wang, He, et al., 2019) and MMGCN (Wei et al., 2019). Further analysis on a case study illustrates how MGAT generates attentive information flow over multimodal interaction graphs.
KW - Attention mechanism
KW - Gate mechanism
KW - Graph
KW - Micro-videos
KW - Personalized recommendation
UR - https://www.scopus.com/pages/publications/85084477591
U2 - 10.1016/j.ipm.2020.102277
DO - 10.1016/j.ipm.2020.102277
M3 - Article
AN - SCOPUS:85084477591
SN - 0306-4573
VL - 57
JO - Information Processing and Management
JF - Information Processing and Management
IS - 5
M1 - 102277
ER -