Abstract
This work proposes a novel and simple sequential learning strategy to train models on videos and texts for multimodal sentiment analysis. To estimate sentiment polarities on unseen out-of-distribution data, we introduce a multimodal model that is trained either in a single source domain or multiple source domains using our learning strategy. This strategy starts with learning domain invariant features from the text, followed by learning sparse domain-agnostic features from videos, assisted by the selected features learned in text. Our experimental results demonstrate that our model achieves significantly better performance than the state-of-the-art approaches on average in both single-source and multi-source settings. Our feature selection procedure favors the features that are independent to each other and are strongly correlated with their polarity labels. To facilitate research on this topic, the source code of this work will be publicly available upon acceptance.
Original language | English |
---|---|
Title of host publication | Proceedings of the 32nd ACM International Conference on Multimedia |
Editors | Yadan Luo, Toan Do, Yan Yan |
Place of Publication | New York NY USA |
Publisher | Association for Computing Machinery (ACM) |
Pages | 9729-9738 |
Number of pages | 10 |
ISBN (Electronic) | 9798400706868 |
DOIs | |
Publication status | Published - 2024 |
Event | ACM International Conference on Multimedia 2024 - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024 Conference number: 32nd https://dl.acm.org/doi/book/10.1145/3664647 (Proceedings) https://2024.acmmm.org/ (Website) |
Conference
Conference | ACM International Conference on Multimedia 2024 |
---|---|
Abbreviated title | MM 2024 |
Country/Territory | Australia |
City | Melbourne |
Period | 28/10/24 → 1/11/24 |
Internet address |
|
Keywords
- causal inference
- feature selection
- msa
- ood