计算机科学
情态动词
人工智能
自然语言处理
情绪分析
语音识别
化学
高分子化学
作者
Taine Zhao,Ming Kong,Tian Liang,Qiang Zhu,Kun Kuang,Fei Wu
标识
DOI:10.1145/3591106.3592296
摘要
Multi-modal Sentiment Analysis (MSA) is a hotspot of multi-modal fusion. To make full use of the correlation and complementarity between modalities in the process of fusing multi-modal data, we propose a two-stage framework of Contrastive Language-Audio Pre-training (CLAP) for the MSA task: 1) Making contrastive pre-training on an unlabeled large-scaled external data to yield better single-modal representations; 2) Adopting a Transformer-based multi-modal fusion module, to achieve further single-modal feature optimization and sentiment prediction via the task-driven training process. Our work fully demonstrates the importance and necessity of core elements such as pre-training, contrastive learning, and representation learning for the MSA task and significantly outperforms existing methods on two well-recognized MSA benchmarks.
科研通智能强力驱动
Strongly Powered by AbleSci AI