计算机科学
模式
模态(人机交互)
人工智能
特征(语言学)
情态动词
任务(项目管理)
一般化
对偶(语法数字)
机器学习
深度学习
互补性(分子生物学)
数学
工程类
系统工程
高分子化学
艺术
化学
社会学
哲学
数学分析
文学类
生物
遗传学
语言学
社会科学
作者
Wenxiu Geng,Xiangxian Li,Yu Bian
标识
DOI:10.1145/3591106.3592260
摘要
Multimodal sentiment analysis is a complex research problem. Firstly, current multimodal approaches fail to adequately consider the intricate multi-level correspondence between modalities and the unique contextual information within each modality; secondly, cross-modal fusion methods for inter-modal fusion somewhat weaken the mode-specific internal features, which is a limitation of the traditional single-branch model. To this end, we proposes a dual-branch enhanced multi-task learning network (DBEM), a new architecture that considers both the multiple dependencies of sequences and the heterogeneity of multimodal data, for better multimodal sentiment analysis. The global-local branch takes into account the intra-modal dependencies of different length time subsequences and aggregates global and local features to enrich the feature diversity. The cross-refine branch considers the difference in information density of different modalities and adopts coarse-to-fine fusion learning to model the inter-modal dependencies. Coarse-grained fusion achieves low-level feature reinforcement of audio and visual modalities, and fine-grained fusion improves the ability to integrate information complementarity between different levels of modalities. Finally, multi-task learning is carried out to improve the generalization and performance of the model based on the enhanced fusion features obtained from the dual-branch network. Compared with the single branch network (SBEM, variant of DBEM model) and SOTA methods, the experimental results on the two datasets CH-SIMS and CMU-MOSEI validate the effectiveness of the DBEM model.
科研通智能强力驱动
Strongly Powered by AbleSci AI