期刊:Communications in computer and information science日期:2023-11-27卷期号:: 270-281
标识
DOI:10.1007/978-981-99-8181-6_21
摘要
The core of multimodal sentiment analysis is to find effective encoding and fusion methods to make accurate predictions. However, previous works ignore the problems caused by the sampling heterogeneity of modalities, and visual-audio fusion does not filter out noise and redundancy in a progressive manner. On the other hand, current deep learning approaches for multimodal fusion rely on single-channel fusion (horizontal position/vertical space channel), and models of the human brain highlight the importance of multichannel fusion. In this paper, inspired by the perceptual mechanisms of the human brain in neuroscience, to overcome the above problems, we propose a novel framework named Progressive Multichannel Fusion Network (PMFNet) to meet the different processing needs of each modality and provide interaction and integration between modalities at different encoded representation densities, enabling them to be better encoded in a progressive manner and fused over multiple channels. Extensive experiments conducted on public datasets demonstrate that our method gains superior or comparable results to the state-of-the-art models.