计算机科学
模式
人工智能
多模式学习
机器学习
模态(人机交互)
特征(语言学)
情态动词
特征学习
代表(政治)
模式识别(心理学)
社会科学
语言学
哲学
化学
社会学
政治
政治学
高分子化学
法学
作者
Xin Zou,Chang Tang,Wei Zhang,Kun Sun,Liangxiao Jiang
标识
DOI:10.1109/icme55011.2023.00165
摘要
Multimodal learning aims to integrate complementary information from different modalities for more reliable decisions. However, existing multimodal classification methods simply integrate the learned local features, which ignore the underlying structure of each modality and the higher-order correlation across modalities. In this paper, we propose a novel Hierarchical Attention Learning Network (HALNet) for multimodal classification. Specifically, HALNet has three merits: 1) A hierarchical feature fusion module is proposed to learn multilevel features, aggregating multi-level features for a global feature representation with the attention mechanism and progressive fusion tactics. 2) A cross-modal higher-order fusion module is introduced to capture the prospective cross-modal correlations at label space. 3) A dual prediction pattern is designed to generate credible decisions. Extensive experiments on three real-world multimodal datasets demonstrate that HALNet achieves competitive performance compared to the state-of-the-art.
科研通智能强力驱动
Strongly Powered by AbleSci AI