计算机科学
张量(固有定义)
人工智能
联营
参数化复杂度
机器学习
代表(政治)
压扁
集合(抽象数据类型)
算法
数学
政治
复合材料
政治学
材料科学
程序设计语言
法学
纯数学
作者
Panagiotis Koromilas,Mihalis A. Nicolaou,Theodoros Giannakopoulos,Yannis Panagakis
标识
DOI:10.1109/icassp49357.2023.10097030
摘要
Despite the considerable research output on Multimodal Learning for Affect-related tasks, most of the current methods are very complex in terms of the number of trainable parameters, and thus do not constitute effective solutions for real-life applications. In this work we try to alleviate this gap in the literature by introducing the Multimodal Attention Tensor Regression (MMATR) network, a lightweight model that is based on: (i) a static input representation (2D matrix of dimensions time × features) for each modality, which helps to avoid high-parameterized sequential models by incorporating a CNN, (ii) the replacement of the usual pooling and flattening operations as well as the linear layers by tensor contraction and tensor regression layers that are able to reduce the number of parameters, while keeping the high-order structure of the multimodal data, and (iii) a bimodal attention layer that learns multimodal co-occurrences. By a set of experiments comparing with a variety of state-of-the-art techniques, we show that the proposed MMATR can achieve results competitive to the state-of-the-art in the task of Multimodal Sentiment Analysis, albeit having four orders of magnitude fewer parameters.
科研通智能强力驱动
Strongly Powered by AbleSci AI