计算机科学
情态动词
人工智能
心情
机器学习
集合(抽象数据类型)
语音识别
心理学
精神科
化学
高分子化学
程序设计语言
作者
Sujeesha A.S.,Mala J.B.,Rajeev Rajan
标识
DOI:10.1016/j.engappai.2023.107355
摘要
Automatic music recommendation systems based on human emotions are becoming popular nowadays. Since audio and lyrics can provide a rich set of information regarding a song, a fusion model including both modalities can enhance classification accuracy and is attempted in this paper. The main objective of the paper is to address music mood classification using various attention mechanisms, namely, self-attention (SA), channel attention (CA), and hierarchical attention network (HAN), on a multi-modal music mood classification system. Experimental results show that multi-modal architectures with attention have achieved higher accuracy than multi-modal architectures without attention and uni-modal architectures. Motivated by the performance of attention mechanisms, a new network architecture, HAN-CA-SA based multi-modal classification system, is proposed, which reported an accuracy of 82.35%. ROC and Kappa are also computed to see the efficacy of the proposed model. The proposed model is also evaluated using the K-fold cross-validation technique. The performance of the proposed model is compared with that of XLNet and CNN-BERT systems. In addition, McNemar's statistical hypothesis test is conducted to reaffirm the importance of the proposed approach.
科研通智能强力驱动
Strongly Powered by AbleSci AI