歌词
计算机科学
语音识别
音频分析器
情态动词
情绪识别
背景(考古学)
价(化学)
人工智能
音频信号处理
音频信号
语音编码
物理
文学类
艺术
古生物学
生物
化学
高分子化学
量子力学
作者
Jiahao Zhao,Ganghui Ru,Yi Yu,Yulun Wu,Dichucheng Li,Wei Li
标识
DOI:10.1109/icme52920.2022.9859812
摘要
Computational music emotion recognition is to recognize the emotional content in music tracks. In computational music emotion recognition studies, researchers have paid close attention to the audio content of the music tracks. Although lyrics content and music context contribute greatly to the perceived emotion, these kinds of emotional information are usually ignored. Based on this finding, we propose a multimodal music emotion recognition method jointly predicting the valence and arousal values by combining the audio, lyrics, track name, and artist of a given track. Audio features, lyrics features and context features are extracted separately and fused by a cross-modal attention mechanism, forming a hierarchical structure. Our proposed model outperforms two baselines by a large margin and achieves state-of-the-art performance on two public datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI