计算机科学
匹配(统计)
情态动词
相似性(几何)
特征(语言学)
成对比较
人工智能
模式
感知
语音识别
图像(数学)
哲学
社会学
统计
生物
神经科学
语言学
化学
高分子化学
社会科学
数学
作者
Xin Zhao,Xiaobing Li,Yun Tie,Lin Qi
出处
期刊:Research Square - Research Square
日期:2023-06-13
标识
DOI:10.21203/rs.3.rs-3037240/v1
摘要
Abstract People resonate more with music when exposed to visual information, and music enhances their perception of video content. Cross-modal recommendation techniques can be used to suggest appropriate background music for a given video. However, there is not a simple correspondence between the different modal data. Therefore, to explore the association between the two modalities of video and music, we propose MFF-VBMR, a video background music recommendation model based on multi-level fusion features. The model uses the cross-modal information of static, dynamic and emotional content of video and music to realize the task of matching and recommending suitable background music for a given video. We propose a feature normalized convolutional similarity algorithm network FNC, which takes into account the pairwise similarity of visual and acoustic regions without losing region details. Experimental results show that the proposed model outperforms other existing models in terms of performance and achieves satisfactory results for video background music recommendation.
科研通智能强力驱动
Strongly Powered by AbleSci AI