计算机科学
视听
人工智能
上下文图像分类
音频挖掘
图像融合
计算机视觉
音频信号处理
图像(数学)
模式识别(心理学)
多媒体
语音识别
音频信号
语音编码
语音处理
语音活动检测
作者
Hong Liang Dai,Xinfeng Zhang,Haiyang Yu
标识
DOI:10.1145/3617695.3617728
摘要
The short multimedia video has become one of the most representative products in the new media era. Because short videos are highly time-sensitive, it is essential to manage and classify them efficiently. Currently, the primary approaches for short video classification are to extract the image features and then make judgments to complete the classification. However, audio information, crucial for classification tasks, is often discarded or used separately. To this end, we propose an attention-based audio-visual fusion method for short video classification. In this method, the attention module calculates the magnitude of the influence of image and audio information on the classification result, and the image and audio information are fused for the short video classification. The experimental results on different short multimedia video datasets demonstrate that the proposed attention-based audio-visual fusion method is effective and can significantly improve the classification accuracy of short videos.
科研通智能强力驱动
Strongly Powered by AbleSci AI