计算机科学
人工智能
视听
Kadir–Brady显著性检测器
计算机视觉
显著性图
可视化
情态动词
模态(人机交互)
模式识别(心理学)
语音识别
图像(数学)
多媒体
化学
高分子化学
作者
Xiongkuo Min,Guangtao Zhai,Jiantao Zhou,Xiao–Ping Zhang,Xiaokang Yang,Xinping Guan
出处
期刊:IEEE transactions on image processing
[Institute of Electrical and Electronics Engineers]
日期:2020-01-01
卷期号:29: 3805-3819
被引量:179
标识
DOI:10.1109/tip.2020.2966082
摘要
Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.
科研通智能强力驱动
Strongly Powered by AbleSci AI