计算机科学
人工智能
卷积神经网络
推论
模式识别(心理学)
规范化(社会学)
卷积(计算机科学)
核(代数)
计算
水准点(测量)
膨胀(度量空间)
特征提取
机器学习
人工神经网络
算法
组合数学
社会学
数学
人类学
地理
大地测量学
作者
Sheng Yang,Guosheng Lin,Qiuping Jiang,Weisi Lin
标识
DOI:10.1109/tmm.2019.2947352
摘要
Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendly module in DCNN architectures. In this work, we propose an end-to-end dilated inception network (DINet) for visual saliency prediction. It captures multi-scale contextual features effectively with very limited extra parameters. Instead of utilizing parallel standard convolutions with different kernel sizes as the existing inception module, our proposed dilated inception module (DIM) uses parallel dilated convolutions with different dilation rates which can significantly reduce the computation load while enriching the diversity of receptive fields in feature maps. Moreover, the performance of our saliency model is further improved by using a set of linear normalization-based probability distribution distance metrics as loss functions. As such, we can formulate saliency prediction as a global probability distribution prediction task for better saliency inference instead of a pixel-wise regression problem. Experimental results on several challenging saliency benchmark datasets demonstrate that our DINet with proposed loss functions can achieve state-of-the-art performance with shorter inference time.
科研通智能强力驱动
Strongly Powered by AbleSci AI