卷积神经网络
计算机科学
人工智能
情态动词
模式识别(心理学)
代表(政治)
视觉文字
集合(抽象数据类型)
深度学习
匹配(统计)
图像检索
图像(数学)
数学
化学
统计
政治
政治学
高分子化学
法学
程序设计语言
作者
Yunchao Wei,Yao Zhao,Canyi Lu,Shikui Wei,Luoqi Liu,Zhongkui Zhu,Shuicheng Yan
出处
期刊:IEEE transactions on cybernetics
[Institute of Electrical and Electronics Engineers]
日期:2016-01-01
卷期号:: 1-12
被引量:284
标识
DOI:10.1109/tcyb.2016.2519449
摘要
Recently, convolutional neural network (CNN) visual features have demonstrated their powerful ability as a universal representation for various recognition tasks. In this paper, cross-modal retrieval with CNN visual features is implemented with several classic methods. Specifically, off-the-shelf CNN visual features are extracted from the CNN model, which is pretrained on ImageNet with more than one million images from 1000 object categories, as a generic image representation to tackle cross-modal retrieval. To further enhance the representational ability of CNN visual features, based on the pretrained CNN model on ImageNet, a fine-tuning step is performed by using the open source Caffe CNN library for each target data set. Besides, we propose a deep semantic matching method to address the cross-modal retrieval problem with respect to samples which are annotated with one or multiple labels. Extensive experiments on five popular publicly available data sets well demonstrate the superiority of CNN visual features for cross-modal retrieval.
科研通智能强力驱动
Strongly Powered by AbleSci AI