计算机科学
人工智能
分割
滑动窗口协议
模式识别(心理学)
可扩展性
边距(机器学习)
特征(语言学)
目标检测
帕斯卡(单位)
卷积神经网络
机器学习
窗口(计算)
哲学
程序设计语言
操作系统
数据库
语言学
作者
Ross Girshick,Jeff Donahue,Trevor Darrell,Jitendra Malik
出处
期刊:Cornell University - arXiv
日期:2013-11-11
被引量:375
摘要
Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.
科研通智能强力驱动
Strongly Powered by AbleSci AI