计算机科学
修补
编码器
人工智能
背景(考古学)
特征(语言学)
初始化
模式识别(心理学)
卷积神经网络
特征学习
上下文模型
像素
语义学(计算机科学)
计算机视觉
深度学习
图像(数学)
古生物学
哲学
操作系统
程序设计语言
生物
语言学
对象(语法)
作者
Deepak Pathak,Philipp Krähenbühl,Jeff Donahue,Trevor Darrell,Alexei A. Efros
标识
DOI:10.1109/cvpr.2016.278
摘要
We present an unsupervised visual feature learning algorithm driven by context-based pixel prediction. By analogy with auto-encoders, we propose Context Encoders - a convolutional neural network trained to generate the contents of an arbitrary image region conditioned on its surroundings. In order to succeed at this task, context encoders need to both understand the content of the entire image, as well as produce a plausible hypothesis for the missing part(s). When training context encoders, we have experimented with both a standard pixel-wise reconstruction loss, as well as a reconstruction plus an adversarial loss. The latter produces much sharper results because it can better handle multiple modes in the output. We found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures. We quantitatively demonstrate the effectiveness of our learned features for CNN pre-training on classification, detection, and segmentation tasks. Furthermore, context encoders can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI