计算机科学
心理学
视觉注意
认知心理学
感知
神经科学
作者
Volodymyr Mnih,Nicolas Heess,Alex Graves,Koray Kavukcuoglu
出处
期刊:Cornell University - arXiv
日期:2014-01-01
被引量:1071
标识
DOI:10.48550/arxiv.1406.6247
摘要
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.
科研通智能强力驱动
Strongly Powered by AbleSci AI