计算机科学
联营
卷积神经网络
人工智能
编码器
模式识别(心理学)
循环神经网络
深度学习
卷积码
图层(电子)
序列(生物学)
解码方法
人工神经网络
算法
操作系统
生物
遗传学
有机化学
化学
作者
Hongtao Xie,Shancheng Fang,Zheng-Jun Zha,Yating Yang,Yan Li,Yongdong Zhang
摘要
In this article, we present Convoluitional Attention Networks (CAN) for unconstrained scene text recognition. Recent dominant approaches for scene text recognition are mainly based on Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), where the CNN encodes images and the RNN generates character sequences. Our CAN is different from these methods; our CAN is completely built on CNN and includes an attention mechanism. The distinctive characteristics of our method include (i) CAN follows encoder-decoder architecture, in which the encoder is a deep two-dimensional CNN and the decoder is a one-dimensional CNN; (ii) the attention mechanism is applied in every convolutional layer of the decoder, and we propose a novel spatial attention method using average pooling; and (iii) position embeddings are equipped in both a spatial encoder and a sequence decoder to give our networks a sense of location. We conduct experiments on standard datasets for scene text recognition, including Street View Text , IIIT5K, and ICDAR datasets. The experimental results validate the effectiveness of different components and show that our convolutional-based method achieves state-of-the-art or competitive performance over prior works, even without the use of RNN.
科研通智能强力驱动
Strongly Powered by AbleSci AI