计算机科学
联营
人工智能
棱锥(几何)
失真(音乐)
比例(比率)
透视图(图形)
文本识别
编码器
频道(广播)
特征(语言学)
模式识别(心理学)
透视失真
图像(数学)
计算机视觉
语音识别
哲学
物理
光学
放大器
带宽(计算)
操作系统
量子力学
语言学
计算机网络
作者
Hao Liao,Xiurong Du,Yun Wu,Da‐Han Wang
标识
DOI:10.1145/3581807.3581808
摘要
Scene text recognition have proven to be highly effective in solving various computer vision tasks. Recently, numerous recognition algorithms based on the encoder-decoder framework have been proposed for handling scene texts with perspective distortion and curve shape. Nevertheless, most of these methods only consider single-scale features while not taking multi-scale features into account. Meanwhile, the existing text recognition methods are mainly used for English texts, whereas ignoring Chinese texts' pivotal role. In this paper, we proposed an end-to-end method to integrate multi-scale features for Chinese scene text recognition (CSTR). Specifically, we adopted and customized the Dense Atrous Spatial Pyramid Pooling (DenseASPP) to our backbone network to capture multi-scale features of the input image while simultaneously extending the receptive fields. Moreover, we added Squeeze-and-Excitation Networks (SE) to capture attentional features with global information to improve the performance of CSTR further. The experimental results of the Chinese scene text datasets demonstrate that the proposed method can efficiently mitigate the impacts of the loss of contextual information caused by the text scale varying and outperforms the state-of-the-art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI