计算机科学
文本识别
人工智能
自然语言处理
变压器
语音识别
端到端原则
机器翻译
判决
抄写(语言学)
语言模型
图像(数学)
语言学
量子力学
物理
哲学
电压
作者
Tonghua Su,Shuchen Liu,Shengjie Zhou
标识
DOI:10.1007/978-3-030-86331-9_7
摘要
Text image recognition and translation have a wide range of applications. It is straightforward to work out a two-stage approach: first perform the text recognition, then translate the text to target language. The handwritten text recognition model and the machine translation model are trained separately. Any transcription error may degrade the translation quality. This paper proposes an end-to-end leaning architecture that directly translates English handwritten text in images into Chinese. The handwriting recognition task and translation task are combined in a unified deep learning model. Firstly we conduct a visual encoding, next bridge the semantic gaps using a feature transformer and finally present a textual decoder to generate the target sentence. To train the model effectively, we use transfer learning to improve the generalization of the model under low-resource conditions. The experiments are carried out to compare our method to the traditional two-stage one. The results indicate that the performance of end-to-end model greatly improved as the amount of training data increases. Furthermore, when larger amount of training data is available, the end-to-end model is more advantageous.
科研通智能强力驱动
Strongly Powered by AbleSci AI