变压器
计算机科学
计算智能
任务(项目管理)
人工智能
机制(生物学)
语音识别
模式识别(心理学)
自然语言处理
工程类
电压
电气工程
物理
量子力学
系统工程
作者
Qian‐Feng Zhang,Feng Liu,Wanru Song
标识
DOI:10.1007/s40747-024-01713-8
摘要
Intelligence technology has widely empowered education. As an example, Optical Character Recognition (OCR) can be used in smart education scenarios such as online homework correction and teaching data analysis. One of the fundamental yet challenging tasks is to recognize images of handwritten English text as editable text accurately. This is because handwritten text tends to have different writing habits as well as smearing and overlapping, resulting in hard alignment between the image and the real text. Additionally, the lack of data on handwritten text further leads to a lower recognition rate. To address the above issue, on the one hand, this paper extends the existing dataset and introduces hyphenated data annotation to provide data support for improving the robustness and discrimination of the model; on the other hand, a novel framework named Improved Multi-task Transformer based on Localization Mechanism Network (IMTLM-Net) is proposed for handwritten English text recognition. IMTLM-Net contains two parts, namely the encoding and decoding modules. The encoding module introduces a dual-stream processing mechanism. That is, in the simultaneous processing of text and images, a Vision Transformer (VIT) is utilized to encode images, and a Permutation Language Model (PLM) is designed for word arrangement. Two Multiple Head Attention (MHA) units are employed in the decoding module, focusing on text sequences and image sequences. Moreover, the localization mechanism (LM) is applied to enhance font structure feature extraction from image data, which in turn improves the model's ability to capture complex details. Numerous experiments demonstrate that the proposed method achieves state-of-the-art results in handwritten text recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI