计算机科学
拼写
错误检测和纠正
人工智能
自然语言处理
语音识别
词(群论)
字错误率
语言模型
规范化(社会学)
人工神经网络
语言学
算法
人类学
哲学
社会学
作者
Anuruth Lertpiya,Tawunrat Chalothorn,Ekapol Chuangsuwanich
出处
期刊:IEEE Access
[Institute of Electrical and Electronics Engineers]
日期:2020-01-01
卷期号:8: 133403-133419
被引量:14
标识
DOI:10.1109/access.2020.3010828
摘要
Text correction systems (e.g., spell checkers) have been used to improve the quality of computerized text by detecting and correcting errors. However, the task of performing spelling correction and word normalization (text correction) for Thai social media text has remained largely unexplored. In this paper, we investigated how current text correction systems perform on correcting errors and word variances in Thai social texts and propose a method designed for this task. We have found that currently available Thai text correction systems are insufficiently robust for correcting spelling errors and word variances, while the text correctors designed for English grammatical error correction suffer from overcorrections (text rewrites). Thus, we proposed a neural-based text corrector with a two-stage structure to alleviate issues of overcorrections while exploiting the benefits of a neural Seq2Seq corrector. Our method consists of a neural-based error detector and a Seq2Seq neural error corrector with contextual attention. This novel architecture allows the Seq2Seq network to produce corrections based on both the erroneous text and its context without the need for an end-to-end structure. Our method outperformed all the other evaluated text correction systems. When compared to the second-best result (copy-augmented transformer), our method further reduced the word error rate (WER) from 2.51% to 2.07%, improved the generalized language evaluation understanding (GLEU) score from 0.9409 to 0.9502 on the Thai text correction task, and improved the GLEU score from 0.7409 to 0.7539 on the English spelling correction task.
科研通智能强力驱动
Strongly Powered by AbleSci AI