An approach for detecting and cleaning of struck-out handwritten text

计算机科学 人工智能 预处理器 图形 模式识别(心理学) 光学字符识别 自然语言处理 连接部件 分类器(UML) 文本检测 图像(数学) 理论计算机科学
作者
B.B. Chaudhuri,Chandranath Adak
出处
期刊:Pattern Recognition [Elsevier BV]
卷期号:61: 282-294 被引量:35
标识
DOI:10.1016/j.patcog.2016.07.032
摘要

Abstract This paper deals with the identification and processing of struck-out texts in unconstrained offline handwritten document images. If run on the OCR engine, such texts will produce nonsense character-string outputs. Here we present a combined (a) pattern classification and (b) graph-based method for identifying such texts. In case of (a), a feature-based two-class (normal vs. struck-out text) SVM classifier is used to detect moderate-sized struck-out components. In case of (b), skeleton of the text component is considered as a graph and the strike-out stroke is identified using a constrained shortest path algorithm. To identify zigzag or wavy struck-outs, all paths are found and some properties of zigzag and wavy line are utilized. Some other types of strike-out stroke are also detected by modifying the above method. The large sized multi-word and multi-line struck-outs are segmented into smaller components and treated as above. The detected struck-out texts can then be blocked from entering the OCR engine. In another kind of application involving historical documents, page images along with their annotated ground-truth are to be generated. In this case the strike-out strokes can be deleted from the words and then fed to the OCR engine. For this purpose an inpainting-based cleaning approach is employed. We worked on 500 pages of documents and obtained an overall F-Measure of 91.56% (91.06%) in English (Bengali) script for struck-out text detection. Also, for strike-out stroke identification and deletion, the F-Measures obtained were 89.65% (89.31%) and 91.16% (89.29%), respectively.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
JPG完成签到,获得积分0
1秒前
北岸发布了新的文献求助10
1秒前
123123完成签到 ,获得积分10
2秒前
科研狗应助寸马豆人采纳,获得50
2秒前
2秒前
俭朴笑晴发布了新的文献求助10
2秒前
汝桢完成签到 ,获得积分10
2秒前
斯文败类应助科研通管家采纳,获得10
3秒前
Ava应助科研通管家采纳,获得10
3秒前
小蘑菇应助科研通管家采纳,获得10
3秒前
顾矜应助科研通管家采纳,获得10
3秒前
烟花应助科研通管家采纳,获得10
3秒前
小二郎应助科研通管家采纳,获得10
3秒前
隐形曼青应助科研通管家采纳,获得30
3秒前
Moonpie应助科研通管家采纳,获得10
3秒前
wanci应助科研通管家采纳,获得10
3秒前
丘比特应助科研通管家采纳,获得10
3秒前
烟花应助科研通管家采纳,获得10
3秒前
搜集达人应助科研通管家采纳,获得10
3秒前
3秒前
3秒前
3秒前
Moonpie应助科研通管家采纳,获得10
3秒前
充电宝应助科研通管家采纳,获得10
4秒前
4秒前
4秒前
4秒前
4秒前
酷波er应助科研通管家采纳,获得10
4秒前
脑洞疼应助科研通管家采纳,获得10
4秒前
小蘑菇应助科研通管家采纳,获得10
4秒前
bkagyin应助科研通管家采纳,获得10
4秒前
科研通AI6.3应助晴qq采纳,获得30
5秒前
煌煌发布了新的文献求助10
5秒前
5秒前
nini完成签到,获得积分20
6秒前
jessicaw发布了新的文献求助10
7秒前
酷波er应助超帅发夹采纳,获得10
7秒前
yyy完成签到,获得积分10
9秒前
一玮完成签到 ,获得积分10
10秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
The Cambridge History of China: Volume 4, Sui and T'ang China, 589–906 AD, Part Two 1500
Cowries - A Guide to the Gastropod Family Cypraeidae 1200
Quality by Design - An Indispensable Approach to Accelerate Biopharmaceutical Product Development 800
Pulse width control of a 3-phase inverter with non sinusoidal phase voltages 777
The Cambridge Handbook of Second Language Acquisition (2nd)[第二版] 666
Signals, Systems, and Signal Processing 610
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6401315
求助须知:如何正确求助?哪些是违规求助? 8218532
关于积分的说明 17416978
捐赠科研通 5454130
什么是DOI,文献DOI怎么找? 2882445
邀请新用户注册赠送积分活动 1859025
关于科研通互助平台的介绍 1700739