光学字符识别
预处理器
计算机科学
人工智能
计算机视觉
模式识别(心理学)
低分辨率
性格(数学)
字符识别
图像(数学)
数学
高分辨率
遥感
几何学
地质学
作者
Matteo Brisinello,Ratko Grbić,Matija Pul,Tihomir Andelic
标识
DOI:10.23919/elmar.2017.8124460
摘要
Efficient Optical Character Recognition (OCR) in images grabbed from Set-Top Boxes (STBs) plays an important role in STB testing. However, running OCR software on such images usually ends with low OCR performance since images can have low resolution, low image quality or colorful background. In order to improve OCR performance, four different image preprocessing methods are proposed. In this paper OCR is performed with Tesseract 3.5 and the relatively new Tesseract 4.0 on the images grabbed from different STBs. On the original images Tesseract 3.5 provides a 35.7% accuracy while Tesseract 4.0 attains a 70.2% accuracy. The proposed preprocessing methods improve OCR performance by 33.3% for Tesseract 3.5 and 22.6% for Tesseract 4.0 on the available images.
科研通智能强力驱动
Strongly Powered by AbleSci AI