A shadow removal method for tesseract text recognition

2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)(2017)

引用 7|浏览31
暂无评分
摘要
For shadowed text images, the character recognition performance of Tesseract drops significantly. In this paper, we propose a new method to process the shadowed text images for the Tesseract's optical character recognition engine. First, a local adaptive threshold algorithm is used to transform the grayscale image into a binary image to capture the contours of texts. Next, to delete the salt-and-pepper noise in the shadow areas we propose a double-filtering algorithm, in which a projection method is used to remove the noise between texts and the median filter is used to remove the noise within characters. Finally, the processed binary image is fed into the Tesseract's optical character recognition engine. Experimental results show that the proposed method can achieve a better character recognition performance.
更多
查看译文
关键词
Tesseract,shadow,local adaptive threshold,projection denoising,median filter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要