Chinese document image retrieval based on recognition candidates.

Xuhui Jia,Yong Xia, Rui Zhou,Hongwei Liang

FSKD（2012）

引用 0|浏览16

暂无评分

摘要

For the sake of the low recognition rate for degraded Chinese document, the retrieval performance is not good if directly based on OCR result. In this paper, an indexing method with n-gram and recognition candidates is proposed to improve the performance of retrieval. For ease of test, this paper also presents a method to automatically generate ground-truth of imaged document, synthesized degraded document image and ground-truth of recognition candidates. Several synthesized document image collections on large-scale are built and used, and the experimental results show that the retrieval performance are improved for both collections with high or low OCR error rates. © 2012 IEEE.

查看译文

关键词

chinese document image retrieval,indexing method with n-gram and recognition candidates,synthesized degraded document image,degradation,optical character recognition,ground truth,image retrieval,natural languages,n gram,error rate,estimation,indexing,image recognition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要