Developing an OCR Model for Extracting Information from Invoices with Korean Language

Xiem HoangVan, Phu TranQuang, Minh DinhBao, Tien VuHuu

2023 International Conference on Advanced Technologies for Communications (ATC)(2023)

引用 0|浏览0
暂无评分
摘要
Invoices are commercial documents that contain various pieces of information, including the purchased items, time, and total money. Therefore, the extraction of these key information automatically plays an important role. Although there are several methods in detection and extraction information for documents with Latin characters, methods for Korean character are limited. Therefore, in this paper, an Optical Character Recognition (OCR) model using deep learning model combined with some image preprocessing techniques is proposed to automatically extract proper information from the invoices with Korean language. The proposed OCR model is assessed in a large scale dataset of collected invoices and the results show that 87% F1-score can be achieved with negligible time processing.
更多
查看译文
关键词
OCR,text detection,text recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要