Table information extraction and analysis: A robust geometric approach based on GatedGCN.

Ying Liu, Xiaoyun Liang, Shaoqiong Chen,Liang Diao,Xin Tang,Rui Fang,Weifu Chen

ICPR(2022)

引用 0|浏览11
暂无评分
摘要
With the rapid development of Artificial Intelligence, Optical Character Recognition(OCR) is applied to analyze and understand the contents of various images, which has a very important effect on online office and makes business more intelligent. As downstream tasks of OCR, information extraction and table analysis are indispensable for acquisition of target information. However, when the texts information are gained from an invoice image or a table image by detection and recognition methods, how to further extract necessary information from a mass of texts or to analyze table information by reconstruction method is still difficult and challenging. In the paper, based on gated graph convolutional networks (GatedGCNs), we propose a novel model to extract key information in documents and reconstruct table information from listing images. Different from manual methods, the GatedGCN-based model considers three kinds of features for the semantic entities, including the position of an entity, the box containing the entity and texts inside the box. The proposed model also considers the relationship between semantic entities, which is a key factor to improve the classification accuracy. Since the update of gated edges in GatedGCN can be treated as a new way to implement attention mechanism, the model can integrate more critical information and discard unnecessary information. Therefore, combining with the node features and the edge features we have extracted, when applying the model on key field extraction (which can be treated as node classification problems) and table reconstruction (that can be treated as link-prediction problems), the model reaches overall excellent results in terms of precision, recall, F1 score and accuracy, evaluated on Medical Invoice, Train Tickets, SciTSR, ICDAR2013 datasets.
更多
查看译文
关键词
table information extraction,information extraction,robust geometric approach
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要