谷歌浏览器插件
订阅小程序
在清言上使用

OpenTE: Open-Structure Table Extraction from Text

Haoyu Dong,Mengkang Hu, Qinyu Xu, Haochen Wang,Yue Hu

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

引用 0|浏览8
暂无评分
摘要
This paper presents an Open-Structure Table Extraction (OpenTE) task, which aims to extract a table with intrinsic semantic, calculational, and hierarchical structure from unstructured text. We devise a novel Identification-Extraction-Grounding (IEG) framework for language models (LMs) comprising three chaining steps: (1) identifying semantic and calculational relationships among columns, (2) extracting structured data from unstructured text, and (3) aligning extracted data with the source text and the table structure with a separate discrete grounding model. Experiment results suggest that OpenTE presents a significant challenge for state-of-the-art LMs and demonstrate that the IEG framework achieves superior performance on both datasets, with over 9% F1 improvements in the few-shot setting for GPT-3.5&4 and other large language models (LLMs) and over 4.9% F1 enhancements in the fine-tuning setting for open-source BART. We’ll release the dataset to facilitate future research.
更多
查看译文
关键词
Hierarchical Structure,Semantic Similarity,Language Model,Source Text,Unstructured Text,Experimental Analysis,Information Extraction,Hallucinations,Tree Structure,Large Margin,Exact Match,Hierarchical Relationships,Identification Of Modules,Basketball Players,Table Cells,Data Quality Issues,Incorrect Location,Key-value Pairs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要