KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction
arxiv(2024)
摘要
In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct
Universal Information Extraction (UIE) via code generation. KnowCoder aims to
develop a kind of unified schema representation that LLMs can easily understand
and an effective learning framework that encourages LLMs to follow schemas and
extract structured knowledge accurately. To achieve these, KnowCoder introduces
a code-style schema representation method to uniformly transform different
schemas into Python classes, with which complex schema information, such as
constraints among tasks in UIE, can be captured in an LLM-friendly manner. We
further construct a code-style schema library covering over 30,000
types of knowledge, which is the largest one for UIE, to the best of our
knowledge. To ease the learning process of LLMs, KnowCoder contains a two-phase
learning framework that enhances its schema understanding ability via code
pretraining and its schema following ability via instruction tuning. After code
pretraining on around 1.5B automatically constructed data, KnowCoder already
attains remarkable generalization ability and achieves relative improvements by
49.8% F1, compared to LLaMA2, under the few-shot setting. After
instruction tuning, KnowCoder further exhibits strong generalization ability on
unseen schemas and achieves up to 12.5% and 21.9%,
compared to sota baselines, under the zero-shot setting and the low resource
setting, respectively. Additionally, based on our unified schema
representations, various human-annotated datasets can simultaneously be
utilized to refine KnowCoder, which achieves significant improvements up to
7.5% under the supervised setting.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要