A Large-Scale Chinese Patent Dataset for Information Extraction
Systems science & control engineering(2024)
摘要
Information extraction is an important foundation for automated patent analysis. Deep learning methods show promising results for information extraction, the performance of such methods heavily depends on the available corpus. To promote research on Chinese information extraction and evaluate the performance of related systems, we present a novel dataset, named CPIE, and make it publicly available. The dataset consisting of five thousands records of Chinese patent documents. The data were annotated by a tagging team using an on-line annotation tools. The dataset was evaluated using a state-of-the-art information extraction method that involves named entity recognition and relationship classification. The results shed light on new challenges and promote information extraction research.
更多查看译文
关键词
Information extraction,patent analysis,named entity recognition,dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要