BioKGrapher: Initial Evaluation of Automated Knowledge Graph Construction from Biomedical Literature

Henning Schaefer,Ahmad Idrissi-Yaghir, Kamyar Arzideh,Hendrik Damm, Tabea M. G. Pakull,Cynthia S. Schmidt, Mikel Bahn,Georg Lodde,Elisabeth Livingstone,Dirk Schadendorf,Felix Nensa,Peter A. Horn,Christoph M. Friedrich

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL（2024）

Univ Hosp Essen | Univ Appl Sci & Arts Dortmund FHDO

Cited 0|Views4

Abstract

Background The growth of biomedical literature presents challenges in extracting and structuring knowledge. Knowledge Graphs (KGs) offer a solution by representing relationships between biomedical entities. However, manual construction of KGs is labor-intensive and time-consuming, highlighting the need for automated methods. This work introduces BioKGrapher, a tool for automatic KG construction using large-scale publication data, with a focus on biomedical concepts related to specific medical conditions. BioKGrapher allows researchers to construct KGs from PubMed IDs. Methods The BioKGrapher pipeline begins with Named Entity Recognition and Linking (NER+NEL) to extract and normalize biomedical concepts from PubMed, mapping them to the Unified Medical Language System (UMLS). Extracted concepts are weighted and re-ranked using Kullback-Leibler divergence and local frequency balancing. These concepts are then integrated into hierarchical KGs, with relationships formed using terminologies like SNOMED CT and NCIt. Downstream applications include multi-label document classification using Adapter-infused Transformer models. Results BioKGrapher effectively aligns generated concepts with clinical practice guidelines from the German Guideline Program in Oncology (GGPO), achieving F 1 -Scores of up to 0.6. In multi-label classification, Adapter-infused models using a BioKGrapher cancer-specific KG improved micro F 1 -Scores by up to 0.89 percentage points over a non-specific KG and 2.16 points over base models across three BERT variants. The drug-disease extraction case study identified indications for Nivolumab and Rituximab. Conclusion BioKGrapher is a tool for automatic KG construction, aligning with the GGPO and enhancing downstream task performance. It offers a scalable solution for managing biomedical knowledge, with potential applications in literature recommendation, decision support, and drug repurposing.

Translated text

Key words

Knowledge graph,Named entity recognition,Entity linking,Clinical guidelines,Software

求助PDF

上传PDF

Bibtex

AI Read Science

AI Summary

AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.

Example

Background

Key content

Introduction

Methods

Results

Related work

Fund

Key content

Pretraining has recently greatly promoted the development of natural language processing (NLP)
We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance

Upload PDF to Generate Summary

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：本文介绍了BioKGrapher，一种基于大规模生物医学文献数据的自动化知识图谱构建工具，有效提升了生物医学概念的组织和关系提取，创新点在于其结合了NER+NEL技术、概念权重分配以及与临床指南的对照验证。

【方法】：BioKGrapher采用NER+NEL技术从PubMed中提取和标准化生物医学概念，并将其映射到UMLS，随后使用Kullback-Leibler发散和局部频率平衡对概念进行权重分配和重排，最终整合成层次化的知识图谱。

【实验】：实验使用PubMed数据集，通过BioKGrapher构建的知识图谱与德国肿瘤指南项目（GGPO）的临床指南进行了对齐，F1-Scores达到0.6，并在多标签文档分类任务中，基于BioKGrapher的特定癌症知识图谱的Adapter-infused模型相比非特定KG和基础模型在三个BERT变体上提高了F1-Scores，最高提升至0.89个百分点。药物-疾病提取案例研究中识别了Nivolumab和Rituximab的适应症。