Large-scale Dataset and Effective Model for Variant-Disease Associations Extraction

14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023(2023)

引用 0|浏览2
暂无评分
摘要
Extracting variant-disease associations (VDAs) from the biomedical literature is a critical task in biomedical and genomics research, as it provides valuable insights into the genetic basis of diseases and facilitates the development of precision medicine. The biomedical literature is a vast and growing source of information containing a wealth of knowledge on genetic variants and their associations with diseases. However, the manual extraction of VDAs from the literature is a time-consuming and labor-intensive process, making it challenging to keep up with the rapidly expanding literature. Therefore, there is a pressing need to develop computational methods for effectively extracting and curating VDAs from the biomedical literature, and to build a comprehensive dataset for this significant task. In this paper, we present a large-scale, semi-automatically annotated dataset for VDA extraction from the biomedical literature (called VDAL) based on the DisGeNet platform which contains one of the largest publicly available collections of genes and variants associated with human diseases. To the best of our knowledge, VDAL is one of the largest datasets for VDA extraction, containing 9,362 related PubMed documents from the biomedical domain. In addition, we propose a novel and simple yet effective model, called VDANet, which incorporates the corresponding gene embeddings of the variants into the model to better explore the associations between genetic variants and human diseases. Extensive experiments on the constructed dataset show that VDANet significantly outperforms the state-of-the-art baseline methods, thus establishing a new benchmark for VDA extraction. For reproducibility, our code and data are available at https://github.com/JasonCLEI/VDANet.
更多
查看译文
关键词
Variant-disease associations,Biomedical literature,PubMed website,DisGeNet platform,Corresponding gene embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要