pathCLIP: Detection of Genes and Gene Relations from Biological Pathway Figures through Image-Text Contrastive Learning.

bioRxiv : the preprint server for biology(2023)

引用 0|浏览6
暂无评分
摘要
In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. A case study on extracting pathway information from non-small cell lung cancer literature further demonstrates the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.
更多
查看译文
关键词
Contrastive learning,Entity detection,Literature mining,Pathway figures,Relation extraction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要