谷歌浏览器插件
订阅小程序
在清言上使用

Genomics-FM: Universal Foundation Model for Versatile and Data-Efficient Functional Genomic Analysis

Peng Ye, Weiqing Bai,Yuchen Ren,Wenran Li, Lifeng Qiao, Chaoqi Liang,Linxiao Wang, Yuchen Cai,Jianle Sun, Zejun Yang, Peng Zheng, Nanqing Dong,Tao Chen,Zhihui Wang, Xihui Liu, Xinzhu Ma, Hongliang Yan,Zhen Wang,Sijia Wang, Wanli Ouyang

biorxiv(2024)

引用 0|浏览6
暂无评分
摘要
Artificial intelligence (AI) plays a crucial role in functional genomic analysis, offering great potential for comprehending biological phenomena such as heredity, development, diseases, and evolution. However, the development of AI models needs substantial labeled data, and these models are typically task-specific with limited generalizability to various applications. Here, we develop Genomics-FM, a genomic vocabulary driven foundation model that enables versatile and label-efficient functional genomic analysis. Specifically, Genomics-FM is first pretrained with ensemble genomic vocabulary on vast unlabelled data to learn comprehensive and generalizable representations and then finetuned with specific genomic vocabulary on limited labeled data to selectively activate and adapt the pretraining knowledge for specific tasks. We show that Genomics-FM significantly reduces the dependence on labeled data, and demonstrates the capability to outperform existing models across a comprehensive suite of tasks including genome annotation, epigenomic and expression profile prediction, and variant effect assessment. Remarkably, Genomics-FM even shows impressive zero-shot predictive capabilities across diverse species and tissues and exhibits noticeable adaptability to RNA-related tasks. With feasibility in data scarcity and even cross-domain biological scenarios, Genomics-FM will promote the broad application of AI and empower researchers to tackle previously insurmountable challenges, paving the way for groundbreaking research and discoveries. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要