RITA: a Study on Scaling Up Generative Protein Sequence Models

Daniel Hesslow,Niccoló Zanichelli,Pascal Notin,Iacopo Poli,Debora Marks

arxiv（2022）

引用 0|浏览38

暂无评分

摘要

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.

查看译文

关键词

models,protein,scaling,sequence

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要