The Continuous Language of Protein Structure

Lukas Billera, Anton Oresten, Aron Stålmarck,Kenta Sato,Mateusz Kaduk,Ben Murrell

biorxiv（2024）

引用 0|浏览12

暂无评分

摘要

Just as language is composed of sublexical tokens that combine to form words, sentences, and paragraphs, protein backbones are composed of sub-structural elements that combine to form helices, sheets, folds, domains, and chains. Autoregressive language models operate on discrete tokens, whereas protein structure is inherently continuous, and generative approaches to protein design have borrowed more from image generation than language modeling. But autoregressive models do not inherently require their inputs and outputs to be discrete. Here we describe a generative autoregressive language model over the continuous space of protein backbones, where the distribution over the placement of each successive amino acid is conditioned on all preceding residues, and can be sampled from one residue after another. We show that this approach can learn to sample diverse and realistic protein chains, opening a new potential avenue for in silico protein design. ### Competing Interest Statement The authors have declared no competing interest.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要