A joint embedding of protein sequence and structure enables robust variant effect predictions

Lasse M. Blaabjerg, Nicolas Jonsson,Wouter Boomsma,Amelie Stein,Kresten Lindorff-Larsen

biorxiv(2023)

引用 0|浏览3
暂无评分
摘要
The ability to predict how amino acid changes may affect protein function has a wide range of applications including in disease variant classification and protein engineering. Many existing methods focus on learning from patterns found in either protein sequences or protein structures. Here, we present a method for integrating information from protein sequences and structures in a single model that we term SSEmb (Sequence Structure Embedding). SSEmb combines a graph representation for the protein structure with a transformer model for processing multiple sequence alignments, and we show that by integrating both types of information we obtain a variant effect prediction model that is more robust to cases where sequence information is scarce. Furthermore, we find that SSEmb learns embeddings of the sequence and structural properties that are useful for other downstream tasks. We exemplify this by training a downstream model to predict protein-protein binding sites at high accuracy using only the SSEmb embeddings as input. We envisage that SSEmb may be useful both for zero-shot predictions of variant effects and as a representation for predicting protein properties that depend on protein sequence and structure. ### Competing Interest Statement KL-L holds stock options in and is a consultant for Peptone Ltd.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要