Latent Taxonomic Signatures: Alignment Free Approach Reveals Semantic Properties of Species Proteomes

Social Science Research Network(2021)

引用 0|浏览1
暂无评分
摘要
Alignment-based methods allow only one-to-one comparisons; promote gene-centered viewpoint and lack broad insight needed for complex biological systems. In actuality, each gene or a protein is part of conglomerate where more than one sequence contributes to the functional network and evolutionary trajectory of the cell. Conserving these network interactions is arguably more important to the evolutionary success than conservation of sequence integrity of an individual protein. Using alignment-free language model, we encoded sets of randomly selected species’ proteins into distributed vector representations of respective species. These representations captured transitive relations between otherwise unrelated proteins, resulting from conserved interactions within a proteome. This allowed us to discover Latent Taxonomic Signatures, a species-specific difference in the frequency of short amino acid chains occurrence, reflecting constraints imposed on protein evolution by their proteome context. Even orphan proteins exhibited LTSs, allowing us to establish taxonomic relatedness in total absence of alignment-based homology. The alignment-free approach here suggests that difference between species is more than just numbers and sequences, actual semantic properties could be equally important as protein family kinship when proteins evolve as parts of a system.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要