Chrome Extension
WeChat Mini Program
Use on ChatGLM

Knowledge-guided data mining on the standardized architecture of NRPS: Subtypes, novel motifs, and sequence entanglements

Ruolin He, Jinyu Zhang, Yuanzhe Shao, Shaohua Gu, Chen Song, Long Qian, Wen-Bing Yin, Zhiyuan Li

PLOS Computational Biology(2023)

Cited 1|Views12
No score
Abstract
Author summaryNRPS, a gigantic enzyme that produces diverse microbial secondary metabolites, provides a rich source for important medical products such as antibiotics and antitumor agents. Despite the extensive knowledge gained about its structure and a large amount of sequencing data available, the frequent failure of re-engineering NRPS in synthetic biology highlights that much still needs to be discovered. In this work, we applied existing knowledge to data mining of NRPS sequences, using well-known conserved motifs to partition NRPS sequences into motif-intermotif architectures. This standardization allows for integrating large amounts of sequences from different sources, providing a comprehensive overview of NRPSs across different kingdoms. Our findings included new C domain subtypes, novel conserved motifs with implications in structural flexibility, and insights into why NRPSs are so difficult to re-engineer. To facilitate researchers in related fields, we constructed an online platform, "NRPS Motif Finder", for parsing the motif-and-intermotif architecture and C domain subtype classification. We believe that this knowledge-guided approach not only advances our understanding of NRPSs but also provides a useful methodology for data mining in large-scale biological sequences. Non-ribosomal peptide synthetase (NRPS) is a diverse family of biosynthetic enzymes for the assembly of bioactive peptides. Despite advances in microbial sequencing, the lack of a consistent standard for annotating NRPS domains and modules has made data-driven discoveries challenging. To address this, we introduced a standardized architecture for NRPS, by using known conserved motifs to partition typical domains. This motif-and-intermotif standardization allowed for systematic evaluations of sequence properties from a large number of NRPS pathways, resulting in the most comprehensive cross-kingdom C domain subtype classifications to date, as well as the discovery and experimental validation of novel conserved motifs with functional significance. Furthermore, our coevolution analysis revealed important barriers associated with re-engineering NRPSs and uncovered the entanglement between phylogeny and substrate specificity in NRPS sequences. Our findings provide a comprehensive and statistically insightful analysis of NRPS sequences, opening avenues for future data-driven discoveries.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined