Data-Driven Strain Design Using Aggregated Adaptive Laboratory Evolution Mutational Data

biorxiv(2021)

引用 0|浏览6
暂无评分
摘要
Microbes are being engineered for an increasingly large and diverse set of applications. However, the designing of microbial genomes remains challenging due to the general complexity of biological system. Adaptive Laboratory Evolution (ALE) leverages nature’s problem-solving processes to generate optimized genotypes currently inaccessible to rational methods. The large amount of public ALE data now represents a new opportunity for data-driven strain design. This study presents a novel and first of its kind meta-analysis workflow to derive data-driven strain designs from aggregate ALE mutational data using rich mutation annotations, statistical and structural biology methods. The mutational dataset consolidated and utilized in this study contained 63 Escherichia coli K-12 MG1655 based ALE experiments, described by 93 unique environmental conditions, 357 independent evolutions, and 13,957 observed mutations. High-level trends across the entire dataset were established and revealed that ALE-derived strain designs will largely be gene-centric, as opposed to non-coding, and a relatively small number of variants (approx. 4) can significantly alter cellular states and provide benefits which range from an increase in fitness to a complete necessity for survival. Three novel experimentally validated designs relevant to metabolic engineering applications are presented as use cases for the workflow. Specifically, these designs increased growth rates with glycerol as a carbon source through a point mutation to glpK and a truncation to cyaA or increased tolerance to toxic levels of isobutyric acid through a pykF truncation. These results demonstrate how strain designs can be extracted from aggregated ALE data to enhance strain design efforts. ![Figure][1] ### Competing Interest Statement The authors have declared no competing interest. * ALE : adaptive laboratory evolution SNP : single nucleotide polymorphism DEL : deletion MOB : mobile insertion elements INS : insertion SUB : substitution AMP : amplification TFBS : transcription factor binding site RBS : ribosomal binding site SV : structural variant SIFT : Sorting Intolerant from Tolerant PykF : Pyruvate kinase I GlpK : glycerol kinase CyaA : Adenylate cyclase Crr : PTS system glucose-specific EIIA component PTS : phosphotransferase system EIIA : Enzyme II A CCR : carbon catabolite repression cAMP-CRP : activated CRP complex CRP : cAMP receptor protein cAMP : cyclic AMP ΔΔG : The predicted difference between the free energy of unfolding the protein structure before and after the variant. [1]: pending:yes
更多
查看译文
关键词
strain design,evolution,data-driven
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要