Data-Driven Strain Design Using Aggregated Adaptive Laboratory Evolution Mutational Data
biorxiv(2021)
摘要
Microbes are being engineered for an increasingly large and diverse set of applications. However, the designing of microbial genomes remains challenging due to the general complexity of biological system. Adaptive Laboratory Evolution (ALE) leverages nature’s problem-solving processes to generate optimized genotypes currently inaccessible to rational methods. The large amount of public ALE data now represents a new opportunity for data-driven strain design. This study presents a novel and first of its kind meta-analysis workflow to derive data-driven strain designs from aggregate ALE mutational data using rich mutation annotations, statistical and structural biology methods. The mutational dataset consolidated and utilized in this study contained 63 Escherichia coli K-12 MG1655 based ALE experiments, described by 93 unique environmental conditions, 357 independent evolutions, and 13,957 observed mutations. High-level trends across the entire dataset were established and revealed that ALE-derived strain designs will largely be gene-centric, as opposed to non-coding, and a relatively small number of variants (approx. 4) can significantly alter cellular states and provide benefits which range from an increase in fitness to a complete necessity for survival. Three novel experimentally validated designs relevant to metabolic engineering applications are presented as use cases for the workflow. Specifically, these designs increased growth rates with glycerol as a carbon source through a point mutation to glpK and a truncation to cyaA or increased tolerance to toxic levels of isobutyric acid through a pykF truncation. These results demonstrate how strain designs can be extracted from aggregated ALE data to enhance strain design efforts.
![Figure][1]
### Competing Interest Statement
The authors have declared no competing interest.
* ALE
: adaptive laboratory evolution
SNP
: single nucleotide polymorphism
DEL
: deletion
MOB
: mobile insertion elements
INS
: insertion
SUB
: substitution
AMP
: amplification
TFBS
: transcription factor binding site
RBS
: ribosomal binding site
SV
: structural variant
SIFT
: Sorting Intolerant from Tolerant
PykF
: Pyruvate kinase I
GlpK
: glycerol kinase
CyaA
: Adenylate cyclase
Crr
: PTS system glucose-specific EIIA component
PTS
: phosphotransferase system
EIIA
: Enzyme II A
CCR
: carbon catabolite repression
cAMP-CRP
: activated CRP complex
CRP
: cAMP receptor protein
cAMP
: cyclic AMP
ΔΔG
: The predicted difference between the free energy of unfolding the protein structure before and after the variant.
[1]: pending:yes
更多查看译文
关键词
strain design,evolution,data-driven
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要