GEMBS: high through-put processing pipeline for DNA methylation data from Whole Genome Bisulfite Sequencing (WGBS)

bioRxiv(2017)

引用 4|浏览17
暂无评分
摘要
DNA methylation is essential for normal embryogenesis and development in mammals. Currently, whole genome sequencing of bisulfite converted DNA (WGBS) represents the gold standard for studying DNA methylation at genomic level. Contrary to other techniques, it provides an unbiased view of the entire genome at single base pair resolution. However, in practice, due to its (until recently) comparatively high cost, its application for the analysis of large data sets (i.e. u003e 50 samples) has been lagging behind other more cost-efficient platforms, such as for example the Illumina microarrays (Infinium 27K, 450k and EPIC). Subsequently, despite the variety of software tools that exist for the analysis of WGBS, processing of large datasets still remains cumbersome. We present GEMBS, a bioinformatics pipeline specifically designed for the analysis of large WGBS data sets. GEMBS is based on two core modules: GEM3, a high performance read aligner, and BScall, a variant caller specifically for bisulfite sequencing data. Both components are embedded in a highly parallel workflow enabling highly efficient and reliable execution in a HPC environment. In this study, we benchmark GEMBS performance against other common analysis tools and show how GEMBS can be used for accurate variant calling from WGBS data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要