A Protocol to Extract a Specific Genomic Region from a Public Whole-Genome Database and Modify Analytical Bin Length for Population Genetic Studies

Methods and protocols(2024)

引用 0|浏览1
暂无评分
摘要
With the advent of “next-generation” sequencing and the continuous reduction in sequencing costs, an increasing amount of genomic data has emerged, such as whole-genome, whole-exome, and targeted sequencing data. These applications are popular not only in mega sequencing projects, such as the 1000 Genomes Project and UK BioBank, but also among individual researchers. Evolutionary genetic analyses, such as the dN/dS ratio and Tajima’s D, are demanded more and more for whole-genome-level population data. These analyses are often carried out under a uniform custom bin size across the genome. However, these analyses require subdivision of a genomic region into functional units, such as protein-coding regions, introns, and untranslated regions, and computing these genetic measures for large-scale data remains challenging. In a recent investigation, we successfully devised a method to address this issue. This method requires a multi-sample VCF file containing population data, a reference genome, target regions in the BED file, and a list of samples to be included in the analysis. Given that the targeted regions are extracted in a new VCF file, targeted population genetic analysis can be performed. We conducted Tajima’s D analysis using this approach on intact and pseudogenes, as well as non-coding regions.
更多
查看译文
关键词
population genetics,whole-genome sequencing,whole-exome sequencing,target capture,evolutionary studies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要