Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing

biorxiv(2023)

引用 1|浏览6
暂无评分
摘要
Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that sin-gle-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bio-informatically. Depending on the array type, SNVs are found in approximately 2-5 Mb of the genome that are inverted between refer-ence builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype struc-ture, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus asso-ciated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 3 10-7 to 0.0011 in a case-control analysis of 20,286 Af-ricans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, lift -Over, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy.
更多
查看译文
关键词
genome build,bioinformatics,reference genome,genetic associations,imputation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要