Predicting the genetic ancestry of 2.6 million New York City patients using clinical data

biorxiv(2019)

引用 0|浏览14
暂无评分
摘要
Ancestry is an essential covariate in clinical genomics research. When genetic data are available, dimensionality reduction techniques, such as principal components analysis, are used to determine ancestry in complex populations. Unfortunately, these data are not always available in the clinical and research settings. For example, electronic health records (EHRs), which are a rich source of temporal human disease data that could be used to enhance genetic studies, do not directly capture ancestry. Here, we present a novel algorithm for predicting genetic ancestry using only variables that are routinely captured in EHRs, such as self-reported race and ethnicity, and condition billing codes. Using patients that have both genetic and clinical information at Columbia University/ New York-Presbyterian Irving Medical Center, we developed a pipeline that uses only clinical data to predict the genetic ancestry of all patients of which more than 80% identify as other or unknown. Our ancestry estimates can be used in observational studies of disease inheritance, to guide genetic cohort studies, or to explore health disparities in clinical care and outcomes.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要