Applying Machine Learning on UK Biobank biomarker data empowers case-control discovery yield

medrxiv(2023)

引用 0|浏览20
暂无评分
摘要
Missing or inaccurate diagnoses in biobank datasets can reduce the power of human genetic association studies. We present a machine-learning framework (MILTON) that utilizes the wealth of phenotypic information available in a biobank dataset to identify undiagnosed individuals within the cohort who have biomarker profiles similar to those of positively diagnosed cases. We applied MILTON to perform an augmented phenome-wide association study (PheWAS) based on 405,703 whole exome sequencing samples from UK Biobank, resulting in improved signals for known (p<1×10−8) gene-disease relationships alongside 206 novel gene-disease relationships that only achieved genome-wide significance upon using MILTON. To further validate these putatively novel discoveries, we adopt two orthogonal machine learning methods that prioritise gene-disease relationships using comprehensive publicly available datasets alongside a biological insights knowledge graph. For additional clinical translation utility, MILTON outputs a disease-specific biomarker set per disease as well as comorbidity clusters across ICD10 disease codes based on shared biomarker profiles of positively labelled cases. All the extracted associations and biomarker importance results for the 3,308 studied binary traits will be made available via an interactive web-portal. ### Competing Interest Statement The authors are employees and/or stockholders of AstraZeneca UK Ltd. ### Funding Statement This study did not receive any funding ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The details of the IRB/oversight body that provided approval or exemption for the research described are given below: UK Biobank data use (Project Application Numbers 68601 and 26041) was approved by the UK Biobank according to their established access procedures. UK Biobank has approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB), and as such researchers using UK Biobank data do not require separate ethical clearance and can operate under the RTB approval. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes
更多
查看译文
关键词
biomarker,machine learning,discovery,data,case-control
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要