A Comparison of Imputation Methods for Missing Risk Factor Data from Large Real-world Electronic Medical Records for Comparative Effectiveness Studies (Preprint)

semanticscholar(2018)

引用 0|浏览1
暂无评分
摘要
UNSTRUCTURED Background: Evaluation of appropriate methodologies for imputation of missing risk factor or outcome data from electronic medical records (EMRs) is crucial but lacking for comparative effectiveness studies. Robust imputation of missing data relies on the understanding of the predictors of missingness in the risk factor data, especially in patients with chronic diseases. These two aspects have not been explored simultaneously to support methodological developments in clinical epidemiological studies with real-world data. Methods: Using disease-biomarker data (glycated haemoglobin, HbA1c) from large EMR database in patients with diabetes, exploratory analyses were conducted to ascertain the possible predictors of missingness. Three approaches based on multiple imputation (MI) technique, namely two-fold MI, MI by chained equations, and MI with Monte Carlo Markov Chain, were evaluated in terms of their robustness in imputing missing data. The value of using imputed data for drawing robust inferences on comparative effectiveness of two anti-diabetes therapies were compared with the complete-case analyses. Results: Older patients and patients with higher disease-severity were less likely to have missing HbA1c data longitudinally over 12 months, while gender and pre-existing comorbidities were not associated with the likelihood of missingness. No significant differences in the distributions of follow-up imputed data with the three methods were observed. Conclusion: While complete case analyses were prone to bias by indication, use of three MI techniques for large proportion of missing primary outcome data under unknown patterns of missingness appeared to be valid, and able to provide consistent and reliable clinical inferences.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要