Distance-based linkage of personal microbiome records for identification and its privacy implications

COMPUTERS & SECURITY(2024)

引用 0|浏览11
暂无评分
摘要
Due to its high potential for analysis in clinical settings, research on the human microbiome has been flourishing for several years. As an increasing amount of data on the microbiome is gathered and stored, analysing the temporal and individual stability of microbiome readings, and the succeeding privacy risks, has gained importance. In 2015, Franzosa et al. demonstrated the feasibility of matching and linking individuals in microbiome-based datasets from the Human Microbiome Project, which could lead to re-identification of individuals, and thus poses privacy implications for microbiome study designs. Their technique is based on the construction of body site-specific metagenomic codes that maintain a certain stability over time. In this paper, we establish a distance-based technique for personal microbiome identification, which is combined with a solution for avoiding spurious, false positive matches. In a direct comparison with the approach from Franzosa et al., which assumes that information is available as microbial records, rather than at the more detailed (but less likely to be shared) nucleic acid level, our method improves upon the identification results on most of the considered datasets. Our main finding is an increase of the average percentage of true positive identifications of 30% on the widely studied microbiome of the gastrointestinal tract. While we particularly recommend our method for application on the gut microbiome, we also observed substantial identification success on other body sites. Our results demonstrate the potential of privacy threats in microbiome data gathering, storage, sharing, and analysis, and thus underline the need for solutions to protect the microbiome as personal and sensitive medical data. We also show that the method is robust to various hyper-parameter settings. Based on our observations, we further identify challenges in personal microbiome identification research, specifically, the scarcity of benchmark data and associated data analysis tasks. Based on our experience, we propose solutions for a more systematic and comparable evaluation, considering also aspects of costs entailed with applying privacy-preserving methods.
更多
查看译文
关键词
Human microbiome,Data privacy,Re-identification,Record linkage,Mitigation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要