On the Impact of Data Heterogeneity in Federated Learning Environments with Application to Healthcare Networks
CoRR(2024)
摘要
Federated Learning (FL) allows multiple privacy-sensitive applications to
leverage their dataset for a global model construction without any disclosure
of the information. One of those domains is healthcare, where groups of silos
collaborate in order to generate a global predictor with improved accuracy and
generalization. However, the inherent challenge lies in the high heterogeneity
of medical data, necessitating sophisticated techniques for assessment and
compensation. This paper presents a comprehensive exploration of the
mathematical formalization and taxonomy of heterogeneity within FL
environments, focusing on the intricacies of medical data. In particular, we
address the evaluation and comparison of the most popular FL algorithms with
respect to their ability to cope with quantity-based, feature and label
distribution-based heterogeneity. The goal is to provide a quantitative
evaluation of the impact of data heterogeneity in FL systems for healthcare
networks as well as a guideline on FL algorithm selection. Our research extends
beyond existing studies by benchmarking seven of the most common FL algorithms
against the unique challenges posed by medical data use cases. The paper
targets the prediction of the risk of stroke recurrence through a set of
tabular clinical reports collected by different federated hospital silos: data
heterogeneity frequently encountered in this scenario and its impact on FL
performance are discussed.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要