A practical analysis of balancing policies for rearranging data replicas in HDFS clusters

Anais do XXIII Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2022)(2022)

引用 0|浏览0
暂无评分
摘要
Data replication is the main fault tolerance mechanism implemented by the HDFS. The placement of the replicated data across the nodes directly influences replica balancing and data locality, which are essential to ensure high reliability and data availability. The HDFS Balancer is the official solution to perform replica balancing through data redistribution. In this work, we conducted a practical experiment to evaluate different policies for replica rearrangement, namely: datanode, blockpool, and custom. The evaluation results underline the behavior and the effectiveness of each policy. In addition, we investigated the cost of the HDFS Balancer operation and the performance and availability improvements promoted by a balanced replica distribution.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要