Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters.

ICPP(2022)

引用 2|浏览9
暂无评分
摘要
Large-scale distributed storage systems have introduced erasure code to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage nodes are often organized into multiple stripes independently manipulated by erasure code. Due to the scarcity and heterogeneity of the cross-rack bandwidth, the cross-rack network transmission dominates the entire repair costs. We argue that when erasure code is deployed in a rack architecture, existing repair techniques are limited in different aspects: neglecting the heterogeneous cross-rack bandwidth, less consideration for multi-stripe failure, no special treatment on repair link scheduling, and only targeting specific erasure code constructions. In this paper, we present CMRepair, an efficient Cross-rack Multi-stripe Repair technique that aims to reduce the repair time for multi-stripes failure repair in heterogeneous erasure-coded clusters. CMRepair carefully chooses the nodes for reading/repairing blocks and greedily searches for the near-optimal multi-stripe repair solution that reduces the cross-rack repair time while only introducing negligible computational overhead. Furthermore, it selectively schedules the execution orders of cross-rack links, with the primary objective of saturating the unused upload/download bandwidth resources and avoiding network congestion. CMRepair can also be extended to tackle full-node repair, multi-failure repair, and adapt to different erasure codes. Experiments show that CMRepair can reduce 6.42%-62.50% of the cross-rack repair time and improve 24.94%-53.91% of the repair throughput.
更多
查看译文
关键词
Erasure Code, Rack Architecture, Multiple Stripes, Heterogeneous Network, Repair Time
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要