Exploring Declustered Software RAID for Enhanced Reliability and Recovery Performance in HPC Storage Systems

2019 38th Symposium on Reliable Distributed Systems (SRDS)(2019)

引用 0|浏览40
暂无评分
摘要
Redundant array of independent disks (RAID) has been widely used to address the reliability and performance issues of storage systems. As the scale of modern storage systems continues growing, disk failure becomes the norm. With the ever-increasing disk capacity, RAID recovery based on disk rebuild becomes more and more costly, which causes significant performance degradation and even unavailability of storage systems. Declustered data layout enables parallel RAID reconstruction by shuffling data and parity blocks among all drives (including spares) in a RAID group. However, the reliability and performance of declustered RAID in real-world storage environments have not been thoroughly studied. With the popularity of ZFS file system and software RAID used in production data centers, in this paper, we extensively evaluate declustered RAID with regard to the RAID recovery time and I/O performance on a high-performance storage platform at Los Alamos National Laboratory. Our empirical study reveals the advantages and disadvantages of declustered RAID technology. We qualitatively characterize the recovery performance of declustered RAID and compare with that of ZFS RAIDZ under various I/O workloads and access patterns. The experimental results show that the speedup of declustered RAID over traditional RAID is sub-linear to the parallelism of recovery I/O. Furthermore, we formally model and analyze the reliability of declustered RAID in terms of the mean-time-to-data-loss (MTTDL) and discover that the improved recovery performance leads to higher storage reliability compared with the traditional RAID.
更多
查看译文
关键词
Storage Reliability,Reliability Modeling,Software RAID,Declustered RAID,Performance Evaluation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要