EXPERT: EXPloiting DRAM ERror Types to Improve the Effective Forecasting Coverage in the Field

Xiangjun Peng, Zheng Huang, Alex Cantrell, Bi Hua Shu, Ke Ke Xie, Yi Li,Yu Li,Li Jiang,Qiang Xu,Ming-Chang Yang

DSN-S(2023)

引用 0|浏览20
暂无评分
摘要
DRAM failures, which are mostly caused by DRAM uncorrectable errors (UCEs), are one of the most critical factors for reliable services in computing systems. Prior work demonstrates the potential to utilize machine learning techniques for forecasting DRAM UCEs. However, they do not have the knowledge that different DRAM UCEs can be classified into different types. To this end, we obtain the first field dataset from a large datacenter of Alibaba Cloud, with the labels of different UCE types. Then, we propose EXPERT, a design to exploit such information to improve the effective forecasting coverage of DRAM UCEs. Finally, we evaluate the effectiveness of our approach against two state-of-the-art forecaster designs in the field, and the results show that EXPERT achieves up to 18.43% improvements on the effective coverage in terms of F1-Score.
更多
查看译文
关键词
Alibaba Cloud,computing systems,DRAM failures,DRAM UCE,DRAM uncorrectable errors,effective coverage,effective forecasting coverage,EXPERT,exploiting DRAM error types,F1-score,field dataset,forecaster designs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要