谷歌浏览器插件
订阅小程序
在清言上使用

DRAM Errors and Cosmic Rays: Space Invaders or Science Fiction?

CoRR(2024)

引用 0|浏览1
暂无评分
摘要
It is widely accepted that cosmic rays are a plausible cause of DRAM errors in high-performance computing (HPC) systems, and various studies suggest that they could explain some aspects of the observed DRAM error behavior. However, this phenomenon is insufficiently studied in production environments. We analyze the correlations between cosmic rays and DRAM errors on two HPC clusters: a production supercomputer with server-class DDR3-1600 and a prototype with LPDDR3-1600 and no hardware error correction. Our error logs cover 2000 billion MB-hours for the MareNostrum 3 supercomputer and 135 million MB-hours for the Mont-Blanc prototype. Our analysis combines quantitative analysis, formal statistical methods and machine learning. We detect no indications that cosmic rays have any influence on the DRAM errors. To understand whether the findings are specific to systems under study, located at 100 meters above the sea level, the analysis should be repeated on other HPC clusters, especially the ones located on higher altitudes. Also, analysis can (and should) be applied to revisit and extend numerous previous studies which use cosmic rays as a hypothetical explanation for some aspects of the observed DRAM error behaviors.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要