Architectural Support for Mitigating DRAM Soft Errors in Large-Scale Supercomputers

msra(2007)

引用 23|浏览18
暂无评分
摘要
Modern DRAM devices are built from high-density, low-voltage integrated circuits that are becoming increasingly susceptible to influences from external factors such as electrical noise, process variation, and natural radiation (particle-induced upsets). Errors resulting from these effects are referred to as “soft errors” since although they corrupt the state of a storage element, they generally do not cause any permanent damage to its underlying circuitry. The rate at which these events occur is referred to as the soft error rate (SER) and has been steadily increasing as transistor geometries shrink. In particular, largescale supercomputers, where high volumes of memory parts are utilized in a single system, must be designed to address and tolerate otherwise crippling SERs. The Cray BlackWidow large-scale multiprocessor is one such system where particular attention has been paid to mitigating the effects of soft errors in main memory. In this paper we describe several mechanisms used in the Cray BlackWidow memory system to combat soft errors and illustrate how, together, these mechanisms work to address a broad range of memory errors.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要