Unity ECC: Unified Memory Protection Against Bit and Chip Errors

SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis(2023)

引用 0|浏览5
暂无评分
摘要
DRAM vendors utilize On-Die Error Correction Codes (OD-ECC) to correct random bit errors internally. Meanwhile, system companies utilize Rank-Level ECC (RL-ECC) to protect data against chip errors. Separate protection increases the redundancy ratio to 32.8% in DDR5 and incurs significant performance penalties. This paper proposes a novel RL-ECC, Unity ECC , that can correct both singlechip and double-bit error patterns. Unity ECC corrects doublebit errors using unused syndromes of single-chip correction. Our evaluation shows that Unity ECC without OD-ECC can provide the same reliability level as Chipkill RL-ECC with OD-ECC. Moreover, it can significantly improve system performance and reduce DRAM energy and area by eliminating OD-ECC.
更多
查看译文
关键词
DRAM,reliability,ECC,on-die ECC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要