Utilizing Parity Checking to Optimize Soft Error Detection Through Low-Level Reexecution

IEEE TRANSACTIONS ON RELIABILITY(2023)

引用 0|浏览4
暂无评分
摘要
Higher component density, lower voltage levels, and higher transistor counts increase programmable systems' susceptibility to transient faults. At the same time, the adoption of embedded systems in many safety-critical and mission-critical systems makes their reliability of utmost importance. Software-implemented error detection techniques can be utilized to protect these systems as an alternative to less flexible and costlier hardware solutions like redundant hardware or fully duplicated systems. One of these techniques is the low-level re-execution-based technique called DETECTOR, which matches the error reduction capabilities of other state-of-the-art techniques while utilizing only three reserved CPU registers. This is in contrast to other techniques, which required a large number of CPU registers to be reserved, making them unusable for some programs. This article provides an optimization of DETECTOR by combining parity checking with DETECTOR's re-execution mechanism. The technique, called P-DETECTOR, is validated extensively on multiple data processing and I/O-driven case studies and compared to the state-of-the-art. The results show that, compared to an unprotected system, the P-DETECTOR technique reduces the percentage of faults resulting in a corrupted output by 93.76% for control flow errors and by 87.89% for data flow errors, outperforming DETECTOR and matching other state-of-the-art techniques like RACFED, SWIFT, and FDSC.
更多
查看译文
关键词
Fault tolerance, re-execution, reliability, software-implemented error detection, transient errors
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要