POSTER: Hardening Selective Protection across Multiple Program Inputs for HPC Applications

Principles and Practice of Parallel Programming(2022)

引用 3|浏览24
暂无评分
摘要
With the ever-shrinking size of transistors and increasing scale of applications, silent data corruptions (SDCs) have become a common yet serious issue in HPC applications. Selective instruction duplication (SID) is a popular fault-tolerance technique that can obtain a high SDC coverage with low-performance overhead, as it selects the most vulnerable parts of a program for protection with priority. However, existing studies of SID are confined to single program input in the evaluation, assuming that the error resilience of the program remains similar across inputs, leading to a drastic loss of SDC coverage from SID when the protected program runs different inputs. Hence, we proposed Sentinel, an automated compiler-based framework to mitigate the loss of SDC coverage. Evaluation results show that Sentinel can effectively mitigate the loss of SDC coverage (up to 97.00%) across multiple inputs, which significantly hardens existing SID techniques.
更多
查看译文
关键词
Error Resilience, Fault Injection, Compiler, High Performance Computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要