System-wide trade-off modeling of performance, power, and resilience on petascale systems

The Journal of Supercomputing(2018)

引用 20|浏览55
暂无评分
摘要
While performance remains a major objective in the field of high-performance computing (HPC), future systems will have to deliver desired performance under both reliability and energy constraints. Although a number of resilience methods and power management techniques have been presented to address the reliability and energy concerns, the trade-offs among performance, power, and resilience are not well understood, especially in HPC systems with unprecedented scale and complexity. In this work, we present a co-modeling mechanism named TOPPER (system-wide Trade-Off modeling for Performance, PowEr, and Resilience). TOPPER is build with colored Petri nets which allow us to capture the dynamic, complicated interactions and dependencies among different factors such as workload characteristics, hardware reliability, runtime system operation, on a petascale machine. Using system traces collected from a production supercomputer, we conducted a series of experiments to analyze various resilience methods, power capping techniques, and job characteristics in terms of system-wide performance and energy consumption. Our results provide interesting insights regarding performance–power–resilience trade-offs on HPC systems.
更多
查看译文
关键词
Performance–power–resilience modeling,Trade-off analysis,Petaflop systems,Colored Petri nets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要