Reliability and Energy Data Analysis and Modeling for Extreme Scale Systems

semanticscholar(2014)

引用 1|浏览0
暂无评分
摘要
Reliability and energy are two of the top major concerns in the development of today’s supercomputers. To build a powerful machine while at the same time satisfying reliability requirement and energy constraint, HPC scientists continue to seek a better understanding of system and component behaviors. Toward this end, modern systems are deployed with various monitoring and logging tools to track reliability and energy data during system operations. Since these data contain important information about system reliability and energy, they are valuable resources for understanding system behaviors. However, as system scale and complexity continue to grow, the process from collecting system data to extracting meaningful knowledge out of overwhelming reliability and energy data faces a number of key challenges. To address these challenges, my work consists of three parts, including data preprocessing, data analysis and advanced modeling.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要