Cray System Monitoring Successes , Requirements , and Priorities

semanticscholar(2018)

引用 2|浏览5
暂无评分
摘要
Effective HPC system operations and utilization require unprecedented insight into system state, applications’ demands for resources, contention for shared resources, and system demands on center power and cooling. Monitoring can provide such insights when the necessary fundamental capabilities for data availability and usability are provided. In this paper, multiple Cray sites seek to motivate monitoring as a core capability in HPC design, through the presentation of success stories illustrating enhanced understanding and improved performance and/or operations as a result of monitoring and analysis. We present the utility, limitations, and gaps of the data necessary to enable the required insights. The capabilities developed to enable the case successes drive our identification and prioritization of monitoring system requirements. Ultimately, we seek to engage all HPC stakeholders to drive community and vendor progress on these priorities.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要