Towards an Architecture for Management of Very Large Computing Systems
mag(2010)
摘要
Managing very large computing systems with up to 100.000 nodes has become a very complex issue. Existing tools reach their
limits especially for High Performance Computing (HPC) resources because they are slightly different from other compute resources.
First we will introduce the specific HPC obstacles and what we suppose to be challenges for future resources to support the
system management. After that we propose the framework designed in scope of the TIMaCS Project (http://www.timacs.de). Assuming that we once have a corresponding solution implemented we will show how this solution can change administration
far beyond the current situation. This is separated into a more technical part describing how the administration can be simplified
or where we can add new capabilities in resources provisioning and a business part where we outline the need for business
policy based management and scheduling, and show a possible approach investigating these relationships. In the end we will
show what might be possible far beyond the scope of the project.
更多查看译文
关键词
Schedule Policy, High Performance Computing, Business Policy, Error Handling, Security Constraint
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要