Evaluating the Potential of Coscheduling on High-Performance Computing Systems

JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, JSSPP 2023(2023)

引用 0|浏览0
暂无评分
摘要
Modern high-performance computing (HPC) system designs have converged to heavyweight nodes with growing numbers of processors. If schedulers on these systems allocate nodes in an exclusive and dedicated manner, many HPC applications and scientific workflows will be unable to fully utilize and benefit from such hardware. This is because at such extreme scale, it will be difficult for modern HPC applications to utilize all of the node-level resources on these systems. In this paper, we investigate the potential of moving away from dedicated node allocation and instead using intelligent coscheduling-where multiple jobs can share node-level resources-to improve node utilization and therefore job turnaround time. We design and implement a coscheduling simulator, and, using traces from a high-end HPC cluster with 100K jobs and 1158 nodes, demonstrate that coscheduling can improve average turnaround times by up to 18% when compared to easy backfilling. Our results indicate that coscheduling has the potential to be a more efficient way to schedule jobs on high-end machines in both turnaround time and system and component utilization.
更多
查看译文
关键词
coscheduling,high-performance computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要