Efficient Evaluation of Scheduling Metrics Using Emulation A Case Study in the Effect of Artefacts

47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP '18)(2018)

引用 1|浏览0
暂无评分
摘要
Scheduling algorithms have a significant impact in the optimal utilization of HPC facilities. Waiting time, response time, slowdown and weighted slowdown are classical metrics used to compare the performance of different scheduling algorithms. This paper investigates the effects of four artefacts, namely non-determinism, shuffling, time shrinking and sampling, on these metrics. We present a scheduling framework based on emulation, that is, using a real scheduler (Slurm) with a sleep program able to take into account periods of suspension. The framework is able to emulate a 50K core cluster using 10 virtualized nodes, with the scheduler running on an isolated node. We find that the non-determinism in repeatedly running a workload has a small but discernible effect of these metrics, and that shuffling job order in a workload increases this by a factor of 5-10. Experiments with shuffled workloads indicate that the average difference of the Backfill and Suspend-Resume strategy performance is within this variation. We also propose methodologies for time shrinking and sampling to decrease the duration of emulations, while aiming to keep these metrics invariant (or linear variant) with the original workload. We find that time shrinking to a factor of up to 90% can have similar effect on the metrics as non-determinism. For sampling, our methodology preserved the distribution of job sizes to a high extent, but had a variation in the metrics somewhat greater than for shuffling. Finally, we use our framework to study in-depth Slurm's scheduling performance, and discover a deficiency in the Suspend-Resume implementation.
更多
查看译文
关键词
parallel job scheduling,classical scheduling metrics,emulation,Slurm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要