Harmony: A Scheduling Framework Optimized for Multiple Distributed Machine Learning Jobs

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)(2021)

引用 4|浏览57
暂无评分
摘要
We introduce Harmony, a new scheduling framework that executes multiple Parameter-Server ML training jobs together to improve cluster resource utilization. Harmony coordinates a fine-grained execution of co-located jobs with complementary resource usages to avoid contention and to efficiently share resources between the jobs. To resolve the memory pressure due to the increased number of simultaneo...
更多
查看译文
关键词
Training,Measurement,Adaptation models,Schedules,Processor scheduling,Conferences,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要