Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
CoRR(2024)
摘要
We consider ML query processing in distributed systems where GPU-enabled
workers coordinate to execute complex queries: a computing style often seen in
applications that interact with users in support of image processing and
natural language processing. In such systems, coscheduling of GPU memory
management and task placement represents a promising opportunity. We propose
Compass, a novel framework that unifies these functions to reduce job latency
while using resources efficiently, placing tasks where data dependencies will
be satisfied, collocating tasks from the same job (when this will not overload
the host or its GPU), and efficiently managing GPU memory. Comparison with
other state of the art schedulers shows a significant reduction in completion
times while requiring the same amount or even fewer resources. In one case,
just half the servers were needed for processing the same workload.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要