A Model of Job Parallelism for Latency Reduction in Large-Scale Systems

arxiv(2022)

引用 0|浏览1
暂无评分
摘要
Processing computation-intensive jobs at multiple processing cores in parallel is essential in many real-world applications. In this paper, we consider an idealised model for job parallelism in which a job can be served simultaneously by $d$ distinct servers. The job is considered complete when the total amount of work done on it by the $d$ servers equals its size. We study the effect of parallelism on the average delay of jobs. Specifically, we analyze a system consisting of $n$ parallel processor sharing servers in which jobs arrive according to a Poisson process of rate $n \lambda$ ($\lambda <1$) and each job brings an exponentially distributed amount of work with unit mean. Upon arrival, a job selects $d$ servers uniformly at random and joins all the chosen servers simultaneously. We show by a mean-field analysis that, for fixed $d \geq 2$ and large $n$, the average occupancy of servers is $O(\log (1/(1-\lambda)))$ as $\lambda \to 1$ in comparison to $O(1/(1-\lambda))$ average occupancy for $d=1$. Thus, we obtain an exponential reduction in the response time of jobs through parallelism. We make significant progress towards rigorously justifying the mean-field analysis.
更多
查看译文
关键词
latency reduction,job parallelism,systems,large-scale large-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要