Asymptotic Scheduling For Many Task Computing In Big Data Platforms

Andrei Sfrent,Florin Pop

Information Sciences: an International Journal(2015)

引用 86|浏览89
暂无评分
摘要
Due to the advancement of technology the datasets that are being processed nowadays in modern computer clusters extend beyond the petabyte scale - the 4 detectors of the Large Hadron Collider at CERN produced several petabytes of data in 2011. Large scale computing solutions are increasingly used for genome sequencing tasks in the Human Genome Project. In the context of Big Data platforms, efficient scheduling algorithms play an essential role. This paper deals with the problem of scheduling a set of jobs across a set of machines and specifically analyzes the behavior of the system at very high loads, which is specific to Big Data processing. We show that under certain conditions we can easily discover the best scheduling algorithm, prove its optimality and compute its asymptotic throughput. We present a simulation infrastructure designed especially for building/analyzing different types of scenarios. This allows to extract scheduling metrics for three different algorithms (the asymptotically optimal one, FCFS and a traditional GA-based algorithm) in order to compare their performance. We focus on the transition period from low incoming job rates load to the very high load and back. Interestingly, all three algorithms experience a poor performance over the transition periods. Since the Asymptotically Optimal algorithm makes the assumption of an infinite number of jobs it can be used after the transition, when the job buffers are saturated. As the other scheduling algorithms do a better job under reduced load, we will combine them into a single hybrid algorithm and empirically determine what is the best switch point, offering in this way an asymptotic scheduling mechanism for many task computing used in Big Data processing platforms. (C) 2015 Elsevier Inc. All rights reserved.
更多
查看译文
关键词
Asymptotic scheduling,Many-task computing,Cloud computing,Big Data platforms,Simulation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要