Meteor

Future Generation Computer Systems(2019)

引用 0|浏览0
暂无评分
摘要
Due to its speed and ease of use, Spark has become a popular tool amongst data scientists to analyze data in various sizes. Counter-intuitively, data processing workloads in industrial companies such as Google, Facebook, and Yahoo are dominated by short-running applications, which is due to the majority of applications being mostly consisted of simple SQL-like queries (Dean, 2004, Zaharia et al, 2008). Unfortunately, the current version of Spark is not optimized for such kinds of workloads. In this paper, we propose a novel framework, called Meteor, which can dramatically improve the performance for short-running applications. We extend Spark with three additional operating modes: one-thread, one-container, and distributed. The one-thread mode executes all tasks on just one thread; the one-container mode runs these tasks in one container by multi-threading; the distributed mode allocates all tasks over the whole cluster. A new framework for submitting applications is also designed, which utilizes a fine-grained Spark performance model to decide which of the three modes is the most efficient to invoke upon a new application submission. From our extensive experiments on Amazon EC2, one-thread mode is the optimal choice when the input size is small, otherwise the distributed mode is better. Overall, Meteor is up to 2 times faster than the original Spark for short applications. • Design a new scheduler which takes data locality and resource usage into account when allocating containers. • Design a one-thread mode for spark, where all tasks of a short application are computed on just one thread. • Design a one-container mode which deploys one container but with multiple virtual cores. • Design a submitter framework with a reasonable number of AM containers to quickly bootstrap short applications. • Design a profiler for Spark which uses bytecode instrumentation.
更多
查看译文
关键词
Spark,Short Application,Scheduling,Time-critical,Resource-sensitive
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要