A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning

arxiv(2024)

引用 0|浏览0
暂无评分
摘要
As Spark becomes a common big data analytics platform, its growing complexity makes automatic tuning of numerous parameters critical for performance. Our work on Spark parameter tuning is particularly motivated by two recent trends: Spark's Adaptive Query Execution (AQE) based on runtime statistics, and the increasingly popular Spark cloud deployments that make cost-performance reasoning crucial for the end user. This paper presents our design of a Spark optimizer that controls all tunable parameters (collectively called a "configuration") of each query in the new AQE architecture to explore its performance benefits and, at the same time, casts the tuning problem in the theoretically sound multi-objective optimization setting to better adapt to user cost-performance preferences. To this end, we propose a novel hybrid compile-time/runtime approach to multi-granularity tuning of diverse, correlated Spark parameters, as well as a suite of modeling and optimization techniques to solve the tuning problem in the MOO setting while meeting the stringent time constraint of 1-2 seconds for cloud use. Our evaluation results using the TPC-H and TPC-DS benchmarks demonstrate the superior performance of our approach: (i) When prioritizing latency, it achieves an average of 61 respectively, under the solving time of 0.62-0.83 sec, outperforming the most competitive MOO method that reduces only 18-25 of 2.4-15 sec. (ii) When shifting preferences between latency and cost, our approach dominates the solutions from alternative methods by a wide margin, exhibiting superior adaptability to varying preferences.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要