A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning
arxiv(2024)
摘要
As Spark becomes a common big data analytics platform, its growing complexity
makes automatic tuning of numerous parameters critical for performance. Our
work on Spark parameter tuning is particularly motivated by two recent trends:
Spark's Adaptive Query Execution (AQE) based on runtime statistics, and the
increasingly popular Spark cloud deployments that make cost-performance
reasoning crucial for the end user. This paper presents our design of a Spark
optimizer that controls all tunable parameters (collectively called a
"configuration") of each query in the new AQE architecture to explore its
performance benefits and, at the same time, casts the tuning problem in the
theoretically sound multi-objective optimization setting to better adapt to
user cost-performance preferences.
To this end, we propose a novel hybrid compile-time/runtime approach to
multi-granularity tuning of diverse, correlated Spark parameters, as well as a
suite of modeling and optimization techniques to solve the tuning problem in
the MOO setting while meeting the stringent time constraint of 1-2 seconds for
cloud use. Our evaluation results using the TPC-H and TPC-DS benchmarks
demonstrate the superior performance of our approach: (i) When prioritizing
latency, it achieves an average of 61
respectively, under the solving time of 0.62-0.83 sec, outperforming the most
competitive MOO method that reduces only 18-25
of 2.4-15 sec. (ii) When shifting preferences between latency and cost, our
approach dominates the solutions from alternative methods by a wide margin,
exhibiting superior adaptability to varying preferences.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要