JUGGLER: Autonomous Cost Optimization and Performance Prediction of Big Data Applications

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)(2022)

引用 10|浏览6
暂无评分
摘要
Distributed in-memory processing frameworks accelerate iterative workloads by caching suitable datasets in memory rather than recomputing them in each iteration. Selecting appropriate datasets to cache as well as allocating a suitable cluster configuration for caching these datasets play a crucial role in achieving optimal performance. In practice, both are tedious, time-consuming tasks and are often neglected by end users, who are typically not aware of workload semantics, sizes of intermediate data, and cluster specification. To address these problems, we present JUGGLER, an end-to-end framework, which autonomously selects appropriate datasets for caching and recommends a correspondingly suitable cluster configuration to end users, with the aim of achieving optimal execution time and cost. We evaluate JUGGLER on various iterative, real-world, machine learning applications. Compared with our baseline, JUGGLER reduces execution time to 25.1 % and cost to 58.1 %, on average, as a result of selecting suitable datasets for caching. It recommends optimal cluster configuration in 50 % of cases and near-to-optimal configuration in the remaining cases. Moreover, JUGGLER achieves an average performance prediction accuracy of 90 %.
更多
查看译文
关键词
database caching, cluster configuration, performance prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要