A Learned Cost Model for Big Data Query Processing

Yan Li,Liwei Wang,Sheng Wang,Yuan Sun,Bolong Zheng,Zhiyong Peng

Information Sciences（2024）

引用 0|浏览3

暂无评分

摘要

The efficiency of query processing in the Spark SQL big data processing engine is significantly affected by execution plans and allocated resources. However, existing cost models for Spark SQL rely on hand-crafted rules. While learning-based cost models have been proposed for relational databases, they do not consider available resources. To address this issue, we propose a resource-aware deep learning model capable of automatically predicting query plan execution times based on historical data. To train our model, we embed query execution plans within a query plan tree and extracted features from allocated resources. An adaptive attention mechanisms is integrated into the deep learning model to enhance prediction accuracy. Additionally, we extract sufficient features to represent data information and learn the effect of the data on query execution. This approach reduces the need for model retraining owing to data changes. The experimental results demonstrate that our deep cost model outperforms traditional rule-based methods and relational database learning-based optimizers in predicting query plan execution times.

查看译文

关键词

Cost model,Big Data,Deep learning,Query optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要