Salvaging failing and straggling queries

2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022)(2022)

引用 2|浏览36
暂无评分
摘要
Interactive time responses are a crucial requirement for users analyzing large amounts of data, typically stored in a relational style data-warehouse where data is partitioned across thousands of nodes for high efficiency and throughput. However, consistently providing quick responses remains a big challenge for two reasons: (1) with data distributed across thousands of nodes, it is highly likely that some nodes are unavailable or are very slow during query execution and, (2) large number of users result in high resource contention which exacerbates the problem of slow and failing nodes. In such situations, systems typically straggle or fail the query resulting in higher latencies and wastage of resources. In this paper, we propose a novel solution to alleviate the failure/straggling problem: use the intermediate results from the partial query execution over available data, and exploit the statistical properties of efficiently partitioned data, particularly, co-hash partitioned data, to provide approximate answers along with confidence bounds. The proposed approach handles aggregate queries that involve joins, group bys, having clauses and a subclass of nested subqueries, covering a large portion of analytical queries. We validate our approach through extensive experiments on the TPC-H dataset and we observe that even with a low data availability of 1%, our proposed solution provides answers with less than 5% error.
更多
查看译文
关键词
approximate query processing,data partitioning,stragglers,node failures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要