Amoeba: A Shape changing Storage System for Big Data.

PVLDB(2016)

引用 19|浏览103
暂无评分
摘要
Data partitioning significantly improves the query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset for a given query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload upfront. Furthermore, workloads change over time as businesses evolve or as analysts gain better understanding of their data. Static workload-based data partitioning techniques are therefore not suitable for such settings. In this paper, we describe the demonstration of Amoeba, a distributed storage system which uses adaptive multi-attribute data partitioning to efficiently support ad-hoc as well as recurring queries. Amoeba applies a robust partitioning algorithm such that ad-hoc queries on all attributes have similar performance gains. Thereafter, Amoeba adaptively repartitions the data based on the observed query sequence, i.e., the system improves over time. All along Amoeba offers both adaptivity (i.e., adjustments according to workload changes) as well as robustness (i.e., avoiding performance spikes due to workload changes). We propose to demonstrate Amoeba on scenarios from an internet-of-things startup that tracks user driving patterns. We invite the audience to interactively fire fast ad-hoc queries, observe multi-dimensional adaptivity, and play with a robust/reactive knob in Amoeba. The web front end displays the layout changes, runtime costs, and compares it to Spark with both default and workload-aware partitioning.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要