[Demo] Low-latency Spark Queries on Updatable Data
Proceedings of the 2019 International Conference on Management of Data(2019)
摘要
As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbenchmarks and real-world graph processing queries, in datasets that are continuously growing.
更多查看译文
关键词
indexing, low-latency, query, spark
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络