High-Performance Stateful Stream Processing On Solid-State Drives

9TH ASIA-PACIFIC SYSTEMS WORKSHOP 2018 (APSYS'18)(2018)

引用 2|浏览44
暂无评分
摘要
Stream processing has been widely used in big data analytics because it provides real-time information on continuously incoming data streams with low latency. As the volume of data increases and the processing logic becomes more complicated, the size of internal states in stream processing applications also increases. To deal with large states efficiently, modern stream processing systems support storing internal states on solid state drives (SSDs) by utilizing persistent key-value (KV) stores optimized for SSDs. For example, Apache Flink and Apache Samza store internal states on RocksDB. However, delegating state management to persistent KV stores degrades the performance, because the KV stores cannot optimize their state management strategies according to stream query semantics as they are not aware of the query semantics. In this paper, we investigate the performance limitations of current state management approaches on SSDs and show that query-aware optimizations can significantly improve the performance of stateful query processing on SSDs. Based on our observation, we propose a new stream processing system design with static and runtime query-aware optimizations. We also discuss additional research directions on integrating emerging storage technologies with stateful stream processing.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要