Cluster-Reduce: Compressing Sketches for Distributed Data Streams

Knowledge Discovery and Data Mining(2021)

引用 8|浏览47
暂无评分
摘要
ABSTRACTSketches, a type of probabilistic algorithms, have been widely accepted as the approximate summary of data streams. Compressing sketches is the best choice in distributed data streams to reduce communication overhead. The ideal compression algorithm should meet the following three requirements: high efficiency of compression procedure, support of direct query without decompression, and high accuracy of compressed sketches. However, no prior work can meet these requirements at the same time. Especially, the accuracy is poor after compression using existing methods. In this paper, we propose Cluster-Reduce, a framework for compressing sketches, which can meet all three requirements. Our key technique nearness clustering rearranges the adjacent counters with similar values in the sketch to significantly improve the accuracy. We use Cluster-Reduce to compress four kinds of sketches in two use-cases: distributed data streams and distributed machine learning. Extensive experimental results show that Cluster-Reduce can achieve up to 60 times smaller error than prior works. The source codes of Cluster-Reduce are available at Github anonymously[1].
更多
查看译文
关键词
Sketch Compression, Distributed Data Streams, Distributed ML
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要