Cuttle: Enabling Cross-Column Compression in Distributed Column Stores

WEB AND BIG DATA, APWEB-WAIM 2017, PT II(2017)

引用 3|浏览28
暂无评分
摘要
We observe that, in real-world distributed data warehouse systems, data columns from different sources often exhibit redundancy. Even though these systems can employ both general and column-oriented compression schemes to reduce the data storage pressure, such cross-column redundancy (CCR) is not recognized or exploited effectively. Therefore, we propose Cuttle, a column storage system that enables cross-column compression to reduce CCR. Specifically, we identify three kinds of CCR and develop a referential transformation encoding (RTE) scheme to compress multiple columns of data with CCR. Furthermore, we address the CCR selection problem and propose a greedy algorithm to generate cross-column compression schemes. Our experiments on real-world datasets show that Cuttle can further reduce data size by half after applying both the column-oriented and general compression schemes, and that the query processing performance with Cuttle is improved by \(20\%\) without any change to the application programs.
更多
查看译文
关键词
Big data,Compression,Storage optimization,Data management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要