A Deep Dive into Common Open Formats for Analytical DBMSs.

Chunwei Liu, Anna Pavlenko,Matteo Interlandi,Brandon Haynes

Proc. VLDB Endow.(2023)

引用 0|浏览4
暂无评分
摘要
This paper evaluates the suitability of Apache Arrow, Parquet, and ORC as formats for subsumption in an analytical DBMS. We systematically identify and explore the high-level features that are important to support efficient querying in modern OLAP DBMSs and evaluate the ability of each format to support these features. We find that each format has trade-offs that make it more or less suitable for use as a format in a DBMS and identify opportunities to more holistically co-design a unified in-memory and on-disk data representation. Our hope is that this study can be used as a guide for system developers designing and using these formats, as well as provide the community with directions to pursue for improving these common open formats.
更多
查看译文
关键词
analytical dbmss,common open formats
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要