Progressive Partitioning for Parallelized Query Execution in Google's Napa.

Jun'ichi Tatemura,Tao Zou ,Jagan Sankaranarayanan,Yanlai Huang,Jim Chen,Yupu Zhang,Kevin Lai, Hao Zhang,Gokul Nath Babu Manoharan,Goetz Graefe,Divyakant Agrawal, Brad Adelberg,Shilpa Kolhar,Indrajit Roy

VLDB 2023（2023）

引用 0|浏览10

暂无评分

摘要

Napa holds Google's critical data warehouses in log-structured merge trees for real-time data ingestion and sub-second response for billions of queries per day. These queries are often multi-key look-ups in highly skewed tables and indexes. In our production experience, only progressive query-specific partitioning can achieve Napa's strict query latency SLOs. Here we advocate good-enough partitioning that keeps the per-query partitioning time low without risking uneven work distribution. Our design combines pragmatic system choices and algorithmic innovations. For instance, B-trees are augmented with statistics of key distributions, thus serving the dual purpose of aiding lookups and partitioning. Furthermore, progressive partitioning is designed to be "good enough" thereby balancing partitioning time with performance. The resulting system is robust and successfully serves day-in-day-out billions of queries with very high quality of service forming a core infrastructure at Google.

查看译文

关键词

parallelized query execution,partitioning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要