Compact Filter Structures for Fast Data Partitioning

Qing Zheng,Charles D. Cranor,Ankush Jain,Gregory R. Ganger,Garth A. Gibson,George Amvrosiadis,Bradley W. Settlemyer,Gary A. Grider

semanticscholar（2019）

引用 0|浏览20

暂无评分

摘要

We are approaching a point in time when it will be infeasible to catalog and query data after it has been generated. This trend has fueled research on in-situ data processing (i.e. operating on data as it is streamed to storage). One important example of this approach is in-situ data indexing. Prior work has shown the feasibility of indexing at scale as a two-step process: first by partitioning data by key across CPU cores, and then by having each core produce indexes on its subset as data is persisted. Online partitioning requires that the data be shuffled over the network so that it can be indexed and stored by the responsible core. This is becoming more costly as new processors emphasize parallelism instead of individual core performance that is crucial for processing network events. In addition to indexing, scalable online data partitioning is also useful in other contexts such as efficient compression and load balancing. We present FilterKV, a data management scheme for faster online data partitioning of key-value (KV) pair data. FilterKV reduces the amount of data shuffled over the network by: (a) moving KV pairs quickly off the network to storage, and (b) using an extremely compact representation to represent each KV pair in the communication occurring over the network. We demonstrate FilterKV on the LANL Trinity cluster, and show that it can reduce total write time (including partitioning overhead) by up to 1.9-3.0x across 4096 processor cores.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要