Bullion: A Column Store for Machine Learning
arxiv(2024)
摘要
The past two decades have witnessed columnar storage revolutionizing data
warehousing and analytics. However, the rapid growth of machine learning poses
new challenges to this domain. This paper presents Bullion, a columnar storage
system tailored for machine learning workloads. Bullion addresses the
complexities of data compliance, optimizes the encoding of long sequence sparse
features, efficiently manages wide-table projections, and introduces feature
quantization in storage. By aligning with the evolving requirements of ML
applications, Bullion extends columnar storage to various scenarios, from
advertising and recommendation systems to the expanding realm of Generative AI.
Preliminary experimental results and theoretical analysis demonstrate
Bullion's superior performance in handling the unique demands of machine
learning workloads compared to existing columnar storage solutions. Bullion
significantly reduces I/O costs for deletion compliance, achieves substantial
storage savings with its optimized encoding scheme for sparse features, and
drastically improves metadata parsing speed for wide-table projections. These
advancements position Bullion as a critical component in the future of machine
learning infrastructure, enabling organizations to efficiently manage and
process the massive volumes of data required for training and inference in
modern AI applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要