AWARE: Workload-aware, Redundancy-exploiting Linear Algebra

Proceedings of the ACM on Management of Data（2023）

引用 0|浏览2

暂无评分

摘要

Compression is an effective technique for fitting data in available memory, reducing I/O, and increasing instruction parallelism. While data systems primarily rely on lossless compression, modern machine learning (ML) systems exploit the approximate nature of ML and mostly use lossy compression via low-precision floating- or fixed-point representations. The resulting unknown impact on learning progress, and model accuracy, however, create trust concerns, that require trial and error, and are problematic for declarative ML pipelines. Given the trend towards increasingly complex, composite ML pipelines---with outer loops for hyper-parameter tuning, feature selection, and data cleaning/augmentation---it is hard for a user to infer the impact of lossy compression. Sparsity exploitation is a common lossless scheme used to improve performance without this uncertainty. Evolving this concept to general redundancy-exploiting compression is a natural next step. Existing work on lossless compression and compressed linear algebra (CLA) enable such exploitation to a degree, but face challenges for general applicability. In this paper, we address these limitations with a workload-aware compression framework, comprising a broad spectrum of new compression schemes and kernels. Instead of a data-centric approach that optimizes compression ratios, our workload-aware compression summarizes the workload of an ML pipeline, and optimizes the compression and execution plan to minimize execution time. On various micro benchmarks and end-to-end ML pipelines, we observe improvements for individual operations up to 10,000x and ML algorithms up to νmprint6.6 x compared to uncompressed operations.

查看译文

关键词

linear,workload-aware,redundancy-exploiting

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要