Taming Model Serving Complexity, Performance and Cost: A Compilation to Tensor Computations Approach

semanticscholar(2020)

引用 0|浏览9
暂无评分
摘要
Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure—the bespoke solutions typical in large web companies are simply untenable. Model scoring, the process of obtaining prediction from a trained model over new data, is a primary contributor to infrastructure complexity and cost, as models are trained once but used many times. In this paper we propose HUMMINGBIRD, a novel approach to model scoring, which compiles featurization operators and traditional ML models (e.g., decision trees) into a small set of tensor operations. This approach inherently reduces infrastructure complexity and directly leverages existing investments in Neural Networks’ compilers and runtimes to generate efficient computations for both CPU and hardware accelerators. Our performance results are surprising: despite replacing sparse computations (e.g., tree traversals) with dense ones (tensor operations), HUMMINGBIRD is competitive and even outperforms (by up to 3×) hand-crafted kernels on micro-benchmarks, while enabling seamless end-to-end acceleration (with a speedup of up to 1200×) of ML pipelines. PVLDB Reference Format: Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, and Matteo Interlandi. Taming Model Serving Complexity, Performance and Cost: A Compilation to Tensor Computations Approach. PVLDB, 00(0): xxx-yyy, 2020. DOI: https://doi.org/TBD
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要