Pocket: ML Serving from the Edge

Misun Park,Ketan Bhardwaj,Ada Gavrilovska

EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems（2023）

引用 0|浏览12

暂无评分

摘要

One of the major challenges in serving ML applications is the resource pressure introduced by the underlying ML frameworks. This becomes a bigger problem at resource-constrained, multi-tenant edge server locations, where it is necessary to scale to a larger number of clients with a fixed resource envelope. Naive approaches which simply minimize the resource budget allocation of each application result in performance degradation that voids the benefits expected from operating at the edge. This paper presents Pocket - a new approach for serving ML applications in settings like the edge, based on a shared ML runtime backend as a service and lightweight ML application pocket containers. Key to realizing Pocket is use of lightweight IPC, support for cross-client isolation, and a novel resource amplification method which inlines resource reallocation with IPC. The latter ensures just-in-time assignment of the limited edge resources where they're most needed, thereby reducing contention effects and boosting overall performance and efficiency. Experimental evaluations demonstrate that Pocket can scale to 1.3-20x more clients with the same amount of resources while reducing response time by 20-80% compared to monolithic designs.

查看译文

关键词

ML serving,containers,resource management,isolation,runtime-as-a-service,IPC,edge computing,visual analytics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要