Delen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI

PROCEEDINGS 8TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2023(2023)

引用 0|浏览29
暂无评分
摘要
Model-serving systems expose machine learning (ML) models to applications programmatically via a high-level API. Cloud platforms use these systems to mask the complexities of optimally managing resources and servicing inference requests across multiple applications. Model serving at the edge is now also becoming increasingly important to support inference workloads with tight latency requirements. However, edge model serving differs substantially from cloud model serving in its latency, energy, and accuracy constraints: these systems must support multiple applications with widely different latency and accuracy requirements on embedded edge accelerators with limited computational and energy resources. To address the problem, this paper presents Delen,(1) a flexible and adaptive model-serving system for multi-tenant edge AI. Delen exposes a high-level API that enables individual edge applications to specify a bound at runtime on the latency, accuracy, or energy of their inference requests. We efficiently implement Delen using conditional execution in multi-exit deep neural networks (DNNs), which enables granular control over inference requests, and evaluate it on a resource-constrained Jetson Nano edge accelerator. We evaluate Delen flexibility by implementing state-of-the-art adaptation policies using Delen's API, and evaluate its adaptability under different workload dynamics and goals when running single and multiple applications.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要