Layercake: Efficient Inference Serving with Cloud and Mobile Resources

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)(2023)

引用 0|浏览11
暂无评分
摘要
Many mobile applications are now integrating deep learning models into their core functionality. These functionalities have diverse latency requirements while demanding high-accuracy results. Currently, mobile applications statically decide to use either in-cloud inference, relying on a fast and consistent network, or on-device execution, relying on sufficient local resources. However, neither mobile networks nor computation resources deliver consistent performance in practice. Consequently, mobile inference often experiences variable performance or struggles to meet performance goals, when inference execution decisions are not made dynamically. In this paper, we introduce Layer Cake, a deep-learning inference framework that dynamically selects the best model and location for executing inferences. Layercake accomplishes this by tracking model state and availability, both locally and remotely, as well as the network bandwidth, allowing for accurate estimations of model response time. By doing so, Layercake achieves latency targets in up to 96.4% of cases, which is an improvement of 16.7% over similar systems, while decreasing the cost of cloud-based resources by over 68.33% than in-cloud inference.
更多
查看译文
关键词
Deep-Learning Inference,Mobile Devices,Cloud Computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要