Band: coordinated multi-DNN inference on heterogeneous mobile processors

Joo Seong Jeong,Jingyu Lee,Donghyun Kim, Changmin Jeon, Changjin Jeong,Youngki Lee,Byung-Gon Chun

Mobile Systems, Applications, and Services（2022）

引用 5|浏览44

暂无评分

摘要

BSTRACTThe rapid development of deep learning algorithms, as well as innovative hardware advancements, encourages multi-DNN workloads such as augmented reality applications. However, existing mobile inference frameworks like TensorFlow Lite and MNN fail to efficiently utilize heterogeneous processors available on mobile platforms, because they focus on running a single DNN on a specific processor. As mobile processors are too resource-limited to deliver reasonable performance for such workloads by their own, it is challenging to serve multi-DNN workloads with existing frameworks. This paper introduces Band, a new mobile inference system that coordinates multi-DNN workloads on heterogeneous processors. Band examines a DNN beforehand and partitions it into a set of subgraphs, while taking operator dependency into account. At runtime, Band dynamically selects a schedule of subgraphs from multiple possible schedules, following the scheduling goal of a pluggable scheduling policy. Fallback operators, which are not supported by certain mobile processors, are also considered when generating subgraphs. Evaluation results on mobile platforms show that our system outperforms TensorFlow Lite, a state-of-the-art mobile inference framework, by up to 5.04× for single-app workloads involving multiple DNNs. For a multi-app scenario consisting of latency-critical DNN requests, Band reaches up to 3.76× higher SLO satisfaction rate.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要