Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

Meng Wang, Liang Qian, Na Meng,Yusong Cheng,Weiwei Fang

2023 IEEE 14th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)（2023）

引用 0|浏览0

暂无评分

摘要

Deep neural networks (DNNs) have recently gained widespread application in various domains. However, the computational and memory requirements of DNN models pose challenges for deploying them on resource-constrained edge devices. With the widespread use of the Internet of Things (IoT), heterogeneous edge devices with diverse computational capabilities and network conditions are increasingly employed for DNN inference. This paper proposes a distributed DNN model deployment scheme for edge device clusters. The DNN model partitioning is performed using two algorithms: Edge Layer Partitioning (EdgeLP), which partitions a single neural network layer, and Edge Model Partitioning (EdgeMP), which performs complete model partitioning. These algorithms consider both the computational capabilities and network conditions of the edge devices. To address the transmission overhead between collaborative edge devices, layer fusion and data quantization are applied to reduce the amount of transmitted data. Experimental results show that our method dramatically improves the performance of distributed DNN inference in heterogeneous scenarios. Specifically, on a cluster of three edge devices, the proposed scheme achieves DNN inference time acceleration speedup of 1.38–1.72× without accuracy loss compared to a state-of-the-art scheme.

查看译文

关键词

distributed DNN inference,model partitioning,edge computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要