Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading

Daliang Xu,Mengwei Xu,Qipeng Wang,Shangguang Wang,Yun Ma,Kang Huang,Gang Huang,Xin Jin,Xuanzhe Liu

Mobile Computing and Networking（2022）

引用 21|浏览66

暂无评分

摘要

This paper proposes Mandheling, the first system that enables highly resource-efficient on-device training by orchestrating mixed-precision training with on-chip Digital Signal Processor (DSP) offloading. Mandheling fully explores the advantages of DSP in integer-based numerical calculations using four novel techniques: (1) a CPU-DSP co-scheduling scheme to situationally mitigate the overhead from DSP-unfriendly operators; (2) a self-adaptive rescaling algorithm to reduce the overhead of dynamic rescaling in backward propagation; (3) a batch-splitting algorithm to improve DSP cache efficiency; (4) a DSP compute subgraph-reusing mechanism to eliminate the preparation overhead on DSP. We have fully implemented Mandheling and demonstrated its effectiveness through extensive experiments. The results show that, compared to the state-of-the-art DNN engines from TFLite and MNN, Mandheling reduces per-batch training time by 5.5x and energy consumption by 8.9x on average. In end-to-end training tasks, Mandheling reduces convergence time by up to 10.7x and energy consumption by 13.1x, with only 1.9%-2.7% accuracy loss compared to the FP32 precision setting.

查看译文

关键词

dnn,training,mixed-precision,on-device

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要