DQ-STP: An Efficient Sparse On-Device Training Processor Based on Low-Rank Decomposition and Quantization for DNN

Baoting Li, Danqing Zhang, Pengfei Zhao,Hang Wang,Xuchong Zhang,Hongbin Sun,Nanning Zheng

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS（2024）

引用 0|浏览21

暂无评分

摘要

Due to the bottleneck problems such as scenariovarying application, significant data communication overhead and privacy protection between off-line training and on-line inference, intelligent edge devices capable of adaptively fine-tuning the deep neural network (DNN) models for specific tasks have become the most urgent need. However, the computational cost is intolerable for ordinary on-device training (ODT), which inspires us to explore an efficient ODT processor, named DQSTP. In this paper, we leverage a series of optimization techniques using software -hardware co-design. On the one hand, the proposed design incorporates SVD-based low-rank decomposition, 2(n )quantization and ACBN algorithm on the software side. This unifies the sparse computing mode of convolutional layers and enhancing weight sparsity. On the other hand, the proposed design effectively leverages data sparsity on the hardware side through four techniques: 1) The flag compressed sparse row is proposed to compress input feature maps and gradient maps. 2) A unified processing element (PE) array comprising shifters and adders is proposed to expedite forward and error propagation steps. 3) The PE arrays for error propagation and weight gradients generation are separated to enhance throughput. 4) A sparse alignment strategy is proposed to further enhance PE utilization. Through these software and hardware co-optimization, the proposed DQ-STP achieves an area efficiency and peak energy efficiency of 41.2 GOPS/mm(2 )and 90.63 TOPS/W. In comparison to state-of-the-art reference designs, the proposed DQ-STP demonstrates a 2.19x improvement in normalized area efficiency and a 1.85x enhancement in energy efficiency.

查看译文

关键词

Deep neural network,weight low-rank decomposition,quantization,sparsity exploitation,on-device training processor

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要