Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications

Zeju Li, Qinfan Wang, Zihan Zou,Qiao Shen,Na Xie,Hao Cai, Hao Zhang,Bo Liu

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS(2024)

引用 0|浏览20
暂无评分
摘要
Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings 1.45 $\times$ computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28- $\text{nm}$ CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.
更多
查看译文
关键词
Approximate computing,energy-efficient design,Internet of Things (IoT),layerwise quantization,neural network (NN) processor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要