Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS(2024)
摘要
Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings 1.45 $\times$ computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28- $\text{nm}$ CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.
更多查看译文
关键词
Approximate computing,energy-efficient design,Internet of Things (IoT),layerwise quantization,neural network (NN) processor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要