This is SPATEM! A Spatial-Temporal Optimization Framework for Efficient Inference on ReRAM-based CNN Accelerator

2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)(2022)

引用 1|浏览13
暂无评分
摘要
Resistive memory-based computing-in-memory (CIM) has been considered as a promising solution to accelerate convolutional neural networks (CNN) inference, which stores the weights in crossbar memory arrays and performs in-situ matrix-vector multiplications (MVMs) in an analog manner. Several techniques assume that a whole crossbar can operate concurrently and discuss how to efficiently map the weights onto crossbar arrays. However, in practice, the accumulated effect of per-cell current deviation and Analog-to-Digital-Converter overhead may greatly degrade inference accuracy, which motivates the concept of Operation Unit (OU), by which an operation per cycle in a crossbar only involve limited wordlines and bitlines to preserve satisfactory inference accuracy. With OU-based operations, the mapping of weights and scheduling strategy for parallelizing CNN convolution operations should take the cost of communication overhead and resource utilization into consideration to optimize the inference acceleration. In this work, we propose the first optimization framework named SPATEM, that efficiently executes MVMs with OU-based operations on ReRAM-based CIM accelerators. It decouples the design space into tractable steps, models the expected inference latency, and derives an optimized spatial-temporal-aware scheduling strategy. By comparing with state-of-the-arts, the experimental result shows that the derived scheduling strategy of SPATEM achieves on average 29.24% inference latency reduction with 31.28% less communication overhead by exploiting more originally unused crossbar cells.
更多
查看译文
关键词
communication overhead,inference acceleration,SPATEM,MVMs,OU-based operations,ReRAM-based CIM accelerators,spatial-temporal-aware scheduling strategy,derived scheduling strategy,inference latency reduction,unused crossbar cells,spatial-temporal optimization framework,ReRAM-based CNN accelerator,resistive memory-based computing-in-memory,convolutional neural network inference,crossbar memory arrays,matrix-vector multiplications,Operation Unit,satisfactory inference accuracy,CNN convolution operations,analog-to-digital-converter overhead,in-situ matrix-vector multiplications,per-cell current deviation,bitlines,limited wordlines,weight mapping,optimization framework,design space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要