ArchitectureWith Precision-Scaling 3 for Deep Learning

Purab Ranjan Sutradhar,Mark Connolly,Sathwika Bavikadi, Sai Manoj, Mark A. Indovina, Amlan Ganguly

semanticscholar(2020)

引用 0|浏览1
暂无评分
摘要
9 Abstract—Memory access latencies and low data transfer bandwidth limit the 10 processing speed of many data intensive applications such as Convolutional 11 Neural Networks (CNNs) in conventional Von Neumann architectures. Processing 12 in Memory (PIM) is envisioned as a potential hardware solution for such 13 applications as the data access bottlenecks can be avoided in PIM by performing 14 computations within the memory die. However, PIM realizations with logic-based 15 complex processing units within the memory present complicated fabrication 16 challenges. In this letter, we propose to leverage the existing memory 17 infrastructure to implement a programmable PIM (pPIM), a novel Look-Up-Table 18 (LUT)-based PIM where all the processing units are implemented solely with 19 LUTs, as opposed to prior LUT-based PIM implementations that combine LUT with 20 logic circuitry for computations. This enables pPIM to perform ultra-low power & 21 low-latency operations with minimal fabrication complications. Moreover, the 22 complete LUT-based design offers simple ‘memory write’ based programmability 23 in pPIM. Enabling precision scaling further improves the performance and the 24 power consumption for CNN applications. The programmability feature potentially 25 makes it easier for online training implementations. Our preliminary simulations 26 demonstrate that our proposed pPIM can achieve 2000x, 657.5x and 1.46x 27 improvement in inference throughput per unit power consumption compared to 28 state-of-the-art conventional processor architecture, Graphics Processing Unit 29 (GPUs) and a prior hybrid LUT-logic based PIM respectively. Furthermore, 30 precision scaling improves the energy efficiency of the pPIM approximately by 31 1.35x over its full-precision operation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要