谷歌浏览器插件
订阅小程序
在清言上使用

Sparsity-aware and Re-Configurable NPU Architecture for Samsung Flagship Mobile SoC

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021)(2021)

引用 46|浏览59
暂无评分
摘要
Of late, deep neural networks have become ubiquitous in mobile applications. As mobile devices generally require immediate response while maintaining user privacy, the demand for on-device machine learning technology is on the increase. Nevertheless, mobile devices suffer from restricted hardware resources, whereas deep neural networks involve considerable computation and communication. Therefore, the implementation of a neural-network specialized hardware accelerator, generally called neural processing unit (NPU), has started to gain attention for the mobile application processor (AP). However, NPUs for commercial mobile AP face two challenges that are difficult to realize simultaneously: execution of a wide range of applications and efficient performance. In this paper, we propose a flexible but efficient NPU architecture for a Samsung flagship mobile system-on-chip (SoC). To implement an efficient NPU, we design an energy-efficient inner-product engine that utilizes the input feature map sparsity. We propose a re-configurable MAC array to enhance the flexibility of the proposed NPU, dynamic internal memory port assignment to maximize on-chip memory bandwidth utilization, and efficient architecture to support mixed-precision arithmetic. We implement the proposed NPU using the Samsung 5nm library. Our silicon measurement experiments demonstrate that the proposed NPU achieves 290.7 FPS and 13.6 TOPS/W, when executing an 8-bit quantized Inception-v3 model [1] with a single NPU core. In addition, we analyze the proposed zero-skipping architecture in detail. Finally, we present the findings and lessons learned when implementing the commercial mobile NPU and interesting avenues for future work.
更多
查看译文
关键词
neural processing unit,neural network,accelerator,sparsity,mixed-precision,re-configurable
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要