A 1-TFLOPS/W, 28-nm Deep Neural Network Accelerator Featuring Online Compression and Decompression and BF16 Digital In-Memory-Computing Hardware

2024 IEEE Custom Integrated Circuits Conference (CICC)（2024）

引用 0|浏览1

暂无评分

摘要

With the recent advances in deep neural networks (DNN), researchers have proposed various hardware accelerators. However, many neglected the energy consumption of off-chip memory access for weight and activation data, which can dominate the total energy consumption. To reduce off-chip data traffic, some works adopted aggressively quantized arithmetic, such as 1-4b fixed-point (FX1-FX4) or block-floating-point (BFP). However, they achieve only a limited computation precision, hurting DNN inference accuracy.

查看译文

关键词

Deep Neural Network,Online Compression,Root Mean Square Error,Energy Consumption,Energy Efficiency,Processing Unit,Weight Data,Lookup Table,Compressor,Binary Search,Hardware Accelerators,Compression Algorithm,Network Training Process,Bit-width,Computational Layers,Round Of Search,Alignment Blocks,Off-chip Memory,NOR Gate,Binary Search Algorithm

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要