A 1-TFLOPS/W, 28-nm Deep Neural Network Accelerator Featuring Online Compression and Decompression and BF16 Digital In-Memory-Computing Hardware

2024 IEEE Custom Integrated Circuits Conference (CICC)(2024)

引用 0|浏览1
暂无评分
摘要
With the recent advances in deep neural networks (DNN), researchers have proposed various hardware accelerators. However, many neglected the energy consumption of off-chip memory access for weight and activation data, which can dominate the total energy consumption. To reduce off-chip data traffic, some works adopted aggressively quantized arithmetic, such as 1-4b fixed-point (FX1-FX4) or block-floating-point (BFP). However, they achieve only a limited computation precision, hurting DNN inference accuracy.
更多
查看译文
关键词
Deep Neural Network,Online Compression,Root Mean Square Error,Energy Consumption,Energy Efficiency,Processing Unit,Weight Data,Lookup Table,Compressor,Binary Search,Hardware Accelerators,Compression Algorithm,Network Training Process,Bit-width,Computational Layers,Round Of Search,Alignment Blocks,Off-chip Memory,NOR Gate,Binary Search Algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要