Per-channel Quantization Level Allocation for Quantizing Convolutional Neural Networks

2020 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia)（2020）

引用 4|浏览23

暂无评分

摘要

Deep convolutional neural network such as ResNet18/50 for ImageNet classification can be quantized into 3-bit or higher precision with the accuracy of full precision (FP32) baseline, yet 2-bit quantization incurs significant accuracy losses. In this work, we report, given that ranges of per-channel activation distributions vary across activation channels, 2-bit (4-level) quantization loses most of the information in the activation channel with a small range. To minimize the loss of information of the activation channel with a small range, we propose a novel quantization method called Per-channel Quantization Level Allocation (PCQLA), which quantizes activation into 2-bit precision with per-channel quantization clipping value according to the range of per-channel activation distribution. We also apply PCQLA mothed with outlier-aware quantization. On the ImageNet classification dataset, our method offers sufficient accuracies, comparable to the full precision baseline, with 2-bit activation quantization on ResNet-18/50.

查看译文

关键词

Convolutional Neural Network,Neural Network Compression,Quantization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要