On Quantization of Image Classification Neural Networks for Compression Without Retraining.

ICIP（2022）

引用 1|浏览11

暂无评分

摘要

We studied the quantization of neural networks for their compression and representation without retraining. The goal is to facilitate neural network representation and deployment in standard formats so that general networks may have their weights quantized and entropy coded within the deployment format. We relate weight entropy and model accuracy and try to evaluate distribution of weights against known distributions. Many scalar quantization strategies were tested. We have found that weights are typically approximated by a Laplacian distribution for which optimal quantizers are approximated by entropy-coded uniform quantizers with dead-zones. Results indicate that it is possible to reduce 8-fold the size of the popular image classification networks with accuracy losses near 1%.

查看译文

关键词

Neural network compression, weight quantization, ONNX file compression

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要