Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Benoit Jacob,Skirmantas Kligys,Bo Chen,Menglong Zhu, Matthew Tang,Andrew Howard,Hartwig Adam,Dmitry Kalenichenko

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition（2017）

引用 3112|浏览360

暂无评分

摘要

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

查看译文

关键词

point inference,training procedure,CPU,integer-arithmetic-only inference,ImageNet classification,COCO detection,on-device inference schemes,deep learning-based models,intelligent mobile devices,neural networks,run-time efficiency,model family,quantization scheme,end-to-end model accuracy post quantization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要