Trainable Power-of-2 Scale Factors for Hardware-friendly Network Quantization

Pu Li,Jian Cao, Fang Li, Hongyi Yao, Yiwen Wang

2022 2nd International Conference on Computer, Control and Robotics (ICCCR)(2022)

引用 0|浏览0
暂无评分
摘要
Quantization can efficiently reduce the bitwidth of parameters in neural networks and accelerate both inference and data transmission, which is essential for deploying networks with a large parameter scale on resource-limited edge devices. However, most of the existing quantization methods use non-linear function (e.g., tanh, non-linear polynomial function), it is not only difficult to implement on hardware but also occupy plenty of computing resources. In addition, previous quantization methods seldom consider the hardware implementation of operators, which only simulate the quantization accuracy at the algorithm level. To address these problems, we propose novel \emph{Trainable Power-of-2 Scale Factors Quantization} (TPSQ) to combine power-of-2 quantization scale factor and trainable clamp interval, to benefit from the advantages of both. Several experiments over current mainstream vision tasks show that the performance of TPSQ surpasses most of the previous quantization methods, these prove the effectiveness of TPSQ. Finally, we implemented an FPGA accelerator for object detection to demonstrate the hardware friendliness of TPSQ. The experimental results show that the peak performance of our system is 300 GOP/s under 300 MHz working frequency, which is 72 times higher than the implementation on an ARM-A9 processor.
更多
查看译文
关键词
model compression,quantization,FPGA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要