DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

CoRR(2023)

引用 1|浏览0
暂无评分
摘要
A lot of recent progress has been made in ultra low-bit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model accuracy that is comparable to full-precision floating-point baselines even with sub-byte quantization. However, it is extremely challenging to deploy these ultra low-bit quantized models on mainstream CPU devices because commodity SIMD (Single Instruction, Multiple Data) hardware typically supports no less than 8-bit precision. To overcome this limitation, we propose DeepGEMM, a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. The proposed method precomputes all possible products of weights and activations, stores them in a lookup table, and efficiently accesses them at inference time to avoid costly multiply-accumulate operations. Our 2-bit implementation outperforms corresponding 8-bit integer kernels in the QNNPACK framework by up to 1.74x on x86 platforms.
更多
查看译文
关键词
2-bit implementation,8-bit integer kernels,8-bit precision,commodity SIMD,CPU architectures,DeepGEMM,edge devices,energy consumption,full-precision floating-point baselines,inference time,Learned Step Size Quantization,lookup table based approach,low-precision convolutional neural networks,mainstream CPU devices,memory footprint,model accuracy,Quantization methods,SIMD hardware,sub-byte quantization,ultra low-bit quantization,ultra low-bit quantized models,ultra low-precision inference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要