LIBXSMM: accelerating small matrix multiplications by runtime code generation.

Alexander Heinecke,Greg Henry,Maxwell Hutchinson,Hans Pabst

SC（2016）

引用 196|浏览176

暂无评分

摘要

Many modern highly scalable scientific simulations packages rely on small matrix multiplications as their main computational engine. Math libraries or compilers are unlikely to provide the best possible kernel performance. To address this issue, we present a library which provides high performance small matrix multiplications targeting all recent x86 vector instruction set extensions up to Intel AVX-512. Our evaluation proves that speed-ups of more than 10 x are possible depending on the CPU and application. These speed-ups are achieved by a combination of several novel technologies. We use a code generator which has a built-in architectural model to create code which runs well without requiring an auto-tuning phase. Since such code is very specialized we leverage just-in-time compilation to only build the required kernel variant at runtime. To keep ease-of-use, overhead, and kernel management under control we accompany our library with a BLAS-compliant frontend which features a multi-level code-cache hierarchy.

查看译文

关键词

Small GEMM,JIT compilation,code generation,Block CSR,FEM,SEM

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要