Towards a Unified Implementation of GEMM in BLIS

ICS(2023)

引用 0|浏览10
暂无评分
摘要
Matrix libraries often focus on achieving high performance for problems considered to be either "small" or "large", as these two scenarios tend to respond best to different optimization strategies. We propose a unified technique for implementing matrix operations like general matrix multiplication (GEMM) that can achieve high performance for both small and large problem sizes. The key is to fuse packing - an operation that copies data to a contiguous layout in memory and which is critical for large matrix performance - with the first computational "pass" over that data. This boosts performance across the problem size spectrum. As a result, tuning general-purpose libraries becomes simpler since it obviates the need to carefully express and parameterize logic that chooses between a "small matrix" strategy and a "large matrix" strategy. A prototype implementation of the technique built with the BLASlike Library Instantiation Software (BLIS) framework is described and performance on a range of architectures is reported.
更多
查看译文
关键词
BLAS,BLIS,matrix multiplication,performance,CPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要