Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis(2023)

引用 0|浏览9
暂无评分
摘要
In this paper, we propose and evaluate several optimized implementations of the general matrix multiplication (gemm) on two different RISC-V architecture cores implementing the RISC-V vector extension (RVV): C906 and C910 from T-HEAD. Specifically, we address the performance portability problem across these processor cores by means of an automatic assembly code generator, written in Python, capable of emitting RVV code for high performance computing (HPC), with a variety of combinations of specific and general optimizations. Our experimental results using a number of automatically-generated micro-kernels for gemm, on both RISC-V architectures, reveal different impact of each optimization, depending on the target architecture, and highlight the importance of automatically generating HPC RVV code to achieve performance portability while reducing the developers’ effort. In addition, these optimizations show important performance gains with resepcto to a state-of-the-art tuned BLAS library (OpenBLAS), reaching 3 × and 1.3 × speed-ups for the C910 and C906, respectively.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要