Performance-Portable Sparse Matrix-Matrix Multiplication For Many-Core Architectures

Mehmet Deveci,Christian Trott,Sivasankaran Rajamanickam

2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)（2017）

引用 53|浏览39

暂无评分

摘要

We consider the problem of writing performance portable sparse matrix-sparse matrix multiplication (SPGEMM) kernel for many-core architectures. We approach the SPGEMM kernel from the perspectives of algorithm design and implementation, and its practical usage. First, we design a hierarchical, memory-efficient SPGEMM algorithm. We then design and implement thread scalable data structures that enable us to develop a portable SPGEMM implementation. We show that the method achieves performance portability on massively threaded architectures, namely Intel's Knights Landing processors (KNLs) and NVIDIA's Graphic Processing Units (GPUs), by comparing its performance to specialized implementations. Second, we study an important aspect of SPGEMM's usage in practice by reusing the structure of input matrices, and show speedups up to 3x compared to the best specialized implementation on KNLs. We demonstrate that the portable method outperforms 4 native methods on 2 different GPU architectures (up to 17x speedup), and it is highly thread scalable on KNLs, in which it obtains 101x speedup on 256 threads.

查看译文

关键词

KNL,GPUs,Sparse Matrix Multiplication,SpGEMM,Performance portability

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要