Algorithm and Hardware Co-Optimized Solution for Large Spmv Problems

Fazle Sadi,Larry Pileggi,Franz Franchetti

IEEE Conference on High Performance Extreme Computing（2017）

引用 7|浏览32

暂无评分

摘要

Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel for many scientific and engineering applications. However, SpMV performance and efficiency are poor on commercial of-the-shelf (COTS) architectures, specially when the data size exceeds on-chip memory or last level cache (LLC). In this work we present an algorithm co-optimized hardware accelerator for large SpMV problems. We start with exploring the basic difference in data transfer characteristics for various SpMV algorithms. We propose an algorithm that requires the least amount of data transfer while ensuring main memory streaming for all accesses. However, the proposed algorithm requires an efficient multi-way merge, which is difficult to achieve with COTS architectures. Hence, we propose a hardware accelerator model that includes an Application Specific Integrated Circuit (ASIC) for the muti-way merge operation. The proposed accelerator incorporates state of the art 3D stacked High Bandwidth Memory (HBM) in order to demonstrate the proposed algorithm's capability coupled with the latest technologies. Simulation results using standard benchmarks show improvements of over 100× against COTS architectures with commercial libraries for both energy efficiency and performance.

查看译文

关键词

hardware co-optimized solution,data size,co-optimized hardware accelerator,data transfer characteristics,SpMV algorithms,COTS architectures,hardware accelerator model,sparse matrix-vector multiplication,application specific integrated circuit,muti-way merge operation,3D stacked High Bandwidth Memory,main memory streaming,commercial of-the-shelf architectures

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要