谷歌浏览器插件
订阅小程序
在清言上使用

Algorithm and Hardware Co-Optimized Solution for Large Spmv Problems

IEEE Conference on High Performance Extreme Computing(2017)

引用 7|浏览32
暂无评分
摘要
Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel for many scientific and engineering applications. However, SpMV performance and efficiency are poor on commercial of-the-shelf (COTS) architectures, specially when the data size exceeds on-chip memory or last level cache (LLC). In this work we present an algorithm co-optimized hardware accelerator for large SpMV problems. We start with exploring the basic difference in data transfer characteristics for various SpMV algorithms. We propose an algorithm that requires the least amount of data transfer while ensuring main memory streaming for all accesses. However, the proposed algorithm requires an efficient multi-way merge, which is difficult to achieve with COTS architectures. Hence, we propose a hardware accelerator model that includes an Application Specific Integrated Circuit (ASIC) for the muti-way merge operation. The proposed accelerator incorporates state of the art 3D stacked High Bandwidth Memory (HBM) in order to demonstrate the proposed algorithm's capability coupled with the latest technologies. Simulation results using standard benchmarks show improvements of over 100× against COTS architectures with commercial libraries for both energy efficiency and performance.
更多
查看译文
关键词
hardware co-optimized solution,data size,co-optimized hardware accelerator,data transfer characteristics,SpMV algorithms,COTS architectures,hardware accelerator model,sparse matrix-vector multiplication,application specific integrated circuit,muti-way merge operation,3D stacked High Bandwidth Memory,main memory streaming,commercial of-the-shelf architectures
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要