谷歌浏览器插件
订阅小程序
在清言上使用

OpenBLAS: A high performance BLAS library on Loongson 3A CPU

Ruan Jian Xue Bao/Journal of Software(2011)

引用 0|浏览8
暂无评分
摘要
BLAS is a fundamental math library in scientific computing. Thus, each CPU vendor releases optimized BLAS library for its own CPU. Loongson CPU series are developed by the Institute of Computing Technology, Chinese Academy of Sciences. In 2010, it released Loongson 3 CPU series. This paper introduces the open source BLAS library OpenBLAS, which is forked on GotoBLAS 2-1.13 BSD version. BLAS Level 3 functions of OpenBLAS is optimized on Loongson 3A quad cores CPU. In sequential optimizations, blocking, hand coding assembly kernel, Loongson 3A special instructions and reordering instructions are utilized. The performance of BLAS Level 3 subroutines exceeded GotoBLAS and ATLAS by about 75% and 17%. Meanwhile, it exceeded GotoBLAS and ATLAS by about 103% and 36% in double precision functions. In parallel multi-threads optimization, this study used interleaved data buffer layout to avoid shared L2 Cache conflictions among multi-threads. OpenBLAS achieved 3.47 speedups on quad cores. In 4 threads, the performance of OpenBLAS BLAS Level3 functions exceeded GotoBLAS and ATLAS by about 69% and 34%, 89% and 55% in double precision functions. ©2011 Journal of Software.
更多
查看译文
关键词
BLAS,Loongson 3A,MIPS64,Multicore,Optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要