Parallel matrix multiplication

Nikola Tomikj,Marjan Gusev

2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)(2018)

引用 1|浏览12
暂无评分
摘要
Utilizing all CPU cores available for numerical computations is a topic of considerable interest in HPC. This paper analyzes and compares four different parallel algorithms for matrix multiplication without block partitioning using OpenMP. The comparison of the algorithms is based on the achieved speed, memory bandwidth and efficient use of the cache of the algorithms.
更多
查看译文
关键词
algorithm,cache,memory bandwidth,speed,OpenMP,thread,loop order,matrix multiplication,transpose
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要