Parallel matrix multiplication
2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)(2018)
摘要
Utilizing all CPU cores available for numerical computations is a topic of considerable interest in HPC. This paper analyzes and compares four different parallel algorithms for matrix multiplication without block partitioning using OpenMP. The comparison of the algorithms is based on the achieved speed, memory bandwidth and efficient use of the cache of the algorithms.
更多查看译文
关键词
algorithm,cache,memory bandwidth,speed,OpenMP,thread,loop order,matrix multiplication,transpose
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要