Extending PluTo for Multiple Devices by Integrating OpenACC

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)(2018)

引用 1|浏览2
暂无评分
摘要
For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while different code transformers (like PluTo) allow for (semi-)automatic code parallelization for multi-core CPUs, they do generally not support OpenACC yet. We present first promising results of our PluTo extension that generates parallelized codes using OpenACC. Using our transformer we create programs which exploit the parallelism of different platforms without any manual modifications and we achieve performance speedups of up to 100 in comparison to the original unoptimized programs and accelations of 2.05 in comparison to equally generated OpenMP codes.
更多
查看译文
关键词
parallelization,compiler,multi core,GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要