Iterative statistical kernels on contemporary GPUs

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING(2013)

引用 5|浏览0
暂无评分
摘要
We present a study of OpenCL implementations of three important kernels that occur frequently in iterative statistical applications: multi-dimensional scaling MDS, PageRank and K-means clustering. We evaluated their performance on NVIDIA Tesla and Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. We explored the optimisation of these kernels by four main techniques: 1 caching invariant data in GPU memory across iterations; 2 selectively placing data in different memory levels; 3 rearranging data in memory; 4 dividing the work between the GPU and the CPU. We also implemented a novel algorithm for MDS and a novel data layout scheme for PageRank. Our optimisations resulted in performance improvements of up to 5× to 6×, compared to naïve OpenCL implementations and up to 100× improvement over single-core CPU. We believe that these categories of optimisations are also applicable to other similar kernels.
更多
查看译文
关键词
performance improvement,single-core CPU,different memory level,invariant data,GPU memory,Fermi GPGPU card,Iterative statistical kernel,OpenCL implementation,rearranging data,contemporary GPUs,novel algorithm,novel data layout scheme
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要