Iterative statistical kernels on contemporary GPUs

Thilina Gunarathne,Bimalee Salpitikorala,Arun Chauhan,Geoffrey Fox

INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING（2013）

引用 5|浏览0

暂无评分

摘要

We present a study of OpenCL implementations of three important kernels that occur frequently in iterative statistical applications: multi-dimensional scaling MDS, PageRank and K-means clustering. We evaluated their performance on NVIDIA Tesla and Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. We explored the optimisation of these kernels by four main techniques: 1 caching invariant data in GPU memory across iterations; 2 selectively placing data in different memory levels; 3 rearranging data in memory; 4 dividing the work between the GPU and the CPU. We also implemented a novel algorithm for MDS and a novel data layout scheme for PageRank. Our optimisations resulted in performance improvements of up to 5× to 6×, compared to naïve OpenCL implementations and up to 100× improvement over single-core CPU. We believe that these categories of optimisations are also applicable to other similar kernels.

查看译文

关键词

performance improvement,single-core CPU,different memory level,invariant data,GPU memory,Fermi GPGPU card,Iterative statistical kernel,OpenCL implementation,rearranging data,contemporary GPUs,novel algorithm,novel data layout scheme

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要