Intra-Clustering: Accelerating On-chip Communication for Data Parallel Architectures.

Wen Yuan,Rahul Boyapati,Lei Wang,Hyunjun Jang,Yuho Jin,Ki Hwan Yum,Eun Jung Kim

SBAC-PAD Workshops（2015）

引用 4|浏览36

暂无评分

摘要

Modern computation workloads contain abundant Data Level Parallelism (DLP), which requires specialized data parallel architectures, such as Graphics Processing Units (GPUs). With parallel programming models, such as CUDA and OpenCL, GPUs are easily to be programmed for non-graphics applications, and therefore become a cost effective approach for data parallel architectures. The large quantity of available parallelism places a heavy stress on the memory system as the limited number of pins confines the number of memory controllers on the chip. This creates a potential bottleneck for performance scalability of the GPUs. To accelerate communication with the memory system, we propose the Intra-Clustering on-chip network for data parallel architectures, which is built upon a traditional two-dimensional electrical mesh network with memory controllers connected through a nanophotonic ring and compute cores grouped into different clusters. Our evaluations with CUDA benchmarks show that the Intra-Clustering architecture can improve communication delay by an average of 17% (up to 32%) and IPC by an average of 5% (up to 11.5%).

查看译文

关键词

on-chip communication,data parallel architectures,data level parallelism,DLP,graphics processing units,GPU,parallel programming models,OpenCL,nongraphics applications,memory controllers,memory system,intraclustering on-chip network,two-dimensional electrical mesh network,nanophotonic ring,CUDA benchmarks,intraclustering architecture,communication delay

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要