Characterizing the Microarchitectural Implications of a Convolutional Neural Network (CNN) Execution on GPUs.

ICPE(2018)

引用 37|浏览21
暂无评分
摘要
GPUs have become a very popular platform for accelerating the processing involved in deep learning applications. One class of popular variants, Convolutional Neural Networks (CNNs), have been widely deployed to run on GPUs. In many application settings, a GPU has sufficient computing power and memory space to accommodate the dense matrix operations performed during CNN training. However, few characterization studies have considered how CNNs can impact microarchitectural structures in a GPU. In this paper, we perform a characterization of one selected CNN workload as run on two different NVIDIA GPUs from distinct microarchitecture families, highlighting the impact that microarchitecture plays on this important class of workload. First, we analyze the performance implications of a CNN model using microarchitectural details on a layer-by-layer basis, and characterize the memory access behavior in the context of a typical GPU memory hierarchy, considering hardware resource utilization associated with each primitive in the CNN model. We identify major bottlenecks by considering the potential limits of using a single GPU. Additionally, we evaluate a number of optimization approaches, such as L1 cache bypassing and kernel fusion. L1 cache bypassing can achieve up to a 6.2% speedup for a single layer, but manipulating L1 cache provides very limited benefits in terms of application speedup, while kernel fusion provides an overall application speedup of 4.0%, on average.
更多
查看译文
关键词
Convolutional Neural Networks, GPU, Characterization, Performance analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要