Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)(2017)
摘要
Accelerating Convolutional Neural Networks (CNNs) on GPUs usually involves two stages: training and inference. Traditionally, this two-stage process is deployed on high-end GPU-equipped servers. Driven by the increase in compute power of desktop and mobile GPUs, there is growing interest in performing inference on various kinds of platforms. In contrast to the requirements of high throughput and accuracy during the training stage, end-users will face diverse requirements related to inference tasks. To address this emerging trend and new requirements, we propose Pervasive CNN (P-CNN), a user satisfaction-aware CNN inference framework. P-CNN is composed of two phases: cross-platform offline compilation and run-time management. Based on users' requirements, offline compilation generates the optimal kernel using architecture-independent techniques, such as adaptive batch size selection and coordinated fine-tuning. The runtime management phase consists of accuracy tuning, execution, and calibration. First, accuracy tuning dynamically identifies the fastest kernels with acceptable accuracy. Next, the run-time kernel scheduler partitions the optimal computing resource for each layer and schedules the GPU thread blocks. If its accuracy is not acceptable to the end-user, the calibration stage selects a slower but more precise kernel to improve the accuracy. Finally, we design a user satisfaction metric for CNNs to evaluate our Pervasive deign. Our evaluation results show P-CNN can provide the best user satisfaction for different inference tasks.
更多查看译文
关键词
user satisfactory CNN,pervasive CNN,GPU microarchitectures,convolutional neural networks,two-stage process,high-end GPU-equipped servers,mobile GPU,P-CNN,user satisfaction-aware CNN inference framework,cross-platform offline compilation,run-time management,user requirements,offline compilation,optimal kernel,architecture-independent techniques,adaptive batch size selection,coordinated fine-tuning,run-time management,coordinated fine-tuning,runtime management phase,accuracy tuning,run-time kernel scheduler partitions,optimal computing resource,GPU thread blocks,user satisfaction metric,pervasive deign
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络