PAQSIM: Fast Performance Model for Graphics Workload on Mobile GPUs

Xiang Gong,Chunling Hu,Chu-Cheow Lim

LCTES '20: 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems London United Kingdom June, 2020（2020）

引用 1|浏览52

暂无评分

摘要

As the popularity of GPU in embedded systems keeps increasing, there is a growing demand for performance models for rapid estimation and tuning. One major challenge of developing a GPU performance model is the balance between accuracy and speed. The analytical model and the architectural model, two prevailing performance models, both have their weaknesses. The analytical model is fast to execute and simple to implement but usually suffers from low simulation accuracy. On the other hand, the cycle-level architectural model can offer high accuracy, but often at the expense of the execution time. In this work, we present a hybrid performance model for core-level performance studies. Our model takes advantage of the speed of the analytical model and the accuracy of the cycle-level architectural model. We model the resource contention as in traditional architectural models but reduce the pipeline stages when no contention is expected. The graphics workloads have shown uniform characteristics, which allows us to replace some detailed simulation with analytical models for latency estimation in key events such as memory accesses, texture fetches, and synchronizations. Such design greatly reduces the simulation time while maintains decent simulation accuracy. We evaluate our performance model against commercial mobile GPUs. The experiments using graphics workloads from popular games show great simulation speed and high accuracy in predicting the GPU performance. For simulations using the aggressive mode, the simulator can achieve an average 4.1x slowdown, with an average error rate at 6% and the peak error rate at 27.9%.

查看译文

关键词

GPU, Graphics, SoC, Simulation, Performance Model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要