BTO, Block and Thread Optimization of GPU Kernels on Geophysical Exploration.

Brenda S. Schussler, Pedro H. C. Rigon,Arthur Francisco Lorenzon,Philippe O. A. Navaux

International Euromicro Conference on Parallel, Distributed and Network-Based Processing(2024)

引用 0|浏览0
暂无评分
摘要
The pursuit of performance and energy efficiency of geophysical exploration applications on high-performance computing (HPC) servers has been driving the optimization of hardware resource usage in graphic processing units (GPUs). On such architectures, the execution configuration of each kernel (e.g., the number of blocks and threads per block) plays an essential role in the performance and energy consumption of these applications. However, as we show in this paper, due to the massive number of possible configurations of the number of blocks and threads per block, leveraging solely on the software developer to define the configuration execution for every GPU kernel does not lead to an ideal usage of GPU hardware resources, leading to performance loss and an increase on the energy consumption. To tackle this challenge, we propose BTO, a block and thread optimization strategy driven by a genetic algorithm. It cooperatively optimizes the number of blocks and thread per block for every GPU kernel at runtime with minimum convergence overhead regardless of the number of GPUs available on the system. When employing BTO to optimize the Fletcher modeling, a representative geophysical exploration application, on different AMD and NVIDIA GPUs, we show that BTO improves the energy-delay product (EDP - tradeoff between performance and energy) by up to 83.8% and 81.9% over the standard execution of Fletcher and the default execution of GPU applications on the target architectures. Moreover, by comparing it to an exhaustive search, we show that BTO converges to optimal execution configurations in 96.4% of all evaluated scenarios.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要