Automatic Multi-Parameter Performance Modeling of HPC Applications on a New Sunway Supercomputer

Yilian Zhang,Yao Liu, Penglong Jiao, Yiping Zhou,Tongquan Wei

IEEE Transactions on Parallel and Distributed Systems(2023)

引用 0|浏览3
暂无评分
摘要
As the successor to Sunway TaihuLight, the new Sunway supercomputer has ultra-high computing capacity, but the unique heterogeneous architecture presents performance optimization challenges for High Performance Computing (HPC) applications. Performance modeling is an effective way to discover the performance bottlenecks and then improve the performance of HPC applications. Existing performance modeling techniques do not work well on large-scale HPC applications due to high overhead and low accuracy, and are not suitable for the heterogeneous architecture due to a lack of support for multi-resource parameters. To address the above challenges, we propose an automatic multi-parameter performance modeling method for HPC applications on the new Sunway supercomputer. First, a lightweight performance profiling method is proposed to achieve low overhead performance profiling. Then, performance models with multiple resource parameters based on the Fourier neural operator are built, achieving high prediction accuracy and generalization ability. Finally, the Fourier neural operator is extended on the new Sunway supercomputer to realize the performance modeling automatically. Experimental results show that the average prediction error is less than 10% and the average overhead is less than 4%, and the results are superior to the baselines.
更多
查看译文
关键词
new sunway supercomputer,hpc applications,performance,multi-parameter
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要