Detecting Performance Variance for Parallel Applications Without Source Code

J Zhai,L Zheng,F Zhang,X Tang,H Wang,T Yu,Y Jin,Sl Song,W Chen

IEEE Transactions on Parallel and Distributed Systems（2022）

引用 1|浏览90

暂无评分

摘要

For parallel applications, performance variance is a critical issue that can degrade performance and make applications’ behavior difficult to explain. Therefore, users and application developers should be able to detect and diagnose performance variance. Previous detection methods either introduce too much overhead and slow down applications, or rely on nontrivial source code analysis, which is impractical for production-run parallel systems. In this article, we propose Vapro , a framework for detecting and diagnosing performance variance in production-run parallel systems. Our method is based on an observation that most parallel programs contain code snippets that are executed repeatedly with a fixed workload and can be utilized to detect performance variance. We present State Transition Graph (STG) to track program execution and then do light-weight workload analysis on STG to locate performance variance. Vapro is able to successfully identify these snippets at runtime even without program source code. To diagnose the discovered variation, Vapro uses a progressive diagnosis method based on a hybrid model combining variance breakdown and statistical analysis. According to evaluating results, Vapro 's performance overhead is only 1.38% on average. Vapro can identify performance variance in real applications caused by hardware issues, such as memory and IO. The standard deviation of the execution time is decreased by up to 73.5% when the identified variance is fixed. Vapro achieves 30.0% larger detection coverage than the state-of-the-art variance detection approach based on source code analysis.

查看译文

关键词

Performance variance,anomaly detection,system noise,parallel computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要