Pcatch: automatically detecting performance cascading bugs in cloud systems

Jiaxin Li,Yuxi Chen,Haopeng Liu,Shan Lu,Yiming Zhang,Haryadi S. Gunawi,Xiaohui Gu,Xicheng Lu,Dongsheng Li

EuroSys '18: Thirteenth EuroSys Conference 2018 Porto Portugal April, 2018（2018）

引用 49|浏览299

暂无评分

摘要

Distributed systems have become the backbone of modern clouds. Users often expect high scalability and performance isolation from distributed systems. Unfortunately, a type of poor software design, which we refer to as performance cascading bugs (PCbugs), can often cause the slowdown of non-scalable code in one job to propagate, causing global performance degradation and even threatening system availability. This paper presents a tool, PCatch, that can automatically predict PCbugs by analyzing system execution under small-scale workloads. PCatch contains three key components in predicting PCbugs. It uses program analysis to identify code regions whose execution time can potentially increase dramatically with the workload size; it adapts the traditional happens-before model to reason about software resource contention and performance dependency relationship; it uses dynamic tracking to identify whether the slowdown propagation is contained in one job or not. Our evaluation using representative distributed systems, Cassandra, Hadoop MapReduce, HBase, and HDFS, shows that PCatch can accurately predict PCbugs based on small-scale workload execution.

查看译文

关键词

Performance Bugs,Cascading problems,Distributed Systems,Bug Detection,Cloud Computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要