Root Cause Analysis for Cloud-Native Applications

Bartosz Zurkowski,Krzysztof Zielinski

IEEE TRANSACTIONS ON CLOUD COMPUTING(2024)

引用 0|浏览0
暂无评分
摘要
Root cause analysis (RCA) is a critical component in maintaining the reliability and performance of modern cloud applications. However, due to the inherent complexity of cloud environments, traditional RCA techniques become insufficient in supporting system administrators in daily incident response routines. This article presents an RCA solution specifically designed for cloud applications, capable of pinpointing failure root causes and recreating complete fault trajectories from the root cause to the effect. The novelty of our approach lies in approximating causal symptom dependencies by synergizing several symptom correlation methods that assess symptoms in terms of structural, semantic, and temporal aspects. The solution integrates statistical methods with system structure and behavior mining, offering a more comprehensive analysis than existing techniques. Based on these concepts, in this work, we provide definitions and construction algorithms for RCA model structures used in the inference, propose a symptom correlation framework encompassing essential elements of symptom data analysis, and provide a detailed description of the elaborated root cause identification process. Functional evaluation on a live microservice-based system demonstrates the effectiveness of our approach in identifying root causes of complex failures across multiple cloud layers.
更多
查看译文
关键词
Cloud computing,Correlation,Semantics,Behavioral sciences,Trajectory,Measurement,Inference algorithms,Root cause analysis,event correlation,knowledge mining,cloud-native applications
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要