FIRED: a fine-grained robust performance diagnosis framework for cloud applications
arxiv(2022)
摘要
To run a cloud application with the required service quality, operators have
to continuously monitor the cloud application's run-time status, detect
potential performance anomalies, and diagnose the root causes of anomalies.
However, existing models of performance anomaly detection often suffer from low
re-usability and robustness due to the diversity of system-level metrics being
monitored and the lack of high-quality labeled monitoring data for anomalies.
Moreover, the current coarse-grained analysis models make it difficult to
locate system-level root causes of the application performance anomalies for
effective adaptation decisions. We provide a FIne-grained Robust pErformance
Diagnosis (FIRED) framework to tackle those challenges. The framework offers an
ensemble of several well-selected base models for anomaly detection using a
deep neural network, which adopts weakly-supervised learning considering fewer
labels exist in reality. The framework also employs a real-time fine-grained
analysis model to locate dependent system metrics of the anomaly. Our
experiments show that the framework can achieve the best detection accuracy and
algorithm robustness, and it can predict anomalies in four minutes with F1
score higher than 0.8. In addition, the framework can accurately localize the
first root causes, and with an average accuracy higher than 0.7 of locating
first four root causes.
更多查看译文
关键词
robust performance diagnosis framework,cloud
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要