Murphy: Performance Diagnosis of Distributed Cloud Applications

PROCEEDINGS OF THE 2023 ACM SIGCOMM 2023 CONFERENCE, SIGCOMM 2023(2023)

引用 1|浏览13
暂无评分
摘要
Modern cloud-based applications have complex inter-dependencies on both distributed application components as well as network infrastructure, making it difficult to reason about their performance. As a result, a rich body of work seeks to automate performance diagnosis of enterprise networks and such cloud applications. However, existing methods either ignore inter-dependencies which results in poor accuracy, or require causal acyclic dependencies which cannot model common enterprise environments. We describe the design and implementation of Murphy, an automated performance diagnosis system, that can work with commonly available telemetry in practical enterprise environments, while achieving high accuracy. Murphy utilizes loosely-defined associations between entities obtained from commonly available monitoring data. Its learning algorithm is based on a Markov Random Field (MRF) that can take advantage of such loose associations to reason about how entities affect each other in the context of a specific incident. We evaluate Murphy in an emulated microservice environment and in real incidents from a large enterprise. Compared to past work, Murphy is able to reduce diagnosis error by approximate to 1.35x in restrictive environments supported by past work, and by >= 4.7x in more general environments.
更多
查看译文
关键词
performance diagnosis,cyclic dependencies,enterprise networks,microservices
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要