Zero Overhead Monitoring for Cloud-native Infrastructure using RDMA

USENIX Annual Technical Conference (USENIX ATC)(2022)

引用 2|浏览44
暂无评分
摘要
Cloud services have recently undergone a major shift from monolithic designs to microservices running on the cloud-native infrastructure, where monitoring systems are widely deployed to ensure the service level agreement (SLA). Nevertheless, the traditional monitoring system no longer fulfills the demands of cloud-native monitoring, which is observed from the practical experience in Alibaba cloud. Specifically, the monitor occupies resources (e.g., CPU) of the monitored infrastructure, disturbing services running on it. For example, enabling monitor causes jitters/declines of online services in Alibaba's "double eleven" shopping festival with high loads. On the other hand, the quality of service (QoS) of monitoring itself, which is vital to track and ensure SLA, is not guaranteed with the high loaded system. In this paper, we design and implement a novel monitoring system, named ZERO, for cloud-native monitoring. First, ZERO achieves zero overhead to collect raw metrics from the monitored hosts using one-sided remote direct memory access (RDMA) operations, thus avoiding any interferences to cloud services. Second, ZERO adopts receiver-driven model to collect monitoring metrics with high QoS, where credit-based flow control and hybrid I/O model are proposed to mitigate network congestion/interference and CPU bottlenecks. ZERO has been deployed and evaluated in Alibaba cloud. Deployment results show that ZERO achieves no CPU occupation at the monitored host and supports 1 similar to 10k hosts with 0.1 similar to 1s sampling interval using single thread for network I/O.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要