Optimal probing with statistical guarantees for network monitoring at scale

Computer Communications(2022)

引用 0|浏览20
暂无评分
摘要
Monitoring large-scale cloud networks is a complex task because their scale is prohibitively large, monitoring budgets are limited, network topologies are not regular and the estimates produced are a function of traffic patterns. In this work, we take a statistical approach to estimating a network metric, such as the latency of a set of paths, with guarantees on the estimation error. We aim to do so in an intelligent and scalable manner, without observing all existing traffic, and minimizing the estimation error at a fixed probing budget per unit of time. Our algorithms produce a distribution of probes/samples across network paths which can be used in conjunction with existing probers (or samplers). These algorithms are based on A- and E-optimal experimental designs in statistics, which guarantee a bounded estimation error for any monitoring budget. Unfortunately, these designs are too computationally intensive to be used in production at scale. We propose a scalable and near-optimal approximate implementations based on the Frank–Wolfe algorithm. We validate our approaches with two metrics (latency and loss) in simulations on real network topologies, and also using a production probing system in a real cloud network. We show major gains in reducing the probing budget compared to both production and academic baselines, while maintaining low errors in estimates, even with very low probing budgets.
更多
查看译文
关键词
Telemetry at scale,Cloud communication,Optimal monitoring
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要