CoFI: consistency-guided fault injection for cloud systems

ASE(2020)

引用 20|浏览73
暂无评分
摘要
ABSTRACTNetwork partitions are inevitable in large-scale cloud systems. Despite developer's efforts in handling network partitions throughout designing, implementing and testing cloud systems, bugs caused by network partitions, i.e., partition bugs, still exist and cause severe failures in production clusters. It is challenging to expose these partition bugs because they often require network partitions to start and stop at specific timings. In this paper, we propose Consistency-Guided Fault Injection (CoFI), a novel technique that systematically injects network partitions to effectively expose partition bugs. We observe that, network partitions can leave cloud systems in inconsistent states, where partition bugs are more likely to occur. Based on this observation, CoFI first infers invariants (i.e., consistent states) among different nodes in a cloud system. Once detecting violations to the inferred invariants (i.e., inconsistent states) while running the cloud system, CoFI injects network partitions to prevent the cloud system from recovering back to consistent states, and thoroughly tests whether the cloud system still proceeds correctly at inconsistent states. We have applied CoFI to three widely-deployed cloud systems, i.e., Cassandra, HDFS, and YARN. CoFI has detected 12 previously-unknown bugs, and four of them have been confirmed by developers.
更多
查看译文
关键词
Cloud system, netwrok partition, fault injection, testing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要