Transparent Checkpoint-Restart of Distributed Applications on Commodity Clusters

Burlington, MA(2005)

引用 62|浏览4
暂无评分
摘要
We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network ap- plications on commodity clusters. ZapC provides a thin virtualization layer on top of the operating system that de- couples a distributed application from dependencies on the cluster nodes on which it is executing. This decoupling en- ables ZapC to checkpoint an entire distributed application across all nodes in a coordinated manner such that it can be restarted from the checkpoint on a different set of cluster nodes at a later time. ZapC checkpoint-restart operations execute in parallel across different cluster nodes, provid- ing faster checkpoint-restart performance. ZapC uniquely supports network state in a transport protocol independent manner, including correctly saving and restoring socket and protocol state for both TCP and UDP connections. We have implemented a ZapC Linux prototype and demonstrate that it provides low virtualization overhead and fast checkpoint- restart times for distributed network applications without any application, library, kernel, or network protocol modi- fications.
更多
查看译文
关键词
Linux,checkpointing,distributed processing,ZapC Linux prototype,ZapC checkpoint-restart operations,cluster nodes,commodity clusters,distributed application,distributed applications,distributed network applications,operating system,transparent checkpoint-restart,transparent coordinated checkpoint-restart,transport protocol
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要