GNSGA: A Decentralized Data Replication Algorithm for Big Science Data.

IFIP Networking(2023)

引用 0|浏览5
暂无评分
摘要
Domain science applications in fields such as Genomics and High-Energy Particle Physics use geographically distributed data federations for publishing and accessing datasets. Data is typically replicated among data federation nodes to improve efficiency and fault tolerance. While replication strategies are well documented in distributed database instances (e.g., Apache Cassandra), replication among distributed data storage nodes can be ad-hoc. Replication over wide area networks can also require global coordination (or global shared state) which is not ideal when multiple organizations are involved. In this paper, we introduce GNSGA, which stands for Greedy Non-dominated Sorting Genetic Algorithm II. It is an optimization algorithm that combines greedy and non-dominated sorting genetic algorithms to solve multi-objective optimization problems. The "greedy" aspect of the algorithm refers to the use of a greedy strategy in the selection of nodes, while the "Non-dominated Sorting Genetic Algorithm II (NSGA-II)" is a fast non-dominated multi-objective optimization algorithm with an elite retention strategy. Replication decisions in GNSGA are based on the local properties and resource availability of the data storage nodes. By incorporating Greedy and NSGA-II algorithms, GNSGA optimizes multiple conflicting objectives to satisfy replica placement constraints such as cost, time, and storage capacity. We compared GNSGA with popular replica placement strategies, such as closest node replication, shortest transfer time, and a Particle Swarm Optimization (PSO)-based replication algorithm. We performed simulations and an actual deployment on the NSF's FABRIC testbed for evaluation. The results demonstrate that GNSGA consistently selects nodes to reduce replication time by 5.8%-15.4% while satisfying replication constraints (i.e., cost, time, and storage). We also show that GNSGA is beneficial for replicating large files over wide area networks.
更多
查看译文
关键词
replication, multi-objective optimization, distributed federation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要