Challenges of Provenance in Scientific Workflow Management Systems

2022 IEEE/ACM Workshop on Workflows in Support of Large-Scale Science (WORKS)(2022)

引用 3|浏览5
暂无评分
摘要
Scientific workflow is one of the well-established pillars of large-scale computational science and emerged as a torchbearer to formalize and structure a massive amount of complex heterogeneous data and accelerate scientific progress. A workflow can analyze terabyte-scale datasets, contain numerous individual tasks, and coordinate between heterogeneous tasks with the help of scientific workflow management systems (SWfMSs). SWfMSs support the automation of repetitive tasks and capture complex analysis through workflows. However, the execution of workflows is costly and requires a lot of resource usage. At different phases of a workflow life cycle, most SWfMSs store provenance information, allowing result reproducibility, sharing, and knowledge reuse in the scientific community. But, this provenance information can be many times larger than the workflow and input data, and managing provenance data is growing in complexity with large-scale applications. Handling exponential increasing data volume and utilizing the technical resources for storage and computing are thus demanded by exploiting data-intensive computing in various application fields. This paper documented the challenges of provenance management and reuse in e-science, focusing primarily on scientific workflow approaches by exploring different SWfMSs and provenance management systems. We also investigated the ways to overcome the challenges.
更多
查看译文
关键词
Scientific workflow,scientific workflow management system,provenance,reusability,open science
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要