Efficient Provenance Management via Clustering and Hybrid Storage in Big Data Environments

IEEE Transactions on Big Data(2020)

引用 13|浏览17
暂无评分
摘要
Provenance is a type of metadata that records the creation and transformation of data objects. It has been applied to a wide variety of areas such as security, search, and experimental documentation. However, provenance usually has a vast amount of data with its rapid growth rate which hinders the effective extraction and application of provenance. This paper proposes an efficient provenance management system via clustering and hybrid storage. Specifically, we propose a Provenance-Based Label Propagation Algorithm which is able to regularize and cluster a large number of irregular provenance. Then, we use separate physical storage mediums, such as SSD and HDD, to store hot and cold data separately, and implement a hot/cold scheduling scheme which can update and schedule data between them automatically. Besides, we implement a feedback mechanism which can locate and compress the rarely used cold data according to the query request. The experimental test shows that the system can significantly improve provenance query performance with a small run-time overhead.
更多
查看译文
关键词
Big data,provenance management,clustering,hybrid storage,compress
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要