Simplified Specification of Data Requirements for Demand-Actuated Big Data Refinement

Journal of Data Intelligence(2022)

引用 0|浏览5
暂无评分
摘要
Data have become one of the most valuable resources in modern society. Due to increasing digitalization and the growing prevalence of the Internet of Things, it is possible to capture data on any aspect of today's life. Similar to physical resources, data have to be refined before they can become a profitable asset. However, such data preparation entails completely novel challenges: For instance, data are not consumed when being processed, whereby the volume of available data that needs to be managed increases steadily. Furthermore, the data preparation has to be tailored to the intended use case in order to achieve an optimal outcome. This, however, requires the knowledge of domain experts. Since such experts are typically not IT experts, they need tools that enable them to specify the data requirements of their use cases in a user-friendly manner. The goal of this data preparation is to provide any emerging use case with demand-actuated data.}{With this in mind, we designed a tailorable data preparation zone for Data Lakes called BARENTS\@. It provides a simplified method for domain experts to specify how data must be pre-processed for their use cases, and these data preparation steps are then applied automatically. The data requirements are specified by means of an ontology-based method which is comprehensible to non-IT experts. Data preparation and provisioning are realized resource-efficient by implementing BARENTS as a dedicated zone for Data Lakes. This way, BARENTS is seamlessly embeddable into established Big Data infrastructures.}{This article is an extended and revised version of the conference paper ``Demand-Driven Data Provisioning in Data Lakes: BARENTS\,---\,A Tailorable Data Preparation Zone'' by Stach~et~al.~\cite{Stach2021}. In comparison to our original conference paper, we take a more detailed look at related work in the paper at hand. The emphasis of this extended and revised version, however, is on strategies to improve the performance of BARENTS and enhance its functionality. To this end, we discuss in-depth implementation details of our prototype and introduce a novel recommender system in BARENTS that assists users in specifying data preparation steps.
更多
查看译文
关键词
data requirements,specification,refinement,demand-actuated
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要