Secondary Use Prevention in Large-Scale Data Lakes

SAI (3)(2021)

引用 1|浏览1
暂无评分
摘要
Large-scale infrastructures acquire data from integrated data lakes and warehouses managed by diverse data owners and controllers, which is then offered to a large variety of users or data processors. This data might contain personal information about individuals, which if not used according to the data collection purposes can lead to secondary use that may result in legal ramifications. The significance of data often increases with different transformations and aggregations when new linkages and correlations are revealed, making it valuable for users. However, with continuous transformation and new emerging data requirements of different users, often it is difficult for controllers to monitor resource usage closely, and collection purposes are overlooked. Hence, in order to limit secondary use in large-scale distributed environments, the collection purposes for the resources need to be preserved for data through the different transformations that they may undergo. We, therefore, propose to record the collection purposes as part of the resource metadata or provenance. This way it can be preserved and maintained through different data changes and can be used as a deciding factor in limiting the exposure of personal information for different users or data processors. This paper offers insight into how collection purposes can be described as a provenance property, and how is it used in an access control mechanism to limit secondary use.
更多
查看译文
关键词
Purpose, Secondary use, Privacy, Data lakes, Provenance, Access control
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要