Citation of Dynamic Data


引用 0|浏览3
Digitally driven research is a rather young discipline that evolves fast. As a result the tools and the data are rarely developed with a focus of long term awareness. What matters most to researchers are fast results and prompt publications. Yet, only if results can be reproduced precisely, the validity of research experiments and business processes can be judged, evaluated and verified. This requires precise identification of the data used in any such process. However, researchers rarely use an entire dataset as provided, but select subsets, be it a specific time-range, or a set of measurements. Hence there is a strong need for data citation mechanisms that allow identifying arbitrary subsets of large data set with precision in a machine-actionable way. An additional challenge within the area of research data is the requirement to cite evolving data reliably. Researchers need the possibility to reference data material that is subject to change. Hence, mechanisms are required that allow to cite data as they used it during a particular experiment. When the data gets updated, modified or deleted, these changes must be reflected and should be recoverable by the citation system as well. Therefore time-stamped/versioned data is an important factor. The easier and more transparently this citation process can be implemented, the higher the acceptance. The solution needs to be machine-actionable, and needs to scale from small to very large datasets, from static data to highly dynamic data, across changes in the data representation. We will provide proofs of concept, mockups and prototype implementations that can be tested and used by the community. We want to go beyond theoretical work and deliver real world prototypes and case studies for our models. In an optimal setting, a researcher, when selecting a subset of data for an experiment, will be issued with a PID that allows others to retrieve precisely the same data set again.
AI 理解论文
Chat Paper