The WGDC recommendations enable researchers and data centers to identify and cite data used in experiments and studies. Instead of providing static data exports or textual descriptions of data subsets, we support a dynamic, query centric view of data sets. The proposed solution enables precise identification of the very subset and version of data used, supporting reproducibility of processes, sharing and reuse of data.
The goal of the WG were to create identification mechanisms that (a) allow us to identify and cite arbitrary views of data, from a single record to an entire data set in a precise, machine-actionable manner; (b) allow us to cite and retrieve that data as it existed at a certain point in time, whether the database is static or highly dynamic; and (c) is stable across different technologies and technological changes.
The WG recommends solving this challenge by (1) ensuring that data is stored in a versioned and timestamped manner and (2) identifying data sets by storing and assigning persistent identifiers (PIDs) to timestamped queries that can be re-executed against the timestamped data store.