Principles and best practices in data versioning for all data sets big and small. Version 1.1.

View/ Open
Average rating
votes
Date
2020Author
Klump, Jens
Wyborn, Lesley
Wu, Mingfang
Downs, Robert
Asmi, Ari
Ryder, Gerry
Martin, Julia
Corporate Author
Research Data Alliance Data Versioning Working Group
Status
PublishedPages
19pp.Metadata
Show full item recordAbstract
The demand for better reproducibility of research results is growing. More and more data is becoming available online. In some cases, the datasets have become so large that downloading the data is no longer feasible. Data can also be offered through web services and accessed on demand. This means that parts of the data are accessed at a remote source when needed. In this scenario, it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. However, while the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available.
Versioning procedures and best practices are well established for scientific software. The related Wikipedia article gives an overview of software versioning practices. The codebase of large software projects does bear some semblance to large dynamic datasets. Are th.....
Publisher
Research Data Alliance (RDA)Document Language
enMaturity Level
Pilot or DemonstratedDOI Original
10.15497/RDA00042Citation
Klump, J., Wyborn, L., Downs, R., Asmi, A., Wu, M., Ryder, G., & Martin, J. (2020) Principles and best practices in data versioning for all data sets big and small. Version 1.1. Research Data Alliance, 19pp. DOI: 10.15497/RDA00042.Collections
- RDA Resources [6]