Assessing data change in scientific datasets.

View/ Open
Average rating
votes
Date
2021Author
Müller, Juliane
Faybishenko, Boris
Agarwal, Deborah
Bailey, Stephen
Jiang, Chongya
Ryu, Youngryel
Tull, Craig
Ramakrishnan, Lavanya
Metadata
Show full item recordAbstract
Scientific datasets are growing rapidly and becoming critical to next‐generation scientific discoveries. The validity of scientific results relies on the quality of data used and data are often subject to change, for example, due to observation additions, quality assessments, or processing software updates. The effects of data change are not well understood and difficult to predict. Datasets are often repeatedly updated and recomputing derived data products quickly becomes time consuming and resource intensive and may in some cases not even be necessary, thus delaying scientific advance. Despite its importance, there is a lack of systematic approaches for best comparing data versions to quantify the changes, and ad‐hoc or manual processes are commonly used. In this article, we propose a novel hierarchical approach for analyzing data changes, including real‐time (online) and offline analyses. We employ a variety of fast‐to‐compute numerical analyses, graphical data change representation.....
Resource URL
https://onlinelibrary.wiley.com/doi/10.1002/cpe.6245Journal
Concurrency and Computational Practice and ExperienceIssue
Article e6245Page Range
22pp.Document Language
enBest Practice Type
Manual (incl. handbook, guide, cookbook etc)DOI Original
https://doi.org/10.1002/cpe.6245Citation
Müller, J., Faybishenko, B., Agarwal, D., et al (2021) Assessing data change in scientific datasets. Concurrency and Computaional Practice and Experience, e6245, 22pp. DOI: https://doi.org/10.1002/cpe.6245Collections
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International