Show simple item record

dc.contributor.authorMüller, Juliane
dc.contributor.authorFaybishenko, Boris
dc.contributor.authorAgarwal, Deborah
dc.contributor.authorBailey, Stephen
dc.contributor.authorJiang, Chongya
dc.contributor.authorRyu, Youngryel
dc.contributor.authorTull, Craig
dc.contributor.authorRamakrishnan, Lavanya
dc.date.accessioned2021-04-06T21:03:53Z
dc.date.available2021-04-06T21:03:53Z
dc.date.issued2021
dc.identifier.citationMüller, J., Faybishenko, B., Agarwal, D., et al (2021) Assessing data change in scientific datasets. Concurrency and Computaional Practice and Experience, e6245, 22pp. DOI: https://doi.org/10.1002/cpe.6245en_US
dc.identifier.urihttp://hdl.handle.net/11329/1536
dc.identifier.urihttp://dx.doi.org/10.25607/OBP-1032
dc.description.abstractScientific datasets are growing rapidly and becoming critical to next‐generation scientific discoveries. The validity of scientific results relies on the quality of data used and data are often subject to change, for example, due to observation additions, quality assessments, or processing software updates. The effects of data change are not well understood and difficult to predict. Datasets are often repeatedly updated and recomputing derived data products quickly becomes time consuming and resource intensive and may in some cases not even be necessary, thus delaying scientific advance. Despite its importance, there is a lack of systematic approaches for best comparing data versions to quantify the changes, and ad‐hoc or manual processes are commonly used. In this article, we propose a novel hierarchical approach for analyzing data changes, including real‐time (online) and offline analyses. We employ a variety of fast‐to‐compute numerical analyses, graphical data change representations, and more resource‐intensive recomputations of a subset of the data product. We illustrate the application of our approach using three scientific diverse use cases, namely, satellite, cosmological, and x‐ray data. The results show that a variety of data change metrics should be employed to enable a comprehensive representation and qualitative evaluation of data changes.en_US
dc.language.isoenen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subject.otherData versionsen_US
dc.subject.otherScientific data change analysisen_US
dc.titleAssessing data change in scientific datasets.en_US
dc.typeJournal Contributionen_US
dc.description.refereedRefereeden_US
dc.format.pagerange22pp.en_US
dc.identifier.doihttps://doi.org/10.1002/cpe.6245
dc.subject.dmProcessesData Management Practices::Data archival/stewardship/curationen_US
dc.subject.dmProcessesData Management Practices::Data quality controlen_US
dc.bibliographicCitation.titleConcurrency and Computational Practice and Experienceen_US
dc.bibliographicCitation.issueArticle e6245en_US
dc.description.bptypeManual (incl. handbook, guide, cookbook etc)en_US
obps.contact.contactnamejuliane muelle
obps.contact.contactemailjulianemueller@lbl.gov
obps.resourceurl.publisherhttps://onlinelibrary.wiley.com/doi/10.1002/cpe.6245en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International