Authors: O'Brien, Margaret; Smith, Colin A.; Sokol, Eric R.; Gries, Corinna; Lany, Nina; Record, Sydne; Castorani, Max C. N.
Source: Ecological Informatics, Volume: 64, Article Number: 101374, DOI: 10.1016/j.ecoinf.2021.101374, September 2021
Type of Publication: Journal Article
Abstract: The idea of harmonizing data is not new. Decades of amassing data in databases according to community standards both locally and globally have been more successful for some research domains than others. It is particularly difficult to harmonize data across studies where sampling protocols vary greatly and complex environmental conditions need to be understood to apply analytical methods correctly. However, a body of longterm ecological community observations is increasingly becoming publicly available and has been used in important studies. Here, we discuss an approach to preparing harmonized community survey data by an environmental data repository, in collaboration with a national observatory. The workflow framework and repository infrastructure are used to create a decentralized, asynchronous model to reformat data without altering original data through cleaning or aggregation, while retaining metadata about sampling methods and provenance, and enabling programmatic data access. This approach does not create another data 'silo' but will allow the repository to contribute subsets of available data to a variety of different analysis-ready data preparation efforts. With certain limitations (e.g., changes to the sampling protocol over time), data updates and downstream processing may be completely automated. In addition to supporting reuse of community observation data by synthesis science, a goal for this harmonization and workflow effort is to contribute these datasets to the Global Biodiversity Information Facility (GBIF) to increase the data's discovery and use.