ICOS improved data lifecycle


ICOS provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO2 partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.

As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.

In this report we we define and describe the implemention of a comprehensive unified metadata flow from Thematic Centres to the Carbon Portal. The design criteria of this system were to integrate as much as possible the operational (legacy) database systems at the TCs with the data portal, thereby preserving the investments in the robust and proven QA/QC and database systems at the TCs and combining these with the benefits of a linked open data system with connected data licence check, usage tracking and dynamic machine operable data and metadata based on a versioned RDF triple store.

Also we developed a connected DOI minting system, implemented the generation of data collections and a linked system for versioning of the data, all connected to the ontology driven single point of ingestion, optimised for machine to machine communication. This has been used incrementally in full operational mode over the last years and is now in place and used by all ICOS domains for all data streams, from raw data through near-real-time to final quality controlled data, and by the external users that provide elaborated products.

The licence check and data usage tracking has been implemented in a completely unobtrusive way and is flexible enough to be started to interoperate with major data portals like those of FLUXNET, NEON, SOCAT and WMO WDCGG. The use of DOIs increases the exposure of the ICOS data to global and European data portals like the future EOSC portal and current OpenAIRE portal and Google Dataset Search. The ICOS data is already finding it way to many users and the growing length of the ICOS timeseries in all domains and the interoperation with the global portals this data use of ICOS data can now grow further optimally.

DOI https://doi.org/10.18160/D2JV-KB6B
Metadata Access https://oai.datacite.org/oai?verb=GetRecord&metadataPrefix=datacite&identifier=doi:10.18160/d2jv-kb6b
Creator Vermeulen, Alex ORCID logo; Hazan, Lynn ORCID logo; Pfeil, Benjamin ORCID logo; Lankreijer, Harry ORCID logo; Hellström, Margareta ORCID logo; Mirzov, Oleg (ORCID: 0000-0002-4742-958X); D'Onofrio, Claudio ORCID logo; Rivier, Leo; Jones, Steve ORCID logo; Papale, Dario ORCID logo; Juurola, Eija
Publisher ICOS ERIC
Contributor Vermeulen, Alex
Publication Year 2020
Rights CC0; https://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Resource Type application/pdf; Text
Format PDF
Version 1.0
Discipline Other