RMQS1 16S bioinformatic config files and control sample data

DOI

RMQS: The French Soil Quality Monitoring Network (RMQS) is a national program for the assessment and long-term monitoring of the quality of French soils. This network is based on the monitoring of 2240 sites representative of French soils and their land use. These sites are spread over the whole French territory (metropolitan and overseas) along a systematic square grid of 16 km x 16 km cells. The network covers a broad spectrum of climatic, soil and land-use conditions (croplands, permanent grasslands, woodlands, orchards and vineyards, natural or scarcely anthropogenic land and urban parkland). The first sampling campaign in metropolitan France took place from 2000 to 2009. Dataset: This dataset contains config files used to run the bioinformatic pipeline and the control sample data that were not published before Reference environmental DNA samples named “G4” in internal laboratory processes were added for each molecular analysis. They were used for technical validation, but not necessarily published alongside the datasets. The taxonomy and OTU abundance files for these control samples were built like the taxonomy and abundance file of the main dataset. As these internal control samples were clustered against the RMQS dataset in an open reference fashion, they contained new OTUs (noted as “OUT”) that corresponded to sequences that did not match any of 188,030 RMQS reference sequences. The sample bank association file links each sample to its sequencing library. The G4 metadata file links each G4 to its library, molecular tag and sequence repository information. File structure:

Taxonomy files rmqs1_control_taxonomy_<rank>:

Taxonomy is splitted across five files with one line per site and one column per taxa. Each line sums to 10k (rarefaction threshold). Three supplementary columns are present:

Unknown: not matching any reference. Unclassified: missing taxa between genus and phylum. Environmental: matched to sample from environmental study, generally with only a phylum name.

rmqs1_16S_otu_abundance.tsv:

OTU abundance per site (one column per OTUs, “DB” + number for OTUs from RMQS reference set, “OUT” for OTUs not matching any “DB” ones). Each line sums to 10k (rarefaction threshold).

rmqs1_16S_bank_association.tsv:

two columns file with bank name for each sample

rmqs1_16S_bank_metadata.tsv:

library_name: library name used in labs study_accession, sample_accession, experiment_accession, run_accession: SRA EBI identifier library_name_genoscope: library name used in the Genoscope sequence center MID: multiplex identifier sequence run_alias: Genoscope internal alias ftp_link: FTP link to download library

Details:

Three libraries (58,59 and 69) data were re-sequenced and are not detailed in files.

Identifier
DOI https://doi.org/10.57745/XBFOJP
Related Identifier https://doi.org/10.1371/journal.pone.0186766
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/XBFOJP
Provenance
Creator Terrat, Sébastien ORCID logo; Dequiedt, Samuel ORCID logo
Publisher Recherche Data Gouv
Contributor Cottin, Aurélien
Publication Year 2023
Funding Reference French National Research Agency (ANR) ANR-10-INBS-09-08 ; French National Research Agency (ANR) ANR-11-INBS-0001 ; French Agency for Ecological Transition (ADEME) ; France Génomique
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Cottin, Aurélien (INRAE)
Representation
Resource Type Dataset
Format application/gzip; text/tab-separated-values
Size 362535; 33093; 117004; 522347; 80032; 16460; 32344; 13212
Version 1.0
Discipline Agriculture, Forestry, Horticulture; Geosciences; Agricultural Sciences; Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Biology; Biospheric Sciences; Earth and Environmental Science; Ecology; Environmental Research; Life Sciences; Natural Sciences