RMQS:
The French Soil Quality Monitoring Network (RMQS) is a national program for the assessment and long-term monitoring of the quality of French soils. This network is based on the monitoring of 2,240 sites representative of French soils and their land use. These sites are spread over the whole French territory (metropolitan and overseas) along a systematic square grid of 16 km x 16 km cells. The network covers a broad spectrum of climatic, soil and land-use conditions (croplands, permanent grasslands, woodlands, orchards and vineyards, natural or scarcely anthropogenic land and urban parkland). The first sampling campaign in metropolitan France took place from 2000 to 2009.
Dataset:
This dataset contains 16S (Archaea and Bacteria) operational taxonomic units (OTU) abundance table of 1,842 sites of the RMQS, as well as their core sequences.
Soil 16S rDNA gene was sequenced using pyrosequecing (GS FLX Titanium - Roche 454) at Genosocope. Bioinformatics analysis was performed using BIOCOM-PIPE (previously named GNS-PIPE) metabarcoding pipeline. OTUs were clustered at 95% using a post-clustering strategy Terrat et.al. (2019) across the whole dataset, producing 188,030 robust OTUs. See associated articles for details (keep in mind that first results were obtained without post-clustering, richness is therefore higher after post-clustering for each sample). Raw sequencing data are available at EBI under project PRJEB21351.
File structure:
rmqs1_16S_otu_abundance.tsv: OTU abundance per site (each OTUs name “DB” + number, arranged by their global abundance).
Each line sums to 10,000 (rarefaction chosen threshold).
otu_reference_sequence.fasta.gz: OTU core sequence.
rmqs1_16S_otu_taxonomy.tsv: Tabulated file giving the complete taxonomy for each OTU.
SEQUENCE: Name of the considered OTU
KINGDOM: Taxonomy at the kingdom level
PHYLUM: Taxonomy at the phylum level
CLASS: Taxonomy at the class level
ORDER: Taxonomy at the order level
FAMILY: Taxonomy at the family level
GENUS: Taxonomy at the genus level
Details:
Some sites sample could not be collected, they do not appear in dataset.
Some sites did not pass laboratory or bioinformatics step to attain 10,000 sequences before post-clustering, they dot not appear in the dataset.
One can link this dataset with 10.15454/QSXKGA to get each sample physico-chemical property, landuse, coordinates, or filtering sites using its site_officiel column.
Sites with ID longer than 4 number are supplementary sites that are not in the center of the cells (e.g. 10797 and 20797 that came from cell 797).
The taxonomy was defined based on the SILVA R132 database using only the core sequence of each OTU. This can be different to the approach of BIOCOM-PIPE, affiliating all sequences, and not only OTU core sequences.