RMQS:
The French Soil Quality Monitoring Network (RMQS) is a national program for the assessment and long-term monitoring of the quality of French soils. This network is based on the monitoring of 2240 sites representative of French soils and their land use. These sites are spread over the whole French territory (metropolitan and overseas) along a systematic square grid of 16 km x 16 km cells. The network covers a broad spectrum of climatic, soil and land-use conditions (croplands, permanent grasslands, woodlands, orchards and vineyards, natural or scarcely anthropogenic land and urban parkland). The first sampling campaign in metropolitan France took place from 2000 to 2009.
Dataset:
This dataset contains config files used to run the bioinformatic pipeline and the control sample data that were not published before
Reference environmental DNA samples named “G4” in internal laboratory processes were added for each molecular analysis. They were used for technical validation, but not necessarily published alongside the datasets. The taxonomy and OTU abundance files for these control samples were built like the taxonomy and abundance file of the main dataset. As these internal control samples were clustered against the RMQS dataset in an open reference fashion, they contained new OTUs (noted as “OUT”) that corresponded to sequences that did not match any of 188,030 RMQS reference sequences. The sample bank association file links each sample to its sequencing library. The G4 metadata file links each G4 to its library, molecular tag and sequence repository information.
File structure:
Taxonomy files rmqs1_control_taxonomy_<rank>:
Taxonomy is splitted across five files with one line per site and one column per taxa.
Each line sums to 10k (rarefaction threshold).
Three supplementary columns are present:
Unknown: not matching any reference.
Unclassified: missing taxa between genus and phylum.
Environmental: matched to sample from environmental study, generally with only a phylum name.
rmqs1_16S_otu_abundance.tsv:
OTU abundance per site (one column per OTUs, “DB” + number for OTUs from RMQS reference set, “OUT” for OTUs not matching any “DB” ones).
Each line sums to 10k (rarefaction threshold).
rmqs1_16S_bank_association.tsv:
two columns file with bank name for each sample
rmqs1_16S_bank_metadata.tsv:
library_name: library name used in labs
study_accession, sample_accession, experiment_accession, run_accession: SRA EBI identifier
library_name_genoscope: library name used in the Genoscope sequence center
MID: multiplex identifier sequence
run_alias: Genoscope internal alias
ftp_link: FTP link to download library
Input_G4.txt:
Tabulated file containing the parameters and the bioinformatic steps done by the BIOCOM-PIPE pipeline to extract, treat and analyze controls from raw librairies detailed in the rmqs1_16S_bank_metadata.tsv.
project_G4.tab:
Comma separated file containing the needed information to generate the Input.txt file with the BIOCOM-PIPE pipeline for controls only:
PROJECT: Project name chosen by the user
LIBRARY_NAME: Library name chosen by the user
LIBRARY_NAME_RECEIVED: Library name chosen by the sequencing partner and used by BIOCOM-PIPE
SAMPLE_NAME: Sample name chosen by the user
MID_F: MID name or MID sequence associated to the Forward primer
MID_R: MID name or MID sequence associated to the Reverse primer
TARGET: Target gene (16S, 18S, or 23S)
PRIMER_F: Forward primer name used for amplification
PRIMER_R: Reverse primer name used for amplification
SEQUENCE_PRIMER_F: Forward primer sequence used for amplification
SEQUENCE_PRIMER_R: Reverse primer sequence used for amplification
Input_GLOBAL.txt:
Tabulated file containing the parameters and the bioinformatic steps done by the BIOCOM-PIPE pipeline to extract, treat and analyze controls and samples from raw librairies detailed in the rmqs1_16S_bank_metadata.tsv.
project_GLOBAL.tab:
Comma separated file containing the needed information to generate the Input.txt file for controls and samples with the BIOCOM-PIPE pipeline:
PROJECT: Project name chosen by the user
LIBRARY_NAME: Library name chosen by the user
LIBRARY_NAME_RECEIVED: Library name chosen by the sequencing partner and used by BIOCOM-PIPE
SAMPLE_NAME: Sample name chosen by the user
MID_F: MID name or MID sequence associated to the Forward primer
MID_R: MID name or MID sequence associated to the Reverse primer
TARGET: Target gene (16S, 18S, or 23S)
PRIMER_F: Forward primer name used for amplification
PRIMER_R: Reverse primer name used for amplification
SEQUENCE_PRIMER_F: Forward primer sequence used for amplification
SEQUENCE_PRIMER_R: Reverse primer sequence used for amplification
Details:
Three libraries (58,59 and 69) data were re-sequenced and are not detailed in files. Some samples can be present in several libraries. We kept only the one with the highest number of sequences.