FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science

Dataset

FDA-ARGOS project update FDA-ARGOS database updates may help researchers rapidly validate diagnostic tests and use qualified genetic sequences to support future product development As of September 2021, Embleema and George Washington University have been conducting bioinformatic research and system development, focusing on expanding the FDA-ARGOS database. This project expands datasets publicly available in FDA-ARGOS, improves quality control by developing quality matrix tools and scoring approaches that will allow the mining of public sequence databases, and identifies high-quality sequences for upload to the FDA-ARGOS database as regulatory-grade sequences. Building on expansions during the COVID-19 pandemic, this project aims to further improve the utility of the FDA-ARGOS database as a key tool for medical countermeasure development and validation. For additional details on project information and assembly QC see <ul> <li> <a href="https://www.fda.gov/emergency-preparedness-and-response/mcm-regulatory-science/expanding-next-generation-sequencing-tools-support-pandemic-preparedness-and-response">FDA-ARGOS Project Information</a> </li> <li> <a href="https://data.argosdb.org/">ARGOS Database</a> </li> </ul> FDA-ARGOS Initial Phase </li> </ul> In May 2014, the FDA and collaborators had established a publicly available database for Reference Grade microbial Sequences called FDA-ARGOS. With funding support from FDA's Office of Counterterrorism and Emerging Threats (OCET) and DoD, the FDA-ARGOS team had initially collected and sequenced 2000 microbes that included biothreat microorganisms, common clinical pathogens, and closely related species. At the beginning of this project, the FDA-ARGOS microbial genomes were generated in 3 phases. Generally: <ul> <li>Phase 1 entailed collection of a previously identified microbe and nucleic acid extraction.</li> <li>Phase 2, the microbial nucleic acids were then sequenced and de novo assembled occurred using Illumina and PacBio sequencing platforms at the Institute for Genome Sciences at the University of Maryland (UMD-IGS).</li> <li>Phase 3, the assembled genomes were then vetted by an ID-NGS subject matter expert working group that consisted of FDA personnel and collaborators and the data was then deposited in the NCBI databases.</li> </ul> The FDA-ARGOS genomes meet the quality metrics for reference-grade genomes for regulatory use. FDA-ARGOS reference genomes have been de novo assembled with high depth of base coverage and placed within a pre-established phylogenetic tree. Each microbial isolate in the database is covered at a minimum of 20X over 95 percent of the assembled core genome. Furthermore, sample-specific metadata, raw reads, assemblies, annotation, and details of the bioinformatics pipeline are available.

Identifier
Source	https://data.blue-cloud.org/search-details?step=~0120B0065DB0652951C11A52F4C43F72A01C2E10EF4
Metadata Access	https://data.blue-cloud.org/api/collections/0B0065DB0652951C11A52F4C43F72A01C2E10EF4

Provenance
Instrument	Illumina HiSeq 2500; PacBio RS; Illumina MiSeq; Illumina HiSeq 4000; Illumina NovaSeq 6000; Sequel; Sequel II; Illumina HiSeq 2000; ILLUMINA; PACBIO_SMRT
Publisher	Blue-Cloud Data Discovery & Access service; ELIXIR-ENA
Publication Year	2024
OpenAccess	true
Contact	blue-cloud-support(at)maris.nl

Representation
Discipline	Marine Science
Temporal Coverage Begin	1909-01-01T00:00:00Z
Temporal Coverage End	2020-01-19T00:00:00Z