The Glycoside Hydrolase 19 Engineering Database (GH19ED) contains information on protein sequences and structures of glycoside hydrolases from family 19. This dataset lists cross-references to the National Center for Biotechnology Information (NCBI), cross-references to the Protein Data Bank (PDB) and the taxonomic lineage for each sequence entry in the GH19ED.
The tab-separated tabular file comprises nine columns:
(1) the sequence identifier from the GH19ED, integer (Sequence_id),
(2) the protein sequence accessions from the NCBI, semicolon-separated (NCBI_accessions),
(3) the PDB accessions, semicolon-separated (PDB_accessions),
(4) the name of the source or source organism (Source_name),
(5) the NCBI taxonomy identifier for the source (NCBI_taxonomy_id),
(6) the taxonomic lineage from the lowest to the highest rank, as inferred from NCBI taxonomy (Lineage),
(7) the "protein" identifier from the GH19ED, integer (Protein_id),
(8) the "homologous family" (or group) identifier from the GH19ED, integer (Homologous_family_id),
(9) the "superfamily" (or subfamily) identifier from the GH19ED, integer (Superfamily_id). For sequence entries assigned to more than one source organism name, only the first taxonomic lineage found in the GH19ED is listed.