MD simulations and ML dataset of HLA-EpiCheck epitope predictor tool

DOI

This dataset contains all the data used to implement the B-cell epitope predictor tool called HLA-EpiCheck (see https://doi.org/10.1101/2023.12.18.572133).

CONTENTS: - pre-patches: Directory containing the computed pre-patches. A pre-patch corresponds to the set of residues within a given distance from a residue. The patches are generated subsequently by keeping only the solvent-accessible residues. Files are organized by locus and by antigen. A file contains the pre-patches associated to a given residue computed for any frame considered in the trajectory. + patches_resid__size.txt: Each line in the file corresponds to a pre-patch of a given frame. Line format : : Residue numbering is the same as in the PDB files.

  • trajectories: Directory contaning the MD data. Files are organized by locus and by antigen.

    • .dcd: 10ns MD trajectory comprising 1000 frames. Water molecules were removed.
    • .psf: Topology file of the .dcd trajectory.
    • .pdb: Starting structure of the MD simulation.
  • training_set_size_15.csv: training set used to train HLA-EpiCheck.

  • test_set_size_15.csv: test set used to evaluate HLA-EpiCheck.
  • table_patch_ID_antigen_residue.csv: Table containing the antigen and central residue associated to each patch.
  • model_ERF_radius_15.pkl: ML model of HLA-Epicheck in pickle format. Pickle version 4.0 used.
  • descriptors_eplets_non-confirmed.csv: Descriptors of the non-confirmed residue patches.
  • preds_non_confirmed_DQ.csv: HLA-EpiCheck predictions on the non-confirmed residue patches of eplets from locus DQ.
  • PDB_modeled_structures.txt : List of antigens modeled from a PDB structure with the corresponding PDB entry.

VMD, 1.9.4a57

namd3, 3.0alpha9_gcc-8.3.0

Python, 3.9.13

Scikit-learn, 1.0.2

pickle, 4.0

Identifier
DOI https://doi.org/10.57745/GXZHH8
Related Identifier https://doi.org/10.1101/2023.12.18.572133
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/GXZHH8
Provenance
Creator AMAYA RAMIREZ, Diego ORCID logo; DEVRIESE, Magali ORCID logo; USUREAU, Cedric ORCID logo; SMAIL-TABBONE, Malika ORCID logo; TAUPIN, Jean-Luc (ORCID: 0000-0002-5766-046X); DEVIGNES, Marie-Dominique ORCID logo
Publisher Recherche Data Gouv
Contributor AMAYA RAMIREZ, Diego; RINGOT, Patrice; Institut National de Recherche en Informatique et Aitomatique; Institut National de la Santé et de la Recherche Médicale; Université de Lorraine; Centre National de la Recherche Scientifique; Entrepôt-Catalogue Recherche Data Gouv
Publication Year 2023
Funding Reference ANR ANR-22-CE15-0036-03
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact AMAYA RAMIREZ, Diego (CAPSID team ; LORIA ; CNRS, INRIA, Université de Lorraine)
Representation
Resource Type Dataset
Format application/octet-stream; text/plain; text/tab-separated-values; application/zip
Size 15067438; 1166; 30568; 268980801; 4864; 890142; 332915; 1333826; 13820744113
Version 1.0
Discipline Computer Science; Life Sciences; Medicine