This dataset was used for training of a quantitative structure-activity relationship (QSAR) model that predicts skin sensitization according to bone marrow-derived dendritic cells (BMDC) assay.
File: skin_sens_bmdc.sdf
The dataset is provided as one file, skin_sens_bmdc.sdf in MDL SDF format (see for instance https://discover.3ds.com/sites/default/files/2020-08/biovia_ctfileformats_2020.pdf). Descriptions of the fields are given below:
SMILES (string)
SMILES (Simplified Molecular Input Line Entry System) is a representation of a molecule in string format.
CAS number (string)
CAS (Chemical Abstract Service) is a unique and unambiguous identifier of a molecule or a substance.
Compound_name (string)
A common chemical name of a compound or a subtance.
LLNA_potency (nominal)
Skin sensitization potency according to LLNA assay categories. Levels are: NS (non-sensitizer), Weak, Moderate, Strong, Extreme.
LLNA_class (integer)
Binary skin sensitization classification based on LLNA assay: 0 - non-sensitizer; 1 - sensitizer.
BMDC_class (integer)
Binary skin sensitization classification based on BMDC assay: 0 - non-sensitizer; 1 - sensitizer.
File: All_Sensitization_Labels.sdf
The file All_Sensitization_Labels.sdf contains all compounds with the sensitization labels interpreted from different end points. For logical variables, 0 means "non-sensitizing" and 1 means "sensitizing".
Compound_name (string)
A common chemical name of a compound or a substance
LLNA_Call_ICE (string)
LLNA call from the ICE database
LLNA_pEC3_ICE (float)
LLNA pEC3 from the ICE database
Class_LLNA (logical, 0 or 1)
Sensitization class for LLNA
Class_LuSens_ICE (logical, 0 or 1)
Sensitization class for LuSens
Class_U-SENS_ICE (logical, 0 or 1)
Sensitization class for U-SENS
Class_hCLAT_ICE (logical, 0 or 1)
Sensitization class for hCLAT
Class_mMUSST_ICE (logical, 0 or 1)
Sensitization class for mMUSST
Class_DPRA_ICE (logical, 0 or 1)
Sensitization class for DPRA
Class_KeratinoSens_ICE (logical, 0 or 1)
Sensitization class for KeratinoSens
Class_BMDC (logical, 0 or 1)
Sensitization class for BMDC
Prediction_PredSkin (logical, 0 or 1)
Sensitization class predicted by the PredSkin model
Class_Human_PredSkin (logical, 0 or 1)
Sensitization class for human from PredSkin dataset
Class_LLNA_PredSkin (logical, 0 or 1)
Sensitization class for LLNA from PredSkin dataset
Confidence_PredSkin (nominal, "Low", "Medium", "High")
Sensitization class confidence of the prediction by the PredSkin model. All entries have the confidence equal to "High"
CAS (string)
Chemical Abstrasct Service Registry Number of the molecule or substance
CAS_Tropsha (string)
Chemical Abstrasc Service Registry Number of the molecule or substance according to PredSkin training set
File: data_preparation_and_analyis.knwf
The file data_preparation_and_analysis.knwf is a workflow for the KNIME software version 4.6.5 and later (https://www.knime.com/). The workflow was developed and used for pairwise comparison of different skin sensitization assay labels. It is shipped with the raw data.
Updates:
22/02/2024 - 4-acetoxybenzoic acid LLNA label 1 (old) changed to 0 (new).