IMPORTANT-IMPORTANT-IMPORTANT-IMPORTANT
The valid files are 11112024_AcqParams_v2.xlsx, 11112024_BDMetaData_v2.xlsx, DICTIONARY_RADIOLUNG_v2.docx, and all the associated CT files.
This prospective cohort study, initiated in December 2019, aims to evaluate nodules of patients who underwent surgery for pulmonary nodules (PN) in routine practice and as part of Lung Cancer Screening (LCS). To determine the malignancy of the nodules and their histopathological type, a biopsy was performed on each PN.
Informed consent was obtained from all subjects involved in the study. The study was conducted according to the guidelines of the Declaration of Helsinki-Fortaleza/Brazil, 2013, and approved by the Institutional Review Board of Hospital Universitari Germans Trias i Pujol (protocol code PI-19-169 and date 6 September 2019).
The dataset contains 4 types of files:
- NII.GZ files: anonymized 3D volumes Fromm CT Scans.
- ACSV files: Text files generated by 3D Slicer, associated to an individual NII.GZ file, which include the coordinates and size of the volume that encompasses the nodule found in the CT.
- XLSX files: One spreadsheet with the metadata corresponding to the acquisition parameters of each CT (11112024_AcqParams), and one spreadsheet with the metadata corresponding to each patient, CT scan, and nodule. (11112024_BDMetaData). The description, and possible values, of each column are written in a dictionary in DOCX format.
- DOCX files: Text file with the description, and possible values, of the metadata for patient, CT scans, and nodules, included in the dataset.
XLSX and DOCX files are found in the main folder of the dataset.
The anonymized CT scans, and their corresponding ACSV files, are stored in separate folders, using the patient identifier as the folder name. These folders are stored inside the folder “CT”.
The CT scans were acquired using standardized parameters, including 120 kV, 100-350 mA (dose modulation range), soft tissue reconstructions, and high-frequency algorithms. For more detailed information, please refer to the article titled “An Intelligent Radiomic Approach for Lung Cancer Screening” (full reference provided above).
The database contains anonymized CT scans in NifTI format, along with the precise location of the nodules within the scans. These locations were marked by a respiratory medicine physician with seven years of experience using 3D-Slicer software (version 4.11.20210226). The physician utilized this tool to define a Volume of Interest (VOI) that encloses each nodule, resulting in the generation of a corresponding .ACSV file for each PN.
For instance, let’s take the file R_1.acsv, which was generated using 3D-Slicer and contains a VOI represented by the following lines:
line 24. # pointColumns = type|x|y|z|sel|vis
line 25. point|82.3079|-85.7626|102.94|1|1
line 26. point|11.9305|16.9547|10.8962|1|1
Line 24 describes the positional items of the subsequent lines. The VOI is represented by a central point (x, y, z), which can be found in Line 25, and a shift in both directions (+/-) for each axis, specified in Line 26. This means that the VOI is delimited by the following two points:
Point1 = (70.3774, −102.7173, 92.0438)
Point2 = (94.2384, −68.8079, 113.8362)
These points define the bounding box that encompasses the PN image. They are given in the world system coordinates relative to the CT scan and need to be mapped to the voxel coordinates for extracting the VOI of the nodule. To achieve this mapping, the affine matrix is employed to transform from the scanner coordinate system to the voxel coordinate system. Once the mapping is performed, the bounding box can be utilized to extract the pulmonary nodule from the CT scan.