R-TF-028-003 Data Collection Instructions - Prospective Data
Table of contents
Context
To supplement the large-scale retrospective dataset and ensure our algorithms are trained on contemporary, high-quality clinical data, a prospective data collection study was designed. Prospective collection allows for greater control over the acquisition process and ensures the inclusion of both clinical and dermoscopic images for the same lesions, reflecting modern dermatological practice.
Furthermore, the study was designed to capture not just a single diagnostic label but a ranked list of differential diagnoses. This provides richer data that reflects the clinical decision-making process and is suitable for advanced model training techniques, such as using soft labels.
This document describes the protocol for the prospective, observational data collection study conducted at the dermatology service of the Hospital Universitario de Torrejón.
Objectives
The primary objectives of this prospective data collection are:
- To collect a high-quality dataset of clinical and dermoscopic images from a real-world, routine clinical setting.
- To minimize selection bias by employing a consecutive enrollment ("all-comers") strategy for patient recruitment.
- To establish a robust ground truth for each case by capturing the diagnostic assessment from qualified dermatologists.
- To capture the diagnostic thought process and clinical uncertainty by collecting a ranked list of up to three differential diagnoses for each case.
- To create a sequestered test set from a portion of this prospectively collected data for the final, unbiased validation of the AI/ML models.
Population
Recruitment
Participants were recruited from the patient population attending the dermatology service at the Hospital Universitario de Torrejón. Recruitment was conducted on a consecutive enrollment ("all-comers") basis, meaning all patients who met the eligibility criteria were invited to participate to ensure the dataset is representative of the general patient population of the service.
Ethics
- The study protocol was submitted to and approved by the Institutional Review Board / Comité de Ética de la Investigación (CEIm) of the Hospital Universitario de Torrejón prior to the enrollment of the first participant.
- All participants provided written Informed Consent before any study-related images or data were collected. The consent process ensured that participants were fully aware of the study's purpose, the data being collected, and how their de-identified data would be used.
- All data was de-identified at the point of collection and handled in strict compliance with the EU General Data Protection Regulation (GDPR).
Inclusion Criteria
- Patients attending the dermatology clinic for assessment of a skin condition.
- Patients aged 18 years or older.
- Patients who were able and willing to provide written informed consent.
Exclusion Criteria
- Patients unable or unwilling to provide informed consent.
- Patients presenting with conditions or in situations that, in the investigator's opinion, would prevent high-quality image capture.
Design
The study is a prospective, single-center, observational data collection study. A cohort of approximately 500 was enrolled. A pre-defined subset of this cohort, selected randomly at the patient level, is reserved exclusively as a test set for final model validation.
The workflow for each participant was as follows:
- Patient attends their standard dermatology appointment.
- The attending dermatologist assesses eligibility for the study.
- The study is explained to the patient, who is given the opportunity to ask questions.
- If the patient agrees, written informed consent is obtained.
- Study-specific image acquisition is performed during the standard consultation.
- The patient's standard care pathway is not altered in any way.
Acquisition Protocol
Operator
All images were acquired by qualified dermatologists on the staff of the Hospital Universitario de Torrejón during routine clinical consultations.
Acquisition Procedure
For each enrolled participant, the dermatologist captured between 3 and 5 high-resolution images of the relevant skin lesion(s). The image set was specified to include:
- Clinical Images: Standard macroscopic photographs of the lesion.
- Dermoscopic Images: Close-up, magnified images of the lesion taken with a dermatoscope, where clinically indicated for that specific type of lesion.
Collection Protocol
The data collection and handling process was executed as follows:
- Upon obtaining informed consent, a unique, anonymized patient identifier was generated for the participant.
- The dermatologist acquired images as specified in the Acquisition Protocol (Section 5).
- Following the examination, the attending dermatologist recorded their diagnostic assessment in the electronic Case Report Form (eCRF). This assessment included:
- A primary diagnosis (the most certain diagnosis).
- Up to two additional differential diagnoses, listed in descending order of certainty.
- All diagnoses were recorded using ICD-11 codes.
- Other required metadata (e.g., anonymized demographics, lesion location) were also recorded in the eCRF.
- At regular intervals, the set of de-identified images and the corresponding metadata file were securely transferred to AI Labs Group S.L.'s secure research environment for ingestion.
Collected Data Will Include:
- De-identified image files (JPG/PNG).
- A metadata file (CSV) containing the anonymized patient ID, a ranked list of one to three differential diagnoses (ICD-11), demographics, and lesion information for each case.
Other Specifications
- To ensure the dataset reflects a real-world clinical population, no filtering of patients was applied beyond the defined inclusion/exclusion criteria.
- The choice of which lesions to photograph was left to the clinical judgment of the attending dermatologist based on the patient's presentation.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Author: Team members involved
- Reviewer: JD-003, JD-004
- Approver: JD-001