R-TF-028-003 Data Collection Instructions - Custom Gathered Data
Table of contents
Purpose and Scope
This document defines the systematic protocol for the collection of dermatological images and associated clinical metadata from custom data gathering activities conducted by or for AI Labs Group S.L. This protocol forms part of the data acquisition strategy for the development, validation, and continuous improvement of the AI algorithms integrated into Legit.Health Plus.
Custom gathered data serves as a critical complement to retrospective archive data, providing controlled, high-quality datasets from known clinical contexts with standardized acquisition protocols. These datasets are sourced from clinical validation studies of the medical device and from prospective data acquisition studies specifically designed for algorithm training and testing purposes.
Context and Rationale
The development of clinically safe, effective, and generalizable AI algorithms requires not only large-scale retrospective data but also carefully curated prospective data collected under controlled conditions [1-3]. Custom gathered data offers several key advantages:
- Acquisition Control: Standardized imaging protocols ensure consistent image quality, resolution, and metadata completeness.
- Ground Truth Quality: Direct access to clinical workflows enables the collection of robust diagnostic labels, including differential diagnoses and diagnostic confidence levels.
- Real-World Clinical Context: Data collected during actual clinical practice or under realistic clinical scenarios ensures ecological validity and reflects the intended use environment [4, 5].
- Regulatory Compliance: Custom data collection enables full control over informed consent, data protection, and ethical oversight, ensuring compliance with MDR 2017/745 and GDPR requirements.
- Performance Validation: Data from clinical validation studies provides an independent assessment of device performance in real-world conditions.
This approach ensures the creation of high-quality, well-characterized datasets that enhance model robustness, support regulatory requirements, and enable continuous device improvement.
Objectives
The primary objectives of this custom data collection protocol are:
- Controlled Data Acquisition: To collect dermatological images and metadata under standardized protocols that ensure high quality, completeness, and consistency.
- Clinical Validation Support: To gather data during clinical validation studies, enabling assessment of device performance in real-world clinical settings.
- Algorithm Enhancement: To acquire targeted datasets for specific diagnostic categories, patient demographics, or imaging conditions to address performance gaps or expand device capabilities.
- Ground Truth Establishment: To establish robust diagnostic ground truth through expert clinical assessment, differential diagnoses, and where applicable, histopathological confirmation [6].
- Regulatory Compliance: To execute all data collection activities in full compliance with ethical requirements (informed consent, IRB/CEIm approval), data protection regulations (GDPR), and quality management system procedures.
Data Sources and Study Types
Custom gathered data is collected through two primary mechanisms, with distinct intended uses for algorithm development:
Clinical Validation Studies
Data collected during clinical validation studies has a strictly limited use:
- Performance Validation: Primary and sole purpose is to assess the clinical performance of the device in real-world settings, comparing device outputs to reference standard diagnoses.
- Independent Test/Validation Sets Only: Data from clinical validation studies is used exclusively for independent testing and validation of algorithm performance.
- Regulatory Compliance: This separation ensures unbiased performance assessment and compliance with regulatory expectations for independent validation datasets.
Clinical validation studies are conducted in accordance with the device's Clinical Evaluation Plan and comply with all applicable regulatory and ethical requirements.
Dedicated Prospective Data Acquisition Studies
Prospective observational studies designed specifically for data collection purposes, conducted without intervention of the medical device. These studies:
- Are designed to capture specific types of data needed for algorithm development (e.g., rare diagnoses, specific skin types, particular imaging conditions).
- Flexible Dataset Use: Data collected through these studies may be used for algorithm training, validation, and testing as needed for algorithm development and improvement.
- Follow standardized acquisition protocols to ensure data quality and consistency.
- Require IRB/CEIm approval and informed consent from all participants.
- Do not involve any intervention, treatment, or modification of the patient's standard care pathway.
Data Population Characteristics
Recruitment Strategy
Participants are recruited from dermatology clinics and healthcare institutions where AI Labs Group S.L. conducts clinical validation or prospective data acquisition studies. Recruitment strategies vary by study type:
- Clinical Validation Studies: Consecutive enrollment ("all-comers") strategy to ensure representative patient populations and minimize selection bias [7].
- Prospective Data Acquisition Studies: May employ stratified or purposive sampling to ensure adequate representation of specific diagnostic categories, demographics, or clinical presentations.
Participating Institutions
Data collection is conducted at qualified healthcare institutions, including:
- Academic medical centers with dermatology departments.
- Community dermatology clinics with experienced dermatologists.
- Specialized skin disease clinics.
All participating institutions are selected based on:
- Availability of qualified dermatologists.
- Appropriate clinical infrastructure for image acquisition.
- Institutional capacity to comply with ethical and data protection requirements.
- Established collaboration agreements or data collection contracts with AI Labs Group S.L.
Ethical and Legal Considerations
All custom data collection activities are conducted in full compliance with ethical and legal requirements:
Ethical Approval
- All data collection protocols require prior approval from an Institutional Review Board (IRB) or Comité de Ética de la Investigación (CEIm) at the participating institution [8].
- The ethics application includes the study protocol, informed consent forms, data protection measures, and any amendments.
- Ethics approval documentation is maintained in the technical file for each data collection study.
Informed Consent
- All participants provide written Informed Consent prior to any data collection [9].
- The consent process ensures participants are fully informed about:
- The purpose of the study (clinical validation or data collection).
- The types of data being collected (images, clinical metadata).
- How their de-identified data will be used (algorithm development, training, validation).
- Their rights, including the right to withdraw consent.
- Consent forms are available in the participant's native language.
- Signed consent forms are securely stored and maintained according to retention requirements.
Data Protection and Privacy (GDPR Compliance)
- All data collection, processing, and storage activities comply with the EU General Data Protection Regulation (GDPR) [10].
- Data is de-identified at the point of collection:
- Unique anonymized participant identifiers are assigned.
- No personally identifiable information (PII) such as names, dates of birth, medical record numbers, or addresses is collected.
- Image metadata (EXIF data) containing potential identifiers is stripped.
- Data transfer between institutions and AI Labs Group S.L. uses secure, encrypted channels.
- Access to identifiable data (during the active study phase) is restricted to authorized personnel under confidentiality agreements.
- Re-identification keys (linking anonymized IDs to participants) are maintained securely and separately from the research dataset, accessible only to the principal investigator at the collecting institution, and are not shared with AI Labs Group S.L.
Data Processing Agreements
Where applicable, formal Data Processing Agreements (DPAs) are established between AI Labs Group S.L. and participating institutions, clearly defining roles, responsibilities, and data handling procedures.
Inclusion Criteria
Participants and data will be included if they meet all of the following criteria:
- Eligibility: Patients attending a dermatology clinic for assessment of a skin condition.
- Age: Patients aged 18 years or older (or as specified in the study-specific protocol).
- Consent: Patients who are able and willing to provide written informed consent.
- Clinical Scope: Conditions involving the epidermis, dermis, and associated cutaneous structures, consistent with the intended use of Legit.Health Plus.
- Image Quality: Images captured must meet minimum quality standards (resolution, focus, lighting) as defined in the acquisition protocol.
- Diagnostic Information: A confirmed diagnosis or diagnostic assessment by a qualified dermatologist must be available for each case.
Exclusion Criteria
Participants and data will be excluded if they meet any of the following criteria:
- Inability to Consent: Patients unable to provide informed consent due to cognitive impairment, language barriers without interpretation, or other reasons.
- Refusal to Participate: Patients who decline to participate or withdraw consent.
- Poor Image Quality: Cases with extremely low visual quality due to patient factors (e.g., inability to remain still, uncooperative), anatomical factors, or technical issues.
- Out-of-Scope Conditions: Conditions outside the intended use of Legit.Health Plus (e.g., purely mucosal lesions, unless within device scope).
- Incomplete Data: Cases with missing critical metadata (diagnosis, imaging modality, lesion location) that cannot be recovered.
Study Design and Protocols
Study Design Types
Custom data collection studies are classified as:
- Prospective, Observational Studies: No intervention or modification of the patient's standard clinical care.
- Single-center or Multi-center: Depending on the specific study objectives and required sample size.
- Consecutive or Stratified Enrollment: Depending on whether the goal is to capture a representative general population or specific targeted subgroups.
Study Workflow
The general workflow for data collection during clinical encounters is:
- Patient Presentation: Patient attends their standard dermatology appointment or a study-specific visit.
- Eligibility Assessment: The attending dermatologist or study coordinator assesses eligibility based on the inclusion/exclusion criteria.
- Informed Consent: The study is explained to eligible patients, questions are addressed, and written informed consent is obtained.
- Image Acquisition: Study-specific images are captured according to the standardized acquisition protocol.
- Clinical Assessment: The dermatologist performs their standard clinical assessment and records the diagnosis and relevant clinical metadata.
- Standard Care Continuation: The patient's standard care pathway continues without alteration.
- Data Recording: All required data is entered into the electronic Case Report Form (eCRF) or data collection system.
Image Acquisition Protocol
Qualified Operators
All images are acquired by or under the supervision of:
- Qualified Dermatologists: Board-certified or equivalent dermatologists with clinical experience in dermatological diagnosis.
- Trained Clinical Staff: Where appropriate, trained nurses or medical assistants may perform image acquisition under dermatologist supervision, following standardized protocols.
All operators receive training on the acquisition protocol and quality standards before participating in data collection.
Standardized Acquisition Procedure
To ensure consistency and quality across all data collection sites, a standardized image acquisition procedure is defined:
Number of Images
For each enrolled case, the operator captures between 1 and 5 high-resolution images of the relevant skin lesion(s) or affected area. The specific number depends on:
- The size and extent of the lesion(s).
- The number of distinct lesions requiring documentation.
- The clinical complexity of the presentation.
Image Types and Modalities
The image set must include:
- Clinical Images (Macroscopic Photographs):
- Standard photographs of the lesion and surrounding skin.
- Captured at a distance that provides anatomical context when needed.
- Lighting should be adequate and even, avoiding harsh shadows or specular reflections.
- Dermoscopic Images:
- Close-up, magnified images captured using a dermatoscope (contact or non-contact).
- Required for pigmented lesions and other cases where dermoscopy is clinically indicated.
- Should clearly show the morphological structures relevant for diagnosis.
Technical Specifications
Images must meet the following minimum technical requirements:
- Format: JPEG, BMP or PNG format.
- Focus: Images must be in sharp focus across the region of interest.
- Lighting: Adequate, even illumination without significant shadows, glare, or color casts.
- Framing: The lesion must occupy a significant portion of the frame while including sufficient surrounding normal skin for context.
- Artifacts: Minimize obstructions (hair, rulers, surgical markings) where possible, or ensure they do not obscure diagnostically relevant features.
Equipment
- Cameras: Digital cameras, smartphones with high-resolution cameras, or dedicated medical imaging devices.
- Dermatoscopes: Polarized or non-polarized dermatoscopes, contact or non-contact, from established manufacturers (e.g., DermLite, Heine, FotoFinder).
- Lighting: Natural lighting or standardized artificial lighting (daylight-balanced LED or flash).
Note: No specific manufacturer or model requirements are imposed to ensure real-world generalizability and device compatibility across diverse clinical settings [11, 12].
Data Collection and Handling Protocol
Data Collection Workflow
The systematic data collection process ensures completeness, quality, and traceability:
-
Participant Enrollment:
- Upon obtaining informed consent, a unique, anonymized participant identifier is generated using a standardized format (e.g., SITE_YYMMDD_NNN).
- The identifier is recorded in both the institution's study records and the data collection system.
-
Image Capture:
- The operator captures images following the Standardized Acquisition Procedure (Section 6.2).
- Images are reviewed immediately for quality (focus, lighting, framing).
- Poor-quality images are recaptured if possible.
-
Clinical Assessment and Diagnosis:
- The attending dermatologist performs their clinical assessment.
- The diagnostic assessment is recorded in the electronic Case Report Form (eCRF) or data collection tool.
- Diagnostic Information Collected:
- Primary Diagnosis: The most certain diagnosis (ICD-11 code).
- Differential Diagnoses (optional but encouraged), listed in descending order of certainty (ICD-11 codes).
- Diagnostic Confidence: Where applicable, the clinician's confidence level (e.g., high, medium, low).
- Histopathological Confirmation: If available, biopsy results and histopathological diagnosis.
-
Metadata Recording:
- Required metadata is entered into the eCRF for each case:
- Anonymized participant ID.
- Demographics: age (in years or age ranges), sex, Fitzpatrick skin phototype (I-VI) [13].
- Lesion characteristics: anatomical location, lesion type, size, duration.
- Imaging modality: clinical, dermoscopic.
- Acquisition date (de-identified: month and year only).
- Device/camera model (optional but encouraged).
- Required metadata is entered into the eCRF for each case:
-
Quality Control:
- A study coordinator or designated personnel reviews each case for completeness:
- All required images present and meet quality standards.
- All required metadata fields populated.
- Diagnosis recorded in correct format (ICD-11).
- Incomplete cases are flagged for resolution before data transfer.
- A study coordinator or designated personnel reviews each case for completeness:
-
Secure Data Transfer:
- At regular intervals (e.g., weekly or monthly), de-identified images and metadata are securely transferred to AI Labs Group S.L.'s secure research environment.
- Transfer methods include encrypted file transfer (SFTP, HTTPS), secure cloud storage with access controls, or encrypted physical media.
- Data integrity is verified using checksums (e.g., SHA-256), when applicable.
-
Data Receipt and Verification:
- Upon receipt, AI Labs Group S.L. verifies data integrity and completeness.
- Any issues (corrupted files, missing metadata) are reported to the data collection site for resolution.