Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index of Technical Documentation or Product File
    • Summary of Technical Documentation (STED)
    • Description and specifications
    • R-TF-001-007 Declaration of conformity
    • GSPR
    • Artificial Intelligence
      • R-TF-028-001 AI/ML Description
      • R-TF-028-001 AI/ML Development Plan
      • R-TF-028-003 Data Collection Instructions - Prospective Data
      • R-TF-028-003 Data Collection Instructions - Retrospective Data
      • R-TF-028-004 Data Annotation Instructions - Visual Signs
      • R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
      • R-TF-028-004 AI/ML Development Report
      • R-TF-028 AI/ML Release Report
      • R-TF-028 AI/ML Design Checks
    • Clinical
    • Cybersecurity
    • Design and development
    • Design History File
    • IFU and label
    • Post-Market Surveillance
    • Quality control
    • Risk Management
    • Usability and Human Factors Engineering
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Artificial Intelligence
  • R-TF-028-003 Data Collection Instructions - Retrospective Data

R-TF-028-003 Data Collection Instructions - Retrospective Data

Table of contents
  • 1. Context
  • 2. Objectives
  • Population
    • Recruitment
    • Ethics
    • Inclusion Criteria
    • Exclusion Criteria
  • Design
  • Acquisition Protocol
  • Collection Protocol
  • Other Specifications

1. Context​

The development of high-performing, safe, and effective AI/ML algorithms for dermatological assessment, as intended for Legit.Health Plus, is critically dependent on the quality, diversity, and scale of the training and testing data [cite: 60-62, 430]. [cite_start]To build models that generalize across real-world clinical scenarios, it is essential to source data from a wide variety of contexts [cite: 448-450].

This document outlines the instructions for the retrospective collection of dermatological images and associated metadata from reputable, publicly available medical datasets and online atlases[cite: 443]. This approach ensures the creation of a comprehensive foundational dataset that is broad in scope.

2. Objectives​

The primary objectives of this data collection protocol are:

  • To gather a large-scale, heterogeneous dataset of dermatological images for the training, validation, and testing of the AI/ML algorithms in Legit.Health Plus.
  • To ensure the dataset is representative of the intended patient population, covering a wide spectrum of ICD-11 categories, patient demographics (age, sex), and all six Fitzpatrick skin phototypes[cite: 67, 69, 452].
  • To acquire the necessary diagnostic labels and clinical metadata to establish a reliable ground truth for each image, enabling supervised learning and robust performance evaluation[cite: 521].

Population​

Recruitment​

Data will be sourced retrospectively from Public Medical Datasets and Online Atlases[cite: 443]. The process involves the systematic identification and acquisition of reputable, publicly accessible dermatological image repositories.

Ethics​

  • All data collection and usage will adhere strictly to the licenses, terms of use, and any sharing agreements under which the public datasets were published.
  • All data processing activities by AI Labs Group S.L. will be conducted in full compliance with applicable data protection regulations, including the General Data Protection Regulation (GDPR).
  • A verification step will be included in the collection protocol to ensure the data is de-identified. Any data containing residual personal identifiers will be excluded.

Inclusion Criteria​

  • Images of the epidermis, dermis, and associated appendages[cite: 119].
  • Cases with a confirmed diagnosis (ICD category) provided by a qualified medical expert or through histopathological analysis[cite: 442].
  • Images of sufficient diagnostic quality to be of clinical utility [cite: 687-689].
  • Both clinical and dermoscopic images[cite: 724, 725, 731].

Exclusion Criteria​

  • Images below a defined quality threshold (e.g., out of focus, poor lighting, significant artifacts)[cite: 685, 686].
  • Cases with ambiguous, missing, or unverified diagnostic labels[cite: 506, 507].
  • Images for which the usage rights are unclear or do not permit use for this purpose.
  • Images containing identifiable patient information that cannot be securely and completely removed.

Design​

This is a retrospective data collection protocol. All data is pre-existing. No new data will be generated from patient interactions under this plan. The collection process is ongoing to continuously improve and expand the dataset.

Acquisition Protocol​

As the data is collected retrospectively from multiple public sources, there is no single, standardized acquisition protocol[cite: 448]. [cite_start]The images will have been captured using a variety of devices (e.g., different digital cameras, dermatoscopes) and under diverse clinical settings[cite: 449, 450]. [cite_start]This inherent variability is considered a positive attribute, as it contributes to the development of a more robust and generalizable AI/ML model[cite: 448, 449].

Collection Protocol​

The collection of data will be performed systematically as follows:

  1. Source Identification: Identify relevant and reputable public datasets and dermatological atlases.
  2. Data Retrieval: Securely download the data (images and associated metadata/labels) into a temporary staging area within AI Labs Group S.L.'s secure research environment.
  3. Curation and Standardization:
  • Verify that all data meets the inclusion/exclusion criteria.
  • Standardize the diagnostic labels by mapping all provided taxonomies to the official ICD-11 classification system, a process overseen by qualified medical specialists [cite: 507-519].
  • Organize data into a consistent file structure.
  1. De-identification Verification: Perform a final check to ensure all data is free from personally identifiable information. Any files with residual identifiers will be rejected.
  2. Ingestion: Ingest the curated, verified, and standardized data into the main AI/ML development database, where it will be versioned and prepared for partitioning into training, validation, and test sets as described in the R-TF-028-002 AI/ML Development Plan.

Collected Data Will Include:

  • Image files (e.g., JPG, PNG, DICOM).
  • Metadata files (e.g., CSV, JSON) containing the ground truth diagnosis, and where available, patient demographics (age, sex, phototype) and other relevant clinical information.

Other Specifications​

  • No specific conditions are applied to the specific make or model of camera or dermatoscope used in the original acquisition to ensure real-world diversity[cite: 449, 450].
  • No specific conditions are applied regarding the operator who performed the original examination, provided the resulting data meets the quality and inclusion criteria.

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-001
Previous
R-TF-028-003 Data Collection Instructions - Prospective Data
Next
R-TF-028-004 Data Annotation Instructions - Visual Signs
  • 1. Context
  • 2. Objectives
  • Population
    • Recruitment
    • Ethics
    • Inclusion Criteria
    • Exclusion Criteria
  • Design
  • Acquisition Protocol
  • Collection Protocol
  • Other Specifications
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)