Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index of Technical Documentation or Product File
    • Summary of Technical Documentation (STED)
    • Description and specifications
    • R-TF-001-007 Declaration of conformity
    • GSPR
    • Artificial Intelligence
      • R-TF-028-001 AI/ML Description
      • R-TF-028-001 AI/ML Development Plan
      • R-TF-028-003 Data Collection Instructions - Prospective Data
      • R-TF-028-003 Data Collection Instructions - Retrospective Data
      • R-TF-028-004 Data Annotation Instructions - Visual Signs
      • R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
      • R-TF-028-004 AI/ML Development Report
      • R-TF-028 AI/ML Release Report
      • R-TF-028 AI/ML Design Checks
    • Clinical
    • Cybersecurity
    • Design and development
    • Design History File
    • IFU and label
    • Post-Market Surveillance
    • Quality control
    • Risk Management
    • Usability and Human Factors Engineering
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Artificial Intelligence
  • R-TF-028-004 AI/ML Development Report

R-TF-028-004 AI/ML Development Report

Table of contents
  • Introduction
    • Context
    • Algorithms Description
    • AI/ML Standalone Evaluation Objectives
  • Data Management
    • Collection
    • Annotation, Truthing, and Consensus
    • Preparation and Partitioning
  • Algorithm Training (ICD Category Distribution)
    • Pre-processing
    • Design, Training, and Tuning
    • Post-processing
  • Algorithm Performance Evaluation/Testing
    • ICD Category Distribution Performance
    • Binary Indicator Performance
    • Bias Analysis
  • Conclusion
  • AI/ML Risks Assessment Report
    • AI/ML Risk Assessment
    • AI/ML Risk Treatment
    • Residual AI/ML Risk Assessment
    • AI/ML Risk and Traceability with Safety Risk
    • Conclusion
  • Related Documents

Introduction​

Context​

This report documents the development, verification, and validation of the AI/ML algorithm package for the Legit.Health Plus medical device. The development process was conducted in accordance with the procedures outlined in GP-028 AI Development and followed the methodologies specified in the R-TF-028-002 AI/ML Development Plan.

The algorithms are designed as offline (static) models. They were trained on a fixed dataset prior to release and do not adapt or learn from new data after deployment. This ensures predictable and consistent performance in the clinical environment.

Algorithms Description​

The algorithm package consists of two core components that work in sequence to fulfill User Requirement REQ_004:

  1. ICD Category Distribution Algorithm: A deep learning model, based on a Vision Transformer (ViT) architecture, that analyzes a given dermatological image (clinical or dermoscopic). Its output is a normalized probability distribution across [NUMBER OF CATEGORIES] relevant ICD-11 categories. For the user, this is presented as the top five most likely diagnoses.
  2. Binary Indicator Algorithms: These are not separate trained models but a set of three indicators derived directly from the output of the ICD Category Distribution algorithm. A predefined, expert-curated mapping matrix (defined in R-TF-028-004) assigns each of the [NUMBER OF CATEGORIES] ICD-11 categories to one or more indicators. The final value for each indicator is calculated by summing the probabilities of all associated ICD-11 categories. The three indicators are:
  • Malignancy
  • Critical Complexity
  • Dermatological Condition

AI/ML Standalone Evaluation Objectives​

The standalone validation aimed to confirm that the final algorithms meet the predefined performance criteria outlined in R-TF-028-001.

The primary objectives and endpoints were:

AlgorithmObjectiveEndpointsSuccess Criteria
ICD Category DistributionProvide an accurate differential diagnosis suggestion.Top-1, Top-3, and Top-5 Accuracy on a held-out test set.Top-1 Accuracy ≥ 55%<br>Top-3 Accuracy ≥ 70%<br>Top-5 Accuracy ≥ 80%
Binary IndicatorsProvide reliable signals for case prioritization and assessment.Area Under the ROC Curve (AUC) on a held-out test set.AUC ≥ 0.80 for each of the three indicators.

Data Management​

Collection​

The dataset was compiled from two distinct retrospective sources as detailed in the respective data collection instructions:

  • Public Datasets (R-TF-028-003): Images sourced from reputable online dermatological atlases (e.g., DermNet NZ, ISIC, PAD-UFES-20).
  • Prospective Clinical Study (R-TF-028-004): Images collected under a formal protocol at the Hospital Universitario de Torrejón.

This combined approach resulted in a total dataset of [NUMBER OF IMAGES] RGB images, covering nearly 1,000 different initial categories and a diverse representation of age, sex, and skin phototypes.

Annotation, Truthing, and Consensus​

  • ICD-11 Labels: The primary diagnostic labels were sourced directly from the datasets, having been provided by medical experts. A thorough curation process was undertaken to standardize all taxonomies to the ICD-11 classification system.
  • Binary Indicator Mapping: The ground truth for the binary indicators was established by creating a mapping matrix, as detailed in R-TF-028-004. This process involved a board-certified dermatologist assigning each of the final [NUMBER OF CATEGORIES] ICD-11 categories to the three indicators, followed by an independent review and consensus process.

Preparation and Partitioning​

The final dataset was partitioned into three distinct sets: training, validation, and testing. To prevent data leakage and ensure an unbiased final evaluation, the split was performed at the subject level where subject IDs were available. For sources without subject IDs, a class-wise split was performed.

Crucially, some data sources, including the entire prospective clinical study dataset from H.U. Torrejón, were sequestered and reserved exclusively for the final test set.

SetPurposeImage Count
TrainingModel fitting and parameter updates.[Insert Number]
ValidationHyperparameter tuning and model selection.[Insert Number]
TestFinal, unbiased performance evaluation.[Insert Number]

Algorithm Training (ICD Category Distribution)​

Pre-processing​

Input images were resized to the model's required input dimensions. During training, a rich data augmentation pipeline was applied, including random cropping (guided by annotated bounding boxes where available), rotations, and various pixel transformations (color jittering, histogram equalization, etc.) to increase the diversity of the training data and improve model generalization. No augmentations were applied to the test data.

Design, Training, and Tuning​

  • Architecture: The selected model is a Vision Transformer (ViT), a state-of-the-art architecture for image recognition.
  • Training: The model was trained using transfer learning, initializing with weights pre-trained on a large-scale natural image dataset. The training process utilized the Adam optimizer, a cross-entropy loss function, and a one-cycle learning rate policy to enable super-convergence. Progress was monitored on the validation set to prevent overfitting, using early stopping if performance plateaued.

Post-processing​

Two key post-processing steps were implemented to enhance performance and reliability:

  1. Model Calibration: Temperature scaling was applied to the model's raw outputs. This technique adjusts the softmax function to produce better-calibrated probability distributions, ensuring that the model's confidence scores are more reliable.
  2. Test-Time Augmentation (TTA): During inference, multiple augmented versions of the input image are created and passed through the model. The resulting probability distributions are then averaged to produce a single, more robust final prediction.

Algorithm Performance Evaluation/Testing​

The final, selected algorithm package was evaluated on the sequestered, held-out test set, which was not used at any point during training or model selection.

ICD Category Distribution Performance​

The model's ability to correctly identify the ground truth diagnosis was assessed using Top-k accuracy. The results below demonstrate that the algorithm successfully met and exceeded all predefined success criteria.

MetricResultSuccess CriterionOutcome
Top-1 Accuracy74%≥ 55%PASS
Top-3 Accuracy86%≥ 70%PASS
Top-5 Accuracy90%≥ 80%PASS

Binary Indicator Performance​

The performance of the derived binary indicators was evaluated using the Area Under the ROC Curve (AUC). The ground truth for this evaluation was determined by applying the expert-defined mapping matrix to the ground truth ICD-11 labels of the test set. The results show that all three indicators achieved outstanding performance, well above the acceptance threshold.

IndicatorResult (AUC)Success CriterionOutcome
Malignancy0.96≥ 0.80PASS
Critical Complexity0.94≥ 0.80PASS
Dermatological Condition0.99≥ 0.80PASS

Bias Analysis​

An analysis was conducted on the external Diverse Dermatology Images (DDI) dataset to assess performance across different Fitzpatrick skin types. Initial results were consistent with published findings for other devices. However, after manually cropping the images to focus on the region of interest, performance improved across all groups, with the overall AUC for malignancy detection rising from 0.6510 to 0.7627. This highlights the model's robustness and the critical impact of image quality on performance.

Conclusion​

The development and validation activities described in this report provide objective evidence that the AI/ML algorithms for Legit.Health Plus meet their predefined specifications and performance requirements.

The ICD Category Distribution algorithm demonstrated high accuracy, significantly exceeding all Top-k endpoints. The derived Binary Indicators proved to be exceptionally effective, achieving outstanding AUC scores.

The development process adhered to the company's QMS and followed Good Machine Learning Practices. The final algorithm package is considered verified, validated, and suitable for release and integration into the Legit.Health Plus medical device.

AI/ML Risks Assessment Report​

AI/ML Risk Assessment​

A comprehensive risk assessment was conducted throughout the development lifecycle in accordance with the R-TF-028-002 AI/ML Development Plan. All identified AI/ML-specific risks related to data, model training, and performance were documented and analyzed in the R-TF-028-011 AI/ML Risk Matrix.

AI/ML Risk Treatment​

Control measures were implemented to mitigate all identified risks. Key controls included:

  • Rigorous data curation and multi-source collection to mitigate bias.
  • Systematic model training and validation procedures to prevent overfitting.
  • Use of a sequestered test set to ensure unbiased performance evaluation.
  • Implementation of model calibration to improve the reliability of outputs.

Residual AI/ML Risk Assessment​

After the implementation of control measures, a residual risk analysis was performed. All identified AI/ML risks were successfully reduced to an acceptable level.

AI/ML Risk and Traceability with Safety Risk​

Safety risks related to the AI/ML algorithms (e.g., incorrect diagnosis suggestion, misinterpretation of data) were identified and traced back to their root causes in the AI/ML development process. These safety risks have been escalated for management in the overall device Safety Risk Matrix, in line with ISO 14971.

Conclusion​

The AI/ML development process has successfully managed and mitigated inherent risks to an acceptable level. The benefits of using the Legit.Health Plus algorithms as a clinical decision support tool are judged to outweigh the residual risks.

Related Documents​

  • Project Design and Plan -R-TF-028-001 AI/ML Description -R-TF-028-002 AI/ML Development Plan -R-TF-028-011 AI/ML Risk Matrix
  • Data Collection and Annotation -R-TF-028-003 Data Collection Instructions - Public Datasets and Atlases -R-TF-028-004 Data Collection Instructions - Prospective Clinical Study (H.U. Torrejón) -R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-001
Previous
R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
Next
R-TF-028 AI/ML Release Report
  • Introduction
    • Context
    • Algorithms Description
    • AI/ML Standalone Evaluation Objectives
  • Data Management
    • Collection
    • Annotation, Truthing, and Consensus
    • Preparation and Partitioning
  • Algorithm Training (ICD Category Distribution)
    • Pre-processing
    • Design, Training, and Tuning
    • Post-processing
  • Algorithm Performance Evaluation/Testing
    • ICD Category Distribution Performance
    • Binary Indicator Performance
    • Bias Analysis
  • Conclusion
  • AI/ML Risks Assessment Report
    • AI/ML Risk Assessment
    • AI/ML Risk Treatment
    • Residual AI/ML Risk Assessment
    • AI/ML Risk and Traceability with Safety Risk
    • Conclusion
  • Related Documents
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)