R-TF-028-001 AI/ML Development Plan

Table of contents

Abbreviations
Introduction
- Context
- Objectives
Team
Project Management
Environment
AI/ML Development Plan
Data Management Plan
- Good Practices
- Working Plan
Training & Evaluation Plan
- Good Practices
- Working Plan
Release Plan
AI/ML Risk Management Plan

Abbreviations

Term	Definition
AI/ML	Artificial Intelligence / Machine Learning
AUC	Area Under the Receiver Operating Characteristic Curve
GDPR	General Data Protection Regulation
GMLP	Good Machine Learning Practice
ICD	International Classification of Diseases
ONNX	Open Neural Network Exchange
QMS	Quality Management System
RPN	Risk Priority Number
ViT	Vision Transformer
XAI	Explainable Artificial Intelligence

Introduction

Context

Legit.Health Plus provides advanced Clinical Decision Support (CDS) through AI/ML algorithms designed to assist qualified healthcare professionals in the assessment of dermatological conditions. The algorithms analyze clinical and dermoscopic images of skin lesions to generate objective, data-driven insights. It is critical to note that the device is intended to augment, not replace, the clinical judgment of a healthcare professional.

The core AI/ML functionality is delivered through two algorithm types:

An ICD Category Distribution Algorithm: A multiclass classification model that processes a lesion image and outputs a ranked probability distribution across relevant ICD-11 categories, presenting the top five differential diagnoses.
Binary Indicator Algorithms: Derived from the primary model's output, these algorithms provide three discrete indicators for case prioritization: Malignancy, Dermatological Condition, and Critical Complexity.

Objectives

The primary objectives of this development plan are to:

Develop a robust ICD Category Distribution algorithm to assist clinicians in formulating a differential diagnosis, thereby enhancing diagnostic accuracy and efficiency, while meeting the performance endpoints specified in R-TF-028-001.
Develop three highly performant Binary Indicator algorithms to provide clear, actionable signals for clinical workflow prioritization, meeting the AUC thresholds defined in R-TF-028-001.
Ensure the entire development lifecycle adheres to the company's QMS, GMLP principles, and applicable regulations (MDR 2017/745, ISO 13485) to deliver safe and effective algorithms.

Team

Role	Description And Responsibilities	Person(s)
Technical Manager	Overall management of team planning and resources. Ensuring alignment with QMS procedures. Application of this procedure.	Alfonso Medela
Design & Development Manager	Manages the design and development lifecycle, including verification and validation activities in accordance with GP-012.	Taig Mac Carthy
AI Team	Develops, validates, and maintains the AI/ML algorithms. Responsible for data management, training, evaluation, and release processes.

Project Management

Meetings

Sprint Meetings: The project follows an Agile framework with 2-week sprints. Bi-weekly meetings are held for sprint review, retrospective analysis, and planning.
Daily Stand-ups: The AI team conducts daily stand-up meetings to synchronize progress, address impediments, and align on daily priorities.
Technical Reviews: Bi-weekly or monthly meetings are held to present key R&D findings, review model architectures, and discuss experimental results with cross-functional stakeholders.

Management Tools

Tool	Description
Jira	To manage the product backlog, plan sprints, and track all tasks, bugs, and user stories with full traceability.
GitHub	Central repository for technical documentation, design specifications, meeting minutes, and sprint reports.

Project Planning

The Technical Manager is responsible for the overall project planning and monitoring, ensuring that development milestones align with the product roadmap and regulatory timelines.

Environment

Development Tools

Tool	Description
Bitbucket / Git	For rigorous version control of all source code, models, and critical configuration files. Enforces peer review via pull requests.
Docker	To create containerized, reproducible environments, ensuring consistency between development, testing, and deployment.
MLflow / Weights & Biases	For systematic tracking of experiments, including parameters, metrics, code versions, and model artifacts, ensuring full reproducibility.

Development Software

Software	Description
Python `>=3.9`	Primary programming language.
TensorFlow `>=2.10` / PyTorch `>=1.12`	State-of-the-art deep learning frameworks.
CUDA / cuDNN	NVIDIA libraries for GPU acceleration.
NumPy, Pandas, Scikit-learn, OpenCV	Core libraries for data manipulation, image processing, and performance evaluation.
Flake8 / Black / MyPy / Pytest	A suite of tools to enforce code quality, style, type safety, and correctness through automated testing.

Development Environment

AI/ML development is conducted on a secure, high-performance computing infrastructure.

Environment	Description
Research Server (Ubuntu 22.04 LTS)	Primary environment for model training, evaluation, and experiment management.
Database	PostgreSQL instance for structured storage of annotations and metadata.
Data Storage	Secure, access-controlled cloud storage (e.g., AWS S3, Google Cloud Storage) for medical images.

Research Server Minimum Requirements:

OS: Ubuntu 22.04 LTS or higher
GPU: NVIDIA A100 or H100 (or equivalent) with >= 40 GB VRAM
CPU: >= 32 cores @ >= 2.5 GHz
RAM: >= 128 GB
Storage: >= 5 TB of high-speed NVMe SSD storage

AI/ML Development Plan

Development Cycle

The AI/ML development adheres to the three-phase cycle mandated by procedure GP-028 AI Development, ensuring a structured progression from design to release.

Development Specifications

All development is strictly governed by the specifications in R-TF-028-001 AI/ML Description. This document serves as the primary input for design and defines the acceptance criteria for V&V.

Development Steps

Data Management: Sourcing, curating, annotating, and partitioning data according to GMLP.
Training & Evaluation: Building, training, tuning, and rigorously evaluating models.
Release (V&V): Finalizing, documenting, and packaging the model for software integration.

Data Management Plan

Good Practices

Data Collection & Curation

Representativeness: In line with GMLP principles, data is collected to be highly representative of the intended patient population. Active measures are taken to ensure diversity across age, sex, and all six Fitzpatrick skin phototypes to promote equitable performance.
Protocols: Data acquisition follows the detailed clinical and technical requirements in R-TF-028-003, ensuring consistency in image quality.
Compliance: All data processing is fully compliant with GDPR. Data is de-identified at the source, and robust data protection impact assessments are conducted.

Data Quality & Integrity

Annotation: Data is labeled by qualified dermatologists following R-TF-028-004. Critical labels are subject to a multi-annotator review process to ensure high quality and consistency.
Traceability: Data is managed using version-controlled snapshots. Each snapshot is an immutable, timestamped collection of data and labels, ensuring a complete audit trail from data to the final model.

Ground Truth Determination

Methodology: The ground truth for diagnoses is established by a panel of at least three board-certified dermatologists. Discrepancies are resolved by a senior reviewer or through histopathological correlation where available and clinically appropriate. This robust process minimizes label noise and ensures a high-fidelity reference standard.

Sequestration of Test Data

Partitioning: The dataset is partitioned at the patient level into training, validation, and test sets. This strict separation is critical to prevent data leakage and ensure that the final performance evaluation is unbiased.
Shielding: The test set is a sequestered, held-out dataset used only once for the final, unbiased evaluation of the selected model. It is never used for training, tuning, or model selection.

Working Plan

Data is collected, de-identified, and securely stored.
Data is annotated according to the defined multi-stage review process.
A versioned data snapshot is created and frozen.
The snapshot is split by patient ID into training, validation, and test sets. The test set is immediately sequestered.
The snapshot version and split definitions are logged for full reproducibility.

Training & Evaluation Plan

Good Practices

Reproducibility and Traceability

Versioning: Every component is versioned: Git for code, DVC for data, and MLflow for experiments. Each trained model is linked to the exact code, data, and hyperparameters used to create it.

Model Design & Selection

Architecture: Model selection is informed by a systematic review of state-of-the-art architectures (e.g., ViT, ConvNeXt, EfficientNetV2).
Hyperparameter Optimization: A structured approach (e.g., Bayesian optimization or grid search) is used to find the optimal set of hyperparameters.

Model Training & Tuning

Augmentation: A rich set of data augmentation techniques is used to improve generalization, including geometric transformations (rotation, scaling, flipping) and photometric distortions (brightness, contrast, color jitter) that reflect real-world variability.
Overfitting Mitigation: In addition to augmentation, techniques like dropout, weight decay, and early stopping are employed to ensure models generalize well to unseen data.
Model Calibration: Post-training calibration techniques (e.g., temperature scaling) are applied to ensure that the model's output probabilities are reliable and well-calibrated, meaning a predicted 80% confidence accurately reflects an 80% likelihood of correctness.

Model Evaluation & Validation

Robustness Analysis: Performance is evaluated not just on aggregate metrics but also across key patient subgroups (e.g., by skin phototype, age, sex) to proactively identify and mitigate potential biases.
Explainability (XAI): During development, XAI techniques (e.g., Grad-CAM, SHAP) are used to visualize and understand the model's decision-making process. This helps verify that the model is learning clinically relevant features and not relying on confounding artifacts.
Statistical Rigor: All key performance metrics are reported with 95% confidence intervals to accurately represent statistical uncertainty.

Working Plan

A model configuration file specifies all parameters for a training run.
The model is trained, with all metrics and artifacts logged in real-time to MLflow.
A uniquely identified model package is generated, containing the model, its configuration, and training history.
A final, comprehensive evaluation is performed on the held-out test set, with results and explainability analyses compiled into the final performance report.

Release Plan

Good Practices

Equivalence Testing: Models are converted to a high-performance format (e.g., ONNX). Rigorous tests are run to verify near-identical numerical output between the original and converted models.
Comprehensive Reporting: The AI/ML Development Report (R-TF-028-005) provides a complete account of the development and V&V process, serving as objective evidence that the model is safe and effective.
Clear Instructions: The AI/ML Release (R-TF-028-006) document provides the software team with precise integration specifications.
Semantic Versioning: The algorithm release package is assigned a unique semantic version (e.g., v1.0.0), with full traceability to the versions of its constituent models.

Working Plan

Verification is performed to confirm the model was developed according to this plan.
Validation is performed to confirm the model meets the acceptance criteria in R-TF-028-001.
The V&V results are documented in the AI/ML Development Report (R-TF-028-005).
The final algorithm package and AI/ML Release (R-TF-028-006) are delivered to the software team.

Deliverables

Documentation

All R-TF-028-xxx documents generated, including Description, Development Plan, Reports, and completed V&V checklists.

Algorithm Package

1 ICD Category Distribution algorithm (as .onnx file).
1 Binary Indicators configuration (as .json mapping file).

AI/ML Risk Management Plan

This plan focuses on risks inherent to the AI/ML development lifecycle, as recorded in R-TF-028-011 AI/ML Risk Matrix. This process is a key input into the overall device risk management activities governed by ISO 14971.

AI/ML Risk Management Process

Risk Assessment: Systematically identifying, analyzing, and evaluating risks related to data, model training, and performance.
Risk Control: Implementing and verifying mitigation measures for all unacceptable risks.
Monitoring & Review: Continuously reviewing risks throughout the lifecycle.

AI/ML Risk Ranking System

$RPN = Severity \times Likelihood$

Severity

Severity is based on the potential impact on model performance and its clinical utility.

Ranking	Definition	Severity
5	Degrades model performance to a point of being fundamentally flawed or unsafe (e.g., systematically misclassifies critical conditions).	Catastrophic
4	Significantly degrades model performance, making it frequently unreliable or erroneous for its intended task.	Critical
3	Moderately degrades model performance, making it often erroneous under specific, plausible conditions.	Moderate
2	Slightly degrades model performance, making it sometimes erroneous or showing minor performance loss.	Minor
1	Negligibly degrades model performance with no discernible impact on clinical utility.	Negligible

Likelihood

Likelihood of the risk occurring during development.

Ranking	Definition	Likelihood
5	Almost certain to occur if not controlled.	Very high
4	Likely to occur.	High
3	May occur.	Moderate
2	Unlikely to occur.	Low
1	Extremely unlikely to occur.	Very low

AI/ML Risk Priority Number and Acceptability

Severity →<br>Likelihood ↓	Negligible (1)	Minor (2)	Moderate (3)	Critical (4)	Catastrophic (5)
Very high (5)	Tolerable (5)	Tolerable (10)	Unacceptable (15)	Unacceptable (20)	Unacceptable (25)
High (4)	Acceptable (4)	Tolerable (8)	Tolerable (12)	Unacceptable (16)	Unacceptable (20)
Moderate (3)	Acceptable (3)	Tolerable (6)	Tolerable (9)	Tolerable (12)	Unacceptable (15)
Low (2)	Acceptable (2)	Acceptable (4)	Tolerable (6)	Tolerable (8)	Tolerable (10)
Very low (1)	Acceptable (1)	Acceptable (2)	Acceptable (3)	Acceptable (4)	Tolerable (5)

Acceptable: RPN ≤ 4
Tolerable: 5 ≤ RPN ≤ 12 (Requires risk-benefit analysis)
Unacceptable: RPN ≥ 15 (Requires mitigation)

The AI team is responsible for identifying how AI/ML development risks can contribute to hazardous situations. These "safety risks related to AI/ML" are escalated to the product team for inclusion in the overall Safety Risk Matrix and are mitigated through a combination of technical controls and user-facing measures, in line with ISO 14971.

A1 --> B1 --> C1 A2 --> B2 --> C1 B1 --> D1 & D3 B2 --> D2 & D3

<!-- hay que revisar bien quien firma, esto es un copia pega! -->

<Signature contentTitle={contentTitle} />

Abbreviations​

Introduction​

Context​

Objectives​

Team​

Project Management​

Meetings​

Management Tools​

Project Planning​

Environment​

Development Tools​

Development Software​

Development Environment​

AI/ML Development Plan​

Development Cycle​

Development Specifications​

Development Steps​

Data Management Plan​

Good Practices​

Data Collection & Curation​

Data Quality & Integrity​

Ground Truth Determination​

Sequestration of Test Data​

Working Plan​

Training & Evaluation Plan​

Good Practices​

Reproducibility and Traceability​

Model Design & Selection​

Model Training & Tuning​

Model Evaluation & Validation​

Working Plan​

Release Plan​

Good Practices​

Working Plan​

Deliverables​

Documentation​

Algorithm Package​

AI/ML Risk Management Plan​

AI/ML Risk Management Process​

AI/ML Risk Ranking System​

Severity​

Likelihood​

AI/ML Risk Priority Number and Acceptability​

Safety Risks Related to AI/ML​

Abbreviations

Introduction

Context

Objectives

Team

Project Management

Meetings

Management Tools

Project Planning

Environment

Development Tools

Development Software

Development Environment

AI/ML Development Plan

Development Cycle

Development Specifications

Development Steps

Data Management Plan

Good Practices

Data Collection & Curation

Data Quality & Integrity

Ground Truth Determination

Sequestration of Test Data

Working Plan

Training & Evaluation Plan

Good Practices

Reproducibility and Traceability

Model Design & Selection

Model Training & Tuning

Model Evaluation & Validation

Working Plan

Release Plan

Good Practices

Working Plan

Deliverables

Documentation

Algorithm Package

AI/ML Risk Management Plan

AI/ML Risk Management Process

AI/ML Risk Ranking System

Severity

Likelihood

AI/ML Risk Priority Number and Acceptability

Safety Risks Related to AI/ML