R-TF-028-001 AI Development Plan
Table of contents
Abbreviations
| Term | Definition | 
|---|---|
| AI | Artificial Intelligence | 
| AUC | Area Under the Receiver Operating Characteristic Curve | 
| GDPR | General Data Protection Regulation | 
| GMLP | Good Machine Learning Practice | 
| ICD | International Classification of Diseases | 
| QMS | Quality Management System | 
| RPN | Risk Priority Number | 
| ViT | Vision Transformer | 
| XAI | Explainable Artificial Intelligence | 
Introduction
Context
Legit.Health Plus provides advanced Clinical Decision Support (CDS) through AI algorithms designed to assist qualified healthcare professionals in the assessment of dermatological conditions. The algorithms analyze clinical and dermoscopic images of skin lesions to generate objective, data-driven insights. It is critical to note that the device is intended to augment, not replace, the clinical judgment of a healthcare professional.
The core AI functionality is delivered through two algorithm types:
- An ICD Category Distribution Algorithm: A multiclass classification model that processes a lesion image and outputs a ranked probability distribution across relevant ICD-11 categories, presenting the top five differential diagnoses.
 - Binary Indicator Algorithms: Derived from the primary model's output, these algorithms provide three discrete indicators for case prioritization: Malignancy, Dermatological Condition, and Critical Complexity.
 
Objectives
The primary objectives of this development plan are to:
- Develop a robust ICD Category Distribution algorithm to assist clinicians in formulating a differential diagnosis, thereby enhancing diagnostic accuracy and efficiency, while meeting the performance endpoints specified in 
R-TF-028-001. - Develop three highly performant Binary Indicator algorithms to provide clear, actionable signals for clinical workflow prioritization, meeting the AUC thresholds defined in 
R-TF-028-001. - Ensure the entire development lifecycle adheres to the company's QMS, GMLP principles, and applicable regulations (MDR 2017/745, ISO 13485) to deliver safe and effective algorithms.
 
Team
| Role | Description And Responsibilities | Person(s) | 
|---|---|---|
| Technical Manager | Overall management of team planning and resources. Ensuring alignment with QMS procedures. Application of this procedure. | Alfonso Medela | 
| Design & Development Manager | Manages the design and development lifecycle, including verification and validation activities in accordance with GP-012. | Taig Mac Carthy | 
| AI Team | Develops, validates, and maintains the AI algorithms. Responsible for data management, training, evaluation, and release processes. | 
Project Management
Meetings
- Sprint Meetings: The project follows an Agile framework with 2-week sprints. Bi-weekly meetings are held for sprint review, retrospective analysis, and planning.
 - Daily Stand-ups: The AI team conducts daily stand-up meetings to synchronize progress, address impediments, and align on daily priorities.
 - Technical Reviews: Bi-weekly or monthly meetings are held to present key R&D findings, review model architectures, and discuss experimental results with cross-functional stakeholders.
 
Management Tools
| Tool | Description | 
|---|---|
| Jira | To manage the product backlog, plan sprints, and track all tasks, bugs, and user stories with full traceability. | 
| GitHub | Central repository for technical documentation, design specifications, meeting minutes, and sprint reports. | 
Project Planning
The Technical Manager is responsible for the overall project planning and monitoring, ensuring that development milestones align with the product roadmap and regulatory timelines.
Environment
Development Tools
| Tool | Description | 
|---|---|
| Bitbucket / Git | For rigorous version control of all source code, models, and critical configuration files. Enforces peer review via pull requests. | 
| Docker | To create containerized, reproducible environments, ensuring consistency between development, testing, and deployment. | 
| MLflow / Weights & Biases | For systematic tracking of experiments, including parameters, metrics, code versions, and model artifacts, ensuring full reproducibility. | 
Development Software
| Software | Description | 
|---|---|
Python >=3.9 | Primary programming language. | 
TensorFlow >=2.10 / PyTorch >=1.12 | State-of-the-art deep learning frameworks. | 
| CUDA / cuDNN | NVIDIA libraries for GPU acceleration. | 
| NumPy, Pandas, Scikit-learn, OpenCV | Core libraries for data manipulation, image processing, and performance evaluation. | 
| Flake8 / Black / MyPy / Pytest | A suite of tools to enforce code quality, style, type safety, and correctness through automated testing. | 
Development Environment
AI development is conducted on a secure, high-performance computing infrastructure.
| Environment | Description | 
|---|---|
| Research Server (Ubuntu 22.04 LTS) | Primary environment for model training, evaluation, and experiment management. | 
| Database | PostgreSQL instance for structured storage of annotations and metadata. | 
| Data Storage | Secure, access-controlled cloud storage (e.g., AWS S3, Google Cloud Storage) for medical images. | 
Research Server Minimum Requirements:
- OS: Ubuntu 22.04 LTS or higher
 - GPU: NVIDIA A100 or H100 (or equivalent) with >= 40 GB VRAM
 - CPU: >= 32 cores @ >= 2.5 GHz
 - RAM: >= 128 GB
 - Storage: >= 5 TB of high-speed NVMe SSD storage
 
AI Development Plan
Development Cycle
The AI development adheres to the three-phase cycle mandated by procedure GP-028 AI Development, ensuring a structured progression from design to release.
Development Specifications
All development is strictly governed by the specifications in R-TF-028-001 AI Description. This document serves as the primary input for design and defines the acceptance criteria for V&V.
Development Steps
- Data Management: Sourcing, curating, annotating, and partitioning data according to GMLP.
 - Training & Evaluation: Building, training, tuning, and rigorously evaluating models.
 - Release (V&V): Finalizing, documenting, and packaging the model for software integration.
 
Data Management Plan
Good Practices
Data Collection & Curation
- Representativeness: In line with GMLP principles, data is collected to be highly representative of the intended patient population. Active measures are taken to ensure diversity across age, sex, and all six Fitzpatrick skin phototypes to promote equitable performance.
 - Protocols: Data acquisition follows the detailed clinical and technical requirements in 
R-TF-028-003, ensuring consistency in image quality. - Compliance: All data processing is fully compliant with GDPR. Data is de-identified at the source, and robust data protection impact assessments are conducted.
 
Data Quality & Integrity
- Annotation: Data is labeled by qualified dermatologists following 
R-TF-028-004. Critical labels are subject to a multi-annotator review process to ensure high quality and consistency. - Traceability: Data is managed using version-controlled snapshots. Each snapshot is an immutable, timestamped collection of data and labels, ensuring a complete audit trail from data to the final model.
 
Ground Truth Determination
- Methodology: The ground truth for diagnoses is established by a panel of at least three board-certified dermatologists. Discrepancies are resolved by a senior reviewer or through histopathological correlation where available and clinically appropriate. This robust process minimizes label noise and ensures a high-fidelity reference standard.
 
Sequestration of Test Data
- Partitioning: The dataset is partitioned at the patient level into training, validation, and test sets. This strict separation is critical to prevent data leakage and ensure that the final performance evaluation is unbiased.
 - Shielding: The test set is a sequestered, held-out dataset used only once for the final, unbiased evaluation of the selected model. It is never used for training, tuning, or model selection.
 
Working Plan
- Data is collected, de-identified, and securely stored.
 - Data is annotated according to the defined multi-stage review process.
 - A versioned data snapshot is created and frozen.
 - The snapshot is split by patient ID into training, validation, and test sets. The test set is immediately sequestered.
 - The snapshot version and split definitions are logged for full reproducibility.
 
Training & Evaluation Plan
Good Practices
Reproducibility and Traceability
- Versioning: Every component is versioned: Git for code, DVC for data, and MLflow for experiments. Each trained model is linked to the exact code, data, and hyperparameters used to create it.
 
Model Design & Selection
- Architecture: Model selection is informed by a systematic review of state-of-the-art architectures (e.g., ViT, ConvNeXt, EfficientNetV2).
 - Hyperparameter Optimization: A structured approach (e.g., Bayesian optimization or grid search) is used to find the optimal set of hyperparameters.
 
Model Training & Tuning
- Augmentation: A rich set of data augmentation techniques is used to improve generalization, including geometric transformations (rotation, scaling, flipping) and photometric distortions (brightness, contrast, color jitter) that reflect real-world variability.
 - Overfitting Mitigation: In addition to augmentation, techniques like dropout, weight decay, and early stopping are employed to ensure models generalize well to unseen data.
 - Model Calibration: Post-training calibration techniques (e.g., temperature scaling) are applied to ensure that the model's output probabilities are reliable and well-calibrated, meaning a predicted 80% confidence accurately reflects an 80% likelihood of correctness.
 
Model Evaluation & Validation
- Robustness Analysis: Performance is evaluated not just on aggregate metrics but also across key patient subgroups (e.g., by skin phototype, age, sex) to proactively identify and mitigate potential biases.
 - Explainability (XAI): During development, XAI techniques (e.g., Grad-CAM, SHAP) are used to visualize and understand the model's decision-making process. This helps verify that the model is learning clinically relevant features and not relying on confounding artifacts.
 - Statistical Rigor: All key performance metrics are reported with 95% confidence intervals to accurately represent statistical uncertainty.
 
Working Plan
- A model configuration file specifies all parameters for a training run.
 - The model is trained, with all metrics and artifacts logged in real-time to MLflow.
 - A uniquely identified model package is generated, containing the model, its configuration, and training history.
 - A final, comprehensive evaluation is performed on the held-out test set, with results and explainability analyses compiled into the final performance report.
 
Release Plan
Good Practices
- Model Serialization: Models are saved in PyTorch native format (.pt or .pth), preserving full model architecture, weights, and computational graph for reliable deployment and reproducibility.
 - Comprehensive Reporting: The 
AI Development Report (R-TF-028-005)provides a complete account of the development and V&V process, serving as objective evidence that the model is safe and effective. - Clear Instructions: The 
AI Release (R-TF-028-006)document provides the software team with precise integration specifications, including PyTorch runtime requirements and inference procedures. - Semantic Versioning: The algorithm release package is assigned a unique semantic version (e.g., 
v1.0.0), with full traceability to the versions of its constituent models. 
Working Plan
- Verification is performed to confirm the model was developed according to this plan.
 - Validation is performed to confirm the model meets the acceptance criteria in 
R-TF-028-001. - The V&V results are documented in the 
AI Development Report (R-TF-028-005). - The final algorithm package and 
AI Release (R-TF-028-006)are delivered to the software team. 
Deliverables
Documentation
- All 
R-TF-028-xxxdocuments generated, including Description, Development Plan, Reports, and completed V&V checklists. 
Algorithm Package
The algorithm package contains all AI models described in R-TF-028-001 AI Description, organized by clinical function. All models are trained and deployed using PyTorch framework, with models saved in native PyTorch format (.pt, .pth, or .ckpt files) to ensure full preservation of model architecture, training state, and computational graph integrity.
Detailed integration specifications, including input/output formats, preprocessing requirements, and post-processing instructions, are provided in R-TF-028-006 AI Release.
Clinical Models
Clinical models directly fulfill the device's intended purpose by providing quantitative data on clinical signs and interpretative distributions of ICD categories.
ICD Category Distribution and Binary Indicators
| Model | File Name | 
|---|---|
| ICD Category Distribution | icd_distribution_v{X.Y.Z}.* | 
| Binary Indicators Mapping | binary_indicators_mapping_v{X.Y.Z}.json | 
Intensity Quantification Models
| Clinical Sign | File Name | 
|---|---|
| Erythema Intensity | erythema_intensity_v{X.Y.Z}.* | 
| Desquamation Intensity | desquamation_intensity_v{X.Y.Z}.* | 
| Induration Intensity | induration_intensity_v{X.Y.Z}.* | 
| Pustule Intensity | pustule_intensity_v{X.Y.Z}.* | 
| Crusting Intensity | crusting_intensity_v{X.Y.Z}.* | 
| Xerosis Intensity | xerosis_intensity_v{X.Y.Z}.* | 
| Swelling Intensity | swelling_intensity_v{X.Y.Z}.* | 
| Oozing Intensity | oozing_intensity_v{X.Y.Z}.* | 
| Excoriation Intensity | excoriation_intensity_v{X.Y.Z}.* | 
| Lichenification Intensity | lichenification_intensity_v{X.Y.Z}.* | 
Wound Assessment
| Model | File Name | 
|---|---|
| Wound Characteristics Multi-Task | wound_characteristics_v{X.Y.Z}.* | 
Surface Quantification Models
| Surface Type | File Name | 
|---|---|
| Body Surface Segmentation | body_surface_segmentation_v{X.Y.Z}.* | 
| Wound Surface Quantification | wound_surface_v{X.Y.Z}.* | 
| Hair Loss Surface Quantification | hair_loss_surface_v{X.Y.Z}.* | 
| Hypopigmentation/Depigmentation | hypopigmentation_surface_v{X.Y.Z}.* | 
Lesion Counting Models
| Lesion Type | File Name | 
|---|---|
| Inflammatory Nodular Lesions | inflammatory_nodular_v{X.Y.Z}.* | 
| Acneiform Lesion Types | acneiform_lesion_types_v{X.Y.Z}.* | 
| Inflammatory Lesions | inflammatory_lesion_v{X.Y.Z}.* | 
| Hive Lesions | hive_lesion_v{X.Y.Z}.* | 
| Nail Lesion Surface | nail_lesion_surface_v{X.Y.Z}.* | 
Pattern Identification Models
| Pattern Type | File Name | 
|---|---|
| Acneiform Inflammatory Pattern | acneiform_pattern_v{X.Y.Z}.* | 
| Follicular and Inflammatory Pattern | follicular_inflammatory_pattern_v{X.Y.Z}.* | 
| Inflammatory Pattern Identification | inflammatory_pattern_v{X.Y.Z}.* | 
| Inflammatory Pattern Indicator | inflammatory_pattern_indicator_v{X.Y.Z}.* | 
Non-Clinical Models
Non-clinical models enable proper functioning of clinical models through quality assurance, preprocessing, and technical validation. These models do not directly provide clinical outputs but support the reliability and safety of the device.
Quality Assessment Models
| Model | File Name | 
|---|---|
| Dermatology Image Quality Assessment (DIQA) | diqa_v{X.Y.Z}.* | 
| Domain Validation | domain_validation_v{X.Y.Z}.* | 
Support Models
| Model | File Name | 
|---|---|
| Fitzpatrick Skin Type Identification | fitzpatrick_v{X.Y.Z}.* | 
| Skin Surface Segmentation | skin_surface_segmentation_v{X.Y.Z}.* | 
| Surface Area Quantification | surface_area_v{X.Y.Z}.* | 
| Body Site Identification | body_site_v{X.Y.Z}.* | 
Model Versioning
All models follow semantic versioning: model_name_v{MAJOR.MINOR.PATCH}.* where * represents the file extension (.pt, .pth, or .ckpt)
- MAJOR: Incremented for incompatible API changes or significant architecture modifications
 - MINOR: Incremented for backward-compatible functionality additions or performance improvements
 - PATCH: Incremented for backward-compatible bug fixes or minor updates
 
Each model package includes accompanying metadata documenting training date, dataset version, performance metrics, and dependencies.
AI Risk Management Plan
This plan focuses on risks inherent to the AI development lifecycle, as recorded in R-TF-028-011 AI Risk Matrix. This process is a key input into the overall device risk management activities governed by ISO 14971.
AI Risk Management Process
- Risk Assessment: Systematically identifying, analyzing, and evaluating risks related to data, model training, and performance.
 - Risk Control: Implementing and verifying mitigation measures for all unacceptable risks.
 - Monitoring & Review: Continuously reviewing risks throughout the lifecycle.
 
AI Risk Ranking System
Severity
Severity is based on the potential impact on model performance and its clinical utility.
| Ranking | Definition | Severity | 
|---|---|---|
| 5 | Degrades model performance to a point of being fundamentally flawed or unsafe (e.g., systematically misclassifies critical conditions). | Catastrophic | 
| 4 | Significantly degrades model performance, making it frequently unreliable or erroneous for its intended task. | Critical | 
| 3 | Moderately degrades model performance, making it often erroneous under specific, plausible conditions. | Moderate | 
| 2 | Slightly degrades model performance, making it sometimes erroneous or showing minor performance loss. | Minor | 
| 1 | Negligibly degrades model performance with no discernible impact on clinical utility. | Negligible | 
Likelihood
Likelihood of the risk occurring during development.
| Ranking | Definition | Likelihood | 
|---|---|---|
| 5 | Almost certain to occur if not controlled. | Very high | 
| 4 | Likely to occur. | High | 
| 3 | May occur. | Moderate | 
| 2 | Unlikely to occur. | Low | 
| 1 | Extremely unlikely to occur. | Very low | 
AI Risk Priority Number and Acceptability
| Severity →<br>Likelihood ↓ | Negligible (1) | Minor (2) | Moderate (3) | Critical (4) | Catastrophic (5) | 
|---|---|---|---|---|---|
| Very high (5) | Tolerable (5) | Tolerable (10) | Unacceptable (15) | Unacceptable (20) | Unacceptable (25) | 
| High (4) | Acceptable (4) | Tolerable (8) | Tolerable (12) | Unacceptable (16) | Unacceptable (20) | 
| Moderate (3) | Acceptable (3) | Tolerable (6) | Tolerable (9) | Tolerable (12) | Unacceptable (15) | 
| Low (2) | Acceptable (2) | Acceptable (4) | Tolerable (6) | Tolerable (8) | Tolerable (10) | 
| Very low (1) | Acceptable (1) | Acceptable (2) | Acceptable (3) | Acceptable (4) | Tolerable (5) | 
- Acceptable: 
RPN ≤ 4 - Tolerable: 
5 ≤ RPN ≤ 12(Requires risk-benefit analysis) - Unacceptable: 
RPN ≥ 15(Requires mitigation) 
Safety Risks Related to AI
The AI team is responsible for identifying how AI development risks can contribute to hazardous situations. These "safety risks related to AI" are escalated to the product team for inclusion in the overall Safety Risk Matrix and are mitigated through a combination of technical controls and user-facing measures, in line with ISO 14971.
A1 --> B1 --> C1 A2 --> B2 --> C1 B1 --> D1 & D3 B2 --> D2 & D3
<!-- hay que revisar bien quien firma, esto es un copia pega! -->
<Signature contentTitle={contentTitle} />