R-TF-028-010 AI V&V Checks
Table of contents
Purpose
This checklist is used to verify that the Verification & Validation Phase of the AI development lifecycle has been completed in accordance with procedure GP-028 AI Development. It ensures that the AI Development Report and AI Release Report are complete, all performance criteria have been met, and the algorithm package is ready for integration into the target software environment.
Scope
This V&V check covers all AI/ML algorithms integrated into Legit.Health Plus version 1.1.0.0, including:
- Clinical Models (54 models): ICD Category Distribution (1), Visual Sign Intensity Quantification (10 models), Wound Characteristic Assessment (24 models), Lesion Quantification (5 models), Surface Quantification (12 models), Pattern Identification (2 models).
- Non-Clinical Models (5 models): DIQA, Domain Validation, Skin Surface Segmentation, Body Surface Segmentation, Head Detection.
Instructions
The verifier must assess each item in the checklist. For each item, select "Yes," "No," or "N/A" and provide comments where necessary, especially for any "No" answers. All "No" items must be resolved before the V&V Phase can be considered complete and the algorithm package can be released.
Checklist
R-TF-028-005 AI Development Report
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Is the data management process fully documented and traceable? | Yes | Data Management section provides comprehensive documentation of dataset provenance (280,342 images from 850 ICD-11 categories), composition by Fitzpatrick skin type, age, sex, and data quality verification processes. |
| 2. Is the training methodology comprehensively described for each model? | Yes | Each model section includes detailed architecture selection rationale, hyperparameter choices, data augmentation strategies, loss functions, optimizers, and training duration with justification. |
| 3. Are performance results presented with appropriate statistical measures (e.g., confidence intervals)? | Yes | All performance metrics include 95% confidence intervals calculated via bootstrap resampling (1000-2000 iterations). Results tables include sample sizes for transparency. |
| 4. Do all models meet their predefined performance criteria as specified in R-TF-028-001? | Yes | All models demonstrate PASS outcomes against success criteria: ICD Distribution (Top-1 ≥50%, Top-3 ≥60%, Top-5 ≥70%), Binary Indicators (AUC ≥0.80), Visual Signs (RMAE thresholds per sign). |
| 5. Has bias analysis been conducted across relevant subpopulations? | Yes | Comprehensive subgroup analysis performed for Fitzpatrick skin types (I-II, III-IV, V-VI), age groups (Pediatric, Adult, Geriatric), sex (Male, Female), and image type (Clinical, Dermoscopic). |
| 6. Are bias analysis results acceptable for all subgroups? | Yes | All subgroups meet performance criteria. Minor performance variation noted for FST V-VI with ICD classification, documented with mitigation strategies for ongoing monitoring. |
| 7. Has the test set been properly sequestered and used only for final evaluation? | Yes | Test set (12.74% of data, 35,726 images) was sequestered with patient-level separation from training/validation sets. Documentation confirms test set used only once for final evaluation. |
| 8. Is there evidence of robustness testing under various conditions? | Yes | Robustness checks performed including rotations, brightness/contrast adjustments, zoom, and image quality variations. Domain-specific artifact simulation during training (rulers, markers, dermoscopy shadows). |
| 9. Are explainability/interpretability methods documented where applicable? | Yes | Development Plan specifies Grad-CAM and SHAP techniques for model interpretability. Training includes bounding-box guided augmentation to ensure clinically relevant feature learning. |
| 10. Is the model calibration process documented? | Yes | Temperature scaling for probability calibration documented per "On Calibration of Modern Neural Networks" (Guo et al., 2017). Calibration parameter included in configuration files. |
R-TF-028-006 AI Release Report
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Are all models included in the release package documented? | Yes | Complete inventory of 59 models documented with file names, version numbers, and clinical/non-clinical classification. |
| 2. Are input/output specifications clearly defined for each model? | Yes | Detailed input specifications (image format, resolution, color space) and output specifications (data types, value ranges, JSON structures) provided for each model category. |
| 3. Are preprocessing requirements fully specified? | Yes | Preprocessing steps documented: resize dimensions, normalization parameters (ImageNet mean/std), data type conversion, tensor format (CHW ordering). |
| 4. Are post-processing requirements fully specified? | Yes | Post-processing documented including TTA procedures (augmentation list), temperature scaling parameters, probability calibration, and output formatting. |
| 5. Are configuration files complete and correct? | Yes | Configuration file structures provided with model paths, versions, target shapes, normalization parameters, TTA settings, and class label references. |
| 6. Are integration guidelines sufficient for the software development team? | Yes | Release report provides complete integration specifications with code-level details, including binary indicator calculation formula and example JSON outputs. |
| 7. Is the model file format appropriate and documented (e.g., PyTorch version)? | Yes | PyTorch native format (.pt/.pth/.ckpt) specified with PyTorch version >=1.12. Format choice justified for optimized inference and compatibility with research infrastructure. |
| 8. Is the Integration Verification Package documented? | Yes | Integration Verification Package specification includes reference test images, expected outputs file, verification manifest, and acceptance criteria per GP-028 requirements. |
Algorithm Package Verification
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Are all model files present and accessible? | Yes | All 59 PyTorch model files verified present in designated repository location with correct naming convention (model_name_vX.Y.Z.pt). |
| 2. Are all configuration files present and valid? | Yes | JSON configuration files for all models verified present and syntactically valid. Binary indicator mapping matrix validated. |
| 3. Do model versions match the documentation? | Yes | Model version v1.1.0.0 consistent across AI Development Report, AI Release Report, and model file naming. |
| 4. Have the models been tested for basic functionality (inference runs without errors)? | Yes | Smoke testing performed: all models execute inference successfully on sample inputs with expected output formats. |
| 5. Is the package versioned and stored in the designated repository? | Yes | Package version v1.1.0.0 stored in version-controlled repository with semantic versioning and full traceability to development artifacts. |
Risk Assessment Integration
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Has the AI Risk Matrix (R-TF-028-011) been updated based on development findings? | Yes | AI Risk Assessment updated with 29 identified risks. Residual risk distribution: 52% Acceptable (RPN ≤4), 48% Tolerable (RPN 5-9), 0% Unacceptable. |
| 2. Are all residual risks at acceptable or tolerable levels? | Yes | All risks reduced to Acceptable or Tolerable levels. Three Tolerable risks (AI-RISK-016, AI-RISK-022, AI-RISK-026) flagged for enhanced post-market monitoring. |
| 3. Have safety risks related to AI been communicated to the product team? | Yes | 72% of AI risks (21 of 29) linked to device-level safety risks in R-TF-013-002. Traceability established to Clinical Evaluation Plan (R-TF-015-001) and Risk Management Plan (R-TF-013-001). |
| 4. Has the benefit-risk analysis been documented? | Yes | Residual Risk Acceptability section documents benefit-risk justification: improved diagnostic accuracy, reduced inter-observer variability, enhanced clinical workflow, expanded patient access to specialist expertise. |
Traceability
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Is there clear traceability from requirements (R-TF-028-001) to test results? | Yes | Each model section in Development Report references corresponding specification in AI Description. Performance tables explicitly cite success criteria from R-TF-028-001. |
| 2. Are all design decisions documented and justified? | Yes | Architecture selection, hyperparameter choices, and methodology decisions documented with rationale based on experimental comparisons and literature references. |
| 3. Are all data sources traceable and documented? | Yes | DatasetMappingTable component provides complete traceability of all data sources. Version-controlled dataset (LegitHealth-DX) with documented ICD-11 mapping and annotation provenance. |
| 4. Is experiment tracking documented for reproducibility? | Yes | Development Plan specifies MLflow/Weights & Biases for experiment tracking. Each trained model linked to code version, data version, and hyperparameters. |
| 5. Are annotator qualifications and training records referenced? | Yes | Data Annotation Instructions specify annotator qualifications (board-certified dermatologists with ≥5 years experience) and multi-annotator consensus methodology. |
Regulatory Compliance
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Are Good Machine Learning Practice (GMLP) principles demonstrated? | Yes | GMLP principles addressed throughout: data representativeness, sequestered test sets, reference standard methodology, reproducibility, traceability, and transparent reporting. |
| 2. Is alignment with IEC 62304 (software lifecycle) demonstrated? | Yes | AI development integrated into software lifecycle per IEC 62304. Clinical models classified as Class B. V&V activities aligned with software verification requirements. |
| 3. Is alignment with ISO 14971 (risk management) demonstrated? | Yes | AI Risk Assessment follows ISO 14971 framework with 5×5 risk matrix, control measure priority hierarchy, and residual risk acceptability determination. |
| 4. Is alignment with EU AI Act requirements addressed? | Yes | AI Risk Assessment scope explicitly references EU AI Act Regulation 2024/1689. High-risk AI system requirements addressed through risk management and transparency documentation. |
| 5. Are clinical performance claims supported by validation evidence? | Yes | All performance claims in AI Description supported by validation results in Development Report with statistical confidence intervals and subgroup analysis. |
Conclusion
☑ V&V Phase Approved: All checks have been successfully passed. The algorithm package is cleared for release to the software development team.
☐ V&V Phase Not Approved: One or more checks have failed. The responsible team members must address the comments and resubmit the documentation for verification.
Overall Comments:
The Verification & Validation Phase for the Legit.Health Plus v1.1.0.0 AI/ML algorithm package has been completed successfully. Key findings:
-
Performance Verification: All 59 AI models meet their predefined acceptance criteria as specified in R-TF-028-001 AI Description, with performance results documented with appropriate statistical rigor (95% confidence intervals).
-
Bias and Fairness: Comprehensive subgroup analysis demonstrates acceptable performance across demographic categories (Fitzpatrick skin types, age groups, sex). Minor performance variation for darker skin tones (FST V-VI) has been documented with appropriate mitigation through enhanced post-market surveillance.
-
Risk Management Integration: The AI Risk Assessment has been updated with all identified risks reduced to Acceptable or Tolerable levels. Critical risks requiring ongoing monitoring have been identified and linked to post-market surveillance activities.
-
Documentation Completeness: All required documentation per GP-028 has been produced, including AI Description, Development Plan, Data Collection Instructions, Data Annotation Instructions, Development Report, Release Report, and Risk Assessment.
-
Regulatory Alignment: The development process demonstrates compliance with MDR 2017/745, IEC 62304, ISO 14971, GMLP principles, and EU AI Act requirements for high-risk AI systems.
The algorithm package is ready for integration into the Legit.Health Plus software during GP-012 Phase 3 (Software Development).
Release Authorization
| Item | Value |
|---|---|
| Package Version | v1.1.0.0 |
| Repository Location | s3://legit-health-plus/algorithm-packages/v1.1.0.0/ |
Verification
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: JD-009
- Reviewer: JD-009
- Approver: JD-005