R-TF-028-009 AI Design Checks
Table of contents
Purpose
This checklist is used to verify that the Design Phase of the AI development lifecycle has been completed in accordance with procedure GP-028 AI Development. It ensures that the AI Description, Development Plan, Data Collection Instructions, Data Annotation Instructions, and initial Risk Matrix are complete, coherent, and provide a sufficient basis for proceeding to the Development Phase.
Scope
This design check covers all AI/ML algorithms integrated into Legit.Health Plus version 1.1.0.0, including:
- Clinical Models (54 models): ICD Category Distribution (1), Visual Sign Intensity Quantification (10 models), Wound Characteristic Assessment (24 models), Lesion Quantification (5 models), Surface Area Quantification (12 models), Pattern Identification (2 models).
- Non-Clinical Models (5 models): DIQA, Domain Validation, Skin Surface Segmentation, Body Surface Segmentation, Head Detection.
Instructions
The verifier must assess each item in the checklist. For each item, select "Yes," "No," or "N/A" and provide comments where necessary, especially for any "No" answers. All "No" items must be resolved before the Design Phase can be considered complete and approved.
Checklist
R-TF-028-001 AI Description
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Is the purpose of the algorithm package clearly defined? | Yes | The AI Description clearly defines the purpose: to provide clinical decision support through ICD-11 probability distributions, binary indicators for triage, and quantification of visual clinical signs. |
| 2. Is the ICD Category Distribution algorithm's function (ViT model, probability output) described? | Yes | Section describes the deep learning model (ViT architecture), the normalized probability vector output across ICD-11 categories, and the presentation of top-5 differential diagnoses with confidence scores. |
| 3. Is the Binary Indicators' derivation logic (summation via matrix multiplication) clearly described? | Yes | The mathematical formula for deriving binary indicators from ICD probabilities via the mapping matrix is explicitly defined: Binary Indicator_j = Σ(p_i × M_ij). All six indicators are described with clinical definitions. |
| 4. Are the performance endpoints and success criteria for Top-k Accuracy (Top-1 ≥50%, Top-3 ≥60%, Top-5 ≥70%) explicitly stated? | Yes | Performance thresholds are explicitly stated in the "ICD Category Distribution Endpoints and Requirements" section with clinical justification and literature references. |
| 5. Are the performance endpoints and success criteria for Binary Indicator AUC (≥0.80) explicitly stated? | Yes | The AUC ≥0.80 threshold is explicitly stated with a severity scale interpretation table and requirement for 95% confidence intervals. |
| 6. Are performance endpoints for Visual Sign Quantification models (RMAE thresholds) explicitly stated? | Yes | RMAE thresholds are specified for each visual sign: Erythema ≤14%, Desquamation ≤17%, Induration ≤36%, etc., with clinical justification based on inter-observer variability literature. |
| 7. Are the overall data specifications (scale, diversity, expert annotation, Fitzpatrick skin types I-VI) defined? | Yes | Data requirements specify diversity across age, sex, Fitzpatrick skin types (I-VI), anatomical sites, imaging conditions, and disease presentations. Expert annotation by board-certified dermatologists is required. |
| 8. Is the classification of models (Clinical vs. Non-Clinical) clearly defined with appropriate rationale? | Yes | Clear distinction is made between clinical models (fulfilling intended purpose) and non-clinical models (supporting functions), with definitions aligned to MDR 2017/745 intended use requirements. |
| 9. Are cybersecurity, transparency, and integration aspects sufficiently described? | Yes | Requirements for model interpretability (attention maps), output transparency, and integration specifications are described. Cybersecurity considerations are addressed in the broader technical documentation. |
| 10. Are objectives supported by clinical evidence and literature references? | Yes | Comprehensive literature citations support all performance thresholds and clinical objectives, including systematic reviews and clinical studies demonstrating AI-assisted diagnostic improvement. |
R-TF-028-002 AI Development Plan
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Is the project team and their responsibilities clearly defined? | Yes | Roles are defined: Technical Manager (overall management, QMS alignment), Design & Development Manager (lifecycle management per GP-012), and AI Team (development, validation, maintenance). |
| 2. Is the project management approach (meetings, tools, planning) defined? | Yes | Agile framework with 2-week sprints, daily stand-ups, bi-weekly technical reviews. Tools include Jira (task management), GitHub/Bitbucket (version control), MLflow/Weights & Biases (experiment tracking). |
| 3. Is the development environment (hardware, software, tools) specified? | Yes | Development software (Python ≥3.9, PyTorch ≥1.12/TensorFlow ≥2.10, CUDA/cuDNN) and hardware requirements (NVIDIA A100/H100, ≥128GB RAM, ≥5TB NVMe) are specified. Code quality tools (Flake8, Black, MyPy, Pytest) are listed. |
| 4. Does the Data Management Plan cover data collection, curation, partitioning, and test set sequestration? | Yes | Comprehensive plan includes: representativeness (Fitzpatrick I-VI), GDPR compliance, multi-annotator review, patient-level partitioning, and test set sequestration (held-out, used only once for final evaluation). |
| 5. Does the Training & Evaluation Plan cover model architecture selection, training methodology, calibration, and post-processing? | Yes | Plan covers: architecture selection (ViT, ConvNeXt, EfficientNetV2), hyperparameter optimization, data augmentation, overfitting mitigation (dropout, weight decay, early stopping), and temperature scaling for calibration. |
| 6. Does the plan include provisions for explainability (XAI) and subgroup analysis? | Yes | Grad-CAM and SHAP techniques are specified for explainability. Robustness analysis across patient subgroups (skin phototype, age, sex) is required. |
| 7. Does the Release Plan specify the deliverables for the software integration team? | Yes | Deliverables include: algorithm package (PyTorch .pt/.pth/.ckpt files), R-TF-028-006 AI Release Report with integration specifications, and ongoing technical support. Semantic versioning scheme is defined. |
| 8. Does the plan include a comprehensive AI Risk Management Plan with severity/likelihood ranking system? | Yes | AI Risk Management Process is defined with RPN = Severity × Likelihood formula. Severity scale (1-5) and process for risk assessment, control, monitoring, and review are established. |
| 9. Is the development cycle aligned with GP-028 and GP-012 requirements? | Yes | The plan explicitly references the three-phase cycle (Design, Development, V&V) mandated by GP-028, with integration into GP-012 Phase 2 (Software Design) before Phase 3 begins. |
| 10. Is reference standard determination methodology (multi-dermatologist panel, histopathological correlation) defined? | Yes | Reference standard established by panel of ≥3 board-certified dermatologists with discrepancy resolution by senior reviewer or histopathological correlation where available. |
Data Collection Instructions (R-TF-028-003)
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Are there clear and distinct instructions for retrospective data collection from archive/public sources? | Yes | R-TF-028-003 (Archive Data) provides comprehensive protocol for retrospective collection: source identification, evaluation criteria, dataset documentation requirements, and target of >100,000 curated images. |
| 2. Are there clear and distinct instructions for prospective/custom data collection from clinical studies? | Yes | R-TF-028-003 (Custom Gathered Data) provides protocol for prospective collection through clinical validation studies and dedicated acquisition studies, with standardized image acquisition procedures. |
| 3. For retrospective data, are licensing compliance and Creative Commons requirements explicitly addressed? | Yes | Licensing compliance section specifies: only datasets permitting commercial use, license type/version/URL documentation, attribution fulfillment, and license compatibility verification. |
| 4. For prospective data, are ethical requirements (IRB/CEIm approval, informed consent) explicitly addressed? | Yes | Ethical approval (IRB/CEIm), written informed consent process, GDPR compliance, data de-identification at source, and Data Processing Agreements are all explicitly required. |
| 5. Are the target population, inclusion/exclusion criteria defined for both data types? | Yes | Inclusion criteria cover anatomical scope, diagnostic labeling, image quality, modality, and de-identification. Exclusion criteria address poor quality, inadequate labeling, licensing issues, PII, and duplicates. |
| 6. Are the technical protocols for data retrieval, de-identification verification, and secure transfer clearly specified? | Yes | Secure data retrieval (HTTPS/SFTP), checksums for integrity, de-identification verification (automated EXIF stripping, manual PII review), and access-controlled staging areas are specified. |
| 7. Is the rationale for accepting acquisition variability in retrospective data provided? | Yes | Variability in imaging equipment/settings is explicitly stated as a deliberate strength to promote model generalization across clinical environments, supported by literature references [4-6]. |
| 8. Are non-dermatological images (for domain validation) included with appropriate specifications? | Yes | Inclusion criteria explicitly address non-dermatological images (out-of-context objects, confounding textures) for exclusive use in training domain validation model, with required labeling as "non-dermatological" or "out-of-domain". |
Data Annotation Instructions (R-TF-028-004)
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Is the protocol for creating the Binary Indicator Mapping Matrix clear and unambiguous? | Yes | R-TF-028-004 (Binary Indicator Mapping) provides detailed protocol: ICD-11 category extraction, annotation worksheet preparation, primary clinical annotation with decision criteria for all 6 indicators, and secondary reviewer validation. |
| 2. Are the clinical decision criteria for each binary indicator (malignancy, premalignancy, association to malignancy, etc.) explicitly defined? | Yes | Explicit definitions and "Assign TRUE for" / "Assign FALSE for" criteria are provided for each indicator with clinical references (WHO Classification of Skin Tumours, NCCN Guidelines, European consensus guidelines). |
| 3. Are there clear instructions for annotating Visual Signs covering intensity (ordinal), count (bounding boxes), and extent (polygons)? | Yes | R-TF-028-004 (Visual Signs) provides task-specific instructions: ordinal intensity scoring (0-4 scales), bounding box annotation for counting, polygon annotation for extent, and categorical classification for patterns/staging. |
| 4. Are the qualifications for the medical expert annotators (Primary Annotator and Secondary Reviewer) clearly specified? | Yes | Primary Annotator: Board-certified dermatologist with ≥5 years post-certification experience, expertise across dermatological domains. Secondary Reviewer: same qualifications, must be independent. |
| 5. Is the consensus mechanism (multi-annotator, senior review) for resolving discrepancies well-defined? | Yes | Multi-annotator process specified with consensus reference standard via pooling (mean/median for intensity, voting for categories, algorithmic fusion for boxes/polygons). Senior specialist review for high disagreement cases. |
| 6. Are the annotation tools and workflow clearly described? | Yes | Web-based annotation platform workflow described: image examination, task selection, annotation per instructions, completion of all relevant signs. Tool-specific features (bounding box, polygon tools) are referenced. |
R-TF-028-011 AI Risk Matrix
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Have initial AI risks related to data management (data bias, unrepresentative data, inaccurate labels) been identified? | Yes | Risk assessment identifies: AI-RISK-001 (Dataset Representativity Problem), AI-RISK-002 (Data Annotation Error), with specific analysis, consequences, and mitigation measures documented. |
| 2. Have initial AI risks related to model training and evaluation (overfitting, inadequate hyperparameters, inadequate evaluation) been identified? | Yes | Risks identified: AI-RISK-003 (Models Inadequately Evaluated), AI-RISK-004 (Models Inadequately Hyperparametrized), with mitigation measures including evaluation reports, sequestered test data, and hyperparameter studies. |
| 3. Have initial AI risks related to model robustness (imaging variability, artifacts, edge cases) been identified? | Yes | Risk AI-RISK-016 (Model robustness to imaging variability) identified with RPN 8 (Tolerable), requiring enhanced post-market surveillance through PMCF. |
| 4. Have initial AI risks related to clinical output errors (ICD misclassification, binary indicator false negatives for malignancy) been identified? | Yes | Risks AI-RISK-022 (ICD category misclassification) and AI-RISK-026 (Binary indicator false negatives for malignancy) identified, both with RPN 8, flagged for ongoing monitoring. |
| 5. Has an initial assessment of severity and likelihood been performed using the 5×5 Risk Matrix? | Yes | All risks assessed using defined severity scale (1-5: Negligible to Catastrophic) and likelihood scale (1-5: Very low to Very high). RPN = Severity × Likelihood calculated for each. |
| 6. Are risk control measures defined following ISO 14971 priority hierarchy (inherent safety, protective measures, information for safety)? | Yes | Control measures follow ISO 14971 hierarchy: inherent safety (architecture, data diversity, thresholds), protective measures (DIQA, domain validation, confidence thresholds), information for safety (IFU, user training, API error responses). |
| 7. Is the residual risk after mitigation documented and classified (Acceptable/Tolerable/Unacceptable)? | Yes | Residual risks documented: 52% Acceptable (RPN ≤4), 48% Tolerable (RPN 5-9), 0% Unacceptable (RPN ≥10). Residual risk acceptability justified against clinical benefits. |
| 8. Is traceability to AI Specifications (R-TF-028-001), Safety Risks (R-TF-013-002), and Clinical Validation (R-TF-015-001) established? | Yes | Section explicitly documents traceability: 72% of AI risks linked to device-level safety risks in R-TF-013-002. References to Clinical Evaluation Plan (R-TF-015-001) and Post-Market Surveillance (GP-007) established. |
| 9. Are critical risks requiring ongoing monitoring identified with post-market surveillance requirements? | Yes | Three Tolerable risks (AI-RISK-016, AI-RISK-022, AI-RISK-026) identified for enhanced monitoring through: PMCF per GP-007, user feedback analysis, PSUR, and annual AI model performance review. |
Cross-Document Consistency
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Are performance endpoints in R-TF-028-001 consistent with acceptance criteria referenced in R-TF-028-002? | Yes | Development Plan references "specifications in R-TF-028-001 AI Description" as primary input for design and acceptance criteria for V&V. Metrics are consistently defined across both documents. |
| 2. Are data requirements in R-TF-028-001 consistent with data collection protocols in R-TF-028-003? | Yes | Data diversity requirements (Fitzpatrick I-VI, anatomical sites, imaging conditions) in AI Description align with collection protocols in both retrospective and prospective data collection instructions. |
| 3. Are annotation requirements in R-TF-028-001 consistent with annotation protocols in R-TF-028-004? | Yes | Visual sign quantification specifications (intensity, count, extent) in AI Description match task types defined in annotation instructions. Binary indicator definitions are consistent. |
| 4. Are identified risks in R-TF-028-011 traceable to specifications in R-TF-028-001 and mitigation measures in R-TF-028-002? | Yes | Risk Matrix references "ai_ml_specifications" from AI Description and mitigation measures align with Data Management Plan and Training & Evaluation Plan provisions. |
| 5. Is the document numbering scheme consistent with GP-028 procedure requirements? | Yes | All documents follow R-TF-028-0XX naming convention as defined in GP-028 Related QMS Documents section. |
Regulatory Compliance
| Check Item | Yes/No/NA | Comments |
|---|---|---|
| 1. Are Good Machine Learning Practices (GMLP) principles addressed in the design documentation? | Yes | Development Plan explicitly references GMLP principles throughout: data representativeness, sequestered test sets, reference standard methodology, reproducibility, and traceability. |
| 2. Is alignment with MDR 2017/745 requirements (GSPR 17 for software with diagnostic function) demonstrated? | Yes | AI Risk Assessment explicitly references MDR 2017/745 GSPR 17, MDCG guidance documents, and establishes traceability to Clinical Evaluation per MDR requirements. |
| 3. Is alignment with IEC 62304 (software lifecycle) requirements demonstrated? | Yes | Development Plan references IEC 62304, clinical models classified as Class B per IEC 62304. Software lifecycle processes for AI development are defined. |
| 4. Is alignment with ISO 14971 (risk management) requirements demonstrated? | Yes | AI Risk Assessment references ISO 14971:2019, uses 5×5 risk matrix per R-TF-013-001, follows control measure priority hierarchy, and establishes residual risk acceptability. |
| 5. Is alignment with EU AI Act requirements (high-risk AI systems) addressed? | Yes | AI Risk Assessment scope explicitly references EU AI Act Regulation 2024/1689 for high-risk AI systems. Risk management approach addresses AI Act requirements. |
Conclusion
☑ Design Phase Approved: All checks have been successfully passed. The project is cleared to proceed to the Development Phase.
☐ Design Phase Not Approved: One or more checks have failed. The responsible team members must address the comments and resubmit the design documentation for verification.
Overall Comments:
The Design Phase documentation for Legit.Health Plus version 1.1.0.0 AI/ML algorithms is comprehensive and compliant with GP-028 requirements. Key strengths include:
-
Comprehensive Algorithm Specifications: The AI Description provides detailed specifications for all clinical and non-clinical models with evidence-based performance thresholds and clear success criteria.
-
Robust Data Management: Data collection instructions address both retrospective (archive) and prospective (custom) data sources with appropriate ethical, legal, and quality controls including GDPR compliance, informed consent requirements, and de-identification verification.
-
Rigorous Annotation Protocols: Data annotation instructions define clear decision criteria, qualified annotator requirements, and multi-reviewer consensus mechanisms aligned with clinical standards.
-
Integrated Risk Management: The AI Risk Matrix identifies relevant AI-specific risks, applies the 5×5 severity/likelihood framework, and establishes traceability to device-level risk management per ISO 14971.
-
Regulatory Alignment: Documentation demonstrates alignment with MDR 2017/745, IEC 62304, ISO 14971, GMLP principles, and EU AI Act requirements for high-risk AI systems.
The design documentation provides a sufficient basis for proceeding to the Development Phase with confidence that the AI algorithms can be developed, validated, and deployed safely and effectively.
Verification
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: JD-009
- Reviewer: JD-009
- Approver: JD-005