R-TF-028-009 AI Design Checks

Table of contents

Purpose
Scope
Instructions
Checklist
Conclusion
Verification

Purpose

This checklist is used to verify that the Design Phase of the AI development lifecycle has been completed in accordance with procedure GP-028 AI Development. It ensures that the AI Description, Development Plan, Data Collection Instructions, Data Annotation Instructions, and initial Risk Matrix are complete, coherent, and provide a sufficient basis for proceeding to the Development Phase.

Scope

This design check covers all AI/ML algorithms integrated into Legit.Health Plus version 1.1.0.0, including:

Clinical Models (54 models): ICD Category Distribution (1), Visual Sign Intensity Quantification (10 models), Wound Characteristic Assessment (24 models), Lesion Quantification (5 models), Surface Area Quantification (12 models), Pattern Identification (2 models).
Non-Clinical Models (5 models): DIQA, Domain Validation, Skin Surface Segmentation, Body Surface Segmentation, Head Detection.

Instructions

The verifier must assess each item in the checklist. For each item, select "Yes," "No," or "N/A" and provide comments where necessary, especially for any "No" answers. All "No" items must be resolved before the Design Phase can be considered complete and approved.

Checklist

`R-TF-028-001 AI Description`

Check Item	Yes/No/NA	Comments
1. Is the purpose of the algorithm package clearly defined?	Yes	The AI Description clearly defines the purpose: to provide clinical decision support through ICD-11 probability distributions, binary indicators for triage, and quantification of visual clinical signs.
2. Is the ICD Category Distribution algorithm's function (ConvNeXt-V2 model, probability output) described?	Yes	Section describes the deep learning model (ConvNeXt-V2 architecture), the normalized probability vector output across ICD-11 categories, and the presentation of top-5 differential diagnoses with confidence scores.
3. Is the Binary Indicators' derivation logic (summation via matrix multiplication) clearly described?	Yes	The mathematical formula for deriving binary indicators from ICD probabilities via the mapping matrix is explicitly defined: Binary Indicator_j = Σ(p_i × M_ij). All six indicators are described with clinical definitions.
4. Are the performance endpoints and success criteria for Top-k Accuracy (Top-1 ≥50%, Top-3 ≥60%, Top-5 ≥70%) explicitly stated?	Yes	Performance thresholds are explicitly stated in the "ICD Category Distribution Endpoints and Requirements" section with clinical justification and literature references.
5. Are the performance endpoints and success criteria for Binary Indicator AUC (≥0.80) explicitly stated?	Yes	The AUC ≥0.80 threshold is explicitly stated with a severity scale interpretation table and requirement for 95% confidence intervals.
6. Are performance endpoints for Visual Sign Quantification models (RMAE thresholds) explicitly stated?	Yes	RMAE thresholds are specified for each visual sign: Erythema ≤14%, Desquamation ≤17%, Induration ≤36%, etc., with clinical justification based on inter-observer variability literature.
7. Are the overall data specifications (scale, diversity, expert annotation, Fitzpatrick skin types I-VI) defined?	Yes	Data requirements specify diversity across age, sex, Fitzpatrick skin types (I-VI), anatomical sites, imaging conditions, and disease presentations. Expert annotation by board-certified dermatologists is required.
8. Is the classification of models (Clinical vs. Non-Clinical) clearly defined with appropriate rationale?	Yes	Clear distinction is made between clinical models (fulfilling intended purpose) and non-clinical models (supporting functions), with definitions aligned to MDR 2017/745 intended use requirements.
9. Are cybersecurity, transparency, and integration aspects sufficiently described?	Yes	Requirements for model interpretability (attention maps), output transparency, and integration specifications are described. Cybersecurity considerations are addressed in the broader technical documentation.
10. Are objectives supported by clinical evidence and literature references?	Yes	Comprehensive literature citations support all performance thresholds and clinical objectives, including systematic reviews and clinical studies demonstrating AI-assisted diagnostic improvement.

`R-TF-028-002 AI Development Plan`

Check Item	Yes/No/NA	Comments
1. Is the project team and their responsibilities clearly defined?	Yes	Roles are defined: Technical Manager (overall management, QMS alignment), Design & Development Manager (lifecycle management per GP-012), and AI Team (development, validation, maintenance).
2. Is the project management approach (meetings, tools, planning) defined?	Yes	Agile framework with 2-week sprints, daily stand-ups, bi-weekly technical reviews. Tools include Jira (task management), GitHub/Bitbucket (version control), MLflow/Weights & Biases (experiment tracking).
3. Is the development environment (hardware, software, tools) specified?	Yes	Development software (Python ≥3.9, PyTorch ≥1.12/TensorFlow ≥2.10, CUDA/cuDNN) and hardware requirements (NVIDIA A100/H100, ≥128GB RAM, ≥5TB NVMe) are specified. Code quality tools (Flake8, Black, MyPy, Pytest) are listed.
4. Does the Data Management Plan cover data collection, curation, partitioning, and test set sequestration?	Yes	Comprehensive plan includes: representativeness (Fitzpatrick I-VI), GDPR compliance, multi-annotator review, patient-level partitioning, and test set sequestration (held-out, used only once for final evaluation).
5. Does the Training & Evaluation Plan cover model architecture selection, training methodology, calibration, and post-processing?	Yes	Plan covers: architecture selection (ConvNeXt-V2, EfficientNetV2, ViT evaluated), hyperparameter optimization, data augmentation, overfitting mitigation (dropout, weight decay, early stopping), and temperature scaling for calibration.
6. Does the plan include provisions for explainability (XAI) and subgroup analysis?	Yes	Grad-CAM and SHAP techniques are specified for explainability. Robustness analysis across patient subgroups (skin phototype, age, sex) is required.
7. Does the Release Plan specify the deliverables for the software integration team?	Yes	Deliverables include: algorithm package (PyTorch .pt/.pth/.ckpt files), R-TF-028-006 AI Release Report with integration specifications, and ongoing technical support. Semantic versioning scheme is defined.
8. Does the plan include a comprehensive AI Risk Management Plan with severity/likelihood ranking system?	Yes	AI Risk Management Process is defined with RPN = Severity × Likelihood formula. Severity scale (1-5) and process for risk assessment, control, monitoring, and review are established.
9. Is the development cycle aligned with GP-028 and GP-012 requirements?	Yes	The plan explicitly references the three-phase cycle (Design, Development, V&V) mandated by GP-028, with integration into GP-012 Phase 2 (Software Design) before Phase 3 begins.
10. Is reference standard determination methodology (multi-dermatologist panel, histopathological correlation) defined?	Yes	Reference standard established by panel of ≥3 board-certified dermatologists with discrepancy resolution by senior reviewer or histopathological correlation where available.

Data Collection Instructions (`R-TF-028-003`)

Check Item	Yes/No/NA	Comments
1. Are there clear and distinct instructions for retrospective data collection from archive/public sources?	Yes	R-TF-028-003 (Archive Data) provides comprehensive protocol for retrospective collection: source identification, evaluation criteria, dataset documentation requirements, and target of >100,000 curated images.
2. Are there clear and distinct instructions for prospective/custom data collection from clinical studies?	Yes	R-TF-028-003 (Custom Gathered Data) provides protocol for prospective collection through clinical validation studies and dedicated acquisition studies, with standardized image acquisition procedures.
3. For retrospective data, are licensing compliance and Creative Commons requirements explicitly addressed?	Yes	Licensing compliance section specifies: only datasets permitting commercial use, license type/version/URL documentation, attribution fulfillment, and license compatibility verification.
4. For prospective data, are ethical requirements (IRB/CEIm approval, informed consent) explicitly addressed?	Yes	Ethical approval (IRB/CEIm), written informed consent process, GDPR compliance, data de-identification at source, and Data Processing Agreements are all explicitly required.
5. Are the target population, inclusion/exclusion criteria defined for both data types?	Yes	Inclusion criteria cover anatomical scope, diagnostic labeling, image quality, modality, and de-identification. Exclusion criteria address poor quality, inadequate labeling, licensing issues, PII, and duplicates.
6. Are the technical protocols for data retrieval, de-identification verification, and secure transfer clearly specified?	Yes	Secure data retrieval (HTTPS/SFTP), checksums for integrity, de-identification verification (automated EXIF stripping, manual PII review), and access-controlled staging areas are specified.
7. Is the rationale for accepting acquisition variability in retrospective data provided?	Yes	Variability in imaging equipment/settings is explicitly stated as a deliberate strength to promote model generalization across clinical environments, supported by literature references [4-6].
8. Are non-dermatological images (for domain validation) included with appropriate specifications?	Yes	Inclusion criteria explicitly address non-dermatological images (out-of-context objects, confounding textures) for exclusive use in training domain validation model, with required labeling as "non-dermatological" or "out-of-domain".

Data Annotation Instructions (`R-TF-028-004`)

Check Item	Yes/No/NA	Comments
1. Is the protocol for creating the Binary Indicator Mapping Matrix clear and unambiguous?	Yes	R-TF-028-004 (Binary Indicator Mapping) provides detailed protocol: ICD-11 category extraction, annotation worksheet preparation, primary clinical annotation with decision criteria for all 6 indicators, and secondary reviewer validation.
2. Are the clinical decision criteria for each binary indicator (malignancy, premalignancy, association to malignancy, etc.) explicitly defined?	Yes	Explicit definitions and "Assign TRUE for" / "Assign FALSE for" criteria are provided for each indicator with clinical references (WHO Classification of Skin Tumours, NCCN Guidelines, European consensus guidelines).
3. Are there clear instructions for annotating Visual Signs covering intensity (ordinal), count (bounding boxes), and extent (polygons)?	Yes	R-TF-028-004 (Visual Signs) provides task-specific instructions: ordinal intensity scoring (0-4 scales), bounding box annotation for counting, polygon annotation for extent, and categorical classification for patterns/staging.
4. Are the qualifications for the medical expert annotators (Primary Annotator and Secondary Reviewer) clearly specified?	Yes	Primary Annotator: Board-certified dermatologist with ≥5 years post-certification experience, expertise across dermatological domains. Secondary Reviewer: same qualifications, must be independent.
5. Is the consensus mechanism (multi-annotator, senior review) for resolving discrepancies well-defined?	Yes	Multi-annotator process specified with consensus reference standard via pooling (mean/median for intensity, voting for categories, algorithmic fusion for boxes/polygons). Senior specialist review for high disagreement cases.
6. Are the annotation tools and workflow clearly described?	Yes	Web-based annotation platform workflow described: image examination, task selection, annotation per instructions, completion of all relevant signs. Tool-specific features (bounding box, polygon tools) are referenced.

`R-TF-028-011 AI Risk Matrix`

Check Item	Yes/No/NA	Comments
1. Have initial AI risks related to data management (data bias, unrepresentative data, inaccurate labels) been identified?	Yes	Risk assessment identifies: AI-RISK-001 (Dataset Representativity Problem), AI-RISK-002 (Data Annotation Error), with specific analysis, consequences, and mitigation measures documented.
2. Have initial AI risks related to model training and evaluation (overfitting, inadequate hyperparameters, inadequate evaluation) been identified?	Yes	Risks identified: AI-RISK-003 (Models Inadequately Evaluated), AI-RISK-004 (Models Inadequately Hyperparametrized), with mitigation measures including evaluation reports, sequestered test data, and hyperparameter studies.
3. Have initial AI risks related to model robustness (imaging variability, artifacts, edge cases) been identified?	Yes	Risk AI-RISK-016 (Model robustness to imaging variability) identified with RPN 8 (Tolerable), requiring enhanced post-market surveillance through PMCF.
4. Have initial AI risks related to clinical output errors (ICD misclassification, binary indicator false negatives for malignancy) been identified?	Yes	Risks AI-RISK-022 (ICD category misclassification) and AI-RISK-026 (Binary indicator false negatives for malignancy) identified, both with RPN 8, flagged for ongoing monitoring.
5. Has an initial assessment of severity and likelihood been performed using the 5×5 Risk Matrix?	Yes	All risks assessed using defined severity scale (1-5: Negligible to Catastrophic) and likelihood scale (1-5: Very low to Very high). RPN = Severity × Likelihood calculated for each.
6. Are risk control measures defined following ISO 14971 priority hierarchy (inherent safety, protective measures, information for safety)?	Yes	Control measures follow ISO 14971 hierarchy: inherent safety (architecture, data diversity, thresholds), protective measures (DIQA, domain validation, confidence thresholds), information for safety (IFU, user training, API error responses).
7. Is the residual risk after mitigation documented and classified (Acceptable/Tolerable/Unacceptable)?	Yes	Residual risks documented: 52% Acceptable (RPN ≤4), 48% Tolerable (RPN 5-9), 0% Unacceptable (RPN ≥10). Residual risk acceptability justified against clinical benefits.
8. Is traceability to AI Specifications (R-TF-028-001), Safety Risks (R-TF-013-002), and Clinical Validation (R-TF-015-001) established?	Yes	Section explicitly documents traceability: 72% of AI risks linked to device-level safety risks in R-TF-013-002. References to Clinical Evaluation Plan (R-TF-015-001) and Post-Market Surveillance (GP-007) established.
9. Are critical risks requiring ongoing monitoring identified with post-market surveillance requirements?	Yes	Three Tolerable risks (AI-RISK-016, AI-RISK-022, AI-RISK-026) identified for enhanced monitoring through: PMCF per GP-007, user feedback analysis, PSUR, and annual AI model performance review.

Cross-Document Consistency

Check Item	Yes/No/NA	Comments
1. Are performance endpoints in R-TF-028-001 consistent with acceptance criteria referenced in R-TF-028-002?	Yes	Development Plan references "specifications in R-TF-028-001 AI Description" as primary input for design and acceptance criteria for V&V. Metrics are consistently defined across both documents.
2. Are data requirements in R-TF-028-001 consistent with data collection protocols in R-TF-028-003?	Yes	Data diversity requirements (Fitzpatrick I-VI, anatomical sites, imaging conditions) in AI Description align with collection protocols in both retrospective and prospective data collection instructions.
3. Are annotation requirements in R-TF-028-001 consistent with annotation protocols in R-TF-028-004?	Yes	Visual sign quantification specifications (intensity, count, extent) in AI Description match task types defined in annotation instructions. Binary indicator definitions are consistent.
4. Are identified risks in R-TF-028-011 traceable to specifications in R-TF-028-001 and mitigation measures in R-TF-028-002?	Yes	Risk Matrix references "ai_ml_specifications" from AI Description and mitigation measures align with Data Management Plan and Training & Evaluation Plan provisions.
5. Is the document numbering scheme consistent with GP-028 procedure requirements?	Yes	All documents follow R-TF-028-0XX naming convention as defined in GP-028 Related QMS Documents section.

Regulatory Compliance

Check Item	Yes/No/NA	Comments
1. Are Good Machine Learning Practices (GMLP) principles addressed in the design documentation?	Yes	Development Plan explicitly references GMLP principles throughout: data representativeness, sequestered test sets, reference standard methodology, reproducibility, and traceability.
2. Is alignment with MDR 2017/745 requirements (GSPR 17 for software with diagnostic function) demonstrated?	Yes	AI Risk Assessment explicitly references MDR 2017/745 GSPR 17, MDCG guidance documents, and establishes traceability to Clinical Evaluation per MDR requirements.
3. Is alignment with IEC 62304 (software lifecycle) requirements demonstrated?	Yes	Development Plan references IEC 62304, clinical models classified as Class B per IEC 62304. Software lifecycle processes for AI development are defined.
4. Is alignment with ISO 14971 (risk management) requirements demonstrated?	Yes	AI Risk Assessment references ISO 14971:2019, uses 5×5 risk matrix per R-TF-013-001, follows control measure priority hierarchy, and establishes residual risk acceptability.
5. Is alignment with EU AI Act requirements (high-risk AI systems) addressed?	Yes	AI Risk Assessment scope explicitly references EU AI Act Regulation 2024/1689 for high-risk AI systems. Risk management approach addresses AI Act requirements.

Conclusion

☑ Design Phase Approved: All checks have been successfully passed. The project is cleared to proceed to the Development Phase.

☐ Design Phase Not Approved: One or more checks have failed. The responsible team members must address the comments and resubmit the design documentation for verification.

Overall Comments:

The Design Phase documentation for Legit.Health Plus version 1.1.0.0 AI/ML algorithms is comprehensive and compliant with GP-028 requirements. Key strengths include:

Comprehensive Algorithm Specifications: The AI Description provides detailed specifications for all clinical and non-clinical models with evidence-based performance thresholds and clear success criteria.
Robust Data Management: Data collection instructions address both retrospective (archive) and prospective (custom) data sources with appropriate ethical, legal, and quality controls including GDPR compliance, informed consent requirements, and de-identification verification.
Rigorous Annotation Protocols: Data annotation instructions define clear decision criteria, qualified annotator requirements, and multi-reviewer consensus mechanisms aligned with clinical standards.
Integrated Risk Management: The AI Risk Matrix identifies relevant AI-specific risks, applies the 5×5 severity/likelihood framework, and establishes traceability to device-level risk management per ISO 14971.
Regulatory Alignment: Documentation demonstrates alignment with MDR 2017/745, IEC 62304, ISO 14971, GMLP principles, and EU AI Act requirements for high-risk AI systems.

The design documentation provides a sufficient basis for proceeding to the Development Phase with confidence that the AI algorithms can be developed, validated, and deployed safely and effectively.

Verification

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

Author: Team members involved
Reviewer: JD-003 Design & Development Manager, JD-004 Quality Manager & PRRC
Approver: JD-001 General Manager

ㅤㅤ ㅤㅤ

Purpose​

Scope​

Instructions​

Checklist​

R-TF-028-001 AI Description​

R-TF-028-002 AI Development Plan​

Data Collection Instructions (R-TF-028-003)​

Data Annotation Instructions (R-TF-028-004)​

R-TF-028-011 AI Risk Matrix​

Cross-Document Consistency​

Regulatory Compliance​

Conclusion​

Verification​