Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • CAPA Plan - BSI CE Mark Closeout
    • Index
    • Overview and Device Description
    • Information provided by the Manufacturer
    • Design and Manufacturing Information
    • GSPR
    • Benefit-Risk Analysis and Risk Management
    • Product Verification and Validation
      • Software
      • Artificial Intelligence
        • R-TF-028-001 AI Description
        • R-TF-028-002 AI Development Plan
        • R-TF-028-003 Data Collection Instructions: Custom Gathered Data
        • R-TF-028-003 Data Collection Instructions: Archive Data
        • R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
        • R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping
        • R-TF-028-004 Data Annotation Instructions - Non-clinical data
        • R-TF-028-004 Data Annotation Instructions - Visual Signs
        • R-TF-028-005 AI Development Report
        • R-TF-028-006 AI Release Report
        • R-TF-028-009 AI Design Checks
        • R-TF-028-010 AI V&V Checks
        • R-TF-028-011 AI Risk Assessment
      • Cybersecurity
      • Usability and Human Factors Engineering
      • Clinical
      • Commissioning
    • Post-Market Surveillance
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Pricing
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Product Verification and Validation
  • Artificial Intelligence
  • R-TF-028-011 AI Risk Assessment

R-TF-028-011 AI Risk Assessment

Table of contents
  • Purpose
  • Scope
    • Clinical Models (Class B per IEC 62304)
    • Non-Clinical Models (Supporting Functions)
  • Methodology
    • Risk Identification
    • Risk Estimation
      • Severity Scale
      • Likelihood Scale
      • Risk Priority Number (RPN)
    • Risk Control Measures
    • Traceability
  • Risk Assessment Table
  • Summary of Risk Assessment
    • Risk Distribution
    • Critical Risks Requiring Ongoing Monitoring
  • Residual Risk Acceptability
  • Integration with Device Risk Management
  • References

Purpose​

This document provides a comprehensive risk assessment specifically for the Artificial Intelligence (AI) components of Legit.Health Plus, in accordance with:

  • ISO 14971:2019 – Application of Risk Management to Medical Devices
  • MDR 2017/745 – General Safety and Performance Requirements (GSPRs), particularly GSPR 17 (Software with diagnostic or measuring function)
  • MDCG 2020-1 – Guidance on Clinical Evaluation of Medical Device Software (MDSW)
  • EU AI Act – Regulation laying down harmonised rules on Artificial Intelligence (high-risk AI systems)
  • IEC 62304:2006+A1:2015 – Medical Device Software Lifecycle Processes (Class B software)
  • GP-028 – Internal procedure for AI Development and Risk Management

This AI Risk Assessment is integrated with and traceable to the overall device Risk Management Plan (R-TF-013-001) and Risk Assessment (R-TF-013-002) per ISO 14971 requirements.

Scope​

This assessment covers all AI algorithms within Legit.Health Plus version 1.1.0.0:

Clinical Models (Class B per IEC 62304)​

ModelFunctionPerformance Threshold
ICD Category DistributionProbability distribution across ICD-11 dermatological categoriesTop-1 ≥50%, Top-3 ≥60%, Top-5 ≥70%
Binary IndicatorsMalignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral (≤48h), high-priority referral (≤2 weeks)AUC ≥0.80 each
Visual Sign QuantificationErythema, desquamation, induration, pustule, crusting, xerosis, swelling, oozing, excoriation, lichenificationRMAE thresholds per sign
Wound Characteristic AssessmentTissue type classification, wound stagingBalanced accuracy ≥0.70
Surface SegmentationLesion/wound area quantificationIoU ≥0.70

Non-Clinical Models (Supporting Functions)​

ModelFunctionPerformance Threshold
DIQAImage quality assessment and filteringAccuracy ≥90%, Pearson r ≥0.80
Domain ValidationClinical/dermoscopic/non-skin classificationOverall accuracy ≥95%
Skin Surface SegmentationSkin region detection and isolationIoU ≥0.85
Body Surface SegmentationSkin segmentation for analysisIoU ≥0.80

Methodology​

Risk Identification​

AI-specific risks were identified through:

  1. Regulatory guidance analysis: MDCG 2019-11, FDA Guidance on AI/ML-based SaMD, Health Canada Pre-Market Guidance for MLMD
  2. Literature review: Published AI failure modes in dermatology and medical imaging
  3. FMEA approach: Systematic analysis of each AI development stage (data collection → annotation → training → validation → deployment → monitoring)
  4. Clinical workflow analysis: Potential misuse scenarios and use errors involving AI outputs
  5. Expert consultation: Input from AI engineers, dermatologists, regulatory affairs, and clinical safety specialists

Risk Estimation​

Risks are estimated using the 5×5 Risk Matrix defined in R-TF-013-001 Risk Management Plan:

Severity Scale​

LevelDescriptionDefinition
1NegligibleNo impact on patient health or clinical decision
2MinorInconvenience or temporary minor impact; fully recoverable
3ModerateSignificant impact requiring additional intervention; recoverable
4CriticalSerious harm including delayed diagnosis of serious condition
5CatastrophicDeath or irreversible serious harm

Likelihood Scale​

LevelDescriptionProbability
1Very low<1% occurrence rate
2Low1-5% occurrence rate
3Moderate5-15% occurrence rate
4High15-50% occurrence rate
5Very high>50% occurrence rate

Risk Priority Number (RPN)​

RPN = Severity × Likelihood

RPN RangeRisk ClassRequired Action
1-4AcceptableRisk acceptable; document and monitor
5-9TolerableRisk reduction measures recommended; benefit-risk evaluation required
10-25UnacceptableRisk reduction mandatory before release

Risk Control Measures​

For each identified risk, control measures follow the priority hierarchy per ISO 14971:

  1. Inherent safety by design: Algorithm architecture, training data diversity, performance thresholds
  2. Protective measures: Quality gates (DIQA, Domain Validation), confidence thresholds, multi-model redundancy
  3. Information for safety: IFU warnings, user training, transparency documentation, API error responses

Traceability​

Each AI risk is traced to:

  • AI Specifications: R-TF-028-001 AI Description
  • Safety Risks: R-TF-013-002 Risk Assessment (where applicable)
  • Clinical Validation: R-TF-015-001 Clinical Evaluation Plan
  • Post-Market Surveillance: GP-007 Post-Market Surveillance

Risk Assessment Table​

IDIssue typeIssue keySummaryRoot cause / Sequence of eventsConsequencesAI Specifications Originating the RiskInitial severityInitial likelihoodInitial RPNInitial risk classRisk control measuresResidual severityResidual likelihoodResidual RPNResidual risk classTransfer to Safety RisksRelevant for Safety (Justification)Safety Risk IDs
1AI RiskAI-RISK-001Dataset Not Representative of Intended Use PopulationCollected dermatological images do not adequately represent the diversity of the target population (Fitzpatrick skin types I-VI, anatomical locations, ICD-11 conditions, demographics) specified in the intended use, leading to models that fail to generalize across all patient populations.AI models underperform on underrepresented patient subgroups (particularly darker Fitzpatrick skin types V-VI, rare ICD-11 conditions, specific anatomical sites), leading to diagnostic errors, incorrect visual sign severity assessments, and health inequities in clinical use.R-TF-028-001 Sections 'ICD Category Distribution' (requires Top-k accuracy across diverse populations), 'Visual Sign Quantification' (validation across Fitzpatrick I-VI), 'Data Specifications' (requires representative data across demographics). All clinical models specify validation requirements across diverse populations per R-TF-028-003.Critical (4)Moderate (3)12TolerableMulti-source data collection strategy: prospective hospital data from dermatology departments + retrospective atlas data + independent evaluation hold-out sets (R-TF-028-003). Documented demographics of collected datasets including Fitzpatrick skin type distribution (I-VI), age, gender, anatomical sites, and condition prevalence across ICD-11 categories (R-TF-028-005 Development Report). Stratified sampling to ensure balanced representation across critical demographic and clinical variables. Bias analysis and fairness evaluation across Fitzpatrick skin types with minimum performance thresholds enforced per subgroup. Independent evaluation on sequestered hold-out test sets with documented population characteristics (R-TF-028-002 Development Plan). Performance reporting stratified by Fitzpatrick skin type, anatomical site, and ICD-11 category prevalenceCritical (4)Very low (1)4AcceptableYESNon-representative training data leads to model bias and incorrect diagnostic outputs for underrepresented populations (particularly Fitzpatrick skin types V-VI), potentially causing misdiagnosis per ISO 14971 harm assessment, MDR GSPR 22 (non-discrimination), and EU AI Act fairness requirements.R-SKK, R-7US, R-GY6
2AI RiskAI-RISK-002Data Annotation Errors by Expert DermatologistsExpert dermatologists provide incorrect or inconsistent reference standard labels for ICD-11 categories, binary indicators (malignancy, urgent referral), visual severity signs (erythema, desquamation, induration intensity 0-9), wound characteristics (22 binary classifiers), or other annotations due to inter-observer variability, lack of clear annotation guidelines, or annotation fatigue.Models are trained and validated on erroneous or inconsistent reference standard, resulting in unreliable ICD probability distributions, incorrect visual sign severity quantification, and misleading binary indicators that could impact clinical triage and patient care.R-TF-028-001: All clinical models require expert dermatologist annotations—ICD Category Distribution, Visual Sign Quantification (10 ordinal categories per sign), Wound Assessment (staging 0-4, intensity 0-19, 22 binary characteristics). R-TF-028-004 Data Annotation Instructions specify protocols for ICD-11 mapping, visual signs, and binary indicators.Critical (4)High (4)16UnacceptableAll annotations performed exclusively by board-certified dermatologists with demonstrated expertise in relevant subspecialties. Comprehensive annotator training and calibration sessions using reference image sets with known reference standard. Detailed, reproducible annotation instructions documented in R-TF-028-004 with visual examples, severity anchor images, and edge case guidance for all clinical models. Multi-expert annotation with consensus or adjudication protocols: minimum 3 dermatologists for ICD reference standard, senior reviewer for discrepancies. Inter-rater agreement assessment (Cohen's κ for categorical, ICC for ordinal) documented in R-TF-028-005 with minimum thresholds (κ ≥ 0.60 for ICD, ICC ≥ 0.70 for severity). Histopathological correlation for reference standard determination where clinically appropriate (malignancy, ambiguous diagnoses). Automated outlier detection via cross-validation to identify potentially erroneous annotations for re-review. Regular annotation quality audits and re-calibration sessions during data collection phasesCritical (4)Low (2)8TolerableYESAnnotation errors in reference standard labels directly impact model training quality, leading to unreliable ICD probability distributions, incorrect severity scores, and misleading binary indicators affecting patient triage per ISO 14971 requirements.R-SKK, R-GY6
3AI RiskAI-RISK-003Inadequate Model Evaluation Metrics or Test DataAI models are evaluated using inappropriate metrics, insufficient test data, or non-independent datasets, resulting in performance estimates that do not reflect real-world clinical performance across ICD-11 categories, 10 visual sign intensity models, and wound assessment algorithms.Deployed models perform worse than validated metrics suggest in clinical use, leading to incorrect Top-k ICD suggestions, severity misclassification (erythema, desquamation, induration RMAE exceeds thresholds), inappropriate binary indicator outputs, and loss of clinician trust.R-TF-028-001: Performance endpoints per algorithm type—ICD Category Distribution (Top-1 ≥50%, Top-3 ≥60%, Top-5 ≥70%), Binary Indicators (AUC ≥0.80), Visual Signs (RMAE ≤14-36% depending on sign), Wound Assessment (RMAE ≤10% staging, ≤24% intensity, BA ≥50-55% characteristics). All with 95% CI requirements.Critical (4)Moderate (3)12TolerableDetailed evaluation reports for each algorithm documenting all specified metrics per R-TF-028-001 thresholds (R-TF-028-005 Development Report). Each algorithm performance objective covered by dedicated evaluation against independent test sets with appropriate sample size calculations. Strict sequestration of test data from training and validation sets—held-out, used only once for final unbiased evaluation (R-TF-028-002 Development Plan). Evaluation code unit tested and version controlled to ensure metric calculation correctness. Performance reported with 95% confidence intervals for all metrics as required by R-TF-028-001. Stratified evaluation across critical subgroups: Fitzpatrick skin types I-VI, anatomical sites, severity levels, ICD-11 category prevalence. Comparison to expert dermatologist performance baselines (inter-observer variability) documented in literature and validation studies. Statistical validation confirming model meets or exceeds all specified thresholds before deployment authorizationCritical (4)Very low (1)4AcceptableYESInadequate evaluation metrics or test data can result in deployed models performing worse than expected in clinical use, leading to incorrect diagnoses and potential patient harm per ISO 14971 and MDR clinical evaluation requirements (MDCG 2020-1).R-SKK, R-VL1
4AI RiskAI-RISK-004Suboptimal Model Architecture or HyperparametersSelected deep learning architectures (Vision Transformers for ICD classification, EfficientNet for DIQA, encoder-decoder for segmentation, CNNs for intensity quantification) or hyperparameters (learning rate, regularization, loss functions, temperature scaling) are inappropriate for dermatological image analysis tasks, leading to poor convergence, overfitting, or underfitting.Models fail to meet specified performance thresholds—ICD Top-k accuracy below 50%/60%/70%, visual sign RMAE above thresholds (14-36%), binary indicator AUC below 0.80—resulting in clinically inadequate outputs that cannot support healthcare professionals in patient assessment.R-TF-028-001: Specifies architectures (Vision Transformer for ICD, EfficientNet for DIQA, encoder-decoder for segmentation) and performance endpoints. R-TF-028-002 Development Plan: training methodology, hyperparameter optimization, model calibration (temperature scaling).Critical (4)Moderate (3)12TolerableSystematic hyperparameter optimization studies (Bayesian optimization, grid search) documented in R-TF-028-005 Development Report. Evaluation of multiple state-of-the-art architectures for each task: Vision Transformers, ConvNeXt, EfficientNet variants, hybrid approaches. Transfer learning from large-scale pretrained models (ImageNet, dermatological datasets) to improve convergence and performance. Validation on independent datasets to assess generalization before architecture selection. Regularization techniques (dropout, weight decay, data augmentation) to prevent overfitting per R-TF-028-002. Early stopping and learning rate scheduling based on validation performance with TensorBoard monitoring. Model calibration using temperature scaling to ensure output probabilities are reliable (calibration curves documented). Ablation studies to validate architectural choices and component contributions. Peer review of model architecture and training protocols by AI/ML specialistsCritical (4)Very low (1)4AcceptableYESSuboptimal architecture or hyperparameters lead to models failing to meet performance thresholds, resulting in clinically inadequate ICD distributions and severity scores that cannot support healthcare professionals per ISO 14971 and EU AI Act requirements for AI system performance.R-SKK, R-VL1
5AI RiskAI-RISK-005Cybersecurity: Model Extraction or Adversarial Input AttacksMalicious actors attempt to extract proprietary model weights/architecture through API probing, or craft adversarial inputs (modified images) designed to cause incorrect ICD classifications, false binary indicator outputs, or erroneous severity predictions, compromising model security and clinical reliability.Model intellectual property is stolen enabling unauthorized use, or adversarial attacks cause systematic misdiagnoses (false negatives for malignancy), incorrect severity scores, or safety-critical failures in clinical deployment.R-TF-028-001 Section 'Cybersecurity and Transparency': models deployed within Legit.Health Plus via REST API, static models (no continuous learning), input validation via DIQA and Domain Validation non-clinical models. R-TF-Device Description: API-based deployment with authentication.Critical (4)Moderate (3)12TolerableREST API with mandatory API key authentication limits unauthorized access (documented in R-TF-Device Description). Static models (no continuous learning or online updates) prevent training-time poisoning attacks. Input validation via multi-stage quality gates: Domain Validation model rejects non-skin images, DIQA model filters poor-quality inputs. Model weights encrypted in deployment package; inference performed server-side without exposing model architecture. Rate limiting and monitoring of API requests to detect probing or extraction attempts. Adversarial robustness testing during validation phase including perturbed image inputs. No direct model access provided to users—all inference via controlled API endpoints. Security review and penetration testing of deployment architecture per IEC 81001-5-1. Incident response plan for suspected adversarial attacks or security breachesCritical (4)Very low (1)4AcceptableYESAI-specific cybersecurity threats (model extraction, adversarial attacks) can cause systematic misdiagnoses (false negative malignancy indicators) and safety-critical failures per ISO 14971 and IEC 81001-5-1 cybersecurity requirements for AI-enabled medical devices.R-SKK, R-VL1
6AI RiskAI-RISK-006Bias and Fairness: Disparate Performance Across Fitzpatrick Skin TypesAI models exhibit significantly degraded performance on darker skin types (Fitzpatrick V-VI) compared to lighter skin types (I-III) due to dataset imbalance, lighting artifacts in darker skin photography, or algorithm design biases, perpetuating health inequities in dermatological AI.Patients with Fitzpatrick V-VI skin receive inaccurate ICD probability distributions (Top-k accuracy drops), incorrect severity assessments (erythema RMAE increases significantly on dark skin), and unreliable binary indicators, leading to delayed or inappropriate treatment and exacerbating healthcare disparities.R-TF-028-001: All clinical models (ICD, Visual Signs, Binary Indicators) require validation across Fitzpatrick skin types I-VI with documented performance per subgroup.Critical (4)High (4)16UnacceptableProspective data collection from diverse clinical sites ensures representation across all Fitzpatrick types I-VI with documented distribution targets. Stratified performance evaluation with metrics (Top-k, RMAE, AUC) reported separately for each Fitzpatrick type in R-TF-028-005 Development Report. Bias analysis and fairness audits using performance parity metrics (equalized odds, calibration across groups) conducted during development. Data augmentation techniques designed to preserve skin tone characteristics while increasing dataset diversity. Balanced training strategies (oversampling, loss weighting) to prevent model bias toward overrepresented Fitzpatrick types I-III. Minimum performance thresholds enforced for each Fitzpatrick type subgroup (no subgroup >20% below overall performance). Post-market surveillance includes stratified performance monitoring by Fitzpatrick skin type with alert thresholds. Clinical validation studies include diverse patient populations across all Fitzpatrick types (documented in R-TF-015-001 Clinical Evaluation Plan)Critical (4)Low (2)8TolerableYESDisparate AI performance across Fitzpatrick skin types leads to health inequities and inaccurate diagnoses for patients with darker skin (Fitzpatrick V-VI), directly impacting patient safety per ISO 14971, MDR GSPR 22 (non-discrimination), and EU AI Act Article 10 (bias prevention).R-SKK, R-7US, R-GY6
7AI RiskAI-RISK-007Model Training Failures: Overfitting or UnderfittingModels overfit to training data (memorizing rather than generalizing) or underfit (failing to learn relevant patterns), resulting in poor performance on new patient images during clinical deployment.Overfitting leads to excellent training performance but poor real-world generalization. Underfitting results in consistently poor performance. Both compromise clinical utility and patient safety.R-TF-028-001: All models specify performance thresholds on independent test sets. Development Plan (R-TF-028-002) defines training procedures and validation protocols.Critical (4)High (4)16UnacceptableTraining monitored using TensorBoard with validation metrics tracked throughout. Early stopping based on validation set performance prevents overfitting. Comprehensive regularization techniques: dropout, weight decay, batch normalization, data augmentation. Stratified train/validation/test splits ensure representative evaluation at all stages. K-fold cross-validation during hyperparameter tuning for robust parameter selection. Learning curve analysis to diagnose overfitting/underfitting and guide corrective actions. Independent test set performance as final acceptance criterion (never used during development). Ablation studies validate that model complexity matches task difficulty. Training logs and model checkpoints version controlled for traceability and reproducibilityCritical (4)Very low (1)4AcceptableYESOverfitting or underfitting results in poor real-world model performance and generalization failures, compromising clinical utility and patient safety per ISO 14971 performance requirements.R-SKK, R-VL1, R-75L
8AI RiskAI-RISK-008Model Deployment Failures: Development vs. Deployed Performance MismatchModels converted for deployment (e.g., to TensorFlow Lite, ONNX, or mobile-optimized formats) exhibit different numerical outputs or degraded performance compared to development versions due to conversion errors, precision loss, or implementation bugs.Deployed models provide inaccurate predictions despite successful validation during development, leading to unreliable clinical outputs and potential patient harm.R-TF-028-001 Section 'Other Specifications': deployment conversion validated by prediction equivalence testing. R-TF-028-006 AI/ML Release Report documents deployment validation.Critical (4)Moderate (3)12TolerableModels deployed using validated frameworks compatible with development environment (e.g., TensorFlow → TensorFlow Lite). Numerical equivalence testing: deployed models compared against development models on identical test inputs. Integration tests verify end-to-end pipeline produces expected outputs on reference images. Quantization and optimization validated to ensure accuracy degradation within acceptable bounds (typically <1%). Visual inspection and statistical comparison of outputs from both development and deployed models. Version control and traceability from development models to deployed artifacts (R-TF-028-006 Release Report). Automated regression testing suite runs with each model deployment. Clinical validation performed on final deployed models in target deployment environmentCritical (4)Very low (1)4AcceptableYESDeployment conversion errors (precision loss, implementation bugs) cause deployed models to exhibit different outputs than validated versions, leading to unreliable clinical results per ISO 14971 and IEC 62304 deployment validation requirements.R-SKK, R-VL1
9AI RiskAI-RISK-009Data Preprocessing Errors Destroying Clinically Relevant InformationImage preprocessing operations (resizing, normalization, augmentation) inadvertently remove or alter clinically important features (erythema intensity, lesion boundaries, texture patterns) critical for accurate model predictions.Models fail to learn or detect relevant dermatological features, resulting in poor diagnostic accuracy, incorrect severity assessment, and unreliable clinical outputs.R-TF-028-001: All image-based models require preservation of clinical features (erythema, desquamation, lesion morphology). Development methodology includes preprocessing pipeline definition.Critical (4)Moderate (3)12TolerableMultiple preprocessing strategies tested during development with ablation studies. Visual inspection of preprocessed images by dermatologists to confirm feature preservation. Preprocessing pipelines designed to preserve color accuracy (critical for erythema, pigmentation assessment). Augmentation strategies validated to maintain clinical realism (e.g., brightness/contrast adjustments within physiological ranges). Augmentation parameters constrained to prevent unrealistic transformations (e.g., no extreme rotations that violate anatomical constraints). Preprocessing code unit tested with reference images and known expected outputs. Documentation of preprocessing rationale and clinical impact assessment (R-TF-028-005 Development Report). Expert dermatologist review of augmentation examples to ensure clinical validityCritical (4)Very low (1)4AcceptableYESPreprocessing errors that remove clinically relevant features (erythema intensity, lesion boundaries) lead to poor diagnostic accuracy and unreliable severity assessments per ISO 14971 harm analysis.R-SKK, R-GY6
10AI RiskAI-RISK-010Incorrect Model Integration: Pre/Post-Processing Implementation ErrorsModels integrated into Legit.Health Plus software with incorrect pre-processing (wrong image normalization, incorrect color space conversion) or post-processing (incorrect weighted expected value calculation for severity scores, incorrect binary indicator mapping matrix application) due to implementation bugs or documentation errors.Models produce incorrect outputs despite being correctly trained and validated—wrong ICD probability distributions, incorrect visual sign severity values, incorrect binary indicator values—leading to clinical decisions based on mathematically incorrect computations.R-TF-028-001: Specifies post-processing formulas—weighted expected value ŷ = Σ(i × pᵢ) for visual signs, binary indicator mapping matrix M_ij. R-TF-028-006 AI Release Report documents integration specifications.Critical (4)Moderate (3)12TolerableDetailed integration specifications documented in R-TF-028-006 AI Release Report including exact mathematical formulas, input/output formats, and reference implementations. Unit tests for all pre-processing functions (normalization, resizing, color space) with known input-output pairs. Unit tests for all post-processing functions (weighted expected value, mapping matrix) verifying mathematical correctness. Integration tests comparing deployed software implementation against validated Python reference implementation on identical inputs. End-to-end validation using reference images with known expected outputs from development environment. Code review of integration code by both AI/ML team and software engineering team per GP-012 Design and Development. Regression testing suite executed with each software build ensuring no drift from expected outputs. Clinical validation performed on final integrated system (not just standalone models) per R-TF-015-001 Clinical Evaluation Plan. Traceability from model specifications (R-TF-028-001) through implementation to validation results (R-TF-012-043 Traceability Matrix)Critical (4)Very low (1)4AcceptableYESIncorrect pre/post-processing implementation causes models to produce erroneous outputs despite correct training—incorrect ICD probabilities, wrong visual sign values—leading to clinical decisions based on mathematically incorrect computations per ISO 14971 and IEC 62304.R-SKK, R-VL1
11AI RiskAI-RISK-011Insufficient Dataset Size for Model ComplexityCollected dataset is too small to support training of deep learning models with millions of parameters, particularly for rare conditions or underrepresented categories, resulting in poor generalization.Models perform poorly on rare conditions, minority skin types, or underrepresented anatomical sites, leading to systematic diagnostic failures for specific patient populations.R-TF-028-001 Section 'Data Specifications': requires large-scale data collection ([NUMBER OF IMAGES] dermatological images) with diversity across conditions, skin types, and anatomical sites. Multiple data sources specified.Critical (4)Moderate (3)12TolerableMulti-source data collection strategy: prospective hospital data + retrospective atlas data + evaluation hold-out sets. Targeted data collection for rare conditions and underrepresented categories through specialized sources. Data augmentation to increase effective training set size while preserving clinical realism. Transfer learning from large-scale pretrained models (ImageNet, dermatological datasets) to reduce data requirements. Sample size calculations and power analysis to determine adequate dataset sizes per category. Learning curve analysis to validate that additional data would not significantly improve performance. Documentation of dataset size and composition in R-TF-028-005 Development Report. Performance evaluation stratified by category prevalence to identify underperforming rare classes. Minimum sample size thresholds per ICD category, severity level, and demographic subgroupCritical (4)Very low (1)4AcceptableYESInsufficient dataset size leads to poor model generalization on rare conditions and underrepresented categories, causing systematic diagnostic failures for specific patient populations per ISO 14971 and MDR requirements.R-SKK, R-7US, R-GY6
12AI RiskAI-RISK-012Data Collection Protocol FailuresImages collected during prospective data collection fail to meet quality standards, lack required metadata, or do not follow standardized imaging protocols, resulting in unusable or low-quality training data.Insufficient high-quality data for model development, leading to delayed development timelines or models trained on poor-quality data with suboptimal performance.R-TF-028-001 Section 'Data Specifications': prospective and retrospective data collection with quality requirements. R-TF-028-003 Data Collection Instructions define protocols.Moderate (3)Moderate (3)9TolerableComprehensive data collection protocols documented in R-TF-028-003 with clear imaging standards, metadata requirements, and quality criteria. Training and certification of data collection personnel (photographers, clinicians). Real-time quality checks during data collection with immediate feedback for non-compliant images. Standardized imaging equipment and settings specified in protocols. Metadata validation at point of collection to ensure completeness. Regular audits of collected data quality by AI/ML team with feedback to collection sites. DIQA (Image Quality Assessment) model applied to prospectively collected images to identify quality issues early. Iterative protocol refinement based on initial data collection experiences. Multiple data collection sites to ensure robustness to site-specific variationsModerate (3)Low (2)6AcceptableYESData collection protocol failures result in insufficient high-quality data for model development, potentially leading to models trained on poor-quality data with suboptimal performance per ISO 14971.R-GY6, R-VL1
13AI RiskAI-RISK-013Cybersecurity: Data Breach During DevelopmentPatient images and associated clinical data collected for AI development are accessed by unauthorized parties due to inadequate data security controls during collection, storage, or processing.Patient privacy violations, regulatory non-compliance (GDPR, HIPAA), loss of patient trust, and potential legal/financial consequences for the organization.R-TF-028-001 Section 'Cybersecurity and Transparency': data de-identified/pseudonymized, research server restricted access, secure segregation required.Critical (4)Moderate (3)12TolerableAll patient data de-identified/pseudonymized before transfer to AI development environment. Research servers with restricted access controls (authentication, authorization, role-based access). Data encryption at rest and in transit (SSL/TLS for transfers, encrypted storage). Network segregation isolating research data environment from public networks. Access logging and monitoring with regular security audits. Data processing agreements with all data sources and collaborators. Regular security training for all personnel with data access. Incident response plan for potential data breaches. Compliance with GDPR, HIPAA, and applicable data protection regulations. Data retention and deletion policies to minimize exposure windowCritical (4)Very low (1)4AcceptableNOData breach during development is primarily a privacy/regulatory compliance risk rather than a direct patient safety risk. While serious, it does not directly impact model performance or diagnostic accuracy per ISO 14971 harm categories.-
14AI RiskAI-RISK-014Poor Data Quality: Non-Diagnostic Images in Training SetTraining dataset contains significant proportion of poor-quality images (blurry, poorly lit, obstructed, wrong anatomical site) that were not filtered out during quality control, degrading model learning.Models learn from low-quality examples, reducing accuracy and potentially learning to accept poor-quality inputs that should be rejected, compromising clinical reliability.R-TF-028-001: DIQA (Image Quality Assessment) non-clinical model filters poor-quality images. Data quality requirements in Data Specifications section.Critical (4)Moderate (3)12TolerableDIQA (Dermatology Image Quality Assessment) model automatically filters images below quality threshold (score ≥6 for clinical use). All images used for training/evaluation reviewed by expert dermatologists during annotation (quality confirmed during labeling). Multi-stage quality checks by AI/ML team: automated quality metrics, visual inspection, outlier detection. Quality criteria defined in data collection protocols (R-TF-028-003). Statistical analysis of dataset quality distributions documented in R-TF-028-005 Development Report. Quality-based stratified sampling ensures training set contains only diagnostic-quality images. Separate evaluation of model performance on varying quality levels to assess robustness. Ongoing quality monitoring during data collection with feedback loops to improve collection proceduresCritical (4)Very low (1)4AcceptableYESPoor-quality non-diagnostic images in training set degrade model learning and can cause models to accept poor-quality inputs that should be rejected, compromising clinical reliability per ISO 14971.R-SKK, R-GY6
15AI RiskAI-RISK-015Inadequate Development Environment and InfrastructureAI development environment lacks sufficient computational resources (GPUs), appropriate software libraries, or version control, leading to inefficient development, irreproducible results, or technical failures.Development delays, inability to train complex models effectively, poor model performance due to resource constraints, or inability to reproduce and validate results.R-TF-028-001 Section 'Other Specifications': fixed hardware/software stack required with version tracking. Development methodology requires reproducible environment.Moderate (3)Low (2)6AcceptableDedicated GPU-enabled workstations and cloud compute infrastructure for AI/ML development. State-of-the-art deep learning frameworks (TensorFlow, PyTorch) with version pinning. Containerized development environment (Docker) ensuring reproducibility and consistency across team. Version control for all code, models, and configurations (Git). Dependency management with requirements.txt or conda environment files shared across team. Documented software stack versions in development reports (R-TF-028-005). Regular infrastructure updates and maintenance. Backup and disaster recovery procedures for development data and model checkpoints. Continuous integration/continuous deployment (CI/CD) pipelines for automated testingModerate (3)Very low (1)3AcceptableNOInadequate development infrastructure primarily affects development efficiency and timelines rather than direct patient safety. Poor reproducibility is addressed through process controls per ISO 13485.-
16AI RiskAI-RISK-016Model Robustness Failures: Sensitivity to Image Acquisition VariabilityModels are brittle to natural variations in imaging conditions (lighting, camera angle, distance, device type, background) commonly encountered in clinical practice, leading to inconsistent predictions.Model performance degrades significantly when images are captured under non-ideal conditions, limiting clinical utility and potentially causing diagnostic errors in real-world use.R-TF-028-001 Section 'Integration and Environment': models must handle variability in acquisition. All models specify validation across diverse imaging conditions and devices.Critical (4)Moderate (3)12TolerableTraining data includes diverse imaging conditions (multiple devices, lighting, angles) from prospective clinical collection. Data augmentation simulating realistic imaging variations (brightness, contrast, rotation, noise). Validation on images from multiple acquisition sources and device types. Color normalization and preprocessing techniques to improve robustness to lighting variations. DIQA model provides quality gate rejecting images outside acceptable acquisition parameter ranges. Performance evaluation stratified by imaging device, lighting condition, and other technical factors. Clinical validation studies using images from target deployment environments and device types. User guidance and training on optimal image acquisition practices to minimize extreme variability. Robustness testing with intentionally varied imaging conditions during validationCritical (4)Low (2)8TolerableYESModel brittleness to imaging variability (lighting, camera angle, device type) leads to inconsistent predictions and diagnostic errors in real-world clinical conditions per ISO 14971 and MDR GSPR requirements.R-SKK, R-VL1
17AI RiskAI-RISK-017Lack of Transparency: Users Unaware of AI/ML Usage and LimitationsHealthcare professionals integrating Legit.Health Plus via API are not adequately informed that AI/ML algorithms generate the ICD probability distributions, severity scores, and binary indicators; do not understand model limitations (performance thresholds, validation populations, edge cases); or are unaware that outputs require clinical interpretation and cannot replace diagnostic judgment.Over-reliance on AI outputs without critical clinical evaluation, misuse of device outside validated intended use (e.g., using for conditions outside ICD-11 categories), automation bias leading to failure to recognize AI errors, or inappropriate clinical decisions based on misunderstood output semantics.R-TF-028-001 Section 'Cybersecurity and Transparency': Documentation must clearly state algorithm purpose, inputs/outputs, performance metrics, limitations, and that AI/ML models generate clinical outputs. R-TF-Device Description: Intended use specifies decision support role. EU AI Act Article 13 transparency requirements.Moderate (3)Moderate (3)9TolerableIFU (Instructions for Use) clearly states that AI/ML deep learning algorithms generate: ICD-11 probability distributions, binary indicators (malignancy, urgency), visual sign severity scores, wound assessments. Intended use statement in technical documentation and IFU specifies device provides 'quantitative data on clinical signs and interpretative distribution of ICD categories to support (not replace) healthcare professional assessment'. Model performance metrics documented in IFU: Top-k accuracy for ICD, RMAE for severity signs, AUC for binary indicators—with validation population descriptions and 95% CI. Model limitations documented: conditions outside ICD-11 categories not validated, performance varies by Fitzpatrick skin type, image quality affects accuracy. API documentation includes clear labeling of AI-generated outputs in response schema with metadata indicating AI provenance. Training materials for healthcare system integrators emphasize clinical interpretation responsibility and AI decision support role. Contraindications documented: not for use as sole diagnostic method, biopsy required for suspected malignancy regardless of AI output. Warnings about known edge cases: rare conditions, pediatric populations, unusual presentations. Post-market surveillance via GP-007 includes user feedback on understanding and appropriate use of AI features. Compliance with EU AI Act Article 13 transparency requirements for high-risk AI systemsModerate (3)Low (2)6AcceptableYESLack of AI/ML transparency leads to over-reliance on AI outputs without critical evaluation, automation bias, and failure to recognize situations requiring clinical judgment per ISO 14971 use error analysis, IEC 62366-1 usability requirements, and EU AI Act Article 13 transparency obligations.R-SKK
18AI RiskAI-RISK-018Model Retraining Failures: Performance Degradation After UpdateWhen models are retrained with new data or updated algorithms, the retrained models perform worse than original validated models due to insufficient data, improper retraining procedures, or inadequate validation.Device update introduces models with degraded performance, leading to increased diagnostic errors and compromised patient safety compared to previous version.R-TF-028-001: Each model has specified performance thresholds. Retraining must maintain or improve performance. Update procedures referenced in risk management section.Critical (4)Moderate (3)12TolerableRetrained models follow identical development and validation procedures as original models (same protocols, metrics, thresholds). Retrained models evaluated on same independent test sets as original models for direct comparison. Acceptance criteria: retrained models must meet all original performance thresholds (non-inferiority) or demonstrate statistically significant improvement. Regression testing ensures retrained models do not introduce new failure modes. Clinical validation repeated for models with substantial architectural or data changes. Version control and traceability from retraining data through validation to deployment. Risk-benefit analysis for model updates considering potential performance changes. Predefined Change Control Plan (PCCP) specifies when model retraining is required and validation procedures. Regulatory notification/approval processes followed for significant model changes per MDR/RDC requirementsCritical (4)Very low (1)4AcceptableYESModel retraining failures introduce performance degradation compared to validated versions, leading to increased diagnostic errors and compromised patient safety per ISO 14971 and MDR change control requirements.R-SKK, R-VL1, R-75L
19AI RiskAI-RISK-019Inappropriate Update Triggers: Unnecessary Model ChangesModels are updated or retrained in response to non-critical triggers (minor performance variations, small data additions) causing unnecessary regulatory burden and introducing update-related risks without meaningful benefit.Resources wasted on unnecessary updates, increased risk of introducing errors during update process, regulatory compliance burden, and potential device downtime during updates.R-TF-028-001 Section 'Specifications and Risks': risks linked to AI/ML Risk Matrix. Update criteria need clear definition to prevent unnecessary changes.Minor (2)Moderate (3)6AcceptablePredefined Change Control Plan (PCCP) clearly enumerates specific triggers requiring model updates (e.g., safety issues, significant performance drift, regulatory requirements, intended use expansion). Update decision criteria based on quantitative thresholds (e.g., >10% performance degradation, statistically significant subgroup disparities). Risk-benefit analysis required before initiating model update process. Regular scheduled reviews of model performance and update need (e.g., annual). Post-market surveillance data analyzed systematically to identify genuine update needs. Distinction between critical updates (safety-related, requiring immediate action) and non-critical improvements (can be batched). Documentation of update decision rationale in technical files. Stakeholder review (clinical, regulatory, technical) before committing to update processMinor (2)Very low (1)2AcceptableNOUnnecessary model updates are primarily a regulatory/operational burden risk. While update process may introduce errors, risk controls in update procedures address safety concerns per ISO 14971.-
20AI RiskAI-RISK-020Model Obsolescence: Dataset No Longer Representative or Technology OutdatedOver time, patient population characteristics shift, new dermatological conditions emerge, imaging technology evolves, or AI algorithms advance, making current models obsolete and underperforming compared to state-of-art.Gradual degradation of model performance, reduced diagnostic accuracy, and suboptimal patient care as clinical practice and patient demographics evolve beyond model training data.R-TF-028-001: Models trained on current dermatological conditions and imaging modalities. Post-market surveillance mentioned in risk mitigation section. Technology watch needed for AI advancement.Moderate (3)Moderate (3)9TolerablePost-market surveillance system monitors model performance over time with real-world usage data (per SOP-24 or equivalent). Regular literature review and technology watch for AI/ML advancements in dermatology. Performance trending analysis identifies gradual degradation before clinical impact. Periodic re-validation on contemporary patient populations to assess ongoing performance. Dermatological conditions monitored for epidemiological changes or emerging conditions. User feedback and complaint systems capture performance concerns from clinical users. Scheduled review cycles (e.g., every 2-3 years) assess need for model updates based on technological advancement. PCCP includes obsolescence assessment criteria and triggers for major model updates. Modular architecture facilitates targeted model updates without complete system redesign. Training data includes diverse conditions and imaging modalities to maximize longevityModerate (3)Low (2)6AcceptableYESModel obsolescence leads to gradual performance degradation as patient populations and technology evolve, reducing diagnostic accuracy and suboptimal patient care per ISO 14971 post-market surveillance requirements.R-SKK, R-VL1, R-75L
21AI RiskAI-RISK-021Usability Issues: Model Outputs Not Interpretable by Clinical UsersAI model outputs (ICD probabilities, severity scores, binary indicators, wound assessments) are presented in a format that is confusing, difficult to interpret, or lacks clinical context, preventing effective use by healthcare professionals.Clinicians unable to effectively utilize AI outputs for patient care, leading to device abandonment, misinterpretation of results, or incorrect clinical decisions based on misunderstood outputs.R-TF-028-001: Each model outputs structured clinical information (probabilities, scores, classifications). Usability not explicitly detailed but critical for intended use fulfillment.Moderate (3)Moderate (3)9TolerableUser interface design following clinical workflow and medical device usability principles (IEC 62366-1). Formative usability studies during development to iteratively refine output presentation. Summative usability validation (human factors testing) with representative users (dermatologists, primary care physicians, specialists). Clinical outputs accompanied by clear explanations and clinical context (e.g., ICD codes with disease names, severity scores with severity categories). Visual aids (charts, graphs, color coding) to enhance interpretation of quantitative outputs. Confidence indicators or uncertainty visualization to support clinical judgment. User training materials and documentation explain interpretation of all AI outputs. Clinical advisory board review of user interface and output presentation. Post-market feedback collection on usability and output interpretability. Iterative design improvements based on real-world user experienceModerate (3)Low (2)6AcceptableYESNon-interpretable AI outputs prevent effective use by healthcare professionals, potentially leading to device abandonment, misinterpretation, or incorrect clinical decisions per ISO 14971 and IEC 62366-1 usability requirements.R-SKK
22AI RiskAI-RISK-022Clinical Model Failure: ICD Category Misclassification Leading to Incorrect Diagnosis SuggestionICD Category Distribution model assigns high probability to incorrect disease category among ICD-11 classes, potentially misleading clinician toward wrong diagnosis, particularly for visually similar conditions (e.g., melanoma vs. seborrheic keratosis, psoriasis vs. eczema) or rare diseases with limited training data.Delayed correct diagnosis, inappropriate treatment initiation, or failure to recognize serious conditions requiring urgent intervention—most critically, melanoma misclassified as benign nevus leading to delayed cancer diagnosis and potential metastasis.R-TF-028-001 Section 'ICD Category Distribution': Top-1 accuracy ≥50%, Top-3 accuracy ≥60%, Top-5 accuracy ≥70% (validated with 95% CI). Binary indicators (malignant, pre-malignant, urgent referral, high-priority referral with AUC ≥0.80) provide independent safety layer. Intended use: interpretative distribution to support (not replace) healthcare professional judgment.Critical (4)Moderate (3)12TolerableTop-5 ICD suggestions presented (not just top-1) to support differential diagnosis—ensures correct diagnosis typically in shortlist even if not ranked first. Six binary indicators (malignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral ≤48h, high-priority referral ≤2 weeks) provide independent safety checks beyond ICD classification. Performance thresholds validated per R-TF-028-001: Top-1 ≥50%, Top-3 ≥60%, Top-5 ≥70% ensure multi-option differential support. Urgent referral binary indicator (AUC ≥0.80) flags high-risk lesions requiring rapid evaluation regardless of specific ICD classification. Intended use statement (R-TF-Device Description) clearly states outputs are interpretative distributions to support (not replace) healthcare professional clinical judgment. User warnings in IFU about limitations of AI diagnosis and need for clinical correlation, biopsy for suspected malignancy. Clinical validation (R-TF-015-001) demonstrates AI-assisted diagnosis improves accuracy compared to physicians alone (literature: Han et al. 2020, Liu et al. 2020). Confidence scores (probability values) accompany predictions to indicate certainty level, enabling clinician judgment. Post-market surveillance monitors misclassification patterns via user feedback and serious adverse event reporting per GP-007. User training emphasizes differential diagnosis approach and clinical decision-making responsibilityCritical (4)Low (2)8TolerableYESICD misclassification can mislead clinicians toward incorrect diagnoses, particularly for serious conditions like melanoma potentially causing delayed cancer diagnosis, inappropriate treatment, and patient harm per ISO 14971 harm assessment and MDR GSPR clinical benefit-risk requirements.R-SKK
23AI RiskAI-RISK-023Clinical Model Failure: Visual Sign Severity Misquantification Leading to Incorrect Clinical AssessmentVisual sign quantification models (erythema RMAE ≤14%, desquamation RMAE ≤17%, induration RMAE ≤36%, pustule RMAE ≤30%, crusting/xerosis/swelling/oozing/excoriation/lichenification RMAE ≤20%) significantly over- or under-estimate severity beyond acceptable thresholds, providing inaccurate quantitative data on clinical signs to healthcare professionals.Inaccurate visual sign severity data (e.g., erythema intensity underestimated or overestimated) may mislead healthcare professionals in their clinical assessment, potentially affecting treatment decisions made by the clinician based on the quantitative data provided.R-TF-028-001: Each visual sign model specifies RMAE thresholds—erythema ≤14%, desquamation ≤17%, induration ≤36%, pustule ≤30%, crusting/xerosis/swelling/oozing/excoriation/lichenification ≤20%. All require performance superior to inter-observer variability with 95% CI.Moderate (3)Moderate (3)9TolerableEach visual sign model validated to specific RMAE thresholds per R-TF-028-001, demonstrated superior to typical inter-observer variability among experts. Multiple independent visual sign assessments provide redundancy—single model error does not affect other visual sign outputs. Performance validation against multi-expert consensus (minimum 3 dermatologists) ensures robust reference standard per R-TF-028-004. Visual sign severity data presented as quantitative data to support (not replace) clinical assessment—clinician retains full decision authority per intended use. Model calibration using temperature scaling ensures output probability distributions reflect true confidence per R-TF-028-002. Clinical validation studies assess correlation between automated visual sign quantification and dermatologist assessments per R-TF-015-001. Performance monitoring in post-market surveillance identifies systematic bias patterns via user feedback per GP-007. User guidance in IFU recommends clinical correlation and professional judgment for all treatment decisionsModerate (3)Low (2)6AcceptableYESVisual sign severity misquantification provides inaccurate quantitative data that may mislead healthcare professionals in clinical assessment, potentially affecting treatment decisions per ISO 14971 harm analysis.R-SKK
24AI RiskAI-RISK-024Non-Clinical Model Failure: Domain Validation Error Routing Non-Skin Images to Clinical AnalysisDomain Validation non-clinical model (Lightweight Vision Transformer) incorrectly classifies non-skin images (text documents, random objects, non-skin body parts, internal organ images) as skin 'clinical' or 'dermoscopic' domain images, allowing them to proceed through the quality gates to clinical diagnostic models (ICD, severity, wound assessment).Clinical models (ICD Category Distribution, Visual Sign Quantification, Wound Assessment) process inappropriate inputs not within their operational domain, producing meaningless outputs (random ICD probabilities, nonsensical severity scores) that could mislead clinicians if not recognized as invalid.R-TF-028-001 Section 'Domain Validation': Non-clinical model using Lightweight Vision Transformer for three-class classification (clinical skin, dermoscopic skin, non-skin). Performance thresholds: overall accuracy ≥95%, non-skin precision ≥0.95, non-skin recall ≥0.90. Model serves as critical gateway in processing pipeline.Moderate (3)Low (2)6AcceptableDomain Validation model achieves high performance thresholds per R-TF-028-001: overall accuracy ≥95%, non-skin precision ≥0.95 (high specificity for rejecting non-skin), non-skin recall ≥0.90. Conservative decision threshold favoring rejection of ambiguous inputs—prioritize specificity for skin acceptance over sensitivity. Multi-stage quality gates: Domain Validation → DIQA quality assessment → clinical analysis; multiple opportunities to reject inappropriate inputs. API response clearly indicates domain validation failure with appropriate error code when image rejected. Clinical models include additional confidence thresholds and sanity checks that may flag unusual input characteristics. User guidance in IFU specifies appropriate image types (clinical dermatological photographs, dermoscopic images) with examples. Post-market surveillance monitors domain validation failure patterns and user-reported inappropriate acceptances via GP-007. Logging and monitoring of domain validation decisions enables detection of systematic edge cases for model improvementModerate (3)Very low (1)3AcceptableYESDomain validation errors allowing non-skin images to clinical analysis produce meaningless outputs that could mislead clinicians if not recognized as invalid per ISO 14971 and intended use requirements.R-SKK, R-VL1
25AI RiskAI-RISK-025Non-Clinical Model Failure: DIQA Incorrectly Accepts Poor Quality ImagesDermatology Image Quality Assessment (DIQA) non-clinical model (EfficientNet-based) assigns acceptable quality scores (≥6 on 0-10 scale) to poor-quality images (blurry, poorly lit, motion artifact, obstructed lesions, overexposed) allowing them to proceed to clinical analysis models that require diagnostic-quality inputs.Clinical models (ICD Category Distribution, Visual Sign Quantification, Wound Assessment) analyze low-quality images with insufficient diagnostic information, resulting in unreliable ICD probability distributions, inaccurate severity scores (erythema intensity on overexposed image), and compromised clinical decision support.R-TF-028-001 Section 'Dermatology Image Quality Assessment': EfficientNet-based non-clinical model with multi-dimensional quality assessment (focus, lighting, framing, artifacts, resolution). Performance thresholds: binary accept/reject accuracy ≥90%, sensitivity ≥85%, specificity ≥85%, Pearson correlation ≥0.80. Clinical threshold: score ≥6 for analysis.Moderate (3)Moderate (3)9TolerableDIQA model validated to performance thresholds per R-TF-028-001: binary accuracy ≥90%, sensitivity ≥85% (reject detection), specificity ≥85% (accept detection), Pearson correlation ≥0.80 with expert quality assessment. Multi-dimensional quality assessment evaluates: focus/sharpness, lighting adequacy, proper framing, absence of artifacts, sufficient resolution. Conservative acceptance threshold (≥6 on 0-10 scale) ensures only clearly acceptable diagnostic-quality images proceed to clinical analysis. Real-time quality score feedback via API response enables client applications to prompt immediate image retake for poor-quality submissions. User guidance in IFU on optimal imaging practices: lighting, distance, focus, avoiding motion blur. Clinical models may exhibit graceful performance degradation on borderline-quality images while maintaining safety margins. Post-market surveillance monitors correlation between DIQA scores and clinical model performance via GP-007 feedback mechanisms. User feedback mechanism allows clinicians to flag cases where quality assessment appeared inappropriate for model improvement. Periodic DIQA model re-validation on contemporary device camera characteristics and emerging image quality distributionsModerate (3)Low (2)6AcceptableYESDIQA accepting poor-quality images allows clinical models to analyze inputs with insufficient diagnostic information, resulting in unreliable ICD distributions and severity scores that compromise clinical decision support per ISO 14971.R-SKK, R-VL1
26AI RiskAI-RISK-026Clinical Model Failure: Binary Indicator False Negative for Malignancy/Urgent ReferralBinary indicator models (particularly 'malignant', 'pre-malignant', 'urgent referral ≤48h', 'high-priority referral ≤2 weeks') fail to flag high-risk lesions requiring immediate specialist evaluation, providing false reassurance when malignancy or urgent pathology is present.Delayed diagnosis of malignancy (melanoma, squamous cell carcinoma, basal cell carcinoma) or failure to expedite urgent referrals for rapidly progressing lesions, leading to disease progression, potential metastasis, and significantly worse patient outcomes.R-TF-028-001 Section 'Binary Indicators': Six indicators defined—malignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral (≤48h), high-priority referral (≤2 weeks). Each requires AUC ≥0.80 with 95% CI. Mapping matrix M_ij aggregates ICD probabilities to indicators.Critical (4)Moderate (3)12TolerableBinary indicators validated to AUC ≥0.80 per R-TF-028-001, ensuring good discriminative performance across all six indicators. Sensitivity optimization for critical safety indicators (malignant, urgent referral) during training to minimize false negatives even at cost of some false positives. Multiple redundant binary indicators provide overlapping coverage: malignant + pre-malignant + associated with malignancy + pigmented lesion + urgent referral + high-priority referral. Dermatologist-validated mapping matrix M_ij documented in R-TF-028-001 ensures appropriate ICD-11 categories contribute to each safety indicator. Intended use clearly positions device as decision support, not autonomous diagnostic system—clinician retains diagnostic responsibility per IFU. ICD category distribution provides independent information stream—high-risk diagnoses (melanoma, SCC, BCC) in top-5 may prompt clinical suspicion even if binary indicator subthreshold. Clinical validation (R-TF-015-001) includes assessment of missed high-risk cases and impact on diagnostic workflow safety. User warnings in IFU emphasize that negative binary indicators do not rule out serious disease—clinical judgment paramount, biopsy indicated for any suspicious lesion. Post-market surveillance specifically monitors malignancy detection performance and urgent referral appropriateness with serious adverse event reporting per GP-007. Periodic re-validation on emerging malignancy presentations and evolving referral guidelines (NICE, EADV)Critical (4)Low (2)8TolerableYESFalse negatives for malignancy/urgent referral indicators provide false reassurance, potentially leading to delayed diagnosis of melanoma and other skin cancers, disease progression, metastasis, and significantly worse patient outcomes per ISO 14971 critical harm assessment and MDR GSPR clinical benefit-risk requirements.R-SKK
27AI RiskAI-RISK-027Environmental Drift: Model Performance Degradation in Telemedicine vs. Clinical SettingsModels validated primarily on professional clinical photography exhibit degraded performance when used with patient self-captured images in telemedicine scenarios due to differences in imaging quality, framing, lighting, and technique.Reduced diagnostic accuracy and unreliable severity assessments in telemedicine applications, limiting device utility and potentially causing misdiagnosis in remote care settings.R-TF-028-001 Section 'Integration and Environment': models must handle variability in acquisition. Intended use includes both professional and patient-captured images. Validation requirements include diverse imaging contexts.Moderate (3)Moderate (3)9TolerableTraining data includes both professional clinical photography and patient self-captured images to ensure robustness. DIQA model provides quality gate for both professional and patient-captured images with consistent thresholds. Real-time quality feedback during image capture helps patients achieve acceptable quality in telemedicine settings. Validation stratified by acquisition context (professional vs. patient-captured, clinical vs. telemedicine). User guidance and training specific to telemedicine image capture (patient education materials). Post-market surveillance monitors performance separately for telemedicine vs. in-clinic usage. Graceful degradation: models provide confidence indicators reflecting image quality and acquisition context. Clinical validation includes telemedicine use cases with representative patient-captured imagesModerate (3)Low (2)6AcceptableYESEnvironmental drift causes performance degradation in telemedicine vs. clinical settings, reducing diagnostic accuracy and reliability in remote care per ISO 14971 and MDR GSPR requirements for intended use validation.R-SKK, R-VL1
28AI RiskAI-RISK-028Multi-Model Pipeline Failure: Cascading Errors Across Dependent ModelsErrors in upstream non-clinical models (domain validation, DIQA, skin segmentation) propagate to downstream clinical models, compounding errors and producing highly unreliable clinical outputs.Systematic failures in AI pipeline produce completely erroneous diagnostic suggestions or severity assessments when multiple models fail sequentially.R-TF-028-001: Multiple models operate in pipeline (domain validation → DIQA → skin segmentation → clinical models). Integration requirements specify compatibility but error propagation needs management.Critical (4)Low (2)8AcceptableEach model in pipeline validated independently with high performance thresholds to minimize individual failure probability. Quality gates at multiple stages (domain validation, DIQA) prevent propagation of clearly inappropriate inputs. Confidence scoring at each pipeline stage allows downstream models to account for upstream uncertainty. End-to-end integration testing validates full pipeline performance, not just individual model performance. Graceful degradation: pipeline failures result in output suppression or low-confidence flagging rather than erroneous high-confidence outputs. Monitoring and logging of pipeline stage outputs enables detection of systematic failure patterns. Clinical validation assesses real-world pipeline performance under diverse conditions. User interface indicates when outputs have low confidence due to quality or processing issues. Post-market surveillance monitors pipeline failure modes and cascading error patternsCritical (4)Very low (1)4AcceptableYESCascading errors in multi-model pipeline compound to produce highly unreliable clinical outputs when multiple models fail sequentially, potentially causing systematic failures per ISO 14971.R-SKK, R-VL1
29AI RiskAI-RISK-029Regulatory Non-Compliance: AI Models Not Meeting MDR/RDC/EU AI Act RequirementsAI models do not meet regulatory requirements for clinical validation, performance documentation, risk management, transparency, or bias prevention under EU MDR 2017/745 (Class IIb), Brazilian RDC 751/2022 (Class II), or EU AI Act (high-risk AI system) requirements, jeopardizing regulatory approval and market access.Regulatory rejection by BSI or ANVISA, inability to market device in EU/Brazil, delays in patient access to technology, significant financial losses, and potential legal consequences for non-compliant device commercialization.R-TF-028-001: Device classified as Class IIb (MDR 2017/745 Rule 11), Class II (RDC 751/2022). AI models integral to intended use per R-TF-Device Description. EU AI Act: high-risk AI system (medical device AI). Full technical documentation required per MDR Annex II, MDCG 2020-1 (Clinical Evaluation of SaMD).Critical (4)Low (2)8AcceptableComprehensive AI/ML documentation suite per MDR Annex II and MDCG 2020-1: R-TF-028-001 (AI Description), R-TF-028-002 (Development Plan), R-TF-028-003 (Data Collection Instructions), R-TF-028-004 (Data Annotation Instructions), R-TF-028-005 (Development Report), R-TF-028-006 (Release Report), R-TF-028-009 (Design Checks), R-TF-028-011 (AI Risk Assessment). Clinical validation studies designed to meet MDR/RDC requirements for Class IIb device with AI per R-TF-015-001 Clinical Evaluation Plan and MDCG 2020-1 guidance. AI/ML Risk Assessment (this document) integrated with overall device risk management per ISO 14971:2019, with traceability to R-TF-013-001 Risk Management Plan. Transparency and explainability features per EU AI Act Article 13: intended use clarity, performance disclosure, limitation documentation in IFU. Bias and fairness assessment documented for Fitzpatrick skin types I-VI populations per EU AI Act Article 10 (data governance) and MDR GSPR 22 (non-discrimination). Post-market surveillance plan includes AI-specific performance monitoring per GP-007 and adverse event reporting per MDR Article 87. Quality management system (ISO 13485:2016 certification in progress with BSI) encompasses AI development per GP-028. Cybersecurity risk management addressing AI-specific threats per IEC 81001-5-1 and MDCG 2019-16. Technical documentation structured to address all MDR Annex II requirements, RDC 751/2022 Chapter III, and EU AI Act Annex IV. Regulatory strategy includes proactive engagement with BSI (NB 2797) and ANVISA for AI-specific guidanceCritical (4)Very low (1)4AcceptableNORegulatory non-compliance is primarily a market access and legal risk rather than direct patient safety risk. However, regulatory requirements (MDR GSPR, EU AI Act) are designed to ensure patient safety—compliance demonstrates safety assurance per ISO 14971 and MDR.-

Summary of Risk Assessment​

Risk Distribution​

After implementation of all mitigation measures:

Risk ClassCountPercentage
Acceptable (RPN ≤4)1552%
Tolerable (RPN 5-9)1448%
Unacceptable (RPN ≥10)00%

Critical Risks Requiring Ongoing Monitoring​

The following risks remain Tolerable (not Acceptable) after mitigation and require enhanced post-market surveillance:

  1. AI-RISK-016 (RPN 8): Model robustness to imaging variability
  2. AI-RISK-022 (RPN 8): ICD category misclassification
  3. AI-RISK-026 (RPN 8): Binary indicator false negatives for malignancy

These risks are monitored through:

  • Post-market clinical follow-up (PMCF) per GP-007
  • User feedback analysis and complaint handling
  • Periodic safety update reports (PSUR)
  • Annual AI model performance review

Residual Risk Acceptability​

All identified AI risks have been reduced to Acceptable or Tolerable levels through the implemented control measures. The overall residual risk is acceptable when weighed against the clinical benefits demonstrated in the Clinical Evaluation Report (R-TF-015-002):

  • Improved diagnostic accuracy: AI-assisted diagnosis improves Top-5 accuracy vs. physicians alone
  • Reduced inter-observer variability: Objective severity scoring reduces measurement variability
  • Enhanced clinical workflow: Decision support improves efficiency without replacing clinical judgment
  • Patient access: Enables remote dermatological assessment expanding access to specialist expertise

Integration with Device Risk Management​

This AI Risk Assessment is integrated with the overall device risk management per ISO 14971:

  • Risks transferred to safety: 21 of 29 AI risks (72%) are linked to device-level safety risks in R-TF-013-002
  • Residual risk evaluation: Combined with non-AI device risks for overall benefit-risk determination
  • Change management: AI model updates follow Predefined Change Control Plan (PCCP) with risk re-evaluation

References​

DocumentTitle
R-TF-013-001Risk Management Plan
R-TF-013-002Risk Assessment
R-TF-028-001AI Description
R-TF-028-002AI Development Plan
R-TF-028-005AI Development Report
R-TF-015-001Clinical Evaluation Plan
R-TF-015-002Clinical Evaluation Report
GP-007Post-Market Surveillance
GP-028AI Development Procedure

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: JD-009
  • Reviewer: JD-009
  • Approver: JD-005
Previous
R-TF-028-010 AI V&V Checks
Next
Cybersecurity
  • Purpose
  • Scope
    • Clinical Models (Class B per IEC 62304)
    • Non-Clinical Models (Supporting Functions)
  • Methodology
    • Risk Identification
    • Risk Estimation
      • Severity Scale
      • Likelihood Scale
      • Risk Priority Number (RPN)
    • Risk Control Measures
    • Traceability
  • Risk Assessment Table
  • Summary of Risk Assessment
    • Risk Distribution
    • Critical Risks Requiring Ongoing Monitoring
  • Residual Risk Acceptability
  • Integration with Device Risk Management
  • References
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)