Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 0: Background & Action Plan
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
          • Question
          • Research and planning
          • Response
          • Message for Alfonso: R-TF-028-011 updates needed
    • BSI Non-Conformities
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • Item 7: Risk
  • Response

Response

The following addresses each of the four observations raised in relation to lines 1 (AI-RISK-001), 16 (AI-RISK-016), and 21 (AI-RISK-021) from R-TF-028-011 (AI Risk Assessment).

1. Severity justification: why severity 4 (not 5)

The maximum severity assigned to AI-RISK-001 and AI-RISK-016 is 4 ("Critical — Delayed serious entity identification") on the AI risk scale. We have updated R-TF-028-011 to document the explicit rationale for this rating. The justification rests on three regulatory and clinical grounds:

  • ISO 14971:2019 Annex C guidance on clinical decision support: The device is a clinical decision support tool (CDSS). The healthcare professional always makes the final diagnostic decision. The device output is one input among many (patient history, physical examination, dermoscopy, clinical experience). For a device error to result in patient death (severity 5), all of the following independent barriers would need to fail simultaneously: the device misclassifies the condition, the six binary safety indicators (malignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral, high-priority referral) fail to flag the presentation, the clinician does not exercise independent judgment, and the applicable standard of care (biopsy for suspected malignancy) is not followed.

  • MDCG 2020-1 Valid Clinical Association: The device's outputs are scientifically associated with actual dermatological conditions, established through systematic literature review (R-TF-015-011 State of the Art). This means device errors are errors in probabilistic ranking, not random failures. The clinical barriers to harm are corroborated by the device's own performance data: Top-3 sensitivity of 0.9032 for melanoma and AUC of 0.97 for overall malignancy detection (R-TF-015-003, Summary of Clinical Benefits Achievement). A device with this discrimination performance does not routinely produce rankings where the binary safety indicators simultaneously fail.

  • MEDDEV 2.7.1 Rev 4 Annex A7.3 performance data: The device presents the physician with a prioritised short-list (typically Top-5 categories). Even if the correct diagnosis is not ranked first, it is typically within the short-list. The probability distribution architecture ensures that high-risk presentations are flagged through both the ICD ranking and the independent binary indicators.

This severity assessment accounts for foreseeable misuse, including over-reliance on the device by less experienced clinicians. Even under such conditions, severity 4 holds: the binary safety indicators and the probability distribution output architecture provide automated safety barriers that function independently of the clinician's judgment quality. Furthermore, we have documented in R-TF-028-011 the risk impact of a hypothetical change to severity 5: AI-RISK-001 residual RPN would change from 4 to 5 (still Acceptable), and AI-RISK-016 residual RPN would change from 8 to 10 (Tolerable, no change in acceptability category). This sensitivity analysis confirms that the severity 4 rating does not mask an unacceptable risk — the conclusion holds regardless of the severity assignment.

2. Design mitigations and verification of effectiveness

We have updated R-TF-028-011 to explicitly categorise the mitigation measures for each risk into three types: design mitigations (runtime features in the deployed device), process mitigations (development and validation procedures), and information mitigations (IFU content and user training).

For AI-RISK-001 (dataset not representative of intended use population):

  • Design mitigations: The six binary safety indicators operate independently of the ICD classification and are derived from a dermatologist-defined mapping matrix (R-TF-028-004). These indicators flag high-risk presentations (malignant, pre-malignant, urgent referral) regardless of whether the underlying probability distribution is affected by dataset representativity limitations. The probability distribution output format is classified as an information mitigation (see below) because it communicates uncertainty to the clinician rather than actively blocking or filtering the harm pathway.
  • Process mitigations: Multi-source data collection strategy, stratified sampling across Fitzpatrick phototypes, bias analysis across demographic subgroups (R-TF-028-005 AI Development Report).
  • Information mitigations: The normalised probability distribution output format (presenting a ranked list of possibilities rather than a single assertion) communicates inherent uncertainty and requires the clinician to consider multiple differential possibilities before acting. The IFU (Important Safety Information, § Population and performance variability) states that performance may vary across skin phototypes and demographic subgroups, and instructs the clinician to exercise particular judgment for underrepresented populations.

For AI-RISK-016 (model robustness failures from image acquisition variability):

  • Design mitigation: The Dermatology Image Quality Assessment (DIQA) model provides a runtime quality gate that evaluates each input image and returns a quality score with dimension sub-scores (focus, lighting, framing, resolution). This mitigation is implemented as SRS requirement SRS-Y5W (derived from PRS-7XK) and verified by test cases C50, C62, C68, C73, C77, C106, C329, C370, C371, C454, and C455 in R-TF-012-034 Software Test Description. All test cases passed.
  • Process mitigations: Data augmentation during training, external validation on independent datasets.
  • Information mitigations: The IFU (How to take pictures) provides detailed image acquisition guidance covering lighting, distance, angle, and focus. The IFU (Precautions, risk #9 and #30) instructs users to review image quality information returned by the device.

For AI-RISK-021 (usability — outputs not interpretable):

  • Design mitigations: Explainability media (bounding boxes, segmentation masks) allow the clinician to verify the basis of quantitative measurements (SRS-0AB, SRS-K7M; verified by test cases C256 and C265 in R-TF-012-034). The probability distribution output format inherently communicates that results are probabilistic, not definitive.
  • Information mitigations: The IFU (Important Safety Information) contains a prominent non-diagnostic disclaimer: "The device output is not a clinical diagnosis." The IFU (Endpoint specification) explains entropy as a confidence measure and describes how binary indicators are derived.

The traceability from risk to mitigation to SRS requirement to verification test case follows the approach established in the corrective action for Technical Review N3 (risk mitigation traceability), which was applied systematically to all 62 risks in R-TF-013-002.

3. Occurrence rate estimation basis

We have updated R-TF-028-011 to document the basis for each likelihood estimate using a three-step chain: data source, appraisal method, and derived value.

For AI-RISK-001 (residual likelihood: Very low, 1):

  • Data source: Post-market surveillance data from the equivalent legacy device (R-TF-007-003 PSUR): over 4,500 reports generated across 21 contracts, with 7 non-serious incidents reported during the surveillance period. None of the 7 incidents involved diagnostic error or harm attributable to dataset representativity. Clinical validation studies (BI_2024, PH_2024, SAN_2024, IDEI_2023) included patients across Fitzpatrick phototypes I through IV with no systematic performance degradation observed across subgroups.
  • Appraisal method: PMS data appraised using IMDRF MDCE WG/N56 Appendix F quality criteria as endorsed by MDCG 2020-6 Appendix I.
  • Derived value: The occurrence estimate integrates both data sources: the PMS incident rate of 0.16% over 4+ years with zero diagnostic-error-related incidents provides the real-world baseline, while the clinical validation studies provide independent confirmation that no systematic performance degradation occurs across subgroups. The combined evidence supports a residual likelihood of "Very low" (1) on the AI risk scale.

For AI-RISK-016 (residual likelihood: Low, 2):

  • Data source: PMS data (as above, zero incidents attributable to image quality failure). DIQA quality gate verification (SRS-Y5W, 11 test cases — all passed). Clinical validation studies used images captured under real-world conditions with varying acquisition quality.
  • Appraisal method: Same as AI-RISK-001.
  • Derived value: The DIQA quality gate provides a runtime barrier, but borderline images may pass and yield degraded performance. The residual likelihood of "Low" (2) reflects the residual possibility of borderline-quality images affecting output, mitigated but not eliminated by the quality gate.

For AI-RISK-021 (residual likelihood: Low, 2):

  • Data source: Summative usability evaluation (R-TF-025-007, October 2025, n=36): HCP Scenario 3 Q4 achieved 72.2% success on understanding that the device output is not a diagnosis. 1 use error, 3 close calls observed in Scenario 3. Simulated use scenarios achieved 100% success.
  • Appraisal method: Assessed per IEC 62366-1:2015 §5.9 residual risk assessment methodology, documented in R-TF-025-007 §14.7.
  • Derived value: The 72.2% Q4 success rate and 1 observed use error support a residual likelihood of "Low" (2), reflecting the possibility that some users may not fully appreciate the non-diagnostic nature of the output. This is mitigated by the prominent IFU disclaimer, the probabilistic output format, and the requirement for clinical judgment in all use cases.

4. Residual risk communication in the IFU

We have updated R-TF-028-011 to map each residual risk to the specific IFU section where it is communicated to users.

For AI-RISK-001 (dataset representativity, residual severity 4, RPN 4):

  • IFU, Important Safety Information, § Population and performance variability: States that performance may vary across Fitzpatrick skin phototypes, age groups, and geographic populations. Instructs clinicians to exercise particular judgment for underrepresented populations (phototypes V-VI, paediatric, geriatric).
  • IFU, Important Safety Information, § Understanding the device output: States the device produces probabilistic outputs that represent a range of possibilities, not a single conclusion.

For AI-RISK-016 (image acquisition variability, residual severity 4, RPN 8):

  • IFU, How to take pictures: Provides detailed guidance on lighting (even illumination, avoid harsh shadows, use flash if needed), distance (10-30 cm), angle (perpendicular to skin surface), and focus (tap-to-focus). Includes a common issues table (motion blur, glare, shadows, poor background) with solutions.
  • IFU, Precautions, risk #9: States that image artefacts and resolution affect device performance and instructs users to review image quality information returned by the device.
  • IFU, Precautions, risk #30: Addresses inadequate lighting specifically.

For AI-RISK-021 (usability — outputs not interpretable, residual severity 3, RPN 6):

  • IFU, Important Safety Information, § The device does not provide a clinical diagnosis: Prominent warning that the device output is clinical decision support information, not a diagnosis.
  • IFU, Important Safety Information, § Understanding the device output: Explains that the device produces a probability distribution, cannot confirm the presence of a condition, and outputs a range of possibilities.
  • IFU, Endpoint specification, § Binary Indicators: Explains how the six binary safety indicators are derived from the probability distribution.
  • IFU, Endpoint specification, § Entropy: Explains entropy as a confidence measure, with visual examples of low vs. high confidence distributions.
  • IFU, Troubleshooting (Clinical): Provides interpretation guidance for severity measurements, clinical sign scores, and the Top-5 accuracy approach.

Cross-reference to R-TF-013-002

The three risks in R-TF-028-011 reviewed by BSI transfer to corresponding risks in the main safety risk register (R-TF-013-002), where they are assessed on the main safety severity scale (Critical = 5 = Death; Serious = 4 = Permanent impairment). The relationship is documented in R-TF-028-011. The two risk documents use different severity scales appropriate to their respective risk domains: the AI risk scale is calibrated to AI-specific failure modes (delayed identification, degraded performance), while the main safety scale is calibrated to direct patient harm outcomes. This is consistent with the risk management framework defined in R-TF-013-003 and ISO 14971:2019, which permits domain-specific risk assessment scales within an integrated risk management system.

Red-lined versions of R-TF-028-011 and R-TF-013-002 are provided as supplementary evidence.

Previous
Research and planning
Next
Message for Alfonso: R-TF-028-011 updates needed
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)