Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
          • Question
          • Research and planning
    • BSI Non-Conformities
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • Item 7: Risk
  • Research and planning

Research and planning

Internal working document

This document is for internal use only. It contains analysis, gap identification, and response strategy for Item 7 of the BSI Clinical Review Round 1. It will not be included in the final response to BSI.

1. What BSI is asking​

BSI references three specific rows from R-TF-028-011 (AI Risk Assessment) and raises four concerns:

The three risks:

LineIDRiskInitial severityResidual severityResidual RPN
1AI-RISK-001Dataset Not Representative of Intended Use PopulationCritical (4)Critical (4)4 (Acceptable)
16AI-RISK-016Model Robustness Failures: Sensitivity to Image Acquisition VariabilityCritical (4)Critical (4)8 (Tolerable)
21AI-RISK-021Usability Issues: Model Outputs Not Interpretable by Clinical UsersModerate (3)Moderate (3)6 (Acceptable)

BSI's four concerns:

  1. Severity justification: The maximum severity assigned is 4 (pre- and post-mitigation). Why could these risks not result in harms of severity 5 to the patient?
  2. Design mitigations: For lines 1 and 16, are there design mitigations (e.g., image quality rejection)? If so, how have these been verified as effective?
  3. Occurrence rate estimation: Are occurrence rates based on available data (PMS or literature)? This is unclear.
  4. Residual risk communication: Do residual risks remain for these risk lines? If so, where have these been communicated to users in the IFU?

This is an observation/request, not a deficiency finding. The regulatory basis is GSPRs 1–5, 8, and EN ISO 14971.

2. Two severity scales in play​

BSI reviewed R-TF-028-011 (AI Risk Assessment), which uses an AI-specific severity scale. The main safety risk register (R-TF-013-002) uses a different scale defined in R-TF-013-003:

ScoreMain safety scale (R-TF-013-003)AI risk scale (R-TF-028-011)
5Critical — DeathCatastrophic — Death or irreversible harm
4Serious — Permanent impairment or irreversible injuryCritical — Delayed serious entity identification
3Major — Injury requiring medical/surgical interventionModerate — Significant impact, recoverable
2Minor — Temporary injuryMinor — Temporary
1Negligible — InconvenienceNegligible

Both scales place "death" at severity 5. BSI is asking: why is severity capped at 4, not 5?

3. Severity justification analysis​

The core argument​

The device is a clinical decision support tool (CDSS), not a diagnostic device. It provides an interpretative distribution of probable ICD-11 categories and quantitative severity data to support (not replace) healthcare professional judgment. Several layers prevent a device error from reaching the patient as a harm of severity 5 (death):

  1. The clinician always makes the final diagnostic decision. The device output is one input among many (patient history, physical examination, dermoscopy, clinical experience). The IFU explicitly states the device is not intended for diagnosis.
  2. Top-5 presentation: The device presents the top 5 most probable ICD-11 categories, not a single diagnosis. Even if the correct diagnosis is not ranked #1, it is typically within the shortlist.
  3. Independent binary safety indicators: Six binary indicators (malignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral ≤48h, high-priority referral ≤2 weeks) operate independently of the ICD classification and flag high-risk lesions regardless of the specific ICD suggestion.
  4. Standard of care: Clinical guidelines require biopsy for suspected malignancy regardless of any CDSS output. The device does not alter the standard of care.

Why severity 4 (not 5) is defensible​

A severity 5 rating requires a plausible direct causal chain from the device error to patient death. For a CDSS:

  • The device error (e.g., misclassification) is an initiating event, not a harm.
  • Between the device error and patient harm, there are multiple independent barriers: clinician judgment, standard of care protocols, binary safety indicators, follow-up consultations.
  • For death to occur, all of these barriers would need to fail simultaneously — the device misclassifies, the binary indicators fail to flag, the clinician does not exercise independent judgment, and standard of care is not followed.

This multi-barrier chain reduces the probability of reaching severity 5 but does not eliminate the theoretical possibility. BSI's question is whether the theoretical possibility should be reflected in the severity score.

The gap​

The current AI risk assessment does not document this justification. The severity values are assigned but the rationale for choosing 4 over 5 is not explicitly stated. This is a documentation gap, not a clinical reasoning gap. The argument above is valid but needs to be written into the risk assessment.

Risk of changing severity to 5​

If severity is changed to 5, then:

  • AI-RISK-001: residual RPN = 5 × 1 = 5 (still Acceptable)
  • AI-RISK-016: residual RPN = 5 × 2 = 10 (moves from Tolerable to Tolerable/borderline)
  • AI-RISK-021: would remain at 3 since BSI's concern about severity 5 is specific to lines 1 and 16

Changing to severity 5 would not fundamentally alter the risk acceptability outcomes but would require recalculating RPNs across the AI risk register. The safer approach is to document the justification for severity 4 rather than change to 5, since the multi-barrier argument is clinically sound and consistent with how other CDSS manufacturers handle this.

4. Design mitigations (BSI concern #2)​

BSI specifically asks about design mitigations for AI-RISK-001 and AI-RISK-016, giving the example of image quality rejection.

AI-RISK-001 (Dataset representativity)​

Current mitigations are primarily process controls (training/validation procedures), not runtime design mitigations:

  • Multi-source data collection strategy
  • Stratified sampling
  • Bias analysis and fairness evaluation across Fitzpatrick skin types
  • Independent evaluation on sequestered hold-out test sets

These are valid mitigations but BSI may be looking for runtime design features that protect against this risk in deployed use. The relevant runtime feature is that the device outputs a probability distribution (not a binary yes/no), so uncertainty is inherently communicated. However, this is not explicitly framed as a design mitigation in the risk assessment.

AI-RISK-016 (Image acquisition variability)​

This risk has a clear design mitigation: DIQA (Dermatology Image Quality Assessment) model provides a quality gate that rejects images outside acceptable acquisition parameter ranges. This is mentioned in the mitigation measures.

BSI asks: has this been verified as effective?

The DIQA model is part of the device's processing pipeline. Its verification evidence should be in the software V&V records (R-TF-012-034 or related SRS test records). The research-and-planning for this item needs to confirm:

  1. Where DIQA verification evidence lives
  2. Whether the test results demonstrate effective rejection of poor-quality images

Gap​

The AI risk assessment lists mitigations but does not clearly distinguish between:

  • Design mitigations (runtime features in the deployed device)
  • Process mitigations (development/validation procedures)
  • Information mitigations (IFU warnings, user training)

BSI wants to see the design mitigations specifically, and evidence they've been verified. This categorization needs to be added.

5. Occurrence rate estimation (BSI concern #3)​

Current likelihood values:

RiskInitial likelihoodResidual likelihoodBasis documented?
AI-RISK-001Moderate (3)Very low (1)No
AI-RISK-016Moderate (3)Low (2)No
AI-RISK-021Moderate (3)Low (2)No

The likelihood values are assigned but the basis (clinical data, PMS experience, literature, expert judgment) is not documented anywhere in the risk assessment. This is a genuine documentation gap.

Available data sources for occurrence estimation​

  1. PMS data: The legacy device has 4+ years of market experience. The PSUR (R-TF-007-003) documents 7 non-serious incidents over the surveillance period. None involved diagnostic error harm. This supports low occurrence.
  2. Clinical validation studies: Pre-market studies demonstrate performance within acceptance criteria (Top-5 >70%, AUC >0.8). Failure rates within validation can inform occurrence estimates.
  3. Literature: Published studies on AI dermatology tools report error rates and clinical impact. These can support the likelihood estimates.
  4. Summative usability evaluation: R-TF-025-007 provides data on use errors (AI-RISK-021) — 72.2% success on Q4 (understanding device is not diagnostic), 1 use error, 3 close calls.

Gap​

The occurrence estimates are reasonable but undocumented. The fix is to add a rationale column or section to the AI risk assessment documenting the basis for each likelihood value.

6. Residual risk communication in IFU (BSI concern #4)​

BSI asks whether residual risks are communicated to users in the IFU. For the three risks:

AI-RISK-001 (Dataset representativity) — Residual severity 4, RPN 4​

Residual risk: The device may perform less well on underrepresented subgroups despite bias analysis and stratified validation.

IFU coverage needed: Limitations section should state that performance may vary across Fitzpatrick skin types and that validation was primarily conducted on specific populations. Warnings about using clinical judgment for all patient populations.

AI-RISK-016 (Image variability) — Residual severity 4, RPN 8​

Residual risk: Despite DIQA quality gate, some borderline images may pass and yield degraded performance.

IFU coverage needed: Image acquisition guidance (lighting, distance, angle, background requirements), statement that image quality affects device performance, information about the quality assessment feature.

AI-RISK-021 (Usability) — Residual severity 3, RPN 6​

Residual risk: Despite usability validation, some users may misinterpret outputs.

IFU coverage needed: Clear explanations of output format, statement that device is not diagnostic, guidance on interpreting probability distributions and severity scores.

Gap​

The IFU likely already covers most of these points (limitations, image quality guidance, non-diagnostic disclaimer), but the AI risk assessment does not explicitly map each residual risk to the specific IFU section where it is communicated. This mapping needs to be added — similar to the traceability fix done for Item 5's PMS Plan.

7. Cross-NC connections​

Technical Review N3 — Risk mitigation implementation​

Critical cross-reference

N3 addresses the same risk management system but from a different angle. N3 found that risk R-DAG's control measures could not be verified against SRS/test records — the mitigation requirement codes in R-TF-013-002 don't map to actual implementation evidence.

This directly affects Item 7's concern #2 (design mitigations verified as effective). If the SRS traceability is broken for main safety risks, it may also be broken for AI risks. The DIQA quality gate mitigation for AI-RISK-016 must have clear traceability to an SRS requirement AND a verification test result.

The fixes for N3 (risk mitigation traceability) and Item 7 (design mitigation verification) are interdependent. Both require demonstrating that design mitigations exist in the code/SRS and have been verified through testing.

Clinical Review Item 4 — Usability​

AI-RISK-021 (usability) connects directly to the summative usability evaluation addressed in Item 4. The HCP Scenario 3 Q4 result (72.2% success on "is the device diagnostic?") is relevant occurrence data for this risk. Item 4's response and Technical Review N2's deeper analysis provide the evidence base.

Clinical Review Item 3a — Clinical data analysis​

The CER's treatment of clinical evidence (Item 3a) underpins the severity justification. If the CER demonstrates that the device improves diagnostic accuracy compared to unassisted assessment (clinical benefit), this supports the argument that severity 5 is unlikely because the device is an additional safety layer, not a replacement for clinical judgment.

Clinical Review Item 5 — PMS Plan​

PMS data (7 non-serious incidents, no diagnostic error harms) provides evidence for occurrence rate estimation. The PMS Plan's trend analysis methodology also supports the argument that occurrence is monitored continuously.

8. Response strategy​

The response should address each of BSI's four concerns directly:

  1. Severity justification: Provide the explicit rationale for severity 4 — the multi-barrier argument (CDSS role, clinician judgment, Top-5 presentation, binary safety indicators, standard of care). Note that this justification has been documented in the updated risk assessment.
  2. Design mitigations: Distinguish between design mitigations (DIQA quality gate, probability distribution output, binary safety indicators) and process mitigations (validation procedures). Point to verification evidence for the design mitigations.
  3. Occurrence rate basis: Provide the data supporting likelihood estimates — PMS data (7 incidents/4+ years, none involving diagnostic error harm), validation study performance data, usability evaluation results for AI-RISK-021.
  4. Residual risk in IFU: Map each residual risk to the specific IFU section where it is communicated (limitations, image acquisition guidance, non-diagnostic disclaimer, output interpretation guidance).

Fixes required​

Fix 1: Add severity justification to AI risk assessment​

For AI-RISK-001 and AI-RISK-016, add an explicit rationale field explaining why severity is 4 (not 5). The justification should reference:

  • Device role as CDSS (not diagnostic)
  • Clinician always makes final decision
  • Multiple independent safety barriers (Top-5 list, binary indicators, standard of care)
  • P₂ framework from R-TF-013-003 (device cannot directly cause physical harm; harm pathway is indirect through clinical decision-making)

Fix 2: Categorise mitigations as design / process / information​

For each of the three risks, categorise the existing mitigation measures into:

  • Design mitigations: Runtime features in the deployed device (DIQA quality gate, probability distribution output, binary safety indicators)
  • Process mitigations: Development/validation procedures (stratified sampling, bias analysis)
  • Information mitigations: IFU content, user training materials

For each design mitigation, add a reference to the SRS requirement and verification test result.

Fix 3: Add occurrence rate rationale​

For each of the three risks, document the basis for the likelihood estimate:

  • PMS data from legacy device (4+ years, incident rates)
  • Clinical validation performance data (failure rates from studies)
  • For AI-RISK-021: summative usability evaluation results (R-TF-025-007)
  • Literature references where applicable

Fix 4: Map residual risks to IFU sections​

For each of the three risks, add a field identifying the specific IFU section(s) where the residual risk is communicated to users. Verify that the IFU actually contains this information; if any residual risk is not addressed, update the IFU.

9. Risk assessment​

RiskImpactMitigation
BSI may insist that severity should be 5 (death cannot be excluded)High — changing severity to 5 affects RPNs across the registerThe multi-barrier argument is clinically sound and consistent with ISO 14971 Annex C guidance on CDSS. If BSI insists, severity 5 does not change risk acceptability (RPN 5 is still Acceptable for AI-RISK-001)
BSI may find DIQA verification evidence insufficientMedium — DIQA is the key design mitigation for AI-RISK-016Coordinate with N3 response to ensure SRS→test traceability is demonstrated
BSI may question whether PMS data from legacy device applies to the MDR deviceLow — the devices share the same core algorithmsExplain continuity between legacy and Plus device; same AI models, same clinical workflow
BSI may note the severity scale mismatch between AI risk and safety risk registersLow — different scales for different risk domains is acceptable under ISO 14971Explain that AI risks transfer to safety risks (R-SKK, R-7US, etc.) where they are assessed on the main severity scale

10. Open items​

#ItemOwnerStatus
1Locate DIQA verification test results — which SRS requirement implements the quality gate, and which test verifies it?TaigRequired
2Check IFU for residual risk coverage — does the IFU address dataset representativity limitations, image quality requirements, and output interpretation guidance?TaigRequired — can check by reading the EU IFU MDR app
3Coordinate with N3 response on risk mitigation traceability approachTaigRequired
4Determine whether to add severity justification to the JSON data structure or as a separate document sectionTaigRequired — check RiskManagement CLAUDE.md for data model
Previous
Question
Next
BSI Non-Conformities
  • 1. What BSI is asking
  • 2. Two severity scales in play
  • 3. Severity justification analysis
    • The core argument
    • Why severity 4 (not 5) is defensible
    • The gap
    • Risk of changing severity to 5
  • 4. Design mitigations (BSI concern #2)
    • AI-RISK-001 (Dataset representativity)
    • AI-RISK-016 (Image acquisition variability)
    • Gap
  • 5. Occurrence rate estimation (BSI concern #3)
    • Available data sources for occurrence estimation
    • Gap
  • 6. Residual risk communication in IFU (BSI concern #4)
    • AI-RISK-001 (Dataset representativity) — Residual severity 4, RPN 4
    • AI-RISK-016 (Image variability) — Residual severity 4, RPN 8
    • AI-RISK-021 (Usability) — Residual severity 3, RPN 6
    • Gap
  • 7. Cross-NC connections
    • Technical Review N3 — Risk mitigation implementation
    • Clinical Review Item 4 — Usability
    • Clinical Review Item 3a — Clinical data analysis
    • Clinical Review Item 5 — PMS Plan
  • 8. Response strategy
    • Fixes required
      • Fix 1: Add severity justification to AI risk assessment
      • Fix 2: Categorise mitigations as design / process / information
      • Fix 3: Add occurrence rate rationale
      • Fix 4: Map residual risks to IFU sections
  • 9. Risk assessment
  • 10. Open items
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)