Research and planning

Internal working document

This document is for internal use only. It contains analysis, gap identification, and response strategy for Item 7 of the BSI Clinical Review Round 1. It will not be included in the final response to BSI.

0. Critical clarification: AI Risk Assessment vs. Risk Management Record

Key insight

BSI reviewed R-TF-028-011 (AI Risk Assessment), which is a development-phase document that analyses AI/ML-specific failure modes. However, this document is not the patient safety risk register. The AI risks identified in R-TF-028-011 transfer to the main safety risk register R-TF-013-002 (Risk Management Record), where they are assessed for patient harm using the ISO 14971 severity scale.

The three AI risks BSI referenced map to the following patient safety risks in R-TF-013-002:

AI Risk (R-TF-028-011)	Patient Safety Risks (R-TF-013-002)	Severity in R-TF-013-002
AI-RISK-001 (Dataset representativity)	R-SKK, R-7US, R-GY6	3 (not 4)
AI-RISK-016 (Image variability robustness)	R-SKK, R-VL1	3 (not 4)
AI-RISK-021 (Output interpretability)	R-SKK	3 (not 4)

The severity in R-TF-013-002 is 3 (Major), not 4 (Critical). This is a more conservative (lower) severity than what appears in the AI Risk Assessment, and the justification for why it cannot be 5 is documented in R-TF-013-003 Risk Management Report.

Our response to BSI should redirect their attention to R-TF-013-002 (Risk Management Record) and R-TF-013-003 (Risk Management Report), which are the authoritative documents for patient safety risk assessment per ISO 14971.

1. What BSI is asking

BSI references three specific rows from R-TF-028-011 (AI Risk Assessment) and raises four concerns:

The three risks:

Line	ID	Risk	Initial severity	Residual severity	Residual RPN
1	AI-RISK-001	Dataset Not Representative of Intended Use Population	Critical (4)	Critical (4)	4 (Acceptable)
16	AI-RISK-016	Model Robustness Failures: Sensitivity to Image Acquisition Variability	Critical (4)	Critical (4)	8 (Tolerable)
21	AI-RISK-021	Usability Issues: Model Outputs Not Interpretable by Clinical Users	Moderate (3)	Moderate (3)	6 (Acceptable)

BSI's four concerns:

Severity justification: The maximum severity assigned is 4 (pre- and post-mitigation). Why could these risks not result in harms of severity 5 to the patient?
Design mitigations: For lines 1 and 16, are there design mitigations (e.g., image quality rejection)? If so, how have these been verified as effective?
Occurrence rate estimation: Are occurrence rates based on available data (PMS or literature)? This is unclear.
Residual risk communication: Do residual risks remain for these risk lines? If so, where have these been communicated to users in the IFU?

This is an observation/request, not a deficiency finding. The regulatory basis is GSPRs 1–5, 8, and EN ISO 14971.

2. Two severity scales in play

BSI reviewed R-TF-028-011 (AI Risk Assessment), which uses an AI-specific severity scale. The main safety risk register (R-TF-013-002) uses a different scale defined in R-TF-013-003:

Score	Main safety scale (R-TF-013-003)	AI risk scale (R-TF-028-011)
5	Critical: Death	Catastrophic: Death or irreversible harm
4	Serious: Permanent impairment or irreversible injury	Critical: Delayed serious entity identification
3	Major: Injury requiring medical/surgical intervention	Moderate: Significant impact, recoverable
2	Minor: Temporary injury	Minor: Temporary
1	Negligible: Inconvenience	Negligible

Both scales place "death" at severity 5. BSI is asking: why is severity capped at 4, not 5?

Key point: In R-TF-013-002 (the patient safety register), the corresponding risks (R-SKK, R-7US, R-VL1) have severity 3, not 4. The AI Risk Assessment uses a different scale calibrated for AI development failure modes. BSI should be directed to the patient safety risk register for the authoritative severity assessment.

3. Severity justification analysis

The core argument

The device is a clinical decision support tool (CDSS), not a diagnostic device. It provides an interpretative distribution of probable ICD-11 categories and quantitative severity data to support (not replace) healthcare professional judgment. Several layers prevent a device error from reaching the patient as a harm of severity 5 (death):

The clinician always makes the final diagnostic decision. The device output is one input among many (patient history, physical examination, dermoscopy, clinical experience). The IFU explicitly states the device is not intended for diagnosis.
Top-5 presentation: The device presents the top 5 most probable ICD-11 categories, not a single diagnosis. Even if the correct diagnosis is not ranked #1, it is typically within the shortlist.
Independent binary safety indicators: Six binary indicators (malignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral ≤48h, high-priority referral ≤2 weeks) operate independently of the ICD classification and flag high-risk lesions regardless of the specific ICD suggestion.
Standard of care: Clinical guidelines require biopsy for suspected malignancy regardless of any CDSS output. The device does not alter the standard of care.

Why severity 4 (not 5) is defensible

A severity 5 rating requires a plausible direct causal chain from the device error to patient death. For a CDSS:

The device error (e.g., misclassification) is an initiating event, not a harm.
Between the device error and patient harm, there are multiple independent barriers: clinician judgment, standard of care protocols, binary safety indicators, follow-up consultations.
For death to occur, all of these barriers would need to fail simultaneously; the device misclassifies, the binary indicators fail to flag, the clinician does not exercise independent judgment, and standard of care is not followed.

This multi-barrier chain reduces the probability of reaching severity 5 but does not eliminate the theoretical possibility. BSI's question is whether the theoretical possibility should be reflected in the severity score.

The gap

The current AI risk assessment does not document this justification. The severity values are assigned but the rationale for choosing 4 over 5 is not explicitly stated. This is a documentation gap, not a clinical reasoning gap. The argument above is valid but needs to be written into the risk assessment.

Risk of changing severity to 5

If severity is changed to 5, then:

AI-RISK-001: residual RPN = 5 × 1 = 5 (still Acceptable)
AI-RISK-016: residual RPN = 5 × 2 = 10 (moves from Tolerable to Tolerable/borderline)
AI-RISK-021: would remain at 3 since BSI's concern about severity 5 is specific to lines 1 and 16

Changing to severity 5 would not fundamentally alter the risk acceptability outcomes but would require recalculating RPNs across the AI risk register. The safer approach is to document the justification for severity 4 rather than change to 5, since the multi-barrier argument is clinically sound and consistent with how other CDSS manufacturers handle this.

4. Design mitigations (BSI concern #2)

BSI specifically asks about design mitigations for AI-RISK-001 and AI-RISK-016, giving the example of image quality rejection.

AI-RISK-001 (Dataset representativity)

Current mitigations are primarily process controls (training/validation procedures), not runtime design mitigations:

Multi-source data collection strategy
Stratified sampling
Bias analysis and fairness evaluation across Fitzpatrick skin types
Independent evaluation on sequestered hold-out test sets

These are valid mitigations but BSI may be looking for runtime design features that protect against this risk in deployed use. The relevant runtime feature is that the device outputs a probability distribution (not a binary yes/no), so uncertainty is inherently communicated. However, this is not explicitly framed as a design mitigation in the risk assessment.

AI-RISK-016 (Image acquisition variability)

This risk has a clear design mitigation: DIQA (Dermatology Image Quality Assessment) model provides a quality gate that rejects images outside acceptable acquisition parameter ranges. This is mentioned in the mitigation measures.

BSI asks: has this been verified as effective?

The DIQA model is part of the device's processing pipeline. Its verification evidence should be in the software V&V records (R-TF-012-034 or related SRS test records). The research-and-planning for this item needs to confirm:

Where DIQA verification evidence lives
Whether the test results demonstrate effective rejection of poor-quality images

Gap

The AI risk assessment lists mitigations but does not clearly distinguish between:

Design mitigations (runtime features in the deployed device)
Process mitigations (development/validation procedures)
Information mitigations (IFU warnings, user training)

BSI wants to see the design mitigations specifically, and evidence they've been verified. This categorization needs to be added.

5. Occurrence rate estimation (BSI concern #3)

Current likelihood values:

Risk	Initial likelihood	Residual likelihood	Basis documented?
AI-RISK-001	Moderate (3)	Very low (1)	No
AI-RISK-016	Moderate (3)	Low (2)	No
AI-RISK-021	Moderate (3)	Low (2)	No

The likelihood values are assigned but the basis (clinical data, PMS experience, literature, expert judgment) is not documented anywhere in the risk assessment. This is a genuine documentation gap.

Available data sources for occurrence estimation

PMS data: The legacy device has 4+ years of market experience. The PSUR (R-TF-007-003) documents 7 non-serious incidents over the surveillance period. None involved diagnostic error harm. This supports low occurrence.
Clinical validation studies: Pre-market studies demonstrate performance within acceptance criteria (Top-5 >70%, AUC >0.8). Failure rates within validation can inform occurrence estimates.
Literature: Published studies on AI dermatology tools report error rates and clinical impact. These can support the likelihood estimates.
Summative usability evaluation: R-TF-025-007 provides data on use errors (AI-RISK-021), 72.2% success on Q4 (understanding device is not diagnostic), 1 use error, 3 close calls.

Gap

The occurrence estimates are reasonable but undocumented. The fix is to add a rationale column or section to the AI risk assessment documenting the basis for each likelihood value.

6. Residual risk communication in IFU (BSI concern #4)

BSI asks whether residual risks are communicated to users in the IFU. For the three risks:

AI-RISK-001 (Dataset representativity): Residual severity 4, RPN 4

Residual risk: The device may perform less well on underrepresented subgroups despite bias analysis and stratified validation.

IFU coverage needed: Limitations section should state that performance may vary across Fitzpatrick skin types and that validation was primarily conducted on specific populations. Warnings about using clinical judgment for all patient populations.

AI-RISK-016 (Image variability): Residual severity 4, RPN 8

Residual risk: Despite DIQA quality gate, some borderline images may pass and yield degraded performance.

IFU coverage needed: Image acquisition guidance (lighting, distance, angle, background requirements), statement that image quality affects device performance, information about the quality assessment feature.

AI-RISK-021 (Usability): Residual severity 3, RPN 6

Residual risk: Despite usability validation, some users may misinterpret outputs.

IFU coverage needed: Clear explanations of output format, statement that device is not diagnostic, guidance on interpreting probability distributions and severity scores.

Gap

The IFU likely already covers most of these points (limitations, image quality guidance, non-diagnostic disclaimer), but the AI risk assessment does not explicitly map each residual risk to the specific IFU section where it is communicated. This mapping needs to be added, similar to the traceability fix done for Item 5's PMS Plan.

7. Cross-NC connections

Technical Review N3: Risk mitigation implementation

Critical cross-reference

N3 addresses the same risk management system but from a different angle. N3 found that risk R-DAG's control measures could not be verified against SRS/test records; the mitigation requirement codes in R-TF-013-002 don't map to actual implementation evidence.

This directly affects Item 7's concern #2 (design mitigations verified as effective). If the SRS traceability is broken for main safety risks, it may also be broken for AI risks. The DIQA quality gate mitigation for AI-RISK-016 must have clear traceability to an SRS requirement AND a verification test result.

The fixes for N3 (risk mitigation traceability) and Item 7 (design mitigation verification) are interdependent. Both require demonstrating that design mitigations exist in the code/SRS and have been verified through testing.

Clinical Review Item 4: Usability

AI-RISK-021 (usability) connects directly to the summative usability evaluation addressed in Item 4. The HCP Scenario 3 Q4 result (72.2% success on "is the device diagnostic?") is relevant occurrence data for this risk. Item 4's response and Technical Review N2's deeper analysis provide the evidence base.

Clinical Review Item 3a: Clinical data analysis

The CER's treatment of clinical evidence (Item 3a) underpins the severity justification. If the CER demonstrates that the device improves diagnostic accuracy compared to unassisted assessment (clinical benefit), this supports the argument that severity 5 is unlikely because the device is an additional safety layer, not a replacement for clinical judgment.

Clinical Review Item 5: PMS Plan

PMS data (7 non-serious incidents, no diagnostic error harms) provides evidence for occurrence rate estimation. The PMS Plan's trend analysis methodology also supports the argument that occurrence is monitored continuously.

8. Response strategy

The response should address each of BSI's four concerns directly:

Severity justification: Provide the explicit rationale for severity 4, the multi-barrier argument (CDSS role, clinician judgment, Top-5 presentation, binary safety indicators, standard of care). Note that this justification has been documented in the updated risk assessment.
Design mitigations: Distinguish between design mitigations (DIQA quality gate, probability distribution output, binary safety indicators) and process mitigations (validation procedures). Point to verification evidence for the design mitigations.
Occurrence rate basis: Provide the data supporting likelihood estimates: PMS data (7 incidents/4+ years, none involving diagnostic error harm), validation study performance data, usability evaluation results for AI-RISK-021.
Residual risk in IFU: Map each residual risk to the specific IFU section where it is communicated (limitations, image acquisition guidance, non-diagnostic disclaimer, output interpretation guidance).

Fixes required

Fix 1: Add severity justification to AI risk assessment

For AI-RISK-001 and AI-RISK-016, add an explicit rationale field explaining why severity is 4 (not 5). The justification should reference:

Device role as CDSS (not diagnostic)
Clinician always makes final decision
Multiple independent safety barriers (Top-5 list, binary indicators, standard of care)
P₂ framework from R-TF-013-003 (device cannot directly cause physical harm; harm pathway is indirect through clinical decision-making)

Fix 2: Categorise mitigations as design / process / information

For each of the three risks, categorise the existing mitigation measures into:

Design mitigations: Runtime features in the deployed device (DIQA quality gate, probability distribution output, binary safety indicators)
Process mitigations: Development/validation procedures (stratified sampling, bias analysis)
Information mitigations: IFU content, user training materials

For each design mitigation, add a reference to the SRS requirement and verification test result.

Fix 3: Add occurrence rate rationale

For each of the three risks, document the basis for the likelihood estimate:

PMS data from legacy device (4+ years, incident rates)
Clinical validation performance data (failure rates from studies)
For AI-RISK-021: summative usability evaluation results (R-TF-025-007)
Literature references where applicable

Fix 4: Map residual risks to IFU sections

For each of the three risks, add a field identifying the specific IFU section(s) where the residual risk is communicated to users. Verify that the IFU actually contains this information; if any residual risk is not addressed, update the IFU.

9. Risk assessment

Risk	Impact	Mitigation
BSI may insist that severity should be 5 (death cannot be excluded)	High: changing severity to 5 affects RPNs across the register	The multi-barrier argument is clinically sound and consistent with ISO 14971 Annex C guidance on CDSS. If BSI insists, severity 5 does not change risk acceptability (RPN 5 is still Acceptable for AI-RISK-001)
BSI may find DIQA verification evidence insufficient	Medium: DIQA is the key design mitigation for AI-RISK-016	Coordinate with N3 response to ensure SRS→test traceability is demonstrated
BSI may question whether PMS data from legacy device applies to the MDR device	Low: the devices share the same core algorithms	Explain continuity between legacy and Plus device; same AI models, same clinical workflow
BSI may note the severity scale mismatch between AI risk and safety risk registers	Low: different scales for different risk domains is acceptable under ISO 14971	Explain that AI risks transfer to safety risks (R-SKK, R-7US, etc.) where they are assessed on the main severity scale

10. Open items

#	Item	Owner	Status
1	Locate DIQA verification test results: which SRS requirement implements the quality gate, and which test verifies it?	Taig	Required
2	Check IFU for residual risk coverage: does the IFU address dataset representativity limitations, image quality requirements, and output interpretation guidance?	Taig	Required: can check by reading the EU IFU MDR app
3	Coordinate with N3 response on risk mitigation traceability approach	Taig	Required
4	Determine whether to add severity justification to the JSON data structure or as a separate document section	Taig	Required: check RiskManagement CLAUDE.md for data model

Regulatory framework: what the BSI meeting revealed

Severity warning

Nick stated during the BSI meeting (2026-03-25) that refusal is extremely likely if identified gaps are not closed. Item 7 is categorised as an observation, but the four concerns BSI raised (severity justification, design mitigation verification, occurrence rate sourcing, residual risk communication) all require specific documented evidence that currently does not exist in the AI risk assessment. This is not a formatting issue; it is a documentation substance gap.

The four applicable guidance documents

Document	Role for Item 7
MEDDEV 2.7.1 Rev 4, Annex A7.2	Defines the "Conformity with acceptable benefit/risk profile" standard. This is the regulatory basis for all four of BSI's concerns. Specifically: (a) benefits must be quantified: magnitude, variation across population, clinical relevance of endpoint changes; (b) clinical risks must be evaluated with rates: nature, severity, number, and rates of harmful events; (c) for diagnostic devices: risks from false positives (unnecessary treatment) and false negatives (missed/delayed diagnosis) must both be assessed; (d) benefit-risk must be evaluated against the state of the art including alternative treatments (unassisted HCP assessment). The AI risk assessment currently states likelihood values without sourcing them, and benefits without quantification against SotA; both of which directly violate Annex A7.2.
MDCG 2020-6, § 6.3	"Data appraisal must use validated tools: Cochrane RCT tool, MINORS, Newcastle-Ottawa Scale, IMDRF MDCE WG/N56 Appendix F. Complaint/incident ratios (incidents ÷ device sales) are NOT sufficient to prove safety." This directly constrains how occurrence rates can be documented. Citing PMS complaint counts alone (7 non-serious incidents in 4+ years) is not sufficient for the likelihood estimates in the AI risk assessment. The data sources must be assessed using a validated appraisal tool, and the rates must be derived from that appraised data, not just quoted.
MDCG 2020-1	Three-pillar framework for MDSW. The Valid Clinical Association (VCA) pillar establishes that the device's outputs are scientifically associated with actual clinical conditions. This is directly relevant to the severity justification: a device that produces outputs scientifically correlated with real dermatological conditions (established via systematic literature review in R-TF-015-011) is a device where the clinical barriers to harm from a single device error are real and well-founded. The VCA foundation is what makes the multi-barrier argument (severity 4, not 5) defensible under a regulatory framework, not just a claim that the device is "decision support."
MDCG 2020-13, Section G	BSI checks: "risk information is quantified in the IFU; limitations, contraindications, warnings are adequate." The residual risk communication (BSI concern #4) must be quantified, not just qualitative. If AI-RISK-001 (dataset representativity) has a residual risk that performance may vary across Fitzpatrick skin types, the IFU must state this in a measurable way, not just warn generically. If validation data exists showing performance variation by Fitzpatrick type, that data must be cited or summarised.

What Erin said: occurrence rates must be traceable

Erin explicitly stated during the BSI meeting: "Occurrence rates must be traceable to where they were pulled from (PMS data, clinical investigations, literature). Risk documents must be updated to reflect current evidence."

This is not a question about whether the likelihood values in the AI risk assessment are correct; it is about the source chain. Every likelihood value must be traceable to a specific data source, and that data source must be assessed using a validated appraisal methodology. The current AI risk assessment assigns likelihood values (e.g., AI-RISK-001 residual: Very low (1); AI-RISK-016 residual: Low (2)) without any documented basis. BSI will request this basis.

The required structure for each likelihood value:

Name the data source (e.g., "PSUR R-TF-007-003, 2023: 7 non-serious incidents / 250,000+ reports = 0.003% incident rate over 4 years")
State the appraisal method applied to that data (e.g., "appraised using IMDRF MDCE WG/N56 Appendix F quality criteria")
State the likelihood value derived from that appraisal and why it maps to the AI risk scale

This three-step chain is required by MEDDEV 2.7.1 Rev 4 Section 9 (Stage 2 appraisal), which is endorsed by MDCG 2020-6 Appendix I as applicable under MDR.

What Nick said: cross-referencing the correct risk document

Nick specifically noted that if the risk assessment relevant to a question is in a different document than the one BSI reviewed, a clear cross-reference must be provided. BSI reviewed R-TF-028-011 (AI Risk Assessment). The main safety risk register is R-TF-013-002. These two documents use different severity scales:

AI scale: Catastrophic (5) = death or irreversible harm; Critical (4) = delayed serious entity identification
Main safety scale: Critical (5) = Death; Serious (4) = Permanent impairment or irreversible injury

The relationship between AI risks in R-TF-028-011 and their corresponding risks in R-TF-013-002 (R-SKK, R-7US, etc.) must be explicitly documented. BSI must be able to follow the chain: AI-RISK-001 → transferred to R-TF-013-002 risk [ID] → assessed on the main safety scale at severity [X]. Without this cross-reference, BSI cannot verify that the AI risks have been properly integrated into the overall safety risk management system.

What MEDDEV Annex A7.2 requires that is currently missing

Per MEDDEV 2.7.1 Rev 4 Annex A7.2, the clinical evaluation (not just the risk assessment) must demonstrate:

Benefits quantified against SotA: For each clinical benefit, the magnitude of improvement must be stated relative to what alternative methods (unaided HCP assessment, dermoscopy, specialist referral) achieve. The benefit-risk calculation must show that HCP+device achieves a better risk-benefit ratio than the alternative. This is currently not done in the AI risk assessment.
Clinical risks evaluated as rates: False positive rates (unnecessary procedures triggered by incorrect high-probability rankings) and false negative rates (missed malignancies due to low probability rankings) must be estimated. These are not currently documented in the AI risk assessment; they are implicit in the performance claims data but not connected to the risk assessment.
Benefit-risk benchmarked against state of the art: The SotA document (R-TF-015-011) contains baseline rates for unaided HCP diagnostic accuracy. The risk assessment must reference these baselines to demonstrate that the device's benefit-risk profile is superior to the alternative.

None of these three requirements are currently met in the AI risk assessment. The MEDDEV A7.2 fix is therefore not only about the three specific risks BSI flagged; it applies to the entire risk assessment framework. However, for the Item 7 response, the focus should be on the three specific risks BSI raised, with a note that the methodology has been documented for consistency across the register.

The severity justification: regulatory grounding

The multi-barrier argument (severity 4, not 5) is clinically sound, but it must be grounded in the regulatory framework, not just stated as a clinical opinion. The regulatory grounding is:

ISO 14971 Annex C (applicable per the risk management framework): Guidance on applying ISO 14971 to CDSS devices recognises that when a healthcare professional retains final authority, the causal chain from device error to harm includes human judgment as a controlling factor.
MDCG 2020-1 VCA pillar: The device's outputs are scientifically associated with real clinical conditions (established via systematic literature review). This VCA foundation means the device's errors are errors in probabilistic ranking, not random failures, and clinical decision-making provides a meaningful safety net against ranking errors escalating to severity 5.
MEDDEV 2.7.1 Rev 4, Annex A7.3: For diagnostic devices, the standard requires sensitivity and specificity data per major clinical indication. Our data (Top-3 sensitivity 0.9032 for melanoma; AUC 0.8482 overall malignancy) demonstrates that the probability distribution reliably ranks high-risk conditions highly. A device with AUC 0.848 for malignancy detection is not one where the binary safety indicators (flagging malignant, pre-malignant, urgent referral) routinely fail simultaneously with the probabilistic output; the architectural safeguards are corroborated by performance data.

This regulatory grounding, ISO 14971 + MDCG 2020-1 VCA + MEDDEV A7.3 performance data, is what makes the severity 4 justification defensible to BSI. A purely narrative argument ("the clinician always makes the final decision") without regulatory and data grounding is unlikely to be accepted.

0. Critical clarification: AI Risk Assessment vs. Risk Management Record​

1. What BSI is asking​

2. Two severity scales in play​

3. Severity justification analysis​

The core argument​

Why severity 4 (not 5) is defensible​

The gap​

Risk of changing severity to 5​

4. Design mitigations (BSI concern #2)​

AI-RISK-001 (Dataset representativity)​

AI-RISK-016 (Image acquisition variability)​

Gap​

5. Occurrence rate estimation (BSI concern #3)​

Available data sources for occurrence estimation​

Gap​

6. Residual risk communication in IFU (BSI concern #4)​

AI-RISK-001 (Dataset representativity): Residual severity 4, RPN 4​

AI-RISK-016 (Image variability): Residual severity 4, RPN 8​

AI-RISK-021 (Usability): Residual severity 3, RPN 6​

Gap​

7. Cross-NC connections​

Technical Review N3: Risk mitigation implementation​

Clinical Review Item 4: Usability​

Clinical Review Item 3a: Clinical data analysis​

Clinical Review Item 5: PMS Plan​

8. Response strategy​

Fixes required​

Fix 1: Add severity justification to AI risk assessment​

Fix 2: Categorise mitigations as design / process / information​

Fix 3: Add occurrence rate rationale​

Fix 4: Map residual risks to IFU sections​

9. Risk assessment​

10. Open items​

Regulatory framework: what the BSI meeting revealed​

The four applicable guidance documents​

What Erin said: occurrence rates must be traceable​

What Nick said: cross-referencing the correct risk document​

What MEDDEV Annex A7.2 requires that is currently missing​

The severity justification: regulatory grounding​

0. Critical clarification: AI Risk Assessment vs. Risk Management Record

1. What BSI is asking

2. Two severity scales in play

3. Severity justification analysis

The core argument

Why severity 4 (not 5) is defensible

The gap

Risk of changing severity to 5

4. Design mitigations (BSI concern #2)

AI-RISK-001 (Dataset representativity)

AI-RISK-016 (Image acquisition variability)

Gap

5. Occurrence rate estimation (BSI concern #3)

Available data sources for occurrence estimation

Gap

6. Residual risk communication in IFU (BSI concern #4)

AI-RISK-001 (Dataset representativity): Residual severity 4, RPN 4

AI-RISK-016 (Image variability): Residual severity 4, RPN 8

AI-RISK-021 (Usability): Residual severity 3, RPN 6

Gap

7. Cross-NC connections

Technical Review N3: Risk mitigation implementation

Clinical Review Item 4: Usability

Clinical Review Item 3a: Clinical data analysis

Clinical Review Item 5: PMS Plan

8. Response strategy

Fixes required

Fix 1: Add severity justification to AI risk assessment

Fix 2: Categorise mitigations as design / process / information

Fix 3: Add occurrence rate rationale

Fix 4: Map residual risks to IFU sections

9. Risk assessment

10. Open items

Regulatory framework: what the BSI meeting revealed

The four applicable guidance documents

What Erin said: occurrence rates must be traceable

What Nick said: cross-referencing the correct risk document

What MEDDEV Annex A7.2 requires that is currently missing

The severity justification: regulatory grounding