Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
      • Round 1
        • M1: Diagnostic Function
        • M2: Software V&V
        • N1: Information Supplied
        • N2: Usability
        • N3: Risk Management
          • Question
          • Research and planning
          • Response
    • BSI Non-Conformities
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Technical Review
  • Round 1
  • N3: Risk Management
  • Research and planning

Research and planning

Internal working document

This page is for internal planning only. It will not be included in the final response to BSI.

What BSI is asking​

BSI reviewed risk R-DAG ("The medical device outputs a wrong result") in the Risk Management Record (R-TF-013-002) and found four implemented mitigations listed:

  1. Information about device outputs are detailed in the IFU.
  2. The medical device returns metadata about the output that helps supervising it, such as explainability media and other metrics.
  3. The device returns an interpretative distribution representation of possible ICD categories, not just one single condition.
  4. AI models undergo retraining using expanded dataset of images.

BSI then cross-referenced these mitigations with the "Mitigation or Control Requirement(s)" and "Verification of implementation of risk control measures" columns, checking against the software requirements (R-TF-012-034) and test descriptions. They could not find corresponding requirements or test evidence that clearly address explainability, interpretive distributions, retraining, or IFU information about device outputs.

BSI also flags: "It is unclear if other risks are similarly impacted" — implying they suspect a systemic traceability gap.

Underlying regulatory concern: EN ISO 14971:2019 requires a complete, verifiable traceability chain for risk controls. The specific sub-clauses BSI is testing:

ISO 14971 sub-clauseRequirementHow it applies here
7.2Risk control measures shall be implemented and their implementation verifiedThe core issue — traceability from mitigation → requirement → test must be demonstrable
7.4Benefit-risk analysis for residual risksCorrected traceability must not change the benefit-risk conclusion
7.6Completeness of risk controlThe "other risks" audit addresses whether risk control is complete across the register

BSI's cited GSPRs map as follows:

GSPRRequirementRelevance to N3
GSPR 1Devices shall achieve intended performance and be suitable for their intended purposeThe mitigations (explainability, distributions, IFU) ensure the device output supports HCP decision-making as intended
GSPR 4Manufacturers shall establish and maintain a risk management system per Annex I §3The traceability chain (risk → control → requirement → verification) is a core element of this system
GSPR 17.2Diagnostic devices shall provide sufficient accuracy, precision, and stabilityThe ICD probability distribution and explainability media are the mechanisms by which accuracy/precision are communicated to the HCP

BSI also cites Annex II documentation requirements:

Annex II sectionWhat it requiresHow it applies
5(b)Description and justification of residual risksR-TF-013-002 must demonstrate that residual risks are acceptable after controls are verified
6.1(a)/(b)Evidence of GSPR compliance (tests, clinical data, etc.)The verification test cases are the evidence — they must clearly map to the mitigations
6.2(f)Risk analysis including risk control measuresThe complete traceability chain in R-TF-013-002 fulfils this requirement

What BSI is NOT saying: They are not saying the mitigations are unimplemented. They are saying they could not find the traceability evidence linking mitigations → requirements → tests. This is a documentation/traceability gap, not necessarily a missing implementation gap.

Root cause diagnosis​

The central issue is that R-DAG's mitigationRequirements field contains the same SRS codes as its causeRequirements field — these are infrastructure/API codes, not the codes that implement the actual mitigations:

FieldSRS codesWhat they cover
causeRequirementsSRS-7PJ, SRS-AQM, SRS-BYJ, SRS-DW0, SRS-D3N, SRS-LBSAPI port listening, HTTP status codes, JSON format, authentication, clinical params endpoint, URL versioning
mitigationRequirements (SRS part)SRS-7PJ, SRS-AQM, SRS-BYJ, SRS-DW0, SRS-D3N, SRS-LBSIdentical to cause codes
mitigationRequirements (LR part)LR-4XK, LR-9WR, LR-4RZ, LR-8YNIFU read instruction, output interpretation guidance, warnings/precautions, HCP supervision

The test cases in verificationOfImplementation (C106, C454, C455, C50, C62, C68, C73, C77) all map to those infrastructure SRS codes — they verify HTTP status codes, JSON format, authentication, and API versioning. None of them verify explainability, probability distributions, or AI outputs. This is why BSI found them irrelevant.

The actual SRS codes and test cases that implement and verify the mitigations do exist but were never linked to R-DAG. The analysis below is mitigation by mitigation.

Mitigation-by-mitigation analysis​

Mitigation 1: "Information about device outputs are detailed in the IFU"​

Status: Implemented. Traceability incomplete.

What exists in the IFU:

The IFU contains comprehensive documentation of all device output fields:

IFU sectionPathWhat it covers
User Interface (device outputs)apps/eu-ifu-mdr/versioned_docs/version-1.1.0.0/installation-manual/user-interface.mdxFull JSON output structure: probability distributions (conclusions array), entropy scores (0-100 with thresholds), explainability media (explainabilityMedia field), clinical indicators, severity scores, image quality
Clinical troubleshootingapps/eu-ifu-mdr/versioned_docs/version-1.1.0.0/troubleshooting/clinical.mdxHow to interpret interpretive distributions, entropy as uncertainty measure, top-5 accuracy approach, explainability media for understanding AI reasoning
JSON output exampleapps/eu-ifu-mdr/src/components/AnonymousDiagnosticReport/_anonymous_diagnostic_report_json.mdxComplete JSON output specimen with all explainability fields populated

LR requirements correctly listed in R-DAG:

  • LR-9WR (Device outputs interpretation guidance): Explains probability distribution format, entropy scores, heat maps, clinical indicator meanings
  • LR-4RZ (Warnings and precautions): Warns that outputs support (not replace) clinical judgment; requires review of explainability media
  • LR-8YN (Device supervision requirement): Mandates HCP supervision; final diagnostic decisions remain with HCP
  • LR-4XK (Read the IFU before use): Directs users to the complete IFU

Gap: BSI notes that "none of the tests appear to verify information about device outputs in the IFU." This overlaps with M2 Q2, which also flags that labeling requirements verification evidence could not be found. The LR codes in R-DAG are the correct mitigation references, but the verification chain for labeling requirements is incomplete. Our M2 response will establish the LR verification chain; N3 can cross-reference it.

Corrective action: No change needed to R-DAG's mitigation requirements for this item (LR codes are correct). The labeling verification gap is addressed systemically in M2 Q2.


Mitigation 2: "The medical device returns metadata about the output that helps supervising it, such as explainability media and other metrics"​

Status: Implemented and verified. Traceability broken — wrong SRS codes and test cases referenced in R-DAG.

SRS requirements that implement this mitigation (exist but NOT listed in R-DAG):

SRS codeTitleWhat it requires
SRS-0ABGenerate per-image ICD analysis with explainability heat mapFor each image, generate: ICD category probabilities + explainability object with Base64-encoded heat map (heatMap), its contentType, and title
SRS-K7MOrchestrate diagnosis support workflowGenerate pixel-level attention indicators (heat maps or saliency masks) that highlight image regions most influential to each predicted category

Note: SRS-Q9M (Clinical Signs Analysis Endpoint) was considered but excluded. SRS-Q9M covers the POST /clinical-signs-analysis severity assessment endpoint, which is a different analysis pathway from the ICD diagnosis workflow. R-DAG's risk is specifically about the ICD interpretive distribution, so only SRS codes directly implementing the ICD pathway should be referenced to keep traceability tight and defensible.

Test cases that verify this mitigation (exist but NOT listed in R-DAG):

Test IDCase IDTitleWhat it verifiesSRS
T123C256Verify response includes per-image ICD probabilities and heat maps for top five categoriesexplanation.attentionMap objects, colour model data, Base64-encoded image dataSRS-0AB
T132C265Verify diagnosis workflow returns ranked ICD-11 codes, binary indicators, and explainability mapsEntropy of result, pixel-level attention indicators (heat maps/saliency masks) for top-5 conclusionsSRS-K7M

What is currently in R-DAG instead: SRS-7PJ (API port listening), SRS-AQM (HTTP status codes), etc., verified by C50 (accepts HTTP requests), C62 (returns 200), etc. — entirely unrelated to explainability.

Corrective action: Add SRS-0AB, SRS-K7M to mitigationRequirements. Add C256 (T123), C265 (T132) to verificationOfImplementation.


Mitigation 3: "The device returns an interpretative distribution representation of possible ICD categories, not just one single condition"​

Status: Implemented and verified. Traceability broken — wrong SRS codes and test cases referenced in R-DAG.

SRS requirements that implement this mitigation (exist but NOT listed in R-DAG):

SRS codeTitleWhat it requires
SRS-Q3QGenerate an aggregated ICD probability distribution from a set of imagesReturn a normalized probability distribution across all ICD categories (not a single diagnosis). Each element contains: calculated probability, official ICD code, display name, system identifier, and version
SRS-K7MOrchestrate diagnosis support workflowCompute normalized probability vector across all supported ICD-11 categories (sum = 100%). Generate top-5 ranked output with ICD-11 codes and confidence scores

Test cases that verify this mitigation (exist but NOT listed in R-DAG):

Test IDCase IDTitleWhat it verifiesSRS
T122C255Verify API returns aggregated ICD probability distribution with structured code detailshypotheses array with numeric probability fields, valid ICD-11 code structures, distribution across all categoriesSRS-Q3Q
T132C265Verify diagnosis workflow returns ranked ICD-11 codes, binary indicators, and explainability mapsTop-5 ranked ICD-11 categories, probability sum = 100% across full distribution, entropy, five binary indicatorsSRS-K7M

Additionally, the AI Models Integration Tests (T307-T379, C466-C539) verify that each individual AI model produces correct probability_distribution outputs and icd_distribution data with entropy scores and top-5 predictions — providing model-level evidence that the interpretive distribution is generated correctly at every layer of the system.

Corrective action: Add SRS-Q3Q, SRS-K7M to mitigationRequirements. Add C255 (T122), C265 (T132) to verificationOfImplementation. Consider referencing the AI Models Integration Tests (T307-T379) as additional model-level verification evidence.


Mitigation 4: "AI models undergo retraining using expanded dataset of images"​

Status: This is a prospective lifecycle/process control, not a software feature. It has no software-level traceability because it should not have any.

This mitigation is fundamentally different from mitigations 1-3. It is not something the device software does at runtime — it is something the organisation does as part of its AI lifecycle management. It is:

  • Defined in GP-028 AI Development, § AI Updates → Retraining: "Retraining is performed when an algorithm's core logic or data foundation is modified. This includes training on new or updated data, implementing a new model architecture, or changing key parameters/hyperparameters."
  • Documented via R-TF-028-007 AI Retraining Report (mandatory output of any retraining)
  • Governed by GP-024 PCCP (Predetermined Change Control Plan), which classifies retraining as a minor or major AI model version change
  • Verified through R-TF-028-010 AI V&V Checks (mandatory verification before any retrained model is released)
  • Monitored via GP-028 post-market surveillance provisions, which feed back into retraining decisions

Relevant documents:

DocumentPath
GP-028 AI Developmentapps/qms/docs/procedures/GP-028/index.mdx
GP-024 PCCPapps/qms/docs/procedures/GP-024/index.mdx
T-028-007 AI Retraining Report templateapps/qms/docs/procedures/GP-028/Templates/T-028-007.mdx
R-TF-028-010 AI V&V Checks (v1.1.0.0)apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/artificial-intelligence/r-tf-028-010-aiml-vv-checks.mdx

Important distinction — prospective vs. completed: No retraining has been performed for v1.1.0.0 (no completed R-TF-028-007 record exists). Retraining is a prospective control: it will be triggered when PCCP criteria are met (e.g., post-market data indicating performance drift, new training data available). The mitigation statement in R-DAG should therefore be reworded to reflect this accurately:

  • Current wording (misleading): "AI models undergo retraining using expanded dataset of images."
  • Proposed wording: "AI models are subject to retraining under expanded datasets as governed by GP-028 (§ AI Updates → Retraining) and GP-024 (PCCP), with verification through R-TF-028-010 (AI V&V Checks) before any retrained model is released."

This wording honestly describes the control without implying retraining has already occurred for this version.

Gap: The risk management record currently references only software test cases in verificationOfImplementation. There is no mechanism to reference process-level controls. The retraining mitigation has no explicit traceability at all in R-TF-013-002.

Corrective action:

  1. Reword the mitigation statement in implementedMitigations to use the proposed wording above.
  2. Add a reference to GP-028 (§ AI Updates → Retraining), GP-024 (PCCP), and R-TF-028-010 (AI V&V Checks) in verificationOfImplementation. This requires extending the verification text to include process-level references alongside test case references.
  3. In the response to BSI, explicitly explain that retraining is a lifecycle control verified through QMS process adherence, not through runtime software tests, and that it is a prospective control governed by PCCP.

"It is unclear if other risks are similarly impacted" — Systematic audit results​

BSI explicitly asks whether other risks have the same traceability gap. A systematic audit of all 62 risks in R-TF-013-002 was performed, checking three criteria:

  1. Whether mitigationRequirements SRS codes are just copies of causeRequirements (rather than codes implementing the actual mitigations)
  2. Whether verificationOfImplementation test cases verify the mitigation requirements (not just the cause requirements)
  3. Whether process-level controls (e.g. retraining) have any traceability at all

Audit findings summary​

29 out of 62 risks have some form of the traceability gap BSI identified in R-DAG. They fall into three categories:

Category A: Identical cause/mitigation codes with infrastructure-only verification (21 risks) — CRITICAL​

These risks have mitigationRequirements SRS codes identical to causeRequirements — no additional mitigation codes were added. Their verification test cases only cover infrastructure (API port, HTTP status codes, JSON format, authentication, versioning). This is the exact pattern BSI flagged in R-DAG.

Infrastructure/API group (cause = SRS-7PJ, SRS-AQM, SRS-BYJ, SRS-DW0, SRS-D3N, SRS-LBS):

Risk IDRisk nameMitigation typeGap
R-T8QData transmission failure from HCP systemError handling + availabilityNo SRS codes for error handling or availability mitigations
R-3N5Data input failureError handling + availabilitySame as R-T8Q
R-YF4Data accessibility failureError handling + availabilitySame as R-T8Q
R-LRPData transmission failureError messages + FHIRNo LR codes for FHIR IFU documentation
R-MWDInterruption of serviceElastic scaling, backups, RESTNo SRS/LR codes for scaling or backup mitigations
R-OM1Data overwriteREST protocol immutabilityArchitectural argument, no distinct mitigation code
R-B63Inconsistent or unreliable outputAlgorithm V&V with representative datasetsProcess-level (GP-012), no requirement code
R-VL1Device failure or performance degradationElastic scaling + error messagesNo SRS for auto-scaling; no LR for error messaging
R-72DSOUP anomaly/incompatibilityCareful SOUP analysisProcess-level mitigation, no requirement trace
R-MQ1SOUP not maintained nor patchedSOUP monitoring and patchingProcess-level mitigation, no requirement trace

Regulatory/GSPR group:

Risk IDRisk nameMitigation typeGap
R-QLFNon-compliance with GSPRDevelop per harmonised standardsProcess-level, no SRS/LR trace
R-ES8Absence of risk management processISO 14971 implementationProcess-level, no SRS/LR trace
R-C6QAbsence of PMS & PMCF processPMS/PMCF plansProcess-level, no SRS/LR trace
R-27MInadequate maintenanceMaintenance planProcess-level, no SRS/LR trace
R-HH0Electronic data tamperedOAuth/JWT, encryption, SSL/TLSSecurity SRS codes exist (SRS-1KW, SRS-WER, SRS-SDZ, SRS-WGF) but are NOT referenced
R-9SSSOUP cybersecurity vulnerabilitiesSOUP analysis + design reviewProcess-level, no requirement code
R-33BElectronic IFU tamperedGPG signed commits, RBAC, branch approvalsToolchain controls, no product-level SRS/LR codes

AI/ML group:

Risk IDRisk nameMitigation typeGap
R-GY6Inaccurate training dataCareful image selection, hired HCPsProcess-level, no requirement trace
R-7USBiased or incomplete training dataSame as R-GY6Same gap
R-75LStagnation of model performancePlan for retraining, data augmentationProcess-level, no requirement trace
R-PWKDegradation of model performanceManual retraining, data augmentationProcess-level, no requirement trace

Category B: Retraining mitigation with no traceability (5 risks) — HIGH​

These risks include "AI models undergo retraining" as an implemented mitigation but have no corresponding requirement code or process-level verification reference:

Risk IDRisk nameMitigation wordingAdditional issue
R-DAGWrong result (ICD distribution)"AI models undergo retraining using expanded dataset of images"The original BSI finding
R-75HIncorrect clinical information"AI models undergo retraining using expanded dataset of images"Same infrastructure-only verification as R-DAG
R-SKKIncorrect results shown to patient"AI models undergo retarining [sic] using expanded dataset of images"Typo: "retarining" → "retraining"
R-75LStagnation of model performance"We plan for re-training during the design and development process"Also in Category A
R-PWKDegradation of model performance"we plan for exclusively manual retraining"Also in Category A

Category C: Risks with better traceability (not impacted)​

R-BDR (Misinterpretation of data returned by the device) was initially suspected but appears better traced than R-DAG. It adds LR codes (LR-4XK, LR-9WR, LR-8HV, LR-5TG) beyond the cause codes, and its verification test set (C368, C369, C373, C374, etc.) includes FHIR-specific tests, not just the generic infrastructure set. However, R-BDR should still be reviewed to confirm its LR verification chain is complete.

The remaining 33 risks either have no mitigations (risks accepted without control), have correctly differentiated mitigation codes, or have mitigations whose traceability is appropriate.

How to report this to BSI​

The response should:

  1. Acknowledge that the audit found additional risks with the same traceability pattern
  2. Categorise the findings: (a) risks where mitigation codes need correction, (b) risks where process-level controls need traceability references
  3. State that all affected risks have been corrected in the updated R-TF-013-002 (red-lined version provided)
  4. Note the R-SKK typo correction as part of the update
  5. Confirm that risks not in these categories were verified as correctly traced

Relationship with other NCs​

NCOverlap with N3How to handle in N3 response
M2 Q2Labeling requirements (LR-XXX) verification gap. The LR codes in R-DAG are correct, but the verification evidence for labeling requirements is also questioned in M2. Our M2 response establishes the LR verification chain.N3 should state: "The LR codes (LR-4XK, LR-9WR, LR-4RZ, LR-8YN) are the correct mitigation references for this item. These labeling requirements are verified against the IFU content as documented in R-TF-012-037; the complete verification evidence for labeling requirements is provided in our response to M2 Q2." This makes N3 self-contained while avoiding duplication.
M1 Q4BSI found that response.json for test T377 was missing icd_distribution and top_5_predictions keys. This relates directly to mitigations 2 and 3 of R-DAG (probability distribution, ICD categories).N3 should note that the AI Models Integration Tests (T307-T379) provide model-level verification evidence for ICD distributions, and reference M1 Q4 for the detailed explanation of the test evidence format.

Response strategy​

Approach: Acknowledge the traceability gap, demonstrate the implementations exist, provide corrected documentation, and report the results of a systematic audit of all risks.

The response to BSI should:

  1. Acknowledge that BSI correctly identified a traceability gap in R-TF-013-002 for R-DAG, per ISO 14971:2019 clause 7.2 (verification of implementation of risk control measures)
  2. Provide a mitigation-by-mitigation mapping for R-DAG showing: mitigation statement → SRS/LR requirement(s) → test case(s) → result, demonstrating compliance with ISO 14971:2019 clause 7.2 and Annex II 6.1(b)
  3. Explain that "retraining" is a prospective lifecycle control governed by GP-028 and GP-024 (PCCP), which will be verified through R-TF-028-010 (AI V&V Checks) before any retrained model is released — not through runtime software tests. Cite ISO 14971:2019 clause 7.2 note on risk control measures that may include "inherent safety by design, protective measures, or information for safety"
  4. Confirm that the retraining mitigation statement has been reworded to accurately reflect its prospective nature
  5. State that R-TF-013-002 has been updated with correct traceability for R-DAG (red-lined version provided), satisfying Annex II 6.2(f)
  6. Report audit results: A systematic audit of all 62 risks identified 29 risks with analogous traceability gaps (21 with identical cause/mitigation codes, 5 with untraced retraining mitigations, plus overlap). All have been corrected in the updated R-TF-013-002. This addresses ISO 14971:2019 clause 7.6 (completeness of risk control)
  7. Confirm that the benefit-risk analysis conclusions in R-TF-013-002 are unchanged by the traceability corrections, per ISO 14971:2019 clause 7.4
  8. Cross-reference M2 Q2 for the labeling requirements verification chain, while keeping N3 self-contained
  9. Reference GSPR 1 (intended performance), GSPR 4 (risk management system), and GSPR 17.2 (diagnostic accuracy) to tie corrective actions back to the cited requirements

Decision: infrastructure SRS codes in R-DAG​

Decision: Keep existing infrastructure codes AND add the correct mitigation codes (Option C).

Rationale: The infrastructure codes (SRS-7PJ, SRS-AQM, SRS-BYJ, SRS-DW0, SRS-D3N, SRS-LBS) provide the foundational transport layer through which the clinical outputs are delivered. While they do not directly implement the mitigations BSI flagged, removing them could be seen as overcorrection and BSI has not asked for their removal. The correct approach is to add the missing mitigation-specific codes (SRS-Q3Q, SRS-0AB, SRS-K7M) alongside the existing infrastructure codes, making the traceability chain complete.

Corrective actions summary​

#ActionWhat to changeFileStatus
1Add correct SRS codes to R-DAG mitigationRequirementsAdd SRS-Q3Q, SRS-0AB, SRS-K7MR-TF-013-002.jsonTo do
2Add correct test cases to R-DAG verificationOfImplementationAdd C255 (T122), C256 (T123), C265 (T132); reference AI Models Integration Tests (T307-T379) as additional model-level evidenceR-TF-013-002.jsonTo do
3Add process-level traceability for retrainingReference GP-028 (§ AI Updates → Retraining), GP-024 (PCCP), R-TF-028-010 in verificationOfImplementationR-TF-013-002.jsonTo do
4Reword retraining mitigation statementChange from present-tense "undergo" to prospective "are subject to" wordingR-TF-013-002.jsonTo do
5Fix R-SKK typo"retarining" → "retraining"R-TF-013-002.jsonTo do
6Correct all 29 audited risksFor each: add correct mitigation codes, add process-level references where applicable, verify test case mappingR-TF-013-002.jsonTo do
7Add security SRS codes to R-HH0Add SRS-1KW, SRS-WER, SRS-SDZ, SRS-WGF (exist but not referenced)R-TF-013-002.jsonTo do
8Keep infrastructure SRS codes in R-DAGDo NOT remove SRS-7PJ, SRS-AQM, etc. — add alongside, not replaceR-TF-013-002.jsonDecision made
9Generate red-lined R-TF-013-002 PDFFor BSI submissionExport from QMSTo do
Previous
Question
Next
Response
  • What BSI is asking
  • Root cause diagnosis
  • Mitigation-by-mitigation analysis
    • Mitigation 1: "Information about device outputs are detailed in the IFU"
    • Mitigation 2: "The medical device returns metadata about the output that helps supervising it, such as explainability media and other metrics"
    • Mitigation 3: "The device returns an interpretative distribution representation of possible ICD categories, not just one single condition"
    • Mitigation 4: "AI models undergo retraining using expanded dataset of images"
  • "It is unclear if other risks are similarly impacted" — Systematic audit results
    • Audit findings summary
    • Category A: Identical cause/mitigation codes with infrastructure-only verification (21 risks) — CRITICAL
    • Category B: Retraining mitigation with no traceability (5 risks) — HIGH
    • Category C: Risks with better traceability (not impacted)
    • How to report this to BSI
  • Relationship with other NCs
  • Response strategy
  • Decision: infrastructure SRS codes in R-DAG
  • Corrective actions summary
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)