Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
          • Request A: Device Description & Intended Purpose
          • Request B: Clinical Benefits, Performance & Safety vs SotA
            • Question
            • Research and planning
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
    • BSI Non-Conformities
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • Item 2: Device Description & Claims
  • Request B: Clinical Benefits, Performance & Safety vs SotA
  • Research and planning

Research and planning

Internal working document

This page is for internal planning only. It will not be included in the final response to BSI.

What BSI is asking​

BSI's clinical reviewer finds the clinical benefit, performance, and safety outcomes — and their acceptance criteria as based on SotA — unclear and insufficiently traced. The SotA document contains article summaries but not a complete analysis showing how those summaries were used to derive acceptance criteria. BSI raises three distinct but interrelated concerns:

1. Clinical benefit​

  • Benefits/acceptance criteria hard to follow in CEP §17.4 (the clinical benefits table at lines 281-289 of R-TF-015-001). Seven benefits, each with multiple means of measure and magnitude thresholds, are presented in a dense table without narrative explanation.
  • "Top-1 accuracy" not defined. The CEP and performance claims use "top-1 accuracy" throughout without explaining what it means. This is an AI/ML metric (the proportion of cases where the correct diagnosis appears as the model's highest-ranked prediction) that clinical reviewers may not know.
  • No SotA traceability for acceptance criteria. Each benefit has specific numerical thresholds (e.g., "15% improvement in diagnostic accuracy", "AUC >= 90%") but the CER/CEP does not show which SotA articles these values were derived from or why they were chosen.
  • 0ZC remote care appears to contradict CEP §14 (use environment). Benefit 0ZC claims "remote diagnosis" and "remote referral" capability, but the use environment text (en.json line 507, rendered in the CEP) states: "The device is intended to be used in the setting of healthcare organisations and their IT departments, which commonly are situated inside hospitals or other clinical facilities." BSI sees a contradiction. However, as analysed below, there is no actual contradiction — the use environment describes the device's IT deployment context (where the API runs), not the clinician's physical location. Teledermatology is a workflow modality that operates within the stated use environment. The CER needs to clarify this distinction for BSI.

2. Clinical performance​

  • Too many claims, hard to follow. ~148 performance claims (in performanceClaims.ts) across 8 studies, 7 benefits, multiple metrics and user groups. No summary or navigation aid.
  • "Multiple conditions" is vague. Many claims use indications: "Multiple conditions" without specifying which conditions are included.
  • Data pooling unexplained. The globalValueOfDevice is computed as a weighted average across studies (formula: Sigma(achievedValue x sampleSize) / Sigma(sampleSize)) but this methodology is not described in the CER or CEP. BSI asks "how/why data was pooled."
  • Some acceptance criteria seem low. For example, 0ZC sensitivity of 30% for remote referrals; 9VW absolute accuracy of 54% for rare diseases; 5RB unweighted kappa of 0.6 for alopecia severity.
  • No SotA traceability. Same as for benefits: the performance claims have acceptanceCriteriaStateOfTheArtValue fields populated (numeric SotA baselines exist in the data) but the CER/CEP does not trace these values to specific SotA articles or explain the derivation.

3. Clinical safety​

  • Safety rates not traced to SotA. The CEP safety endpoints (lines 470-476 of R-TF-015-001) use generic language: "Nb cases of device outputs incorrect clinical information < residual probability in RMF." These are compared to the Risk Management File probabilities, not to SotA/similar device rates from the literature.
  • No justification of appropriateness/relevance of the safety approach.

What regulations are at stake​

  • MDR Annex XIV, 1(a), sub-bullet 4: CEP must include "a clear specification of [...] the relevant and specified clinical outcome parameters used to determine, based on the state of the art in medicine, the acceptability of the benefit-risk ratio for the [...] intended clinical benefit(s)" — this requires traceability from acceptance criteria to SotA.
  • MDR Annex XIV, 1(a), sub-bullet 6: CEP must include "an indication of the clinical performance parameters and clinical safety parameters to be determined during the clinical evaluation, with justification" — BSI expects these to be traced to SotA and similar devices.
  • MDR Article 2(53): "'clinical benefit' means the positive impact of a device on the health of an individual, expressed in terms of a meaningful, measurable, patient-relevant clinical outcome(s), including outcome(s) related to diagnosis" — BSI wants to see each benefit expressed in these terms with clear measurability.
  • MDR Annex II: Technical documentation must include "a discussion of the clinical benefits to patients with reference to relevant regulatory requirements."

Root cause analysis​

The root cause spans three interconnected gaps:

  1. SotA document is descriptive, not analytical. R-TF-015-011 contains a systematic literature search (226 articles screened, 64 retained), appraisal scores (CRIT1-7 framework), and article summaries organized by clinical application (malignancy detection, diagnostic accuracy, referral accuracy, teledermatology, severity assessment). However, it does NOT contain an explicit derivation of acceptance criteria from these articles. The SotA tells you what the literature says about PCP diagnostic accuracy (e.g., Burton 1998: 56.4%; Gerbert 1996: 56.3%) but does NOT show a calculation or justification like: "Given PCP baseline accuracy of 56.4% (Burton 1998, score 8.5/10), we set the acceptance criterion at 10% improvement to reach 62%, because this represents a clinically meaningful improvement based on [reason]."

  2. Performance claims data model has SotA values but no references. Each performance claim in performanceClaims.ts has an acceptanceCriteriaStateOfTheArtValue field (e.g., claim MRT: acceptanceCriteriaStateOfTheArtValue: 0.0636), but this value is not linked to a specific SotA article or page number. The data model stores the baseline number but not its provenance.

  3. Safety is not benchmarked against literature. The CEP safety endpoints compare against the device's own Risk Management File probabilities, not against SotA rates for similar devices or standard clinical practice. The SotA document includes a vigilance database search (MAUDE, EUDAMED) that found zero incidents for similar devices — but this is presented as a search result, not integrated into the safety endpoint framework.

  4. Use environment text is ambiguous, not contradictory. The use environment text describes the device's IT deployment context ("healthcare organisations... situated inside hospitals or other clinical facilities"), which is correct — the API runs within healthcare org infrastructure. BSI read this as restricting the clinician's physical location, but it doesn't. The text needs clarification in the CER response, not fundamental revision. The device is an API: the "use environment" is the server/IT infrastructure, and both in-person and teleconsultation workflows operate within it.

Relevant QMS documents​

DocumentPathRelevance
CEP, Clinical Benefits tableR-TF-015-001-Clinical-Evaluation-Plan.mdx, lines 281-289The "§17.4" BSI references — 7 benefits with means of measure and magnitude thresholds
CEP, Safety endpointsR-TF-015-001-Clinical-Evaluation-Plan.mdx, lines 466-478Safety objectives mapped to risk IDs, with generic acceptance criteria
CEP, Pivotal investigationsR-TF-015-001-Clinical-Evaluation-Plan.mdx, lines 660-6758 study protocols with acceptance criteria per study
SotA documentR-TF-015-011-State-of-the-Art.mdx64 appraised articles, organized by clinical application. Contains baselines but no derivation of acceptance criteria
CERR-TF-015-003-Clinical-Evaluation-Report.mdxClinical evaluation results, safety conclusions. BSI reviewed this and found traceability gaps
Performance claims datapackages/ui/src/components/PerformanceClaimsAndClinicalBenefits/performanceClaims.ts~148 claims with acceptanceCriteriaStateOfTheArtValue field — baselines exist but are not sourced
Clinical benefits datapackages/ui/src/components/PerformanceClaimsAndClinicalBenefits/clinicalBenefits.ts7 benefits with declarative filter criteria
Performance claims typespackages/ui/src/components/PerformanceClaimsAndClinicalBenefits/types.tsglobalValueOfDevice computation (data pooling formula)
Use environment textpackages/reusable/translations/en.json, line 507"situated inside hospitals or other clinical facilities"
Risk Management RecordR-TF-013-002-Risk-Management-Record.mdxResidual probabilities referenced by safety endpoints
Use environment (fyi)fyi/icd-distribution-vs-diagnosis.mdEssential reading: device function vs diagnosis framing
Use environment (fyi)fyi/clinical-evidence-icd-distribution-rationale.mdEssential reading: why claims aren't condition-specific

Gap analysis​

What we already have​

  1. SotA baselines exist in the data model. Each performance claim has an acceptanceCriteriaStateOfTheArtValue populated from SotA literature. The data exists but the provenance chain is broken — the CER/CEP does not show which article each baseline comes from.

  2. The SotA document has the right articles. R-TF-015-011 contains article summaries organized by clinical application (malignancy detection, diagnostic accuracy, referral accuracy, severity assessment, teledermatology). The summaries include quantitative baselines. The gap is that these summaries are not connected to the acceptance criteria in the CEP.

  3. Study acceptance criteria are already defined. CEP lines 660-675 list each pivotal study's acceptance criteria. For example, BI_2024: "An improvement of at least 10% in diagnostic accuracy for GPP when used by PCPs, and at least 5% by dermatologists."

  4. Data pooling methodology is coded but not documented. The globalValueOfDevice formula exists in types.ts and is documented in CLAUDE.md — but nowhere in the CER or CEP.

  5. 0ZC studies support remote care. SAN_2024 and PH_2024 both have secondary objectives evaluating remote care, and DAO_O_2022 includes teledermatology referral assessment. The clinical evidence exists; the use environment text simply wasn't updated to reflect this use context.

  6. Safety rates from studies. The CER states "no serious incidents reported" across all studies, and the vigilance search found zero incidents for similar devices. This data exists but is not structured as a traceability table.

What BSI couldn't find​

  1. Derivation of acceptance criteria from SotA. A table or narrative showing: SotA article X reports baseline Y; we set acceptance criterion Z because [justification]; this maps to benefit [ID].

  2. Definition of "Top-1 accuracy." Nowhere in the CEP or CER is this AI/ML metric explained for a clinical audience.

  3. Explanation of data pooling. The globalValueOfDevice weighted average methodology is not documented in any regulatory document.

  4. Traceability from safety rates to SotA/similar devices. Safety endpoints reference the RMF but not the literature.

  5. Clear use environment for remote care. See analysis below — this is not actually a contradiction.

What genuinely needs updating​

  1. Add SotA derivation traceability (for acceptance criteria): A section in the CER (or CEP, or both) that traces each acceptance criterion back to specific SotA articles, showing the baseline value, the article(s), and the rationale for the chosen threshold.

  2. Define "Top-1 accuracy" in the CER/CEP glossary section.

  3. Document the data pooling methodology in the CER: explain the weighted-average formula and why cross-study aggregation is appropriate.

  4. Add safety benchmarking against SotA: Create a table that compares observed safety outcomes from our studies against (a) similar device incident rates from vigilance databases, and (b) standard clinical practice safety rates from SotA literature.

  5. Clarify (not change) the use environment text in the CER response to BSI — see analysis below.

  6. Add a navigable summary for the ~148 performance claims (e.g., summary table per benefit showing aggregate results, with detailed claims as supporting detail).

  7. Clarify "multiple conditions" — explain what this indication label means in the context of each study.

Use environment vs remote care — NOT a contradiction (clarification needed, not a fix)

BSI reads the use environment text ("healthcare organisations... situated inside hospitals or other clinical facilities") and benefit 0ZC ("remote diagnosis", "remote referral") and sees a contradiction. On closer analysis, there is no contradiction. The apparent conflict arises from conflating two different concepts:

1. Use environment = where the device is deployed (IT infrastructure). The use environment text describes the device's deployment context: it runs as an API integrated into a healthcare organisation's IT system. The two sentences say:

  • "The device is intended to be used in the setting of healthcare organisations and their IT departments, which commonly are situated inside hospitals or other clinical facilities."
  • "The device is intended to be integrated into the healthcare organisation's system by IT professionals."

This describes where the software runs — on the healthcare organisation's servers/infrastructure. It does NOT restrict where the clinician sits when they access the device through their organisation's system.

2. Remote care = a clinical workflow modality, not a change in use environment. A dermatologist reviewing images from home through their hospital's system is still using the device "in the setting of healthcare organisations." The device is running on the organisation's infrastructure, accessed through the organisation's authenticated systems, within the organisation's clinical workflow. Teledermatology is standard clinical practice — the MDR does not require that clinicians be physically inside a hospital to use a cloud/API-based SaMD.

MDR regulatory basis: MDR Annex I GSPR 14.1 requires specifying "conditions of use" and "use environment." For SaMD, this means the IT environment, network requirements, and integration context (see also MDCG 2019-11 on SaMD qualification). The MDR does not define "use environment" as the physical location of the end user — that would be unworkable for any cloud-based or API-based medical device, as clinicians routinely access hospital systems remotely.

Clinical evidence supports this reading:

  • Study SAN_2024: "conducted remotely by sending the images to the participating professionals"
  • Study PH_2024: "conducted remotely via image analysis by participating primary care professionals"
  • Study BI_2024: "conducted remotely by sending the images to the participating dermatologists"
  • Study COVIDX_EVCDAO_2022: "continuous and remote monitoring of patient condition severity"
  • Planned studies triaje_VH_2025 and clinical_VH_2025: explicitly target "automated triage in teledermatology"

In every case, the device was deployed in a healthcare organisation's infrastructure. The "remote" aspect refers to the clinical workflow (image-based teleconsultation), not a different deployment environment.

Response strategy for BSI: Do NOT frame this as "we need to change the use environment." Instead:

  1. Explain that the use environment text describes the device's deployment context (healthcare org IT infrastructure), not the clinician's physical location.
  2. Note that teledermatology is a workflow modality that operates entirely within the stated use environment — the device runs on the healthcare organisation's servers regardless of whether the consultation is in-person or remote.
  3. Point to the clinical studies that validate the device in both in-person and remote workflows, all deployed within healthcare organisations' systems.
  4. If BSI wants the text to be more explicit, we can add a clarifying sentence (e.g., "The device supports both in-person and teleconsultation clinical workflows within this deployment environment") — but this is a clarification, not a correction. The current text does not exclude remote use.

No decision pending. This does not require a regulatory decision about the intended purpose. The device already supports teledermatology. The use environment text already permits it. BSI's observation stems from a misreading of "healthcare facilities" as restricting clinician location rather than describing the IT deployment context.

Response strategy​

Regulatory mapping​

BSI concernGSPR / Annex clauseHow our corrective action addresses it
Acceptance criteria not traced to SotAAnnex XIV 1(a) sub-bullet 4Add derivation table linking each acceptance criterion to SotA article, baseline value, and justification
Performance/safety parameters not justifiedAnnex XIV 1(a) sub-bullet 6Add justification narrative for each parameter and benchmark against SotA/similar devices
Clinical benefits unclear/not measurableArticle 2(53)Add explanatory text defining each metric (including "Top-1 accuracy"), with clear patient-relevant outcomes
Data pooling methodologyAnnex IIDocument the weighted-average formula and justification in the CER
Safety not traced to SotAAnnex XIV 1(a) sub-bullet 6; GSPR 1, 8Add safety benchmarking table comparing observed rates against literature and vigilance data
Use environment vs remote careAnnex XIV 1(a) sub-bullet 2Clarify in response that the use environment describes IT deployment context, not clinician physical location; add clarifying sentence to CER if needed

Fix plan​

#ActionDocument affectedComplexity
1Add "Acceptance Criteria Derivation from State of the Art" section to the CERR-TF-015-003 CERHigh — requires systematic mapping of ~30 acceptance criteria to ~64 SotA articles
2Add glossary entry for "Top-1 accuracy" and other AI/ML metricsR-TF-015-003 CER, R-TF-015-001 CEPLow
3Document data pooling methodology (weighted-average formula, grouping criteria, justification)R-TF-015-003 CERMedium
4Add safety benchmarking table comparing observed safety outcomes to SotA/similar device ratesR-TF-015-003 CERMedium
5Clarify use environment text to explicitly state it covers both in-person and teleconsultation workflowsR-TF-015-003 CER (clarifying sentence); response to BSI (explanation)Low — no regulatory decision needed, just clarification
6Add benefit-level summary of performance claims (aggregate results per benefit) to improve navigabilityR-TF-015-003 CERMedium
7Clarify "Multiple conditions" indication label — define what it means in each study contextR-TF-015-003 CER, performance claims documentationLow
8Justify acceptance criteria that appear low (0ZC sensitivity 30%, 9VW accuracy 54%, 5RB kappa 0.6)R-TF-015-003 CERMedium — needs clinical rationale

Response approach​

For each of BSI's three areas, the response should:

  1. Acknowledge that the CER/CEP lacked explicit traceability between acceptance criteria and SotA articles.
  2. Explain that the acceptance criteria were derived from the SotA literature (which was already complete) but the derivation chain was not documented in the CER/CEP — the SotA document provided article summaries, and the CEP set thresholds, but the link between them was implicit rather than explicit.
  3. Describe the fix: point to the new section(s) in the CER that now trace each acceptance criterion to its SotA source.
  4. Reference the updated CER sections with specific paragraph/table numbers.

Response tone rules (from M1.Q1 and Item 2a):

  • Do NOT argue that BSI should have found the information — acknowledge the gap and describe the fix.
  • Do NOT over-explain the data pooling mathematics — present it clearly and briefly.
  • Do NOT claim acceptance criteria are "conservative" or "stringent" — BSI flagged some as seemingly low, so address those specifically with clinical justification.
  • Do NOT frame safety as inherently risk-free because the device is SaMD — BSI expects specific benchmarks even for software.

Handling "acceptance criteria seem low"​

Some acceptance criteria need specific justification because they appear low on first read:

  • 0ZC sensitivity 30% for remote referrals: This must be contextualized against the SotA. If primary care practitioners without the device have a referral sensitivity of X% in teledermatology settings, and 30% represents an improvement or a clinically acceptable threshold for the remote use case, state this explicitly. If the 30% threshold is genuinely low, consider whether it should be revised.

  • 9VW absolute accuracy 54% for rare diseases: Rare diseases are by definition harder to diagnose. If the SotA shows PCP baseline accuracy for rare dermatological conditions is around 40-45%, then 54% with device assistance represents a meaningful improvement. The CER must show this baseline.

  • 5RB unweighted kappa 0.6 for alopecia severity: Kappa thresholds follow Landis & Koch (1977): 0.6 = "moderate" agreement. If interobserver agreement in dermatological severity assessment is typically 0.4-0.5 (the SotA data for HS shows ICC = 0.47), then kappa 0.6 represents improvement. The CER must cite this baseline.

Cross-NC connections​

Connection to Item 2a (Device Description & Intended Purpose)​

Item 2a and Item 2b are two parts of the same deficiency finding. Item 2a addresses WHAT the device does (outputs, ICD categories, indications); Item 2b addresses HOW WELL the device performs (clinical benefits, acceptance criteria, SotA comparison). The fixes must be coordinated:

  • Any new "Acceptance Criteria Derivation from SotA" section in the CER should reference the device description and intended purpose language from Item 2a.
  • The "Multiple conditions" clarification in Item 2b aligns with the ICD-11 category enumeration in Item 2a.
  • The use environment reconciliation (0ZC) affects both items — Item 2a defines the intended purpose scope, Item 2b validates performance claims within that scope.

Connection to Technical Review M1.Q1 (IFU Performance Claims)​

Alignment required across reviews

Item 2b and M1.Q1 (technical review) address performance claims from different angles. M1.Q1 concerns how claims are presented in the IFU (user-facing); Item 2b concerns how they are justified in the CER (regulatory-facing). Both responses go to BSI and must be consistent:

  1. Same claims data model. The ~148 performance claims used in the IFU (via ClinicalBenefitsList components) and in the CER come from the same performanceClaims.ts. Any change to claim structure, acceptance criteria, or SotA values affects both.

  2. Device function vs clinical benefit. M1.Q1 established the framing (uniform distributional output, context-dependent clinical benefits) that Item 2b must maintain when explaining why benefits vary across conditions while the device function is uniform.

  3. SotA baselines. M1.Q1 added acceptanceCriteriaStateOfTheArtValue to the IFU display. Item 2b must trace these same values to specific SotA articles. The values must match.

  4. "Top-1 accuracy" definition. M1.Q1 added a "How to Read the Performance Claims" section to the IFU. The same definition should appear in the CER.

Connection to Items 3a and 3b (Clinical Data)​

Item 3 asks about clinical data analysis and data sufficiency. Item 2b's fixes (SotA traceability, data pooling justification, acceptance criteria rationale) directly support Item 3's requirements. The SotA derivation table created for Item 2b will be referenced in Item 3's response to demonstrate that clinical data analysis is systematically based on pre-defined benchmarks.

Connection to Item 7 (Risk)​

Item 7 asks about severity justification, occurrence estimates, and residual risk. Item 2b's safety benchmarking (fix #4) directly relates — the safety endpoint improvements should align with whatever risk justification approach is used in Item 7's response.

Key research findings​

Finding 1: SotA baselines exist in the data but are not traced​

The performanceClaims.ts data model already contains acceptanceCriteriaStateOfTheArtValue for each claim where applicable. For example:

  • Claim MRT (top-1 accuracy, multiple conditions, all HCPs): SotA value = 0.0636 (6.36% relative improvement baseline)
  • Claim LL5 (ICC, hidradenitis supurativa): SotA value = 0.47 (literature interobserver agreement), CI [0.32, 0.65]

These values came from the SotA literature review (R-TF-015-011) but the specific article provenance is not recorded in the data model or the CER.

Finding 2: SotA document is organized for traceability — just needs the final link​

R-TF-015-011 organizes its article summaries by clinical application:

  • "Clinical data collected on malignancy detection" (line 444+)
  • "Clinical data collected on the diagnostic accuracy of HCPs" (line 456+)
  • "Clinical data collected on the referral accuracy of PCPs" (line 495+)
  • "Clinical data collected on severity assessment" (section exists)
  • "Clinical data collected on teledermatology" (section exists)

Each section includes quantitative baselines from the literature. The gap is that these baselines are not explicitly linked to acceptance criteria in the CEP/CER.

Finding 3: Data pooling is well-defined programmatically​

The globalValueOfDevice computation (CLAUDE.md documentation):

  • Formula: Sigma(achievedValue x sampleSize) / Sigma(sampleSize)
  • Grouping: Claims are grouped when ALL of indications, userGroup, acceptanceCriteriaDomain, acceptanceCriteriaMetric, acceptanceCriteriaValueMagnitude, and performanceSubject match
  • Sample sizes sourced from clinicalStudiesData.ts

This is a methodologically sound weighted-average approach. It just needs to be documented in the CER with justification for why cross-study aggregation is appropriate (same device version, same measurement methodology, compatible study designs).

Finding 4: Safety approach relies on RMF not SotA​

CEP safety endpoints (lines 470-476) use the pattern: "Nb cases of [harm] < residual probability in RMF for the corresponding risk(s) (a possibility between 0.1% and 0.01%)." The acceptance criteria are entirely internal (reference the device's own RMF) rather than benchmarked against:

  • Adverse event rates from similar devices (vigilance search found 0 incidents for SkinVision, Molescope, Huvy, DERM, Dermalyser, FotoFinder)
  • Misdiagnosis rates in standard clinical practice from SotA literature
  • Adverse outcome rates for AI dermatology devices from published studies

Finding 5: Use environment text is narrower than clinical evidence​

The use environment (en.json line 507) says "healthcare organisations... situated inside hospitals or other clinical facilities." However:

  • Study COVIDX_EVCDAO_2022: secondary objective included "confirming that the utilization of the device elicits a high level of patient satisfaction, particularly in its remote application"
  • Study DAO_O_2022: acceptance criteria include "sensitivity and specificity equal to or superior to the PCP to identify necessary referrals in teledermatology"
  • Studies SAN_2024 and PH_2024: secondary objectives include "validate what percentage of cases could be handled remotely"

The clinical evidence clearly supports remote care use, but the use environment text was not updated to reflect this. This is likely an oversight from when the use environment text was originally written for the MDD-era legacy device.

Finding 6: "Top-1 accuracy" is a standard AI metric but undefined for clinicians​

"Top-1 accuracy" measures whether the correct diagnosis appears as the model's highest-probability prediction. In the context of this device (which outputs a probability distribution), "Top-1 accuracy with device" means: the proportion of cases where the HCP, when reviewing the device's distributional output, selects the correct diagnosis as their primary choice. The comparison "Top-1 accuracy without device" is the proportion of cases where the HCP's unaided primary choice is correct. The acceptance criteria are typically expressed as the relative improvement between these two rates.

Potential weaknesses (BSI auditor perspective)​

Internal working document

These concerns were identified through a critical review from the BSI auditor's perspective.

High risk: SotA analysis gap is real and substantive​

BSI says: "The SotA document seems to contain a summary of each article but complete analysis is not found." This is correct. R-TF-015-011 has 64 appraised articles, each scored and summarized, but the document does NOT contain a synthesis section that says: "Based on articles X, Y, Z, the SotA diagnostic accuracy for PCPs is A%; therefore, we set the acceptance criterion at B% improvement because [justification]." The fix requires creating this synthesis — this is significant work and the highest priority.

High risk: Some acceptance criteria may be challenged as too low​

BSI flagged that "some acceptance criteria seem low." Specific examples:

  • 0ZC sensitivity of 30% for remote referrals: Even in teledermatology, 30% may seem unacceptable. The justification must explain this is an improvement over baseline, or the threshold should be reconsidered.
  • 9VW accuracy of 54% with device for rare diseases: While better than ~40% baseline, BSI may consider 54% clinically insufficient for an IIb device.
  • 5RB kappa 0.6: "Moderate" agreement by Landis & Koch standards — BSI may expect "substantial" (0.61-0.80).

These need specific justification referencing the SotA baselines and clinical context, or the thresholds need to be reconsidered.

Medium risk: Data pooling may be challenged​

Cross-study aggregation using weighted averages is methodologically defensible but BSI may question:

  • Are the studies sufficiently homogeneous to pool?
  • Were any studies excluded from pooling and why?
  • Is a weighted average appropriate when studies have different designs (prospective vs retrospective)?

The CER must address these questions proactively in the data pooling methodology section.

Low risk: 0ZC use environment perception​

BSI perceived a contradiction between the use environment and 0ZC, but as analysed above, there is no actual contradiction — the use environment describes IT deployment, not clinician physical location. The risk is that BSI insists on a different reading. Our response should preemptively clarify the distinction and point to the clinical studies conducted in remote workflows within healthcare organisation systems. If BSI still disagrees, adding a clarifying sentence to the use environment text is a low-effort fix.

Low risk: Number of claims may invite scrutiny​

~148 performance claims is a large number. BSI may question whether all are necessary or whether they represent a fragmented evidence base. A summary table per benefit (showing the aggregate picture) would help frame the detail as supporting a clear benefit case.

Open items requiring decisions​

#QuestionWho decidesImpact
1Acceptance criteria values: are the low-seeming thresholds (0ZC 30%, 9VW 54%, 5RB 0.6) correct and defensible?JordiIf not defensible, may need to revise thresholds or withdraw specific claims
2SotA article-to-criterion mapping: does this mapping already exist internally (e.g., in Jordi's records), or does it need to be created from scratch?JordiAffects timeline — creating the mapping from scratch requires reviewing all 64 SotA articles against all acceptance criteria
Previous
Question
Next
Item 3: Clinical Data
  • What BSI is asking
    • 1. Clinical benefit
    • 2. Clinical performance
    • 3. Clinical safety
    • What regulations are at stake
  • Root cause analysis
  • Relevant QMS documents
  • Gap analysis
    • What we already have
    • What BSI couldn't find
    • What genuinely needs updating
  • Response strategy
    • Regulatory mapping
    • Fix plan
    • Response approach
    • Handling "acceptance criteria seem low"
  • Cross-NC connections
    • Connection to Item 2a (Device Description & Intended Purpose)
    • Connection to Technical Review M1.Q1 (IFU Performance Claims)
    • Connection to Items 3a and 3b (Clinical Data)
    • Connection to Item 7 (Risk)
  • Key research findings
    • Finding 1: SotA baselines exist in the data but are not traced
    • Finding 2: SotA document is organized for traceability — just needs the final link
    • Finding 3: Data pooling is well-defined programmatically
    • Finding 4: Safety approach relies on RMF not SotA
    • Finding 5: Use environment text is narrower than clinical evidence
    • Finding 6: "Top-1 accuracy" is a standard AI metric but undefined for clinicians
  • Potential weaknesses (BSI auditor perspective)
    • High risk: SotA analysis gap is real and substantive
    • High risk: Some acceptance criteria may be challenged as too low
    • Medium risk: Data pooling may be challenged
    • Low risk: 0ZC use environment perception
    • Low risk: Number of claims may invite scrutiny
  • Open items requiring decisions
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)