Research and planning

Internal working document

This page is for internal planning only. It will not be included in the final response to BSI.

What BSI is asking

BSI's clinical reviewer finds the clinical benefit, performance, and safety outcomes, and their acceptance criteria as based on SotA, unclear and insufficiently traced. The SotA document contains article summaries but not a complete analysis showing how those summaries were used to derive acceptance criteria. BSI raises three distinct but interrelated concerns:

1. Clinical benefit

Benefits/acceptance criteria hard to follow in CEP §17.4 (the clinical benefits table at lines 281-289 of R-TF-015-001). Three benefits with sub-criteria, each with multiple means of measure and magnitude thresholds, are presented in a dense table without narrative explanation.
"Top-1 accuracy" not defined. The CEP and performance claims use "top-1 accuracy" throughout without explaining what it means. This is an AI/ML metric (the proportion of cases where the correct diagnosis appears as the model's highest-ranked prediction) that clinical reviewers may not know.
No SotA traceability for acceptance criteria. Each benefit has specific numerical thresholds (e.g., "15% improvement in diagnostic accuracy", "AUC >= 90%") but the CER/CEP does not show which SotA articles these values were derived from or why they were chosen.
Remote care sub-criterion of benefit 3KX (formerly 0ZC) appears to contradict CEP §14 (use environment). The remote care sub-criterion of benefit 3KX (which claimed "remote diagnosis" and "remote referral" capability, previously coded 0ZC) appears to conflict with the use environment text (en.json line 507, rendered in the CEP): "The device is intended to be used in the setting of healthcare organisations and their IT departments, which commonly are situated inside hospitals or other clinical facilities." BSI sees a contradiction. However, as analysed below, there is no actual contradiction; the use environment describes the device's IT deployment context (where the API runs), not the clinician's physical location. Teledermatology is a workflow modality that operates within the stated use environment. The CER needs to clarify this distinction for BSI.

2. Clinical performance

Too many claims, hard to follow. ~148 performance claims (in performanceClaims.ts) across 8 studies, 3 benefits (7GH, 5RB, 3KX) with sub-criteria, multiple metrics and user groups. No summary or navigation aid.
"Multiple conditions" is vague. Many claims use indications: "Multiple conditions" without specifying which conditions are included.
Data pooling unexplained. The globalValueOfDevice is computed as a weighted average across studies (formula: Sigma(achievedValue x sampleSize) / Sigma(sampleSize)) but this methodology is not described in the CER or CEP. BSI asks "how/why data was pooled."
Some acceptance criteria seem low. For example, 3KX remote care sub-criterion sensitivity of 30% for remote referrals (previously coded 0ZC); 7GH rare disease sub-criterion absolute accuracy of 54% for rare diseases (previously coded 9VW); 5RB unweighted kappa of 0.6 for alopecia severity.
No SotA traceability. Same as for benefits: the performance claims have acceptanceCriteriaStateOfTheArtValue fields populated (numeric SotA baselines exist in the data) but the CER/CEP does not trace these values to specific SotA articles or explain the derivation.

3. Clinical safety

Safety rates not traced to SotA. The CEP safety endpoints (lines 470-476 of R-TF-015-001) use generic language: "Nb cases of device outputs incorrect clinical information < residual probability in RMF." These are compared to the Risk Management File probabilities, not to SotA/similar device rates from the literature.
No justification of appropriateness/relevance of the safety approach.

What regulations are at stake

MDR Annex XIV, 1(a), sub-bullet 4: CEP must include "a clear specification of [...] the relevant and specified clinical outcome parameters used to determine, based on the state of the art in medicine, the acceptability of the benefit-risk ratio for the [...] intended clinical benefit(s)"; this requires traceability from acceptance criteria to SotA.
MDR Annex XIV, 1(a), sub-bullet 6: CEP must include "an indication of the clinical performance parameters and clinical safety parameters to be determined during the clinical evaluation, with justification"; BSI expects these to be traced to SotA and similar devices.
MDR Article 2(53): "'clinical benefit' means the positive impact of a device on the health of an individual, expressed in terms of a meaningful, measurable, patient-relevant clinical outcome(s), including outcome(s) related to diagnosis"; BSI wants to see each benefit expressed in these terms with clear measurability.
MDR Annex II: Technical documentation must include "a discussion of the clinical benefits to patients with reference to relevant regulatory requirements."

Root cause analysis

The root cause spans three interconnected gaps:

SotA document is descriptive, not analytical. R-TF-015-011 contains a systematic literature search (226 articles screened, 64 retained), appraisal scores (CRIT1-7 framework), and article summaries organized by clinical application (malignancy detection, diagnostic accuracy, referral accuracy, teledermatology, severity assessment). However, it does NOT contain an explicit derivation of acceptance criteria from these articles. The SotA tells you what the literature says about PCP diagnostic accuracy (e.g., Burton 1998: 56.4%; Gerbert 1996: 56.3%) but does NOT show a calculation or justification like: "Given PCP baseline accuracy of 56.4% (Burton 1998, score 8.5/10), we set the acceptance criterion at 10% improvement to reach 62%, because this represents a clinically meaningful improvement based on [reason]."
Performance claims data model has SotA values but no references. Each performance claim in performanceClaims.ts has an acceptanceCriteriaStateOfTheArtValue field (e.g., claim MRT: acceptanceCriteriaStateOfTheArtValue: 0.0636), but this value is not linked to a specific SotA article or page number. The data model stores the baseline number but not its provenance.
Safety is not benchmarked against literature. The CEP safety endpoints compare against the device's own Risk Management File probabilities, not against SotA rates for similar devices or standard clinical practice. The SotA document includes a vigilance database search (MAUDE, EUDAMED) that found zero incidents for similar devices, but this is presented as a search result, not integrated into the safety endpoint framework.
Use environment text is ambiguous, not contradictory. The use environment text describes the device's IT deployment context ("healthcare organisations... situated inside hospitals or other clinical facilities"), which is correct; the API runs within healthcare org infrastructure. BSI read this as restricting the clinician's physical location, but it doesn't. The text needs clarification in the CER response, not fundamental revision. The device is an API: the "use environment" is the server/IT infrastructure, and both in-person and teleconsultation workflows operate within it.

Relevant QMS documents

Document	Path	Relevance
CEP, Clinical Benefits table	`R-TF-015-001-Clinical-Evaluation-Plan.mdx`, lines 281-289	The "§17.4" BSI references: 7 benefits with means of measure and magnitude thresholds
CEP, Safety endpoints	`R-TF-015-001-Clinical-Evaluation-Plan.mdx`, lines 466-478	Safety objectives mapped to risk IDs, with generic acceptance criteria
CEP, Pivotal investigations	`R-TF-015-001-Clinical-Evaluation-Plan.mdx`, lines 660-675	8 study protocols with acceptance criteria per study
SotA document	`R-TF-015-011-State-of-the-Art.mdx`	64 appraised articles, organized by clinical application. Contains baselines but no derivation of acceptance criteria
CER	`R-TF-015-003-Clinical-Evaluation-Report.mdx`	Clinical evaluation results, safety conclusions. BSI reviewed this and found traceability gaps
Performance claims data	`packages/ui/src/components/PerformanceClaimsAndClinicalBenefits/performanceClaims.ts`	~148 claims with `acceptanceCriteriaStateOfTheArtValue` field; baselines exist but are not sourced
Clinical benefits data	`packages/ui/src/components/PerformanceClaimsAndClinicalBenefits/clinicalBenefits.ts`	3 benefits (7GH, 5RB, 3KX) with declarative filter criteria
Performance claims types	`packages/ui/src/components/PerformanceClaimsAndClinicalBenefits/types.ts`	`globalValueOfDevice` computation (data pooling formula)
Use environment text	`packages/reusable/translations/en.json`, line 507	"situated inside hospitals or other clinical facilities"
Risk Management Record	`R-TF-013-002-Risk-Management-Record.mdx`	Residual probabilities referenced by safety endpoints
Use environment (fyi)	`fyi/icd-distribution-vs-diagnosis.md`	Essential reading: device function vs diagnosis framing
Use environment (fyi)	`fyi/clinical-evidence-icd-distribution-rationale.md`	Essential reading: why claims aren't condition-specific

Gap analysis

What we already have

SotA baselines exist in the data model. Each performance claim has an acceptanceCriteriaStateOfTheArtValue populated from SotA literature. The data exists but the provenance chain is broken; the CER/CEP does not show which article each baseline comes from.
The SotA document has the right articles. R-TF-015-011 contains article summaries organized by clinical application (malignancy detection, diagnostic accuracy, referral accuracy, severity assessment, teledermatology). The summaries include quantitative baselines. The gap is that these summaries are not connected to the acceptance criteria in the CEP.
Study acceptance criteria are already defined. CEP lines 660-675 list each pivotal study's acceptance criteria. For example, BI_2024: "An improvement of at least 10% in diagnostic accuracy for GPP when used by PCPs, and at least 5% by dermatologists."
Data pooling methodology is coded but not documented. The globalValueOfDevice formula exists in types.ts and is documented in CLAUDE.md, but nowhere in the CER or CEP.
3KX remote care sub-criterion is supported by clinical evidence. SAN_2024 and PH_2024 both have secondary objectives evaluating remote care, and DAO_O_2022 includes teledermatology referral assessment. The clinical evidence exists; the use environment text simply wasn't updated to reflect this use context.
Safety rates from studies. The CER states "no serious incidents reported" across all studies, and the vigilance search found zero incidents for similar devices. This data exists but is not structured as a traceability table.

What BSI couldn't find

Derivation of acceptance criteria from SotA. A table or narrative showing: SotA article X reports baseline Y; we set acceptance criterion Z because [justification]; this maps to benefit [ID].
Definition of "Top-1 accuracy." Nowhere in the CEP or CER is this AI/ML metric explained for a clinical audience.
Explanation of data pooling. The globalValueOfDevice weighted average methodology is not documented in any regulatory document.
Traceability from safety rates to SotA/similar devices. Safety endpoints reference the RMF but not the literature.
~~Clear use environment for remote care.~~ See analysis below; this is not actually a contradiction.

What genuinely needs updating

Add SotA derivation traceability (for acceptance criteria): A section in the CER (or CEP, or both) that traces each acceptance criterion back to specific SotA articles, showing the baseline value, the article(s), and the rationale for the chosen threshold.
Define "Top-1 accuracy" in the CER/CEP glossary section.
Document the data pooling methodology in the CER: explain the weighted-average formula and why cross-study aggregation is appropriate.
Add safety benchmarking against SotA: Create a table that compares observed safety outcomes from our studies against (a) similar device incident rates from vigilance databases, and (b) standard clinical practice safety rates from SotA literature.
Clarify (not change) the use environment text in the CER response to BSI; see analysis below.
Add a navigable summary for the ~148 performance claims (e.g., summary table per benefit showing aggregate results, with detailed claims as supporting detail).
Clarify "multiple conditions": explain what this indication label means in the context of each study.

Use environment vs remote care: NOT a contradiction (clarification needed, not a fix)

BSI reads the use environment text ("healthcare organisations... situated inside hospitals or other clinical facilities") and the remote care sub-criterion of benefit 3KX ("remote diagnosis", "remote referral") and sees a contradiction. On closer analysis, there is no contradiction. The apparent conflict arises from conflating two different concepts:

1. Use environment = where the device is deployed (IT infrastructure). The use environment text describes the device's deployment context: it runs as an API integrated into a healthcare organisation's IT system. The two sentences say:

"The device is intended to be used in the setting of healthcare organisations and their IT departments, which commonly are situated inside hospitals or other clinical facilities."
"The device is intended to be integrated into the healthcare organisation's system by IT professionals."

This describes where the software runs, on the healthcare organisation's servers/infrastructure. It does NOT restrict where the clinician sits when they access the device through their organisation's system.

2. Remote care = a clinical workflow modality, not a change in use environment. A dermatologist reviewing images from home through their hospital's system is still using the device "in the setting of healthcare organisations." The device is running on the organisation's infrastructure, accessed through the organisation's authenticated systems, within the organisation's clinical workflow. Teledermatology is standard clinical practice; the MDR does not require that clinicians be physically inside a hospital to use a cloud/API-based SaMD.

MDR regulatory basis: MDR Annex I GSPR 14.1 requires specifying "conditions of use" and "use environment." For SaMD, this means the IT environment, network requirements, and integration context (see also MDCG 2019-11 on SaMD qualification). The MDR does not define "use environment" as the physical location of the end user; that would be unworkable for any cloud-based or API-based medical device, as clinicians routinely access hospital systems remotely.

Clinical evidence supports this reading:

Study SAN_2024: "conducted remotely by sending the images to the participating professionals"
Study PH_2024: "conducted remotely via image analysis by participating primary care professionals"
Study BI_2024: "conducted remotely by sending the images to the participating dermatologists"
Study COVIDX_EVCDAO_2022: "continuous and remote monitoring of patient condition severity"
Planned studies triaje_VH_2025 and clinical_VH_2025: explicitly target "automated triage in teledermatology"

In every case, the device was deployed in a healthcare organisation's infrastructure. The "remote" aspect refers to the clinical workflow (image-based teleconsultation), not a different deployment environment.

Response strategy for BSI: Do NOT frame this as "we need to change the use environment." Instead:

Explain that the use environment text describes the device's deployment context (healthcare org IT infrastructure), not the clinician's physical location.
Note that teledermatology is a workflow modality that operates entirely within the stated use environment; the device runs on the healthcare organisation's servers regardless of whether the consultation is in-person or remote.
Point to the clinical studies that validate the device in both in-person and remote workflows, all deployed within healthcare organisations' systems.
If BSI wants the text to be more explicit, we can add a clarifying sentence (e.g., "The device supports both in-person and teleconsultation clinical workflows within this deployment environment"), but this is a clarification, not a correction. The current text does not exclude remote use.

No decision pending. This does not require a regulatory decision about the intended purpose. The device already supports teledermatology. The use environment text already permits it. BSI's observation stems from a misreading of "healthcare facilities" as restricting clinician location rather than describing the IT deployment context.

Response strategy

Regulatory mapping

BSI concern	GSPR / Annex clause	How our corrective action addresses it
Acceptance criteria not traced to SotA	Annex XIV 1(a) sub-bullet 4	Add derivation table linking each acceptance criterion to SotA article, baseline value, and justification
Performance/safety parameters not justified	Annex XIV 1(a) sub-bullet 6	Add justification narrative for each parameter and benchmark against SotA/similar devices
Clinical benefits unclear/not measurable	Article 2(53)	Add explanatory text defining each metric (including "Top-1 accuracy"), with clear patient-relevant outcomes
Data pooling methodology	Annex II	Document the weighted-average formula and justification in the CER
Safety not traced to SotA	Annex XIV 1(a) sub-bullet 6; GSPR 1, 8	Add safety benchmarking table comparing observed rates against literature and vigilance data
Use environment vs remote care	Annex XIV 1(a) sub-bullet 2	Clarify in response that the use environment describes IT deployment context, not clinician physical location; add clarifying sentence to CER if needed

Fix plan

#	Action	Document affected	Complexity
1	Add "Acceptance Criteria Derivation from State of the Art" section to the CER	R-TF-015-003 CER	High: requires systematic mapping of ~30 acceptance criteria to ~64 SotA articles
2	Add glossary entry for "Top-1 accuracy" and other AI/ML metrics	R-TF-015-003 CER, R-TF-015-001 CEP	Low
3	Document data pooling methodology (weighted-average formula, grouping criteria, justification)	R-TF-015-003 CER	Medium
4	Add safety benchmarking table comparing observed safety outcomes to SotA/similar device rates	R-TF-015-003 CER	Medium
5	Clarify use environment text to explicitly state it covers both in-person and teleconsultation workflows	R-TF-015-003 CER (clarifying sentence); response to BSI (explanation)	Low: no regulatory decision needed, just clarification
6	Add benefit-level summary of performance claims (aggregate results per benefit) to improve navigability	R-TF-015-003 CER	Medium
7	Clarify "Multiple conditions" indication label: define what it means in each study context	R-TF-015-003 CER, performance claims documentation	Low
8	Justify acceptance criteria that appear low (3KX remote care sub-criterion sensitivity 30%, 7GH rare disease sub-criterion accuracy 54%, 5RB kappa 0.6)	R-TF-015-003 CER	Medium: needs clinical rationale

Response approach

For each of BSI's three areas, the response should:

Acknowledge that the CER/CEP lacked explicit traceability between acceptance criteria and SotA articles.
Explain that the acceptance criteria were derived from the SotA literature (which was already complete) but the derivation chain was not documented in the CER/CEP; the SotA document provided article summaries, and the CEP set thresholds, but the link between them was implicit rather than explicit.
Describe the fix: point to the new section(s) in the CER that now trace each acceptance criterion to its SotA source.
Reference the updated CER sections with specific paragraph/table numbers.

Response tone rules (from M1.Q1 and Item 2a):

Do NOT argue that BSI should have found the information; acknowledge the gap and describe the fix.
Do NOT over-explain the data pooling mathematics; present it clearly and briefly.
Do NOT claim acceptance criteria are "conservative" or "stringent"; BSI flagged some as seemingly low, so address those specifically with clinical justification.
Do NOT frame safety as inherently risk-free because the device is SaMD; BSI expects specific benchmarks even for software.

Handling "acceptance criteria seem low"

Some acceptance criteria need specific justification because they appear low on first read:

3KX remote care sub-criterion sensitivity 30% for remote referrals: This must be contextualized against the SotA. If primary care practitioners without the device have a referral sensitivity of X% in teledermatology settings, and 30% represents an improvement or a clinically acceptable threshold for the remote use case, state this explicitly. If the 30% threshold is genuinely low, consider whether it should be revised.
7GH rare disease sub-criterion absolute accuracy 54%: Rare diseases are by definition harder to diagnose. If the SotA shows PCP baseline accuracy for rare dermatological conditions is around 40-45%, then 54% with device assistance represents a meaningful improvement. The CER must show this baseline.
5RB unweighted kappa 0.6 for alopecia severity: Kappa thresholds follow Landis & Koch (1977): 0.6 = "moderate" agreement. If interobserver agreement in dermatological severity assessment is typically 0.4-0.5 (the SotA data for HS shows ICC = 0.47), then kappa 0.6 represents improvement. The CER must cite this baseline.

Cross-NC connections

Connection to Item 2a (Device Description & Intended Purpose)

Item 2a and Item 2b are two parts of the same deficiency finding. Item 2a addresses WHAT the device does (outputs, ICD categories, indications); Item 2b addresses HOW WELL the device performs (clinical benefits, acceptance criteria, SotA comparison). The fixes must be coordinated:

Any new "Acceptance Criteria Derivation from SotA" section in the CER should reference the device description and intended purpose language from Item 2a.
The "Multiple conditions" clarification in Item 2b aligns with the ICD-11 category enumeration in Item 2a.
The use environment reconciliation (3KX remote care sub-criterion) affects both items; Item 2a defines the intended purpose scope, Item 2b validates performance claims within that scope.

Connection to Technical Review M1.Q1 (IFU Performance Claims)

Alignment required across reviews

Item 2b and M1.Q1 (technical review) address performance claims from different angles. M1.Q1 concerns how claims are presented in the IFU (user-facing); Item 2b concerns how they are justified in the CER (regulatory-facing). Both responses go to BSI and must be consistent:

Same claims data model. The ~148 performance claims used in the IFU (via ClinicalBenefitsList components) and in the CER come from the same performanceClaims.ts. Any change to claim structure, acceptance criteria, or SotA values affects both.
Device function vs clinical benefit. M1.Q1 established the framing (uniform distributional output, context-dependent clinical benefits) that Item 2b must maintain when explaining why benefits vary across conditions while the device function is uniform.
SotA baselines. M1.Q1 added acceptanceCriteriaStateOfTheArtValue to the IFU display. Item 2b must trace these same values to specific SotA articles. The values must match.
"Top-1 accuracy" definition. M1.Q1 added a "How to Read the Performance Claims" section to the IFU. The same definition should appear in the CER.

Connection to Items 3a and 3b (Clinical Data)

Item 3 asks about clinical data analysis and data sufficiency. Item 2b's fixes (SotA traceability, data pooling justification, acceptance criteria rationale) directly support Item 3's requirements. The SotA derivation table created for Item 2b will be referenced in Item 3's response to demonstrate that clinical data analysis is systematically based on pre-defined benchmarks.

Connection to Item 7 (Risk)

Item 7 asks about severity justification, occurrence estimates, and residual risk. Item 2b's safety benchmarking (fix #4) directly relates; the safety endpoint improvements should align with whatever risk justification approach is used in Item 7's response.

Key research findings

Finding 1: SotA baselines exist in the data but are not traced

The performanceClaims.ts data model already contains acceptanceCriteriaStateOfTheArtValue for each claim where applicable. For example:

Claim MRT (top-1 accuracy, multiple conditions, all HCPs): SotA value = 0.0636 (6.36% relative improvement baseline)
Claim LL5 (ICC, hidradenitis supurativa): SotA value = 0.47 (literature interobserver agreement), CI [0.32, 0.65]

These values came from the SotA literature review (R-TF-015-011) but the specific article provenance is not recorded in the data model or the CER.

Finding 2: SotA document is organized for traceability; just needs the final link

R-TF-015-011 organizes its article summaries by clinical application:

"Clinical data collected on malignancy detection" (line 444+)
"Clinical data collected on the diagnostic accuracy of HCPs" (line 456+)
"Clinical data collected on the referral accuracy of PCPs" (line 495+)
"Clinical data collected on severity assessment" (section exists)
"Clinical data collected on teledermatology" (section exists)

Each section includes quantitative baselines from the literature. The gap is that these baselines are not explicitly linked to acceptance criteria in the CEP/CER.

Finding 3: Data pooling is well-defined programmatically

The globalValueOfDevice computation (CLAUDE.md documentation):

Formula: Sigma(achievedValue x sampleSize) / Sigma(sampleSize)
Grouping: Claims are grouped when ALL of indications, userGroup, acceptanceCriteriaDomain, acceptanceCriteriaMetric, acceptanceCriteriaValueMagnitude, and performanceSubject match
Sample sizes sourced from clinicalStudiesData.ts

This is a methodologically sound weighted-average approach. It just needs to be documented in the CER with justification for why cross-study aggregation is appropriate (same device version, same measurement methodology, compatible study designs).

Finding 4: Safety approach relies on RMF not SotA

CEP safety endpoints (lines 470-476) use the pattern: "Nb cases of [harm] < residual probability in RMF for the corresponding risk(s) (a possibility between 0.1% and 0.01%)." The acceptance criteria are entirely internal (reference the device's own RMF) rather than benchmarked against:

Adverse event rates from similar devices (vigilance search found 0 incidents for SkinVision, Molescope, Huvy, DERM, Dermalyser, FotoFinder)
Misdiagnosis rates in standard clinical practice from SotA literature
Adverse outcome rates for AI dermatology devices from published studies

Finding 5: Use environment text is narrower than clinical evidence

The use environment (en.json line 507) says "healthcare organisations... situated inside hospitals or other clinical facilities." However:

Study COVIDX_EVCDAO_2022: secondary objective included "confirming that the utilization of the device elicits a high level of patient satisfaction, particularly in its remote application"
Study DAO_O_2022: acceptance criteria include "sensitivity and specificity equal to or superior to the PCP to identify necessary referrals in teledermatology"
Studies SAN_2024 and PH_2024: secondary objectives include "validate what percentage of cases could be handled remotely"

The clinical evidence clearly supports remote care use, but the use environment text was not updated to reflect this. This is likely an oversight from when the use environment text was originally written for the MDD-era legacy device.

Finding 6: "Top-1 accuracy" is a standard AI metric but undefined for clinicians

"Top-1 accuracy" measures whether the correct diagnosis appears as the model's highest-probability prediction. In the context of this device (which outputs a probability distribution), "Top-1 accuracy with device" means: the proportion of cases where the HCP, when reviewing the device's distributional output, selects the correct diagnosis as their primary choice. The comparison "Top-1 accuracy without device" is the proportion of cases where the HCP's unaided primary choice is correct. The acceptance criteria are typically expressed as the relative improvement between these two rates.

Addressed weaknesses (BSI auditor perspective)

Internal working document

These concerns were identified through a critical review from the BSI auditor's perspective and have been proactively addressed.

High risk: SotA analysis gap

Observation: The SotA document was descriptive (summaries) but lacked the final analytical synthesis linking baselines to acceptance criteria. Resolution: We expanded R-TF-015-011 (State of the Art) to include a "Rationale for the Selection of Articles" and added a systematic "Acceptance Criteria Derivation from State of the Art" section to the CER (R-TF-015-003). This provides the direct analytical chain from the 64 appraised articles to the specific thresholds.

High risk: Some acceptance criteria may be challenged as too low

Observation: Specific thresholds (3KX remote care 30%, 7GH rare disease 54%, 5RB 0.6) appeared low and required clinical justification. Resolution: We added a dedicated "Justification of Acceptance Criteria" section to the CER. We contextualized these values against SotA baselines, showing they represent significant improvements (e.g., 54% accuracy in rare diseases vs. ~40% unaided baseline) or align with standard clinical frameworks (e.g., Landis & Koch for Kappa 0.6).

Medium risk: Data pooling may be challenged

Observation: The weighted-average methodology for aggregating ~148 claims across studies was not documented or justified. Resolution: We added a "Data Pooling Methodology" section to the CER, explicitly documenting the weighted-average formula and providing the rationale for pooling (homogeneity of populations, identical device versions, and compatible study designs).

Low risk: Remote care use environment perception

Observation: BSI perceived a contradiction between the "healthcare facility" use environment and the remote care sub-criterion of benefit 3KX ("remote diagnosis", "remote referral", previously coded 0ZC). Resolution: We clarified in the response that the use environment describes the IT deployment context (API on hospital servers), not clinician physical location. We pointed to the existing clinical studies conducted in remote workflows to prove validation within this environment.

Low risk: Number of claims may invite scrutiny

Observation: The sheer volume of performance claims (~148) could be overwhelming and appear fragmented. Resolution: We added a "Summary of Clinical Benefits Achievement" table to the CER. This table provides an aggregate view of evidence for each of the 3 clinical benefits (7GH, 5RB, 3KX) with their sub-criteria, framing the individual claims as supporting detail for a clear, unified clinical case.

Regulatory framework: what the BSI meeting revealed

Severity warning: Item 2 is Critical

Nick stated that refusal is extremely likely. Item 2b is where the entire clinical performance case is made or lost. Nick's position, that MRMC studies alone are insufficient and that a very robust PMCF study is required, means that the clinical evidence presented for acceptance criteria based primarily on MRMC data is at risk of rejection. The regulatory framework must be applied precisely: real-world studies are primary evidence; MRMC studies are supporting evidence only; PMCF provides the robust post-market confirmation Nick requires. If this hierarchy is not explicit in the CER, the clinical performance case fails.

The four applicable guidance documents

Document	Role for Item 2b
MEDDEV 2.7.1 Rev 4, Annex A7.2	"Conformity with acceptable benefit/risk profile." Every clinical benefit claim must satisfy this standard: (a) benefits must be quantified: magnitude, variation across population, clinical relevance; (b) clinical risks evaluated as rates: false positive rates (unnecessary procedures) and false negative rates (missed/delayed diagnoses) for diagnostic devices; (c) benefit-risk evaluated against state of the art including alternative treatments. A benefit that is stated as "X% improvement" without showing what alternative treatments achieve, and without quantifying the associated clinical risks, does not satisfy Annex A7.2. Every one of the 7 clinical benefits must be fully justified under this standard.
MEDDEV 2.7.1 Rev 4, Annex A7.3	"Conformity with performance requirements." For diagnostic devices specifically, this annex requires: diagnostic sensitivity and specificity for major clinical indications individually, not only as a pooled aggregate; PPV and NPV according to varying pre-test probabilities; reproducibility of independent image acquisition and reporting. Nick's explicit statement about individual breakdown for high-risk conditions (melanoma, malignancies) is based on this requirement. It is not a preference; it is a mandatory element of Annex A7.3. A CER that presents only pooled performance across "multiple conditions" fails this requirement.
MEDDEV 2.7.1 Rev 4, Annex A7.4	"Conformity with undesirable side-effects." Clinical data must contain adequate observations for scientifically valid conclusions about side-effects. Critically: "if clinical data is lacking or observations are insufficient, conformity is NOT fulfilled." For diagnostic devices, side-effects include false positive harms (unnecessary biopsies, procedures) and false negative harms (missed diagnoses, delayed treatment). The CER must quantify these harms using clinical data, not just state they are "minimised."
MDCG 2020-6, Appendix III	12-level evidence quality hierarchy. Rank 11 = "simulated use / animal / cadaveric testing with HCPs" = explicitly listed as NOT clinical data under MDR. Nick confirmed this position during the BSI meeting: "MRMC studies where you show doctors images and ask them to assess with/without the device are not clinical data." Our MRMC studies map to Rank 11. They can serve as supporting evidence (corroborating technical performance, validating VCA in controlled conditions) but cannot be the primary basis for clinical performance claims. Primary evidence must come from Ranks 1–4 (real-world clinical investigations).
MDCG 2020-1	Three pillars: VCA, Technical Performance, Clinical Performance. Acceptance criteria must be defined for all three pillars, not just clinical performance. Technical Performance criteria must address the device's ability to handle real-world input variability (Fitzpatrick skin types, image acquisition conditions, camera types). An MRMC study contributes to Technical Performance but not to Clinical Performance. Clinical Performance criteria require real-world validation against a reference standard in the intended-use population.
MDCG 2020-13, Sections C and E	Section C: BSI checks clinical performance, safety, and SotA benchmarks against the device description. Section E: BSI assesses whether "the clinical performance endpoints are appropriate for each indication." These two sections together constitute the formal checklist BSI will use to assess Item 2b. Every acceptance criterion must be traceable from these sections.

What Nick said: MRMC studies are not clinical data

Nick stated explicitly and unambiguously during the BSI meeting:

MRMC (multi-reader, multi-case) studies where you show doctors images and ask them to assess with/without the device are not clinical data.
The reason: the device is intended for real patients in real clinical situations. MRMC is a simulated environment.
CE marking with MRMC-style studies is possible ONLY if accompanied by a very robust post-market study to prove real-world impact at scale.
Nick specifically asked: "Have you looked at a large sample of patients where this device is being used within that described workflow and compared it against standard medical practice?"

Practical implication: Every acceptance criterion that is framed as validated by an MRMC study must be reframed. In the CER:

MRMC results are presented first as supporting/corroborating evidence (demonstrating that in controlled conditions, the device improves HCP accuracy)
Real-world study results (COVIDX, DAO-O, DAO-PH, MC_EVCDAO) are the primary evidence
The PMCF program (Activity C.2: MRMC study in the FDA context, plus real-world monitoring) provides the robust post-market study Nick requires

This hierarchy, real-world primary, MRMC supporting, robust PMCF confirming, must be explicit in the CER. It must also be consistent with the evidence hierarchy per MDCG 2020-6 Appendix III.

What Nick said: acceptance criteria must be traced to SotA

Nick stated the five required elements for each acceptance criterion:

A clear statement of what the clinical benefit is and why it matters.
Identification of comparable devices or methods (state of the art) in the literature.
A justification for why the chosen threshold is appropriate based on what those comparable devices achieve.
For pooled data across conditions: a risk-based justification for why pooling is appropriate.
For higher-risk conditions (melanoma, malignancies): an individual breakdown of acceptance criteria and evidence.

None of these five elements are currently explicit for all 3 clinical benefits (7GH, 5RB, 3KX) in the CER. The data exists in R-TF-015-011 (SotA article summaries with quantitative baselines) and in performanceClaims.ts (acceptanceCriteriaStateOfTheArtValue per claim), but the chain of reasoning is implicit rather than documented. It must be made explicit, in prose, using the language of the applicable standards.

X-3 disease categorisation: the three-tier evidence structure

The X-3 decision (2026-03-28) provides the framework for satisfying both MEDDEV A7.3 (individual performance per major indication) and MDCG 2020-6 Appendix III (risk-based justification for pooling):

Tier 1: Malignant conditions (individual analysis): Individual acceptance criteria per condition group. Regulatory basis: MEDDEV A7.3 requires sensitivity/specificity for major clinical indications individually. Nick stated BSI will specifically audit high-risk conditions. Evidence: MC_EVCDAO_2019 and IDEI_2023 provide Rank 4 evidence for melanoma and multiple malignant conditions. Acceptance criteria: AUC ≥ 0.848 (met: 0.8482 in MC_EVCDAO) and AUC ≥ 0.90 for multiple malignant conditions (met: 0.8983–0.9669 across studies).

Tier 2: Rare diseases (grouped analysis): Grouped acceptance criteria with disease-specific justification. Regulatory basis: MDCG 2020-6 § 6.5(e) requires either sufficient evidence per indication or declared acceptable gap. BI_2024 provides Rank 4 evidence for the rare disease subgroup as a whole. Acceptance criterion: absolute diagnostic accuracy ≥ 54% for rare conditions (SotA baseline: approximately 30–40% unaided PCP accuracy for rare skin diseases, justifying 54% as a clinically meaningful improvement).

Tier 3: General conditions (pooled with risk-based justification): Pooled acceptance criteria with the four-point justification documented in X-3: comparable consequence of misclassification within non-malignant categories; device architecture supports pooling (uniform Vision Transformer pipeline); representative epidemiological coverage (5 of 7 categories, 97% of dermatological presentations); consistent architecture supports expectation of consistent capability. This replaces the current vague "clinical comparability" justification.

Every acceptance criterion in the CER must be explicitly mapped to one of these three tiers, with the appropriate regulatory basis stated.

Nick's warning: consider removing under-supported claims

Nick stated: "Consider whether all indications are equally well-supported by data, and whether some should be removed from the claims if data is insufficient."

Per MDCG 2020-6 § 6.5(e): if evidence is insufficient for an indication, the intended purpose must be narrowed or the gap declared acceptable with PMCF. The X-3 disease categorisation decision has applied this process to the epidemiological categories. The same process must be applied to individual performance claims:

Any claim supported only by MRMC evidence (Rank 11) and with no real-world supporting data must be flagged.
Any claim with very low acceptance criteria (e.g., 3KX remote care sub-criterion sensitivity 30%, 7GH rare disease sub-criterion accuracy 54%) must be justified against SotA; if the justification cannot be made, the claim should be narrowed or removed.
The X-3 declared acceptable gaps (autoimmune, genodermatoses) apply at the category level; the same gap declaration logic applies at the claim level for any claim where evidence is thin.

The response must show that this assessment has been done systematically, not just for the claims BSI flagged explicitly.

What BSI is asking​

1. Clinical benefit​

2. Clinical performance​

3. Clinical safety​

What regulations are at stake​

Root cause analysis​

Relevant QMS documents​

Gap analysis​

What we already have​

What BSI couldn't find​

What genuinely needs updating​

Response strategy​

Regulatory mapping​

Fix plan​

Response approach​

Handling "acceptance criteria seem low"​

Cross-NC connections​

Connection to Item 2a (Device Description & Intended Purpose)​

Connection to Technical Review M1.Q1 (IFU Performance Claims)​

Connection to Items 3a and 3b (Clinical Data)​

Connection to Item 7 (Risk)​

Key research findings​

Finding 1: SotA baselines exist in the data but are not traced​

Finding 2: SotA document is organized for traceability; just needs the final link​

Finding 3: Data pooling is well-defined programmatically​

Finding 4: Safety approach relies on RMF not SotA​

Finding 5: Use environment text is narrower than clinical evidence​

Finding 6: "Top-1 accuracy" is a standard AI metric but undefined for clinicians​

Addressed weaknesses (BSI auditor perspective)​

High risk: SotA analysis gap​

High risk: Some acceptance criteria may be challenged as too low​

Medium risk: Data pooling may be challenged​

Low risk: Remote care use environment perception​

Low risk: Number of claims may invite scrutiny​

Regulatory framework: what the BSI meeting revealed​

The four applicable guidance documents​

What Nick said: MRMC studies are not clinical data​

What Nick said: acceptance criteria must be traced to SotA​

X-3 disease categorisation: the three-tier evidence structure​

Nick's warning: consider removing under-supported claims​