Research and planning
This document is for internal use only. It contains analysis, gap identification, and response strategy for Item 4 of the BSI Clinical Review Round 1. It will not be included in the final response to BSI.
1. What BSI is asking
BSI's clinical reviewer read the CEP (R-TF-015-001, §20, lines 644–650) which states:
"The summative usability validation will be conducted as an integral part of the upcoming clinical investigation."
The CEP was written in future tense and provides no further details. BSI wants to know:
- Which CI the summative evaluation refers to
- Whether it has been completed
- A summary of the methodology: users, sample size, critical tasks
- Results
This is an observation/request, not a deficiency finding. The regulatory basis is GSPR 5 (eliminating/reducing use-related risks) and EN ISO 14971.
2. The answer is straightforward
The summative evaluation has been completed. It was conducted as a standalone HF validation study (not embedded in a specific CI), documented in:
- R-TF-025-004: Summative Evaluation Protocol
- R-TF-025-005: Summative Evaluation Observation Form
- R-TF-025-006: Summative Evaluation Questionnaires (HCP + ITP)
- R-TF-025-007: Summative Evaluation Report
Key facts
| Parameter | Value |
|---|---|
| Study dates | October 14–25, 2025 |
| Location | HCP: in-person, Valencia, Spain. ITP: remote via video conference |
| Total participants | 36 (18 HCP + 18 ITP) |
| HCP professions | 10 nurses (55.6%), 5 dermatologists (27.8%), 3 GPs (16.7%) |
| ITP professions | Software engineers, DevOps, backend developers, API integration specialists, systems integrators |
| Standards | IEC 62366-1:2015 §5.9, FDA HF guidance (Feb 2016), ISO 14971 |
| Equipment | HCPs used their own personal smartphones (ecological validity per FDA guidance) |
Critical tasks tested
HCP (3 scenarios):
- Scenario 1 — Simulated use: no lesion (photograph submission, report interpretation)
- Scenario 2 — Simulated use: lesion (photograph submission, report interpretation with clinical findings)
- Scenario 3 — Knowledge assessment (4 questions: report contents, malignancy probability, detected conditions, diagnostic vs support tool distinction)
ITP (1 scenario + knowledge assessment):
- Scenario 1 — Simulated use: 7 tasks (access IFU, authenticate, send/receive API requests, verify response fields, check API version)
- Knowledge assessment — 6 questions on endpoint URLs, response handling, error handling
Results summary
| Metric | HCP | ITP |
|---|---|---|
| Simulated use success | 100% (18/18) for scenarios 1 and 2 | 100% (18/18) for all 7 tasks |
| Knowledge assessment | Variable — Q1: 94.4%, Q2: 100%, Q3: 100%, Q4: 72.2% | 100% (18/18) for all 6 questions |
| Use errors | 1 (HCP Scenario 3 Q4) | 0 |
| Close calls | 3 (HCP Scenario 3 Q4) | 0 |
| Use difficulties | 2 (HCP Scenario 3 Q1, Q4) | 0 |
| SUS score | 82.5 (Excellent) | 85.2 (Excellent) |
| SUS target | >70 (Good) | >70 (Good) |
Conclusion from R-TF-025-007
Both HCP and ITP testing demonstrate safe and effective use for all intended user groups. SUS scores exceed the target threshold.
3. Gap analysis
| # | Aspect | Status | Gap |
|---|---|---|---|
| 1 | Study completed? | Done — October 2025 | None |
| 2 | Methodology documented? | Done — R-TF-025-004 protocol | None |
| 3 | Results documented? | Done — R-TF-025-007 report | None |
| 4 | CEP updated to reflect completion? | Gap — CEP still uses future tense | CEP line 644–650 should be updated to past tense with reference to the report |
The only gap for Item 4 specifically is that the CEP still describes the summative evaluation in future tense. The evaluation has been completed but the CEP was not updated to reflect this.
4. Cross-NC connections
Technical Review N2 — Usability (same study, deeper issues)
The Technical Review N2 non-conformity (N2 index) addresses the same summative evaluation but raises much deeper concerns:
- N2.a: Use errors, close calls, and difficulties in HCP Scenario 3 were not subjected to root cause analysis (RCA) or residual risk assessment — despite the protocol (R-TF-025-004 §14.7) committing to this analysis
- N2.b: 33% of HCPs did not understand the device is not diagnostic (Q4 results: 72.2% success)
- N2.c: No conclusions about IFU usability
- N2.d: No conclusions about effectiveness of safety information
The fixes for N2 (adding RCA, residual risk assessment, IFU conclusions, and safety information effectiveness analysis to R-TF-025-007) will also strengthen the response to Item 4. The response to Item 4 should reference N2's more detailed analysis rather than duplicating it.
Clinical Review Item 3a — Clinical data analysis
Item 3a's CER fix requirements include integrating the summative evaluation results into the CER's clinical analysis. The CER currently references usability at line 766 but delegates to the summative report without summarising the results.
5. Response strategy
This is one of the simpler items to respond to. The response should:
- Confirm the summative evaluation has been completed — reference R-TF-025-007
- Provide the methodology summary BSI requested (users, sample size, critical tasks) — directly from the data above
- Provide the results summary — success rates, SUS scores, use problems identified
- Note that the CEP has been updated to reflect the completed status
- Reference Technical Review N2 for the detailed RCA and residual risk analysis of the use problems identified (the clinical reviewer's concern is simply "has it been done?" — the deeper analysis is N2's territory)
Fix required
Fix 1: Update CEP future tense (minor)
In R-TF-015-001, lines 644–650, change:
"The summative usability validation will be conducted..."
To past tense with reference to the completed report:
"The summative usability validation was conducted in October 2025 in accordance with R-TF-025-004. Results are documented in R-TF-025-007."
6. Risk assessment
| Risk | Impact | Mitigation |
|---|---|---|
| BSI reads the response and follows up with deeper usability questions | Low — the clinical reviewer's question is surface-level; deeper issues are handled by the technical reviewer in N2 | Keep the response concise; reference N2 for detailed analysis |
| HCP Q4 result (72.2% on "is it diagnostic?") could concern BSI from a clinical perspective | Medium — 33% misunderstanding rate on intended use is clinically significant | This is addressed head-on in N2.b; the response here should acknowledge the finding but defer analysis to N2 |
7. Open items
None — all information needed to respond to Item 4 is already available in the QMS. This item can proceed directly to response drafting after the CEP future-tense fix is applied.
Regulatory framework: what the BSI meeting revealed
Nick stated during the BSI meeting (2026-03-25) that refusal is extremely likely unless all gaps are closed. Item 4 is not the most critical item, but the 27.8% misunderstanding rate on intended use (HCP Scenario 3 Q4) is the kind of finding BSI cross-checks against the IFU and the risk management record. The response must be complete and consistent across all documents.
The four applicable guidance documents
| Document | Role for Item 4 |
|---|---|
| MDCG 2020-6, Appendix III, Rank 11 | Explicitly classifies "simulated use / animal / cadaveric testing with HCPs" as not clinical data under MDR. The summative usability evaluation (R-TF-025-007) used simulated use with HCPs — it maps to Rank 11. This classification does NOT undermine the usability evaluation's validity for its intended purpose (demonstrating safe and effective use per GSPR 5 and IEC 62366-1). However, it means the evaluation cannot serve as primary clinical evidence for performance or safety claims. Its regulatory role is usability validation, which is a distinct and specific requirement. This distinction must be explicitly stated in the response and in the CER — confusing usability evidence with clinical evidence is an error that creates new findings. |
| MEDDEV 2.7.1 Rev 4, Annex A10 | The CER release checklist. One item: verify "Consistency" — that manufacturer information materials (IFU, labelling) match CER contents. The usability conclusions in the CER must be consistent with what the IFU says about how the device is used, who uses it, and what use errors are known. If the IFU states the device is not diagnostic, and 27.8% of HCPs in the usability study misunderstood this, the IFU's disclaimer must be demonstrated as adequate — not just present. |
| MDCG 2020-13, Section G | BSI checks whether the IFU contains quantified risk information and adequate safety information. Residual usability risks must be communicated in the IFU in a way BSI can verify as adequate. For AI-RISK-021 (usability issues / model outputs not interpretable), the IFU must contain quantified guidance — not just a qualitative warning. |
| MDCG 2020-1 | Clinical evaluation of MDSW. The Clinical Performance pillar requires validation in the intended-use context. The summative usability evaluation contributes to the Clinical Performance evidence base by demonstrating that users can produce and interpret the device output in real conditions. However, as a simulated-use study, it supports rather than anchors the Clinical Performance claim — real-world deployment data (the PMCF program) is the primary source. |
The CER must contain usability results in prose, not just a reference
BSI reviewed the CER as a standalone document. If the CER's treatment of usability is a brief cross-reference to R-TF-025-007, BSI cannot assess whether the usability validation is adequate from the CER alone — violating the standalone requirement Nick and Erin both described as the #1 structural problem.
Per MEDDEV 2.7.1 Rev 4 Annex A10 (CER release checklist), the CER must be verifiable as complete and self-consistent by a third party. A reviewer who does not have access to R-TF-025-007 must still be able to understand: what the summative evaluation tested, who participated, what the results were (including use errors and close calls), and what safety conclusions were reached.
The fix for Item 4 must therefore include two actions, not one:
- Update the CEP to past tense (already planned — minor fix).
- Add a prose summary to the CER's safety analysis section that covers the summative evaluation methodology, participants, critical tasks, results, and conclusions — making the CER standalone with respect to usability.
The MRMC framing does NOT apply to usability
Nick's statement that MRMC studies are not clinical data specifically applies to simulated clinical performance studies (e.g., showing doctors images and asking them to diagnose with/without the device). The summative usability evaluation is different in nature: it is a usability validation study, not a performance study. It tests whether users can use the device safely and effectively, not whether the device diagnoses correctly.
This distinction must be explicit in the response. The summative evaluation is not being presented as clinical evidence of device performance — it is presented as usability validation evidence per GSPR 5 and IEC 62366-1:2015 § 5.9. These are separate regulatory obligations with different evidence standards.
Consistency requirement: usability → IFU → risk management
Per MEDDEV Annex A10 and MDCG 2020-13 Section G, BSI will verify consistency across three documents:
- Usability evaluation results (R-TF-025-007): 1 use error, 3 close calls, 2 use difficulties. Q4 result: 27.8% of HCPs could not correctly identify the device as non-diagnostic.
- IFU warnings and limitations: Must reflect these findings quantitatively. If 27.8% of HCPs misunderstand the intended use, the IFU's non-diagnostic disclaimer must be prominent and reinforced — and BSI will check that it is.
- Risk management record (R-TF-028-011, AI-RISK-021): Residual usability risk must be communicated to users, and the IFU section where it is communicated must be cited in the risk assessment.
Technical Review N2 must be resolved before Item 4's response is finalised: N2 adds the RCA and residual risk assessment for the use errors identified in the summative evaluation. Without N2's fix, the risk management record (relevant to MDCG 2020-13 Section G) remains incomplete, and BSI will flag the inconsistency between the usability findings and the risk assessment.