Research and planning
This document is for internal use only. It contains analysis, gap identification, and response strategy for Item 3b of the BSI Clinical Review Round 1. It will not be included in the final response to BSI.
1. What BSI is asking
Item 3b says: "Please provide justification that sufficient data in quantity and quality has been analyzed in order to support the clinical benefit, safety, and performance of the device as compared to SotA in its intended use, including for all of the relevant patient populations and indications."
Where Item 3a asks us to identify and analyse all clinical data, Item 3b asks us to justify that the data is sufficient. This is a distinct regulatory obligation under:
- Annex XIV (2): The clinical evaluation shall be "thorough and objective" and its "depth and extent shall be proportionate and appropriate to the nature, classification, intended purpose and risks of the device."
- Article 61(1): The manufacturer shall "specify and justify the level of clinical evidence necessary."
- MDCG 2020-6 Appendix III: Hierarchy of clinical evidence types and considerations for sufficiency.
The word "sufficient" has three dimensions BSI will evaluate:
- Quantity: Enough subjects, enough studies, enough statistical power
- Quality: Study designs, methodological rigour, appraisal scores
- Coverage: All clinical benefits, all indications, all relevant patient populations, all intended users, safety endpoints
2. Current state of the sufficiency argument in the CER
What the CER claims (R-TF-015-003)
Line 840: "The adequacy of the number of observations, gathered from over 800 patients across eight pivotal studies, is justified for both performance and safety. Regarding performance, the sample size was formally calculated to ensure sufficient statistical power to validate the primary performance endpoints, based on detecting an effect size exceeding the 80% performance goal..."
Line 873: "The current body of evidence is sufficient to demonstrate the conformity of Legit.Health Plus with the General Safety and Performance Requirements (GSPRs) of the MDR 2017/745."
Why BSI finds this insufficient
The CER makes a top-level sufficiency claim but does not provide the structured, granular justification BSI expects. Specifically:
- No mapping from each clinical benefit → supporting studies → evidence adequacy
- No mapping from each indication → coverage across studies → gaps identified
- No patient population breakdown showing demographic representativeness
- No explicit comparison of evidence strength vs SotA for each claim
- The MDCG 2020-6 evidence hierarchy table in the CEP (lines 692–710) marks "No" for Rank 5 (equivalence data), Rank 7 (complaints/vigilance), and Rank 8 (PMS data) — despite claiming equivalence and having PMS data available. This directly contradicts the CER's own narrative.
3. Inventory of clinical evidence
3.1. Study portfolio
| Study | Design | N | Population | Indications | Key domains | User group |
|---|---|---|---|---|---|---|
| AIHS4 2025 | Retrospective, longitudinal | 2 patients (16 assessments) | HS patients | Hidradenitis suppurativa | Severity assessment | Dermatologists |
| BI 2024 | Prospective, cross-sectional | 100 images, 15 practitioners | Mixed conditions | GPP, HS, multiple | Diagnostic accuracy, rare diseases | PCPs + dermatologists |
| COVIDX 2022 | Prospective, cross-sectional | 160 patients, 6 dermatologists | Chronic dermatological conditions | Multiple chronic conditions | Clinical utility, remote monitoring, severity assessment | Dermatologists |
| DAO_O 2022 | Prospective, longitudinal | 117 patients (127 enrolled, 10 excluded) | Primary care referrals | Multiple conditions | Referral adequacy, malignancy detection | PCPs |
| DAO_PH 2022 | Prospective, longitudinal | 131 patients | Primary care referrals | Multiple conditions | Diagnostic accuracy, referral adequacy | PCPs + dermatologists |
| IDEI 2023 | Prospective + retrospective | 202 patients | Pigmented lesions + alopecia | Melanoma suspicion, androgenetic alopecia | Diagnostic accuracy, malignancy detection, severity assessment | Dermatologists |
| MC_EVCDAO 2019 | Prospective, cross-sectional | 105 patients | Melanoma-suspected lesions | Melanoma | Malignancy detection | Dermatologists |
| PH 2024 | Prospective, cross-sectional | 30 images, 9 PCPs | Multiple conditions | Multiple conditions | Diagnostic accuracy, remote consultation | PCPs |
| SAN 2024 | Prospective, cross-sectional | 29 images, 16 practitioners | Multiple conditions | Multiple conditions | Diagnostic accuracy, remote consultation | PCPs + dermatologists |
Total: 9 studies (8 with frozen MDR version + 1 with legacy device), 800+ patients, 60+ practitioners.
3.2. Evidence hierarchy assessment (MDCG 2020-6)
The CEP's evidence hierarchy table (lines 692–710) needs correction. Current vs. what we actually have:
| Rank | Evidence type | CEP says | Actual status |
|---|---|---|---|
| 1 | High quality CIs covering all variants | Yes | Yes — 8 pivotal studies |
| 5 | Equivalence data | No | Should be Yes — equivalence claimed with legacy device, full access to design data |
| 6 | SotA evaluation | Yes | Yes — 64 articles in R-TF-015-011 |
| 7 | Complaints and vigilance data | No | Should be Yes — 7 non-serious incidents documented in PSUR/PMS Report |
| 8 | Proactive PMS data (surveys) | No | Should be Yes — COVIDX included CUS/DUQ/SUS questionnaires; PMCF surveys conducted |
The CEP explicitly marks equivalence data, vigilance data, and PMS survey data as "Not used" in the evidence hierarchy, while the CER simultaneously claims equivalence with the legacy device and references its market experience. This inconsistency must be corrected in both documents.
3.3. Appraisal quality scores (CER lines 722–734)
| Study | Relevance (/6) | Quality (/4) | Weight (/10) | Level of evidence (/10) |
|---|---|---|---|---|
| MC_EVCDAO 2019 | 0.5 | 3.5 | 6.5 | 5 |
| AIHS4 2025 | 0.5 | 3.5 | 8.5 | 5 |
| BI 2024 | 0.5 | 3.5 | 8.5 | 6 |
| COVIDX 2022 | 0.5 | 2.5 | 6.5 | 5 |
| DAO_O 2022 | 0.5 | 3.5 | 9.5 | 5 |
| DAO_PH 2022 | 0.5 | 3.5 | 9.5 | 5 |
| IDEI 2023 | 0.5 | 3.5 | 8.5 | 5 |
| PH 2024 | 0.5 | 3.5 | 8.5 | 5 |
| SAN 2024 | 0.5 | 3.5 | 8.5 | 5 |
| Mean | — | — | 8.3 | 5.1 |
Mean weight 8.3/10 is strong. Level of evidence 5/10 reflects primarily observational designs (no RCTs), which is standard for SaMD diagnostic aids.
4. Coverage analysis
4.1. Clinical benefit coverage
Mapping the 7 claimed clinical benefits to supporting studies:
| Benefit | Code | Supporting studies | Coverage assessment |
|---|---|---|---|
| Diagnostic accuracy for multiple conditions | 7GH | BI 2024, DAO_PH 2022, IDEI 2023, SAN 2024, PH 2024 | Strong — 5 studies, multiple user groups, 500+ subjects |
| Reduce waiting times | 3KX | DAO_O 2022, DAO_PH 2022, COVIDX 2022 | Moderate — 3 studies; operational impact (actual waiting time reduction) not directly measured, inferred from referral adequacy |
| Referral precision | 8PL | DAO_O 2022, DAO_PH 2022 | Moderate — 2 studies with 248 patients in primary care settings |
| Malignancy detection (skin cancer) | 1QF | MC_EVCDAO 2019, IDEI 2023, DAO_O 2022, DAO_PH 2022 | Strong — 4 studies, 555+ patients, includes melanoma-specific cohort |
| Rare disease diagnosis | 9VW | BI 2024, SAN 2024, PH 2024 | Moderate — acceptance criteria defined as "improvement in rare conditions"; coverage depends on how "rare" is defined across study populations |
| Objective severity assessment | 5RB | AIHS4 2025, COVIDX 2022, IDEI 2023 | Weak-to-moderate — AIHS4 has only 2 patients (16 assessments); COVIDX uses CUS rather than direct severity measurement; IDEI covers androgenetic alopecia severity. Gap identified in CER (Gap 2) for atopic dermatitis, acne, and FFA |
| Remote care | 0ZC | COVIDX 2022, PH 2024, SAN 2024 | Moderate — COVIDX was conducted remotely; PH/SAN assessed remote consultation feasibility |
Key weakness: Benefit 5RB (severity assessment) has the thinnest evidence base. AIHS4 with 2 patients is extremely small, and the CER itself acknowledges this as Gap 2 for PMCF.
4.2. Indication coverage
The device covers ICD-11 Chapter 14 skin conditions. Key condition groups and their study coverage:
| Condition category | Studies providing evidence | N (approx.) | Assessment |
|---|---|---|---|
| Melanoma / malignant lesions | MC_EVCDAO, IDEI, DAO_O, DAO_PH | 400+ | Good |
| Pigmented lesions (benign) | MC_EVCDAO, IDEI | 200+ | Good |
| Psoriasis | COVIDX | Part of 160 | Limited — single study |
| Acne | COVIDX | Part of 160 | Limited — single study; Gap 2 |
| Atopic dermatitis | COVIDX | Part of 160 | Limited — single study; Gap 2 |
| Hidradenitis suppurativa | AIHS4, BI | 2 + images | Weak — AIHS4 has 2 patients |
| GPP (Generalised Pustular Psoriasis) | BI | Image-based | Limited — single study, image assessment only |
| Androgenetic alopecia | IDEI | 96 | Moderate — single study but adequate N |
| Urticaria | COVIDX (PMS data) | — | Minimal — mentioned in usage patterns only |
| Other rare conditions | BI, SAN, PH | Image sets | Variable — depends on condition |
Key weakness: The device claims coverage of all ICD-11 Chapter 14 conditions but most individual conditions (beyond melanoma and pigmented lesions) are covered by only 1–2 studies. The CER must either justify why limited per-condition coverage is acceptable (uniform algorithm architecture argument) or narrow the claims.
4.3. Patient population coverage
| Demographic factor | Available data | Gap |
|---|---|---|
| Age | Studies specify "adult patients (≥18)" but no age distribution breakdown provided | Need to compile available age ranges from study data |
| Sex | Not reported per study | GDPR data minimisation limits collection; must be justified |
| Fitzpatrick skin type | Some studies have data (confirmed by user) — need to identify which and compile | Critical for AI dermatology — must present whatever data exists |
| Geographic diversity | Studies conducted in Spain (Basque Country, Madrid, other regions) | Limited geographic diversity; must justify representativeness |
| Comorbidities | Not systematically reported | Standard for SaMD observational studies; justify |
4.4. User group coverage
| User group | Studies | N practitioners | Assessment |
|---|---|---|---|
| Primary care physicians (PCPs) | DAO_O, DAO_PH, BI, PH, SAN | 30+ | Good |
| Dermatologists | MC_EVCDAO, IDEI, COVIDX, BI, SAN | 30+ | Good |
| IT professionals (deployment) | None | 0 | Not applicable — IT professionals deploy the device, they don't generate clinical data |
4.5. Safety coverage
| Safety aspect | Evidence | Assessment |
|---|---|---|
| Adverse events in CIs | 0 across all 9 studies | Strong — consistent "no adverse events" across 800+ patients |
| Device deficiencies in CIs | 0 reported | Strong |
| Legacy market experience | 7 non-serious incidents, 0 serious, 0 FSCAs (4+ years, 4,500+ reports) | Strong — but NOT included in CER (see Item 3a) |
| Vigilance database search | EUDAMED/MAUDE searches referenced | Need to confirm this is documented |
| Similar device safety | SotA identified no direct patient harm from similar devices | Adequate |
5. Gap analysis specific to sufficiency
| # | Sufficiency dimension | What we have | What's missing for BSI | Priority |
|---|---|---|---|---|
| 1 | Benefit-to-study mapping | 7 benefits, 9 studies — mapping is implicit in filter criteria code | Explicit narrative in CER mapping each benefit to its supporting studies, with per-benefit sufficiency conclusion | High |
| 2 | Indication coverage justification | Studies cover melanoma, pigmented lesions, multiple chronic conditions, HS, GPP, alopecia | Explanation of how 9 studies covering ~15 conditions justify claims across all ICD-11 Ch.14 (~346 conditions). The uniform algorithm architecture argument needs to be made explicit | High |
| 3 | Population demographics | "Over 800 patients" — no demographic breakdown | Compile Fitzpatrick data from studies that have it; present available age/sex data; justify gaps via GDPR and study design | High |
| 4 | Per-study sample size justification | Formal calculations exist in CIPs (80% power, alpha 0.05 for IDEI; melanoma ratio for MC_DAO; target sample for others) | CER must summarise the sample size rationale for each study, not just claim "over 800 patients" | Medium |
| 5 | Evidence hierarchy correction | CEP table marks equivalence, vigilance, and PMS as "Not used" | Correct the table to reflect actual data used; align with CER narrative | High |
| 6 | Quality methodology | Studies appraised with mean weight 8.3/10 | CER needs a brief discussion of why observational Level 5 evidence is appropriate for SaMD (no surgical intervention, no randomisation needed for diagnostic accuracy studies) | Medium |
| 7 | SotA comparison narrative | acceptanceCriteriaStateOfTheArtValue exists per claim | CER lacks an explicit "device vs SotA" comparison section with aggregate conclusions. Individual claim-level comparisons exist but no synthesis | High |
| 8 | Severity assessment evidence weakness | AIHS4 has 2 patients; acknowledged as Gap 2 | Must explicitly acknowledge this limitation and justify that PMCF activities will address it; argue that current evidence is sufficient for initial CE mark with planned post-market data collection | Medium |
6. Cross-NC connections
Item 3a — Clinical data analysis
Item 3a research covers the factual gaps (missing PMS data, CI regulatory details, acceptance criteria reconciliation, etc.). Item 3b builds on those findings to construct the sufficiency argument. The fixes are coordinated:
- Item 3a Fix 1 (integrate PMS data) → feeds into Item 3b's safety sufficiency argument
- Item 3a Fix 3 (acceptance criteria reconciliation) → feeds into Item 3b's performance sufficiency argument
- Item 3a Fix 4 (data pooling methodology) → feeds into Item 3b's quantity justification
Item 2b — Clinical benefits, performance, safety vs SotA
Item 2b research addresses the SotA traceability chain. Item 3b's gap #7 (SotA comparison narrative) depends on the same fix: establishing provenance from SotA articles → baselines → acceptance criteria → achieved values.
Technical Review M1.Q1 — IFU performance claims
M1.Q1 research shares the concern about whether all IFU claims are backed by sufficient evidence, and the 239 vs 346 ICD-11 category reconciliation.
7. Response strategy
Structure of the sufficiency justification
The response should present a structured sufficiency argument organised along the three dimensions BSI expects:
A. Quantity of data
- Total evidence base: 9 studies, 800+ patients, 60+ practitioners, 4+ years of market experience with legacy device
- Per-study sample size: Summarise each study's sample size calculation, target, and actual enrollment
- Per-benefit evidence: Map each of the 7 clinical benefits to supporting studies and total subjects contributing
- Statistical power: All studies designed for ≥80% power at alpha 0.05 (except AIHS4, which uses repeated measures design)
B. Quality of data
- Study designs: All prospective or mixed prospective/retrospective; observational designs appropriate for SaMD diagnostic accuracy (no surgical intervention; reference standard available)
- Appraisal scores: Mean weight 8.3/10 across the portfolio; no study below 6.5/10
- Level of evidence: Level 5 (observational) is appropriate for SaMD — cite MDCG 2020-1 (clinical evaluation of MDSW) which acknowledges that RCTs may not be appropriate or feasible for SaMD
- Data quality controls: DIQA algorithm validates image quality in real-time; this mirrors real-world use because the device itself rejects poor quality images
C. Coverage
- Clinical benefits: Table mapping 7 benefits → studies → subjects → sufficiency conclusion
- Indications: Justify coverage through the uniform algorithm architecture argument — the device processes all skin images through the same pipeline; condition-specific performance is validated for the highest-risk conditions (melanoma, malignant lesions) and representative chronic conditions; full ICD-11 coverage is monitored through PMCF
- Patient populations: Present available demographic data (Fitzpatrick from studies that have it; age ranges; geographic distribution); justify gaps via GDPR data minimisation and argue that skin condition diagnosis is less demographically sensitive than pharmacological interventions
- User groups: PCPs and dermatologists both well-represented across multiple studies
- Safety: Zero adverse events across 800+ patients in CIs + zero serious incidents across 4,500+ reports in market use; discuss why this is sufficient given the device's risk profile (SaMD, human-in-the-loop, no direct patient contact)
- Comparison to SotA: Device performance meets or exceeds SotA baselines derived from 64 articles; present the comparison at the clinical benefit level, not just the individual claim level
Fixes required in the CER
Fix 1: Add a "Sufficiency of clinical evidence" section
New section in the CER containing:
- The benefit-to-study mapping table
- The indication coverage analysis with justification
- Available demographic data and gap justification
- Per-study sample size summary
- Aggregate safety conclusion incorporating both CI data and legacy PMS data
Fix 2: Correct the MDCG 2020-6 evidence hierarchy table
In the CEP (R-TF-015-001, lines 692–710):
- Change Rank 5 (equivalence) from "No" to "Yes" — reference the equivalence assessment
- Change Rank 7 (complaints/vigilance) from "No" to "Yes" — reference PSUR/PMS Report
- Change Rank 8 (proactive PMS/surveys) from "No" to "Yes" — reference COVIDX CUS/SUS and PMCF surveys
Fix 3: Add device vs SotA synthesis
The CER currently presents individual performance claims with SotA values but no synthesis. Add a section that:
- Groups performance by clinical benefit
- Compares aggregate device performance vs SotA baselines
- Draws per-benefit conclusions on whether the device meets, exceeds, or falls below SotA
- Acknowledges limitations and how PMCF addresses them
Fix 4: Acknowledge and justify evidence limitations
Proactively address known weaknesses:
- AIHS4 small sample (2 patients) — justified by repeated measures design; Gap 2 in PMCF
- Limited per-condition coverage beyond melanoma — justified by uniform architecture; PMCF monitoring
- Limited geographic diversity (Spain only) — justified by skin condition universality; planned international PMCF
- Observational designs only — justified by MDCG 2020-1 guidance on MDSW evidence requirements
8. Risk assessment
| Risk | Impact | Mitigation |
|---|---|---|
| BSI concludes evidence is insufficient for all claimed indications | Could require narrowing claims to only validated conditions, which would impact IFU and intended purpose | Present the uniform architecture argument clearly; show that high-risk conditions (melanoma) have strongest coverage; acknowledge monitoring gaps addressed by PMCF |
| AIHS4's 2-patient study undermines severity assessment benefit | BSI may require additional pre-market data for severity claims | Frame as "initial validation with confirmatory PMCF" per MDCG 2020-7; emphasise that COVIDX provides additional severity data for chronic conditions |
| Demographic coverage gaps (no age/sex breakdown) undermine population claim | BSI may question whether results generalise across demographics | Compile Fitzpatrick data from studies that have it; present geographic diversity of study sites; cite GDPR data minimisation as legitimate constraint |
| Evidence hierarchy inconsistency triggers a secondary finding | Could generate a new NC about CEP quality | Fix the table proactively in both CEP and CER before responding |
9. Open items
Most open items for Item 3b are the same as Item 3a (see question-for-jordi.mdx). One additional item:
- Which studies have Fitzpatrick data? — User confirmed some studies have Fitzpatrick skin type data. Need to identify which ones and compile the data for the population coverage analysis. This may require reading each study's CIR in detail.