Response
The sufficiency of the clinical evidence is justified through a structured, risk-proportionate analysis addressing quantity, quality, coverage of indications, coverage of patient populations, and identified gaps. The justification framework follows MDCG 2020-6 (evidence hierarchy and gap management), MEDDEV 2.7.1 Rev 4 Sections 9–10 (appraisal and analysis methodology), and MDCG 2020-1 (MDSW clinical evidence requirements). We have updated R-TF-015-003 (Clinical Evaluation Report) with a dedicated "Justification of sufficiency of clinical evidence" section that presents this analysis in full.
Per MDCG 2020-1, sufficiency for medical device software is demonstrated across three pillars: Valid Clinical Association (VCA), established through the systematic literature review in R-TF-015-011 linking the visual morphological features processed by the AI model to the ICD-11 dermatological categories; Technical Performance, demonstrated through algorithm validation studies and the AI development report (R-TF-028-005); and Clinical Performance, demonstrated through the clinical investigations described below, conducted in real-world clinical settings with real patients. The pre-market evidence base, comprising Rank 2–4 clinical studies, Rank 7–8 real-world deployment data, Rank 7 legacy market data, and published Technical Performance evidence (MDCG 2020-1 Pillar 2), is sufficient to support initial CE marking. PMCF activities provide essential prospective confirmation per MDCG 2020-6 § 6.4, with targeted activities for the two declared acceptable gaps (Section 3 below) as explicitly permitted under MDCG 2020-6 § 6.5(e).
1. Evidence quantity and quality: characterised by MDCG 2020-6 Appendix III ranking
The clinical evidence portfolio comprises nine clinical investigations involving over 800 patients and 60+ healthcare professionals across multiple clinical settings. We have updated the CER to characterise each study by its evidence rank per MDCG 2020-6 Appendix III, rather than applying a uniform quality label:
- Rank 2–4 (analytical observational studies with real patients in clinical settings): MC_EVCDAO_2019 (105 patients, melanoma-suspected lesions, prospective, single-centre) and IDEI_2023 (202 patients, pigmented lesions and alopecia, prospective and retrospective). These constitute the primary clinical evidence.
- Rank 7–8 (real-world deployment and proactive PMS data): COVIDX_EVCDAO_2022 (160 patients, prospective, 6 dermatologists), DAO_Derivación_O_2022 (117 patients, prospective, primary care referrals), DAO_Derivación_PH_2022 (131 patients, prospective, primary care referrals). These studies were conducted in real clinical workflows and constitute clinical data per MDR Article 2(48).
- Rank 11 (simulated use with healthcare professionals): BI_2024 (100 images, 15 HCPs), PH_2024 (30 images, 9 PCPs), SAN_2024 (29 images, 16 practitioners). These multi-reader, multi-case (MRMC) studies are classified as simulated use per MDCG 2020-6 Appendix III and are positioned as corroborating technical performance evidence, not primary clinical data. They demonstrate the magnitude of HCP diagnostic improvement when using the device but do not constitute clinical data under MDR Article 2(48).
- Rank 7 (complaints and vigilance data): The equivalent legacy device has been in clinical use since 2020, generating over 4,500 clinical reports across 21 contracts. Zero complaints regarding clinical safety or performance were reported during the surveillance period; zero serious incidents, CAPAs, or FSCAs. This data is clinical data per MDR Article 2(48), appraised using IMDRF MDCE WG/N56 Appendix F quality criteria as endorsed by MDCG 2020-6 Appendix I. The use of this legacy market data as clinical evidence is grounded in MDCG 2020-6 § 6.2.2, which addresses the use of post-market data from legacy devices in the clinical evaluation of the transitioning device.
The sufficiency argument rests on the portfolio as a whole: primary evidence from Rank 2–4 real-world clinical studies, corroborated by Rank 7–8 deployment data and Rank 11 simulated-use studies. This characterisation is now explicit in R-TF-015-003.
2. Coverage of indications: 3-tier evidence structure with 7-category epidemiological framework
We have updated the CER to justify coverage of all relevant indications using a risk-proportionate, 3-tier evidence structure and a 7-category epidemiological framework based on the Global Burden of Skin Disease (Karimkhani et al. 2017).
The 7 epidemiological categories and their evidence coverage are: infectious diseases (57% of global burden, 4 studies), other conditions (19%, 7 studies), inflammatory diseases (15%, 7 studies), malignant and pre-malignant neoplasms (5%, 7 studies), autoimmune diseases (3%, 2 studies), genodermatoses (1%, 0 studies), and vascular conditions (1%, 4 studies). The evidence coverage matrix, showing which studies cover which categories, is now documented in R-TF-015-003.
The 3-tier evidence structure assesses the clinical evidence at a depth proportionate to the clinical risk of misclassification:
- Tier 1: Malignant conditions (individual analysis). Individual acceptance criteria per condition or condition group. MC_EVCDAO_2019 provides dedicated melanoma evidence (AUC 0.8482, Top-3 sensitivity 0.9032 for melanoma; AUC 0.8983 for overall malignancy detection). Six additional studies contribute malignancy data across melanoma, BCC, SCC, and actinic keratosis. This satisfies MEDDEV 2.7.1 Rev 4 Annex A7.3 (per-indication sensitivity/specificity for major clinical indications) and MDCG 2020-6 (no unsupported pooling for high-risk indications).
- Tier 2: Rare diseases (grouped analysis). Grouped analysis with a dedicated acceptance criterion. BI_2024 and PH_2024 provide evidence for rare disease diagnosis: +26.77% improvement in Top-1 accuracy for conditions including GPP, acne conglobata, palmoplantar pustulosis, subcorneal pustular dermatosis, and pemphigus vulgaris. This satisfies MDCG 2020-6 § 6.5(e) by demonstrating specific evidence for a higher-risk subgroup.
- Tier 3: General conditions (pooled with risk-based justification). Pooled analysis across non-malignant, non-rare categories. The risk-based justification for pooling, now documented in the CER, rests on four factors: (a) comparable clinical consequence of misclassification within these categories (delayed or modified treatment, not mortality, with the physician's clinical assessment providing the safety net); (b) the device outputs a single probability distribution across all ICD-11 categories simultaneously, so pooled assessment reflects how the device actually works; (c) representative sampling across epidemiological categories, demonstrated by the coverage matrix covering 5 of 7 categories (97% of presentations); and (d) consistent Vision Transformer architecture processes all inputs through the same feature extraction pipeline.
3. Declared acceptable gaps: MDCG 2020-6 § 6.5(e)
Per MDCG 2020-6 § 6.5(e), when evidence is insufficient for an indication, the gap must be declared acceptable with justification and addressed via PMCF. Two epidemiological categories are declared as acceptable evidence gaps in the CER (R-TF-015-003, "Need for more clinical evidence"):
- Gap 4: Autoimmune diseases (3% of dermatological presentations): Only bullous pemphigoid (5 cases in DAO_O_2022) provides autoimmune-specific evidence not already counted in Tier 2. Gap declared acceptable because autoimmune conditions typically require serological confirmation beyond visual assessment, the device is a decision-support tool, and misranking does not carry acute mortality risk comparable to malignancy. Addressed by PMCF Activity D.1 (prospective surveillance, 50-case target, Top-3 accuracy >= 60%).
- Gap 5: Genodermatoses (1% of presentations): Zero direct representation. Gap declared acceptable because these conditions are diagnosed through genetic testing and clinical history, extreme rarity makes prospective recruitment impractical, and the device's role is supportive. Addressed by PMCF Activity D.2 (passive surveillance, safety-trigger-based).
Coverage of the remaining 5 categories (97% of presentations) is demonstrated by the coverage matrix. The coverage is strong for other conditions, inflammatory diseases, and malignant neoplasms (7 studies each); moderate for infectious diseases (4 studies); and adequate for vascular conditions (4 studies, predominantly benign presentations).
4. Coverage of patient populations
The CER has been updated with a comprehensive demographic analysis. Study populations span paediatric (6.3%), adult (69.5%), and geriatric (19.6%) patients, with balanced gender distribution (45.4% male, 54.6% female), and Fitzpatrick skin phototypes I through V. Phototype V is present but in limited numbers; phototype VI has minimal representation. Both are identified as a monitoring priority in the PMCF plan. This underrepresentation is communicated to users in the IFU (Important Safety Information, "Population and performance variability"), which instructs clinicians to exercise particular judgment for underrepresented populations.
5. Coverage of clinical benefits
The three consolidated clinical benefits are all supported by the evidence portfolio, with all acceptance criteria achieved. Each acceptance criterion was derived from published SotA performance benchmarks identified and appraised in R-TF-015-011 (State of the Art); the derivation methodology and specific SotA articles underpinning each threshold are documented in R-TF-015-003, "Acceptance Criteria Derivation from State of the Art":
- Benefit 7GH: Diagnostic Accuracy (all presentations). Sub-criterion (a) general conditions: +18.5% aggregate Top-1 accuracy improvement (criterion >= 15%), 70 supporting claims. Sub-criterion (b) rare diseases: 54.8% aggregate Top-1 accuracy (criterion >= 54%), 24 claims. Sub-criterion (c) malignant lesions: AUC 0.97 (criterion >= 0.90), 20 claims. Supported by 7 studies spanning Ranks 2–4, 7–8, and 11.
- Benefit 5RB: Objective Severity Assessment. The evidence for severity assessment is structured across two distinct evidence types per MDCG 2020-1. Technical Performance evidence (Pillar 2) is provided by 4 published peer-reviewed validation studies that demonstrate algorithm-level concordance with expert dermatologist consensus on internationally validated severity scales. Clinical Performance evidence (Pillar 3) is provided by the AIHS4_2025 study (2 patients, 16 longitudinal assessments, ICC 0.727) and will be substantially expanded through prospective PMCF activities. Per-condition acceptance criteria and results from the published Technical Performance literature are: Sub-criterion (a) HS severity (IHS4): ICC >= 0.70; observed ICC 0.727 from AIHS4_2025 (real-world clinical setting, limited sample). Sub-criterion (b) psoriasis severity (PASI): visual sign classification accuracy >= human annotator consensus; observed 60.6% vs 52.5% for erythema, from Mac Carthy et al., JEADV Clinical Practice, 2025 (2,857 images, 4 expert dermatologists). Sub-criterion (c) urticaria severity (UAS): Krippendorff alpha >= 0.60; observed alpha 0.826 for hive counting, from Mac Carthy et al., JID Innovations, 2024 (313 images, 5 dermatologists). Sub-criterion (d) AD severity (SCORAD): RMAE <= 15%; observed RMAE 13.0%, from Medela et al., JID Innovations, 2022. An additional published validation (Hernández Montilla et al., Skin Research and Technology, 2023; 221 images, 6 dermatologists) demonstrates HS severity assessment comparable to the most expert physician. This Technical Performance evidence establishes algorithm-level validity across 4 conditions using retrospective atlas-based datasets. Prospective validation with device-captured clinical images is a primary objective of PMCF activities B.1–B.5 (target: 100+ patients per condition), which will provide essential Clinical Performance evidence confirming real-world severity scoring performance. Clinical investigations contributing severity data: AIHS4_2025, COVIDX_EVCDAO_2022, IDEI_2023.
- Benefit 3KX: Care Pathway Optimisation. Sub-criterion (a) waiting times: 56% reduction (criterion >= 50%), 14 claims. Sub-criterion (b) referral adequacy: 38% reduction in unnecessary referrals (criterion >= 30%), 8 claims. Sub-criterion (c) remote care: +30% improvement in referral sensitivity and 100% expert consensus (criterion >= 75%), 5 claims. Supported by 5 studies.
6. Safety sufficiency
Safety is confirmed by two independent data sources. First, zero serious adverse events or device-related complications across all nine clinical investigations (800+ patients). At zero observed events, the upper 95% confidence bound for the true adverse event rate is 0.375% (rule of three per MEDDEV 2.7.1 Rev 4 Annex A7.4). Second, the equivalent legacy device's market experience (2020–2025: over 4,500 reports, zero serious incidents, zero FSCAs) provides independent real-world safety confirmation appraised under MDCG 2020-6 § 6.2.2.
7. Lifetime sufficiency through PMCF
The sufficiency argument extends beyond pre-market data. The PMCF plan (R-TF-007-002) describes a continuous evaluation lifecycle with 11 targeted clinical activities (A.1 through D.2) and 4 general surveillance methods. These are explicitly mapped to all three clinical benefits, all identified risks with clinical impact, and both declared evidence gaps. The annual PMCF evaluation cycle feeds directly into CER updates per GP-015, ensuring continuous reassessment of the benefit-risk profile throughout the device's market life as required by MDR Article 61(11) and Annex XIV Part B.
For further details, refer to the updated R-TF-015-003 (sections: "Tiered evidence assessment strategy", "Evidence coverage by disease category", "Need for more clinical evidence", "Justification of sufficiency of clinical evidence") and R-TF-015-001 (section 11). Red-lined documentation is provided.