Response

In response to the observations regarding the clarity, traceability, and justification of clinical benefit, performance, and safety outcomes, we have performed a comprehensive update of our technical documentation. This update addresses the derivation of acceptance criteria from the State of the Art (SotA), justifies specific thresholds, clarifies metric definitions, explains data pooling and indication labels, and provides safety benchmarking against the SotA.

The updated CER (R-TF-015-003) has also been structured to make the MEDDEV 2.7.1 Rev 4 staged clinical evaluation process explicit. A new section, "MEDDEV 2.7.1 Rev 4: Clinical Evaluation Stages," documents how the CER demonstrates completion of Stage 0 (scope), Stage 1 (identification of pertinent data, Section 8), Stage 2 (appraisal of pertinent data, Section 9), Stage 3 (analysis of clinical data, Section 10), and Stage 4 (the clinical evaluation report itself). This ensures that the clinical evaluation process is fully traceable and auditable against the MEDDEV framework.

1. Traceability of Acceptance Criteria to State of the Art (SotA)

We acknowledge the observation that the analytical link between the reported SotA baselines and the acceptance criteria was not sufficiently explicit, and that the rationale for selecting specific articles and similar devices was not fully detailed. To address this, we have updated R-TF-015-003 (Clinical Evaluation Report) and significantly expanded R-TF-015-011 (State of the Art).

Rationale for Selection (Why they were chosen): In the SotA document (R-TF-015-011), we have added a new subsection, "Rationale for the Selection of Articles and Similar Devices". This explicitly details that articles were not chosen arbitrarily, but prioritized based on their clinical relevance (e.g., evaluating human-in-the-loop performance, which matches our intended use) and methodological quality (prioritizing meta-analyses and MRMC pivotal trials). It also clarifies that similar devices (like DermaSensor, SkinVision, ModelDerm) were included because their FDA/CE-marked status establishes the current technological and competitive benchmark for acceptable benefit-risk profiles.
Systematic Derivation & Deep Analysis: We moved beyond summarizing articles by adding a detailed "Data Pooling and Statistical Analysis" methodology and a "Clinical Domains and Traceability to SotA" section in both the CER and SotA documents. These sections provide a direct chain of evidence, explicitly linking the Clinical Claim to the specific subset of chosen SotA Articles, explaining the statistical synthesis performed (meta-analysis or weighted average), the Derived SotA Baseline, and the final Acceptance Criterion. This approach ensures that the benchmarks remain statistically grounded and directly traceable to the highest-quality clinical evidence, rather than being simple summaries.
Pooling Methodology: We have documented the pooling methodology in the CER (R-TF-015-003, section "Tiered evidence assessment strategy"). Aggregate performance metrics are calculated using a weighted average formula: Sigma(achievedValue x sampleSize) / Sigma(sampleSize). Critically, the CER now provides an explicit risk-based justification for when and why pooling is appropriate, grounded in four factors: (a) comparable clinical consequence of misclassification within the pooled categories (delayed or modified treatment, not mortality, with the physician's clinical assessment providing the safety net); (b) the device outputs a single probability distribution across all ICD-11 categories simultaneously, so pooled assessment reflects how the device actually works; (c) representative sampling across epidemiological categories, demonstrated by a coverage matrix based on the Global Burden of Skin Disease framework (Karimkhani et al. 2017) covering 5 of 7 categories (97% of presentations); and (d) consistent Vision Transformer architecture processes all inputs through the same feature extraction pipeline. Pooling is applied only to Tier 3 (general, non-malignant, non-rare conditions); Tier 1 (malignant conditions) uses individual analysis with per-condition acceptance criteria, and Tier 2 (rare diseases) uses grouped analysis with a dedicated acceptance criterion.

The following summarises the per-domain derivation chain documented in the updated SotA (R-TF-015-011, section "Clinical Domains and Traceability to SotA") and CER (R-TF-015-003, section "Acceptance Criteria Derivation from State of the Art"). For each clinical domain the specific articles used, the synthesised SotA baseline, and the resulting acceptance criterion are provided:

Melanoma Detection (Tier 1, individual analysis): A meta-analysis of Maron et al. (2019), Haenssle et al. (2018), Barata et al. (2023), Chen et al. (2024), Maron et al. (2020), Brinker et al. (2019, two studies), and Marchetti et al. (2019) yielded SotA baselines of AUC 0.81 [0.78-0.84], Top-1 accuracy 0.754 [0.70-0.80], sensitivity 0.734 [0.67-0.79], and specificity 0.762 [0.68-0.84]. These values establish the non-inferiority benchmarks. Given the high clinical risk of melanoma (where a missed diagnosis may result in delayed treatment and increased mortality), the acceptance criteria were set above the SotA baselines by applying the same 10-percentage-point clinical significance margin used across other domains (e.g., AUC >= 0.848, representing a margin of approximately 0.04 above the SotA baseline of 0.81, justified by the need to demonstrate substantial clinical benefit per MDCG 2020-1 for this highest-risk indication).
Multiple Malignant Conditions (Tier 1, pooled malignancy): For non-melanoma skin cancers (BCC, SCC), the clinical management pathway prioritises identifying malignancy to trigger referral and biopsy rather than definitive sub-typing, which is reserved for histopathology. A meta-analysis of Maron et al. (2019), Han et al. (2020), Ahadi et al. (2021), Tepedino et al. (2024), and Tschandl et al. (2019) yielded a pooled AUC of 0.778 [0.74-0.80], sensitivity 0.76 [0.70-0.82], and specificity 0.79 [0.71-0.85]. The acceptance criterion for pooled malignancy was set at AUC >= 0.90, adding a clinical safety margin above the SotA baseline to ensure substantial clinical benefit per MDCG 2020-1.
Diagnostic Accuracy Improvement (human-in-the-loop): A weighted average of Ba et al. (2022), Ferris et al. (2025), Han et al. (2020), Jain et al. (2021), Maron et al. (2020), Krakowski et al. (2024), and Tschandl et al. (2020), all evaluating the improvement in clinician accuracy when aided by AI, established SotA baselines of +6.36% overall accuracy improvement, +6.30% sensitivity improvement, and +4.60% specificity improvement. For PCPs specifically, the baseline improvement was +9.30% accuracy, +13.00% sensitivity, and +10.80% specificity. To ensure a substantial clinical benefit (per MDCG 2020-1) and account for real-world image variability, the acceptance criterion was set at approximately 10 percentage points above the SotA average (e.g., target >= +15% overall accuracy improvement versus the SotA baseline of +6.36%).
Unaided Diagnostic Accuracy (baseline): An additional meta-analysis incorporating Escalé-Besa et al. (2023), Han et al. (2020, 2022), Kim et al. (2022), Liu et al. (2020), and Muñoz-López et al. (2021) established the unaided HCP baseline: overall Top-1 accuracy 0.49 [0.46-0.54], sensitivity 0.69 [0.63-0.75], and specificity 0.764 [0.73-0.79]. For PCPs specifically, Top-1 accuracy was 0.419 [0.36-0.47]. These unaided baselines provide the reference against which the device-aided improvement is measured.
Resource Optimisation and Referral Adequacy: A weighted average of Baker et al. (2022), Eminović et al. (2009), Jain et al. (2021), and Knol et al. (2006), all investigating AI-triage and teledermatology impact on healthcare utilisation, demonstrated a 14-24% reduction in unnecessary referrals with the use of medical devices and teledermatology. The acceptance criterion was set at a minimum 30% reduction, adding a clinical significance margin. For waiting times, while the average waiting time in standard practice typically exceeds 60 days (up to 132 days in some regions, per the Spanish SNS Report June 2025, DREES Report 2018, and DERMAsurvey 2013), the SotA literature for teledermatology workflows demonstrated reductions of approximately 71%. The acceptance criterion was set conservatively at a minimum 50% reduction.
Remote Care Capacity: A weighted average of Giavina-Bianchi et al. (2020), Orekoya et al. (2021), Kheterpal et al. (2023), and Whited (2015) indicated that 55% of patients could be managed remotely via teledermatology. The acceptance criterion was set at a minimum 55% remote management capacity.
Referral Sensitivity for PCPs: Burton et al. (1998) and Gerbert et al. (1996) established an unaided PCP referral sensitivity of 0.663 [0.61-0.71] and specificity of 0.60 [0.51-0.69]. The acceptance criterion was set at a minimum 10% relative improvement in both sensitivity and specificity when using the device over this baseline.
Interobserver Agreement for Hidradenitis Suppurativa Severity (IHS4): Goldfarb et al. (2021) and Thorlacius et al. (2019), the primary validation studies for the IHS4 scoring system, yielded a weighted average ICC of 0.47 [0.32-0.65], establishing the SotA baseline for human expert interobserver agreement. The device acceptance criterion was set at ICC >= 0.70, substantially above the SotA baseline, to demonstrate good-to-excellent reliability (per Cicchetti 1994) and to improve upon the variability inherent in manual IHS4 scoring. The per-condition acceptance criteria for the remaining severity sub-criteria (psoriasis/PASI, urticaria/UAS, atopic dermatitis/SCORAD) are derived from the published Technical Performance literature, where expert dermatologist consensus serves as the clinical reference standard (see Section 6).
Alopecia Severity Assessment: Due to a recognised lack of specific literature addressing inter-observer agreement for the Ludwig scale in female androgenetic alopecia, the acceptance criterion was established based on the Landis and Koch interpretive framework: a minimum Cohen's Kappa of 0.6, representing moderate-to-substantial agreement.
Expert Consensus Agreement: Methodological literature does not set a single universal threshold for majority-vote consensus; however, >= 75% is frequently considered a substantial or optimal majority in clinical validation. This was established as the acceptance criterion.

This per-domain mapping, now documented in full in R-TF-015-011 and R-TF-015-003, provides the direct analytical chain from the 64 appraised SotA articles to every acceptance criterion.

2. Justification of Acceptance Criteria

The following justifies the acceptance criteria values that were noted as appearing low. These explanations have been incorporated into the CER (R-TF-015-003):

Alopecia Severity Assessment (Cohen's Kappa >= 0.6): There is a recognized lack of specific literature addressing inter-observer agreement for pathological severity assessment of Female Androgenetic Alopecia. In the absence of disease-specific benchmarks, we adopted the Landis & Koch framework, which is the gold standard for interpreting the Cohen's Kappa metric. In this framework, 0.41-0.60 represents "Moderate" agreement. Given the inherent subjectivity of visual severity scales in alopecia, a threshold of kappa >= 0.6 (Substantial agreement) is a rigorous and clinically acceptable benchmark for a medical device intended to standardize assessments.
Diagnostic Accuracy in Rare Diseases (Accuracy >= 54%): Skin rare diseases present a significant challenge due to low incidence and high misdiagnosis rates. An acceptance criterion of 54% represents a significant documented clinical benefit over unaided HCPs. On average, for both dermatologists and PCPs, the use of the device resulted in a 26.77% increase in Top-1 diagnostic accuracy, a 25.56% increase in sensitivity, and a 23.50% increase in specificity for rare diseases (based on pivotal studies BI 2024 and PH 2024). This represents a meaningful improvement in diagnostic precision for complex cases where biopsy remains the current alternative.
Teledermatology Referral Outcomes (Sensitivity Improvement >= 30%): The 30% figure does not represent the total sensitivity, but rather the improvement in the detection of cases requiring referral when using the device, compared to an unaided baseline (which in our studies was 0% for remote detection of specific referral criteria). This represents a clinically meaningful documented enhancement of primary care physician performance during teledermatology consultations.
Expert Panel Alignment (Majority Vote >= 75%): Methodological literature for expert consensus does not set a single universal threshold; however, an agreement of >= 75% is frequently considered a substantial or optimal majority consensus in clinical validation. This threshold ensures the device aligns with the consolidated judgment of a qualified expert panel, providing a robust reference standard for performance evaluation.

3. Metric Definitions and Terminology

To ensure clarity for clinical reviewers and users, and to explicitly address the rationale behind the chosen metrics, we have expanded the metric definitions and indication labels:

Metric Rationale and Relevance: We have added explicit definitions for Top-1, Top-3/Top-5 accuracy, AUC, Sensitivity, Specificity, PPV, NPV, ICC, Unweighted Kappa, and Experts' Consensus to the Glossary of the CER (R-TF-015-003), explicitly detailing why they were chosen and how they are relevant to the intended purpose:
- Top-1 Accuracy: Represents the "exact match" performance. Chosen to benchmark the algorithm's absolute precision against the single primary diagnosis made by clinicians.
- Top-3 / Top-5 Accuracy: Reflect the real-world clinical workflow of formulating a differential diagnosis. High Top-3/5 accuracy ensures the correct diagnosis is presented among the suggestions, prompting the HCP to consider it.
- AUC, Sensitivity, and Specificity: AUC demonstrates core discriminative power independent of thresholds; Sensitivity and Specificity demonstrate safety and utility at the clinical operating point.
- PPV and NPV: Chosen to evaluate the reliability of positive and negative findings respectively, quantifying the probability that the device's output correctly reflects the patient's true state.
- Intraclass Correlation Coefficient (ICC) and Unweighted Kappa: Chosen to evaluate the consistency and inter-rater agreement between the device's quantitative/categorical severity assessments and expert clinical judgment.
- Experts' Consensus (Majority Vote): Chosen to establish a robust reference standard for complex cases where individual expert opinions may vary.
- Efficiency Metrics: Explicit definitions for Reduction in Cumulative Waiting Time and Reduction in Unnecessary Referrals were added to quantify the systemic impact of the device on healthcare workflows.
Multiple Conditions Clarification: We have replaced the generic "Clarification on Multiple conditions" section in the CER (R-TF-015-003) with a new section, "Evidence coverage by disease category." The indication label "Multiple conditions" is now substantiated by a 7-category epidemiological framework (infectious diseases 57%, other conditions 19%, inflammatory diseases 15%, malignant neoplasms 5%, autoimmune diseases 3%, genodermatoses 1%, vascular conditions 1%) with a coverage matrix showing which studies cover which categories. This demonstrates that "Multiple conditions" encompasses conditions from 5 of 7 epidemiological categories representing 97% of dermatological presentations. The two categories with insufficient evidence (autoimmune and genodermatoses) are declared as acceptable gaps per MDCG 2020-6 § 6.5(e), with justification and linked PMCF activities.

4. Clinical Safety Outcomes and Benchmarking

To clarify how safety rates are established based on SotA and similar devices, and to provide the requested traceability, we have expanded both the CER (R-TF-015-003) and the SotA document (R-TF-015-011):

Quantitative SotA Analysis: We have added a new section to the SotA document, "Hazards and Safety Rates of AI-Guided Medical Devices", which performs a deep, quantitative analysis of safety outcomes (adverse events, false negatives, and technical failures) reported in vigilance databases and literature for similar devices. The analysis covers four hazard categories, each traced to specific SotA sources:
- Overall adverse events and patient harm: A systematic review of the MAUDE and EUDAMED databases over the last 10 years for similar approved devices (SkinVision, DermaSensor, ModelDerm, Dermalyser, FotoFinder, Molescope, DERM) revealed 0 reported incidents resulting in direct patient harm, injury, or death (Tepedino et al. 2024; Sangers et al. 2022). This justifies the safety objective of lower than 0.1% for overall adverse events.
- Misdiagnosis and false negatives: The SotA establishes that a false negative rate of approximately 4.5-10% is currently accepted for similar CE-marked and FDA-cleared devices. Specifically, SkinVision missed 14 out of 285 pre-malignant cases (4.9% false negative rate, Udrea et al. 2019); DermaSensor demonstrated 95.5% melanoma sensitivity (4.5% miss rate, Hartman et al. 2023) and 90.0% overall sensitivity in primary care (10.0% miss rate, Tepedino et al. 2024); and Dermalyser missed 4.8% of melanomas (sensitivity 95.2%, Papachristou et al. 2024). Our device's residual probability of harm from incorrect output is set at lower than 0.1%, which is justified because the device acts strictly as a decision-support tool; the probability of harm is the product of the AI error rate and the clinician failing to detect the error despite their own assessment and the provided explainability metadata.
- Poor image quality and artefacts: Navarrete-Dechent et al. (2020) demonstrated that ModelDerm's Top-1 accuracy dropped significantly (from 63% to 39%) depending on image upload conditions. Han et al. (2022) also noted significant performance deterioration with out-of-focus or inadequate quality images. This justifies the device's implementation of an automated image quality assessment module and the safety objective of maintaining lower than 0.1% unflagged poor-quality inputs affecting clinical output.
- System interoperability and data transmission failures: For cloud-based and API-driven medical devices, the IT industry standard for mission-critical healthcare APIs (e.g., FHIR servers) demands 99.9% uptime (a failure rate of lower than 0.1%). This justifies the corresponding safety acceptance criterion for interoperability failures.
Safety Benchmarking: The CER now includes a section, "Safety Benchmarking against State of the Art", which presents a direct comparison between our observed safety outcomes (0 incidents in investigations with >800 patients) and the benchmarks derived in the SotA analysis. Additionally, the SotA document now includes a comparative benefit-risk profile of similar devices, documenting that: DERM achieved a 62% reduction in urgent face-to-face appointments with 91.0% sensitivity and 80.4% specificity for skin cancer detection (Baker et al. 2022; Thomas et al. 2023); SkinVision achieved 95% sensitivity for skin cancer screening (Udrea et al. 2019; Sangers et al. 2022); DermaSensor augmented PCP management sensitivity from 82.0% to 91.4% (Ferris et al. 2025); and ModelDerm achieved an AUC of 0.937 for malignancy detection, with a 25% increase in clinician diagnostic accuracy when AI-assisted (Han et al. 2020, 2022). These benchmarks establish the competitive and clinical context within which our device's safety and performance are assessed.
Justification of Relevance: We have added specific "Relevance to Acceptance Criteria" descriptions for each safety hazard in the SotA document. This explains how market error rates were analysed to justify our device's stringent safety objectives, accounting for the human-in-the-loop clinical workflow. This ensures that our safety acceptance criteria are both clinically relevant and appropriate for a Class IIb device.

5. Use Environment vs. Remote Care

Regarding the observation that remote care (a sub-criterion within clinical benefit 3KX: Care Pathway Optimisation) seems to contradict the use environment stated in §14 of the CEP, we clarify that there is no contradiction. The apparent conflict arises from conflating the IT deployment environment with the clinical workflow modality:

Use Environment (IT Deployment Context): The text stating that the device is "intended to be used in the setting of healthcare organisations" describes where the software runs: specifically, as an API integrated into a healthcare organisation's IT infrastructure.
Remote Care (Clinical Workflow Modality): A clinician reviewing images remotely (e.g., via teleconsultation) while accessing the device through their organisation's systems is using the device "in the setting of healthcare organisations." Teledermatology is a standard clinical workflow that operates entirely within the stated IT use environment. No changes to the intended purpose or use environment text are required.

6. Navigability and Evidence Synthesis

To address the concern regarding the large number of individual performance claims (approximately 150) and provide a coherent view of the evidence base, we have added a "Summary of Clinical Benefits Achievement" table in the CER (R-TF-015-003). This aggregate view demonstrates that the device has successfully achieved all defined goals across the three clinical benefits. The observed aggregate magnitudes are summarized as follows:

Diagnostic Accuracy: all presentations (7GH): Three sub-criteria, all achieved: (a) General conditions: +18.5% aggregate Top-1 accuracy improvement (criterion ≥15%), supported by 70 claims; (b) Rare diseases: 54.8% aggregate Top-1 accuracy (criterion ≥54%), supported by 24 claims; (c) Malignant lesions: AUC 0.97 (criterion ≥0.90), supported by 20 claims. Total 114 aggregated claims from studies: BI_2024, IDEI_2023, MC_EVCDAO_2019, PH_2024, SAN_2024, DAO_Derivación_PH_2022, DAO_Derivación_O_2022.
Objective Severity Assessment (5RB): The evidence for this benefit is structured across two distinct evidence types per MDCG 2020-1. Technical Performance evidence (Pillar 2) is provided by four published peer-reviewed validation studies demonstrating algorithm-level concordance with expert dermatologist consensus on internationally validated severity scales: (a) psoriasis/PASI (Mac Carthy et al., JEADV Clinical Practice, 2025: device accuracy 60.6% vs human annotator 52.5% for erythema; acceptance criterion: visual sign accuracy >= human annotator consensus; 2,857 images, 4 expert dermatologists); (b) urticaria/UAS (Mac Carthy et al., JID Innovations, 2024: Krippendorff alpha 0.826 for hive counting; acceptance criterion: alpha >= 0.60; 313 images, 5 dermatologists); (c) hidradenitis suppurativa/IHS4 (Hernández Montilla et al., Skin Research and Technology, 2023: performance comparable to the most expert physician; 221 images, 6 dermatologists); (d) atopic dermatitis/SCORAD (Medela et al., JID Innovations, 2022: RMAE 13.0%; acceptance criterion: RMAE <= 15%; 1,083 images, 9 dermatologists). All four publications are appraised using MINORS (non-comparative, all scored 11/16) and ranked 5–6 per MDCG 2020-6 Appendix III. Clinical Performance evidence (Pillar 3) includes the AIHS4_2025 study (ICC 0.727, acceptance criterion >= 0.70, 2 patients, 16 longitudinal assessments) and 9 aggregated clinical claims (LL5, SDP, 3OA, EZ1, JWQ, A1Q, 284, 3OB, 7TS) derived from clinical investigations: AIHS4_2025, COVIDX_EVCDAO_2022, IDEI_2023. This combined evidence base — algorithm-level Technical Performance established across 4 dermatological conditions, supplemented by preliminary Clinical Performance data — supports the severity assessment benefit for initial CE marking. Prospective Clinical Performance confirmation at scale is planned through PMCF activities B.1–B.5 (target: 100 patients per condition, ICC > 0.75).
Care Pathway Optimisation (3KX): Three sub-criteria, all achieved: (a) Waiting times: 56% reduction (criterion ≥50%), supported by 14 claims; (b) Referral adequacy: 38% reduction in unnecessary referrals (criterion ≥30%), supported by 8 claims; (c) Remote care: +30% improvement in referral sensitivity and 100% expert consensus agreement (criterion ≥75%), supported by 5 claims. Studies: COVIDX_EVCDAO_2022, DAO_Derivación_PH_2022, DAO_Derivación_O_2022, PH_2024, SAN_2024.

By presenting these aggregate outcomes, we frame the approximately 150 detailed performance claims as robust supporting evidence for a clear and unified clinical benefit case across three consolidated benefits.

7. Predictive Values and Sample Size Adequacy (MEDDEV 2.7.1 Rev 4, Annexes A7.3 and A7.4)

In accordance with MEDDEV 2.7.1 Rev 4 Annex A7.3, the CER (R-TF-015-003) has been updated to include PPV and NPV across varying pre-test probabilities for the highest-risk indication (melanoma detection). Using the device performance from MC_EVCDAO_2019 (Sensitivity 93.2%, Specificity 81.0%), predictive values are provided for three representative clinical settings: primary care (2% pre-test probability, PPV 9.1%, NPV 99.8%), general dermatology (10%, PPV 35.2%, NPV 99.0%), and pigmented lesion clinic (30%, PPV 67.7%, NPV 96.4%). This analysis confirms that across all intended clinical settings the NPV remains high (>= 96%), supporting the safe use of the device in a human-in-the-loop workflow.

In accordance with MEDDEV 2.7.1 Rev 4 Annex A7.4, the CER has also been updated to include an explicit statement confirming that the clinical investigation portfolio of over 800 patients provides statistically adequate sample size for safety conclusions. At 0 observed adverse events across 800 patients, the upper 95% confidence bound for the true adverse event rate is 0.375% (rule of three), confirming that serious adverse event rates above 0.4% can be excluded with high confidence.

Summary of Changes

R-TF-015-001 (Clinical Evaluation Plan):
- Corrected the benefit 7GH sub-criterion (c) melanoma specificity acceptance criterion from 90% to 80%, aligning with the observed result from MC_EVCDAO_2019 (Specificity > 0.80).
- Corrected the 5RB ICC acceptance criterion from "A 72.70% ICC" to "An ICC of at least 70%", distinguishing the pre-registered criterion (>= 70%) from the observed result (ICC 0.727).
R-TF-015-011 (State of the Art): Added the "Methodology for Establishing Acceptance Criteria" section, including: "Rationale for the Selection of Articles and Similar Devices" (documenting selection based on clinical relevance, methodological quality, and competitive benchmarking against CE-marked/FDA-cleared devices); "Data Pooling and Statistical Analysis" methodology (extraction, meta-analysis/weighted average synthesis, and threshold definition with a 10-percentage-point clinical significance margin); "Clinical Domains and Traceability to SotA" (per-domain derivation chain for all 10 clinical domains from melanoma detection to expert consensus); and "Hazards and Safety Rates of AI-Guided Medical Devices" (quantitative analysis of vigilance databases and literature for similar devices across four hazard categories).
R-TF-015-003 (Clinical Evaluation Report):
- Added the "MEDDEV 2.7.1 Rev 4: Clinical Evaluation Stages" section explicitly documenting Stages 0–4.
- Added the "Acceptance Criteria Derivation from State of the Art" section with detailed literature derivation mappings.
- Added derivation table rows for: 5RB alopecia severity correlation (>= 65%, observed 77%); benefit 7GH sub-criterion (c) PPV/NPV primary care and dermatology (human-in-the-loop criteria with SotA literature basis); benefit 3KX sub-criterion (c) referral specificity in teledermatology (>= 65%, observed 66.7%).
- Added the "Predictive Values by Clinical Setting" section (MEDDEV A7.3 compliance), with a clarifying note distinguishing standalone device predictive values from the human-in-the-loop PPV/NPV acceptance criteria.
- Added the "Summary of Clinical Benefits Achievement" table to provide a coherent aggregate view of the evidence.
- Expanded the Summary table to include: benefit 3KX sub-criterion (b) referral sensitivity and specificity (observed 74% and 67% respectively, both criteria met); individual melanoma metrics for benefit 7GH sub-criterion (c) (AUC 0.85, accuracy 0.81, sensitivity 0.93, specificity 0.80, all criteria met against SotA-derived benchmarks); benefit 3KX sub-criterion (c) remote referral specificity (observed 66.7%, criterion >= 65%).
- Added the "Tiered evidence assessment strategy" section (replacing the previous "Data Pooling Methodology") and "Evidence coverage by disease category" section (replacing the previous "Clarification on Multiple conditions").
- Added the "Safety Benchmarking against State of the Art" section comparing safety outcomes to similar devices from vigilance databases.
- Added MEDDEV A7.4 sample size adequacy statement to the Safety Benchmarking section.
- Corrected the benefit 3KX sub-criterion (c) acceptance criterion bullet to >= 30% improvement, with the SotA-derived analytical floor of >= 10% explicitly distinguished.
- Corrected the 5RB ICC criterion in the Summary table from >= 0.72 to >= 0.70.
- Updated the Glossary with definitions for Top-1, Top-3, and Top-5 accuracy.
IFU: Updated the Glossary with definitions for Top-1, Top-3, and Top-5 accuracy.

1. Traceability of Acceptance Criteria to State of the Art (SotA)​

2. Justification of Acceptance Criteria​

3. Metric Definitions and Terminology​

4. Clinical Safety Outcomes and Benchmarking​

5. Use Environment vs. Remote Care​

6. Navigability and Evidence Synthesis​

7. Predictive Values and Sample Size Adequacy (MEDDEV 2.7.1 Rev 4, Annexes A7.3 and A7.4)​

Summary of Changes​