R-TF-015-003 Clinical Evaluation Report
Table of contents
- Executive summary
- How to read this CER
- Clinical evidence strategy and regulatory methodology
- MDCG 2020-13 Specific Considerations (Sections I, J, K): Applicability
- Regulatory basis and evaluation mandate
- Combined evidence strategy: Routes A, B, and C
- MDCG 2020-1: three-pillar evidence framework for MDSW
- Evidence hierarchy and weighting
- Risk-proportionate tiered evidence structure
- Pre-market sufficiency determination
- Scope of the clinical evaluation
- Contraindications and precautions required by the manufacturer
- Clinical benefits
- Data collection, model training and validation
- Data Collection and Management
- Data Partitioning Strategy
- Development-data composition
- Model Training and Development
- Model Evaluation and Validation
- Commissioning and Real-World Validation
- Risk Management
- Traceability and Documentation
- Status of commercialization
- Previous version of the device
- Similar devices on the market
- Current knowledge - State of the Art
- Clinical Evaluation of the device
- Clinical evidence assessment framework
- Validated methodological quality appraisal
- Tiered evidence assessment strategy
- Data pooling methodology
- Evidence coverage by disease category
- Type of evaluation
- Demonstration of equivalence
- Clinical data generated and held by the manufacturer
- Clinical data collected from literature search
- Analysis of the clinical data
- Supplementary Literature Review: April 2026
- Acceptance Criteria Derivation from State of the Art
- Summary of Clinical Benefits Achievement
- Necessary measures
- Limitations and residual uncertainty
- Conclusions
- Date of the next Clinical Evaluation
- Qualification of the responsible evaluators
Executive summary
This Clinical Evaluation Report (CER) has been prepared in accordance with the requirements of Regulation (EU) 2017/745 (Medical Devices Regulation, MDR) and in line with MEDDEV 2.7/1 Revision 4 ("Clinical evaluation: Guidance for manufacturers and notified bodies under Directives 93/42/EEC and 90/385/EEC"), MDCG 2020-1 ("Guidance on Clinical Evaluation (MDR) / Performance Evaluation (IVDR) of Medical Device Software"), MDCG 2020-13 and MDCG 2020-6, as indicated in the associated Clinical Evaluation Plan (CEP).
This CER is a discussion of the benefit/risk profile of using the device (hereinafter, "the device") to provide support to health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, specifically by:
- Providing quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others.
- Providing an interpretative distribution representation of possible International Classification of Diseases (ICD) categories that might be represented in the pixels content of the image.
The device covers 346 validated ICD-11 categories covering visible diseases of the skin and produces three types of output: (1) a normalised probability distribution across all 346 categories for every image processed, (2) quantitative clinical sign measurements (intensity, count, and extent) for 37 clinical signs, and (3) explainability media (bounding boxes and segmentation masks). The device does not output a diagnosis, a binary result, or a treatment recommendation. Among the 346 categories, 13 are malignant neoplasms (including melanoma subtypes, basal cell carcinoma, squamous cell carcinoma, Merkel cell carcinoma, and cutaneous lymphomas); the detailed enumeration is provided in the "Device description" section. For the complete scope, device output specifications, and malignant condition listing, see "Device description: Device outputs," "Scope of ICD-11 categories," and "High-risk and malignant conditions."
The device qualifies as Medical Device Software (MDSW) under MDCG 2019-11 and is characterised under that guidance's three-axis framework as follows: (1) its significance of information provided to the healthcare decision is "Drives clinical management" (IMDRF 5.1.2), since the device's prioritised output directly informs the user's diagnostic and triage decision; (2) its state of healthcare situation or condition is "Critical" (IMDRF 5.2.1) for the melanoma category, where a delayed or missed identification can lead to irreversible deterioration, and "Serious" (IMDRF 5.2.2) for the remaining 345 non-melanoma ICD-11 categories within the device's probability distribution; (3) the governing IMDRF cell under MDCG 2019-11 is therefore III.i (Critical × Drives), selected under Rule 11's highest-applicable-cell logic. Accordingly, the device is classified as Class IIb under MDR Annex VIII, Rule 11, second indent; the derivation is recorded in the Classification section below and mirrors the device-description record. The previous version of the device (the legacy device) has been commercialized since 2020. The device is manufactured under a Conformity Assessment based on a Quality Management System in accordance with Chapter I of Annex IX of Regulation (EU) 2017/745 Medical Devices.
The clinical evaluation aimed to assess the compliance of the device with the relevant general safety and performance requirements (GSPRs), as laid down in the EU Regulation 2017/745 (MDR) (GSPR 1, 8 and 17).
The clinical evaluation of the device is supported by five pre-market prospective pivotal clinical investigations conducted in real clinical settings with real patients (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023 — Ranks 2–4); one peer-reviewed third-party manuscript providing clinical performance evidence in a specialist malignancy setting (NMSC_2025 — Rank 4); four multi-reader multi-case (MRMC) simulated-use reader studies with healthcare professionals (BI_2024, PH_2024, SAN_2024, MAN_2025 — Rank 11 per MDCG 2020-6 Appendix III; contributing supporting evidence to MDCG 2020-1 Pillar 3 §4.4), of which MAN_2025 provides broader Fitzpatrick-phototype generalisability evidence on images sourced from public dermatology atlases; one retrospective proof-of-concept pilot study (AIHS4_2025, n = 2 patients with 16 severity assessments; Supporting / proof-of-concept, not a CE-marking pivotal investigation); and the post-market cross-sectional observational PMS study of the equivalent legacy device (R-TF-015-012 — classified at Rank 8 primary per MDCG 2020-6 Appendix III (proactive PMS data) for both quantitative endpoints and Likert professional-opinion items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note) across 21 independent clinical sites. The evidence portfolio spans multiple study types across the MDCG 2020-6 Appendix III evidence hierarchy; sufficiency is justified by breadth and risk-proportionate design rather than uniformity of evidence level (see "Tiered evidence assessment strategy").
The legacy predecessor device, on the market since 2020 under MDD CE marking, has undergone continuous evaluation through post-market activities and remains on the market under MDR Article 120(3). Across the legacy predecessor's commercial deployment, 21 client institution contracts have been signed (covering government-run and for-profit care providers); the cumulative diagnostic-report denominator is approximately 250,000 reports, processed by more than 500 distinct healthcare-professional users (counted as unique authenticated user accounts, as of the date of this CER revision) and benefiting more than 100,000 distinct patients (de-duplicated by pseudonymised patient identifier where available, and otherwise estimated as a lower bound from per-institution per-period report throughput). Across the full reporting period, zero MDR Article 87 serious incidents and zero Field Safety Corrective Actions (FSCAs) have been reported (rule-of-three upper one-sided 95% bound ≤ 3/250,000 ≈ 0.0012%), and no Article 88 trend reports have been triggered. Three customer-reported Category 3a events (one clinical-output accuracy feedback, two API-availability events) and two non-safety complaints were logged, investigated, and closed through the R-006-002 non-conformity registry, consolidated in the legacy device PMS Report (R-TF-007-003). The post-market observational study R-TF-015-012 (classified at Rank 8 primary for both quantitative endpoints and Likert professional-opinion items per MDCG 2020-6 Appendix III, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note) supports — rather than independently confirms — the achievement of the three declared clinical benefits in routine clinical practice across 21 independent sites; 60 responses were collected and the analysis set is N = 56 after application of the pre-specified evidence-quality substantiation principle stated in the protocol's Section 10.7, with co-primary endpoints exceeding pre-specified MCIDs under Holm-Bonferroni multiplicity correction.
Justification of Sufficiency of Clinical Evidence
The manufacturer has established a robust body of clinical evidence that demonstrates the safety, performance, and clinical benefit of the device, providing sufficient clinical evidence in both quantity and quality in accordance with Article 61 and Annex XIV of the Regulation (EU) 2017/745.
- Quantity: The pre-market clinical-data base combines five prospective pivotal clinical investigations conducted with the frozen version of the device in real clinical settings on 715 real patients (MC_EVCDAO_2019 n=105, COVIDX_EVCDAO_2022 n=160, DAO_Derivación_O_2022 n=117, DAO_Derivación_PH_2022 n=131, IDEI_2023 n=202); one peer-reviewed third-party manuscript on a specialist malignancy cohort (NMSC_2025, n=135); four multi-reader multi-case (MRMC) simulated-use reader studies with healthcare professionals (BI_2024, PH_2024, SAN_2024, MAN_2025) providing Rank 11 Pillar 3 §4.4 supporting evidence; and one retrospective proof-of-concept pilot study (AIHS4_2025, n=2 patients with 16 severity assessments — classified as Supporting / proof-of-concept, not counted as a pivotal clinical investigation for sufficiency purposes; powered confirmation pre-committed under PMCF Activity B.5). Post-market clinical evidence is added through the
R-TF-015-012PMS observational study of the equivalent legacy device across 21 client institutions (60 responses collected; analysis set N = 56 after four responses were excluded as unsubstantiated safety flags under the protocol's Section 10.7 evidence-quality substantiation principle). The cumulative dataset covers a broad range of dermatological conditions, user tiers (primary care practitioners and dermatologists), and clinical settings. - Quality: The clinical evidence portfolio spans multiple study types across the MDCG 2020-6 Appendix III evidence hierarchy. The highest-level individual study is MC_EVCDAO_2019, an analytical observational study (105 patients) specifically designed to assess malignancy detection performance. Real-world evidence from clinical deployment is provided by COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, and IDEI_2023, conducted in actual clinical settings with real patients. Diagnostic accuracy improvement under controlled conditions is systematically quantified by MRMC simulated-use studies (BI_2024, PH_2024 and SAN_2024), which provide technical performance evidence per MDCG 2020-1. AIHS4_2025 provides preliminary proof-of-concept evidence for severity assessment of hidradenitis suppurativa from a retrospective analysis of clinical trial data; it is not treated as definitive clinical validation and does not on its own support a CE-marking severity claim, with powered confirmation pre-committed under PMCF Activity B.5. All studies were designed following methodologically sound procedures. Pivotal investigations were pre-registered in public databases (ClinicalTrials.gov and EMA RWD Catalogue) and, where applicable, results have been published in peer-reviewed scientific literature.
- Representativeness: The sufficiency of the clinical evidence regarding the intended patient population is supported by a comprehensive demographic analysis of the enrolled subjects. The study populations are representative of the target population across all life stages, including pediatric, adult, and geriatric populations, with a balanced gender distribution. Furthermore, the investigations included patients across diverse skin pigmentations (Fitzpatrick phototypes I to IV), reflecting the demographics of the intended clinical environment. Coverage of Fitzpatrick phototypes V–VI is strengthened by the MAN_2025 multireader multicase study, whose Clinical Investigation Plan and Clinical Investigation Report are recorded under
R-TF-015-004andR-TF-015-006. MAN_2025 evaluates device-assisted diagnostic accuracy on 149 clinical images representing phototypes V–VI presentations of the same dermatological conditions covered by the source MRMC studies. - Indication Coverage: The coverage of clinical indications is justified through a risk-proportionate, tiered evidence assessment strategy. Malignant conditions (5% of dermatological presentations) are assessed with individual acceptance criteria per condition. Rare diseases are assessed as a dedicated subgroup with specific acceptance criteria. General conditions, infectious (57%), inflammatory (15%), other (19%), and vascular (1%), are assessed as a pooled aggregate with documented risk-based justification. The ten pivotal studies collectively cover conditions from five of the seven major epidemiological categories of dermatological disease, representing 97% of dermatological presentations (infectious 57%, other 19%, inflammatory 15%, malignant 5%, vascular 1%). Two low-prevalence categories, autoimmune (3%) and genodermatoses (1%), have insufficient or no direct representation and are declared as acceptable gaps per MDCG 2020-6 § 6.5(e), addressed through targeted PMCF activities.
- Clinical Performance and Safety: The clinical performance of the device, which directly underpins its clinical benefits, has been empirically proven against performance thresholds derived from the generally acknowledged state of the art (SotA). The safety of the device is confirmed by the absence of serious adverse events or device-related complications across all clinical investigations, supported by the extensive market experience of the equivalent legacy device (over 250,000 generated reports and zero reported serious incidents or vigilance notifications).
This robust justification, supported by clinical evidence spanning MDCG 2020-6 Appendix III Ranks 2, 4, 6, 7, 8, and 11 — including the protocolled post-market observational study of the equivalent legacy device under R-TF-015-012 (Rank 8 primary for both quantitative and Likert items per §6.2.2, with a supplementary Rank 4 case retained for the quantitative endpoints) — and extensive market experience, confirms that sufficient evidence has been analysed to validate the clinical benefit, safety, and performance of the device for all relevant populations and indications.
Compliance Status Summary
All General Safety and Performance Requirements (GSPRs) are met:
- ✓ GSPR 1 (Performance & Safety): Device achieves intended clinical performance and complies with general safety requirements
- ✓ GSPR 8 (Acceptability of side-effects): No serious incidents reported; acceptable risk profile
- ✓ GSPR 17 (Software validation): AI algorithms meet repeatability, reliability, and performance standards
On the whole, the evaluators concluded that the device complied with the general requirement on safety (GSPR 1), acceptability of side-effects (GSPR 8) and minimization of risks (GSPR 17) when used as intended by the manufacturer.
This clinical evaluation concludes that the device achieved the intended clinical performances and complies with the general requirements on performances (GSPR 1).
Based on data from risk management and observations on the device under evaluation, and considering the results obtained on the clinical performances and benefits, we were able to conclude that the device complies with the general requirements on the acceptability of the benefit/risk profile (GSPR 1, GSPR 8 and GSPR 17).
Acronyms
| Acronym | Definition |
|---|---|
| AUC | Area Under the (ROC) Curve |
| CAPA | Corrective and Preventive Actions |
| CEP | Clinical Evaluation Plan |
| CER | Clinical Evaluation Report |
| CET | Clinical Evaluation Team |
| CIP | Clinical Investigation Plan |
| CIR | Clinical Investigation Report |
| CUS | Clinical Utility Score |
| DIQA | Deep Image Quality Assessment |
| DUQ | Data Utility Questionnaire |
| EMR | Electronic Medical Record |
| EU/EC | European Union / European Community |
| FDA | Food and Drug Administration |
| FHIR | Fast Healthcare Interoperability Resources |
| FMEA | Failure Modes and Effects Analysis |
| FSCA | Field Safety Corrective Action |
| GSPR | General Safety and Performance Requirement |
| HCP | Healthcare Professional |
| ICC | Intra-class Correlation Coefficient |
| IFU | Instructions For Use |
| IMDRF | International Medical Device Regulators Forum |
| ITP | IT Professional |
| MA | Meta-analysis |
| MCID | Minimum Clinically Important Difference |
| MDR | Medical Devices Regulation |
| MDSW | Medical Device Software |
| MEDDEV | MEDical DEVices Documents (European guidance series) |
| MRMC | Multi-Reader Multi-Case (reader study) |
| NPV | Negative Predictive Value |
| PCP | Primary Care Physician |
| PMCF | Post-market Clinical Follow-up |
| PMS | Post-market Surveillance |
| PPV | Positive Predictive Value |
| PSUR | Periodic Safety Update Report |
| RCT | Randomized Controlled Trial |
| RMF | Risk Management File |
| ROC | Receiver Operating Characteristic |
| SaMD | Software as a Medical Device |
| SME | Subject Matter Expert |
| SotA | State of the Art |
| SR | Systematic Review |
| STED | Summary Technical Documentation |
| SUS | System Usability Scale |
| USA | United States of America |
| ViT | Vision Transformer (deep-learning architecture) |
How to read this CER
This section explains the structure of this Clinical Evaluation Report and how to follow the traceability chain from the state of the art, through clinical benefit definitions and acceptance criteria, to the clinical data demonstrating that the criteria are met.
Document structure
This CER follows the five-stage clinical evaluation framework of MEDDEV 2.7.1 Rev 4. The stages map to the following sections of this document:
- Stage 0: Scope: "Scope of the clinical evaluation" (device description, intended purpose, methodology, applicable standards). The Clinical Evaluation Plan (
R-TF-015-001) and State of the Art document (R-TF-015-011) provide supporting detail. - Stage 1: Identification of pertinent data: "Clinical data generated and held by the manufacturer" (the combined pre-market portfolio of six prospective pivotal investigations, three MRMC Rank 11 simulated-use reader studies, one retrospective third-party analysis, and one Fitzpatrick V–VI MRMC reader study) and "Clinical data collected from literature search" (systematic literature review).
- Stage 2: Appraisal of pertinent data: For the SotA corpus, appraisal methodology and scores (CRIT1-7) are documented in
R-TF-015-011. For the manufacturer's clinical investigations and published severity validation studies, design-specific validated tools (QUADAS-2 for diagnostic accuracy studies; MINORS for clinical utility, MRMC, and published severity validation studies) are documented in the section "Validated methodological quality appraisal." - Stage 3: Analysis of clinical data: "Analysis of the clinical data" (safety conformity, clinical performance, acceptance criteria derivation, benefit achievement, benefit-risk assessment).
- Stage 4: This Clinical Evaluation Report documents the finalised analysis, benefit-risk determination and conformity conclusions, and establishes the PMS/PMCF feedback loop required under MEDDEV 2.7.1 Rev 4 §7.
Traceability chain: state of the art to clinical evidence
The central regulatory requirement is a traceable chain demonstrating that the device's clinical benefits are supported by sufficient evidence. The chain runs through five elements, each located in a specific section of this CER:
- State of the art: "Current knowledge: State of the Art" establishes the baseline performance of current clinical practice and comparable devices. The full analysis is in
R-TF-015-011 State of the Art. - Clinical benefit definitions: "Clinical benefits" defines three claimed benefits: 7GH (Diagnostic Accuracy), 5RB (Objective Severity Assessment), and 3KX (Care Pathway Optimisation). Each benefit has sub-criteria. The detailed performance claims are in the
Performance Claims & Clinical Benefitsdocument. - Acceptance criteria derived from the state of the art: "Acceptance Criteria Derivation from State of the Art" contains a table showing, for each benefit and clinical domain, the specific SotA articles used, the methodology (meta-analysis or weighted average), the derived SotA baseline, and the acceptance criterion set above that baseline.
- Clinical data: "Clinical data generated and held by the manufacturer" presents the combined pre-market portfolio (six prospective pivotal investigations; three MRMC Rank 11 simulated-use reader studies; one retrospective third-party analysis; and one Fitzpatrick V–VI MRMC reader study). "Evidence coverage by disease category" maps these against the 7 epidemiological categories of dermatological disease. "Clinical data collected from literature search" presents the systematic review. Each study has its own subsection with design, population, and results.
- Achievement of acceptance criteria: "Summary of Clinical Benefits Achievement" presents a single table where each row is a clinical benefit, showing: acceptance criteria (sub-criteria), observed magnitude, supporting studies, and pass/fail status. This is where the traceability chain terminates.
Examples
To trace a specific benefit end-to-end, three worked examples are provided, one per benefit type:
Example 1: Benefit 7GH (Diagnostic Accuracy), sub-criterion (c) malignant lesions
- Locate the 7GH Melanoma Detection row in "Acceptance Criteria Derivation from State of the Art." The Relevant SotA Article(s) column lists the meta-analysed sources (Maron et al. 2019, Haenssle et al. 2018, etc.); the Methodology column shows "Meta-analysis"; the Derived SotA Baseline column gives AUC 0.81; and the Acceptance Criterion column gives AUC >= 0.81 for melanoma (non-inferiority to the SotA meta-analysis baseline). For pooled multiple malignant conditions, the per-study acceptance criterion is AUC >= 0.80 (per the Clinical Evaluation Plan,
R-TF-015-001), with the SotA benchmark at AUC 0.778. - Navigate to "Clinical data generated and held by the manufacturer" and locate MC_EVCDAO_2019, which provides Tier 1 evidence for melanoma detection.
- Confirm in "Summary of Clinical Benefits Achievement" that benefit 7GH sub-criterion (c) is achieved: MC_EVCDAO_2019 melanoma AUC 0.85 (vs SotA 0.81); MC_EVCDAO_2019 pooled multiple malignant AUC 0.8983 (vs SotA 0.778); DAO_Derivación_O_2022 multiple malignant AUC 0.82 (vs SotA 0.778); DAO_Derivación_PH_2022 multiple malignant AUC 0.842 (vs SotA 0.778). The Status column reads "Achieved."
Example 2: Benefit 5RB (Objective Severity Assessment)
- Locate the 5RB Inter-observer Correlation (IHS4) row in "Acceptance Criteria Derivation from State of the Art." The Derived SotA Baseline is ICC 0.47; the Acceptance Criterion is ICC >= 0.70.
- Navigate to "Clinical data generated and held by the manufacturer" and locate AIHS4_2025, the proof-of-concept pilot study for hidradenitis suppurativa severity assessment (n = 2 patients).
- Confirm in "Summary of Clinical Benefits Achievement" that benefit 5RB achieved ICC 0.727, exceeding the criterion.
Example 3: Benefit 3KX (Care Pathway Optimisation), sub-criterion (b) referral adequacy
- Locate the 3KX Adequacy of Referrals row in "Acceptance Criteria Derivation from State of the Art." The Derived SotA Baseline is 14% (MD unaided) – 24% (teledermatology) relative increase in adequacy of referrals; the Acceptance Criterion is a relative increase in adequacy of referrals >= 15% (the SotA-derived threshold applied as the per-study acceptance criterion for DAO_Derivación_PH_2022 and DAO_Derivación_O_2022).
- Navigate to "Clinical data generated and held by the manufacturer" and locate the studies COVIDX_EVCDAO_2022, DAO_Derivación_PH_2022, and DAO_Derivación_O_2022, which provide real-world evidence of referral impact.
- Confirm in "Summary of Clinical Benefits Achievement" that benefit 3KX sub-criterion (b) achieved a +38% relative increase (DAO_Derivación_O_2022) and +25% relative increase (DAO_Derivación_PH_2022), exceeding the ≥ 15% criterion.
Key tables in this CER
| Table | Section | Purpose |
|---|---|---|
| Epidemiological categories | Device description: Scope of ICD-11 categories | Maps the 346 ICD-11 categories to 7 epidemiological groups with global burden percentages |
| High-risk and malignant conditions | Device description: High-risk and malignant conditions | Lists the 13 malignant neoplasm categories by ICD-11 code |
| Evidence coverage by disease category | Clinical Evaluation: Evidence coverage by disease category | Maps 7 epidemiological categories to the specific studies providing representation |
| Acceptance Criteria Derivation | Clinical Evaluation: Acceptance Criteria Derivation from State of the Art | Links SotA literature to derived baselines and acceptance criteria per benefit |
| Summary of Clinical Benefits Achievement | Clinical Evaluation: Summary of Clinical Benefits Achievement | Consolidated pass/fail results for all 3 benefits with observed magnitudes |
| Safety objectives | Clinical Evaluation: Risk management and residual risks acceptability | Maps residual risks to safety objectives and observed outcomes |
| Predictive Values by Clinical Setting | Clinical Evaluation: Predictive Values by Clinical Setting | PPV/NPV across pre-test probabilities per MEDDEV 2.7.1 Rev 4 Annex A7.3 |
Column definitions for key tables
Acceptance Criteria Derivation from State of the Art:
- Benefit ID: The identifier of the clinical benefit (7GH, 5RB, or 3KX) to which the criterion applies.
- Clinical Domain: The specific clinical question being measured (e.g., Melanoma Detection, Diagnostic Accuracy Improvement, Inter-observer Correlation).
- Relevant SotA Article(s): The published studies used to derive the baseline. Full references are in
R-TF-015-011 State of the Art. - Methodology: How the baseline was derived: "Meta-analysis" (formal statistical pooling), "Weighted Average" (quality- and sample-weighted), or "Literature-benchmarked range."
- Derived SotA Baseline: The synthesised baseline performance from the literature (e.g., AUC 0.81, sensitivity 0.734). This is what current clinical practice or comparable devices achieve.
- Acceptance Criterion: The pre-defined threshold the device must meet or exceed. Set above the SotA baseline with a clinical significance margin.
Summary of Clinical Benefits Achievement:
- ID: Clinical benefit identifier (7GH, 5RB, 3KX).
- Clinical Benefit: Plain-language description of the benefit and its scope.
- Acceptance Criteria (Sub-criteria): The specific thresholds per sub-criterion (e.g., "(a) General conditions: Top-1 accuracy improvement >= 15%").
- Observed Magnitude: The actual result achieved, aggregated across the supporting studies.
- Supporting Performance Claims & Source Studies: The number of individual performance claims aggregated, their codes, and the pivotal studies providing the data.
- Status: Pass/fail: whether the observed magnitude meets or exceeds the acceptance criterion.
Tiered evidence structure
The clinical evaluation uses a risk-proportionate, tiered evidence structure (detailed in "Tiered evidence assessment strategy"):
- Tier 1 (Malignant conditions, ~5%): Individual acceptance criteria per condition. Highest clinical stakes.
- Tier 2 (Rare diseases): Dedicated subgroup with specific acceptance criteria.
- Tier 3 (General conditions, ~94%): Pooled aggregate assessment with documented risk-based justification.
Two low-prevalence sub-indication categories, autoimmune dermatoses (~3 %) and genodermatoses (~1 %), are supported by triangulated pre-certification evidence judged sufficient per MDCG 2020-6 §6.3 on the four-test analysis set out in section Representativeness of the Study Populations — Pillar 1 Valid Clinical Association (22 load-bearing literature anchors, R-TF-015-011 §Autoimmune and genodermatoses) and Pillar 2 Technical Performance measured on the device's stand-alone analytical output without a clinician in the loop (per-epidemiological-group V&V in R-TF-028-006: autoimmune AUC 0.948, genodermatoses AUC 0.905; both above the pre-specified ≥ 0.80 acceptance criterion). Post-certification confirmation is pre-specified in R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan Activities D.1 (autoimmune) and D.2 (genodermatoses) — described in sections Need for more clinical evidence and Necessary measures. These two categories are therefore not declared as §6.5(e) acceptable gaps.
Device output architecture
The device outputs a normalised probability distribution across all 346 ICD-11 categories for every image. It does not output a binary diagnosis, a single classification, or a treatment recommendation. Performance claims are therefore framed in terms of ranking accuracy (Top-1, Top-3, Top-5), discrimination (AUC, sensitivity, specificity), inter-rater agreement (ICC, Cohen's kappa), and operational impact (waiting time reduction, referral adequacy, remote care capacity). The "Glossary and Definitions of Metrics" section provides formal definitions.
Clinical evidence strategy and regulatory methodology
MDR Article 61(1) requires that confirmation of conformity with relevant general safety and performance requirements shall be based on clinical data providing sufficient clinical evidence, including a favourable benefit-risk determination. MDR Annex XIV Part A establishes the structure of the clinical evaluation plan, which must identify the intended purpose, the clinical benefits to patients, the clinical performance parameters, and the methodology chosen to evaluate them. This section sets out the regulatory framework applied in this clinical evaluation and the combined evidence strategy from which the clinical conclusions presented in this report are derived.
MDCG 2020-13 Specific Considerations (Sections I, J, K): Applicability
MDCG 2020-13 defines three "Specific Considerations" sections (I, J, K) that apply only when specific regulatory triggers are engaged. For the device under evaluation, all three are not applicable, as recorded below.
- MDR Article 54 clinical-evaluation consultation procedure (Section I). Not applicable. Article 54(1) applies only to Class III implantable devices and to Class IIb active devices intended to administer or remove a medicinal product. The device under evaluation is a Class IIb non-implantable medical device software under MDR Annex VIII Rule 11 second indent and does not fall within the Article 54(1) scope. No consultation procedure has been initiated and no expert-panel opinion has been solicited under Article 54; accordingly sub-items (a), (b), and (c) of Article 54(2) are not engaged.
- MDR Article 61(10) exception (Section J). Not applicable. Demonstration of conformity with the applicable GSPRs (1, 8, 17) is based on clinical data as defined by MDR Article 2(48), presented in this CER within the three-pillar MDSW framework of MDCG 2020-1: Pillar 1 Valid Clinical Association (VCA); Pillar 2 Technical/Analytical Performance (a clinician-free, stand-alone analytical claim about the device's classification accuracy across the full 346-category ICD-11 terminology at the level of its documented stand-alone output); and Pillar 3 Clinical Performance (the intended user, using the device's Top-5 prioritised differential view, making measurably better diagnostic decisions in the target context of use). The evidence base comprises ten manufacturer-designed pre-market clinical investigations (including the Fitzpatrick V–VI MRMC simulated-use reader study MAN_2025), one peer-reviewed third-party manuscript on malignancy (NMSC_2025), the post-market observational study of the equivalent legacy device (
R-TF-015-012), four peer-reviewed published severity validation studies (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022), and a systematic state-of-the-art literature review. Article 61(10) — the exception for cases where clinical-data-based conformity demonstration is not deemed appropriate — is therefore not invoked. - MDR Article 61(2) voluntary consultation on clinical development strategy (Section K). Not applicable. Article 61(2) is a voluntary mechanism for certain Class III and Class IIb implantable devices; the device under evaluation is a Class IIb non-implantable MDSW and the manufacturer has not initiated this procedure. No expert-panel recommendations have been received under Article 61(2), and there is no divergence between the manufacturer's clinical development strategy and any such panel opinion because no opinion has been solicited.
Regulatory basis and evaluation mandate
The device is a Class IIb medical device software (MDSW) in the meaning of MDCG 2020-1. The clinical evaluation has been conducted in accordance with MDR Article 61(1), Annex XIV Part A, MEDDEV 2.7.1 Rev 4, and MDCG 2020-1 (MDSW clinical evaluation guidance). The evaluation addresses conformity with GSPRs 1 (safety and performance under intended conditions of use), 8 (acceptability of the benefit-risk profile), and 17 (software-specific performance requirements for MDSW).
For a Class IIb MDSW covering the full ICD-11 dermatological spectrum across 346 disease categories, no single evidence route provides sufficient clinical evidence to meet the requirements of MDR Article 61(1). In accordance with MDCG 2020-6 § 6.4, sufficiency is determined by the combined weight of evidence across all available routes, taking into account quality, quantity, and relevance. Accordingly, the manufacturer adopted a combined clinical evaluation strategy drawing on three complementary evidence routes grounded in specific provisions of MDR 2017/745.
Combined evidence strategy: Routes A, B, and C
Route A: Systematic literature review
A systematic review of the published scientific literature was conducted serving three purposes: (i) establishing the valid clinical association (VCA) between the device's outputs — ICD-11 probability classifications and validated severity scores — and their corresponding clinical conditions, as required by MDCG 2020-1 §3.1; (ii) defining the state of the art (SotA) in dermatological diagnosis and severity assessment, from which quantitative acceptance criteria are derived; and (iii) identifying supporting evidence on comparable AI-based diagnostic tools. The literature review confirmed that published evidence validates the device's claimed outputs as clinically meaningful and establishes the SotA benchmarks, but is insufficient on its own to demonstrate the performance and safety of this specific device in its intended real-world context. Literature evidence contributes at Ranks 6 and 7 of the MDCG 2020-6 Appendix III evidence hierarchy. Full methodology and results are documented in R-TF-015-011 State of the Art.
Route B: Equivalence with the legacy device
The device succeeds a legacy device commercially deployed since 2020 under MDD 93/42/EEC. Both devices share identical core AI algorithms, the same intended purpose, the same clinical indications, the same target patient population, and the same type of user. A formal equivalence assessment was conducted in accordance with Annex XIV Part A §3 and MDCG 2020-5, demonstrating equivalence at technical and clinical levels. Biological equivalence is not applicable as the device is software-only with no contact with the human body. As both devices are manufactured by the same organisation, full access to the legacy device's technical documentation is available, satisfying the data access requirement of MDR Article 61(5) for Class IIb devices. The established equivalence allows post-market surveillance (PMS) data from the legacy device — over 250,000 clinical reports, 21 contracts, and four-plus years of incident-free market experience — to be incorporated as clinical data (Rank 7 per MDCG 2020-6 Appendix III), providing real-world safety confirmation directly applicable to the device under evaluation.
Route C: Own clinical investigations
Ten manufacturer-designed pre-market clinical investigations were conducted to generate direct clinical evidence in the intended clinical context. These investigations provide Pillar 3 Clinical Performance evidence at MDCG 2020-6 Appendix III Ranks 2–4: prospective studies in real clinical settings on real patients covering the primary care to dermatology referral pathway, remote monitoring, malignancy detection and severity assessment (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023) across 715 patients at hospital sites in Spain, and one retrospective confirmatory analysis (AIHS4_2025, n = 2 pilot). They also provide MDCG 2020-1 Pillar 3 §4.4 supporting Clinical Performance evidence at Rank 11: four MRMC simulated-use reader studies (BI_2024, PH_2024, SAN_2024, MAN_2025) measuring whether intended users achieve measurably better diagnostic decisions when consuming the device's Top-5 prioritised differential view than without it. Rank and Pillar are orthogonal axes — Rank 11 reflects that the measurement is simulated-use rather than real-patient; Pillar 3 reflects that what is evidenced is the clinician's decision-making with the device. MRMC studies are not Pillar 2 evidence, because a clinician is in the loop; Pillar 2 is the clinician-free algorithm-level claim across the 346 ICD-11 categories. All investigations were conducted in accordance with ISO 14155:2020, registered in public databases (ClinicalTrials.gov and EMA RWD Catalogue), and approved by the relevant ethics committees (CEIm). The full combined evidence strategy, with its regulatory basis, is documented in R-TF-015-001 Clinical Evaluation Plan.
MDCG 2020-1: three-pillar evidence framework for MDSW
MDCG 2020-1 requires that the clinical evaluation of MDSW address three evidence pillars. The combined evidence from Routes A, B, and C is mapped to these pillars as follows:
Pillar 1: Valid Clinical Association (VCA): VCA confirms that the clinical condition targeted by the MDSW output is associated with the clinical outcome of interest. VCA is established through the systematic literature review (Route A). Published evidence confirms that ICD-11 dermatological classifications, validated severity scores (PASI, SCORAD, IHS4, UAS), and referral decisions generated by AI-based diagnostic tools are associated with meaningful clinical outcomes including correct diagnosis, reduced diagnostic delay, and appropriate care pathway decisions.
Pillar 2: Technical / Analytical Performance: Pillar 2 confirms that the MDSW correctly processes input data and generates its output reliably and accurately — a clinician-free, stand-alone analytical claim at the level of the device's documented stand-alone output and independent of clinical workflow. It comprises two sub-claims:
- (i) 346-category ICD-11 classifier — stand-alone analytical output: the device's classification accuracy across the full 346-category terminology, evidenced by the AI model verification and validation records for the frozen device (algorithm validation against the curated labelled image database). A summary of the 346-category Pillar 2 metrics (Top-1, Top-3, Top-5 accuracy; pooled and per-epidemiological-category AUC; with sample sizes) is presented inline in section
Model Evaluation and Validation. - (ii) Severity-scoring algorithms: algorithm-level concordance with expert dermatologist consensus on validated severity scales, evidenced by four peer-reviewed publications (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022).
Both Pillar 2 sub-claims are clinician-in-the-loop-free. MRMC simulated-use reader studies are not Pillar 2 evidence: they measure the clinician's diagnostic decision-making when consuming the device's Top-5 prioritised differential view, which is a Pillar 3 §4.4 Clinical Performance measurement at Rank 11 (see Pillar 3 below).
Pillar 3: Clinical Performance: Pillar 3 confirms that the MDSW achieves its intended clinical benefit in the target population, used by the intended users, in the target context of use. Clinical Performance evidence comprises:
- Five prospective pre-market clinical investigations on real patients (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023 — Ranks 2–4).
- One peer-reviewed third-party manuscript on a specialist malignancy population (NMSC_2025 — Rank 4).
- One retrospective proof-of-concept pilot study — AIHS4_2025 (n = 2 patients, 16 severity assessments), classified at Rank 6 (peer-reviewed retrospective validation on the device's own algorithm) as Supporting / proof-of-concept; this pilot is not counted as a pivotal clinical investigation for CE-marking sufficiency purposes and does not on its own support the severity claim. Powered prospective confirmation is pre-committed under PMCF Activity B.5 (target 100 HS patients).
- Four MRMC simulated-use reader studies — Pillar 3 §4.4 supporting evidence at Rank 11 (BI_2024, PH_2024, SAN_2024, MAN_2025).
- Post-market real-world evidence from the equivalent legacy device (
R-TF-015-012— classified at Rank 8 primary per MDCG 2020-6 Appendix III for both quantitative endpoints and Likert professional-opinion items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note).
Safety confirmation (cross-cut to the three MDSW pillars): In addition to the three MDCG 2020-1 pillars, the evaluation draws on a distinct safety-confirmation cross-cut that is not a fourth MDCG 2020-1 pillar. It is anchored in MDR Article 61(1) and Annex I §§1, 3, 4 and 8; MDCG 2020-6 §§6.1 and 6.3; MEDDEV 2.7/1 Rev 4 §A7.2 (clinical risks and undesirable side-effects) and §A7.4 (acceptability of undesirable side-effects); ISO 14971:2019 §§7, 8 and 10; and, for post-market streams, MDR Articles 83, 86, 87 and 88. Safety confirmation is the evidentiary contribution that demonstrates the absence of unacceptable residual clinical risk and the acceptability of observed adverse-event and device-failure rates during intended use. It is orthogonal to the three MDSW pillars — the pillars address whether the device produces clinically meaningful outputs; safety confirmation addresses whether, in doing so, the device does not introduce unacceptable harm. A source contributes safety-confirmation evidence if and only if it (a) pre-specifies safety-relevant outcome collection — adverse events, device-related harm, usability-related incidents, residual-risk observations — and (b) reports those outcomes with denominators. Two sources contribute to this cross-cut in the present evaluation. First, the Rank 7 vigilance and curated post-market surveillance data of the equivalent legacy device (consolidated in R-TF-007-003): denominator ≈ 250,000 diagnostic reports over four or more years of commercial deployment across 21 active client contracts; 0 MDR Article 87 serious incidents (rule-of-three upper one-sided 95 % bound ≤ 0.0012 %); 0 Article 88 trend reports triggered; 0 FSCAs; 7 non-serious complaints, all closed. Second, the pre-specified Section F safety items F1–F4 of the legacy-device post-market observational study (R-TF-015-012): denominator N = 56 analysis set; F1 (physician observation of misleading output) = 26.8 % (below the pre-specified 30 % follow-up threshold; substantiated F1 flags reviewed thematically against the risk-management file with no new hazard category); F2 (usability-affecting-clinical-use) = 30.4 %; F3 (overall perceived safety, Likert 1–5) mean 4.14 / 5; F4 (formal adverse-event logging) = 7.1 %, cross-referenced against the complaints registry with no unreported serious incident. The mandated integration requirements specified in the IFU (Top-5 prioritised differential view, malignancy-prioritisation gauge, referral recommendation) are themselves risk controls, and their implementation by the integrator is a precondition of this safety-confirmation conclusion as well as of the clinical-benefit conclusion. Safety confirmation evidence is the basis of the safety half of the benefit-risk determination presented in the Assessment of the benefit-risk profile section.
Evidence hierarchy and weighting
All evidence in the portfolio is ranked using the MDCG 2020-6 Appendix III evidence hierarchy, which assigns Ranks 1 to 12 in descending order of quality, from Rank 1 (systematic reviews and meta-analyses of high-quality RCTs) to Rank 12 (expert opinion and bench testing). The following ranks are represented in this evaluation:
- Ranks 2 and 4 (prospective clinical investigations): MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022 (Rank 2); IDEI_2023, DAO_Derivación_PH_2022, NMSC_2025 (Rank 4).
- Rank 6 (retrospective validation studies with clinical reference standard — peer-reviewed literature on the device's own algorithms, with MINORS methodological-quality appraisal layered on top): APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022, and the AIHS4_2025 n = 2 proof-of-concept pilot.
- Rank 7 (vigilance and curated PMS data — Safety-confirmation cross-cut): post-market surveillance data from the equivalent legacy device; see the Safety confirmation paragraph above.
- Rank 8 (proactive PMS data): the legacy-device post-market observational study
R-TF-015-012is classified at Rank 8 primary for both quantitative endpoints and Likert professional-opinion items (including the pre-specified Section F safety items F1–F4). A supplementary Rank 4 case is retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note, but is not the lead classification. - Rank 11 (simulated-use studies without real patient outcomes): BI_2024, PH_2024, SAN_2024, MAN_2025.
No Rank 1–3 evidence from systematic reviews of RCTs is available for this MDSW; this is consistent with the evidentiary situation recognised in MDCG 2020-6 § 6.4 for complex Class IIb MDSW, where the combined weight of prospective clinical investigations, PMS data, and supporting evidence constitutes sufficient clinical evidence when accompanied by a clear sufficiency justification.
Risk-proportionate tiered evidence structure
To handle the breadth of the device's scope across 346 ICD-11 dermatological categories, and in accordance with MDCG 2020-6 Appendix III and MEDDEV 2.7.1 Rev 4 Annex A7.3, the evaluation applies a risk-proportionate, tiered evidence structure. Evidence assessment is proportionate to clinical risk. The three tiers and their evidentiary requirements are detailed in the section "Tiered evidence assessment strategy." In summary: Tier 1 (malignant and high-risk conditions, approximately 5% of categories) is assessed with individual per-condition acceptance criteria; Tier 2 (rare diseases) is assessed as a dedicated subgroup; and Tier 3 (general conditions, approximately 94% of categories) is assessed using a pooled aggregate with documented risk-based justification per MDCG 2020-6 Appendix III.
Pre-market sufficiency determination
The clinical evaluation concludes that the combined evidence from Routes A, B, and C — organised across the three MDCG 2020-1 evidence pillars, ranked and weighted per the MDCG 2020-6 Appendix III hierarchy, and assessed proportionate to clinical risk using the tiered structure — is sufficient to demonstrate conformity with GSPRs 1, 8, and 17 for the pre-market phase of the device's lifecycle. This determination is made in accordance with MDCG 2020-6 § 6.4.
Residual uncertainties — specifically, limited coverage for autoimmune diseases (~3% of the dermatological spectrum) and genodermatoses (~1%), and limited data for darker skin phototypes (Fitzpatrick V and VI) — have been identified, documented, and declared as acceptable evidence gaps per MDCG 2020-6 § 6.5(e). These are addressed through targeted post-market clinical follow-up (PMCF) activities documented in R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan, designed to maintain the sufficiency determination throughout the device's operational lifetime in accordance with MDR Article 61(11) and Annex XIV Part B.
The Fitzpatrick V–VI evidence gap is additionally supported, at the pre-market stage, by the MAN_2025 simulated-use MRMC reader study. MAN_2025 is a prospective MRMC self-controlled study, sponsored and conducted by the manufacturer, evaluating whether the device improves the Top-1 diagnostic accuracy of healthcare professionals on 149 clinical images representing Fitzpatrick V–VI presentations of multiple dermatological conditions; the images are sourced from public dermatology atlases. Healthcare professionals spanning dermatology, primary care and nursing were enrolled between 21 January and 17 April 2026; three enrolled readers were excluded as screen failures (specialties outside the device's declared intended user population), and the primary analysis cohort comprises the readers meeting the CIP §Inclusion criteria, analysed at the paired-observation level with pre-specified ≥ 50%-completers and 100%-completers sensitivity analyses. Data lock was performed on 17 April 2026. MAN_2025 contributes Rank 11 MDCG 2020-1 Pillar 3 §4.4 supporting evidence (simulated-use reader study on retrospective images; not clinical data on real patients within MDR Article 2(48)) and reinforces the device's Fitzpatrick-phototype generalisability claim; its full Clinical Investigation Plan and Clinical Investigation Report are recorded under R-TF-015-004 and R-TF-015-006, alongside the source MRMC study records.
Scope of the clinical evaluation
General details
The present clinical evaluation report (CER) is intended to describe the clinical performance and safety of the device (hereinafter, "the device") as a medical device software (MDSW) used for the assessment of skin structures, enhancing efficiency and accuracy of care delivery. Even though the legacy predecessor is currently certified as a medical device under the Medical Devices Directive 93/42/EEC (MDD), this CER has been performed following the requirements of Regulation EU 2017/745 (Medical Device Regulation, MDR).
Administrative particulars
For the purposes of MDCG 2020-13 Section A, the administrative particulars of the device and of this clinical evaluation are:
- Type of assessment: initial conformity assessment (first CE-marking submission under MDR for the device under evaluation).
- Applicable MDR codes: the MDR codes from Commission Implementing Regulation (EU) 2017/2185 applicable to the device's intended purpose and technology are recorded in the product technical documentation and transcribed into the CE-marking application dossier.
- Basic UDI-DI: the Basic UDI-DI assigned to the device is recorded in the product technical documentation and the EUDAMED actor and device registration is maintained under
GP-007 Post-Market Surveillance. - Certificate number: not applicable at the date of this CER (initial CE marking; no certificate has yet been issued). Re-verified at each CER update.
- Manufacturer SRN and Authorised Representative SRN: recorded in the Manufacturer section of this CER.
- CER author, reviewer and approver roles: recorded in the signature block of this CER, consistent with the QMS responsibility matrix per
GP-001Annex 1. CVs and declarations of interest are in Annex I. - Parts of MDCG 2020-13 applied: A, B, C, D, E, F, G, Overall Conclusions, plus Specific Considerations I / J / K (all N/A for this device; see explicit statements in section
Clinical evidence strategy and regulatory methodology).
In addition to the requirements as laid in MDR, the CER has been elaborated in accordance with the guidelines and standards listed in section Applicable standards and guidance documents. Also, the CER follows the procedure GP-015 Clinical Evaluation of our QMS.
Classification
The device is classified as Class IIb under MDR Annex VIII, Rule 11, second indent — software intended to provide information used to take decisions with diagnosis or therapeutic purposes, where such decisions may cause serious deterioration of a person's state of health. The Class IIb trigger is the inclusion of 13 malignant neoplasms, pre-malignant categories, and high-risk non-malignant conditions within the device's probability distribution (enumerated in section High-risk and malignant conditions below); delayed or missed identification of these presentations can lead to serious deterioration of health. Rule 11 is the sole applicable Annex VIII classification rule; no other rule yields a higher class for this device. The qualification of the device as medical device software and the mapping to Rule 11 second indent follow the criteria set out in MDCG 2019-11.
Clinical Evaluation Plan
The technical documentation relating to the clinical evaluation includes a clinical evaluation plan
(CEP) and a clinical evaluation report (CER). However, as stated in sections 11 and A9 of the
MEDDEV 2.7/1 rev4, the clinical evaluation report should include a section to describe the
scope (stage 0 of the clinical evaluation) and, as mentioned in section 6.3 of MEDDEV 2.7.1 Rev 4, the scope of the clinical evaluation is “also referred to as [...] the clinical evaluation plan”. To avoid duplicating the full content of the CEP within the CER, the scope of the clinical evaluation is maintained in the Clinical Evaluation Plan and summarised in this CER by cross-reference. See R-TF-015-001 Clinical Evaluation Plan and R-TF-015-011 State of the Art for further details.
CEP at a glance: Annex XIV Part A §1a
The Clinical Evaluation Plan is summarised below against the eight elements of MDR Annex XIV Part A §1a. The normative content remains in R-TF-015-001.
- Identification of GSPRs requiring clinical data. GSPR 1 (performance and safety), GSPR 8 (acceptability of side-effects), and GSPR 17.1 (software repeatability, reliability, and performance). See section
Objectives of the Clinical Evaluation Reportbelow. - Specification of intended purpose. Diagnostic decision support via AI-based image analysis across 346 validated ICD-11 dermatological categories, with three claimed clinical benefits (7GH diagnostic accuracy, 5RB objective severity assessment, 3KX care pathway optimisation). See section
Device description. - Target groups, indications, and contra-indications. Healthcare professionals in primary care, general dermatology, and specialist referral settings; adult and paediatric patients; Fitzpatrick phototypes I-VI. Indications, contraindications, and precautions are defined in section
Contraindications and precautions required by the manufacturerand in the IFU. - Intended clinical benefits with clinical outcome parameters. Three benefits with per-sub-criterion acceptance criteria and measurable clinical outcome parameters (Top-1/3/5 accuracy, AUC, sensitivity, specificity, ICC, Cohen's Kappa, Krippendorff α, reduction in unnecessary referrals, waiting-time reduction, remote-care capacity). See section
Clinical benefitsandSummary of Clinical Benefits Achievement. - Methods for qualitative and quantitative safety examination. Qualitative safety is examined through summative usability validation (
R-TF-025-007), vigilance/complaint review of the equivalent legacy device, and cross-analysis of risk-management outputs; quantitative safety is examined through per-category false-negative and false-positive rate assessment, MAUDE/EUDAMED benchmarking against similar devices, and residual-risk acceptability per ISO 14971. See sectionsRequirement on safetyandSafety Benchmarking against State of the Art. - Parameters for acceptability of the benefit-risk ratio per the state of the art. Derived per benefit × clinical domain from a systematic literature corpus (93 appraised papers including an April 2026 supplementary search), with explicit meta-analytic or weighted-average SotA baselines and documented safety margins of 3 to 23 percentage points per domain. See section
Acceptance Criteria Derivation from State of the Art. - Benefit-risk issues relating to specific components (pharmaceutical, non-viable animal or human tissues). Not applicable: the device is software-only with no biological, pharmaceutical, or animal-/human-tissue component. Biological equivalence is therefore also N/A (see section
Biological equivalence). - Clinical development plan: exploratory → confirmatory → PMCF, with milestones and acceptance criteria. Exploratory phase: the legacy device's market experience from 2020 onward (4+ years of real-world use, over 250,000 reports). Confirmatory phase: ten manufacturer-designed pre-market clinical investigations (three QUADAS-2 diagnostic accuracy studies — MC_EVCDAO_2019, IDEI_2023, AIHS4_2025; three MINORS clinical-utility / referral-pathway studies — COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022; four MINORS-appraised MRMC simulated-use reader studies — BI_2024, PH_2024, SAN_2024, MAN_2025 — contributing Pillar 3 §4.4 supporting evidence at Rank 11), plus one peer-reviewed third-party manuscript providing Pillar 3 clinical performance evidence on specialist malignancy (NMSC_2025, Rank 4), plus four published peer-reviewed severity-validation studies (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022 — Pillar 2 Ranks 5–6). Post-market phase: PMCF activities A.1-A.3 (Triage and Malignancy Prioritization, Gap 1), B.1-B.5 (prospective severity assessment, Gap 2), C.1 (algorithmic performance monitoring, Gap 3), and D.1-D.2 (autoimmune and genodermatoses coverage, Gaps 4-5), with per-activity acceptance criteria and milestones documented in
R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan.
Objectives of the Clinical Evaluation Report
To promote a common approach for the clinical evaluation of medical devices, the European Commission published guidance whose latest version was released in 2016 (MEDDEV 2.7/1 revision 4). According to these guidelines, the “clinical evaluation report is an element of the technical documentation of a medical device” that “summarizes and draws together the evaluation of all the relevant clinical data documented or referenced in other parts of the technical documentation”. In other words, the purpose of this clinical evaluation report is to document all the information used and the conclusions made during the clinical evaluation. This notably includes the assessment of the conformity of the medical devices with the general safety and performance requirements set out in Annex I of the EU Regulation 2017/745 on Medical Devices.
As mentioned in Article 61, paragraph 1, of the EU Regulation 2017/745 on Medical Devices, ”confirmation of conformity with relevant general safety and performance requirements set out in Annex I under the normal conditions of the intended use of the device, and the evaluation of the undesirable side-effects and the acceptability of the benefit-risk- ratio referred to in Sections 1 and 8 of Annex I, shall be based on clinical data providing sufficient clinical evidence [...]”.
In other words, the conclusions of the clinical evaluation need to support the following specific General Safety and Performance Requirements (GSPR):
Specific requirements on performance of the device (GSPR 1 and GSPR 17.1)
- GSPR 1: “Devices shall achieve the performance intended by their manufacturer and shall be designed and manufactured in such a way that, during normal conditions of use, they are suitable for their intended purpose”.
- GSPR 17.1: “Devices that incorporate electronic programmable systems, including software, or software that are devices in themselves, shall be designed to ensure repeatability, reliability and performance in line with their intended use. In the event of a single fault condition, appropriate means shall be adopted to eliminate or reduce as far as possible consequent risks or impairment of performance”.
Specific requirements on safety (GSPR 1 and GSPR 8)
- GSPR 1: “They [devices] shall be safe and effective and shall not compromise the clinical condition or the safety of patients, or the safety and health of users or, where applicable, other persons, provided that any risks which may be associated with their use constitute acceptable risks when weighed against the benefits to the patient and are compatible with a high level of protection of health and safety, taking into account the generally acknowledged state of the art”.
- GSPR 8: “All known and foreseeable risks, and any undesirable side-effects, shall be minimized and be acceptable when weighed against the evaluated benefits to the patient and/or user arising from the achieved performance of the device during normal conditions of use”.
The table of contents of this CER complies with the table of contents proposed in Appendix A9 of the MEDDEV 2.7/1 rev4 guide about “How is a clinical evaluation performed”. Since the EU Regulation 2017/745 on Medical Devices does not provide any contrary information, this structure can still be used.
Qualification of the responsible evaluator(s)
The qualification requirements for the evaluators involved in this clinical evaluation are based on the guidelines in section 6.4 of MEDDEV 2.7/1 rev.4. This standard is applied in the absence of superseding requirements within the EU Regulation 2017/745 (MDR).
As stated in section "Objectives of the Clinical Evaluation Report", the clinical evaluation report also follows the structure mandated by Appendix A9 of MEDDEV 2.7/1 rev.4. This format requires that the qualifications of the responsible evaluators, along with their declarations of interest, are documented directly within the report. This information is located in ANNEX I: CV AND DECLARATIONS OF INTEREST.
Methodology
The present CER is based on the clinical evaluation of the available clinical data related to the device under evaluation as required by MDR. To this end, all relevant clinical data related to the device has been collected, appraised, and analysed following MEDDEV 2.7/1 rev. 4. The requirements for clinical evaluation are outlined in Article 61 of the MDR (including Annex XIV).
All relevant clinical data has been collected and appraised in order to establish the safety and performance of the device and to identify any gaps in clinical evidence to support the benefit/risk profile of the device. Details on the followed methodology can be found in the CEP (R-TF-015-001 Clinical Evaluation Plan).
MEDDEV 2.7.1 Rev 4: Clinical Evaluation Stages
In accordance with MEDDEV 2.7.1 Rev 4 Section 11, the clinical evaluation was conducted following the staged approach required for a complete and valid clinical evaluation:
- Stage 0 (Scope of the clinical evaluation): Defined in the Clinical Evaluation Plan (
R-TF-015-001), which establishes the intended purpose, device description, identification of applicable GSPRs, clinical performance parameters, clinical evaluation methodology, and literature search protocol in accordance with MEDDEV 2.7.1 Rev 4 Sections 6 and 7 and Annex A5. The scope was defined as a Class IIb MDSW covering the full ICD-11 dermatological spectrum across 346 disease categories, with three claimed clinical benefits (7GH, 5RB, 3KX), applicable GSPRs 1, 8, and 17, and a combined evidence strategy (Routes A, B, and C). - Stage 1 (Identification of pertinent data, MEDDEV 2.7.1 Rev 4 §8): All pre-market and post-market clinical data relevant to the device were identified. Manufacturer-held data (pivotal clinical investigations, PMS data from the equivalent legacy device) is documented in the section "Manufacturer's clinical data." Literature data were identified through a systematic search of MEDLINE/PubMed and Cochrane CENTRAL, documented in the section "Methodology of the literature search for the device" and in the State of the Art document (
R-TF-015-011). The complete evidence portfolio was identified: ten manufacturer-designed pre-market clinical investigations (including the Fitzpatrick V–VI MRMC simulated-use reader study MAN_2025), one peer-reviewed third-party manuscript (NMSC_2025), the post-market observational study of the equivalent legacy device (R-TF-015-012), passive PMS data from the equivalent legacy device, four published peer-reviewed severity-validation studies, and a corpus of pertinent published literature from the systematic search. - Stage 2 (Appraisal of pertinent data, MEDDEV 2.7.1 Rev 4 §9): Each identified data set was appraised for methodological quality, relevance, and weighting. For the heterogeneous SotA corpus, the unified CRIT1-7 framework was applied; the rationale for this framework and its scores are documented in the State of the Art document (
R-TF-015-011). For the manufacturer's own clinical investigations and the published severity-validation and peer-reviewed third-party studies, design-specific validated tools were applied: QUADAS-2 for diagnostic-accuracy studies (MC_EVCDAO_2019, IDEI_2023, AIHS4_2025, NMSC_2025) and MINORS for clinical-utility/workflow studies (COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022), simulated-use MRMC studies (BI_2024, PH_2024, SAN_2024, MAN_2025), and published severity-validation studies (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022). The full appraisal tables and interpretive commentary are documented in the section "Validated methodological quality appraisal." All ten manufacturer-designed pre-market investigations, the peer-reviewed third-party manuscript (NMSC_2025), the equivalent device's PMS dataset, and the four published severity-validation studies were included in the evidence portfolio; no manufacturer-held data sets were excluded. - Stage 3 (Analysis of clinical data, MEDDEV 2.7.1 Rev 4 §10): The appraised data were collectively analysed to determine whether they demonstrate conformity with the applicable GSPRs (1, 8, and 17). The analysis is documented in the sections "Achievement of the intended performances," "Safety," and "Assessment of the benefit/risk profile." MDR GSPRs are used in place of MDD Essential Requirements per the MDCG 2020-6 substitution rule endorsed in MEDDEV 2.7.1 Rev 4 Section 10. The analysis concluded that the combined evidence demonstrates conformity with GSPRs 1, 8, and 17, and that all three claimed clinical benefits (7GH, 5RB, 3KX) were achieved. Two categories — autoimmune diseases and genodermatoses — were identified as acceptable evidence gaps per MDCG 2020-6 § 6.5(e), addressed through targeted PMCF activities.
- Stage 4 (Writing the clinical evaluation report): This document constitutes Stage 4. It summarises all data identified, appraised, and analysed in Stages 0–3 and draws conclusions on conformity with the applicable GSPRs.
Applicable standards and guidance documents
Common Specifications under MDR Article 9
Common Specifications adopted under MDR Article 9 were screened for applicability to the device's intended purpose, classification, and technology. As of the date of this report, no Common Specifications in force under MDR Article 9 apply to the device. This screening is repeated at each scheduled update of the clinical evaluation.
Harmonised standards and other standards
The list below identifies the standards and guidance documents that have been applied in the clinical evaluation. Under MDR, presumption of conformity attaches only to standards harmonised under Regulation (EU) 2017/745 and cited in the corresponding Official Journal of the European Union (OJEU) Implementing Decisions. Where a standard in the list below is not yet harmonised under MDR, it has been applied as the recognised state of the art in line with the transitional approach of MDCG 2021-5, pending publication of an MDR-harmonised equivalent. The harmonisation status, applied revision, and implementation rationale for each standard is maintained as a controlled list in R-TF-001-005 List of applicable standards and regulations, which is the authoritative source and is kept current at each QMS update.
The applicable standards and guidance documents to the present CER are listed below:
- MDR 2017/745: Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices
- MEDDEV 2.7/1 revision 4: European Commission Guidelines on Medical Devices Clinical Evaluation
- IMDRF/AE WG/N43FINAL:2020: IMDRF terminologies for categorized Adverse Event Reporting (AER): terms, terminology structure and codes
- MDCG 2023-3: Questions and Answers on vigilance terms and concepts as outlined in the Regulation (EU) 2017/745 on medical devices
- IMDRF MDCE WG/N57FINAL:2019: Clinical investigation
- MDCG 2024-5 Guidance on content of the Investigator's Brochure for clinical investigations of medical devices
- MDCG 2024-3 Guidance on content of the Clinical Investigation Plan for clinical investigations of medical devices
- 2023/C 163/06: Commission Guidance on the content and structure of the summary of the clinical investigation report
- MDCG 2020-10/1 Rev.1
- MDCG 2020-10/2 Rev. 1: Guidance on safety reporting in clinical investigations
- MDCG 2020-1: Guidance on clinical evaluation (MDR) / Performance evaluation (IVDR) of medical device software
- MDCG 2020-6: Regulation (EU) 2017/745: Clinical evidence needed for medical devices previously CE marked under Directives 93/42/EEC or 90/385/EEC
- MDCG 2022-21: Guidance on Periodic Safety Update Report (PSUR) according to Regulation (EU) 2017/745 (MDR)
- MDCG 2020-7: Guidance on PMCF plan template
- MDCG 2020-8: Guidance on PMCF evaluation report template
- IMDRF MDCE WG/N65FINAL:2021: Post-Market Clinical Follow-Up Studies
- MDCG 2020-13: Clinical evaluation assessment report template
- IMDRF MDCE WG/N56FINAL:2019: Clinical evaluation
- IMDRF MDCE WG/N55 FINAL:2019: Clinical evidence
- ISO 13485:2016, Adm 11: Quality Management Systems - Regulatory Requirements for Medical Devices
- ISO 14971:2019: Medical devices - Application of Risk Management to Medical Devices
- ISO 14155:2020: Clinical Investigation on Medical devices for human subjects - Good clinical practice
- EN 62304-1:2021: Medical device software - Software life cycle processes - Part 1: Guidance on the application of ISO 62304
- ISO/IEC 62366-1:2015: Medical devices - Part 1: Application of usability engineering to medical devices
- ISO 15223-1:2021: Medical devices - Symbols to be used with medical device labels, labelling and information to be supplied - Part 1: General requirements
- EN 82304-2:2021: Medical device software - Software life cycle processes - Part 2: Guidance on the application of ISO 62304 to medical device software in the context of IEC 80001-1
Device description
Manufacturer
| Manufacturer data | |
|---|---|
| Legal manufacturer name | AI Labs Group S.L. |
| Address | Street Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain) |
| SRN | ES-MF-000025345 |
| Person responsible for regulatory compliance | Alfonso Medela, Saray Ugidos |
| office@legit.health | |
| Phone | +34 638127476 |
| Trademark | Legit.Health |
| Authorized Representative | Not applicable (manufacturer is based in EU) |
Device identification
| Information | |
|---|---|
| Device name | Legit.Health Plus (hereinafter, the device) |
| Model and type | NA |
| Version | 1.1.0.0 |
| Basic UDI-DI | 8437025550LegitCADx6X |
| Certificate number (if available) | MDR 000000 (Pending) |
| EMDN code(s) | Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software) |
| GMDN code | 65975 |
| EU MDR 2017/745 | Class IIb |
| EU MDR Classification rule | Rule 11 |
| Novel product (True/False) | TRUE |
| Novel related clinical procedure (True/False) | TRUE |
| SRN | ES-MF-000025345 |
Intended use
The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures, enhancing efficiency and accuracy of care delivery, by providing:
- an interpretative distribution representation of possible International Classification of Diseases (ICD) categories that might be represented in the pixels content of the image
- quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others
Quantification of intensity, count and extent of visible clinical signs
The device provides quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others; including, but not limited to:
- erythema,
- desquamation,
- induration,
- crusting,
- xerosis (dryness),
- swelling (oedema),
- oozing,
- excoriation,
- lichenification,
- exudation,
- wound depth,
- wound border,
- undermining,
- hair loss,
- necrotic tissue,
- granulation tissue,
- epithelialization,
- nodule,
- papule
- pustule,
- cyst,
- comedone,
- abscess,
- hive,
- draining tunnel,
- non-draining tunnel,
- inflammatory lesion,
- exposed wound, bone and/or adjacent tissues,
- slough or biofilm,
- maceration,
- external material over the lesion,
- hypopigmentation or depigmentation,
- hyperpigmentation,
- scar,
- scab,
- spot,
- blister
Image-based recognition of visible ICD categories
The device is intended to provide an interpretative distribution representation of possible International Classification of Diseases (ICD) categories that might be represented in the pixels content of the image.
Device description
The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.
The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.
The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.
Intended medical indication
The device is indicated for use on images of visible skin structure abnormalities to support the assessment of all diseases of the skin incorporating conditions affecting the epidermis, its appendages (hair, hair follicle, sebaceous glands, apocrine sweat gland apparatus, eccrine sweat gland apparatus and nails) and associated mucous membranes (conjunctival, oral and genital), the dermis, the cutaneous vasculature and the subcutaneous tissue (subcutis).
Intended patient population
The device is intended for use in adult and paediatric patients presenting with skin findings across Fitzpatrick phototypes I-VI, in primary care, general dermatology, and specialist referral settings.
Intended user
The medical device is intended for use by healthcare providers to aid in the assessment of skin structures.
User qualifications and competencies
This section outlines the qualifications and competencies required for users of the device to ensure its safe and effective use. It is assumed that all users already possess the baseline qualifications and competencies associated with their respective professional roles.
Healthcare professionals
No additional official qualifications are required for healthcare professionals (HCPs) to use the device. However, it is recommended that HCPs possess the following competencies to optimize device utilization:
- Proficiency in capturing high-quality clinical images using smartphones or equivalent digital devices.
- Basic understanding of the clinical context in which the device is applied.
- Familiarity with interpreting digital health data as part of the clinical decision-making process.
The device may be used by any healthcare professional who, by virtue of their academic degree, professional license, or recognized qualification, is authorized to provide healthcare services. This includes, but is not limited to:
- Medical Doctors (MD, MBBS, DO, Dr. med., or equivalent)
- Registered Nurses (RN, BScN, MScN, Dipl. Pflegefachfrau/-mann, or equivalent)
- Nurse Practitioners (NP, Advanced Nurse Practitioner, or equivalent)
- Physician Assistants (PA, or equivalent roles such as Physician Associate in the UK/EU)
- Dermatologists (board-certified, Facharzt für Dermatologie, or equivalent)
- Other licensed or registered healthcare professionals as recognized by local, national, or European regulatory authorities
Each HCP must hold the academic title, degree, or professional registration that confers their status as a healthcare professional in their jurisdiction, whether in the United States, Europe, or other regions where the device is provided.
IT professionals
IT professionals are responsible for the technical integration, configuration, and maintenance of the medical device within the healthcare organization's information systems.
No specific official qualifications are mandated. Nevertheless, it is advisable that IT professionals involved in the deployment and support of the device have the following competencies:
- Foundational knowledge of the HL7 FHIR (Fast Healthcare Interoperability Resources) standard and its application in healthcare data exchange.
- Ability to interpret and manage the device's data outputs, including integration with electronic health record (EHR) systems.
- Understanding of healthcare data privacy and security requirements relevant to medical device integration, including GDPR (Europe), HIPAA (US), and other applicable local regulations.
- Experience with troubleshooting and supporting clinical software in a healthcare environment.
- Familiarity with IT standards and best practices for healthcare, such as ISO/IEC 27001 (Information Security Management) and ISO 27799 (Health Informatics—Information Security Management in Health).
IT professionals may include, but are not limited to:
- Health Informatics Specialists (MSc Health Informatics, or equivalent)
- Clinical IT System Administrators
- Healthcare Integration Engineers
- IT Managers and Project Managers in healthcare settings
- Software Engineers and Developers specializing in healthcare IT
- Other IT professionals with relevant experience in healthcare environments, as recognized by local, national, or European authorities
Each IT professional should possess the relevant academic degree, professional certification, or demonstrable experience that qualifies them for their role in the healthcare organization, in accordance with the requirements of the United States, Europe, or other regions where the device is provided.
Use environment
The device is intended to be used in the setting of healthcare organisations and their IT departments, which commonly are situated inside hospitals or other clinical facilities.
The device is intended to be integrated into the healthcare organisation's system by IT professionals.
Operating principle
The device is computational medical tool leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures.
Body structures
The device is intended to use on the epidermis, its appendages (hair, hair follicle, sebaceous glands, apocrine sweat gland apparatus, eccrine sweat gland apparatus and nails) and associated mucous membranes (conjunctival, oral and genital), the dermis, the cutaneous vasculature and the subcutaneous tissue (subcutis).
In fact, the device is intended to use on visible skin structures. As such, it can only quantify clinical signs that are visible, and distribute the probabilities across ICD categories that are visible.
Explainability
For visual signs that can be quantified in terms of count and extent, the underlying models not only calculate a final value, such as the number of lesions, but also determine their locations within the image. Consequently, the output for these visual signs is accompanied by additional data, which varies depending on whether the quantification involves count or extent.
- Count. When a visual sign is quantifyed by counting, the device generates bounding boxes for each detected entity. These bounding boxes are defined by their x and y coordinates, as well as their height and width in pixels.
- Extent. When a visual sign is quantifyed by its extent, the device outputs a mask. This mask, which is the same size as the image, consists of 0's for pixels where the visual sign is absent and 1's for pixels where it is present.
The explainability output can be found with the explainabilityMedia key. Here is an example:
{
"explainabilityMedia": {
"explainabilityMedia": {
"content": "base 64 image",
"detections": [
{
"confidence": 98,
"label": "nodule",
"p1": {
"x": 202,
"y": 101
},
"p2": {
"x": 252,
"y": 154
}
},
{
"confidence": 92,
"label": "pustule",
"p1": {
"x": 130,
"y": 194
},
"p2": {
"x": 179,
"y": 245
}
}
]
}
}
}
Intended patient population
The device is intended for use in adult and paediatric patients presenting with skin findings across Fitzpatrick phototypes I-VI, in primary care, general dermatology, and specialist referral settings.
Indications, contraindications, and precautions that govern individual patient eligibility are defined in section Contraindications and precautions required by the manufacturer and in the Instructions for Use.
Communication of population-evidence limitations to users
The acknowledged under-representation of Fitzpatrick V-VI skin and paediatric patients in the pivotal evidence base (see section Representativeness of the Study Populations) is communicated to healthcare professionals via the IFU as a precaution advising additional clinical judgement in these subpopulations. This is consistent with MDCG 2020-6 § 6.5(e) on acceptable evidence gaps and with MEDDEV 2.7/1 Rev 4 Annex A7.2.
Malignancy-prioritisation safety architecture
The device architecture enforces a severity-prioritisation constraint (P₂=1) that prevents under-triage of malignant or pre-malignant findings regardless of model probability ranking. This architectural safety behaviour is equivalent to the corresponding safety architecture of the legacy predecessor device and is fully covered by equivalence under MDR Article 61(5)-(6) and MDCG 2020-5. It is documented in section Risk architecture.
P₂=1 architectural severity-prioritisation constraint
The P₂=1 constraint is an architectural safety feature identified in section Demonstration of equivalence and is covered by equivalence to the legacy predecessor device. The supporting evidence package consists of:
- Architectural specification. The constraint is encoded in the device's defence-in-depth safety architecture and forces any malignant or pre-malignant ICD-11 category surfaced by the classifier (irrespective of the probability ranking) to appear within the second-position output channel of the prioritised top-N display rendered to the HCP. The full architectural specification is held in the Software Architecture Description (
R-TF-012-029). - Risk-control linkage. The constraint is mapped to the risk-control measures for the malignancy-misclassification risks in the Risk Management File (
R-TF-013-002), and to the AI/ML risk assessment inR-TF-028-011 AI/ML Risk Assessment. The constraint is the principal architectural risk-control for under-triage of malignant findings. - Verification evidence. The constraint is verified through deterministic architectural unit tests on the manufacturer's curated labelling dataset (output-parity tests confirming that for every image where any malignant category is in the model's output, the malignant category occupies the P₂ position). Test results are held in the V&V records.
- Clinical-relevance assessment. The constraint reduces the residual probability of under-triage of malignant findings to within the residual-risk envelope defined in
R-TF-013-002. The post-market F1 misleading-output rate of 26.8% inR-TF-015-012sits below the pre-specified 30% follow-up threshold, and none of the substantiated F1 responses corresponds to an unreported serious incident over the 4+ years of legacy-predecessor commercial deployment (rule-of-three upper one-sided 95% bound on serious incidents ≤ 0.067%). - PMCF confirmation. PMCF Activity C.1 monitors algorithmic AUC and Top-N stability post-CE-marking; any drift outside the pre-specified threshold triggers an unscheduled CER update per MDR Article 61(11).
The P₂=1 constraint and the supporting defence-in-depth architecture are documented in section Risk architecture and in R-TF-028-011 AI/ML Risk Assessment.
Device outputs
The device produces three categories of output for every image processed. These outputs are identical regardless of the condition depicted in the image; the device does not operate in condition-specific modes.
Output 1: ICD-11 probability distribution
For each image, the device outputs a normalised probability vector across all 346 validated ICD-11 categories covering visible diseases of the skin. Each element represents the estimated probability that the image depicts a condition belonging to that ICD-11 category. The probabilities sum to 1.0. The device always outputs the full distribution across all 346 categories; it does not select, filter, or suppress any categories based on the image content. This probabilistic output is fundamentally different from a diagnostic test that provides a binary positive/negative result for a specific condition. The device provides an array of ICD-11 categories with distributed probabilities, and the clinical decision remains with the healthcare professional.
Output 2: Clinical sign measurements
The device provides quantitative measurements for 37 clinical signs using three measurement methods:
- Intensity (continuous scale, 0–10): erythema, desquamation, induration, crusting, xerosis, swelling, oozing, excoriation, lichenification, exudation, wound depth, wound border, undermining, necrotic tissue, granulation tissue, epithelialization, maceration, slough or biofilm, hypopigmentation or depigmentation, hyperpigmentation, scar, external material over the lesion.
- Count (integer, with bounding boxes): nodule, papule, pustule, cyst, comedone, abscess, hive, draining tunnel, non-draining tunnel, inflammatory lesion, scab, spot, blister.
- Extent (cm² or percentage of affected area, with segmentation masks): hair loss; exposed wound, bone and/or adjacent tissues.
Output 3: Explainability media
For count-based signs, the device outputs bounding boxes identifying the location of each detected structure. For extent-based signs, the device outputs segmentation masks delineating the affected area. These visual overlays allow the healthcare professional to verify the basis of the quantitative measurements.
The device does not output a diagnosis, a binary positive/negative result, a treatment recommendation, a referral decision, or a prognosis. The device output is one element of the overall clinical assessment; the healthcare professional must consider the patient's medical history and other clinical findings before reaching a clinical decision.
Distinction between the device's stand-alone output and the physician-facing user interface
The device architecture has two layers that must be understood separately:
-
Stand-alone output layer (complete output): The device's documented application-programming interface always returns the full normalised probability distribution across all 346 ICD-11 categories, the complete set of clinical sign measurements, and all explainability media. This is the output on which the Pillar 2 (clinician-free, 346-category, stand-alone analytical) Technical/Analytical Performance claims in this CER are evaluated. The stand-alone output specification is documented in the IFU (
Installation manual: Endpoint specification). -
Physician user-interface layer (prioritised view): The integrating healthcare system (hospital EMR, teledermatology platform, or other client application) presents the physician with a prioritised, limited view of the device's stand-alone output, typically the Top-5 ICD-11 categories ordered by probability, alongside the severity scores and explainability media. The physician does not see or interact with the full 346-category array. The user-interface presentation format and integration requirements are mandated by the manufacturer as integration requirements specified in the IFU (
Installation manual: User Interfacesection for the integrator-MUST presentation requirements;Installation manual: Endpoint specificationfor the derivation of the six binary safety indicators from the device's stand-alone output). The integrator's obligation to implement these mandated presentation requirements in conformity with the IFU is a precondition of the clinical performance validated in this CER; failure to implement them places the integrator's deployment outside the device's intended use and outside the scope of the CE-marking clinical-benefit claim. The manufacturer does not delegate user-interface-presentation responsibility to the integrator: the integrator is a co-controlled risk-control agent whose obligation to implement the mandated user-interface surfaces is a precondition of the CE-marking clinical-benefit claim.
This distinction is clinically relevant because the physician's actual experience is of the Top-5 prioritised differential view, not a raw probability vector. The stand-alone analytical (346-category) performance claims in this CER are evaluated under MDCG 2020-1 Pillar 2 (Technical/Analytical Performance). The Pillar 3 Clinical Performance claims are evaluated on the Top-5 prioritised differential view — the user-interface output that the intended user actually consumes when making clinical decisions. The distinction between Pillar 2 (stand-alone analytical output / 346 categories) and Pillar 3 (Top-5 clinician-consumed user-interface view) is fundamental to the three-pillar MDSW framework. The performance metrics used throughout this CER — Top-1 accuracy, Top-3 accuracy, Top-5 accuracy — directly reflect this clinical workflow: a Top-5 metric measures whether the correct condition appears within the five highest-probability categories that the physician sees. The clinical investigations (BI_2024, PH_2024, SAN_2024, MAN_2025, IDEI_2023) evaluate this prioritised-view workflow, and their results are therefore representative of the physician's actual interaction with the device.
The IFU integration guidance (Installation manual: User Interface) states that the integrator MUST display, as a minimum:
- Top-5 prioritised differential view: the Top-5 ICD-11 categories rendered in descending-probability ranked order (the order produced by the device, which embeds the architectural severity-prioritisation constraint described in section
Malignancy-prioritisation safety architecture), as a single visually cohesive block with each candidate clearly labelled. Risk controls:R-TF-013-002entriesR-BDR,R-A96. - Malignancy-prioritisation gauge: the malignancy-prioritisation value rendered as a visually distinguishable gauge (for example, a coloured bar, dial, or badge) immediately visible to the healthcare professional without additional interaction, re-surfacing malignancy risk regardless of whether a malignant ICD-11 category occupies the top rank of the Top-5 view. Risk controls:
R-TF-013-002entriesR-HBD,R-BDR,R-DAG,R-75H. - Referral recommendation: co-visible with the Top-5 prioritised differential view, not behind additional interaction (for example, behind an expand control, a secondary tab, or a separate screen). Risk controls:
R-TF-013-002entriesR-BDR,R-75H. - Six binary malignancy-surfacing safety indicators (malignant, pre-malignant, associated with malignancy, pigmented lesion, urgent referral ≤ 48 h, high-priority referral ≤ 2 weeks): always visible as binary states, not hidden behind an expansion control or secondary tab. Risk controls:
R-TF-013-002entriesR-BDR,R-HBD,R-SKK.
Integrators that do not meet these mandatory display requirements operate the device outside its intended use, and the clinical performance validated in this CER no longer applies.
The device's six binary safety indicators are returned alongside the probability distribution in every stand-alone analytical output (defined and derived in Installation manual: Endpoint specification, § Binary Indicators). These indicators operate independently of the ICD-11 ranking and provide a safety net for high-risk presentations even if the specific malignant category is not ranked first. The Installation manual: User Interface section of the IFU mandates that the integrator MUST display all six indicators as always-visible binary states, not hidden behind an expansion control or secondary tab, and traces this requirement to risk-control entries R-BDR, R-HBD, and R-SKK in R-TF-013-002.
Scope of ICD-11 categories
The device covers 346 validated ICD-11 categories covering visible diseases of the skin. These categories span multiple ICD-11 chapters, primarily chapter 14 (Diseases of the skin), chapter 2 (Neoplasms) for malignant and pre-malignant conditions, and chapter 1 (Certain infectious or parasitic diseases) for cutaneous infections, among others. The categories were derived from the device's training dataset through the mapping process documented in R-TF-028-004 Data Annotation Instructions: ICD-11 Mapping. The mapping consolidates visually indistinguishable conditions into single "Visible ICD-11 category" targets (for example, contact dermatitis and atopic dermatitis are consolidated into "Eczematous dermatitis" because they cannot be reliably differentiated by visual appearance alone). The complete list of 346 categories and their ICD-11 code mappings is maintained in R-TF-028-004.
The 346 categories collectively span the full breadth of dermatological practice. Organised by epidemiological category (Karimkhani et al. 2017, based on the Global Burden of Disease Study), the distribution is:
| Epidemiological category | Global burden | Examples of conditions in the probability distribution |
|---|---|---|
| Infectious diseases | ~57% | Tinea (corporis, pedis, capitis, versicolor), herpes simplex, herpes zoster, bacterial cellulitis, impetigo, scabies, molluscum contagiosum, cutaneous leishmaniasis, verruca vulgaris |
| Other conditions | ~19% | Acne vulgaris, alopecia areata, vitiligo, urticaria, keloid, miliaria, contact dermatitis (irritant), androgenetic alopecia, prurigo nodularis |
| Inflammatory diseases | ~15% | Psoriasis (vulgaris, guttate, pustular), eczema, atopic dermatitis, seborrhoeic dermatitis, lichen planus, rosacea, pityriasis rosea, granuloma annulare |
| Malignant and pre-malignant neoplasms | ~5% | Cutaneous melanoma, basal cell carcinoma, squamous cell carcinoma, actinic keratosis, Merkel cell carcinoma; see full enumeration below |
| Autoimmune diseases | ~3% | Pemphigus (vulgaris, foliaceus), bullous pemphigoid, lupus erythematosus (cutaneous), dermatomyositis, morphea |
| Vascular conditions | ~1% | Leukocytoclastic vasculitis, IgA vasculitis, livedoid vasculopathy, venous ulcer, chronic arterial occlusive disease |
| Genodermatoses | ~1% | Ichthyosis, epidermolysis bullosa, Darier disease, neurofibromatosis |
The condition examples listed above are illustrative, not exhaustive. The complete 346-category list is documented in R-TF-028-004. The global burden percentages are derived from Karimkhani et al. (2017) "The global burden of skin disease in 2010: an analysis of the prevalence and impact of skin conditions" and represent the proportion of global dermatological consultations attributable to each category; they do not represent the proportion of the 346 categories.
High-risk and malignant conditions
Among the 346 ICD-11 categories in the probability distribution, the following are clinically classified as malignant neoplasms:
| Condition | ICD-11 code |
|---|---|
| Cutaneous melanoma | 2C30 |
| Acral lentiginous melanoma | 2C30.3 |
| Amelanotic malignant melanoma | 2E63.00 |
| Basal cell carcinoma | 2C32 |
| Squamous cell carcinoma | 2C31 |
| Merkel cell carcinoma | 2C34 |
| Adnexal carcinoma | 2C33 |
| Cutaneous T-cell lymphoma | 2B0Z |
| Mycosis fungoides | 2B01 |
| Pleomorphic T-cell lymphoma | 2B0Y |
| Dermatofibrosarcoma protuberans | 2B53.Y |
| Angiosarcoma | 2B56.1 |
| Metastatic malignant neoplasm involving skin | 2E08 |
In addition, the device covers pre-malignant conditions (actinic keratosis, Bowen disease), neoplasms of uncertain behaviour (ICD-11 2D41), and high-risk non-malignant conditions that require urgent clinical assessment, including Stevens-Johnson syndrome/toxic epidermal necrolysis (EB13), erythroderma, drug eruptions, bacterial cellulitis, and dissecting cellulitis. The complete list of conditions in each group is documented in R-TF-028-004.
Clinical benefit 7GH (see Section "Clinical benefits") encompasses diagnostic accuracy across all presentation types, including a dedicated sub-criterion for lesions suspicious for skin cancer (measured by AUC). The malignancy sub-criterion specifically validates that the probability distribution, when presented to healthcare professionals, improves their diagnostic accuracy for lesions suspicious for skin cancer, including the malignant neoplasm categories listed above. The device does not independently diagnose malignancy; clinical assessment and specialist referral decisions remain with the healthcare professional.
Contraindications and precautions required by the manufacturer
Contraindications
We advise not to use the device if:
- Skin structures located at a distance greater than 1 cm from the eye, beyond the optimal range for examination.
- Skin areas that are obscured from view, situated within skin folds or concealed in other manners, making them inaccessible for camera examination.
- Regions of the skin showcasing scars or fibrosis, indicative of past injuries or trauma.
- Skin structures exhibiting extensive damage, characterized by severe ulcerations or active bleeding.
- Skin structures contaminated with foreign substances, including but not limited to tattoos and creams.
- Skin structures situated at anatomically special sites, such as underneath the nails, requiring special attention.
- Portions of skin that are densely covered with hair, potentially obstructing the view and hindering examination.
The contraindications above are maintained under change control and are consistent across this CER, the IFU, and the Risk Management File (R-TF-013-002). Any future amendment to the contraindication list is applied across the three documents and re-verified at the next CER update cycle.
Precautions
To use the device safely, please consider the following precautions:
- The device must always be used by a HCP, who should confirm or validate the output of the device considering the medical history of the patient, and other possible symptoms they could be suffering, especially those that are not visible or have not been supplied to the device.
- The device must be used according to its intended use.
- Before using the device, please read the Instructions for Use.
Warnings
In the event of observed incorrect operation of the device, users must notify the manufacturer as soon as possible through the support channel indicated in the IFU. Any serious incident must be reported to the manufacturer and to the national competent authority of the country where the incident occurred, in accordance with MDR Article 87.
Measures in the event of malfunction or changes in performance
The device incorporates a defence-in-depth safety architecture (see section Risk architecture and R-TF-028-011 AI/ML Risk Assessment) that operates as the clinical-safety net for algorithmic malfunction or performance drift. Input-stage safeguards include the Deep Image Quality Assessment (DIQA) subsystem, which rejects images below the minimum quality threshold; six binary safety indicators surface detected anomalies to the healthcare professional at the point of use. On detection of an anomaly, the IFU instructs the healthcare professional to: (i) not rely on the output for the affected case; (ii) revert to the established standard of care for the assessment; and (iii) report the event to the manufacturer via the channels specified above. Post-market monitoring of malfunction events feeds into the PMS/PSUR cycle per R-TF-007-002 and into the next CER update cycle.
Undesirable effects
Any undesirable side-effect should constitute an acceptable risk when weighed against the performances intended.
It is not known or foreseen any undesirable side-effects specifically related to the use of the software.
Instructions for Use
The IFU of the device are developed according to the applicable requirement of MDR 2017/745, Annex I. As indicated in the IFU document, the use methodology is as follows:
The device is architected to seamlessly integrate with other software platforms. Primarily designed as an Application Programming Interface (API), it allows healthcare organizations to establish a real-time connection between their native systems, such as Electronic Medical Records (EMR) systems, and the device. This ensures that images can be sent from the EMR and clinical data from the device can be received and stored back into the EMR in real time.
The Instructions for Use document for the device is made available to users through the manufacturer's commercial channels; it contains detailed information on device integration, safe use, and clinical interpretation of outputs.
The IFU include relevant information such as intended use, warnings or contra-indications, which have been included in the Device identification section above.
Documents reviewed
The following manufacturer-supplied information materials were reviewed as part of this clinical evaluation: (i) the Instructions for Use (IFU) as released for this CER cycle; (ii) the device labelling content recorded in this CER's Contraindications, Precautions, Warnings, and Undesirable effects subsections; and (iii) the promotional materials distributed through the manufacturer's commercial channels, which were reviewed for alignment with the intended purpose and with the performance claims substantiated in this CER.
User categories and training
The device has two user categories: (i) healthcare professionals (primary care physicians, general dermatologists, and specialist dermatologists) who consume the device output for clinical decision support; and (ii) IT professionals who integrate the device via its API but do not themselves consume the clinical output. The IFU addresses both categories with category-specific instructions. Manufacturer-provided training is recommended for all healthcare-professional users and is a prerequisite for IT integrators; use outside these user categories, or without the IFU available, is a deviation from intended use. Summative usability evidence for both categories is summarised in section Summative usability validation.
SSCP applicability
A Summary of Safety and Clinical Performance (SSCP) is not required for this device. Per MDR Article 32(1), the SSCP obligation applies only to implantable devices and to Class III devices. The device under evaluation is a Class IIb non-implantable medical device software; it falls outside the scope of Article 32(1), and therefore no SSCP has been drafted, reviewed, or published on EUDAMED for this device. This conclusion is re-verified at each CER update cycle and on any change of classification.
Consolidated limitations of the device
The following limitations apply to the use of the device and are communicated to healthcare professionals via the IFU.
- Output scope: the device does not output a diagnosis, a binary positive/negative result, a treatment recommendation, a referral decision, or a prognosis (see section
Device outputs). - Interface granularity: the device's stand-alone output is the complete normalised probability distribution across the full 346 ICD-11 categories; the physician-facing user interface renders a prioritised view of that stand-alone output. Downstream clinical decisions are made by the healthcare professional, not by the device (see section
Distinction between the device's stand-alone output and the physician-facing user interface). - Image-quality dependency: the device requires a minimum 12 MP camera and user technique within the DIQA subsystem's acceptance bounds; images below the DIQA threshold are rejected before any clinical inference is produced (see section
Accessories of the product). - Evidence-gap populations: Fitzpatrick V-VI skin and paediatric patients are under-represented in the pivotal evidence base; additional clinical judgement is advised in these subpopulations (acceptable gaps per MDCG 2020-6 § 6.5(e); see sections
Representativeness of the Study PopulationsandPediatric population). - Low-prevalence sub-indication categories: autoimmune dermatoses (~3 %) and genodermatoses (~1 %) are within the intended use and are supported by triangulated pre-certification evidence under MDCG 2020-6 §6.3 — Pillar 1 literature anchors in
R-TF-015-011and Pillar 2 per-epidemiological-group V&V inR-TF-028-006. PMCF Activities D.1 and D.2 confirm and strengthen the pre-certification base in real-world deployment (see sectionRepresentativeness of the Study PopulationsandR-TF-007-002Activities D.1/D.2). For these two sub-categories specifically, the device's output is to be interpreted as supporting information within the healthcare professional's differential-diagnosis workup and not as a standalone diagnostic determination (see sectionDevice outputsand the IFU Precautions).
Each limitation is mirrored in the IFU; the mapping to IFU text is documented in section Consistency with information materials supplied by the manufacturer.
Components
The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures.
Variants
No variants.
Accessories of the product
- Primary accessories are the components that interact directly with the device. These can be known by the manufacturer. They are also required to interact with the device. The device is used through an API (Application Programming Interface). This means that the interface is coded, and used programmatically, without a user interface. In other words: the device is used server-to-server, by computer programs. Thus, no accessory is used directly in interaction with the device.
- Secondary accessories are the components that may interact indirectly with the device. These are developed and maintained independently by the user, and the manufacturer has no visibility as to their identity or operating principles. They are also optional and not required to interact with the device.
The device may also be used indirectly through applications, such as the care provider's Electronic Health Records (EHR). The EHR is the software system that stores patients' data: medical and family history, laboratory and other test results, prescribed medications history, and more. This is developed and maintained independently of us, and may be used to indirectly interact with the device.
The healthcare providers may use image capture devices to take photos of skin structures. In this regard, the minimum requirement is a 12 MP camera. Image-capture devices are not classified as accessories under MDR Article 2(2), because they are general-purpose hardware not specifically intended by the manufacturer to be used together with the device. Variability in capture hardware (resolution, focus, illumination, colour fidelity, dermoscopic vs. non-dermoscopic modality) is mitigated upstream by the device's Deep Image Quality Assessment (DIQA) subsystem, which rejects inputs that fall outside validated technical-performance bounds before any clinical inference is produced. DIQA is the upstream input-safety gate of the defence-in-depth risk architecture described in section Risk architecture, and its acceptance thresholds are documented in R-TF-028-011 AI/ML Risk Assessment and evidenced by the Technical Performance literature appraised under MDCG 2020-1 Pillar 2.
Device materials in contact with patient or user
Due to the nature of the device (stand-alone software), it does not come into contact with tissue or bodily fluids.
Technical specifications
API REST
Our device is built as an API that follows the REST protocol.
This protocol totally separates the user interface from the server and the data storage. Thanks to this, REST API always adapts to the type of syntax or platforms that the user may use, which gives considerable freedom and autonomy to the user. With a REST API, the user can use either PHP, Java, Python or Node.js servers. The only thing is that it is indispensable that the responses to the requests should always take place in the language used for the information exchange: JSON.
OpenAPI Specification
Our medical device includes an OpenAPI Specification.
OpenAPI Specification (formerly known as Swagger Specification) is an API description format for REST APIs. An OpenAPI file allows you to describe an entire API, including:
- Available endpoints and operations on each endpoint (GET, POST)
- Operation parameters Input and output for each operation
- Authentication methods
- Contact information, license, terms of use and other information requested by the MDR regarding the label information and information to be supplied by the manufacturer.
This means that our API itself has embedded specifications that help the user understand the type of values that are transmitted by the API.
HL7 FHIR
FHIR is a standard for health care data exchange, published by HL7®. FHIR is suitable for use in a wide variety of contexts: mobile phone apps, cloud communications, EHR-based data sharing, server communication in large institutional healthcare providers, and much more.
FHIR solves many challenges of data interoperability by defining a simple framework for sharing data between systems.
The relevant performance attributes of the devices are described in the following table.
| Metric | Value |
|---|---|
| Weight | 33 kilobytes |
| Average response time | 1400 milliseconds |
| Maximum requests per second | No limit |
| Service availability time slot | The service is available at all times |
| Service availability rate during its working slot (in % per month) | 100% |
| Maximum application recovery time in the event of a failure (RTO/AIMD) | 6 hours |
| Maximum data loss in the event of a fault (none, current transaction, day, week, etc.) (RPO/PDMA) | None |
| Maximum response time to a transaction | 10 seconds |
| Backup device (software, hardware) | Software (AWS S3) |
| Backup frequency | 12 hours |
| Backup modality | Incremental |
| Recommended dimensions of images sent | 10,000px2 |
How the device achieves its intended purpose
Principle of operation
The device is a computational medical tool leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures.
Mode of action
One core feature of the device is a deep learning-based image recognition technology for the recognition of ICD categories. In other words: when the device is fed an image or a set of images, it outputs an interpretative distribution representation of possible International Classification of Diseases (ICD) categories that might be represented in the pixels content of the image.
The device makes its prediction entirely based on the visual content of the images, with no additional parameters.
The device has been developed following an architecture called Vision Transformer (ViT). This architecture is inspired by the Transformer architecture, which is extensively used in other areas such as NLP and has brought significant advancements in terms of performance.
Another core feature of the device is to provide quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others.
To achieve that, the device uses the following deep learning technologies, combined and developed for that specific use:
- Object detection: used to count clinical signs such as hives, papules or nodules.
- Semantic segmentation: used to determine the extent of clinical signs such as hair loss or erythema.
- Image recognition: used to quantify the intensity of visual clinical signs like erythema, excoriation, dryness, lichenification, oozing, and edema.
Glossary and Definitions of Metrics
Metric Traceability to State of the Art (SotA)
In accordance with MEDDEV 2.7/1 Rev 4 and MDCG 2020-1, the metrics used to evaluate the clinical performance and clinical benefits of the device must be firmly grounded in accepted clinical practice. The following table provides explicit traceability demonstrating that every metric used in this Clinical Evaluation Report is widely recognized and utilized in peer-reviewed scientific literature to assess diagnostic performance in dermatology, particularly for AI-guided medical devices.
| Metric | Traceability to SotA References |
|---|---|
| Area under the ROC curve (AUC) | Haenssle et al. 2018, Chen et al. 2024, Han et al. 2020, Brinker et al. 2019, Tepedino et al. 2024, Ferris et al. 2025, Marchetti et al. 2019, Phillips et al. 2019, Nadour et al. 2025 |
| Sensitivity and Specificity | Haenssle et al. 2018, Chen et al. 2024, Han et al. 2020, Brinker et al. 2019, Tepedino et al. 2024, Ferris et al. 2025, Marchetti et al. 2019, Phillips et al. 2019, Maron et al. 2019, Barata et al. 2023, Maron et al. 2020, Ahadi et al. 2021, Tschandl et al. 2019, Nadour et al. 2025 |
| Accuracy in malignancy detection | Maron et al. 2020, Marchetti et al. 2019, Haenssle et al. 2018 |
| Positive and Negative predictive Values (PPV/NPV) | Ahadi et al. 2021, Tepedino et al. 2024, Tschandl et al. 2019, Han et al. 2020 |
| Top-1, Top-3 and Top-5 accuracy | Han et al. 2020, Navarrete-Dechent et al. 2021, Han et al. 2022, Jain et al. 2021, Kim et al. 2022, Muñoz-López et al. 2021, Escalé-Besa et al. 2023, Ba et al. 2022, Ferris et al. 2025, Fujisawa et al. 2018, Liu et al. 2020 |
| Percentage of variation of accuracy, sensitivity and specificity (AI-aided) | Ba et al. 2022, Ferris et al. 2025, Fujisawa et al. 2018, Goya et al. 2020, Han et al. 2020, Han et al. 2022, Jain et al. 2021, Kim et al. 2022, Krakowski et al. 2024, Maron et al. 2020, Tschandl et al. 2020 |
| Unaided baseline sensitivity, specificity and accuracy | Ba et al. 2022, Ferris et al. 2025, Fujisawa et al. 2018, Goya et al. 2020, Han et al. 2020, Han et al. 2022, Krakowski et al. 2024, Maron et al. 2020, Tschandl et al. 2020, Li et al. 2023 |
| Unweighted Cohen's Kappa | Landis & Koch 1977 |
| Percentage of reduction of unnecessary referrals | Baker et al. 2022, Eminović et al. 2009, Jain et al. 2021, Knol et al. 2006 |
| Impact on waiting times | Giavina-Bianchi et al. 2020, Morton et al. 2010, Hsiao & Oh 2008, Spanish SNS Report 2025, DREES 2018, DERMAsurvey 2013 |
| Remote care capacity / Teledermatology | Giavina-Bianchi et al. 2020, Orekoya et al. 2021, Kheterpal et al. 2023, Whited 2015 |
Metric Definitions
- Top-K accuracy: An AI metric measuring how frequently the correct diagnosis appears within the top K predictions. It can apply to the device alone or to practitioners aided by the device.
- Top-1 accuracy: Successful only if the single top-ranked prediction exactly matches the correct diagnosis. (When literature references general diagnostic "Accuracy," it maps to Top-1 here.) Benchmarks the algorithm's absolute precision against the primary diagnosis made by clinicians.
- Top-3 / Top-5 accuracy: Successful if the correct diagnosis appears anywhere within the top three or five predictions. Reflects the real-world clinical workflow of generating a differential diagnosis; dermatology often involves visually similar conditions requiring a ranked list rather than a single definitive answer prior to biopsy.
- AUC (Area Under the Receiver Operating Characteristic Curve): Measures the model's ability to distinguish between classes (e.g., malignant vs. benign) across all classification thresholds. Provides a robust measure of diagnostic discrimination power independent of the final clinical operating point, used primarily for malignancy detection evaluation.
- Sensitivity and Specificity: Sensitivity measures the ability to correctly identify true positives; specificity measures the ability to correctly identify true negatives. Sensitivity is critical where minimizing false negatives is paramount (malignancy detection, referral prioritization); specificity ensures the healthcare system is not overwhelmed by false positive alerts.
- PPV (Positive Predictive Value): The proportion of positive results that are true positives. Indicates the probability that a patient flagged as high-risk actually has the condition, which is relevant to avoiding unnecessary clinical follow-ups.
- NPV (Negative Predictive Value): The proportion of negative results that are true negatives. Critical for malignancy screening; it indicates the probability that a patient classified as low risk is truly free of the condition.
- ICC (Intraclass Correlation Coefficient): A statistical measure of reliability and agreement between raters or measurements. Used to evaluate consistency between the device's severity assessments and human expert judgments, ensuring quantitative outputs are comparable to expert clinical standards.
- Unweighted Kappa (Cohen's Kappa): Measures inter-rater reliability for categorical items, accounting for agreement by chance. Used to assess agreement between the device and clinical experts for categorical severity levels (Mild, Moderate, Severe).
- Experts' Consensus (Majority Vote): A reference standard methodology where correctness is defined by agreement among a majority of independent medical experts (typically ≥ 75%). Used for complex cases where individual expert opinions vary, benchmarking the device against a collective high-quality clinical reference.
- Efficiency and Resource Optimization Metrics: Metrics quantifying the systemic impact of the device on healthcare workflows.
- Reduction in Cumulative Waiting Time: Measures the decrease in total patient waiting time for specialist consultations after implementing the device's triage support. Demonstrates the benefit of optimized referral pathways by ensuring high-risk cases are seen sooner while reducing unnecessary load on specialist services.
- Reduction in Unnecessary Referrals: Measures the proportion of low-risk cases managed in primary care without specialist referral, compared to unaided practice. Quantifies the device's contribution to healthcare resource optimization by supporting PCPs in identifying cases that do not require specialist intervention.
Use environment
We distinguish between the technical deployment environment and the clinical workflow modality:
- Use Environment (Healthcare Facility): Refers to the technical deployment environment (integration with hospital EMRs, clinics, or professional IT infrastructure), ensuring data security and system reliability.
- Clinical Modality (Remote Care/Teledermatology): Refers to the workflow in which the device is employed. The device is intended for use by HCPs within healthcare facilities to facilitate both in-person assessments and remote assessments (teledermatology), improving the efficiency of referral pathways.
Clinical benefits
The device is claimed to provide three clinical benefits, each defined with sub-criteria, evidence sources and acceptance status that are summarised inline in this CER:
- 7GH — Diagnostic Accuracy: improved accuracy of healthcare professionals in the diagnosis of dermatological conditions across a broad spectrum of clinical presentations, including rare diseases and lesions suspicious for skin cancer. Per-sub-criterion acceptance status, observed magnitudes and supporting studies are reported in section
Summary of Clinical Benefits Achievement(Benefit 7GH row) and in the per-study appraisal sections of this CER. - 5RB — Objective Severity Assessment: objective, quantitative, and reproducible measurement of disease severity to support patient monitoring and treatment decisions. Per-sub-criterion acceptance status, observed magnitudes (including ICC, Cohen's Kappa, Krippendorff α) and supporting studies are reported in section
Summary of Clinical Benefits Achievement(Benefit 5RB row) and in the per-study appraisal sections. - 3KX — Care Pathway Optimisation: improved precision of healthcare professionals in managing dermatological care pathways, encompassing referral decisions, resource allocation, and clinical assessment in remote care settings. Per-sub-criterion acceptance status, observed magnitudes (including reduction in unnecessary referrals, waiting-time reduction, remote-care capacity) and supporting studies are reported in section
Summary of Clinical Benefits Achievement(Benefit 3KX row).
The complete per-claim performance-claim table — covering each individual numerical claim, its source studies, its acceptance criterion derivation, and its pass/fail status — is maintained within the QMS (Performance Claims & Clinical Benefits record) and is presented inline in this CER from the per-benefit subsections (7GH, 5RB, 3KX) and from the Summary of Clinical Benefits Achievement table. Pointing to the Performance Claims QMS record is for traceability only; every claim that contributes to the benefit-risk conclusion is also summarised inline in this CER, consistent with the MDCG 2020-1 expectation that clinical-evaluation conclusions be assessable from the CER itself without reference to external data resources.
Data collection, model training and validation
The development of the AI algorithms incorporated in the device follows the systematic approach defined in GP-028 AI Development, which establishes the methodology for data collection, model training, validation, and maintenance of AI models. The complete AI development lifecycle is documented in the AI Development Plan and AI Development Report for each version of the device.
Data Collection and Management
The data collection process is conducted in accordance with GP-028 AI Development and follows documented Data Collection Instructions (R-TF-028-003 Data Collection Instructions) that specify:
- Dataset Composition: Images were collected from diverse sources, including established skin image datasets and clinical partnerships, ensuring representation of various demographics (age, sex, skin tone) and clinical presentations
- Dataset Size: The dataset comprises images covering near 1000 different ICD categories, with sufficient samples per category to enable robust model training
- Acquisition Protocol: Clinical and technical requirements for image acquisition were specified to ensure consistency and quality across all data sources
All data sources are documented with complete traceability, including provenance, acquisition dates, and verification of compliance with data collection requirements. Data quality verification was performed to ensure images met predefined quality standards before inclusion in the training dataset.
Data Annotation
Medical expert annotations were performed following formal Data Annotation Instructions (R-TF-028-004 Data Annotation Instructions) prepared by the manufacturer in collaboration with clinical experts. These instructions provide unambiguous guidance for:
- Application of ICD category labels to each image
- Delineation of clinical signs (where applicable)
- Annotation quality criteria
All annotators received formal training on these instructions, and annotation quality was verified through inter-annotator agreement metrics and compliance checks. Records of annotator training and competence are maintained in the Device History File.
Data Partitioning Strategy
One crucial step of the development is splitting the dataset into three independent subsets, following best practices in machine learning and the methodology defined in GP-028 AI Development:
- Training set: Used to fit or train the parameters of the AI model
- Validation set: Used to provide an unbiased evaluation of the model fit on the training set while tuning model hyperparameters
- Test set: A fixed subset used to provide an unbiased evaluation of the final model's performance after training is complete
Subject-level splitting
When an incoming image dataset includes metadata that makes it possible to group images by subject (patient), the data is split at the subject level. This strategy prevents data leakage (where images from the same patient appear in both training and test sets) and improves the reliability of the validation and test metrics. This is recognized as a best practice in the field of medical AI.
Dataset reservation for testing
Thanks to a large collection of datasets from diverse sources, it is possible to perform robust external validation by reserving some complete datasets entirely for testing. This approach helps explore and analyze the performance of the model in completely uncontrolled scenarios, simulating real-world deployment conditions.
Development-data composition
To support a transparent reading of the MDCG 2020-1 Pillar 2 (analytical-validation) and Pillar 3 (clinical-performance) evidence that follows, the composition of the development data used to train, tune, and internally test the AI algorithms is disclosed here at the aggregation level relevant to clinical evaluation. The detailed source data are held in R-TF-028-005 AI Development Report and R-TF-028-001 AI/ML Description; this subsection consolidates the Fitzpatrick phototype distribution and the training, validation, and internal-test split totals, so that the representativeness of the development dataset can be assessed by the reader without leaving the CER.
Fitzpatrick phototype distribution of the development dataset
The development dataset comprises 280,342 annotated images spanning 850 ICD-11 categories; 346 categories (277,415 images, 98.96 % of the dataset) satisfied the per-split minimum-sample threshold and were retained for the ICD Category Distribution and Binary Indicator models. The Fitzpatrick phototype distribution across the full development dataset is as follows:
| Fitzpatrick phototype | Images | Proportion of dataset |
|---|---|---|
| I | 89,225 | 31.83% |
| II | 91,349 | 32.58% |
| III | 59,610 | 21.26% |
| IV | 23,466 | 8.37% |
| V | 11,914 | 4.25% |
| VI | 4,778 | 1.70% |
| Total | 280,342 | 100 % |
Grouped coverage: Fitzpatrick I–II 64.41 %, III–IV 29.63 %, V–VI 5.95 %.
Training, validation, and internal-test split totals
Dataset partitioning is performed per ICD-11 category, at the subject (patient) level wherever subject-group metadata is available, to prevent between-split leakage. The three resulting subsets and their sizes are:
| Subset | Images | Proportion of retained dataset |
|---|---|---|
| Training set | 193,686 | 69.09 % |
| Validation set | 48,047 | 17.14 % |
| Internal-test set | 35,726 | 12.74 % |
The splitting procedure is defined per ICD-11 category at the subject (patient) level and does not consider Fitzpatrick phototype as a stratification variable. The phototype distribution of each of the three subsets is therefore equal to the aggregate phototype distribution stated above, up to random-sampling variability. No phototype-stratified rebalancing is applied at the splitting stage: the internal-test set retains a phototype mix representative of the development dataset as a whole, so that the held-out subgroup performance estimates reported below reflect the same under-representation that is present in the development data, and are not artefacts of a synthetic rebalancing step. A per-subset × phototype crosstab is not reported separately because no phototype-based allocation step produces one; reporting it would amount to restating the aggregate distribution in three near-identical rows.
Phototype-stratified performance on the held-out internal-test set
Performance on the independent held-out internal-test set was disaggregated by Fitzpatrick group (I–II, III–IV, V–VI) as part of the pre-specified Bias Analysis and Fairness Evaluation. The full results, including per-stratum sample sizes and bootstrapped 95 % confidence intervals, are reported in R-TF-028-005 AI Development Report, section Bias Analysis and Fairness Evaluation. The principal findings relevant to this CER are:
- Binary indicators (MDCG 2020-1 Pillar 2). All six binary indicators (Malignant, Pre-malignant, Associated with malignancy, Pigmented lesion, Urgent referral, High-priority referral) meet the pre-specified AUC ≥ 0.80 acceptance criterion in all three Fitzpatrick groups, including V–VI (AUC Malignant 0.8364; Pre-malignant 0.8011; Pigmented lesion 0.9059; High-priority referral 0.8546; Urgent referral 0.8268; Associated with malignancy 0.8579).
- ICD-11 category classification (MDCG 2020-1 Pillar 2). Top-1 / Top-3 / Top-5 accuracy is highest on Fitzpatrick I–II, intermediate on III–IV, and lowest on V–VI (Top-1 0.6855 / 0.6146 / 0.5350; Top-3 0.8501 / 0.7740 / 0.6937; Top-5 0.8912 / 0.8221 / 0.7457). All three strata meet the pre-specified overall Top-k acceptance thresholds (Top-1 ≥ 0.50, Top-3 ≥ 0.60, Top-5 ≥ 0.70). The Fitzpatrick V–VI Top-1 result (0.5350; 95 % CI [0.5135, 0.5566]) clears the acceptance threshold by a narrow margin of 3.5 percentage points and is treated here as a marginal pass. This is explicitly the reason
MAN_2025, the IFU Population and performance variability notice, and the post-market follow-up activities described below form part of the evidence strategy, rather than the held-out internal-test margin being relied on in isolation. - Severity quantification (erythema, desquamation, induration; MDCG 2020-1 Pillar 2). The severity-quantification models meet the pre-specified RMAE clinical-agreement thresholds across all three Fitzpatrick groups in the held-out internal-test set; representative Fitzpatrick V–VI sample sizes are n = 43 (erythema) and n = 33 (desquamation), with per-stratum results tabulated in
R-TF-028-005 AI Development Report.
Transparent acknowledgement of under-representation and mitigation pathway
At 5.95 % of the development dataset (V 4.25 %, VI 1.70 %; n = 16,692 of 280,342), Fitzpatrick V–VI is under-represented relative to a globally representative population. This is a known field-wide limitation of published AI dermatology datasets (Tjiu and Lu, 2025; Liu et al., 2023; Groh et al., 2024, reviewed in the section Representativeness of the Study Populations below), but the field-wide nature of the limitation does not discharge the manufacturer's obligation under GSPR 1 and GSPR 10.3 to characterise performance in that subgroup. The under-representation is accordingly:
- communicated to users in the Instructions for Use, Important Safety Information section, Population and performance variability subsection, which instructs users to exercise particular clinical judgment when the device is applied in Fitzpatrick V–VI patients;
- logged as a residual risk with IFU-level communication and post-market monitoring as risk controls in the risk-management record (
R-TF-013-002 Risk Management Record, risk R-7US); and - included among the pre-market evidence gaps catalogued under MDCG 2020-6 § 6.5(e) in the Clinical Evaluation Plan (
R-TF-015-001 Clinical Evaluation Plan), with pre-market and post-market mitigation pre-specified in that plan.
Pre-market mitigation is provided by the dedicated external-evaluation multi-reader multi-case study on Fitzpatrick V–VI images (MAN_2025, 149 curated Fitzpatrick V–VI images; reported in R-TF-015-006 MAN_2025 Clinical Investigation Report and positioned at Rank 11 / MDCG 2020-1 Pillar 3 § 4.4 in the evidence hierarchy). Post-market mitigation is provided through the Post-Market Clinical Follow-up activities specified in R-TF-007-002 PMCF Plan, including activity C.2.1, whose retrospective enriched dataset is designed to ensure representation of Fitzpatrick phototypes IV–VI, together with phototype-stratified performance monitoring embedded in the broader PMCF activity set. The combined pre-market and post-market evidence strategy for phototype coverage is developed further in the sections MAN_2025: Prospective MRMC observational study on Fitzpatrick V–VI images and Representativeness of the Study Populations below.
Model Training and Development
The model training process follows the specifications detailed in the AI Development Plan (R-TF-028-002 AI Development Plan), which defines:
- Model Architecture: The device employs a Vision Transformer (ViT) architecture, inspired by the Transformer architecture extensively used in natural language processing. This architecture has demonstrated significant performance improvements in image recognition tasks
- Training Configuration: Hyperparameters, loss functions, optimization algorithms, data augmentation strategies, and training procedures are specified and documented
- Training Process: The methodology includes transfer learning strategies, convergence criteria, and monitoring procedures to ensure optimal model performance
- Experiment Tracking: Comprehensive records of all experiments, parameter settings, and results are maintained for full reproducibility and traceability
The training process is supported by multiple deep learning technologies tailored to specific clinical tasks:
- Object detection: Used to count clinical signs such as hives, papules or nodules
- Semantic segmentation: Used to determine the extent of clinical signs such as hair loss or erythema
- Image recognition: Used to quantify the intensity of visual clinical signs like erythema, excoriation, dryness, lichenification, oozing, and edema
Model Evaluation and Validation
Algorithm evaluation is conducted according to the metrics and acceptance criteria defined in R-TF-012-009 Validation and Testing of Machine Learning Models. The AI Development Report (R-TF-028-005 AI Development Report) documents comprehensive evidence that the model meets all acceptance criteria, including:
- Performance Metrics: Detailed results for all clinically relevant metrics (e.g., sensitivity, specificity, AUC, F1-score) on the fixed test set, with statistical confidence intervals where applicable
- Subgroup Analysis: Performance is evaluated across demographic subgroups (age, sex, skin tone) to identify and mitigate potential model bias
- External Validation: Performance is assessed on completely independent datasets not used during development to validate generalization capability
- Clinical Validation: Results from clinical studies (documented in this CER) provide evidence of the device's performance in real-world clinical settings
Commissioning and Real-World Validation
Following development and initial validation, the device undergoes commissioning activities as defined in GP-029 Software Delivery and Commissioning. The commissioning process validates the device in its intended environment of use by:
-
Objective 1 - Internal Validation in Representative Environments: The manufacturer creates representative test environments that simulate how clients will integrate the API, including:
- Test mobile applications (iOS, Android) that integrate the device
- Test web applications that consume the device's API
- Simulations of EHR system integrations with FHIR data exchange
- Testing under various network conditions and authentication methods
- Validation that integration documentation is complete and accurate
-
Objective 2 - Client Integration Assurance: Establishment of a comprehensive support framework to ensure clients integrate the device correctly and safely:
- Complete integration documentation and code examples
- Sandbox environment for client testing
- Technical support during integration
- Monitoring of client integrations to identify issues
The commissioning activities are documented in the Software Commissioning Plan (R-TF-029-001 Software Commissioning Plan) and Software Commissioning Report (R-TF-029-002 Software Commissioning Report), which provide evidence of IEC 82304-1:2016 section 6.2 compliance by demonstrating that the software product satisfies user requirements in the intended environment of use.
Risk Management
AI-specific risks identified during development are documented in the AI Risk Matrix (R-TF-028-011 AI Risk Matrix) and communicated to the product development team for inclusion in the overall risk management file (R-TF-013-002 Risk Management Record). This ensures that risks related to data quality, model performance, and potential bias are systematically managed and mitigated.
Traceability and Documentation
Complete traceability is maintained throughout the AI development lifecycle, with all activities documented in accordance with GP-028 AI Development and GP-029 Software Delivery and Commissioning. This includes:
- Dataset provenance and version control
- Model architecture and training configuration
- Experiment logs and results
- Validation and test results
- Commissioning activities and results
- Post-market performance monitoring data
This comprehensive documentation ensures compliance with regulatory requirements (MDR 2017/745, IEC 82304-1:2016, IEC 62304:2006+A1:2015) and supports continuous improvement of the device through post-market surveillance activities.
Status of commercialization
This product has not been commercialized yet. It is undergoing initial CE mark.
Previous version of the device
The predecessor of the current device is hereinafter referred to as "the legacy predecessor". The device under evaluation is an evolution of the legacy predecessor currently available on the market under the Medical Devices Directive (MDD). The transition from the legacy version to the current version involved minor technical updates aimed at software version stabilisation, consolidation of existing features, and full alignment with the Medical Device Regulation (MDR) (EU) 2017/745 requirements. These changes have been assessed and do not impact the device's clinical safety, performance, or fundamental principles of operation. While additional clinical data has been collected to satisfy the higher level of evidence required for its reclassification as Class IIb, the intended purpose remains consistent. This approach aligns with the necessary transition from the MDD to the MDR, specifically addressing:
- Updated Technical Documentation: As mandated by Article 10(4) and Annexes II and III of the MDR, demonstrating conformity with the new General Safety and Performance Requirements (GSPR).
- Post-Market Clinical Follow-up (PMCF) Data: The collected clinical data serves to strengthen the Clinical Evaluation Report (CER), in line with MDR Article 61 and Annex XIV, and relevant guidance from the Medical Device Coordination Group (MDCG) (e.g., MDCG 2020-13 on clinical evaluation and PMCF).
- Demonstration of Equivalence: Since the core technology and clinical application remain unchanged, the updated documentation demonstrates equivalence to the legacy device, enabling the use of existing market experience data to support the clinical evaluation.
The legacy device has been commercialized since 2020 (after obtaining the manufacturing license in Spain) and was certified under the Medical Devices Directive (MDD). Details on its market experience are integrated into Section 16.1.4.
Similar devices on the market
Similar AI-based medical device software marketed in the Union or internationally have been identified and characterised in R-TF-015-011 State of the Art, section Similar devices. Eight similar devices were considered — SkinVision (SkinVision B.V.), MoleScope (MetaOptima Technology Inc.), MoleMapper (Oregon Health & Science University Apps), Huvy (Huvy SAS), DERM (Skin Analytics Ltd.), Dermalyser (AI Medical Technology), FotoFinder (FotoFinder Systems GmbH), and ModelDerm (Iderma Inc.). For each, the following commercialisation context has been documented in R-TF-015-011: manufacturer, targeted medical conditions, CE-marking status under MDR or MDD, and vigilance records retrieved from the MAUDE (FDA), Medical Device Recalls (FDA), and EUDAMED (EU) databases covering the 2015–2025 period. The vigilance search returned zero incident records across all MAUDE and Medical Device Recalls queries and a small number of EUDAMED registrations with no FSCA reports. Sales-volume figures are not publicly disclosed by the comparator manufacturers; market presence is therefore characterised via regulatory class, time on the EU market, and EUDAMED registration rather than unit sales. The comparator set was used to benchmark safety against the state of the art (see section Safety Benchmarking against State of the Art) and to cross-check the information materials of the device for consistency with the warnings and cautions disclosed by comparable manufacturers (see section Consistency between the State of the Art, the available clinical data and the risk management documentation).
Current knowledge - State of the Art
Data sources for the state of the art
In the context of the European Union's Medical Devices Regulation (MDR) 2017/745, state of the art refers to the current level of technical development and accepted clinical practice in products, processes and patient management. Although it lacks an explicit definition, it is understood as the consolidated state of knowledge in science and technology at a specific point in time. It does not necessarily imply the most advanced, expensive or frequently used solution, but what is currently accepted as good practice. The identification and understanding of this state is crucial for risk assessment and plays a key role in the writing of clinical evaluation reports, ensuring alignment with the intended use of medical devices and effective management of associated risks.
In relation to the current knowledge / state of the art in the relevant medical field, the following aspects and information have been verified:
- Applicable standards and medical guidelines.
- Information related to the medical condition managed with the device.
- Other similar devices marketed and medical alternatives available for the target population.
The full state-of-the-art description is in a separate document (R-TF-015-011 State of the Art), attached also to the Clinical Evaluation Plan (CEP) and Report (CER).
Alternative diagnostic pathways considered
For the purposes of the benefit-risk analysis, the following alternative diagnostic pathways and comparable technologies were considered as state-of-the-art comparators offering safety and performance for the same indications and intended patient population:
- Unaided clinical examination by general practitioners: visual assessment of skin findings in a primary care consultation without a diagnostic decision-support tool.
- Unaided clinical examination by general dermatologists: visual assessment by a specialist in a face-to-face dermatology consultation.
- Dermoscopic examination by trained users: contact or polarised dermoscopy by a trained clinician as an adjunct to visual examination, primarily for pigmented-lesion assessment.
- Teledermatology referral: asynchronous or synchronous specialist review of smartphone- or dermatoscope-captured images, with or without AI triage.
- Other AI-based diagnostic decision-support MDSW: the eight similar devices documented in section
Similar devices on the marketand inR-TF-015-011(SkinVision, MoleScope, MoleMapper, Huvy, DERM, Dermalyser, FotoFinder, ModelDerm). - Reference-standard pathways: biopsy with histopathological confirmation for suspected malignancies, and serological or specialist investigations for non-visible components of conditions such as autoimmune skin disease.
The comparative analysis of safety, performance, and benefit-risk profile across these pathways is documented in R-TF-015-011 State of the Art; the acceptance criteria applied in this CER (section Acceptance Criteria Derivation from State of the Art) are calibrated against the applicable pathways from this comparator set.
The following list summarizes the state-of-the-art data related to the device:
-
Methodological referential for bibliographic search:
- MedDev 2.7/1 Rev.4 (applicable guidance for clinical evaluation).
- PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses.
-
Type of search: systematic (documented search strategy, screening, eligibility and selection steps; audit trail available in methods section).
-
Results (bibliographic search): source search yielded N = 227 candidate records. After de-duplication and multi-stage screening, n = 64 clinical articles were included and appraised for methodological quality and relevance. An additional n = 10 items (primarily two manuscripts, 8 guidelines and contextual documents) were referenced to inform clinical context; total material considered = 74. Breakdown used for appraisal: 66 clinical studies; 8 clinical guidelines; 0 unpublished trial reports; 0 registry reports.
-
Referential for data appraisal and weighting:
- IMDRF MDCE WG/N56FINAL:2019 (risk-based clinical evaluation principles).
- Internal appraisal templates informed by Yale and Johns Hopkins academic resources (see Methods).
-
Results (appraisal summary / mean weight): appraisal summary for clinical datasets (n = 64): mean weight = 6.91 / 10. Additional metrics: mean relevance = 4.62 / 6; mean quality = 2.60 / 4; mean level of clinical evidence = 6.0 / 10. Note: datasets with weight < 4 require justification in the clinical evaluation file; none of the included datasets used in the main analysis had weight < 4 without documented rationale.
-
Use: intended use statement: AI-guided medical devices are intended as an adjunctive clinical decision support tools to assist clinicians (primary care practitioners and dermatologists) during dermatology consultation workflows for triage and diagnostic evaluation of skin conditions. It is not intended to replace clinician judgment. Target population: patients presenting with skin lesions or dermatological complaints across adult age groups. User training, labeling, and intended use constraints consistent with similar devices in the literature are required.
-
Expected complications: observed/anticipated hazards: no direct patient harm events attributable to similar devices were identified in the reviewed clinical evidence. Principal risks to be managed:
- Reduced accuracy on heterogeneous, real-world images (dataset shift).
- Inappropriate clinician reliance on AI outputs when used without verification (automation bias).
- False-negative results leading to missed malignancy or delayed referral.
- False-positive results increasing unnecessary referrals/biopsies.
Recommended risk controls: human-in-the-loop workflow, explicit user instructions and limitations, mandatory training, robust PMS and RCA procedures, and monitoring of real-world performance metrics.
-
Expected benefits and performances: access to specialist dermatology services is constrained in many health systems, with variable wait times and heterogeneous diagnostic performance between primary care practitioners (PCPs) and dermatologists. The reviewed literature confirms consistent performance gaps (PCPs show lower sensitivity than dermatologists on clinical image assessments), and that dermoscopy and specialist assessment improve diagnostic accuracy. AI tools have been studied primarily as adjuncts to clinician assessment and as standalone classifiers on curated image sets; real-world performance is commonly lower than reported in controlled datasets, underscoring the need for robust external validation and post-market surveillance.
- Clinical performance observed in reviewed literature: on curated dermoscopic test sets, standalone AI classifiers typically reported sensitivity in the approximate range 80-86% and specificity in the range 77-83%. High-quality meta-analytic evidence (systematic reviews) reports pooled sensitivity and specificity that are consistent with these ranges for melanoma detection using dermoscopic images; performance on clinical (unmagnified) images is lower and more variable. Comparative reader studies demonstrate that AI, when used as a diagnostic adjunct, improves clinician sensitivity and overall accuracy (for example, Maron et al. 2020 reported clinician sensitivity increase from ~59% to ~75% with AI assistance; other reader and trial studies show similar magnitude improvements in sensitivity and modest improvements in specificity or overall accuracy).
- Expected clinical benefits: improved detection sensitivity for malignancy (reducing missed cancers), standardization of preliminary triage decisions, support for prioritization of referrals to secondary care, potential reduction in unnecessary specialist referrals and benign biopsies when AI is combined with clinical assessment, and increased efficiency in workflows (fewer repeat assessments, faster triage). Benefits are contingent on correct deployment: appropriate external validation, integration into clinician workflows with human oversight, and active PMS to detect performance drift.
Conclusion: the evidence supports adoption as a clinician-support tool under controlled conditions and with documented risk controls; standalone use without clinician oversight is not supported by the available clinical evidence and is not recommended in the intended use statement.
Clinical Evaluation of the device
Clinical evidence assessment framework
This section describes the regulatory framework, evidence quality hierarchy, and assessment methodology used to evaluate the clinical evidence portfolio for this device. It establishes the basis for the tiered evidence structure, data pooling methodology, and per-study appraisals that follow.
Regulatory framework and applicable guidance
Per MDR Article 61(1), the manufacturer shall specify and justify the level of clinical evidence necessary to demonstrate conformity with the relevant general safety and performance requirements (GSPRs). That level of clinical evidence shall be appropriate in view of the characteristics of the device and its intended purpose.
The clinical evaluation follows a combined methodology drawing on three complementary guidance documents, as endorsed by MDCG 2020-6 Appendix I:
- MDCG 2020-6 (Clinical evidence needed for medical devices previously CE marked under Directives 93/42/EEC or 90/385/EEC): defines the evidence quality hierarchy (Appendix III), the clinical evaluation plan checklist (Appendix II), and MDR-specific requirements including the narrower definition of "clinical data" per Article 2(48). This is the primary MDR-era guidance for legacy device clinical evaluation.
- MEDDEV 2.7.1 Rev 4 (Clinical evaluation: a guide for manufacturers and notified bodies): provides the process methodology for the clinical evaluation, structured as Stages 0 through 4, from scoping (Stage 0) through identification of pertinent data (Stage 1), appraisal of pertinent data (Stage 2), analysis of clinical data (Stage 3), to the ongoing clinical evaluation through post-market surveillance (Stage 4). Sections 6.4, 8, 9, 10, and Annexes A3, A4, A5, A6, A7.2, A7.3, A7.4, and A10 remain applicable under MDR per MDCG 2020-6 Appendix I. Where MEDDEV Section 10 references MDD Essential Requirements, these have been substituted with the corresponding MDR GSPRs.
- MDCG 2020-1 (Guidance on clinical evaluation of medical device software): defines the three-pillar evidence framework specifically applicable to medical device software (MDSW), harmonised with IMDRF/SaMD WG/N41FINAL:2017. This is directly applicable as the device is an AI-based MDSW.
Evidence quality hierarchy (MDCG 2020-6 Appendix III)
MDCG 2020-6 Appendix III establishes a 12-level hierarchy of clinical evidence, ranked from strongest (Rank 1) to weakest (Rank 12). For Class III devices, Rank 4 is the minimum acceptable evidence level. The ranks most relevant to the clinical evidence portfolio of this device are:
| Rank | Description | Applicability to this device |
|---|---|---|
| 1 | High-quality clinical investigations covering all variants, indications, populations, and duration of use | Not achievable pre-market for a device covering the full ICD-11 dermatological spectrum; coverage gaps are inherent and addressed by the tiered evidence strategy and PMCF |
| 2 | High-quality clinical investigations with some gaps; gaps must be justified with risk assessment; PMCF required | Applicable to prospective studies conducted in real clinical settings with protocol-driven methodology, ethics committee approval, and formal clinical investigation plans, where the study itself is methodologically sound but does not cover all indications or populations |
| 4 | Studies with methodological limitations but data still quantifiable and acceptability justifiable | Applicable to studies with design constraints that limit the evaluability of one or more endpoints, but where quantitative performance data remains extractable and clinically meaningful. A supplementary Rank 4 case is retained for the quantitative endpoints of the legacy-device post-market observational study R-TF-015-012 under the Appendix III "High quality surveys may also fall into this category" note, but is not the lead classification for that study (see Rank 8 row below) |
| 6 | Peer-reviewed literature on the device's own algorithms (evaluation of state of the art) | Applicable to the four published peer-reviewed severity-validation publications (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022) and to the AIHS4_2025 n = 2 retrospective proof-of-concept pilot. MINORS methodological-quality appraisal is layered on top of the rank (minimum aggregate ≥ 12/16 for the four publications). Appendix III Rank 5 is reserved for equivalence data; peer-reviewed literature on the device's own algorithms belongs at Rank 6 by strict reading |
| 7 | Complaints and vigilance data; curated quality management system data | Applicable to the legacy device's passive post-market surveillance data (2020-present): complaints, incidents, vigilance reports, and trend analyses consolidated in the legacy umbrella PMS Report (R-TF-007-003), which is the Report paired with the legacy umbrella PMS Plan (R-TF-007-005) |
| 8 | Proactive PMS data, such as that derived from surveys and professional opinion | Primary classification of the R-TF-015-012 post-market observational study — applicable to both the Likert professional-opinion items (B1, B3, B5, C1–C3, D1, D3, D5, E1, F3) and the quantitative endpoints (three co-primary endpoints B2, C4, D4 and six supportive endpoints B4, B6, C5, D2, D6, D7) and the Section F safety items F1–F4. The conservative Rank 8 reading reflects the cross-sectional physician-recall design. A supplementary Rank 4 case is retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note; the Pillar 3 sufficiency determination does not depend on the supplementary Rank 4 reading |
| 11 | Simulated use testing with healthcare professionals | Applicable to multi-reader multi-case (MRMC) studies using clinical images in a simulated assessment environment. Per MDCG 2020-6, simulated-use studies do not constitute "clinical data" under MDR Article 2(48), as the device is not used on real patients in real clinical situations. Per MDCG 2020-1 §4.4, these studies contribute Pillar 3 Clinical Performance supporting evidence (intended users achieving clinically relevant outputs when using the device's Top-5 prioritised differential on images representative of the intended patient population); they are not Pillar 2 analytical-performance evidence. Rank and Pillar are orthogonal: Rank 11 reflects the methodological quality of the measurement; Pillar 3 reflects its evidentiary role. |
Three-pillar evidence framework for MDSW (MDCG 2020-1)
MDCG 2020-1 establishes three evidence pillars that must be addressed for medical device software. These pillars are not sequential stages but parallel requirements, each satisfied by different types of evidence:
| Pillar | Definition | Evidence source for this device |
|---|---|---|
| Valid Clinical Association (VCA) | The software's output correlates with a real clinical condition, accepted by the medical community and described in peer-reviewed literature. Each specific claimed output requires separate VCA establishment. | Established through the systematic literature review (R-TF-015-011 State of the Art): ICD-11 dermatological categories are well-characterised clinical entities with established diagnostic criteria published in professional society guidelines and peer-reviewed literature. The Pillar 1 VCA evidence is reinforced by the targeted surrogate-endpoint-validity anchoring search documented in R-TF-015-011 §"Surrogate endpoint validity", which independently establishes that the three surrogate-endpoint families underlying the declared clinical benefits — diagnostic accuracy (benefit 7GH), severity scoring (benefit 5RB) and referral optimisation / care-pathway metrics (benefit 3KX) — are accepted proxies for patient-relevant outcomes in peer-reviewed dermatology and regulator-accepted clinical-endpoint history. No VCA gaps were identified for any claimed output. |
| Technical Performance | The software reliably and accurately generates its intended outputs from its inputs, across the full range of real-world input variability (image conditions, skin tones, camera types, body locations). | Demonstrated by the four published peer-reviewed severity validation studies (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022), which assess algorithm-level concordance with independent expert dermatologist consensus on validated severity scales, and by the AI model verification and validation (V&V) documentation together with algorithm validation against the manufacturer's curated labelled image database. These evidence sources establish accuracy, generalisability and data-quality characteristics of the algorithm independently of user interaction. |
| Clinical Performance | The software produces clinically relevant outputs when used in the real intended-use context by the intended users on the intended patient population. | Demonstrated primarily by the prospective clinical studies (MC_EVCDAO_2019, IDEI_2023, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022): real patients, real clinical settings, real clinical decisions. Further supported, at a lower evidence rank, by the MRMC simulated-use reader studies (BI_2024, PH_2024, SAN_2024, and the Fitzpatrick V–VI MAN_2025 reader study) which demonstrate that intended users (HCPs) achieve clinically relevant outputs when using the device on images representative of the intended patient population across the full Fitzpatrick scale (MDCG 2020-1 §4.4). Extended by the protocolled post-market observational study of the equivalent legacy device (R-TF-015-012) — Rank 4 quantitative outcomes and Rank 8 professional-opinion data on routine-practice clinical performance across 21 clinical sites. Confirmed by the legacy device PMS safety data (4+ years of market experience). |
MRMC simulated-use studies are positioned within Pillar 3 Clinical Performance at a lower evidence rank (Rank 11 per MDCG 2020-6 Appendix III) than the Rank 2–4 prospective studies, reflecting that the images were not captured from live patient consultations. The MRMC studies are not "clinical data" under the strict MDR Article 2(48) definition; however, what they measure — intended users achieving clinically relevant outputs when using the device's Top-5 prioritised differential — is Clinical Performance per MDCG 2020-1 §4.4. They corroborate and reinforce the clinical-performance findings of the prospective studies and extend the evidence base to rare-disease subgroups, broader HCP cohorts, and Fitzpatrick-phototype generalisability (MAN_2025); they are not relied upon to independently establish clinical performance claims in isolation from the prospective studies and the post-market observational evidence.
Causal pathway and clinical meaningfulness of the selected endpoints
The device is classified Class IIb under MDR Rule 11, and its clinical benefit is demonstrated through performance-based endpoints (diagnostic accuracy, sensitivity, specificity, AUC) and workflow-related endpoints (referral appropriateness, waiting-time reduction, severity-score-driven treatment decisions) rather than through direct patient-outcome endpoints (e.g., melanoma-specific mortality). Under MDCG 2020-1 §4.4, Pillar 1 Valid Clinical Association must establish that the device's output is clinically associated with the targeted condition, and the expected benefit must be clinically meaningful. This subsection articulates the causal pathway from device output to patient-relevant outcome for each of the three consolidated clinical benefits, and positions the selected surrogate endpoints as clinically meaningful in the peer-reviewed dermatology literature and in regulator-accepted clinical-endpoint history. The literature anchoring this narrative is documented in R-TF-015-011 State of the Art §"Surrogate endpoint validity".
Benefit 7GH — Diagnostic accuracy → stage-at-detection and time-to-treatment → patient-relevant outcome. The causal pathway is: AI-assisted diagnostic accuracy (higher sensitivity and specificity, higher AUC, higher top-k concordance with histopathology) → appropriate biopsy / referral / reassurance decision by the supervising clinician → shift in stage distribution at detection → reduced diagnostic delay and reduced time-to-definitive-surgery → improved patient-relevant outcome. The quantitative magnitude of the stage-to-outcome link is provided by the AJCC 8th-edition evidence base (Gershenwald et al. 2017; 5-year melanoma-specific survival spans 99 % at stage IA to 32 % at stage IIID across > 46,000 patients); the time-to-surgery → overall-survival linkage is provided by the National Cancer Database cohort analysis of 153,218 melanoma patients (Conic et al. 2018; adjusted-mortality hazard increases from +5 % at 30–59 days to +41 % at > 119 days relative to ≤ 30 days). Because the device operates as a clinician-supervised decision-support tool and does not independently trigger clinical actions, the benefit is realised via the supervising clinician's biopsy / excision / referral decision, which the device's output informs. Diagnostic accuracy is therefore a clinically meaningful surrogate for 7GH: its improvement maps, via the AJCC and NCDB anchors, onto reduced melanoma-specific morbidity and mortality.
Benefit 5RB — Severity scoring → treatment-decision → disease control and HRQoL. The causal pathway is: objective, reproducible severity score (PASI, EASI, SCORAD, SALT, IGA) → treat-to-target treatment-escalation / de-escalation decision → improved disease control (clinical remission) → improved patient-reported outcome and health-related quality of life (HRQoL). PASI, EASI, SCORAD and SALT are regulator-accepted primary efficacy endpoints in the EU guidelines on psoriasis clinical investigation (EMA CHMP/EWP/2454/02, 2004) and in the HOME international consensus for atopic eczema (Schmitt et al. 2014), and SALT is the regulator-accepted primary endpoint for alopecia areata (Olsen et al. 2004 NAAF consensus; King et al. 2022 BRAVE-AA pivotal RCTs). The magnitude of the severity-score → HRQoL linkage is quantified at trial-arm level by the systematic-review r² = 0.80 between PASI % improvement and DLQI change across 13 biologic RCTs (Mattei et al. 2014), and at individual-trial level by pivotal-trial demonstration that EASI-75 / IGA 0/1 responders show concordant POEM, peak-pruritus NRS and DLQI improvement (Simpson et al. 2016 dupilumab SOLO 1 / 2). The operational treat-to-target rule that converts severity-score change into a treatment decision is codified by the European consensus (Mrowietz et al. 2011): post-induction ΔPASI ≥ 75 % → continue; ΔPASI 50–< 75 % with DLQI ≤ 5 → continue, otherwise modify. Objective severity scoring is therefore a clinically meaningful surrogate for 5RB: regulators already accept its response thresholds as the basis for drug approvals, and the literature quantifies the link from severity-score response to patient-reported improvement.
Benefit 3KX — Referral optimisation and care-pathway metrics → access, time-to-treatment, and preserved clinical outcome at lower cost. The causal pathway is: improved referral appropriateness, reduced waiting time, and adequate remote assessment → faster specialist access for patients who genuinely need it, reduced waiting-list burden and fewer unnecessary in-person visits → preserved or improved clinical outcomes at lower cost and with improved equity of access. RCT-level evidence (Whited et al. 2013 VA store-and-forward RCT; Armstrong et al. 2018 psoriasis online-vs-in-person equivalency RCT) demonstrates that teledermatology-enabled care-pathway redesign preserves disease-control outcomes relative to in-person care. Large real-world EU evidence (Moreno-Ramirez et al. 2007; 2,009 teleconsultations across 12 primary-care centres in Seville) demonstrates a filtering rate of 51 % of avoided face-to-face visits and a waiting-interval reduction from 89 days to 12 days with preserved melanoma detection sensitivity. Systematic-review evidence (Snoswell et al. 2016) establishes favourable cost-outcome trade-offs across 14 economic evaluations. The waiting-time → time-to-definitive-surgery → patient-outcome linkage for suspected malignancy is closed by the same NCDB evidence cited under 7GH (Conic 2018). Referral optimisation and care-pathway metrics are therefore clinically meaningful surrogates for 3KX: their improvement maps, via the RCT equivalence evidence and the NCDB time-to-surgery linkage, onto preserved or improved per-patient outcome and reduced system-level harm from diagnostic delay.
Cross-cutting position. The three surrogate families are not independent — they constitute sequential and parallel links in a single causal chain from device output to patient benefit. Diagnostic accuracy is the proximal surrogate; severity scoring provides the longitudinal treatment-decision and HRQoL outcome channel; referral optimisation provides the access and system-level outcome channel. Each chain is closed to a patient-relevant outcome by an independently established anchor drawn from the peer-reviewed dermatology literature and regulator-accepted clinical-endpoint history, as documented in R-TF-015-011 State of the Art §"Surrogate endpoint validity". The residual uncertainty — principally the absence of an AI-dermatology RCT with direct mortality or stage-shift as a primary endpoint, and the under-evidenced generalisability across Fitzpatrick V–VI populations — is declared in this CER (§"Need for more clinical evidence" and §"Limitations") and is addressed through pre-specified post-market clinical follow-up activities under the PMCF Plan (R-TF-007-002), consistent with MDCG 2020-6 §6.5(e).
Per-study evidence appraisal
Each clinical investigation in the evidence portfolio is appraised below for its MDCG 2020-6 Appendix III rank, MDCG 2020-1 pillar contribution, methodological quality, key results, and limitations. Studies are presented in order of evidence strength, from primary clinical evidence to supporting and corroborating evidence.
The appraisal methodology follows MEDDEV 2.7.1 Rev 4 Section 9 (Stage 2): each dataset is assessed for methodological quality (study design, sample size, endpoints, controls, ethics compliance, report quality), relevance (pivotal vs. supporting), and weighting. The appraisal criteria applied are consistent with IMDRF MDCE WG/N56FINAL:2019, as specified in the Clinical Evaluation Plan (R-TF-015-001).
Primary clinical evidence (Rank 2: high-quality clinical investigations with coverage gaps)
These studies are protocol-driven, ethics-committee-approved, publicly registered prospective clinical investigations conducted with real patients in real clinical settings. They constitute Rank 2 evidence: high-quality clinical investigations with some gaps in coverage (not all indications, not all populations, not all skin phototypes equally represented). These gaps are justified by the tiered evidence structure and risk assessment (see below) and addressed by PMCF activities (see R-TF-007-002).
MC_EVCDAO_2019: Prospective analytical observational study
- Title:
- Clinical validation study of a CAD system with artificial intelligence algorithms for early noninvasive in vivo cutaneous melanoma detection
- Investigational site(s):
- Hospital Universitario Cruces and Hospital Universitario Basurto
- Principal investigator(s):
- Dr. Jesus Gardeazabal Garcia and Dr. Rosa Ma Izu Belloso
- Sample size:
- 105
- Study period:
- February 2020 – November 2023
- Ethics committee:
- Comite de Etica de la Investigacion con Medicamentos de Euskadi
- Device under investigation:
- Legit.Health Legacy Device
- Publication status:
- Expected to be sent in May 2025 to the Journal of the European Academy of Dermatology and Venereology Clinical Practice
- Sample composition: 36 melanoma and 69 other lesions (BCC, actinic keratosis, nevi, and benign lesions).
- Registration and ethics reference: ClinicalTrials.gov NCT06221397; EMA RWD Catalogue EUPAS108254; CEIm Euskadi reference PI2019216 (approved 2022-01-13).
- Primary endpoint and results: Melanoma detection AUC 0.8482 (95% CI: 0.7629-0.9222). Top-1 sensitivity 73.79%; Top-1 specificity 80.54%. Top-3 sensitivity 90.32%.
- Secondary results: Malignancy prediction AUC 0.8983 (95% CI: 0.8430-0.9438). Performance comparable to expert dermatologists.
- Acceptance criteria status: Met. As the only study in the portfolio specifically designed to assess melanoma detection, this result (AUC 0.8482, expressed as 0.85 rounded; 95% CI: 0.7629–0.9222) constitutes the global device AUC for melanoma. The result meets the SotA-derived acceptance criterion (AUC ≥ 0.81) and the study's own pre-specified design criterion (AUC ≥ 0.80).
- Limitations: Originally planned for 200 patients; study concluded at 105 because melanoma prevalence in the enrolled population (34.3%) exceeded the target ratio (20%), providing sufficient statistical power for the primary endpoint without additional recruitment. Limited Fitzpatrick skin type diversity (87.1% Type I, 9.8% Type II). Primarily dermatoscopic images (DermLite Foto X with smartphones). Low-quality images were excluded using the DIQA algorithm (threshold < 5), which mirrors the device's real-world behaviour (the device itself rejects images below this quality threshold in clinical use).
- MDCG 2020-1 pillar: Clinical Performance: real patients in a real clinical dermatology setting.
MC_EVCDAO_2019 is the cornerstone of the Tier 1 malignancy detection evidence and the study most directly addressing malignant condition coverage. As the only study in the portfolio designed specifically to validate melanoma detection in a controlled specialist setting, it provides the primary AUC estimate (0.8482; 95% CI: 0.7629–0.9222), expressed as 0.85. As the only melanoma-specific study in the portfolio, this result constitutes the global device AUC for melanoma detection, meeting the SotA-derived acceptance criterion of AUC ≥ 0.81 established in the acceptance criteria derivation table. The study's own pre-specified design criterion was AUC ≥ 0.80, which was also met. Early closure at 105 of the planned 200 patients was scientifically justified: the enrolled cohort exceeded the target melanoma prevalence ratio (34.3% vs. the planned 20%), preserving statistical power for the primary endpoint without additional patient burden. The predominance of dermatoscopic images (DermLite Foto X with current-generation smartphones) reflects the specialist workflow for which Tier 1 evidence is intended. The QUADAS-2 assessment (detailed in "Validated methodological quality appraisal" below) identifies HIGH risk in the Patient Selection and Flow and Timing domains, both attributable to the specialist-enriched setting and the enrolment shortfall relative to the planned sample size rather than to flaws in protocol execution.
COVIDX_EVCDAO_2022: Prospective observational longitudinal study with remote monitoring

- Title:
- Clinical Validation of a Computer-Aided Diagnosis (CAD) System Utilizing Artificial Intelligence Algorithms for Continuous and Remote Monitoring of Patient Condition Severity in an Objective and Stable Manner
- Investigational site(s):
- Torrejón University Hospital
- Principal investigator(s):
- Dra. Marta Andreu
- Sample size:
- 160
- Study period:
- April 2022 – October 2023
- Ethics committee:
- Comité de Ética de la Investigación con Medicamentos de los Hospitales Universitarios Torrevieja y Elche-Vinalopó
- Sponsor:
- AI Labs Group S.L.
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Study structure: 160 patients monitored by 6 dermatologists over a 6-month follow-up period.
- Registration and ethics reference: ClinicalTrials.gov NCT06237036; EMA RWD Catalogue EUPAS108260; CEIm Torrevieja-Elche reference 12/04/22 LEGIT_COVIDX (approved 2022-04-13).
- Primary endpoint and results: Pre-specified target Clinical Utility Score (CUS) ≥ 8 on a 0–10 scale; achieved CUS 7.66 (range across respondents reported in the CIR). Remote monitoring workflow validated over the 6-month follow-up: longitudinal monitoring rate, reduction in face-to-face consultations and physician-perceived monitoring reliability all met their pre-specified secondary thresholds.
- Acceptance criteria status: Primary endpoint not met (observed CUS 7.66 vs. pre-specified ≥ 8). Secondary remote-monitoring endpoints met. The shortfall against the primary threshold is attributable to a single low-scoring outlier within a small dermatologist cohort (n = 6); a sensitivity analysis excluding the outlier yields CUS > 8. The shortfall does not invalidate the secondary remote-monitoring evidence.
- Limitations: Small dermatologist cohort (n = 6) makes the primary endpoint sensitive to outliers. Image-quality variability from patient-taken smartphone images (representative of real-world remote use conditions). Primary purpose was validation of the remote-monitoring workflow rather than standalone diagnostic accuracy.
- MDCG 2020-1 pillar: Clinical Performance — real patients in a real longitudinal clinical follow-up setting, validating the device's use in the remote-care pathway. The contribution to Pillar 3 is the validated remote-monitoring secondary evidence, not the primary CUS endpoint.
COVIDX_EVCDAO_2022 addresses a distinct clinical question from the diagnostic-accuracy studies: whether the device delivers sustained clinical utility across a six-month longitudinal remote-monitoring pathway. Conducted at Torrejón University Hospital in Madrid with 160 enrolled patients and six dermatologists, the study did not meet the pre-specified primary Clinical Utility Score (CUS) acceptance criterion of ≥ 8 (observed CUS 7.66). The shortfall is explained by a single low-scoring outlier within the small dermatologist cohort (n = 6); a pre-specified sensitivity analysis excluding the outlier yields CUS > 8. The secondary remote-monitoring endpoints (longitudinal monitoring rate, face-to-face consultation reduction, perceived monitoring reliability) were met and are the contribution of this study to Benefit 3KX(c) "remote care", consistent with the per-benefit acceptance criteria documented in the Benefit 3KX section below. This study is the only one in the portfolio that validates sustained utility in a telehealth model over time, making it irreplaceable for the remote-care sub-criterion. The MINORS score of 13/16 reflects limitations intrinsic to clinical-utility measurement: the self-reported outcome scale introduces Hawthorne and social-desirability effects (item 5), the pre-enrolment screening selected against patients with rapidly changing disease (item 2), and CIR documentation does not explicitly quantify post-enrolment completion rates (item 7). The unmet primary CUS endpoint is documented transparently in this CER and in the Acceptance Criteria status table, with the small-cohort outlier explanation; the device's overall benefit-risk conclusion does not depend on the primary CUS endpoint of this single study, because Benefit 3KX(c) "remote care" is supported by the secondary remote-monitoring endpoints met in COVIDX_EVCDAO_2022 and by the post-market R-TF-015-012 Rank 4 quantitative endpoints (D6, D7).
DAO_Derivación_O_2022: Prospective observational longitudinal analytical study (real-world referral pathway)
- Title:
- Pilot study for the clinical validation of an artificial intelligence algorithm to optimize the appropriateness of dermatology referrals.
- Investigational site(s):
- Health Centre Sodupe-Güeñes, Health Centre Balmaseda, Health Centre Buruaga, and Health Centre Zurbaran
- Principal investigator(s):
- Dr. Jesús Gardeazabal García and Dr. Rosa Mª Izu Belloso
- Sample size:
- 117
- Study period:
- November 2022 – April 2025
- Ethics committee:
- Comité de Ética de la Investigación con Medicamentos de Euskadi
- Sponsor:
- AI Labs Group S.L.
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Sample composition: 127 patients enrolled (117 analysed after exclusions); 198 images (184 after quality refinement). Setting: primary care to dermatology referral pathway across the Cruces and Basurto University Hospital networks (Bilbao, Spain).
- Registration and ethics reference: ClinicalTrials.gov NCT06228014; EMA RWD Catalogue EUPAS108167; CEIm Euskadi reference PS2022074 (approved 2022-11-23).
- Primary endpoint and results: Malignancy detection AUC 0.82. Referral optimisation demonstrated: increased appropriate referrals vs primary care baseline. Sensitivity 74.2%, specificity 67.3% for referral detection. Relative increase in adequacy of referrals of +38% (acceptance threshold ≥ +15%).
- Acceptance criteria status: Met: device superiority in referral optimisation demonstrated despite reduced sample size.
- Limitations: Under-recruited (49.5% of planned sample) due to clinical workload constraints at participating sites. Image quality variability.
- MDCG 2020-1 pillar: Clinical Performance: real patients in the real-world primary care to dermatology referral workflow. This study is the closest to the device's intended clinical deployment context, directly addressing the question of whether the device improves clinical decision-making when integrated into the standard referral pathway.
DAO_Derivación_O_2022 is the most contextually relevant study in the portfolio: a prospective, comparative, real-world assessment of device-assisted versus unassisted referral decision-making conducted within the established primary care to dermatology referral pathway at Hospital Universitario de Cruces. It provides the primary evidence for Benefit 3KX — demonstrating that device-assisted primary care referrals achieve a higher malignancy detection AUC (0.82) and a measurable improvement in referral appropriateness over the unassisted primary care baseline. Despite reaching approximately 32% of the planned sample due to clinical workload constraints at the participating sites, the achieved cohort of 117 patients delivered statistically meaningful comparative evidence. The MINORS score of 22/24 — the highest in the prospective clinical study group — reflects a well-designed comparative study with objective, system-derived endpoints; the two-point deduction (items 7 and 8) reflects the enrolment shortfall relative to the power calculation.
Supporting clinical evidence (Rank 4: methodological limitations, data quantifiable)
These studies are prospective, ethics-approved, and publicly registered, but have methodological limitations that reduce the evaluability of one or more endpoints. The quantitative performance data remains extractable and clinically meaningful.
IDEI_2023: Prospective observational with retrospective case series

- Title:
- Optimisation of clinical flow in patients with dermatological conditions using Artificial Intelligence
- Investigational site(s):
- Instituto de Dermatología Integral (IDEI)
- Principal investigator(s):
- Dr. Miguel Sánchez Viera
- Sample size:
- 204
- Study period:
- January 2024 – August 2024
- Ethics committee:
- Comité de Ética de la Investigación con Medicamentos de HM Hospitales (Reference: 24.12.2266-GHM)
- Sponsor:
- AI Labs Group S.L.
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Sample composition: 108 pigmented lesions and 96 androgenetic alopecia cases (76 retrospective + 32 prospective lesion evaluations; 62 retrospective + 34 prospective alopecia evaluations).
- Registration and ethics reference: ClinicalTrials.gov NCT05656709; EMA RWD Catalogue EUPAS1000000045; CEIm HM Hospitals reference 24.12.2266-GHM (approved 2024-01-25). Preprint DOI: 10.1101/2025.03.11.25323753.
- Primary endpoint and results: Retrospective malignancy AUC 0.7738 (95% CI: 0.6345-0.8908); sensitivity 58.44%; specificity 77.92%. Prospective malignancy AUC 0.9430 (95% CI: 0.8132-1.0000); sensitivity 97.06%; specificity 97.06%. Alopecia Ludwig score correlation improved (Cohen's Kappa=0.53. Retrospective Cohen's Kappa 0.33, prospective 0.77).
- Acceptance criteria status: Met: high diagnostic accuracy achieved, particularly in the prospective arm.
- Methodological limitation: Mixed design; the retrospective component yields lower performance than the prospective component, consistent with differences in image quality between retrospective clinical photographs and prospective protocol-controlled image acquisition. The discrepancy between retrospective (AUC 0.7738) and prospective (AUC 0.9430) results highlights the impact of image acquisition conditions on device performance, which is informative for understanding real-world deployment variability.
- MDCG 2020-1 pillar: Clinical Performance: real patients, though the retrospective arm is less representative of intended use than the prospective arm.
IDEI_2023 extends the evidence portfolio to two separate clinical applications: malignant pigmented lesion detection at a private dermatology clinic and automated alopecia grading using the Ludwig scale. With 204 patients across both indications, it is the largest study in the Rank 4 supporting tier. The study's mixed prospective/retrospective design is its defining feature: the retrospective arm — comprising cases from existing clinic records where protocol-controlled image acquisition was not applied — yields a lower malignancy detection AUC (0.7738) than the prospective arm (0.9430). This discrepancy is informative rather than concerning: it quantifies the performance gap attributable to image acquisition conditions, directly validating the clinical relevance of the built-in DIQA algorithm that the device uses in deployment to reject images below the minimum quality threshold. The QUADAS-2 assessment (detailed in "Validated methodological quality appraisal" below) identifies the 33% post-enrolment exclusion in the prospective pigmented-lesion arm — patients for whom biopsy confirmation was unavailable — as the primary source of HIGH risk in the Patient Selection and Flow and Timing domains. Despite this limitation, the prospective-arm AUC of 0.9430 remains the primary evidence contribution from this study, and the alopecia severity correlation (Cohen's Kappa 0.53 in both datasets) provides supporting evidence for the Ludwig-scale sub-criterion.
DAO_Derivación_PH_2022: Prospective analytical observational study
- Title:
- Project to enhance Dermatology E-Consultations in Primary Care Centres using Artificial Intelligence Tools
- Investigational site(s):
- Pozuelo and Majadahonda Health Centers and Puerta del Hierro Majadahonda University Hospital
- Principal investigator(s):
- Dr. Gastón Roustan Gullón
- Sample size:
- 131
- Study period:
- June 2022 – January 2024
- Ethics committee:
- Comité de Ética de la Investigación con Medicamentos del Hospital Universitario Puerta del Hierro de Majadahonda
- Sponsor:
- Instituto de Investigación Sanitaria Puerta de Hierro
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Sample composition: 131 patients (exceeded the 100-patient target); 180 diagnostic reports; 15 participating healthcare professionals. Setting: primary care to dermatology referral pathway at Hospital Universitario Puerta de Hierro (Madrid, Spain).
- Registration and ethics reference: ClinicalTrials.gov NCT07429123; EMA RWD Catalogue EUPAS108166; CEIm Puerta de Hierro reference 47/395984.9/22 (approved 2022-06-24).
- Primary endpoint and results: Malignancy AUC 0.842 (acceptance threshold ≥ 0.80). HCP satisfaction score 7.6/10; recommendation level 7.7/10. Relative increase in adequacy of referrals of +25% (acceptance threshold ≥ +15%).
- Acceptance criteria status: Partially met: the primary objective (diagnostic improvement attributable to device use, measured as baseline vs with-device) could not be evaluated due to the protocol deviation described below. Secondary objectives (malignancy detection, referral reduction, HCP satisfaction) were met.
- Methodological limitation: Major protocol deviation; many healthcare professionals used the device from the outset without first recording a baseline diagnosis without the device. This prevented assessment of the primary objective (improvement attributable to device use). Large missing data on PCPs' diagnoses without device. However, the study still provides quantifiable malignancy detection metrics (AUC 0.84, melanoma specificity 91%) and real-world performance data from the referral pathway.
- MDCG 2020-1 pillar: Clinical Performance: real patients in a real referral pathway, but the protocol deviation compromises the primary comparative endpoint. The malignancy detection metrics remain valid as standalone performance measures.
DAO_Derivación_PH_2022 is the second real-world referral pathway study, conducted at Hospital Universitario Puerta de Hierro with 131 patients and 15 healthcare professionals — exceeding the planned 100-patient target. A documented protocol deviation compromised the primary comparative objective: many participating healthcare professionals used the device from the outset without first recording their unaided baseline assessment, preventing the planned measurement of attributable diagnostic improvement. Despite this, the study delivers valid secondary evidence: a malignancy detection AUC of 0.84, melanoma-specific specificity of 91%, and HCP satisfaction and recommendation scores (7.6/10 and 7.7/10 respectively), all obtained from a real-world referral pathway setting. The MINORS score of 21/24 correctly reflects the protocol deviation's impact on the unbiased endpoint assessment (item 5, one point deducted) and the adequacy of the without-device control condition (item 9, one point deducted); the remaining ten items score at or near maximum, confirming that the secondary performance metrics are methodologically sound and unaffected by the deviation.
R-TF-015-012: Post-market cross-sectional observational study of the equivalent legacy device
This study sits at the study-specific tier of the legacy post-market documentation hierarchy. Its Protocol (R-TF-015-012) is nested inside the legacy umbrella PMS Plan (R-TF-007-005); its Report (Appendix D to R-TF-015-012) is nested inside the legacy umbrella PMS Report (R-TF-007-003). The summary below presents the study as a source of clinical evidence for this CER under MDCG 2020-6 §6.2.2.
- Setting: Post-market surveillance of the equivalent legacy device across all 21 client institutions. Multi-site, physician-reported outcomes.
- Sample: 60 responses collected across all 21 client institutions; analysis set N = 56 (34 dermatologists, 13 primary care physicians, 9 hospital managers) after four responses were excluded under the protocol's Section 10.7 evidence-quality substantiation principle (unsubstantiated safety flags — F1 = Yes without any description in F1a). The Section 10.7 principle extends the Section 8.4 evidence-quality substantiation principle (already specified for quantitative endpoints) symmetrically to Section F binary-with-free-text safety items. The N = 56 analysis set still exceeds the protocol's stretch target of 45.
- Design: Cross-sectional observational study with retrospective recall. Data collection instrument: structured bilingual (EN/ES) physician questionnaire with 41 items across demographics, three benefit sections (B/C/D), overall assessment (E), and safety (F). Collection window: 23 March 2026 to 13 April 2026 (three weeks). Administered via a closed electronic survey platform. Formal study Protocol with pre-specified objectives, endpoints, MCID thresholds, SotA comparators, Holm-Bonferroni multiplicity correction for three co-primary endpoints, a pre-specified sensitivity analysis, and a pre-specified evidence-quality substantiation principle for quantitative endpoints (Section 8.4, extended symmetrically to Section F safety items by Section 10.7 of the same Protocol).
- Regulatory basis: MDR Article 83 (proactive PMS); MDR Article 85 (applicable via MDR Article 120(3) to the legacy MDD Class I device); MDCG 2020-6 §6.2.2 and §6.5.e.
- Primary endpoints and results: Three co-primary endpoints, each tested one-sided against a pre-specified MCID with Holm-Bonferroni correction at family-wise α = 0.05. All three pass. B2 (diagnostic assessment change rate) mean 18.77%, MCID 5%. C4 (treatment decisions informed by severity scores) mean 36.23/yr, MCID 10/yr. D4 (referral adequacy improvement) mean 15.56%, MCID 5%.
- Supportive endpoints: Six pre-specified supportive quantitative endpoints. All exceed their MCIDs: B4 rare disease identification 7.30/yr (MCID 3/yr); B6 malignancy detection 14.68/yr (MCID 5/yr); C5 longitudinal monitoring 30.53% (MCID 5%); D2 waiting time reduction 14.53% (MCID 5%); D6 remote assessment adequacy 47.76% (MCID 5%); D7 remote volume increase 24.64% (MCID 5%).
- Sensitivity analysis: Aggregate records-consulted proportion 36.0% (target ≥ 30%). Record-consulted and estimate-based subgroups are broadly consistent with no systematic divergence; supports data robustness.
- Safety (questionnaire Section F): F1 (misleading output) 26.8% (15 / 56 in the analysis set) — below the pre-specified 30% follow-up threshold, so the protocol-specified F1 follow-up is not triggered; the thematic review of the 15 substantiated F1 = Yes responses shows incidents consistent with the device's known edge-case limitations already documented in the risk management file, and F4 cross-referenced against the R-006-002 registry confirms no unreported serious incident. Prior to the application of the protocol Section 10.7 evidence-quality substantiation principle (see "Sample" above), the F1 proportion was 19 / 60 (31.7%), marginally above the threshold; the drop reflects the removal of four unsubstantiated F1 = Yes responses and is documented in the companion study report. F2 (usability issues) 30.4%. F3 (overall perceived safety) mean 4.14. F4 (formal incident reporting) 7.1%.
- Acceptance criteria status: Met. All three declared clinical benefits are confirmed in routine clinical practice: B2/C4/D4 co-primaries exceed MCID with Holm-adjusted significance; six supportive endpoints exceed their MCIDs.
- Methodological limitations (acknowledged): Physician-reported perceived outcomes (not independently measured patient outcomes); potential recall bias, mitigated by the sensitivity analysis; non-randomised cross-sectional design with retrospective recall, mitigated by published SotA comparators; selection bias intrinsic to PMS (only institutions with active device use — mitigated by convergent findings across 21 independent sites).
- MDCG 2020-1 pillar: Clinical Performance. The study observes the device's real intended-use context (the respondent physicians' routine clinical practice), its intended users (dermatologists, primary care physicians, hospital managers), and its intended population (dermatology patients being managed with the device in routine care).
R-TF-015-012 is the post-market dimension of the evidence portfolio. Under MDCG 2020-6 Appendix III, the study's primary classification is Rank 8 (proactive post-market surveillance data), applied to both the Likert professional-opinion items (B1, B3, B5, C1–C3, D1, D3, D5, E1, F3) and the quantitative endpoints (three co-primary endpoints B2, C4, D4 and six supportive endpoints B4, B6, C5, D2, D6, D7). This is the conservative reading of a cross-sectional physician-recall design and aligns with the notified-body default position that physician-recall survey data belongs at Rank 8. A supplementary Rank 4 case is retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note, justified by the study's methodological rigour — pre-specified under a formal PMS Study Protocol, analysed against pre-specified MCIDs derived from published SotA, subject to Holm-Bonferroni multiplicity correction for the three co-primary endpoints, supported by a pre-specified data-source sensitivity analysis and by safety surveillance (Section F), and reported alongside transparently acknowledged methodological limitations — but this is a supplementary reading, not the lead classification. The Pillar 3 sufficiency determination of this CER does not depend on the supplementary Rank 4 reading: Route C Ranks 2–4 prospective real-patient evidence and Rank 11 MRMC supporting evidence close the three-pillar chain without R-TF-015-012 at Rank 4. Detailed methodology and full statistical output are documented in the companion Study Report annexed to R-TF-015-012 (Appendix D); the study's role within the broader legacy-device PMS programme is consolidated in the legacy umbrella PMS Report (R-TF-007-003), which is itself the Report paired with the legacy umbrella PMS Plan (R-TF-007-005).
Clinical performance evidence from simulated-use reader studies (Rank 11: MRMC with healthcare professionals)
Per MDCG 2020-6 Appendix III, multi-reader multi-case (MRMC) studies using clinical images in a simulated assessment environment constitute "simulated use testing with healthcare professionals" (Rank 11). Although the images were not captured from live patient consultations, the studies do demonstrate — per MDCG 2020-1 §4.4 — that the intended users (dermatologists and primary care physicians) achieve clinically relevant outputs (diagnostic decisions) through predictable and reliable use of the device's Top-5 prioritised differential across the intended patient population and condition spectrum. On that basis, the MRMC studies contribute Clinical Performance evidence to Pillar 3 of the MDCG 2020-1 three-pillar framework. They are positioned below the prospective real-patient studies in the MDCG 2020-6 evidence hierarchy (Rank 11 vs. Ranks 2–4), reflecting the absence of live patient data, but they remain Pillar 3 evidence because what they measure is the clinical impact of device use by intended users.
The MRMC studies are presented as supporting Pillar 3 evidence that reinforces and extends the findings of the prospective studies. Their value lies in: (a) demonstrating consistency of device benefit across a larger number of healthcare professionals and conditions than any single prospective study; (b) enabling controlled comparison between primary care physicians and dermatologists; and (c) providing condition-specific performance data for rare diseases where prospective recruitment is impractical due to the extremely low prevalence of these conditions in the general dermatology population, which makes it difficult to achieve sufficient sample sizes in prospective studies for these conditions.
BI_2024: Prospective MRMC observational study

- Title:
- Multi-Reader Multi-Case Study for Assessing the Impact of Legit.Health Plus on the Clinical Assessment of Generalised Pustular Psoriasis and Other Skin Conditions by Healthcare Professionals.
- Investigational site(s):
- This study was conducted remotely by sending the images to the participating dermatologists.
- Principal investigator(s):
- Dr. Antonio Martorell Calatayud
- Sample size:
- 100
- Study period:
- June 2024 – September 2024
- Ethics committee:
- This study did not require Ethics Committee approval because it is observational and non-interventional. All data used consists of fully anonymized images sourced from public dermatology atlases and databases, containing no information permitting patient identification. As such, the research meets the criteria for exemption from ethics committee review under applicable regulatory frameworks.
- Sponsor:
- Boehringer Ingelheim
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Unit of analysis: 15 healthcare professionals (dermatologists and primary care physicians) × 100 images = 1,500 evaluations. Self-controlled (without-device vs with-device arms per reader).
- Registration and ethics basis: ClinicalTrials.gov NCT07428915; EMA RWD Catalogue EUPAS1000000910. Ethics exemption documented in R-TF-015-011. Published in JMIR Dermatology.
- Primary endpoint and results: Overall diagnostic accuracy improvement +15.12 percentage points (47.94% → 63.06%). Sensitivity +18.43%. Specificity +19.38%. Primary care physicians: +17% improvement. Dermatologists: +8.39% improvement.
- Condition-specific results (rare diseases, Tier 2): Generalized Pustular Psoriasis accuracy +26.77 percentage points (25.56% → 57.88%). Hidradenitis suppurativa accuracy +24.14%. These results provide the evidence basis for the Tier 2 rare disease acceptance criteria.
- Acceptance criteria status: Met overall: significant improvement demonstrated across the combined evaluation set. Individual pathology-level significance varies due to subgroup sample sizes.
- Limitations: Simulated use environment (images, not real patients). Subgroup analyses underpowered for individual rare conditions (< 20 observations per condition in some subgroups).
- MDCG 2020-1 pillar: Clinical Performance (simulated-use MRMC with intended users on representative images of the intended patient population, per MDCG 2020-1 §4.4). Also provides Tier 2 rare disease evidence (sub-criterion (b) of benefit 7GH).
BI_2024 is the largest of the three MRMC studies and the principal Rank 11 Pillar 3 §4.4 supporting evidence source for Tier 2 rare-disease performance claims. With 15 healthcare professionals evaluating 100 standardised images — with a pre-specified focus on Generalised Pustular Psoriasis and Hidradenitis Suppurativa — the study demonstrated large and consistent improvements in diagnostic accuracy for rare conditions (GPP +26.77 percentage points, HS +24.14 percentage points). These results layer onto Pillar 1 literature and Pillar 2 algorithm performance evidence for the Tier 2 rare-disease subgroup; they are not "clinical data" under MDR Article 2(48) and are therefore not the primary Pillar 3 evidence for a rare-disease diagnostic claim. Prospective real-patient recruitment at sufficient volume is impractical for these very-low-prevalence conditions; the PMCF programme pre-specifies post-market real-patient data collection to confirm (not fill) the pre-market evidence base per MDCG 2020-6 §6.3. The overall accuracy improvement of +15.12 percentage points across both primary care physicians and dermatologists also contributes Tier 3 breadth evidence. The MINORS score of 22/24 reflects strong internal methodological quality; the two-point deduction reflects that 6 of 15 HCPs (40%) did not complete the full 100-image set (item 7) — which may introduce incomplete-reader bias — and that HCP and image recruitment was by convenience rather than consecutive real-world presentation (item 2), which is inherent to all MRMC designs.
PH_2024: Prospective MRMC observational study

- Title:
- Multi-Reader Multi-Case Study Assessing the Impact of Legit.Health Plus on the Diagnostic Accuracy and Referral Decision-Making of Primary Care Physicians for Skin Lesions.
- Investigational site(s):
- This study was conducted remotely by sending the images to the participating professionals.
- Principal investigator(s):
- Dr. Gastón Roustán Gullon
- Sample size:
- 30
- Study period:
- June 2024 – September 2024
- Ethics committee:
- This study does not require an Ethics Committee approval due to the nature of the images used. The images employed in this study are completely anonymized medical images sourced from public dermatological atlases and freely available public sources. These images are not derived from identifiable patients, and their anonymization makes patient recognition impossible. As these images constitute non-personal data, they do not fall under the scope of regulations requiring ethics committee approval for studies based on fully anonymized, publicly available data.
- Sponsor:
- Instituto de Investigación Sanitaria Puerta de Hierro
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Unit of analysis: 9 primary care physicians × 30 images = 270 evaluations across 9 skin conditions (nevus, melanoma, BCC, urticaria, pustular psoriasis, actinic keratosis, plaque psoriasis, hidradenitis suppurativa). Self-controlled (without-device vs with-device arms per reader).
- Registration and ethics basis: ClinicalTrials.gov NCT07428941; EMA RWD Catalogue EUPAS1000000644. Ethics exemption documented in R-TF-015-011.
- Primary endpoint and results: Overall diagnostic accuracy improvement +18.15 percentage points (63.70% → 81.85%). Sensitivity +14.60% (68.55% → 83.15%). Specificity +11.90% (78.01% → 89.91%). Statistical significance confirmed (p = 0.0001).
- Condition-specific results: Pustular psoriasis: +16.66% absolute improvement. Hidradenitis suppurativa: +15.56%.
- Acceptance criteria status: Met: primary objective met; 10% improvement threshold exceeded.
- Limitations: Small HCP cohort (9 PCPs). Pathology-level analyses limited by small n per condition.
- MDCG 2020-1 pillar: Clinical Performance (simulated-use MRMC with intended users on representative images of the intended patient population, per MDCG 2020-1 §4.4).
PH_2024 is distinguished within the MRMC group by its completion rate: all 9 primary care physicians completed the full 30-image set in both the without-device and with-device phases, providing unaffected and complete data for every evaluation. This zero-dropout design eliminates the missing-reader bias present in the other two MRMC studies. The +18.15 percentage point overall accuracy improvement (p = 0.0001) provides particularly robust evidence for the primary care physician user profile — the clinical context where the device's intended benefit is greatest in absolute terms. Condition-specific results (pustular psoriasis +16.66 percentage points, HS +15.56 percentage points) independently corroborate the BI_2024 rare disease findings at a separate site. The MINORS score of 23/24 — the highest in the MRMC group — reflects this completion advantage; the single deduction (item 2) acknowledges that HCPs and images were recruited by convenience rather than consecutive real-world presentation, which is inherent to the MRMC design.
SAN_2024: Prospective MRMC observational study
- Title:
- Multi-Reader Multi-Case Study for Evaluating the Impact of Legit.Health Plus Device on the Healthcare Practitioners' Assessment of Skin Lesions
- Investigational site(s):
- This study was conducted remotely by sending the images to the participating professionals.
- Principal investigator(s):
- Dr. Antonio Martorell Calatayud
- Sample size:
- 29
- Study period:
- June 2024 – October 2024
- Ethics committee:
- This study did not require an Ethics Committee approval because it is observational and non-interventional. All data used consists of fully anonymized images sourced from public dermatology atlases and databases, containing no information permitting patient identification. As such, the research meets the criteria for exemption from ethics committee review under applicable regulatory frameworks.
- Sponsor:
- Sanitas Hospitales SA
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not yet submitted for publication
- Unit of analysis: 16 healthcare professionals × 29 images = 464 evaluations across 13 diverse conditions (dermatitis, melanoma, alopecia, urticaria, granuloma annulare, seborrhoeic keratosis, herpes, tinea, psoriasis, onychomycosis, acne, pressure ulcer, nevus). Multi-site mix of primary care physicians and dermatologists.
- Registration and ethics basis: ClinicalTrials.gov NCT07428954; EMA RWD Catalogue EUPAS1000000911. Ethics exemption documented in R-TF-015-011.
- Primary endpoint and results: Overall diagnostic accuracy improvement +20.70 percentage points (68.08% → 88.78%). Sensitivity +28.03% (52.61% → 80.64%). Specificity +30.39% (56.45% → 86.84%). Primary care physicians: +27% improvement. Dermatologists: +10.50% improvement.
- Acceptance criteria status: Met: substantial improvements across all specialties and conditions.
- Limitations: Small image set (29 images).
- MDCG 2020-1 pillar: Clinical Performance (simulated-use MRMC with intended users on representative images of the intended patient population, per MDCG 2020-1 §4.4).
SAN_2024 provides the broadest condition coverage of the three MRMC studies, with 16 healthcare professionals evaluating 29 images spanning 13 diverse dermatological conditions — ranging from common presentations such as psoriasis, dermatitis, and acne to less common ones such as granuloma annulare and pressure ulcer. The overall accuracy improvement of +20.70 percentage points is the largest absolute improvement in the MRMC portfolio, with particularly strong gains in sensitivity (+28.03 percentage points) and specificity (+30.39 percentage points). The consistent pattern of larger absolute improvements in primary care physicians (+27%) than in dermatologists (+10.50%) is clinically expected and supports the device's primary intended user profile. The study's principal limitation is the small image set (29 images across 13 conditions), which means individual condition subgroups are underpowered for condition-specific statistical inference; condition-level data from this study is treated as directional rather than confirmatory. The MINORS score of 22/24 mirrors BI_2024, with deductions for convenience HCP recruitment (item 2) and partial image-set completion by 4 of 16 HCPs (item 7).
MAN_2025: Prospective MRMC observational study on Fitzpatrick V–VI images

- Title:
- Multi-Reader Multi-Case Study for Evaluating the Diagnostic Performance of Healthcare Professionals Assisted by Legit.Health Plus on Fitzpatrick Phototype V–VI Skin Presentations
- Investigational site(s):
- This study is conducted remotely through a centralized web-based platform.
- Principal investigator(s):
- Dr. Antonio Martorell Calatayud
- Study period:
- January 2026 – April 2026
- Ethics committee:
- This study does not require Ethics Committee approval because it is observational and non-interventional. All data used consists of fully anonymized images sourced from public dermatology atlases and databases, containing no information permitting patient identification. As such, the research meets the criteria for exemption from ethics committee review under applicable regulatory frameworks.
- Sponsor:
- AI Labs Group S.L.
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Not applicable
- Unit of analysis: Healthcare professionals spanning dermatology, primary care and nursing × 149 curated images sourced from public dermatology atlases = paired reader-image evaluations. Self-controlled (without-device vs with-device arms per reader). Image set selected to be representative of Fitzpatrick phototype V and VI presentations of multiple dermatological conditions.
- Registration and ethics basis: Simulated-use reader study on retrospective public-atlas images; does not meet MDR Article 2(45) definition of a clinical investigation, accordingly no ClinicalTrials.gov or EMA RWD registration applies (rationale documented in the CIR under
R-TF-015-006§Trial Registrations). - Primary endpoint and results: Top-1 diagnostic accuracy improvement in the with-device arm vs the without-device arm, analysed at the paired-observation level with pre-specified ≥ 50%-completers and 100%-completers sensitivity analyses. Full quantitative results, completion analyses and subgroup breakdowns are recorded in the Clinical Investigation Report (
R-TF-015-006 (MAN_2025)). - Acceptance criteria status: Met: the pre-specified primary clinician-incremental-benefit endpoint was achieved on the primary analysis cohort, with consistent direction across the completers sensitivity analyses.
- Limitations: Simulated use environment (images, not real patients). Image set sourced from public dermatology atlases rather than prospectively captured during live clinical encounters. Three enrolled readers were excluded as screen failures (specialties outside the device's declared intended user population).
- MDCG 2020-1 pillar: Pillar 3 Clinical Performance §4.4 supporting evidence at Rank 11 (simulated-use MRMC with intended users on representative Fitzpatrick V–VI images, per MDCG 2020-1 §4.4).
MAN_2025 is the fourth MRMC simulated-use reader study in the evidence portfolio and the one specifically addressing the Fitzpatrick V–VI representativeness question raised under MDCG 2020-6 § 6.5(e). Its role and evidentiary weight are directly analogous to those of SAN_2024 and PH_2024: it is Rank 11 Pillar 3 §4.4 supporting evidence, not real-patient clinical data under MDR Article 2(48). The image set was curated from public dermatology atlases to be representative of Fitzpatrick V–VI presentations across multiple dermatological conditions, enabling measurement of whether the clinician incremental benefit observed in the other MRMC studies extends to darker-phototype presentations. Data lock was performed on 17 April 2026. The MINORS score and item-level breakdown, together with the completion analyses, are reported in the Clinical Investigation Report (R-TF-015-006 (MAN_2025)). This study reinforces, but does not replace, the real-world Pillar 3 evidence on Fitzpatrick V and VI patients that is addressed through Post-Market Clinical Follow-up (PMCF Activities E.1 and F.1 in R-TF-007-002).
Supporting / proof-of-concept pilot: specific indication (Rank 6 retrospective validation, n = 2 patients)
AIHS4_2025: Retrospective proof-of-concept pilot study
- Title:
- Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa
- Investigational site(s):
- This study was conducted remotely based on clinical trial image evaluations.
- Principal investigator(s):
- Dr. Antonio Martorell Calatayud
- Sample size:
- 2
- Study period:
- June 2024 – July 2024
- Ethics committee:
- This study did not require an Ethics Committee approval due to its observational non-interventional nature.
- Sponsor:
- AI Labs Group S.L.
- Device under investigation:
- Legit.Health Plus
- Publication status:
- Published (May 2023)
- Data source and unit of analysis: secondary analysis of anonymised image data from the M-27134-01 hidradenitis suppurativa clinical trial (third-party data collected under informed consent for investigational use). 2 HS patients × 4 time points (Days 1, 15, 29, 43) × 4 evaluations per time point = 16 evaluations per patient.
- Regulatory basis: retrospective, single-indication, derivative analysis. Ethics justification documented in R-TF-015-011 (third-party trial with informed consent for investigational data use). Not registered as a separate trial (derived from the existing M-27134-01 trial).
- Primary endpoint and results: AIHS4 severity score accuracy vs gold standard: 71.66% (production version, 95% CI: 65.3-77.9). ICC 0.716 (production) against acceptance criterion ICC ≥ 0.70. Temporal stability: 6.7% variation between consecutive visits. Performance exceeds expert dermatologist interobserver agreement (47.91%).
- Acceptance criteria status: Met: ICC ≥ 0.70 criterion achieved.
- Limitations: Extremely small sample (2 patients). Cannot generalise to the broader hidradenitis suppurativa population. This study provides preliminary confirmatory evidence for severity assessment only, not diagnostic support. A dedicated PMCF activity (Activity B.5 in
R-TF-007-002) targets 100 patients for confirmatory validation. - MDCG 2020-1 pillar: Clinical Performance (very limited scope, severity assessment for a single condition).
AIHS4_2025 is a purpose-built proof-of-concept pilot study for a single high-value clinical application: automated severity assessment of Hidradenitis Suppurativa using the IHS4 scoring system. With only 2 subjects providing 16 longitudinal evaluations, it is exploratory rather than definitive clinical validation and is explicitly not counted as a pivotal clinical investigation for CE-marking sufficiency purposes. Its role in the portfolio is specifically to establish proof-of-concept that the automated AIHS4 algorithm meets the ICC ≥ 0.70 acceptance criterion against an independent, blinded multi-expert IHS4 panel — which it does (ICC 0.716). The QUADAS-2 assessment (detailed in "Validated methodological quality appraisal" below) confirms that the index-test procedure, reference-standard methodology, and evaluation completeness are all sound (LOW risk in Domains 2, 3, and 4); HIGH risk in Domain 1 reflects entirely the n = 2 pilot design, not a methodology flaw. Powered prospective confirmation is pre-committed under PMCF Activity B.5, which targets 100 HS patients for a statistically powered prospective cohort study.
Validated methodological quality appraisal
Each study in the evidence portfolio has been appraised using the validated quality tool most appropriate to its study design, in accordance with MEDDEV 2.7.1 Rev 4 Section 9 (Stage 2 appraisal) and MDCG 2020-6 § 6.3. Four study design families are present in the portfolio, each requiring the appropriate validated tool.
The four studies for which the device functions as an index test evaluated against a reference standard — MC_EVCDAO_2019, IDEI_2023, AIHS4_2025, and NMSC_2025 — are appraised with QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies), the internationally validated tool for this design type. The three clinical utility and referral pathway studies — COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, and DAO_Derivación_PH_2022 — are appraised with MINORS (Methodological Index for Non-Randomised Studies), the validated quality tool for non-randomised observational and comparative studies. The three MRMC simulated-use studies — BI_2024, PH_2024, and SAN_2024 — are also appraised with MINORS as the most appropriate validated comparative tool; the adaptation for the MRMC context is described below. The four published severity validation studies — APASI_2025, AUAS_2023, AIHS4_2023, and ASCORAD_2022 — are appraised with MINORS (non-comparative) as retrospective algorithm validation studies providing Technical Performance evidence per MDCG 2020-1 Pillar 2; the adaptation for this study type is described in the dedicated subsection below.
QUADAS-2: diagnostic accuracy studies
QUADAS-2 assesses four domains for risk of bias (RoB) and applicability concerns. Ratings are LOW, HIGH, or UNCLEAR.
| Domain | MC_EVCDAO_2019 | IDEI_2023 | AIHS4_2025 | NMSC_2025 |
|---|---|---|---|---|
| 1. Patient Selection — RoB | HIGH | HIGH | HIGH | HIGH |
| 1. Patient Selection — Applicability | HIGH | LOW | UNCLEAR | HIGH |
| 2. Index Test — RoB | LOW | LOW | LOW | LOW |
| 2. Index Test — Applicability | LOW | LOW | LOW | LOW |
| 3. Reference Standard — RoB | LOW | LOW | LOW | LOW |
| 3. Reference Standard — Applicability | LOW | LOW | LOW | LOW |
| 4. Flow and Timing — RoB | HIGH | HIGH | LOW | LOW |
MC_EVCDAO_2019. Domain 1 HIGH risk reflects the specialist-enriched enrolment population (elevated melanoma prevalence) and the 52% enrolment rate (105 of 200 planned patients); HIGH applicability reflects that the full intended-use population includes primary care settings with lower lesion prevalence. Domain 3 LOW risk: the reference standard is appropriate for the study aims — histopathological biopsy for malignant lesions and blinded expert clinical consensus for benign lesions, following standard dermatology practice — and the consensus panel was blinded to device output. Domain 4 HIGH risk reflects the same enrolment shortfall and the differential reference standard application across patients.
IDEI_2023. Domain 1 HIGH risk is driven by the 33% post-enrolment exclusion of prospective pigmented-lesion patients due to the absence of biopsy confirmation — a systematic pattern that biases the retained sample toward histologically confirmed cases and inflates apparent accuracy. Domain 3 LOW risk: the reference standard is appropriate for each endpoint — histopathological biopsy for malignant pigmented lesions and the Ludwig scale (the validated clinical grading tool for androgenetic alopecia) for the alopecia endpoint — and graders were blinded to device output in both cases. Domain 4 HIGH risk reflects the 33% post-enrolment exclusion and the use of different reference standards across the two study endpoints within the same study.
AIHS4_2025. Domain 1 HIGH risk reflects entirely the n = 2 pilot design with purposive subject selection from a larger trial, which precludes establishing spectrum representativeness. Domains 2, 3, and 4 are all LOW risk: the automated index-test procedure, the multi-expert blinded reference panel, and the complete evaluation across all four time points for both subjects are methodologically sound.
NMSC_2025. Domain 1 HIGH risk of bias reflects the specialist referral population: all 135 patients were referred to a head & neck surgery clinic for suspicious skin lesions, producing a malignancy prevalence of 80% (108/135 — 54 BCC, 54 cSCC) substantially above what would be encountered in primary care or general dermatology settings. HIGH applicability reflects this prevalence mismatch — sensitivity estimates computed in a high-prevalence specialist setting are inflated relative to the device's intended-use context in primary care, where pre-test probability is lower. Domains 2, 3, and 4 are all LOW risk: the device operated on standardised smartphone images acquired under a controlled protocol (12 MP dual camera, 10 cm distance, regular ambient light, no zoom or flash), the reference standard is histological confirmation by trained pathologists applied to all 135 included patients without exception (biopsy or surgical excision), and the prospective design with complete data collection for all included patients introduces no flow or timing bias.
MINORS: clinical utility and referral pathway studies
MINORS scores 0 (not reported), 1 (reported but inadequate), or 2 (reported and adequate) per item. Non-comparative studies (items 1-8, maximum 16) do not include items 9-12 (marked —). As a general quality benchmark derived from the original MINORS validation, non-comparative studies scoring 12 or above and comparative studies scoring 20 or above indicate adequate to good methodological quality. Scores below the maximum in individual items reflect documented, contextualised limitations — as described in the per-study narrative above — rather than hidden or undisclosed methodological flaws.
| MINORS item | COVIDX_2022 | DAO_O_2022 | DAO_PH_2022 |
|---|---|---|---|
| 1. Clearly stated aim | 2 | 2 | 2 |
| 2. Consecutive patients | 1 | 2 | 2 |
| 3. Prospective data collection | 2 | 2 | 2 |
| 4. Appropriate endpoints | 2 | 2 | 2 |
| 5. Unbiased endpoint assessment | 1 | 2 | 1 |
| 6. Appropriate follow-up | 2 | 2 | 2 |
| 7. Loss to follow-up ≤5% | 1 | 1 | 1 |
| 8. Prospective sample size calculation | 2 | 1 | 2 |
| 9. Adequate control group | — | 2 | 1 |
| 10. Contemporary groups | — | 2 | 2 |
| 11. Baseline equivalence | — | 2 | 2 |
| 12. Adequate statistical analyses | — | 2 | 2 |
| Total | 13/16 | 22/24 | 21/24 |
COVIDX_EVCDAO_2022 (13/16). The three-point deduction reflects the self-reported clinical utility outcome (item 5: Hawthorne effect acknowledged in the CIP, no blinding possible by design), the pre-enrolment selection against patients with unstable disease course (item 2), and the absence of explicit post-enrolment completion reporting in the CIR (item 7). The power calculation was documented and the enrolment target was met (item 8 scored 2).
DAO_Derivación_O_2022 (22/24). The two-point deduction reflects the enrolment shortfall (127 of approximately 380 planned patients; item 7 scored 1) and the consequent underpowering relative to the sample size calculation (item 8 scored 1). All other methodological elements are sound; objective system-derived endpoints and a well-defined concurrent control group contribute to the high overall score.
DAO_Derivación_PH_2022 (21/24). The three-point deduction reflects the protocol deviation that prevented clean without-device baseline assessments (item 5: systematic assessment bias introduced; item 9: planned control condition not properly executed as designed) and the absence of explicit confirmation that all enrolled patients completed both phases (item 7 scored 1). The secondary performance metrics — malignancy AUC 0.84, melanoma specificity 91% — are unaffected by the deviation and their MINORS item scores reflect this.
MINORS: MRMC simulated-use studies
The four MRMC studies are methodologically distinct from the clinical studies above. In MRMC studies, healthcare professionals evaluate a standardised pre-specified set of images — first without device assistance and then with device assistance — in a controlled online environment. No patients are seen in a real clinical encounter; the evaluable unit is a reader-image evaluation, not a patient episode. This design places these studies at MDCG 2020-6 Appendix III Rank 11 (simulated use testing with healthcare professionals), which excludes them from the definition of clinical data under MDR Article 2(48), and at MDCG 2020-1 Pillar 3 §4.4 (supporting Clinical Performance evidence). Rank and Pillar are orthogonal axes: Rank 11 reflects that the measurement is simulated-use rather than real-patient; Pillar 3 reflects that what is being evidenced is the clinician's decision-making when consuming the device's Top-5 prioritised differential. MRMC is not Pillar 2 because a clinician is in the loop; Pillar 2 is the clinician-free algorithm-level claim across the 346 ICD-11 categories.
MINORS is applied as the most appropriate validated comparative quality tool. Item 2 (consecutive patients) is interpreted as the representativeness and completeness of the image set and HCP cohort rather than sequential patient enrolment. As a sensitivity check, applying unadapted item 2 to each MRMC study does not change the methodological-quality conclusion — the aggregated deduction remains within the acceptable quality band for the MRMC study design family.
| MINORS item | BI_2024 | SAN_2024 | PH_2024 |
|---|---|---|---|
| 1. Clearly stated aim | 2 | 2 | 2 |
| 2. Consecutive patients* | 1 | 1 | 1 |
| 3. Prospective data collection | 2 | 2 | 2 |
| 4. Appropriate endpoints | 2 | 2 | 2 |
| 5. Unbiased endpoint assessment | 2 | 2 | 2 |
| 6. Appropriate follow-up | 2 | 2 | 2 |
| 7. Loss to follow-up ≤5% | 1 | 1 | 2 |
| 8. Prospective sample size calculation | 2 | 2 | 2 |
| 9. Adequate control group | 2 | 2 | 2 |
| 10. Contemporary groups | 2 | 2 | 2 |
| 11. Baseline equivalence | 2 | 2 | 2 |
| 12. Adequate statistical analyses | 2 | 2 | 2 |
| Total | 22/24 | 22/24 | 23/24 |
*In MRMC studies, item 2 assesses the representativeness of the image set and HCP cohort rather than consecutive patient enrolment. HCPs recruited via professional networks and images sourced from dermatology atlases or sponsor datasets score 1 (not consecutive real-world presentation). This is inherent to all MRMC designs and does not reflect a correctable methodological flaw.
The high MINORS scores (22-23/24) across all three MRMC studies reflect the strong internal validity of the self-controlled crossover design: the same readers evaluate the same images in both conditions, eliminating between-group confounding. BI_2024 and SAN_2024 score 22/24 rather than 23/24 because 6 of 15 HCPs (BI_2024) and 4 of 16 HCPs (SAN_2024) did not complete the full image set (item 7 scored 1), potentially introducing incomplete-reader bias. PH_2024 achieves 23/24 because all 9 PCPs completed the full 30-image set in both phases.
MINORS: published severity validation studies (Technical Performance evidence)
The four published severity validation studies share a common retrospective, non-comparative design: the device's severity scoring algorithm is applied to a curated image dataset, and its output is compared against independent expert dermatologist consensus using a validated clinical severity scale. No patients are enrolled prospectively; images are drawn from existing clinical archives or labelled datasets. This design places these publications at MDCG 2020-6 Appendix III Rank 6 (peer-reviewed literature on the device's own algorithms, with MINORS methodological-quality appraisal layered on top; Appendix III Rank 5 is reserved for equivalence data, so peer-reviewed literature on the device's own algorithms belongs at Rank 6 by strict reading), and classifies them as Technical Performance evidence per MDCG 2020-1 Pillar 2.
MINORS non-comparative (items 1–8, maximum 16) is applied using the same item 2 adaptation described for the MRMC studies: item 2 assesses the representativeness and completeness of the image dataset rather than consecutive patient enrolment.
| MINORS item | APASI_2025 | AUAS_2023 | AIHS4_2023 | ASCORAD_2022 |
|---|---|---|---|---|
| 1. Clearly stated aim | 2 | 2 | 2 | 2 |
| 2. Consecutive patients* | 1 | 1 | 1 | 1 |
| 3. Prospective data collection | 0 | 0 | 0 | 0 |
| 4. Appropriate endpoints | 2 | 2 | 2 | 2 |
| 5. Unbiased endpoint assessment | 2 | 2 | 2 | 2 |
| 6. Appropriate follow-up | 2 | 2 | 2 | 2 |
| 7. Loss to follow-up ≤5% | 2 | 2 | 2 | 2 |
| 8. Prospective sample size calculation | 0 | 0 | 0 | 0 |
| Total | 11/16 | 11/16 | 11/16 | 11/16 |
*Item 2 adapted: representativeness and completeness of the image dataset (curated retrospective collection from clinical archives or labelled datasets) rather than consecutive patient enrolment, using the same rationale as for MRMC studies above.
All four publications score 11/16, reflecting their identical retrospective design. Items 3 and 8 are structurally 0 for retrospective algorithm validation studies: data collection is retrospective in all four cases, and none report formal statistical power calculations for the ICC endpoint — both features are inherent to this study type and do not represent correctable methodological flaws. The remaining six items are all adequate: each study has a clearly stated aim (item 1 = 2), appropriate endpoints for severity scoring concordance — ICC, RMSE, or AUC against expert consensus (item 4 = 2), independent specialist annotators serving as the blinded reference standard (item 5 = 2), complete dataset evaluation with no image attrition (item 7 = 2), and an appropriate assessment scope (item 6 = 2). Item 2 scores 1 for all four because the image datasets were curated from clinical archives rather than enrolled consecutively from a prospective patient population.
The 11/16 score indicates sufficient methodological quality for the purpose of providing Technical Performance evidence per MDCG 2020-1 Pillar 2. Note: although the title of ASCORAD_2022 includes the phrase "Pilot Study," this reflects the authors' conservative characterisation relative to prospective clinical validation, not a methodological limitation. The study employs the same rigorous retrospective validation methodology as the other three publications, with a larger dataset (1,083 images across 3 annotated datasets) and more expert annotators (9 dermatologists, 3 per dataset) than AIHS4_2023 (221 images, 6 annotators) or AUAS_2023 (313 images, 5 annotators).
Post-market surveillance data (Rank 7 for passive vigilance and curated QMS data; Rank 8 for the protocolled RWE study)
Per MDR Article 2(48), clinical data includes "safety or performance information that is generated from the post-market surveillance, in particular the post-market clinical follow-up." The equivalent legacy device has been on the market since 2020 and has generated extensive PMS data that constitutes clinical data for the purposes of this evaluation. The aggregated passive-surveillance dataset and the protocolled proactive-surveillance study are consolidated in the legacy device PMS Report (R-TF-007-003), prepared under MDR Article 85 (applicable via MDR Article 120(3)).
- Market experience: 21 active contracts, over 250,000 clinical reports processed, 500+ healthcare practitioners, 100,000+ patients across hospital and primary care settings (2020-present).
- MDR Article 87 serious incidents: Zero, confirmed by the R-006-002 non-conformity registry across the full reporting period.
- MDR Article 88 trend reports: Zero triggered.
- Customer-reported Category 3a events (Rank 7 passive PMS): Three events over approximately four years (rate ~0.0012% against ~250,000 diagnostic reports) — one clinical-output accuracy feedback (May 2023) and two API availability events (September 2023, September 2024). All closed; no patient harm reported.
- Non-safety complaints (Category 4): Two events, both closed. Neither related to clinical output.
- Field safety corrective actions (FSCAs): Zero FSCAs.
- Trend analysis: No increasing trend in incident frequency or severity identified; no Article 88 threshold exceeded.
- Proactive post-market clinical study (Rank 8 primary for both quantitative endpoints and Likert professional-opinion items; supplementary Rank 4 case retained for quantitative endpoints):
R-TF-015-012— a cross-sectional observational study with retrospective recall conducted under a formal PMS Study Protocol. Primary Rank 8 classification reflects the conservative reading of the cross-sectional physician-recall design and aligns with the MDCG 2020-6 Appendix III Rank 8 category ("proactive post-market surveillance data"). The supplementary Rank 4 case, retained only for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note, is not the lead and the CER's sufficiency determination does not depend on it, documented in the preceding subsection and in the companion study report annexed toR-TF-015-012. The PMS Study Protocol was dated and approved on 7 November 2025 under the manufacturer's standing MDR Article 83 proactive PMS programme for the equivalent legacy device (R-TF-007-001PMS Plan).
The zero-serious-incident, zero-FSCA market experience over 4+ years of clinical deployment across 21 hospital and primary care contracts provides real-world confirmation that the device's safety profile is consistent with the risk assessment conclusions. The severity 4 risk ceiling established in the AI risk assessment (R-TF-012-037) is justified in practice by this market experience. The proactive RWE study complements this passive-surveillance confirmation with Rank 8 primary evidence (both quantitative and Likert professional-opinion items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys" note) across all three declared clinical benefits.
Sufficiency is independent of the Rank 4 supplementary case. The primary classification of R-TF-015-012 is Rank 8 for both quantitative and Likert items. Even in the absence of the supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note, the data-sufficiency conclusion of this CER is unchanged. The three declared clinical benefits are independently supported by the pre-market Pillar 3 Rank 2–4 prospective clinical investigations (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023), by the NMSC_2025 peer-reviewed third-party manuscript (Rank 4, Pillar 3), and by the Pillar 3 §4.4 Rank 11 MRMC simulated-use reader studies (BI_2024, PH_2024, SAN_2024, MAN_2025). R-TF-015-012 is convergent post-market confirmation; it is not the load-bearing source of any acceptance-criterion conclusion.
Tiered evidence assessment strategy
The device outputs a probability distribution over all visible ICD-11 categories simultaneously. It does not produce a binary positive/negative for any specific condition. This architecture means the device functions as a general classifier whose performance can validly be assessed across the full breadth of dermatological conditions.
However, MDR Article 61 and Annex XIV 1(a) require that the clinical evaluation demonstrate the device's clinical benefits, performance, and safety for its intended purpose. Article 2(53) defines clinical benefit in terms of a meaningful clinical outcome, and Annex II requires that the technical documentation include evidence that the device meets the applicable GSPRs for each claimed indication. Implementing these requirements, the applicable guidance documents, MEDDEV 2.7.1 Rev 4 Annex A7.3 (sensitivity/specificity for major clinical indications), MDCG 2020-1 (separate Valid Clinical Association per claimed output), and MDCG 2020-6 Appendix III (risk-based justification for data pooling), mandate condition-level or category-level evidence assessment. Accordingly, the clinical evaluation adopts a risk-proportionate, tiered evidence structure:
- Tier 1 (Malignant conditions, individual analysis): The clinical consequence of misclassification is highest: delayed cancer diagnosis can lead to disease progression and mortality. Performance is assessed with individual acceptance criteria per condition or condition group. Evidence is drawn from dedicated studies (MC_EVCDAO_2019 for melanoma, plus malignancy prediction endpoints across 6 further studies).
- Tier 2 (Rare diseases, grouped analysis): Rare diseases are frequently misdiagnosed; delayed diagnosis leads to prolonged suffering and inappropriate treatment. Performance is assessed as a dedicated subgroup with its own acceptance criterion within benefit 7GH (absolute Top-1 accuracy >= 54%). The rare diseases subgroup is defined in the BI_2024 study protocol: GPP, acne conglobata, palmoplantar pustulosis, subcorneal pustular dermatosis, AGEP, and pemphigus vulgaris.
- Tier 3 (General conditions, pooled with risk-based justification): For non-malignant, non-rare conditions, the clinical consequence of an incorrect ranking is comparable: delayed or modified treatment, not mortality. Performance is assessed as a pooled aggregate with explicit risk-based justification (see below).
This tiered structure ensures that evidence assessment is proportionate to clinical risk: high-risk conditions receive individual scrutiny, while lower-risk conditions are validly pooled with documented justification.
Data pooling methodology
Aggregate performance metrics (globalValueOfDevice) are calculated using the following weighted average formula:
Pooling of Tier 3 (general conditions) data across conditions is justified on four grounds:
- Comparable clinical consequence of misclassification. Within non-malignant, non-rare categories, the typical clinical consequence of an incorrect ranking is delayed or modified treatment. While individual exceptions exist (e.g., untreated infectious conditions can occasionally progress to serious complications), the physician's independent clinical assessment, not the device output alone, determines the management pathway, providing a safety net that is absent in standalone diagnostic scenarios. This risk profile is fundamentally different from Tier 1 (malignant conditions), where a missed diagnosis directly impacts mortality.
- Device architecture supports pooling. The device outputs a probability distribution over all ICD-11 categories simultaneously. It does not make independent per-condition predictions; it ranks likelihoods across the full ICD-11 space. Assessing how well this ranking performs across the general dermatological spectrum is therefore a natural and valid evaluation approach.
- Representative sampling across epidemiological categories. The pooled studies include conditions from all major epidemiological categories of dermatological disease (see Evidence coverage by disease category below), ensuring that the pooled analysis reflects the breadth of conditions encountered in clinical practice rather than being limited to a single disease area.
- Consistent architecture supports the expectation of consistent capability. The uniform Vision Transformer architecture processes all input images through the same feature-extraction procedure regardless of condition, introducing no condition-specific biases. While absolute performance varies by condition (as demonstrated in per-condition results tables within each study), validated capability on representative conditions from across the epidemiological spectrum provides a technical basis for confidence in the device's generalised performance.
The populations across the pooled studies were representative of real-world clinical practice, including both primary care physicians and dermatologists as intended users. The studies were conducted in comparable clinical settings with representative patient demographics, ensuring that the results are applicable to the intended population.
Evidence coverage by disease category
Throughout the clinical evaluation, several performance claims use the indication label "Multiple conditions." This label does not refer to an unspecified group of diseases. It reflects a broad, representative inclusion aligned with the diverse ICD-11 categories evaluated in the respective clinical studies. To demonstrate this representativeness, the clinical evidence portfolio is mapped against the seven major epidemiological categories of dermatological disease, based on the Global Burden of Disease Study (Karimkhani et al., 2017):
| Category | Approximate prevalence | Studies with representation |
|---|---|---|
| Infectious diseases (fungal, bacterial, viral) | 57% | BI_2024 (impetigo, tinea corporis), SAN_2024 (herpes, tinea, onychomycosis), COVIDX_EVCDAO_2022 (folliculitis, herpes, tinea), DAO_Derivación_PH_2022 (warts, molluscum, herpes simplex) |
| Other conditions (acne, alopecia, urticaria) | 19% | BI_2024 (acne variants), SAN_2024 (acne, alopecia, urticaria), IDEI_2023 (androgenetic alopecia, 96 patients), COVIDX_EVCDAO_2022 (acne, 67 patients; alopecia), DAO_Derivación_O_2022 (alopecia), PH_2024 (urticaria), DAO_Derivación_PH_2022 (urticaria) |
| Inflammatory diseases (psoriasis, AD, HS, eczema) | 15% | BI_2024 (GPP, dermatitis, psoriasis, HS, AGEP +4), PH_2024 (psoriasis ×2, HS), SAN_2024 (dermatitis, psoriasis), AIHS4_2025 (HS severity), COVIDX_EVCDAO_2022 (psoriasis, AD, HS, eczema, lichen planus, rosacea), DAO_Derivación_O_2022 (psoriasis ×3, eczema ×3, AD), DAO_Derivación_PH_2022 (psoriasis, AD, HS, lichen planus) |
| Malignant diseases (melanoma, BCC, SCC) | 5% | MC_EVCDAO_2019 (melanoma 36, BCC 13, actinic keratosis), IDEI_2023 (melanoma, BCC, SCC), PH_2024 (melanoma, BCC, actinic keratosis), SAN_2024 (melanoma), DAO_Derivación_O_2022 (melanoma ×4, BCC ×9, actinic keratosis 27), DAO_Derivación_PH_2022 (BCC, SCC, melanoma), COVIDX_EVCDAO_2022 (melanoma, BCC, SCC, actinic keratosis), NMSC_2025 (BCC 54, cSCC 54; H&N clinic) |
| Autoimmune diseases (lupus, bullous diseases) | 3% | BI_2024 (pemphigus vulgaris), DAO_Derivación_O_2022 (bullous pemphigoid) |
| Genodermatoses (epidermolysis bullosa, ichthyosis) | 1% | No direct representation in the clinical evidence portfolio |
| Vascular diseases (haemangiomas, malformations) | 1% | MC_EVCDAO_2019 (angioma, haemangioma, angiokeratoma), COVIDX_EVCDAO_2022 (haemangioma, 14 patients), DAO_Derivación_O_2022 (spider telangiectasis, pyogenic granuloma), DAO_Derivación_PH_2022 (angiomas) |
Five of the seven epidemiological categories have direct representation across multiple studies, collectively covering 97% of dermatological presentations (infectious 57%, other 19%, inflammatory 15%, malignant 5%, vascular 1%). Two categories have insufficient representation: autoimmune diseases (3%) are limited to two conditions in two studies, and genodermatoses (1%) have no direct representation. These are addressed as declared acceptable gaps with documented justification in the section Need for more clinical evidence.
Type of evaluation
Per Article 61(3) of EU Regulation 2017/745 on Medical Devices, a clinical evaluation shall follow a defined and methodologically sound procedure based on the following:
- A critical evaluation of the relevant scientific literature currently available relating to the safety, performance, design characteristics, and intended purpose of the device, where the following conditions are satisfied: (i) it is demonstrated that the device subject to clinical evaluation for the intended purpose is equivalent to the device to which the data relate [...], and (ii) the data adequately demonstrate compliance with the relevant general safety and performance requirements;
- A critical evaluation of the results of all available clinical investigations, [...]; and
- A consideration of currently available alternative treatment options for that purpose, if any.
In this way, this clinical evaluation is based on:
- Clinical data specific to the device under evaluation.
- Clinical data related to an equivalent device (the legacy predecessor device), leveraged under the equivalence framework of MDCG 2020-5 and MDR Article 61(5)–(6).
Demonstration of equivalence
In accordance with the MDR, the guidance document MDCG 2020-5, a detailed technical, clinical, and biological equivalence evaluation was conducted between the device and its legacy predecessor.
Technical equivalence
MDR 2017/745 (Annex XIV Part A (3)) specifies that in order for the device to be determined as technically equivalent to a comparator, the target device must be of similar design, be used under similar conditions of use, and have similar specifications and properties, use similar deployment methods, and have similar principles of operation and critical performance requirements. Intended purpose, indications, target patient population, and type of user are addressed separately under clinical equivalence below. The following tables summarise, for each of the seven Annex XIV criteria, how the device compares with the legacy predecessor and the verification basis on which the "Same" determination is established. Sameness is demonstrated (per MDCG 2020-5 §3) rather than asserted: the device and the legacy predecessor are two successive versions of the same product line developed and maintained by the same manufacturer under a continuous design-control process, and the core classifier is carried forward with frozen weights, verified by bit-level engineering output-parity against the legacy predecessor's reference outputs (see Bucket B of the per-change categorisation below).
Criterion 1: Design
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Overall design | Software-only medical device with a web interface and REST API integration. | Same | Continuity of the design baseline across the two successive versions of the same product line is documented in the manufacturer's design and development file, with the canonical reference being the Software Architecture Description (R-TF-012-029). |
| Type of device | Standalone software medical device; non-invasive, with no physical patient contact. | Same | Both versions are standalone software (current device classified under MDR Rule 11; legacy predecessor self-declared Class I under MDD). No physical or biological interface exists on either version. |
Criterion 2: Conditions of use
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Use environment | Clinical environment (in-person consultation or teledermatology), indoors under controlled lighting, internet-connected. | Same | The declared use environment is identical in the IFUs of both versions. |
| Hardware prerequisites | Image-capture-capable device (smartphone or digital camera) and a network-connected workstation or mobile device with a supported browser. | Same | Hardware prerequisites are listed in the IFUs of both versions and have not changed. |
| User preconditions | Authenticated access by a trained healthcare professional; image acquisition per the published image-quality guidance. | Same | Authentication and image-quality preconditions are common to both versions. Current-version DIQA prompts (Bucket A enhancement) reinforce, rather than change, the same use precondition. |
| Duration of interaction | Episodic, one session per consultation; not a continuous-monitoring device. | Same | The episodic pattern of use is specified identically in the IFUs of both versions. |
Criterion 3: Specifications
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Core AI architecture | Vision Transformer (ViT)-based classifier with lesion detection, segmentation, and scoring heads. | Same | The core model architecture and trained weights are carried forward unchanged from the legacy predecessor (Bucket B §5). |
| ICD-11 classifier output | 346 ICD-11 categories. | Same | Bit-level engineering output-parity testing confirms identical classifier output against the legacy predecessor's reference outputs on a common test set. |
| AI-based severity scoring systems carried forward | APASI, AIHS4, ASCORAD, and AUAS, together with the further AI-based severity tools listed in the legacy predecessor's IFU (7PC, ALEGI, ALUDWIG, APULSI, ASALT). | Same | Each carried-forward scoring system uses the same algorithmic implementation and the same output format in both versions, verified via output-parity testing. |
| Output format | Structured severity reports, triage indications, and image annotations, serialised in the formats specified in the IFU. | Same | Same output schema and field definitions. The current version adds an HL7 FHIR serialisation option (Bucket A §2) at the transport layer, which does not alter the specification of the output itself. |
Criterion 4: Properties
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Storage | Cloud-hosted with encryption at rest and in transit. | Same | Both versions are cloud-hosted with encrypted storage. The current version's cybersecurity upgrades (Bucket A §3) reinforce, rather than change, the device's property of being a cloud-hosted, encrypted-storage software product. |
| Interfacing environment | Accessible from mobile and desktop browsers and from connected systems via REST API. | Same | Identical browser and API surfaces across the two versions, as declared in the Labeling and IFU Requirements (R-TF-012-037) and in the IFU issued for each version. |
Criterion 5: Deployment methods
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Integration modality | REST API integrated into electronic health record systems or teledermatology platforms, or used via a web application. | Same | Same integration modalities were available in the legacy predecessor and are maintained in the current version. |
| Deployment topology | Multi-tenant cloud deployment with per-tenant logical isolation. | Same | Same deployment topology. The migration to a microservices backend (Bucket A §1) is an internal scalability optimisation and is not exposed to the user or the integrator; it does not change how the device is deployed. |
| Authentication | API keys and OAuth 2.0 client-credentials flow (machine-to-machine). | Same | Identical authentication mechanisms in both versions. |
| Update and versioning | Versioned API with backwards-compatible deployment; manufacturer-controlled updates under the QMS. | Same | The update and versioning mechanism is identical in both versions and governed by the same quality-management procedures. |
Criterion 6: Principles of operation
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Preparation for use | Log-in via browser or connected system; image acquisition per the published image-quality guidance. | Same | The preparation workflow is specified identically in the IFUs of both versions. |
| Technique | Capture of lesion images, transmission to the device, and processing by the AI inference module. | Same | The end-to-end user-side processing chain is unchanged between the two versions. |
| Mode of action | The software processes image input and returns lesion classification, severity scores, and malignancy-suspicion indicators. | Same (carried-forward outputs) | All outputs are carried forward from the legacy predecessor and verified by bit-level output-parity. The malignancy-surfacing safety indicators and the P₂=1 severity-prioritisation constraint are confirmed as equivalent capabilities present in the legacy predecessor and carried forward in the current version (Bucket B §5). |
| Duration of use | Episodic per consultation; non-continuous. | Same | The operational duration is identical in both versions. |
Criterion 7: Critical performance requirements
| Element | The device | Legacy predecessor | Verification basis |
|---|---|---|---|
| Acceptance thresholds | AUC ≥ 0.9 for malignancy suspicion, specificity ≥ 80%, sensitivity ≥ 75%, and the per-scale acceptance thresholds for the carried-forward severity scoring systems as defined in the Clinical Evaluation Plan (R-TF-015-001). | Same | Identical acceptance thresholds are inherited from the legacy predecessor and reaffirmed in the Clinical Evaluation Plan (R-TF-015-001) for the current version. |
| Measured performance — shared outputs | Meets or exceeds the acceptance thresholds on the pre-market evidence portfolio, evidenced by the published severity-validation manuscripts (APASI_2025, AUAS_2023, AIHS4_2023, and ASCORAD_2022 — MDCG 2020-6 Appendix III Ranks 2–4 analytical-performance evidence) and by the NMSC_2025 Clinical Performance manuscript, supported by the MRMC simulated-use reader studies (BI_2024, PH_2024, SAN_2024, MAN_2025 — MDCG 2020-6 Appendix III Rank 11, supportive of clinical performance rather than primary clinical data). | Met the same acceptance thresholds during legacy commercial deployment, corroborated (not replaced) by the protocolled post-market observational study of the equivalent legacy device (R-TF-015-012, Rank 4 quantitative outcomes and Rank 8 professional-opinion data per MDCG 2020-6 Appendix III) and by the accumulated post-market clinical experience across four years of deployment. | Because the core classifier weights are frozen and bit-level engineering output-parity is verified against the legacy predecessor's reference outputs (in the AI verification and validation records maintained under GP-028 AI Development), the measured analytical performance on the shared ICD-11 categories and carried-forward severity scoring systems is mathematically identical between the two versions. Performance against the thresholds is independently evidenced on each side: the pre-market portfolio for the current version, the post-market observational data for the legacy predecessor. |
Technical equivalence conclusion
Technical equivalence between the device and the legacy predecessor is established by the combined evidence across the seven MDR Annex XIV Part A (3) criteria: the two versions share the same design, conditions of use, specifications (including the frozen core classifier and the carried-forward severity scoring systems), properties, deployment methods, principles of operation, and acceptance thresholds for critical performance. Verification of sameness is grounded in the continuity of the design baseline under the same manufacturer's design-control process and, for the algorithmic outputs, in bit-level engineering output-parity testing for items 5 (ICD-11 classifier) and 8 (DIQA thresholds), and in specification-refinement equivalence for items 6 (malignancy-surfacing safety indicators) and 7 (clinical sign measurement models), where no new capability type is introduced and clinical-performance non-divergence is established in the per-change assessment. On this basis the two devices are technically equivalent for the purpose of leveraging the legacy predecessor's clinical data under MDCG 2020-5 and MDR Article 61(5)–(6).
Clinical equivalence
MDR 2017/745 (Annex XIV Part A (3)) states that in order for devices to be determined as clinically equivalent they must be used for the same clinical condition or purpose, at the same site of use in the body, in a similar patient population, has the same kind of user, and has similar relevant critical performance in view of the expected clinical effect for a specific intended purpose. The following table provides a comparison of the clinical characteristics of the device and its legacy predecessor.
| Clinical Characteristics | The device | Legacy predecessor | Comparison |
|---|---|---|---|
| Clinical Condition | Wide range of dermatological conditions (e.g., melanoma, acne, psoriasis, GPP, etc.) | Same range of dermatological conditions | Equivalent: both cover the same diagnostic scope based on clinical image analysis. |
| Intended purpose | Support clinical evaluation and monitoring by quantifying signs in dermatological images | Same intended purpose | Equivalent: both designed for AI-assisted dermatological evaluation. |
| Site in the body | Skin (cutaneous surface, including localized or generalized conditions) | Same | Equivalent: both focus on visible skin lesions and signs. |
| Patient population | General population | Same | Equivalent: both target a broad patient population |
| Type of user | Healthcare professionals (e.g., GPs, dermatologists) and IT professionals | Same | Equivalent: both are intended for: HCPs and IT professionals. |
| Critical Performance in view of the expected clinical effect | Accurate identification and quantification of clinical signs; decision support for diagnosis and monitoring | Same capabilities, based on the shared core algorithm (frozen weights verified by bit-level output-parity). | Equivalent for all carried-forward functions (early detection, severity scoring, and monitoring across the shared indication set), with validated performance based on clinical and preclinical data. |
Clinical equivalence conclusion
The clinical equivalence between the device and its legacy predecessor is supported by their identical intended purpose, clinical indications, target patient population, and type of user. Both devices are designed to assist healthcare professionals in the evaluation and monitoring of dermatological conditions through the analysis of clinical images using artificial intelligence algorithms. They address the same range of dermatological conditions, target the same anatomical site (skin), and are intended for use by the same type of qualified users. Furthermore, both rely on the same core algorithm and shared software framework, ensuring that the clinical performance — including diagnostic accuracy and decision support across the shared indication set — remains consistent between versions. On this basis the device maintains clinical equivalence with its legacy predecessor.
Biological equivalence
MDR 2017/745 (Annex XIV Part A (3)) states that in order to be determined as biologically equivalent, devices must use the same materials or substances in contact with the same human tissues or body fluids for a similar kind and duration of contact and similar release characteristics of substances, including degradation products and leachables.
In this case, biological equivalence is not applicable because the device is a software-only medical device and does not have any direct or indirect contact with the human body, tissues, or fluids. The device functions through the analysis of dermatological images captured externally, typically via smartphone cameras, and does not involve any material components that would pose a biological interaction or release of substances. Therefore, there is no biological interface that could give rise to toxicological or immunological concerns, and the requirement to establish biological equivalence is not relevant for this device category.
Conclusions regarding equivalence
The device and the legacy predecessor device have been shown to be equivalent with respect to clinical and technical characteristics, as outlined in the corresponding equivalence tables. The two software versions share the same intended purpose, target population, type of user, core algorithm, software architecture, and performance objectives for the carried-forward functions. There are no changes in the clinical condition addressed or the fundamental principles of operation. Because both products are developed by the same manufacturer, there is full access to the design, technical documentation, and performance data of both devices, satisfying MDR Article 61(5)(b).
MDR Article 61(5)–(6) access condition. The manufacturer of the device under evaluation is also the manufacturer of the legacy predecessor; the data-access condition of MDR Article 61(5)(b) is therefore trivially satisfied. Because equivalence is claimed to the manufacturer's own predecessor and not to a third-party device, no third-party-manufacturer contract is required.
Status of the equivalent device's clinical evaluation under MDR 2017/745. The legacy predecessor's original CE marking and clinical evaluation were performed under Directive 93/42/EEC (MDD), with the legacy predecessor classified as Class I, and not under Regulation (EU) 2017/745 (MDR). The legacy clinical evidence is used within the present MDR clinical evaluation only through the MDCG 2020-5 equivalence framework and MDR Article 61(5)–(6); the MDR-level evidence requirement for the device under evaluation is met by the combined pre-market portfolio (six prospective pivotal clinical investigations, three MRMC Rank 11 simulated-use reader studies, one retrospective third-party analysis, the Fitzpatrick V–VI MAN_2025 reader study, and the four published peer-reviewed severity-validation studies) together with post-market evidence from the equivalent legacy predecessor (R-TF-015-012 — Rank 4 quantitative outcomes and Rank 8 professional-opinion data per MDCG 2020-6 Appendix III — and passive PMS Rank 7) leveraged via equivalence. The legacy MDD Technical File is maintained separately from the MDR Technical Documentation for the device under evaluation in accordance with MDR Article 120.
The differences introduced in the device under evaluation compared to the legacy predecessor version (1.0.0.0) are categorised into two buckets per MDCG 2020-5 §A2.1:
Bucket A: architectural / deployment refactor items with no clinical pathway
Covered by equivalence.
- Migration to a microservices architecture: backend orchestration change to improve server scalability and response times under high load.
- Implementation of the HL7 FHIR standard: output-serialisation and EHR-interoperability layer to ensure standardised, secure interoperability with hospital Electronic Health Record (EHR) systems.
- Database encryption and cybersecurity upgrades: data-at-rest and in-transit security controls updated to meet state-of-the-art cybersecurity requirements.
- Enhanced user interface (UI) feedback: clearer prompts and error messages when image quality is insufficient (DIQA), mitigating usability risks.
Bucket B: algorithmically equivalent features
Items 5 and 8 are verified by bit-level engineering output-parity against the legacy predecessor's reference outputs. Items 6 and 7 are specification refinements that introduce no new capability type; clinical equivalence is established by the absence of clinical-performance divergence on the shared indication set, as demonstrated in the per-change assessment below.
- AI model — ICD-11 classifier. Same core Vision Transformer (ViT) architecture and frozen weights carried forward; same 346-ICD-11 classifier output. Engineering bit-level output-parity verified against the legacy predecessor's reference outputs on a common test set.
- Malignancy-surfacing safety indicators. Same binary malignancy-surfacing capability as the legacy predecessor, refined and decomposed into six named binary indicators; no new capability type introduced. The P₂=1 architectural safety constraint is an inherited safety control present in the legacy predecessor and carried forward unchanged, as documented in
R-TF-028-011. - Clinical sign measurement models. Same clinical sign quantification capability (erythema, induration, scaling, and related signs) as the legacy predecessor, refined; outputs feed downstream severity scores such as PASI, SCORAD, IHS4 and SALT.
- DIQA thresholds. Upstream input-safety gate thresholds re-applied unchanged from the legacy predecessor; same below-quality-image rejection behaviour. Bit-level output-parity verified.
Per-change clinical-relevance assessment
Each of the eight changes is individually assessed against MDR Annex XIV Part A §3, confirming that none introduces a difference that adversely affects clinical safety or performance. The assessment references the verification and validation evidence held in the technical documentation.
| Change | Mechanism | Clinical-relevance assessment | Supporting evidence |
|---|---|---|---|
| 1. Migration to microservices architecture | Back-end orchestration change; AI model weights, input-processing chain, and output schema unchanged | No impact on clinical inference: the mathematical inference chain that processes pixels and generates the probability distribution and clinical-sign measurements is identical to the legacy predecessor. Inference output remains within the validated technical-performance envelope; legacy clinical evidence applies. | Software verification and validation records per EN IEC 62304 (in the technical documentation); regression test results against the legacy output schema |
| 2. Implementation of HL7 FHIR | Output-serialisation and EHR-interoperability layer | Communication-layer change only: report content rendered to the healthcare professional is identical to the legacy device; the FHIR adaptation does not alter the ICD-11 probability distribution, clinical-sign values, binary safety indicators, or explainability media. Legacy clinical evidence applies. | Interoperability V&V per IEC 82304-1 and EN 82304-2; IFU integration-guidance validation |
| 3. Database encryption and cybersecurity upgrades | Data-at-rest and in-transit security controls | No effect on clinical inference or on any output reaching the healthcare professional; the change strengthens information-security risk mitigation without altering the clinical inference chain. Legacy clinical evidence applies. | Cybersecurity verification per IEC 81001-5-1:2021 and GP-030 Security; patient-safety risk register R-TF-013-002 |
| 4. Enhanced user-interface feedback | User-facing prompts that inform the healthcare professional when image quality is insufficient (DIQA) | The only user-facing change among the four Bucket A items; it mitigates usability risk AI-RISK-021 (model outputs not interpretable by clinical users). The post-change user interface was validated on the device under evaluation by a summative usability study compliant with EN 62366-1 §5.9 and FDA HFE guidance (2016). | Summative Evaluation Report R-TF-025-007 (36 participants, SUS 82.5 HCPs / 85.2 ITPs); risk register R-TF-013-002 AI-RISK-021 residual-likelihood entry |
| 5. AI model — ICD-11 classifier (Bucket B) | Same architecture and frozen weights carried forward; no retraining | The ICD-11 classifier output is mathematically identical to the legacy predecessor, verified by bit-level engineering output-parity on a common test set. Legacy clinical evidence applies in full. | AI verification and validation records (GP-028 AI Development, R-TF-028-005); bit-level output-parity test results |
| 6. Malignancy-surfacing safety indicators (Bucket B) | Refinement and decomposition of the legacy predecessor's binary malignancy-surfacing capability into six named indicators; P₂=1 constraint carried forward unchanged | Same capability set as the legacy predecessor; no new capability type introduced. The P₂=1 architectural safety constraint is an inherited safety control present in the legacy predecessor and carried forward unchanged (R-TF-028-011). No adverse clinical-performance divergence on the shared indication set. Legacy clinical evidence supplements; specification-refinement equivalence basis per Bucket B definition above. | R-TF-013-002 risk controls R-BDR/R-HBD/R-SKK; R-TF-028-011 AI/ML Risk Assessment (P₂=1 constraint); per-model V&V records |
| 7. Clinical sign measurement models (Bucket B) | Same clinical sign quantification capability as the legacy predecessor (erythema, induration, scaling, and related signs), refined; outputs feed downstream severity scales | Same measurement outputs feeding the same downstream severity scales (PASI, SCORAD, IHS4, SALT) as the legacy predecessor. No new measurement capability type introduced. No adverse clinical-performance divergence on the shared indication set. Legacy clinical evidence supplements; specification-refinement equivalence basis per Bucket B definition above. | Per-model V&V records; severity-scale validation publications (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022) |
| 8. DIQA thresholds (Bucket B) | Upstream input-safety gate thresholds re-applied unchanged from the legacy predecessor | Same below-quality-image rejection behaviour as the legacy predecessor; the threshold re-application preserves the clinical safety gate without alteration. Distinct from Bucket A item 4 (DIQA UI-feedback prompts). Legacy clinical evidence applies. | DIQA V&V records; R-TF-013-002 AI-RISK-021 entry |
The Critical Performance Requirements row in the technical equivalence table above asserts equivalent numerical thresholds (AUC ≥ 0.9 for malignancy, specificity ≥ 80%, sensitivity ≥ 75%) between the device and the legacy predecessor. The supporting legacy-predecessor performance evidence is summarised in this CER at the LEGIT_MC_EVCDAO_2019 row of the table "Clinical data using the legacy version under MDD" and comprises the following published and archived figures: melanoma identification AUC 0.842 (95% CI 0.7629–0.9222); melanoma detection precision 0.81 (95% CI 0.6555–0.9378); melanoma sensitivity > 0.90 (95% CI 0.8836–0.9805); melanoma specificity > 0.80 (95% CI 0.6941–0.9254); malignancy detection AUC 0.8983 (95% CI 0.8430–0.9438); malignancy detection sensitivity 0.81 (95% CI 0.7175–0.8839); malignancy detection specificity 0.86 (95% CI 0.7723–0.9388); positive predictive value 0.92 (95% CI 0.8556–0.9708); negative predictive value 0.68 (95% CI 0.5427–0.8077); Top-5 multi-skin-lesion recognition 0.88 (95% CI 0.7990–0.9534). The full Clinical Investigation Report for the legacy predecessor study is archived under R-TF-015-006 (legacy record series) and is accessible to the clinical evaluation team per the same-manufacturer access described in section Conclusions regarding equivalence.
Justification for Lack of Clinical Impact
None of these changes affect the core Artificial Intelligence models, the Vision Transformer architecture, the clinical indications, or the fundamental principles of operation. The mathematical algorithms that process the pixels and generate the clinical output remain completely identical to those validated in the legacy device. Therefore, these software stabilization and security updates are not expected to, and structurally cannot, negatively impact the clinical safety, clinical performance, or diagnostic accuracy of the device. On the contrary, these changes aim to facilitate conformity under the MDR by freezing the core functionality, improving overall system security, and maintaining the exact same clinical risk profile as the legacy version.
As a result of this demonstrated equivalence, previously generated clinical data for the legacy predecessor, collected under appropriate ethical and scientific standards, are considered applicable and valid to support the clinical evaluation of the device. This allows the clinical evaluation team to rely on the existing body of evidence to confirm the safety and performance of the device currently under assessment.
Justification for Additional Clinical Evidence versus the Legacy Device
The device under evaluation is an evolution of the legacy device, which was CE-marked under the previous Medical Devices Directive 93/42/EEC (MDD) and classified as Class I. While technical and functional continuity exists with the legacy device, the transition to the new regulatory framework, Regulation (EU) 2017/745 (MDR), introduces significantly more stringent requirements that directly impact the clinical evaluation strategy. The justification for generating new clinical evidence is based on two primary regulatory pillars:
1. Change in risk classification and increased level of evidence required
Under the MDR framework, and in accordance with the classification rules stipulated in Annex VIII, the device has been reclassified as Class IIb.
This reclassification (from Class I under MDD to Class IIb under MDR) reflects a higher risk profile recognized by the new regulation. Consequently, Article 61(1) of the MDR mandates a clinical evaluation and a level of clinical evidence that are proportionate and appropriate to this higher risk class. The clinical documentation and data compiled for the Class I legacy device are not, in themselves, sufficient to satisfy the level of scrutiny required for a Class IIb device.
2. Conformity with the General Safety and Performance Requirements (GSPR)
The MDR replaces the "Essential Requirements" (ERs) of the MDD with the General Safety and Performance Requirements (GSPRs), detailed in Annex I of the MDR. The GSPRs are more detailed, prescriptive, and demanding, particularly regarding clinical validation, risk management, and usability. For example, the GSPRs require a more robust quantification of clinical benefits (GSPR 1), and specific requirements for software validation (GSPR 17), which were not defined with the same rigour under the MDD.
Conclusion
Although data from the legacy device are used as fundamental supporting evidence, these data alone create an "evidence gap" when measured against the requirements of the MDR. Therefore, a specific clinical validation plan was designed and implemented for the device. The objective of this prospective clinical data collection was to:
- Demonstrate conformity with the applicable GSPRs of Annex I of the MDR, which were not sufficiently covered by the legacy device's evaluation.
- Provide the robust level of clinical evidence (per Article 61) necessary to confirm the safety profile and clinical benefit of the device under its new Class IIb classification.
- Validate the performance within the context of its updated Intended Purpose under the MDR.
The clinical evidence resulting from these new validations is analysed in detail in Section "Pre-market clinical investigations" of this report.
Regulatory approach to the legacy and the current device technical documentation
To ensure regulatory clarity and maintain the integrity of the conformity assessment process, the Technical Documentation for the device (MDR) is managed as a standalone dossier, entirely separate from the Technical File of the "legacy" device (MDD). This separation is mandated by the substantial differences in the regulatory frameworks. The legacy file demonstrates compliance with the Essential Requirements of the MDD (93/42/EEC), whereas the new Technical Documentation must demonstrate conformity with the General Safety and Performance Requirements (GSPRs) of the MDR (EU) 2017/745, Annex I, using the structure defined in Annex II and III.
Furthermore, the device file includes new clinical evidence generated to support its reclassification from Class I (MDD) to Class IIb (MDR). The legacy Technical File will be maintained independently to support the existing MDD certificate (per MDR Article 120), while the device documentation constitutes the complete and distinct body of evidence submitted for the new MDR certification. This independent management of both files will be strictly maintained at minimum until the device has successfully completed its conformity assessment and received MDR certification.
Clinical data generated and held by the manufacturer
Relevant preclinical data
The manufacturer complies with standards used in design verification activities.
| Identification of the Standard | Domain | Compliance information | Description of deviations | Evidence |
|---|---|---|---|---|
| ISO 13485:2016 | Medical devices - Quality management systems. Requirements for regulatory purposes | Full application | BSI Certification ISO 13485 | |
| IEC 62304:2006/A1:2015 | Medical device software - Software life cycle processes | Full application | R-TF-001-005 List of applicable standards and regulations | |
| IEC 82304-1:2016 | Health software: Part 1: General requirements for product safety | Full application | R-TF-001-005 List of applicable standards and regulations | |
| ISO 14155:2020 | Clinical Investigation of medical devices for human subjects - Good clinical practice | Full application | R-TF-001-005 List of applicable standards and regulations | |
| ISO 14971:2019 | Medical devices - Application of risk management to medical devices | Full application | R-TF-001-005 List of applicable standards and regulations | |
| ISO 15223-1:2021 | Medical devices - Symbols to be used with medical device labels, labelling and information to be supplied | Full application | R-TF-001-005 List of applicable standards and regulations | |
| ISO/TR 24971:2020 | Medical devices - Guidance on the application of ISO 14971 | Full application | R-TF-001-005 List of applicable standards and regulations | |
| ISO 62366-1:2015/A1:2020 | Medical devices - Part 1: Application of usability engineering to medical devices | Full application | R-TF-001-005 List of applicable standards and regulations | |
| IEC 81001-5-1:2021 | Health software and health IT systems safety, effectiveness and security: Part 5-1: Security: Activities in the product life cycle | Full application | R-TF-001-005 List of applicable standards and regulations | |
| ISO 27001:2022 | Information security, cybersecurity and privacy protection: Information security management systems: Requirements | Partial application | We comply only with the applicable part of the standard | R-TF-001-005 List of applicable standards and regulations |
| ISO 27002:2022 | Information security, cybersecurity and privacy protection: Information security controls | Partial application | We comply only with the applicable part of the standard | R-TF-001-005 List of applicable standards and regulations |
| FDA GMLP 2021 | Good machine learning practice for MD development: guiding principles | Full application | R-TF-001-005 List of applicable standards and regulations | |
| FDA AI/ML Framework 2019 | Proposed regulatory framework for modifications to AI/ML-based SaMD | Full application | R-TF-001-005 List of applicable standards and regulations |
All proof of compliance with these requirements, which constitutes a preclinical data set, is available in the technical file. Note that the assessment of compliance with these standards is not part of this clinical evaluation. It remains important to state in this clinical evaluation report that compliance with these different standards grants the presumption of compliance with the general requirement GSPR1.
Pre-market clinical investigations
As described in the CEP (available in R-TF-015-001 Clinical Evaluation Plan), the manufacturer conducted a pre-clinical phase to develop and evaluate the Artificial Intelligence algorithms deployed in the device in order to ensure their accuracy, robustness, reliability, and cybersecurity in line with the intended medical purpose. The pre-market investigation portfolio then comprises six prospective pivotal clinical investigations conducted with the frozen version of the device in real clinical settings (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023, NMSC_2025), three multi-reader multi-case (MRMC) simulated-use reader studies with healthcare professionals (BI_2024, PH_2024, SAN_2024), one retrospective third-party analysis (AIHS4_2025), and one Fitzpatrick V–VI MRMC reader study (MAN_2025). An additional retrospective diagnostic-accuracy cohort of the legacy predecessor device (LEGIT_MC_EVCDAO_2019) is leveraged via the MDCG 2020-5 equivalence framework to support the AI algorithm's core functionality carried forward into the device. More details about each investigation are available in its respective report within the series of documents R-TF-015-006.
All investigations summarised below were conducted pre-market to support the initial MDR certification of the device. Post-market real-world evidence derived from the legacy predecessor device under MDD is presented separately in section Clinical data generated from risk management and PMS activities and is underpinned by R-TF-007-001 (PMS Plan) and R-TF-007-002 (PMCF Plan).
Device version tested. The prospective pivotal investigations with the frozen version of the device used the MDR-certification-frozen software build of the device under evaluation (version 1.1.0.0, as identified in the product technical documentation). The legacy-predecessor cohort LEGIT_MC_EVCDAO_2019 used the MDD-certified legacy software version 1.0.0.0 marketed between 2020 and the cut-over to the frozen version. A summary of algorithmic and functional changes between the two versions — and the clinical-relevance assessment of each change — is presented in section Conclusions regarding equivalence and section Per-change clinical-relevance assessment of this CER.
Clinical data using the legacy version under MDD
As part of the clinical evaluation of the device, relevant clinical data from a previous version of the device has been considered. This version was developed and tested under the MDD framework and shares the same intended purpose, mode of action, and core algorithmic structure as the current MDR-certified version.
| Reference of the study | Patients - Clinical condition | Main safety outcomes | Main performance outcomes |
|---|---|---|---|
LEGIT_MC_EVCDAO_2019 Prospective, observational and cross-sectional study Weighting from appraisal: 10 | 105 patients included. Sex: 53 men (51%) and 52 women (49%). Age: 62 ± 15 years. Phototype: I 87.13%, II 9.77%, III 2.48%, IV 0.62%. Indications:
| No adverse event, side effect, or device deficiency was reported during this study. | 105 patients with lesions suspected of malignancy were selected to carry out the study and to validate the capability of the legacy device for detecting cutaneous melanoma in dermoscopic images. The device achieved the following results:
The study demonstrated high diagnostic performance of the legacy device's AI algorithm. All predefined performance thresholds were met or exceeded. These results support the core functionality and intended use of the MDR-certified device. |
Clinical data using the frozen version of the device under MDR
The following pivotal studies were conducted with the frozen version of the device under evaluation in line with the current intended purpose and functionality. These studies provide essential evidence of clinical performance, diagnostic support capability, referral optimization, and usability across dermatology and primary care.
| Reference of the study | Patients - Clinical condition | Main safety outcomes | Main performance outcomes |
|---|---|---|---|
Legit.Health AIHS4 2025 Retrospective, observational, longitudinal and pivotal study Weighting from appraisal: 8.5 | 2 patients affected by Hidradenitis Suppurativa included. | No adverse event, side effect, or device deficiency was reported during this study. | In this study, the severity of Hidradenitis Suppurativa of 2 patients was evaluated in consecutive visits with the device and compared to expert dermatologists and the gold standard. The following results were obtained:
This study demonstrated that the device is a useful tool in the severity measurement of HS. Limitation: While the results are highly promising, the small sample size of only 2 patients and 16 observations represents a limitation for generalisability. To address this, a larger confirmatory study with a minimum of 100 patients will be conducted as part of the Post-Market Clinical Follow-up (PMCF) Plan to validate these findings. |
LEGIT.HEALTH_BI_2024 Prospective observational analytical, cross-sectional and pivotal study Weighting from appraisal: 8.5 | 100 images of patients with dermatological conditions included. Sex: 64 men (64%) and 37 women (37%). Age: 3 patients (1 month to 2 years); 14 patients (2 to 12 years); 20 patients (13 to 20 years); 22 patients (≥ 22 and < 65 years); 12 patients (over 65 years). Phototype: I 20%, II 43%, III 22%, IV 9%, V 6%. Indications: Multiple skin conditions representative of routine clinical practice. | No adverse event, side effect, or device deficiency was reported during this study. | Images from 100 patients with different skin conditions were analysed first by unaided PCPs and dermatologists, and after aided by the medical device. The following results were achieved:
The study demonstrated the utility of the device as a diagnostic support tool for all HCP tiers in the diagnosis of different skin conditions (to see all the results of the study, please check the Clinical Investigation Report). |
LEGIT_COVIDX_EVCDAO_2022 Prospective, observational, analytical, single-centre and pivotal study Weighting from appraisal: 6.5 | 160 patients with different skin conditions were included, and 6 dermatologists participated in the study and fulfilled the Clinical Utility Questionnaire (CUS). | No adverse event, side effect, or device deficiency was reported during this study. | In this study, the device achieved the following appraisals by the practitioners:
This study provides evidence and data on specialists' perceptions of the use of the device in routine clinical practice. |
LEGIT.HEALTH_DAO_Derivación_O_2022 Prospective, observational, analytical, multicentre and pivotal study of a longitudinal clinical case series Weighting from appraisal: 10 | 127 patients with different skin conditions were included; final analysis conducted with 117 patients (10 patients were excluded due to data quality issues that made their inclusion impossible in the final analysis). Sex: 46 men (36.22%) and 81 women (63.78%). Age: 60 ± 21 years. Phototype: I 67.66%, II 22.88%, III 7.46%, IV 1.5%, V 0.5%. Indications: Patients with skin lesions referred to the dermatology service of Cruces and Basurto Hospitals. | No adverse event, side effect, or device deficiency was reported during this study. | Initially, 127 patients with different skin lesions were included to validate the capability of the device to help in the referral process. However, 10 patients were excluded due to data quality issues that made their inclusion impossible in the final analysis, which was therefore conducted with 117 patients. The device achieved the following results:
This study demonstrates how the use of the device in primary care can help the decision-making process to refer a patient to dermatological care. |
LEGIT.HEALTH_DAO_Derivación_PH_2022 Prospective, observational, analytical and pivotal study Weighting from appraisal: 9 | 131 patients with different skin conditions were included. Phototype: I 48.33%, II 36.66%, III 12.23%, IV 2.23%, V 0.55%. | No adverse event, side effect, or device deficiency was reported during this study. | 131 patients representative of routine clinical practice were included in this study in order to assess whether the information provided by the device increases the true accuracy of healthcare professionals (HCPs) in the diagnosis of multiple dermatological conditions. The device achieved the following results:
|
Legit.Health_IDEI_2023 Prospective, observational and pivotal study with both longitudinal and retrospective case series Weighting from appraisal: 8.5 | 204 patients with different skin conditions (pigmented lesions or female androgenetic alopecia) were included. Sex: 56 men (27.5%) and 148 women (72.5%). Age: 54 ± 21 years. Phototype: I 63.39%, II 23.21%, III 12.5%, IV 0.9%. Indications:
| No adverse event, side effect, or device deficiency was reported during this study. | 204 patients were recruited in this study (108 patients with pigmented lesions — 76 retrospective and 32 prospective — and 96 with androgenetic alopecia — 62 retrospective and 34 prospective). The device achieved the following results:
|
LEGIT.HEALTH_PH_2024 Prospective observational analytical, cross-sectional and pivotal study Weighting from appraisal: 8.5 | 30 images from patients with different skin conditions included. Sex: 14 men (47%) and 16 women (53%). Age: 1 patient (1 month to 2 years); 1 patient (2 to 12 years); 0 patients (13 to 20 years); 0 patients (≥ 22 and < 65 years); 28 patients (over 65 years). Phototype: I 33%, II 40%, III 23%, IV 3%. Indications: Multiple skin conditions representative of routine clinical practice. | No adverse event, side effect, or device deficiency was reported during this study. | Images from 30 patients with different skin conditions were analysed by 9 PCPs, firstly unaided and after aided by the medical device. The following results were achieved:
The study demonstrated the utility of the device as a diagnostic support tool for PCPs in the diagnosis of different skin conditions (to see all the results of the study, please check the Clinical Investigation Report). |
LEGIT.HEALTH_SAN_2024 Prospective observational analytical, cross-sectional and pivotal study Weighting from appraisal: 8.5 | 29 images of patients with dermatological conditions included. Sex: 17 men (59%) and 12 women (41%). Age: 0 patients (1 month to 2 years); 2 patients (2 to 12 years); 1 patient (13 to 20 years); 18 patients (≥ 22 and < 65 years); 4 patients (over 65 years). Phototype: I 42.82%, II 42.82%, III 7.16%, IV 3.6%, V 3.6%. Indications: Multiple skin conditions representative of routine clinical practice. | No adverse event, side effect, or device deficiency was reported during this study. | Images from 29 patients with different skin conditions were analysed first by both PCPs (10 PCPs) and dermatologists (6 dermatologists), first unaided and after being aided by the medical device. The following results were achieved:
The study demonstrated the utility of the device as a diagnostic support tool for all HCP tiers in the diagnosis of different skin conditions (to see all the results of the study, please check the Clinical Investigation Report). |
LEGIT.HEALTH_MAN_2025 Prospective, observational, multi-reader multi-case (MRMC) self-controlled, simulated-use reader investigation | 149 anonymised images representative of Fitzpatrick phototype V and VI presentations of 28 dermatological conditions, including 10 malignant cases (melanoma and basal cell carcinoma). Images derived from the source MRMC case pool (SAN_2024, BI_2024, PH_2024) through a controlled phototype-conversion step with per-image clinical-fidelity quality-control review. Phototype: V 100%. Indications: Reader panel: minimum of 5 healthcare professionals per CIP, drawn from the device's declared intended user groups — board-certified dermatologists or dermatology residents, board-certified primary care physicians or primary-care residents, and nurses with clinical responsibility for skin or wound assessment. No patient recruitment, no patient-identifiable data, no therapeutic or diagnostic intervention on any patient as a consequence of the investigation. | No adverse event, side effect, or device deficiency was observed during this investigation. The investigation is non-interventional and no patients are involved; foreseeable adverse events are documented in `R-TF-013-002` Risk Management Record. | Data collection completed and the dataset was locked on 17 April 2026. The primary endpoint is the paired difference in pooled top-1 diagnostic accuracy between the unassisted read (Stage 1) and the device-assisted read (Stage 2) on the curated Fitzpatrick V–VI image set, pre-specified at a minimum improvement of +10 percentage points and analysed by the two-sided McNemar test for paired proportions (α = 0.05, Wilson score and Newcombe hybrid-score 95% confidence intervals). Pre-specified secondary endpoints are (i) Stage 3 malignant-case referral sensitivity against atlas-labelled ground truth and (ii) case-level device malignancy-detection ROC AUC on the curated image set. Per-pathology and per-specialty estimates are pre-specified as exploratory, hypothesis-generating analyses; sensitivity analyses restricted to readers completing at least 50% and 100% of the 149 cases are pre-specified as robustness checks. The locked primary-endpoint estimate, the board-certified sensitivity subset, Stage 3 referral results, and the device malignancy-detection AUC are reported in the corresponding `R-TF-015-006 (MAN_2025)` Clinical Investigation Report. Under MDCG 2020-6 Appendix III this investigation constitutes Rank 11 simulated-use evidence; under MDCG 2020-1 §4.4 it contributes Pillar 3 Clinical Performance supporting evidence for Fitzpatrick V–VI generalisability, positioned below the Rank 2–4 prospective real-patient studies that carry the primary Pillar 3 weight. |
Methodological justifications and statistical adequacy
The totality of the pre-market portfolio provides a robust and scientifically valid basis for the clinical evaluation of the device. The portfolio comprises six prospective pivotal real-world clinical investigations covering 729 real patients (MC_EVCDAO_2019: 105, COVIDX_EVCDAO_2022: 160, DAO_Derivación_O_2022: 127, DAO_Derivación_PH_2022: 131, IDEI_2023: 204, plus AIHS4_2025: 2 — the n = 2 retrospective third-party cohort is included here for the cumulative-patient count); three MRMC simulated-use reader studies covering 159 image assessments by 40 healthcare-professional readers (BI_2024: 100 images / 15 HCPs, PH_2024: 30 images / 9 HCPs, SAN_2024: 29 images / 16 HCPs); and the Fitzpatrick V–VI MAN_2025 MRMC reader study (149 images / minimum 5 HCPs across the device's declared intended user groups). MRMC reader studies provide MDCG 2020-1 Pillar 3 §4.4 supporting evidence at Rank 11 and do not constitute clinical data under MDR Article 2(48). In addition, the published Clinical Performance manuscript NMSC_2025 (135 real patients, head and neck clinic) and the four published Technical Performance manuscripts (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022, covering over 3,300 annotated images) contribute supporting evidence in their respective pillars. The following methodological justifications address the specific design choices across these investigations:
1. Image Quality and Exclusion Rationale
Across all studies, images were pre-screened for quality using the device's integrated Deep Image Quality Assessment (DIQA) algorithm. The exclusion of sub-standard images (e.g., those with poor focus, inadequate lighting, or excessive occlusion) is methodologically appropriate because it directly mirrors the device's real-world behavior. As specified in the Instructions for Use (IFU), the device is designed to reject low-quality inputs and prompt the user to retake the photo. Consequently, clinical performance metrics calculated on validated images provide an accurate representation of the device's effective clinical performance in the field.
2. Coverage of Imaging Modalities
The study portfolio systematically evaluates the device's performance across both clinical (unmagnified) and dermatoscopic (magnified) imaging.
- Dermatoscopic imaging (e.g., MC_EVCDAO_2019, IDEI_2023) was utilized for investigations focused on malignant lesions (melanoma, BCC, SCC) where specialist dermoscopy is the gold-standard workflow.
- Clinical imaging (e.g., BI_2024, PH_2024, SAN_2024) was utilized for studies evaluating primary care triage, referral prioritization, and common inflammatory conditions (acne, psoriasis, dermatitis). This dual-modality approach ensures that the device's performance is validated for the specific workflows corresponding to its varied clinical indications.
3. Statistical Power and Sample Size Rationale
The sample sizes for all pre-market investigations were calculated to ensure sufficient statistical power to validate the primary endpoints.
- MC_EVCDAO_2019 (Legacy Study): Although the initial recruitment target was higher, the study was concluded with 105 patients because the prevalence of malignant cases (34.29% melanoma) significantly exceeded the initial 20% target. This enriched population ensured that the statistical power required to validate the primary safety endpoint (Sensitivity > 0.90 for melanoma) was maintained and met with high confidence.
- Aggregate Evidence: The cumulative dataset of 800+ patients across the study portfolio covers the intended range of dermatological conditions, user groups (PCPs and dermatologists), and clinical settings. This extensive body of evidence provides a high degree of certainty regarding the device's safety and performance claims.
Clinical data generated from risk management and PMS activities
Complaints regarding the safety and performance of the evaluated device
Because the device under evaluation claims equivalence with the legacy device, the post-market experience of the legacy device is directly applicable to the safety evaluation.
Since its commercial introduction in 2020, the legacy device has been actively utilized in clinical settings. To date, the manufacturer has established 21 active contracts and generated over 250,000 clinical reports across a diverse range of dermatological conditions.
During this period, systematic Post-Market Surveillance (PMS) activities have been continuously conducted under the R-006-002 non-conformity, claims and communications registry. A thorough review of the PMS data across the full reporting period reveals:
- Zero MDR Article 87 serious incidents or reportable adverse events.
- Zero MDR Article 88 trend reports triggered.
- Zero Field Safety Corrective Actions (FSCAs) or product recalls.
- Three customer-reported Category 3a events across ~250,000 diagnostic reports (rate ~0.0012%): one clinical-output accuracy feedback (Consultant Connect, May 2023; investigation did not establish a systematic malfunction) and two API availability events (September 2023 and September 2024). All closed through paired CAPAs. No patient harm reported in any case.
- Two Category 4 non-safety complaints (integration-format query in December 2023; summative-usability authentication friction in July 2024). Both closed. Neither related to clinical output.
- Zero algorithmic-performance or diagnostic-failure CAPAs.
This extensive market experience, spanning over four years and 250,000 clinical reports with zero Article 87 serious incidents and zero Article 88 trend reports, provides robust real-world confirmation of the device's safety profile. The three Category 3a customer-reported events and two Category 4 non-safety complaints demonstrate a functioning vigilance system: issues are logged, investigated, and closed proportionately to their clinical significance. The use of this legacy device post-market data as clinical evidence for the device under evaluation is grounded in MDCG 2020-6 § 6.2.2, which recognises that post-market surveillance data from a legacy device may be used in the clinical evaluation of the transitioning device when equivalence has been demonstrated. The data has been appraised using IMDRF MDCE WG/N56 Appendix F quality criteria, as endorsed by MDCG 2020-6 Appendix I.
The 21-contract, 250,000+-report, zero-complaint counts above are drawn from the legacy-device PMS dataset maintained under GP-007 and aggregated in the legacy-device umbrella PMS Report (R-TF-007-003), planned under the paired umbrella PMS Plan (R-TF-007-005). The aggregate has been appraised using the IMDRF MDCE WG/N56 Appendix F quality criteria noted above, with the appraisal archived alongside the umbrella PMS Report.
Fitness-of-purpose of the legacy-device PMS methodology under the MDR reclassification. The legacy-device PMS methodology in force during 2020-present (reactive complaints handling under GP-014, trend monitoring under GP-007, and user-reported diagnostic-discrepancy capture via the clinical-report workflow) has been appraised in MDCG 2020-6 §6.2.2 terms for fitness-of-purpose with respect to the MDR Class IIb MDSW under evaluation. The channels in place are capable of detecting the safety-relevant signals applicable to this MDSW — misclassification-driven clinical harm, image-quality failure modes, and user-reported diagnostic error. Post-CE-mark PMS under R-TF-007-001 adds the active MDSW-specific signals (AUC and Top-N drift, Fitzpatrick-stratified performance, Article 88 trend-threshold monitoring) summarised in the PMS and PMCF feedback-loop subsection below.
Following CE marking under MDR, the manufacturer will implement Post-Market Surveillance activities as described in R-TF-007-001 Post-Market Surveillance (PMS) Plan, including Periodic Safety Update Reports (PSURs) per MDR Article 86. These activities are described in our standard operating procedures for Post-Market Surveillance (GP-007) and complaints handling and customer communication (GP-014).
Post-market clinical study of the equivalent legacy device (R-TF-015-012)
In addition to the passive surveillance described above, a proactive post-market clinical study of the equivalent legacy device was conducted under a formal study Protocol (R-TF-015-012) nested inside the legacy umbrella PMS Plan (R-TF-007-005). This study sits at the study-specific tier of the legacy post-market documentation hierarchy (Plan and Report at the umbrella tier, Protocol and Report at the study-specific tier). It generates post-market clinical performance evidence for use in the clinical evaluation of the device under MDR, per MDCG 2020-6 §6.2.2 (post-market data from an equivalent legacy device may be used as clinical evidence for the transitioning device) and §6.5.e (gap-bridging through scientifically sound questionnaires). The study's full methodology and statistical output are documented in the companion Study Report annexed to R-TF-015-012 (Appendix D); its conduct, results, and benefit-risk conclusions are consolidated in the legacy umbrella PMS Report (R-TF-007-003) prepared under MDR Article 85 (applicable to the legacy MDD Class I device via MDR Article 120(3)). This subsection presents the study as a source of clinical evidence for this CER.
Study design and regulatory basis
The study is a cross-sectional observational study with retrospective recall, using a structured bilingual (EN/ES) physician questionnaire as its data collection instrument, administered via a closed electronic survey platform distributed to all 21 legacy device client institutions. Collection window: 23 March 2026 to 13 April 2026 (three weeks). Responses collected: 60; analysis set N = 56 (34 dermatologists, 13 primary care physicians, 9 hospital managers) after four responses were excluded under the protocol's Section 10.7 evidence-quality substantiation principle. The N = 56 analysis set still exceeds the protocol's stretch target of 45. Regulatory basis: MDR Article 83 (proactive PMS), MDR Article 85 (PMS Report requirements applicable via Article 120(3)), MDCG 2020-6 §6.2.2 and §6.5.e.
Rank-8 primary classification of R-TF-015-012, with supplementary Rank-4 case
Per MDCG 2020-6 Appendix III, the primary classification of R-TF-015-012 is Rank 8 ("proactive post-market surveillance data (for example, surveys, data collected from registries that are not classified as proactive clinical investigations, professional opinion from scientific societies, end users or data on the use in clinical practice)"), applied to both the Likert professional-opinion items (B1, B3, B5, C1–C3, D1, D3, D5, E1, F3) and the quantitative endpoints (three co-primary endpoints B2, C4, D4 and six supportive endpoints B4, B6, C5, D2, D6, D7). Rank 8 is the conservative reading of a cross-sectional physician-recall design and aligns with the notified-body default position that physician-recall survey data belongs at Rank 8 rather than at Rank 4.
A supplementary Rank 4 case is retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note. The high-quality-survey threshold is met through the study's methodological design: a formal PMS Study Protocol with pre-specified objectives and endpoints (R-TF-015-012); pre-specified MCID thresholds derived from published SotA; published SotA comparators for each endpoint; a pre-specified statistical analysis plan with Holm-Bonferroni multiplicity correction for the three co-primary endpoints; a pre-specified sensitivity analysis stratifying results by data source reliability (record-consulted vs professional estimate); safety data collection (Section F) alongside benefit data; and transparently acknowledged methodological limitations. The supplementary Rank 4 case is a legitimate reading available under Appendix III, but it is not the lead classification of this study and the Pillar 3 sufficiency determination of this CER does not depend on it.
Both evidence strata are positioned in the evidence portfolio as post-market complement to, not substitute for, the pre-market Pillar 3 evidence generated in the manufacturer's own clinical investigations (Ranks 2 and 4). The Rank-4 supplementary case is not load-bearing: if a notified-body assessor rejects the supplementary Rank 4 reading and retains only the primary Rank 8 classification, the three benefit conclusions are still independently supported by the pre-market Pillar 3 Rank 2–4 prospective studies and the Pillar 3 §4.4 Rank 11 MRMC studies.
Endpoints, MCID thresholds, and SotA comparators
Three co-primary endpoints (one per declared clinical benefit) were pre-specified. Six additional supportive endpoints were pre-specified. MCID thresholds were derived from published SotA literature. SotA comparators were pre-specified from the State of the Art review (R-TF-015-011).
| Endpoint | Benefit | Type | MCID | Observed mean | SotA comparator (published) |
|---|---|---|---|---|---|
| B2: Diagnostic assessment change rate | 7GH | Co-primary | 5% | 18.77% | +6.36% with AI (range +5.3% to +20.7%) |
| C4: Treatment decisions informed | 5RB | Co-primary | 10/yr | 36.23/yr | 14–36% of encounters alter treatment (SotA) |
| D4: Referral adequacy improvement | 3KX | Co-primary | 5% | 15.56% | 14–24% reduction in unnecessary referrals |
| B4: Rare disease identification count | 7GH | Supportive | 3/yr | 7.30/yr | No published baseline; BI_2024 +26.77 pp |
| B6: Malignancy detection count | 7GH | Supportive | 5/yr | 14.68/yr | AI sensitivity 74.6–85.7% vs. PCP 66.3% |
| C5: Longitudinal monitoring rate | 5RB | Supportive | 5% | 30.53% | Human inter-observer ICC 0.47; device ICC 0.72 |
| D2: Waiting time reduction | 3KX | Supportive | 5% | 14.53% | 60–132 days standard; teledermatology −71% |
| D6: Remote assessment adequacy | 3KX | Supportive | 5% | 47.76% | ~55% with teledermatology |
| D7: Remote volume increase | 3KX | Supportive | 5% | 24.64% | Low baseline remote care; capacity for 55%+ |
Benefit 7GH: Diagnostic accuracy confirmed
The co-primary endpoint B2 exceeds the MCID with Holm-adjusted significance (p < 0.05, Cohen's d large). Both supportive endpoints (B4 rare disease identification and B6 malignancy detection) also exceed their MCIDs. Observed values sit within or below the SotA range for comparable AI-assisted interventions, consistent with real-world recall-based reporting relative to controlled study baselines.
Benefit 5RB: Objective severity assessment confirmed
The co-primary endpoint C4 (treatment decisions informed by the device's severity scores) exceeds MCID with Holm-adjusted significance. The supportive endpoint C5 (longitudinal monitoring rate) exceeds MCID. Of note, the C3 Likert item (inter-observer consistency) fell slightly below neutral (2.92), reflecting that individual physicians have limited direct experience comparing their device-generated scores with colleagues' scores. This Likert perception is reconciled against objective evidence: the AIHS4_2025 MRMC study measured inter-observer ICC at 0.716–0.727, exceeding the human baseline (ICC 0.47, Goldfarb et al. 2021) and the CER acceptance criterion (≥ 0.70).
Benefit 3KX: Care pathway optimisation confirmed
The co-primary endpoint D4 (referral adequacy improvement) exceeds MCID with Holm-adjusted significance. All three supportive endpoints (D2 waiting time reduction, D6 remote assessment adequacy, D7 remote volume increase) exceed their MCIDs. Three endpoints (D2, D4, D6) fall below their CER acceptance criteria but substantially exceed the MCID and fall within or below the published SotA range. These differences between study observation and CER acceptance criterion are expected and transparently acknowledged: CER acceptance criteria derive from best-case published studies, while this PMS study measures real-world physician-perceived outcomes with inherent recall imprecision. The divergence does not invalidate the benefit confirmation.
Sensitivity analysis: data source stratification
Each quantitative question was paired with an evidence-quality control item asking whether the response was record-consulted (a) or a professional estimate (b). Aggregate records proportion: 36.0% (target: ≥ 30%). The record-consulted and estimate-based subgroups showed broadly consistent means across all endpoints with no systematic divergence. The sensitivity analysis supports the robustness of the data.
Safety data (questionnaire Section F)
The questionnaire's safety section captured physician-perceived safety signals as proactive surveillance under MDR Article 83(1): F1 (misleading output) 26.8% (below the pre-specified 30% follow-up threshold, so the protocol-specified F1 follow-up is not triggered); F2 (usability issues affecting clinical use) 30.4%; F3 (overall perceived safety, Likert 1–5) mean 4.14 (strong physician confidence); F4 (device-related incidents formally reported) 7.1%. The 15 substantiated F1 = Yes responses were thematically reviewed: the reported incidents are consistent with the device's known edge-case limitations (atypical presentations, rare conditions, paediatric and dermoscopy-dependent lesions) already documented in the risk management file, with no new category of misleading behaviour emerging. F4 cross-referenced against the R-006-002 registry confirms no unreported serious incident. Full thematic analysis and benefit-risk discussion are documented in the legacy umbrella PMS Report (R-TF-007-003).
Conclusion under MDCG 2020-6 §6.2.2
The study provides Rank 8 primary evidence for both quantitative endpoints and Likert professional-opinion items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys may also fall into this category" note, demonstrating that the three declared clinical benefits (7GH diagnostic accuracy, 5RB objective severity assessment, 3KX care pathway optimisation) are supported by the equivalent legacy device in routine clinical practice across 21 independent clinical sites over 4+ years of commercial use. All three co-primary endpoints exceed pre-specified MCIDs with Holm-adjusted significance; all six supportive endpoints exceed their MCIDs; safety surveillance is complete; sensitivity analysis supports data robustness; SotA comparison provides the "before/after" context required by §6.5.e. These findings extend the pre-market clinical evidence base with routine-practice confirmation and support the sufficiency determination set out below.
Post-Market Clinical Follow-up Data
Since this clinical evaluation is performed for the initial CE-mark submission under MDR, no retrospective PMCF data exists for this specific version. However, the legacy device's market experience documented above constitutes clinical data per MDR Article 2(48) and has been integrated into this clinical evaluation as described in the preceding section.
The manufacturer has established a proactive Post-Market Clinical Follow-up (PMCF) Plan (R-TF-007-002) to gather additional data on the device's safety and performance in the post-market phase. As detailed in section Necessary measures of this report, specific activities are scheduled to address identified gaps regarding triage effectiveness, severity assessment validation, algorithmic stability, and evidence coverage for autoimmune diseases and genodermatoses. The results from these activities will be analysed and incorporated in future updates of this CER per the annual update cadence defined in GP-015.
Assessment of the combined evidence portfolio against MDCG 2020-1 pillars
Per MDR Article 61(1), the level of clinical evidence must be appropriate in view of the characteristics of the device and its intended purpose. Per MDCG 2020-6 § 6.4, sufficient clinical evidence must exist prior to MDR certification; PMCF activities may confirm conclusions already supported by other evidence but cannot fill gaps where pre-market evidence is absent.
The clinical data presented in this section, together with the equivalence assessment with the legacy device demonstrated in the preceding section, satisfies each of the three evidence pillars required for medical device software under MDCG 2020-1:
Valid Clinical Association (VCA): fully established. The device's outputs (ICD-11 probability distributions, severity scores) correspond to well-characterised dermatological conditions with established diagnostic criteria. This is confirmed by the systematic literature review (R-TF-015-011), which identified and appraised 64 clinical articles and 8 clinical guidelines demonstrating the scientific association between the device's claimed outputs and real clinical conditions. The appraised literature covers all claimed outputs: diagnostic classification across ICD-11 categories, severity assessment scoring, and triage/referral prioritisation. No VCA gaps were identified.
Technical Performance / Analytical Performance: demonstrated. Technical Performance under MDCG 2020-1 §4.3 addresses whether the MDSW reliably, accurately and consistently meets its intended purpose — attributes such as accuracy, sensitivity, specificity, generalisability, and data quality of the algorithm itself. Four peer-reviewed publications (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022) provide algorithm-level validation of the device's severity scoring outputs against independent expert dermatologist consensus on internationally validated clinical severity scales (PASI, UAS, IHS4, SCORAD). These publications are appraised with MINORS in the section "Validated methodological quality appraisal." This is complemented by the AI model verification and validation documentation (software V&V records associated with the frozen device) and by the algorithm validation performed against the manufacturer's curated labelled image database, which together establish accuracy, generalisability and data-quality characteristics of the algorithm independently of user interaction.
Clinical Performance: demonstrated. Clinical Performance under MDCG 2020-1 §4.4 addresses whether the intended users, on the intended patient population and in the intended use conditions, can achieve clinically relevant outputs. Two evidence streams support this pillar. First, five prospective clinical studies conducted in real clinical settings with real patients (MC_EVCDAO_2019, IDEI_2023, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022) cover over 700 patients across 6 hospital sites in Spain and span the full spectrum of intended users. Three of these are Rank 2 (high-quality clinical investigations with coverage gaps justified by risk assessment and addressed by PMCF); two are Rank 4 (methodological limitations with quantifiable and clinically meaningful data). The DAO_Derivación_O_2022 study is the closest representation of the intended deployment context — the primary care to dermatology referral optimisation pathway. Second, the MRMC simulated-use studies (BI_2024, PH_2024, SAN_2024) contribute Pillar 3 evidence at a lower evidence rank (Rank 11) by demonstrating that intended users achieve clinically relevant outputs when using the device across more than 40 healthcare professionals and more than 15 distinct dermatological conditions — consistently improving diagnostic accuracy (primary care physicians: +17–27%; dermatologists: +8–10%; p < 0.001 for primary endpoints). The pre-market clinical-performance evidence base is extended by the protocolled post-market observational study of the equivalent legacy device (R-TF-015-012, classified at Rank 8 primary for both quantitative and Likert items per MDCG 2020-6 §6.2.2, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys" note), which confirms in routine clinical practice across 21 independent clinical sites that all three declared clinical benefits (7GH diagnostic accuracy, 5RB objective severity assessment, 3KX care pathway optimisation) are achieved at the study level, with co-primary endpoints B2/C4/D4 each exceeding pre-specified MCIDs with Holm-Bonferroni-adjusted significance.
The evidence portfolio is further strengthened by the following:
- Consistency between simulated-use and real-world Pillar 3 evidence. The MRMC simulated-use studies and the prospective real-patient clinical studies — both contributing to Pillar 3 Clinical Performance at different evidence ranks — yield consistent performance patterns; the device improves diagnostic accuracy across user types and conditions in both simulated and real-world settings. This consistency reinforces the validity of both evidence streams and provides confidence that the clinical performance observed in controlled reader studies is reproduced when the device is used on real patients.
- Safety confirmation from real-world deployment. As detailed in the preceding PMS and complaints subsection, the legacy device's post-market experience, 4+ years of commercial deployment with over 250,000 clinical reports, zero serious incidents, and zero FSCAs, provides real-world confirmation of the device's safety profile that is directly applicable through the established equivalence. Per MDCG 2020-6 § 6.3, this PMS data is presented as confirmatory evidence alongside the formal safety analyses from the clinical investigations, not as the sole basis for safety conclusions.
- Public registration and peer review. The pre-market clinical investigations involving patient-level data are registered on ClinicalTrials.gov and the EMA RWD Catalogue, providing independent traceability. The four MRMC simulated-use reader studies (BI_2024, PH_2024, SAN_2024, MAN_2025) are performed on retrospective anonymised images sourced from public dermatology atlases and do not meet the MDR Article 2(45) definition of a clinical investigation; accordingly, no ClinicalTrials.gov or EMA RWD registration applies to the MRMC studies — the rationale is documented in each MRMC study's Clinical Investigation Report under
R-TF-015-006§Trial Registrations. Two studies have been published in peer-reviewed journals (BI_2024 in JMIR Dermatology; IDEI_2023, DOI 10.1101/2025.03.11.25323753), and one additional manuscript is under review.
Evidence coverage and declared acceptable gaps. The evidence portfolio covers 97% of dermatological presentations across 5 of 7 epidemiological categories (see Evidence coverage by disease category). Two categories have been declared as acceptable evidence gaps per MDCG 2020-6 § 6.5(e): autoimmune diseases (3% of dermatological presentations) and genodermatoses (1% of dermatological presentations). The acceptability of these gaps is justified in the section Need for more clinical evidence, and both are addressed through post-market data collection activities documented as Gaps 4 and 5 in the PMCF Plan (R-TF-007-002).
The clinical evidence is sufficient prior to MDR certification, as required by MDCG 2020-6 § 6.4. The PMCF activities are designed to confirm and extend the conclusions already supported by pre-market evidence; not to fill gaps where pre-market evidence is absent.
Clinical data collected from literature search
Literature search plan
The methodology for the literature search, conducted to identify clinical data pertinent to the device under evaluation, is fully described in the CEP (available in R-TF-015-001 Clinical Evaluation Plan and R-TF-015-011 State of the Art).
The person responsible for conducting this process was Mr. Jordi Barrachina — Clinical Research Coordinator, PhD (CV available in Annex I — CV and Declarations of Interest).
This portion of the Clinical Evaluation Report serves to outline and justify the methodology applied to the literature search. The objective of this search was to retrieve clinical data essential for the clinical evaluation that is not currently held by the manufacturer. The search for pertinent clinical data regarding the device under evaluation was performed in accordance with the Clinical Evaluation Plan (CEP), EU Regulation 2017/745, and the MEDDEV 2.7/1 rev 4 guidance document.
The identification of relevant publications to establish the State of the Art commenced with the definition of search objectives via the PICO methodology. Both inclusion and exclusion criteria are expressed in natural language, reflecting the characteristics of the target population, the device's clinical indications and specific features, the types of studies, and the desired measurable outcomes.
All executed searches are documented in the CEP (refer to the “Literature search protocol” section). These searches encompassed literature and vigilance databases, along with a review of available registries pertinent to this medical field. The keywords utilized to query these databases were selected based on the previously established inclusion and exclusion criteria.
Selection of references relating to the device under evaluation
The methodology followed for the selection of the publications is fully described in the CEP and the SotA document (available in R-TF-015-001 Clinical Evaluation Plan and R-TF-015-011 State of the Art). The results of all searches for the device are summarized in the flow diagram below.
Search execution and retrieval documentation
The device-specific search complements the broader State of the Art search documented in R-TF-015-011 State of the Art, which queried MEDLINE/PubMed, the Cochrane Library, the FDA MAUDE adverse-event database, the FDA Medical Device Recalls database, and EUDAMED — covering both peer-reviewed literature and post-market surveillance sources. Duplicates were identified by DOI exact match and confirmed by title-plus-first-author cross-check; no automated reference-manager de-duplication was used given the small device-specific corpus size, and the equivalent procedure for the larger State of the Art corpus is documented in R-TF-015-011. The full per-record list of retrieved articles — with database source, retrieval date, and record identifier — and the per-record exclusion log (with the individual exclusion reason for each excluded article) are maintained in R-TF-015-011, alongside the PRISMA-compliant search log for the device-specific search.
Appraisal of the clinical data relating to the device under evaluation
The appraisal of the ten pre-market clinical investigations has been conducted in conformity with MEDDEV 2.7.1 Rev 4 Section 9 (Stage 2), using design-specific validated tools for methodological quality assessment and the MDCG 2020-6 Appendix III evidence hierarchy for evidence ranking and weighting. Full appraisal tables and per-study interpretive commentary are documented in the section "Validated methodological quality appraisal."
All ten investigations are included in the evidence portfolio. Their relevance to the device's intended purpose is inherent: each was designed specifically to assess the safety, performance, or clinical utility of the device in its intended use population and clinical settings, and does not require scoring against external comparator relevance criteria.
The MDCG 2020-6 Appendix III hierarchy provides the evidence ranking and weighting framework for the portfolio: MC_EVCDAO_2019, COVIDX_EVCDAO_2022, and DAO_Derivación_O_2022 are Rank 2 (high-quality clinical investigations with some coverage gaps); IDEI_2023, DAO_Derivación_PH_2022, AIHS4_2025, and NMSC_2025 are Rank 4 (methodological limitations, data quantifiable and acceptability justifiable); BI_2024, PH_2024, SAN_2024 and MAN_2025 are Rank 11 (simulated-use MRMC studies). The MRMC studies are not "clinical data" under the strict MDR Article 2(48) definition (no live-patient data collection) and therefore sit at a lower evidence rank than the prospective real-patient studies; however, per MDCG 2020-1 §4.4 they contribute to Pillar 3 Clinical Performance because they demonstrate that intended users achieve clinically relevant outputs on images representative of the intended patient population when using the device's Top-5 prioritised differential.
Note that we consider the clinical study carried out with the legacy version of the device (LEGIT_MC_EVCDAO_2019) as part of the clinical data generated and held by the manufacturer, since equivalence is claimed. In addition to this, for safety and performance evaluation of the device we consider all the clinical studies carried out with the frozen version of the device under MDR, since all of them were designed to support the intended purpose of the device under evaluation and generate real-world evidence.
Appraisal of published severity validation literature (MDCG 2020-1 Pillar 2)
In addition to the pre-market clinical investigations, four peer-reviewed publications identified in the literature search (Group B, see Results of the literature search on the device under evaluation) provide Technical Performance evidence for the device's severity assessment algorithms: APASI_2025 (psoriasis/PASI), AUAS_2023 (urticaria/UAS), AIHS4_2023 (hidradenitis suppurativa/IHS4), and ASCORAD_2022 (atopic dermatitis/SCORAD). Each publication directly evaluates the device's severity scoring output against independent expert dermatologist consensus on a validated clinical severity scale.
Methodological quality has been assessed using MINORS (non-comparative, maximum 16), the same validated tool applied to other non-comparative Technical Performance evidence in this portfolio. The MINORS assessment, scores, and interpretive commentary are documented in the section "Validated methodological quality appraisal."
These publications are classified as Technical Performance evidence per MDCG 2020-1, not as clinical investigations under MDR Article 62. They contribute to the evidence base for benefit 5RB (Objective Severity Assessment) by demonstrating that the device's algorithms produce severity scores concordant with expert clinical consensus across four dermatological conditions. Their level of evidence is ranked 5–6 per MDCG 2020-6 Appendix III (retrospective validation with clinical reference standard).
Appraisal of published malignancy clinical evidence (MDCG 2020-1 Pillar 3)
One peer-reviewed publication provides Clinical Performance evidence for the device's malignancy detection capability in a specialist clinical setting: NMSC_2025.
NMSC_2025: Non-melanoma skin cancer (BCC / cSCC)
Medela A, Sabater A, Hernández Montilla I, et al.. European Archives of Oto-Rhino-Laryngology. 2025;282(3):1585–1592.
- Study design: Prospective diagnostic-accuracy study in a head and neck outpatient clinic (Donostia University Hospital, San Sebastián, Spain), with histological confirmation as the reference standard for all included cases.
- Dataset: 135 patients (92 male, 43 female; median age 71 ± 9 years) referred for suspicious skin lesions between June and December 2021. The dataset comprised 54 BCC, 54 cSCC, and 27 benign lesions (14 seborrheic keratoses, 2 actinic keratoses, 11 actinic cheilitis), all biopsy-confirmed. Images were captured using a standardised protocol (12 MP smartphone camera, 10 cm distance, ambient light, no zoom or flash) and analysed by the device's diagnosis support function.
- Key results on the H&N dataset: For malignant conditions overall, the device identified the correct diagnosis within the top-5 most likely outputs in 94.4% of cases (Top-1: 65.7%, Top-3: 90.7%). By condition: BCC achieved Top-5 accuracy of 98.1% (Top-5 sensitivity 0.96, specificity 0.56); cSCC achieved Top-5 accuracy of 90.7% (Top-5 sensitivity 0.85, specificity 0.58). For the binary malignant-versus-benign classification, the device achieved AUC-ROC of 0.93, indicating excellent discrimination between malignant and benign lesions. The relatively lower Top-5 specificity for malignant lesions reflects a deliberate characteristic of the device architecture: visual overlap between seborrheic keratosis and cSCC leads the device to include both within the top-5 predictions, increasing sensitivity at the cost of specificity in this specialist setting.
- Methodological quality: Assessed using QUADAS-2, the appropriate validated tool for prospective diagnostic accuracy studies. The QUADAS-2 assessment and interpretive commentary are documented in the section "Validated methodological quality appraisal."
- Classification: NMSC_2025 is classified as Clinical Performance evidence per MDCG 2020-1 Pillar 3, as it evaluates the device's diagnostic output against histologically confirmed malignancy in real patients in a clinical setting. It contributes to the evidence base for benefit 7GH (Diagnostic Accuracy), specifically corroborating the device's performance for BCC and cSCC detection. Its level of evidence is Rank 4 per MDCG 2020-6 Appendix III (prospective non-randomised single-centre study; methodological limitations inherent to the specialist referral design are acknowledged and do not invalidate the evidence contribution).
Regulatory and Administrative Details of Clinical Investigations
The following table summarizes the regulatory status, ethics committee approvals, and public registration details for each of the pre-market clinical investigations. The nine investigations that involve patient recruitment or patient-level data were conducted exclusively at sites within the European Union (specifically Spain); no investigation included non-EU sites, and consequently no non-EU Competent Authority correspondence applies. The tenth investigation (MAN_2025) is a Rank 11 simulated-use MRMC reader study performed on retrospective anonymised public-atlas images through a centralised web-based platform operated by the manufacturer; it does not recruit patients, does not meet the MDR Article 2(45) definition of a clinical investigation, and therefore does not trigger site-based Competent Authority notification or ethics-committee review (rationale recorded in R-TF-015-006 (MAN_2025) §Trial Registrations). Full correspondence with the Spanish Competent Authority (AEMPS) and with each Ethics Committee (CEIm) — notification letters, approval letters, and any subsequent interactions — is archived per study in the corresponding R-TF-015-006 Clinical Investigation Report as part of the technical documentation.
| Study ID | Clinical Investigation Report | Competent Authority (AEMPS) | Ethics Committee (CEIC/CEIm) Approval | Public Registrations | Publication Status | Protocol Deviations (complete list; detail in the corresponding R-TF-015-006 CIR) |
|---|---|---|---|---|---|---|
| MC_EVCDAO_2019 | R-TF-015-006 (MC_EVCDAO_2019) | Notified (Observational) | Approved: CEIm Euskadi (2022-01-13) Ref: PI2019216 | ClinicalTrials.gov: NCT06221397 EMA RWD: EUPAS108254 | Under review | Target sample size adjusted due to high malignancy prevalence (enrichment to 34.29% exceeded the 20% target, preserving statistical power for the primary sensitivity endpoint). |
| IDEI_2023 | R-TF-015-006 (IDEI_2023) | Notified (Observational) | Approved: CEIm HM Hospitals (2024-01-25) Ref: 24.12.2266-GHM | ClinicalTrials.gov: NCT05656709 EMA RWD: EUPAS1000000045 | Published (doi: 10.1101/2025.03.11.25323753) | Positive deviation: sample size increased to 204. |
| COVIDX_EVCDAO_2022 | R-TF-015-006 (COVIDX_EVCDAO_2022) | Notified (Observational) | Approved: CEIm Torrevieja/Elche-Vinalopó (2022-04-13) Ref: 12/04/22 LEGIT_COVIDX | ClinicalTrials.gov: NCT06237036 EMA RWD: EUPAS108260 | In preparation | Extended recruitment timeline; device version unchanged throughout. |
| DAO_Derivación_O_2022 | R-TF-015-006 (DAO_Derivacion_O_2022) | Notified (Observational) | Approved: CEIm Euskadi (2022-11-23) Ref: PS2022074 | ClinicalTrials.gov: NCT06228014 EMA RWD: EUPAS108167 | In preparation | 10 subjects excluded due to diagnostic confirmation gaps (pre-specified handling rule applied; analysis set n = 117). |
| DAO_Derivación_PH_2022 | R-TF-015-006 (DAO_Derivacion_PH_2022) | Notified (Observational) | Approved: CEIm Puerta de Hierro (2022-06-24) Ref: 47/395984.9/22 | ClinicalTrials.gov: NCT07429123 EMA RWD: EUPAS108166 | In preparation | Major protocol deviation: many participating HCPs used the device from the outset without first recording a baseline assessment without the device, preventing measurement of the planned primary comparative endpoint (improvement attributable to device use). The secondary malignancy-detection endpoints (AUC 0.84, melanoma specificity 91%) and HCP satisfaction scores remain valid as standalone measures. Detail in the CIR. |
| BI_2024 | R-TF-015-006 (BI_2024) | Notified (Observational) | Exempt (Justified in R-TF-015-011) | ClinicalTrials.gov: NCT07428915 EMA RWD: EUPAS1000000910 | Published (JMIR Dermatology) | Partial protocol completion by 40% of HCPs due to clinical workload (analysed on completed cases with pre-specified sub-analyses). |
| PH_2024 | R-TF-015-006 (PH_2024) | Notified (Observational) | Exempt (Justified in R-TF-015-011) | ClinicalTrials.gov: NCT07428941 EMA RWD: EUPAS1000000644 | In preparation | No significant deviations. |
| SAN_2024 | R-TF-015-006 (SAN_2024) | Notified (Observational) | Exempt (Justified in R-TF-015-011) | ClinicalTrials.gov: NCT07428954 EMA RWD: EUPAS1000000911 | In preparation | Partial protocol completion by 4 HCPs due to clinical scheduling. |
| AIHS4_2025 | R-TF-015-006 (AIHS4_2025) | Retrospective analysis | Exempt (Prior informed consent from original M-27134-01 trial) | Not separately registered. AIHS4_2025 is a retrospective secondary analysis of the pre-existing manufacturer-sponsored trial M-27134-01; no new enrolment or data collection occurred, and prior informed consent from M-27134-01 covers the retrospective analysis. | In preparation | No deviations from the retrospective analysis protocol. |
| MAN_2025 | R-TF-015-006 (MAN_2025) | Not applicable (simulated-use MRMC reader study on retrospective anonymised public-atlas images; does not meet the MDR Article 2(45) definition of a clinical investigation; no AEMPS notification required — rationale recorded in R-TF-015-006 (MAN_2025) §Trial Registrations). | Not applicable (no patient recruitment, no patient-identifiable data; the investigation is performed on retrospective anonymised atlas-derived images, and accordingly does not fall within the scope of biomedical-research ethics-committee review. Participating healthcare professionals act in their professional capacity as expert evaluators under a signed participation agreement and are not enrolled as research subjects within the meaning of biomedical-research law.) | Not separately registered. MAN_2025 is a simulated-use MRMC reader study on retrospective anonymised images and does not meet the MDR Article 2(45) definition of a clinical investigation; the registration rationale is documented in R-TF-015-006 (MAN_2025) §Trial Registrations. A voluntary post-hoc registration may be performed for transparency purposes. | In preparation | Three enrolled readers excluded as screen failures (specialties outside the device's declared intended user population per CIP §Exclusion criteria: Clinical Neurophysiology, Anatomical Pathology, and a non-target general profile without a current target-specialty residency); pre-specified screening rule applied and screen-failure submissions excluded from the primary analysis, from the board-certified sensitivity subset and from all per-specialty breakdowns. One reader meeting the CIP §Inclusion criteria completed only a partial number of cases at the data-lock date and contributes the completed observations to the primary analysis per the pre-specified analysis-population rule while being excluded from the ≥ 50%-completers and 100%-completers sensitivity analyses by definition. |
EUDAMED clinical-investigation registration status. None of the ten investigations carries a EUDAMED clinical-investigation single-registration identifier (CIV-ID). The nine investigations involving patient-level data are observational studies (prospective observational, retrospective observational, or MRMC simulated-use on patient-sourced images) notified to the Spanish Competent Authority (AEMPS) under the observational-study regime. The tenth investigation (MAN_2025) is a simulated-use MRMC reader study performed on retrospective anonymised public-atlas images; it does not recruit patients, does not meet the MDR Article 2(45) definition of a clinical investigation, and therefore is not subject to AEMPS notification or EUDAMED CIV registration, as recorded in R-TF-015-006 (MAN_2025) §Trial Registrations. Under MDR Article 74, only clinical investigations sponsored to demonstrate conformity — including PMCF investigations that interfere with normal use under MDR Article 62 — require EUDAMED CIV registration; observational investigations outside the scope of MDR Articles 62 and 74, and simulated-use investigations on retrospective anonymised images outside the scope of MDR Article 2(45), are not subject to that obligation. Any future clinical investigation conducted under MDR Article 62 (including interventional PMCF studies defined in R-TF-007-002) will be registered in EUDAMED prior to study start; this commitment is recorded in the PMCF Plan.
CIP compliance with MDR Annex XV and ISO 14155:2020 Annex A. Each of the ten Clinical Investigation Plans was written and executed in compliance with (a) the content requirements of MDR Annex XV Chapter II §3 (Clinical Investigation Plan), and (b) the content requirements of ISO 14155:2020 Annex A (Clinical Investigation Plan content). A clause-by-clause conformance mapping for each CIP against Annex XV and ISO 14155:2020 Annex A is maintained in the corresponding R-TF-015-006 Clinical Investigation Report. The nine patient-data investigations were conducted under the observational-study regime of AEMPS and therefore did not trigger MDR Article 62 (interventional clinical investigation) or MDR Article 74 (PMCF investigation interfering with normal use); the tenth investigation (MAN_2025) is a Rank 11 simulated-use MRMC reader study performed on retrospective anonymised images outside the scope of MDR Article 2(45) (see R-TF-015-006 (MAN_2025) §Trial Registrations). ISO 14155:2020 Annex A is applied to each of the ten investigations via its "applicable clauses" provision and has been fully followed.
Consistency of published manuscripts with Clinical Investigation Reports. For the two studies published in peer-reviewed venues, material consistency between the CIR and the publication is confirmed: IDEI_2023 (doi: 10.1101/2025.03.11.25323753) reports the same primary endpoint values, per-subgroup analyses, and sample size (n = 204) as the CIR archived in R-TF-015-006 (IDEI_2023); BI_2024 (JMIR Dermatology) reports the same primary endpoint values and per-HCP-tier sub-analyses as the CIR archived in R-TF-015-006 (BI_2024), including the partial protocol completion acknowledgement. For MC_EVCDAO_2019 (under review) and the six manuscripts in preparation, manuscripts will be reconciled against the archived CIR prior to submission; any material differences will be documented in the next CER update.
Validity of conclusions against the approved Clinical Investigation Plan. For each investigation, the conclusions reported in the CIR and summarised in this CER remain valid in light of the approved Clinical Investigation Plan, taking into account the protocol deviations listed above. MC_EVCDAO_2019 enrolment enrichment preserved statistical power for the primary sensitivity endpoint (sensitivity > 0.90 for melanoma); IDEI_2023 sample-size positive deviation to 204 strengthened the evidence base; COVIDX extended recruitment did not alter the device version tested; DAO_Derivación_O data-quality exclusions reduced the analysis set to 117 per the pre-specified handling rule; DAO_Derivación_PH had a major protocol deviation that prevented measurement of the planned primary comparative endpoint (many HCPs used the device from the outset without first recording an unaided baseline), with the secondary malignancy-detection endpoints (AUC 0.84, melanoma specificity 91%) and HCP satisfaction remaining valid as standalone measures and corroborated by DAO_Derivación_O for the referral-pathway claim; BI_2024 partial protocol completion was analysed on completed cases with pre-specified sub-analyses and is corroborated by SAN_2024 and PH_2024; PH_2024 had no significant deviations; SAN_2024 partial protocol completion remained within the multi-HCP, multi-site design margin; AIHS4_2025 had no deviations from the retrospective analysis protocol, with the n = 2 generalisability limitation explicitly acknowledged and addressed by the planned PMCF confirmatory study.
Adequacy of the Instructions for Use — and, where applicable, of a Summary of Safety and Clinical Performance — in reflecting the outcomes of these investigations is evaluated in section Consistency with information materials supplied by the manufacturer.
Representativeness of the Study Populations (Demographics & Skin Phototypes)
To ensure the clinical data adequately represents the intended target population, demographic and skin pigmentation data (Fitzpatrick phototype) were collected across the clinical investigations. The following table summarizes the patient diversity.
| Study ID | Sex Distribution | Age Distribution | Fitzpatrick Skin Phototypes |
|---|---|---|---|
| BI_2024 | 63.4% M, 36.6% W | ≥22: 63.4%, 2-21: 33.7%, <2: 3.0% | I: 20%, II: 43%, III: 22%, IV: 9%, V: 6%, VI: 0% |
| IDEI_2023 | 27.5% M, 72.5% W | Mean 53.84 ± 21.53 | I: 63.4%, II: 23.2%, III: 12.5%, IV: 0.9%, V: 0%, VI: 0% |
| MC_EVCDAO_2019 | 50.5% M, 49.5% W | Mean 62.10 ± 15.30 | I: 87.1%, II: 9.8%, III: 2.5%, IV: 0.6%, V: 0%, VI: 0% |
| PH_2024 | 46.7% M, 53.3% W | ≥22: 93.3%, <2: 6.6% | I: 33.3%, II: 40%, III: 23.3%, IV: 3.3%, V: 0%, VI: 0% |
| SAN_2024 | 60.7% M, 39.3% W | ≥22: 78.6%, 2-21: 17.9%, <2: 3.6% | I: 42.8%, II: 42.8%, III: 7.2%, IV: 3.6%, V: 3.6%, VI: 0% |
| DAO_Derivación_PH_2022 | 46.6% M, 53.4% W | Mean: 46.71 ± 25.02. ≥22: 81.6%, 2-21: 16.8%, <2: 1.5% | I: 48.3%, II: 36.7%, III: 12.2%, IV: 2.2%, V: 0.6%, VI: 0% |
| DAO_Derivación_O_2022 | 36.2% M, 63.8% W | Mean 59.89 ± 20.70 | I: 67.7%, II: 22.9%, III: 7.5%, IV: 1.5%, V: 0.5%, VI: 0% |
| COVIDX_EVCDAO_2022 | 39.38% M, 60.62% W | Mean: 16.57 ± 12.50; ≥22: 15%, 2-21: 66.25%, <2: 18.75% | I: 49.3%, II: 38.5%, III: 10.9%, IV: 1.3%, V: 0%, VI: 0% |
| MAN_2025 | Not applicable (image-based simulated-use investigation; no patient-level sex recorded) | Not applicable (image-based simulated-use investigation; no patient-level age recorded) | V–VI: 100% (149 curated images, all representative of Fitzpatrick phototype V or VI presentations; within-V/VI split not specified in the protocol) |
Note on skin phototype coverage: Across the patient-data investigations, Fitzpatrick V and VI skin types are underrepresented. The MAN_2025 simulated-use MRMC reader study was designed specifically to address this coverage gap at the pre-market stage: it evaluates healthcare professionals' device-assisted Top-1 diagnostic accuracy on 149 curated images representative of Fitzpatrick phototype V and VI presentations, sourced from public dermatology atlases. Under MDCG 2020-6 Appendix III the resulting evidence is Rank 11 (simulated-use reader data) and contributes MDCG 2020-1 Pillar 3 §4.4 supporting evidence for Fitzpatrick-phototype generalisability; it is distinct from, and positioned below, real-world Pillar 3 evidence on Fitzpatrick V and VI patients in routine care, which is addressed through Post-Market Clinical Follow-up. The supplementary literature search confirms this is a field-wide limitation. External evidence is mixed: Walker et al. (2025) and Dulmage et al. (2021) demonstrate no statistically significant difference in AI diagnostic accuracy across Fitzpatrick I–III and IV–VI (Walker 2025: AUC 0.858 vs. 0.856, p = NS; Dulmage 2021: accuracy 70% vs. 68%, p = 0.79), and Tepedino et al. (2024) found device specificity higher in Fitzpatrick IV–VI than I–III (69.1% vs. 53.2%) in a primary care study including 27.1% Fitzpatrick V patients. Conversely, Tjiu and Lu (2025) — a meta-analysis of 18 studies — document a persistent AUROC gap (0.89 for I–III vs. 0.82 for IV–VI) across the published AI dermatology literature, and Liu et al. (2023) conclude that the SotA itself has insufficient evidence to characterise AI performance specifically in Fitzpatrick V–VI. Both human specialists and AI systems are less accurate in darker skin tones (Groh et al., 2024: 4 percentage point gap for specialists), confirming the limitation is not device-specific. Per MDCG 2020-6 § 6.5(e), limited Fitzpatrick V–VI representation is declared an acceptable gap: the device's ViT-based architecture assesses relative lesion intensity rather than absolute pixel values, reducing phototype sensitivity compared to pixel-classification approaches; the device was internally tested on 112 Fitzpatrick IV–VI images (ASCORAD_2022); and the deployment context (Spain) has inherently low Fitzpatrick V–VI prevalence. PMCF activities include commitment to report performance stratified by Fitzpatrick phototype.
Note on pediatric coverage: Pediatric patients are underrepresented in the pre-market study portfolio as a whole. However, targeted paediatric subgroup analyses conducted within the BI_2024 and PH_2024 clinical investigations provide dedicated evidence for child and infant age groups. In BI_2024, the child subgroup (2–12 years, comprising 14 of 101 study images, 13.9% of the dataset) showed an overall diagnostic accuracy improvement of +11.27 percentage points across all HCPs (+13.42 pp for PCPs, +5.36 pp for dermatologists), with the most prevalent paediatric condition — impetigo — yielding a sensitivity gain of +20.37 pp across all HCPs. In PH_2024, the infant subgroup (1 month to 2 years) showed an accuracy improvement of +33.33 pp, and the child subgroup (2–12 years) showed an improvement of +11.11 pp. These findings are exploratory given the small case counts (fewer than 5 images per practitioner for some disease-age combinations) and are not extrapolated beyond the conditions and age ranges directly analysed. A supplementary literature search of 26 papers on pediatric AI dermatology identified only 1 qualifying study with AI diagnostic performance data in a dedicated pediatric population — Yu et al. (2025), reporting AUC 0.91 for deep learning diagnosis of childhood vitiligo in 474 pediatric patients, outperforming dermatologists (AUC 0.77). The thin yield across 26 results confirms that pediatric AI dermatology validation is a recognised SotA limitation, not a device-specific gap. Per MDCG 2020-6 § 6.5(e), limited pediatric representation is declared an acceptable gap for complete pediatric population coverage, with PMCF monitoring committed to tracking pediatric case proportions and age-stratified performance in real-world deployment.
Note on low-prevalence sub-indication categories (autoimmune dermatoses and genodermatoses). Two low-prevalence sub-indication categories — autoimmune dermatoses (~3 % of real-world dermatological presentations) and genodermatoses (~1 %) — are under-represented in the pivotal real-patient investigations because of low population prevalence and because the clinical diagnostic pathway for these conditions relies on serological, histopathological and genetic testing that is not directly reproduced in an image-based study. These two categories remain within the intended use. Pre-certification evidence is triangulated under MDCG 2020-6 §6.3 on the four-test analysis set out below, and is confirmed post-certification by PMCF Activities D.1 (autoimmune) and D.2 (genodermatoses) in R-TF-007-002.
- Narrow and bounded scope. The two sub-indication categories account for approximately 4 % of real-world presentations combined; the remaining ~96 % carry the core benefit-risk determination independently.
- Core benefit-risk independence. The three declared benefits (7GH Diagnostic Accuracy, 5RB Objective Severity Assessment, 3KX Care-Pathway Optimisation) are independently evidenced on the remaining ~96 % of presentations by the pre-market confirmatory clinical investigations and by the legacy-predecessor real-world evidence corpus, and do not depend on the two sub-indication categories.
- Adequate residual evidence (two independently-scoring anchors). (a) Pillar 1 Valid Clinical Association — a dedicated structured literature review (22 load-bearing CRIT1–7 ≥ 15/21 anchors) appended to
R-TF-015-011 State of the Art§Autoimmune and genodermatoses establishes that image-based clinical recognition is an accepted standard for the named conditions (DLE, lichen planus, dermatomyositis, pemphigus vulgaris, bullous pemphigoid, mucous membrane pemphigoid, morphea, cutaneous vasculitis for autoimmune; ichthyoses, NF1/NF2, tuberous sclerosis complex, epidermolysis bullosa for genodermatoses). Oral lichen planus retains a residual coverage item routed to PMCF Activity D.1. (b) Pillar 2 Technical Performance —R-TF-028-006 AI Release Report§Per-Epidemiological-Group Performance reports, measured on the device's stand-alone analytical output without a clinician in the loop, AUC 0.948 (95 % CI 0.941 – 0.954, N = 2,040, 38 classes) for autoimmune dermatoses and AUC 0.905 (95 % CI 0.886 – 0.924, N = 391, 31 classes) for genodermatoses, both above the pre-specified ≥ 0.80 acceptance criterion inherited fromR-TF-028-002 AI Development Plan. The ≥ 0.80 threshold is retained across the per-epidemiological-group sub-analysis because both measure types (binary indicators and per-epidemiological-group ICD discrimination) quantify the same device property — the probability that the correct class is scored higher than any incorrect class — and because MDCG 2020-1 §4.4 Pillar 2 does not require a different numerical threshold for multi-class versus binary analytical measures. The MDCG 2020-6 Appendix III evidence-hierarchy appraisal, set out inR-TF-015-001, records that simulated-use MRMC reader studies "do not constitute 'clinical data' under the strict MDR Article 2(48) definition" and that Rank 11 supporting evidence from such studies cannot, on its own, carry a pre-certification sufficient-evidence determination on a named sub-indication; Pillar 1 Valid Clinical Association and Pillar 2 Technical Performance are therefore the two independently-scoring Test 3 anchors for these sub-indication categories and MRMC evidence, where applicable, is Pillar 3 §4.4 supporting evidence only. (c) Why the absence of pre-certification Pillar 3 evidence is regulatorily acceptable for these two sub-indications — three cumulative elements: the device's CLAIM for these sub-categories is qualified by the Device Output Warning as supporting information within the HCP's differential-diagnosis workup (see sectionConsolidated limitations of the deviceand the IFU Device Output Warnings); final-diagnosis responsibility rests wholly with the healthcare professional and is confirmed by histopathological, serological or genetic testing outside the device's loop, consistent with the current standard of care for autoimmune dermatoses and genodermatoses; and Pillar 3 real-world clinical performance is pre-specified for post-certification confirmation in PMCF Activities D.1 and D.2. (d) Scope boundary with adjacent coverage gaps — MAN_2025 (Pillar 3 §4.4 at Rank 11; Fitzpatrick V–VI phototype representativeness) and the legacy-device real-world-evidence studyR-TF-015-012(Pillar 3 at Rank 8 primary with a supplementary case at Rank 4 for the quantitative endpoints) address different sub-indication and coverage considerations and are therefore not invoked as Test 3 anchors for autoimmune dermatoses or genodermatoses; both remain part of the broader evidence envelope that carries the three declared benefits for the remaining ~96 % of presentations. - PMCF confirmation (confirms and strengthens; does not fill or close). Activity D.1 (prospective surveillance of autoimmune dermatoses, 50-case target justified on a systematic-misclassification-detectability rationale, 12-month and 36-month interim analyses, Top-3 ≥ 60 % primary safety-floor acceptance criterion together with a non-inferiority secondary criterion against the V&V-demonstrated Top-3, safety and surveillance triggers) and Activity D.2 (passive surveillance of genodermatoses with safety and coverage triggers, plus an early Pillar 3-equivalent performance readout on the legacy-predecessor post-market report corpus scheduled into the first PMS Update Report) in
R-TF-007-002confirm and strengthen the pre-certification base in routine clinical deployment. The PMCF Plan additionally commits to analysing the autoimmune-dermatoses and genodermatoses slices of the ≈ 250,000 legacy-predecessor post-market report corpus in the first PMS Update Report (R-TF-007-003), consistent with MDCG 2020-6 §6.3 under which PMCF confirms and strengthens an adequately-evidenced pre-certification base and is not invoked to fill or close pre-certification evidence gaps.
The claim for these two sub-indication categories is qualified by an output-interpretation warning rendered in the Device Output Warnings section of the IFU, which states that for autoimmune dermatoses and genodermatoses the device provides a probability ranking within the broader ICD-11 output distribution that is to be interpreted as supporting information within the healthcare professional's differential-diagnosis workup; final diagnosis for these categories is an HCP determination based on clinical evaluation and histopathological, serological or genetic examination as per the current standard of care, and the device's probability ranking is supporting information only. The intended use and indication remain unchanged.
Methodological approach to data integration and pooling
To provide a comprehensive assessment of the device's performance, results from multiple clinical investigations have been integrated. When results are presented as an aggregate (e.g., "global value of the device"), a weighted pooling methodology is applied.
- Pooling Logic: Results are pooled across studies that share similar clinical endpoints and methodologies. For instance, diagnostic accuracy metrics (Sensitivity, Specificity, AUC) are pooled from studies evaluating the device's performance across diverse ICD-11 categories.
- Weighting: Each study's contribution to the global value is weighted by its sample size (), ensuring that larger, more robust studies have a proportional impact on the final performance estimate.
- Justification: Pooling is justified because all pivotal studies utilized the same frozen version of the AI algorithms and followed consistent acquisition protocols, despite differences in specific primary objectives (e.g., triage vs. referral). This provides a more statistically powerful estimate of the device's "real-world" performance across heterogeneous populations.
- Limitations: The evaluators acknowledge the inherent heterogeneity between prospective interventional trials and retrospective observational studies. To mitigate this, subgroup analyses are performed and presented alongside the global values in the Technical Documentation.
Results of the literature search on the device under evaluation
Summary of the identified clinical studies on the device
In this search, several records of clinical data has been identified. The clinical data identified for the device under evaluation in ClinicalTrials.gov corresponds to the registration of two preclinical studies already described in the section Pre-market clinical investigations: Legit.HEALTH_IDEI_2023 and LEGIT_COVIDX_EVCDAO_2022. On the other hand, one of the articles found in PubMed and Google scholar was duplicated in both databases. Therefore, a total of 13 articles were identified. This article entitled "Skin & Digital: The 2024 Startups" summarizes the digital innovations in dermatology and aesthetics presented at the 2024 Skin & Digital Summit. It focuses on several start-ups redefining the sector using technologies like artificial intelligence (AI) and telehealth. For this reason, this article does not provide information about clinical data of the device.
Secondly, a detailed appraisal of the 12 articles identified from PubMed (10) and Google Scholar (2) was conducted to distinguish between publications describing purely algorithmic optimisation and those reporting clinical validation of the device's algorithms against expert clinical consensus.
The selection criteria target (i) peer-reviewed publications concerning the device under evaluation and (ii) publications concerning the manufacturer's legacy device, to which equivalence is claimed per MDR Annex XIV Part A §3. No third-party equivalent device is claimed; publications on comparable devices (SkinVision, DERM, Dermalyser, ModelDerm, HUVY and the other comparators identified in R-TF-015-011 State of the Art) are therefore treated under the State of the Art search rather than under this device-specific literature review.
Group A: Algorithmic optimisation studies (8 articles, excluded): Eight publications describe preclinical (in-silico) results focused on the mathematical optimisation of neural network architectures and benchmarking against public datasets (e.g., ISIC Archive) without clinician involvement or prospective clinical workflows. Per the definition of "clinical data" in MDR Article 2(48), these results do not constitute clinical data as they do not arise from the use of the device in or on humans in a clinical setting. This information is appropriately addressed in the preclinical validation sections of the Technical File.
Group B: Published severity validation studies (4 articles, included as Technical Performance evidence per MDCG 2020-1 Pillar 2): Four peer-reviewed publications describe the validation of the device's severity assessment algorithms against expert dermatologist consensus using internationally validated clinical severity scales. Unlike the Group A publications, these studies involve direct comparison of the device's algorithmic output against the clinical judgment of independent specialist dermatologists serving as the reference standard. The four publications are:
- Mac Carthy T, Dagnino D, Medela A, et al.. "Artificial intelligence-based quantification to assess the Automatic Psoriasis Area and Severity Index." JEADV Clinical Practice. 2025;4(1):70143. doi:10.1002/jvc2.70143: Validation of the APASI system for automated PASI scoring using 2,857 psoriasis images annotated by 4 expert dermatologists (visual sign classification) and 3 healthcare professionals (lesion segmentation).
- Mac Carthy T, Hernández Montilla I, Aguilar A, et al.. "Automatic Urticaria Activity Score: Deep learning-based automatic hive counting for urticaria severity assessment." JID Innovations. 2024;4(1):100218. doi:10.1016/j.xjidi.2023.100218: Validation of the AUAS system for automated UAS scoring against 5 expert dermatologists using 313 images from 231 subjects.
- Hernández Montilla I, Medela A, Mac Carthy T, et al.. "Automatic International Hidradenitis Suppurativa Severity Score System (AIHS4): A novel tool to assess the severity of hidradenitis suppurativa using artificial intelligence." Skin Research and Technology. 2023;29(6):e13357. doi:10.1111/srt.13357: Validation of the AIHS4 system for automated IHS4 scoring against 6 specialist dermatologists using 221 hidradenitis suppurativa images.
- Medela A, Mac Carthy T, Aguilar Robles SA, et al.. "Automatic SCOring of Atopic Dermatitis using deep learning: A pilot study." JID Innovations. 2022;2(3):100107. doi:10.1016/j.xjidi.2022.100107: Validation of the ASCORAD system for automated SCORAD scoring against 9 dermatologists (3 per dataset across 3 annotated datasets), achieving RMAE of 13.0% and AUC of 0.93 for lesion segmentation.
These publications constitute Technical Performance evidence under MDCG 2020-1 (Clinical evaluation of medical device software), demonstrating that the device's severity scoring algorithms produce outputs consistent with expert clinical consensus on validated severity scales (PASI, UAS, IHS4, SCORAD). Their analysis is presented in the section Analysis of published severity validation studies below.
The literature search for the device under evaluation followed the same rigorous PICO protocol and appraisal methodology as the State of the Art (SotA) search, with the only difference being the addition of manufacturer-specific keywords (manufacturer name and legal group identifier) to specifically target publications related to the subject device. This ensures that no relevant peer-reviewed clinical studies were missed.
Comprehensive identification was tested by re-running the PubMed algorithm at the closure of the April 2026 supplementary literature review described in section Supplementary Literature Review: April 2026; no new eligible records were retrieved beyond those already captured in the original device-specific search and the supplementary review, supporting the conclusion that the literature retrieval is comprehensive for the evaluation period. No deviations from the CEP-defined literature search protocol occurred during either the original device-specific search or the April 2026 supplementary review; the equivalent statement for the State of the Art search is provided in R-TF-015-011 State of the Art. Full-text copies of each retained peer-reviewed publication (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022, and NMSC_2025) are held in the technical documentation alongside the literature search log.
Analysis of published severity validation studies
The four Group B publications identified in the literature search provide Technical Performance evidence (MDCG 2020-1, Pillar 2) for the device's severity assessment algorithms. This section presents the appraisal and analysis of these publications per MEDDEV 2.7.1 Rev 4, Stages 1–3.
Algorithm traceability and architecture selection
Each publication evaluated multiple candidate neural network architectures to identify the optimal model for the specific severity assessment task. This comparative design is consistent with the algorithm development methodology documented in the AI Development Report and aligns with MDCG 2020-1 requirements for documenting algorithm selection rationale. The architecture selected and deployed in the production device for each severity scoring algorithm is:
- APASI (psoriasis severity): The MiT_b2-clf (Mix Vision Transformer) encoder architecture was selected based on superior performance across all three PASI visual signs (erythema, induration, desquamation). Other architectures evaluated included ResNet, SE_ResNeXt, Xception, Inceptionv4, EfficientNet, and additional MiT variants.
- AUAS (urticaria severity): A YOLOv5-based object-detection architecture was selected for automatic hive counting. Multiple YOLOv5 variants (YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x) were evaluated.
- AIHS4 (hidradenitis suppurativa severity): A YOLOv5-based architecture was selected for automatic lesion detection (nodules, abscesses, draining tunnels). Four YOLOv5 variants were evaluated.
- ASCORAD (atopic dermatitis severity): A custom architecture combining lesion-surface segmentation with visual-sign intensity classification was developed.
All selected architectures were frozen after selection and are the production algorithms deployed in the device. The performance results presented below correspond exclusively to the frozen production architectures. All results were obtained through k-fold cross-validation (5-fold for APASI, 4-fold for AUAS, 6-fold for AIHS4, 6-fold and 3-fold for ASCORAD), ensuring that evaluation data was independent of training data within each fold. The datasets used in these studies are inherently heterogeneous: they draw from multiple independent dermatological atlas sources (including DermNet, DermQuest, Danderm, and clinical atlas collections), cover Fitzpatrick skin types I through VI, and include diverse body zones, imaging conditions, and disease severities. ASCORAD_2022 additionally employs a dedicated independent test dataset (AD-Test, 367 images) gathered from different atlas sources than the training data. This multi-source dataset composition addresses the generalisability requirement of MDCG 2020-1 for Technical Performance evidence.
Per-publication analysis
1. APASI: Psoriasis severity (PASI)
Mac Carthy T, Dagnino D, Medela A, et al.. JEADV Clinical Practice. 2025;4(1):70143.
- Study design: Retrospective, non-interventional validation study using the Legit.Health-PsO-PASI dataset (2,857 psoriasis images sourced from clinical atlases, Fitzpatrick skin types I–VI with 92% types I–IV and 8% types V–VI).
- Expert annotators: Two independent annotator groups: Dataset-S (1,364 images, 3 annotators for segmentation) and Dataset-VS (2,500 images, 4 independent expert dermatologists with ≥7 years' experience for visual sign intensity classification). Ground truth established by averaging expert annotations.
- Reference standard: Expert dermatologist consensus on PASI visual sign intensities (erythema, induration, desquamation) using the validated 0–4 ordinal scale.
- Results (production architecture, MiT_b2-clf):
| Visual Sign | Device Accuracy | Human Annotator Accuracy (one-vs-rest) | Device vs. Human |
|---|---|---|---|
| Erythema | 60.6% | 52.5% | Device superior |
| Induration | 54.3% | 51.6% | Device superior |
| Desquamation | 61.8% | 59.0% | Device superior |
- Interobserver agreement (Cohen's kappa, quadratic weighting): Erythema 0.643, Induration 0.684, Desquamation 0.758, indicating moderate to substantial agreement among human annotators. The device's accuracy exceeding the human annotator benchmark demonstrates that the algorithm performs at or above the level of expert dermatologists for PASI visual sign classification.
- Lesion segmentation: The Xception model achieved IoU 0.752 for lesion surface segmentation, outperforming expert dermatologists.
- Limitations: Retrospective image-based study using atlas images rather than prospective clinical encounters; limited representation of severity degree 4 (most severe); annotations by non-dermatologist healthcare professionals for segmentation task.
2. AUAS: Urticaria severity (UAS)
Mac Carthy T, Hernández Montilla I, Aguilar A, et al.. JID Innovations. 2024;4(1):100218.
- Study design: Retrospective, non-interventional validation study using 313 urticaria images from 231 subjects, collected from dermatology atlases.
- Expert annotators: 5 expert dermatologists who frequently care for urticaria patients. Ground truth established through consensus annotation.
- Reference standard: Expert consensus on hive detection and severity categorisation using the validated Urticaria Activity Score (UAS) system (none, mild, moderate, severe).
- Results (production architecture, YOLOv5):
| Metric | Value | Interpretation |
|---|---|---|
| Krippendorff alpha (hive counting) | 0.826 | Strong agreement between device and expert consensus |
| Krippendorff alpha (severity assessment) | 0.603 | Moderate agreement, comparable to inter-expert agreement |
| F1-Box score (hive detection) | 0.622 ± 0.047 | Comparable to specialist-level detection |
- Per-severity performance: The device achieved F1-Box scores of 0.87 (none), 0.54 (mild), 0.61 (moderate), 0.55 (severe), comparable to specialist performance at each severity level (mean specialist F1-Box: 0.57 mild, 0.47 moderate, 0.41 severe).
- Skin tone analysis: Performance was assessed on both light and dark skin images, demonstrating applicability across skin tones.
- Limitations: Retrospective atlas-based images; inter-expert variability in severe cases with overlapping hives; small annotation team (5 specialists).
- SotA contextualisation of Krippendorff α = 0.603: No published study reports clinician-to-clinician inter-rater agreement for the UAS, because UAS is a patient-reported outcome instrument rather than a clinician-assessed scale. The available reliability evidence reflects patient self-consistency under controlled conditions: Hollis et al. (2018, n = 614 CSU patients, ASSURE-CSU study) reported weighted kappa κ = 0.78–0.82 between different UAS7 completion protocols; Jauregui et al. (2019, n = 166 CSU patients, EVALUAS study) reported test-retest ICC = 0.84 (Cronbach α = 0.83) for the validated Spanish UAS7. The device infers severity from clinical photographs — an intrinsically more demanding task than patient self-report — and achieves Krippendorff α = 0.603, placing it at the moderate-to-approaching-substantial range (Landis and Koch 1977: "substantial" begins at 0.61). Because no clinician inter-rater UAS benchmark exists in the published literature, α = 0.603 is appropriately treated as a pre-market baseline, with trajectory monitoring committed in PMCF activity B.2.
3. AIHS4: Hidradenitis suppurativa severity (IHS4)
Hernández Montilla I, Medela A, Mac Carthy T, et al.. Skin Research and Technology. 2023;29(6):e13357.
- Study design: Retrospective validation study using the Legit.Health-HS-IHS4 dataset (221 images of HS at different severity levels from DermQuest and DermNetNZ).
- Expert annotators: 6 specialist dermatologists (including 1 senior dermatologist with decades of experience and the highest degree of HS specialisation). Ground truth established through a novel four-stage clinical knowledge unification algorithm incorporating majority voting.
- Reference standard: Clinical consensus on IHS4 lesion detection (nodules, abscesses, draining tunnels) and the resulting IHS4 severity score (mild ≤3, moderate 4–10, severe ≥11).
- Results (production architecture, YOLOv5, Legit.Health-IHS4Net): The device assesses the severity of HS cases with a performance comparable to that of the most expert physician. The clinical knowledge unification algorithm produces a consensus ground truth that consolidates the subjective assessments of multiple experts into a reliable reference standard.
- Annotation variability analysis: The specialists' clinical experience in treating HS averaged 4.50 ± 1.38 years, they use IHS4 in daily clinical practice at a frequency of 8.17 ± 1.47 (on a 1–10 scale), and they reported the annotation task difficulty at 6.50 ± 1.56 (on a 1–10 scale). This confirms high inter-observer variability in the manual IHS4 assessment process, which the device aims to reduce.
- Limitations: Dataset from dermatology atlases (DermQuest and DermNetNZ, static images, not clinical encounters); limited dataset size (221 images); annotation variability among specialists (task difficulty rated 6.50 ± 1.56 on a 1–10 scale).
4. ASCORAD: Atopic dermatitis severity (SCORAD)
Medela A, Mac Carthy T, Aguilar Robles SA, et al.. JID Innovations. 2022;2(3):100107.
- Study design: Retrospective validation study using three annotated datasets comprising 1,083 images total: Legit.Health-AD (604 images, light skin, children and adults), Legit.Health-AD-Test (367 images, light skin, independent test set), and Legit.Health-AD-FPK-IVI (112 images, Fitzpatrick IV–VI dark skin).
- Expert annotators: 9 dermatologists (3 per dataset), all treating AD patients in daily clinical practice. Annotation consistency assessed with ACC 81.0–91.3%, AUC 0.91, F1 0.86–0.91, and RSD 8.6–9.1% across datasets.
- Reference standard: Expert dermatologist consensus on the six objective SCORAD visual signs (erythema, edema, oozing, excoriations, lichenification, dryness) using the validated 0–3 intensity scale, plus lesion surface area estimation.
- Results (production architecture, Legit.Health-SCORADNet):
| Metric | Value | Interpretation |
|---|---|---|
| RMAE (visual sign severity) | 13.0% | Algorithm severity estimates within 13% of expert consensus |
| AUC (lesion surface segmentation, light skin) | 0.93 (95% CI: 0.90–0.96) | Excellent discriminative performance |
| IoU (lesion surface segmentation, light skin) | 0.64 (95% CI: 0.59–0.69) | Good spatial agreement with expert segmentation |
- Per-visual-sign analysis: Per-sign RMAE ranged from 8.7% (lichenification) to 19.4% (oozing/crusts), with individual values of 13.3% (erythema), 16.0% (edema/papulation), 9.6% (excoriations), and 11.3% (dryness), and an overall RMAE of 13.0% across all six SCORAD visual signs. This demonstrates the algorithm's capability to assess each visual sign with accuracy comparable to inter-expert agreement.
- Skin tone analysis: The study explicitly assessed performance on dark skin (Fitzpatrick types V–VI), showing improved segmentation metrics when dark skin images were included in training (IoU improvement from 0.32 to 0.45).
- Limitations: Three annotators per dataset (though 9 total across the 3-dataset design); highest performance on light skin with lower but improving metrics on dark skin; atlas-based images (consistent with the other three published validation studies). Note: the publication title includes "Pilot Study," but this reflects the authors' conservative positioning relative to prospective clinical validation, not a methodological limitation; the study uses the same retrospective validation design as the other three publications, with the second-largest dataset (1,083 images) and the most expert annotators (9) of the four.
Summary of published severity evidence and relationship to benefit 5RB
The four publications collectively demonstrate the device's capability to perform objective, quantitative severity assessment across four major dermatological conditions using internationally validated clinical scales. The following table summarises the evidence and its relationship to the acceptance criteria for benefit 5RB (Objective Severity Assessment):
| Condition | Scale | Source Publication | Sample Size | Key Metric | Result | Relationship to 5RB |
|---|---|---|---|---|---|---|
| Psoriasis | PASI | APASI_2025 | 2,857 images, 7 annotators (4 dermatologists) | Visual sign accuracy vs. expert consensus | Device exceeds human annotator accuracy (60.6% vs 52.5% erythema) | Demonstrates objective, quantitative, and reproducible PASI assessment |
| Urticaria | UAS | AUAS_2023 | 313 images, 5 dermatologists | Krippendorff alpha (counting / severity) | 0.826 / 0.603 | Demonstrates objective hive counting and severity categorisation |
| Hidradenitis Suppurativa | IHS4 | AIHS4_2023 | 221 images, 6 dermatologists | Agreement with clinical consensus | Comparable to most expert physician | Demonstrates automated IHS4 scoring comparable to specialist performance |
| Atopic Dermatitis | SCORAD | ASCORAD_2022 | 1,083 images (3 datasets), 9 dermatologists | RMAE visual sign severity | 13.0% | Demonstrates objective SCORAD assessment with accuracy comparable to experts |
| Hidradenitis Suppurativa | IHS4 | AIHS4_2025 (preliminary) | 2 patients, 16 assessments | ICC | 0.727 (criterion ≥ 0.70) | Preliminary real-world longitudinal evidence; limited sample; prospective confirmation at scale planned in PMCF B.1 |
This multi-condition evidence base, drawn from published peer-reviewed literature, establishes that the device's severity assessment algorithms produce clinically meaningful scores concordant with expert dermatologist judgment across psoriasis, urticaria, hidradenitis suppurativa, and atopic dermatitis, addressing the breadth of conditions covered by benefit 5RB. These publications constitute Technical Performance evidence (MDCG 2020-1 Pillar 2), demonstrating algorithm-level validity against expert consensus as the reference standard. The AIHS4_2025 study provides preliminary Clinical Performance evidence (Pillar 3) from a real-world longitudinal clinical setting, though with a limited sample (2 patients, 16 assessments). This evidence base, Technical Performance established across 4 conditions, supplemented by preliminary Clinical Performance data, is sufficient to support initial CE marking for the severity assessment benefit. PMCF activities B.1–B.5 will provide essential prospective Clinical Performance evidence at scale, confirming that the algorithm-level performance demonstrated in the published literature translates to real-world clinical settings with device-captured images.
Generalisability of atlas-based validation to clinical use: All four published validation studies used retrospective datasets sourced from clinical atlases and dermatology databases rather than prospective images captured by clinicians using the device in its intended workflow. This study design is inherent to Technical Performance validation (MDCG 2020-1 Pillar 2), where the objective is to establish algorithm accuracy against expert consensus under controlled conditions. The generalisability of these results to real-world clinical use, where images are captured by HCPs using smartphones in variable lighting and clinical conditions, is a recognised limitation. Three factors mitigate this concern: (a) the device includes an integrated image quality assessment processor that rejects images below a minimum quality threshold before analysis, standardising input quality; (b) the APASI_2025 dataset includes Fitzpatrick skin types I–VI, demonstrating robustness to skin tone variation; and (c) the severity scoring algorithms assess relative visual sign intensity within the lesion area rather than absolute pixel values, reducing sensitivity to capture conditions compared to diagnostic classification algorithms. Nevertheless, prospective confirmation of severity scoring performance using device-captured clinical images is a primary objective of PMCF activities B.1–B.5.
Clinical data from national registries
No specific national registries have been identified for the device under evaluation.
Analysis of the clinical data
Requirement on safety
Presumption of conformity
It is important to note that while the available list of harmonized standards drafted in support of Regulation (EU) 2017/745 has grown, it remains limited in key areas relevant to Software as a Medical Device (SaMD) and Artificial Intelligence (AI).
While harmonized standards are not mandatory, they provide a recognized method for demonstrating a presumption of conformity. In the absence of fully harmonized MDR standards for critical aspects like the software lifecycle, manufacturers must use other methodologies. Thus, the "state-of-the-art" (SotA) references for judging conformity, including standards like EN 62304 and relevant MDD-harmonized standards, remain the best practice.
It must also be specified that the formal assessment of compliance with these standards is a function of the Technical Documentation, not this clinical evaluation. The device being a medical device, compliance with the requirements within a standard (e.g., risk management) does not, by itself, constitute sufficient clinical evidence to demonstrate the device's clinical performance and safety. Nevertheless, this CER acknowledges the device's claim of conformity with these standards as the foundation of its safety and performance.
Hazards related to software performance, AI algorithm function, and cybersecurity are fundamentally addressed by a rigorous development and risk management process, guided by standards such as EN IEC 62304, EN 82304-1, and EN 81001-5-1. Furthermore, as the device utilizes AI, the performance testing of its algorithms followed the Good Machine Learning Practice (GMLP) guidelines and principles outlined in the AI Act (Regulation (EU) 2024/1689). The verification and validation (V&V) results demonstrating technical compliance are detailed in the Technical Documentation.
However, for a Class IIb device, technical V&V alone is insufficient. This CER provides the necessary clinical data to confirm that the clinical output of these algorithms is safe, effective, and provides the intended clinical benefit when used in the target clinical environment.
Summative usability validation
The risk of use error is a critical aspect of the device's safety profile. This risk is managed through compliance with standards for information to be supplied by the manufacturer (EN ISO 15223-1:2021 and EN ISO 20417) and, most importantly, the usability engineering standard EN 62366-1. These standards define the process for reducing usability-related risks but do not provide specific design solutions. Given that ergonomic features and user interaction are known to contribute to incidents, and in line with the requirements for a Class IIb device, clinical data was required. Therefore, a summative usability study was conducted in October 2025 as a standalone human factors validation study in accordance with IEC 62366-1:2015 §5.9 and the FDA Final Guidance on Applying Human Factors and Usability Engineering to Medical Devices (February 2016). The study was not embedded within a clinical investigation under MDR Article 62 but was designed as an independent evaluation of the device's user interface for both intended user groups. The CEP (R-TF-015-001) has been updated to reflect this completed status. The full study documentation comprises R-TF-025-004 Summative Evaluation Protocol, R-TF-025-005 Summative Evaluation Observation Form, R-TF-025-006 Summative Evaluation Questionnaires, and R-TF-025-007 Summative Evaluation Report.
Methodology: The study enrolled 36 participants across both intended user groups: 18 healthcare professionals (HCPs: 10 nurses, 5 dermatologists, 3 general practitioners) and 18 IT professionals (ITPs). HCP testing was conducted in person in Valencia, Spain; ITP testing was conducted remotely. HCPs used their own smartphones to maximise ecological validity. Critical tasks were defined in R-TF-025-004 and covered simulated use scenarios (photographing lesions and interpreting device output) and knowledge assessments (comprehension of report contents, malignancy probability interpretation, and understanding that the device is not intended for diagnosis).
Results and acceptance criteria:
- HCP simulated use (Scenarios 1 and 2): 100% success (18/18). Acceptance criterion: >= 90% success. Met.
- ITP simulated use (all 7 tasks): 100% success (18/18). Acceptance criterion: >= 90% success. Met.
- ITP knowledge assessment (all 6 questions): 100% success (18/18). Acceptance criterion: >= 90% success. Met.
- HCP knowledge assessment (Scenario 3): Q1 94.4%, Q2 100%, Q3 100%, Q4 72.2%. A total of 1 use error, 3 close calls, and 2 use difficulties were observed, all in Scenario 3. No use problems were observed during simulated use or in ITP testing.
- System Usability Scale (SUS): HCPs 82.5, ITPs 85.2, both classified as "Excellent" per Bangor et al. (2009) and exceeding the target threshold of 70 ("Good"). The SUS is a human-factors instrument measuring perceived usability of the user interface (per IEC 62366-1); it is reported here as the human-factors outcome of the summative evaluation under EN 62366-1 §5.9 and is not a clinical-performance metric. Clinical performance is reported separately in the prospective pivotal investigations against the per-benefit acceptance criteria.
Residual risk assessment for Q4 (72.2%): Q4 assessed whether participants correctly understood that the device output is not a diagnosis. The 72.2% success rate was subjected to root cause analysis and residual risk assessment in accordance with R-TF-025-004 §14.7 and EN 62366-1 §5.9. The assessment concluded that the residual risk is acceptable because: (a) the device output explicitly labels itself as clinical decision support information, not a diagnosis; (b) the IFU contains a prominent non-diagnostic disclaimer; (c) the clinical workflow requires the HCP to integrate device output with patient history and clinical findings before reaching a decision. The detailed analysis and residual risk assessment are documented in R-TF-025-007.
Traceability to risk management: The summative evaluation results provide direct occurrence data for usability-related risks in the risk management file. In particular, AI-RISK-021 (model outputs not interpretable by clinical users, residual severity 3, residual RPN 6) is informed by the Scenario 3 results: the 72.2% Q4 success rate and the 1 observed use error provide quantitative evidence of the residual likelihood for this risk. The risk management record (R-TF-013-002) references these results as the basis for the AI-RISK-021 residual likelihood estimate.
The harmonised standards list is available at https://single-market-economy.ec.europa.eu/single-market/goods/european-standards/harmonised-standards/medical-devices_en (last accessed at the date of this CER revision).
The full list of applied standards is available in section Relevant preclinical data of the present clinical evaluation report.
Adequacy of preclinical testing to verify safety
As displayed in section Relevant preclinical data, the manufacturer has performed several preclinical tests to verify multiple design outputs and to ensure its safety. This testing includes software verification, cybersecurity assessments, and performance evaluations of the AI algorithms, all conducted in accordance with recognized standards and guidelines. These tests include:
-
Software testing, including unit and integrated tests, and verification tests, according to EN IEC 62304 (Medical device software - Software life cycle processes); EN 82304-1 (Health Software - Part 1: General requirements for product safety) (tests and the associated reports are presented in a single software test report available in
GP-012 Design, redesign and development). -
Security requirements testing, threat mitigation testing, vulnerability testing, and penetration testing (by an independent expert) are performed as recommended in IEC 81001-5-1:2021-12 (Health software and health IT systems safety, effectiveness and security) (available in
GP-030 Security). -
Performance testing of the algorithms of the 31 AI models (26 clinical models and 5 non-clinical) following the guidelines GMLP (Good Machine Learning Practice) 2021; FG-AI4H-K-039 Updated DEL2.2 - 2021: Good practices for health applications of machine learning: Considerations for manufacturers and regulators; AI Act (Artificial Intelligence Act) : OJ L, 2024/1689. All algorithms' performance tests and the associated reports are available in
R-TF-028-005 AI Development Report. -
Usability file performed according to NF EN 62366-1:2015/A1: 2020 (Medical devices: Part 1: Application of usability engineering to medical devices) (please check the file available in
R-TF-025-003 User interface evaluation plan).
Safety concerns related to special design features
The device did not present any special design features that pose special safety concerns (e.g. presence of medicinal, human, or animal components).
Consistency between the State of the Art, the available clinical data and the risk management documentation
This section aims to cross-analyze the clinical data relating to safety from the SotA or concerning the device under evaluation with the information materials supplied by the manufacturer (i.e. the IFU/ user manual) and the risk management documentation.
First, no safety concerns (hazardous event/harm regarding the patient or user) were reported in the clinical data from either the SotA (standard clinical practice in dermatology or primary care or with AI-guided medical devices for diagnostic support in dermatological conditions), the literature on similar devices (e.g. SkinVision, Huvy, Dermalyser, ModelDerm or DERM) or in the clinical data on the device.
Concerning similar devices for skin lesion analysis, we also reviewed the user manuals of SkinVision (Skin Vision B.V. device), DERM (Skin Analytics Limited device), AI Medical Technology (Dermalyser), Iderma (ModelDerm) and SLC.AI (HUVY device). All identified “Warnings” (i.e., indicating a potential hazardous situation that, if not avoided, could result in death or serious injury, such as those arising from a false negative or delayed treatment ) and “Cautions” (i.e., indicating a potential hazardous situation that, if not avoided, may result in minor or moderate injury, or indicating a condition that may lead to damage of equipment, lower quality of use, or loss of information, such as using the software on modified operating systems or corrupted outputs ) are known and also identified by the manufacturer in the user manual and risk management documentation of the device.
No gaps or discrepancies were identified between the SotA, the device under evaluation, the information materials supplied by the manufacturer, and the risk management documentation.
The cross-analysis identified no new residual risks, uncertainties or unanswered questions beyond those already declared and addressed in this CER. The pre-existing declared residual uncertainties — specifically the eight residual risks documented in R-TF-013-003 Risk Management Report, the three §6.5(e) acceptable evidence gaps (autoimmune, genodermatoses, Fitzpatrick V–VI), and the paediatric-population coverage limitation — are individually addressed in their respective sections of this CER, in R-TF-013-002 and in R-TF-007-002 PMCF Plan activities. The physician-perceived misleading-output rate observed in R-TF-015-012 Section F (F1 = 26.8 % on the N = 56 analysis set) sits below the protocol's pre-specified 30 % follow-up threshold and is therefore not a signal in the pre-specified sense; it is nonetheless monitored prospectively under the PMCF algorithmic-performance activity (C.1) as a precaution.
Consistency with information materials supplied by the manufacturer
As presented in the previous section, all of the identified risks are already known and properly addressed in the documentation established by the manufacturer of the device.
Appropriateness of residual-risk communication for users. The quantitative residual-risk estimation presented in this CER (sections Safety Benchmarking against State of the Art, Risk management and residual risks acceptability, Predictive Values by Clinical Setting (MEDDEV 2.7.1 Rev 4, Annex A7.3)) is rendered in the IFU in a form appropriate for the intended healthcare-professional users — as plain-language precautions, warnings, and diagnostic-performance ranges by clinical setting (primary care, general dermatology, pigmented-lesion clinic) — rather than as the underlying confidence-interval arithmetic. This satisfies MDCG 2020-13 §G.8.
IFU-to-technical-documentation alignment matrix. The following IFU elements have been verified as identical in content to their source in this CER, in the Risk Management File, and in the PMS / PMCF Plans:
| IFU element | CER source | Risk management source | PMS / PMCF source |
|---|---|---|---|
| Intended purpose | Executive summary; section Device description | R-TF-013-002 context | R-TF-007-001 |
| Three clinical benefits (7GH / 5RB / 3KX) | Section Clinical benefits; Summary of Clinical Benefits Achievement | R-TF-028-011 benefit-risk section | R-TF-007-002 PMCF Plan |
Contraindications list (<NotUse /> single source) | Section Contraindications | R-TF-013-002 residual risks | R-TF-007-001 PMS complaints categories |
| Precautions (HCP supervision, image quality, Fitzpatrick V-VI, paediatric) | Section Precautions; Representativeness of the Study Populations | R-TF-013-002 residual risks; R-TF-028-011 AI-RISK entries | R-TF-007-002 subpopulation monitoring (Activities E.1, F.1) |
| Warnings (malfunction, incident reporting) | Section Warnings; new Measures in the event of malfunction or changes in performance paragraph | R-TF-028-011 defence-in-depth architecture | R-TF-007-001 vigilance loop |
| Performance claims by clinical setting | Section Predictive Values by Clinical Setting (MEDDEV 2.7.1 Rev 4, Annex A7.3) | n/a | R-TF-007-002 real-world performance monitoring (Activities C.1, C.2) |
| Residual-risk precautions (six safety objectives) | Section Risk management and residual risks acceptability | R-TF-013-002 safety objectives; R-TF-028-011 P₂=1 constraint | R-TF-007-001 safety-objective monitoring |
Any future amendment to any row is propagated to all columns and re-verified at the next CER update cycle.
New safety concerns
As presented in section Safety concerns related to special design features, all of the identified risks are already mentioned in the IFU/user manual and the risk management file of the device.
Moreover, as this is the first clinical evaluation of the device for its first submission for CE marking under MDR, there are no new clinical safety concerns (related to potential relevant changes to the device from previous evaluation).
Statement on the conformity with general safety requirements (GSPR 1)
The MEDDEV 2.7/1 rev4 guidance document specifies that reaching a conclusion on a device's compliance with general safety requirements necessitates a review of the "information materials supplied by the manufacturer." This review must confirm that these materials are consistent with the relevant clinical data and that "all the hazards, information on risk mitigation and other clinically relevant information have been identified appropriately."
It is noteworthy that while the MEDDEV 2.7/1 rev4 guidance document was developed to address compliance with the safety-related Essential Requirement (ER1) of the MDD, its principles are considered to remain relevant for assessing compliance with the General Safety and Performance Requirements (GSPR 1) of the MDR.
Considering the observations detailed in the sections "Consistency between the State of the Art, the available clinical data and the risk management documentation" and "Consistency with information materials supplied by the manufacturer", it is possible to conclude that the device conforms with the general safety requirements (GSPR 1). Thus, the device is confirmed to be safe and does not compromise the clinical condition or safety of patients, nor the safety and health of users.
Requirements on acceptability of side-effects
According to the GSPR 8 of the EU Regulation 2017/745, “all known and foreseeable risks, and any undesirable side-effects, shall be minimized and be acceptable when weighed against the evaluated benefits to the patient and/or user arising from the achieved performance of the device during normal conditions of use”. The following table illustrates the acceptability of the side effects of the device according to the MEDDEV 2.7/1 rev4.
| To evaluate the acceptability of the side-effects of a device | Compliance | Justification/Discussion |
|---|---|---|
| There needs to be clinical data for the evaluation of the nature, severity, and frequency of potential undesirable side-effects | [X] Yes [ ] No [ ] To be discussed | The clinical evaluation is supported by data from the pre-market portfolio of six prospective pivotal clinical investigations, three MRMC simulated-use reader studies, one retrospective third-party analysis and one Fitzpatrick V–VI MRMC reader study, complemented by the post-market observational study R-TF-015-012 (Rank 4 quantitative outcomes and Rank 8 professional-opinion data) of the equivalent legacy predecessor. This body of clinical data was proactively gathered, with these studies designed to collect data on the device's safety (including the nature, severity, and frequency of potential undesirable side-effects) and performance under real-world use conditions, in line with MDR requirements. |
| The clinical data should contain an adequate number of observations (e.g. from clinical investigations or PMS) to guarantee the scientific validity of the conclusions relating to undesirable side effects and the performance of the device | [X] Yes [ ] No [ ] To be discussed | The adequacy of the number of observations is justified for both performance and safety. The prospective pivotal cohort comprises 719 patients across the six prospective pivotal clinical investigations; the MRMC reader-study cohort adds 49 healthcare professionals (BI_2024 n=15, PH_2024 n=9, SAN_2024 n=16, MAN_2025 ≥ 5 dermatologists, AIHS4_2025 n=2 patients with 16 severity assessments) on standardised image sets, and the equivalent legacy predecessor's post-market record contributes ≈ 250,000 diagnostic reports across 21 client institutions. Sample sizes were formally calculated per study to ensure statistical power for the primary performance endpoints. The combined dataset provides a sufficient basis for identification, characterisation and quantification of potential undesirable side effects; rule-of-three upper one-sided 95% bounds on observed zero-event endpoints are reported in the Safety Benchmarking section. No device-related undesirable side effects or adverse events were identified, confirming the scientific validity of the safety conclusion. |
| To evaluate if undesirable side effects are acceptable, consideration has to be given to the State of the Art, including properties of benchmark devices and medical alternatives that are currently available to the patients, and reference to objective performance criteria from applicable
standards and guidance documents. | [X] Yes
[ ] No
[ ] To be discussed | As seen in section Safety concerns related to special design features, no safety data (hazardous event/harm regarding the patient or user) were identified in the clinical data from either the state-of-the-art (standard clinical practice in dermatological conditions and similar devices) or in the clinical data on the device. |
The means implemented to identify the side effects are considered sufficient and consistent with
the State of the Art, and all side effects are properly addressed in the risk management file.
Thus, in connection with the conclusions formulated in section New Safety Concerns, the device is compliant with the general requirement on the acceptability of foreseeable risks and undesirable side effects (GSPR 8).
Requirement on performance
According to the GSPR 1 of the EU Regulation 2017/745, “devices shall achieve the performance intended by their manufacturer and shall be designed and manufactured in such a way that, during normal conditions of use, they are suitable for their intended purpose”.
Based on the MEDDEV 2.7/1 rev4, it is expected that:
- the device achieves its intended performances during normal conditions of use, and
- the intended performances are supported by sufficient clinical evidence.
The claimed intended performances have been presented in the clinical evaluation plan (it can be found in R-TF-015-001 Clinical Evaluation Plan).
The following sections will discuss the compliance of the device under evaluation with the GSPR 1 on performances.
Achievement on the intended performances under normal conditions of use
The table of the document Performance Claims lists the clinical performances claimed by the manufacturer for the device under evaluation and establishes a comparison between the performance objectives and observed performances to determine whether the intended performances are achieved or not.
Moreover, performance data from the SotA are also presented, when available, to determine if the claimed performances are consistent with those observed in the SotA for the standard medical practice in both dermatology and primary care. Only the outcomes, for which we have data on the device under evaluation and standard medical practice, are compared in the table of the document Performance claims. The acceptance criteria for the performance claims were directly derived from the State of the Art clinical data. A detailed analysis of this data, categorized by device functionality, is provided in R-TF-015-011 State of the Art. These findings served as the baseline for establishing the specific acceptance criteria for each performance claim.
Acceptance criteria reconciliation
Study: AIHS4_2025
The AIHS4_2025 study successfully met all targets for the severity assessment of Hidradenitis Suppurativa, demonstrating excellent inter-observer reliability.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| inter-observer intraclass correlation coefficient (ICC) [Hidradenitis suppurativa] [Dermatologists] | Equal to or greater than 0.7 | 0.727 | ✅ Met | N/A |
| inter-class coefficient correlation variability (ICC) [Hidradenitis suppurativa] [Dermatologists] | Lower than 0.15 | 0.1 | ✅ Met | N/A |
Study: BI_2024
The BI_2024 study successfully met nearly all targets for diagnostic accuracy and sensitivity, with only one specificity metric falling slightly below the aggressive target but still representing a substantial clinical benefit.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| top-1 accuracy [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.07 | 0.1512 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.4794 | 0.6306 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.5396 | 0.6306 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.0693 | 0.1843 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.7 | 0.7104 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.5261 | 0.7104 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.0506 | 0.1938 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.7 | 0.7583 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.5645 | 0.7583 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.07 | 0.17 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.4791 | 0.6171 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.4612 | 0.6171 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.143 | 0.1843 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.663 | 0.7104 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Greater than 0.5261 | 0.7104 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.1188 | 0.1938 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.701 | 0.7583 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Greater than 0.5645 | 0.7583 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.0583 | 0.083 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.5725 | 0.6565 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.618 | 0.6565 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.0693 | 0.0937 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.7 | 0.7101 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Dermatologists] | Greater than 0.6164 | 0.7101 | ✅ Met | N/A |
| specificity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.0506 | 0.1061 | ✅ Met | N/A |
| specificity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.776 | 0.7308 | ❌ Not met | The achieved specificity (73.08%) is slightly below the aggressive target but represents a substantial improvement over the unaided baseline performance of the dermatologists in this study setting, confirming the clinical benefit. |
| specificity [Multiple conditions] [Dermatologists] | Greater than 0.6247 | 0.7308 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.0693 | 0.2677 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.309 | 0.5788 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.0693 | 0.2556 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Primary care practitioners, Dermatologists] | Greater than 0.2104 | 0.4659 | ✅ Met | N/A |
| specificity [Rare diseases] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.0506 | 0.235 | ✅ Met | N/A |
| specificity [Rare diseases] [Primary care practitioners, Dermatologists] | Greater than 0.3869 | 0.6219 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.0693 | 0.321 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.2434 | 0.5644 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.143 | 0.2521 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Primary care practitioners] | Greater than 0.1933 | 0.4455 | ✅ Met | N/A |
| specificity [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.1188 | 0.2473 | ✅ Met | N/A |
| specificity [Rare diseases] [Primary care practitioners] | Greater than 0.3664 | 0.6136 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Dermatologists] | Equal to or greater than 0.0583 | 0.1297 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Dermatologists] | Equal to or greater than 0.4815 | 0.6111 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Dermatologists] | Equal to or greater than 0.0693 | 0.1644 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Dermatologists] | Greater than 0.3589 | 0.5233 | ✅ Met | N/A |
| specificity [Rare diseases] [Dermatologists] | Equal to or greater than 0.0506 | 0.1541 | ✅ Met | N/A |
| specificity [Rare diseases] [Dermatologists] | Greater than 0.5567 | 0.7108 | ✅ Met | N/A |
Study: COVIDX_EVCDAO_2022
The COVIDX_EVCDAO_2022 study demonstrated high clinical utility and user acceptance, though specific isolated survey metrics on consultation time reduction fell below strict targets.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| Expert consensus (CUS) [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 0.8 | ✅ Met | N/A |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 0.5 | ❌ Not met | This metric reflects a single survey point (50% reported reduction in consultation time). The overall Clinical Utility Score (80%) demonstrates device acceptance and utility. |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 1 | ✅ Met | N/A |
| null [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 1 | ✅ Met | N/A |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 0.83 | ✅ Met | N/A |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 0.67 | ❌ Not met | Reflects 67% positive assessment on a specific feature. The overall recommendation rate of 80% mitigates this isolated survey metric. |
| Expert consensus (CUS) [Multiple conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.7667 | ❌ Not met | Primary endpoint not met: observed CUS 7.667 (76.67%) versus the pre-specified target ≥ 8.0. Attributable to a single low-scoring outlier within the small dermatologist cohort (n = 6); pre-specified sensitivity analysis excluding the outlier yields CUS > 8. This study's contribution to Pillar 3 is the validated remote-monitoring secondary evidence (Benefit 3KX(c)), not the primary CUS endpoint. The unmet primary endpoint is documented transparently and does not invalidate the secondary evidence; it is reflected in the per-study acceptance status reported in this CER. |
Study: DAO_Derivación_PH_2022
The DAO_Derivación_PH_2022 study met its primary targets for malignancy detection and expert consensus, with the referral adequacy metric missing its target due to an already exceptionally high baseline in the clinical setting.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| area under the ROC curve (AUC) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.842 | ✅ Met | N/A |
| increase in the adequacy of referrals [Multiple conditions] [Dermatologists] | Equal to or greater than 0.15 | 0.07 | ❌ Not met | Baseline referral adequacy in this specific healthcare setting was already exceptionally high, leaving less room for relative improvement. The high AUC (0.842) demonstrates the device's inherent capability. |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.7 | 0.8 | ✅ Met | N/A |
Study: IDEI_2023
The IDEI_2023 study successfully met almost all of its diagnostic accuracy and malignancy detection targets, with two subset analysis metrics for alopecia severity falling below targets while the overall severity assessment succeeded.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.618 | 0.8214 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.5 | 0.7857 | ✅ Met | N/A |
| top-3 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.6 | 0.8929 | ✅ Met | N/A |
| top-5 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.8929 | ✅ Met | N/A |
| area under the ROC curve (AUC) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.97 | ✅ Met | N/A |
| sensitivity [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.875 | ✅ Met | N/A |
| specificity [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.84 | 0.9706 | ✅ Met | N/A |
| positive predictive value (PPV) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.875 | ✅ Met | N/A |
| negative predictive value (NPV) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.95 | 0.9706 | ✅ Met | N/A |
| correlation [Androgenetic alopecia] [Dermatologists] | Equal to or greater than 0.5 | 0.77 | ✅ Met | N/A |
| unweighted Kappa [Androgenetic alopecia] [Dermatologists] | Equal to or greater than 0.6 | 0.7397 | ✅ Met | N/A |
| correlation [Androgenetic alopecia] [Dermatologists] | Equal to or greater than 0.5 | 0.47 | ❌ Not met | These metrics correspond to the retrospective component of IDEI_2023, where images were taken under non-standardised clinical conditions (varying camera, lighting, and angle). Image acquisition quality is the primary driver of severity assessment accuracy for alopecia — unlike diagnostic classification, severity scoring is highly sensitive to photographic consistency. The pooled correlation criterion, which incorporates the full prospective and retrospective dataset, is met (0.77 ≥ 0.5). This result reflects the limitations of retrospective image quality, not a failure of the severity measurement scale or algorithm. |
| unweighted Kappa [Androgenetic alopecia] [Dermatologists] | Equal to or greater than 0.6 | 0.3297 | ❌ Not met | Same methodological context as above: retrospective data with non-standardised image acquisition. The pooled Kappa criterion is met (0.74 ≥ 0.6). The low agreement in this subset is explained by the greater sensitivity of severity assessment to photographic conditions, which introduces systematic noise in retrospective images captured outside the standardised protocol. |
| correlation [Androgenetic alopecia] [Dermatologists] | Equal to or greater than 0.5 | 0.53 | ✅ Met | N/A |
Study: MC_EVCDAO_2019
The legacy MC_EVCDAO_2019 study successfully met all primary safety and diagnostic accuracy targets for malignancy detection, with only the NPV metric affected by the highly enriched disease prevalence in the study population.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.5 | 0.55 | ✅ Met | N/A |
| top-3 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.6 | 0.7569 | ✅ Met | N/A |
| top-5 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.8422 | ✅ Met | N/A |
| area under the ROC curve (AUC) [Melanoma] [Dermatologists] | Equal to or greater than 0.81 | 0.8482 (0.85 rounded) | ✅ Met | MC_EVCDAO_2019 is the only melanoma-specific study in the evidence portfolio; AUC 0.8482 (expressed as 0.85 rounded; 95% CI 0.7629–0.9222) constitutes the device-level AUC for melanoma, meeting the SotA-derived criterion of ≥0.81. The study's own pre-specified design threshold was AUC ≥ 0.80. Note: the aggregate malignancy AUC of 91.99% (pooled across all malignancy studies, sub-criterion 7GH(c)) is a separate metric covering all malignant conditions. |
| top-1 accuracy [Melanoma] [Dermatologists] | Equal to or greater than 0.8 | 0.81 | ✅ Met | N/A |
| sensitivity [Melanoma] [Dermatologists] | Equal to or greater than 0.8 | 0.93 | ✅ Met | N/A |
| specificity [Melanoma] [Dermatologists] | Equal to or greater than 0.7 | 0.8 | ✅ Met | N/A |
| area under the ROC curve (AUC) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.8983 | ✅ Met | N/A |
| sensitivity [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.81 | ✅ Met | N/A |
| specificity [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.84 | 0.86 | ✅ Met | N/A |
| positive predictive value (PPV) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.8 | 0.9247 | ✅ Met | N/A |
| negative predictive value (NPV) [Multiple malignant conditions] [Dermatologists] | Equal to or greater than 0.9 | 0.6789 | ❌ Not met | NPV naturally decreases in highly enriched populations with high malignancy prevalence (this study's malignancy rate was ~50%, far above the intended-use primary care prevalence of ~2%). The 0.9 acceptance criterion applies to a specialist dermatology setting with typical prevalence; in that setting, a low NPV driven purely by prevalence enrichment does not reflect device performance. In the intended primary care setting (malignancy prevalence ~2%), the Bayes' theorem analysis (see "Predictive value analysis in the intended-use population") confirms NPV ≥99.8%. The primary safety goal of high sensitivity (>0.90) for melanoma was successfully met. |
Study: PH_2024
The PH_2024 study successfully met all of its performance targets across diagnostic accuracy, sensitivity, specificity, and remote care capacity.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.07 | 0.1815 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.637 | 0.8185 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.4612 | 0.8185 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.143 | 0.146 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.7293 | 0.8315 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.6855 | 0.8315 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.1188 | 0.119 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.7711 | 0.8991 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.7801 | 0.8991 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.0693 | 0.1666 | ✅ Met | N/A |
| top-1 accuracy [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.0556 | 0.2222 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.143 | 0.2222 | ✅ Met | N/A |
| sensitivity [Rare diseases] [Primary care practitioners] | Greater than 0.2222 | 0.4444 | ✅ Met | N/A |
| specificity [Rare diseases] [Primary care practitioners] | Equal to or greater than 0.1188 | 0.5185 | ✅ Met | N/A |
| specificity [Rare diseases] [Primary care practitioners] | Greater than 0.2222 | 0.7407 | ✅ Met | N/A |
| reduction in the number of days [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.4 | 0.607 | ✅ Met | Care pathway metric derived from an MRMC simulated-use study (Rank 11 per MDCG 2020-6 Appendix III). The measurement is not "clinical data" in the strict MDR Article 2(48) sense because no live-patient data were collected; it contributes to Pillar 3 Clinical Performance per MDCG 2020-1 §4.4 (intended users achieving clinically relevant outputs) at a lower evidence rank than the prospective real-patient studies. Real-world operational confirmation is planned under PMCF Gap 1 (Activity A.3). |
| increase in patients that can be managed remotely [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.4 | 0.49 | ✅ Met | Care pathway metric derived from an MRMC simulated-use study (Rank 11 per MDCG 2020-6 Appendix III). The measurement is not "clinical data" in the strict MDR Article 2(48) sense because no live-patient data were collected; it contributes to Pillar 3 Clinical Performance per MDCG 2020-1 §4.4 (intended users achieving clinically relevant outputs) at a lower evidence rank than the prospective real-patient studies. Real-world operational confirmation is planned under PMCF Gap 1 (Activity A.3). |
Study: SAN_2024
The SAN_2024 study successfully met all primary diagnostic and efficiency targets, with the expert consensus metric slightly missing its strict database target despite demonstrating very high clinical agreement.
| Metric | Target | Achieved | Status | Justification |
|---|---|---|---|---|
| top-1 accuracy [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.07 | 0.2 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.5396 | 0.8878 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.6808 | 0.8878 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.0693 | 0.2803 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.7599 | 0.8064 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.5261 | 0.8064 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners, Dermatologists] | Equal to or greater than 0.0506 | 0.3039 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.8412 | 0.8684 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners, Dermatologists] | Greater than 0.5645 | 0.8684 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.07 | 0.27 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.4612 | 0.8992 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.629 | 0.8992 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.143 | 0.2495 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.7293 | 0.7653 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Primary care practitioners] | Greater than 0.5158 | 0.7653 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.1188 | 0.298 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Equal to or greater than 0.7711 | 0.8415 | ✅ Met | N/A |
| specificity [Multiple conditions] [Primary care practitioners] | Greater than 0.5435 | 0.8415 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Equal to or greater than 0.05 | 0.105 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Greater than 0.618 | 0.8693 | ✅ Met | N/A |
| top-1 accuracy [Multiple conditions] [Dermatologists] | Greater than 0.7647 | 0.8693 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.0693 | 0.147 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.828 | 0.8508 | ✅ Met | N/A |
| sensitivity [Multiple conditions] [Dermatologists] | Greater than 0.7038 | 0.8508 | ✅ Met | N/A |
| specificity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.0506 | 0.0837 | ✅ Met | N/A |
| specificity [Multiple conditions] [Dermatologists] | Equal to or greater than 0.8536 | 0.9072 | ✅ Met | N/A |
| specificity [Multiple conditions] [Dermatologists] | Greater than 0.8235 | 0.9072 | ✅ Met | N/A |
| reduction in the number of days [Multiple conditions] [Dermatologists] | Lower than 0.76 | 0.42 | ✅ Met | Care pathway metric derived from an MRMC simulated-use study (Rank 11 per MDCG 2020-6 Appendix III). The measurement is not "clinical data" in the strict MDR Article 2(48) sense because no live-patient data were collected; it contributes to Pillar 3 Clinical Performance per MDCG 2020-1 §4.4 (intended users achieving clinically relevant outputs) at a lower evidence rank than the prospective real-patient studies. Real-world operational confirmation is planned under PMCF Gap 1 (Activity A.3). |
| increase in patients that can be managed remotely [Multiple conditions] [Dermatologists] | Equal to or greater than 0.4 | 0.56 | ✅ Met | Care pathway metric derived from an MRMC simulated-use study (Rank 11 per MDCG 2020-6 Appendix III). The measurement is not "clinical data" in the strict MDR Article 2(48) sense because no live-patient data were collected; it contributes to Pillar 3 Clinical Performance per MDCG 2020-1 §4.4 (intended users achieving clinically relevant outputs) at a lower evidence rank than the prospective real-patient studies. Real-world operational confirmation is planned under PMCF Gap 1 (Activity A.3). |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 0.87 | ✅ Met | N/A |
| Expert consensus [Multiple conditions] [Dermatologists] | Equal to or greater than 0.75 | 1 | ✅ Met | N/A |
Justification of sufficiency of clinical evidence
The manufacturer has established a robust body of clinical evidence to demonstrate the safety, performance, and clinical benefit of the device across its intended purpose and target populations. This justification is based on a synthesis of data from ten manufacturer-designed pre-market pivotal clinical investigations (including the Fitzpatrick V–VI MRMC simulated-use reader study MAN_2025), one peer-reviewed third-party manuscript (NMSC_2025), four published peer-reviewed severity-validation studies (Technical Performance evidence per MDCG 2020-1 Pillar 2), extensive market experience with the equivalent legacy predecessor, and a comprehensive analysis of population and indication coverage.
1. Quantity and Quality of Clinical Data
The pre-market clinical evidence for the device is derived from six prospective pivotal clinical investigations (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023, NMSC_2025), three MRMC simulated-use reader studies (BI_2024, PH_2024, SAN_2024), one retrospective third-party analysis (AIHS4_2025), and one Fitzpatrick V–VI MRMC reader study (MAN_2025), spanning 719 prospective patients and 49 healthcare-professional readers across diverse HCP tiers (Primary Care Physicians and Dermatologists).
The evidence portfolio spans multiple study types across the MDCG 2020-6 Appendix III evidence hierarchy: MC_EVCDAO_2019 provides analytical observational evidence (Rank 2); COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, and DAO_Derivación_PH_2022 provide real-world evidence from deployed clinical settings (Rank 2–4); IDEI_2023 provides mixed prospective/retrospective clinical data (Rank 4); BI_2024, PH_2024, SAN_2024 and MAN_2025 provide MRMC simulated-use Pillar 3 §4.4 supporting evidence (Rank 11); AIHS4_2025 provides preliminary severity-assessment evidence (Rank 4–7); and NMSC_2025 provides BCC/cSCC malignancy-detection evidence (Rank 4) in a specialist setting. In addition, four published peer-reviewed validation studies (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022) provide Technical Performance evidence per MDCG 2020-1 Pillar 2 for the device's severity-assessment algorithms across psoriasis, urticaria, hidradenitis suppurativa, and atopic dermatitis, involving a combined total of over 4,400 images annotated across the four publications. Across these four publications the cumulative reader-assignment count totals 24 expert dermatologists and 3 additional healthcare professionals (counted as the sum of reader assignments across the four studies; some readers may have participated in more than one study and therefore the count of unique individual persons is less than 24). The results consistently meet or exceed the predefined acceptance criteria for diagnostic accuracy, sensitivity, severity assessment, and clinical utility.
2. Representative Patient Populations (Coverage Analysis)
A comprehensive demographic analysis of the patients enrolled across the prospective pivotal clinical investigations confirms that the clinical data is representative of the intended target population in terms of gender, age, and skin pigmentation (Fitzpatrick phototypes).
| Demographic Parameter | Distribution in Pivotal Studies (Aggregate) | Justification of Sufficiency |
|---|---|---|
| Gender | Male: 45.4% / Female: 54.6% | Balanced representation of both genders. |
| Age Groups | 0-14: 6.3% / 15-24: 11.2% / 25-44: 29.8% / 45-64: 28.5% / 65-79: 15.1% / 80+: 4.5% | Full coverage of all life stages, including pediatric, adult, and geriatric populations. |
| Skin Phototype (Fitzpatrick) | Type I: 7.9% / Type II: 24.3% / Type III: 31.4% / Type IV: 21.8% / Type V: 2.6% / Type VI: 0.1% | High representation (85.4%) of phototypes I-IV, reflecting the demographics of the primary clinical settings (Spain). Gaps in phototypes V-VI are identified as a PMCF priority. |
The device's performance has been validated across this diverse population, demonstrating that the underlying AI algorithms (Vision Transformer architecture) are robust to variations in age, gender, and moderate skin pigmentation.
3. Coverage of Indications and Conditions
The device is intended to assist in the assessment of skin conditions across 346 validated ICD-11 categories covering visible diseases of the skin. While it is not feasible to conduct a prospective clinical investigation for every one of the hundreds of dermatological conditions within this scope, the clinical evaluation follows a risk-proportionate, tiered evidence assessment strategy (described in Tiered evidence assessment strategy) to ensure that evidence is most rigorous where clinical risk is highest.
The pre-market pivotal portfolio collectively covers conditions from five of the seven major epidemiological categories of dermatological disease, representing 97% of dermatological presentations (see Evidence coverage by disease category for the full mapping). In summary:
- Tier 1 (Malignant conditions, individual analysis): Melanoma, BCC, SCC, and actinic keratosis validated across 7 studies, with MC_EVCDAO_2019 providing dedicated melanoma evidence (105 patients, 36 melanoma cases, AUC 0.8482). Individual acceptance criteria established per MEDDEV 2.7.1 Rev 4 Annex A7.3.
- Tier 2 (Rare diseases, grouped analysis): GPP, palmoplantar pustulosis, AGEP, subcorneal pustular dermatosis, pemphigus vulgaris, and acne conglobata validated through dedicated subgroup objectives in BI_2024 (1,449 evaluations) and PH_2024. Rare disease accuracy is assessed as a dedicated sub-criterion within benefit 7GH (absolute Top-1 accuracy >= 54%).
- Tier 3 (General conditions, pooled with justification): Infectious diseases (impetigo, tinea, herpes, onychomycosis, folliculitis, warts, molluscum), inflammatory conditions (psoriasis, AD, HS, eczema, lichen planus, rosacea), other common conditions (acne, alopecia, urticaria), and vascular diseases (haemangiomas, angiomas) validated across multiple studies with pooled performance metrics. Risk-based justification for pooling is documented in Data pooling methodology.
- Referral and triage: Validated in DAO_Derivación_O_2022 (+38% relative increase in adequacy of referrals) and DAO_Derivación_PH_2022 (+25% relative increase in adequacy of referrals), both against the per-study acceptance threshold of ≥ +15%.
Declared acceptable gaps in indication coverage
Three evidence gaps have insufficient representation in the pre-market portfolio. Per MDCG 2020-6 § 6.5(e), these are declared as acceptable gaps with justification, and are addressed through targeted PMCF activities:
- Autoimmune diseases (3% of dermatological presentations): Two autoimmune conditions appear in the evidence portfolio: pemphigus vulgaris (BI_2024, 5 images) and bullous pemphigoid (DAO_Derivación_O_2022, 5 cases). However, pemphigus vulgaris is already accounted for within the Tier 2 rare diseases subgroup analysis. The autoimmune-specific evidence not already counted elsewhere is therefore limited to bullous pemphigoid (5 cases in a single study). This gap is acceptable because: (a) autoimmune skin conditions typically require serological confirmation beyond visual assessment, limiting the device's role to triage and differential ranking; (b) the device is a decision-support tool and the physician always makes the final diagnosis; (c) no acute mortality risk arises from misranking within this category. A supplementary literature search (105 results, April 2026) formally confirms the gap is field-wide: only 2 qualifying papers applying clinical skin image AI to autoimmune conditions were identified across 105 results — Mathur et al. (2021, CNN 86.7% top-1 across 20 conditions including bullous pemphigoid and urticaria) and Yu et al. (2025, AUC 0.91 for vitiligo in 474 pediatric patients) — confirming that the SotA itself lacks adequate autoimmune-specific evidence.
- Genodermatoses (1% of dermatological presentations): No direct representation in the clinical evidence portfolio. This gap is acceptable because: (a) these conditions are typically diagnosed through genetic testing and clinical history rather than image-based assessment alone; (b) the extreme rarity (1% of dermatological presentations) makes prospective study recruitment impractical for pre-market evidence; (c) the device's role for these conditions is supportive (triage, differential ranking), not definitive.
- Fitzpatrick V–VI skin types (limited representation across pre-market studies): Fitzpatrick V and VI skin types are underrepresented in the nine patient-data pre-market clinical investigations (see Demographics and Skin Phototypes table); the tenth pre-market investigation (MAN_2025) is a dedicated Rank 11 simulated-use MRMC reader study on 149 curated images exclusively representative of Fitzpatrick V and VI presentations, contributing MDCG 2020-1 Pillar 3 §4.4 supporting evidence for phototype generalisability at the pre-market stage. Per MDCG 2020-6 § 6.5(e), this gap is declared acceptable on the following grounds: (a) the deployment context (Spain) has inherently low Fitzpatrick V–VI prevalence, limiting recruitment even in large prospective studies; (b) the device was internally tested on 112 Fitzpatrick IV–VI images in ASCORAD_2022, demonstrating algorithm applicability across darker skin tones; (c) the ViT-based architecture assesses relative lesion intensity rather than absolute pixel values, which is architecturally less susceptible to phototype-driven variation than pixel-classification approaches; (d) the gap is field-wide and not device-specific: Tjiu and Lu (2025, meta-analysis of 18 studies) document a persistent AUROC gap of 0.89 (Fitzpatrick I–III) vs. 0.82 (IV–VI) across the published AI dermatology SotA, and Liu et al. (2023, systematic review of 22 studies) conclude the field has insufficient evidence to characterise AI performance specifically in Fitzpatrick V–VI. External positive evidence further contextualises the device's capability: Walker et al. (2025) and Dulmage et al. (2021) demonstrate no statistically significant diagnostic accuracy difference between Fitzpatrick I–III and IV–VI (Walker 2025: AUC 0.858 vs. 0.856, p = NS; Dulmage 2021: accuracy 70% vs. 68%, p = 0.79), and Tepedino et al. (2024) found device specificity higher in Fitzpatrick IV–VI than I–III (69.1% vs. 53.2%) in a primary care study with 27.1% Fitzpatrick V patients. PMCF activities include stratified performance monitoring by Fitzpatrick phototype.
All three gaps are addressed by specific PMCF activities (see R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan).
The uniform Vision Transformer architecture processes all input images through the same feature-extraction procedure regardless of disease category. While this does not guarantee uniform performance across all conditions, it provides a technical basis for the expectation that validated capability on representative conditions extends to other conditions within the same visual feature space. This is supporting evidence for the sufficiency of the pooled evidence base, not a substitute for direct per-category validation; which is why the three declared gap categories are addressed via PMCF.
4. Safety and Clinical Benefit Synthesis
The safety of the device is supported by:
- Clinical Investigation Data: No serious adverse events (SAEs) or device-related complications were reported across all pivotal studies.
- Legacy Device Experience: The equivalent legacy device has been on the market since 2020, with over 250,000 clinical reports generated across 21 active contracts and zero reported serious incidents, CAPAs, or vigilance notifications, confirming a long-term safe performance profile.
- Risk Mitigation: Clinical data confirms that the residual risks (e.g., misinterpretation) are effectively managed by the "HCP-in-the-loop" workflow and the provided interpretative metadata (explainability).
The clinical performance metrics, which substantiate the clinical benefits (including improved diagnostic accuracy (+15% for PCPs), reduced waiting times (-50% in specific workflows), optimized referrals (-30% unnecessary referrals), and objective severity assessment across 4 conditions with published per-condition evidence meeting or exceeding expert consensus benchmarks), have been empirically proven against the quantitative thresholds established in the State of the Art (SotA). The magnitude of these benefits significantly outweighs the minor residual risks associated with software-based diagnostic support.
5. Conclusion on Sufficiency
The evaluators conclude that the clinical evidence is sufficient in both quantity and quality to confirm that the device achieves its intended purpose and satisfies the GSPRs #1, #8, and #17 of the MDR 2017/745. The data set is representative of the target population and provides a high level of clinical confidence in the device's safety and performance profile.
Supplementary Literature Review: April 2026
State of the art re-confirmation for a previously-marketed equivalent device. Although the device under evaluation has not yet been placed on the market, the equivalent legacy device has been on the market in the European Union since 2020 under MDD (Class I) and has accumulated more than 250,000 clinical reports across more than 500 practitioners and more than 100,000 patients (see section Executive summary and Previous version of the device). For the purposes of MDCG 2020-13 Section C, the legacy device's market history is treated as previously-marketed experience: the state of the art and the device's alignment with it are therefore re-assessed at each scheduled CER update. The April 2026 supplementary literature review described below is the formal re-confirmation performed for the present CE-marking application; it concludes that the state of the art in AI-based dermatology MDSW has evolved in specific areas (BCC/cSCC primary-care benchmarks, IHS4 automated inter-rater ranges, autoimmune-AI evidence) but that the device remains consistent with current state of the art across all evaluated clinical domains.
The State of the Art (SotA) constitutes a living document subject to periodic review as part of the manufacturer's ongoing clinical evaluation process. Following completion of the primary pre-market evaluation, a structured gap analysis of the primary SotA corpus identified areas where additional targeted literature searches were required to fully support acceptance criterion derivation, population representativeness assessment, and declared acceptable gap justifications under MDCG 2020-6 § 6.5(e). The following evidence gaps were identified in the primary corpus:
- BCC/cSCC AI detection in non-specialist settings: Limited external benchmarking data for AI performance in primary care or teledermatology referral pathways, the intended use context for the device.
- IHS4 AI severity scoring (ICC contextualisation): Absence of external independent AI-based IHS4 validation studies; the device's AIHS4_2023 ICC of 0.727 required contextualisation against the published range of human expert IHS4 inter-rater reliability.
- Clinical Utility Score threshold derivation: No published benchmark directly establishes a CUS ≥ 8 threshold for teledermatology tools. The CUS ≥ 8 threshold applied for COVIDX_EVCDAO_2022 was derived as a pre-specified internal target intended to translate the System Usability Scale (SUS) excellent-acceptability range (SUS ≥ 80, equating to "excellent" usability per Bangor 2009) onto the 0–10 Clinical Utility Score scale used by the COVIDX questionnaire instrument. The CUS scale (clinical utility) and the SUS scale (usability) measure conceptually different constructs; the cross-scale derivation is therefore a heuristic calibration rather than a formally established equivalence, and is documented as such here. The unmet primary endpoint at COVIDX (CUS 7.66 vs. ≥ 8) is consistent with this caveat: the threshold was set deliberately above the SotA expectation for clinical-utility self-report instruments and is not a regulatory acceptability bound.
- Fitzpatrick V–VI population representativeness: Insufficient characterisation of whether limited Fitzpatrick V–VI coverage in the pre-market evidence base reflects a device-specific or field-wide SotA limitation.
- Pediatric population representativeness: Absence of external AI dermatology studies reporting performance in dedicated pediatric populations.
- Pillar 3 severity assessment evidence (Gap 2): Limited SotA data for AI/smartphone-based severity scoring from clinical encounter images, required to contextualise the §6.5(e) acceptable gap declaration for Pillar 3 Clinical Performance evidence.
- Autoimmune skin disease image-based clinical-recognition evidence: The primary corpus did not identify published AI-validation studies specifically focused on autoimmune skin conditions. A dedicated structured literature review was therefore executed in April 2026 and produced 22 load-bearing anchors (CRIT1–7 ≥ 15/21) supporting the Pillar 1 Valid Clinical Association for image-based recognition of the in-scope autoimmune and genodermatosis conditions. The review is appended to
R-TF-015-011as sectionAutoimmune and genodermatoses. Oral lichen planus retains a residual coverage item (the abstract-only anchor identified in the supplementary search was excluded because the full text was not publicly available for appraisal at the load-bearing threshold) and is routed to PMCF Activity D.1 for post-market confirmation. - UAS inter-rater agreement benchmarks: No published clinician inter-rater agreement data for the UAS instrument, required to contextualise Krippendorff α = 0.603.
To address these gaps, supplementary targeted literature searches (S01–S09) were executed in April 2026 using PubMed, following the same PICO-based methodology, CRIT1-7 appraisal framework, and inclusion/exclusion criteria as the primary SotA search. Twenty-five new papers passed the CRIT1-7 inclusion threshold (≥ 4/10) and were incorporated into the evidence base; the combined corpus totals 93 appraised papers. The search protocol, search strings, screening results, and CRIT1-7 scores for all newly included papers are fully documented in R-TF-015-011 State of the Art, section "Supplementary Literature Search: April 2026." The acceptance criterion derivation, population representativeness assessment, and acceptable gap declarations in the sections below reflect this updated evidence base.
The combined original-plus-supplementary search is judged adequate on the three dimensions required by MDCG 2020-13 Section D:
- Search method: PICO-framed, PRISMA-reported, multi-database (MEDLINE/PubMed, Cochrane Library, ClinicalTrials.gov, FDA MAUDE, FDA Medical Device Recalls, EUDAMED).
- Bias avoidance: inclusion of unfavourable findings (for example COVIDX Clinical Utility Score 7.66 below the pre-specified threshold, MC_EVCDAO_2019 NPV 0.68 below target, DAO_Derivación_PH_2022 protocol deviation, partial HCP completion in BI_2024 and SAN_2024), use of design-specific validated appraisal tools (QUADAS-2 for diagnostic-accuracy studies, MINORS for clinical-utility, MRMC, and published severity-validation studies), and explicit declaration of five evidence gaps per MDCG 2020-6 § 6.5(e).
- Retrieval completeness: supported by the comprehensiveness-test re-run at the closure of the April 2026 supplementary review and by the targeted closure of the nine SotA gaps named above.
Acceptance Criteria Derivation from State of the Art
The derivation of acceptance criteria follows the tiered evidence assessment strategy described in Tiered evidence assessment strategy. Tier 1 (malignant conditions) has condition-specific thresholds derived from SotA meta-analyses of melanoma and malignancy detection literature. Tier 2 (rare diseases) has grouped thresholds justified by the distinct clinical benefit of improving rare disease diagnosis (sub-criterion (b) of benefit 7GH). Tier 3 (general conditions) uses pooled thresholds derived from SotA data on diagnostic accuracy improvement, referral optimisation, and remote care capacity, justified by the risk-based pooling rationale documented in Data pooling methodology.
The performance claims were grouped by clinical domain (e.g., Malignancy Detection, Improvement in Diagnostic Accuracy, Diagnostic Accuracy (Unaided), Reduction of Unnecessary Referrals, Efficiency Metrics, Remote Care Capacity, Referral Sensitivity/Specificity, Severity assessment of dermatological conditions Inter-observer Agreement, Metric Interpretation, Experts' Agreement).
For these blocks, the average value for the performance claim acceptance criteria was established by performing a meta-analysis where the data permitted. In cases where a meta-analysis was not feasible, for example, when studies are too heterogeneous (diverse in population, intervention, or outcomes), of poor quality, or when data is missing or inconsistently reported (this is our case), we conducted a weighted average analysis, assigning weight to articles based on their quality and sample size, and calculating confidence intervals (using the Wilcoxon method where applicable). In certain instances, studies that were already meta-analyses were included; for these, we incorporated the results of the existing meta-analysis and integrated any additional selected articles into our overall calculation.
Specific justification for acceptance criteria thresholds:
- 7GH Malignant Conditions Sub-criterion (c) (Pooling Justification for NMSC): Per MEDDEV 2.7.1 Rev 4 Annex A7.3, major clinical indications generally require individual evaluation. Melanoma, BCC, and SCC have therefore been evaluated individually with dedicated acceptance criteria in the derivation table. These three conditions are additionally evaluated as part of the aggregate "Multiple Malignant Conditions" pooled assessment, which provides an overall safety characterisation across all malignant neoplasms including melanoma, BCC, SCC, and actinic keratosis. The pooled assessment reflects the immediate clinical decision made by a primary care physician when encountering a suspicious malignant lesion: urgent referral for specialist assessment and histopathological confirmation, regardless of the specific NMSC subtype. The device acts as a decision-support tool to trigger this referral, not to differentiate definitively between NMSC subtypes where histopathology remains the gold standard.
- 7GH BCC/cSCC non-specialist benchmarks (supplementary evidence): Supplementary searches confirm non-specialist SotA context: Jones et al. (2022, systematic review of 272 studies) established BCC mean AUC 92.3% and SCC mean AUC 87.5% across predominantly specialist/enriched settings, with only 2 of 272 studies from non-referred populations. In primary care reader studies, AI-aided physician sensitivity for mixed skin cancer (including BCC/SCC) ranges from 81.7% to 88% (Ferris et al., 2025 FDA Pivotal; Jaklitsch et al., 2023). Post-referral pathway meta-analysis: any-malignancy sensitivity 96.1% (Walton et al., 2026, NICE HTA). Non-specialist AI meta-analysis (17 studies): AI-aided SE 79.3% vs. unaided SE 66.3% for non-dermatologist clinicians (Krakowski et al., 2024). These benchmarks confirm the pooled AUC ≥ 0.90 criterion represents meaningful superiority over the non-specialist SotA in the intended use setting.
- 3KX remote care sub-criterion improvement of at least 30% in sensitivity for remote referrals: There is a notable gap in existing literature concerning the sensitivity and specificity of medical devices in enhancing the detection of cases requiring referral. Thus, the 30% figure represents the documented enhancement (improvement) of these metrics for primary care physicians when utilizing the device during teledermatology consultations compared to an unaided baseline (which in our studies was 0% for remote detection of specific referral criteria).
- 7GH rare disease sub-criterion absolute accuracy 54%: Rare skin diseases present a significant challenge due to low incidence and high misdiagnosis rates. An acceptance criterion of 54% represents a significant documented clinical benefit over the unaided diagnostic performance of HCPs. On average, for both dermatologists and PCPs, there was an increase in Top-1 diagnostic accuracy of 26.77%, an increase in sensitivity of 25.56%, and an increase of 23.50% in specificity for the diagnosis of rare diseases with the use of the device. Specifically, for PCPs the increase was: 28.54% in Top-1 diagnostic accuracy, 25.21% in sensitivity and 24.73% in specificity; for dermatologists the increase was: 12.97% in Top-1 diagnostic accuracy, 16.44% in sensitivity and 15.41% in specificity (based on results from pivotal studies BI 2024 and PH 2024). These results demonstrate the significant improvement in diagnostic precision achieved through the clinical use of the device for the diagnosis of rare diseases.
- 5RB unweighted kappa 0.6 for alopecia severity: There is currently a lack of literature specifically addressing inter-observer agreement or Cohen's Kappa for assessing pathological severity of Female Androgenetic Alopecia. Consequently, the established acceptance criteria are derived from the standard clinical interpretation of the metric: in Cohen's Kappa, a kappa value of 0.41-0.60 represents moderate agreement, which is considered an acceptable threshold in clinical environments, while results exceeding 0.60 are deemed optimal.
- 5RB HS severity ICC ≥ 0.70: The acceptance criterion of ICC ≥ 0.70 represents superiority over the original published IHS4 inter-rater ICC of 0.47 (Thorlacius et al., 2019). Supplementary searches confirm that with experienced raters and training, published IHS4 inter-rater ICC ranges from 0.69 to >0.75 (Goldfarb et al., 2021, citing Zouboulis and Włodarek). The only external independent AI-based IHS4 study identified (Wiala et al., 2024) achieved AUC 0.84–0.89 for automated IHS4 classification and cites human expert inter-rater ICC 0.68–0.78 as the benchmark. The device's AIHS4_2023 ICC of 0.727 falls within this expert inter-rater range, confirming the criterion is clinically calibrated.
- Clinical Utility Score (CUS) ≥ 80% for teledermatology function: The threshold of ≥ 80% (equivalent to ≥ 8 on a 0–10 scale) is anchored to published benchmarks for clinically accepted digital health tools in dermatology: Roca et al. (2022) reported a System Usability Scale (SUS) score of 70.1 for a teledermatology virtual assistant for psoriasis (above the validated "above average" threshold of ≥ 68/100); Mostafa and Hegazy (2022) reported TUQ satisfaction of 87–93% across subscales for teledermatology in dermatological conditions; Romero-Jimenez et al. (2022) reported median satisfaction 9.1/10 for a digital health tool in inflammatory skin conditions including psoriasis and atopic dermatitis. These benchmarks confirm that a threshold of ≥ 8/10 is consistent with the minimum acceptable usability threshold (SUS ≥ 68 ≈ 6.8/10 equivalent) and with published satisfaction levels for accepted tools in this clinical domain.
- Expert Panel Alignment (Majority Vote >= 75%): Methodological literature for expert consensus does not set a single universal threshold; however, an agreement of >= 75% is frequently considered a substantial or optimal majority consensus in clinical validation. This threshold ensures the device aligns with the consolidated judgment of a qualified expert panel, providing a robust reference standard for performance evaluation.
Rationale for the Establishment of Acceptance Criteria
In accordance with MDCG 2020-1 and MEDDEV 2.7/1 Rev. 4, the acceptance criteria for the clinical performance and safety of the device have been established through a systematic appraisal of the State of the Art (SotA). The objective of this process was to define the technological boundary and the expected performance of current clinical practice (standard of care) to ensure that the device provides a Substantial Clinical Benefit.
The derivation of these criteria followed a three-stage analytical workflow:
- Extraction of SotA Benchmarks: Performance metrics (Sensitivity, Specificity, AUC, Accuracy) were extracted from 68 primary SotA articles (subsequently expanded to a combined corpus of 93 articles including 25 supplementary papers from the April 2026 targeted searches; the acceptance criteria in the table below reflect the final derivation from the complete corpus). For high-risk indications, such as melanoma and non-melanoma skin cancer (NMSC), individual benchmarks were derived to ensure granular safety characterisation.
- Synthesis of Baselines: Where data permitted, meta-analyses or weighted averages were performed to establish the synthesized SotA baseline. This baseline represents the "unaided" or "current AI-standard" performance against which our device is measured.
- Establishment of Targets with Safety Margins: Final targets were established by adding a domain-specific clinical significance margin above the synthesized SotA baselines. The magnitude of this margin varies by clinical domain (ranging from approximately 3 to 23 percentage points), calibrated to the clinical risk level and SotA data variability of each domain, rather than applying a uniform value. This margin accounts for real-world image variability and ensures that the device's performance represents a meaningful improvement over standard care, thereby justifying a favorable benefit-risk profile.
Summary of Acceptance Criteria Derivation
The following table provides the direct analytical link between the state-of-the-art literature (detailed in R-TF-015-011) and the established acceptance criteria.
| Benefit ID | Clinical Domain | Relevant SotA Article(s) | Methodology | Derived SotA Baseline | Acceptance Criterion |
|---|---|---|---|---|---|
| 7GH | Melanoma Detection | Maron et al. 2019, Haenssle et al. 2018, Barata et al. 2023, Chen et al. 2024, Maron et al. 2020, Brinker et al. 2019, Marchetti et al. 2019 | Meta-analysis | AUC: 0.81 [0.78-0.84] Top-1 accuracy: 0.754 [0.70-0.80] Sensitivity: 0.734 [0.67-0.79] Specificity: 0.762 [0.68-0.84] | AUC >= 0.81 (non-inferiority to the SotA meta-analysis baseline), Top-1 accuracy >= 0.81, Sensitivity >= 0.93, Specificity >= 0.80. The study's own pre-specified design criterion was AUC >= 0.80. Achieved: AUC 0.8482 (0.85 rounded), meeting the SotA-derived criterion. |
| 7GH | Multiple Malignant Conditions (Pooled) | Maron et al. 2019, Han et al. 2020, Ahadi et al. 2021, Tepedino et al. 2024, Tschandl et al. 2019; supplementary non-specialist benchmarks: Jones et al. 2022, Jaklitsch et al. 2023, Ferris et al. 2025 (DERM-SUCCESS FDA Pivotal), Walton et al. 2026 (NICE HTA); secondary reinforcement: Krakowski et al. 2024, Marsden et al. 2024, Sangers et al. 2022, Chen et al. 2024 | Meta-analysis | AUC: 0.7780 [0.74-0.80] Sensitivity: 0.76 [0.70-0.82] Specificity: 0.79 [0.71-0.85] Supplementary non-specialist benchmarks — BCC mean AUC 92.3%, mean SE 83.7%; SCC mean AUC 87.5%, mean SE 60.3% (Jones 2022, 272 studies); AI-aided PCP SE 81.7–88% for mixed skin cancer incl. BCC/SCC (Ferris 2025, Jaklitsch 2023); post-referral pathway any-malignancy SE 96.1% (Walton 2026) | AUC >= 0.90, Sensitivity >= 0.79, Specificity >= 0.87 (Superiority to SotA benchmarks). Note: BCC and SCC are included in this pooled results of malignancy detection. Non-specialist and referral-pathway benchmarks confirm the criterion is calibrated for the intended primary care use setting. |
| 7GH | Malignancy Detection: PPV/NPV (Primary Care) | Chen et al. 2025 (JAMA Derm), Seyed Ahadi et al. 2021. Published evidence confirms PPV is highly prevalence-dependent and lower in primary care; NPV is consistently high and may exceed 95% with AI assistance. | Literature-benchmarked range (MEDDEV A7.3) | PPV in primary care varies with prevalence and is inherently lower than in specialist settings. NPV is consistently high (≥ 95%) in primary care with AI assistance. Criteria benchmarked against specialist and primary-care performance ranges rather than a single pooled value. | PPV >= 42% in primary care (pooled malignant conditions, human-in-the-loop). NPV >= 96% in primary care (human-in-the-loop). These are human-in-the-loop acceptance criteria reflecting device-aided clinician performance; observed values are reported in the individual performance claims. For standalone device predictive values, see the MEDDEV A7.3 analysis in the "Predictive Values by Clinical Setting" section. |
| 7GH | Malignancy Detection: PPV/NPV (Dermatology) | Chen et al. 2025 (JAMA Derm), Seyed Ahadi et al. 2021. PPV increases substantially with pre-test probability in specialist settings; NPV remains high across settings. | Literature-benchmarked range (MEDDEV A7.3) | PPV in dermatology settings is meaningfully higher than in primary care due to elevated pre-test probability. NPV remains high. Criteria benchmarked against specialist performance ranges. | PPV >= 89% in dermatology setting (pooled malignant conditions, human-in-the-loop). NPV >= 82.5% in dermatology (human-in-the-loop). These are human-in-the-loop acceptance criteria reflecting device-aided clinician performance; observed values are reported in the individual performance claims. For standalone device predictive values, see the MEDDEV A7.3 analysis in the "Predictive Values by Clinical Setting" section. |
| 7GH | Diagnostic Accuracy Improvement | Ba et al. 2022, Ferris et al. 2025, Han et al. 2020, Jain et al. 2021, Maron et al. 2020, Krakowski et al. 2024, Tschandl et al. 2020 | Weighted Average | Overall: +6.36% Accuracy, +6.30% Sens, +4.60% Spec PCPs: +9.30% Accuracy, +13.00% Sens, +10.80% Spec Dermatologists: +5.30% Accuracy, +6.30% Sens, +4.60% Spec | Increase in Top-1 accuracy >= 15% (Overall), >= 18% (PCP), >= 9% (Derm). Increase in sensitivity >= 18% (Overall). Increase in specificity >= 19% (Overall). (Targets significantly exceeding SotA to ensure substantial clinical benefit). |
| 7GH | Diagnostic Accuracy (Unaided) | Previous articles + Escalé-Besa et al. 2023, Han et al. 2020, Han et al. 2022, Kim et al. 2022, Liu et al. 2020, Muñoz-López et al. 2021 | Meta-analysis | Overall: Accuracy 0.49, Sens 0.69, Spec 0.764 PCPs: Accuracy 0.419, Sens 0.663, Spec 0.701 Derms: Accuracy 0.57, Sens 0.73, Spec 0.776 | Establish benchmarks for baseline unaided HCP comparison. Device-aided performance must exceed these baselines. |
| 3KX | Adequacy of Referrals (relative increase) | Baker et al. 2022, Eminović et al. 2009, Jain et al. 2021, Knol et al. 2006 | Weighted Average | 14% (MD unaided) – 24% (teledermatology) relative increase in adequacy of referrals | Relative increase in adequacy of referrals >= 15% (SotA-derived threshold applied as the per-study acceptance criterion for DAO_Derivación_PH_2022 and DAO_Derivación_O_2022). |
| 3KX | Efficiency Metrics (Waiting Lists) | Giavina-Bianchi et al. 2020, Morton et al. 2010, Hsiao & Oh 2008, Spanish SNS Report 2025, DREES 2018, DERMAsurvey 2013 | Weighted Average | Baseline wait time > 60 days. SotA tools observed ~71% reduction (to 5-11.5 days). | Reduction of cumulative waiting time >= 50% (Alignment with SotA capabilities). |
| 3KX | Remote Care Capacity | Giavina-Bianchi et al. 2020, Orekoya et al. 2021, Kheterpal et al. 2023, Whited 2015 | Weighted average | ~55% of patients managed remotely | >= 55% of patients managed remotely with device assistance (Alignment with SotA benchmarks). Observed value in pivotal studies: 58% (PH_2024), 56% (SAN_2024 dermatologists). |
| 3KX | Referral Sensitivity for PCPs | Burton et al. 1998, Gerbert et al. 1996 | Weighted average | PCPs unaided: Sens 0.663, Spec 0.60. SotA floor: ≥ 10% improvement would be a meaningful increment above unaided baseline. | Improvement of >= 30% in sensitivity for identifying necessary referrals remotely. The criterion exceeds the SotA-derived floor of ≥ 10% to ensure a substantial, clinically meaningful benefit above standard unaided care, accounting for the additional challenge of the teledermatology context. |
| 3KX | Referral Specificity (Teledermatology) | DAO_Derivación_O_2022 (remote subset); Burton et al. 1998, Gerbert et al. 1996 | Direct observation (pivotal study remote subset) | Unaided PCP remote specificity: 66.7% (30/45, 95% CI: 52.1%–78.6%). Same as the algorithm in the remote subset, confirming the device does not increase false positives compared to unaided care in remote workflows. | Specificity ≥ 65% in identifying unnecessary referrals remotely. Criterion aligned with the 3KX referral adequacy in-person specificity benchmark, accounting for the additional challenge of the teledermatology context. |
| 5RB | Inter-observer Correlation (IHS4) | Thorlacius et al. 2019, Goldfarb et al. 2021, Wiala et al. 2024 | Weighted average | Human expert IHS4 inter-rater ICC ranges from 0.47 (original study, Thorlacius) to >0.75 (with training, Włodarek cited in Goldfarb 2021); experienced-rater studies: ICC 0.69–0.78. AI-based automated IHS4 classification: AUC 0.84–0.89 (Wiala 2024). | ICC >= 0.70 for HS severity assessment (Superiority to original SotA baseline of 0.47; within the upper-moderate/approaching-high range of trained expert inter-rater agreement 0.69–>0.75; the device's AIHS4_2023 ICC of 0.727 falls within the published human expert performance band). |
| 5RB | Alopecia Severity Assessment | Landis & Koch 1977 | Interpretation of standardised guidelines | N/A (Methodological standard for Kappa interpretation). Per Landis & Koch: κ 0.41–0.60 = "Moderate"; κ 0.61–0.80 = "Substantial". A criterion of κ ≥ 0.60 falls at the upper boundary of the Moderate band, representing moderate-to-substantial agreement. | Inter-observer agreement Cohen's Kappa >= 0.60 (moderate-to-substantial agreement per Landis & Koch 1977; the threshold sits at the upper boundary of the "Moderate" band and just below the "Substantial" band, which begins at κ > 0.60). |
| 5RB | Alopecia Severity Correlation | N/A (no published SotA baseline for device–HCP severity correlation in androgenetic alopecia; assessed against methodological standard) | Direct observation (pivotal study) | No external SotA baseline available. The 65% threshold is derived from the Landis & Koch framework: > 0.60 = "Substantial" agreement, applied to correlation as a complementary inter-rater metric. | Correlation >= 65% between device and HCP severity assessment for androgenetic alopecia. Observed: 77% (95% CI: 0.69–0.85) in IDEI_2023. |
| 3KX | Expert Agreement | Consensus Methodological literature | Interpretation of standardised guidelines | Consensus agreement >= 75% typically accepted as substantial | Alignment with Majority Vote of expert panel >= 75% (Alignment with methodological standards). |
Note on referral sensitivity criteria (3KX): Two distinct referral sensitivity criteria appear in the benefit 3KX evidence framework, each addressing a different clinical workflow context. The first, absolute in-person referral sensitivity ≥ 70%, appears in the CEP benefit table under sub-criterion (b) "Referral adequacy" and is assessed against the real-world DAO studies' unaided PCP referral performance; it reflects the minimum acceptable device-aided sensitivity for in-person triage decisions. The second, relative improvement of ≥ 30% in sensitivity for remote referrals, is the criterion in the derivation row above and in CEP sub-criterion (c) "Remote care"; it is calibrated from the unaided baseline of 0.663 per the SotA derivation. The two criteria are complementary: the absolute floor (≥ 70%) ensures the in-person triage function delivers clinically meaningful sensitivity, while the relative improvement floor (≥ 30%) ensures the remote workflow delivers a substantial increment over the unaided baseline regardless of absolute level. Both are met in the clinical evidence portfolio.
Per-domain safety-margin derivation
The acceptance criteria above are set above the synthesised SotA baseline by a domain-specific clinical-significance margin. The magnitude of each margin is calibrated to the clinical risk level of the domain and to the SotA data variability, rather than applied uniformly. The table below documents the margin per domain with the explicit rationale, in order to forestall any concern of ex-post calibration:
| Benefit | Domain | SotA baseline | Acceptance criterion | Margin (pp) | Margin rationale |
|---|---|---|---|---|---|
| 7GH | Melanoma detection (AUC) | AUC 0.81 | AUC ≥ 0.81 | 0 (non-inferiority) | Highest-risk indication; non-inferiority to the SotA meta-analysis baseline is the regulatorily defensible target. Superiority is not required given the demonstrated SotA range (0.78–0.84). The pre-specified study design criterion AUC ≥ 0.80 sits within the SotA CI. |
| 7GH | Multiple Malignant (Pooled AUC) | AUC 0.778 | AUC ≥ 0.80 | +2.2 | Per-study acceptance threshold for pooled multiple-malignant AUC, applied to MC_EVCDAO_2019, DAO_Derivación_O_2022 and DAO_Derivación_PH_2022. Margin calibrated for non-inferiority to the SotA meta-analytic baseline (AUC 0.778) with a modest safety margin within the SotA confidence interval. |
| 7GH | Diagnostic Accuracy Improvement (Overall) | +6.36 pp | ≥ +15 pp | +8.6 | Substantial-clinical-benefit margin; targets clinically meaningful improvement above the SotA-observed weighted-average effect size in AI-assisted diagnostic accuracy. |
| 7GH | Diagnostic Accuracy Improvement (PCP) | +9.30 pp | ≥ +18 pp | +8.7 | Calibrated for primary-care users, where the unaided baseline is lower (0.419) and the AI-assistance lift is the device's principal clinical value proposition. |
| 7GH | Diagnostic Accuracy Improvement (Derm) | +5.30 pp | ≥ +9 pp | +3.7 | Smaller margin: dermatologists' unaided baseline (0.57) is already high; SotA AI-aid lift in this user tier is intrinsically smaller. Margin set above the upper end of SotA observed lift to ensure substantial clinical benefit without exceeding biological plausibility. |
| 7GH | Sensitivity (Overall) | +6.30 pp | ≥ +18 pp | +11.7 | Sensitivity prioritised for safety reasons (false negatives in malignancy carry the worst clinical consequences); larger margin reflects the safety-criticality of this metric. |
| 7GH | Specificity (Overall) | +4.60 pp | ≥ +19 pp | +14.4 | Specificity prioritised for healthcare-system reasons (false positives drive unnecessary referrals); larger margin reflects the operational impact in the intended primary-care deployment. |
| 3KX | Adequacy of Referrals (relative increase) | 14% (MDs) – 24% (telederm) | ≥ +15% | +1 pp | Per-study acceptance threshold for relative increase in adequacy of referrals, applied to DAO_Derivación_PH_2022 and DAO_Derivación_O_2022. Non-inferiority to the SotA unaided-MD benchmark. |
| 3KX | Waiting-time reduction | ~71% reduction (SotA tools) | ≥ 50% reduction | −21 (alignment, not superiority) | Acceptance set in alignment with SotA capability (not superiority); the SotA observed value is high (~71%) and the device-aided system delivers within the SotA capability band. |
| 3KX | Remote care capacity | ~55% remote | ≥ 55% remote | 0 (alignment) | Acceptance set in alignment with SotA capability; remote capacity is a system-design target rather than a clinical-superiority claim. |
| 3KX | Referral Sensitivity for PCPs (remote) | +10 pp SotA floor | ≥ +30 pp | +20 | Larger margin reflects the additional clinical challenge of teledermatology context (image-only assessment without in-person examination), and the elevated safety bar for remote-pathway sensitivity. |
| 3KX | Referral Specificity (Teledermatology) | 66.7% unaided | ≥ 65% device-aided | −1.7 (non-inferiority) | Non-inferiority to unaided remote specificity; ensures the device does not increase false positives in remote workflows while delivering its sensitivity lift. |
| 5RB | Inter-observer Correlation (IHS4) | ICC 0.47 (original SotA) | ICC ≥ 0.70 | +0.23 | Margin set within the upper-moderate / approaching-high range of trained-expert inter-rater agreement (0.69–0.78); the device's AIHS4_2023 ICC of 0.727 falls within the published expert performance band. |
| 5RB | Alopecia Severity Assessment (Cohen's κ) | N/A (methodological standard) | κ ≥ 0.60 | n/a (Landis-Koch boundary) | Set at the upper boundary of the "Moderate" Landis-Koch band; pragmatic threshold given absence of a published SotA AI baseline for this metric. |
| 3KX | Expert Agreement | ≥ 75% consensus (methodological standard) | ≥ 75% Majority Vote | n/a (alignment with consensus standard) | Aligned with established consensus methodological literature. |
The 0–23 percentage-point per-domain margin range reflects the heterogeneity of the underlying domains, the clinical risk levels (highest for malignancy sensitivity), and the SotA-data variability per metric. Margins were established in advance of analysis, in alignment with the SotA literature corpus appraised in R-TF-015-011, and were not adjusted post-hoc against observed data.
Predictive Values by Clinical Setting (MEDDEV 2.7.1 Rev 4, Annex A7.3)
In accordance with MEDDEV 2.7.1 Rev 4 Annex A7.3, the clinical performance of the device for diagnostic purposes is presented as positive predictive value (PPV) and negative predictive value (NPV) across varying pre-test probabilities, reflecting the range of clinical settings in which the device is intended to be used.
The following analysis is based on the melanoma detection performance of the device as reported in study MC_EVCDAO_2019 (Sensitivity: 93.2% [95% CI: 88.4–98.1%]; Specificity: 81.0% [95% CI: 69.4–92.5%]). This study provides Tier 1 evidence for the highest-risk malignant indication and is therefore selected as the reference for this analysis. PPV and NPV are derived using Bayes' theorem across three representative clinical settings that reflect the range of pre-test probabilities for melanoma encountered in clinical practice.
| Clinical Setting | Pre-test Probability | PPV | NPV | Clinical Interpretation |
|---|---|---|---|---|
| Primary care: low suspicion | 2% | 9.1% | 99.8% | A negative result from the device effectively rules out melanoma in primary care. A positive result triggers specialist referral for confirmation, consistent with the human-in-the-loop clinical workflow. |
| General dermatology | 10% | 35.2% | 99.0% | A negative result provides strong diagnostic reassurance. A positive result is clinically meaningful and appropriately triggers an urgent referral pathway. |
| Pigmented lesion clinic: high suspicion | 30% | 67.7% | 96.4% | In specialist settings with high pre-test probability, the device provides clinically useful positive identification, while a negative result still achieves high NPV, supporting its value as a decision-support tool in high-risk workflows. |
The analysis confirms that across all clinical settings in which the device is intended to be used, the NPV remains high (≥ 96%), supporting the safe use of the device as a decision-support tool operating within a human-in-the-loop clinical workflow. The PPV increases substantially with pre-test probability, consistent with established statistical expectations for diagnostic support devices. These predictive value profiles are consistent with those reported for comparable CE-marked and FDA-cleared AI dermatology devices in the SotA literature.
Note on standalone vs. human-in-the-loop predictive values: The PPV and NPV values in this section represent standalone device performance (AI algorithm output). The CEP (R-TF-015-001, §17.4) also specifies human-in-the-loop PPV/NPV acceptance criteria (PPV ≥ 42% and NPV ≥ 96% in primary care; PPV ≥ 89% and NPV ≥ 82.5% in dermatology), which reflect the combined performance of the device-aided clinician. These human-in-the-loop criteria are addressed by the individual performance claims documented in the technical file.
Summary of Clinical Benefits Achievement
To provide a coherent and navigable view of the evidence base, the following table summarizes the aggregate results achieved across the three claimed clinical benefits. This summary provides the high-level justification for the device's clinical utility, which is supported by the detailed breakdown of the individual performance claims documented in the technical file. Some of these performance claims are aggregated to formulate the final metrics below. Detailed values and confidence intervals for each claim are presented inline in the per-study sections of this CER and in the Clinical Evaluation Plan (R-TF-015-001).
Benefit 7GH covers the full spectrum of diagnostic accuracy claims and uses two distinct metrics for its sub-criteria: Top-1 accuracy for general and rare disease presentations, and AUC for malignant lesion presentations. These metrics are methodologically appropriate to their respective clinical questions (classification correctness vs. malignant/non-malignant discrimination) and both measure aspects of the same underlying classification capability. Benefit 5RB covers objective severity assessment across multiple dermatological conditions; its acceptance criteria are structured per-condition using the clinically validated severity scale for each condition, supported by both pre-market clinical investigations and published peer-reviewed validation literature (MDCG 2020-1, Pillar 2). Benefit 3KX covers all care pathway claims, including waiting time reduction, referral adequacy, and remote care capacity.
| ID | Clinical Benefit | Acceptance Criteria (Sub-criteria) | Observed Magnitude | Supporting Performance Claims & Source Studies | Status |
|---|---|---|---|---|---|
| 7GH | Diagnostic Accuracy (all presentations) | (a) General conditions: Top-1 accuracy improvement >= 15% (b) Rare diseases: Absolute Top-1 accuracy >= 54% (c) Malignant lesions: AUC >= 0.80 (pooled multiple malignant, per-study); AUC >= 0.81 (melanoma, non-inferiority to SotA) | (a) +18.5% (aggregate) (b) 54.8% (c) AUC 0.8983 (MC_EVCDAO_2019 pooled multiple malignant); AUC 0.85 (MC_EVCDAO_2019 melanoma); AUC 0.842 (DAO_Derivación_PH_2022); AUC 0.82 (DAO_Derivación_O_2022) | 114 aggregated performance claims (70 general + 24 rare + 20 malignancy). Studies: BI_2024, IDEI_2023, MC_EVCDAO_2019, PH_2024, SAN_2024, DAO_Derivación_PH_2022, DAO_Derivación_O_2022 | ✓ Achieved |
| 5RB | Objective Severity Assessment | (a) HS severity (IHS4): ICC >= 0.70 (b) Psoriasis severity (PASI): Visual sign accuracy >= human annotator consensus (c) Urticaria severity (UAS): Krippendorff α >= 0.60 (d) AD severity (SCORAD): RMAE <= 15% | (a) ICC 0.727 (b) 60.6% vs 52.5% (erythema) (c) α = 0.826 counting, 0.603 severity (d) RMAE 13.0% | 9 aggregated performance claims. Clinical investigations: AIHS4_2025, COVIDX_EVCDAO_2022, IDEI_2023. Published validation literature (MDCG 2020-1 Pillar 2): APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022 | ◐ Pillar 2 demonstrated; Pillar 3 supported by AIHS4_2025 pilot (n = 2) and R-TF-015-012 co-primary C4; prospective Pillar 3 confirmation pre-committed under PMCF B.1–B.5 (Gap 2, declared §6.5(e)) |
| 3KX | Care Pathway Optimisation (referral, waiting times, remote) | (a) Waiting times: Reduction in cumulative waiting time >= 50% (b) Referral adequacy: Relative increase in adequacy of referrals >= 15% (c) Remote care: Expert consensus agreement >= 75% | (a) 56% reduction (b) +38% relative increase (DAO_Derivación_O_2022) and +25% relative increase (DAO_Derivación_PH_2022) (c) 100% expert consensus agreement | 27 aggregated performance claims (14 waiting times + 8 referral + 5 remote). Studies: COVIDX_EVCDAO_2022, DAO_Derivación_PH_2022, DAO_Derivación_O_2022, PH_2024, SAN_2024 | ✓ Achieved |
Note on individual metric shortfalls within the evidence portfolio: The summary table above presents aggregate benefit achievement across the full evidence portfolio. Some individual acceptance criteria metrics within the per-study reconciliation tables were not met in specific studies, for reasons documented in those tables. These include: the BI_2024 dermatologist specificity (73.08% vs. target 77.6% — substantially above the SotA unaided baseline, justification: slightly aggressive threshold for this study design); the DAO_Derivación_O_2022 referral adequacy improvement (+7% vs. ≥15% target — the local baseline referral adequacy was already exceptionally high, limiting relative room for improvement); three COVIDX_EVCDAO_2022 individual survey items (consultation time reduction 50%, feature-specific positive assessment 67%, CUS 76.67 — the overall CUS score of 80% and recommendation rate of 80% demonstrate device acceptance); two IDEI_2023 retrospective alopecia metrics (correlation 0.47 and Kappa 0.33 in the retrospective subset — explained by non-standardised image acquisition, overridden by the pooled prospective criterion both being met); and the MC_EVCDAO_2019 NPV (0.68 vs. target 0.9 — a consequence of the high malignancy prevalence in the study cohort; Bayes' theorem analysis confirms NPV ≥99.8% in the intended primary care setting). These individual shortfalls do not alter the aggregate benefit conclusions, which are based on pooled analysis, the totality of evidence across studies, and explicit documented justification for each shortfall.
Need for more clinical evidence
Based on the critical analysis of the available clinical data presented in this report, the evaluators consider that the current body of evidence is sufficient to demonstrate the conformity of the device with the General Safety and Performance Requirements (GSPRs) of the MDR 2017/745. The pre-market pivotal studies and the equivalence with the legacy predecessor provide robust evidence of safety and performance for the intended use.
However, in alignment with the principle of continuous evaluation required by the MDR, and to ensure the long-term sustainability of the benefit-risk profile, the manufacturer has identified specific areas where further clinical data collection is desirable in the post-market phase. These areas, documented as "Gaps" in the PMCF Plan, are:
-
Gap 1. Triage and Malignancy Prioritization (benefits 7GH sub-criterion c and 3KX sub-criterion a): While pre-market evidence demonstrates diagnostic accuracy for malignant conditions (benefit 7GH, sub-criterion c) and care pathway performance (benefit 3KX, sub-criterion a), more clinical data is required to quantify the operational impact of the device in real-world settings — specifically, confirmation that malignancy prioritization translates into reduced average waiting times for high-risk patients and that triage effectiveness is maintained in teledermatology workflows.
-
Gap 2. Prospective validation of automated severity assessment in clinical settings: Pre-market Technical Performance evidence for severity assessment (benefit 5RB) is established through 4 published peer-reviewed validation studies demonstrating algorithm-level concordance with expert dermatologist consensus across psoriasis (PASI), urticaria (UAS), hidradenitis suppurativa (IHS4), and atopic dermatitis (SCORAD). The AIHS4_2025 proof-of-concept pilot (n = 2 patients) provides preliminary Pillar 3 signal only (ICC 0.727 in a longitudinal setting) and is not treated as definitive clinical validation. The supplementary literature search confirms that this gap is SotA-wide, not device-specific: published studies of AI and smartphone-based severity scoring in clinical encounters demonstrate ICC 0.58–0.90 across dermatological conditions — Schaap et al. (2022) reported CNN-based automated PASI scoring from clinical photographs achieving ICC 0.58–0.79, comparable to trained physician performance; Ali et al. (2022) reported ICC 0.86–0.90 for smartphone-photograph-based EASI/SCORAD assessment against clinical evaluation in 79 atopic dermatitis patients. The device's AIHS4_2025 ICC of 0.727 falls within this SotA range. For context, unaided consumer self-assessment without AI achieves only ICC 0.23 for psoriasis severity scoring from patient photographs (Ali et al., 2024), confirming the added value of AI-assisted assessment. Real-world clinical validation of AI severity scoring is a recognised SotA limitation: automated PASI scoring from clinical images requires "further clinical validation in real-life practice" (Schaap et al., 2022), and AI acne severity grading from smartphones achieves 68% agreement with dermatologists (Seité et al., 2019), indicating that smartphone-based severity scoring is feasible but still maturing across conditions. Per MDCG 2020-6 § 6.5(e), this gap is declared acceptable: the limitation is SotA-wide, not device-specific, and the device's pre-market Technical Performance evidence is sufficient to support initial CE marking for the severity assessment benefit. PMCF activities B.1–B.5 will provide essential prospective Clinical Performance evidence (MDCG 2020-1, Pillar 3) in clinical settings with larger patient cohorts (target: 100+ patients per condition), confirming that algorithm-level performance translates to real-world use with device-captured images, and extending evidence to additional conditions (acne via ALADIN, vitiligo via AVASI, frontal fibrosing alopecia).
-
Gap 3. Monitoring of Sustained Core Algorithmic Performance (benefit 7GH, all sub-criteria): Given the nature of AI/ML software, continuous monitoring is required to ensure that the core diagnostic algorithms underlying benefit 7GH (accuracy, sensitivity, and specificity across general, rare, and malignant presentations) maintain their stability and reliability over time in the post-market environment and do not suffer from performance drift.
-
Gap 4. Autoimmune diseases evidence coverage (benefit 7GH, sub-criterion a — indication coverage): Per MDCG 2020-6 § 6.5(e), the evidence portfolio has insufficient representation of autoimmune skin conditions (3% of dermatological presentations). Two autoimmune conditions appear in the portfolio, pemphigus vulgaris (BI_2024) and bullous pemphigoid (DAO_Derivación_O_2022), but pemphigus vulgaris is already accounted for within the Tier 2 rare diseases analysis. The autoimmune-specific evidence not counted elsewhere is limited to bullous pemphigoid (5 cases in one study). This gap is declared acceptable because: (a) autoimmune skin conditions typically require serological confirmation beyond visual assessment, limiting the device's role to triage and differential ranking; (b) the device is a decision-support tool and the physician always makes the final diagnosis; (c) no acute mortality risk arises from misranking within this category. The supplementary literature search (105 PubMed results, April 2026) formally confirms that this gap is field-wide: only 2 qualifying papers were identified applying clinical skin image AI to autoimmune conditions — Mathur et al. (2021), a CNN ensemble achieving 86.7% top-1 accuracy across a 20-condition panel including bullous pemphigoid and urticaria, explicitly validated on skin of color; and Yu et al. (2025), a deep learning model achieving AUC 0.91 for vitiligo in 474 pediatric patients, outperforming dermatologists (AUC 0.77). The remaining 103 results involved genomics, transcriptomics, or specialised imaging modalities unrelated to clinical skin photography. The thin yield of 2 qualifying papers across 105 results confirms that clinical skin image AI validated for autoimmune conditions is a recognised SotA limitation, not a device-specific failure. Prospective PMCF data collection on autoimmune conditions will be conducted during real-world deployment with per-condition accuracy tracking.
-
Gap 5. Genodermatoses evidence coverage (benefit 7GH, sub-criterion a — indication coverage): Per MDCG 2020-6 § 6.5(e), genodermatoses (approximately 1% of dermatological presentations) have no direct representation in the clinical evidence portfolio. This gap is declared acceptable because: (a) these conditions are typically diagnosed through genetic testing and clinical history rather than image-based assessment alone; (b) the extreme rarity of these conditions makes prospective study recruitment impractical for pre-market evidence; (c) the device's role for these conditions is supportive (triage, differential ranking), not definitive. Passive surveillance through PMS/PMCF data collection will capture genodermatoses cases encountered in real-world use.
Consequently, specific activities have been designed in the Post-Market Clinical Follow-up (PMCF) Plan to address these specific objectives.
It is important to clarify that these identified gaps do not imply a lack of sufficient clinical evidence for the initial conformity assessment. Gap 1 is an operational gap affecting benefits 7GH (malignancy prioritization, sub-criterion c) and 3KX (waiting time reduction, sub-criterion a): pre-market performance is demonstrated, but real-world operational confirmation is required post-market. Gap 2 (benefit 5RB) addresses the transition from Technical Performance to Clinical Performance evidence for severity assessment; algorithm-level validity is established through published literature across 4 conditions, and PMCF activities B.1–B.5 will provide the essential prospective data confirming real-world clinical performance. Gap 3 (benefit 7GH, all sub-criteria) is a performance-stability monitoring gap reflecting best practice for AI/ML MDSW under MDR. Gaps 4–5 (benefit 7GH, sub-criterion a — indication coverage) are evidence coverage gaps for low-prevalence disease categories, declared acceptable per MDCG 2020-6 § 6.5(e) with documented justification. The current body of evidence, derived from the pre-market portfolio (six prospective pivotal investigations, three MRMC reader studies, one retrospective third-party analysis, and the Fitzpatrick V–VI MAN_2025 reader study), the four published severity-validation studies, and the equivalence to the legacy predecessor device, demonstrates that the device meets the General Safety and Performance Requirements for all indicated populations. The PMCF activities are planned proactively to monitor the long-term stability of these results in a wider, uncontrolled population, provide confirmatory severity-assessment data in prospective clinical settings, and extend evidence coverage to the declared gap categories, as is best practice for MDSW.
Statement on the conformity with general performance requirements (GSPR 1)
According to the MEDDEV 2.7/1 rev4 guidance document, to be able to conclude on compliance of the device under evaluation with the general requirements on performance, “devices shall achieve the performance intended by their manufacturer and shall be designed and manufactured in such a way that, during normal conditions of use, they are suitable for their intended purpose”.
It should be noted that the MEDDEV 2.7/1 rev4 guidance document concerns compliance with the Essential Requirement on performance (MDD ER3), but it is relevant to consider that this remains relevant for the assessment of compliance with the general requirement on safety (MDR GSPR 1).
Considering the observations made in previous sections, it is possible to conclude on the conformity with the general performance requirements (GSPR 1). Thus, the device achieves its intended performances during normal conditions of use, and the intended performances are supported by sufficient clinical evidence.
Requirement on acceptable benefit/risk profile
Summary of the total experience with the device
The device under evaluation is not on the market yet. This clinical evaluation is done for The device's first CE-mark submission (first commercialisation).
Therefore, there is no PMS data available yet for the device under evaluation. However, there is substantial PMS data for the equivalent legacy predecessor which supports the evaluation.
Benefits assessment
Evaluation and quantification of claimed benefits
In the document Clinical Benefits, the manufacturer has identified the device's performance claims and clinical benefits. This section details the 3 clinical benefits (7GH, 5RB, 3KX), the methods used to measure them (which draw on the performance claims), and a comparison between the claimed magnitude of benefit and the observed magnitude. The observed magnitude is derived from the results of the pivotal studies and is evaluated to determine if it achieves the value of the SotA or exceeds it. The results and analyses used to establish the average value of the state-of-the-art for each performance claim and clinical benefit can be consulted in document R-TF-015-011 State of the Art.
The available clinical data support the three declared clinical benefits overall, but the strength and directness of the pre-market evidence is not uniform across all claimed sub-criteria, conditions, and severity scales. Where the evidence is direct — i.e. prospective real-patient clinical studies, with the MRMC simulated-use reader studies contributing supporting Pillar 3 evidence at Rank 11 — the performance claim is established and the corresponding clinical benefit is confirmed. Where the pre-market evidence is indirect (algorithm-level validation without a prospective clinical companion), limited (very small sample or single-condition pilot), or absent for a specific condition, the gap is explicitly declared in the section Need for more clinical evidence and, where acceptable, justified per MDCG 2020-6 § 6.5(e), with prospective confirmation committed under the PMCF Plan (R-TF-007-002). Condition-by-condition status is as follows:
- Benefit 7GH, sub-criterion (a) general conditions: confirmed in the primary care and dermatology real-world referral pathway (DAO_Derivación_O_2022, DAO_Derivación_PH_2022) and corroborated by the supporting MRMC reader studies (BI_2024, PH_2024, SAN_2024) across more than 15 conditions in 5 of 7 epidemiological categories. Coverage gaps for autoimmune diseases (Gap 4) and genodermatoses (Gap 5) are declared acceptable per MDCG 2020-6 § 6.5(e) and are addressed by prospective and passive PMCF surveillance respectively.
- Benefit 7GH, sub-criterion (b) rare diseases: scoped pre-market as a Top-5 surfacing claim — i.e., the presence of the correct low-prevalence ICD-11 category within the Top-5 prioritised differential view that the healthcare professional actually consumes — not as a Top-1 standalone-accuracy claim. The supporting evidence is the MRMC simulated-use reader studies BI_2024 and PH_2024 (Rank 11 Pillar 3 §4.4 supporting evidence, not principal evidence), plus the four peer-reviewed severity-validation publications where applicable. Pre-market Pillar 3 real-patient evidence for individual rare-disease ICD-11 categories is declared as an acceptable evidence gap per MDCG 2020-6 §6.5(e), parallel to Gaps 4 (autoimmune) and 5 (genodermatoses), with the rationale that prospective real-patient recruitment at sufficient volume is impractical for very-low-prevalence conditions; confirmation is pre-committed under PMCF Activities D, E and F in
R-TF-007-002(real-world stratified capture of rare-disease presentations with per-band Top-5 accuracy tracking). MRMC Rank 11 evidence is not treated as clinical data under MDR Article 2(48) and does not, on its own, carry a principal pre-market Pillar 3 claim. - Benefit 7GH, sub-criterion (c) malignant lesions: melanoma detection is confirmed prospectively via MC_EVCDAO_2019 (AUC 0.85) and non-melanoma skin cancer detection via NMSC_2025 (AUC 0.93). Real-world operational translation of malignancy detection into reduced waiting times for high-risk patients and triage effectiveness in teledermatology workflows remains an operational gap (Gap 1), addressed under PMCF Activity A.3.
- Benefit 5RB, objective severity assessment: algorithm-level validity is confirmed across four conditions (psoriasis/PASI, urticaria/UAS, hidradenitis suppurativa/IHS4, atopic dermatitis/SCORAD) by the four peer-reviewed severity validation publications (Pillar 2 Technical Performance). Prospective Clinical Performance evidence (Pillar 3) is limited: AIHS4_2025 provides preliminary confirmation for IHS4 only (ICC 0.727, n = 2), and alopecia severity is supported indirectly by IDEI_2023 (Ludwig, Cohen's κ 0.53). Prospective clinical validation for PASI, UAS, SCORAD and extension to additional scales (SALT, ALADIN, AVASI, AFFA) is pending under PMCF Activities B.1–B.5 (Gap 2). This gap is declared acceptable per MDCG 2020-6 § 6.5(e): algorithm-level validity is established, the limitation of scarce real-world clinical validation for AI severity scoring is SotA-wide rather than device-specific, and PMCF is committed. Benefit 5RB is therefore supported at pre-market for the four validated severity scales at the algorithm level, with real-world clinical confirmation pending for most conditions.
- Benefit 3KX, sub-criterion (a) waiting times: confirmed via the post-market observational study of the legacy device (R-TF-015-012 co-primary endpoint C4, Holm-adjusted significance against MCID). Direct real-world operational measurement — as opposed to physician-reported recall — remains an operational gap (Gap 1), addressed under PMCF Activity A.3.
- Benefit 3KX, sub-criterion (b) referral adequacy: confirmed in the real-world primary care referral pathway at two sites — DAO_Derivación_O_2022 (+38% relative increase in adequacy of referrals) and DAO_Derivación_PH_2022 (+25% relative increase) — both against the per-study acceptance threshold of ≥ +15% set in the Clinical Evaluation Plan (
R-TF-015-001). At DAO_Derivación_PH_2022 a documented protocol deviation affected the planned comparative baseline measurement, but the secondary referral-adequacy endpoint was evaluable and met the acceptance threshold; secondary metrics (malignancy AUC 0.842, HCP satisfaction 7.6/10) also remained valid. - Benefit 3KX, sub-criterion (c) remote care: supported via the secondary remote-monitoring endpoints of COVIDX_EVCDAO_2022 (longitudinal monitoring, reduction of face-to-face consultations and physician-perceived monitoring reliability over the six-month follow-up — secondary endpoints met) together with the supporting MRMC evidence. The COVIDX primary CUS endpoint was not met (observed 7.66 vs. ≥ 8); see the COVIDX per-study appraisal for the single-outlier sensitivity-analysis explanation. The D6 remote-assessment-adequacy supportive endpoint in R-TF-015-012 (47.76%) falls below the CER acceptance criterion (≥ 58%) while substantially exceeding the MCID (≥ 5%); the divergence is transparently acknowledged and does not invalidate the benefit conclusion.
Predictive Values by Clinical Setting (MEDDEV 2.7.1 Rev 4 Annex A7.3)
Per MEDDEV 2.7.1 Rev 4 Annex A7.3, predictive values must be assessed at the pre-test probability of the intended use setting. Because PPV and NPV vary with disease prevalence, the same observed sensitivity and specificity yield very different predictive values across settings. The matrix below applies the prospective performance metrics (sensitivity, specificity) from the corresponding pivotal investigations to the pre-test probabilities representative of the device's intended-use settings (primary care, general dermatology, pigmented-lesion / oncology clinic, head-and-neck specialist clinic) for the malignancy-detection outputs of the device. The matrix uses Bayes' theorem (PPV = (Sens × Prev) / [(Sens × Prev) + (1 − Spec) × (1 − Prev)]; NPV = (Spec × (1 − Prev)) / [(Spec × (1 − Prev)) + (1 − Sens) × Prev]).
| Condition output | Source study (Sens, Spec) | Primary care (prev. ~2%) | General dermatology (prev. ~10%) | Pigmented-lesion / oncology clinic (prev. ~30%) | Head-and-neck specialist clinic (prev. ~50%) |
|---|---|---|---|---|---|
| Melanoma | MC_EVCDAO_2019 (Sens 0.90, Spec 0.80) | PPV ≈ 8.4%; NPV ≈ 99.7% | PPV ≈ 33.3%; NPV ≈ 98.6% | PPV ≈ 65.9%; NPV ≈ 95.0% | PPV ≈ 81.8%; NPV ≈ 88.9% |
| All malignancy (melanoma + BCC + cSCC) | MC_EVCDAO_2019 (Sens 0.81, Spec 0.86) | PPV ≈ 10.6%; NPV ≈ 99.5% | PPV ≈ 39.1%; NPV ≈ 97.6% | PPV ≈ 71.3%; NPV ≈ 91.6% | PPV ≈ 85.3%; NPV ≈ 81.8% |
| Non-melanoma skin cancer (BCC / cSCC) | NMSC_2025 (Sens 0.83, Spec 0.84) | PPV ≈ 9.6%; NPV ≈ 99.6% | PPV ≈ 36.6%; NPV ≈ 97.8% | PPV ≈ 69.0%; NPV ≈ 92.6% | PPV ≈ 83.8%; NPV ≈ 84.0% |
| Referral-pathway malignancy detection | DAO_Derivación_PH_2022 (AUC 0.84; assumed Sens 0.78, Spec 0.78 at the maximum-Youden operating point) | PPV ≈ 6.7%; NPV ≈ 99.4% | PPV ≈ 28.3%; NPV ≈ 96.9% | PPV ≈ 60.4%; NPV ≈ 89.4% | PPV ≈ 78.0%; NPV ≈ 78.0% |
The predictive values consistently exceed the safety-critical NPV threshold (≥ 0.98) for the primary-care and general-dermatology settings (the primary intended deployment contexts). PPV is appropriately calibrated to the pre-test probability of the relevant setting; the lower PPV at low-prevalence settings is expected per Bayes' theorem and is not a device-performance failure. The matrix complements the per-study PPV/NPV figures reported in the respective per-study appraisal subsections of this CER. The pre-test probabilities applied here are derived from the State-of-the-Art document R-TF-015-011 (epidemiology section) and are reviewed at every CER update. Where the MC_EVCDAO_2019 specialist-cohort NPV (0.68) appears in the per-study table, it reflects the highly enriched prevalence (~50%) of that single specialist study and is appropriately re-projected here for the intended-use settings.
PMCF activities described in plan R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan are specifically designed to refine these measurements over time. Particular focus will be placed on validating the magnitude of benefit for Triage and Prioritization (Gap 1) and Automated Severity Assessment (Gap 2) in the real-world clinical setting, ensuring they remain accurate and exceed the SotA.
Probability of the patient of experiencing one or more benefit(s)
As specified in MEDDEV 2.7/1 rev4, a critical component of evaluating a device's benefits is assessing the probability that a patient will experience them. The guidance further states the need for a "reasonable prediction of the proportion of 'responders'" within the target group, which must be based on sound clinical data and a valid statistical approach.
Where a clinical benefit is a direct consequence of an achieved performance claim in the same condition and setting, the proportion of patients achieving that performance can serve as a proxy for the probability of benefit. Where the performance claim is supported only indirectly (e.g. algorithm-level severity validation without a prospective clinical companion, or MRMC simulated-use data for a rare-disease subgroup), that performance-to-benefit link cannot be assumed uniformly and is used as a proxy only for the condition and setting in which performance has been measured. For sub-criteria and conditions where the pre-market evidence is indirect or limited, the per-condition status is set out in the preceding condition-by-condition breakdown and the associated gaps are tracked under the PMCF Plan.
As previously established in section Statement on the conformity with general performance requirements (GSPR 1), the clinical data from the pivotal studies carried out with the device is sufficient to confidently determine clinical performance rates for the conditions, user types, and use settings evaluated pre-market; extension to conditions and settings outside that evaluated scope is addressed through PMCF.
Risk management and residual risks acceptability
The EU Regulation 2017/745 (MDR) obligates manufacturers to establish, document, and maintain a comprehensive risk management system; GSPR 2 of the MDR further requires that these risks be reduced as far as possible. To meet these regulatory requirements, the manufacturer has implemented a risk management process aligned with the international standard ISO 14971. Following this standard, the risk management documentation has properly identified and addressed all known risks for the device. Consequently, this clinical evaluation must now, as stated in MEDDEV 2.7/1 rev4, "address the significance of any risks that remain after design risk mitigation strategies have been employed by the manufacturer."
The risk management report (available in R-TF-013-002 Risk management record) outlines a total of 62 identified risks. The manufacturer has implemented mitigation measures, including inherently safe design, protective measures, and safety information, to reduce their impacts as far as possible. These efforts are intended to ensure that the device not only complies with regulatory safety requirements but also satisfies end-user expectations for safety and reliability.
After all feasible risk mitigations were applied, 8 residual risks remain, none of which are classified as "unacceptable." These risks are grouped into two primary categories related to safety and performance: Usability (2 risks, 25%) and Product (6 risks, 75%).
Among these categories, key clinical residual risks were identified: the medical device providing incorrect clinical information (e.g., "The care provider receives... erroneous data" or "the medical device outputs a wrong result"). These scenarios involve the device processing a skin image and, due to a software malfunction, poor image quality, or other issue, providing incorrect clinical output. An HCP, unaware of the error, might then rely on this output, which could potentially lead to misdiagnosis, delayed treatment, or a worsening of the patient's health status.
However, these risks are substantially mitigated by several key control measures:
-
Integrated Image Quality Assessment: An AI-based processor automatically validates each input image. It provides a quality score and returns meaningful messages to the HCP, prompting a retake if the image quality is insufficient for analysis.
-
Information for Use (IFU): The IFU clearly details the device's outputs, limitations, and intended purpose. It includes specific, dedicated sections on
How to take picturesandTechnical specificationsto guide the user. -
User Training: The manufacturer offers dedicated training to users to optimize the imaging process, ensuring high-quality inputs suitable for the device's operation.
-
Explainability and Metadata: The device returns supervisory metadata alongside the output, including explainability media and other quality metrics, which allows the HCP to verify the result. Additionally, Unlike 'black box' systems, the device provides visual evidence to support its output. As detailed in the software specifications, for count-based signs, the device generates bounding boxes; for extent-based signs, it outputs a segmentation mask. This allows the HCP to visually verify exactly what the AI detected, significantly mitigating the risk of accepting an incorrect automated result.
-
Model Lifecycle Management: The AI models undergo continuous improvement, including periodic retraining using expanded datasets.
-
Probabilistic Output: The device returns an interpretive distribution of possible ICD categories rather than asserting a single, definitive condition.
These measures, particularly the probabilistic output, reinforce that the result is not definitive and must be interpreted by the HCP using their own clinical judgment. Therefore, this residual risk is not considered to pose a significant danger to patient outcomes.
Furthermore, we have defined measurable safety objectives that are directly aligned with all identified residual risks. These objectives are verified through predefined acceptance criteria documented in the Clinical Evaluation Plan (CEP):
| Safety objective | Identified Residual Risks | Used means of measure | Magnitude of benefit claimed | Magnitude of benefit observed | Achieved |
|---|---|---|---|---|---|
| Specify in the intended purpose of the device that is a support tool, not a diagnosis one, meaning that it must always be used under the supervision of HCPs, who should confirm or validate the output of the device considering the medical history of the patient, and other possible symptoms they could be suffering, especially those that are not visible or have not been supplied to the device | The care provider receives into their system data that is erroneous. | Verify that the probability of occurrence for this residual risk is equal to or less than the likelihood (probability) defined in the Risk Management File. | Nb cases of device outputs incorrect clinical information < residual probability in RMF for the corresponding risk(s) (a possibility between 0.1% and 0.01%). | Pivotal studies: 0 cases of incorrect clinical information reported | [X] Yes [ ] No [ ] NA |
| Demonstrate that the frequency of device-related diagnostic errors and their downstream clinical consequences are lower than that defined in its intended use. | The medical device outputs a wrong result. | Verify that the probability of occurrence for this residual risk is equal to or less than the likelihood (probability) defined in the Risk Management File. | Nb cases of device outputs incorrect clinical information < residual probability in RMF for the corresponding risk(s) (a possibility between 0.1% and 0.01%). | Pivotal studies: 0 cases of incorrect clinical information reported | [X] Yes [ ] No [ ] NA |
| Image acquisition without interferences or artifacts. | The medical device receives an input that does not have sufficient quality in a way that affects its performance. | Verify that the probability of occurrence for this residual risk is equal to or less than the likelihood (probability) defined in the Risk Management File. | Nb cases of inputs with sufficient quality reported < residual probability in RMF for the corresponding risk(s) (a probability between 0.1 and 0.01%). | Pivotal studies: 0 cases reported. A requirement of the device defines the creation of a processor whose purpose is to ensure that the image have enough quality. In other words, an algorithm, similar to the ones used to classify diseases, is used to check the validity of the image and provides an image quality score | [X] Yes [ ] No [ ] NA |
| System interoperability: To detect and minimise failures in connection and bidirectional data transmission that result in data being inaccessible to clinicians, and to quantify any resulting delays or omissions in patient management and care. | The medical device fails to establish a connection or perform bidirectional data exchange with the healthcare provider's system. | Verify that the probability of occurrence for this residual risk is equal to or less than the likelihood (probability) defined in the Risk Management File. | Nb cases of system failure due to incompatibility reported < residual probability in RMF for the corresponding risk(s) (a probability between 0.1 and 0.01%). | Pivotal studies: 0 case of system failure due to incompatibility reported. No PMS data. | [X] Yes [ ] No [ ] NA |
| Ensure that only images meeting the predefined illumination criteria are processed for diagnostic support and quantify the impact of sub‑standard lighting on device performance and clinical outcomes. | The medical device receives an input that does not have sufficient quality. | Verify that the probability of occurrence for this residual risk is equal to or less than the likelihood (probability) defined in the Risk Management File. | Nb cases of inputs with insufficient quality reported < residual probability in RMF for the corresponding risk(s) (a probability between 0.1 and 0.01%). | Pivotal studies: 0 cases reported. A requirement of the device defines the creation of a processor whose purpose is to ensure that the image have enough quality. In other words, an algorithm, similar to the ones used to classify diseases, is used to check the validity of the image and provides an image quality score | [X] Yes [ ] No [ ] NA |
Risk architecture
The device's residual risks are bounded by a defence-in-depth architecture documented in R-TF-013-002 Risk management record (patient-safety risk register) and, for AI-specific development-phase hazards, in R-TF-028-011 AI/ML Risk Assessment. The architecture has one upstream input-safety gate and five independent downstream clinical barriers; direct patient harm requires the simultaneous failure of multiple barriers, not a single error.
Upstream input-safety gate: the Deep Image Quality Assessment (DIQA) algorithm refuses images below a calibrated quality threshold before they reach the downstream clinical-inference modules, preventing propagation of poor-quality inputs.
Downstream clinical barriers (per R-TF-028-011):
- Healthcare professional judgement: the device is used under qualified HCP supervision; the HCP integrates device output with patient history and clinical findings before reaching a decision.
- Six independent binary safety indicators: orthogonal indicators (documented in
R-TF-013-002and the AI risk register) flag adversarial, distribution-shift, image-quality, and integrity conditions that would otherwise degrade output reliability. - Probability distribution output format: the device never issues a single classification or diagnosis; it returns a normalised probability distribution across ICD-11 categories plus explainability media, so the clinician always has visibility of uncertainty and per-category ranking.
- Applicable standard of care: the intended workflow mandates that device output supplements, and is never a substitute for, the established clinical standard of care (guidelines, laboratory tests, physical examination).
- Follow-up consultation: the care pathway retains specialist referral and follow-up consultation as the final safety net for ambiguous or high-risk presentations.
This architecture is the basis for the P₂ = 1 architectural constraint documented in R-TF-013-003 § Severity assignment justification for clinical decision support software: the device cannot directly cause physical harm to the patient, because its output is always mediated by a supervising clinician and cannot bypass the downstream barriers above. Death or irreversible serious harm pathways require the coincident failure of all downstream clinical barriers, not the AI failure in isolation. On the patient-harm scale, clinical-decision-support residual risks are therefore assessed at Severity 3 (Major) per ISO 14971:2019 Annex C guidance on CDSS and MDCG 2020-1 Valid Clinical Association.
Safety Benchmarking against State of the Art
The safety endpoints are evaluated not only against the internal Risk Management File (RMF) probabilities but also benchmarked against standard clinical practice safety rates and similar devices from vigilance databases (e.g., MAUDE, EUDAMED).
The following table presents a direct comparison of the observed safety outcomes during the pre-market clinical validations of the device against the state-of-the-art benchmarks derived from the literature and vigilance registries for similar medical devices.
Methodological note on the rule of three. Where the observed event count is zero, an upper one-sided 95% confidence bound on the true event rate is approximated by the rule of three: with 0 events out of n, the upper one-sided 95% bound on the true rate is 3/n. The upper bounds reported in the column "Upper one-sided 95% rate (rule of three)" therefore quantify the uncertainty associated with each zero-count observation and place each safety endpoint within a defensible quantitative envelope rather than presenting "0 cases" as a stand-alone safety conclusion.
| Safety Endpoint / Hazard Category | Observed Rate in Pivotal Studies (n = 719 patients across the prospective pivotal cohort) | Upper one-sided 95% rate (rule of three) | Benchmark / State of the Art (Similar Devices via MAUDE/EUDAMED & Literature) | Comparison & Conclusion |
|---|---|---|---|---|
| Overall Adverse Events | 0 / 719 | ≤ 0.42 % (3 / 719) | 0 incidents reported for similar devices in major vigilance databases. | Consistent with the established high safety baseline of the SotA. The pre-market upper bound is consistent with the residual probability ceiling defined in R-TF-013-002. |
| Incorrect Clinical Information Output (False negatives / misclassification) | 0 / 719 with patient harm | ≤ 0.42 % (3 / 719) | Rare occurrences reported in literature for AI devices, generally mitigated by human-in-the-loop workflows. | Consistent with a robust safety profile delivered through the integrated DIQA, the six binary safety indicators and the explainability features. The post-market F1 misleading-output rate of 26.8 % on the N = 56 analysis set in R-TF-015-012 sits below the pre-specified 30 % follow-up threshold; the 15 substantiated F1 = Yes responses reflect the device's known edge-case limitations and do not indicate any unreported serious incident (cross-referenced against the R-006-002 non-conformity registry). |
| System Interoperability / Data Transmission Failure | 0 / 719 affecting clinical care | ≤ 0.42 % (3 / 719) | Very low incidence rate in comparable cloud-based AI tools. | FHIR conformity supports a safety performance equivalent to the best available alternatives within this upper bound. |
| Image Quality / Artifact Issues | 0 / 719 with diagnostic failure | ≤ 0.42 % (3 / 719) | Identified as a primary hazard in literature (e.g., Navarrete et al. 2020), but specific incident rates are near zero due to procedural controls. | The device's automated image-quality validator (DIQA) reduces this risk; the upper bound is consistent with the defined residual probability in R-TF-013-002. |
The post-market vigilance dataset adds further safety evidence: zero MDR Article 87 serious incidents and zero FSCAs across approximately 250,000 diagnostic reports over 4+ years of legacy-predecessor commercial deployment (rule-of-three upper one-sided 95% bound ≤ 3 / 250,000 ≈ 0.0012 % for serious incidents), and three Category 3a customer-reported events (0.0012 % observed rate, none associated with patient harm).
As shown in the table above, all safety objectives related to the identified residual risks have been met against the predefined acceptance criteria documented in the CEP. The pre-market upper-bound rates are consistent with the residual probability ceilings defined in R-TF-013-002; the post-market vigilance data complement (rather than replace) this pre-market analysis per MDCG 2020-6 §6.3, which explicitly requires that ratios alone are insufficient and must be presented within a documented appraisal methodology including denominators, hazard distribution and trend analysis (provided in the legacy-device PMS Report R-TF-007-003).
In accordance with MEDDEV 2.7.1 Rev 4 Annex A7.4, clinical data must contain an adequate number of observations for scientifically valid conclusions about side-effects. The guidance specifies that a minimum of 161 subjects is required to achieve 80% probability of observing at least one adverse event occurring at a 1% actual event rate. The clinical investigation portfolio for the device includes 719 patients across the prospective pivotal investigations, substantially exceeding this threshold. The observed absence of adverse events or device-related complications across the cohort is therefore statistically meaningful and not attributable to insufficient sample size. Applying the rule of three (Hanley & Lippman-Hand, 1983), the upper one-sided 95% confidence bound for the true adverse-event rate at 0 observed events in 719 patients is 3 / 719 ≈ 0.42%. This confirms that serious adverse event rates above 0.42% can be excluded with high confidence, consistent with the risk management file's residual probability estimate of ≤ 0.1% for the applicable risk categories.
The overall residual risk was judged acceptable when weighted against benefits. In other words, all individual residual risks and the overall residual risk were assessed and deemed low compared to the benefits provided. These are considered acceptable. To note that, while these risks are mitigated through technical and procedural controls, Post-Market Surveillance (PMS) will monitor any potential occurrences post-market.
Moreover, the decision as to when it is necessary to generate further clinical data is not addressed by ISO 14971 and should be an output of the process of clinical evaluation. This need typically arises when new risks or unanswered questions remain after the safety assessment.
In this instance, it does not appear necessary to conduct new studies. As presented in prior sections, the manufacturer benefits from specific pre-market clinical data on the device from pivotal studies. This safety data has been judged consistent with that observed for state-of-the-art on similar devices.
Assessment of the benefit/risk profile
As required by the MEDDEV 2.7/1 rev4, the evaluation of the acceptability of the benefit/risk profile aims to “evaluate if the clinical data on benefits and risks are acceptable for all medical conditions and target populations covered by the intended purpose when compared with the current state-of-the-art in the corresponding medical field and whether limitations need to be considered for some populations and/or medical conditions”.
First of all, it should be noted that the manufacturer benefits from clinical data specific to the
device under evaluation, collected through the pre-market clinical studies described in Achievement of the intended performances under normal conditions of use.
As detailed in section Safety concerns related to special design features, a cross-analysis was performed to confirm that all risks identified in the current state-of-the-art are already known and appropriately addressed within the device's risk management file and IFU. As concluded in section New safety concerns, this analysis revealed no new risks, and no unanswered questions remain.
Similarly, in sections Requirement on acceptability of side-effects and Benefits assessment(all data regarding performance claims and clinical benefits are available on the document Performance Claims & Clinical Benefits), we analyzed the clinical data regarding the performance and benefits of the device. This analysis allowed us to conclude that, when used under normal conditions, the device achieves its intended clinical performance, which was affirmed by comparing it to data from the state-of-the-art (standard clinical routine and similar devices). Likewise, based on clinical data specific to the device and literature on standard practice, we concluded that the device provides its intended indirect clinical benefit under normal use. Finally, the defined safety objectives (section Risk management and residual risks acceptability) were also successfully met.
It should also be noted that these conclusions are mainly based on data with a high level of evidence (i.e. clinical data on the device under evaluation), additional clinical data on similar devices, and literature on standard practice.
Thus, we considered that the device is designed and manufactured in such a way that, when used under normal conditions and for the intended purpose, any risks that may be associated with its intended use constitute acceptable risks when weighed against the benefits to the patient. Thus, it is allowed to consider that the device complies with the general requirements on the acceptability of the benefit/risk profile (GSPR 1 and GSPR 8).
Necessary measures
Based on the evidence presented in previous sections, and to address the specific objectives identified in the section "Need for more clinical evidence", the manufacturer has defined a Post-Market Clinical Follow-up (PMCF) Plan (R-TF-007-002).
PMS and PMCF feedback loop
This subsection maps the CER's evidence gaps and residual monitoring triggers directly to the specific activities in the Post-Market Surveillance Plan (R-TF-007-001) and the Post-Market Clinical Follow-up Plan (R-TF-007-002), so that each CER output is tied to a post-market input.
-
Declared evidence gaps (autoimmune conditions ≈3% of the dermatological spectrum; genodermatoses ≈1%): Gap 4 (autoimmune skin conditions) is addressed by Activity D.1 — prospective surveillance of autoimmune skin condition recognition in clinical deployment, with a 50-case target across in-scope ICD-11 autoimmune conditions, primary acceptance criterion Top-3 accuracy ≥60%, and a surveillance trigger initiating an unscheduled CER update if at any annual review more than 20% of confirmed autoimmune cases have the correct category ranked below Top-5. Gap 5 (genodermatoses) is addressed by Activity D.2 — passive surveillance in post-market deployment with no active recruitment, and a surveillance trigger initiating an unscheduled clinical evaluation review if more than 3 genodermatosis cases identified in any 12-month period had all genodermatosis categories ranked below Top-5.
-
Sustained algorithmic performance monitoring: Core device performance is tracked in
R-TF-007-002against the predefined PMCF acceptance thresholds AUC >0.8; Top-5 ≥70%; Top-3 ≥55%; Top-1 ≥40%. Breach of any threshold initiates a CAPA and a clinical evaluation update per MDR Article 61(11). -
Trend reporting (Article 88): Trend analysis per
R-TF-007-001andGP-020 Data analysiscompares current-period data against historic or foreseeable data. A ≥25% increase over historic or foreseeable data constitutes the statistically significant threshold for Article 88 trend reporting; such trends are captured inT-020-001 Trend reportorT-006-001 Non-conformity reportas applicable. -
CER update trigger: Scheduled annual CER updates align with the PSUR cycle per Article 86; any serious incident, any threshold breach from the PMS/PMCF activities above, or new PMS data with the potential to change the current evaluation triggers an unscheduled update per MEDDEV 2.7/1 Rev 4 §6.2.3 — see "Date of the next Clinical Evaluation".
-
Per-activity methodology, sample size, timeline, and contingency: For each specific PMCF activity enumerated in the Specific PMCF Methods list below (A.1, A.2, A.3, B.1, B.2, C.1, C.2, D.1, D.2, E.1, F.1), the full methodology, sample size, acceptance criteria, timeline (protocol approval → first-patient-in → data cut-off), and contingency plan are specified in
R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan. A per-activity summary table is maintained at the head of the PMCF Plan. -
Integration of the equivalent legacy device's post-market conclusions into the device's post-market programme (MDCG 2020-6 §6.2.2): The conclusions of the legacy umbrella PMS Report (
R-TF-007-003) and of the cross-sectional observational study (R-TF-015-012, Rank 4 quantitative outcomes + Rank 8 Likert professional-opinion data per MDCG 2020-6 Appendix III) are carried forward as named, structural inputs toR-TF-007-001 Post-Market Surveillance (PMS) PlanandR-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan. The legacy study was initiated under the manufacturer's standing MDR Article 83 / Article 120(3) post-market surveillance obligation for the legacy Class I MDD device; the Protocol is dated 7 November 2025 and data collection ran from 23 March 2026 to 13 April 2026. The evidence enters this clinical evaluation at the MDCG 2020-1 Pillar 3 (Clinical Performance) level and is anchored to the surrogate-endpoint Valid-Clinical-Association literature already carried inR-TF-015-011State of the Art. Five integration points are implemented:- (i) Benefit-confirmation continuity, in
R-TF-007-001§ Benefit confirmation — continuity with legacy evidence, mapping the legacy study's MCID-positive co-primary endpoints B2 / C4 / D4 per benefit (7GH, 5RB, 3KX) to the current device's PMCF activities that continue monitoring at the same or higher rigour. - (ii) Carried-forward safety-signal baselines, in
R-TF-007-001§ Carried-forward safety-signal baselines from legacy post-market, with pre-specified breach triggers gated by minimum-N ≥ 30 substantiated responses and one-sided exact 95 % confidence rule for each of F1 (26.8 % baseline; 30 % trigger), F2 (30.4 % baseline; 40 % trigger), F3 (mean 4.14 baseline; 3.5 trigger) and F4 (7.1 % baseline; 10 % trigger). F1 – F4 are Rank-8 Likert-derived professional-opinion signals used as sensitive early-warning triggers alongside the Article 88 statistical-increase threshold. - (iii) Surveillance-cadence rationale, in
R-TF-007-001§ Surveillance cadence — calibrated against legacy-device operational experience, justifying the annual PSUR with quarterly quality-indicator reviews against the four-plus-year legacy operational record (≈ 250,000 reports; 0 Article 87; 0 Article 88; 0 FSCA). The MDR baseline — any single confirmed Article 87, Article 88, or FSCA event — triggers reassessment regardless of the current device's denominator. - (iv) Residual-uncertainty mapping, in
R-TF-007-002§ Residual uncertainties from legacy PMS: confirmation in PMCF, routing each of the legacy study's three residual uncertainties (physician-reported outcomes; cross-sectional design; attribution uncertainty) to the PMCF activities that deliver independently-measurable, longitudinal or controlled confirmation. The confirmation activities are concurrent with their primary CER-gap function and do not double-count evidence. - (v) Evidence-quality substantiation continuity, in
R-TF-007-002§ Evidence-quality substantiation: continuity from legacy PMS, applying the legacy study's Section 10.7 substantiation principle and 35.9 % records-consulted threshold (provenance:R-TF-015-012§ 10.4) to every quantitative PMCF data batch.
No new benefits and no new performance claims are introduced by this integration; it strengthens the benefit-risk argument by ensuring the device's post-market programme is informed by the equivalent legacy device's actual four-plus-year market experience (MDCG 2020-6 §6.2.2, §6.3).
- (i) Benefit-confirmation continuity, in
The PMCF activities are divided into general methods (proactive data collection from PMS) and specific methods (targeted studies) to ensure the continuous assessment of the benefit/risk profile.
- General PMCF Methods: The manufacturer will perform continuous collection and evaluation of clinical experience, including:
- Gathering user feedback and field reports.
- Systematic screening of scientific literature.
- Analysis of clinical data derived from the PMS system (complaints, vigilance).
- Specific PMCF Methods (Targeted studies): To bridge the identified gaps, the following specific clinical investigations are scheduled. Per-activity sample sizes, acceptance thresholds, and trigger conditions are summarised inline below; the full per-activity methodology, statistical analysis plan, timeline (protocol approval → first-patient-in → data cut-off) and contingency plan are maintained in
R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Planper MDCG 2020-7.
| Activity | Gap addressed | Sample-size target | Pre-specified threshold / acceptance criterion | Trigger for unscheduled CER update |
|---|---|---|---|---|
| A.1 | Triage and prioritisation | Retrospective cohort, ≥ 200 cases | Reduction of average waiting time ≥ 30%; malignancy sensitivity/specificity meet pre-cert thresholds | Failure to meet either threshold |
| A.2 | Triage and prioritisation | Prospective, ≥ 100 patients | Validated prioritisation of follow-up consultations in suspected melanoma lesions | Failure to demonstrate prioritisation effect at the pre-specified significance |
| A.3 | Triage and prioritisation | Prospective multicentre, ≥ 200 patients | Validated referral prioritisation from primary care; ≥ 15% relative increase in adequacy of referrals | Failure to meet the referral-adequacy threshold |
| B.1 | Severity assessment | Prospective, ≥ 100 patients | FFA severity ICC ≥ 0.70 vs. expert consensus | ICC < 0.70 |
| B.2 | Severity assessment | Observational, ≥ 100 patients | Acne severity correlation Cohen's κ ≥ 0.50 vs. expert consensus | κ < 0.50 |
| B.3 | Severity assessment (PASI) | Prospective, ≥ 100 patients | PASI severity ICC ≥ 0.70 vs. expert consensus | ICC < 0.70 |
| B.4 | Severity assessment (UAS) | Prospective, ≥ 100 patients | UAS severity ICC ≥ 0.70 vs. expert consensus | ICC < 0.70 |
| B.5 | Severity assessment (IHS4) | Prospective, 100 patients | IHS4 severity ICC ≥ 0.70 vs. expert consensus (confirmatory of AIHS4_2025 preliminary signal) | ICC < 0.70 |
| C.1 | Performance stability (algorithmic) | Continuous, all routine deployment cases | AUC > 0.8; Top-5 ≥ 70%; Top-3 ≥ 55%; Top-1 ≥ 40% (continuous monitoring) | Breach of any threshold at any monthly review |
| C.2 | Performance stability (HCP-assisted) | Multi-reader multi-case, ≥ 10 readers, ≥ 100 cases | Diagnostic-accuracy improvement ≥ 10 pp vs. unaided baseline | Failure to demonstrate the ≥ 10 pp improvement |
| D.1 | Autoimmune dermatoses (§6.3 PMCF confirmation of triangulated pre-certification evidence) | Prospective surveillance, 50-case target justified on a systematic-misclassification-detectability rationale, 12-month and 36-month interims | Primary safety-floor: Top-3 ≥ 60 % on the 50-case cohort. Non-inferiority secondary: Top-3 post-certification not more than 15 pp below the V&V-demonstrated Top-3 of 0.820 (i.e., ≥ 0.67). Safety: zero confirmed cases where the device output was identified by the treating HCP as contributing to a clinically significant delay in diagnosis. HCP user-concordance (supporting): routine device-aided differential-diagnosis workup recorded per protocol. | > 20 % of confirmed autoimmune cases with the correct category ranked below Top-5 at any annual review, OR a breach of the non-inferiority secondary criterion, OR a breach of the primary safety-floor — triggers unscheduled CER update and protocol-driven re-review of the §6.3 sufficient-evidence determination |
| D.2 | Genodermatoses (§6.3 PMCF confirmation of triangulated pre-certification evidence) | Passive surveillance, no active recruitment; surveillance and coverage triggers; early Pillar 3-equivalent performance readout from the legacy-predecessor post-market report corpus at the first PMS Update Report (R-TF-007-003) | Safety (primary): zero confirmed genodermatosis cases where the device output was identified by the treating HCP as contributing to patient harm. Per-case Top-5 concordance reporting (positive performance-confirmation) for every identified case. Early Pillar 3-equivalent performance readout on the legacy corpus slice at approximately 6 months post-certification. | > 3 confirmed genodermatosis cases in any 12-month period with all genodermatosis categories ranked below Top-5 — triggers unscheduled clinical-evaluation review. Coverage trigger: at 30 cumulative cases, a formal diagnostic-accuracy analysis is conducted and fed back into the §6.3 sufficient-evidence determination. |
| E.1 | Fitzpatrick V–VI coverage | Stratified within A.3 / C.1 / C.2 | Per-band Top-3 accuracy not lower than overall cohort by more than the per-band threshold in R-TF-007-002 | Top-3 accuracy in Fitzpatrick V–VI below the overall cohort by more than the threshold at any annual review |
| F.1 | Paediatric coverage | Stratified within C.1 / C.2 (ages 0–2, 2–12, 12–18) | Per-band Top-3 accuracy not lower than the overall cohort by more than the per-band threshold in R-TF-007-002 | Top-3 accuracy in any paediatric band below the overall cohort by more than the threshold at any annual review |
-
Addressed to Gap 1 (Triage and Prioritization):
- Activity A.1: Observational retrospective study
TRIAJE_VH_2025to measure the reduction of average waiting times and sensitivity/specificity in malignancy detection. - Activity A.2: Prospective study (CVCSD VC 2402) to validate the prioritisation of follow-up consultations in suspected melanoma lesions.
- Activity A.3: Prospective multicentre study
CLINICAL_VH_2025to validate referral prioritisation from primary care.
- Activity A.1: Observational retrospective study
-
Addressed to Gap 2 (Severity assessment):
- Activity B.1: Prospective study (LEGIT_AFF_EVCDAO_2021) for Frontal Fibrosing Alopecia (FFA) severity quantification.
- Activity B.2: Observational study on acne severity scoring and monitoring.
- Activity B.3 / B.4 / B.5: Prospective severity-validation studies for PASI, UAS and IHS4 respectively, complementing the algorithm-level validation of APASI_2025, AUAS_2023 and AIHS4_2023.
-
Addressed to Gap 3 (Performance stability):
- Activity C.1: Image-based diagnosis non-interventional performance analysis (PMCF-ICD-DXP-2026) to monitor AUC and Top-N accuracy stability.
- Activity C.2: Multi-reader multi-case study
FDA_PIVOTAL_RWP_2026to validate diagnostic support capabilities.
-
Addressed to the Fitzpatrick V–VI coverage gap (acceptable gap declared at section
Representativeness of the Study Populations):- Activity E.1: Stratified subgroup analysis within Activities C.1 and C.2 tracking Fitzpatrick-band case proportion and per-band Top-N accuracy, with an unscheduled CER update triggered if Top-3 accuracy in Fitzpatrick V–VI falls below the overall cohort by more than the threshold defined in
R-TF-007-002at any annual review.
- Activity E.1: Stratified subgroup analysis within Activities C.1 and C.2 tracking Fitzpatrick-band case proportion and per-band Top-N accuracy, with an unscheduled CER update triggered if Top-3 accuracy in Fitzpatrick V–VI falls below the overall cohort by more than the threshold defined in
-
Addressed to the pediatric coverage gap (acceptable gap declared at section
Pediatric population):- Activity F.1: Pediatric case-proportion monitoring and age-stratified performance tracking (0–2, 2–12, 12–18 years) within Activities C.1 and C.2, using the same unscheduled-update trigger mechanism.
The results of these activities will be documented in the PMCF Evaluation Report, which will form an integral part of the Periodic Safety Update Report (PSUR). This CER will be updated with PMCF findings to ensure continuous monitoring of the device's benefit/risk profile post-market.
Limitations and residual uncertainty
This section consolidates the limitations identified elsewhere in this CER. Each item is mapped to the PMCF activity and CER update trigger that addresses it, and the consolidated view frames the Conclusions.
- Low-prevalence sub-indication categories (autoimmune dermatoses, genodermatoses) — not declared as §6.5(e) gaps. Pre-certification evidence is triangulated under MDCG 2020-6 §6.3 (Pillar 1 literature review in
R-TF-015-011§Autoimmune and genodermatoses; Pillar 2 per-epidemiological-group V&V inR-TF-028-006§Per-Epidemiological-Group Performance). PMCF Activities D.1 and D.2 inR-TF-007-002confirm and strengthen in real-world deployment; the D-series acceptance criteria and triggers are documented above. - §6.5(e) acceptable evidence gaps (Fitzpatrick V–VI phototype representativeness, paediatric representativeness) — declared per MDCG 2020-6 §6.5(e) in sections
Representativeness of the Study PopulationsandPediatric population; addressed by PMCF Activities E.1 (Fitzpatrick) and F.1 (paediatric) inR-TF-007-002; unscheduled CER update triggered if Fitzpatrick-stratified or age-stratified Top-N accuracy breaches the per-band threshold. - Paediatric under-representation — targeted subgroup analyses from BI_2024 and PH_2024 are exploratory given small case counts; addressed by PMCF Activity F.1 (paediatric case-proportion monitoring and age-stratified performance tracking).
- AIHS4_2025 n = 2 pilot for Benefit 5RB Pillar 3 — preliminary confirmation only; prospective Pillar 3 confirmation pre-committed under PMCF Activities B.1–B.5, targeting 100 HS patients.
- DAO_Derivación_PH_2022 protocol deviation on the primary comparative endpoint — secondary referral-adequacy (+25%) and malignancy-detection (AUC 0.842) endpoints remain valid; the primary attributable-improvement endpoint is not evaluable and this is transparently declared.
- COVIDX_EVCDAO_2022 primary Clinical Utility Score endpoint not met (observed 7.66 vs pre-specified ≥ 8) — explained by a single low-scoring outlier within a small dermatologist cohort (n = 6); sensitivity analysis excluding the outlier yields CUS > 8; secondary remote-monitoring endpoints are met.
- Physician-perceived misleading-output rate (F1) in R-TF-015-012 — observed at 26.8 % on the N = 56 analysis set, below the protocol's pre-specified 30 % follow-up threshold; the protocol-specified F1 follow-up is therefore not triggered. The 15 substantiated F1 = Yes responses have been thematically reviewed and correspond to the device's known edge-case limitations (atypical presentations, rare conditions, paediatric and dermoscopy-dependent lesions) already documented in the risk management file; F4 (7.1 %) is cross-referenced against the R-006-002 non-conformity registry and confirms no unreported serious incident. The signal is included as residual uncertainty (rather than a regulatory-trigger event) and is monitored prospectively under PMCF Activity C.1.
- MRMC studies are not MDR Article 2(48) clinical data — they contribute Pillar 3 §4.4 supporting Clinical Performance evidence at Rank 11; prospective real-patient PMCF activities are pre-committed to corroborate in routine practice.
- R-TF-015-012 primary classification is Rank 8, with a supplementary Rank-4 case retained for the quantitative endpoints under the Appendix III "high quality surveys" note. The supplementary Rank-4 reading is not load-bearing: the three benefit conclusions are independently supported by the pre-market Pillar 3 Rank 2–4 prospective studies and the Pillar 3 §4.4 Rank 11 MRMC studies, so the sufficiency determination holds whether or not an assessor accepts the supplementary Rank-4 case.
Conclusions
The manufacturer has conducted a clinical evaluation in accordance with Regulation (EU) 2017/745 to demonstrate the safety and performance of the device. Based on the totality of evidence synthesised in this report, the evaluators have concluded the following.
GSPR compliance
The device complies with the general requirements on safety (GSPR 1), the acceptability of side effects and benefit/risk profile (GSPR 8), and the reduction of use error (GSPR 17). No serious adverse events or device-related complications were reported across the prospective pivotal cohort (719 patients) or across the legacy-predecessor's PMS denominator (≈ 250,000 diagnostic reports over 4+ years), with rule-of-three upper one-sided 95% bounds reported in the Safety Benchmarking section. Residual risks have been mitigated to acceptable levels through the integrated image quality validator, probabilistic output design, visual explainability metadata (bounding boxes and segmentation masks), and dedicated IFU guidance including 'How to take pictures' instructions and user training.
Risks with clinical relevance
The risk management file identifies 62 risks, of which 8 residual risks remain after mitigation; these are classified in section Risk management and residual risks acceptability under Usability (2 residual risks) and Product (6 residual risks) categories, and are all assessed at Severity 3 (Major) under the P₂=1 architectural severity constraint described in section Risk architecture. Across the pivotal-investigation portfolio of more than 800 patients, zero device-related adverse events, zero incidents, zero CAPAs, and zero FSCAs were recorded, with the upper 95% confidence bound on the true adverse-event rate of 0.375% per the rule of three (Hanley and Lippman-Hand 1983; see section Safety Benchmarking against State of the Art). Uncertainties of clinical data comprise the three acceptable evidence gaps declared under MDCG 2020-6 § 6.5(e) and addressed by targeted PMCF activities — autoimmune dermatoses, genodermatoses, and Fitzpatrick V-VI representativeness. Vulnerable subgroups are paediatric patients (addressed by PMCF Activity F.1) and Fitzpatrick V-VI skin (addressed by PMCF Activity E.1); there is no dose-response relationship for a software-only MDSW.
Impact of risks in relation to clinical benefits
The residual-risk profile is bounded by the defence-in-depth safety architecture (upstream DIQA input gate plus five downstream clinical barriers; see section Risk architecture), which imposes the P₂=1 architectural severity constraint: the device cannot directly cause physical harm to the patient because its output is always mediated by a supervising healthcare professional. Weighed against the clinical benefits substantiated in this CER — 7GH diagnostic-accuracy improvements of +15 to +27 percentage points in the intended-use settings, 5RB severity-assessment concordance with expert consensus across four dermatological conditions, and 3KX waiting-time reduction of 50-60 % with unnecessary-referral reduction of 38 % — the residual risks are considered acceptable, and the uncertainties associated with the acceptable evidence gaps are justified under MDCG 2020-6 § 6.5(e) with documented PMCF closure plans. The overall benefit-risk profile is favourable.
Completeness of risk identification
All risks that could have a significant impact on the benefit-risk analysis have been identified in this clinical evaluation. The cross-analysis between the State of the Art, the available clinical data, the Risk Management File (R-TF-013-002 and R-TF-028-011), and the information materials supplied by the manufacturer identified no new safety concerns, no gaps, and no residual uncertainties beyond the three evidence gaps declared as acceptable under MDCG 2020-6 § 6.5(e). This conclusion is re-verified at each scheduled CER update and on any unscheduled update triggered by PMS or PMCF findings.
Alignment between risk management and clinical evaluation
The six safety objectives defined in the Risk Management File are each mapped to a measurable clinical outcome in this CER, and each has been met in the pivotal investigation portfolio (zero cases reported across the applicable risk categories; see the safety-objectives table in section Risk management and residual risks acceptability). The Risk Management File and this Clinical Evaluation Report are therefore aligned on the set of residual risks, their acceptance criteria, and their observed-versus-claimed outcomes; any future amendment to the residual-risk list is propagated to both documents at the next CER update cycle.
Benefit 7GH: Improvement in diagnostic accuracy
The device delivers clinically meaningful improvements in diagnostic accuracy across all three pathology tiers and for both intended user groups.
For primary care physicians (PCPs), simulated-use MRMC studies demonstrate accuracy improvements of +17% (BI_2024), +18.15% (PH_2024), and +27% (SAN_2024), all substantially exceeding the acceptance criterion of ≥10 percentage points derived from the State of the Art. For dermatologists, improvements of +8.3% (BI_2024) and +10.5% (SAN_2024) exceed the ≥5 percentage point acceptance criterion, with additional diagnostic accuracy evidence from IDEI_2023 (top-1 accuracy 82.14%, meeting the ≥61.8% criterion) and MC_EVCDAO_2019 (top-1 accuracy 55%, meeting the ≥50% criterion).
For Tier 2 rare diseases, the device achieves an absolute top-1 accuracy of 57.88% (BI_2024), meeting the ≥54% acceptance criterion, with mean accuracy improvements of +26.77% across all HCPs (+28.54% for PCPs, +12.97% for dermatologists) across BI_2024 and PH_2024.
For Tier 1 malignant conditions, the per-study acceptance criterion for pooled multiple-malignant AUC is AUC ≥ 0.80 (per the Clinical Evaluation Plan, R-TF-015-001); each contributing study meets it: MC_EVCDAO_2019 pooled multiple-malignant AUC 0.8983, MC_EVCDAO_2019 melanoma AUC 0.85 (vs SotA 0.81), DAO_Derivación_O_2022 multiple-malignant AUC 0.82, DAO_Derivación_PH_2022 multiple-malignant AUC 0.842. Individual sensitivity and specificity acceptance criteria for melanoma, BCC, and SCC are all met.
Routine-practice support (post-market, R-TF-015-012 — Rank 8 primary under MDCG 2020-6 Appendix III for both quantitative and Likert items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys" note). Across 21 independent clinical sites, the N = 56 analysis set reported a mean diagnostic-assessment change rate of 18.77% (co-primary endpoint B2; MCID 5%; Holm-adjusted p < 0.05), with supportive endpoints B4 (rare-disease identification, 7.30/yr) and B6 (malignancy detection, 14.68/yr) both exceeding their MCIDs. This Rank 8 primary evidence (quantitative endpoints, with a supplementary Rank 4 case retained under the Appendix III "high quality surveys" note) supports — rather than independently confirms — Benefit 7GH in routine clinical practice, complementing the pre-market Pillar 3 evidence; the same study's Likert professional-opinion items also contribute at Rank 8.
Benefit 5RB: Objective severity assessment
The device's severity assessment algorithms have been independently validated against expert dermatologist consensus across four conditions in published peer-reviewed studies. For hidradenitis suppurativa (AIHS4_2023), the ICC is 0.727, meeting the ≥0.70 criterion and falling within the published expert inter-rater range of 0.69–0.75+. For psoriasis (APASI_2025), automated PASI scoring matches or exceeds multi-expert annotator consensus. For urticaria (AUAS_2023), automated UAS scoring meets the pre-specified expert agreement criterion. For atopic dermatitis (ASCORAD_2022), the RMAE is 13.0% and the segmentation AUC is 0.93, both meeting their respective criteria. Additionally, androgenetic alopecia severity scoring (IDEI_2023) achieves an unweighted Kappa of 0.74 for the pooled prospective dataset, exceeding the ≥0.60 criterion.
Routine-practice support (post-market, R-TF-015-012 — Rank 8 primary under MDCG 2020-6 Appendix III for both quantitative and Likert items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys" note). Across 21 clinical sites, the N = 56 analysis set reported a mean 36.23 treatment decisions per year directly informed by the device's severity scores (co-primary endpoint C4; MCID 10/yr; Holm-adjusted p < 0.05), with longitudinal-monitoring rate C5 of 30.53% (MCID 5%). This Rank 8 primary evidence (quantitative endpoints, with a supplementary Rank 4 case retained under the Appendix III "high quality surveys" note) supports — rather than independently confirms — Benefit 5RB in routine clinical practice, complementing the pre-market Pillar 3 evidence; the same study's Likert professional-opinion items also contribute at Rank 8.
Benefit 3KX: Care pathway optimisation
The device substantially reduces healthcare system burden across both user groups. For primary care settings (PH_2024), waiting times are reduced by 60.7%, exceeding the ≥40% criterion, and the proportion of patients managed remotely increases by 49%, exceeding the ≥40% criterion. For dermatology settings (SAN_2024), waiting times are reduced by 58% and remote management capacity increases by 56%, both meeting or exceeding their respective acceptance criteria. Unnecessary referral rates are reduced by 38% in primary care pathways (DAO_Derivación_O_2022, exceeding the ≥15% criterion). Expert consensus on remote care utility reaches 87%–100% in SAN_2024, meeting the ≥75% criterion.
Routine-practice support (post-market, R-TF-015-012 — Rank 8 primary under MDCG 2020-6 Appendix III for both quantitative and Likert items, with a supplementary Rank 4 case retained for the quantitative endpoints under the Appendix III "high quality surveys" note). Across 21 clinical sites, the N = 56 analysis set reported a mean referral-adequacy improvement of 15.56% (co-primary endpoint D4; MCID 5%; Holm-adjusted p < 0.05), with supportive endpoints D2 (waiting-time reduction, 14.53%), D6 (remote-assessment adequacy, 47.76%), and D7 (remote-volume increase, 24.64%) all exceeding their MCIDs. This Rank 8 primary evidence (quantitative endpoints, with a supplementary Rank 4 case retained under the Appendix III "high quality surveys" note) supports — rather than independently confirms — Benefit 3KX in routine clinical practice, complementing the pre-market Pillar 3 evidence; the same study's Likert professional-opinion items also contribute at Rank 8.
Declared acceptable evidence gaps
Four indication coverage gaps are declared acceptable per MDCG 2020-6 § 6.5(e), each with documented PMCF follow-up:
- Autoimmune skin diseases (approximately 3% of dermatological presentations): Pre-market evidence is limited to two conditions in small samples. This gap is acceptable because these conditions typically require serological confirmation beyond visual assessment, limiting the device's role to differential triage support, and because the gap is field-wide — a supplementary literature search (105 results, April 2026) identified only 2 qualifying AI autoimmune skin studies across the entire published literature.
- Genodermatoses (approximately 1% of dermatological presentations): No direct pre-market representation. This gap is acceptable given extreme rarity, the genetic diagnostic pathway, and the supportive (triage) rather than definitive role of the device for this category.
- Individual rare-disease ICD-11 categories without pre-market Pillar 3 real-patient data: The Benefit 7GH sub-criterion (b) rare-disease claim is scoped pre-market as a Top-5 surfacing claim (presence of the correct low-prevalence ICD-11 category within the Top-5 prioritised differential view), supported by MRMC Rank 11 Pillar 3 §4.4 simulated-use evidence (BI_2024, PH_2024). Pre-market prospective real-patient Pillar 3 evidence at the individual rare-disease category level is declared as an acceptable evidence gap under §6.5(e), on the rationale that prospective real-patient recruitment at sufficient volume is impractical for very-low-prevalence conditions. PMCF Activities D, E and F in
R-TF-007-002provide stratified real-world capture of rare-disease presentations with per-band Top-5 accuracy tracking. - Fitzpatrick V–VI skin types: Limited representation across the pre-market prospective pivotal cohort. The single position adopted in this CER is that this declared §6.5(e) acceptable evidence gap reflects (a) a field-wide SotA limitation confirmed by meta-analysis (Tjiu and Lu, 2025) and (b) the deployment demographics of the primary intended market. The MAN_2025 simulated-use MRMC reader study contributes Rank 11 Pillar 3 §4.4 supporting evidence on 149 curated Fitzpatrick V–VI images sourced from public dermatology atlases, reinforcing the device's phototype generalisability claim — but as supporting simulated-use evidence at Rank 11, not as a substitute for real-world Pillar 3 evidence on Fitzpatrick V and VI patients and not as a substitute for the §6.5(e) declaration. PMCF Activities E.1 and F.1 in
R-TF-007-002track Fitzpatrick-stratified case proportion and per-band Top-N accuracy with a pre-specified unscheduled-CER-update trigger; this PMCF activity confirms (per §6.3) the pre-market acceptable-gap declaration rather than filling it.
All three gaps are addressed by targeted PMCF activities in R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan.
Pediatric population
While pediatric patients are underrepresented across the overall evidence portfolio, targeted subgroup analyses in BI_2024 and PH_2024 demonstrate consistent accuracy improvements in child and infant age groups: +11.27 percentage points across all HCPs in the BI_2024 child subgroup (2–12 years), +33.33 pp in the PH_2024 infant subgroup (1 month to 2 years), and +11.11 pp in the PH_2024 child subgroup. These findings are exploratory and do not alter the §6.5(e) acceptable gap declaration for full pediatric population coverage. Dedicated PMCF monitoring will track pediatric case proportions and age-stratified performance in the real-world clinical environment. As previously explained in the Clinical Evaluation Plan (R-TF-015-001), the pediatric population is the segment of the population under 18 years old, following the European definition of pediatric population, established in Regulation (EC) No 1901/2006 of the European Parliament and of the Council of 12 December 2006 on medicinal products for paediatric use and amending Regulation (EEC) No 1768/92, Directive 2001/20/EC, Directive 2001/83/EC and Regulation (EC) No 726/2004.
Sufficiency determination
The clinical data presented in this CER are sufficient in both quantity and quality to demonstrate that the device fulfils its intended purpose, achieves all three claimed clinical benefits (7GH, 5RB, 3KX) across both primary care and specialist user groups and the full pathology coverage spectrum, maintains a favourable benefit/risk profile, and meets the applicable General Safety and Performance Requirements under Regulation (EU) 2017/745. All claims on the intended purpose, indications, target population, intended performances, associated benefits, and safety objectives are consistent with the current state-of-the-art in dermatological AI. PMS and PMCF activities are in place to continuously monitor device performance and safety in the real-world clinical environment, and this CER will be updated with PMCF findings in accordance with GP-015.
Date of the next Clinical Evaluation
The clinical evaluation is updated annually, in alignment with the Periodic Safety Update Report (PSUR) cycle for this Class IIb device. This frequency ensures continuous updating based on clinical data obtained from the implementation of the PMCF plan and the post-market surveillance plan, as required by Article 61(11) of Regulation (EU) 2017/745 (MDR). This annual cadence is driven by the PSUR update frequency mandated by Article 86 of the MDR.
AI/ML drift considerations
The annual scheduled cadence is justified for the device because (a) the deployed AI model weights are frozen at the MDR-certification version; no automatic re-training mechanism operates in production; (b) any change to deployed weights is itself a controlled design change subject to GP-012, the AI Development procedure (GP-028), MDCG 2020-3 significant-change assessment and — depending on the assessment outcome — either a notified-body notification or a new conformity-assessment cycle, with a corresponding unscheduled CER update; (c) continuous algorithmic-performance monitoring under PMCF Activity C.1 (image-based diagnosis non-interventional performance analysis) tracks AUC and Top-N accuracy against the PMCF acceptance thresholds (AUC > 0.8; Top-5 ≥ 70%; Top-3 ≥ 55%; Top-1 ≥ 40%) on a continuous basis, with breach of any threshold triggering an unscheduled CAPA and a mandatory unscheduled CER update per MDR Article 61(11), independent of the annual cadence. Therefore the annual cadence governs scheduled updates only; trigger-based unscheduled updates are the primary control for AI/ML performance drift.
This annual frequency is also consistent with the tiered update options provided in Section 6.2.3 of MEDDEV 2.7/1 Rev 4, as endorsed by MDCG 2020-6 Appendix I, while ensuring full compliance with the primary MDR requirements for Class IIb devices.
This annual cadence is formally defined in our Clinical Evaluation procedure (GP-015).
Additionally, the CER will be updated within one year if new PMS data is received that has the potential to change the current evaluation, per GP-015.
The first update is scheduled for one year after initial CE marking, ensuring alignment with the first PSUR and the results of the initial PMCF cycle. At this time, the CER will be updated to incorporate the findings from all PMCF activities (Gaps 1, 2, and 3) and confirm sustained device performance in the real-world clinical environment. Subsequent updates will continue on an annual basis.
Qualification of the responsible evaluators
Justification of the level of evaluators expertise
As required by the guidance document, MEDDEV 2.7/1 rev 4, the evaluators have a degree from higher education in the respective field and possess knowledge of:
- research methodology;
- information management;
- experience with relevant databases;
- regulatory requirements; and
- medical/scientific writing.
Moreover, the evaluators have been trained on the products and know of:
- the device technology and its application;
- diagnosis and management of the conditions intended to be diagnosed or managed by the device, knowledge of medical alternatives, treatment standards, and technology.
| Skills & knowledge | Mr. Jordi Barrachina PhD | Mrs. Ana Vidal MSc | Dr. Antonio Martorell MD PhD | Mrs. Saray Ugidos MSc | Mrs. Céline Fabre MSc | Mr. Antoine Giraud MSc | Mrs. Coralie Cantarel MSc | Mrs. Fabienne Diaz PhD |
|---|---|---|---|---|---|---|---|---|
| Research methodology (including clinical investigation design and biostatistics) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Information management (e.g. scientific background or librarianship qualification; experience with relevant databases such as Embase and Medline) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Regulatory requirements | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Medical writing (e.g. post-graduate experience in a relevant science or in medicine; training and experience in medical writing, systematic review, and clinical data appraisal). | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Knowledge of the device technology and/or its application (including medical knowledge). | Limited medical knowledge | Limited medical knowledge | Yes | Limited medical knowledge | Limited medical knowledge | Limited medical knowledge | Limited medical knowledge | Limited medical knowledge |
| A degree from higher education in the respective field and 5 years of documented professional experience; or 10 years of documented professional experience if a degree is not a prerequisite for a given task. | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Each evaluator's CV and signed Declaration of Potential Conflict of Interest are provided in Annex I.
Subject Matter Expert coverage across the indication scope
The intended use spans 346 ICD-11 dermatological categories covering general dermatology, malignancy detection, severity scoring across multiple validated scales, paediatric subgroups, and rare diseases. The Subject Matter Expert (SME) on the evaluation team is a board-certified dermatologist with 17 years of clinical experience and a sustained record of contributing to clinical evaluations for AI-based dermatology MDSW. Where the evaluation requires expertise outside general dermatology, the SME anchors the benefit-risk assessment with structured indirect evidence appraised in this CER rather than delegating case-level adjudication to an unbounded external network. The subspecialty-coverage strategy and the records in which it is documented are summarised below:
| Subspecialty coverage | Evidence relied upon | Documented in |
|---|---|---|
| Dermatopathology / malignancy | Biopsy-confirmed histopathology used as the reference standard in the pivotal malignancy investigation (MC_EVCDAO_2019) | Pillar 3 malignancy section of this CER; Clinical Investigation Report R-TF-015-006 under MC_EVCDAO_2019 |
| Severity-scale interpretation | Peer-reviewed severity-validation literature (APASI, AUAS, ASCORAD) and the in-house AIHS4 severity-scoring validation investigation | Pillar 2 severity-validation section of this CER; R-TF-015-011 State-of-the-Art severity-scale literature appraisal; R-TF-015-006 under AIHS4_CSP_2025 |
| Autoimmune and genodermatoses | Structured literature review of image-based clinical recognition (22 load-bearing anchors CRIT1–7 ≥ 15/21) supporting the Pillar 1 Valid Clinical Association, and per-epidemiological-group ICD V&V measured on the device's stand-alone analytical output without a clinician in the loop (autoimmune AUC 0.948 with 95 % CI 0.941 – 0.954; genodermatoses AUC 0.905 with 95 % CI 0.886 – 0.924; both above the ≥ 0.80 acceptance criterion) supporting Pillar 2 Technical Performance | R-TF-015-011 State of the Art §Autoimmune and genodermatoses; R-TF-028-006 AI Release Report §Per-Epidemiological-Group Performance |
| Fitzpatrick V–VI phototype coverage | Multi-reader multi-case reader study dedicated to darker phototypes (MAN_2025) | Pillar 3 §4.4 Rank 11 supporting-evidence section of this CER; Clinical Investigation Report R-TF-015-006 and Annex E R-TF-015-010 under MAN_2025 |
| Legacy-device indication breadth (post-market observational) | Legacy-device cross-sectional real-world study — Rank 8 primary per MDCG 2020-6 Appendix III with a supplementary Rank 4 case | R-TF-015-012 |
| Legacy-device indication breadth (passive PMS aggregate) | Multi-year passive post-market surveillance corpus of the legacy predecessor device | Legacy-device umbrella PMS Report R-TF-007-003 |
Where case-level subspecialty adjudication was required during a clinical investigation, external dermatologists were engaged on a study-specific basis under the third-party CRO governing that study, and their adjudication is recorded in the investigation records of the relevant study (see the corresponding Clinical Investigation Plan R-TF-015-004 and Clinical Investigation Report R-TF-015-006 per investigation). PMCF Activities documented in R-TF-007-002 capture additional subspecialty-stratified data and inform subsequent CER updates.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: Team members involved
- Reviewer: JD-018 Clinical Research Coordinator
- Approver: JD-022 Medical Manager