Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health version 2.1 (Legacy MDD)
  • Legit.Health US Version 1.1.0.0
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 0: Background & Action Plan
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
        • completed-tasks
          • task-3b10-legacy-pms-document-hierarchy-refactor
          • task-3b11-sme-coverage-subspecialty-documentation
          • task-3b12-phase-1-exploratory-per-bucket-c-feature
          • task-3b13-man-2025-cep-cip-completeness
          • task-3b14-ifu-integration-requirements-verification
          • task-3b4-mrmc-dark-phototypes
          • task-3b6-surrogate-endpoint-literature-review
            • Appraisal log — CRIT1–7 rolling table
            • Do we need this task?
            • Integration map — propagation of the surrogate-endpoint validity review
            • references
              • diagnostic-accuracy
                • Conic 2018 — Impact of melanoma surgical timing on survival (NCDB)
                • Daneshjou 2022 — Disparities in dermatology AI performance on a diverse clinical image set (DDI) [BALANCING]
                • Dick 2019 — Accuracy of computer-aided diagnosis of melanoma: a meta-analysis
                • Esteva 2017 — Dermatologist-level classification of skin cancer with deep neural networks
                • Freeman 2020 — Algorithm-based smartphone apps for skin cancer risk: BMJ systematic review [BALANCING]
                • Gershenwald 2017 — AJCC 8th edition: melanoma staging and survival gradient
                • Haenssle 2018 — Man against machine: CNN vs 58 dermatologists for melanoma recognition
                • Haenssle 2020 — Man against machine reloaded: market-approved CNN (Moleanalyzer Pro) vs 96 dermatologists
                • Han 2018 — Clinical-image classification for benign and malignant tumours (cross-ethnicity) [BALANCING]
                • Liu 2020 — A deep learning system for differential diagnosis of skin diseases
                • Salinas 2024 — Systematic review and meta-analysis of AI vs. clinicians for skin cancer diagnosis
                • Tschandl 2020 — Human–computer collaboration for skin cancer recognition
                • Winkler 2023 — Dermatologists cooperating with a CNN: prospective clinical study
              • referral-optimisation
              • severity-assessment
            • Research prompts — external deep-research tools
            • Surrogate-Endpoint Validity in Dermatology AI — Structured Literature Review
          • task-3b7-icd-per-epidemiological-group-vv
          • task-3b8-safety-confirmation-column-definition
          • task-3b9-legacy-pms-conclusions-into-plus-pms-plan
        • Coverage matrix
        • resources
        • Task 3b-5: Autoimmune and Genodermatoses Triangulated-Evidence Package
      • Evidence rank & phases
      • Pre-submission review of R-TF-015-001 CEP and R-TF-015-003 CER
  • Pricing
  • Public tenders
  • Trainings
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • completed-tasks
  • task-3b6-surrogate-endpoint-literature-review
  • references
  • diagnostic-accuracy
  • Daneshjou 2022 — Disparities in dermatology AI performance on a diverse clinical image set (DDI) [BALANCING]

Daneshjou 2022 — Disparities in dermatology AI performance on a diverse clinical image set (DDI) [BALANCING]

Citation​

Daneshjou R, Vodrahalli K, Novoa RA, Jenkins M, Liang W, Rotemberg V, et al. Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci Adv. 2022 Aug 12;8(32):eabq6147. DOI: 10.1126/sciadv.abq6147. PMID 35960806.

Study design and population​

External-validation / bias audit using the Diverse Dermatology Images (DDI) dataset — 656 pathologically confirmed clinical images across Fitzpatrick I–VI. Three state-of-the-art dermatology AI models evaluated: ModelDerm, DeepDerm, HAM10000-trained.

Reported metrics​

  • DDI overall AUC: ModelDerm 0.65 (95 % CI 0.61–0.70); DeepDerm 0.56 (0.51–0.61); HAM10000 0.67 (0.62–0.71) — 27–36 % drop vs. original benchmark reports
  • FST I–II vs. V–VI stratification: HAM10000 AUC 0.72 (0.63–0.79) vs. 0.57 (0.48–0.67)
  • Balanced-accuracy gap: ModelDerm 0.67 (FST I–II) → 0.51 (FST V–VI)
  • Fine-tuning on DDI partially closed the gap

Surrogate-to-outcome linkage​

Quantifies skin-tone spectrum bias — the surrogate-to-outcome chain breaks for under-represented populations if the diagnostic-accuracy claim is uniform across phototypes. This is the MANDATORY balancing reference: demonstrates that current diagnostic-accuracy performance is overstated for FST IV–VI and must be addressed by stratified PMCF performance monitoring and diverse training data.

CRIT1–7 appraisal​

CriterionScoreJustification
CRIT1 Relevance3Direct — quantifies the generalisability limit of the primary surrogate.
CRIT2 Methodology3Purpose-built diverse dataset; multi-model comparison; FST-stratified analysis; fine-tuning experiments.
CRIT3 Reporting3AUC and balanced-accuracy with 95 % CIs, stratified by phototype.
CRIT4 Applicability3Directly addresses the intended-population equity requirement under MDR Annex I §17.2.
CRIT5 Evidence weight1Retrospective external-validation / bias-audit study.
CRIT6 Risk of bias2Moderate dataset size (656 images); single-institution curation; Fitzpatrick scale imperfect proxy for melanin; limited long-tail disease coverage.
CRIT7 Contribution3MANDATORY balancing reference — quantifies the critical failure mode of the surrogate and anchors the PMCF subgroup-monitoring commitment.

Aggregate: very strong (as a balancing reference).

Limitations and notes​

Fitzpatrick scale known to be a coarse melanin proxy; single-institution curation; small per-FST stratum sizes; legacy models not all designed for non-dermoscopic clinical images.

Strength as anchor​

Mandatory inclusion. Demonstrates balanced citation practice (per BSI Erin's attention to selective citation) and directly motivates the PMCF stratified-performance-monitoring commitment in R-TF-007-002. Supported by Han 2018 (cross-ethnicity) and informed by Dick 2019 (independent-test-set gap).

Previous
Conic 2018 — Impact of melanoma surgical timing on survival (NCDB)
Next
Dick 2019 — Accuracy of computer-aided diagnosis of melanoma: a meta-analysis
  • Citation
  • Study design and population
  • Reported metrics
  • Surrogate-to-outcome linkage
  • CRIT1–7 appraisal
  • Limitations and notes
  • Strength as anchor
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)