Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health version 2.1 (Legacy MDD)
  • Legit.Health US Version 1.1.0.0
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 0: Background & Action Plan
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
        • completed-tasks
          • task-3b10-legacy-pms-document-hierarchy-refactor
          • task-3b11-sme-coverage-subspecialty-documentation
          • task-3b12-phase-1-exploratory-per-bucket-c-feature
          • task-3b13-man-2025-cep-cip-completeness
          • task-3b14-ifu-integration-requirements-verification
          • task-3b4-mrmc-dark-phototypes
          • task-3b6-surrogate-endpoint-literature-review
            • Appraisal log — CRIT1–7 rolling table
            • Do we need this task?
            • Integration map — propagation of the surrogate-endpoint validity review
            • references
              • diagnostic-accuracy
                • Conic 2018 — Impact of melanoma surgical timing on survival (NCDB)
                • Daneshjou 2022 — Disparities in dermatology AI performance on a diverse clinical image set (DDI) [BALANCING]
                • Dick 2019 — Accuracy of computer-aided diagnosis of melanoma: a meta-analysis
                • Esteva 2017 — Dermatologist-level classification of skin cancer with deep neural networks
                • Freeman 2020 — Algorithm-based smartphone apps for skin cancer risk: BMJ systematic review [BALANCING]
                • Gershenwald 2017 — AJCC 8th edition: melanoma staging and survival gradient
                • Haenssle 2018 — Man against machine: CNN vs 58 dermatologists for melanoma recognition
                • Haenssle 2020 — Man against machine reloaded: market-approved CNN (Moleanalyzer Pro) vs 96 dermatologists
                • Han 2018 — Clinical-image classification for benign and malignant tumours (cross-ethnicity) [BALANCING]
                • Liu 2020 — A deep learning system for differential diagnosis of skin diseases
                • Salinas 2024 — Systematic review and meta-analysis of AI vs. clinicians for skin cancer diagnosis
                • Tschandl 2020 — Human–computer collaboration for skin cancer recognition
                • Winkler 2023 — Dermatologists cooperating with a CNN: prospective clinical study
              • referral-optimisation
              • severity-assessment
            • Research prompts — external deep-research tools
            • Surrogate-Endpoint Validity in Dermatology AI — Structured Literature Review
          • task-3b7-icd-per-epidemiological-group-vv
          • task-3b8-safety-confirmation-column-definition
          • task-3b9-legacy-pms-conclusions-into-plus-pms-plan
        • Coverage matrix
        • resources
        • Task 3b-5: Autoimmune and Genodermatoses Triangulated-Evidence Package
      • Evidence rank & phases
      • Pre-submission review of R-TF-015-001 CEP and R-TF-015-003 CER
  • Pricing
  • Public tenders
  • Trainings
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • completed-tasks
  • task-3b6-surrogate-endpoint-literature-review
  • references
  • diagnostic-accuracy
  • Esteva 2017 — Dermatologist-level classification of skin cancer with deep neural networks

Esteva 2017 — Dermatologist-level classification of skin cancer with deep neural networks

Citation​

Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017 Feb 2;542(7639):115–118. DOI: 10.1038/nature21056. PMID 28117445.

Study design and population​

Retrospective diagnostic-accuracy validation of an Inception-v3 CNN trained on 129,450 clinical images across 2,032 disease classes; head-to-head reader study against 21 board-certified dermatologists on two binary tasks — keratinocyte carcinoma vs. benign seborrheic keratosis; malignant melanoma vs. benign nevi — across clinical photography and dermoscopy. Single-institution (Stanford) development; biopsy-proven test sets.

Reported metrics​

  • Keratinocyte carcinoma vs. benign seborrheic keratosis: AUC 0.96
  • Melanoma vs. benign nevi (clinical photography): AUC 0.94
  • Melanoma vs. benign nevi (dermoscopy): AUC 0.91
  • CNN sensitivity/specificity operating points match or exceed the mean dermatologist operating point across all three tasks
  • 95 % CIs not reported in the primary paper

Surrogate-to-outcome linkage​

Establishes diagnostic accuracy (AUC, sensitivity, specificity against histopathology) as the accepted output-level surrogate for a dermatology image-classifier. The paper positions classification accuracy as a direct proxy for the biopsy/referral decision, which sits on the causal pathway to earlier-stage detection (and, in melanoma, to melanoma-specific survival per the AJCC stage-survival gradient).

CRIT1–7 appraisal​

CriterionScoreJustification
CRIT1 Relevance3Direct match — image-based dermatology AI on malignancy classification; surrogate domain is diagnostic accuracy (7GH).
CRIT2 Methodology2Large training corpus; well-defined test sets; head-to-head against 21 dermatologists with histopathology reference. Not a prospective clinical deployment.
CRIT3 Reporting2AUCs and reader operating points reported; no 95 % CIs; methods reproducible.
CRIT4 Applicability2Consistent with intended use (clinician-supervised decision support). Limited Fitzpatrick-IV–VI representation.
CRIT5 Evidence weight1Retrospective reader study on curated test sets.
CRIT6 Risk of bias2Spectrum bias (biopsy-preselected lesions); single institution; curated image quality may not generalise.
CRIT7 Contribution3Foundational reference establishing dermatologist-level AI classification; universally cited as the anchor for the diagnostic-accuracy surrogate.

Aggregate: strong (landmark inclusion).

Limitations and notes​

Spectrum bias; curated image quality; Fitzpatrick I–III dominant; single institution. Paired balancing references required (Daneshjou 2022, Han 2018) to declare generalisability limits.

Strength as anchor​

Strong for accepted-surrogate claim (diagnostic accuracy as the canonical MDSW-output endpoint in dermatology AI). Not load-bearing for the quantitative surrogate-to-outcome magnitude (the AJCC staging literature carries that anchor).

Previous
Dick 2019 — Accuracy of computer-aided diagnosis of melanoma: a meta-analysis
Next
Freeman 2020 — Algorithm-based smartphone apps for skin cancer risk: BMJ systematic review [BALANCING]
  • Citation
  • Study design and population
  • Reported metrics
  • Surrogate-to-outcome linkage
  • CRIT1–7 appraisal
  • Limitations and notes
  • Strength as anchor
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)