Haenssle 2018 — Man against machine: CNN vs 58 dermatologists for melanoma recognition

Citation

Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018 Aug;29(8):1836–1842. DOI: 10.1093/annonc/mdy166. PMID 29846502.

Study design and population

Pre-registered (DRKS00013570) cross-sectional comparative reader study. Google Inception-v4 CNN vs. 58 international dermatologists (17 countries; 30 experts) on a 100-image dermoscopic test set at two information levels (Level I — dermoscopy only; Level II — dermoscopy + clinical context). Enriched melanoma prevalence (~20 %).

Reported metrics

Level I dermatologists — sensitivity 86.6 % ± 9.3 SD; specificity 71.3 % ± 11.2 SD
Level II dermatologists — sensitivity 88.9 %; specificity 75.7 %
CNN ROC-AUC 0.86 vs. dermatologist mean ROC-AUC 0.79 (p < 0.01)
At matched dermatologist sensitivity, CNN specificity 82.5 % vs. dermatologist 71.3 % (p < 0.01)
95 % CIs not reported (SDs only)

Surrogate-to-outcome linkage

Higher specificity at equal sensitivity translates directly to fewer unnecessary benign excisions while preserving melanoma detection — i.e., improved "appropriate biopsy rate". Higher sensitivity at equal specificity reduces missed melanomas, the proximal mechanism feeding the stage-at-detection → melanoma-specific survival gradient.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Direct dermoscopic melanoma classification; surrogate domain diagnostic accuracy.
CRIT2 Methodology	2	Pre-registered; 58-dermatologist comparator cohort; two information-level design; reference standard histopathology.
CRIT3 Reporting	2	Operating points and AUCs reported; no parametric CIs.
CRIT4 Applicability	2	Matches intended use (clinician + device). Phototype distribution not reported.
CRIT5 Evidence weight	2	Large prospective pre-registered multi-reader study (not RCT, not meta-analysis).
CRIT6 Risk of bias	2	Enriched melanoma prevalence; 100-image artificial test set; possible post-hoc operating-point selection.
CRIT7 Contribution	3	Core anchor — CNN outperforms expert dermatologists on specificity; links directly to unnecessary-biopsy reduction.

Aggregate: strong.

Limitations and notes

Artificial reading environment; enriched prevalence; methodology critique on operating-point selection published elsewhere; Fitzpatrick distribution unreported.

Strength as anchor

Strong for the directional claim (improved accuracy → appropriate-biopsy outcome). Regulator-familiar landmark reference; complements the quantitative AJCC anchor.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​