Haenssle 2018 — Man against machine: CNN vs 58 dermatologists for melanoma recognition
Citation
Haenssle HA, Fink C, Schneiderbauer R, Toberer F, Buhl T, Blum A, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018 Aug;29(8):1836–1842. DOI: 10.1093/annonc/mdy166. PMID 29846502.
Study design and population
Pre-registered (DRKS00013570) cross-sectional comparative reader study. Google Inception-v4 CNN vs. 58 international dermatologists (17 countries; 30 experts) on a 100-image dermoscopic test set at two information levels (Level I — dermoscopy only; Level II — dermoscopy + clinical context). Enriched melanoma prevalence (~20 %).
Reported metrics
- Level I dermatologists — sensitivity 86.6 % ± 9.3 SD; specificity 71.3 % ± 11.2 SD
- Level II dermatologists — sensitivity 88.9 %; specificity 75.7 %
- CNN ROC-AUC 0.86 vs. dermatologist mean ROC-AUC 0.79 (p < 0.01)
- At matched dermatologist sensitivity, CNN specificity 82.5 % vs. dermatologist 71.3 % (p < 0.01)
- 95 % CIs not reported (SDs only)
Surrogate-to-outcome linkage
Higher specificity at equal sensitivity translates directly to fewer unnecessary benign excisions while preserving melanoma detection — i.e., improved "appropriate biopsy rate". Higher sensitivity at equal specificity reduces missed melanomas, the proximal mechanism feeding the stage-at-detection → melanoma-specific survival gradient.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | Direct dermoscopic melanoma classification; surrogate domain diagnostic accuracy. |
| CRIT2 Methodology | 2 | Pre-registered; 58-dermatologist comparator cohort; two information-level design; reference standard histopathology. |
| CRIT3 Reporting | 2 | Operating points and AUCs reported; no parametric CIs. |
| CRIT4 Applicability | 2 | Matches intended use (clinician + device). Phototype distribution not reported. |
| CRIT5 Evidence weight | 2 | Large prospective pre-registered multi-reader study (not RCT, not meta-analysis). |
| CRIT6 Risk of bias | 2 | Enriched melanoma prevalence; 100-image artificial test set; possible post-hoc operating-point selection. |
| CRIT7 Contribution | 3 | Core anchor — CNN outperforms expert dermatologists on specificity; links directly to unnecessary-biopsy reduction. |
Aggregate: strong.
Limitations and notes
Artificial reading environment; enriched prevalence; methodology critique on operating-point selection published elsewhere; Fitzpatrick distribution unreported.
Strength as anchor
Strong for the directional claim (improved accuracy → appropriate-biopsy outcome). Regulator-familiar landmark reference; complements the quantitative AJCC anchor.