Haenssle 2020 — Man against machine reloaded: market-approved CNN (Moleanalyzer Pro) vs 96 dermatologists
Citation
Haenssle HA, Fink C, Toberer F, Winkler J, Stolz W, Deinlein T, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020 Jan;31(1):137–143. DOI: 10.1016/j.annonc.2019.10.013.
Study design and population
Two-level comparative reader study; CE-marked CNN Moleanalyzer Pro (FotoFinder) vs. 96 dermatologists across beginner, skilled and expert tiers; 100 pigmented and non-pigmented lesion cases with clinical close-ups, dermoscopy and textual context. Multinational European reader group.
Reported metrics
- Level I dermatologist sensitivity 83.8 % (95 % CI 81.8–85.8); specificity 77.6 % (95 % CI 75.2–80.0)
- Level II dermatologist sensitivity 90.6 % (95 % CI 89.3–92.0); specificity 82.4 % (95 % CI 80.5–84.3)
- CNN sensitivity 95.0 % (95 % CI 83.5–98.6); specificity 76.7 % (95 % CI 64.6–85.6); AUC 0.918 (95 % CI 0.866–0.970)
Surrogate-to-outcome linkage
Because the device tested is a CE-marked medical device operating under realistic dermoscopic-plus-context conditions, accuracy metrics directly parallel a regulatory-grade diagnostic-accuracy surrogate. AI assistance benefits less-experienced users most — the operational mechanism by which a Class IIb CDS device raises appropriate-biopsy and referral rates when deployed at primary-care level.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | CE-marked dermatology CNN; directly analogous regulatory context. |
| CRIT2 Methodology | 2 | 96-reader prospective design across three experience tiers; reference standard histopathology. |
| CRIT3 Reporting | 3 | Point estimates with 95 % CIs reported for sensitivity, specificity and AUC. |
| CRIT4 Applicability | 3 | Highly applicable — CE-marked device, "less artificial" conditions, intended-use population. |
| CRIT5 Evidence weight | 2 | Large prospective multi-reader study on market-approved device. |
| CRIT6 Risk of bias | 2 | 100-case curated dataset; limited phototype diversity; manufacturer co-authorship flag. |
| CRIT7 Contribution | 3 | Highest-applicability reference — demonstrates regulatory-grade diagnostic performance of a commercial CE-marked CNN. |
Aggregate: very strong.
Limitations and notes
Manufacturer co-authorship; 100-case test; curated phototype distribution; no patient-outcome follow-up.
Strength as anchor
Very strong — reference of choice for the regulator-facing argument that a Class IIb dermatology CDS can meet diagnostic-accuracy endpoints adequate for safe deployment. Reports 95 % CIs, increasing CRIT3 score relative to Esteva 2017 / Haenssle 2018.