Haenssle 2020 — Man against machine reloaded: market-approved CNN (Moleanalyzer Pro) vs 96 dermatologists

Citation

Haenssle HA, Fink C, Toberer F, Winkler J, Stolz W, Deinlein T, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020 Jan;31(1):137–143. DOI: 10.1016/j.annonc.2019.10.013.

Study design and population

Two-level comparative reader study; CE-marked CNN Moleanalyzer Pro (FotoFinder) vs. 96 dermatologists across beginner, skilled and expert tiers; 100 pigmented and non-pigmented lesion cases with clinical close-ups, dermoscopy and textual context. Multinational European reader group.

Reported metrics

Level I dermatologist sensitivity 83.8 % (95 % CI 81.8–85.8); specificity 77.6 % (95 % CI 75.2–80.0)
Level II dermatologist sensitivity 90.6 % (95 % CI 89.3–92.0); specificity 82.4 % (95 % CI 80.5–84.3)
CNN sensitivity 95.0 % (95 % CI 83.5–98.6); specificity 76.7 % (95 % CI 64.6–85.6); AUC 0.918 (95 % CI 0.866–0.970)

Surrogate-to-outcome linkage

Because the device tested is a CE-marked medical device operating under realistic dermoscopic-plus-context conditions, accuracy metrics directly parallel a regulatory-grade diagnostic-accuracy surrogate. AI assistance benefits less-experienced users most — the operational mechanism by which a Class IIb CDS device raises appropriate-biopsy and referral rates when deployed at primary-care level.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	CE-marked dermatology CNN; directly analogous regulatory context.
CRIT2 Methodology	2	96-reader prospective design across three experience tiers; reference standard histopathology.
CRIT3 Reporting	3	Point estimates with 95 % CIs reported for sensitivity, specificity and AUC.
CRIT4 Applicability	3	Highly applicable — CE-marked device, "less artificial" conditions, intended-use population.
CRIT5 Evidence weight	2	Large prospective multi-reader study on market-approved device.
CRIT6 Risk of bias	2	100-case curated dataset; limited phototype diversity; manufacturer co-authorship flag.
CRIT7 Contribution	3	Highest-applicability reference — demonstrates regulatory-grade diagnostic performance of a commercial CE-marked CNN.

Aggregate: very strong.

Limitations and notes

Manufacturer co-authorship; 100-case test; curated phototype distribution; no patient-outcome follow-up.

Strength as anchor

Very strong — reference of choice for the regulator-facing argument that a Class IIb dermatology CDS can meet diagnostic-accuracy endpoints adequate for safe deployment. Reports 95 % CIs, increasing CRIT3 score relative to Esteva 2017 / Haenssle 2018.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​