Winkler 2023 — Dermatologists cooperating with a CNN: prospective clinical study

Citation

Winkler JK, Blum A, Kommoss K, Enk A, Toberer F, Rosenberger A, Haenssle HA. Assessment of Diagnostic Performance of Dermatologists Cooperating With a Convolutional Neural Network in a Prospective Clinical Study: Human With Machine. JAMA Dermatol. 2023 Jun 1;159(6):621–627. DOI: 10.1001/jamadermatol.2023.0905. PMID 37133847.

Study design and population

Prospective two-centre clinical study. 22 dermatologists evaluated 228 suspect melanocytic lesions with and without market-approved CNN support (Moleanalyzer Pro, FotoFinder). Histopathological reference available for 54.8 % of lesions.

Reported metrics

Sensitivity: dermatologist alone 84.2 % (95 % CI 69.6–92.6) → with CNN 100.0 % (95 % CI 90.8–100.0); p = 0.03
Specificity: 72.1 % → 83.7 %; p < 0.001
ROC AUC: 0.895 (95 % CI 0.836–0.954) → 0.968 (95 % CI 0.948–0.988); p = 0.005
CNN guidance reduced unnecessary excision of benign nevi by 19.2 %

Surrogate-to-outcome linkage

Prospective, real-world evidence that AI-assisted diagnostic accuracy translates into a measurable reduction in unnecessary procedures (19.2 % fewer benign excisions) while simultaneously eliminating missed melanomas. Closes the loop from accuracy surrogate to the patient-relevant iatrogenic-harm outcome.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Prospective clinical study of CE-marked CNN in the intended clinician-supervised workflow.
CRIT2 Methodology	3	Prospective, two-centre; within-subject before-after design; histopathology reference.
CRIT3 Reporting	3	Sensitivity, specificity, AUC with 95 % CIs and p-values reported.
CRIT4 Applicability	3	Direct match — dermatologist + CNN in real clinical workflow.
CRIT5 Evidence weight	2	Prospective clinical study (not RCT, not meta-analysis).
CRIT6 Risk of bias	2	Within-subject design; two-centre; histopathology available for 54.8 % only.
CRIT7 Contribution	3	Core anchor — links accuracy uplift to reduced benign excisions, a patient-relevant outcome.

Aggregate: very strong.

Limitations and notes

Two-centre design; histopathology partial; industry-affiliated device developer in author list.

Strength as anchor

Very strong — one of the few prospective real-world studies quantifying the patient-relevant outcome (avoided benign excisions) downstream of AI-supported accuracy. Complements Tschandl 2020 (simulated reader) with real-deployment evidence.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​