Dick 2019 — Accuracy of computer-aided diagnosis of melanoma: a meta-analysis

Citation

Dick V, Sinz C, Mittlböck M, Kittler H, Tschandl P. Accuracy of computer-aided diagnosis of melanoma: a meta-analysis. JAMA Dermatol. 2019 Nov 1;155(11):1291–1299. DOI: 10.1001/jamadermatol.2019.1375. PMID 31215969.

Study design and population

Systematic review and bivariate random-effects meta-analysis of 132 computer-aided melanoma diagnosis studies (January 2002 – December 2018); 70 studies quantitatively pooled.

Reported metrics

Pooled CAD melanoma sensitivity 0.74 (95 % CI 0.66–0.80)
Pooled CAD melanoma specificity 0.84 (95 % CI 0.79–0.88)
Independent test sets: sensitivity 0.51 (95 % CI 0.34–0.69) vs. non-independent 0.82 (95 % CI 0.77–0.86), p < 0.001
CAD performance approximately equivalent to dermatologist sensitivity; ~10 pp lower specificity (non-significant)

Surrogate-to-outcome linkage

Highest-weight aggregate evidence that AI-derived diagnostic accuracy (sensitivity/specificity) is the accepted endpoint class for melanoma recognition in dermatology AI. The independent-test-set drop quantifies the external-validity gap that directly motivates PMCF performance monitoring under the EU MDR.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Direct — CAD melanoma diagnostic accuracy.
CRIT2 Methodology	3	Systematic review with bivariate random-effects meta-analysis; independent-vs-non-independent test-set stratification.
CRIT3 Reporting	3	Pooled estimates with 95 % CIs; spectrum-bias effect quantified.
CRIT4 Applicability	2	Aggregates predominantly curated-dataset studies; generalisability to primary-care populations limited.
CRIT5 Evidence weight	3	Meta-analysis — highest tier.
CRIT6 Risk of bias	2	High heterogeneity across primary studies; publication bias likely; limited phototype diversity in included studies.
CRIT7 Contribution	3	Core aggregate anchor for the accepted-surrogate claim; quantifies the independent-test-set gap as a declared PMCF target.

Aggregate: very strong.

Limitations and notes

Heterogeneous primary studies; many from computer-science literature rather than clinical dermatology; publication bias; homogeneous phototype coverage.

Strength as anchor

Very strong — the highest-tier aggregate evidence in the domain. Used as the regulator-facing weight for the accepted-surrogate claim and the motivating citation for the phototype-bias + PMCF-performance-monitoring narrative.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​