Salinas 2024 — Systematic review and meta-analysis of AI vs. clinicians for skin cancer diagnosis
Citation
Salinas MP, Sepúlveda J, Hidalgo L, Peirano D, Morel M, Uribe P, et al. A systematic review and meta-analysis of artificial intelligence versus clinicians for skin cancer diagnosis. NPJ Digit Med. 2024 May 14;7(1):125. DOI: 10.1038/s41746-024-01103-x. PMID 38744955.
Study design and population
Pre-registered systematic review and meta-analysis (PRISMA 2020, QUADAS-2); 53 comparative studies screened; 19 included in bivariate meta-analysis. AI algorithms vs. clinicians for benign/malignant classification against histopathology reference.
Reported metrics
- Pooled AI sensitivity 87.0 % (95 % CI 81.7–90.9); specificity 77.1 % (95 % CI 69.8–83.0)
- All clinicians — sensitivity 79.8 % (95 % CI 73.2–85.1); specificity 73.6 % (95 % CI 66.5–79.6)
- AI vs. expert dermatologists — clinically comparable (AI sens 86.3 %, spec 78.4 % vs. expert sens 84.2 %, spec 74.4 %)
- AI vs. generalists — AI markedly superior in sensitivity (92.5 % vs. 64.6 %)
Surrogate-to-outcome linkage
Confirms at meta-analytic level that AI diagnostic accuracy is comparable to expert dermatologists and significantly superior to non-specialists, who constitute the initial point of contact for most skin-lesion presentations. Validates sensitivity/specificity as a surrogate for clinically relevant triage accuracy at the primary-care care-step.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | Direct — AI vs clinicians for skin-cancer diagnosis. |
| CRIT2 Methodology | 3 | PRISMA 2020, pre-registered, QUADAS-2, bivariate meta-analysis. |
| CRIT3 Reporting | 3 | Pooled sensitivity/specificity with 95 % CIs by comparator subgroup. |
| CRIT4 Applicability | 3 | Subgroup analyses (expert vs. generalist) match the intended-use clinical context. |
| CRIT5 Evidence weight | 3 | Meta-analysis — highest tier. |
| CRIT6 Risk of bias | 2 | QUADAS-2 concerns in constituent studies; predominantly curated-dataset evaluations. |
| CRIT7 Contribution | 3 | Contemporary anchor for accepted-surrogate + directional claims; adds generalist-comparator data that Dick 2019 lacks. |
Aggregate: very strong.
Limitations and notes
Constituent studies largely use curated datasets; phototype coverage inconsistently reported; no direct patient-outcome linkage.
Strength as anchor
Very strong — complements Dick 2019 with more recent data (search to August 2022) and an explicit expert-vs-generalist comparator stratification that anchors the CDS primary-care-uplift argument.