Liu 2020 — A deep learning system for differential diagnosis of skin diseases
Citation
Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020 Jun;26(6):900–908. DOI: 10.1038/s41591-020-0842-3. PMID 32424212.
Study design and population
Development and temporal-split validation of a 26-class deep-learning system (DLS; covering 419 long-tail conditions) on 16,114 de-identified teledermatology cases from 17 US primary-care-affiliated sites. Reader subset: 6 dermatologists, 6 PCPs, 6 NPs on 963 cases.
Reported metrics
- DLS top-1 accuracy 0.66 (95 % CI 0.63–0.68); top-3 0.90 (95 % CI 0.88–0.91)
- DLS top-1 non-inferior to dermatologists (0.63; p < 0.001)
- DLS superior to PCPs (0.44) and NPs (0.40) at p < 0.001
- Reference standard: dermatologist consensus (not histopathology for most cases)
Surrogate-to-outcome linkage
Top-k differential-diagnosis accuracy is the surrogate for appropriate downstream management (referral, empirical therapy, biopsy). The DLS narrows the gap between non-specialists and dermatologists — the exact population most affected by diagnostic error in primary care — plausibly reducing referral errors and diagnostic delay.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | Direct match — DLS for dermatology differential diagnosis, clinician-supervised workflow. |
| CRIT2 Methodology | 2 | Large teledermatology corpus; temporal-split validation; comparator cohorts across experience tiers. Not prospective/RCT. |
| CRIT3 Reporting | 3 | Accuracy metrics with 95 % CIs reported. |
| CRIT4 Applicability | 3 | Intended-use population directly analogous — PCP/NP use with clinician supervision. |
| CRIT5 Evidence weight | 1 | Retrospective temporal-split validation. |
| CRIT6 Risk of bias | 2 | Reference standard is dermatologist consensus, not histopathology; ~3 % FST V–VI; single US teledermatology service; industry (Google) funding. |
| CRIT7 Contribution | 3 | Core anchor — quantifies the AI-to-non-specialist accuracy uplift central to the directional claim. |
Aggregate: strong.
Limitations and notes
Dermatologist-consensus reference standard; limited FST V–VI; single US teledermatology service; retrospective; industry funding.
Strength as anchor
Strong — one of the few large studies with a non-specialist comparator cohort, directly evidencing the primary-care diagnostic-uplift mechanism. Supports both diagnostic-accuracy and referral-optimisation domains (cross-referenced).