Liu 2020 — A deep learning system for differential diagnosis of skin diseases

Citation

Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020 Jun;26(6):900–908. DOI: 10.1038/s41591-020-0842-3. PMID 32424212.

Study design and population

Development and temporal-split validation of a 26-class deep-learning system (DLS; covering 419 long-tail conditions) on 16,114 de-identified teledermatology cases from 17 US primary-care-affiliated sites. Reader subset: 6 dermatologists, 6 PCPs, 6 NPs on 963 cases.

Reported metrics

DLS top-1 accuracy 0.66 (95 % CI 0.63–0.68); top-3 0.90 (95 % CI 0.88–0.91)
DLS top-1 non-inferior to dermatologists (0.63; p < 0.001)
DLS superior to PCPs (0.44) and NPs (0.40) at p < 0.001
Reference standard: dermatologist consensus (not histopathology for most cases)

Surrogate-to-outcome linkage

Top-k differential-diagnosis accuracy is the surrogate for appropriate downstream management (referral, empirical therapy, biopsy). The DLS narrows the gap between non-specialists and dermatologists — the exact population most affected by diagnostic error in primary care — plausibly reducing referral errors and diagnostic delay.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Direct match — DLS for dermatology differential diagnosis, clinician-supervised workflow.
CRIT2 Methodology	2	Large teledermatology corpus; temporal-split validation; comparator cohorts across experience tiers. Not prospective/RCT.
CRIT3 Reporting	3	Accuracy metrics with 95 % CIs reported.
CRIT4 Applicability	3	Intended-use population directly analogous — PCP/NP use with clinician supervision.
CRIT5 Evidence weight	1	Retrospective temporal-split validation.
CRIT6 Risk of bias	2	Reference standard is dermatologist consensus, not histopathology; ~3 % FST V–VI; single US teledermatology service; industry (Google) funding.
CRIT7 Contribution	3	Core anchor — quantifies the AI-to-non-specialist accuracy uplift central to the directional claim.

Aggregate: strong.

Limitations and notes

Dermatologist-consensus reference standard; limited FST V–VI; single US teledermatology service; retrospective; industry funding.

Strength as anchor

Strong — one of the few large studies with a non-specialist comparator cohort, directly evidencing the primary-care diagnostic-uplift mechanism. Supports both diagnostic-accuracy and referral-optimisation domains (cross-referenced).

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​