Han 2018 — Clinical-image classification for benign and malignant tumours (cross-ethnicity) [BALANCING]
Citation
Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol. 2018 Jul;138(7):1529–1538. DOI: 10.1016/j.jid.2018.01.028. PMID 29428356.
Study design and population
Retrospective multi-dataset validation of a ResNet-152 CNN trained on Asan (Korean, FST III–V) + Atlas datasets (~19,398 images, 12 diagnostic categories). External validation on Asan, Edinburgh (Caucasian) and Hallym test sets. 16 Korean dermatologists compared on a 480-image subset.
Reported metrics
- Asan internal test — AUC BCC 0.96 ± 0.01; SCC 0.83; IEC 0.82; melanoma 0.96
- Edinburgh external test — AUC BCC 0.90; SCC 0.91; IEC 0.83; melanoma 0.88
- Hallym external test — BCC sensitivity 87.1 % ± 6.0 %
- 95 % CIs not reported (cross-validation SDs)
Surrogate-to-outcome linkage
Cross-ethnicity external-validation evidence: melanoma AUC drop from 0.96 (Asian training/test) to 0.88 (Caucasian test) illustrates generalisation limits. Diagnostic accuracy as a proxy for appropriate biopsy is population-dependent; training-set ethnic composition materially affects performance, with implications for the surrogate-to-outcome chain in under-represented phenotypes.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | Direct — classifier performance across ethnic / phototype groups. |
| CRIT2 Methodology | 2 | Multi-dataset external validation; head-to-head dermatologist comparison; no prospective deployment. |
| CRIT3 Reporting | 2 | Per-dataset AUCs with SDs; no parametric CIs. |
| CRIT4 Applicability | 3 | Directly relevant to MDR Annex I §17.2 intended-population generalisability. |
| CRIT5 Evidence weight | 1 | Retrospective multi-dataset validation. |
| CRIT6 Risk of bias | 2 | Training-set ethnically homogeneous (Korean); external datasets vary; no outcome follow-up. |
| CRIT7 Contribution | 3 | Complementary balancing reference (with Daneshjou 2022) — explicit cross-ethnicity generalisability quantification. |
Aggregate: strong (as balancing reference).
Limitations and notes
Ethnic / skin-tone heterogeneity handled via dataset comparison rather than FST stratification; external dataset sizes modest.
Strength as anchor
Strong as a balancing reference. Complements Daneshjou 2022 by providing cross-ethnic (not just cross-FST) external-validation evidence. Confirms the spectrum-bias concern from Dick 2019 meta-analysis with population-level granularity.