Han 2018 — Clinical-image classification for benign and malignant tumours (cross-ethnicity) [BALANCING]

Citation

Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol. 2018 Jul;138(7):1529–1538. DOI: 10.1016/j.jid.2018.01.028. PMID 29428356.

Study design and population

Retrospective multi-dataset validation of a ResNet-152 CNN trained on Asan (Korean, FST III–V) + Atlas datasets (~19,398 images, 12 diagnostic categories). External validation on Asan, Edinburgh (Caucasian) and Hallym test sets. 16 Korean dermatologists compared on a 480-image subset.

Reported metrics

Asan internal test — AUC BCC 0.96 ± 0.01; SCC 0.83; IEC 0.82; melanoma 0.96
Edinburgh external test — AUC BCC 0.90; SCC 0.91; IEC 0.83; melanoma 0.88
Hallym external test — BCC sensitivity 87.1 % ± 6.0 %
95 % CIs not reported (cross-validation SDs)

Surrogate-to-outcome linkage

Cross-ethnicity external-validation evidence: melanoma AUC drop from 0.96 (Asian training/test) to 0.88 (Caucasian test) illustrates generalisation limits. Diagnostic accuracy as a proxy for appropriate biopsy is population-dependent; training-set ethnic composition materially affects performance, with implications for the surrogate-to-outcome chain in under-represented phenotypes.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Direct — classifier performance across ethnic / phototype groups.
CRIT2 Methodology	2	Multi-dataset external validation; head-to-head dermatologist comparison; no prospective deployment.
CRIT3 Reporting	2	Per-dataset AUCs with SDs; no parametric CIs.
CRIT4 Applicability	3	Directly relevant to MDR Annex I §17.2 intended-population generalisability.
CRIT5 Evidence weight	1	Retrospective multi-dataset validation.
CRIT6 Risk of bias	2	Training-set ethnically homogeneous (Korean); external datasets vary; no outcome follow-up.
CRIT7 Contribution	3	Complementary balancing reference (with Daneshjou 2022) — explicit cross-ethnicity generalisability quantification.

Aggregate: strong (as balancing reference).

Limitations and notes

Ethnic / skin-tone heterogeneity handled via dataset comparison rather than FST stratification; external dataset sizes modest.

Strength as anchor

Strong as a balancing reference. Complements Daneshjou 2022 by providing cross-ethnic (not just cross-FST) external-validation evidence. Confirms the spectrum-bias concern from Dick 2019 meta-analysis with population-level granularity.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​