Huang 2023 — AI-based PASI severity assessment: real-world study (SkinTeller)
Citation
Huang Y, Wei Q, Li Y, et al. Artificial Intelligence–Based Psoriasis Severity Assessment: Real-World Study With PASI as a Benchmark. JMIR Dermatol. 2023;6:e44932. DOI: 10.2196/44932.
Study design and population
Development and prospective validation of a deep-learning system for automated PASI scoring. Training set: 14,096 images from 2,367 patients. Internal validation cohort: 405 patients. Comparator: 43 experienced dermatologists from 18 hospitals. Subsequent real-world deployment via the SkinTeller app (3,369 uses across 18 hospitals).
Reported metrics
- Mean absolute error (MAE) 2.05 PASI points using 3 input images
- AI outperformed the 43-dermatologist mean by 33.2 % on PASI estimation
- Lin's concordance correlation ≈ 0.86; Pearson r ≈ 0.90 vs. trained-dermatologist PASI
- Sub-score improvements: erythema 23 %, induration 7 %, desquamation 11 %, area ratio 12 %
Surrogate-to-outcome linkage
Confirms that AI-automated PASI achieves not only acceptable concordance with expert-panel scoring but actively reduces rater variability — directly supporting the clinical claim that automated severity scoring is a valid and (at sub-score level) superior surrogate to manual scoring. Real-world deployment data across 18 hospitals adds ecological validity.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | Direct — AI PASI, intended-use device modality. |
| CRIT2 Methodology | 2 | Large training set; prospective validation cohort; multi-centre dermatologist comparator (43 readers, 18 hospitals). |
| CRIT3 Reporting | 2 | MAE, concordance correlation and sub-score gains reported; 95 % CIs not all reported. |
| CRIT4 Applicability | 3 | Image-based, matches CDS modality; real-world deployment data. |
| CRIT5 Evidence weight | 2 | Prospective validation with large real-world comparator cohort. |
| CRIT6 Risk of bias | 2 | Single-country (China); dermatologists used for ground truth rather than biopsy; MAE 2.05 may still cross threshold for individual patients. |
| CRIT7 Contribution | 3 | Strong modern anchor — AI PASI outperforms dermatologist mean, not merely matches it. |
Aggregate: strong.
Limitations and notes
Single-country (China); dermatologist consensus reference standard; MAE 2.05 points can still flip borderline treatment thresholds; no phototype stratification.
Strength as anchor
Strong complement to Schaap 2022 — where Schaap demonstrates CNN-in-physician-range agreement, Huang shows CNN-beats-dermatologist-mean at sub-score level across 43 readers. Together they sufficiently anchor the AI-PASI analytic-validity claim.