Huang 2023 — AI-based PASI severity assessment: real-world study (SkinTeller)

Citation

Huang Y, Wei Q, Li Y, et al. Artificial Intelligence–Based Psoriasis Severity Assessment: Real-World Study With PASI as a Benchmark. JMIR Dermatol. 2023;6:e44932. DOI: 10.2196/44932.

Study design and population

Development and prospective validation of a deep-learning system for automated PASI scoring. Training set: 14,096 images from 2,367 patients. Internal validation cohort: 405 patients. Comparator: 43 experienced dermatologists from 18 hospitals. Subsequent real-world deployment via the SkinTeller app (3,369 uses across 18 hospitals).

Reported metrics

Mean absolute error (MAE) 2.05 PASI points using 3 input images
AI outperformed the 43-dermatologist mean by 33.2 % on PASI estimation
Lin's concordance correlation ≈ 0.86; Pearson r ≈ 0.90 vs. trained-dermatologist PASI
Sub-score improvements: erythema 23 %, induration 7 %, desquamation 11 %, area ratio 12 %

Surrogate-to-outcome linkage

Confirms that AI-automated PASI achieves not only acceptable concordance with expert-panel scoring but actively reduces rater variability — directly supporting the clinical claim that automated severity scoring is a valid and (at sub-score level) superior surrogate to manual scoring. Real-world deployment data across 18 hospitals adds ecological validity.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Direct — AI PASI, intended-use device modality.
CRIT2 Methodology	2	Large training set; prospective validation cohort; multi-centre dermatologist comparator (43 readers, 18 hospitals).
CRIT3 Reporting	2	MAE, concordance correlation and sub-score gains reported; 95 % CIs not all reported.
CRIT4 Applicability	3	Image-based, matches CDS modality; real-world deployment data.
CRIT5 Evidence weight	2	Prospective validation with large real-world comparator cohort.
CRIT6 Risk of bias	2	Single-country (China); dermatologists used for ground truth rather than biopsy; MAE 2.05 may still cross threshold for individual patients.
CRIT7 Contribution	3	Strong modern anchor — AI PASI outperforms dermatologist mean, not merely matches it.

Aggregate: strong.

Limitations and notes

Single-country (China); dermatologist consensus reference standard; MAE 2.05 points can still flip borderline treatment thresholds; no phototype stratification.

Strength as anchor

Strong complement to Schaap 2022 — where Schaap demonstrates CNN-in-physician-range agreement, Huang shows CNN-beats-dermatologist-mean at sub-score level across 43 readers. Together they sufficiently anchor the AI-PASI analytic-validity claim.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​