Appraisal log — CRIT1–7 rolling table
Rolling scoring table across all included references. Per-publication appraisal files live in
references/<domain>/<first-author-year-keyword>.md. CRIT scores per MEDDEV 2.7/1 Rev 4 / CEP literature-review methodology: 1 (weak) – 3 (strong) per criterion.
Legend
- CRIT1 Relevance to device / indication / surrogate domain
- CRIT2 Quality of study methodology (design, sample size, controls)
- CRIT3 Quality of reporting (endpoint definitions, statistical analysis, 95 % CIs)
- CRIT4 Applicability to intended population (image-based dermatology, clinician-supervised use)
- CRIT5 Evidence weight (1 retrospective / validation · 2 RCT / prospective cohort / consensus · 3 meta-analysis / systematic review / regulatory guideline)
- CRIT6 Risk of bias
- CRIT7 Contribution to specific surrogate-validity claim
Inclusion threshold: aggregate across CRIT1–CRIT7 such that the reference materially contributes to an anchor claim. Balancing references intentionally included for completeness.
Domain 1 — Diagnostic accuracy (benefit 7GH)
| Reference | C1 | C2 | C3 | C4 | C5 | C6 | C7 | Strength |
|---|---|---|---|---|---|---|---|---|
| Esteva 2017 (Nature) | 3 | 2 | 2 | 2 | 1 | 2 | 3 | Strong (landmark) |
| Haenssle 2018 (Ann Oncol) | 3 | 2 | 2 | 2 | 2 | 2 | 3 | Strong |
| Haenssle 2020 (Ann Oncol, CE-marked) | 3 | 2 | 3 | 3 | 2 | 2 | 3 | Very strong |
| Tschandl 2020 (Nat Med, human-AI collab) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong |
| Liu 2020 (Nat Med, DLS) | 3 | 2 | 3 | 3 | 1 | 2 | 3 | Strong |
| Dick 2019 (JAMA Derm, meta-analysis) | 3 | 3 | 3 | 2 | 3 | 2 | 3 | Very strong |
| Salinas 2024 (NPJ Digit Med, SR/MA) | 3 | 3 | 3 | 3 | 3 | 2 | 3 | Very strong |
| Winkler 2023 (JAMA Derm, prospective) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong |
| Gershenwald 2017 (CA Cancer J Clin, AJCC 8) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong (outcome anchor) |
| Conic 2018 (JAAD, NCDB surgical delay) | 3 | 2 | 2 | 3 | 1 | 2 | 3 | Strong (outcome bridge) |
| Daneshjou 2022 (Sci Adv) — BALANCING | 3 | 3 | 3 | 3 | 1 | 2 | 3 | Very strong (balancing) |
| Han 2018 (J Invest Dermatol) — BALANCING | 3 | 2 | 2 | 3 | 1 | 2 | 3 | Strong (balancing) |
| Freeman 2020 (BMJ, SR) — BALANCING | 3 | 3 | 3 | 3 | 3 | 2 | 3 | Very strong (balancing) |
Domain 1 summary: 13 references (10 positive + 3 balancing). Minimum ≥ 8 cleared; target 10–12 achieved. Evidence base anchored by 3 systematic reviews / meta-analyses (Dick 2019, Salinas 2024, Freeman 2020), 1 prospective clinical study (Winkler 2023), 4 landmark reader studies (Esteva, Haenssle 2018/2020, Tschandl 2020), 1 large clinical-use validation (Liu 2020), 2 outcome-anchor references (Gershenwald 2017, Conic 2018) and 2 cross-ethnic/phototype balancing references (Daneshjou 2022, Han 2018).
Domain 2 — Severity scoring (benefit 5RB)
| Reference | C1 | C2 | C3 | C4 | C5 | C6 | C7 | Strength |
|---|---|---|---|---|---|---|---|---|
| EMA 2004 (CHMP/EWP/2454/02, psoriasis guideline) | 3 | n/a | 3 | 3 | 3 | 2 | 3 | Very strong (regulatory) |
| Schmitt 2014 (J Allergy Clin Immunol, HOME IV) | 3 | 3 | 3 | 3 | 3 | 2 | 3 | Very strong (consensus) |
| Simpson 2016 (NEJM, dupilumab SOLO 1/2) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong (pivotal RCT) |
| King 2022 (NEJM, baricitinib BRAVE-AA) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong (pivotal RCT) |
| Olsen 2004 (JAAD, SALT definition) | 3 | n/a | 2 | 3 | 1 | 2 | 3 | Strong (foundational) |
| Mattei 2014 (JEADV, PASI-DLQI SR) | 3 | 3 | 2 | 3 | 3 | 2 | 3 | Very strong (quantitative anchor) |
| Mrowietz 2011 (Arch Dermatol Res, European treat-to-target) | 3 | 2 | 3 | 3 | 2 | 2 | 3 | Strong (operational anchor) |
| Fink 2018 (JEADV, PASI variability) | 3 | 2 | 2 | 3 | 1 | 2 | 3 | Strong (reliability anchor) |
| Schaap 2022 (JEADV, CNN PASI) | 3 | 2 | 2 | 3 | 1 | 2 | 3 | Strong (AI analytic validity) |
| Huang 2023 (JMIR Derm, AI PASI SkinTeller) | 3 | 2 | 2 | 3 | 2 | 2 | 3 | Strong (AI outperforms mean dermatologist) |
Domain 2 summary: 10 references. Minimum ≥ 6 cleared; target 8–10 achieved. Evidence base anchored by 1 EU regulatory guideline (EMA 2004), 1 international consensus statement (HOME 2014), 2 pivotal phase-3 RCT papers (Simpson 2016 dupilumab, King 2022 baricitinib), 1 foundational instrument-definition paper (Olsen 2004), 2 quantitative-linkage references (Mattei 2014, Mrowietz 2011), 1 manual-reliability reference (Fink 2018), 2 AI-PASI analytic-validity references (Schaap 2022, Huang 2023).
Domain 3 — Referral optimisation / care-pathway (benefit 3KX)
| Reference | C1 | C2 | C3 | C4 | C5 | C6 | C7 | Strength |
|---|---|---|---|---|---|---|---|---|
| Eminović 2009 (Arch Dermatol, cluster RCT) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong (RCT, referral reduction) |
| Whited 2013 (J Telemed Telecare, RCT) | 3 | 3 | 2 | 2 | 2 | 2 | 3 | Strong (RCT, outcome equivalence) |
| Armstrong 2018 (JAMA Netw Open, equivalency RCT) | 3 | 3 | 3 | 3 | 2 | 2 | 3 | Very strong (RCT, chronic disease) |
| Finnane 2017 (JAMA Derm, SR) | 3 | 3 | 2 | 3 | 3 | 2 | 3 | Very strong (SR) |
| Chuchu 2018 (Cochrane, SR) | 3 | 3 | 3 | 3 | 3 | 2 | 3 | Very strong (Cochrane SR) |
| Giavina-Bianchi 2020 (eClinicalMedicine, 30K pts) | 3 | 2 | 2 | 2 | 1 | 2 | 3 | Strong (real-world wait-time anchor) |
| Moreno-Ramirez 2007 (Arch Dermatol, Seville) | 3 | 2 | 3 | 3 | 1 | 2 | 3 | Strong (EU wait-time anchor) |
| Snoswell 2016 (JAMA Derm, cost-effectiveness SR) | 3 | 3 | 2 | 3 | 3 | 2 | 3 | Very strong (health-economic SR) |
| Jain 2021 (JAMA Netw Open, AI triage PCP / NP) | 3 | 2 | 3 | 3 | 2 | 2 | 3 | Strong (AI-decision-support uplift) |
Domain 3 summary: 9 references. Minimum ≥ 6 cleared; target 8–10 achieved. Evidence base anchored by 3 RCTs (Eminović 2009, Whited 2013, Armstrong 2018), 3 systematic reviews (Finnane 2017, Chuchu 2018 Cochrane, Snoswell 2016), 2 large real-world wait-time references (Giavina-Bianchi 2020, Moreno-Ramirez 2007) and 1 AI-triage uplift reference (Jain 2021).
Aggregate coverage
| Domain | Minimum | Target | Achieved | Primary source of evidence weight |
|---|---|---|---|---|
| Diagnostic accuracy (7GH) | ≥ 8 | 10–12 | 13 | 3 SR/MA + 1 prospective + 4 reader studies + 2 outcome anchors + 3 balancing |
| Severity scoring (5RB) | ≥ 6 | 8–10 | 10 | 1 EMA guideline + 1 international consensus + 2 phase-3 RCTs + 2 AI-PASI validations |
| Referral optimisation (3KX) | ≥ 6 | 8–10 | 9 | 3 RCTs + 3 SRs + 2 real-world wait-time + 1 AI-triage |
| Total | ≥ 20 | 26–32 | 32 |
Mandatory balancing references present:
- Daneshjou 2022 (phototype bias) — confirms generalisability limit
- Han 2018 (cross-ethnicity generalisation) — complementary phototype evidence
- Freeman 2020 (BMJ SR of smartphone apps) — confirms AI-dermatology heterogeneity across products; underpins device-specific clinical-data requirement
Declared evidence gaps (for CER and PMCF declaration)
- No AI-dermatology RCT with mortality or stage-shift as primary endpoint. The surrogate-to-outcome chain rests on reader-study accuracy equivalence plus independently established AJCC stage-survival gradient.
- Phototype-stratified prospective evidence is sparse. Daneshjou 2022 quantifies the gap; motivates PMCF subgroup-stratified performance monitoring.
- Long-term automated-severity-scoring outcome data are thin. Analytic-validity evidence (Schaap 2022, Huang 2023) strong; durable DLQI / POEM outcome data following AI-PASI deployment unavailable.
- AI-triage RCT evidence lags teledermatology RCT evidence. Eminović 2009, Whited 2013, Armstrong 2018 test human-teledermatologist workflows; direct RCT of AI-triage vs. teledermatology not yet published.
These gaps are declared in surrogate-validity-review.md §8 and tracked as PMCF-plan commitments in R-TF-007-002.