Appraisal log — CRIT1–7 rolling table

Rolling scoring table across all included references. Per-publication appraisal files live in references/<domain>/<first-author-year-keyword>.md. CRIT scores per MEDDEV 2.7/1 Rev 4 / CEP literature-review methodology: 1 (weak) – 3 (strong) per criterion.

Legend

CRIT1 Relevance to device / indication / surrogate domain
CRIT2 Quality of study methodology (design, sample size, controls)
CRIT3 Quality of reporting (endpoint definitions, statistical analysis, 95 % CIs)
CRIT4 Applicability to intended population (image-based dermatology, clinician-supervised use)
CRIT5 Evidence weight (1 retrospective / validation · 2 RCT / prospective cohort / consensus · 3 meta-analysis / systematic review / regulatory guideline)
CRIT6 Risk of bias
CRIT7 Contribution to specific surrogate-validity claim

Inclusion threshold: aggregate across CRIT1–CRIT7 such that the reference materially contributes to an anchor claim. Balancing references intentionally included for completeness.

Domain 1 — Diagnostic accuracy (benefit 7GH)

Reference	C1	C2	C3	C4	C5	C6	C7	Strength
Esteva 2017 (Nature)	3	2	2	2	1	2	3	Strong (landmark)
Haenssle 2018 (Ann Oncol)	3	2	2	2	2	2	3	Strong
Haenssle 2020 (Ann Oncol, CE-marked)	3	2	3	3	2	2	3	Very strong
Tschandl 2020 (Nat Med, human-AI collab)	3	3	3	3	2	2	3	Very strong
Liu 2020 (Nat Med, DLS)	3	2	3	3	1	2	3	Strong
Dick 2019 (JAMA Derm, meta-analysis)	3	3	3	2	3	2	3	Very strong
Salinas 2024 (NPJ Digit Med, SR/MA)	3	3	3	3	3	2	3	Very strong
Winkler 2023 (JAMA Derm, prospective)	3	3	3	3	2	2	3	Very strong
Gershenwald 2017 (CA Cancer J Clin, AJCC 8)	3	3	3	3	2	2	3	Very strong (outcome anchor)
Conic 2018 (JAAD, NCDB surgical delay)	3	2	2	3	1	2	3	Strong (outcome bridge)
Daneshjou 2022 (Sci Adv) — BALANCING	3	3	3	3	1	2	3	Very strong (balancing)
Han 2018 (J Invest Dermatol) — BALANCING	3	2	2	3	1	2	3	Strong (balancing)
Freeman 2020 (BMJ, SR) — BALANCING	3	3	3	3	3	2	3	Very strong (balancing)

Domain 1 summary: 13 references (10 positive + 3 balancing). Minimum ≥ 8 cleared; target 10–12 achieved. Evidence base anchored by 3 systematic reviews / meta-analyses (Dick 2019, Salinas 2024, Freeman 2020), 1 prospective clinical study (Winkler 2023), 4 landmark reader studies (Esteva, Haenssle 2018/2020, Tschandl 2020), 1 large clinical-use validation (Liu 2020), 2 outcome-anchor references (Gershenwald 2017, Conic 2018) and 2 cross-ethnic/phototype balancing references (Daneshjou 2022, Han 2018).

Domain 2 — Severity scoring (benefit 5RB)

Reference	C1	C2	C3	C4	C5	C6	C7	Strength
EMA 2004 (CHMP/EWP/2454/02, psoriasis guideline)	3	n/a	3	3	3	2	3	Very strong (regulatory)
Schmitt 2014 (J Allergy Clin Immunol, HOME IV)	3	3	3	3	3	2	3	Very strong (consensus)
Simpson 2016 (NEJM, dupilumab SOLO 1/2)	3	3	3	3	2	2	3	Very strong (pivotal RCT)
King 2022 (NEJM, baricitinib BRAVE-AA)	3	3	3	3	2	2	3	Very strong (pivotal RCT)
Olsen 2004 (JAAD, SALT definition)	3	n/a	2	3	1	2	3	Strong (foundational)
Mattei 2014 (JEADV, PASI-DLQI SR)	3	3	2	3	3	2	3	Very strong (quantitative anchor)
Mrowietz 2011 (Arch Dermatol Res, European treat-to-target)	3	2	3	3	2	2	3	Strong (operational anchor)
Fink 2018 (JEADV, PASI variability)	3	2	2	3	1	2	3	Strong (reliability anchor)
Schaap 2022 (JEADV, CNN PASI)	3	2	2	3	1	2	3	Strong (AI analytic validity)
Huang 2023 (JMIR Derm, AI PASI SkinTeller)	3	2	2	3	2	2	3	Strong (AI outperforms mean dermatologist)

Domain 2 summary: 10 references. Minimum ≥ 6 cleared; target 8–10 achieved. Evidence base anchored by 1 EU regulatory guideline (EMA 2004), 1 international consensus statement (HOME 2014), 2 pivotal phase-3 RCT papers (Simpson 2016 dupilumab, King 2022 baricitinib), 1 foundational instrument-definition paper (Olsen 2004), 2 quantitative-linkage references (Mattei 2014, Mrowietz 2011), 1 manual-reliability reference (Fink 2018), 2 AI-PASI analytic-validity references (Schaap 2022, Huang 2023).

Domain 3 — Referral optimisation / care-pathway (benefit 3KX)

Reference	C1	C2	C3	C4	C5	C6	C7	Strength
Eminović 2009 (Arch Dermatol, cluster RCT)	3	3	3	3	2	2	3	Very strong (RCT, referral reduction)
Whited 2013 (J Telemed Telecare, RCT)	3	3	2	2	2	2	3	Strong (RCT, outcome equivalence)
Armstrong 2018 (JAMA Netw Open, equivalency RCT)	3	3	3	3	2	2	3	Very strong (RCT, chronic disease)
Finnane 2017 (JAMA Derm, SR)	3	3	2	3	3	2	3	Very strong (SR)
Chuchu 2018 (Cochrane, SR)	3	3	3	3	3	2	3	Very strong (Cochrane SR)
Giavina-Bianchi 2020 (eClinicalMedicine, 30K pts)	3	2	2	2	1	2	3	Strong (real-world wait-time anchor)
Moreno-Ramirez 2007 (Arch Dermatol, Seville)	3	2	3	3	1	2	3	Strong (EU wait-time anchor)
Snoswell 2016 (JAMA Derm, cost-effectiveness SR)	3	3	2	3	3	2	3	Very strong (health-economic SR)
Jain 2021 (JAMA Netw Open, AI triage PCP / NP)	3	2	3	3	2	2	3	Strong (AI-decision-support uplift)

Domain 3 summary: 9 references. Minimum ≥ 6 cleared; target 8–10 achieved. Evidence base anchored by 3 RCTs (Eminović 2009, Whited 2013, Armstrong 2018), 3 systematic reviews (Finnane 2017, Chuchu 2018 Cochrane, Snoswell 2016), 2 large real-world wait-time references (Giavina-Bianchi 2020, Moreno-Ramirez 2007) and 1 AI-triage uplift reference (Jain 2021).

Aggregate coverage

Domain	Minimum	Target	Achieved	Primary source of evidence weight
Diagnostic accuracy (7GH)	≥ 8	10–12	13	3 SR/MA + 1 prospective + 4 reader studies + 2 outcome anchors + 3 balancing
Severity scoring (5RB)	≥ 6	8–10	10	1 EMA guideline + 1 international consensus + 2 phase-3 RCTs + 2 AI-PASI validations
Referral optimisation (3KX)	≥ 6	8–10	9	3 RCTs + 3 SRs + 2 real-world wait-time + 1 AI-triage
Total	≥ 20	26–32	32

Mandatory balancing references present:

Daneshjou 2022 (phototype bias) — confirms generalisability limit
Han 2018 (cross-ethnicity generalisation) — complementary phototype evidence
Freeman 2020 (BMJ SR of smartphone apps) — confirms AI-dermatology heterogeneity across products; underpins device-specific clinical-data requirement

Declared evidence gaps (for CER and PMCF declaration)

No AI-dermatology RCT with mortality or stage-shift as primary endpoint. The surrogate-to-outcome chain rests on reader-study accuracy equivalence plus independently established AJCC stage-survival gradient.
Phototype-stratified prospective evidence is sparse. Daneshjou 2022 quantifies the gap; motivates PMCF subgroup-stratified performance monitoring.
Long-term automated-severity-scoring outcome data are thin. Analytic-validity evidence (Schaap 2022, Huang 2023) strong; durable DLQI / POEM outcome data following AI-PASI deployment unavailable.
AI-triage RCT evidence lags teledermatology RCT evidence. Eminović 2009, Whited 2013, Armstrong 2018 test human-teledermatologist workflows; direct RCT of AI-triage vs. teledermatology not yet published.

These gaps are declared in surrogate-validity-review.md §8 and tracked as PMCF-plan commitments in R-TF-007-002.

Legend​

Domain 1 — Diagnostic accuracy (benefit 7GH)​

Domain 2 — Severity scoring (benefit 5RB)​

Domain 3 — Referral optimisation / care-pathway (benefit 3KX)​

Aggregate coverage​

Declared evidence gaps (for CER and PMCF declaration)​