Do we need this task?

Status (2026-04-21): Decision document. Written after the folder was set up and before significant Claude-time is spent appraising references. If you agree with the conclusion, close or rescope the task. If you disagree, record why at the bottom of this file and proceed.

TL;DR

The task as currently scoped (≥ 20 references, 3 domains, full CRIT1–7 per reference, ~5–7 days of collaborative work) is over-scoped.

Three reasons:

Horiana's §2.1.3 says "at least one of" — structured literature review OR formalised PMCF strategy. The PMCF-plan half is already delivered via R-TF-007-002 + task-3b5 Ingredients 1/3/5/6. Her own 2026-04-20 follow-up call did not reach §4.3; she did not endorse doing this pre-submission.
R-TF-015-011 already contains the VCA literature. Roughly 15–20 diagnostic-accuracy references, 3–5 severity-scoring references, and 5–8 referral-optimisation references already live there with CRIT1–7 appraisal done. The task would largely restructure and reframe existing evidence, not add new evidence.
BSI Round 1 did not raise this. There is no clinical-review non-conformity tied to surrogate-endpoint validity. §2.1.3 is a Horiana pre-emption of a Class IIb scrutiny risk — it strengthens posture, it does not close a BSI-flagged gap.

What the task would actually add (genuinely new, not already in R-TF-015-011): ~3–6 references anchoring the surrogate-to-outcome chain that the existing SotA doesn't cover — stage-at-detection → melanoma survival, regulatory acceptance of PASI/EASI/SCORAD as FDA/EMA endpoints, diagnostic-delay → outcome. Plus the causal-pathway paragraph Horiana explicitly required in the CER.

What it would NOT achieve:

No Round 1 BSI non-conformity closure (none is contingent on this).
No change to the Pillar 1 VCA acceptability verdict (the CER already closes VCA; Celine's §5 verdict already stated no expected bar to overall acceptability on the 9-study sample).
No new primary evidence — literature only.
No change to pillar mapping, rank assignments, integrator-responsibility language, acceptance-criteria thresholds, or any other Round 1 critical axis.

Recommendation: Rescope to a minimum-viable surrogate-validity anchor (1–2 days, ~8–12 targeted references focused on the 3 genuinely-new anchoring claims) + the CER causal-pathway paragraph that Horiana §2.1.3 requires regardless. Drop the full 20–32-reference structured review from the pre-submission scope. Park the full version as a Round 2 enhancement if BSI Round 1 feedback signals that surrogate-validity depth is a live issue.

The question, in full

Horiana's 2026-04-17 Recommendations §2.1.3 gave two mitigations for the Class IIb indirect-benefit concern and said "Expectation: at least one of the following":

Structured literature review on surrogate-endpoint validity.
Formalised PMCF strategy with objectives and timelines for direct outcome data.

Plus, separately from the two mitigations, she said the CER should:

Clearly articulate the causal pathway (performance → decision → benefit).
Explicitly position the selected endpoints as clinically meaningful.

These two CER asks are non-optional and required regardless of which of the two mitigations is chosen.

The current task plan is to deliver mitigation (1) even though mitigation (2) is already on the roadmap — on the rationale that "the safer posture for a Class IIb submission is to deliver both." That is an internal hedge, not a Horiana requirement.

What's already in the audit-visible documents

`R-TF-015-011 State of the Art` — 1,242 lines

Diagnostic accuracy (domain → 7GH). Already cited and appraised:

Haenssle 2018 (Annals of Oncology — CNN vs 58 dermatologists)
Maron 2019 (European Journal of Cancer — AI-based decision support for melanoma)
Maron 2020 (JMIR — human-AI collaboration on pigmented lesions)
Tschandl 2019 (Lancet Oncology — ML vs humans, international diagnostic study)
Tschandl 2020 (Nature Medicine — human-computer collaboration)
Han 2020, 2022 · Jain 2021 · Marsden 2024 · Ferris 2025 · Krakowski 2024 · Brinker 2019 · Marchetti 2019 · Ba 2022 · Chen 2024 · Barata 2023 · Zanchetta 2025 · Muñoz-López 2021
Pooled-malignancy meta-analysis (Maron 2019, Han 2020, Ahadi 2021, Tepedino 2024, Tschandl 2019) — AUC 0.778 baseline
Melanoma sub-domain meta-analysis — AUC 0.81 baseline (Maron 2019, Haenssle 2018, Barata 2023, Chen 2024, Maron 2020, Brinker 2019, Marchetti 2019, Brinker 2019b)

Severity scoring (domain → 5RB). Already cited and appraised:

IHS4 inter-observer reliability: Wiala 2024 · Goldfarb 2021 · Thorlacius 2019 — ICC 0.47 baseline, 0.70 acceptance criterion
Separate searches already done for PASI / SCORAD / UAS / IHS4 AI validation (S02, S03, S07)

Referral optimisation (domain → 3KX). Already cited and appraised:

Eminović 2009 (Archives of Dermatology — patient-assisted teledermatology referral rates)
Hsiao 2008 (JAAD — impact on outpatient care and referrals)
Knol 2006 (J Telemedicine and Telecare — RCT of teledermatology referral decision value)
Giavina-Bianchi 2020 · Mostafa 2022 · Roca 2022
Weighted-average analysis: 14% MD-unaided → 24% teledermatology improvement in referral adequacy
Waiting-time reduction: ~71% achievable; ≥50% set as conservative acceptance criterion

`R-TF-015-003 Clinical Evaluation Report`

Already contains:

Explicit "Pillar 1: Valid Clinical Association (VCA)" subsection stating "No VCA gaps were identified for any claimed output."
"Acceptance Criteria Derivation from State of the Art" table linking each benefit to specific SotA articles, methodology (meta-analysis / weighted average), derived baseline, and acceptance criterion.
Per-benefit sub-criteria achievement statements with measured values vs SotA baselines (7GH: melanoma AUC 0.85 vs 0.81, pooled-malignancy AUC 0.8983 vs 0.778; 5RB: ICC achievement vs 0.47 baseline; 3KX: referral-adequacy achievement vs 14–24% baseline).
Route A (systematic literature review) as Rank 6–7 contribution to the evidence portfolio under MDCG 2020-6 Appendix III.

What the new task would actually add

Reading the task CLAUDE.md against the existing SotA honestly, the delta splits roughly 80 / 20:

80% overlap with existing R-TF-015-011

Diagnostic-accuracy AI-vs-human literature (Maron, Haenssle, Tschandl, etc.), severity-scale inter-rater variability studies (Wiala, Goldfarb, Thorlacius), teledermatology → referral-adequacy studies (Eminović, Hsiao, Knol) are all already in R-TF-015-011 with CRIT1–7 appraisal done.

If the new task re-curates these under a "surrogate-validity" frame, it produces a re-structured presentation of existing evidence. That is cosmetic, not substantive. It is a reframing of what R-TF-015-011 already does.

20% genuinely new anchoring material (the part that matters)

Three bodies of literature that the existing SotA does not cover, because the existing SotA is benchmark-focused (what is the SotA performance level?) rather than outcome-anchor-focused (does improving this surrogate translate to patient benefit?):

Stage-at-detection → survival (7GH anchor). Breslow-thickness → 5-year survival; AJCC 8th-edition staging; melanoma-specific mortality as a function of time-to-treatment. Classic oncology-outcome literature — not in the current SotA because the current SotA is about AI-vs-dermatologist accuracy, not about what happens to patients when diagnosis is delayed. This is the anchor that lets the CER say: "an AUC improvement of X on melanoma detection maps via Breslow-thickness → stage migration → survival."
Regulatory acceptance of dermatology severity scales as surrogate endpoints (5RB anchor). FDA/EMA label-history of PASI 75/90/100 for psoriasis (Langley 2004, Mrowietz 2011), EASI 50/75/90 for atopic dermatitis (Simpson 2014), SCORAD in label text, SALT in alopecia areata. This is a classical surrogate-validation anchor because a scale that regulators accept as the primary endpoint in Phase III drug trials is, by definition, an accepted surrogate for treatment response. The existing SotA does not cite these regulatory histories — it cites severity-scale inter-rater studies, which is a different claim.
Diagnostic-delay → patient outcome (all three anchors, especially 3KX). Dermatology diagnostic-delay literature, waiting-time → outcome studies, primary-care-to-specialist referral adequacy → diagnostic-delay → outcome. The existing SotA cites teledermatology → referral-adequacy; the delta is the next link in the chain — adequacy → outcome.

Delivering those three anchoring bodies is a ~8–12-reference targeted search, not a ≥ 20–32-reference full structured review.

Plus the non-optional CER edit

Horiana's §2.1.3 also requires, regardless of which mitigation is chosen:

A causal-pathway paragraph in the CER (performance → decision → benefit).
An explicit clinically-meaningful positioning of the three surrogate endpoints.

These are CER prose edits, not literature-review work. A few hours at most. They must happen even if this task is cancelled.

So would it achieve something substantial?

Yes and no. Depends on the yardstick.

Yardstick	Verdict	Why
Closes a Round 1 BSI non-conformity	No	BSI Round 1 did not raise a surrogate-endpoint NC. No Round 1 item is contingent on this.
Closes Horiana §2.1.3 bullet 1	Partially	Horiana said "at least one of." The PMCF route (mitigation 2) alone already closes her expectation per her own wording. Adding mitigation 1 is belt-and-braces, not a closure.
Fulfils Horiana §2.1.3's non-optional CER asks (causal pathway + clinical-meaningfulness positioning)	Overkill	Those two asks are CER prose edits of a few hours. A 20–32-reference structured review is the wrong tool for them.
Strengthens the Pillar 1 VCA narrative	Marginally	~80% of what would end up in the structured review is already in R-TF-015-011. The genuine delta is ~3 bodies of literature.
Pre-empts BSI's Nick-style challenge ("controlled-study accuracy doesn't automatically translate to practice outcomes")	Yes, but narrowly	Only the ~20% genuinely-new material (stage-at-detection → survival, regulatory endpoint acceptance, diagnostic-delay → outcome) actually rebuts Nick's challenge. The re-curated 80% does not — it's the same evidence he already considers indirect.
Produces a defensive asset for BSI Round 2 if indirect-benefit becomes a live issue	Yes	This is the strongest argument. A written, CRIT-appraised surrogate-validity review is a useful asset to have on file regardless of whether BSI asks for it.
Fits the one-week-to-resubmission window	No	Full task as scoped = 5–7 days of collaborative work. Under time pressure, quality risk on a ≥ 20-reference CRIT-appraised review is high.
Opportunity cost vs Items 2, 3, 6	Adverse	Those items have actual Round 1 NCs with BSI-visible clocks. This task does not.

The honest answer

If you deliver the full task (≥ 20 references, 3 domains, full CRIT1–7 per reference), you will achieve:

A slightly stronger Pillar 1 VCA narrative than today. Not transformative, because R-TF-015-011 already has most of the evidence.
A defensive asset for Round 2, useful if and only if BSI Round 2 challenges indirect benefit directly.
The ~20% genuinely-new anchoring that would actually rebut Nick — but that is a ~8–12-reference delta, not a ≥ 20-reference undertaking.

What you will NOT achieve: closure of any Round 1 blocker, a verdict change on acceptability, or a response to a BSI-issued question.

If you deliver the minimum-viable version (the ~8–12 genuinely-new references + the CER causal-pathway paragraph + a named subsection in R-TF-015-011 that ties the existing SotA into a surrogate-validity frame):

You close the non-optional Horiana §2.1.3 CER asks.
You capture ~90–95% of the regulatory benefit of the full task.
You preserve the option to expand the review to full scope post-submission if Round 2 signals a need.
You spend 1–2 days, not 5–7.

Three scoping options

Option A — Drop entirely

Close the task. Rely on the PMCF mitigation (2) alone for Horiana §2.1.3. Do the CER causal-pathway paragraph (non-optional) as a standalone 1–2-hour edit.

Pros: zero additional work, no time-pressure quality risk, matches Horiana's "at least one of" wording literally.
Cons: no defensive asset if BSI Round 2 challenges indirect-benefit substance. No new anchoring for Nick's specific concern.
When to pick: if Item 2/3/6 workload is genuinely dominating the week and Horiana signals comfort with PMCF-alone in a later touch.

Option B — Minimum viable (RECOMMENDED)

Write the CER causal-pathway paragraph + clinical-meaningfulness positioning (non-optional, a few hours).
Targeted ~8–12-reference search across the three genuinely-new anchoring claims:
- Melanoma stage-at-detection → survival (3–4 references).
- Regulatory acceptance of dermatology severity scales as surrogate endpoints (3–4 references).
- Diagnostic-delay / referral-adequacy → outcome (3–4 references).
Integrate as a named subsection "Surrogate endpoint validity" in R-TF-015-011, cross-referencing the existing SotA corpus without duplicating it.
Cross-reference from the CER's Pillar 1 VCA subsection and from the CEP's evidence-hierarchy table.

Pros: captures ~90–95% of regulatory benefit at ~20% of cost; fits the week; avoids quality risk of large-N review under time pressure.
Cons: reads leaner than the 20–32-reference version; if BSI Round 2 asks for a full structured review, we would need to extend it then.
When to pick: default. This is the version that matches the actual risk profile.

Option C — Full task as currently scoped

Proceed per existing CLAUDE.md in this folder — ≥ 20 references, 3 domains, full CRIT1–7 appraisal per reference, full structured review document.

Pros: most defensive; single-shot delivery; maximum Round-2 preparedness.
Cons: 5–7 days of collaborative work in the resubmission week; 80% overlap with existing SotA; time-pressure quality risk on the appraisal layer; no Round 1 NC closure to show for the effort.
When to pick: only if Horiana signals (in a follow-up touch) that this is material to her comfort on Class IIb sign-off, or if BSI Round 1 response suggests surrogate-validity is surfacing as a live concern.

Recommendation

Option B — minimum viable.

Execution plan:

Keep the folder but rescope CLAUDE.md to the minimum-viable target: 8–12 references across the three genuinely-new anchoring claims + the CER causal-pathway edit. Update the reference-count table accordingly.
Do the CER edit first (non-optional per Horiana, independent of literature-review scope). This closes the Horiana §2.1.3 CER asks regardless of what happens to the task.
Targeted sourcing: user supplies 3–4 references per claim focused on the specific anchoring body (not a re-do of the SotA accuracy literature).
Short structured review (~5–10 pages, not 20–30). Named subsection in R-TF-015-011. CRIT appraisal kept, but proportionate.
Preserve the path to Option C — if Round 2 signals a need, the folder scaffolding and any per-reference appraisal already done can be extended.

Open questions for the user

Do you agree that the PMCF mitigation (via R-TF-007-002 + task-3b5) plus the CER causal-pathway edit is a literal-wording closure of Horiana §2.1.3?
Is there a specific BSI reviewer signal (from the 2026-03-25 call or elsewhere) that makes surrogate-endpoint-validity depth a higher priority than it reads on paper?
If we scope this to Option B, do we need Horiana sign-off on the rescope, or is this a decision we own?
Does Option A (drop entirely) materially change the defensibility posture, or is the CER causal-pathway paragraph alone enough for Round 1?

If you disagree with this recommendation

Record your rationale below this line. Document the yardstick of "substantial" you're using and why the cost-benefit comes out differently.

TL;DR​

The question, in full​

What's already in the audit-visible documents​

R-TF-015-011 State of the Art — 1,242 lines​

R-TF-015-003 Clinical Evaluation Report​

What the new task would actually add​

80% overlap with existing R-TF-015-011​

20% genuinely new anchoring material (the part that matters)​

Plus the non-optional CER edit​

So would it achieve something substantial?​

The honest answer​

Three scoping options​

Option A — Drop entirely​

Option B — Minimum viable (RECOMMENDED)​

Option C — Full task as currently scoped​

Recommendation​

Open questions for the user​

If you disagree with this recommendation​