Research prompts — external deep-research tools

Prompts prepared on 2026-04-20 to seed the surrogate-endpoint literature review with candidate peer-reviewed references. The user runs these in three external tools (Perplexity Deep Research, Gemini Deep Research, Claude/Opus Deep Research), pastes the outputs back into this task, and Claude performs CRIT1–7 appraisal, populates references/<domain>/<author-year-keyword>.md, and drafts surrogate-validity-review.md.

Each prompt is self-contained and copy-paste ready. Framing is varied so the three tools complement rather than duplicate each other:

Perplexity — breadth, fast retrieval with DOIs.
Gemini Deep Research — depth, structured multi-step survey with a written report.
Opus / Claude Deep Research — reasoning-heavy appraisal with pre-graded evidence weights, optimised for regulatory defensibility.

Do not edit the bracketed domain headers or the CRIT-aligned field requests — they map 1:1 to the folder structure and to the CER's CRIT1–7 appraisal methodology.

Prompt 1 — Perplexity (Deep Research mode)

You are assembling a regulatory literature bibliography for a Class IIb CE-marked AI-based dermatology clinical decision support device under MDR Annex I/XIV and MDCG 2020-1 "Pillar 1 — Valid Clinical Association". The device's clinical benefit is demonstrated indirectly via three surrogate-endpoint families, and I need peer-reviewed evidence that each of those surrogates is an accepted proxy for a patient-relevant outcome.

Task: produce a structured bibliography of peer-reviewed references, grouped by the three domains below. For each reference give full citation (authors, year, title, journal, volume/pages, DOI) and a 2–3 sentence extract stating (i) the surrogate-to-outcome linkage it supports and (ii) the quantitative claim if any (effect size, AUC, HR, % reduction, with 95% CI where reported).

Domains (minimum counts are hard floors; target counts are ideal):

[DIAGNOSTIC ACCURACY]  min 8, target 10–12.
Questions to anchor evidence for:
- Is diagnostic accuracy (sensitivity/specificity/AUC/concordance with histopathology) an accepted surrogate in dermatology/AI-dermatology literature?
- Do improvements in diagnostic accuracy translate into earlier appropriate treatment, reduced diagnostic delay, or improved stage-at-detection for skin cancer (melanoma and NMSC)?
- Is there meta-analytic or large-cohort evidence linking AI-assisted or teledermatology diagnostic concordance to downstream clinical management or outcomes?
Must-have landmarks to cover if you can retrieve them: Esteva 2017 (Nature), Haenssle 2018, Tschandl 2020 (Nature Medicine), Liu 2020 meta-analysis of AI diagnostic accuracy, plus stage-at-detection → survival evidence for melanoma (SEER / AJCC-based).
Also include at least 1–2 balancing references (generalisability limits, phototype bias, e.g. Daneshjou 2022, Han 2018).

[SEVERITY SCORING]  min 6, target 8–10.
Questions to anchor evidence for:
- Are established severity scales (PASI, EASI, SCORAD, IGA, SALT, GAGS) accepted clinical endpoints in regulatory drug-approval pathways (FDA/EMA)?
- Does objective/automated/digital severity scoring show acceptable concordance (ICC, kappa, correlation) with expert-panel scoring?
- Does tighter severity tracking drive better treatment titration and improved PRO / disease-control outcomes (DLQI, POEM, clinical remission)?
Prefer regulatory-acceptance history papers, inter-observer-variability studies, and validation studies of objective/digital severity scoring.

[REFERRAL OPTIMISATION / CARE-PATHWAY METRICS]  min 6, target 8–10.
Questions to anchor evidence for:
- Do teledermatology / AI-triage interventions improve referral appropriateness, waiting times, and access to specialist care?
- Is there outcome-equivalence evidence for remote vs in-person dermatology assessment (especially skin cancer, chronic inflammatory disease)?
- Are there health-economic / pragmatic trial results on AI or teledermatology triage at primary-care level?
Must-have landmarks if available: Armstrong 2021, Bashshur 2015, recent Cochrane or systematic reviews on teledermatology.

Output format:
- Three sections, one per [DOMAIN].
- Within each domain, one bullet per reference: full citation → DOI → 2–3 sentence extract covering linkage + quantitative claim.
- Strongly prefer meta-analyses, systematic reviews, landmark RCTs, large prospective cohorts, and regulatory-endpoint validation papers. Avoid editorials, narrative reviews without new data, preprints without peer review, and non-English sources unless landmark.
- End with a short "Gaps / thin areas" paragraph flagging where the literature is weakest per domain.

Return only the bibliography and the gaps paragraph. Do not summarise the device or the regulatory context back to me.

Prompt 2 — Gemini Deep Research

Run a deep research survey producing a structured report on the validity of three surrogate-endpoint families as accepted proxies for patient-relevant outcomes in dermatology. The regulatory context is MDCG 2020-1 "Pillar 1 — Valid Clinical Association" for a Class IIb CE-marked AI-based dermatology clinical decision support device; the research will anchor the Pillar 1 literature review of the device's Clinical Evaluation Report. Scope is peer-reviewed evidence only.

Structure the final report in exactly three parts, one per surrogate-endpoint domain. Each part must independently answer three anchor questions, in order:

(1) Accepted-surrogate claim — is this surrogate accepted as a clinical endpoint in the peer-reviewed dermatology / regulatory literature? Cite validation and regulatory-acceptance history.
(2) Directional claim — are improvements in the surrogate associated with improvements in patient-relevant outcomes (earlier diagnosis, better disease control, faster access to specialist care)?
(3) Quantitative claim — where has the magnitude of the surrogate-to-outcome association been estimated in at least one peer-reviewed source? Report effect sizes with 95% CI where available.

Domains:

A. DIAGNOSTIC ACCURACY (sensitivity, specificity, AUC, top-1/top-5 concordance with histopathology reference standard; AI-assisted vs unassisted reader performance; teledermatology concordance). Patient-relevant outcomes: earlier/more accurate diagnosis → earlier appropriate treatment → reduced morbidity/mortality, most load-bearing being skin cancer stage-at-detection → survival. Minimum 8 peer-reviewed references, target 10–12. Cover meta-analyses of AI dermatology diagnostic performance, teledermatology concordance → outcome studies, and stage-at-detection → survival evidence for melanoma and non-melanoma skin cancer. Include balancing references on generalisability limits and phototype bias.

B. SEVERITY SCORING (inter-observer agreement, concordance of objective/automated/digital scoring with expert panels, regulatory acceptance of PASI, EASI, SCORAD, IGA, SALT, GAGS in drug approvals). Patient-relevant outcomes: improved treatment titration and disease control, improved PROs (DLQI, POEM), higher clinical remission rates. Minimum 6 references, target 8–10. Cover FDA/EMA regulatory-endpoint history, inter-observer variability studies, objective/digital severity validation, and severity-driven treatment-escalation outcome studies.

C. REFERRAL OPTIMISATION / CARE-PATHWAY METRICS (referral appropriateness rate, waiting-time reduction, remote-assessment adequacy, proportion manageable remotely). Patient-relevant outcomes: faster specialist access for genuine cases, equitable access, reduced system bottlenecks, unchanged-or-improved outcomes at lower cost. Minimum 6 references, target 8–10. Cover teledermatology waiting-time and referral-appropriateness systematic reviews, access-to-care outcome studies (skin cancer, chronic inflammatory disease), remote-care outcome-equivalence studies, and health-economic / pragmatic trials of AI or teledermatology triage.

Evidence-quality preference order: meta-analysis > systematic review > landmark RCT > large prospective cohort > regulatory-endpoint validation paper > retrospective cohort. Exclude editorials, narrative reviews that add no new data, non-peer-reviewed preprints, and non-English sources unless landmark.

For each reference, report: full citation (authors, year, title, journal, volume/pages, DOI), study design, study population, primary metric with 95% CI, and a two-sentence extract of the surrogate-to-outcome linkage it supports.

Close with a cross-domain synthesis paragraph articulating the causal pathway: diagnostic accuracy → clinical decision-making → expected clinical and organisational benefits, with severity tracking and referral optimisation as the two organisational-outcome channels. End with a "residual uncertainty" paragraph flagging where evidence is thinnest.

Prompt 3 — Opus / Claude Deep Research

Produce a structured, regulatory-defensible bibliography supporting a "Valid Clinical Association" (MDCG 2020-1 Pillar 1) argument for three surrogate-endpoint families used by a Class IIb AI-based dermatology clinical decision support device under MDR. I will use the output directly as the reference pool for a structured literature review; after you return it, I will apply a CRIT1–7 appraisal and write the synthesis prose.

Your job: retrieve peer-reviewed evidence that each of the three surrogate domains below is an accepted proxy for a patient-relevant outcome in dermatology, and pre-structure the references in a way that makes appraisal frictionless.

For each reference you return, provide these fields:
1. Full citation (authors, year, title, journal, volume/pages, DOI).
2. Study design and population (n, setting, country/region if relevant).
3. Primary metric with 95% CI (effect size, AUC, HR, ICC, kappa, % reduction, etc.).
4. Surrogate-to-outcome linkage in two sentences — explicitly stating which patient-relevant outcome is being anchored.
5. Pre-graded evidence weight on a 1–3 scale under the CRIT5 criterion: 3 = meta-analysis or systematic review, 2 = RCT or large prospective cohort, 1 = retrospective cohort / case series / validation study.
6. Known limitations or risk-of-bias concerns in one sentence.
7. One-line "why this anchors the claim" — what specific sub-claim of the surrogate-to-outcome argument it supports.

Three domains, each with a hard minimum and target:

DOMAIN 1 — Diagnostic accuracy as a proxy for clinical outcome (min 8, target 10–12).
Anchor-claim sub-targets to hit:
(a) Diagnostic accuracy is an accepted proxy for downstream clinical management in dermatology / AI-dermatology.
(b) Improved accuracy translates into earlier appropriate treatment and/or reduced diagnostic delay.
(c) Stage-at-detection for skin cancer (melanoma, NMSC) predicts survival, so earlier detection is on the causal pathway to patient benefit.
(d) AI-assisted and teledermatology concordance with reference standard is consistent with, or superior to, unassisted clinicians in representative populations.
Include at least one balancing reference on generalisability / phototype bias (Daneshjou 2022 or equivalent) and one on AI dermatology classifier limits (Han 2018 or equivalent).

DOMAIN 2 — Severity scoring as a proxy for treatment optimisation (min 6, target 8–10).
Anchor-claim sub-targets:
(a) PASI, EASI, SCORAD, IGA, SALT, GAGS are accepted regulatory endpoints in FDA/EMA dermatology drug approvals.
(b) Manual expert severity scoring has non-trivial inter-observer variability that affects treatment decisions.
(c) Objective / automated / digital severity scoring shows acceptable concordance with expert consensus.
(d) Severity-driven treatment escalation improves disease control and PRO outcomes.

DOMAIN 3 — Referral optimisation / care-pathway metrics as a proxy for access-to-care and system outcomes (min 6, target 8–10).
Anchor-claim sub-targets:
(a) Teledermatology / AI triage improves referral appropriateness and reduces waiting times.
(b) Remote-care outcomes are equivalent to in-person for relevant dermatology indications.
(c) AI triage or decision support in primary care improves access-to-care, especially for underserved populations.
(d) Health-economic analyses show favourable cost-outcome trade-offs for teledermatology / AI-triage pathways.

Return the output in three clearly delimited sections (DOMAIN 1, DOMAIN 2, DOMAIN 3), then a short cross-domain synthesis paragraph articulating the causal pathway diagnostic accuracy → clinical decision-making → expected clinical and organisational benefits, and a final "gaps" paragraph naming sub-claims that are under-evidenced in the retrieved set so I know where to search more.

Exclude: editorials, narrative reviews adding no new data, non-peer-reviewed preprints, non-English sources unless landmark. Prefer sources from 2015 onward for AI/teledermatology evidence, no date restriction for regulatory-endpoint history papers.

Where to paste the returned output

When the three tools return results, save each dump verbatim under a new subfolder so we keep the raw source material separately from the appraised references:

task-3b6-surrogate-endpoint-literature-review/
├── research-prompts.md            ← this file
└── raw-research-output/
    ├── perplexity-2026-04-20.md   ← paste Perplexity output here
    ├── gemini-2026-04-20.md       ← paste Gemini Deep Research output here
    └── opus-2026-04-20.md         ← paste Claude/Opus Deep Research output here

From those three raw dumps, Claude will dedupe, triage against the ≥20-total / per-domain minima, and populate references/<domain>/<author-year-keyword>.md with full CRIT1–7 appraisal. See this task's CLAUDE.md for the working model and per-reference file schema.

Prompt 1 — Perplexity (Deep Research mode)​

Prompt 2 — Gemini Deep Research​

Prompt 3 — Opus / Claude Deep Research​

Where to paste the returned output​

Prompt 1 — Perplexity (Deep Research mode)

Prompt 2 — Gemini Deep Research

Prompt 3 — Opus / Claude Deep Research

Where to paste the returned output