R-TF-015-012 PMS Study Protocol: Physician-Reported Clinical Performance of the Legacy Device in Routine Clinical Practice

Study title

Cross-Sectional Observational Study with Retrospective Recall Evaluating the Physician-Reported Clinical Performance of the Legacy Device in Routine Clinical Practice: A Multi-Site Study Under Post-Market Surveillance

Study objectives

Primary objective

Evaluate whether the legacy device achieves its three declared clinical benefits in routine clinical practice, and quantify the magnitude of benefit relative to pre-specified Minimum Clinically Important Difference (MCID) thresholds and published State of the Art (SotA) baselines.

Secondary objectives

Characterise the professional opinion of healthcare professionals regarding the device's clinical utility across all benefit dimensions
Assess the consistency of physician-reported perceived benefits across institutional settings, professional roles, and duration of device use
Evaluate data robustness through a pre-specified sensitivity analysis stratifying results by data source reliability (consulted institutional records vs. professional estimate)
Collect safety and risk data, including reports of misleading device outputs, usability issues, and overall perceived safety, to support the benefit-risk assessment required by MDR Article 83

Declared clinical benefits under evaluation

The three consolidated clinical benefits, as defined in the Clinical Evaluation Plan (R-TF-015-001) and the Clinical Evaluation Report (R-TF-015-003), are:

ID	Benefit	Sub-criteria
7GH	Diagnostic accuracy --- improves HCP accuracy in diagnosis of dermatological conditions, including rare diseases and malignancy detection	(a) General diagnostic accuracy improvement, (b) rare disease identification, (c) malignancy detection and triage
5RB	Objective severity assessment --- measures disease involvement objectively, quantitatively, reproducibly	(a) Reproducibility (inter-observer consistency), (b) treatment monitoring precision, (c) longitudinal severity tracking
3KX	Care pathway optimisation --- improves referral decisions, reduces waiting times, enables remote care	(a) Waiting time reduction, (b) referral adequacy improvement, (c) remote care enablement

Study design

Design type

Cross-sectional observational study with retrospective recall, collecting physician-reported perceived outcomes.

Design rationale

This study uses a cross-sectional design (data collected at a single time point via questionnaire) with retrospective recall elements (questions ask respondents to estimate outcomes from "the last 12 months" or "since incorporating the device"). The cross-sectional design captures a snapshot of accumulated clinical experience across all participating sites. Physician-reported perceived outcomes are the appropriate data collection method because the study evaluates the device's perceived impact on clinical workflows, diagnostic processes, and care pathways --- outcomes that are directly observable by the treating physician but not systematically recorded in a single data system. The distinction between physician-reported perceived outcomes and independently measured outcomes is acknowledged as a methodological limitation (Section 11.1) and addressed through the evidence quality control sensitivity analysis (Section 10.4).

Number of sites

Up to 21 clinical sites (all institutions with active legacy device contracts as of the study date).

Study duration

Data collection window: 14 days (target), with provision for extension. If fewer than 30 responses have been received by day 14, the collection window is extended by 7 days and a reminder email is sent to all non-responding institutions. Total study duration including analysis and reporting: 18--21 days.

Regulatory basis

MDR Article 83: Obligation to proactively gather clinical performance data through post-market surveillance, including "the collection and utilisation of available information, in particular... on any undesirable side-effects" (Article 83(1))
MDR Article 85: PMS report requirements for Class IIa devices
MDCG 2020-6 §6.2.2: Use of post-market data from a legacy device as clinical evidence for the current device under MDR
MDCG 2020-6 §6.5.e: Gap-bridging through "clinically relevant scientifically sound questionnaires"
MDCG 2020-6 Appendix III, Rank 4: "Outcomes from studies with potential methodological flaws but where data can still be quantified and acceptability justified." The considerations column explicitly states: "High quality surveys may also fall into this category." This study is designed to satisfy the Rank 4 "high quality survey" classification through formal protocol, pre-specified endpoints, MCID thresholds, SotA comparators, sensitivity analysis, and acknowledged limitations.

Compliance with MDCG 2020-6 Section 6.5.e

MDCG 2020-6 Section 6.5.e states that scientifically sound studies used to bridge clinical data gaps "will normally include (note, this is not a complete list)" the following elements. This section maps each requirement to the corresponding protocol section.

MDCG 2020-6 §6.5.e requirement	Protocol section
Clearly stated research question(s), objective(s) and related endpoints	Section 2 (Study objectives): primary objective, secondary objectives, and 3 declared clinical benefits. Section 6 (Co-primary endpoints) and Section 7 (Secondary endpoints): 3 co-primary quantitative endpoints, 6 supportive quantitative endpoints, 10 Likert professional opinion endpoints, and 5 safety endpoints.
An evaluation of potential sources of bias or study distortion, and the impact of these factors on the potential validity of results	Section 11 (Acknowledged limitations): 5 identified sources of bias (physician-reported outcomes, recall bias, non-randomised cross-sectional design with retrospective recall, selection bias, single-instrument data collection), each with specific mitigation measures and impact assessment.
Design with an appropriate rationale and statistical analysis plan	Section 3 (Study design): design type and rationale. Section 10 (Statistical analysis plan): descriptive statistics, co-primary MCID testing with Holm-Bonferroni correction, effect size reporting, sensitivity analysis, subgroup analyses, benefit coverage assessment, safety assessment, and pre-specified handling of negative results. Section 14 (Sample size justification): power analysis supporting the target sample size.
A plan for analysis of the data and for drawing appropriate conclusion(s)	Section 10.6 (Benefit coverage assessment): pre-specified criteria for confirming each benefit (Holm-Bonferroni-adjusted p < 0.05, Cohen's d >= 0.3, Likert mean >= 3.5). Section 10.8 (Handling of negative results): pre-specified procedure if co-primary endpoints fail, including assessment of whether claims should be retained, narrowed, or removed per MDCG 2020-6 §6.5.e. Section 15 (Reporting): study report structure and integration into regulatory documents.

Study population

Inclusion criteria

Healthcare professional (dermatologist, primary care physician, or hospital manager) employed at an institution with an active legacy device contract
Institution has been using the legacy device for >= 6 months at the time of survey administration
Respondent has personally used or directly supervised the use of the device in clinical practice
Respondent provides informed consent by completing the questionnaire's consent section

Exclusion criteria

Institutions with < 6 months of legacy device use (insufficient clinical experience for meaningful retrospective assessment)
Individuals who have not personally used or directly supervised use of the device
Individuals who do not provide consent

Eligible population

As of the study date, 21 institutions hold active legacy device contracts. Based on contract scope (each institution typically employs dozens to hundreds of healthcare professionals who may interact with the device), the eligible population is estimated at 200--500 individuals. Not all eligible individuals will respond --- the study targets a minimum of 30 respondents and aims for >= 45.

Data collection instrument

Instrument description

The data collection instrument is a structured physician questionnaire comprising 40 items across 6 sections:

Section A (5 items): Demographics and context (role, institution, duration of use, case volume, clinical setting)
Sections B--D (28 items): Benefit-specific questions --- 10 Likert-scale professional opinion items, 9 quantitative endpoint items, and 9 evidence quality control items
Section E (3 items): Overall benefit assessment (Likert) and optional qualitative feedback
Section F (5 items): Device safety and risk --- 2 binary (yes/no) safety screening items with conditional free-text follow-ups, and 1 Likert-scale overall safety assessment

The questionnaire is available in English and Spanish. Estimated completion time: 11--14 minutes.

Recall time frames

Quantitative questions use two distinct time frames, selected based on the nature of the metric:

"In the last 12 months" (B2, B4, B6, C4, C5, D6): Used for absolute counts and rates (e.g., number of cases identified, percentage of patients monitored). A fixed 12-month window standardises the recall period across respondents with different durations of device use, reducing variability and enabling meaningful cross-respondent comparison.
"Since incorporating the device" (D2, D4, D7): Used for relative change metrics (e.g., percentage reduction in waiting times, percentage improvement in referral adequacy). These questions ask respondents to estimate the cumulative effect of device adoption on their clinical workflows. A "since incorporation" frame captures the full magnitude of change, which is the relevant comparator for the SotA baselines that describe the state without the device.

This distinction is intentional and documented in the questionnaire instrument. The sensitivity analysis (Section 10.4) and subgroup analysis by duration of use (Section 10.5) provide additional controls for time-frame-related variability.

Section F is included to ensure the study collects both benefit and safety data, consistent with MDR Article 83(1) which requires PMS to include information on "any undesirable side-effects." This prevents the study from being characterised as a benefit-only confirmation exercise.

Instrument reference

Full questionnaire specification: questionnaire.mdx (in this folder). The questionnaire was validated through synthetic data simulation (Phase 2) before deployment.

Administration

The questionnaire is administered electronically via a survey platform (e.g., Google Forms). Respondents receive an email invitation with a cover letter explaining the study purpose, referencing this protocol, and encouraging consultation of institutional records when answering quantitative questions.

Evidence quality control

After each quantitative question, respondents are asked:

"To provide the figure above, did you: (a) consult your records, institutional statistics, or data systems, or (b) provide a professional estimate based on your experience?"

This question does not affect the evidence classification of individual data points. It enables the pre-specified sensitivity analysis (Section 10.4).

Co-primary endpoints

The study designates one co-primary endpoint per benefit (3 co-primary endpoints total). This structure avoids the multiplicity problem of testing 9 independent hypotheses at alpha = 0.05 (which would inflate the family-wise error rate to ~37%). Each co-primary endpoint was selected as the most clinically meaningful and broadly applicable metric for its benefit. The remaining 6 quantitative endpoints are designated as supportive quantitative secondary endpoints (Section 7.4).

All endpoints measure physician-reported perceived outcomes --- that is, the respondent's estimate of improvement based on their clinical experience, not independently verified measurements. This distinction is critical for data appraisal in the CER (see Section 11.1).

All quantitative questions are time-bounded ("in the last 12 months" or "since incorporating the device").

Benefit 7GH --- Diagnostic accuracy

Endpoint ID	Question	Metric	Unit	Sub-criterion
B2	Physician-reported perceived percentage of cases with clinically significant diagnostic assessment change	Perceived diagnostic assessment change rate	%	(a) General diagnostic accuracy

Selection rationale: B2 is the broadest measure of diagnostic accuracy benefit, covering the most clinically common scenario (change in diagnostic assessment across all dermatological conditions). B4 (rare diseases) and B6 (malignancy) address important sub-criteria but affect smaller case volumes and are designated secondary.

Benefit 5RB --- Objective severity assessment

Endpoint ID	Question	Metric	Unit	Sub-criterion
C4	Physician-reported number of treatment decisions directly informed by device severity scores	Perceived treatment decisions informed	Count/year	(b) Treatment monitoring precision

Selection rationale: C4 measures the most direct clinical impact of severity assessment --- whether the device's severity scores actually change treatment decisions. C5 (longitudinal monitoring rate) addresses uptake rather than clinical impact and is designated secondary.

Benefit 3KX --- Care pathway optimisation

Endpoint ID	Question	Metric	Unit	Sub-criterion
D4	Physician-reported perceived improvement in referral adequacy	Perceived referral adequacy improvement	%	(b) Referral adequacy

Selection rationale: D4 addresses the most consistently cited care pathway concern in the SotA literature (inappropriate referrals burden secondary care) and was specifically raised by BSI. D2 (waiting times), D6 (remote adequacy), and D7 (remote volume) are designated secondary.

Secondary endpoints

Professional opinion (Likert scale)

Endpoint ID	Question	Benefit	Sub-criterion
B1	"The device improves my diagnostic accuracy for dermatological conditions."	7GH	(a) General accuracy
B3	"The device helps me identify conditions I might otherwise miss, including rare or uncommon diseases."	7GH	(b) Rare diseases
B5	"The device improves the detection or triage of potentially malignant lesions from primary care."	7GH	(c) Malignancy
C1	"The device provides objective, reproducible severity measurements that improve my clinical monitoring."	5RB	(a) Reproducibility
C2	"The device's severity scores help me track treatment response more precisely than my clinical assessment alone."	5RB	(b) Treatment monitoring
C3	"Different clinicians using the device on the same patient obtain consistent severity assessments."	5RB	(a) Inter-observer consistency
D1	"The device has contributed to reducing waiting times for specialist dermatological consultation."	3KX	(a) Waiting times
D3	"The device improves the adequacy of my referral decisions."	3KX	(b) Referral adequacy
D5	"The device enables adequate remote clinical assessment of dermatological patients."	3KX	(c) Remote care
E1	"Overall, the device delivers meaningful clinical benefits in my daily practice."	All	Overall

Qualitative feedback

Endpoint ID	Question	Purpose
E2	"What is the most significant clinical impact the device has had in your practice?" (optional)	Qualitative benefit characterisation
E3	"Are there any limitations or areas for improvement you have identified in the device?" (optional)	PMS completeness --- demonstrates genuine surveillance

Supportive quantitative endpoints

The following quantitative endpoints provide additional evidence for each benefit dimension. They are analysed using the same methods as the co-primary endpoints (Section 10.2) but are not included in the co-primary hypothesis testing. Their results are reported descriptively and contribute to the overall benefit assessment.

Endpoint ID	Question	Metric	Unit	Benefit	Sub-criterion
B4	Physician-reported rare/uncommon conditions identified with device aid	Perceived rare disease identification count	Count/year	7GH	(b) Rare disease identification
B6	Physician-reported malignancy cases identified/confirmed with device aid	Perceived malignancy detection count	Count/year	7GH	(c) Malignancy detection and triage
C5	Physician-reported percentage of monitored patients tracked with device over multiple visits	Perceived longitudinal monitoring rate	%	5RB	(c) Longitudinal severity tracking
D2	Physician-reported perceived decrease in average waiting times for specialist consultation	Perceived waiting time reduction	%	3KX	(a) Waiting time reduction
D6	Physician-reported percentage of remote consultations with adequate assessment (no in-person follow-up needed)	Perceived remote assessment adequacy	%	3KX	(c) Remote care adequacy
D7	Physician-reported perceived increase in patients manageable remotely	Perceived remote volume increase	%	3KX	(c) Remote care volume

Safety and risk endpoints

Endpoint ID	Question	Type	Purpose
F1	"In the last 12 months, have you observed any cases where the device output was misleading or could have led to an incorrect clinical decision if not verified by a clinician?"	Binary (Yes/No)	Safety signal screening --- incident/near-miss identification
F1a	If yes, describe briefly (number, type, context)	Conditional free-text	Characterisation of misleading output incidents
F2	"Have you identified any usability issues or technical problems that affected clinical use of the device?"	Binary (Yes/No)	Usability problem identification per IEC 62366-1
F2a	If yes, describe briefly	Conditional free-text	Characterisation of usability issues
F3	"Overall, I consider the device safe for use in its intended clinical setting."	Likert (1--5)	Overall perceived safety assessment

Safety endpoints are secondary. They contribute to the benefit-risk assessment in the PMS report (MDR Article 85) and complement the device's incident history (7 non-serious incidents, 0 serious). The proportion of respondents reporting misleading outputs (F1) or usability issues (F2) is reported alongside the overall perceived safety score (F3 mean and 95% CI). A mean F3 score significantly below neutral (3.0) would constitute a safety signal requiring follow-up investigation.

Pre-specified MCID thresholds

MCID thresholds define the minimum improvement that is clinically meaningful. Results exceeding the MCID are interpreted as evidence of a real clinical benefit, not merely a statistically detectable difference.

Co-primary and supportive endpoint MCIDs

MCIDs are derived from the published SotA literature reviewed in the Clinical Evaluation Report (R-TF-015-003) and the State of the Art document (R-TF-015-011). The derivation rationale for each threshold is provided below. MCIDs apply to both co-primary endpoints (B2, C4, D4) and supportive quantitative endpoints (B4, B6, C5, D2, D6, D7).

Endpoint	MCID	Unit	Derivation rationale
B2 --- Diagnostic assessment change rate	5%	Percentage points	Published SotA shows HCP accuracy improvement of +6.36% overall with AI support (R-TF-015-011). A 5% change rate represents approximately 1 in 20 cases where the device meaningfully altered the diagnosis --- sufficient to indicate clinical impact on diagnostic workflows.
B4 --- Rare disease identification count	3	Cases/year	With a low base rate of rare dermatological conditions in routine practice, identifying 3 additional cases per year per respondent represents a meaningful contribution to early diagnosis. No published comparator exists for this specific metric.
B6 --- Malignancy detection count	5	Cases/year	Published malignancy detection sensitivity for PCPs without AI is 0.663 (R-TF-015-011, Burton et al. 1998). Five additional malignancy detections per year per respondent, in the context of typical dermatological caseloads, represents a clinically significant improvement in cancer screening.
C4 --- Treatment decisions informed	10	Decisions/year	Published SotA shows that formal severity scoring is used in ~25% of dermatology visits (Hillary & Lambert 2021) and alters treatment plans in 14--36% of encounters where scores are actually used. Ten severity-informed treatment decisions per year (~1 per month) represents a conservative threshold for systematic integration of objective severity data into clinical decision-making --- consistent with the expected increase when the 3--10 minute manual scoring burden is eliminated.
C5 --- Longitudinal monitoring rate	5%	Percentage points	Represents a meaningful shift in monitoring practice. SotA establishes that human inter-observer ICC for severity assessment is 0.47 (Goldfarb et al. 2021), indicating that objective device-based monitoring fills a genuine clinical need.
D2 --- Waiting time reduction	5%	Percentage points	Published SotA shows ~71% reduction achievable with teledermatology tools (Giavina-Bianchi et al. 2020). A 5% threshold is conservative, allowing detection of even modest improvements. The CER acceptance criterion is >= 50% reduction.
D4 --- Referral adequacy improvement	5%	Percentage points	Published SotA shows 14% reduction in unnecessary referrals using medical devices, 24% with teledermatology (Baker et al. 2022, Eminovic et al. 2009). A 5% threshold captures clinically meaningful improvements well below published achievable levels. The CER acceptance criterion is >= 30% reduction.
D6 --- Remote assessment adequacy	5%	Percentage points	SotA establishes ~55% of patients can be managed remotely with teledermatology (R-TF-015-011). A 5% threshold is appropriate for this metric given high variability in remote care implementation across sites.
D7 --- Remote volume increase	5%	Percentage points	Same rationale as D6. The CER acceptance criterion for remote management capacity is >= 58%.

Secondary endpoint MCIDs

Endpoint	MCID	Unit	Derivation rationale
Likert professional opinion (B1, B3, B5, C1--C3, D1, D3, D5, E1)	0.5	Points above neutral (3.0) on 1--5 scale	A mean of 3.5 (halfway between neutral and "Agree") indicates a meaningful positive signal across respondents. This is a standard threshold for 5-point Likert scales in healthcare professional surveys.

State of the Art comparators

For each primary endpoint, this section defines the published baseline without the device. These baselines are derived from the SotA literature review documented in R-TF-015-011 (State of the Art) and the acceptance criteria derivation in R-TF-015-003 (CER), Section "Acceptance Criteria Derivation from State of the Art."

The study compares observed data against these baselines to determine whether the legacy device's real-world clinical performance is consistent with what published literature demonstrates for comparable interventions.

Benefit 7GH --- Diagnostic accuracy

Endpoint	SotA comparator (baseline WITHOUT device)	SotA comparator (baseline WITH comparable AI)	Source
B2 --- Diagnostic assessment change rate	Unaided HCP diagnostic accuracy: overall 49% top-1, sensitivity 69%, specificity 76.4%. PCPs: accuracy 41.9%, sensitivity 66.3%. Dermatologists: accuracy 57%, sensitivity 73%	AI-assisted improvement: +6.36% accuracy (overall), +9.30% (PCPs), +5.30% (dermatologists). Range across studies: +5.30% to +20.70%	Escalé-Besa et al. 2023, Han et al. 2020/2022, Kim et al. 2022 (unaided); Ba et al. 2022, Ferris et al. 2025, Maron et al. 2020, Tschandl et al. 2020 (AI-assisted). CER acceptance criterion: >= +15%
B4 --- Rare disease identification	No published baseline for rare disease identification count per practitioner per year. General rare disease detection accuracy for AI: +26.77 pp (BI_2024 study, GPP)	Published AI accuracy improvement for rare diseases: +26.77 pp in prospective MRMC study (BI_2024)	BI_2024 study. No population-level baseline exists for this metric.
B6 --- Malignancy detection	PCP unaided malignancy detection sensitivity: 0.663 [0.61--0.71] (Burton et al. 1998, Gerbert et al. 1996). Dermatologist melanoma sensitivity: 0.734 [0.67--0.79] (meta-analysis)	AI-assisted melanoma sensitivity: 74.6--85.7% (Maron et al. 2020, Barata et al. 2023). Published AUC: 0.81 [0.78--0.84] (meta-analysis)	Maron et al. 2019/2020, Haenssle et al. 2018, Brinker et al. 2019, Chen et al. 2024. CER acceptance criterion: AUC >= 0.85 (melanoma), >= 0.90 (pooled malignancy)

Benefit 5RB --- Objective severity assessment

Endpoint	SotA comparator (baseline WITHOUT device)	SotA comparator (baseline WITH comparable AI)	Source
C4 --- Treatment decisions informed	No published study directly quantifies severity-informed treatment decisions per clinician per year. However, the SotA establishes three convergent findings that characterise the baseline: (1) Low formal scoring frequency: Only ~25% of dermatologists report using PASI at every visit; >50% never use DLQI (Hillary & Lambert 2021, n=149 dermatologists). SCORAD and EASI require 3--10 and 2--6 minutes respectively per assessment, limiting routine adoption. (2) When scores are used, they alter treatment in 14--36% of encounters: A clinical audit of 268 consultations found DLQI scores directly influenced treatment decisions in 14% of general dermatology encounters and 36.2% of specialised GP encounters. In psoriasis, 36--37% of patients on biologics required score-driven treatment modification in the first year (Foster et al. 2013, n=169). (3) Poor inter-observer agreement limits score-based decisions: Human inter-observer ICC for IHS4 severity is 0.47 [0.32--0.65] (Goldfarb et al. 2021, Thorlacius et al. 2019); PASI inter-rater MAD is ±3.3 points (Bożek et al. 2018, n=120), which is clinically consequential given the PASI ≥ 10 biologic eligibility threshold. Treatment recommendations for identical psoriasis cases agree only ~50% between human experts (Moreno-Ramírez et al. 2017, MDi-Psoriasis, n=10 cases). The practical baseline is therefore low and inconsistent integration of formal severity data into treatment decisions. The MCID of 10 decisions/year (~1 per month) represents a clinically meaningful threshold for systematic integration of objective severity data into treatment workflows.	Device ICC: 0.716--0.727 (AIHS4_2025 study), exceeding human baseline and the acceptance criterion of >= 0.70. By eliminating the 3--10 minute manual scoring burden and reducing inter-observer variability to single digits, the device makes objective severity data available at every encounter, structurally enabling severity-informed treatment decisions where subjective assessment currently predominates.	Goldfarb et al. 2021, Thorlacius et al. 2019 (human ICC baseline); Bożek et al. 2018 (PASI MAD); Hillary & Lambert 2021 (scoring frequency survey); Foster et al. 2013 (biologic modification rate); Moreno-Ramírez et al. 2017 (treatment agreement); AIHS4_2025 (device performance).
C5 --- Longitudinal monitoring rate	No published population-level baseline quantifies the proportion of dermatology patients receiving objective longitudinal severity tracking. Manual severity tracking relies on subjective clinical assessment with poor inter-observer reproducibility (ICC 0.47, Goldfarb et al. 2021), limiting its adoption for systematic longitudinal monitoring. The practical baseline is low: most severity tracking in routine practice is informal and inconsistent. The MCID of 5 percentage points represents a meaningful shift toward systematic monitoring practice. As a supportive secondary endpoint, C5 is interpreted descriptively against SotA rather than through formal superiority testing.	Device enables standardised longitudinal tracking with objective scores. SCORAD RMAE: 13.0% (ASCORAD_2022). PASI accuracy: 60.6% vs. human 52.5% (APASI_2025)	ASCORAD_2022, APASI_2025, AIHS4_2023. No population-level baseline exists for this metric.

Benefit 3KX --- Care pathway optimisation

Endpoint	SotA comparator (baseline WITHOUT device)	SotA comparator (baseline WITH comparable tools)	Source
D2 --- Waiting time reduction	Standard waiting time for specialist dermatological consultation: 60--132 days (Spain SNS Report 2025, France DREES 2018, Europe DERMAsurvey 2013)	With teledermatology and AI-assisted triage: ~71% reduction, achieving 5--11.5 days (Giavina-Bianchi et al. 2020, Morton et al. 2010, Hsiao & Oh 2008). CER acceptance criterion: >= 50% reduction	CER observed: 56% reduction (DAO_Derivacion_O_2022)
D4 --- Referral adequacy improvement	PCP unaided referral performance: sensitivity 0.663 [0.61--0.71], specificity 0.60 [0.51--0.69] (Burton et al. 1998, Gerbert et al. 1996). Unnecessary referral baseline in standard care: not specifically quantified	Medical device-assisted reduction of unnecessary referrals: 14% (Baker et al. 2022). Teledermatology-assisted: 24% (Eminovic et al. 2009, Jain et al. 2021, Knol et al. 2006). CER acceptance criterion: >= 30% reduction	CER observed: 38% reduction
D6 --- Remote assessment adequacy	Remote dermatological assessment without AI: limited by image quality and diagnostic confidence. No single published adequacy rate baseline	With teledermatology: ~55% of patients manageable remotely (Giavina-Bianchi et al. 2020, Orekoya et al. 2021, Kheterpal et al. 2023, Whited 2015). CER acceptance criterion: >= 58%	CER observed: 100% expert consensus agreement on remote management adequacy (COVIDX_EVCDAO_2022)
D7 --- Remote volume increase	Low baseline remote dermatological care volume without AI tools. Standard practice is predominantly in-person	With teledermatology implementation: capacity to manage 55%+ of patients remotely. CER acceptance criterion for remote management: >= 58% of patients	Same sources as D6

Statistical analysis plan

Descriptive statistics

For each primary and secondary endpoint:

n, mean, median, standard deviation, 95% confidence interval
Frequency distributions for categorical variables (demographics, evidence quality control)

Co-primary analysis --- MCID testing

For each of the 3 co-primary endpoints (B2, C4, D4), the primary analysis tests whether the observed mean exceeds the pre-specified MCID:

Test: One-sample t-test (one-sided)
Null hypothesis: H0: μ ≤ MCID (the population mean does not exceed the MCID)
Alternative hypothesis: H1: μ > MCID (the population mean exceeds the MCID)
Significance level: α = 0.05 (one-sided). A one-sided test is used because the hypothesis is directional: the study aims to demonstrate that physician-reported perceived improvements exceed the MCID, not merely that they differ from it. This is consistent with standard practice for superiority testing against a pre-specified threshold in clinical studies.
Multiple comparison adjustment: With 3 co-primary endpoints tested one-sided at α = 0.05, the unadjusted family-wise error rate is 1 − (1 − 0.05)^3 = 14.3%. Both unadjusted and Holm-Bonferroni-adjusted p-values are reported. The Holm-Bonferroni procedure controls the family-wise error rate at 0.05 while being less conservative than Bonferroni. A co-primary endpoint is considered confirmed if the Holm-Bonferroni-adjusted p-value is < 0.05. Based on synthetic data validation (Phase 2), the observed effect sizes (Cohen's d = 0.28--1.49 against MCID) are large enough that results are expected to survive correction.

Additionally, for each co-primary endpoint:

Test against zero: One-sample t-test, H0: mu = 0 (is the physician-reported perceived improvement different from zero?)
Comparison against SotA: Contextual comparison of observed mean against published SotA baselines (Section 9). This is a descriptive comparison, not a formal hypothesis test, because the SotA values come from different populations and study designs.

Supportive quantitative analysis

For each of the 6 supportive quantitative secondary endpoints (B4, B6, C5, D2, D6, D7), the same analyses as Section 10.2 are performed (MCID test, test against zero, SotA comparison) but results are reported with unadjusted p-values and interpreted as supportive evidence. A significant supportive endpoint strengthens the benefit confirmation but is not required for it.

Effect size

Per endpoint: Cohen's d = (mean - MCID) / SD
Per benefit (pooled): Cohen's d computed by pooling all Likert question responses within each benefit, compared against neutral (3.0)
Interpretation: Small (d = 0.2), Medium (d = 0.5), Large (d = 0.8) per Cohen 1988

Sensitivity analysis --- Data source stratification

The pre-specified sensitivity analysis stratifies all primary endpoint results by the evidence quality control response:

Subgroup A (record-consulted): Respondents who selected "(a) consulted records, institutional statistics, or data systems"
Subgroup B (estimate-based): Respondents who selected "(b) professional estimate based on experience"

For each primary endpoint, report:

Descriptive statistics (mean, median, SD, 95% CI) for each subgroup
Qualitative assessment of consistency: Do both subgroups show improvements in the same direction and of comparable magnitude?
Formal test: Two-sample t-test between subgroups to determine whether results differ significantly by data source

Interpretation: Consistency between subgroups supports the robustness of the overall findings. If the record-consulted subgroup shows systematically higher or lower values, this is reported as a finding but does not invalidate the study --- it characterises the direction of potential recall bias.

Target: >= 30% of quantitative responses should be record-consulted (subgroup A) for the sensitivity analysis to be meaningful. Based on synthetic data validation, the expected proportion is ~35%.

Subgroup analyses

Pre-specified subgroup analyses by:

Role: Dermatologist vs. primary care physician vs. hospital manager
Duration of device use: < 1 year vs. 1--3 years vs. > 3 years
Case volume: < 200 vs. 200--1,000 vs. > 1,000
Clinical setting: In-person only vs. remote/both (for endpoints D2--D7 only)

For each subgroup: descriptive statistics and visual comparison. Formal between-group testing (ANOVA or Kruskal-Wallis) only if subgroup sizes are adequate (n >= 10 per group).

Pre-specified sensitivity analysis: exclusion of non-clinical respondents

Hospital managers are included in the study population because they can meaningfully report on care pathway outcomes (Section D: waiting times, referral adequacy, remote care volume) and overall institutional benefit (Section E). However, they may not directly observe clinical performance outcomes (Section B: diagnostic accuracy; Section C: severity assessment).

A pre-specified sensitivity analysis repeats the co-primary and supportive endpoint analyses for Sections B and C after excluding hospital manager respondents. If the results are materially unchanged (direction, significance, and effect size consistent), the full-sample analysis is reported as the primary result. If hospital manager inclusion materially affects the Section B or C results, both analyses are reported and the clinician-only analysis is designated as the primary result for those sections.

Benefit coverage assessment

The study concludes that a clinical benefit is confirmed if:

The co-primary endpoint for that benefit shows a statistically significant result (Holm-Bonferroni-adjusted p < 0.05) against the MCID
The effect size (Cohen's d) for that co-primary endpoint is >= 0.3 (small-to-medium)
The corresponding Likert secondary endpoints show a mean >= 3.5 (above MCID of 0.5 above neutral)

All three benefits must meet these criteria for the study to conclude overall benefit confirmation. The supportive quantitative endpoints (Section 7.3) provide additional evidence: consistent results across the co-primary and supportive endpoints for a given benefit strengthen the conclusion; inconsistent results are reported transparently and discussed in the study report.

Safety assessment

The safety data from Section F is reported descriptively:

Proportion of respondents reporting misleading device outputs (F1 = "Yes"), with 95% CI
Proportion of respondents reporting usability issues (F2 = "Yes"), with 95% CI
F3 (overall safety) Likert mean, median, SD, and 95% CI; one-sample t-test against neutral (3.0)
Thematic analysis of F1a and F2a free-text responses, categorised by type of issue

A mean F3 score significantly below 3.0, or a misleading output rate (F1) exceeding 30%, would constitute a safety signal requiring follow-up investigation under the PMS plan. These thresholds are pre-specified to demonstrate that safety monitoring is prospective, not post hoc.

Handling of negative results

If the co-primary endpoint for one or more benefits fails to reach statistical significance (Holm-Bonferroni-adjusted p >= 0.05), or if the effect size is < 0.3:

The finding is reported transparently in the study report. Results are never suppressed or re-analysed post hoc to achieve significance.
The supportive quantitative endpoints and Likert secondary endpoints for that benefit are examined to determine whether the failure is isolated to the co-primary metric or consistent across the benefit dimension.
The clinical evaluation team assesses whether the benefit claim should be: (a) retained with additional justification from other evidence sources (prospective clinical investigations, published literature), (b) narrowed in scope (e.g., limited to specific sub-criteria with supporting evidence), or (c) removed from the CER if no supporting evidence exists.
This assessment is documented in the CER per MDCG 2020-6 §6.5.e, which requires manufacturers to "narrow the intended purpose of the device" if there is not sufficient supportive clinical evidence for a claimed benefit.

Acknowledged limitations

This study has the following methodological limitations, which are acknowledged and addressed through the study design:

Physician-reported outcomes

Outcomes are reported by healthcare professionals based on their clinical experience, not independently verified through patient records or clinical databases. This introduces potential reporting bias (respondents may overestimate device benefit).

Mitigation: The evidence quality control question identifies respondents who consulted institutional records, enabling the sensitivity analysis (Section 10.4) to characterise the impact of self-report vs. record-verified data.

Recall bias

Respondents are asked to recall events and estimates from "the last 12 months" or "since incorporating the device." Memory-based estimates may be inaccurate.

Mitigation: (1) The 12-month recall window limits the time horizon. (2) The evidence quality control question identifies record-consulted responses, which are less susceptible to recall bias. (3) The cover letter encourages respondents to consult records before answering.

Non-randomised cross-sectional design with retrospective recall

There is no control group. The study uses a cross-sectional design (data collected at a single time point) with retrospective recall (questions about past events). It cannot establish causality --- only association between device use and physician-reported perceived outcomes.

Mitigation: Comparison against published SotA baselines (Section 9) provides contextual benchmarks. The pre-specified SotA comparators define what the literature shows for comparable settings without the device, substituting for a direct control group. This is acknowledged as a weaker form of comparison than a randomised controlled trial.

Selection bias

Only institutions with active device contracts are included. These institutions chose to adopt and continue using the device, introducing survivorship bias. Institutions that discontinued use are not represented.

Mitigation: This bias is inherent to post-market surveillance and is acknowledged. The study population represents the device's actual user base, which is the relevant population for evaluating real-world clinical performance. The PMS report will note this limitation.

Single-instrument data collection

All data is collected through a single questionnaire administered at a single time point. No triangulation with independent data sources.

Mitigation: The evidence quality control stratification provides internal validation. Record-consulted responses are partially triangulated against institutional data systems. The questionnaire's dual-format design (Likert + quantitative) provides convergent evidence for each benefit dimension.

Ethics and data protection

The questionnaire includes a bilingual (English/Spanish) informed consent and data protection section that respondents must read before completing the survey. Consent is explicit: respondents must check a consent box confirming they have read and understood the information and freely consent to the processing of their responses. The consent text is reproduced in full in the questionnaire instrument (questionnaire.mdx).

The consent text includes:

A statement that participation is entirely voluntary
A statement that participation or non-participation will have no effect on the respondent's professional relationship with the device manufacturer or their institution's contract
Information on the right to withdraw consent at any time
Contact details for the data controller

Data controller

The device manufacturer.

Legal basis for processing

Explicit consent of the data subject (GDPR Article 6(1)(a)). Consent is freely given, specific (limited to the stated purpose), informed (the consent text describes what data is collected, how it will be used, retention period, and respondent rights), and unambiguous (provided by checking a consent box). This legal basis was selected because the study involves identifiable professional data (institution name + role) and the data controller has a pre-existing commercial relationship with the respondents' institutions, creating a potential power imbalance that makes legitimate interest (GDPR Article 6(1)(f)) insufficiently robust.

Data collected

Professional role, institution name, duration and volume of device use, clinical setting
Likert-scale professional opinions (benefit and safety)
Quantitative estimates of physician-reported perceived clinical outcomes
Evidence quality control responses (record-consulted vs. estimate)
Binary safety screening responses (misleading outputs, usability issues) with conditional free-text descriptions
Optional free-text feedback

No patient data is collected. No personal health information is requested. No personally identifiable patient information is captured.

Data use and disclosure

Responses are analysed in aggregate
Individual responses are not published or shared outside the clinical evaluation team
Institution names are collected for data quality verification (duplicate prevention, site-level analysis) and are not disclosed in published reports
Results are reported in the study report, the legacy device PMS report (MDR Article 85), and the CER

Data retention

Data is retained for the lifetime of the device's technical documentation, as required by MDR Annex IX.

Respondent rights

Respondents may withdraw their consent at any time by contacting the data controller. Withdrawal of consent does not affect the lawfulness of processing based on consent before its withdrawal (GDPR Article 7(3))
Rights of access, rectification, and erasure under GDPR Articles 15--17
Right to data portability under GDPR Article 20
Right to lodge a complaint with a supervisory authority under GDPR Article 77

Ethics committee review

This study does not involve patient data collection, clinical intervention, or modification of clinical practice. It collects healthcare professional opinions and estimates about their existing clinical experience. Formal ethics committee approval is not required for post-market surveillance activities conducted under the manufacturer's MDR obligations. This determination is consistent with the study's classification as a non-interventional PMS activity.

However, individual institutional ethics requirements vary. Before deploying the questionnaire to each institution, the study team will verify whether the institution requires local ethics committee notification or approval for staff surveys. If any institution requires such notification, it will be obtained before collecting data from that site. Any institutional requirements and their resolution will be documented in the study report.

Data management

Data collection platform

Responses are collected electronically via a survey platform (e.g., Google Forms). The platform is configured to enforce the consent checkbox as mandatory before any questions are presented.

Data export and storage

Upon closure of the data collection window, the raw dataset is exported as a CSV file. This raw export is the authoritative data record and is stored unmodified in the device's technical documentation archive. All subsequent cleaning and analysis are performed on copies of the raw export.

Data validation rules

The following validation rules are applied during data cleaning. Violations are documented in a data cleaning log:

Field	Validation rule	Action on violation
Percentage questions (B2, C5, D2, D4, D6, D7)	Value must be 0--100	Values > 100 are queried with the respondent if contactable; otherwise excluded from that endpoint's analysis with reason documented
Count questions (B4, B6, C4)	Value must be >= 0 and integer	Negative values excluded. Non-integer values rounded to nearest integer.
Likert questions (B1, B3, B5, C1--C3, D1, D3, D5, E1, F3)	Value must be 1--5	Values outside range excluded from that endpoint
Evidence quality control (B2c, B4c, etc.)	Value must be "a" or "b"	Missing or invalid values: quantitative data point is included in overall analysis but excluded from the sensitivity analysis subgroups
D5--D7c for in-person-only respondents	Must be N/A if A5 = "In-person only"	Non-N/A values from in-person-only respondents for remote care questions are excluded
Duplicate detection	No two complete responses from the same institution + role + duration combination	Potential duplicates are flagged for review. If confirmed duplicate, the later submission is excluded.

Missing data handling

Missing quantitative responses: excluded per-endpoint, not per-respondent. A respondent who skips B4 (rare disease count) but answers all other questions is included in all analyses except the B4 endpoint.
Missing Likert responses: excluded per-question.
Missing evidence quality control: the corresponding quantitative data point is included in the overall analysis but excluded from the sensitivity analysis subgroups.
Missing safety responses (F1, F2, F3): treated as missing. Not imputed.
A respondent with > 50% missing mandatory items (demographics + Likert + quantitative) is excluded entirely, with reason documented.

Database lock

The dataset is locked (no further modifications permitted) once:

The data collection window has closed
All validation rules have been applied and violations resolved
The data cleaning log has been finalised

The locked dataset version, the data cleaning log, and the raw export are retained as part of the study documentation.

Audit trail

All data processing steps from raw export to final analysis dataset are documented:

Raw CSV export (unmodified)
Data cleaning log (listing every validation rule violation, the action taken, and the rationale)
Final analysis CSV (after cleaning)
Statistical analysis outputs (tables, test results, figures)

Sample size justification

Power analysis

Based on Phase 2 synthetic data validation, power was computed for the one-sample t-test (two-sided, alpha = 0.05):

Scenario	n	Cohen's d	Power
Target sample (all endpoints)	60	0.4	0.873
Target sample (all endpoints)	60	0.5	0.972
Remote care endpoints (D5--D7)	44	0.4	0.756
Remote care endpoints (D5--D7)	44	0.5	0.913
Realistic: 45 respondents	45	0.4	0.765
Realistic: 45 respondents	45	0.5	0.918
Minimum viable: 30 respondents	30	0.4	0.591
Minimum viable: 30 respondents	30	0.5	0.782

Sample size targets

Minimum: 30 respondents. At n = 30, power is >= 0.78 for medium effects (d = 0.5), sufficient for the primary analysis.
Target: 45 respondents. At n = 45, power exceeds 0.76 for d = 0.4 across all endpoints, including remote care questions (reduced n due to in-person-only respondents).
Stretch: 60 respondents. Provides robust power (>= 0.87 for d = 0.4) and enables meaningful subgroup analyses.

Feasibility

The eligible population (200--500 HCPs across 21 institutions) supports a target of 45--60 respondents (response rate of 10--25%). The cover letter and institutional contacts are expected to facilitate adequate response rates.

Reporting

Study report

A standalone study report will be produced documenting:

Study objectives, design, and methods (summarising this protocol)
Participant demographics and response rate
Co-primary endpoint results: per-endpoint descriptive statistics, MCID test results (unadjusted and Holm-Bonferroni-adjusted p-values), Cohen's d, SotA comparison
Supportive quantitative endpoint results: per-endpoint descriptive statistics, MCID test results (unadjusted p-values), Cohen's d
Sensitivity analysis: stratification by data source, consistency assessment
Secondary endpoint results: Likert professional opinion summaries, qualitative feedback themes
Safety assessment: F1/F2 proportions, F3 Likert summary, thematic analysis of safety free-text
Subgroup analyses: by role, duration, case volume, clinical setting
Benefit coverage assessment: per-benefit conclusion (confirmed/not confirmed)
Handling of negative results (if applicable): assessment per Section 10.8
Acknowledged limitations and their impact on interpretation
Overall conclusion: whether the study provides evidence that the legacy device achieves its declared clinical benefits in routine clinical practice, and whether the benefit-risk profile is acceptable

Integration into regulatory documents

Study results will be incorporated into:

Legacy device PMS report (per MDR Article 85): Study results form the core of the PMS report, alongside the adverse event summary (7 non-serious incidents, 0 serious), complaint data from the legacy device's commercial history, and the safety data from Section F (misleading outputs, usability issues, perceived safety)
Clinical Evaluation Report (R-TF-015-003): A new section presenting the post-market clinical evidence per MDCG 2020-6 §6.2.2, referencing both the study report and the PMS report. The CER data appraisal must include a transparent justification for the Rank 4 classification, with the weighting assigned to this evidence relative to the prospective clinical investigations
Clinical Evaluation Plan (R-TF-015-001): Update to the evidence hierarchy table, marking Rank 4 (high quality survey under formal protocol) and Rank 8 (professional opinion) as "Used"

Evidence classification

Per MDCG 2020-6 Appendix III:

The study's quantitative outcomes (co-primary and supportive endpoints): Classified as Rank 4 --- "Outcomes from studies with potential methodological flaws but where data can still be quantified and acceptability justified." The Appendix III considerations column for Rank 4 explicitly states: "High quality surveys may also fall into this category." This study qualifies as a "high quality survey" because it incorporates a formal protocol with pre-specified endpoints, MCID thresholds derived from published SotA, comparison against published baselines, a pre-specified sensitivity analysis, Holm-Bonferroni multiplicity correction, and transparently acknowledged methodological limitations (physician-reported perceived outcomes, recall bias, non-randomised cross-sectional design with retrospective recall). The methodological limitations are consistent with the Rank 4 characterisation of "potential methodological flaws" and are mitigated through the study design (Sections 10.4, 11.1--11.5).
Likert professional opinion data (B1, B3, B5, C1--C3, D1, D3, D5, E1, F3): Contributes as Rank 8 supporting evidence --- "Proactive PMS data, such as that derived from surveys and professional opinion"
Qualitative feedback (E2, E3, F1a, F2a): Not ranked in the MDCG hierarchy. Contributes to PMS completeness and safety surveillance.

Protocol amendments

Any modifications to this protocol after data collection begins must be documented as a protocol amendment with:

Amendment number and date
Description of change
Rationale for change
Impact assessment on previously collected data

Pre-specified analyses (Sections 10.1--10.8) may not be modified post hoc. Any additional analyses performed beyond those specified in this protocol must be clearly labelled as exploratory in the study report.

References

Regulatory guidance

MDCG 2020-6: Guidance on sufficient clinical evidence for legacy devices under Article 61(4)/(5)
MDCG 2020-13: Clinical evaluation assessment report template
MEDDEV 2.7/1 Rev 4: Clinical evaluation --- a guide for manufacturers and notified bodies
MDR 2017/745: Regulation (EU) 2017/745, Articles 61, 83, 85, Annex IX, Annex XIV

Published literature (SotA comparators)

Ba et al. 2022: AI-assisted diagnostic accuracy improvement
Baker et al. 2022: Medical device-assisted reduction of unnecessary referrals
Barata et al. 2023: Reinforcement learning AI for melanoma detection
Brinker et al. 2019: Deep learning outperforms dermatologists in melanoma detection
Burton et al. 1998: PCP diagnostic accuracy for skin lesions
Chen et al. 2024: AI-assisted dermatological diagnosis across experience levels
Eminovic et al. 2009: Teledermatology reduction of unnecessary referrals
Escalé-Besa et al. 2023: HCP diagnostic accuracy for skin conditions
Ferris et al. 2025: AI-assisted dermatological diagnosis
Gerbert et al. 1996: PCP referral accuracy for skin conditions
Giavina-Bianchi et al. 2020: Teledermatology waiting time reduction and remote management
Foster et al. 2013: Biologic therapy modification rates in psoriasis (J Dermatolog Treat, n=169)
Goldfarb et al. 2021: Inter-observer agreement for HS severity scoring
Haenssle et al. 2018: Man against machine --- diagnostic performance of CNN vs. dermatologists
Hillary & Lambert 2021: Survey of severity scoring tool usage frequency among dermatologists (n=149, PMC 8710531)
Han et al. 2020/2022: AI-assisted diagnostic accuracy studies
Hsiao & Oh 2008: Teledermatology waiting time reduction
Jain et al. 2021: AI-assisted referral reduction
Kheterpal et al. 2023: Remote dermatological management capacity
Kim et al. 2022: Diagnostic accuracy baseline studies
Knol et al. 2006: Teledermatology referral reduction
Krakowski et al. 2024: AI-assisted diagnostic accuracy
Maron et al. 2019/2020: AI-assisted melanoma detection and diagnostic accuracy improvement
Morton et al. 2010: Teledermatology waiting time reduction
Muñoz-López et al. 2021: HCP diagnostic accuracy baseline
Orekoya et al. 2021: Remote dermatological management
Thorlacius et al. 2019: IHS4 inter-observer agreement
Bożek et al. 2018: PASI inter-rater variability (n=120, mean absolute difference 3.3 points)
Moreno-Ramírez et al. 2017: MDi-Psoriasis decision support --- treatment agreement among dermatologists (Actas Dermosifiliogr, n=10 cases)
Tschandl et al. 2019/2020: AI-assisted diagnostic studies
Whited 2015: Teledermatology evidence review

Internal study documents

R-TF-015-001: Clinical Evaluation Plan
R-TF-015-003: Clinical Evaluation Report
R-TF-015-011: State of the Art
Questionnaire instrument: questionnaire.mdx
Synthetic data validation: preliminary-analysis.md
Synthetic dataset: synthetic-data-60.csv

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

Author: Team members involved
Reviewer: JD-003 Design & Development Manager, JD-004 Quality Manager & PRRC
Approver: JD-001 General Manager

Study title​

Study objectives​

Primary objective​

Secondary objectives​

Declared clinical benefits under evaluation​

Study design​

Design type​

Design rationale​

Number of sites​

Study duration​

Regulatory basis​

Compliance with MDCG 2020-6 Section 6.5.e​

Study population​

Inclusion criteria​

Exclusion criteria​

Eligible population​

Data collection instrument​

Instrument description​

Recall time frames​

Instrument reference​

Administration​

Evidence quality control​

Co-primary endpoints​

Benefit 7GH --- Diagnostic accuracy​

Benefit 5RB --- Objective severity assessment​

Benefit 3KX --- Care pathway optimisation​

Secondary endpoints​

Professional opinion (Likert scale)​

Qualitative feedback​

Supportive quantitative endpoints​

Safety and risk endpoints​

Pre-specified MCID thresholds​

Co-primary and supportive endpoint MCIDs​

Secondary endpoint MCIDs​

State of the Art comparators​

Benefit 7GH --- Diagnostic accuracy​

Benefit 5RB --- Objective severity assessment​

Benefit 3KX --- Care pathway optimisation​

Statistical analysis plan​

Descriptive statistics​

Co-primary analysis --- MCID testing​

Supportive quantitative analysis​

Effect size​

Sensitivity analysis --- Data source stratification​

Subgroup analyses​

Pre-specified sensitivity analysis: exclusion of non-clinical respondents​

Benefit coverage assessment​

Safety assessment​

Handling of negative results​

Acknowledged limitations​

Physician-reported outcomes​

Recall bias​

Non-randomised cross-sectional design with retrospective recall​

Selection bias​

Single-instrument data collection​

Ethics and data protection​

Informed consent​

Data controller​

Legal basis for processing​

Data collected​

Data use and disclosure​

Data retention​

Respondent rights​

Ethics committee review​

Data management​

Data collection platform​

Data export and storage​

Data validation rules​

Missing data handling​

Database lock​

Audit trail​

Sample size justification​

Power analysis​

Sample size targets​

Feasibility​

Reporting​

Study report​

Integration into regulatory documents​

Evidence classification​

Protocol amendments​

Study title

Study objectives

Primary objective

Secondary objectives

Declared clinical benefits under evaluation

Study design

Design type

Design rationale

Number of sites

Study duration

Regulatory basis

Compliance with MDCG 2020-6 Section 6.5.e

Study population

Inclusion criteria

Exclusion criteria

Eligible population

Data collection instrument

Instrument description

Recall time frames

Instrument reference

Administration

Evidence quality control

Co-primary endpoints

Benefit 7GH --- Diagnostic accuracy

Benefit 5RB --- Objective severity assessment

Benefit 3KX --- Care pathway optimisation

Secondary endpoints

Professional opinion (Likert scale)

Qualitative feedback

Supportive quantitative endpoints

Safety and risk endpoints

Pre-specified MCID thresholds

Co-primary and supportive endpoint MCIDs

Secondary endpoint MCIDs

State of the Art comparators

Benefit 7GH --- Diagnostic accuracy

Benefit 5RB --- Objective severity assessment

Benefit 3KX --- Care pathway optimisation

Statistical analysis plan

Descriptive statistics

Co-primary analysis --- MCID testing

Supportive quantitative analysis

Effect size

Sensitivity analysis --- Data source stratification

Subgroup analyses

Pre-specified sensitivity analysis: exclusion of non-clinical respondents

Benefit coverage assessment

Safety assessment

Handling of negative results

Acknowledged limitations

Physician-reported outcomes

Recall bias

Non-randomised cross-sectional design with retrospective recall

Selection bias

Single-instrument data collection

Ethics and data protection

Informed consent

Data controller

Legal basis for processing

Data collected

Data use and disclosure

Data retention

Respondent rights

Ethics committee review

Data management

Data collection platform

Data export and storage

Data validation rules

Missing data handling

Database lock

Audit trail

Sample size justification

Power analysis

Sample size targets

Feasibility

Reporting

Study report

Integration into regulatory documents

Evidence classification

Protocol amendments