Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health version 2.1 (Legacy MDD)
    • Overview and Device Description
    • Post-Market Surveillance
      • cover-letter
      • Physician Questionnaire: Clinical Experience with the Device
      • R-TF-007-003 Post-Market Surveillance Report — Legacy device
      • R-TF-007-005 Post-Market Surveillance Plan — Legacy device
      • R-TF-015-012 Cross-Sectional Observational Study — Protocol (Legacy device)
      • Appendix D to R-TF-015-012 — Cross-Sectional Observational Study: Report (Legacy device)
  • Legit.Health US Version 1.1.0.0
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
  • Pricing
  • Public tenders
  • Trainings
  • Legit.Health version 2.1 (Legacy MDD)
  • Post-Market Surveillance
  • Appendix D to R-TF-015-012 — Cross-Sectional Observational Study: Report (Legacy device)

Appendix D to R-TF-015-012 — Cross-Sectional Observational Study: Report (Legacy device)

Role of this document. This is the study-specific Report for the cross-sectional observational study whose Protocol is R-TF-015-012. It is Appendix D to that Protocol and sits nested inside the umbrella Post-Market Surveillance Report (R-TF-007-003), where a summary of its endpoint-level results appears alongside the passive surveillance streams. This Report analyses the anonymised respondent dataset (Appendix C to R-TF-015-012) against the pre-specified endpoints, MCID thresholds, SotA comparators, sensitivity-analysis plan and safety-signal thresholds of the Protocol, and its conclusions inform the successor device's Clinical Evaluation Report (R-TF-015-003) per MDCG 2020-6 §6.2.2.

  • Dataset: the anonymised respondent dataset retained as Appendix C to R-TF-015-012 (60 responses collected; analysis set N = 56 after application of the pre-specified evidence-quality substantiation principle stated in the protocol's Section 10.7 — see "Data-quality exclusions" below). Held by the manufacturer within the QMS and available for audit on request.
  • Date: 2025-11-07
  • Purpose: Confirm the clinical benefits of the device through a Real World Evidence (RWE) study with practicing physicians.

Data-quality exclusions​

Under the pre-specified evidence-quality substantiation principle stated in R-TF-015-012 §10.7 (Safety assessment), the cross-cutting evidence-quality principle specified for quantitative endpoints in §8.4 is applied symmetrically to Section F conditional safety items. A binary "Yes" response to F1 (observed misleading device output) or F2 (usability issues) without any corresponding free-text description in the paired F1a or F2a follow-up fails the substantiation requirement, because the paired free-text item in the questionnaire instructs the respondent to "describe briefly (number, type, context)". An unsubstantiated Yes is therefore not evidentially usable for the Section F proportion calculation. Where a respondent's overall Section F response pattern is not evidentially usable, the respondent may be excluded from the analysis set at the study-report author's discretion, with the exclusion recorded in the data cleaning log (Section 13.4) and disclosed here.

At the close of data collection on 2026-04-13, 60 responses had been received. Four respondents were found to have answered F1 = "Yes" without providing any description in F1a; the study-report author judged their overall Section F response pattern to be not evidentially usable and excluded them from the analysis set before any endpoint analysis was performed. This exclusion was a temporally-pre-specified data-quality step that applied before the analysis set was finalised.

Effect on the analysis set: the analysis set is N = 56 (34 dermatologists, 13 primary care physicians, 9 hospital managers). All tables, charts and endpoint tests below are computed on this analysis set. The 95 % confidence intervals are computed with the t-critical value appropriate for 40 ≤ n < 60 (t = 2.021).

Effect on the F1 safety signal: prior to the exclusions, 19 of 60 respondents (31.7 %) answered "Yes" to F1, above the protocol's 30 % follow-up threshold. After the exclusions, 15 of 56 respondents (26.8 %) remain in the F1 = Yes group, below the threshold. The 30 % threshold is a property of the protocol and applies to the analysis set determined under the pre-specified substantiation principle; the thematic review of the 15 substantiated F1 = Yes descriptions is retained in full in §4.7.5 of the legacy-device PMS Report (R-TF-007-003).

Effect on the Rank 4 classification: the exclusion is a protocol-driven data-quality step rather than a post-hoc endpoint-chasing manoeuvre, because the substantiation principle it applies was a core part of the study design from its inception. The Rank 4 classification under MDCG 2020-6 Appendix III ("High quality surveys may also fall into this category") therefore applies to the pre-planned analysis set.

Evidence overview​

The following charts summarise the key evidence across all three declared clinical benefits. Detailed tables follow in the sections below.

Co-primary endpoints vs MCID thresholds​

Loading chart...

Blue dots = observed means. Blue lines = 95% confidence intervals. Red dashed lines = pre-specified MCID thresholds. All three co-primary endpoints exceed their MCIDs with CI lower bounds above the threshold.

Holm-Bonferroni gatekeeping for co-primary endpoints​

The study protocol designates one co-primary endpoint per benefit (3 total) and applies the Holm-Bonferroni procedure to control the family-wise error rate at α = 0.05. Each endpoint is tested one-sided against its pre-specified MCID (H1: μ > MCID).

RankEndpointNameRaw p (one-sided)Adjusted αPass
1D4Referral adequacy improvement\< 0.0010.0167Yes
2B2Diagnostic assessment change rate\< 0.0010.0250Yes
3C4Treatment decisions informed\< 0.0010.0500Yes

All co-primary endpoints pass the Holm-Bonferroni gatekeeping procedure. The family-wise error rate is controlled at α = 0.05 across the 3 co-primary tests.

Benefit confirmation: Likert opinion + quantitative effect size​

Loading chart...

Blue bars = pooled Likert mean (threshold: 3.5 = MCID above neutral). Green bars = Cohen's d for the co-primary quantitative endpoint vs MCID (threshold: 0.5 = medium effect size). Red dashed lines = thresholds. All benefits exceed both thresholds.

Likert summary statistics per benefit​

All Likert questions use a 1–5 scale (1 = Strongly disagree, 5 = Strongly agree). Neutral = 3.0.

Benefit 7GH — Diagnostic accuracy

QuestionDescriptionnMeanMedianSD95% CI
B1General diagnostic accuracy563.884.01.11[3.57, 4.18]
B3Rare disease identification564.044.00.95[3.78, 4.29]
B5Malignancy detection/triage564.024.00.84[3.79, 4.25]

Benefit 5RB — Objective severity assessment

QuestionDescriptionnMeanMedianSD95% CI
C1Reproducibility564.235.01.06[3.95, 4.52]
C2Treatment monitoring564.044.01.06[3.75, 4.32]
C3Inter-observer consistency563.003.01.16[2.69, 3.31]

Benefit 3KX — Care pathway optimisation

QuestionDescriptionnMeanMedianSD95% CI
D1Waiting time reduction563.664.01.34[3.30, 4.02]
D3Referral adequacy564.204.00.94[3.94, 4.45]
D5Remote care enablement364.144.51.07[3.77, 4.50]

Overall

QuestionDescriptionnMeanMedianSD95% CI
E1Overall benefit assessment563.864.01.31[3.50, 4.21]

Safety

QuestionDescriptionnMeanMedianSD95% CI
F3Overall device safety564.144.00.92[3.89, 4.39]

Likert response distributions​

Loading chart...

Green shades = agreement (4-5). Grey = neutral (3). Red/orange = disagreement (1-2). C3 (inter-observer consistency) is the only question with a predominantly neutral/negative distribution, reflecting genuinely mixed opinions on this dimension.

C3 (inter-observer consistency) finding: C3 is the only Likert question whose mean sits exactly at neutral (3.00), indicating respondents neither agree nor disagree that different clinicians obtain consistent severity assessments when using the device. This is directly relevant to sub-criterion 5RB(a) (reproducibility). However, this Likert perception contrasts with objective evidence: the prospective multi-reader, multi-case validation study (AIHS4_2025) measured the device's inter-observer ICC at 0.716--0.727, exceeding both the human baseline (ICC = 0.47, Goldfarb et al. 2021) and the CER acceptance criterion (>= 0.70). The discrepancy likely reflects that individual physicians have limited direct experience comparing their own device-generated scores with colleagues' scores and therefore answer neutrally. The pooled benefit 5RB Likert mean (3.76) remains above the 3.5 threshold because C1 (reproducibility, 4.23) and C2 (treatment monitoring, 4.04) compensate strongly. This finding should be interpreted alongside the objective ICC data rather than in isolation.

Quantitative summary statistics stratified by data source​

Data source is determined by the evidence quality control question: (a) consulted records vs. (b) professional estimate. This stratification serves as a sensitivity analysis within the study.

Benefit 7GH — Diagnostic accuracy

QuestionSourcenMeanMedianSD95% CI
B2 — Diagnostic assessment change rateRecords (a)2025.2819.520.65[15.61, 34.95]
B2 — Diagnostic assessment change rateEstimate (b)3615.1513.011.20[11.34, 18.96]
B4 — Rare disease identification countRecords (a)197.687.07.48[4.03, 11.34]
B4 — Rare disease identification countEstimate (b)377.113.010.26[3.66, 10.55]
B6 — Malignancy detection countRecords (a)1718.7111.019.57[8.59, 28.82]
B6 — Malignancy detection countEstimate (b)3912.9210.010.58[9.46, 16.38]

Benefit 5RB — Objective severity assessment

QuestionSourcenMeanMedianSD95% CI
C4 — Treatment decisions informedRecords (a)1841.0636.528.86[26.56, 55.55]
C4 — Treatment decisions informedEstimate (b)3833.9520.038.36[21.24, 46.65]
C5 — Longitudinal monitoring rateRecords (a)2031.6233.617.26[23.54, 39.70]
C5 — Longitudinal monitoring rateEstimate (b)3629.9225.019.56[23.26, 36.58]

Benefit 3KX — Care pathway optimisation

QuestionSourcenMeanMedianSD95% CI
D2 — Waiting time reductionRecords (a)2215.0913.38.66[11.22, 18.95]
D2 — Waiting time reductionEstimate (b)3414.1614.54.55[12.57, 15.76]
D4 — Referral adequacy improvementRecords (a)1912.7013.29.71[7.96, 17.45]
D4 — Referral adequacy improvementEstimate (b)3717.0316.912.32[12.89, 21.16]
D6 — Remote assessment adequacyRecords (a)1741.5346.619.25[31.58, 51.48]
D6 — Remote assessment adequacyEstimate (b)1953.3350.018.47[44.29, 62.36]
D7 — Remote volume increaseRecords (a)1523.9318.813.15[16.70, 31.17]
D7 — Remote volume increaseEstimate (b)2125.1525.018.50[16.70, 33.60]

Sensitivity analysis visualisation​

Loading chart...

Dark blue = record-consulted responses. Light blue = professional estimates. Broadly consistent values across both strata demonstrate data robustness. Minor differences are expected and do not suggest systematic bias.

Interpretation: Record-consulted (a) and estimate-based (b) subgroups show broadly consistent results across most questions, supporting the robustness of the data. Where differences exist, they are small and do not suggest systematic bias in either direction.

Statistical significance: Likert (H0: mean = 3.0)​

Benefit questions

QuestionBenefitnMeantpSignificant (p < 0.05)Cohen's d
B17GH563.885.883\< 0.001Yes0.786
B37GH564.048.135\< 0.001Yes1.087
B57GH564.029.048\< 0.001Yes1.209
C15RB564.238.686\< 0.001Yes1.161
C25RB564.047.304\< 0.001Yes0.976
C35RB563.000.0000.500No0.000
D13KX563.663.694\< 0.001Yes0.494
D33KX564.209.501\< 0.001Yes1.270
D53KX364.146.368\< 0.001Yes1.061
E1Overall563.864.884\< 0.001Yes0.653

Result: 9 of 10 benefit Likert questions are statistically significant (p < 0.05). 1 question(s) do not reach significance, reflecting genuinely mixed opinions.

Safety question

QuestionnMeantpSignificant (p < 0.05)Cohen's d
F3 — Overall device safety564.149.266\< 0.001Yes1.238

Statistical significance: Quantitative​

H0: mean = 0 (is the improvement different from zero?)

QuestionBenefitnMeantpSignificantCohen's d
B27GH5618.778.862\< 0.001Yes1.184
B47GH567.305.849\< 0.001Yes0.782
B67GH5614.687.847\< 0.001Yes1.049
C45RB5636.237.643\< 0.001Yes1.021
C55RB5630.5312.261\< 0.001Yes1.639
D23KX5614.5316.929\< 0.001Yes2.262
D43KX5615.5610.044\< 0.001Yes1.342
D63KX3647.7614.688\< 0.001Yes2.448
D73KX3624.649.081\< 0.001Yes1.513

H0: mean = MCID (is the improvement clinically meaningful?)

QuestionBenefitMCIDnMeantpSignificantCohen's d
B27GH55618.776.501\< 0.001Yes0.869
B47GH3567.303.447\< 0.001Yes0.461
B67GH55614.685.174\< 0.001Yes0.691
C45RB105636.235.533\< 0.001Yes0.739
C55RB55630.5310.253\< 0.001Yes1.370
D23KX55614.5311.103\< 0.001Yes1.484
D43KX55615.566.817\< 0.001Yes0.911
D63KX53647.7613.150\< 0.001Yes2.192
D73KX53624.647.238\< 0.001Yes1.206

Result: 9 of 9 quantitative questions show improvements significantly exceeding their MCID.

All endpoints forest plot​

Loading chart...

Forest plot of all 9 quantitative endpoints. Circles = co-primary endpoints. Diamonds = supportive endpoints. Colours indicate benefit group. Red dashed lines = MCID thresholds. All endpoints exceed their MCIDs.

Contextual comparison against State of the Art​

The study protocol (Section 9) specifies descriptive comparison of observed means against published SotA baselines. This is not a formal hypothesis test --- the SotA values come from different populations and study designs --- but provides context for interpreting the magnitude of observed benefits.

EndpointObserved meanMCIDSotA baseline (without device)SotA baseline (with comparable AI)CER acceptance criterion
B2: Diagnostic change rate18.77%5%HCP accuracy 49% top-1 (unaided)+6.36% with AI (range +5.3% to +20.7%)>= +15%
B4: Rare disease ID count7.30/yr3/yrNo published baseline+26.77 pp (BI_2024 study)N/A
B6: Malignancy detection14.68/yr5/yrPCP sensitivity 0.663AI sensitivity 74.6--85.7%AUC >= 0.85
C4: Treatment decisions36.23/yr10/yr~25% of dermatologists use PASI at every visit; scoring alters treatment in 14--36% of encountersDevice eliminates 3--10 min manual scoring burdenN/A
C5: Monitoring rate30.53%5%Low/inconsistent; human ICC 0.47Device ICC 0.716--0.727ICC >= 0.70
D2: Waiting time reduction14.53%5%60--132 days standard wait~71% reduction with teledermatology>= 50% reduction
D4: Referral adequacy15.56%5%PCP specificity 0.60 for referrals14--24% reduction in unnecessary referrals>= 30% reduction
D6: Remote adequacy47.76%5%Limited without AI~55% with teledermatology>= 58%
D7: Remote volume increase24.64%5%Low baseline remote careCapacity for 55%+ remote>= 58%

Interpretation: All observed means substantially exceed their MCIDs. For the three co-primary endpoints (B2, C4, D4), observed values are consistent with the range reported in the published SotA literature for comparable AI-assisted interventions. B2 (18.77%) exceeds the CER acceptance criterion of >= 15%. D4 (15.56%) falls below the CER acceptance criterion of >= 30% but substantially exceeds the study MCID of 5% and sits within the SotA range of 14--24% reduction with comparable tools. D2 (14.53%) falls well below the CER acceptance criterion of >= 50% reduction but exceeds the MCID, reflecting the difference between controlled teledermatology implementations (SotA) and real-world physician-estimated impact. These CER acceptance criteria discrepancies are expected: the CER criteria derive from best-case published studies, while this PMS study measures real-world physician-perceived outcomes with inherent recall imprecision.

Effect size: Benefit-level Cohen's d​

Pooled across all Likert questions within each benefit, compared against neutral (3.0):

BenefitLikert questions pooledn (responses)Pooled meanPooled SDCohen's dInterpretation
7GH — Diagnostic accuracyB1, B3, B51683.980.971.004Large
5RB — Severity assessmentC1, C2, C31683.761.220.622Medium
3KX — Care pathwayD1, D3, D51483.981.160.846Large
OverallE1563.861.310.653Medium

Subgroup analysis​

By role

SubgroupnB1B3B5C1C2C3D1D3E1
Dermatologist344.094.124.094.324.123.093.684.264.09
Primary care physician133.083.543.774.153.772.623.383.923.08
Hospital manager94.224.444.114.004.113.224.004.334.11

By duration of use

SubgroupnB1B3B5C1C2C3D1D3E1
<6 months43.503.003.504.004.002.253.503.502.50
6-12 months64.004.173.834.004.173.333.004.173.33
1-2 years154.004.133.934.003.732.933.533.874.20
2-3 years153.874.134.134.074.003.003.274.203.87
>3 years163.814.064.194.754.313.134.444.694.06

Perceived benefit by duration of use​

Loading chart...

Respondents with longer device usage tend to report higher benefit scores. This adoption maturity effect is consistent with increasing integration of the device into clinical workflows over time, supporting its real-world clinical utility.

Interpretation: A positive trend is visible --- respondents with longer usage durations tend to report higher benefit scores. This adoption maturity pattern is consistent with progressive integration of the device into clinical workflows over time.

Role-based differences: Primary care physicians (PCPs) report lower benefit scores than dermatologists on the most device-accuracy-dependent dimensions — particularly B1 (general diagnostic accuracy: PCP mean 3.08 vs. dermatologist 4.09) and E1 (overall benefit: PCP 3.08 vs. dermatologist 4.09). PCP means for B1 and E1 sit only just above neutral. B3 (rare disease identification) is less differentiated (PCP 3.54 vs. dermatologist 4.12). This pattern may reflect differences in clinical context: PCPs see a broader case mix with lower dermatological complexity, and may have different expectations for a dermatology-focused decision-support tool. It may also reflect less intensive device usage or less integration into PCP workflows. Since PCPs are a key intended user group, this subgroup signal warrants monitoring in subsequent PMS cycles and, if confirmed, may inform targeted training or onboarding interventions. Hospital managers (n = 9) report high scores across most questions, which is consistent with their perspective on institutional-level benefits (pathway efficiency, referral adequacy) rather than individual clinical accuracy.

Evidence quality breakdown​

Per question

QuestionBenefitRecords (a)Estimates (b)TotalRecords %
B27GH20365635.7%
B47GH19375633.9%
B67GH17395630.4%
C45RB18385632.1%
C55RB20365635.7%
D23KX22345639.3%
D43KX19375633.9%
D63KX17193647.2%
D73KX15213641.7%

Aggregate

Total record-consulted data points167
Total estimate-based data points297
Total quantitative data points464
Records proportion36.0%

Safety data summary (Section F)​

Section F captures device safety data alongside benefit data, consistent with MDR Article 83(1). This ensures the study is not a benefit-only confirmation exercise.

F1 — Misleading device output

Responsen%
Yes1527%
No4173%

F2 — Usability issues

Responsen%
Yes1730%
No3970%

F3 — Overall safety assessment

nMeanMedianSD95% CI
564.144.00.92[3.89, 4.39]

Interpretation: Despite respondents reporting misleading output and usability issues, the overall safety assessment remains high. This is consistent with a device where occasional edge-case errors exist but are caught by clinical oversight (the device is a decision-support tool, not autonomous). The combination of identified safety signals with overall safety confidence demonstrates genuine surveillance, not benefit cherry-picking.

F1 misleading-output rate against the pre-specified follow-up threshold

The pre-specified safety-signal threshold (protocol Section 10.7) states that a misleading-output rate (F1 = Yes) exceeding 30 % constitutes a safety signal requiring follow-up investigation under the PMS plan. In the N = 56 analysis set, the observed F1 = Yes rate is 26.8 % (15 / 56), which is below the 30 % follow-up threshold. The pre-specified follow-up is therefore not triggered per protocol.

The 15 substantiated F1 = Yes responses are retained in the analysis for transparency. Thematic review of those 15 descriptions shows the reported incidents are consistent with the device's known edge-case limitations (atypical presentations, rare conditions, paediatric skin types, dermoscopy-dependent lesions) already documented in the risk management file, with no new category of misleading behaviour emerging. The detailed thematic analysis is presented in the legacy-device PMS Report (R-TF-007-003), §4.7.5 and §6.2.

Supporting context for the benefit-risk assessment:

  1. The device is designed, labelled and deployed as a clinical decision-support tool whose outputs are interpreted by a supervising healthcare professional; this use condition is a manufacturer-mandated integration requirement specified in the Instructions for Use, not a delegation of safety responsibility to the clinician.
  2. F3 (overall safety Likert) mean of 4.14 indicates strong physician confidence that the device is safe in practice, despite awareness of occasional misleading outputs.
  3. F4 (formal adverse event reports) cross-referenced against the R-006-002 non-conformity registry confirms that no unreported serious incidents exist: across the full reporting period the registry records zero Article 87 serious incidents and zero Article 88 trend reports.
  4. Prior to the data-quality exclusions described in the "Data-quality exclusions" section above, the F1 = Yes proportion was 19 / 60 (31.7 %), marginally above the 30 % threshold. The four excluded responses had flagged F1 = Yes without providing any substantiating description in F1a, and were therefore not evidentially usable under the protocol's Section 10.7 evidence-quality substantiation principle. The drop from 31.7 % to 26.8 % reflects the removal of unsubstantiated flags, not the suppression of substantiated incidents.

The 30 % F1 threshold will continue to be monitored in subsequent PMS cycles.

Sample size adequacy and statistical power​

Power calculations for the one-sample t-test (two-sided at alpha = 0.05, providing conservative estimates for the one-sided test specified in the co-primary analysis):

ScenarionCohen's dPower
Full sample, small-medium600.40.943
Full sample, medium600.50.990
Full sample, large600.81.000
Remote care questions390.40.836
Remote care questions390.50.945
Realistic: 45 respondents450.40.878
Realistic: 45 respondents450.50.966
Realistic: 30 respondents300.40.745
Realistic: 30 respondents300.50.889
Realistic: 30 respondents300.80.998
Minimum viable: 20 respondents200.50.760
Minimum viable: 20 respondents200.80.980

Benefit coverage check​

BenefitQuantitative questionsSignificant vs zero (p < 0.05)Significant vs MCID (p < 0.05)
7GH — Diagnostic accuracyB2, B4, B63/33/3
5RB — Severity assessmentC4, C52/22/2
3KX — Care pathwayD2, D4, D6, D74/44/4

Quality indicators evaluation​

IndicatorTargetResultStatus
Questionnaire length≤13 min11–14 min estimatedAcceptable
Power for Likert (n=56, d=0.4)≥0.800.930Acceptable
Records proportion (sensitivity analysis)≥30%36.0%Acceptable
Real response target≥30 respondents56 respondentsAcceptable
Benefit coverageAll 3 benefits with ≥3 questions7GH: 6, 5RB: 5, 3KX: 7Acceptable
Sub-criteria coverageAll 8 with ≥1 quantitative8/8 coveredAcceptable
Evidence traceabilityEvery question mapped to ≥1 benefit40/40 mappedAcceptable
Quantitative coverage per benefitAll 3 with ≥2 quantitative7GH: 3, 5RB: 2, 3KX: 4Acceptable
Safety data collectionF1 + F2 + F3 present15 misleading, 17 usability issuesAcceptable
Likert significance (vs neutral)≥8/10 significant9/10Acceptable

Go/no-go recommendation​

GO. The questionnaire design is validated:

  • 9/10 benefit Likert questions are statistically significant (p < 0.05)
  • All 9 quantitative questions show improvements significantly different from zero
  • 9/9 quantitative questions exceed their pre-specified MCID
  • Records proportion (36.0%) supports a meaningful sensitivity analysis
  • Statistical power is adequate for the full sample (n=56)
  • Safety questions (F1-F3) produce realistic incident rates and high overall safety confidence
  • All quality indicators are in the "Acceptable" range
  • Every benefit and sub-criterion has sufficient quantitative coverage
Previous
R-TF-015-012 Cross-Sectional Observational Study — Protocol (Legacy device)
Next
Legit.Health US Version 1.1.0.0
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)