Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health version 2.1 (Legacy MDD)
  • Legit.Health US Version 1.1.0.0
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 0: Background & Action Plan
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
        • completed-tasks
          • task-3b10-legacy-pms-document-hierarchy-refactor
          • task-3b14-ifu-integration-requirements-verification
          • task-3b4-mrmc-dark-phototypes
            • MRMC cross-study comparison — BI_2024 · PH_2024 · SAN_2024 · MAN_2025
          • task-3b7-icd-per-epidemiological-group-vv
          • task-3b8-safety-confirmation-column-definition
          • task-3b9-legacy-pms-conclusions-into-plus-pms-plan
        • Coverage matrix
        • resources
        • Task 3b-5: Autoimmune and Genodermatoses Triangulated-Evidence Package
      • Evidence rank & phases
      • Pre-submission review of R-TF-015-001 CEP and R-TF-015-003 CER
  • Pricing
  • Public tenders
  • Trainings
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • completed-tasks
  • task-3b4-mrmc-dark-phototypes
  • MRMC cross-study comparison — BI_2024 · PH_2024 · SAN_2024 · MAN_2025

MRMC cross-study comparison — BI_2024 · PH_2024 · SAN_2024 · MAN_2025

Scope: internal, audit-invisible. All four MRMC investigations that sit in the Plus technical file compared against the same endpoint template. Figures pulled from signed CIRs and, for MAN_2025, computed live from the locked de-identified dataset at apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Investigation/man-2025/data/.

Last refresh: 2026-04-20 (MAN_2025 data lock 2026-04-17; BI/PH/SAN values from their signed CIRs).

Primary endpoint template (all four studies): paired top-1 diagnostic accuracy, HCP unaided vs HCP aided by the device, pre-specified acceptance criterion ≥10 pp absolute improvement, McNemar two-sided p<0.05.

Study design at a glance​

StudyPurposeCIP locationCIR locationReadersCases / image setPaired obsDesign
BI_2024Rare dermatological diseasesInvestigation/bi-2024/r-tf-015-005.mdxInvestigation/bi-2024/r-tf-015-006.mdx15 (11 PCP + 4 derm)~100 curated images, rare-disease focus1,449Self-controlled MRMC, paired unaided→aided, remote web platform
PH_2024Pigmented skin lesions / photographicInvestigation/ph-2024/r-tf-015-005.mdxInvestigation/ph-2024/r-tf-015-006.mdx930 image sets, 8 diagnostic classes~270Self-controlled MRMC, paired unaided→aided
SAN_2024General dermatology, mixed conditionsInvestigation/san-2024/r-tf-015-005.mdxInvestigation/san-2024/r-tf-015-006.mdx16 (10 PCP + 6 derm)29 images × 16 readers (401 completed)401Prospective observational MRMC, remote web platform
MAN_2025Fitzpatrick V–VI phototypes only (PMCF)Investigation/man-2025/r-tf-015-004.mdxInvestigation/man-2025/r-tf-015-006.mdx16 (primary) / 19 enrolled149 curated atlas images (all FP V–VI)2,376Three-stage MRMC (unaided → aided → referral), self-controlled

Primary endpoint — paired top-1 accuracy improvement (pooled primary cohort)​

StudyUnaidedAidedΔ (pp)McNemar pAcceptanceStatus
BI_202447.94%63.06%+15.12<0.001≥10 ppPASS
PH_202463.70%81.85%+18.15<0.001≥10 ppPASS
SAN_202468.08%88.78%+20.70<0.0001≥10 ppPASS
MAN_202541.79%65.07%+23.27≈1.0×10⁻¹⁰⁸ (χ² cc = 490.67)≥10 ppPASS

All four studies clear the ≥10 pp pre-specified bar. MAN_2025 has the largest improvement (+23.27 pp) and the lowest unaided baseline.

MAN_2025 computation provenance​

The MAN_2025 numbers above are computed directly from the locked dataset in this repo:

# One-shot computation run from repo root on 2026-04-20
import json
base = 'apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Investigation/man-2025/data'
subs = json.load(open(f'{base}/submissions.json')) # 8542 rows, categories: diagnosis / assisted-diagnosis / referral
meta = json.load(open(f'{base}/meta.json')) # qualifiedReaders list (16 in primary cohort)
cases = json.load(open(f'{base}/cases.json')) # 149 cases

# Pair (readerCode, caseId) submissions for category ∈ {diagnosis, assisted-diagnosis}
# Compare each answer to case.correctCondition (exact-string)
# Result: 2376 paired obs · unaided 993/2376 (41.79%) · aided 1546/2376 (65.07%) · Δ +23.27 pp
# McNemar discordant pairs: b=34 (correct→incorrect) · c=587 (incorrect→correct) · χ² cc = 490.67

These numbers match what <Man2025PrimaryOutcomeTable /> and <Man2025AcceptanceCriteriaResultsTable /> render in the CIR at build time. The analytics live in apps/qms/src/components/Man2025/analytics.ts (deterministic, no network, unit-testable). The thresholds (≥10 pp, etc.) live once in packages/ui/src/components/PerformanceClaimsAndClinicalBenefits/performanceClaims.ts under studyId: "MAN_2025" rows (M2N, M2A, M2R). See apps/qms/src/components/Man2025/CLAUDE.md for the pipeline.

Secondary endpoints​

Sensitivity / specificity (HCP decision, unaided → aided)​

StudySensitivityΔSpecificityΔNotes
BI_202452.61% → 71.04%+18.4356.45% → 75.83%+19.38Pooled across primary cohort
PH_202468.55% → 83.15%+14.6078.01% → 89.91%+11.90Pooled across 9 readers
SAN_2024not separately reported — top-1 accuracy is the confirmatory endpoint———See SAN_2024 CIR §Secondary endpoints
MAN_2025malignant-case accuracy 56.88% → 78.75% (n=160 paired obs on 10 malignant cases)+21.88——Descriptive only — see §Stage 3 referral for referral-sensitivity

Specialty strata — paired top-1 accuracy, broken out by reader specialty​

The four studies were designed with different reader mixes, so cross-study specialty comparison is not a like-for-like contrast, but the shape of the result is consistent: the lower the unaided baseline, the larger the Δ from device assistance, across every specialty and every study.

Dermatologists (attending + residents, where applicable)​

Studyn readersPaired obsUnaidedAidedΔ (pp)ThresholdStatus
BI_20244 (att.)40057.25%65.65%+8.39≥5 ppDirectional, supportive (n=4 under-powered)
PH_20240—————No dermatologists in PH_2024 panel
SAN_20246 (att.)9176.47%86.93%+10.50≥5 ppPASS
MAN_20259 (3 att. + 6 res.)1,33447.90%64.24%+16.34— (pooled is the endpoint)Exploratory; derm baseline is low because FP V–VI images are harder even for specialists

Primary care / general practitioners (physicians)​

Studyn readersPaired obsUnaidedAidedΔ (pp)ThresholdStatus
BI_2024111,04944.71%61.71%+17.00≥10 ppPASS
PH_20249~27063.70%81.85%+18.15≥10 ppPASS (PH_2024 is PCP-only by design)
SAN_20241031062.90%89.92%+27.00≥10 ppPASS
MAN_20254 (1 att. + 3 res.)59636.58%64.93%+28.36— (pooled is the endpoint)Exploratory; 4 primary-care readers in MAN_2025

Nursing (MAN_2025 only — CIP-eligible category)​

Studyn readersPaired obsUnaidedAidedΔ (pp)
MAN_2025344630.49%67.71%+37.22

Nursing was not admitted as a reader category in BI_2024, PH_2024 or SAN_2024 (those CIPs pre-date the CIP correction that explicitly admits licensed nurses with skin/wound scope). MAN_2025 is the first study in the programme to include them.

MAN_2025 by qualification tier (fully-qualified attendings + senior nurses vs MIR residents)​

Tiern readersPaired obsUnaidedAidedΔ (pp)
Fully-qualified-target (attendings + senior nurses)71,03938.50%69.59%+31.09
Resident-target (MIR residents)91,33744.35%61.56%+17.20

Counter-intuitive at first glance: residents have a higher unaided baseline than attendings in MAN_2025. That's because the "fully-qualified" bucket mixes attending dermatologists, attending primary-care, and three licensed nurses with dermatology/wound scope — the nurse readers drag the unaided baseline down, and the device then pulls them up more than it pulls up the MIR residents. The CIR's sensitivity row reports this explicitly; the regulatory endpoint remains the pooled primary-cohort Δ of +23.27 pp.

What the per-specialty patterns say​

  1. Specialists converge aided. Dermatologists end up in a 64–87% aided-accuracy band across all three studies that recruited them. BI_2024 dermatologist aided (65.65%) and MAN_2025 dermatologist aided (64.24%) sit almost on top of each other despite the cohorts being entirely different — in BI it's rare pustular dermatoses on mostly European skin; in MAN it's common conditions on FP V–VI skin. The device plateau is similar.

  2. PCPs benefit most in absolute terms. +17.00 (BI), +18.15 (PH), +27.00 (SAN), +28.36 (MAN) — every PCP stratum clears ≥10 pp with room to spare. MAN_2025 has the largest PCP Δ because the PCP baseline on FP V–VI is very low (36.58%).

  3. Nursing is the strongest Δ of any stratum in the programme (+37.22 pp). That is directly regulatorily important: it supports the CEP claim that the device delivers value for the full intended-user population, not just physicians, and it supports Celine's Pillar-3 indirect-benefit causal chain (less-expert reader → more device benefit → better patient decision).

  4. Cross-study specialty comparisons must be read carefully. The image sets differ; the conditions differ; reader experience distributions differ. You can compare shapes (derms always lowest Δ, PCPs always larger, nursing largest where present) but you cannot compare levels (SAN's 89.92% aided PCP accuracy on easy derm cases does not mean SAN's PCPs are better than MAN's PCPs; the cases are easier).

Data provenance for the MAN_2025 per-specialty rows​

Computed from the locked dataset by bucketing qualified readers by readers.json → specialty:

Specialty (onboarding form)Qualified readers in MAN_2025
dermatologyR-01, R-04, R-05, R-07, R-08, R-09, R-10, R-12, R-19 (n=9)
general (primary care)R-02, R-13, R-14, R-18 (n=4)
nursingR-06, R-16, R-17 (n=3)

Then the same (diagnosis, assisted-diagnosis) paired comparison as the primary endpoint, filtered to each specialty bucket. The code for this lives in apps/qms/src/components/Man2025/analytics.ts (filterReaders + computePaired); see apps/qms/src/components/Man2025/CLAUDE.md for cohort-API semantics.

Stage 3 referral (MAN_2025 only — exploratory descriptive)​

MetricResult
Malignant cases referred for specialist review150/160 = 93.75%
Benign cases correctly NOT referred875/2220 = 39.41%
Device-level ROC AUC on malignancy (atlas truth)0.878 (10 malig / 139 benign)

The stage 3 referral readout is descriptive only. 10 malignant cases is insufficient for a confirmatory malignancy-accuracy or malignancy-referral-sensitivity claim; that is delegated to the NMSC dedicated investigation and to the PMCF Plan (R-TF-007-002).

Usability / workflow (SAN_2024 and PH_2024)​

StudyNo-referral rateRemote-consult feasibilityUtility score
BI_2024not reportednot reportednot reported
PH_202448.89%60.74%—
SAN_202458.1%55.11%8.0/10 usability · 7.3/10 diagnostic utility
MAN_2025referral data reported as Stage 3 descriptive (above)——

Cross-study patterns worth calling out​

  1. All four studies pass the ≥10 pp bar. Across four independent cohorts and four different reader panels (total 56 HCPs, 4,500+ paired observations), the device's core diagnostic-accuracy claim holds. This is the argument the CER uses in §Clinical performance — confirmatory MRMC programme.

  2. Baseline accuracy tracks cohort difficulty exactly as expected.

    • SAN_2024 (general derm, mostly FP I–III): highest unaided baseline 68.08%
    • PH_2024 (pigmented lesions, experienced readers): 63.70%
    • BI_2024 (rare diseases, harder diagnoses even for specialists): 47.94%
    • MAN_2025 (FP V–VI, under-represented phototypes, harder for HCPs AND harder for the device): 41.79%
  3. The harder the cohort, the larger the lift from the device. This is the main regulatory story: MAN_2025 has the lowest baseline and the largest Δ (+23.27 pp). BI_2024 rare-disease pooled stratum shows +32.32 pp. SAN_2024 PCP stratum shows +27.00 pp. The device delivers the most incremental value where the clinician is least confident. This supports the Pillar-3 clinical-performance claim and the indirect-benefit causal chain (see celine-clinical-consultant agent).

  4. Aided performance converges across cohorts. Aided top-1 accuracy sits in a 63–89% band regardless of cohort difficulty. The spread narrows dramatically post-device (unaided range 41.79–68.08% → aided range 63.06–88.78%). The device compresses reader variance — a secondary argument that can be deployed in PMCF reasoning if reader-variance claims become relevant.

  5. Statistical power is not the constraint in any of these. Even PH_2024 with n=9 readers produces p<0.001 on a self-controlled design. MAN_2025's 2,376 paired observations (χ² cc = 490.67, p ≈ 1×10⁻¹⁰⁸) make it by far the most statistically robust of the four.

  6. BSI's "MRMC is not clinical data" stance (Nick, clarification meeting) does not hurt us. The four MRMC studies are framed as Rank 11 Pillar 3 simulated-use performance evidence — supporting, not primary. Primary real-world evidence is delivered by the legacy-device RWE study (task-3b2-3b3-legacy-rwe-study/). The MRMC programme shows controlled-environment performance; the RWE study shows real-world performance. Together they cover both dimensions (see §Evidence-hierarchy positioning in CLAUDE.md).

Caveats worth documenting​

  • BI_2024 and PH_2024 per-pathology breakdowns are exploratory, not confirmatory. No multiple-testing correction was pre-specified; only aggregate pooled endpoints support the CE claim.
  • SAN_2024 does NOT support Fitzpatrick V–VI claims (Fitzpatrick V = 1 image = 3.6%, Fitzpatrick VI = 0 images in its set). MAN_2025 exists specifically to close this gap. SAN_2024's limitations section makes this explicit.
  • MAN_2025's malignancy readout is descriptive only. 10 malignant cases (7 melanoma, 3 BCC) are not enough for confirmatory malignancy claims. The NMSC study is the confirmatory source.
  • Cross-study Δ comparisons are directionally meaningful, not like-for-like. The image sets differ by design; cohort baselines differ; reader panels differ in specialty mix. The common axis is the device and the ≥10 pp threshold — not the absolute numbers. Do not draw inferences like "MAN_2025 is 1.5× better than BI_2024"; that is meaningless given the differences in case difficulty.
  • MAN_2025 uses public-atlas images, not trial-enrolled patients. The CIP is explicit that the study subjects are the READERS, not the individuals in the images. ISO 14155 Annex E ethics-non-applicability determination is documented in R-TF-015-010 MAN_2025 instance.

Data / script provenance​

ArtefactLocation
MAN_2025 raw datasetapps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Investigation/man-2025/data/
MAN_2025 extract scriptsapps/qms/scripts/fetch-man2025-sheets.mjs · apps/qms/scripts/build-man2025-dataset.py
MAN_2025 analytics moduleapps/qms/src/components/Man2025/analytics.ts
MAN_2025 renderersapps/qms/src/components/Man2025/{PrimaryOutcomeTable,AcceptanceCriteriaResultsTable,ReaderDemographicsTable}.tsx
Shared acceptance-criteria SoTpackages/ui/src/components/PerformanceClaimsAndClinicalBenefits/performanceClaims.ts (rows M2N, M2A, M2R)
BI_2024 / PH_2024 / SAN_2024 CIRssigned documents under Investigation/{bi,ph,san}-2024/r-tf-015-006.mdx
CER cross-referencelegit-health-plus-version-1-1-0-0/.../Evaluation/R-TF-015-003-Clinical-Evaluation-Report.mdx §Clinical performance
Statistical-summary CER appendixlegit-health-plus-version-1-1-0-0/.../Evaluation/r-tf-015-013-statistical-summary.mdx

How to refresh MAN_2025 numbers​

# Pull latest sheet snapshot (service-account creds required)
node apps/qms/scripts/fetch-man2025-sheets.mjs

# Rebuild the de-identified dataset (writes data/*.json into the CIR folder)
python apps/qms/scripts/build-man2025-dataset.py

# Type-check and render
cd apps/qms && npx tsc --noEmit -p .
npm run start # QMS on localhost:3000 — MAN_2025 CIR tables render from the JSON

Both the CIP's <AcceptanceCriteriaTable studyCode="MAN_2025" /> and the CIR's <Man2025AcceptanceCriteriaResultsTable /> pick up new thresholds / observed values automatically on the next build. No hand-editing of numerical tables anywhere in the MDX.

Related internal workspaces​

  • task-3b2-3b3-legacy-rwe-study/ — the real-world-evidence study (primary clinical data under Nick's hierarchy) that pairs with this MRMC programme.
  • task-3b13-man-2025-cep-cip-completeness/ — downstream CEP-row pull-through for MAN_2025.
  • task-3b14-ifu-integration-requirements-verification/ — integrator-responsibility mandate wording (Celine's agent check).
Previous
Review agents pass
Next
Per-Epidemiological-Group Performance — Analysis Notes
  • Study design at a glance
  • Primary endpoint — paired top-1 accuracy improvement (pooled primary cohort)
    • MAN_2025 computation provenance
  • Secondary endpoints
    • Sensitivity / specificity (HCP decision, unaided → aided)
    • Specialty strata — paired top-1 accuracy, broken out by reader specialty
      • Dermatologists (attending + residents, where applicable)
      • Primary care / general practitioners (physicians)
      • Nursing (MAN_2025 only — CIP-eligible category)
      • MAN_2025 by qualification tier (fully-qualified attendings + senior nurses vs MIR residents)
    • What the per-specialty patterns say
    • Data provenance for the MAN_2025 per-specialty rows
    • Stage 3 referral (MAN_2025 only — exploratory descriptive)
    • Usability / workflow (SAN_2024 and PH_2024)
  • Cross-study patterns worth calling out
  • Caveats worth documenting
  • Data / script provenance
  • How to refresh MAN_2025 numbers
  • Related internal workspaces
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)