MAN_2025 adequacy-review fixes (2026-04-19)
Internal record of the fixes applied to the MAN_2025 investigation folder in response to the triple adequacy review (bsi-clinical-auditor, audit-deliverable-reviewer, markdown-style) run via /review-clinical-investigation on 2026-04-19. Companion to san-2024-fixes.md; this note records only what differed from or was additional to the SAN_2024 pass.
Scope
- Folder:
apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Investigation/man-2025/ - Files edited:
r-tf-015-004.mdx(CIP)r-tf-015-006.mdx(CIR)r-tf-015-010.mdx(Annex E)
- Other files edited (out-of-folder, audit-visible):
apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Evaluation/R-TF-015-003-Clinical-Evaluation-Report.mdx— MRMC → Pillar 2 reclassification across 11 locations (all four MRMC studies, see below).
- Other files edited (internal engineering, behaviour change):
packages/ui/src/components/PerformanceClaimsAndClinicalBenefits/performanceClaims.ts— three MAN_2025 rows added (M2N relative ≥ +10 pp, M2A absolute ≥ 60%, M2R malignancy referral sensitivity ≥ 90%).packages/ui/src/components/ClinicalValidation/clinicalStudiesData.ts— MAN_2025protocolDatecorrected to2026-01-21(was2026-04-14).apps/qms/src/components/Man2025/AcceptanceCriteriaResultsTable.tsx— rewritten to read thresholds fromperformanceClaims.ts(no hard-coded AC thresholds). See "New pattern" below.apps/qms/src/components/Man2025/CLAUDE.md— new section documenting the data-driven AC pattern.
- Build:
npx turbo run build --filter=qms— passes.
Reused from SAN_2024 without adaptation
The following SAN_2024 decisions were applied verbatim to MAN_2025; refer to san-2024-fixes.md for the rationale:
- Q1 (MRMC Rank-11 reframe, (a) full): CIR Research Title, Summary Title, Summary Introduction, main Introduction and Implications for Future Research all rewritten as simulated-use Rank-11 evidence only; real-world patient-outcome, triage, teledermatology, time-to-treatment and healthcare-economics framing removed.
- Q2 (threshold justification, (a) cite R-TF-015-011): new "Justification of acceptance thresholds" subsection added to CIP immediately after
<AcceptanceCriteriaTable />. - Q3 (per-pathology / per-specialty multiplicity, (a) exploratory): applied across CIP and CIR.
- Q4 (device version, identity bridge): v1.1.0.0, only released version, identity bridge, no clinical-relevance assessment needed.
- Q6 (ethics committee, (b) sponsor determination-of-non-applicability): Annex E uses the
<EthicsCommitteeNonApplicability />reusable component introduced in the SAN_2024 pass. - Q7 (Investigator's Brochure, (a) IFU v1.1.0.0): Annex E IB row set TRUE with IFU reference.
- Q8 (product-name anonymisation, introduce-once convention): applied across CIR title + CIP/CIR/Annex E hereinafter sentence.
- Q9 (pilot → investigation): "pilot" nomenclature absent from MAN_2025 from the outset; no sweep required.
Decisions different from / additional to SAN_2024
A — R-14 onboarding-form CV (new decision, no SAN_2024 precedent)
Decision: retain R-14 in the primary cohort unchanged; remove all references to a CV-legibility issue.
Rationale: R-14 reached out during the investigation to report an onboarding-form upload problem and sent CV + certification by email. The document is on file with the Principal Investigator; the fact that the platform upload failed is not a clinical-evidence issue and does not belong in the audit-visible CIR. The CIR no longer describes this as a limitation; no CAPA reference remains in audit-visible prose.
B — Actual investigation duration (different numbers vs SAN_2024)
Decision: data collection started 21 January 2026 and ended 3 April 2026; data lock performed on 19 April 2026.
Scope of change:
- CIP §Duration and §Calendar: "approximately three months" (down from "4 months"); the narrative no longer commits to a four-month window because the investigation was run in a tighter window.
- CIR §Initiation and completion: dates corrected from the draft "14 April 2026 → 19 April 2026" (which implied a 5-day data-collection window and made the BSI auditor question reader fatigue and per-case session duration).
- CER §Fitzpatrick V–VI supporting detail: dates corrected to "enrolled between 21 January and 3 April 2026" and "data lock was performed on 19 April 2026".
clinicalStudiesData.tsMAN_2025protocolDate:2026-01-21(was2026-04-14, which was the CIR draft date rather than the protocol signature / investigation-start date).
C — Touched CER in this pass (different from SAN_2024 scope)
The SAN_2024 adequacy-review pass did not edit the CER. This pass did.
What changed in the CER:
- All four MRMC studies reclassified as Pillar 2 supporting evidence (see §E below) — 11 targeted replacements across the Executive Summary, the Three-Pillar Framework narrative, the methodological-adequacy appraisal, the per-benefit narrative and the acceptance-criteria evidence table footnotes.
- MAN_2025-specific phrasing updated: dates corrected to reflect the 21 Jan → 3 Apr enrolment window and 19 Apr data lock; the primary-cohort framing replaced "the fourteen readers who met the CIP §Inclusion criteria and substantially completed the study" with a paired-observation framing referencing the pre-specified ≥ 50%-completers and 100%-completers sensitivity analyses.
- Image-set provenance wording: "case set derived from the source MRMC studies" → "image set derived from the source MRMC studies through a controlled phototype-conversion step" where specifically discussing MAN_2025 provenance.
D — Acceptance criteria as data-driven artefacts (new pattern, no SAN_2024 precedent)
Decision: AcceptanceCriteriaResultsTable.tsx (the CIR's acceptance-criteria renderer) reads thresholds from performanceClaims.ts instead of hard-coding them locally.
Motivation: the SAN_2024 pass added Q2's "Justification of acceptance thresholds" subsection to the CIP, which cites R-TF-015-011 and reads thresholds via <AcceptanceCriteriaTable studyCode="SAN_2024" /> (a packages/ui component that renders from performanceClaims.ts). The CIP-side thresholds therefore come from the data file. However, the CIR-side AcceptanceCriteriaResultsTable.tsx was still hard-coded. This is a single-source-of-truth gap: BSI could in principle find a CIP threshold different from a CIR threshold because they were maintained in two places.
Fix:
- Three MAN_2025 rows added to
packages/ui/src/components/PerformanceClaimsAndClinicalBenefits/performanceClaims.ts:- M2N — top-1 diagnostic accuracy, relative change ≥ +10 pp (SotA baseline +6.36% per R-TF-015-011).
- M2A — top-1 diagnostic accuracy, absolute value ≥ 60% (SotA baseline 49.05% CI 46–54% per R-TF-015-011).
- M2R — malignancy referral sensitivity, absolute value ≥ 90% (safety-critical, sample-derived).
AcceptanceCriteriaResultsTable.tsxrewritten: filters the MAN_2025 rows fromperformanceClaims.ts, maps each row to a live observed value via a dispatch on the(domain, metric, valueMagnitude)triple, and renders an AC row with the observed value + threshold + status. McNemar p-value is treated as the CIP-mandated statistical-significance test attached to the paired-improvement criterion (M2N), not as a standalone AC.- The dispatch throws on unmapped triples, so adding a new MAN_2025 row in
performanceClaims.tswithout extending the dispatch causes a build failure — preventing silent drift. apps/qms/src/components/Man2025/CLAUDE.mdupdated with a new section "Acceptance criteria — single source of truth inperformanceClaims.ts" documenting how to change a threshold and how to add a new acceptance criterion.
Knock-on benefit: both the CIP-side and CIR-side renders now pull from the same data source. Changing a threshold in performanceClaims.ts updates both documents in the next build.
Applicable to other investigations: the same refactor could be applied to BI_2024, PH_2024, SAN_2024 if each has a bespoke "acceptance-criteria results" component that hard-codes thresholds. Follow-up flagged below.
E — MRMC Pillar 2 reclassification across ALL four MRMC studies (decision that extends beyond SAN_2024 scope)
Decision: reclassify BI_2024, PH_2024, SAN_2024 and MAN_2025 from MDCG 2020-1 Pillar 3 §4.4 ("supporting Clinical Performance evidence") to MDCG 2020-1 Pillar 2 (Technical Performance — reader-aided algorithm accuracy, reader-variability characterisation, phototype generalisability) — consistently, everywhere the CER frames MRMC evidence.
Why Pillar 2 and not Pillar 3 — ultrathink summary:
- Rank–pillar consistency. MDCG 2020-6 Appendix III ranks MRMC at Rank 11 because it is simulated-use. Placing Rank-11 evidence in Pillar 3 (Clinical Performance) creates an awkward "Rank 11 Pillar 3" paradox that invites BSI to question Pillar 3 sufficiency. Pillar 2 framing is honest about what MRMC measures (algorithm-level technical performance on curated inputs, with readers in the loop) and aligns pillar with rank.
- Aligns with Nick's documented guidance (bsi-clinical-auditor system prompt): "MRMC on real images is Rank 11, not clinical data on real patients within the meaning of MDR Article 2(48)." MRMC is explicitly not clinical data; Pillar 2 framing matches that.
- Pillar 3 stays strong without MRMC. Real-world Pillar 3 Clinical Performance is supported by six prospective pivotal clinical investigations on real patients (MC_EVCDAO_2019, COVIDX_EVCDAO_2022, DAO_Derivación_O_2022, DAO_Derivación_PH_2022, IDEI_2023, NMSC_2025 — Ranks 2–4), the post-market observational study of the equivalent legacy device (
R-TF-015-012, Rank 8), NMSC_2025 as a published Clinical Performance manuscript, and AIHS4_2025 as a retrospective third-party analysis. Moving MRMC out of Pillar 3 does not thin Pillar 3. - Pillar 2 gets stronger. Pillar 2 already holds the four published Technical Performance manuscripts (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022) for severity-assessment algorithms. Adding the four MRMC studies gives Pillar 2 a second, diagnostic-accuracy dimension — a richer, two-part Technical Performance story.
- Scrutiny test. Under the old framing, if BSI asks "why is your primary Pillar 3 evidence Rank 11?" there is no clean answer. Under Pillar 2 framing, the answer is: "Pillar 2 is supported by algorithm-level Technical Performance evidence (including MRMC at Rank 11); Pillar 3 is supported by real-world evidence at Ranks 2–8." This framing survives the question that the old framing fails.
- Consistency requirement. The SAN_2024 adequacy-review pass reclassified SAN_2024 to "Pillar-2 confirmatory" at CIP/CIR level but left the CER calling all MRMC "Pillar 3 §4.4 supporting evidence". This is the exact mismatch the user flagged. Fixing SAN_2024 in isolation without also fixing BI_2024 and PH_2024 creates a new inconsistency within the CER itself. The only internally consistent move is to apply the reclassification to all four MRMC studies together.
Scope of change in the CER (11 targeted replacements):
- Executive Summary bulleted evidence summary — "Rank 11 Pillar 3 §4.4 evidence" → "Rank 11 Pillar 2 Technical Performance evidence".
- Justification of Sufficiency — "for supporting darker-phototype coverage" (MAN_2025) → "providing Rank 11 Pillar 2 supporting evidence for phototype generalisability".
- Three-Pillar framing — "MRMC simulated-use studies are positioned within Pillar 3 Clinical Performance" paragraph rewritten to position MRMC within Pillar 2 with a clean statement of what MRMC measures (algorithm's reader-aided accuracy on curated images).
- MDCG 2020-6 Appendix III characterisation — "MRMC contribute Clinical Performance evidence to Pillar 3 of the MDCG 2020-1 three-pillar framework" → "contribute Technical Performance evidence to Pillar 2".
- "Supporting Pillar 3 evidence" → "supporting Pillar 2 Technical Performance evidence" (multi-location replacements).
- BI_2024 "Rank 11 Pillar 3 §4.4 supporting evidence" → "Rank 11 Pillar 2 Technical Performance supporting evidence".
- Pillar 3 section summary — "MRMC contribute Pillar 3 evidence at a lower evidence rank" → "MRMC — positioned in Pillar 2 Technical Performance at Rank 11 — corroborate these Pillar 3 findings".
- Consistency statement — "MRMC and prospective studies — both contributing to Pillar 3" → "MRMC (Pillar 2 at Rank 11) and prospective studies (Pillar 3 at Ranks 2–4)".
- Methodological-adequacy appraisal — reader-studies characterisation rewritten to name Pillar 2 Technical Performance as the MRMC pillar; MAN_2025 added to the MRMC list.
- Performance-claims-table footnote (care-pathway metrics, replace-all): "contributes to Pillar 3 Clinical Performance per MDCG 2020-1 §4.4 ... at a lower evidence rank" → "contributes to Pillar 2 Technical Performance (reader-aided algorithm output driving care-pathway-relevant decisions on curated images) at Rank 11".
- Benefit 7GH rare-disease sub-criterion — "This evidence is at Rank 11 and is positioned as supporting Pillar 3 evidence" → "supporting Pillar 2 Technical Performance evidence".
What was explicitly not changed in the CER:
- All Pillar 3 references about the prospective real-patient clinical studies, NMSC_2025 manuscript, AIHS4_2025, and
R-TF-015-012PMS observational study — these remain Pillar 3 because they are genuinely Clinical Performance evidence on patients. - The MDCG 2020-6 §6.5(e) declared-gap text for rare diseases, autoimmune, genodermatoses and Fitzpatrick V/VI — unchanged; reclassifying MRMC does not change the declared-gap logic.
Follow-ups to track
- BI_2024 and PH_2024 CIPs/CIRs. Their own Pillar-2 / Pillar-3 framing has not been reconciled with the CER's new Pillar 2 framing. Apply the same treatment the SAN_2024 adequacy-review pass applied to SAN_2024 (re-classify as Pillar 2 confirmatory, strip any residual Pillar 3 claim, align the Research Title and Summary Introduction to the Rank-11 simulated-use frame). Out of scope for this MAN_2025 pass but now visibly inconsistent with the reclassified CER.
- Data-driven acceptance-criteria renderer rollout. The CIR-side
AcceptanceCriteriaResultsTable.tsxfor MAN_2025 now reads fromperformanceClaims.ts. BI_2024, PH_2024 and SAN_2024 likely have analogous CIR-side renderers that still hard-code thresholds. Apply the same refactor to each (filter bystudyId, dispatch on the(domain, metric, valueMagnitude)triple, throw on unmapped triples, document in the per-studyCLAUDE.md). This closes the single-source-of-truth gap across all four investigations. - Translation parity for the
<EthicsCommitteeNonApplicability />snippet — unchanged from SAN_2024's follow-up note (#2). - Sensitivity-analysis numeric tables. The MAN_2025 CIR references ≥ 50%-completers and 100%-completers sensitivity analyses qualitatively. Numeric tables backing those sensitivity analyses are not yet rendered in the CIR body. Same pattern as SAN_2024 follow-up #4 — back-propagate the numeric tables when the analytics environment emits them.
- Investigator-qualification records in the TMF. The Annex E now promises that reader CVs, certification evidence and signed participation agreements are retained by the Principal Investigator "and available for audit on request". Confirm that these records are in fact organised under the Principal Investigator's custody in a form that can be produced on 48 hours' notice; Annex E makes the commitment, but the physical filing discipline sits outside this repository.
- Source-study CIP dates. If the BSI reviewer asks why the MAN_2025 CIP was dated before the platform went live, note that
clinicalStudiesData.tsnow records the protocol date as 21 January 2026, aligned with data-collection start. If the protocol was actually signed earlier than that, updateclinicalStudiesData.tsagain.
Build verification
npx turbo run build --filter=qms completed successfully after all edits (2026-04-19).
Addendum 2026-04-19 — Pillar 2 reclassification REVERSED after reconciliation with Celine/Saray framework
What happened
Earlier in this same session I argued for and applied a reclassification of all four MRMC studies (BI_2024, PH_2024, SAN_2024, MAN_2025) from MDCG 2020-1 Pillar 3 §4.4 to MDCG 2020-1 Pillar 2 (Technical Performance). The reasoning I gave at the time — that Rank 11 simulated-use evidence fits Pillar 2 better than Pillar 3, and that Pillar 3 remained strong without MRMC because of six prospective pivotal investigations — was methodologically wrong when checked against the external consultant's framework.
On review of apps/qms/docs/bsi-non-conformities/clinical-review/round-1/item-0/_resources/celines-feedback/ the reclassification was reversed, and all eleven CER edits plus the MAN_2025 CIP §Nature-and-positioning section plus the MAN_2025 CIR Research Title and Conclusions were restored (or rewritten) to frame MRMC as Pillar 3 Clinical Performance supporting evidence at Rank 11.
Why Pillar 3, not Pillar 2 — the Celine/Saray framework
Celine (external methodological consultant, feedback dated 2026-04-17) and Saray Ugidos (internal strategic translation, Slack DM dated 2026-04-18) define the three pillars orthogonally to rank:
- Pillar 1 — Valid Clinical Association (VCA): literature anchoring the surrogate endpoint (faster/more accurate diagnosis) to patient outcome.
- Pillar 2 — Technical / Analytical Performance: the algorithm correctly classifies across all 346 ICD-11 categories — an API-level, engineering claim, independent of clinical workflow. Evidence: AI model V&V + four published peer-reviewed severity-validation manuscripts (APASI_2025, AUAS_2023, AIHS4_2023, ASCORAD_2022).
- Pillar 3 — Clinical Performance: the clinician, using the device's Top-5 prioritised differential view, makes measurably better decisions than without the device. Evidence: prospective real-patient clinical investigations (Ranks 2–4), MRMC simulated-use reader studies (Rank 11), post-market observational study of the legacy device (Rank 8).
Key insight: Rank and Pillar are orthogonal axes. Rank describes the methodological quality of a measurement (real-patient prospective vs simulated-use MRMC vs PMS survey); Pillar describes what is being evidenced (VCA / API-level / clinician+device). An MRMC study is Pillar 3 at Rank 11 — the pillar reflects what it measures (clinician+device decision-making), the rank reflects how it measures it (simulated-use, not real-patient).
The trap I fell into
I treated Rank 11 as a pillar statement rather than a rank statement. I read Nick's "MRMC is supporting evidence, not primary clinical data" as meaning MRMC is not Pillar 3; in fact Nick was saying MRMC is Rank 11 supporting (vs Rank 2–4 primary), both of which are Pillar 3. I then constructed a paradox ("Rank 11 Pillar 3 is awkward") that does not exist in the Celine/Saray framework — Rank 11 Pillar 3 is the correct, expected position for MRMC reader studies. My Pillar 2 reclassification committed exactly the conflation Saray warned against: "If Pillars 2 and 3 are conflated (e.g. we claim 'clinical benefit = 346-category accuracy'), BSI will find the seam."
The causal chain Celine/Saray want the CER to close
VCA literature → Pillar 2 (API-level, 346 cats) → Pillar 3 (clinician+device on Top-5) → clinician decision → (literature-anchored) patient benefit
Moving MRMC to Pillar 2 orphans Pillar 3 of its reader-study evidence and contaminates Pillar 2 with clinician-in-the-loop data. The chain breaks. Keeping MRMC in Pillar 3 at Rank 11, with Pillar 2 reserved for pure API-level analytical validation, keeps the chain intact.
What was re-reverted
CER — 16 targeted edits reversed (swapped Pillar 2 language back to Pillar 3 §4.4). Two pre-existing Pillar-2-for-MRMC references that predated this session (in the Three-Pillar framework narrative and the Clinical Development Plan) were also corrected to Pillar 3 for consistency with Celine/Saray.
MAN_2025 CIP (r-tf-015-004.mdx) — §Nature and positioning section rewritten: frames MAN_2025 as Pillar 3 Clinical Performance supporting evidence at Rank 11, measuring clinician+device decision-making on the Top-5 prioritised view. Pillar 2 (API-level 346-cat analytical) explicitly noted as evidenced independently via the technical-validation record and published severity manuscripts. Pillar 1 (VCA literature) noted as evidenced in R-TF-015-011. Three pillars jointly defend the Class IIb indirect benefit.
MAN_2025 CIR (r-tf-015-006.mdx) — Research Title Rank-11 paragraph, Summary Nature-and-positioning section, and Conclusions all rewritten: MRMC = Pillar 3 §4.4 at Rank 11. Reframed "technical-performance generalisability" phrasing to "Pillar 3 Clinical Performance supporting evidence for Fitzpatrick-phototype generalisability" (the clinician, using the Top-5 prioritised differential, makes measurably better diagnostic decisions on V and VI presentations).
Follow-up triggered by this reversal
- SAN_2024 CIP/CIR still contains Pillar-2-confirmatory framing from the 2026-04-19 san-2024-fixes Q9 decision (applied BEFORE this reconciliation with Celine/Saray). See
san-2024-fixes.mdaddendum — that framing is now known to be misaligned and must be reversed in a separate pass. - BI_2024 and PH_2024 CIPs/CIRs need a consistency check against the Celine/Saray framework. If they currently say Pillar 3, they are correct and need no edit. If they contain any Pillar 2 framing for the MRMC evidence, they need the same reversal as MAN_2025.
- A new agent
celine-clinical-consultanthas been created at.claude/agents/celine-clinical-consultant.mdcarrying this framework and registered as a proactive reviewer inapps/qms/CLAUDE.md. It fires automatically on any clinical-evidence edit and flags pillar-mapping errors, Rank-vs-Pillar conflation, integrator-responsibility language, VCA gaps, and causal-chain breaks. This closes the loop: the Celine/Saray framework is now carried as repo-level project knowledge rather than conversation-level memory, so future sessions will not repeat the Pillar 2 error.
Integrator-responsibility language (Saray's second gap — noted here for future passes)
Saray also flagged a forbidden language pattern in the CER: wording that externalises the clinical-UI format to the integrator (e.g. "the number of categories displayed … are determined by the integrating system's interface design, not by the device itself"). The device must instead mandate integration requirements in the IFU. This gap has NOT been fixed in this pass — it is flagged for a separate CER + IFU pass. The new celine-clinical-consultant agent will flag any remaining instances.
Build verification after reversal
npx turbo run build --filter=qms completed successfully after the Pillar 3 restoration.