MAN_2025 — De-identified study dataset

This directory holds the de-identified dataset used by the Clinical Investigation Report (CIR) for the MAN_2025 MRMC study. The JSON files here are the single source of truth for every numerical table in apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Investigation/man-2025/r-tf-015-006.mdx.

File	What it is
`readers.json`	Anonymised reader demographics (one row per enrolled reader, with an anonymised `R-NN` code in place of the reader's name).
`submissions.json`	One row per answer captured at Stages 1 (unassisted diagnosis), 2 (assisted diagnosis) and 3 (referral). ICD-11 codes stored as strings; Google Sheets' scientific-notation corruption (`1e+91`) is already reversed.
`cases.json`	Per-case metadata for the 149 cases in the study set (ground truth, device output on the converted image, Fitzpatrick phototype, source study).
`pathology-map.json`	ICD-11 code → display name dictionary used when rendering tables.
`meta.json`	Export timestamp, data counts and data freshness note used by the banner component.

How the dataset is produced

The MAN_2025 study platform (multireader-multicase repository, study id man-2025) writes every reader response to a Google Sheets spreadsheet. The extraction script at scripts/build-man2025-dataset.py reads the sheet via the mrmc-sheets@mrmc-man-2025.iam.gserviceaccount.com service account, joins it against the case metadata in the multireader-multicase repo and produces the five JSON files above.

Running the extraction:

# 1. Fetch the current Google Sheets credentials from Firebase
firebase apphosting:secrets:access GOOGLE_SHEETS_PRIVATE_KEY --project=mrmc-man-2025 > /tmp/mrmc-man-2025-key.pem

# 2. Run the extractor (writes to this directory)
python scripts/build-man2025-dataset.py

The extractor never persists personally-identifying information. Reader names are replaced with sequential anonymous codes (R-01, R-02, …), and upload URLs for CVs and certifications are stored as boolean hasCvFile / hasCertificationFile flags rather than URLs. The only PII handling required outside this repo is the secure storage (by the Principal Investigator) of the reader onboarding master list that maps R-NN codes back to names.

Do not edit by hand

These JSON files are generated. Manual edits will be overwritten on the next extraction. To change how the data is aggregated, edit the extractor script or the analytics module (apps/qms/src/components/Man2025/analytics.ts).

Contents​

How the dataset is produced​

Do not edit by hand​

Contents

How the dataset is produced

Do not edit by hand