MAN_2025 — De-identified study dataset
This directory holds the de-identified dataset used by the Clinical
Investigation Report (CIR) for the MAN_2025 MRMC study. The JSON files here
are the single source of truth for every numerical table in
apps/qms/docs/legit-health-plus-version-1-1-0-0/product-verification-and-validation/clinical/Investigation/man-2025/r-tf-015-006.mdx.
Contents
| File | What it is |
|---|---|
readers.json | Anonymised reader demographics (one row per enrolled reader, with an anonymised R-NN code in place of the reader's name). |
submissions.json | One row per answer captured at Stages 1 (unassisted diagnosis), 2 (assisted diagnosis) and 3 (referral). ICD-11 codes stored as strings; Google Sheets' scientific-notation corruption (1e+91) is already reversed. |
cases.json | Per-case metadata for the 149 cases in the study set (ground truth, device output on the converted image, Fitzpatrick phototype, source study). |
pathology-map.json | ICD-11 code → display name dictionary used when rendering tables. |
meta.json | Export timestamp, data counts and data freshness note used by the banner component. |
How the dataset is produced
The MAN_2025 study platform (multireader-multicase repository, study id
man-2025) writes every reader response to a Google Sheets spreadsheet. The
extraction script at scripts/build-man2025-dataset.py reads the sheet via
the mrmc-sheets@mrmc-man-2025.iam.gserviceaccount.com service account,
joins it against the case metadata in the multireader-multicase repo and
produces the five JSON files above.
Running the extraction:
# 1. Fetch the current Google Sheets credentials from Firebase
firebase apphosting:secrets:access GOOGLE_SHEETS_PRIVATE_KEY --project=mrmc-man-2025 > /tmp/mrmc-man-2025-key.pem
# 2. Run the extractor (writes to this directory)
python scripts/build-man2025-dataset.py
The extractor never persists personally-identifying information. Reader names
are replaced with sequential anonymous codes (R-01, R-02, …), and upload
URLs for CVs and certifications are stored as boolean hasCvFile /
hasCertificationFile flags rather than URLs. The only PII handling required
outside this repo is the secure storage (by the Principal Investigator) of
the reader onboarding master list that maps R-NN codes back to names.
Do not edit by hand
These JSON files are generated. Manual edits will be overwritten on the next
extraction. To change how the data is aggregated, edit the extractor script
or the analytics module (apps/qms/src/components/Man2025/analytics.ts).