R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping
Table of contents
Context
The Legit.Health Plus device development dataset is compiled from multiple sources: retrospective data (public atlases) and prospective data (clinical studies). Each source provides diagnostic labels in various formats and nomenclatures. Retrospective datasets may use abbreviated terms (e.g., "BCC", "SCC"), common names, or legacy coding systems, while prospective data may use formal diagnoses or other standardized terminologies.
To ensure consistency, clinical validity, and regulatory compliance, all diagnostic labels must be mapped to a single, standardized classification system: ICD-11 (International Classification of Diseases, 11th Revision).
This document describes the formal process for standardizing and mapping all diagnosis labels from heterogeneous data sources to their corresponding ICD-11 categories. The mapping will be performed by a qualified medical expert who will review each unique diagnosis string present in the merged dataset and assign the appropriate ICD-11 code and description based on established medical literature and clinical guidelines.
Objectives
The primary objectives of this annotation procedure are:
- To create a definitive, standardized ICD-11 mapping table that formally links every unique diagnostic label string from retrospective and prospective datasets to its corresponding ICD-11 code and description.
- To ensure this mapping is clinically accurate, consistent, and justifiable based on current medical knowledge and ICD-11 classification guidelines.
- To resolve ambiguities and variations in diagnostic nomenclature (e.g., "BCC" → "Basal cell carcinoma" → ICD-11 code) to establish a unified diagnostic vocabulary.
- To produce a version-controlled artifact that serves as the ground truth diagnostic classification for all images in the development dataset, as specified in the
R-TF-028-001 AI/ML Description
.
Annotation Personnel
Role
Medical expert (Dermatologist).
Qualifications
- Required: Board-certified dermatologist.
- Recommended: Extensive clinical experience (>5 years) in diagnosing a comprehensive range of dermatological diseases, including neoplastic, inflammatory, and infectious conditions.
Responsibilities
- To review the complete list of unique diagnosis strings extracted from both retrospective and prospective datasets.
- To assign the appropriate ICD-11 code and description to each unique diagnosis string, ensuring clinical accuracy and consistency.
- To provide written justification referencing medical literature for any ambiguous or complex mappings.
Annotation Protocol
The creation of the ICD-11 mapping follows a structured, multi-step process.
Step 1: Data Merging and Diagnosis String Extraction
The AI Team will merge the final curated datasets from all retrospective and prospective sources. From this merged dataset, a comprehensive list of all unique diagnosis label strings will be extracted. This list will include all variations and nomenclatures used across different sources (e.g., "BCC", "basal cell carcinoma", "Basalioma").
Step 2: Mapping Matrix Preparation
The AI Team will prepare a data entry spreadsheet (the "mapping matrix"). This spreadsheet will contain:
- Rows: Each row will represent a unique diagnosis string found in the merged dataset (e.g., "BCC", "melanoma", "psoriasis vulgaris").
- Columns:
- Original Diagnosis String: The exact label as it appears in the source dataset.
- ICD-11 Code: To be filled by the medical expert.
- ICD-11 Description: The standardized full description corresponding to the ICD-11 code.
- Source Dataset(s): Indication of which dataset(s) contain this label.
- Justification: A column for the annotator to add comments, rationale, or literature references for the mapping decision.
Step 3: Medical Review and ICD-11 Assignment
The designated Medical Expert will review the matrix row by row. For each unique diagnosis string, the expert will:
- Identify the appropriate ICD-11 category based on clinical knowledge, medical literature, and the official ICD-11 classification guidelines.
- Assign the corresponding ICD-11 code (e.g., "2C30" for Basal cell carcinoma of skin).
- Enter the standardized ICD-11 description (e.g., "Basal cell carcinoma of skin").
- Document justification for any ambiguous cases, multiple possible mappings, or when clinical judgment was required to select between similar categories.
Mapping Guidelines:
- Abbreviations and Acronyms: Map common abbreviations to their full clinical equivalents (e.g., "BCC" → ICD-11 code for "Basal cell carcinoma of skin").
- Synonyms and Variants: Multiple diagnosis strings that refer to the same condition should be mapped to the same ICD-11 code (e.g., "BCC", "basal cell carcinoma", "Basalioma" → same ICD-11 code).
- Ambiguous Labels: When a diagnosis string is ambiguous or could map to multiple ICD-11 codes, the expert should:
- Select the most clinically appropriate and specific code based on context.
- Document the rationale and alternative codes considered in the justification column.
- Non-specific or Incomplete Labels: If a diagnosis string is too vague to map to a specific ICD-11 code, map to the most appropriate parent category and document the limitation.
- Legacy Coding Systems: For labels using older classification systems (ICD-10, SNOMED, etc.), translate to the corresponding ICD-11 code using official crosswalk tables when available, verified by clinical expertise.
Step 4: Finalization
Once the matrix is fully populated, the Medical Expert will conduct a final self-review before submitting it to the AI Team.
Quality Control and Review
To ensure the highest level of clinical accuracy and robustness, the following quality control steps will be implemented:
- Primary Review: The completed matrix will be reviewed by the annotating expert to ensure completeness and internal consistency.
- Secondary Review: The completed and justified matrix will be independently reviewed by a second board-certified dermatologist who was not involved in the initial annotation.
- Consensus Resolution: Any discrepancies between the primary annotation and the secondary review will be resolved through a consensus meeting between the two experts. The final decision and its rationale will be documented.
- Final Approval: The consensus-driven matrix is formally approved and version-controlled. This finalized matrix will serve as the definitive logic and ground truth basis for all subsequent validation of the binary indicators.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Author: Team members involved
- Reviewer: JD-003, JD-004
- Approver: JD-001