R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping
Table of contents
Context
The Legit.Health Plus device development dataset, known as LegitHealth-DX, is compiled from multiple heterogeneous sources and prepared first for the development of the ICD Category Distribution, which requires the correct labelling of the diagnoses according to the ICD-11 classification system. This document focuses extrictly on that.
The sources of the data include:
- Archive Data: Images of skin lesions provinient from repositories with diagnostic confirmations
 - Custom Gathered Data: Clinical studies and prospectively collected datasets.
 
Each source provides diagnostic labels in various formats and nomenclatures. Some arhive sources may use abbreviated terms (e.g., "BCC", "SCC"), common names, alternative spellings (e.g., "Hemangioma" vs "Haemangioma"), or legacy coding systems, while custom gathered data may use more structured diagnoses or other standardized terminologies.
To ensure consistency, clinical validity, and regulatory compliance across all data sources, all diagnostic labels must be mapped to a single, standardized classification system: ICD-11 (International Classification of Diseases, 11th Revision).
This document describes the formal, multi-stage process for standardizing and mapping all diagnosis labels from these heterogeneous data sources to their corresponding ICD-11 categories. The mapping will be performed by the data science team who will review each unique diagnosis string present in the merged dataset and assign the appropriate ICD-11 code and description based on established medical literature, clinical guidelines, and the official ICD-11 classification system. Dermatologists will also be involved in the review and validation of the mappings to ensure clinical accuracy.
Data Sources Summary
The following table summarizes all data sources gathered and described in the R-TF-028-003 collection documents:
| ID | Dataset Name | Type | Description | ICD-11 Mapping | Crops | Diff. Dx | Sex | Age | 
|---|---|---|---|---|---|---|---|---|
| 1 | Torrejon-HCP-diverse-conditions | Multiple | Dataset of skin images by physicians with good photographic skills | ✓ Yes | Varies | ✓ | ✓ | ✓ | 
| 2 | Abdominal-skin | Archive | Small dataset of abdominal pictures with segmentation masks for `Non-specific lesion` class | ✗ No | Yes (programmatic) | — | — | — | 
| 3 | Basurto-Cruces-Melanoma | Custom gathered | Clinical validation study dataset (`MC EVCDAO 2019`) | ✓ Yes | Yes (in-house crops) | — | ✓ | ✓ | 
| 4 | BI-GPP (batch 1) | Archive | Small set of GPP images from Boehringer Ingelheim (first batch) | ✓ Yes | No | — | — | — | 
| 5 | BI-GPP (batch 2) | Archive | Large dataset of GPP images from Boehringer Ingelheim (second batch) | ✓ Yes | Yes (programmatic) | — | ✓ | ✓ | 
| 6 | Chiesa-dataset | Archive | Sample of head and neck lesions (Medela et al., 2024) | ✓ Yes | Yes (in-house crops) | — | ◐ | ◐ | 
| 7 | Figaro 1K | Archive | Hair style classification and segmentation dataset, repurposed for `Non-specific finding` | ✗ No | Yes (in-house crops) | — | — | — | 
| 8 | Hand Gesture Recognition (HGR) | Archive | Small dataset of hands repurposed for non-specific images | ✗ No | Yes (programmatic) | — | — | — | 
| 9 | IDEI 2024 (pigmented) | Archive | Prospective and retrospective studies at IDEI (DERMATIA project), pigmented lesions only | ✓ Yes | Yes (programmatic) | — | ✓ | ◐ | 
| 10 | Manises-HS | Archive | Large collection of hidradenitis suppurativa images | ✗ No | Not yet | — | ✓ | ✓ | 
| 11 | Nails segmentation | Archive | Small nail segmentation dataset repurposed for `non-specific lesion` | ✗ No | Yes (programmatic) | — | — | — | 
| 12 | Non-specific lesion V2 | Archive | Small representative collection repurposed for `non-specific lesion` | ✗ No | Yes (programmatic) | — | — | — | 
| 13 | Osakidetza-derivation | Archive | Clinical validation study dataset (`DAO Derivación O 2022`) | ✓ Yes | Yes (in-house crops) | ◐ | ✓ | ✓ | 
| 14 | Ribera ulcers | Archive | Collection of ulcer images from Ribera Salud | ✗ No | Yes (from wound masks, not all) | — | — | — | 
| 15 | Transient Biometrics Nails V1 | Archive | Biometric dataset of nail images | ✗ No | Yes (programmatic) | — | — | — | 
| 16 | Transient Biometrics Nails V2 | Archive | Biometric dataset of nail images | ✗ No | No (close-ups) | — | — | — | 
| 17 | WoundsDB | Archive | Small chronic wounds database | ✓ Yes | No | — | ✓ | ◐ | 
Total datasets: 51 | With ICD-11 mapping: 37
Legend: ✓ = Yes | ◐ = Partial/Pending | — = No
Objectives
The primary objectives of this annotation procedure are:
- To create a definitive, standardized "Visible ICD-11" mapping table that formally links every unique diagnostic label string from retrospective and prospective datasets to visually-determined diagnostic categories. Each "Visible ICD-11" category may correspond to a single ICD-11 code or to an array of multiple ICD-11 codes that share indistinguishable or highly similar visual features.
 - To ensure this mapping is clinically accurate, consistent, and justifiable based on current medical knowledge and ICD-11 classification guidelines, while recognizing the limitations of visual assessment alone.
 - To resolve ambiguities and variations in diagnostic nomenclature (e.g., "BCC" → "Basal cell carcinoma" → ICD-11 code) to establish a unified diagnostic vocabulary.
 - To identify and consolidate diagnostically distinct conditions that cannot be reliably distinguished based on visual features alone, preventing the model from learning spurious artifacts. For example, contact dermatitis and atopic dermatitis have different ICD-11 codes but share overlapping visual presentations; these are consolidated into a single "Visible ICD-11" target category (e.g., "Eczematous dermatitis") that encompasses both conditions. The final differentiation between such conditions is the responsibility of the healthcare professional, who has access to additional clinical information (patient history, symptoms, triggering factors, etc.) beyond what is visible in the image.
 - To produce a version-controlled artifact that serves as the ground truth diagnostic classification for all images in the development dataset, as specified in the 
R-TF-028-001 AI/ML Description. 
Annotation Personnel
Primary Annotation Role: JD-009 Medical Data Scientist
Qualifications
- Required: Position JD-009 as defined in the organizational structure, with expertise in medical data processing and standardization.
 - Recommended: Experience with medical terminologies, classification systems (ICD-10, ICD-11, SNOMED CT), and dermatological datasets.
 - Required Knowledge: Understanding of dermatological conditions and their visual manifestations sufficient to perform initial mapping decisions.
 
Responsibilities
The JD-009 Medical Data Scientist performs the following processing work:
- To review the complete list of unique diagnosis strings extracted from both all dataset sources.
 - To assign the appropriate "Visible ICD-11" category name and ICD-11 code(s) to each unique diagnosis string, leveraging the existing diagnostic labels already present in the source datasets.
 - To identify synonyms, abbreviations, and spelling variations and map them to standardized categories.
 - To use the official ICD-11 browser, medical literature, and clinical resources to perform initial mappings.
 - To identify cases requiring clinical consultation (e.g., decisions about merging visually indistinguishable conditions, ambiguous diagnoses, or complex differential diagnoses).
 - To document all mapping decisions and maintain the master mapping spreadsheet.
 - To coordinate with the dermatologist for validation of clinically complex or ambiguous mappings.
 
Supporting Clinical Role: JD-022 Medical Manager
Qualifications
- Required: Board-certified dermatologist.
 - Recommended: Extensive clinical experience (>10 years) in diagnosing a comprehensive range of dermatological diseases, including neoplastic, inflammatory, and infectious conditions.
 
Responsibilities
The dermatologist provides clinical expertise for specific decisions, including:
- To provide clinical consultation on ambiguous or complex mapping decisions identified by the data scientist.
 - To validate decisions regarding the consolidation of multiple ICD-11 codes into single "Visible ICD-11" categories when conditions cannot be reliably distinguished based on visual features alone.
 - To resolve clinical doubts about differential diagnoses, overlapping conditions, or borderline cases.
 - To review and approve category mergers and exclusions proposed by the data scientist.
 - To provide written justification referencing medical literature for clinically complex mappings.
 - To conduct periodic quality control reviews of completed mappings to ensure clinical accuracy.
 
Annotation Protocol
The creation of the ICD-11 mapping follows a structured, multi-step process that integrates data from multiple sources into a unified, standardized dataset.
Data Source Processing and Label Extraction
For each new data source added to LegitHealth-DX, the Medical Data Science (MDS) Team will:
- Create a source-specific processing script in the 
sources/XXXfolder (where XXX is the source name), namedadd_XXX_images.py. - Generate a dataset CSV (
XXX_dataset.csv) with standardized metadata including image paths and diagnostic labels as they appear in the original source. - Extract unique diagnosis strings and create a source-specific renaming file (
XXX_renaming.csv) containing all unique diagnostic labels from that source. 
Master Mapping Matrix Preparation
The MDS Team will:
- Merge all source-specific renaming files into a single master mapping spreadsheet (Google Sheets: "LegitHealth-DX ICD category management").
 - Create a dedicated tab for each data source containing:
- Source Label: The exact diagnostic string as it appears in the source dataset.
 - Target Name: The standardized "Visible ICD-11" category name (to be filled by medical expert).
 - ICD-11 Code(s): The official ICD-11 code(s) represented by this visible category - may be a single code or an array of codes for visually indistinguishable conditions (to be filled by medical expert).
 - Notes/Justification: Comments, rationale, or literature references explaining the mapping and any consolidation decisions.
 
 
This master spreadsheet serves as the central repository for all diagnostic mappings across all data sources.
Medical Review and "Visible ICD-11" Assignment
The designated Medical Expert(s) will review each tab of the master spreadsheet. For each unique diagnosis string from every source, the expert will:
- Identify the appropriate "Visible ICD-11" category based on visual features that can be reliably determined from images, using clinical knowledge, medical literature, and the official ICD-11 browser.
 - Assign the Target Name (standardized "Visible ICD-11" category name).
 - Assign the ICD-11 Code(s) (e.g., "2C30" for Basal cell carcinoma of skin, or an array such as ["EA80", "EA81"] for conditions that are visually indistinguishable but require additional clinical context to differentiate).
 - Document justification for any ambiguous cases, multiple possible mappings, consolidation of multiple ICD-11 codes into a single visible category, or when clinical judgment was required.
 
Mapping Guidelines
- Abbreviations and Acronyms: Map common abbreviations to their full clinical equivalents (e.g., "BCC" → "Basal cell carcinoma of skin" → ICD-11 code 2C30).
 - Synonyms and Variants: Multiple diagnosis strings that refer to the same condition should be mapped to the same Target Name and ICD-11 code(s) (e.g., "BCC", "basal cell carcinoma", "Basalioma" → same "Visible ICD-11" category).
 - Spelling Variations: Handle alternative spellings consistently (e.g., "Hemangioma" vs "Haemangioma" → same Target Name).
 - Visually Indistinguishable Conditions: When multiple distinct ICD-11 diagnoses share the same or highly similar visual presentations and cannot be reliably differentiated from images alone, they should be consolidated into a single "Visible ICD-11" category:
- The Target Name should reflect the broader category (e.g., "Eczematous dermatitis" for both contact and atopic dermatitis).
 - The ICD-11 Code(s) field should contain an array of all relevant codes (e.g., ["EA80", "EA81"]).
 - Document the clinical rationale for consolidation and specify what additional information healthcare professionals would need to make the final differentiation (e.g., patient history, allergen exposure, chronicity).
 
 - Ambiguous Labels: When a diagnosis string is ambiguous or could map to multiple ICD-11 codes that are not visually similar, the expert should:
- Select the most clinically appropriate and specific code based on available context.
 - Document the rationale and alternative codes considered in the justification column.
 
 - Non-specific or Incomplete Labels: If a diagnosis string is too vague to map to a specific ICD-11 code, map to the most appropriate parent category and document the limitation.
 - Legacy Coding Systems: For labels using older classification systems (ICD-10, SNOMED, etc.), translate to the corresponding ICD-11 code(s) using official crosswalk tables when available, verified by clinical expertise.
 - Exclusions: Assign "
-" as the Target Name for images that should be excluded from the dataset (e.g., poor quality, non-dermatological content, or images that cannot be reliably diagnosed). 
Category Management and Refinement
After initial mapping, a secondary review process is conducted to manage the complete ICD-11 category set:
- Automated Detection: The system automatically detects any new Target Names that were not present in the category management file and flags them for revision.
 - Category Consolidation: Medical experts review the complete list of mapped categories (
DXvXX_classes_stage3.csv) to identify:- Redundant categories that should be merged (e.g., closely related diagnostic terms).
 - Categories that should be excluded due to insufficient clinical relevance or data quality.
 
 - Updates to Master Spreadsheet: All category-level decisions (mergers, exclusions, corrections) are documented in the "LegitHealth-DX ICD category management" spreadsheet (tab: "Pathologies to exclude and merge").
 
Important: Any changes to Target Names must be manually applied across all relevant tabs in the master mapping spreadsheet to ensure consistency.
ICD-11 Code Validation
To ensure all categories have valid ICD-11 codes:
- The AI Team runs the automated script 
label_ICD_codes.pyto verify that every Target Name in the master spreadsheet has at least one assigned ICD-11 code. - Any missing codes are flagged and sent back to the medical expert for completion.
 - The ICD-11 API is used to validate code accuracy and retrieve official descriptions for all codes in the mapping, whether single codes or arrays of multiple codes.
 
Dataset Generation and Finalization
Once all mappings are complete and validated:
- The AI Team runs 
generate_DX_dataset.pyto:- Download the latest version of the master mapping spreadsheet.
 - Apply all mappings to convert source-specific labels to standardized "Visible ICD-11" categories.
 - Generate the unified LegitHealth-DX dataset with standardized labels.
 
 - Images are organized into folders by "Visible ICD-11" category name.
 - The complete mapping (including both single-code and multi-code categories) is version-controlled and documented in the dataset metadata files (
DXvXX_selected_unsampled.csv). 
Quality Control and Review
To ensure the highest level of clinical accuracy and robustness, the following quality control steps are implemented:
- Primary Review: The completed matrix is reviewed by another JD-009 that has not taken part in the creatin process to ensure completeness and internal consistency.
 - Secondary Review: The completed and justified matrix is independently reviewed by a board-certified dermatologist (JD-022 Medical Manager) who was not involved in the initial annotation.
 - Consensus Resolution: Any discrepancies between the primary annotation and the secondary review are resolved by assuming the secondary review as correct.
 - Automated Validation: The ICD-11 API is used to programmatically validate all assigned codes and ensure they correspond to valid ICD-11 categories.
 - Final Approval: The consensus-driven matrix is formally approved and version-controlled. This finalized matrix serves as the definitive ground truth diagnostic classification for all images in the LegitHealth-DX dataset.
 
Version Control and Traceability
Each iteration of the LegitHealth-DX dataset is assigned a version number (e.g., DXv27.1). For each version:
- The complete ICD-11 mapping spreadsheet is downloaded and archived with the corresponding version identifier.
 - All processing scripts, renaming files, and dataset metadata are version-controlled in the project repository.
 - The mapping between source labels and standardized ICD-11 categories is fully traceable through the master spreadsheet and source-specific renaming files.
 - Changes to category names, mergers, or exclusions are documented in the "Pathologies to exclude and merge" tab of the category management spreadsheet.
 
Dataset Processing Workflow
The complete workflow for generating the LegitHealth-DX dataset with standardized ICD-11 labels follows these stages:
Source-Specific Processing
For each data source:
- Create processing script (
add_XXX_images.py) insources/XXX/folder. - Generate standardized dataset CSV (
XXX_dataset.csv). - Extract unique diagnostic labels and create renaming file (
XXX_renaming.csv). - Upload files to AWS (
s3://skin-pathology-dl/clinical-imaging/diagnose/LegitHealth-DX/data-sources/). 
ICD-11 Mapping
- Copy renaming CSV to master Google Sheets ("LegitHealth-DX ICD category management").
 - Medical expert fills in Target Name and ICD-11 Code for each source label using the ICD-11 browser.
 - Validate all codes using 
label_ICD_codes.py. 
Dataset Generation
- Run 
generate_DX_dataset.pyto download latest mappings and create unified dataset. - Run 
get_DX_dataset_from_AWS.pyto download all images with valid category assignments. - Run 
check_prepare_DX_dataset.pyto identify and remove invalid images. 
Metadata Extraction
- Run 
extract_skin_tone.pyto estimate Fitzpatrick skin type distribution. - Run 
extract_image_domain.pyto classify images as clinical, dermoscopy, or non-dermatological. - Run body parts classification model to extract anatomical location metadata.
 
Crop Annotation (Region of Interest)
- Use the internal image annotation tool to annotate lesion regions for new images.
 - Run 
add_crops_to_DX.pyto combine in-house annotations with programmatically extracted crops from sources with segmentation masks. 
Category Management and Filtering
- Review 
DXvXX_classes_stage1.csv(raw categories after initial mapping). - Apply exclusions and mergers from "Pathologies to exclude and merge" spreadsheet.
 - Generate 
DXvXX_classes_stage2.csv(cleaned categories). - Run 
create_train_val_splits.pyto:- Apply category filters (minimum image count threshold).
 - Generate 
DXvXX_classes_stage3.csv(final category list for training). - Create train/validation/test splits.
 
 
Soft Label Generation
- Run 
generate_soft_labels.pyto create soft labels for images with differential diagnoses. - Generate final training files:
DXvXX_selected_unsampled.csv: Hard labels for standard training.DXvXX_selected_unsampled_soft.csv: Soft labels for differential diagnosis training.
 
Output Files
The complete workflow generates the following key files:
DXvXX_selected_unsampled.csv: Training dataset with hard labels.DXvXX_selected_unsampled_soft.csv: Training dataset with soft labels (for differential diagnoses).DXvXX_fitzpatrick_distribution.csvandDXvXX_fitzpatrick_summary.csv: Fitzpatrick skin type distribution.DXvXX_imagetype_distribution.csvandDXvXX_imagetype_summary.csv: Image type (clinical/dermoscopy) distribution.DXvXX_classes_stage1.csv: Raw category list after initial mapping.DXvXX_classes_stage2.csv: Cleaned category list after exclusions/mergers.DXvXX_classes_stage3.csv: Final category list meeting minimum threshold for training.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: Team members involved
 - Reviewer: JD-003, JD-004
 - Approver: JD-001