Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index
    • Overview and Device Description
    • Information provided by the Manufacturer
    • Design and Manufacturing Information
    • GSPR
    • Benefit-Risk Analysis and Risk Management
    • Product Verification and Validation
      • Software
      • Artificial Intelligence
        • R-TF-028-001 AI Description
        • R-TF-028-001 AI Development Plan
        • R-TF-028-003 Data Collection Instructions - Custom Gathered Data
        • R-TF-028-003 Data Collection Instructions - Archive Data
        • R-TF-028-004 Data Annotation Instructions - Visual Signs
        • R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
        • R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping
        • R-TF-028-005 AI Development Report
        • R-TF-028 AI Release Report
        • R-TF-028 AI Design Checks
        • R-TF-028-011 AI Risk Assessment
      • Usability and Human Factors Engineering
      • Clinical
    • Design History File
    • Post-Market Surveillance
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Grants
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Product Verification and Validation
  • Artificial Intelligence
  • R-TF-028-005 AI Development Report

R-TF-028-005 AI Development Report

Table of contents
  • Introduction
    • Context
    • Algorithms Description
    • AI Standalone Evaluation Objectives
  • Data Management
    • Overview
    • Data Collection
    • Foundational Annotation: ICD-11 Mapping
  • Model Development and Validation
    • ICD Category Distribution and Binary Indicators
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Erythema Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Desquamation Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Induration Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Pustule Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Crusting Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Xerosis Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Swelling Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Oozing Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Excoriation Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Lichenification Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Wound Characteristic Assessment
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Nodular Lesion Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Acneiform Lesion Type Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Lesion Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Hive Lesion Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Body Surface Segmentation
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Wound Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Hair Loss Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Nail Lesion Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Hypopigmentation/Depigmentation Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Skin Surface Segmentation
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Surface Area Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Acneiform Inflammatory Pattern Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Follicular and Inflammatory Pattern Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Pattern Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Pattern Indicator
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Dermatology Image Quality Assessment (DIQA)
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Fitzpatrick Skin Type Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Domain Validation
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Body Site Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
  • Summary and Conclusion
  • State of the Art Compliance and Development Lifecycle
    • Software Development Lifecycle Compliance
    • State of the Art in AI Development
    • Risk Management Throughout Lifecycle
    • Information Security
    • Verification and Validation Strategy
  • AI Risks Assessment Report
    • AI Risk Assessment
    • AI Risk Treatment
    • Residual AI Risk Assessment
    • AI Risk and Traceability with Safety Risk
    • Conclusion
  • Related Documents
    • Project Design and Plan
    • Data Collection and Annotation

Introduction​

Context​

This report documents the development, verification, and validation of the AI algorithm package for the Legit.Health Plus medical device. The development process was conducted in accordance with the procedures outlined in GP-028 AI Development and followed the methodologies specified in the R-TF-028-002 AI Development Plan.

The algorithms are designed as offline (static) models. They were trained on a fixed dataset prior to release and do not adapt or learn from new data after deployment. This ensures predictable and consistent performance in the clinical environment.

Algorithms Description​

The Legit.Health Plus device incorporates 31 AI models that work together to fulfill the device's intended purpose. A comprehensive description of all models, their clinical objectives, and performance specifications is provided in R-TF-028-001 AI/ML Description.

The AI algorithm package includes:

Clinical Models (directly fulfilling the intended purpose):

  • ICD Category Distribution and Binary Indicators (1 model): Provides interpretative distribution of ICD-11 categories and binary risk indicators (malignancy, pre-malignant, associated with malignancy, pigmented lesion, urgent referral, high-priority referral).
  • Visual Sign Intensity Quantification Models (10 models): Quantify the intensity of clinical signs including erythema, desquamation, induration, pustule, crusting, xerosis, swelling, oozing, excoriation, and lichenification.
  • Wound Characteristic Assessment (1 model): Evaluates wound tissue types and characteristics.
  • Lesion Quantification Models (4 models):
    • Inflammatory Nodular Lesion Quantification
    • Acneiform Lesion Type Quantification (multi-class detection of papules, pustules, comedones, nodules, cysts)
    • Inflammatory Lesion Quantification
    • Hive Lesion Quantification
  • Surface Area Quantification Models (6 models):
    • Body Surface Segmentation
    • Wound Surface Quantification
    • Hair Loss Surface Quantification
    • Nail Lesion Surface Quantification
    • Hypopigmentation/Depigmentation Surface Quantification
    • Surface Area Quantification (generic measurement model)
  • Pattern Identification Models (4 models):
    • Acneiform Inflammatory Pattern Identification
    • Follicular and Inflammatory Pattern Identification
    • Inflammatory Pattern Identification
    • Inflammatory Pattern Indicator

Non-Clinical Models (supporting proper functioning - 5 models):

  • Dermatology Image Quality Assessment (DIQA): Ensures image quality is suitable for analysis.
  • Fitzpatrick Skin Type Identification: Identifies skin phototype to support equity and bias mitigation.
  • Domain Validation: Verifies images are within the validated domain.
  • Skin Surface Segmentation: Identifies skin regions for analysis.
  • Body Site Identification: Determines anatomical location.

Total: 26 Clinical Models + 5 Non-Clinical Models = 31 Models

This report focuses on the development methodology, data management processes, and validation results for all models. Each model shares a common data foundation but may require specific annotation procedures as detailed in the respective data annotation instructions.

AI Standalone Evaluation Objectives​

The standalone validation aimed to confirm that all AI models meet their predefined performance criteria as outlined in R-TF-028-001 AI/ML Description.

Performance specifications and success criteria vary by model type and are detailed in the individual model sections of this report. All models were evaluated on independent, held-out test sets that were not used during training or model selection.

Data Management​

Overview​

The development of all AI models in the Legit.Health Plus device relies on a comprehensive dataset compiled from multiple sources and annotated through a multi-stage process. This section describes the general data management workflow that applies to all models, including collection, foundational annotation (ICD-11 mapping), and partitioning. Model-specific annotation procedures are detailed in the individual model sections.

Data Collection​

The dataset was compiled from multiple distinct sources as detailed in R-TF-028-003 Data Collection Instructions - Custom Gathered Data and R-TF-028-003 Data Collection Instructions - Archive Data:

  • Archive Data: Images sourced from reputable online sources and private institutions.
  • Custom Gathered DAta: Images collected under formal protocols at clinical sites.

This combined approach resulted in a comprehensive dataset covering diverse demographic characteristics (age, sex, Fitzpatrick skin types I-VI), anatomical sites, imaging conditions, and pathological conditions.

Dataset summary:

  • Total images: [NUMBER OF IMAGES] (to be completed)
  • Sources: 17
  • ICD-11 categories: [NUMBER OF CATEGORIES] (to be completed)
  • Demographic diversity: Ages [AGE RANGE], Fitzpatrick types I-VI, global geographic representation
IDDataset NameTypeDescriptionICD-11 MappingCropsDiff. DxSexAge
1Torrejon-HCP-diverse-conditionsMultipleDataset of skin images by physicians with good photographic skills✓ YesVaries✓✓✓
2Abdominal-skinArchiveSmall dataset of abdominal pictures with segmentation masks for `Non-specific lesion` class✗ NoYes (programmatic)———
3Basurto-Cruces-MelanomaCustom gatheredClinical validation study dataset (`MC EVCDAO 2019`)✓ YesYes (in-house crops)—✓✓
4BI-GPP (batch 1)ArchiveSmall set of GPP images from Boehringer Ingelheim (first batch)✓ YesNo———
5BI-GPP (batch 2)ArchiveLarge dataset of GPP images from Boehringer Ingelheim (second batch)✓ YesYes (programmatic)—✓✓
6Chiesa-datasetArchiveSample of head and neck lesions (Medela et al., 2024)✓ YesYes (in-house crops)—◐◐
7Figaro 1KArchiveHair style classification and segmentation dataset, repurposed for `Non-specific finding`✗ NoYes (in-house crops)———
8Hand Gesture Recognition (HGR)ArchiveSmall dataset of hands repurposed for non-specific images✗ NoYes (programmatic)———
9IDEI 2024 (pigmented)ArchiveProspective and retrospective studies at IDEI (DERMATIA project), pigmented lesions only✓ YesYes (programmatic)—✓◐
10Manises-HSArchiveLarge collection of hidradenitis suppurativa images✗ NoNot yet—✓✓
11Nails segmentationArchiveSmall nail segmentation dataset repurposed for `non-specific lesion`✗ NoYes (programmatic)———
12Non-specific lesion V2ArchiveSmall representative collection repurposed for `non-specific lesion`✗ NoYes (programmatic)———
13Osakidetza-derivationArchiveClinical validation study dataset (`DAO Derivación O 2022`)✓ YesYes (in-house crops)◐✓✓
14Ribera ulcersArchiveCollection of ulcer images from Ribera Salud✗ NoYes (from wound masks, not all)———
15Transient Biometrics Nails V1ArchiveBiometric dataset of nail images✗ NoYes (programmatic)———
16Transient Biometrics Nails V2ArchiveBiometric dataset of nail images✗ NoNo (close-ups)———
17WoundsDBArchiveSmall chronic wounds database✓ YesNo—✓◐

Total datasets: 51 | With ICD-11 mapping: 37

Legend: ✓ = Yes | ◐ = Partial/Pending | — = No

Foundational Annotation: ICD-11 Mapping​

Before any model-specific training could begin, all diagnostic labels across all data sources were standardized to the ICD-11 classification system. This foundational annotation step is required for all models and is detailed in R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping.

The ICD-11 mapping process involved:

  1. Label Extraction: Extracting all unique diagnostic labels from each data source
  2. Standardization: Mapping source-specific labels (abbreviations, alternative spellings, legacy coding systems) to standardized ICD-11 categories
  3. Clinical Validation: Expert dermatologist review and validation of all mappings
  4. Visible Category Consolidation: Grouping ICD-11 codes that cannot be reliably distinguished based on visual features alone into unified "Visible ICD-11" categories

This standardization ensures:

  • Consistent diagnostic ground truth across all data sources
  • Clinical validity and regulatory compliance (ICD-11 is the WHO standard)
  • Proper handling of visually similar conditions that require additional clinical information for differentiation
  • A unified diagnostic vocabulary for the ICD Category Distribution model and all other clinical models

Key outputs:

  • Master ICD-11 mapping matrix linking all source labels to standardized categories
  • Documentation of clinical rationale for category consolidation decisions
  • Version-controlled ground truth diagnostic classification for the entire dataset

(to be completed)

(include the csv files detailed in the R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping)

Model Development and Validation​

This section details the development, training, and validation of all AI models in the Legit.Health Plus device. Each model subsection includes:

  • Model-specific data annotation requirements
  • Training methodology and architecture
  • Performance evaluation results
  • Bias analysis and fairness considerations

ICD Category Distribution and Binary Indicators​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - ICD Category Distribution and Binary Indicators section

The ICD Category Distribution model is a deep learning classifier that outputs a probability distribution across ICD-11 disease categories. The Binary Indicators are derived from this distribution using an expert-curated mapping matrix.

Models included:

  • ICD Category Distribution (outputs top-5 diagnoses with probabilities)
  • Binary Indicators (6 derived indicators):
    • Malignant
    • Pre-malignant
    • Associated with malignancy
    • Pigmented lesion
    • Urgent referral (≤48h)
    • High-priority referral (≤2 weeks)

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed via R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping)

All images in the training, validation, and test sets were annotated with standardized ICD-11 diagnostic labels following the comprehensive mapping process described in the Data Management section.

Binary Indicator Mapping: A dermatologist-validated mapping matrix was created to link each ICD-11 category to the six binary indicators. This mapping defines which disease categories contribute to each indicator (e.g., melanoma, squamous cell carcinoma, and basal cell carcinoma all contribute to the "Malignant" indicator).

Dataset statistics:

  • Total images with ICD-11 labels: [NUMBER] (to be completed)
  • Number of ICD-11 categories: [NUMBER] (to be completed)
  • Training set: [NUMBER] images
  • Validation set: [NUMBER] images
  • Test set: [NUMBER] images

Training Methodology​

Pre-processing:

  • Input images resized to model-required dimensions
  • Data augmentation during training: random cropping (guided by bounding boxes where available), rotations, color jittering, histogram equalization
  • No augmentation applied to test data

Architecture: [VIT or EfficientNet - to be determined]

  • Vision Transformer (ViT) or EfficientNet architecture
  • Transfer learning from large-scale pre-trained weights

Training:

  • Optimizer: Adam
  • Loss function: Cross-entropy
  • Learning rate policy: One-cycle policy for super-convergence
  • Early stopping based on validation set performance
  • Training duration: [NUMBER] epochs

Post-processing:

  • Temperature scaling for probability calibration
  • Test-time augmentation (TTA) for robust predictions

Performance Results​

ICD Category Distribution Performance:

MetricResultSuccess CriterionOutcome
Top-1 Accuracy[TO FILL]≥ 55%[PENDING]
Top-3 Accuracy[TO FILL]≥ 70%[PENDING]
Top-5 Accuracy[TO FILL]≥ 80%[PENDING]

Binary Indicator Performance:

IndicatorResult (AUC)Success CriterionOutcome
Malignant[TO FILL]≥ 0.80[PENDING]
Pre-malignant[TO FILL]≥ 0.80[PENDING]
Associated with malignancy[TO FILL]≥ 0.80[PENDING]
Pigmented lesion[TO FILL]≥ 0.80[PENDING]
Urgent referral[TO FILL]≥ 0.80[PENDING]
High-priority referral[TO FILL]≥ 0.80[PENDING]

Verification and Validation Protocol​

Test Design:

  • Held-out test set sequestered from training and validation
  • Stratified sampling to ensure representation across ICD-11 categories
  • Independent evaluation on external datasets (DDI, clinical study data)

Complete Test Protocol:

  • Input: RGB images from test set
  • Processing: Model inference with TTA
  • Output: ICD-11 probability distribution and binary indicator scores
  • Ground truth comparison: Expert-labeled ICD-11 categories and binary mappings
  • Statistical analysis: Top-k accuracy, AUC-ROC with 95% confidence intervals

Data Analysis Methods:

  • Top-k accuracy calculation with bootstrapping for confidence intervals
  • ROC curve analysis and AUC calculation for binary indicators
  • Confusion matrix analysis for error pattern identification
  • Statistical significance testing (DeLong test for AUC comparisons)

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Evaluate model performance across demographic subpopulations to identify and mitigate potential biases that could affect clinical safety and effectiveness.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis:

  • Performance metrics (Top-k accuracy, AUC) disaggregated by Fitzpatrick types I-VI
  • Datasets: DDI dataset, internal test set with Fitzpatrick annotations
  • Statistical comparison: Chi-square test for performance differences across groups
  • Success criterion: No statistically significant performance degradation (p < 0.05) in any Fitzpatrick type below overall acceptance thresholds

2. Age Group Analysis:

  • Stratification: Pediatric (under 18 years), Adult (18-65 years), Elderly (over 65 years)
  • Metrics: Top-k accuracy and AUC per age group
  • Data sources: Clinical study datasets with age metadata
  • Success criterion: Performance within ±10% across age groups

3. Anatomical Site Analysis:

  • Site categories: Face, trunk, extremities, intertriginous areas, acral sites
  • Evaluation: Top-k accuracy per anatomical location
  • Success criterion: No anatomical site with performance below acceptance threshold

4. Sex/Gender Analysis:

  • Performance comparison between male and female subgroups
  • Statistical testing for significant differences
  • Success criterion: No gender-based performance disparity >5%

5. Image Quality Impact:

  • Analysis of performance degradation with varying image quality (DIQA scores)
  • Identification of quality thresholds for reliable predictions
  • Mitigation: DIQA-based rejection criteria for low-quality images

6. Rare Condition Representation:

  • Analysis of performance on rare vs. common ICD-11 categories
  • Class-wise sensitivity and specificity reporting
  • Mitigation strategies for underrepresented conditions

Bias Mitigation Strategies:

  • Multi-source data collection ensuring demographic diversity
  • Fitzpatrick type identification for bias monitoring
  • Data augmentation targeting underrepresented subgroups
  • Threshold optimization per subpopulation if necessary
  • Clinical validation with diverse patient populations

Results Summary: (to be completed after bias analysis)

SubpopulationMetricResultComparison to OverallAssessment
Fitzpatrick I-IITop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick III-IVTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VITop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Age: PediatricTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Age: AdultTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Age: ElderlyTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Sex: MaleTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Sex: FemaleTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Site: FaceTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Site: TrunkTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]
Site: ExtremitiesTop-5 Acc.[TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Erythema Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Erythema Intensity Quantification section

This model quantifies erythema (redness) intensity on an ordinal scale (0-9), outputting a probability distribution that is converted to a continuous severity score via weighted expected value calculation.

Clinical Significance: Erythema is a cardinal sign of inflammation in numerous dermatological conditions including psoriasis, atopic dermatitis, and other inflammatory dermatoses.

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed)

Model-specific annotation: Erythema intensity scoring (R-TF-028-004 Data Annotation Instructions - Visual Signs)

Medical experts (dermatologists) annotated images with erythema intensity scores following standardized clinical scoring protocols (e.g., Clinician's Erythema Assessment scale). Annotations include:

  • Ordinal intensity scores (0-9): 0 = none, 9 = maximum erythema
  • Multi-annotator consensus for ground truth establishment (minimum 2-3 dermatologists per image)
  • Quality control and senior dermatologist review for ambiguous cases

Dataset statistics:

  • Images with erythema annotations: [NUMBER] (to be completed)
  • Training set: [NUMBER] images
  • Validation set: [NUMBER] images
  • Test set: [NUMBER] images
  • Average inter-annotator agreement (ICC): [VALUE] (to be completed)
  • Conditions represented: Psoriasis, atopic dermatitis, rosacea, contact dermatitis, etc.

Training Methodology​

Architecture: [CNN-based or ViT-based - to be determined]

  • Deep learning model tailored for ordinal regression
  • Transfer learning from pre-trained weights (ImageNet or domain-specific)
  • Input size: [SIZE] pixels

Training approach:

  • Loss function: Ordinal cross-entropy or weighted expected value optimization
  • Optimizer: Adam with learning rate [LR]
  • Data augmentation: Rotations, color jittering (carefully controlled to preserve erythema characteristics), cropping
  • Regularization: Dropout, weight decay
  • Training duration: [NUMBER] epochs with early stopping

Post-processing:

  • Weighted expected value calculation for continuous score
  • Probability calibration if needed
  • Output range: 0-9 continuous scale

Performance Results​

Performance evaluated using Relative Mean Absolute Error (RMAE) compared to expert consensus.

Success criterion: RMAE ≤ 20% (performance superior to inter-observer variability)

MetricResultSuccess CriterionOutcome
RMAE (Overall)[TO FILL]≤ 20%[PENDING]
Pearson Correlation[TO FILL]≥ 0.85[PENDING]
Expert Inter-observer ICC[TO FILL]ReferenceN/A
Model vs. Expert ICC[TO FILL]≥ Expert ICC[PENDING]

Verification and Validation Protocol​

Test Design:

  • Independent test set with multi-annotator ground truth (minimum 3 dermatologists per image)
  • Comparison against expert consensus (mean of expert scores)
  • Evaluation across diverse conditions (psoriasis, eczema, rosacea), Fitzpatrick skin types, and anatomical sites

Complete Test Protocol:

  • Input: RGB images from test set with expert erythema intensity annotations
  • Processing: Model inference with probability distribution output
  • Output: Continuous erythema severity score (0-9) via weighted expected value
  • Ground truth: Consensus intensity score from multiple expert dermatologists
  • Statistical analysis: RMAE, ICC, Pearson/Spearman correlation, Bland-Altman analysis

Data Analysis Methods:

  • RMAE calculation: Relative Mean Absolute Error comparing model predictions to expert consensus
  • Inter-observer variability measurement (ICC among experts as benchmark)
  • Correlation analysis: Pearson and Spearman correlation coefficients
  • Bland-Altman plots for agreement assessment
  • Bootstrap resampling (1000 iterations) for 95% confidence intervals
  • Subgroup analysis for bias detection

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure erythema quantification performs consistently across demographic subpopulations, with special attention to Fitzpatrick skin types where erythema visualization varies.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis (Critical for erythema):

  • RMAE calculation per Fitzpatrick type (I-II, III-IV, V-VI)
  • Recognition that erythema contrast decreases with increasing melanin content
  • Comparison of model performance vs. expert inter-observer variability per skin type
  • Success criterion: RMAE ≤ 20% maintained across all skin types

2. Disease Condition Analysis:

  • Performance per condition: Psoriasis, atopic dermatitis, rosacea, contact dermatitis, cellulitis
  • Disease-specific annotation challenges and inter-observer variability
  • Success criterion: Model performance better than or equal to expert variability for each condition

3. Anatomical Site Analysis:

  • Site-specific performance: Face, trunk, extremities, intertriginous areas
  • Recognition of site-specific visualization challenges (shadows, curvature)
  • Success criterion: No site with RMAE > 25%

4. Severity Range Analysis:

  • Performance stratified by severity: Mild (0-3), Moderate (4-6), Severe (7-9)
  • Detection of ceiling or floor effects
  • Success criterion: Consistent RMAE across severity levels

5. Image Quality Impact:

  • RMAE correlation with DIQA scores
  • Performance degradation with poor lighting/focus
  • Mitigation: DIQA-based quality filtering

6. Age Group Analysis:

  • Performance in pediatric, adult, elderly populations
  • Age-related skin changes (thinner skin, vascular changes)
  • Success criterion: No age group with significantly degraded performance

Bias Mitigation Strategies:

  • Training data balanced across Fitzpatrick types (minimum 20% representation of types V-VI)
  • Fitzpatrick-specific data augmentation
  • Potential Fitzpatrick-conditional model calibration
  • Collaborative training with other chromatic intensity models (desquamation, induration)

Results Summary: (to be completed after bias analysis)

SubpopulationRMAEExpert ICCModel vs ExpertAssessment
Fitzpatrick I-II[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick III-IV[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VI[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Psoriasis[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Atopic Dermatitis[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Rosacea[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Mild Severity (0-3)[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Moderate Severity (4-6)[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Severe Severity (7-9)[TO FILL][TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Desquamation Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Desquamation Intensity Quantification section

This model quantifies desquamation (scaling/peeling) intensity on an ordinal scale (0-9), critical for assessment of psoriasis, seborrheic dermatitis, and other scaling conditions.

Clinical Significance: Desquamation is one of the three cardinal signs in PASI scoring for psoriasis and a key indicator in many inflammatory dermatoses.

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed)

Model-specific annotation: Desquamation intensity scoring (R-TF-028-004 Data Annotation Instructions - Visual Signs)

Dataset statistics: (to be completed)

Training Methodology​

Architecture: (to be determined)

Performance Results​

MetricResultSuccess CriterionOutcome
RMAE (Overall)[TO FILL]≤ 20%[PENDING]
Pearson Correlation[TO FILL]≥ 0.85[PENDING]

Verification and Validation Protocol​

(Follow same comprehensive protocol as Erythema model)

Bias Analysis and Fairness Evaluation​

Subpopulation Analysis: Fitzpatrick types, disease conditions (psoriasis, eczema, seborrheic dermatitis), anatomical sites, severity ranges.

Results Summary: (to be completed)


Induration Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Induration Intensity Quantification section

This model quantifies induration (plaque thickness/elevation) on an ordinal scale (0-9), essential for psoriasis PASI scoring and assessment of infiltrative conditions.

Clinical Significance: Induration reflects tissue infiltration and is a key component of psoriasis severity assessment.

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed)

Model-specific annotation: Induration intensity scoring (R-TF-028-004 Data Annotation Instructions - Visual Signs)

Dataset statistics: (to be completed)

Training Methodology​

Performance Results​

MetricResultSuccess CriterionOutcome
RMAE (Overall)[TO FILL]≤ 20%[PENDING]

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Pustule Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Pustule Intensity Quantification section

This model quantifies pustule intensity/density on an ordinal scale (0-9), critical for pustular psoriasis, acne, and other pustular dermatoses.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Crusting Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Crusting Intensity Quantification section

This model quantifies crusting severity on an ordinal scale (0-9), important for atopic dermatitis EASI/SCORAD and wound assessment.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Xerosis Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Xerosis Intensity Quantification section

This model quantifies xerosis (dry skin) severity on an ordinal scale (0-9), fundamental for skin barrier assessment.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Swelling Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Swelling Intensity Quantification section

This model quantifies swelling/edema severity on an ordinal scale (0-9), relevant for acute inflammatory conditions.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Oozing Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Oozing Intensity Quantification section

This model quantifies oozing/exudation severity on an ordinal scale (0-9), important for acute eczema and wound assessment.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Excoriation Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Excoriation Intensity Quantification section

This model quantifies excoriation (scratch marks) severity on an ordinal scale (0-9), relevant for atopic dermatitis and pruritic conditions.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Lichenification Intensity Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Lichenification Intensity Quantification section

This model quantifies lichenification (skin thickening with exaggerated skin markings) severity on an ordinal scale (0-9), important for chronic dermatitis assessment.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Wound Characteristic Assessment​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Wound Characteristic Assessment section

This model assesses wound characteristics including tissue types (granulation, slough, necrotic, epithelial), wound bed appearance, exudate level, and other clinically relevant features for comprehensive wound assessment.

Clinical Significance: Accurate wound characterization is essential for wound care planning, treatment selection, and healing progress monitoring.

Data Requirements and Annotation​

Dataset statistics: (to be completed)

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Inflammatory Nodular Lesion Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Inflammatory Nodular Lesion Quantification section

This model uses object detection to count inflammatory nodular lesions, critical for hidradenitis suppurativa (HS) severity assessment using scores like IHS4, Hurley staging, and HS-PGA.

Clinical Significance: Nodule counting is essential for HS assessment, treatment response monitoring, and clinical trial endpoints.

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed)

Model-specific annotation: Bounding box annotations for nodular lesions (R-TF-028-004 Data Annotation Instructions - Visual Signs)

Medical experts (dermatologists specializing in HS) drew bounding boxes around each discrete nodular lesion:

  • Tight rectangles containing entire nodule with minimal background
  • Individual boxes for overlapping but clinically distinguishable nodules
  • Complete coverage of all nodules in each image
  • Multi-annotator consensus for challenging cases

Dataset statistics:

  • Images with nodule annotations: [NUMBER] (to be completed)
  • Training set: [NUMBER] images
  • Validation set: [NUMBER] images
  • Test set: [NUMBER] images
  • Average nodules per image: [NUMBER] (to be completed)
  • Conditions represented: Hidradenitis suppurativa stages I-III

Training Methodology​

Architecture: [YOLO, Faster R-CNN, or similar - to be determined]

  • Object detection model optimized for small-to-medium lesions
  • Transfer learning from pre-trained detection weights
  • Input size: [SIZE] pixels

Training approach:

  • Loss function: Detection loss (bounding box regression + classification)
  • Optimizer: [Adam/SGD with momentum]
  • Non-maximum suppression (NMS) threshold: [VALUE]
  • Data augmentation: Rotations, scaling, color adjustments, flips
  • Hard negative mining for challenging backgrounds
  • Training duration: [NUMBER] epochs

Post-processing:

  • NMS for overlapping predictions
  • Confidence threshold optimization
  • Count aggregation from detected boxes

Performance Results​

Success criteria:

  • Precision ≥ 0.80 (minimize false positives)
  • Recall ≥ 0.80 (minimize missed nodules)
  • F1-score ≥ 0.80 (balanced performance)
  • mAP@IoU=0.5 ≥ 0.75
  • Count correlation (Pearson r) ≥ 0.90
MetricResultSuccess CriterionOutcome
Precision[TO FILL]≥ 0.80[PENDING]
Recall[TO FILL]≥ 0.80[PENDING]
F1-score[TO FILL]≥ 0.80[PENDING]
mAP@IoU=0.5[TO FILL]≥ 0.75[PENDING]
Count Correlation (Pearson)[TO FILL]≥ 0.90[PENDING]

Verification and Validation Protocol​

Test Design:

  • Independent test set with expert bounding box annotations
  • Multi-annotator consensus for lesion counts (minimum 2 HS specialists)
  • Evaluation across HS severity stages (Hurley I, II, III)

Complete Test Protocol:

  • Input: RGB images from test set with expert nodule annotations
  • Processing: Object detection inference with NMS
  • Output: Predicted bounding boxes with confidence scores and nodule counts
  • Ground truth: Expert-annotated boxes and manual nodule counts
  • Statistical analysis: Precision, recall, F1, mAP, count correlation, Bland-Altman

Data Analysis Methods:

  • IoU calculation for box matching (threshold = 0.5)
  • Precision-Recall curves
  • mAP calculation across IoU thresholds (0.5, 0.75)
  • Lesion count correlation: Pearson and Spearman coefficients
  • Count error metrics: MAE, RMSE, mean percentage error
  • Bland-Altman plots for count agreement

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure nodule detection performs consistently across demographic subpopulations and disease severity levels.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis:

  • Precision, recall, F1 disaggregated by skin type
  • Recognition that nodule contrast varies with pigmentation
  • Success criterion: F1-score ≥ 0.75 across all Fitzpatrick types

2. Lesion Density Analysis:

  • Low density: 1-5 nodules
  • Medium density: 6-15 nodules
  • High density: 16+ nodules
  • Success criterion: Consistent F1-score across densities

3. Lesion Size Analysis:

  • Small (less than 5mm), Medium (5-10mm), Large (greater than 10mm)
  • Success criterion: Recall ≥ 0.75 for small nodules

4. Anatomical Site Analysis:

  • Axillary, inguinal, gluteal, inframammary regions
  • Site-specific challenges (folds, shadows, hair)
  • Success criterion: F1 variation ≤ 15% across sites

5. Disease Severity (Hurley Stage):

  • Stage I (mild), Stage II (moderate), Stage III (severe)
  • Success criterion: Consistent performance across stages

6. Image Quality Impact:

  • Performance vs. DIQA scores
  • Mitigation: Quality-based filtering

Bias Mitigation Strategies:

  • Training data balanced across Fitzpatrick types and severities
  • Small lesion augmentation and oversampling
  • Hard negative mining
  • Multi-scale detection

Results Summary: (to be completed)

SubpopulationF1-scoreCount CorrelationAssessment
Fitzpatrick I-II[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick III-IV[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VI[TO FILL][TO FILL][PASS/FAIL]
Low Density (1-5)[TO FILL][TO FILL][PASS/FAIL]
Medium Density (6-15)[TO FILL][TO FILL][PASS/FAIL]
High Density (16+)[TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Acneiform Lesion Type Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Acneiform Lesion Type Quantification section

This is a single multi-class object detection model that detects and counts all five acneiform lesion types simultaneously: papules, pustules, comedones, nodules, and cysts. The model outputs bounding boxes with associated class labels and confidence scores for each detected lesion, enabling comprehensive acne severity assessment.

Clinical Significance: This unified model provides complete acneiform lesion profiling essential for acne grading systems (e.g., Global Acne Grading System, Investigator's Global Assessment) and treatment selection. By detecting all lesion types in a single inference, it ensures consistent assessment across lesion categories.

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed)

Model-specific annotation: Bounding box annotations for all acneiform lesion types (R-TF-028-004 Data Annotation Instructions - Visual Signs)

Medical experts (dermatologists specializing in acne) drew bounding boxes around each discrete lesion and assigned class labels:

  • Papules: Inflammatory, raised lesions without pus (typically less than 5mm)
  • Pustules: Pus-filled inflammatory lesions
  • Comedones: Open (blackheads) and closed (whiteheads) comedones
  • Nodules: Large, deep inflammatory lesions (greater than or equal to 5mm)
  • Cysts: Large, fluid-filled lesions (most severe form)

Annotation guidelines:

  • Tight rectangles containing entire lesion with minimal background
  • Individual boxes for overlapping but distinguishable lesions
  • Clear class label assignment based on morphological features
  • Multi-annotator consensus for ambiguous cases
  • Complete coverage of all lesions in each image

Dataset statistics:

  • Images with acneiform lesion annotations: [NUMBER] (to be completed)
  • Training set: [NUMBER] images
  • Validation set: [NUMBER] images
  • Test set: [NUMBER] images
  • Total lesions annotated: [NUMBER] across all 5 classes
  • Distribution by lesion type: (to be completed)
  • Acne severity range: Mild to severe
  • Anatomical sites: Face, back, chest

Training Methodology​

Architecture: [Multi-class object detection model (e.g., YOLO, Faster R-CNN) - to be determined]

  • Multi-class object detection optimized for small-to-medium lesions
  • Transfer learning from pre-trained detection weights
  • Input size: [SIZE] pixels

Training approach:

  • Loss function: Multi-class detection loss (bounding box regression + classification + confidence)
  • Optimizer: [Adam/SGD with momentum]
  • Non-maximum suppression (NMS) threshold: [VALUE]
  • Data augmentation: Rotations, scaling, color adjustments, flips, lighting variations
  • Class balancing strategy for imbalanced lesion types
  • Hard negative mining for challenging backgrounds
  • Training duration: [NUMBER] epochs

Post-processing:

  • Class-specific NMS for overlapping predictions
  • Confidence threshold optimization per class
  • Count aggregation from detected boxes by lesion type

Performance Results​

Success criteria: The model must achieve performance within or better than expert inter-observer variability for each lesion type.

Lesion TypeMAE ThresholdF1-scorePrecisionRecall
Papules≤ Expert Inter-observer Variability≥ 0.85≥ 0.80≥ 0.80
Pustules≤ Expert Inter-observer Variability≥ 0.90≥ 0.85≥ 0.85
Comedones≤ Expert Inter-observer Variability≥ 0.75≥ 0.70≥ 0.75
Nodules≤ Expert Inter-observer Variability≥ 0.80≥ 0.75≥ 0.80
Cysts≤ Expert Inter-observer Variability≥ 0.80≥ 0.75≥ 0.80

Performance Summary Table:

Lesion TypeMAEF1-scorePrecisionRecallmAP@0.5Outcome
Papules[TO FILL][TO FILL][TO FILL][TO FILL][TO FILL][PENDING]
Pustules[TO FILL][TO FILL][TO FILL][TO FILL][TO FILL][PENDING]
Comedones[TO FILL][TO FILL][TO FILL][TO FILL][TO FILL][PENDING]
Nodules[TO FILL][TO FILL][TO FILL][TO FILL][TO FILL][PENDING]
Cysts[TO FILL][TO FILL][TO FILL][TO FILL][TO FILL][PENDING]
Overall[TO FILL][TO FILL][TO FILL][TO FILL][TO FILL][PENDING]

Verification and Validation Protocol​

Test Design:

  • Independent test set with expert bounding box annotations for all 5 lesion types
  • Multi-annotator consensus for lesion counts and class labels (minimum 2 dermatologists)
  • Evaluation across acne severity grades (mild, moderate, severe)
  • Coverage of diverse anatomical sites (face, back, chest)

Complete Test Protocol:

  • Input: RGB images from test set with expert multi-class annotations
  • Processing: Multi-class object detection inference with class-specific NMS
  • Output: Predicted bounding boxes with class labels, confidence scores, and per-class counts
  • Ground truth: Expert-annotated boxes with consensus labels and manual counts
  • Statistical analysis: Per-class precision, recall, F1, mAP, count correlation (MAE, Pearson, Spearman), Bland-Altman

Data Analysis Methods:

  • IoU calculation for box matching (threshold = 0.5) with class agreement requirement
  • Class-specific Precision-Recall curves
  • Mean Average Precision (mAP) calculation across IoU thresholds (0.5, 0.75)
  • Lesion count correlation per class: Pearson and Spearman coefficients
  • Count error metrics per class: MAE, RMSE, mean percentage error
  • Bland-Altman plots for count agreement per lesion type
  • Confusion matrix for class assignment errors
  • Cross-class misclassification analysis

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure the multi-class acneiform lesion detection model performs consistently across demographic subpopulations, lesion densities, and severity levels for all five lesion types.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis:

  • Per-class precision, recall, F1 disaggregated by skin type
  • Recognition that lesion contrast varies with pigmentation (e.g., comedones harder to detect on darker skin)
  • Success criterion: F1-score ≥ 0.70 for each lesion class across all Fitzpatrick types
  • Special attention to comedone detection on Fitzpatrick V-VI

2. Lesion Density Analysis:

  • Performance stratified by total lesion count:
    • Low density: 1-10 total lesions
    • Medium density: 11-30 total lesions
    • High density: 31+ total lesions
  • Success criterion: Consistent per-class F1-score across densities

3. Lesion Size Analysis:

  • Small (less than 3mm), Medium (3-6mm), Large (greater than 6mm)
  • Success criterion: Recall ≥ 0.70 for small lesions (papules, comedones)
  • Ensure no systematic bias against smaller lesion types

4. Anatomical Site Analysis:

  • Face, back, chest performance comparison
  • Site-specific challenges (facial shadows, back hair, chest skin folds)
  • Success criterion: F1 variation ≤ 15% across sites for each class

5. Acne Severity (Lesion Mix) Analysis:

  • Mild acne (predominantly comedones, few inflammatory lesions)
  • Moderate acne (mixed papules, pustules, comedones)
  • Severe acne (nodules, cysts, extensive inflammatory lesions)
  • Success criterion: Consistent performance across severity spectrum

6. Class Confusion Analysis:

  • Systematic misclassifications (e.g., papules vs. nodules, pustules vs. cysts)
  • Clinical impact of misclassifications (minor vs. severe)
  • Success criterion: Clinically significant misclassification rate ≤ 5%

7. Image Quality Impact:

  • Performance vs. DIQA scores
  • Mitigation: Quality-based filtering

Bias Mitigation Strategies:

  • Training data balanced across Fitzpatrick types and severity levels
  • Class-specific augmentation and oversampling for underrepresented types
  • Hard negative mining for challenging examples
  • Multi-scale detection architecture
  • Class-balanced loss function

Results Summary: (to be completed)

Lesion Type / SubpopulationF1-scoreCount CorrelationAssessment
Papules - Fitz I-II[TO FILL][TO FILL][PASS/FAIL]
Papules - Fitz V-VI[TO FILL][TO FILL][PASS/FAIL]
Pustules - Fitz I-II[TO FILL][TO FILL][PASS/FAIL]
Pustules - Fitz V-VI[TO FILL][TO FILL][PASS/FAIL]
Comedones - Fitz I-II[TO FILL][TO FILL][PASS/FAIL]
Comedones - Fitz V-VI[TO FILL][TO FILL][PASS/FAIL]
Nodules - Severe Acne[TO FILL][TO FILL][PASS/FAIL]
Cysts - Severe Acne[TO FILL][TO FILL][PASS/FAIL]
High Density Scenarios[TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Inflammatory Lesion Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Inflammatory Lesion Quantification section

This model detects and counts general inflammatory lesions across various dermatological conditions.

Clinical Significance: Inflammatory lesion counts are used in various dermatology scoring systems.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Hive Lesion Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Hive Lesion Quantification section

This model detects and counts hives (wheals) for urticaria severity assessment using UAS7 (Urticaria Activity Score).

Clinical Significance: Daily hive counts are essential for urticaria diagnosis and treatment monitoring.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Body Surface Segmentation​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Body Surface Segmentation section

This model segments affected body surface area for conditions like psoriasis, atopic dermatitis, and vitiligo using scores like BSA%, PASI, EASI, and VASI.

Clinical Significance: BSA percentage is critical for disease severity classification and treatment decisions (e.g., systemic therapy eligibility).

Data Requirements and Annotation​

Foundational annotation: ICD-11 mapping (completed)

Model-specific annotation: Polygon annotations for affected areas (R-TF-028-004 Data Annotation Instructions - Visual Signs)

Medical experts traced precise boundaries of affected skin:

  • Polygon tool for accurate edge delineation
  • Separate polygons for non-contiguous patches
  • High spatial precision for reliable area calculation
  • Multi-annotator consensus for boundary agreement

Dataset statistics:

  • Images with BSA annotations: [NUMBER] (to be completed)
  • Training set: [NUMBER] images
  • Validation set: [NUMBER] images
  • Test set: [NUMBER] images
  • Conditions: Psoriasis, atopic dermatitis, vitiligo, others

Training Methodology​

Architecture: [U-Net, DeepLabV3+, or similar - to be determined]

  • Semantic segmentation model with encoder-decoder structure
  • Transfer learning from pre-trained weights
  • Input size: [SIZE] pixels

Training approach:

  • Loss function: Dice loss + Binary Cross-Entropy
  • Optimizer: [Adam/SGD]
  • Data augmentation: Rotations, scaling, color jitter, flips
  • Multi-scale training for varied lesion sizes
  • Training duration: [NUMBER] epochs

Post-processing:

  • Morphological operations for boundary refinement
  • Small region filtering
  • Surface area calculation with calibration

Performance Results​

Success criteria:

  • IoU ≥ 0.75 (segmentation accuracy)
  • Dice coefficient ≥ 0.85 (overlap similarity)
  • Surface area error ≤ 15% (measurement accuracy)
  • Pixel accuracy ≥ 0.90
MetricResultSuccess CriterionOutcome
IoU[TO FILL]≥ 0.75[PENDING]
Dice[TO FILL]≥ 0.85[PENDING]
Area Error (%)[TO FILL]≤ 15%[PENDING]
Pixel Accuracy[TO FILL]≥ 0.90[PENDING]

Verification and Validation Protocol​

Test Design:

  • Independent test set with expert polygon annotations
  • Multi-annotator consensus for segmentation masks (minimum 2 dermatologists)
  • Evaluation across lesion sizes and morphologies

Complete Test Protocol:

  • Input: RGB images with calibration markers
  • Processing: Semantic segmentation inference
  • Output: Predicted masks and calculated BSA%
  • Ground truth: Expert-annotated masks and reference measurements
  • Statistical analysis: IoU, Dice, area correlation, Bland-Altman

Data Analysis Methods:

  • IoU: Intersection/union of predicted and ground truth
  • Dice: 2×intersection/(area_pred + area_gt)
  • Pixel-wise sensitivity, specificity, accuracy
  • Calibrated area calculation
  • Bland-Altman plots for BSA% agreement
  • Pearson/Spearman correlation for area measurements

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure BSA segmentation performs consistently across skin types, lesion sizes, and anatomical locations.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis:

  • Dice scores disaggregated by skin type
  • Recognition that lesion boundaries may have different contrast on darker skin
  • Success criterion: Dice ≥ 0.80 across all Fitzpatrick types

2. Lesion Size Analysis:

  • Small (less than 5 cm²), Medium (5-50 cm²), Large (greater than 50 cm²)
  • Success criterion: IoU ≥ 0.70 for all sizes

3. Lesion Morphology Analysis:

  • Well-defined vs. ill-defined borders
  • Regular vs. irregular shapes
  • Success criterion: Dice variation ≤ 10% across morphologies

4. Anatomical Site Analysis:

  • Flat surfaces vs. curved/folded areas
  • Success criterion: IoU variation ≤ 20% across sites

5. Disease Condition Analysis:

  • Psoriasis, atopic dermatitis, vitiligo performance
  • Success criterion: Dice ≥ 0.80 for each condition

6. Image Quality Impact:

  • Performance vs. DIQA scores, angle, distance
  • Mitigation: Quality filtering, perspective correction

Bias Mitigation Strategies:

  • Balanced training data across Fitzpatrick types
  • Multi-scale augmentation
  • Boundary refinement post-processing

Results Summary: (to be completed)

SubpopulationDiceIoUArea ErrorAssessment
Fitzpatrick I-II[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick III-IV[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VI[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Small Lesions[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Large Lesions[TO FILL][TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Wound Surface Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Wound Surface Quantification section

This model segments wound areas for accurate wound size monitoring and healing progress assessment.

Clinical Significance: Wound area tracking is essential for treatment effectiveness evaluation and clinical documentation.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Hair Loss Surface Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Hair Loss Surface Quantification section

This model segments areas of hair loss for alopecia severity assessment and treatment monitoring.

Clinical Significance: Hair loss area quantification is critical for alopecia areata severity scoring (SALT score).

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Nail Lesion Surface Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Nail Lesion Surface Quantification section

This model segments nail lesion areas for psoriatic nail disease and onychomycosis assessment.

Clinical Significance: Nail involvement percentage is used in NAPSI (Nail Psoriasis Severity Index) scoring.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Hypopigmentation/Depigmentation Surface Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Hypopigmentation/Depigmentation Surface Quantification section

This model segments hypopigmented or depigmented areas for vitiligo extent assessment and repigmentation tracking.

Clinical Significance: Depigmentation area is essential for VASI (Vitiligo Area Scoring Index) and treatment response.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Skin Surface Segmentation​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Skin Surface Segmentation section

This model segments skin regions to distinguish skin from non-skin areas (clothing, background) for accurate lesion area calculations.

Clinical Significance: Accurate skin segmentation is a prerequisite for calculating lesion percentages relative to visible skin area.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Surface Area Quantification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Surface Area Quantification section

This is a generic surface area quantification model that processes segmentation masks from other surface area models and calibration data to compute accurate physical measurements (cm², percentages, etc.).

Clinical Significance: Provides standardized, accurate surface area measurements essential for scoring systems like PASI, EASI, BSA%, and VASI.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Acneiform Inflammatory Pattern Identification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Acneiform Inflammatory Pattern Identification section

This model identifies spatial patterns of acneiform inflammatory lesions (papules, pustules, nodules, cysts) to support differential diagnosis and severity assessment.

Clinical Significance: Pattern recognition aids in distinguishing acne subtypes and assessing distribution characteristics relevant for treatment planning.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Follicular and Inflammatory Pattern Identification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Follicular and Inflammatory Pattern Identification section

This model identifies follicular patterns (folliculitis, follicular prominence) and associated inflammatory patterns for conditions involving hair follicles.

Clinical Significance: Essential for diagnosing and characterizing follicular dermatoses and differentiating follicular from non-follicular inflammatory conditions.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Inflammatory Pattern Identification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Inflammatory Pattern Identification section

This model identifies general inflammatory patterns across various dermatological conditions (e.g., distribution patterns in psoriasis, atopic dermatitis, lichen planus).

Clinical Significance: Pattern recognition supports differential diagnosis and disease characterization.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Inflammatory Pattern Indicator​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Inflammatory Pattern Indicator section

This model provides binary indicators for the presence/absence of specific inflammatory patterns, supporting rapid pattern-based screening and classification.

Clinical Significance: Binary indicators enable efficient triaging and initial assessment of inflammatory pattern presence.

Data Requirements and Annotation​

Training Methodology​

Performance Results​

Verification and Validation Protocol​

Bias Analysis and Fairness Evaluation​


Dermatology Image Quality Assessment (DIQA)​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - DIQA section

This model assesses image quality to filter out images unsuitable for clinical analysis, ensuring reliable downstream model performance.

Clinical Significance: DIQA is critical for patient safety by preventing low-quality images from being analyzed, which could lead to incorrect clinical assessments.

Data Requirements and Annotation​

Data Requirements: Images annotated with quality scores (focus, lighting, angle, resolution)

Annotation document needed: R-TF-028-004 Data Annotation Instructions - DIQA (to be created)

Medical imaging experts and dermatologists rate images on:

  • Focus/sharpness (sharp, acceptable, blurry)
  • Lighting quality (well-lit, acceptable, poor)
  • Viewing angle (optimal, acceptable, extreme)
  • Overall usability (excellent, good, acceptable, poor, unusable)

Dataset statistics: (to be completed)

Training Methodology​

Architecture: [Classification model - to be determined]

Training approach:

  • Multi-task learning: Quality dimensions and overall score
  • Loss function: [Cross-entropy or ordinal regression]
  • Data augmentation: Synthetic blur, lighting variations, noise injection
  • Training duration: [NUMBER] epochs

Performance Results​

Success criteria:

  • Multi-class accuracy ≥ 0.85 for quality categories
  • Binary accuracy ≥ 0.90 for usable vs. unusable
  • AUC-ROC ≥ 0.90 for quality score prediction
MetricResultSuccess CriterionOutcome
Multi-class Accuracy[TO FILL]≥ 0.85[PENDING]
Binary Accuracy[TO FILL]≥ 0.90[PENDING]
AUC-ROC[TO FILL]≥ 0.90[PENDING]

Verification and Validation Protocol​

Test Design:

  • Test set with expert quality annotations across quality spectrum
  • Correlation with objective quality metrics (blur, contrast, exposure)
  • Clinical relevance testing: correlation between DIQA scores and downstream model performance

Complete Test Protocol:

  • Input: Images with varying quality levels
  • Processing: DIQA model inference
  • Output: Quality scores and usability classification
  • Ground truth: Expert quality assessments and objective measurements
  • Statistical analysis: Classification accuracy, AUC, correlation with objective metrics

Data Analysis Methods:

  • Confusion matrix for multi-class quality assessment
  • ROC curve analysis for binary usability classification
  • Correlation analysis: DIQA scores vs. objective metrics (blur kernel size, SNR, dynamic range)
  • Clinical impact analysis: Correlation between DIQA scores and clinical model performance
  • Success criterion: Images flagged as "poor" show significantly degraded clinical model performance

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure DIQA performs consistently across devices, conditions, and populations without unfairly rejecting valid images.

Subpopulation Analysis Protocol:

1. Device Analysis:

  • Performance consistency across imaging devices (smartphones, tablets, professional cameras)
  • Success criterion: Consistent accuracy across device types

2. Lighting Condition Analysis:

  • Accuracy across various lighting conditions (natural, artificial, mixed)
  • Success criterion: No systematic bias against specific lighting types

3. Skin Type Analysis:

  • Consistency for different Fitzpatrick skin types
  • Ensure darker skin images aren't systematically rated lower quality
  • Success criterion: No correlation between Fitzpatrick type and false rejection rate

4. Anatomical Site Analysis:

  • Performance across body sites with varying texture/features
  • Success criterion: Consistent performance across sites

Bias Mitigation Strategies:

  • Training on diverse imaging conditions and device types
  • Balanced dataset across Fitzpatrick types
  • Validation that quality assessment is independent of skin tone

Results Summary: (to be completed)

SubpopulationAccuracyFalse Rejection RateAssessment
Smartphone[TO FILL][TO FILL][PASS/FAIL]
Tablet[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick I-II[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VI[TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Fitzpatrick Skin Type Identification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Fitzpatrick section

This model classifies skin into Fitzpatrick types (I-VI) to enable bias monitoring and fairness evaluation across all clinical models.

Clinical Significance: Essential for ensuring equitable performance across diverse patient populations and detecting potential algorithmic bias.

Data Requirements and Annotation​

Data Requirements: Images annotated with Fitzpatrick skin type (I-VI)

Annotation document needed: R-TF-028-004 Data Annotation Instructions - Fitzpatrick (to be created)

Dermatologists classify skin type based on:

  • Visual assessment from images
  • Clinical history (sun sensitivity, tanning response)
  • Consensus from multiple experts for borderline cases

Dataset statistics: (to be completed)

Training Methodology​

Architecture: [Classification model - to be determined]

Training approach:

  • Multi-class classification (I through VI)
  • Hierarchical classification option (grouped types)
  • Data augmentation: Lighting variations
  • Training duration: [NUMBER] epochs

Performance Results​

Success criteria:

  • Overall accuracy ≥ 0.75 (6-class classification)
  • Grouped accuracy ≥ 0.85 (Light I-II, Medium III-IV, Dark V-VI)
  • Cohen's kappa ≥ 0.70 (agreement with expert classification)
MetricResultSuccess CriterionOutcome
Overall Accuracy[TO FILL]≥ 0.75[PENDING]
Grouped Accuracy[TO FILL]≥ 0.85[PENDING]
Cohen's Kappa[TO FILL]≥ 0.70[PENDING]

Verification and Validation Protocol​

Test Design:

  • Test set with dermatologist-confirmed Fitzpatrick types
  • Diverse anatomical sites and lighting conditions
  • Validation against self-reported and expert-assessed types

Complete Test Protocol:

  • Input: Images of skin from diverse populations
  • Processing: Fitzpatrick classification model inference
  • Output: Predicted Fitzpatrick type (I-VI) with confidence scores
  • Ground truth: Expert dermatologist assessment with clinical history
  • Statistical analysis: Accuracy, Cohen's kappa, confusion matrix

Data Analysis Methods:

  • Multi-class classification accuracy (exact type matching)
  • Grouped classification accuracy (I-II vs. III-IV vs. V-VI)
  • Confusion matrix analysis to identify systematic misclassifications
  • Cohen's kappa for inter-rater reliability with experts
  • Clinical impact: Verification that misclassifications do not lead to bias in clinical models

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: This model itself is a bias mitigation tool. Validation ensures accurate identification across the full Fitzpatrick spectrum.

Validation Strategy:

1. Spectrum Coverage:

  • Accurate identification across full Fitzpatrick I-VI spectrum
  • No systematic over/under-prediction of certain types
  • Success criterion: Balanced precision/recall across all types

2. Anatomical Site Consistency:

  • Consistent performance across body sites
  • Recognition that pigmentation varies by site
  • Success criterion: Site-independent accuracy

3. Lighting Robustness:

  • Minimal impact from lighting variations
  • Success criterion: Consistent classification under various lighting

4. Clinical Outcome Validation:

  • Ensures proper functioning of bias monitoring in clinical models
  • Comparison with gold standard expert assessment
  • Cross-validation with population-representative datasets

Bias Mitigation Strategies:

  • Balanced training data across all Fitzpatrick types
  • Multiple anatomical sites per type
  • Diverse lighting conditions

Results Summary: (to be completed)

Fitzpatrick TypePrecisionRecallF1-scoreAssessment
Type I[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Type II[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Type III[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Type IV[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Type V[TO FILL][TO FILL][TO FILL][PASS/FAIL]
Type VI[TO FILL][TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Domain Validation​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Domain Validation section

This model verifies that input images are within the validated domain (dermatological images) vs. non-skin images, preventing clinical models from processing invalid inputs.

Clinical Significance: Critical safety function preventing misuse and ensuring clinical models only analyze appropriate dermatological images.

Data Requirements and Annotation​

Data Requirements:

  • Positive set: Diverse dermatological images (all conditions, sites, qualities)
  • Negative set: Non-skin images (objects, animals, scenes, other body parts)

Dataset statistics: (to be completed)

Training Methodology​

Architecture: [Binary classifier - to be determined]

Training approach:

  • Binary classification (in-domain vs. out-of-domain)
  • Loss function: [Binary cross-entropy]
  • Negative sampling strategy for diverse out-of-domain examples
  • Training duration: [NUMBER] epochs

Performance Results​

Success criteria:

  • Sensitivity ≥ 0.95 (correctly identify valid dermatological images)
  • Specificity ≥ 0.99 (correctly reject non-dermatological images)
  • False positive rate ≤ 1% (minimize incorrect rejections)
MetricResultSuccess CriterionOutcome
Sensitivity[TO FILL]≥ 0.95[PENDING]
Specificity[TO FILL]≥ 0.99[PENDING]
False Positive Rate[TO FILL]≤ 1%[PENDING]

Verification and Validation Protocol​

Test Design:

  • Positive set: Diverse dermatological images (all conditions, sites, qualities)
  • Negative set: Non-skin images (objects, animals, indoor/outdoor scenes, other body parts)
  • Edge cases: Images with partial skin, heavily zoomed, extreme angles

Complete Test Protocol:

  • Input: Mixed dataset of in-domain and out-of-domain images
  • Processing: Binary domain classification
  • Output: In-domain probability score
  • Ground truth: Expert-confirmed domain labels
  • Statistical analysis: Sensitivity, specificity, ROC-AUC, threshold optimization

Data Analysis Methods:

  • ROC curve analysis to determine optimal threshold
  • Sensitivity/specificity trade-off analysis
  • False positive analysis: Characterization of incorrectly rejected valid images
  • False negative analysis: Characterization of incorrectly accepted invalid images
  • Clinical safety evaluation: Ensure out-of-domain images don't reach clinical models

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure domain validation doesn't unfairly reject valid dermatological images from any subpopulation.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis:

  • Equal sensitivity across all Fitzpatrick types
  • Success criterion: No correlation between skin type and false rejection

2. Anatomical Site Analysis:

  • Consistent performance across all body sites
  • Success criterion: No site-specific rejection bias

3. Condition Diversity Analysis:

  • Robust to various imaging conditions and devices
  • Success criterion: Consistent specificity across conditions

4. Safety Validation:

  • Ensure unusual but valid dermatological presentations aren't rejected
  • Rare conditions, severe cases tested explicitly
  • Success criterion: Sensitivity ≥ 0.90 for rare but valid conditions

Bias Mitigation Strategies:

  • Training on comprehensive dermatological diversity
  • Explicit inclusion of rare conditions in positive set
  • Conservative threshold setting favoring sensitivity

Results Summary: (to be completed)

SubpopulationSensitivitySpecificityAssessment
Fitzpatrick I-II[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VI[TO FILL][TO FILL][PASS/FAIL]
Rare Conditions[TO FILL][TO FILL][PASS/FAIL]
Severe Presentations[TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Body Site Identification​

Model Overview​

Reference: R-TF-028-001 AI/ML Description - Body Site Identification section

This model identifies anatomical locations from images to support site-specific scoring systems and bias monitoring.

Clinical Significance: Anatomical site identification enables automatic application of site-specific scoring rules and demographic performance monitoring.

Data Requirements and Annotation​

Data Requirements: Images annotated with anatomical location

Medical experts label images with:

  • Fine-grained anatomical sites (e.g., dorsal hand, volar forearm)
  • Body regions (head/neck, trunk, upper extremity, lower extremity)
  • Consensus for ambiguous cases

Dataset statistics: (to be completed)

Training Methodology​

Architecture: [Multi-class classifier - to be determined]

Training approach:

  • Hierarchical classification (fine-grained and grouped)
  • Loss function: [Cross-entropy or hierarchical loss]
  • Data augmentation: Rotations, scaling, perspective
  • Training duration: [NUMBER] epochs

Performance Results​

Success criteria:

  • Top-1 accuracy ≥ 0.70 (exact anatomical site)
  • Top-3 accuracy ≥ 0.85 (includes correct site in top 3)
  • Grouped accuracy ≥ 0.90 (body region level)
MetricResultSuccess CriterionOutcome
Top-1 Accuracy[TO FILL]≥ 0.70[PENDING]
Top-3 Accuracy[TO FILL]≥ 0.85[PENDING]
Grouped Accuracy[TO FILL]≥ 0.90[PENDING]

Verification and Validation Protocol​

Test Design:

  • Test set with expert-confirmed anatomical locations
  • Coverage of all major body sites
  • Challenging cases: Similar sites, unusual angles, partial views

Complete Test Protocol:

  • Input: Dermatological images from test set
  • Processing: Multi-class anatomical site classification
  • Output: Predicted body site with confidence scores
  • Ground truth: Expert-confirmed anatomical labels
  • Statistical analysis: Top-k accuracy, confusion matrix, grouped accuracy

Data Analysis Methods:

  • Multi-class classification accuracy (fine-grained sites)
  • Hierarchical classification accuracy (body regions)
  • Confusion matrix to identify commonly confused sites
  • Clinical impact analysis: Assess whether site misidentifications affect clinical model outputs

Test Conclusions: (to be completed after validation)

Bias Analysis and Fairness Evaluation​

Objective: Ensure anatomical site identification performs consistently across demographics.

Subpopulation Analysis Protocol:

1. Fitzpatrick Skin Type Analysis:

  • Performance across skin types
  • Recognition that anatomical identification may vary with pigmentation
  • Success criterion: Accuracy variation ≤ 10% across Fitzpatrick types

2. Age Group Analysis:

  • Accuracy for pediatric vs. adult anatomy
  • Success criterion: Consistent performance across age groups

3. Sex/Gender Analysis:

  • Performance across sex/gender (anatomical differences)
  • Success criterion: No systematic bias in site recognition

4. Body Habitus Analysis:

  • Performance across different body types
  • Success criterion: Consistent accuracy regardless of body habitus

Bias Mitigation Strategies:

  • Training data balanced across demographics and anatomical sites
  • Diverse representation of body types
  • Age-diverse dataset

Results Summary: (to be completed)

SubpopulationAccuracyGrouped AccuracyAssessment
Fitzpatrick I-II[TO FILL][TO FILL][PASS/FAIL]
Fitzpatrick V-VI[TO FILL][TO FILL][PASS/FAIL]
Pediatric[TO FILL][TO FILL][PASS/FAIL]
Adult[TO FILL][TO FILL][PASS/FAIL]

Bias Analysis Conclusion: (to be completed)


Summary and Conclusion​

The development and validation activities described in this report provide objective evidence that the AI algorithms for Legit.Health Plus meet their predefined specifications and performance requirements.

Status of model development and validation:

  • ICD Category Distribution and Binary Indicators: [Status to be updated]
  • Visual Sign Intensity Models: [Status to be updated]
  • Lesion Quantification Models: [Status to be updated]
  • Surface Area Models: [Status to be updated]
  • Non-Clinical Support Models: [Status to be updated]

The development process adhered to the company's QMS and followed Good Machine Learning Practices. Models meeting their success criteria are considered verified, validated, and suitable for release and integration into the Legit.Health Plus medical device.

State of the Art Compliance and Development Lifecycle​

Software Development Lifecycle Compliance​

The AI models in Legit.Health Plus were developed in accordance with state-of-the-art software development practices and international standards:

Applicable Standards and Guidelines:

  • IEC 62304:2006+AMD1:2015 - Medical device software lifecycle processes
  • ISO 13485:2016 - Quality management systems for medical devices
  • ISO 14971:2019 - Application of risk management to medical devices
  • ISO/IEC 25010:2011 - Systems and software quality requirements and evaluation (SQuaRE)
  • FDA Guidance on Software as a Medical Device (SAMD) - Clinical evaluation and predetermined change control plans
  • IMDRF/SaMD WG/N41 FINAL:2017 - Software as a Medical Device: Key Definitions
  • Good Machine Learning Practice (GMLP) - FDA/Health Canada/UK MHRA Guiding Principles (2021)
  • Proposed Regulatory Framework for Modifications to AI/ML-Based SaMD - FDA Discussion Paper (2019)

Development Lifecycle Phases Implemented:

  1. Requirements Analysis: Comprehensive AI model specifications defined in R-TF-028-001 AI/ML Description
  2. Development Planning: Structured development plan in R-TF-028-002 AI Development Plan
  3. Risk Management: AI-specific risk analysis in R-TF-028-011 AI Risk Matrix
  4. Design and Architecture: State-of-the-art architectures (Vision Transformers, CNNs, object detection, segmentation)
  5. Implementation: Following coding standards and version control practices
  6. Verification: Unit testing, integration testing, and algorithm validation
  7. Validation: Clinical performance testing against predefined success criteria
  8. Release: Version-controlled releases with complete traceability
  9. Maintenance: Post-market surveillance and performance monitoring

Version Control and Traceability:

  • All model versions tracked in version control systems (Git)
  • Complete traceability from requirements to validation results
  • Dataset versions documented with checksums and provenance
  • Model artifacts stored with complete training metadata
  • Documented change control process for model updates

State of the Art in AI Development​

Best Practices Implemented:

1. Data Management Excellence:

  • Multi-source data collection with demographic diversity
  • Rigorous data quality control and curation processes
  • Systematic annotation protocols with multi-expert consensus
  • Data partitioning strategies preventing data leakage
  • Sequestered test sets for unbiased evaluation

2. Model Architecture Selection:

  • Use of state-of-the-art architectures (Vision Transformers for classification, YOLO/Faster R-CNN for detection, U-Net/DeepLab for segmentation)
  • Transfer learning from large-scale pre-trained models
  • Architecture selection based on published benchmark performance
  • Justification of architecture choices documented per model

3. Training Best Practices:

  • Systematic hyperparameter optimization
  • Cross-validation and early stopping to prevent overfitting
  • Data augmentation for robustness and generalization
  • Multi-task learning where clinically appropriate
  • Monitoring of training metrics and convergence

4. Model Calibration and Post-Processing:

  • Temperature scaling for probability calibration
  • Test-time augmentation for robust predictions
  • Ensemble methods where applicable
  • Uncertainty quantification for model predictions

5. Comprehensive Validation:

  • Independent test sets never used during development
  • External validation on diverse datasets
  • Clinical ground truth from expert consensus
  • Statistical rigor with confidence intervals
  • Comprehensive subpopulation analysis

6. Bias Mitigation and Fairness:

  • Systematic bias analysis across demographic subpopulations
  • Fitzpatrick skin type stratification in all analyses
  • Data collection strategies ensuring demographic diversity
  • Bias monitoring models (DIQA, Fitzpatrick identification)
  • Transparent reporting of performance disparities

7. Explainability and Transparency:

  • Attention visualization for model interpretability (where applicable)
  • Clinical reasoning transparency (top-k predictions with probabilities)
  • Documentation of model limitations and known failure modes
  • Clear communication of uncertainty in predictions

Risk Management Throughout Lifecycle​

Risk Management Process:

Risk management is integrated throughout the entire AI development lifecycle following ISO 14971:

1. Risk Analysis:

  • Identification of AI-specific hazards (data bias, model errors, distribution shift)
  • Hazardous situation analysis (incorrect predictions leading to clinical harm)
  • Risk estimation combining probability and severity

2. Risk Evaluation:

  • Comparison of risks against predefined acceptability criteria
  • Benefit-risk analysis for each AI model
  • Clinical impact assessment of potential errors

3. Risk Control:

  • Inherent safety by design (offline models, no learning from deployment data)
  • Protective measures (DIQA filtering, domain validation, confidence thresholds)
  • Information for safety (user training, clinical decision support context)

4. Residual Risk Evaluation:

  • Assessment of risks after control measures
  • Verification that all risks reduced to acceptable levels
  • Overall residual risk acceptability

5. Risk Management Review:

  • Production and post-production information review
  • Update of risk management file
  • Traceability to safety risk matrix (R-TF-028-011 AI Risk Matrix)

AI-Specific Risk Controls:

  • Data Quality Risks: Multi-source collection, systematic annotation, quality control
  • Model Overfitting: Sequestered test sets, cross-validation, regularization
  • Bias and Fairness: Demographic diversity, subpopulation analysis, bias monitoring
  • Model Uncertainty: Calibration, confidence scores, uncertainty quantification
  • Distribution Shift: Domain validation, DIQA filtering, performance monitoring
  • Clinical Misinterpretation: Clear communication, clinical context, user training

Information Security​

Cybersecurity Considerations:

The AI models are designed with information security principles integrated throughout development:

1. Model Security:

  • Model parameters stored securely with access controls
  • Model integrity verification (checksums, digital signatures)
  • Protection against model extraction or reverse engineering
  • Secure deployment pipelines

2. Data Security:

  • Patient data protection throughout development (de-identification, anonymization)
  • Secure data storage with encryption at rest
  • Secure data transmission with encryption in transit
  • Access controls and audit logging for training data

3. Inference Security:

  • Secure API endpoints for model inference
  • Input validation to prevent adversarial attacks
  • Rate limiting and authentication
  • Output validation and sanity checking

4. Privacy Considerations:

  • No patient-identifiable information stored in models
  • Training data anonymization and de-identification
  • Compliance with GDPR, HIPAA, and applicable privacy regulations
  • Data minimization principles applied

5. Vulnerability Management:

  • Regular security assessments of AI infrastructure
  • Dependency scanning for software libraries
  • Patch management for underlying frameworks
  • Incident response procedures

6. Adversarial Robustness:

  • Consideration of adversarial attack scenarios
  • Input preprocessing to detect anomalous inputs
  • Domain validation to reject out-of-distribution inputs
  • DIQA filtering to reject manipulated or low-quality images

Cybersecurity Risk Assessment:

Cybersecurity risks are addressed in the overall device risk management file, including:

  • Threat modeling for AI components
  • Attack surface analysis
  • Mitigation strategies and security controls
  • Monitoring and incident response

Verification and Validation Strategy​

Verification Activities (confirming that the AI models implement their specifications):

  • Code reviews and static analysis
  • Unit testing of model components
  • Integration testing of model pipelines
  • Architecture validation against specifications
  • Performance benchmarking against target metrics

Validation Activities (confirming that AI models meet intended use):

  • Independent test set evaluation with sequestered data
  • External validation on diverse datasets
  • Clinical ground truth comparison
  • Subpopulation performance analysis
  • Real-world performance assessment
  • Usability and clinical workflow validation

Documentation of Verification and Validation:

Complete documentation is maintained for all verification and validation activities:

  • Test protocols with detailed methodology
  • Complete test results with statistical analysis
  • Data summaries and test conclusions
  • Traceability from requirements to test results
  • Identified deviations and their resolutions

This comprehensive approach ensures compliance with GSPR 17.2 requirements for software development in accordance with state of the art, incorporating development lifecycle management, risk management, information security, verification, and validation.

AI Risks Assessment Report​

AI Risk Assessment​

A comprehensive risk assessment was conducted throughout the development lifecycle in accordance with the R-TF-028-002 AI Development Plan. All identified AI-specific risks related to data, model training, and performance were documented and analyzed in the R-TF-028-011 AI Risk Matrix.

AI Risk Treatment​

Control measures were implemented to mitigate all identified risks. Key controls included:

  • Rigorous data curation and multi-source collection to mitigate bias.
  • Systematic model training and validation procedures to prevent overfitting.
  • Use of a sequestered test set to ensure unbiased performance evaluation.
  • Implementation of model calibration to improve the reliability of outputs.

Residual AI Risk Assessment​

After the implementation of control measures, a residual risk analysis was performed. All identified AI risks were successfully reduced to an acceptable level.

AI Risk and Traceability with Safety Risk​

Safety risks related to the AI algorithms (e.g., incorrect diagnosis suggestion, misinterpretation of data) were identified and traced back to their root causes in the AI development process. These safety risks have been escalated for management in the overall device Safety Risk Matrix, in line with ISO 14971.

Conclusion​

The AI development process has successfully managed and mitigated inherent risks to an acceptable level. The benefits of using the Legit.Health Plus algorithms as a clinical decision support tool are judged to outweigh the residual risks.

Related Documents​

Project Design and Plan​

  • R-TF-028-001 AI/ML Description - Complete specifications for all AI models
  • R-TF-028-002 AI Development Plan - Development methodology and lifecycle
  • R-TF-028-011 AI Risk Matrix - AI-specific risk assessment and mitigation

Data Collection and Annotation​

  • R-TF-028-003 Data Collection Instructions - Public datasets and clinical study data collection protocols
  • R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping - Foundational diagnostic label standardization (completed)
  • R-TF-028-004 Data Annotation Instructions - Visual Signs - Intensity, count, and extent annotations for visual sign models (completed)
  • R-TF-028-004 Data Annotation Instructions - DIQA - Image quality assessment annotations (to be created)
  • R-TF-028-004 Data Annotation Instructions - Fitzpatrick - Skin type annotations (to be created)
  • R-TF-028-004 Data Annotation Instructions - Body Site - Anatomical location annotations (if needed)

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-001
Previous
R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping
Next
R-TF-028 AI Release Report
  • Introduction
    • Context
    • Algorithms Description
    • AI Standalone Evaluation Objectives
  • Data Management
    • Overview
    • Data Collection
    • Foundational Annotation: ICD-11 Mapping
  • Model Development and Validation
    • ICD Category Distribution and Binary Indicators
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Erythema Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Desquamation Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Induration Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Pustule Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Crusting Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Xerosis Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Swelling Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Oozing Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Excoriation Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Lichenification Intensity Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Wound Characteristic Assessment
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Nodular Lesion Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Acneiform Lesion Type Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Lesion Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Hive Lesion Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Body Surface Segmentation
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Wound Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Hair Loss Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Nail Lesion Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Hypopigmentation/Depigmentation Surface Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Skin Surface Segmentation
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Surface Area Quantification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Acneiform Inflammatory Pattern Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Follicular and Inflammatory Pattern Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Pattern Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Inflammatory Pattern Indicator
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Dermatology Image Quality Assessment (DIQA)
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Fitzpatrick Skin Type Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Domain Validation
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
    • Body Site Identification
      • Model Overview
      • Data Requirements and Annotation
      • Training Methodology
      • Performance Results
      • Verification and Validation Protocol
      • Bias Analysis and Fairness Evaluation
  • Summary and Conclusion
  • State of the Art Compliance and Development Lifecycle
    • Software Development Lifecycle Compliance
    • State of the Art in AI Development
    • Risk Management Throughout Lifecycle
    • Information Security
    • Verification and Validation Strategy
  • AI Risks Assessment Report
    • AI Risk Assessment
    • AI Risk Treatment
    • Residual AI Risk Assessment
    • AI Risk and Traceability with Safety Risk
    • Conclusion
  • Related Documents
    • Project Design and Plan
    • Data Collection and Annotation
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)