R-TF-028-001 AI Description
Table of contents
- Purpose
- Scope
- Non-Clinical Model Overview
- Description and Specifications
- Integration and Environment
- References
- Traceability to QMS Records
Purpose
This document defines the specifications, performance requirements, and data needs for the non-clinical Artificial Intelligence (AI) models used in the Legit.Health Plus device.
Scope
This document details the design and performance specifications for all non-clinical AI algorithms integrated into the Legit.Health Plus device. It establishes the foundation for the development, validation, and risk management of these models.
This description covers the following key areas for each algorithm:
- Algorithm description, clinical objectives, and justification.
- Performance endpoints and acceptance criteria.
- Specifications for the data required for development and evaluation.
- Requirements related to cybersecurity, transparency, and integration.
- Links between the AI specifications and the overall risk management process.
Non-Clinical Model Overview
The Legit.Health Plus device integrates several non-clinical AI models that are essential for robust, equitable, and high-quality operation of the system. These models do not provide clinical diagnostic outputs, but instead perform technical, quality, and contextual functions that support the overall performance, safety, and fairness of the device. Non-clinical models include:
- Image quality and preprocessing models (e.g., color correction)
- Contextual attribute models (e.g., skin tone identification, body site identification)
- Technical validation models (e.g., 3D reconstruction for area quantification)
These models:
- Perform quality assurance, preprocessing, and technical validation
- Enable downstream clinical models to operate within validated domains and with standardized inputs
- Support equity, bias mitigation, and performance monitoring across diverse populations
- Provide structured, non-clinical metadata (e.g., skin tone, body site, image quality) to enhance device reliability and fairness
- Do not generate clinical diagnostic or severity outputs, nor do they provide interpretative distributions of ICD categories
Key Non-Clinical Models and Their Functions:
- Acneiform Inflammatory Pattern Identification: Translates objective lesion counts and density into standardized IGA severity scores, supporting consistent acne severity assessment.
- Skin Tone Identification: Automatically classifies images by Fitzpatrick and Monk skin tone scales to support bias mitigation, personalization, and regulatory compliance.
- Body Site Identification: Detects anatomical regions present in images, enabling context-aware processing, BSA calculations, and site-specific workflow optimization.
- 3D Surface Area Quantification: Transforms 2D image segmentations into real-world 3D measurements, supporting accurate, reproducible area and volume calculations for research and quality assurance.
- Color Correction: Standardizes color representation in images using reference markers, ensuring reliable color features for downstream models and human interpretation.
These non-clinical models are described in detail in the following section.
Description and Specifications
Acneiform Inflammatory Pattern Identification
Description
A mathematical equation ingests the tabular features derived from the Acneiform Inflammatory Lesion Quantification algorithm and outputs an score, on the scale [4], aligned with Investigator's Global Assessment (IGA).
The equation, with parameters and , takes as input:
- The number of acneiform inflammatory lesions .
- The density of acneiform inflammatory lesions .
The final output is weighted x2.5 to align with an [10] scale, rather than [4], for a more granular output.
![]() | ![]() | ![]() |
|---|
Sample images with acneiform inflammatory lesion detections and their confidence,the number of lesion (N), the density of the lesions (D), the calculated IGA scores, and the calculated ALADIN scores.
Objectives
- Support healthcare professionals in providing standardized acne severity assessment using the validated Investigator's Global Assessment (IGA) scale.
- Reduce inter-observer variability in IGA scoring, which shows moderate agreement (κ = 0.50-0.70) between raters in clinical practice [112].
- Enable automated severity classification by translating objective lesion counts and density into clinically meaningful IGA categories.
- Ensure reproducibility by basing severity assessment on quantitative features rather than subjective visual impression.
- Facilitate treatment decision-making by providing standardized severity grades that align with evidence-based treatment guidelines (e.g., topical therapy for mild, systemic therapy for severe).
- Support clinical trial endpoints by providing consistent, reproducible IGA assessments as required by regulatory agencies.
Justification (Clinical Evidence):
- The IGA scale is a widely validated tool for acne severity assessment and is the most commonly used primary endpoint in acne clinical trials [113, 114].
- Manual IGA assessment shows substantial inter-observer variability (κ = 0.50-0.70), with particular difficulty in distinguishing between adjacent grades [112].
- Objective lesion counting combined with algorithmic severity classification has been shown to improve consistency (κ improvement to 0.75-0.85) compared to purely visual IGA assessment [115].
- Treatment guidelines are explicitly linked to IGA grades, with clear recommendations for topical monotherapy (IGA 1-2), combination therapy (IGA 2-3), and systemic therapy consideration (IGA 3-4) [116].
- Regulatory agencies require validated severity measures for acne trials, with IGA being the most accepted scale for primary efficacy endpoints [117].
- Studies show that automated severity grading reduces assessment time by 40-60% while maintaining or improving accuracy compared to manual grading [118].
Endpoints and Requirements
Performance is evaluated using pearson correlation metric between the predicted and expert consensus, to ensure that the model aligns with the criteria from expert dermatologists.
| Metric | Threshold | Interpretation |
|---|---|---|
| Pearson correlation | ≤ Expert Inter-observer Variability | Model performance is non-inferior to expert inter-observer variability |
Justification of the succeed criteria:
- IGA is the scoring system recommended by the FDA for acne severity assessment in clinical trials. Therefore we seek for high correlation to this scale.
- IGA is inherently subjective, with documented inter-observer variability among dermatologists.
- The stablished succeed criteria ensures that the model's predictions are not less reliable than those made by expert dermatologists, making it suitable for clinical and research applications.
Requirements:
- Implement a tabular model (e.g., gradient boosting, mathematical equation, random forest, neural network, or other ML model) that:
- Accepts numerical inputs dereived from Acneiform Inflammatory Lesion Quantification, such as the total inflammatory lesion count, lesion density, anatomical site identifiers, affected surface area, etc.
- Outputs a severity score highly correlated to the IGA scale.
- Demonstrate a correlation with the ground-truth data non-inferior to the inter-observer variability among expert dermatologists.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse dataset including:
- Full range of IGA grades (0-4)
- Diverse patient populations (e.g., various Fitzpatrick skin types)
- Ensure outputs are compatible with:
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for acne treatment recommendations
- Treatment guidelines that specify interventions based on IGA grade
- Clinical trial data collection systems requiring standardized IGA assessments
- Document the model optimization strategy including:
- Feature design
- Hyperparameter optimization methodology
- Rationale for model selection (if multiple architectures compared)
- Provide evidence that:
- The model generalizes across different patient populations
- Predictions align with dermatologist consensus and clinical treatment guidelines
Skin Tone Identification
Description
A deep learning multi-class classification model ingests a clinical dermatological image and outputs two probability distributions: one for the six Fitzpatrick skin tone categories and another for the ten Monk skin tone categories.
Fitzpatrick Skin Tone Classification
where each corresponds to the probability that the skin in the image belongs to Fitzpatrick skin tone , and .
The Fitzpatrick skin tones are defined as:
- Type I: Very fair skin, always burns, never tans (pale white skin, often with red/blonde hair)
- Type II: Fair skin, usually burns, tans minimally (white skin, burns easily)
- Type III: Medium skin, sometimes burns, tans uniformly (cream white skin, burns moderately)
- Type IV: Olive skin, rarely burns, tans easily (moderate brown skin)
- Type V: Brown skin, very rarely burns, tans very easily (dark brown skin)
- Type VI: Dark brown to black skin, never burns, tans very easily (deeply pigmented dark brown to black skin)
The predicted Fitzpatrick type is:
Monk Skin Tone Classification
where each corresponds to the probability that the skin in the image belongs to Monk skin tone category , and .
The Monk skin tone categories are defined as:
- Category 0: Lightest skin tone
- Category 1: Very light skin tone
- Category 2: Light skin tone
- Category 3: Light-medium skin tone
- Category 4: Medium skin tone
- Category 5: Medium-dark skin tone
- Category 6: Dark skin tone
- Category 7: Very dark skin tone
- Category 8: Deeply dark skin tone
- Category 9: Darkest skin tone
The predicted Monk skin tone is:
Additional Outputs
For both Fitzpatrick and Monk classifications, the model outputs a confidence scores representing the certainty of the classification.
![]() | ![]() | ![]() |
|---|---|---|
![]() | ![]() | ![]() |
Sample images with their predicted Fitzpatrick and Monk skin tone categories.
Objectives
- Enable automated skin tone detection to support personalized dermatological AI models that require skin tone information for accurate predictions.
- Reduce assessment variability in skin tone classification, which shows moderate inter-observer agreement (κ = 0.50-0.65) even among dermatologists [213, 214].
- Support bias mitigation in AI models by identifying underrepresented skin tones in datasets and ensuring equitable performance across all Fitzpatrick and Monk skin tones.
- Facilitate treatment personalization by providing objective skin tone information relevant for phototherapy dosing, laser treatment parameters, and topical therapy selection.
- Enable research stratification by providing consistent skin tone classification for clinical trials and real-world evidence studies.
- Support regulatory compliance by ensuring AI models are validated across diverse skin tones as required by regulatory guidelines.
- Improve telemedicine accessibility by providing automated skin tone assessment in remote settings where patient-reported skin tone may be unreliable.
Justification (Clinical Evidence):
- Fitzpatrick skin tone is a critical factor in dermatological assessment, influencing disease presentation, treatment selection, and AI model performance [215, 216].
- Self-reported Fitzpatrick type shows poor accuracy, with concordance to expert assessment ranging from 40-60%, particularly for intermediate types (III-IV) [217, 218].
- AI model performance shows significant disparities across skin tones, with accuracy degradation of 10-30% for darker skin tones (V-VI) when models are trained on predominantly lighter skin datasets [219, 220].
- Automated skin tone detection enables adaptive AI models that adjust prediction thresholds or use skin tone-specific models, improving accuracy by 15-25% for underrepresented groups [221].
- Treatment dosing for phototherapy and laser procedures requires accurate skin tone assessment, with misclassification leading to suboptimal efficacy or adverse events in 15-20% of cases [222].
- Clinical trials increasingly require Fitzpatrick type stratification to demonstrate equitable treatment efficacy and safety across diverse populations [223].
- Studies show that objective skin tone classification improves inter-rater reliability from κ = 0.50-0.65 (manual) to κ = 0.75-0.85 (automated) [224].
- Automated detection addresses the limitation of visual assessment under different lighting conditions, which can shift perceived skin tone by 1-2 categories [225].
Endpoints and Requirements
Performance is evaluated using classification accuracy and mean absolute error compared to expert Fitzpatrick and Monk assessments.
| Metric | Threshold | Interpretation |
|---|---|---|
| Fitzpatrick Accuracy | ≥ inter-rater variability | Performance non-inferior to expert criteria. |
| Fitzpatrick MAE | ≤ 1 | Average error less than a category away from expert criteria. |
| Monk Accuracy | ≥ inter-rater variability | Performance non-inferior to expert criteria. |
| Monk MAE | ≤ 1 | Average error less than a category away from expert criteria. |
All thresholds must be achieved with 95% confidence intervals.
Threshold Justification:
- The difficulty of skin tone assessment varies significantly between clinical and non-clinical settings.
- The difficulty of skin tone assessment depends on the illuminance quality of the images, which varies significantly between datasets.
- Therefore, it is more appropriate to evaluate model performance against the inter-rater variability established for the specific evaluation dataset, rather than a fixed absolute threshold.
- The ordinal nature of skin tone categories means that adjacent-type misclassifications are more clinically acceptable than distant errors. Therefore, a Mean Absolute Error (MAE) of ≤ 1 category allows for acceptable errors in the vicinity of the true stage, reflecting real-world clinical variability.
Requirements:
- Implement a deep learning classification architecture optimized for skin tone analysis.
- Output structured data including:
- Probability distribution across all categories
- Predicted skin tone with confidence score
- Demonstrate performance meeting or exceeding all thresholds:
- Overall accuracy ≥ inter-rater variability
- MAE ≤ 1
- Ensure outputs are compatible with:
- Downstream AI models that require skin tone information as input
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for treatment personalization
- Bias monitoring dashboards tracking AI performance across skin tones
- Research data collection systems for clinical trial stratification
- Document the training strategy including:
- Data collection protocol ensuring balanced representation
- Multi-expert annotation protocol for ground truth establishment
- Handling of class imbalance (if present)
- Data augmentation strategies preserving skin tone characteristics
- Regularization and calibration techniques
- Transfer learning approach (if applicable)
- Provide evidence that:
- The model provides equitable performance for all categories (no systematic bias)
- Predictions align with expert dermatologist consensus
- Include bias assessment and mitigation:
- Regular auditing of performance disparities across skin tones
- Documentation of dataset composition by skin tone category
- Strategies for addressing underrepresentation in training data
- Transparency reporting on per-type performance metrics
- Continuous monitoring of real-world performance across diverse populations
Clinical Impact:
The Skin Tone Identification model serves multiple critical functions:
- Bias mitigation: Enables skin tone-aware AI models that maintain equitable performance across all populations
- Treatment personalization: Supports accurate dosing for phototherapy, laser procedures, and skin-tone-specific therapeutics
- Research equity: Ensures clinical trials include and stratify diverse skin tones for representative evidence
- Quality assurance: Validates that dermatological AI systems perform equitably across all skin tone categories
- Regulatory compliance: Demonstrates AI model validation across diverse populations as required by regulatory agencies
- Clinical workflow integration: Provides automated skin tone documentation for electronic health records
Skin 3D Reconstruction
Description
This method transforms 2D pixel coordinates from an standard 2D image into 3D world metric coordinates, enabling comprehensive and accurate spatial analysis of skin surfaces.
This method leverages 3D metric maps and camera calibration parameters to convert pixel coordinates into real-world measurements, accounting for depth variations and perspective distortion.
For any given pixel coordinate , the 3D metric world coordinates are computed using the following equation:
Where:
- is the camera intrinsic matrix.
- is the metric depth value at pixel .
By applying this mathematical transformation to every point of a target surface segmentation, the method allows for straightforward geometric analysis, including the calculation of area, perimeter, axes, volume, and depth.
![]() | ![]() | ![]() |
|---|
Left: Sample image with a delineated target surface and 3 reference markers. Results shows the area (A), perimeter (P), depth (D), volume (V), and dimensions (width x height) in cm. Center: depth map of the image and marker detections. Right: 3D visualization of the target surface, markers, and surface axes (width, height, and depth). Side perspective.
Objectives
- Enable accurate surface area quantification in body surface area (BSA) affected calculations for severity scoring systems (PASI, EASI, burn assessment, vitiligo VASI).
- Account for depth variation across non-planar body surfaces, providing more accurate measurements than simple 2D planimetry.
- Reduce measurement error associated with perspective distortion, camera angle, and irregular body surface curvature.
- Provide calibrated measurements in standardized physical units (e.g., mm²) for clinical documentation and research.
- Enable automated BSA percentage calculation by combining surface area measurements with body site identification.
- Support telemedicine workflows where physical ruler measurements are impractical or unavailable.
Justification (Clinical Evidence):
- Body surface area quantification is fundamental to severity scoring in dermatology, with PASI, EASI, and burn assessment all requiring accurate BSA affected estimates [275, 276].
- Manual BSA estimation shows high inter-observer variability (coefficient of variation 20-40%), particularly for irregular lesions or when visual estimation methods are used [277, 278].
- Simple 2D planimetry without depth correction introduces systematic errors of 15-35% when measuring non-planar body surfaces due to perspective distortion and surface curvature [279].
- Reference-based calibration has been validated in wound measurement showing accuracy within 5-10% of gold-standard methods (water displacement, 3D scanning) [280, 281].
- Monocular depth estimation combined with calibration markers achieves mean absolute error <8% for surface area quantification on curved surfaces [282].
- Automated BSA quantification improves reproducibility in clinical trials, with standardized measurements showing 50-70% reduction in outcome variability compared to visual estimation [283].
- Depth-aware surface area calculation is particularly critical for body sites with significant curvature (joints, torso, scalp) where 2D approximations introduce substantial error [284].











