R-TF-028-001 AI Description
Table of contents
- Purpose
 - Scope
 - Algorithm summary
 - Algorithm Classification
 - Description and Specifications
- ICD Category Distribution and Binary Indicators
 - Erythema Intensity Quantification
 - Desquamation Intensity Quantification
 - Induration Intensity Quantification
 - Pustule Intensity Quantification
 - Crusting Intensity Quantification
 - Xerosis Intensity Quantification
 - Swelling Intensity Quantification
 - Oozing Intensity Quantification
 - Excoriation Intensity Quantification
 - Lichenification Intensity Quantification
 - Wound Characteristic Assessment
 - Body Surface Segmentation
 - Wound Surface Quantification
 - Hair Loss Surface Quantification
 - Inflammatory Nodular Lesion Quantification
 - Acneiform Lesion Type Quantification
 - Inflammatory Lesion Quantification
 - Hive Lesion Quantification
 - Nail Lesion Surface Quantification
 - Hypopigmentation or Depigmentation Surface Quantification
 - Acneiform Inflammatory Pattern Identification
 - Follicular and Inflammatory Pattern Identification
 - Inflammatory Pattern Identification
 - Inflammatory Pattern Indicator
 - Dermatology Image Quality Assessment (DIQA)
 - Fitzpatrick Skin Type Identification
 - Domain Validation
 - Skin Surface Segmentation
 - Surface Area Quantification
 - Body Site Identification
 - Data Specifications
 - Other Specifications
 - Cybersecurity and Transparency
 - Specifications and Risks
 
 - Integration and Environment
 - References
 - Traceability to QMS Records
 
Purpose
This document defines the specifications, performance requirements, and data needs for the Artificial Intelligence (AI) models used in the Legit.Health Plus device.
Scope
This document details the design and performance specifications for all AI algorithms integrated into the Legit.Health Plus device. It establishes the foundation for the development, validation, and risk management of these models.
This description covers the following key areas for each algorithm:
- Algorithm description, clinical objectives, and justification.
 - Performance endpoints and acceptance criteria.
 - Specifications for the data required for development and evaluation.
 - Requirements related to cybersecurity, transparency, and integration.
 - Links between the AI specifications and the overall risk management process.
 
Algorithm summary
| ID | Model Name | Type | Task Type | Visible Signs | Clinical Context | 
|---|---|---|---|---|---|
| 1 | ICD Category Distribution and Binary Indicators | 🔬 Clinical | Classification | All Dermatological Conditions | ICD-11, Diagnosis, Triage | 
| 2 | Erythema Intensity Quantification | 🔬 Clinical | Ordinal Classification | Erythema | PASI, EASI, SCORAD, GPPGA, PPPASI | 
| 3 | Desquamation Intensity Quantification | 🔬 Clinical | Ordinal Classification | Desquamation | PASI, GPPGA, PPPASI | 
| 4 | Induration Intensity Quantification | 🔬 Clinical | Ordinal Classification | Induration | PASI | 
| 5 | Pustule Intensity Quantification | 🔬 Clinical | Ordinal Classification | Pustule | PPPASI, GPPGA, Acne | 
| 6 | Crusting Intensity Quantification | 🔬 Clinical | Ordinal Classification | Crusting | EASI, SCORAD | 
| 7 | Xerosis Intensity Quantification | 🔬 Clinical | Ordinal Classification | Xerosis | EASI, SCORAD, ODS | 
| 8 | Swelling Intensity Quantification | 🔬 Clinical | Ordinal Classification | Swelling | EASI, SCORAD | 
| 9 | Oozing Intensity Quantification | 🔬 Clinical | Ordinal Classification | Oozing | EASI, SCORAD | 
| 10 | Excoriation Intensity Quantification | 🔬 Clinical | Ordinal Classification | Excoriation | EASI, SCORAD | 
| 11 | Lichenification Intensity Quantification | 🔬 Clinical | Ordinal Classification | Lichenification | EASI, SCORAD | 
| 12 | Wound Characteristic Assessment | 🔬 Clinical | Multi Task Multi Output | Perilesional Erythema, Damaged Edges, Delimited Edges, Diffuse Edges, Thickened Edges, Indistinguishable Edges, Perilesional Maceration, Biofilm-Compatible Tissue, Fibrinous Exudate, Purulent Exudate, Bloody Exudate, Serous Exudate, Greenish Exudate | AWOSI, Wound Assessment, NPUAP | 
| 13 | Wound Surface Quantification | 🔬 Clinical | Multi Class Segmentation | Erythema, Wound Bed, Angiogenesis and Granulation Tissue, Biofilm and Slough, Necrosis, Maceration, Orthopedic Material, Bone, Cartilage, or Tendon | Wound Assessment, AWOSI | 
| 14 | Hair Loss Surface Quantification | 🔬 Clinical | Segmentation | Alopecia | SALT, APULSI, Alopecia Assessment | 
| 15 | Hair Follicle Quantification | 🔬 Clinical | Object Detection | Hair Follicles | Androgenetic Alopecia, Alopecia Areata, Telogen Effluvium, Hair Transplantation, Treatment Monitoring | 
| 16 | Inflammatory Nodular Lesion Quantification | 🔬 Clinical | Multi Class Object Detection | Nodule, Abscess, Non-draining Tunnel, Draining Tunnel | IHS4, Hidradenitis Suppurativa | 
| 17 | Acneiform Lesion Type Quantification | 🔬 Clinical | Multi Class Object Detection | Papule, Pustule, Cyst, Comedone, Nodule | GAGS, IGA, ASI, Acne Assessment | 
| 18 | Inflammatory Lesion Quantification | 🔬 Clinical | Object Detection | Inflammatory Lesion | PASI, EASI, Inflammatory Dermatoses | 
| 19 | Hive Lesion Quantification | 🔬 Clinical | Object Detection | Hive | UAS7, UCT, Urticaria Assessment | 
| 20 | Nail Lesion Surface Quantification | 🔬 Clinical | Segmentation | Nail Lesion | NAPSI, Nail Psoriasis, Onychomycosis | 
| 21 | Hypopigmentation or Depigmentation Surface Quantification | 🔬 Clinical | Segmentation | Hypopigmentation or Depigmentation | VASI, VETF, Vitiligo Assessment | 
| 22 | Hyperpigmentation Surface Quantification | 🔬 Clinical | Segmentation | Hyperpigmentation | MASI, mMASI, Melasma Assessment, PIH Assessment | 
| 23 | Acneiform Inflammatory Pattern Identification | 🔬 Clinical | Tabular Classification | Inflammatory Lesion Count, Lesion Density | IGA, Acne Assessment | 
| 24 | Follicular and Inflammatory Pattern Identification | 🔬 Clinical | Classification | — | Hidradenitis Suppurativa, Martorell Classification, HS Phenotyping | 
| 25 | Inflammatory Pattern Identification (Hurley Staging) | 🔬 Clinical | Classification | — | Hidradenitis Suppurativa, Hurley Staging, HS Severity | 
| 26 | Inflammatory Pattern Indicator | 🔬 Clinical | Classification | — | Hidradenitis Suppurativa, Disease Activity, Treatment Selection | 
| 27 | Body Surface Segmentation | 🛠️ Non-Clinical | Multi Class Segmentation | — | PASI, EASI, BSA Calculation, Burn Assessment | 
| 28 | Surface Area Quantification | 🛠️ Non-Clinical | Regression | — | BSA Calculation, Surface Area Measurement, Calibration | 
| 29 | Dermatology Image Quality Assessment (DIQA) | 🛠️ Non-Clinical | Regression | — | Quality Control, Telemedicine | 
| 30 | Fitzpatrick Skin Type Identification | 🛠️ Non-Clinical | Classification | — | Bias Monitoring, Equity Assessment, Performance Stratification | 
| 31 | Domain Validation | 🛠️ Non-Clinical | Classification | — | Image Routing, Quality Control, Domain Classification | 
| 32 | Skin Surface Segmentation | 🛠️ Non-Clinical | Segmentation | — | Preprocessing, ROI Extraction, Skin Detection | 
| 33 | Body Site Identification | 🛠️ Non-Clinical | Classification | — | Anatomical Context, Site-Specific Analysis, Documentation | 
| 34 | Head Detection | 🛠️ Non-Clinical | Object Detection | — | Privacy Protection, Quality Control, Patient Counting, Multi-patient Detection | 
Algorithm Classification
The AI algorithms in the Legit.Health Plus device are classified into two categories based on their relationship to the device's intended purpose as defined in the Technical Documentation.
Clinical Models
Clinical models are AI algorithms that directly fulfill the device's intended purpose by providing one or more of the following outputs to healthcare professionals:
- Quantitative data on clinical signs (severity measurement of dermatological features)
 - Interpretative distribution of ICD categories (diagnostic support for skin conditions)
 
These models:
- Directly contribute to the device's medical purpose of supporting healthcare providers in assessing skin structures
 - Provide outputs that healthcare professionals use for diagnosis, monitoring, or treatment decisions
 - Generate quantitative measurements or probability distributions that constitute medical information
 - Are integral to the clinical claims and intended use of the device
 - Are subject to full clinical validation and regulatory requirements under MDR 2017/745 and RDC 751/2022
 
Non-Clinical Models
Non-clinical models are AI algorithms that enable the proper functioning of the device but do not themselves provide the outputs defined in the intended purpose. These models:
- Perform quality assurance, preprocessing, or technical validation functions
 - Ensure that clinical models receive appropriate inputs and operate within their validated domains
 - Support equity, bias mitigation, and performance monitoring across diverse populations
 - Do not generate quantitative data on clinical signs or interpretative distributions of ICD categories
 - Do not independently provide medical information used for diagnosis, monitoring, or treatment decisions
 - Serve as auxiliary technical infrastructure supporting clinical model performance and patient safety
 
Important Distinctions:
- Clinical models directly fulfill the intended purpose: "to provide quantitative data on clinical signs and an interpretative distribution of ICD categories to healthcare professionals for assessing skin structures"
 - Non-clinical models enable clinical models to function properly but do not themselves provide the quantitative or interpretative outputs defined in the intended purpose
 
Description and Specifications
ICD Category Distribution and Binary Indicators
Model Classification: 🔬 Clinical Model
Description
ICD Category Distribution
We employ a deep learning model to analyze clinical or dermoscopic lesion images and output a probability distribution across ICD-11 categories. Deep learning-based image classifiers can be designed to recognize fine-grained disease categories with high variability, leveraging mechanisms to capture both local and global image features [1,2,9].
Given an image and optional basic clinical metadata (age, sex and body site), this model outputs a normalized probability vector:
where each corresponds to the probability that the lesion belongs to the -th ICD-11 category, and .
The system highlights the top five ICD-11 disease categories, each accompanied by its corresponding code and confidence score, thereby supporting clinicians with both ranking and probability information—a strategy shown to enhance diagnostic confidence and interpretability in multi-class dermatological AI systems [2,3].
Binary Indicators
Binary indicators are derived from the ICD-11 probability distribution as a post-processing step using a dermatologist-defined mapping matrix. Each indicator reflects the aggregated probability that a case belongs to clinically meaningful categories requiring differential triage or diagnostic attention.
The six binary indicators are:
- Malignant: probability that the lesion is a confirmed malignancy (e.g., melanoma, squamous cell carcinoma, basal cell carcinoma).
 - Pre-malignant: probability of conditions with malignant potential (e.g., actinic keratosis, Bowen's disease).
 - Associated with malignancy: benign or inflammatory conditions with frequent overlap or mimicry of malignant presentations (e.g., atypical nevi, pigmented seborrheic keratoses).
 - Pigmented lesion: probability that the lesion belongs to the pigmented subgroup, important for melanoma risk assessment.
 - Urgent referral: lesions likely requiring dermatological evaluation within 48 hours (e.g., suspected melanoma, rapidly growing nodular lesions, bleeding or ulcerated malignancies).
 - High-priority referral: lesions that should be seen within 2 weeks according to dermatology referral guidelines (e.g., suspected non-melanoma skin cancer, premalignant lesions with malignant potential).
 
For categories and 6 indicators, the mapping matrix has a size of . Thus, the computation of each indicator is defined as:
where is the probability for the -th ICD-11 category, and is the binary weight coefficient () that indicates whether category contributes to indicator .
Objectives
ICD Category Distribution Objectives
- Improve diagnostic accuracy, aiming for an uplift of approximately 10–15% in top-1 and top-5 prediction metrics compared to baseline approaches [4,5,6].
 - Assist clinicians in differential diagnosis, especially in ambiguous or rare cases, by presenting a ranked shortlist that enables efficient decision-making.
 - Enhance trust and interpretability—leveraging attention maps to offer transparent reasoning and evidence for suggested categories [7].
 
Justification: Presenting a ranked list of likely diagnoses (e.g., top-5) is evidence-based.
- In reader studies, AI-based multiclass probabilities improved clinician accuracy beyond AI or physicians alone, with the largest benefit for less experienced clinicians [8,9].
 - Han et al. reported sensitivity +12.1%, specificity +1.1%, and top-1 accuracy +7.0% improvements when physicians were supported with AI outputs including top-k predictions [9].
 - Clinical decision support tools providing ranked differentials improved diagnostic accuracy by up to 34% without prolonging consultations [10].
 - Systematic reviews confirm that AI assistance consistently improves clinician accuracy, especially for non-specialists [11,12].
 
Binary Indicator Objectives
- Clinical triage support: Provide clinicians with clear case-prioritization signals, improving patient flow and resource allocation [13, 14].
 - Malignancy risk quantification: Objectively assess malignancy and premalignancy likelihood to reduce missed diagnoses [15].
 - Referral urgency standardization: Align algorithm outputs with international clinical guidelines for dermatology referrals, e.g., NICE and EADV recommendations: urgent (≤48h), high-priority (≤2 weeks) [16, 17].
 - Improve patient safety: Flag high-risk pigmented lesions for expedited evaluation, ensuring melanoma is not delayed in triage [18, 19].
 - Reduce variability: Decrease inter-observer variation in urgency assignment by providing consistent, evidence-based binary outputs [20].
 
Justification:
- Binary classification systems for malignancy risk have demonstrated clinical utility in improving referral appropriateness and reducing diagnostic delays [13, 15].
 - Standardized triage tools based on objective criteria show reduced inter-observer variability (κ improvement from 0.45 to 0.82) compared to subjective clinical judgment alone [20].
 - Integration of urgency indicators into clinical workflows has been associated with improved melanoma detection rates and reduced time to specialist evaluation [18, 19].
 
Endpoints and Requirements
ICD Category Distribution Endpoints and Requirements
Performance is evaluated using Top-k Accuracy compared to expert-labeled ground truth.
| Metric | Threshold | Interpretation | 
|---|---|---|
| Top-1 Accuracy | ≥ 55% | Meets minimum diagnostic utility | 
| Top-3 Accuracy | ≥ 70% | Reliable differential diagnosis | 
| Top-5 Accuracy | ≥ 80% | Substantial agreement with expert performance | 
All thresholds have been set according to existing literature on fine-grained skin disease classification [1][9], and they must be achieved with 95% confidence intervals.
Requirements:
- Implement image analysis models capable of ICD classification [15].
 - Output normalized probability distributions (sum = 100%).
 - Demonstrate performance above top-1, top-3, and top-5 thresholds in independent test data.
 - Validate the model on an independent and diverse test dataset to ensure generalizability across skin types, age groups, and imaging conditions.
 
Binary Indicator Endpoints and Requirements
Performance of binary indicators is evaluated using AUC (Area Under the ROC Curve) against dermatologists' consensus labels.
| AUC Score | Agreement Category | Interpretation | 
|---|---|---|
< 0.70 | Poor | Not acceptable for clinical use | 
0.70 - 0.79 | Fair | Below acceptance threshold | 
0.80 - 0.89 | Good | Meets acceptance threshold | 
0.90 - 0.95 | Excellent | High robustness | 
> 0.95 | Outstanding | Near-expert level performance | 
Success criteria:
Each binary indicator must achieve AUC ≥ 0.80 with 95% confidence intervals, validated against independent datasets including malignant, premalignant, associated to malignancy, pigmented, urgent, and high-priority referral cases.
Requirements:
- Implement all six binary indicators:
- Malignant
 - Pre-malignant
 - Associated with malignancy
 - Pigmented lesion
 - Urgent referral (≤48h)
 - High-priority referral (≤2 weeks)
 
 - Define and document the dermatologist-validated mapping matrix .
 - Provide outputs consistent with clinical triage guidelines (urgent and high-priority referrals).
 - Validate performance on diverse and independent datasets representing both common and rare conditions, as well as positive and negative cases for each indicator.
 - Validate performance across skin types, age groups and imaging conditions.
 - Ensure ≥0.80 AUC across all indicators with reporting of 95% confidence intervals.
 
Erythema Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model’s softmax-normalized probability that the erythema intensity belongs to ordinal category (ranging from minimal to maximal erythema).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous erythema severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in the assessment of erythema severity by providing an objective, quantitative measure.
 - Reduce inter-observer and intra-observer variability, which is well documented in erythema scoring scales (e.g., Clinician’s Erythema Assessment [CEA] interrater ICC ≈ 0.60, weighted κ ≈ 0.69) [cite: Tan 2014].
 - Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type).
 - Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective scoring introduces variability.
 
Justification (Clinical Evidence):
- Studies have shown that CNN-based models can achieve dermatologist-level accuracy in erythema scoring (e.g., ResNet models reached ~99% accuracy in erythema detection under varying conditions) [cite: Lee 2021, Cho 2021].
 - Automated erythema quantification has demonstrated reduced variability compared to human raters in tasks such as Minimum Erythema Dose (MED) and SPF index assessments [cite: Kim 2023].
 - Clinical scales such as the CEA, though widely used, suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: Tan 2014].
 
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation | 
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. | 
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal erythema categories (softmax output, sum = 1).
 - Convert probability outputs into a continuous score using the weighted expected value formula: