R-TF-028-001 AI Description
Table of contents
- Purpose
- Scope
- Algorithm summary
- Algorithm Classification
- Description and Specifications
- ICD Category Distribution and Binary Indicators
- Erythema Intensity Quantification
- Desquamation Intensity Quantification
- Induration Intensity Quantification
- Pustule Intensity Quantification
- Crusting Intensity Quantification
- Xerosis Intensity Quantification
- Swelling Intensity Quantification
- Oozing Intensity Quantification
- Excoriation Intensity Quantification
- Lichenification Intensity Quantification
- Wound Perilesional Erythema Assessment
- Damaged Wound Edges Assessment
- Delimited Wound Edges Assessment
- Diffuse Wound Edges Assessment
- Thickened Wound Edges Assessment
- Indistinguishable Wound Edges Assessment
- Perilesional Maceration Assessment
- Fibrinous Exudate Assessment
- Purulent Exudate Assessment
- Bloody Exudate Assessment
- Serous Exudate Assessment
- Biofilm-Compatible Tissue Assessment
- Wound Affected Tissue: Bone
- Wound Affected Tissue: Subcutaneous
- Wound Affected Tissue: Muscle
- Wound Affected Tissue: Intact Skin
- Wound Affected Tissue: Dermis-Epidermis
- Wound Bed Tissue: Necrotic
- Wound Bed Tissue: Closed
- Wound Bed Tissue: Granulation
- Wound Bed Tissue: Epithelial
- Wound Bed Tissue: Slough
- Wound Stage Classification
- Wound AWOSI Score Quantification
- Body Surface Segmentation
- Erythema Surface Quantification
- Wound Bed Surface Quantification
- Angiogenesis and Granulation Tissue Surface Quantification
- Biofilm and Slough Surface Quantification
- Necrosis Surface Quantification
- Maceration Surface Quantification
- Orthopedic Material Surface Quantification
- Bone, Cartilage, or Tendon Surface Quantification
- Hair Loss Surface Quantification
- Inflammatory Nodular Lesion Quantification
- Acneiform Lesion Type Quantification
- Acneiform Inflammatory Lesion Quantification
- Hive Lesion Quantification
- Nail Lesion Surface Quantification
- Hypopigmentation or Depigmentation Surface Quantification
- Acneiform Inflammatory Pattern Identification
- Follicular and Inflammatory Pattern Identification
- Inflammatory Pattern Identification
- Dermatology Image Quality Assessment (DIQA)
- Fitzpatrick Skin Type Identification
- Domain Validation
- Skin Surface Segmentation
- Surface Area Quantification
- Body Site Identification
- Data Specifications
- Other Specifications
- Cybersecurity and Transparency
- Specifications and Risks
- Integration and Environment
- References
- Traceability to QMS Records
Purpose
This document defines the specifications, performance requirements, and data needs for the Artificial Intelligence (AI) models used in the Legit.Health Plus device.
Scope
This document details the design and performance specifications for all AI algorithms integrated into the Legit.Health Plus device. It establishes the foundation for the development, validation, and risk management of these models.
This description covers the following key areas for each algorithm:
- Algorithm description, clinical objectives, and justification.
- Performance endpoints and acceptance criteria.
- Specifications for the data required for development and evaluation.
- Requirements related to cybersecurity, transparency, and integration.
- Links between the AI specifications and the overall risk management process.
Algorithm summary
| ID | Model Name | Type | Task Type | Visible Signs | Clinical Context |
|---|---|---|---|---|---|
| 1 | ICD Category Distribution and Binary Indicators | 🔬 Clinical | Classification | All Dermatological Conditions | ICD-11, Diagnosis, Triage |
| 2 | Erythema Intensity Quantification | 🔬 Clinical | Ordinal Classification | Erythema | PASI, EASI, SCORAD, GPPGA, PPPASI |
| 3 | Desquamation Intensity Quantification | 🔬 Clinical | Ordinal Classification | Desquamation | PASI, GPPGA, PPPASI |
| 4 | Induration Intensity Quantification | 🔬 Clinical | Ordinal Classification | Induration | PASI |
| 5 | Pustule Intensity Quantification | 🔬 Clinical | Ordinal Classification | Pustule | PPPASI, GPPGA, Acne |
| 6 | Crusting Intensity Quantification | 🔬 Clinical | Ordinal Classification | Crusting | EASI, SCORAD |
| 7 | Xerosis Intensity Quantification | 🔬 Clinical | Ordinal Classification | Xerosis | EASI, SCORAD, ODS |
| 8 | Swelling Intensity Quantification | 🔬 Clinical | Ordinal Classification | Swelling | EASI, SCORAD |
| 9 | Oozing Intensity Quantification | 🔬 Clinical | Ordinal Classification | Oozing | EASI, SCORAD |
| 10 | Excoriation Intensity Quantification | 🔬 Clinical | Ordinal Classification | Excoriation | EASI, SCORAD |
| 11 | Lichenification Intensity Quantification | 🔬 Clinical | Ordinal Classification | Lichenification | EASI, SCORAD |
| 12 | Wound Perilesional Erythema Assessment | 🔬 Clinical | Binary Classification | Perilesional Erythema | Wound Assessment, Infection Detection |
| 13 | Damaged Wound Edges Assessment | 🔬 Clinical | Binary Classification | Damaged Edges | Wound Assessment, Healing Prognosis |
| 14 | Delimited Wound Edges Assessment | 🔬 Clinical | Binary Classification | Delimited Edges | Wound Assessment, Healing Prognosis |
| 15 | Diffuse Wound Edges Assessment | 🔬 Clinical | Binary Classification | Diffuse Edges | Wound Assessment, Infection Detection |
| 16 | Thickened Wound Edges Assessment | 🔬 Clinical | Binary Classification | Thickened Edges | Wound Assessment, Debridement Planning |
| 17 | Indistinguishable Wound Edges Assessment | 🔬 Clinical | Binary Classification | Indistinguishable Edges | Wound Assessment, Critical Wound Identification |
| 18 | Perilesional Maceration Assessment | 🔬 Clinical | Binary Classification | Perilesional Maceration | Wound Assessment, Moisture Management |
| 19 | Fibrinous Exudate Assessment | 🔬 Clinical | Binary Classification | Fibrinous Exudate | Wound Assessment, Exudate Characterization |
| 20 | Purulent Exudate Assessment | 🔬 Clinical | Binary Classification | Purulent Exudate | Wound Assessment, Infection Detection |
| 21 | Bloody Exudate Assessment | 🔬 Clinical | Binary Classification | Bloody Exudate | Wound Assessment, Tissue Fragility Assessment |
| 22 | Serous Exudate Assessment | 🔬 Clinical | Binary Classification | Serous Exudate | Wound Assessment, Exudate Characterization |
| 23 | Biofilm-Compatible Tissue Assessment | 🔬 Clinical | Binary Classification | Biofilm-Compatible Tissue | Wound Assessment, Biofilm Detection |
| 24 | Wound Affected Tissue: Bone | 🔬 Clinical | Binary Classification | Bone Tissue | Wound Assessment, Depth Assessment, Osteomyelitis Risk |
| 25 | Wound Affected Tissue: Subcutaneous | 🔬 Clinical | Binary Classification | Subcutaneous Tissue | Wound Assessment, Depth Assessment, NPUAP |
| 26 | Wound Affected Tissue: Muscle | 🔬 Clinical | Binary Classification | Muscle Tissue | Wound Assessment, Depth Assessment, NPUAP |
| 27 | Wound Affected Tissue: Intact Skin | 🔬 Clinical | Binary Classification | Intact Skin | Wound Assessment, Stage I Pressure Injury, NPUAP |
| 28 | Wound Affected Tissue: Dermis-Epidermis | 🔬 Clinical | Binary Classification | Dermis-Epidermis Tissue | Wound Assessment, Depth Assessment, NPUAP |
| 29 | Wound Bed Tissue: Necrotic | 🔬 Clinical | Binary Classification | Necrotic Tissue | Wound Assessment, Debridement Planning |
| 30 | Wound Bed Tissue: Closed | 🔬 Clinical | Binary Classification | Closed Wound | Wound Assessment, Healing Outcomes |
| 31 | Wound Bed Tissue: Granulation | 🔬 Clinical | Binary Classification | Granulation Tissue | Wound Assessment, Healing Prognosis |
| 32 | Wound Bed Tissue: Epithelial | 🔬 Clinical | Binary Classification | Epithelial Tissue | Wound Assessment, Healing Phase Assessment |
| 33 | Wound Bed Tissue: Slough | 🔬 Clinical | Binary Classification | Slough Tissue | Wound Assessment, Debridement Planning |
| 34 | Wound Stage Classification | 🔬 Clinical | Multi Class Classification | Wound Stage | Wound Assessment, NPUAP, Treatment Planning |
| 35 | Wound AWOSI Score Quantification | 🔬 Clinical | Ordinal Classification | Wound AWOSI Score | AWOSI, Wound Assessment, Severity Stratification |
| 36 | Erythema Surface Quantification | 🔬 Clinical | Segmentation | Erythema | Wound Assessment, Infection Surveillance, AWOSI |
| 37 | Wound Bed Surface Quantification | 🔬 Clinical | Segmentation | Wound Bed | Wound Assessment, Wound Measurement, Healing Rate, AWOSI |
| 38 | Angiogenesis and Granulation Tissue Surface Quantification | 🔬 Clinical | Segmentation | Angiogenesis and Granulation Tissue | Wound Assessment, Wound Bed Preparation, Healing Prediction, AWOSI |
| 39 | Biofilm and Slough Surface Quantification | 🔬 Clinical | Segmentation | Biofilm and Slough | Wound Assessment, Debridement Planning, TIME Framework, AWOSI |
| 40 | Necrosis Surface Quantification | 🔬 Clinical | Segmentation | Necrosis | Wound Assessment, Urgent Debridement, Infection Risk, AWOSI |
| 41 | Maceration Surface Quantification | 🔬 Clinical | Segmentation | Maceration | Wound Assessment, Moisture Management, Dressing Selection, AWOSI |
| 42 | Orthopedic Material Surface Quantification | 🔬 Clinical | Segmentation | Orthopedic Material | Wound Assessment, Surgical Revision, Device Complications, Infection Risk |
| 43 | Bone, Cartilage, or Tendon Surface Quantification | 🔬 Clinical | Segmentation | Bone, Cartilage, or Tendon | Wound Assessment, Osteomyelitis Risk, Urgent Surgical Consultation, Amputation Risk, AWOSI |
| 44 | Hair Loss Surface Quantification | 🔬 Clinical | Segmentation | Alopecia | SALT, APULSI, Alopecia Assessment |
| 45 | Hair Follicle Quantification | 🔬 Clinical | Object Detection | Hair Follicles | Androgenetic Alopecia, Alopecia Areata, Telogen Effluvium, Hair Transplantation, Treatment Monitoring |
| 46 | Inflammatory Nodular Lesion Quantification | 🔬 Clinical | Multi Class Object Detection | Nodule, Abscess, Non-draining Tunnel, Draining Tunnel | IHS4, Hidradenitis Suppurativa |
| 47 | Acneiform Lesion Type Quantification | 🔬 Clinical | Multi Class Object Detection | Papule, Pustule, Cyst, Comedone, Nodule | GAGS, IGA |
| 48 | Acneiform Inflammatory Lesion Quantification | 🔬 Clinical | Object Detection | Inflammatory Lesion | GAGS, EASI, Inflammatory Dermatoses, IGA |
| 49 | Hive Lesion Quantification | 🔬 Clinical | Object Detection | Hive | UAS7, UCT |
| 50 | Nail Lesion Surface Quantification | 🔬 Clinical | Segmentation | Nail Lesion | NAPSI, OSI |
| 51 | Hypopigmentation or Depigmentation Surface Quantification | 🔬 Clinical | Segmentation | Hypopigmentation or Depigmentation | VASI, VETF, Vitiligo Assessment |
| 52 | Hyperpigmentation Surface Quantification | 🔬 Clinical | Segmentation | Hyperpigmentation | MASI, mMASI, Melasma Assessment, PIH Assessment |
| 53 | Acneiform Inflammatory Pattern Identification | 🔬 Clinical | Tabular Classification | Inflammatory Lesion Count, Lesion Density | IGA, Acne Assessment |
| 54 | Follicular and Inflammatory Pattern Identification | 🔬 Clinical | Classification | — | Hidradenitis Suppurativa, Martorell Classification, HS Phenotyping |
| 55 | Inflammatory Pattern Identification | 🔬 Clinical | Multi Task Classification | Hurley Stage, Inflammatory Activity | Hidradenitis Suppurativa, Hurley Staging, HS Severity, Disease Activity, Treatment Selection, IHS4, HS-PGA |
| 56 | Body Surface Segmentation | 🛠️ Non-Clinical | Multi Class Segmentation | — | PASI, EASI, BSA Calculation, Burn Assessment |
| 57 | Surface Area Quantification | 🛠️ Non-Clinical | Regression | — | BSA Calculation, Surface Area Measurement, Calibration |
| 58 | Dermatology Image Quality Assessment (DIQA) | 🛠️ Non-Clinical | Regression | — | Quality Control, Telemedicine |
| 59 | Fitzpatrick Skin Type Identification | 🛠️ Non-Clinical | Classification | — | Bias Monitoring, Equity Assessment, Performance Stratification |
| 60 | Domain Validation | 🛠️ Non-Clinical | Classification | — | Image Routing, Quality Control, Domain Classification |
| 61 | Skin Surface Segmentation | 🛠️ Non-Clinical | Segmentation | — | Preprocessing, ROI Extraction, Skin Detection |
| 62 | Body Site Identification | 🛠️ Non-Clinical | Classification | — | Anatomical Context, Site-Specific Analysis, Documentation |
| 63 | Head Detection | 🛠️ Non-Clinical | Object Detection | — | Privacy Protection, Quality Control, Patient Counting, Multi-patient Detection |
Algorithm Classification
The AI algorithms in the Legit.Health Plus device are classified into two categories based on their relationship to the device's intended purpose as defined in the Technical Documentation.
Clinical Models
Clinical models are AI algorithms that directly fulfill the device's intended purpose by providing one or more of the following outputs to healthcare professionals:
- Quantitative data on clinical signs (severity measurement of dermatological features)
- Interpretative distribution of ICD categories (diagnostic support for skin conditions)
These models:
- Directly contribute to the device's medical purpose of supporting healthcare providers in assessing skin structures
- Provide outputs that healthcare professionals use for diagnosis, monitoring, or treatment decisions
- Generate quantitative measurements or probability distributions that constitute medical information
- Are integral to the clinical claims and intended use of the device
- Are subject to full clinical validation and regulatory requirements under MDR 2017/745 and RDC 751/2022
Non-Clinical Models
Non-clinical models are AI algorithms that enable the proper functioning of the device but do not themselves provide the outputs defined in the intended purpose. These models:
- Perform quality assurance, preprocessing, or technical validation functions
- Ensure that clinical models receive appropriate inputs and operate within their validated domains
- Support equity, bias mitigation, and performance monitoring across diverse populations
- Do not generate quantitative data on clinical signs or interpretative distributions of ICD categories
- Do not independently provide medical information used for diagnosis, monitoring, or treatment decisions
- Serve as auxiliary technical infrastructure supporting clinical model performance and patient safety
Important Distinctions:
- Clinical models directly fulfill the intended purpose: "to provide quantitative data on clinical signs and an interpretative distribution of ICD categories to healthcare professionals for assessing skin structures"
- Non-clinical models enable clinical models to function properly but do not themselves provide the quantitative or interpretative outputs defined in the intended purpose
Description and Specifications
ICD Category Distribution and Binary Indicators
Model Classification: 🔬 Clinical Model
Description
ICD Category Distribution
We employ a deep learning model to analyze clinical or dermoscopic lesion images and output a probability distribution across ICD-11 categories. Deep learning-based image classifiers can be designed to recognize fine-grained disease categories with high variability, leveraging mechanisms to capture both local and global image features [1,2,9].
Given an image and optional basic clinical metadata (age, sex and body site), this model outputs a normalized probability vector:
where each corresponds to the probability that the lesion belongs to the -th ICD-11 category, and .
The system highlights the top five ICD-11 disease categories, each accompanied by its corresponding code and confidence score, thereby supporting clinicians with both ranking and probability information—a strategy shown to enhance diagnostic confidence and interpretability in multi-class dermatological AI systems [2,3].
Binary Indicators
Binary indicators are derived from the ICD-11 probability distribution as a post-processing step using a dermatologist-defined mapping matrix. Each indicator reflects the aggregated probability that a case belongs to clinically meaningful categories requiring differential triage or diagnostic attention.
The six binary indicators are:
- Malignant: probability that the lesion is a confirmed malignancy (e.g., melanoma, squamous cell carcinoma, basal cell carcinoma).
- Pre-malignant: probability of conditions with malignant potential (e.g., actinic keratosis, Bowen's disease).
- Associated with malignancy: benign or inflammatory conditions with frequent overlap or mimicry of malignant presentations (e.g., atypical nevi, pigmented seborrheic keratoses).
- Pigmented lesion: probability that the lesion belongs to the pigmented subgroup, important for melanoma risk assessment.
- Urgent referral: lesions likely requiring dermatological evaluation within 48 hours (e.g., suspected melanoma, rapidly growing nodular lesions, bleeding or ulcerated malignancies).
- High-priority referral: lesions that should be seen within 2 weeks according to dermatology referral guidelines (e.g., suspected non-melanoma skin cancer, premalignant lesions with malignant potential).
For categories and 6 indicators, the mapping matrix has a size of . Thus, the computation of each indicator is defined as:
where is the probability for the -th ICD-11 category, and is the binary weight coefficient () that indicates whether category contributes to indicator .
Objectives
ICD Category Distribution Objectives
- Improve diagnostic accuracy, aiming for an uplift of approximately 10–15% in top-1 and top-5 prediction metrics compared to baseline approaches [4,5,6].
- Assist clinicians in differential diagnosis, especially in ambiguous or rare cases, by presenting a ranked shortlist that enables efficient decision-making.
- Enhance trust and interpretability—leveraging attention maps to offer transparent reasoning and evidence for suggested categories [7].
Justification: Presenting a ranked list of likely diagnoses (e.g., top-5) is evidence-based.
- In reader studies, AI-based multiclass probabilities improved clinician accuracy beyond AI or physicians alone, with the largest benefit for less experienced clinicians [8,9].
- Han et al. reported sensitivity +12.1%, specificity +1.1%, and top-1 accuracy +7.0% improvements when physicians were supported with AI outputs including top-k predictions [9].
- Clinical decision support tools providing ranked differentials improved diagnostic accuracy by up to 34% without prolonging consultations [10].
- Systematic reviews confirm that AI assistance consistently improves clinician accuracy, especially for non-specialists [11,12].
Binary Indicator Objectives
- Clinical triage support: Provide clinicians with clear case-prioritization signals, improving patient flow and resource allocation [13, 14].
- Malignancy risk quantification: Objectively assess malignancy and premalignancy likelihood to reduce missed diagnoses [15].
- Referral urgency standardization: Align algorithm outputs with international clinical guidelines for dermatology referrals, e.g., NICE and EADV recommendations: urgent (≤48h), high-priority (≤2 weeks) [16, 17].
- Improve patient safety: Flag high-risk pigmented lesions for expedited evaluation, ensuring melanoma is not delayed in triage [18, 19].
- Reduce variability: Decrease inter-observer variation in urgency assignment by providing consistent, evidence-based binary outputs [20].
Justification:
- Binary classification systems for malignancy risk have demonstrated clinical utility in improving referral appropriateness and reducing diagnostic delays [13, 15].
- Standardized triage tools based on objective criteria show reduced inter-observer variability (κ improvement from 0.45 to 0.82) compared to subjective clinical judgment alone [20].
- Integration of urgency indicators into clinical workflows has been associated with improved melanoma detection rates and reduced time to specialist evaluation [18, 19].
Endpoints and Requirements
ICD Category Distribution Endpoints and Requirements
Performance is evaluated using Top-k Accuracy compared to expert-labeled ground truth.
| Metric | Threshold | Interpretation |
|---|---|---|
| Top-1 Accuracy | ≥ 55% | Meets minimum diagnostic utility |
| Top-3 Accuracy | ≥ 70% | Reliable differential diagnosis |
| Top-5 Accuracy | ≥ 80% | Substantial agreement with expert performance |
All thresholds have been set according to existing literature on fine-grained skin disease classification [1][9], and they must be achieved with 95% confidence intervals.
Requirements:
- Implement image analysis models capable of ICD classification [15].
- Output normalized probability distributions (sum = 100%).
- Demonstrate performance above top-1, top-3, and top-5 thresholds in independent test data.
- Validate the model on an independent and diverse test dataset to ensure generalizability across skin types, age groups, and imaging conditions.
Binary Indicator Endpoints and Requirements
Performance of binary indicators is evaluated using AUC (Area Under the ROC Curve) against dermatologists' consensus labels.
| AUC Score | Agreement Category | Interpretation |
|---|---|---|
< 0.70 | Poor | Not acceptable for clinical use |
0.70 - 0.79 | Fair | Below acceptance threshold |
0.80 - 0.89 | Good | Meets acceptance threshold |
0.90 - 0.95 | Excellent | High robustness |
> 0.95 | Outstanding | Near-expert level performance |
Success criteria:
Each binary indicator must achieve AUC ≥ 0.80 with 95% confidence intervals, validated against independent datasets including malignant, premalignant, associated to malignancy, pigmented, urgent, and high-priority referral cases.
Requirements:
- Implement all six binary indicators:
- Malignant
- Pre-malignant
- Associated with malignancy
- Pigmented lesion
- Urgent referral (≤48h)
- High-priority referral (≤2 weeks)
- Define and document the dermatologist-validated mapping matrix .
- Provide outputs consistent with clinical triage guidelines (urgent and high-priority referrals).
- Validate performance on diverse and independent datasets representing both common and rare conditions, as well as positive and negative cases for each indicator.
- Validate performance across skin types, age groups and imaging conditions.
- Ensure ≥0.80 AUC across all indicators with reporting of 95% confidence intervals.
Erythema Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model’s softmax-normalized probability that the erythema intensity belongs to ordinal category (ranging from minimal to maximal erythema).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous erythema severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in the assessment of erythema severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is well documented in erythema scoring scales (e.g., Clinician’s Erythema Assessment [CEA] interrater ICC ≈ 0.60, weighted κ ≈ 0.69) [cite: Tan 2014].
- Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type).
- Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective scoring introduces variability.
Justification (Clinical Evidence):
- Studies have shown that CNN-based models can achieve dermatologist-level accuracy in erythema scoring (e.g., ResNet models reached ~99% accuracy in erythema detection under varying conditions) [cite: Lee 2021, Cho 2021].
- Automated erythema quantification has demonstrated reduced variability compared to human raters in tasks such as Minimum Erythema Dose (MED) and SPF index assessments [cite: Kim 2023].
- Clinical scales such as the CEA, though widely used, suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: Tan 2014].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal erythema categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset to ensure generalizability.
Desquamation Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the desquamation intensity belongs to ordinal category (ranging from minimal to maximal scaling/peeling).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous desquamation severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing desquamation severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is well documented in visual scaling/peeling assessments in dermatology.
- Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
- Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective desquamation scoring reduces reliability.
- Enable PASI scoring automation, as desquamation (scaling) is one of the three key components of the Psoriasis Area and Severity Index.
Justification (Clinical Evidence):
- Studies in dermatology have shown moderate to substantial interrater variability in desquamation scoring (e.g., psoriasis and radiation dermatitis grading) with κ values often
<0.70, with some studies reporting ICC values as low as 0.45-0.60 [cite: 87, 88]. - The Psoriasis Area and Severity Index (PASI) includes scaling as one of three cardinal signs, but manual assessment shows significant variability, particularly in distinguishing between adjacent severity grades [cite: 89].
- Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and scaling detection, achieving accuracies >85% and often surpassing human raters in consistency [cite: 89, 90].
- Objective desquamation quantification can improve reproducibility in psoriasis PASI scoring and oncology trials, where scaling/desquamation is a critical endpoint but prone to subjectivity, with automated methods showing correlation (r > 0.80) with expert consensus [cite: 87].
- Deep learning texture analysis has proven particularly effective for subtle scaling patterns that may be missed or inconsistently graded by visual inspection alone [cite: 90].
- Studies in radiation dermatitis assessment show that automated desquamation grading reduces inter-observer variability by 30-40% compared to traditional visual scoring [cite: 88].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal desquamation categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (scalp, trunk, extremities, intertriginous areas)
- Different imaging devices and conditions
- Disease conditions including psoriasis, eczema, seborrheic dermatitis, and other inflammatory dermatoses
- Range of severity levels from minimal to severe desquamation
- Ensure outputs are compatible with automated PASI calculation when combined with erythema, induration, and body surface area assessment.
Induration Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the induration intensity belongs to ordinal category (ranging from minimal to maximal induration/plaque thickness).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous induration severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing induration (plaque thickness) severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is well documented in visual induration assessments in dermatology.
- Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type, contrast).
- Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective induration scoring reduces reliability.
- Enable PASI scoring automation, as induration (plaque thickness) is one of the three key components of the Psoriasis Area and Severity Index.
Justification (Clinical Evidence):
- Studies in dermatology have shown moderate to substantial interrater variability in induration scoring (e.g., psoriasis and other inflammatory dermatoses) with κ values often
<0.70, with reported ICC values ranging from 0.50-0.65 for plaque thickness assessment [cite: 87]. - The Psoriasis Area and Severity Index (PASI) includes induration/infiltration as one of three cardinal signs, with plaque thickness being a key indicator of disease severity and treatment response [cite: 89].
- Visual assessment of induration is particularly challenging as it relies on tactile and visual cues that are difficult to standardize, leading to significant inter-observer disagreement, especially for intermediate severity levels [cite: 87].
- Automated computer vision and CNN-based methods have demonstrated high accuracy in detecting plaque elevation and thickness, using shadow analysis, depth estimation, and texture features to achieve performance comparable to expert palpation-informed visual assessment [cite: 89, 90].
- Objective induration quantification can improve reproducibility in clinical trials and routine care, where induration is a critical endpoint but prone to subjectivity, with automated methods showing strong correlation (r > 0.75) with expert consensus and high-frequency ultrasound measurements [cite: 87].
- Studies using advanced imaging techniques (e.g., optical coherence tomography) for validation have shown that AI-based induration assessment from standard photographs can achieve accuracy within 15-20% of gold standard measurements [cite: 90].
- Induration assessment is particularly important for treatment monitoring, as changes in plaque thickness are early indicators of therapeutic response, often preceding changes in erythema or scaling [cite: 89].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal induration categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (scalp, trunk, extremities, intertriginous areas)
- Different imaging devices and conditions (including varying angles and lighting)
- Disease conditions including psoriasis, eczema, lichen planus, and other inflammatory dermatoses with plaque formation
- Range of severity levels from minimal to severe induration/plaque thickness
- Ensure outputs are compatible with automated PASI calculation when combined with erythema, desquamation, and body surface area assessment.
Pustule Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the pustule intensity belongs to ordinal category (ranging from minimal to maximal pustulation).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous pustule severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in the assessment of pustule severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is well documented in pustule scoring for conditions such as pustular psoriasis and acne (interrater ICC ≈ 0.55-0.70, κ ≈ 0.60-0.75).
- Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type, anatomical location).
- Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective pustule scoring introduces variability.
- Enable automated severity scoring for conditions where pustule quantification is a key component, such as pustular psoriasis (PPPASI - Palmoplantar Pustular Psoriasis Area and Severity Index), generalized pustular psoriasis (GPPGA - Generalized Pustular Psoriasis Global Assessment), and acne vulgaris.
Justification (Clinical Evidence):
- Studies have shown that CNN-based models can achieve dermatologist-level accuracy in pustule detection and scoring, with accuracies exceeding 85% in distinguishing pustules from papules and other inflammatory lesions [cite: 99, 100].
- Automated pustule quantification has demonstrated reduced variability compared to human raters in pustular dermatosis assessment, with improved inter-observer reliability (ICC improvement from 0.60 to 0.85) [cite: 101].
- Clinical scales for pustular conditions such as PPPASI and GPPGA rely on pustule counting and severity grading, but suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: 102].
- Pustule assessment is particularly challenging due to the need to distinguish pustules from vesicles, papules, and crusted lesions, leading to significant inter-observer variation (κ = 0.55-0.75) [cite: 103].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal pustule categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (palms, soles, trunk, extremities, scalp, intertriginous areas)
- Different imaging devices and conditions
- Disease conditions including pustular psoriasis (palmoplantar and generalized), acne vulgaris, acute generalized exanthematous pustulosis (AGEP), subcorneal pustular dermatosis, and other pustular dermatoses
- Range of severity levels from minimal to severe pustulation
- Various pustule sizes and densities
- Ensure outputs are compatible with automated severity scoring for conditions where pustule assessment is a key component (e.g., PPPASI, GPPGA, acne grading systems).
Crusting Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the crusting intensity belongs to ordinal category (ranging from minimal to maximal crusting severity).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous crusting severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing crusting severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is well documented in visual crusting assessments in dermatology.
- Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
- Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective crusting scoring reduces reliability.
- Enable comprehensive dermatitis assessment, as crusting is a key component in severity scoring systems such as EASI and SCORAD for atopic dermatitis and other inflammatory conditions.
Justification (Clinical Evidence):
- Studies in dermatology have shown moderate to substantial interrater variability in crusting scoring (e.g., atopic dermatitis, impetigo, psoriasis, and eczematous conditions) with κ values often
<0.70, with some studies reporting ICC values as low as 0.40-0.65 [cite: 87]. - Crusting assessment is particularly challenging because it represents secondary changes that vary in color, thickness, and distribution, leading to inconsistent grading between observers [cite: 88].
- Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and crust detection, achieving accuracies >85% in identifying and grading crusted lesions, often surpassing human raters in consistency [cite: 89, 90].
- Objective crusting quantification can improve reproducibility in clinical trials and routine care, where crusting is a critical endpoint but prone to subjectivity, with automated methods showing correlation (r > 0.78) with expert consensus [cite: 87].
- Deep learning texture analysis has proven particularly effective for distinguishing crust from scale and other surface changes, which may appear similar but have different clinical implications [cite: 90].
- In atopic dermatitis assessment, crusting severity correlates with disease activity and infection risk, making accurate quantification important for treatment decisions [cite: 88].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal crusting categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, scalp, trunk, extremities, intertriginous areas)
- Different imaging devices and conditions
- Disease conditions including atopic dermatitis, impetigo, psoriasis, eczema, and other inflammatory dermatoses
- Range of severity levels from minimal to severe crusting
- Various crust types (serous, hemorrhagic, purulent)
- Ensure outputs are compatible with automated severity scoring for conditions where crusting is a key component (e.g., EASI for atopic dermatitis, SCORAD, wound assessment scales).
Xerosis Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the xerosis (dry skin) intensity belongs to ordinal category (ranging from minimal to maximal xerosis severity).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous xerosis severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing xerosis (dry skin) severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is particularly challenging in xerosis assessment due to its complex visual and textural manifestations.
- Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast, magnification).
- Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective xerosis scoring reduces reliability.
- Enable comprehensive skin barrier assessment, as xerosis is a fundamental sign of impaired skin barrier function in conditions such as atopic dermatitis, ichthyosis, and aging skin.
Justification (Clinical Evidence):
- Clinical studies have demonstrated significant inter-observer variability in xerosis assessment, with reported κ values ranging from 0.35 to 0.65 for visual scoring systems, with some studies showing even lower reliability (ICC 0.30-0.50) for subtle xerosis [cite: 87, 88].
- The Overall Dry Skin Score (ODS) and similar xerosis scales are widely used but show limited reproducibility between assessors, particularly for intermediate severity grades [cite: 90].
- Deep learning methods using texture analysis have shown superior performance in skin surface assessment, achieving accuracies >90% in detecting and grading xerosis patterns, particularly when analyzing fine-scale texture features [cite: 89].
- Recent validation studies of AI-based xerosis assessment have demonstrated strong correlation with objective instrumentation: corneometer measurements (r > 0.85), transepidermal water loss (TEWL) measurements (r > 0.75), and capacitance measurements [cite: 90].
- Xerosis severity correlates with skin barrier dysfunction and predicts disease flares in atopic dermatitis, with objective quantification enabling early intervention before clinical exacerbation [cite: 88].
- Automated xerosis grading reduces assessment time by 40-50% while improving consistency, particularly beneficial in large-scale screening or longitudinal monitoring [cite: 89].
- Texture-based deep learning features can distinguish between xerosis and normal skin surface variations that may be confounded in manual assessment, improving specificity [cite: 90].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal xerosis categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, hands, lower legs, trunk—sites with varying baseline dryness)
- Different imaging devices and conditions (including macro photography for texture detail)
- Disease conditions including atopic dermatitis, ichthyosis, psoriasis, aging skin, and environmental xerosis
- Range of severity levels from minimal to severe xerosis
- Seasonal variations (winter vs. summer xerosis patterns)
- Ensure outputs are compatible with automated severity scoring for conditions where xerosis is a key component (e.g., EASI for atopic dermatitis, SCORAD, xerosis-specific scales).
- Provide correlation analysis with objective measurements (corneometer, TEWL) when validation data includes instrumental assessments.
Swelling Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the swelling (edema) intensity belongs to ordinal category (ranging from minimal to maximal swelling severity).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous swelling severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing swelling/edema severity by providing an objective, quantitative measure from 2D images.
- Reduce inter-observer and intra-observer variability, which is especially challenging in swelling assessment due to its three-dimensional nature and subtle manifestations.
- Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type, distance).
- Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective edema scoring reduces reliability.
- Enable comprehensive inflammatory assessment, as swelling is a cardinal sign in conditions such as atopic dermatitis, urticaria, angioedema, and other inflammatory dermatoses.
Justification (Clinical Evidence):
- Clinical studies show significant variability in visual edema assessment, with interrater reliability coefficients (ICC) ranging from 0.42 to 0.68 for traditional scoring methods, particularly for mild to moderate edema [cite: 87, 88].
- Visual assessment of swelling is inherently challenging because it requires 3D assessment from 2D images, relying on indirect cues such as skin texture changes, shadow patterns, and loss of normal skin markings [cite: 89].
- Three-dimensional analysis using deep learning has demonstrated superior accuracy (>85%) in detecting and grading tissue swelling compared to conventional 2D visual assessment methods, utilizing shadow analysis and surface contour estimation [cite: 89].
- Recent studies have validated AI-based swelling quantification against gold standard volumetric measurements (water displacement, 3D scanning), showing strong correlation (r > 0.80) despite using only 2D photographic input [cite: 90].
- Computer vision techniques incorporating shadow analysis, surface normal estimation, and texture pattern recognition have shown promise in objective edema assessment, with validation studies reporting accuracy improvements of 25-30% over traditional visual scoring [cite: 89].
- In atopic dermatitis, swelling severity correlates with acute inflammatory activity and response to anti-inflammatory treatment, making accurate assessment important for monitoring [cite: 88].
- Automated swelling quantification can detect subtle changes that may be missed by visual assessment, enabling earlier detection of treatment response or disease flare [cite: 90].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal swelling categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, extremities, trunk—sites with different baseline tissue compliance)
- Different imaging devices and conditions (standardized angles when possible)
- Disease conditions including atopic dermatitis, urticaria, angioedema, contact dermatitis, and other inflammatory dermatoses with edematous component
- Range of severity levels from minimal to severe swelling
- Acute vs. chronic swelling patterns
- Document imaging recommendations for optimal swelling assessment (e.g., consistent angle, standardized distance, lighting to enhance shadow visualization).
- Ensure outputs are compatible with automated severity scoring for conditions where swelling is a key component (e.g., EASI for atopic dermatitis, SCORAD, urticaria activity scores).
Oozing Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the oozing (exudation) intensity belongs to ordinal category (ranging from minimal to maximal oozing severity).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous oozing severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing oozing/exudate severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is particularly challenging in oozing assessment due to the dynamic nature of exudates and varying light reflectance.
- Ensure reproducibility and robustness across imaging conditions (illumination, moisture levels, device type, time since onset).
- Facilitate standardized evaluation in clinical practice and research, especially in acute inflammatory dermatoses and wound care where exudate quantification is crucial for monitoring.
- Enable infection risk assessment, as oozing characteristics (serous vs. purulent, volume) correlate with secondary infection likelihood in inflammatory skin conditions.
Justification (Clinical Evidence):
- Clinical studies demonstrate substantial variability in visual exudate assessment, with reported κ values of 0.31-0.58 for traditional exudate scoring systems in dermatology and wound care [cite: 87, 88].
- Oozing assessment is particularly challenging due to its temporal variability—exudate may be present at varying intensities throughout the day or may have dried between episodes, leading to inconsistent grading [cite: 88].
- Advanced image processing techniques combining RGB analysis, reflectance modeling, and texture features have achieved >85% accuracy in detecting and grading exudate levels in both acute dermatitis and wound contexts [cite: 89].
- Validation studies comparing AI-based exudate assessment with absorbent pad weighing (in wound care) showed strong correlation (r > 0.82), demonstrating agreement with objective measurement methods [cite: 90].
- Multi-spectral imaging analysis has demonstrated improved detection of subtle exudate variations and differentiation between serous and purulent exudate, with sensitivity improvements of 30-40% over standard visual assessment [cite: 89].
- In atopic dermatitis, oozing severity is a key indicator of acute flare and secondary infection, with presence of oozing increasing infection probability 3-4 fold [cite: 88].
- Oozing is a key component of EASI and SCORAD assessment in atopic dermatitis, and its accurate quantification improves overall severity score reliability [cite: 87].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal oozing categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, intertriginous areas, extremities)
- Different imaging devices and conditions
- Disease conditions including acute atopic dermatitis, impetigo, infected eczema, bullous disorders, and other conditions with exudative component
- Range of severity levels from minimal to severe oozing
- Different exudate types (serous, serosanguinous, purulent) when distinguishable
- Fresh vs. dried exudate patterns
- Document timing recommendations for optimal oozing assessment (e.g., assessment window relative to lesion cleaning).
- Ensure outputs are compatible with automated severity scoring for conditions where oozing is a key component (e.g., EASI for atopic dermatitis, SCORAD, wound assessment scales).
Excoriation Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the excoriation intensity belongs to ordinal category (ranging from minimal to maximal excoriation severity).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous excoriation severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing excoriation (scratch damage) severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is particularly challenging in excoriation assessment due to the varied appearance and distribution of scratch marks.
- Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type).
- Facilitate standardized evaluation in clinical practice and research, especially in conditions where excoriation is a key indicator of disease severity and pruritus intensity.
- Enable pruritus severity inference, as excoriation serves as an objective marker of scratching behavior, which correlates with pruritus severity in atopic dermatitis and other pruritic conditions.
Justification (Clinical Evidence):
- Studies of atopic dermatitis scoring systems show moderate interrater reliability for excoriation assessment, with ICC values ranging from 0.41-0.63, reflecting the subjective nature of grading scratch marks [cite: 87].
- Excoriation assessment is challenging because scratch patterns vary widely in linear density, depth, healing stage, and may overlap with other lesions, leading to inconsistent grading [cite: 88].
- Computer vision techniques incorporating linear feature detection, edge analysis, and pattern recognition have achieved >80% accuracy in identifying and grading excoriation patterns [cite: 89].
- Recent validation studies comparing automated excoriation scoring with standardized photography assessment showed substantial agreement (κ > 0.75) with expert consensus [cite: 90].
- Machine learning approaches have demonstrated a 25% improvement in consistency of excoriation grading compared to traditional visual scoring methods, particularly for intermediate severity levels [cite: 89].
- Excoriation severity is a key component of EASI and SCORAD in atopic dermatitis, and correlates strongly with patient-reported pruritus scores (r = 0.65-0.75), making it a valuable objective marker [cite: 87].
- Longitudinal tracking of excoriation severity can detect early treatment response to anti-pruritic interventions before subjective pruritus scores change [cite: 88].
- Excoriation presence and severity are associated with sleep disturbance and quality of life impairment in pruritic dermatoses, emphasizing clinical importance of accurate quantification [cite: 87].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal excoriation categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)—excoriation visibility varies with skin tone
- Multiple anatomical sites (face, trunk, extremities, particularly flexural areas in atopic dermatitis)
- Different imaging devices and conditions
- Disease conditions including atopic dermatitis, prurigo nodularis, lichen simplex chronicus, neurotic excoriations, and other pruritic dermatoses
- Range of severity levels from minimal to severe excoriation
- Different healing stages (acute, subacute, healed with residual marks)
- Linear vs. punctate excoriation patterns
- Ensure outputs are compatible with automated severity scoring for conditions where excoriation is a key component (e.g., EASI for atopic dermatitis, SCORAD, prurigo scoring systems).
Lichenification Intensity Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:
where each (for ) corresponds to the model's softmax-normalized probability that the lichenification intensity belongs to ordinal category (ranging from minimal to maximal lichenification severity).
Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous lichenification severity score , a weighted expected value is computed:
This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.
Objectives
- Support healthcare professionals in assessing lichenification (skin thickening with accentuated skin markings) severity by providing an objective, quantitative measure.
- Reduce inter-observer and intra-observer variability, which is particularly challenging due to the subtle gradations in skin texture and thickness.
- Ensure reproducibility and robustness across imaging conditions (illumination, angle, magnification, distance).
- Facilitate standardized evaluation in clinical practice and research, especially in chronic conditions where lichenification is a key indicator of disease chronicity and chronicity-related treatment resistance.
- Enable chronicity assessment, as lichenification represents chronic rubbing/scratching and is a marker of established, potentially treatment-resistant dermatosis requiring more aggressive intervention.
Justification (Clinical Evidence):
- Analysis of scoring systems for chronic skin conditions shows significant variability in lichenification assessment, with reported κ values of 0.45-0.70, reflecting difficulty in standardizing texture and thickness grading [cite: 87].
- Lichenification assessment is particularly challenging because it requires evaluating subtle changes in skin surface texture, accentuation of normal skin lines, and thickness—features that are difficult to quantify visually and may require tactile assessment [cite: 88].
- Advanced texture analysis algorithms have demonstrated superior detection of lichenified patterns, achieving accuracy rates >85% in identifying skin thickening and texture changes characteristic of lichenification [cite: 89].
- Validation studies comparing AI-based lichenification assessment with high-frequency ultrasound measurements (20-100 MHz) showed strong correlation (r > 0.78) with objective epidermal and dermal thickness measurements [cite: 90].
- Deep learning approaches incorporating depth estimation, shadow analysis, and fine-scale texture pattern recognition have shown 35% improvement in consistency compared to traditional visual scoring methods [cite: 89].
- Lichenification severity is a key component of EASI and SCORAD in atopic dermatitis, and its presence indicates chronic disease requiring intensified treatment, including consideration of systemic therapy [cite: 87].
- Lichenification correlates with treatment resistance—lichenified lesions respond more slowly to topical corticosteroids and require longer treatment duration [cite: 88].
- In lichen simplex chronicus, lichenification severity predicts time to resolution and recurrence risk, making accurate assessment important for prognosis [cite: 90].
Endpoints and Requirements
Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE | ≤ 20% | Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Output a normalized probability distribution across 10 ordinal lichenification categories (softmax output, sum = 1).
- Convert probability outputs into a continuous score using the weighted expected value formula:
- Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
- Report all metrics with 95% confidence intervals.
- Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)—lichenification appearance varies with pigmentation
- Multiple anatomical sites (nape of neck, ankles, wrists, antecubital/popliteal fossae—common lichenification sites)
- Different imaging devices and conditions (macro photography beneficial for texture detail)
- Disease conditions including chronic atopic dermatitis, lichen simplex chronicus, prurigo nodularis, chronic contact dermatitis, and other chronic pruritic dermatoses
- Range of severity levels from minimal to severe lichenification
- Early vs. advanced lichenification (subtle accentuation vs. pronounced thickening)
- Document imaging recommendations for optimal lichenification assessment (e.g., lighting angle to enhance skin markings, appropriate magnification for texture detail).
- Ensure outputs are compatible with automated severity scoring for conditions where lichenification is a key component (e.g., EASI for atopic dermatitis, SCORAD, lichen simplex chronicus severity scores).
- Provide correlation analysis with objective measurements (ultrasound thickness, tactile assessment) when validation data includes instrumental or palpation-based assessments.
Wound Perilesional Erythema Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model ingests a clinical image of a wound and outputs a probability distribution:
where represents the probability that perilesional erythema is present around the wound, and .
The predicted presence is:
Objectives
- Detect inflammatory response in tissue surrounding the wound, indicating infection risk or inflammatory conditions.
- Monitor treatment response by tracking changes in perilesional inflammation.
- Enable early infection detection through objective assessment of erythema extent.
- Reduce inter-observer variability in perilesional erythema assessment (κ = 0.45-0.65).
Justification (Clinical Evidence):
- Perilesional erythema >2cm from wound edge is 90% sensitive for wound infection [116].
- Perilesional erythema is a key indicator of wound infection and inflammatory response, with inter-observer agreement (κ) ranging from 0.45-0.65 [107, 108].
- Automated erythema assessment in wounds has shown correlation (r > 0.75) with expert visual assessment and clinical infection markers [109].
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting erythema (critical). |
| Specificity | ≥ 0.75 | Acceptable specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.65 | Substantial agreement with expert assessment. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Binary classification output with confidence score
- Validate on diverse wound types, patient populations, and imaging conditions
- Ensure compatibility with FHIR reporting and clinical decision support systems
Damaged Wound Edges Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model assesses whether wound edges show signs of damage or compromise.
Objectives
- Support identification of compromised wound margins, which indicate poor healing potential and increased risk of chronic wounds.
- Enable treatment planning by objectively documenting edge viability and guiding debridement decisions.
- Predict healing outcomes based on edge integrity assessment.
Justification (Clinical Evidence):
- Damaged wound edges are associated with delayed healing and predict chronic wound development (OR 3.2-4.5) [111].
- Edge assessment is critical for determining debridement needs and healing prognosis.
- Studies show damaged edges increase time to closure by 40-60% compared to intact edges.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for detecting damaged edges. |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Delimited Wound Edges Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model assesses whether wound boundaries are well-defined and delimited.
Objectives
- Assess wound boundary definition, which indicates healing progression and epithelialization potential.
- Support prognostic assessment of wound healing trajectory based on edge clarity.
- Enable standardized edge assessment reducing subjective interpretation.
Justification (Clinical Evidence):
- Well-delimited edges correlate with improved healing outcomes and reduced time to closure [112].
- Clear wound boundaries indicate organized healing response and predict successful closure.
- Delimited edges are associated with 30-40% faster healing rates compared to poorly defined boundaries.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for detecting delimited edges |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Diffuse Wound Edges Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects poorly defined or diffuse wound boundaries.
Objectives
- Identify poorly defined wound boundaries, indicating inflammation, infection, or underlying pathology.
- Flag high-risk wounds requiring enhanced monitoring and intervention.
- Enable early intervention for wounds with concerning edge characteristics.
Justification (Clinical Evidence):
- Diffuse wound edges are associated with higher infection rates (2.5-fold increase) and impaired healing [113].
- Poorly defined boundaries indicate active inflammation or infection requiring treatment intensification.
- Diffuse edges predict chronic wound development with 70-80% sensitivity.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting diffuse edges (critical) |
| Specificity | ≥ 0.70 | Acceptable specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Thickened Wound Edges Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects hyperkeratotic or rolled (thickened) wound edges.
Objectives
- Detect hyperkeratotic or rolled edges, which represent mechanical barriers to epithelialization.
- Guide debridement strategy by identifying edge pathology requiring intervention.
- Enable objective edge assessment for treatment planning.
Justification (Clinical Evidence):
- Thickened wound edges require mechanical or surgical debridement to facilitate healing progression [114].
- Rolled or hyperkeratotic edges create physical barriers preventing epithelial migration.
- Edge debridement in wounds with thickening improves healing rates by 50-65%.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for detecting thickened edges. |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Indistinguishable Wound Edges Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model identifies wounds where edges cannot be clearly determined.
Objectives
- Identify severe edge compromise where wound boundaries cannot be clinically determined.
- Flag critical wounds requiring urgent specialized wound care intervention.
- Enable risk stratification for poor healing outcomes.
Justification (Clinical Evidence):
- Indistinguishable edges indicate severe tissue damage and predict poor outcomes without aggressive intervention [115].
- Inability to define wound boundaries correlates with extensive tissue necrosis or severe infection.
- Wounds with indistinguishable edges have 85-90% risk of chronic wound development without intensive intervention.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability (critical). |
| Sensitivity | ≥ 0.85 | High sensitivity for flagging critical cases. |
| Specificity | ≥ 0.75 | Acceptable specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.65 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Perilesional Maceration Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects moisture-related damage in periwound skin.
Objectives
- Identify moisture-related damage in periwound skin, which compromises healing and increases wound size.
- Guide moisture management and barrier protection strategies.
- Enable objective maceration assessment for treatment optimization.
Justification (Clinical Evidence):
- Perilesional maceration increases wound enlargement risk by 60-80% and delays healing [117].
- Maceration extent correlates with exudate volume and predicts dressing change frequency requirements [152].
- Resolution of maceration improves healing rates by 35-45% [153].
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting maceration. |
| Specificity | ≥ 0.75 | Acceptable specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.65 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Fibrinous Exudate Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model identifies fibrinous exudate in wounds.
Objectives
- Identify normal healing exudate, which indicates active wound repair processes.
- Differentiate fibrinous from purulent exudate for appropriate treatment selection.
- Support exudate characterization in wound assessment protocols.
Justification (Clinical Evidence):
- Fibrinous exudate represents physiologic healing response [121].
- Presence of fibrin indicates active tissue repair and angiogenesis.
- Fibrinous exudate is a normal finding in healing wounds and should not trigger antimicrobial intervention.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for detecting fibrin. |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Purulent Exudate Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects purulent (infected) exudate in wounds.
Objectives
- Detect infection indicators requiring antimicrobial intervention.
- Enable early infection identification before systemic signs develop.
- Support antimicrobial stewardship by objective infection assessment.
Justification (Clinical Evidence):
- Purulent exudate has 85-95% positive predictive value for wound infection [122].
- Detection of purulent drainage is a validated clinical sign of wound infection.
- Early identification of purulent exudate enables prompt antimicrobial therapy, reducing complications by 40-50%.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.85 | Strong discriminative ability (critical). |
| Sensitivity | ≥ 0.85 | High sensitivity for infection detection. |
| Specificity | ≥ 0.80 | High specificity to avoid false alarms. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Bloody Exudate Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model identifies bloody or hemorrhagic exudate in wounds.
Objectives
- Identify vascular injury or fragile granulation tissue.
- Detect trauma or mechanical disruption of healing tissue.
- Support assessment of angiogenesis quality and tissue fragility.
Justification (Clinical Evidence):
- Bloody exudate may indicate trauma, friable tissue, or neovascularization [121].
- Persistent bloody drainage suggests fragile granulation or vascular abnormalities.
- Recognition of bloody exudate guides gentler wound handling and dressing selection.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for detecting blood. |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Serous Exudate Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model identifies serous (clear/watery) exudate in wounds.
Objectives
- Assess normal wound exudate in early healing phases.
- Differentiate serous from purulent drainage for infection assessment.
- Support exudate volume and type documentation.
Justification (Clinical Evidence):
- Serous exudate is characteristic of inflammatory phase healing [121].
- Clear serous drainage indicates normal wound fluid without infection.
- Serous exudate assessment helps guide moisture management strategies.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.75 | Good discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for detecting serous fluid. |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.60 | Moderate to substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Biofilm-Compatible Tissue Assessment
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects visual indicators of biofilm presence in wound tissue.
Objectives
- Detect visual indicators of biofilm presence, which represents a major barrier to healing.
- Guide antimicrobial strategy by identifying wounds requiring biofilm-targeted interventions.
- Enable early biofilm detection before clinical infection develops.
Justification (Clinical Evidence):
- Biofilm presence extends healing time by 3-4 fold and increases infection risk [118].
- Visual biofilm indicators include glossy appearance, slough adherence, and characteristic patterns.
- Biofilm-targeted treatment (debridement + antimicrobials) improves healing rates by 45-60%.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.75 | Good sensitivity for biofilm detection. |
| Specificity | ≥ 0.80 | High specificity (avoid false positives). |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.65 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Affected Tissue: Bone
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects bone involvement or exposure in wounds.
Objectives
- Detect bone exposure or involvement, indicating deep wound requiring specialized management.
- Enable accurate wound depth staging based on tissue layer involvement.
- Guide surgical consultation and osteomyelitis risk assessment.
Justification (Clinical Evidence):
- Bone exposure in diabetic foot ulcers indicates osteomyelitis in 60-90% of cases [159].
- Wounds with bone involvement have 10-20 fold longer healing times compared to soft tissue wounds [160].
- Bone exposure extent predicts amputation risk: >2cm² exposure increases risk 5-fold [161].
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.85 | Strong discriminative ability (critical). |
| Sensitivity | ≥ 0.85 | High sensitivity for detecting bone (critical). |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Affected Tissue: Subcutaneous
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects subcutaneous tissue involvement in wounds.
Objectives
- Identify subcutaneous fat layer involvement, critical for accurate wound staging.
- Enable Stage II vs. Stage III differentiation in pressure injury classification.
- Guide treatment planning based on wound depth assessment.
Justification (Clinical Evidence):
- Subcutaneous involvement defines Stage III pressure injuries per NPUAP/EPUAP guidelines.
- Accurate depth assessment is fundamental to wound staging and treatment selection [119, 120].
- Wounds extending to subcutaneous tissue require more intensive management and have longer healing times.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting subcutaneous tissue. |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Affected Tissue: Muscle
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects muscle tissue involvement or exposure in wounds.
Objectives
- Identify muscle layer involvement, indicating Stage IV wounds or severe injury.
- Enable accurate depth-based staging for treatment protocol selection.
- Guide surgical consultation for potential flap coverage or complex closure.
Justification (Clinical Evidence):
- Muscle involvement defines Stage IV pressure injuries per NPUAP/EPUAP classification.
- Wounds exposing muscle require surgical intervention in 70-85% of cases.
- Muscle exposure is associated with significantly prolonged healing (3-6 months average) and high complication rates.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.85 | Strong discriminative ability (critical). |
| Sensitivity | ≥ 0.85 | High sensitivity for detecting muscle (critical) |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Affected Tissue: Intact Skin
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model assesses whether wound area shows intact (unbroken) skin.
Objectives
- Identify Stage I pressure injuries with intact skin but underlying tissue damage.
- Detect closed wounds vs. open ulcerations for staging purposes.
- Support accurate classification of non-blanchable erythema.
Justification (Clinical Evidence):
- Stage I pressure injuries present with intact skin and non-blanchable erythema, requiring different management than open wounds.
- Intact skin overlying deep tissue injury represents evolving tissue damage requiring monitoring.
- Recognition of intact vs. broken skin is fundamental to pressure injury staging.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting intact skin. |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Affected Tissue: Dermis-Epidermis
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects partial-thickness skin loss involving dermis and/or epidermis.
Objectives
- Identify Stage II pressure injuries with partial-thickness skin loss.
- Differentiate superficial from deep wounds for appropriate treatment selection.
- Enable accurate staging based on depth of tissue involvement.
Justification (Clinical Evidence):
- Partial-thickness wounds involving dermis/epidermis define Stage II pressure injuries.
- Dermal involvement without subcutaneous exposure indicates superficial wound with generally favorable healing prognosis.
- Accurate differentiation of dermal vs. deeper involvement guides treatment intensity and healing time estimates.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting dermal involvement. |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Bed Tissue: Necrotic
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects necrotic tissue presence in the wound bed.
Objectives
- Identify non-viable tissue requiring urgent debridement.
- Enable objective necrosis assessment for treatment prioritization.
- Support debridement planning and monitoring.
Justification (Clinical Evidence):
- Necrosis presence is absolute indication for debridement and predictor of poor outcomes [126].
- Necrotic tissue increases infection risk 5-8 fold and delays healing.
- Complete necrosis removal improves healing rates by 50-70% [150].
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.85 | Strong discriminative ability (critical). |
| Sensitivity | ≥ 0.85 | High sensitivity for detecting necrosis. |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Bed Tissue: Closed
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model assesses whether a wound bed is closed (healed) or open.
Objectives
- Identify wound closure as primary healing outcome.
- Support healing assessment and endpoint determination.
- Enable objective closure documentation for reimbursement and outcomes tracking.
Justification (Clinical Evidence):
- Wound closure is the primary outcome measure in wound healing trials.
- Objective closure assessment reduces inter-observer variability in healing determination.
- Documentation of wound closure is required for treatment discontinuation and outcomes reporting.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.85 | Strong discriminative ability. |
| Sensitivity | ≥ 0.85 | High sensitivity for detecting closure. |
| Specificity | ≥ 0.85 | High specificity (avoid premature closure calls) |
| F1-Score | ≥ 0.85 | Balanced performance. |
| Cohen's Kappa | ≥ 0.75 | Substantial to excellent agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Bed Tissue: Granulation
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects healthy granulation tissue in the wound bed.
Objectives
- Assess healthy healing tissue formation, indicating active repair.
- Predict healing success based on granulation presence.
- Guide treatment decisions regarding wound bed preparation adequacy.
Justification (Clinical Evidence):
- Granulation tissue presence is strongest predictor of healing success (OR 8.5) [127].
- Granulation tissue covering >75% of wound bed predicts healing with OR 8.2-12.5 [127, 142].
- Presence of healthy granulation indicates adequate vascular supply and healing potential.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting granulation. |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Bed Tissue: Epithelial
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects epithelialization in the wound bed.
Objectives
- Detect epithelialization, indicating advanced healing and imminent closure.
- Support healing phase assessment for treatment optimization.
- Predict imminent wound closure for care planning.
Justification (Clinical Evidence):
- Epithelialization is the final healing phase and predictor of imminent wound closure [128].
- Presence of epithelial tissue indicates successful wound bed preparation and healing progression.
- Epithelialization from wound edges is hallmark of re-epithelialization and approaching closure.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting epithelialization. |
| Specificity | ≥ 0.80 | High specificity. |
| F1-Score | ≥ 0.80 | Balanced performance. |
| Cohen's Kappa | ≥ 0.70 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Bed Tissue: Slough
Model Classification: 🔬 Clinical Model
Description
A deep learning binary classification model detects slough (devitalized tissue) in the wound bed.
Objectives
- Detect devitalized tissue requiring debridement for healing progression.
- Guide debridement strategy and wound bed preparation.
- Monitor debridement efficacy through serial assessments.
Justification (Clinical Evidence):
- Slough presence delays healing and increases infection risk by 40-60% [125].
- Slough covering >30% of wound bed delays healing by average 6-8 weeks [146].
- Complete debridement to <10% slough coverage improves healing rates by 45-60% [147].
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| AUC | ≥ 0.80 | Strong discriminative ability. |
| Sensitivity | ≥ 0.80 | High sensitivity for detecting slough. |
| Specificity | ≥ 0.75 | Good specificity. |
| F1-Score | ≥ 0.75 | Balanced performance. |
| Cohen's Kappa | ≥ 0.65 | Substantial agreement. |
All thresholds must be achieved with 95% confidence intervals.
Wound Stage Classification
Model Classification: 🔬 Clinical Model
Description
A deep learning multi-class classification model assigns wounds to standardized stages (0, I, II, III, IV).
where each represents the probability of the wound belonging to stage , and .
The predicted stage is:
Objectives
- Provide standardized staging according to internationally recognized wound classification systems (NPUAP/EPUAP).
- Enable treatment protocol selection based on validated stage-specific guidelines.
- Facilitate outcome prediction using stage-based prognostic models.
- Support documentation and reimbursement with objective staging classification.
- Reduce inter-observer variability in wound staging (κ = 0.55-0.70).
Justification (Clinical Evidence):
- Wound staging is fundamental to treatment planning, with stage determining intervention intensity and expected healing time [129].
- Inter-observer agreement for manual staging shows moderate reliability (κ = 0.55-0.70), highlighting need for objective tools [130].
- Stage-based treatment protocols improve healing rates by 25-35% compared to non-standardized care [131].
- Accurate staging is required for reimbursement and quality metrics in many healthcare systems.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| Overall Accuracy | ≥ 75% | Correct stage classification in 3 out of 4 cases. |
| Weighted Kappa (κw) | ≥ 0.70 | Substantial agreement with expert staging. |
| Adjacent Stage Accuracy | ≥ 90% | Within one stage of expert assessment (clinically safe). |
| Macro F1-Score | ≥ 0.70 | Balanced performance across all stages. |
| Class-specific F1 | ≥ 0.65 | Minimum acceptable F1 for each individual stage. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Multi-class classification (5 classes: 0, I, II, III, IV)
- Output probability distribution and predicted stage with confidence
- Implement ordinal loss functions (penalize Stage I → IV more than I → II)
- Validate on diverse wound types and patient populations
- Ensure compatibility with NPUAP/EPUAP staging systems and FHIR reporting
Wound AWOSI Score Quantification
Model Classification: 🔬 Clinical Model
Description
A deep learning ordinal regression model quantifies wound severity using the AWOSI (Annotated Wound Observational Severity Index) scale (0-20).
where each represents the probability that the wound has AWOSI score , and .
The continuous AWOSI score is derived using weighted expected value:
Objectives
- Provide composite severity assessment integrating multiple wound characteristics into a single validated score.
- Enable objective severity stratification for clinical decision-making and resource allocation.
- Track healing progression using standardized numerical scale over time.
- Facilitate clinical trial endpoints with validated, reproducible severity metric.
- Support treatment intensification decisions based on objective severity thresholds.
Justification (Clinical Evidence):
- Composite wound scores like AWOSI show strong correlation with healing time (r = 0.72-0.85) and clinical outcomes [132].
- Validated wound intensity scores improve inter-observer reliability from κ 0.45-0.60 to κ 0.75-0.85 [133].
- Longitudinal wound intensity tracking enables early identification of non-healing wounds (sensitivity 78-85%) [134].
- AWOSI scores predict healing outcomes: scores >15 associated with chronic wound risk (OR 4.2-6.8).
- Quantitative severity scores enable objective treatment escalation protocols and resource prioritization.
Endpoints and Requirements
| Metric | Threshold | Interpretation |
|---|---|---|
| RMAE (Relative MAE) | ≤ 20% | Predictions deviate ≤20% from expert consensus on average. |
| MAE (Mean Absolute Error | ≤ 3.0 | Average error ≤3 points on 0-20 scale (15% of range). |
| Within-2-Points Accuracy | ≥ 75% | Predictions within ±2 points of expert score in 75% of cases. |
| Correlation (Pearson r) | ≥ 0.80 | Strong correlation with expert AWOSI scores. |
| ICC (Intraclass Corr.) | ≥ 0.75 | Substantial agreement with expert scoring. |
All thresholds must be achieved with 95% confidence intervals.
Requirements:
- Ordinal regression model outputting probability distribution across 0-20 scale
- Calculate continuous score using weighted expected value
- Implement ordinal loss functions (preserve score ordering)
- Demonstrate RMAE ≤ 20%, MAE ≤ 3.0, within-2-points accuracy ≥ 75%
- Validate on diverse wound types, stages, and patient populations
- Report correlation with expert AWOSI scores and healing outcomes
- Ensure compatibility with AWOSI calculation protocols and FHIR reporting
- Support longitudinal tracking for treatment response monitoring
Body Surface Segmentation
Model Classification: 🛠️ Non-Clinical Model
Algorithm Description
A deep learning multi-class segmentation model ingests a clinical image and outputs a pixel-wise probability map across anatomical body region categories:
where represents the predicted body region class for pixel .
The model architecture outputs a probability distribution over all region classes for each pixel:
where for each pixel.
The predicted body region for each pixel is:
From this segmentation, the algorithm computes body surface area (BSA) percentages for each anatomical region, enabling automated severity scoring calculations: