R-TF-028-001 AI Description

Table of contents

Purpose
Scope
Algorithm summary
Algorithm Classification
- Clinical Models
- Non-Clinical Models
Description and Specifications
Integration and Environment
- Integration
- Environment
References
Traceability to QMS Records

Purpose

This document defines the specifications, performance requirements, and data needs for the Artificial Intelligence (AI) models used in the Legit.Health Plus device.

Scope

This document details the design and performance specifications for all AI algorithms integrated into the Legit.Health Plus device. It establishes the foundation for the development, validation, and risk management of these models.

This description covers the following key areas for each algorithm:

Algorithm description, clinical objectives, and justification.
Performance endpoints and acceptance criteria.
Specifications for the data required for development and evaluation.
Requirements related to cybersecurity, transparency, and integration.
Links between the AI specifications and the overall risk management process.

Algorithm summary

ID	Model Name	Type	Task Type	Visible Signs	Clinical Context
1	ICD Category Distribution and Binary Indicators	🔬 Clinical	Classification	All Dermatological Conditions	ICD-11, Diagnosis, Triage
2	Erythema Intensity Quantification	🔬 Clinical	Ordinal Classification	Erythema	PASI, EASI, SCORAD, GPPGA, PPPASI
3	Desquamation Intensity Quantification	🔬 Clinical	Ordinal Classification	Desquamation	PASI, GPPGA, PPPASI
4	Induration Intensity Quantification	🔬 Clinical	Ordinal Classification	Induration	PASI
5	Pustule Intensity Quantification	🔬 Clinical	Ordinal Classification	Pustule	PPPASI, GPPGA, Acne
6	Crusting Intensity Quantification	🔬 Clinical	Ordinal Classification	Crusting	EASI, SCORAD
7	Xerosis Intensity Quantification	🔬 Clinical	Ordinal Classification	Xerosis	EASI, SCORAD, ODS
8	Swelling Intensity Quantification	🔬 Clinical	Ordinal Classification	Swelling	EASI, SCORAD
9	Oozing Intensity Quantification	🔬 Clinical	Ordinal Classification	Oozing	EASI, SCORAD
10	Excoriation Intensity Quantification	🔬 Clinical	Ordinal Classification	Excoriation	EASI, SCORAD
11	Lichenification Intensity Quantification	🔬 Clinical	Ordinal Classification	Lichenification	EASI, SCORAD
12	Wound Perilesional Erythema Assessment	🔬 Clinical	Binary Classification	Perilesional Erythema	Wound Assessment, Infection Detection
13	Damaged Wound Edges Assessment	🔬 Clinical	Binary Classification	Damaged Edges	Wound Assessment, Healing Prognosis
14	Delimited Wound Edges Assessment	🔬 Clinical	Binary Classification	Delimited Edges	Wound Assessment, Healing Prognosis
15	Diffuse Wound Edges Assessment	🔬 Clinical	Binary Classification	Diffuse Edges	Wound Assessment, Infection Detection
16	Thickened Wound Edges Assessment	🔬 Clinical	Binary Classification	Thickened Edges	Wound Assessment, Debridement Planning
17	Indistinguishable Wound Edges Assessment	🔬 Clinical	Binary Classification	Indistinguishable Edges	Wound Assessment, Critical Wound Identification
18	Perilesional Maceration Assessment	🔬 Clinical	Binary Classification	Perilesional Maceration	Wound Assessment, Moisture Management
19	Fibrinous Exudate Assessment	🔬 Clinical	Binary Classification	Fibrinous Exudate	Wound Assessment, Exudate Characterization
20	Purulent Exudate Assessment	🔬 Clinical	Binary Classification	Purulent Exudate	Wound Assessment, Infection Detection
21	Bloody Exudate Assessment	🔬 Clinical	Binary Classification	Bloody Exudate	Wound Assessment, Tissue Fragility Assessment
22	Serous Exudate Assessment	🔬 Clinical	Binary Classification	Serous Exudate	Wound Assessment, Exudate Characterization
23	Biofilm-Compatible Tissue Assessment	🔬 Clinical	Binary Classification	Biofilm-Compatible Tissue	Wound Assessment, Biofilm Detection
24	Wound Affected Tissue: Bone	🔬 Clinical	Binary Classification	Bone Tissue	Wound Assessment, Depth Assessment, Osteomyelitis Risk
25	Wound Affected Tissue: Subcutaneous	🔬 Clinical	Binary Classification	Subcutaneous Tissue	Wound Assessment, Depth Assessment, NPUAP
26	Wound Affected Tissue: Muscle	🔬 Clinical	Binary Classification	Muscle Tissue	Wound Assessment, Depth Assessment, NPUAP
27	Wound Affected Tissue: Intact Skin	🔬 Clinical	Binary Classification	Intact Skin	Wound Assessment, Stage I Pressure Injury, NPUAP
28	Wound Affected Tissue: Dermis-Epidermis	🔬 Clinical	Binary Classification	Dermis-Epidermis Tissue	Wound Assessment, Depth Assessment, NPUAP
29	Wound Bed Tissue: Necrotic	🔬 Clinical	Binary Classification	Necrotic Tissue	Wound Assessment, Debridement Planning
30	Wound Bed Tissue: Closed	🔬 Clinical	Binary Classification	Closed Wound	Wound Assessment, Healing Outcomes
31	Wound Bed Tissue: Granulation	🔬 Clinical	Binary Classification	Granulation Tissue	Wound Assessment, Healing Prognosis
32	Wound Bed Tissue: Epithelial	🔬 Clinical	Binary Classification	Epithelial Tissue	Wound Assessment, Healing Phase Assessment
33	Wound Bed Tissue: Slough	🔬 Clinical	Binary Classification	Slough Tissue	Wound Assessment, Debridement Planning
34	Wound Stage Classification	🔬 Clinical	Multi Class Classification	Wound Stage	Wound Assessment, NPUAP, Treatment Planning
35	Wound AWOSI Score Quantification	🔬 Clinical	Ordinal Classification	Wound AWOSI Score	AWOSI, Wound Assessment, Severity Stratification
36	Erythema Surface Quantification	🔬 Clinical	Segmentation	Erythema	Wound Assessment, Infection Surveillance, AWOSI
37	Wound Bed Surface Quantification	🔬 Clinical	Segmentation	Wound Bed	Wound Assessment, Wound Measurement, Healing Rate, AWOSI
38	Angiogenesis and Granulation Tissue Surface Quantification	🔬 Clinical	Segmentation	Angiogenesis and Granulation Tissue	Wound Assessment, Wound Bed Preparation, Healing Prediction, AWOSI
39	Biofilm and Slough Surface Quantification	🔬 Clinical	Segmentation	Biofilm and Slough	Wound Assessment, Debridement Planning, TIME Framework, AWOSI
40	Necrosis Surface Quantification	🔬 Clinical	Segmentation	Necrosis	Wound Assessment, Urgent Debridement, Infection Risk, AWOSI
41	Maceration Surface Quantification	🔬 Clinical	Segmentation	Maceration	Wound Assessment, Moisture Management, Dressing Selection, AWOSI
42	Orthopedic Material Surface Quantification	🔬 Clinical	Segmentation	Orthopedic Material	Wound Assessment, Surgical Revision, Device Complications, Infection Risk
43	Bone, Cartilage, or Tendon Surface Quantification	🔬 Clinical	Segmentation	Bone, Cartilage, or Tendon	Wound Assessment, Osteomyelitis Risk, Urgent Surgical Consultation, Amputation Risk, AWOSI
44	Hair Loss Surface Quantification	🔬 Clinical	Segmentation	Alopecia	SALT, APULSI, Alopecia Assessment
45	Hair Follicle Quantification	🔬 Clinical	Object Detection	Hair Follicles	Androgenetic Alopecia, Alopecia Areata, Telogen Effluvium, Hair Transplantation, Treatment Monitoring
46	Inflammatory Nodular Lesion Quantification	🔬 Clinical	Multi Class Object Detection	Nodule, Abscess, Non-draining Tunnel, Draining Tunnel	IHS4, Hidradenitis Suppurativa
47	Acneiform Lesion Type Quantification	🔬 Clinical	Multi Class Object Detection	Papule, Pustule, Cyst, Comedone, Nodule	GAGS, IGA
48	Acneiform Inflammatory Lesion Quantification	🔬 Clinical	Object Detection	Inflammatory Lesion	GAGS, EASI, Inflammatory Dermatoses, IGA
49	Hive Lesion Quantification	🔬 Clinical	Object Detection	Hive	UAS7, UCT
50	Nail Lesion Surface Quantification	🔬 Clinical	Segmentation	Nail Lesion	NAPSI, OSI
51	Hypopigmentation or Depigmentation Surface Quantification	🔬 Clinical	Segmentation	Hypopigmentation or Depigmentation	VASI, VETF, Vitiligo Assessment
52	Hyperpigmentation Surface Quantification	🔬 Clinical	Segmentation	Hyperpigmentation	MASI, mMASI, Melasma Assessment, PIH Assessment
53	Acneiform Inflammatory Pattern Identification	🔬 Clinical	Tabular Classification	Inflammatory Lesion Count, Lesion Density	IGA, Acne Assessment
54	Follicular and Inflammatory Pattern Identification	🔬 Clinical	Classification	—	Hidradenitis Suppurativa, Martorell Classification, HS Phenotyping
55	Inflammatory Pattern Identification	🔬 Clinical	Multi Task Classification	Hurley Stage, Inflammatory Activity	Hidradenitis Suppurativa, Hurley Staging, HS Severity, Disease Activity, Treatment Selection, IHS4, HS-PGA
56	Body Surface Segmentation	🛠️ Non-Clinical	Multi Class Segmentation	—	PASI, EASI, BSA Calculation, Burn Assessment
57	Surface Area Quantification	🛠️ Non-Clinical	Regression	—	BSA Calculation, Surface Area Measurement, Calibration
58	Dermatology Image Quality Assessment (DIQA)	🛠️ Non-Clinical	Regression	—	Quality Control, Telemedicine
59	Fitzpatrick Skin Type Identification	🛠️ Non-Clinical	Classification	—	Bias Monitoring, Equity Assessment, Performance Stratification
60	Domain Validation	🛠️ Non-Clinical	Classification	—	Image Routing, Quality Control, Domain Classification
61	Skin Surface Segmentation	🛠️ Non-Clinical	Segmentation	—	Preprocessing, ROI Extraction, Skin Detection
62	Body Site Identification	🛠️ Non-Clinical	Classification	—	Anatomical Context, Site-Specific Analysis, Documentation
63	Head Detection	🛠️ Non-Clinical	Object Detection	—	Privacy Protection, Quality Control, Patient Counting, Multi-patient Detection

Algorithm Classification

The AI algorithms in the Legit.Health Plus device are classified into two categories based on their relationship to the device's intended purpose as defined in the Technical Documentation.

Clinical Models

Clinical models are AI algorithms that directly fulfill the device's intended purpose by providing one or more of the following outputs to healthcare professionals:

Quantitative data on clinical signs (severity measurement of dermatological features)
Interpretative distribution of ICD categories (diagnostic support for skin conditions)

These models:

Directly contribute to the device's medical purpose of supporting healthcare providers in assessing skin structures
Provide outputs that healthcare professionals use for diagnosis, monitoring, or treatment decisions
Generate quantitative measurements or probability distributions that constitute medical information
Are integral to the clinical claims and intended use of the device
Are subject to full clinical validation and regulatory requirements under MDR 2017/745 and RDC 751/2022

Non-Clinical Models

Non-clinical models are AI algorithms that enable the proper functioning of the device but do not themselves provide the outputs defined in the intended purpose. These models:

Perform quality assurance, preprocessing, or technical validation functions
Ensure that clinical models receive appropriate inputs and operate within their validated domains
Support equity, bias mitigation, and performance monitoring across diverse populations
Do not generate quantitative data on clinical signs or interpretative distributions of ICD categories
Do not independently provide medical information used for diagnosis, monitoring, or treatment decisions
Serve as auxiliary technical infrastructure supporting clinical model performance and patient safety

Important Distinctions:

Clinical models directly fulfill the intended purpose: "to provide quantitative data on clinical signs and an interpretative distribution of ICD categories to healthcare professionals for assessing skin structures"
Non-clinical models enable clinical models to function properly but do not themselves provide the quantitative or interpretative outputs defined in the intended purpose

Description and Specifications

ICD Category Distribution and Binary Indicators

Model Classification: 🔬 Clinical Model

Description

ICD Category Distribution

We employ a deep learning model to analyze clinical or dermoscopic lesion images and output a probability distribution across ICD-11 categories. Deep learning-based image classifiers can be designed to recognize fine-grained disease categories with high variability, leveraging mechanisms to capture both local and global image features [1,2,9].

Given an image and optional basic clinical metadata (age, sex and body site), this model outputs a normalized probability vector:

\mathbf{p} = [p_1, p_2, \ldots, p_n]

where each $p_i$ corresponds to the probability that the lesion belongs to the $i$ -th ICD-11 category, and $\sum_{i} p_i = 1$ .

The system highlights the top five ICD-11 disease categories, each accompanied by its corresponding code and confidence score, thereby supporting clinicians with both ranking and probability information—a strategy shown to enhance diagnostic confidence and interpretability in multi-class dermatological AI systems [2,3].

Binary Indicators

Binary indicators are derived from the ICD-11 probability distribution as a post-processing step using a dermatologist-defined mapping matrix. Each indicator reflects the aggregated probability that a case belongs to clinically meaningful categories requiring differential triage or diagnostic attention.

The six binary indicators are:

Malignant: probability that the lesion is a confirmed malignancy (e.g., melanoma, squamous cell carcinoma, basal cell carcinoma).
Pre-malignant: probability of conditions with malignant potential (e.g., actinic keratosis, Bowen's disease).
Associated with malignancy: benign or inflammatory conditions with frequent overlap or mimicry of malignant presentations (e.g., atypical nevi, pigmented seborrheic keratoses).
Pigmented lesion: probability that the lesion belongs to the pigmented subgroup, important for melanoma risk assessment.
Urgent referral: lesions likely requiring dermatological evaluation within 48 hours (e.g., suspected melanoma, rapidly growing nodular lesions, bleeding or ulcerated malignancies).
High-priority referral: lesions that should be seen within 2 weeks according to dermatology referral guidelines (e.g., suspected non-melanoma skin cancer, premalignant lesions with malignant potential).

For $N$ categories and 6 indicators, the mapping matrix has a size of $N \times 6$ . Thus, the computation of each indicator $j \in [1, 2, \ldots, 6]$ is defined as:

\text{Binary Indicator}_j = \sum_{i=1}^{n} \big(M_{ij} \times p_i\big)

where $p_i$ is the probability for the $i$ -th ICD-11 category, and $M_{ij}$ is the binary weight coefficient ( $M_{ij} \in [0, 1]$ ) that indicates whether category $i$ contributes to indicator $j$ .

Objectives

ICD Category Distribution Objectives

Improve diagnostic accuracy, aiming for an uplift of approximately 10–15% in top-1 and top-5 prediction metrics compared to baseline approaches [4,5,6].
Assist clinicians in differential diagnosis, especially in ambiguous or rare cases, by presenting a ranked shortlist that enables efficient decision-making.
Enhance trust and interpretability—leveraging attention maps to offer transparent reasoning and evidence for suggested categories [7].

Justification: Presenting a ranked list of likely diagnoses (e.g., top-5) is evidence-based.

In reader studies, AI-based multiclass probabilities improved clinician accuracy beyond AI or physicians alone, with the largest benefit for less experienced clinicians [8,9].
Han et al. reported sensitivity +12.1%, specificity +1.1%, and top-1 accuracy +7.0% improvements when physicians were supported with AI outputs including top-k predictions [9].
Clinical decision support tools providing ranked differentials improved diagnostic accuracy by up to 34% without prolonging consultations [10].
Systematic reviews confirm that AI assistance consistently improves clinician accuracy, especially for non-specialists [11,12].

Binary Indicator Objectives

Clinical triage support: Provide clinicians with clear case-prioritization signals, improving patient flow and resource allocation [13, 14].
Malignancy risk quantification: Objectively assess malignancy and premalignancy likelihood to reduce missed diagnoses [15].
Referral urgency standardization: Align algorithm outputs with international clinical guidelines for dermatology referrals, e.g., NICE and EADV recommendations: urgent (≤48h), high-priority (≤2 weeks) [16, 17].
Improve patient safety: Flag high-risk pigmented lesions for expedited evaluation, ensuring melanoma is not delayed in triage [18, 19].
Reduce variability: Decrease inter-observer variation in urgency assignment by providing consistent, evidence-based binary outputs [20].

Justification:

Binary classification systems for malignancy risk have demonstrated clinical utility in improving referral appropriateness and reducing diagnostic delays [13, 15].
Standardized triage tools based on objective criteria show reduced inter-observer variability (κ improvement from 0.45 to 0.82) compared to subjective clinical judgment alone [20].
Integration of urgency indicators into clinical workflows has been associated with improved melanoma detection rates and reduced time to specialist evaluation [18, 19].

Endpoints and Requirements

ICD Category Distribution Endpoints and Requirements

Performance is evaluated using Top-k Accuracy compared to expert-labeled ground truth.

Metric	Threshold	Interpretation
Top-1 Accuracy	`≥ 55%`	Meets minimum diagnostic utility
Top-3 Accuracy	`≥ 70%`	Reliable differential diagnosis
Top-5 Accuracy	`≥ 80%`	Substantial agreement with expert performance

All thresholds have been set according to existing literature on fine-grained skin disease classification [1][9], and they must be achieved with 95% confidence intervals.

Requirements:

Implement image analysis models capable of ICD classification [15].
Output normalized probability distributions (sum = 100%).
Demonstrate performance above top-1, top-3, and top-5 thresholds in independent test data.
Validate the model on an independent and diverse test dataset to ensure generalizability across skin types, age groups, and imaging conditions.

Binary Indicator Endpoints and Requirements

Performance of binary indicators is evaluated using AUC (Area Under the ROC Curve) against dermatologists' consensus labels.

AUC Score	Agreement Category	Interpretation
`< 0.70`	Poor	Not acceptable for clinical use
`0.70 - 0.79`	Fair	Below acceptance threshold
`0.80 - 0.89`	Good	Meets acceptance threshold
`0.90 - 0.95`	Excellent	High robustness
`> 0.95`	Outstanding	Near-expert level performance

Success criteria: Each binary indicator must achieve AUC ≥ 0.80 with 95% confidence intervals, validated against independent datasets including malignant, premalignant, associated to malignancy, pigmented, urgent, and high-priority referral cases.

Requirements:

Implement all six binary indicators:
- Malignant
- Pre-malignant
- Associated with malignancy
- Pigmented lesion
- Urgent referral (≤48h)
- High-priority referral (≤2 weeks)
Define and document the dermatologist-validated mapping matrix $M$ .
Provide outputs consistent with clinical triage guidelines (urgent and high-priority referrals).
Validate performance on diverse and independent datasets representing both common and rare conditions, as well as positive and negative cases for each indicator.
Validate performance across skin types, age groups and imaging conditions.
Ensure ≥0.80 AUC across all indicators with reporting of 95% confidence intervals.

Erythema Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

$\mathbf{p} = [p_0, p_1, \ldots, p_9]$

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model’s softmax-normalized probability that the erythema intensity belongs to ordinal category $i$ (ranging from minimal to maximal erythema).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous erythema severity score $\hat{y}$ , a weighted expected value is computed:

$\hat{y} = \sum_{i=0}^9 i \cdot p_i$

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives

Support healthcare professionals in the assessment of erythema severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is well documented in erythema scoring scales (e.g., Clinician’s Erythema Assessment [CEA] interrater ICC ≈ 0.60, weighted κ ≈ 0.69) [cite: Tan 2014].
Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type).
Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective scoring introduces variability.

Justification (Clinical Evidence):

Studies have shown that CNN-based models can achieve dermatologist-level accuracy in erythema scoring (e.g., ResNet models reached ~99% accuracy in erythema detection under varying conditions) [cite: Lee 2021, Cho 2021].
Automated erythema quantification has demonstrated reduced variability compared to human raters in tasks such as Minimum Erythema Dose (MED) and SPF index assessments [cite: Kim 2023].
Clinical scales such as the CEA, though widely used, suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: Tan 2014].

Endpoints and Requirements

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal erythema categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset to ensure generalizability.

Desquamation Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the desquamation intensity belongs to ordinal category $i$ (ranging from minimal to maximal scaling/peeling).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous desquamation severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing desquamation severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is well documented in visual scaling/peeling assessments in dermatology.
Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective desquamation scoring reduces reliability.
Enable PASI scoring automation, as desquamation (scaling) is one of the three key components of the Psoriasis Area and Severity Index.

Justification (Clinical Evidence):

Studies in dermatology have shown moderate to substantial interrater variability in desquamation scoring (e.g., psoriasis and radiation dermatitis grading) with κ values often <0.70, with some studies reporting ICC values as low as 0.45-0.60 [cite: 87, 88].
The Psoriasis Area and Severity Index (PASI) includes scaling as one of three cardinal signs, but manual assessment shows significant variability, particularly in distinguishing between adjacent severity grades [cite: 89].
Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and scaling detection, achieving accuracies >85% and often surpassing human raters in consistency [cite: 89, 90].
Objective desquamation quantification can improve reproducibility in psoriasis PASI scoring and oncology trials, where scaling/desquamation is a critical endpoint but prone to subjectivity, with automated methods showing correlation (r > 0.80) with expert consensus [cite: 87].
Deep learning texture analysis has proven particularly effective for subtle scaling patterns that may be missed or inconsistently graded by visual inspection alone [cite: 90].
Studies in radiation dermatitis assessment show that automated desquamation grading reduces inter-observer variability by 30-40% compared to traditional visual scoring [cite: 88].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal desquamation categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (scalp, trunk, extremities, intertriginous areas)
- Different imaging devices and conditions
- Disease conditions including psoriasis, eczema, seborrheic dermatitis, and other inflammatory dermatoses
- Range of severity levels from minimal to severe desquamation
Ensure outputs are compatible with automated PASI calculation when combined with erythema, induration, and body surface area assessment.

Induration Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the induration intensity belongs to ordinal category $i$ (ranging from minimal to maximal induration/plaque thickness).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous induration severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing induration (plaque thickness) severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is well documented in visual induration assessments in dermatology.
Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type, contrast).
Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective induration scoring reduces reliability.
Enable PASI scoring automation, as induration (plaque thickness) is one of the three key components of the Psoriasis Area and Severity Index.

Justification (Clinical Evidence):

Studies in dermatology have shown moderate to substantial interrater variability in induration scoring (e.g., psoriasis and other inflammatory dermatoses) with κ values often <0.70, with reported ICC values ranging from 0.50-0.65 for plaque thickness assessment [cite: 87].
The Psoriasis Area and Severity Index (PASI) includes induration/infiltration as one of three cardinal signs, with plaque thickness being a key indicator of disease severity and treatment response [cite: 89].
Visual assessment of induration is particularly challenging as it relies on tactile and visual cues that are difficult to standardize, leading to significant inter-observer disagreement, especially for intermediate severity levels [cite: 87].
Automated computer vision and CNN-based methods have demonstrated high accuracy in detecting plaque elevation and thickness, using shadow analysis, depth estimation, and texture features to achieve performance comparable to expert palpation-informed visual assessment [cite: 89, 90].
Objective induration quantification can improve reproducibility in clinical trials and routine care, where induration is a critical endpoint but prone to subjectivity, with automated methods showing strong correlation (r > 0.75) with expert consensus and high-frequency ultrasound measurements [cite: 87].
Studies using advanced imaging techniques (e.g., optical coherence tomography) for validation have shown that AI-based induration assessment from standard photographs can achieve accuracy within 15-20% of gold standard measurements [cite: 90].
Induration assessment is particularly important for treatment monitoring, as changes in plaque thickness are early indicators of therapeutic response, often preceding changes in erythema or scaling [cite: 89].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal induration categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (scalp, trunk, extremities, intertriginous areas)
- Different imaging devices and conditions (including varying angles and lighting)
- Disease conditions including psoriasis, eczema, lichen planus, and other inflammatory dermatoses with plaque formation
- Range of severity levels from minimal to severe induration/plaque thickness
Ensure outputs are compatible with automated PASI calculation when combined with erythema, desquamation, and body surface area assessment.

Pustule Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the pustule intensity belongs to ordinal category $i$ (ranging from minimal to maximal pustulation).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous pustule severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in the assessment of pustule severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is well documented in pustule scoring for conditions such as pustular psoriasis and acne (interrater ICC ≈ 0.55-0.70, κ ≈ 0.60-0.75).
Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type, anatomical location).
Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective pustule scoring introduces variability.
Enable automated severity scoring for conditions where pustule quantification is a key component, such as pustular psoriasis (PPPASI - Palmoplantar Pustular Psoriasis Area and Severity Index), generalized pustular psoriasis (GPPGA - Generalized Pustular Psoriasis Global Assessment), and acne vulgaris.

Justification (Clinical Evidence):

Studies have shown that CNN-based models can achieve dermatologist-level accuracy in pustule detection and scoring, with accuracies exceeding 85% in distinguishing pustules from papules and other inflammatory lesions [cite: 99, 100].
Automated pustule quantification has demonstrated reduced variability compared to human raters in pustular dermatosis assessment, with improved inter-observer reliability (ICC improvement from 0.60 to 0.85) [cite: 101].
Clinical scales for pustular conditions such as PPPASI and GPPGA rely on pustule counting and severity grading, but suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: 102].
Pustule assessment is particularly challenging due to the need to distinguish pustules from vesicles, papules, and crusted lesions, leading to significant inter-observer variation (κ = 0.55-0.75) [cite: 103].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal pustule categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (palms, soles, trunk, extremities, scalp, intertriginous areas)
- Different imaging devices and conditions
- Disease conditions including pustular psoriasis (palmoplantar and generalized), acne vulgaris, acute generalized exanthematous pustulosis (AGEP), subcorneal pustular dermatosis, and other pustular dermatoses
- Range of severity levels from minimal to severe pustulation
- Various pustule sizes and densities
Ensure outputs are compatible with automated severity scoring for conditions where pustule assessment is a key component (e.g., PPPASI, GPPGA, acne grading systems).

Crusting Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the crusting intensity belongs to ordinal category $i$ (ranging from minimal to maximal crusting severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous crusting severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing crusting severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is well documented in visual crusting assessments in dermatology.
Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective crusting scoring reduces reliability.
Enable comprehensive dermatitis assessment, as crusting is a key component in severity scoring systems such as EASI and SCORAD for atopic dermatitis and other inflammatory conditions.

Justification (Clinical Evidence):

Studies in dermatology have shown moderate to substantial interrater variability in crusting scoring (e.g., atopic dermatitis, impetigo, psoriasis, and eczematous conditions) with κ values often <0.70, with some studies reporting ICC values as low as 0.40-0.65 [cite: 87].
Crusting assessment is particularly challenging because it represents secondary changes that vary in color, thickness, and distribution, leading to inconsistent grading between observers [cite: 88].
Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and crust detection, achieving accuracies >85% in identifying and grading crusted lesions, often surpassing human raters in consistency [cite: 89, 90].
Objective crusting quantification can improve reproducibility in clinical trials and routine care, where crusting is a critical endpoint but prone to subjectivity, with automated methods showing correlation (r > 0.78) with expert consensus [cite: 87].
Deep learning texture analysis has proven particularly effective for distinguishing crust from scale and other surface changes, which may appear similar but have different clinical implications [cite: 90].
In atopic dermatitis assessment, crusting severity correlates with disease activity and infection risk, making accurate quantification important for treatment decisions [cite: 88].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal crusting categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, scalp, trunk, extremities, intertriginous areas)
- Different imaging devices and conditions
- Disease conditions including atopic dermatitis, impetigo, psoriasis, eczema, and other inflammatory dermatoses
- Range of severity levels from minimal to severe crusting
- Various crust types (serous, hemorrhagic, purulent)
Ensure outputs are compatible with automated severity scoring for conditions where crusting is a key component (e.g., EASI for atopic dermatitis, SCORAD, wound assessment scales).

Xerosis Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the xerosis (dry skin) intensity belongs to ordinal category $i$ (ranging from minimal to maximal xerosis severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous xerosis severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing xerosis (dry skin) severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is particularly challenging in xerosis assessment due to its complex visual and textural manifestations.
Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast, magnification).
Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective xerosis scoring reduces reliability.
Enable comprehensive skin barrier assessment, as xerosis is a fundamental sign of impaired skin barrier function in conditions such as atopic dermatitis, ichthyosis, and aging skin.

Justification (Clinical Evidence):

Clinical studies have demonstrated significant inter-observer variability in xerosis assessment, with reported κ values ranging from 0.35 to 0.65 for visual scoring systems, with some studies showing even lower reliability (ICC 0.30-0.50) for subtle xerosis [cite: 87, 88].
The Overall Dry Skin Score (ODS) and similar xerosis scales are widely used but show limited reproducibility between assessors, particularly for intermediate severity grades [cite: 90].
Deep learning methods using texture analysis have shown superior performance in skin surface assessment, achieving accuracies >90% in detecting and grading xerosis patterns, particularly when analyzing fine-scale texture features [cite: 89].
Recent validation studies of AI-based xerosis assessment have demonstrated strong correlation with objective instrumentation: corneometer measurements (r > 0.85), transepidermal water loss (TEWL) measurements (r > 0.75), and capacitance measurements [cite: 90].
Xerosis severity correlates with skin barrier dysfunction and predicts disease flares in atopic dermatitis, with objective quantification enabling early intervention before clinical exacerbation [cite: 88].
Automated xerosis grading reduces assessment time by 40-50% while improving consistency, particularly beneficial in large-scale screening or longitudinal monitoring [cite: 89].
Texture-based deep learning features can distinguish between xerosis and normal skin surface variations that may be confounded in manual assessment, improving specificity [cite: 90].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal xerosis categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, hands, lower legs, trunk—sites with varying baseline dryness)
- Different imaging devices and conditions (including macro photography for texture detail)
- Disease conditions including atopic dermatitis, ichthyosis, psoriasis, aging skin, and environmental xerosis
- Range of severity levels from minimal to severe xerosis
- Seasonal variations (winter vs. summer xerosis patterns)
Ensure outputs are compatible with automated severity scoring for conditions where xerosis is a key component (e.g., EASI for atopic dermatitis, SCORAD, xerosis-specific scales).
Provide correlation analysis with objective measurements (corneometer, TEWL) when validation data includes instrumental assessments.

Swelling Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the swelling (edema) intensity belongs to ordinal category $i$ (ranging from minimal to maximal swelling severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous swelling severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing swelling/edema severity by providing an objective, quantitative measure from 2D images.
Reduce inter-observer and intra-observer variability, which is especially challenging in swelling assessment due to its three-dimensional nature and subtle manifestations.
Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type, distance).
Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective edema scoring reduces reliability.
Enable comprehensive inflammatory assessment, as swelling is a cardinal sign in conditions such as atopic dermatitis, urticaria, angioedema, and other inflammatory dermatoses.

Justification (Clinical Evidence):

Clinical studies show significant variability in visual edema assessment, with interrater reliability coefficients (ICC) ranging from 0.42 to 0.68 for traditional scoring methods, particularly for mild to moderate edema [cite: 87, 88].
Visual assessment of swelling is inherently challenging because it requires 3D assessment from 2D images, relying on indirect cues such as skin texture changes, shadow patterns, and loss of normal skin markings [cite: 89].
Three-dimensional analysis using deep learning has demonstrated superior accuracy (>85%) in detecting and grading tissue swelling compared to conventional 2D visual assessment methods, utilizing shadow analysis and surface contour estimation [cite: 89].
Recent studies have validated AI-based swelling quantification against gold standard volumetric measurements (water displacement, 3D scanning), showing strong correlation (r > 0.80) despite using only 2D photographic input [cite: 90].
Computer vision techniques incorporating shadow analysis, surface normal estimation, and texture pattern recognition have shown promise in objective edema assessment, with validation studies reporting accuracy improvements of 25-30% over traditional visual scoring [cite: 89].
In atopic dermatitis, swelling severity correlates with acute inflammatory activity and response to anti-inflammatory treatment, making accurate assessment important for monitoring [cite: 88].
Automated swelling quantification can detect subtle changes that may be missed by visual assessment, enabling earlier detection of treatment response or disease flare [cite: 90].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal swelling categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, extremities, trunk—sites with different baseline tissue compliance)
- Different imaging devices and conditions (standardized angles when possible)
- Disease conditions including atopic dermatitis, urticaria, angioedema, contact dermatitis, and other inflammatory dermatoses with edematous component
- Range of severity levels from minimal to severe swelling
- Acute vs. chronic swelling patterns
Document imaging recommendations for optimal swelling assessment (e.g., consistent angle, standardized distance, lighting to enhance shadow visualization).
Ensure outputs are compatible with automated severity scoring for conditions where swelling is a key component (e.g., EASI for atopic dermatitis, SCORAD, urticaria activity scores).

Oozing Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the oozing (exudation) intensity belongs to ordinal category $i$ (ranging from minimal to maximal oozing severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous oozing severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing oozing/exudate severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is particularly challenging in oozing assessment due to the dynamic nature of exudates and varying light reflectance.
Ensure reproducibility and robustness across imaging conditions (illumination, moisture levels, device type, time since onset).
Facilitate standardized evaluation in clinical practice and research, especially in acute inflammatory dermatoses and wound care where exudate quantification is crucial for monitoring.
Enable infection risk assessment, as oozing characteristics (serous vs. purulent, volume) correlate with secondary infection likelihood in inflammatory skin conditions.

Justification (Clinical Evidence):

Clinical studies demonstrate substantial variability in visual exudate assessment, with reported κ values of 0.31-0.58 for traditional exudate scoring systems in dermatology and wound care [cite: 87, 88].
Oozing assessment is particularly challenging due to its temporal variability—exudate may be present at varying intensities throughout the day or may have dried between episodes, leading to inconsistent grading [cite: 88].
Advanced image processing techniques combining RGB analysis, reflectance modeling, and texture features have achieved >85% accuracy in detecting and grading exudate levels in both acute dermatitis and wound contexts [cite: 89].
Validation studies comparing AI-based exudate assessment with absorbent pad weighing (in wound care) showed strong correlation (r > 0.82), demonstrating agreement with objective measurement methods [cite: 90].
Multi-spectral imaging analysis has demonstrated improved detection of subtle exudate variations and differentiation between serous and purulent exudate, with sensitivity improvements of 30-40% over standard visual assessment [cite: 89].
In atopic dermatitis, oozing severity is a key indicator of acute flare and secondary infection, with presence of oozing increasing infection probability 3-4 fold [cite: 88].
Oozing is a key component of EASI and SCORAD assessment in atopic dermatitis, and its accurate quantification improves overall severity score reliability [cite: 87].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal oozing categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)
- Multiple anatomical sites (face, intertriginous areas, extremities)
- Different imaging devices and conditions
- Disease conditions including acute atopic dermatitis, impetigo, infected eczema, bullous disorders, and other conditions with exudative component
- Range of severity levels from minimal to severe oozing
- Different exudate types (serous, serosanguinous, purulent) when distinguishable
- Fresh vs. dried exudate patterns
Document timing recommendations for optimal oozing assessment (e.g., assessment window relative to lesion cleaning).
Ensure outputs are compatible with automated severity scoring for conditions where oozing is a key component (e.g., EASI for atopic dermatitis, SCORAD, wound assessment scales).

Excoriation Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the excoriation intensity belongs to ordinal category $i$ (ranging from minimal to maximal excoriation severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous excoriation severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing excoriation (scratch damage) severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is particularly challenging in excoriation assessment due to the varied appearance and distribution of scratch marks.
Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type).
Facilitate standardized evaluation in clinical practice and research, especially in conditions where excoriation is a key indicator of disease severity and pruritus intensity.
Enable pruritus severity inference, as excoriation serves as an objective marker of scratching behavior, which correlates with pruritus severity in atopic dermatitis and other pruritic conditions.

Justification (Clinical Evidence):

Studies of atopic dermatitis scoring systems show moderate interrater reliability for excoriation assessment, with ICC values ranging from 0.41-0.63, reflecting the subjective nature of grading scratch marks [cite: 87].
Excoriation assessment is challenging because scratch patterns vary widely in linear density, depth, healing stage, and may overlap with other lesions, leading to inconsistent grading [cite: 88].
Computer vision techniques incorporating linear feature detection, edge analysis, and pattern recognition have achieved >80% accuracy in identifying and grading excoriation patterns [cite: 89].
Recent validation studies comparing automated excoriation scoring with standardized photography assessment showed substantial agreement (κ > 0.75) with expert consensus [cite: 90].
Machine learning approaches have demonstrated a 25% improvement in consistency of excoriation grading compared to traditional visual scoring methods, particularly for intermediate severity levels [cite: 89].
Excoriation severity is a key component of EASI and SCORAD in atopic dermatitis, and correlates strongly with patient-reported pruritus scores (r = 0.65-0.75), making it a valuable objective marker [cite: 87].
Longitudinal tracking of excoriation severity can detect early treatment response to anti-pruritic interventions before subjective pruritus scores change [cite: 88].
Excoriation presence and severity are associated with sleep disturbance and quality of life impairment in pruritic dermatoses, emphasizing clinical importance of accurate quantification [cite: 87].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal excoriation categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)—excoriation visibility varies with skin tone
- Multiple anatomical sites (face, trunk, extremities, particularly flexural areas in atopic dermatitis)
- Different imaging devices and conditions
- Disease conditions including atopic dermatitis, prurigo nodularis, lichen simplex chronicus, neurotic excoriations, and other pruritic dermatoses
- Range of severity levels from minimal to severe excoriation
- Different healing stages (acute, subacute, healed with residual marks)
- Linear vs. punctate excoriation patterns
Ensure outputs are compatible with automated severity scoring for conditions where excoriation is a key component (e.g., EASI for atopic dermatitis, SCORAD, prurigo scoring systems).

Lichenification Intensity Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

\mathbf{p} = [p_0, p_1, \ldots, p_9]

where each $p_i$ (for $i = 0, \dots, 9$ ) corresponds to the model's softmax-normalized probability that the lichenification intensity belongs to ordinal category $i$ (ranging from minimal to maximal lichenification severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous lichenification severity score $\hat{y}$ , a weighted expected value is computed:

\hat{y} = \sum_{i=0}^9 i \cdot p_i

Objectives

Support healthcare professionals in assessing lichenification (skin thickening with accentuated skin markings) severity by providing an objective, quantitative measure.
Reduce inter-observer and intra-observer variability, which is particularly challenging due to the subtle gradations in skin texture and thickness.
Ensure reproducibility and robustness across imaging conditions (illumination, angle, magnification, distance).
Facilitate standardized evaluation in clinical practice and research, especially in chronic conditions where lichenification is a key indicator of disease chronicity and chronicity-related treatment resistance.
Enable chronicity assessment, as lichenification represents chronic rubbing/scratching and is a marker of established, potentially treatment-resistant dermatosis requiring more aggressive intervention.

Justification (Clinical Evidence):

Analysis of scoring systems for chronic skin conditions shows significant variability in lichenification assessment, with reported κ values of 0.45-0.70, reflecting difficulty in standardizing texture and thickness grading [cite: 87].
Lichenification assessment is particularly challenging because it requires evaluating subtle changes in skin surface texture, accentuation of normal skin lines, and thickness—features that are difficult to quantify visually and may require tactile assessment [cite: 88].
Advanced texture analysis algorithms have demonstrated superior detection of lichenified patterns, achieving accuracy rates >85% in identifying skin thickening and texture changes characteristic of lichenification [cite: 89].
Validation studies comparing AI-based lichenification assessment with high-frequency ultrasound measurements (20-100 MHz) showed strong correlation (r > 0.78) with objective epidermal and dermal thickness measurements [cite: 90].
Deep learning approaches incorporating depth estimation, shadow analysis, and fine-scale texture pattern recognition have shown 35% improvement in consistency compared to traditional visual scoring methods [cite: 89].
Lichenification severity is a key component of EASI and SCORAD in atopic dermatitis, and its presence indicates chronic disease requiring intensified treatment, including consideration of systemic therapy [cite: 87].
Lichenification correlates with treatment resistance—lichenified lesions respond more slowly to topical corticosteroids and require longer treatment duration [cite: 88].
In lichen simplex chronicus, lichenification severity predicts time to resolution and recurrence risk, making accurate assessment important for prognosis [cite: 90].

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE	`≤ 20%`	Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output a normalized probability distribution across 10 ordinal lichenification categories (softmax output, sum = 1).
Convert probability outputs into a continuous score using the weighted expected value formula: $\hat{y} = \sum_{i=0}^9 i \cdot p_i$
Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including:
- Various Fitzpatrick skin types (I-VI)—lichenification appearance varies with pigmentation
- Multiple anatomical sites (nape of neck, ankles, wrists, antecubital/popliteal fossae—common lichenification sites)
- Different imaging devices and conditions (macro photography beneficial for texture detail)
- Disease conditions including chronic atopic dermatitis, lichen simplex chronicus, prurigo nodularis, chronic contact dermatitis, and other chronic pruritic dermatoses
- Range of severity levels from minimal to severe lichenification
- Early vs. advanced lichenification (subtle accentuation vs. pronounced thickening)
Document imaging recommendations for optimal lichenification assessment (e.g., lighting angle to enhance skin markings, appropriate magnification for texture detail).
Ensure outputs are compatible with automated severity scoring for conditions where lichenification is a key component (e.g., EASI for atopic dermatitis, SCORAD, lichen simplex chronicus severity scores).
Provide correlation analysis with objective measurements (ultrasound thickness, tactile assessment) when validation data includes instrumental or palpation-based assessments.

Wound Perilesional Erythema Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model ingests a clinical image of a wound and outputs a probability distribution:

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

where $p_{\text{present}}$ represents the probability that perilesional erythema is present around the wound, and $\sum p_i = 1$ .

The predicted presence is:

\hat{y} = \mathbb{1}[p_{\text{present}} \geq 0.5]

Objectives

Detect inflammatory response in tissue surrounding the wound, indicating infection risk or inflammatory conditions.
Monitor treatment response by tracking changes in perilesional inflammation.
Enable early infection detection through objective assessment of erythema extent.
Reduce inter-observer variability in perilesional erythema assessment (κ = 0.45-0.65).

Justification (Clinical Evidence):

Perilesional erythema >2cm from wound edge is 90% sensitive for wound infection [116].
Perilesional erythema is a key indicator of wound infection and inflammatory response, with inter-observer agreement (κ) ranging from 0.45-0.65 [107, 108].
Automated erythema assessment in wounds has shown correlation (r > 0.75) with expert visual assessment and clinical infection markers [109].

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting erythema (critical).
Specificity	≥ 0.75	Acceptable specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.65	Substantial agreement with expert assessment.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Binary classification output with confidence score
Validate on diverse wound types, patient populations, and imaging conditions
Ensure compatibility with FHIR reporting and clinical decision support systems

Damaged Wound Edges Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model assesses whether wound edges show signs of damage or compromise.

\mathbf{p} = [p_{\text{intact}}, p_{\text{damaged}}]

Objectives

Support identification of compromised wound margins, which indicate poor healing potential and increased risk of chronic wounds.
Enable treatment planning by objectively documenting edge viability and guiding debridement decisions.
Predict healing outcomes based on edge integrity assessment.

Justification (Clinical Evidence):

Damaged wound edges are associated with delayed healing and predict chronic wound development (OR 3.2-4.5) [111].
Edge assessment is critical for determining debridement needs and healing prognosis.
Studies show damaged edges increase time to closure by 40-60% compared to intact edges.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for detecting damaged edges.
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Delimited Wound Edges Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model assesses whether wound boundaries are well-defined and delimited.

\mathbf{p} = [p_{\text{undefined}}, p_{\text{delimited}}]

Objectives

Assess wound boundary definition, which indicates healing progression and epithelialization potential.
Support prognostic assessment of wound healing trajectory based on edge clarity.
Enable standardized edge assessment reducing subjective interpretation.

Justification (Clinical Evidence):

Well-delimited edges correlate with improved healing outcomes and reduced time to closure [112].
Clear wound boundaries indicate organized healing response and predict successful closure.
Delimited edges are associated with 30-40% faster healing rates compared to poorly defined boundaries.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for detecting delimited edges
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Diffuse Wound Edges Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects poorly defined or diffuse wound boundaries.

\mathbf{p} = [p_{\text{defined}}, p_{\text{diffuse}}]

Objectives

Identify poorly defined wound boundaries, indicating inflammation, infection, or underlying pathology.
Flag high-risk wounds requiring enhanced monitoring and intervention.
Enable early intervention for wounds with concerning edge characteristics.

Justification (Clinical Evidence):

Diffuse wound edges are associated with higher infection rates (2.5-fold increase) and impaired healing [113].
Poorly defined boundaries indicate active inflammation or infection requiring treatment intensification.
Diffuse edges predict chronic wound development with 70-80% sensitivity.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting diffuse edges (critical)
Specificity	≥ 0.70	Acceptable specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Thickened Wound Edges Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects hyperkeratotic or rolled (thickened) wound edges.

\mathbf{p} = [p_{\text{normal}}, p_{\text{thickened}}]

Objectives

Detect hyperkeratotic or rolled edges, which represent mechanical barriers to epithelialization.
Guide debridement strategy by identifying edge pathology requiring intervention.
Enable objective edge assessment for treatment planning.

Justification (Clinical Evidence):

Thickened wound edges require mechanical or surgical debridement to facilitate healing progression [114].
Rolled or hyperkeratotic edges create physical barriers preventing epithelial migration.
Edge debridement in wounds with thickening improves healing rates by 50-65%.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for detecting thickened edges.
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Indistinguishable Wound Edges Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model identifies wounds where edges cannot be clearly determined.

\mathbf{p} = [p_{\text{distinguishable}}, p_{\text{indistinguishable}}]

Objectives

Identify severe edge compromise where wound boundaries cannot be clinically determined.
Flag critical wounds requiring urgent specialized wound care intervention.
Enable risk stratification for poor healing outcomes.

Justification (Clinical Evidence):

Indistinguishable edges indicate severe tissue damage and predict poor outcomes without aggressive intervention [115].
Inability to define wound boundaries correlates with extensive tissue necrosis or severe infection.
Wounds with indistinguishable edges have 85-90% risk of chronic wound development without intensive intervention.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability (critical).
Sensitivity	≥ 0.85	High sensitivity for flagging critical cases.
Specificity	≥ 0.75	Acceptable specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.65	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Perilesional Maceration Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects moisture-related damage in periwound skin.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify moisture-related damage in periwound skin, which compromises healing and increases wound size.
Guide moisture management and barrier protection strategies.
Enable objective maceration assessment for treatment optimization.

Justification (Clinical Evidence):

Perilesional maceration increases wound enlargement risk by 60-80% and delays healing [117].
Maceration extent correlates with exudate volume and predicts dressing change frequency requirements [152].
Resolution of maceration improves healing rates by 35-45% [153].

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting maceration.
Specificity	≥ 0.75	Acceptable specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.65	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Fibrinous Exudate Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model identifies fibrinous exudate in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify normal healing exudate, which indicates active wound repair processes.
Differentiate fibrinous from purulent exudate for appropriate treatment selection.
Support exudate characterization in wound assessment protocols.

Justification (Clinical Evidence):

Fibrinous exudate represents physiologic healing response [121].
Presence of fibrin indicates active tissue repair and angiogenesis.
Fibrinous exudate is a normal finding in healing wounds and should not trigger antimicrobial intervention.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for detecting fibrin.
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Purulent Exudate Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects purulent (infected) exudate in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Detect infection indicators requiring antimicrobial intervention.
Enable early infection identification before systemic signs develop.
Support antimicrobial stewardship by objective infection assessment.

Justification (Clinical Evidence):

Purulent exudate has 85-95% positive predictive value for wound infection [122].
Detection of purulent drainage is a validated clinical sign of wound infection.
Early identification of purulent exudate enables prompt antimicrobial therapy, reducing complications by 40-50%.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.85	Strong discriminative ability (critical).
Sensitivity	≥ 0.85	High sensitivity for infection detection.
Specificity	≥ 0.80	High specificity to avoid false alarms.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Bloody Exudate Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model identifies bloody or hemorrhagic exudate in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify vascular injury or fragile granulation tissue.
Detect trauma or mechanical disruption of healing tissue.
Support assessment of angiogenesis quality and tissue fragility.

Justification (Clinical Evidence):

Bloody exudate may indicate trauma, friable tissue, or neovascularization [121].
Persistent bloody drainage suggests fragile granulation or vascular abnormalities.
Recognition of bloody exudate guides gentler wound handling and dressing selection.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for detecting blood.
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Serous Exudate Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model identifies serous (clear/watery) exudate in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Assess normal wound exudate in early healing phases.
Differentiate serous from purulent drainage for infection assessment.
Support exudate volume and type documentation.

Justification (Clinical Evidence):

Serous exudate is characteristic of inflammatory phase healing [121].
Clear serous drainage indicates normal wound fluid without infection.
Serous exudate assessment helps guide moisture management strategies.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.75	Good discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for detecting serous fluid.
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.60	Moderate to substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Biofilm-Compatible Tissue Assessment

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects visual indicators of biofilm presence in wound tissue.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Detect visual indicators of biofilm presence, which represents a major barrier to healing.
Guide antimicrobial strategy by identifying wounds requiring biofilm-targeted interventions.
Enable early biofilm detection before clinical infection develops.

Justification (Clinical Evidence):

Biofilm presence extends healing time by 3-4 fold and increases infection risk [118].
Visual biofilm indicators include glossy appearance, slough adherence, and characteristic patterns.
Biofilm-targeted treatment (debridement + antimicrobials) improves healing rates by 45-60%.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.75	Good sensitivity for biofilm detection.
Specificity	≥ 0.80	High specificity (avoid false positives).
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.65	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Affected Tissue: Bone

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects bone involvement or exposure in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Detect bone exposure or involvement, indicating deep wound requiring specialized management.
Enable accurate wound depth staging based on tissue layer involvement.
Guide surgical consultation and osteomyelitis risk assessment.

Justification (Clinical Evidence):

Bone exposure in diabetic foot ulcers indicates osteomyelitis in 60-90% of cases [159].
Wounds with bone involvement have 10-20 fold longer healing times compared to soft tissue wounds [160].
Bone exposure extent predicts amputation risk: >2cm² exposure increases risk 5-fold [161].

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.85	Strong discriminative ability (critical).
Sensitivity	≥ 0.85	High sensitivity for detecting bone (critical).
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Affected Tissue: Subcutaneous

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects subcutaneous tissue involvement in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify subcutaneous fat layer involvement, critical for accurate wound staging.
Enable Stage II vs. Stage III differentiation in pressure injury classification.
Guide treatment planning based on wound depth assessment.

Justification (Clinical Evidence):

Subcutaneous involvement defines Stage III pressure injuries per NPUAP/EPUAP guidelines.
Accurate depth assessment is fundamental to wound staging and treatment selection [119, 120].
Wounds extending to subcutaneous tissue require more intensive management and have longer healing times.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting subcutaneous tissue.
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Affected Tissue: Muscle

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects muscle tissue involvement or exposure in wounds.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify muscle layer involvement, indicating Stage IV wounds or severe injury.
Enable accurate depth-based staging for treatment protocol selection.
Guide surgical consultation for potential flap coverage or complex closure.

Justification (Clinical Evidence):

Muscle involvement defines Stage IV pressure injuries per NPUAP/EPUAP classification.
Wounds exposing muscle require surgical intervention in 70-85% of cases.
Muscle exposure is associated with significantly prolonged healing (3-6 months average) and high complication rates.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.85	Strong discriminative ability (critical).
Sensitivity	≥ 0.85	High sensitivity for detecting muscle (critical)
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Affected Tissue: Intact Skin

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model assesses whether wound area shows intact (unbroken) skin.

\mathbf{p} = [p_{\text{broken}}, p_{\text{intact}}]

Objectives

Identify Stage I pressure injuries with intact skin but underlying tissue damage.
Detect closed wounds vs. open ulcerations for staging purposes.
Support accurate classification of non-blanchable erythema.

Justification (Clinical Evidence):

Stage I pressure injuries present with intact skin and non-blanchable erythema, requiring different management than open wounds.
Intact skin overlying deep tissue injury represents evolving tissue damage requiring monitoring.
Recognition of intact vs. broken skin is fundamental to pressure injury staging.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting intact skin.
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Affected Tissue: Dermis-Epidermis

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects partial-thickness skin loss involving dermis and/or epidermis.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify Stage II pressure injuries with partial-thickness skin loss.
Differentiate superficial from deep wounds for appropriate treatment selection.
Enable accurate staging based on depth of tissue involvement.

Justification (Clinical Evidence):

Partial-thickness wounds involving dermis/epidermis define Stage II pressure injuries.
Dermal involvement without subcutaneous exposure indicates superficial wound with generally favorable healing prognosis.
Accurate differentiation of dermal vs. deeper involvement guides treatment intensity and healing time estimates.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting dermal involvement.
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Bed Tissue: Necrotic

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects necrotic tissue presence in the wound bed.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Identify non-viable tissue requiring urgent debridement.
Enable objective necrosis assessment for treatment prioritization.
Support debridement planning and monitoring.

Justification (Clinical Evidence):

Necrosis presence is absolute indication for debridement and predictor of poor outcomes [126].
Necrotic tissue increases infection risk 5-8 fold and delays healing.
Complete necrosis removal improves healing rates by 50-70% [150].

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.85	Strong discriminative ability (critical).
Sensitivity	≥ 0.85	High sensitivity for detecting necrosis.
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Bed Tissue: Closed

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model assesses whether a wound bed is closed (healed) or open.

\mathbf{p} = [p_{\text{open}}, p_{\text{closed}}]

Objectives

Identify wound closure as primary healing outcome.
Support healing assessment and endpoint determination.
Enable objective closure documentation for reimbursement and outcomes tracking.

Justification (Clinical Evidence):

Wound closure is the primary outcome measure in wound healing trials.
Objective closure assessment reduces inter-observer variability in healing determination.
Documentation of wound closure is required for treatment discontinuation and outcomes reporting.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.85	Strong discriminative ability.
Sensitivity	≥ 0.85	High sensitivity for detecting closure.
Specificity	≥ 0.85	High specificity (avoid premature closure calls)
F1-Score	≥ 0.85	Balanced performance.
Cohen's Kappa	≥ 0.75	Substantial to excellent agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Bed Tissue: Granulation

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects healthy granulation tissue in the wound bed.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Assess healthy healing tissue formation, indicating active repair.
Predict healing success based on granulation presence.
Guide treatment decisions regarding wound bed preparation adequacy.

Justification (Clinical Evidence):

Granulation tissue presence is strongest predictor of healing success (OR 8.5) [127].
Granulation tissue covering >75% of wound bed predicts healing with OR 8.2-12.5 [127, 142].
Presence of healthy granulation indicates adequate vascular supply and healing potential.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting granulation.
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Bed Tissue: Epithelial

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects epithelialization in the wound bed.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Detect epithelialization, indicating advanced healing and imminent closure.
Support healing phase assessment for treatment optimization.
Predict imminent wound closure for care planning.

Justification (Clinical Evidence):

Epithelialization is the final healing phase and predictor of imminent wound closure [128].
Presence of epithelial tissue indicates successful wound bed preparation and healing progression.
Epithelialization from wound edges is hallmark of re-epithelialization and approaching closure.

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting epithelialization.
Specificity	≥ 0.80	High specificity.
F1-Score	≥ 0.80	Balanced performance.
Cohen's Kappa	≥ 0.70	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Bed Tissue: Slough

Model Classification: 🔬 Clinical Model

Description

A deep learning binary classification model detects slough (devitalized tissue) in the wound bed.

\mathbf{p} = [p_{\text{absent}}, p_{\text{present}}]

Objectives

Detect devitalized tissue requiring debridement for healing progression.
Guide debridement strategy and wound bed preparation.
Monitor debridement efficacy through serial assessments.

Justification (Clinical Evidence):

Slough presence delays healing and increases infection risk by 40-60% [125].
Slough covering >30% of wound bed delays healing by average 6-8 weeks [146].
Complete debridement to <10% slough coverage improves healing rates by 45-60% [147].

Endpoints and Requirements

Metric	Threshold	Interpretation
AUC	≥ 0.80	Strong discriminative ability.
Sensitivity	≥ 0.80	High sensitivity for detecting slough.
Specificity	≥ 0.75	Good specificity.
F1-Score	≥ 0.75	Balanced performance.
Cohen's Kappa	≥ 0.65	Substantial agreement.

All thresholds must be achieved with 95% confidence intervals.

Wound Stage Classification

Model Classification: 🔬 Clinical Model

Description

A deep learning multi-class classification model assigns wounds to standardized stages (0, I, II, III, IV).

\mathbf{p} = [p_0, p_{\text{I}}, p_{\text{II}}, p_{\text{III}}, p_{\text{IV}}]

where each $p_i$ represents the probability of the wound belonging to stage $i$ , and $\sum p_i = 1$ .

The predicted stage is:

\hat{y}_{\text{stage}} = \arg\max_{j \in [0, \text{I}, \text{II}, \text{III}, \text{IV}]} p_j

Objectives

Provide standardized staging according to internationally recognized wound classification systems (NPUAP/EPUAP).
Enable treatment protocol selection based on validated stage-specific guidelines.
Facilitate outcome prediction using stage-based prognostic models.
Support documentation and reimbursement with objective staging classification.
Reduce inter-observer variability in wound staging (κ = 0.55-0.70).

Justification (Clinical Evidence):

Wound staging is fundamental to treatment planning, with stage determining intervention intensity and expected healing time [129].
Inter-observer agreement for manual staging shows moderate reliability (κ = 0.55-0.70), highlighting need for objective tools [130].
Stage-based treatment protocols improve healing rates by 25-35% compared to non-standardized care [131].
Accurate staging is required for reimbursement and quality metrics in many healthcare systems.

Endpoints and Requirements

Metric	Threshold	Interpretation
Overall Accuracy	≥ 75%	Correct stage classification in 3 out of 4 cases.
Weighted Kappa (κw)	≥ 0.70	Substantial agreement with expert staging.
Adjacent Stage Accuracy	≥ 90%	Within one stage of expert assessment (clinically safe).
Macro F1-Score	≥ 0.70	Balanced performance across all stages.
Class-specific F1	≥ 0.65	Minimum acceptable F1 for each individual stage.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Multi-class classification (5 classes: 0, I, II, III, IV)
Output probability distribution and predicted stage with confidence
Implement ordinal loss functions (penalize Stage I → IV more than I → II)
Validate on diverse wound types and patient populations
Ensure compatibility with NPUAP/EPUAP staging systems and FHIR reporting

Wound AWOSI Score Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning ordinal regression model quantifies wound severity using the AWOSI (Annotated Wound Observational Severity Index) scale (0-20).

\mathbf{p} = [p_0, p_1, \ldots, p_{20}]

where each $p_i$ represents the probability that the wound has AWOSI score $i$ , and $\sum p_i = 1$ .

The continuous AWOSI score is derived using weighted expected value:

\hat{y}_{\text{AWOSI}} = \sum_{i=0}^{20} i \cdot p_i

Objectives

Provide composite severity assessment integrating multiple wound characteristics into a single validated score.
Enable objective severity stratification for clinical decision-making and resource allocation.
Track healing progression using standardized numerical scale over time.
Facilitate clinical trial endpoints with validated, reproducible severity metric.
Support treatment intensification decisions based on objective severity thresholds.

Justification (Clinical Evidence):

Composite wound scores like AWOSI show strong correlation with healing time (r = 0.72-0.85) and clinical outcomes [132].
Validated wound intensity scores improve inter-observer reliability from κ 0.45-0.60 to κ 0.75-0.85 [133].
Longitudinal wound intensity tracking enables early identification of non-healing wounds (sensitivity 78-85%) [134].
AWOSI scores predict healing outcomes: scores >15 associated with chronic wound risk (OR 4.2-6.8).
Quantitative severity scores enable objective treatment escalation protocols and resource prioritization.

Endpoints and Requirements

Metric	Threshold	Interpretation
RMAE (Relative MAE)	≤ 20%	Predictions deviate ≤20% from expert consensus on average.
MAE (Mean Absolute Error	≤ 3.0	Average error ≤3 points on 0-20 scale (15% of range).
Within-2-Points Accuracy	≥ 75%	Predictions within ±2 points of expert score in 75% of cases.
Correlation (Pearson r)	≥ 0.80	Strong correlation with expert AWOSI scores.
ICC (Intraclass Corr.)	≥ 0.75	Substantial agreement with expert scoring.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Ordinal regression model outputting probability distribution across 0-20 scale
Calculate continuous score using weighted expected value
Implement ordinal loss functions (preserve score ordering)
Demonstrate RMAE ≤ 20%, MAE ≤ 3.0, within-2-points accuracy ≥ 75%
Validate on diverse wound types, stages, and patient populations
Report correlation with expert AWOSI scores and healing outcomes
Ensure compatibility with AWOSI calculation protocols and FHIR reporting
Support longitudinal tracking for treatment response monitoring

Body Surface Segmentation

Model Classification: 🛠️ Non-Clinical Model

Algorithm Description

A deep learning multi-class segmentation model ingests a clinical image and outputs a pixel-wise probability map across anatomical body region categories:

M(x, y) \in [\text{Background}, \text{Head/Neck}, \text{Upper Extremities}, \text{Trunk}, \text{Lower Extremities}], \quad \forall (x, y) \in \text{Image}

where $M(x, y)$ represents the predicted body region class for pixel $(x, y)$ .

The model architecture outputs a probability distribution over all region classes for each pixel:

\mathbf{p}_{(x,y)} = [p_{\text{bg}}, p_{\text{head/neck}}, p_{\text{upper}}, p_{\text{trunk}}, p_{\text{lower}}]_{(x,y)}

where $\sum_{\text{class}} p_{\text{class}} = 1$ for each pixel.

The predicted body region for each pixel is:

\hat{M}(x, y) = \arg\max_{\text{region}} p_{\text{region}}(x, y)

From this segmentation, the algorithm computes body surface area (BSA) percentages for each anatomical region, enabling automated severity scoring calculations:

\text{BSA}_{\text{region}} = \frac{\sum_{(x,y)} \mathbb{1}[\hat{M}(x,y) = \text{region}]}{\sum_{(x,y)} \mathbb{1}[\hat{M}(x,y) \neq \text{Background}]} \times 100

This provides anatomical region segmentation that accounts for body site boundaries even when partially obscured by clothing, hair, or positioning, enabling accurate BSA estimation for severity scoring systems.

Objectives

Enable automated body surface area (BSA) calculation for severity scoring systems (PASI, EASI, burn assessment) by segmenting anatomical regions regardless of clothing or occlusion.
Support PASI scoring which requires BSA affected percentages for four body regions: head/neck (10%), upper extremities (20%), trunk (30%), and lower extremities (40%).
Facilitate EASI scoring which uses similar regional BSA assessment for atopic dermatitis severity quantification.
Handle real-world clinical scenarios where patients are partially clothed or positioned in ways that obscure complete body visualization.
Provide robust anatomical boundaries by learning body region spatial relationships and proportions rather than requiring complete skin visibility.
Enable automated lesion-to-BSA mapping by combining body region segmentation with lesion segmentation to calculate affected BSA percentages.
Support treatment monitoring by providing consistent BSA measurements across longitudinal assessments regardless of positioning or clothing variations.
Reduce assessment variability in BSA estimation, which shows high inter-observer disagreement (coefficient of variation 20-40%) with manual methods.

Justification (Clinical Evidence):

Body surface area assessment is fundamental to severity scoring in dermatology, with PASI and EASI requiring accurate regional BSA estimation [285, 286].
Manual BSA estimation using the "rule of nines" or hand-palm method shows substantial inter-observer variability (κ = 0.45-0.65), particularly for irregular lesion distributions [287, 288].
The Psoriasis Area and Severity Index (PASI) requires BSA affected calculation for four body regions with specific weightings: head/neck 10%, upper extremities 20%, trunk 30%, lower extremities 40% [289].
Traditional BSA estimation methods require complete body visualization, which is often impractical in clinical settings where patients are partially clothed [290].
Automated body region segmentation has demonstrated superior consistency (ICC > 0.90) compared to manual BSA estimation (ICC 0.55-0.75) [291].
Studies show that BSA estimation errors directly propagate to severity score calculations, with 20% BSA estimation error leading to 15-30% variation in final PASI/EASI scores [292].
Body region segmentation enables objective lesion distribution analysis, identifying disease patterns (e.g., predominant truncal vs. extremity involvement) relevant for treatment selection [293].
Automated BSA calculation reduces assessment time by 50-70% compared to manual methods while improving reproducibility [294].
The ability to handle partially clothed patients addresses a major practical limitation of current automated methods, enabling deployment in real-world clinical workflows [295].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) for each body region and BSA percentage estimation accuracy compared to expert annotations.

Metric	Threshold	Interpretation
Mean IoU (all regions)	≥ 0.75	Good overall segmentation quality across all body regions.
Head/Neck IoU	≥ 0.70	Acceptable segmentation for head/neck region (10% BSA weight).
Upper Extremities IoU	≥ 0.75	Good segmentation for arms (20% BSA weight).
Trunk IoU	≥ 0.75	Good segmentation for trunk (30% BSA weight).
Lower Extremities IoU	≥ 0.75	Good segmentation for legs (40% BSA weight).
Pixel Accuracy	≥ 0.85	Overall classification accuracy across all pixels.
BSA Percentage Relative Error	≤ 15%	Regional BSA estimates within 15% of ground truth (critical for PASI/EASI accuracy).
Boundary F1-Score	≥ 0.70	Accurate delineation of body region boundaries.
Region Proportion Consistency	≥ 0.80	Correlation between predicted and expected anatomical region proportions (10/20/30/40)
Robustness to Partial Occlusion	≥ 0.70 IoU	Maintains segmentation accuracy when body regions are partially clothed/obscured.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a multi-class segmentation architecture with:
- Encoder-decoder structure (e.g., U-Net, DeepLabV3+, HRNet, or similar)
- Five output classes: Background, Head/Neck (10% BSA), Upper Extremities (20% BSA), Trunk (30% BSA), Lower Extremities (40% BSA)
- Pixel-wise probability distributions (softmax output, sum = 1 per pixel)
- Anatomical prior integration to maintain realistic body region proportions
Output structured data including:
- Segmentation masks for each body region
- BSA percentages for each region relative to total visible body surface
- Regional BSA affected when combined with lesion segmentation: $\text{BSA affected}_{\text{region}} = \frac{\text{Lesion pixels in region}}{\text{Total region pixels}} \times \text{Regional BSA weight}$
- Total BSA affected summed across all regions for PASI/EASI calculation
- Confidence maps indicating segmentation certainty for each region
- Occlusion indicators flagging partially visible or obscured regions
- Body region visibility percentages for quality assessment
Demonstrate performance meeting or exceeding all thresholds:
- Mean IoU ≥ 0.75 across all body regions
- Region-specific IoU thresholds for each anatomical area
- BSA percentage relative error ≤ 15%
- Region proportion consistency ≥ 0.80
Report all metrics with 95% confidence intervals for each region independently.
Validate the model on an independent and diverse test dataset including:
- Various body positions: Standing, sitting, lying down, partial views
- Different clothing scenarios:
  - Fully unclothed (reference standard)
  - Partially clothed (shirt only, pants only, underwear)
  - Clinical gowns with partial exposure
  - Hair covering scalp/face
- Multiple imaging perspectives: Frontal, posterior, lateral, oblique views
- Diverse patient populations:
  - Various body habitus (BMI ranges, body proportions)
  - Different ages (pediatric, adult, geriatric with body proportion differences)
  - Various Fitzpatrick skin types (I-VI)
  - Different genders and anatomical variations
- Various imaging conditions: Different lighting, distances, camera angles
- Skin conditions: Healthy skin and various dermatoses across all regions
Handle anatomical variability and occlusion:
- Partial body visibility: Only head/neck and upper trunk visible
- Clothing occlusion: T-shirts obscuring trunk, pants covering lower extremities
- Hair coverage: Long hair obscuring neck, scalp, upper back
- Positioning artifacts: Crossed arms, bent limbs affecting visible surface area
- Extreme body habitus: Obesity, cachexia affecting regional proportions
- Pediatric proportions: Different head-to-body ratios in children
Implement anatomical constraints:
- Spatial priors: Body regions should be spatially coherent and contiguous
- Proportion constraints: Regional BSA percentages should approximate anatomical standards (10/20/30/40) when full body visible
- Boundary smoothness: Enforce realistic anatomical boundaries between regions
- Occlusion handling: Infer complete region boundaries even when partially obscured
Ensure outputs are compatible with:
- PASI calculation systems: Provide head/neck (×0.1), upper extremities (×0.2), trunk (×0.3), lower extremities (×0.4) BSA affected
- EASI calculation systems: Similar regional BSA assessment for atopic dermatitis
- Burn assessment tools: Total body surface area involved in burns
- FHIR-based structured reporting for interoperability
- Clinical decision support systems requiring anatomical BSA data
- Lesion-to-BSA mapping algorithms combining body region and lesion segmentation
Provide BSA calculation features:
- Regional BSA percentages for each anatomical area
- Lesion-specific BSA affected when lesion masks are provided:
  - Calculate overlap between lesion mask and each body region
  - Weight by standard anatomical BSA percentages (10/20/30/40)
  - Sum across regions for total BSA affected
- Confidence intervals for BSA estimates based on segmentation uncertainty
- Quality flags indicating:
  - Partial body visibility (may affect BSA accuracy)
  - Extreme positioning (non-standard anatomical proportions)
  - Occlusion severity (percentage of regions obscured)
Document the training strategy including:
- Multi-expert annotation protocol for anatomical region ground truth
- Handling of ambiguous boundaries (e.g., neck-trunk, shoulder-arm transitions)
- Data augmentation strategies:
  - Simulated clothing occlusion
  - Body position variations
  - Simulated partial body visibility
- Loss function design:
  - Combined Dice + Cross-Entropy for segmentation
  - Anatomical proportion regularization to maintain realistic BSA percentages
  - Boundary-aware losses for accurate region delineation
- Class balancing for anatomical regions with different prevalence
Implement occlusion robustness:
- Train on datasets with varying degrees of clothing coverage
- Learn to infer complete body region boundaries from partial visibility
- Use spatial context and body proportion priors to complete occluded regions
- Provide occlusion confidence scores indicating reliability of inferred boundaries
Provide evidence that:
- The model generalizes across different body positions and viewing angles
- Performance is maintained with partial clothing and occlusion
- BSA estimates are accurate across diverse body habitus and age groups
- Regional proportions are realistic even with partial body visibility
- The model maintains accuracy across different Fitzpatrick skin types
- Anatomical boundaries are consistent with expert annotations
Include interpretability features:
- Visualization overlays showing detected body regions with BSA percentages
- Anatomical boundary highlighting for region transitions
- Occlusion maps indicating which regions are partially obscured
- BSA breakdown showing contribution of each region to total affected area
- Confidence visualization for segmentation uncertainty
Implement quality control mechanisms:
- Body visibility score: Percentage of standard body surface visible in image
- Regional occlusion flags: Indicate which regions are partially hidden
- Anatomical proportion validation: Flag images with unrealistic body proportions
- Segmentation confidence thresholds: Recommend manual review for low-confidence cases
- BSA calculation warnings: Alert when partial visibility may affect accuracy
Document failure modes and limitations:
- Extreme occlusion: Performance degrades when >60% of body is obscured
- Atypical positioning: Non-standard body positions may affect boundary accuracy
- Extreme body habitus: Severe obesity or cachexia may alter regional proportions
- Pediatric extremes: Very young children have different head-to-body ratios
- Incomplete body views: Single limb or small body region only (insufficient for BSA)
Provide PASI/EASI integration:
- Accept lesion segmentation mask as input
- Calculate lesion area within each body region
- Apply standard BSA weightings:
  - Head/Neck lesion area × 0.1
  - Upper Extremities lesion area × 0.2
  - Trunk lesion area × 0.3
  - Lower Extremities lesion area × 0.4
- Output total BSA affected for PASI/EASI calculation
- Provide regional BSA affected breakdown for detailed analysis

Clinical Impact:

The Body Surface Segmentation model serves critical functions for severity assessment:

Automated PASI/EASI calculation: Enables accurate BSA affected quantification for severity scoring
Real-world applicability: Handles partially clothed patients common in clinical practice
Reproducibility: Reduces inter-observer variability in BSA estimation from 20-40% to <10%
Clinical efficiency: Reduces assessment time by 50-70% compared to manual BSA estimation
Lesion distribution analysis: Identifies disease patterns (predominant body regions affected)
Treatment monitoring: Provides consistent BSA tracking across longitudinal assessments
Telemedicine enablement: Supports remote severity assessment with patient-captured images

Body Region Definitions (PASI/EASI Standard):

The model segments the body according to standard dermatological severity scoring conventions:

Head and Neck (10% BSA): Entire head, scalp, face, ears, neck (anterior and posterior)
Upper Extremities (20% BSA): Arms from shoulder to fingertips including:
- Shoulders
- Upper arms (anterior and posterior)
- Elbows
- Forearms
- Wrists
- Hands (palms and dorsal)
- Axillae (included in upper extremity region for PASI)
Trunk (30% BSA): Torso from base of neck to top of legs including:
- Chest (anterior trunk)
- Abdomen
- Back (entire posterior trunk)
- Buttocks/gluteal region
Lower Extremities (40% BSA): Legs from top of thigh to toes including:
- Thighs (anterior, posterior, medial, lateral)
- Knees
- Lower legs (shins, calves)
- Ankles
- Feet (soles and dorsal)
- Inguinal/groin region (included in lower extremity for PASI)

PASI Calculation Integration:

When combined with lesion segmentation and severity assessment (erythema, induration, desquamation), the body region segmentation enables complete automated PASI calculation:

\text{PASI} = 0.1 \times (E_h + I_h + D_h) \times A_h + 0.2 \times (E_u + I_u + D_u) \times A_u + 0.3 \times (E_t + I_t + D_t) \times A_t + 0.4 \times (E_l + I_l + D_l) \times A_l

where:

$E, I, D$ = Erythema, Induration, Desquamation scores (0-4)
$A$ = Area score (0-6 based on BSA affected percentage)
Subscripts: $h$ = head/neck, $u$ = upper extremities, $t$ = trunk, $l$ = lower extremities

The body region segmentation provides the area score $A$ for each region by calculating lesion coverage within that anatomical area.

Technical Details:

Occlusion Handling Strategy:

Spatial context learning: Model learns typical body region spatial relationships to infer occluded boundaries
Anatomical priors: Enforces realistic body proportions (e.g., head typically ~10% of total body surface)
Multi-scale processing: Captures both local boundaries and global body structure
Confidence calibration: Provides uncertainty estimates higher for inferred (occluded) regions

Boundary Delineation:

Anatomical landmarks: Head-neck junction, shoulder line, waist, hip crease
Smooth transitions: Enforces gradual boundaries rather than sharp transitions
Bilateral symmetry: Leverages left-right body symmetry for improved robustness
Position invariance: Maintains boundary accuracy across different body positions

Quality Assurance:

Body visibility check: Flags images where <50% of body surface is visible
Proportion validation: Alerts when detected regional proportions deviate >20% from anatomical standards
Occlusion severity: Quantifies percentage of each region obscured by clothing/positioning
BSA calculation confidence: Adjusts confidence based on visibility and segmentation quality

Note: This is a Non-Clinical model that performs anatomical body region segmentation to enable BSA calculation for severity scoring systems (PASI, EASI, burn assessment). It does not make medical diagnoses or clinical assessments. The model provides quantitative anatomical region boundaries and BSA percentages that are used as inputs to clinical scoring systems operated by healthcare professionals. While it handles partially clothed patients and occluded body regions, extreme occlusion (>60%) may reduce accuracy and require clinical judgment.

Endpoints and Requirements

Performance is evaluated using task-appropriate metrics for each output type: RMAE for ordinal outputs, accuracy and F1-score for categorical staging, AUC for binary classifications, and overall multi-task performance.

Output Type	Specific Output	Metric	Threshold	Interpretation
Ordinal (0-10)	Erythema	RMAE	`≤ 20%`	Predictions within 20% of expert consensus.
Categorical (1-4)	Wound Stage	Accuracy	`≥ 75%`	Correct stage classification in 3 out of 4 cases.
		F1-score	`≥ 0.70`	Balanced precision and recall across stages.
		Kappa (κ)	`≥ 0.65`	Substantial agreement with expert staging.
Ordinal (0-40)	Wound Intensity	RMAE	`≤ 15%`	Predictions within 15% of expert consensus for composite score.
		ICC	`≥ 0.75`	Strong reliability compared to expert raters.
Binary (23 outputs)	Edge characteristics (5)	AUC (avg)	`≥ 0.80`	Good discrimination for all edge types.
	Perilesional features (2)	AUC (avg)	`≥ 0.80`	Good discrimination for perilesional characteristics.
	Tissue types (5)	AUC (avg)	`≥ 0.80`	Good discrimination for tissue depth assessment.
	Exudate types (5)	AUC (avg)	`≥ 0.80`	Good discrimination for exudate characterization.
	Wound bed tissue (5)	AUC (avg)	`≥ 0.80`	Good discrimination for wound bed tissue types.
	Individual critical signs	AUC (min)	`≥ 0.75`	Minimum acceptable performance for any single binary output.
Multi-task Overall	Composite performance	mAP	`≥ 0.75`	Mean average precision across all outputs.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a multi-task architecture with:
- Shared feature extraction backbone (e.g., CNN or Vision Transformer)
- Specialized output heads:
  - One ordinal head (11 classes) for erythema (0-10)
  - One categorical head (4 classes) for wound stage (1-4)
  - One ordinal head (41 classes) for wound intensity (0-40)
  - Twenty-three binary classification heads for specific wound characteristics
Output structured data including:
- Erythema score (0-10 continuous)
- Wound stage (1-4 categorical)
- Wound intensity score (0-40 continuous)
- Binary presence/absence for each of 23 wound characteristics
- Confidence scores for all predictions
Demonstrate performance meeting or exceeding all thresholds for:
- RMAE ≤ 20% for erythema, RMAE ≤ 15% for wound intensity
- Accuracy ≥ 75% and κ ≥ 0.65 for wound staging
- AUC ≥ 0.80 (average) and AUC ≥ 0.75 (minimum) for binary outputs
- mAP ≥ 0.75 for overall multi-task performance
Report all metrics with 95% confidence intervals for each output independently.
Validate the model on an independent and diverse test dataset including:
- Various wound etiologies (pressure injuries, diabetic foot ulcers, venous leg ulcers, surgical wounds, traumatic wounds)
- All wound stages (1-4)
- Diverse patient populations (various Fitzpatrick skin types, ages, comorbidities)
- Multiple anatomical locations
- Various imaging conditions and devices
Ensure outputs are compatible with:
- Standardized wound assessment protocols (NPUAP/EPUAP staging, TIME framework)
- AWOSI scoring system calculation and interpretation
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for wound care pathways
Document the joint training strategy including:
- Loss weighting scheme for multiple output types (ordinal, categorical, binary)
- Handling of class imbalance in binary outputs
- Regularization strategies for multi-task learning
Provide evidence that:
- Multi-task learning improves individual task performance compared to single-task baselines
- The model maintains performance across different wound types and stages
- Predictions align with clinical wound assessment guidelines and expert consensus

Erythema Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of a wound and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Erythema}, \text{Non-Erythema Wound}], \quad \forall (x, y) \in \text{Image}

Background = non-wound area
Erythema = perilesional and wound bed erythema
Non-Erythema Wound = wound tissue without erythema

From this segmentation, the algorithm computes the percentage of erythema surface area relative to the total wound area:

\hat{y}_{\text{erythema}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Erythema}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Erythema}, \text{Non-Erythema Wound}]]} \times 100

This provides objective, reproducible quantification of wound erythema extent, enabling standardized inflammatory assessment and infection surveillance.

Objectives

Quantify perilesional and wound bed erythema extent, which indicates inflammatory response and infection risk.
Enable objective tracking of inflammatory changes over time for infection surveillance.
Support clinical decision-making by providing quantitative measures of wound inflammation.
Reduce variability in visual erythema extent estimation.

Justification (Clinical Evidence):

Extent of perilesional erythema is a validated predictor of wound infection (sensitivity 78-85%) [135].
Automated erythema surface quantification shows strong correlation (r = 0.76-0.84) with clinical infection diagnosis [136].
Percentage erythema surface area >20% of wound perimeter is associated with 3-fold increased infection risk [137].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) for percentage surface area compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.60`	Good segmentation and accurate surface percentage for erythema.
RE%	`≤ 20%`	Relative error within 20% of expert consensus.
**Pixel Accuracy	`≥ 0.85`	Overall classification accuracy.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture (e.g., U-Net, DeepLabV3+) for erythema detection.
Output structured data including:
- Erythema segmentation mask
- Percentage surface area of erythema relative to total wound area
- Absolute surface area in cm² or mm² (when calibration available)
- Confidence maps indicating segmentation certainty
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.60, RE% ≤ 20%, Pixel Accuracy ≥ 0.85
Report all metrics with 95% confidence intervals.
Validate the model on an independent and diverse test dataset including various wound etiologies, patient populations, and imaging conditions.
Ensure outputs are compatible with FHIR-based structured reporting and wound assessment protocols.

Wound Bed Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image and outputs a binary segmentation delineating the wound bed area:

M(x, y) \in [\text{Background}, \text{Wound Bed}], \quad \forall (x, y) \in \text{Image}

From this segmentation, the algorithm computes the total wound surface area:

\hat{y}_{\text{wound}} = \sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Wound Bed}] \times \text{pixel area}

When calibration is available (scale marker or ruler), this measurement is provided in absolute units (cm² or mm²).

This provides objective, reproducible wound measurement, eliminating ruler-based measurement errors and enabling accurate tracking of wound closure progression.

Objectives

Quantify total wound surface area, which is fundamental for wound size assessment and healing trajectory monitoring.
Enable accurate wound measurement eliminating ruler-based measurement errors and irregular wound shape challenges.
Track wound closure progression using objective, reproducible surface area measurements.
Calculate wound healing rate (% area reduction per week) for treatment efficacy assessment.

Justification (Clinical Evidence):

Manual wound measurement shows high variability (coefficient of variation 15-30%) particularly for irregular wounds [138].
Digital planimetry using segmentation achieves agreement with expert tracing (ICC > 0.90) [139].
Wound surface area is the primary outcome measure in wound healing trials, requiring accurate quantification [140].
Healing rate (% area reduction) is the strongest predictor of eventual wound closure [141].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.75`	High accuracy for total wound area, primary outcome measure.
RE%	`≤ 15%`	Relative error within 15% of expert consensus.
Pixel Accuracy	`≥ 0.90`	Overall classification accuracy.
*Boundary Precision	`≥ 0.80`	Accurate delineation of wound edges.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture optimized for wound boundary detection.
Output structured data including:
- Wound bed segmentation mask
- Total wound surface area (with calibration when available)
- Wound dimensions (maximum length, width, perimeter)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.75, RE% ≤ 15%, Pixel Accuracy ≥ 0.90
Report all metrics with 95% confidence intervals.
Validate on diverse wound types, sizes, shapes, and imaging conditions.
Support calibration methods (ruler, marker) for absolute measurements.
Ensure outputs are compatible with wound measurement standards and FHIR reporting.

Angiogenesis and Granulation Tissue Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of a wound and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Granulation Tissue}, \text{Other Wound Tissue}], \quad \forall (x, y) \in \text{Image}

Background = non-wound area
Granulation Tissue = healthy granulation tissue indicating active healing and angiogenesis
Other Wound Tissue = wound area without granulation

From this segmentation, the algorithm computes the percentage of granulation tissue relative to the total wound area:

\hat{y}_{\text{granulation}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Granulation}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Granulation}, \text{Other Wound Tissue}]]} \times 100

This provides objective quantification of wound bed preparation status and healing progression.

Objectives

Quantify healthy granulation tissue extent, which indicates active wound healing and predicts successful closure.
Assess wound bed preparation adequacy for advanced therapies or surgical closure.
Monitor angiogenesis progression as indicator of healing phase and vascular response.
Guide treatment decisions by identifying wounds with inadequate granulation requiring intervention.

Justification (Clinical Evidence):

Granulation tissue covering >75% of wound bed is strongest predictor of healing (OR 8.2-12.5) [127, 142].
Automated granulation quantification shows excellent agreement with expert assessment (κ = 0.82-0.88) [143].
Granulation tissue percentage correlates strongly with time to wound closure (r = -0.78) [144].
Low granulation tissue (<40%) predicts chronic wound development with 82% sensitivity [145].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.70`	Good segmentation of healthy healing tissue.
RE%	`≤ 20%`	Relative error within 20% of expert consensus.
Pixel Accuracy	`≥ 0.85`	Overall classification accuracy.
Sensitivity	`≥ 0.75`	High sensitivity for detecting granulation.
Specificity	`≥ 0.80`	Accurate differentiation from other wound types.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture for granulation tissue detection.
Output structured data including:
- Granulation tissue segmentation mask
- Percentage surface area relative to total wound area
- Absolute surface area (when calibration available)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.70, RE% ≤ 20%
Report all metrics with 95% confidence intervals.
Validate on diverse wound types and healing stages.
Ensure outputs are compatible with wound bed preparation protocols and FHIR reporting.

Biofilm and Slough Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of a wound and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Biofilm/Slough}, \text{Other Wound Tissue}], \quad \forall (x, y) \in \text{Image}

Background = non-wound area
Biofilm/Slough = devitalized tissue, slough, or biofilm requiring debridement
Other Wound Tissue = wound area without slough/biofilm

From this segmentation, the algorithm computes the percentage of biofilm/slough relative to the total wound area:

\hat{y}_{\text{slough}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Biofilm/Slough}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Biofilm/Slough}, \text{Other Wound Tissue}]]} \times 100

This provides objective quantification of devitalized tissue burden, guiding debridement strategy and monitoring debridement efficacy.

Objectives

Quantify devitalized tissue burden, which requires debridement before healing can progress.
Identify biofilm presence and extent, a major barrier to wound healing requiring targeted intervention.
Guide debridement strategy by quantifying tissue requiring removal.
Monitor debridement efficacy through serial measurements of slough/biofilm reduction.

Justification (Clinical Evidence):

Slough covering >30% of wound bed delays healing by average 6-8 weeks [146].
Biofilm presence extends healing time by 3-4 fold and increases infection risk [118].
Complete debridement to <10% slough coverage improves healing rates by 45-60% [147].
Automated slough quantification enables objective debridement endpoints (target <20% coverage) [148].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.65`	Acceptable performance for challenging devitalized tissue.
RE%	`≤ 25%`	Relative error within 25% of expert consensus.
Pixel Accuracy	`≥ 0.80`	Overall classification accuracy.
Sensitivity	`≥ 0.70`	Adequate sensitivity for detecting slough/biofilm.
Specificity	`≥ 0.75`	Differentiation from other wound tissues.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture for biofilm/slough detection.
Output structured data including:
- Biofilm/slough segmentation mask
- Percentage surface area relative to total wound area
- Absolute surface area (when calibration available)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.65, RE% ≤ 25%
Report all metrics with 95% confidence intervals.
Validate on various wound types with different levels of devitalized tissue.
Ensure outputs are compatible with debridement protocols and FHIR reporting.

Necrosis Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of a wound and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Necrosis}, \text{Other Wound Tissue}], \quad \forall (x, y) \in \text{Image}

Background = non-wound area
Necrosis = necrotic tissue (eschar, black/brown devitalized tissue)
Other Wound Tissue = viable wound tissue

From this segmentation, the algorithm computes the percentage of necrotic tissue relative to the total wound area:

\hat{y}_{\text{necrosis}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Necrosis}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Necrosis}, \text{Other Wound Tissue}]]} \times 100

This provides objective quantification of non-viable tissue requiring urgent debridement.

Objectives

Quantify necrotic tissue extent, indicating non-viable tissue requiring urgent debridement.
Prioritize surgical intervention for wounds with extensive necrosis.
Monitor debridement completeness by tracking necrosis elimination.
Assess infection risk, as necrotic tissue is prime substrate for bacterial growth.

Justification (Clinical Evidence):

Necrotic tissue presence is absolute indication for debridement and major risk factor for infection [126].
Necrosis covering >25% of wound bed increases amputation risk 4-fold in diabetic foot ulcers [149].
Complete necrosis removal improves healing rates by 50-70% compared to partial debridement [150].
Time to necrosis debridement predicts outcomes: debridement within 2 weeks reduces complications by 60% [151].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.70`	Good detection of non-viable tissue requiring debridement.
RE%	`≤ 20%`	Relative error within 20% of expert consensus.
Pixel Accuracy	`≥ 0.85`	Overall classification accuracy.
Sensitivity	`≥ 0.80`	High sensitivity for detecting necrosis (critical).
Specificity	`≥ 0.80`	Accurate differentiation from viable tissue.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture for necrosis detection.
Output structured data including:
- Necrosis segmentation mask
- Percentage surface area relative to total wound area
- Absolute surface area (when calibration available)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.70, RE% ≤ 20%
Report all metrics with 95% confidence intervals.
Validate on various wound types with different necrosis presentations (dry eschar, wet necrosis).
Ensure outputs are compatible with urgent debridement protocols and FHIR reporting.

Maceration Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Maceration}, \text{Healthy Periwound}], \quad \forall (x, y) \in \text{Image}

Background = non-relevant area
Maceration = periwound skin with moisture damage (white, wrinkled appearance)
Healthy Periwound = intact periwound skin without maceration

From this segmentation, the algorithm computes the extent of periwound maceration:

\hat{y}_{\text{maceration}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Maceration}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Maceration}, \text{Healthy Periwound}]]} \times 100

This provides objective quantification of periwound moisture damage, guiding moisture management strategies.

Objectives

Quantify periwound moisture damage extent, which enlarges wounds and delays healing.
Guide moisture management strategy including absorbent dressing selection and frequency.
Monitor treatment efficacy by tracking maceration reduction with barrier products.
Identify wounds at risk of enlargement due to excessive exudate.

Justification (Clinical Evidence):

Periwound maceration increases wound enlargement risk by 60-80% [117].
Maceration extent correlates with exudate volume and predicts dressing change frequency requirements [152].
Resolution of maceration improves healing rates by 35-45% [153].
Maceration affecting >2cm perimeter is associated with delayed healing (HR 2.1-2.8) [154].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.60`	Acceptable performance for periwound moisture damage.
RE%	`≤ 25%`	Relative error within 25% of expert consensus.
Pixel Accuracy	`≥ 0.80`	Overall classification accuracy.
Sensitivity	`≥ 0.65`	Adequate sensitivity for detecting maceration.
Specificity	`≥ 0.75`	Differentiation from healthy periwound skin.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture for periwound maceration detection.
Output structured data including:
- Maceration segmentation mask
- Percentage/extent of periwound maceration
- Absolute area (when calibration available)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.60, RE% ≤ 25%
Report all metrics with 95% confidence intervals.
Validate on various wound types with different exudate levels.
Ensure outputs are compatible with moisture management protocols and FHIR reporting.

Orthopedic Material Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of a wound and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Orthopedic Material}, \text{Wound Tissue}], \quad \forall (x, y) \in \text{Image}

Background = non-wound area
Orthopedic Material = exposed orthopedic hardware, plates, screws, prosthetic materials
Wound Tissue = biological wound tissue

From this segmentation, the algorithm computes the percentage of exposed orthopedic material relative to the total wound area:

\hat{y}_{\text{orthopedic}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Orthopedic Material}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Orthopedic Material}, \text{Wound Tissue}]]} \times 100

This provides objective detection and quantification of exposed orthopedic hardware, indicating device-related complications requiring intervention.

Objectives

Detect and quantify exposed orthopedic hardware or materials, which indicates device-related complications.
Identify hardware exposure requiring surgical revision or coverage procedures.
Assess infection risk associated with exposed prosthetic materials.
Guide treatment planning for hardware-associated wound complications.

Justification (Clinical Evidence):

Exposed orthopedic hardware increases infection risk 8-12 fold [155].
Hardware exposure requires surgical intervention in 75-85% of cases [156].
Early detection of hardware exposure enables preventive interventions reducing major complications by 40-50% [157].
Extent of hardware exposure correlates with complexity of required revision surgery [158].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.65`	Acceptable detection of exposed hardware.
RE%	`≤ 25%`	Relative error within 25% of expert consensus
Pixel Accuracy	`≥ 0.85`	Overall classification accuracy.
Sensitivity	`≥ 0.75`	High sensitivity for detecting hardware.
Specificity	`≥ 0.85`	Accurate differentiation from bone/tissue.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture for orthopedic material detection.
Output structured data including:
- Orthopedic material segmentation mask
- Percentage surface area relative to total wound area
- Absolute surface area (when calibration available)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.65, RE% ≤ 25%
Report all metrics with 95% confidence intervals.
Validate on various wound types with different hardware exposures.
Handle class imbalance (rare class) appropriately.
Ensure outputs trigger surgical consultation alerts and are compatible with FHIR reporting.

Bone, Cartilage, or Tendon Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of a wound and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Background}, \text{Deep Structures}, \text{Soft Tissue}], \quad \forall (x, y) \in \text{Image}

Background = non-wound area
Deep Structures = exposed bone, cartilage, or tendon
Soft Tissue = wound soft tissue (muscle, fat, dermis)

From this segmentation, the algorithm computes the percentage of exposed deep structures relative to the total wound area:

\hat{y}_{\text{deep}} = \frac{\sum_{(x,y)} \mathbb{1}[M(x,y) = \text{Deep Structures}]}{\sum_{(x,y)} \mathbb{1}[M(x,y) \in [\text{Deep Structures}, \text{Soft Tissue}]]} \times 100

This provides objective detection and quantification of exposed bone, cartilage, or tendon, indicating severe wounds with high complication risk.

Objectives

Detect and quantify exposed deep structures, indicating severe wounds with osteomyelitis or septic arthritis risk.
Enable accurate wound staging based on tissue depth involvement.
Guide surgical planning for coverage procedures or amputation consideration.
Assess osteomyelitis risk, as bone exposure is major risk factor.

Justification (Clinical Evidence):

Bone exposure in diabetic foot ulcers indicates osteomyelitis in 60-90% of cases [159].
Wounds with exposed bone/tendon have 10-20 fold longer healing times compared to soft tissue wounds [160].
Bone exposure extent predicts amputation risk: >2cm² exposure increases risk 5-fold [161].
Early identification of deep structure exposure enables prompt orthopedic/plastic surgery consultation reducing complications [162].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and Relative Error (RE%) compared to expert annotations.

Metric	Threshold	Interpretation
IoU	`≥ 0.65`	Acceptable detection of exposed deep structures.
RE%	`≤ 25%`	Relative error within 25% of expert consensus.
Pixel Accuracy	`≥ 0.85`	Overall classification accuracy.
Sensitivity	`≥ 0.80`	High sensitivity for detecting bone/tendon (critical)
Specificity	`≥ 0.85`	Accurate differentiation from soft tissue.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a segmentation architecture for deep structure detection.
Output structured data including:
- Deep structure (bone/cartilage/tendon) segmentation mask
- Percentage surface area relative to total wound area
- Absolute surface area (when calibration available)
- Confidence maps
Demonstrate performance meeting or exceeding all thresholds: IoU ≥ 0.65, RE% ≤ 25%
Report all metrics with 95% confidence intervals.
Validate on various wound types with deep structure exposure.
Handle class imbalance (rare class) appropriately.
Ensure outputs trigger urgent surgical consultation alerts and are compatible with FHIR reporting.
Support osteomyelitis risk stratification based on bone exposure extent.

Hair Loss Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of the scalp and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Hair}, \text{No Hair}, \text{Non-Scalp}], \quad \forall (x, y) \in \text{Image}

Hair = scalp region with visible hair coverage
No Hair = scalp region with hair loss
Non-Scalp = background, face, ears, or any non-scalp area

From this segmentation, the algorithm computes the percentage of hair loss surface area relative to the total scalp surface:

\hat{y} = \frac{\sum M(x, y) = \text{No Hair}}{\sum M(x, y) \in [\text{Hair}, \text{No Hair}]} \times 100

This provides an objective and reproducible measure of the extent of alopecia, excluding background and non-scalp regions.

Objectives

Support healthcare professionals by providing precise and reproducible quantification of alopecia surface extent.
Reduce subjectivity in clinical indices such as the Severity of Alopecia Tool (SALT), which relies on visual estimates of scalp surface affected [Hasan 2023].
Enable automatic calculation of validated severity scores (e.g., SALT, APULSI) directly from images.
Improve robustness by excluding non-scalp regions, ensuring consistent results across varied image framing conditions.
Facilitate standardization across clinical practice and trials where manual estimation introduces variability.

Justification (Clinical Evidence):

Hair loss evaluation is extent-based (surface area involved), making it distinct from lesion counting or intensity scoring [103].
Manual estimation of scalp surface involvement is subjective and variable, particularly in diffuse hair thinning or patchy alopecia areata [105].
Deep learning segmentation methods have shown expert-level agreement in skin lesion and hair density mapping, demonstrating robustness across imaging conditions [104].
Standardized, automated quantification strengthens trial endpoints and improves reproducibility in therapeutic monitoring [106].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) for scalp segmentation and Relative Error (RE%) for percentage hair loss compared to expert annotations.

Metric	Threshold	Interpretation
IoU (Scalp segmentation)	`≥ 0.50`	Segmentation of hair/no-hair vs. scalp achieves clinical utility.
Relative Error (Hair loss %)	`≤ 20%`	Predicted hair loss percentage deviates ≤ 20% from expert consensus.

Success criteria: The algorithm must achieve IoU ≥ 0.50 for segmentation and RE ≤ 20% for surface percentage estimation, with 95% confidence intervals.

Requirements:

Perform three-class segmentation (Hair, No Hair, Non-Scalp).
Compute percentage of hair loss relative to total scalp.
Demonstrate IoU ≥ 0.50 and RE ≤ 20% compared to expert consensus.
Validate on diverse populations (age, sex, skin tone, hair type, alopecia subtype).
Provide outputs in a FHIR-compliant structured format for interoperability.

Inflammatory Nodular Lesion Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning multi-class object detection model ingests a clinical image and outputs bounding boxes with associated class labels and confidence scores for each detected lesion:

\mathbf{D} = [(b_1, l_1, c_1), (b_2, l_2, c_2), \ldots, (b_n, l_n, c_n)]

where $b_i$ is the bounding box for the $i$ -th predicted lesion, $l_i \in [\text{Nodule}, \text{Abscess}, \text{Non-draining tunnel}, \text{Draining tunnel}]$ is the class label, and $c_i \in [0,1]$ is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs separate counts for each lesion type:

\hat{y}_{\text{class}} = \sum_{i=1}^{n} \mathbb{1}[l_i = \text{class} \land c_i \geq \tau]

where $\tau$ is a confidence threshold.

This provides objective, reproducible counts of nodules, abscesses, non-draining tunnels, and draining tunnels directly from clinical images, without requiring manual annotation by clinicians.

Objectives

Nodule Lesion Quantification

Support healthcare professionals in quantifying nodular burden, which is essential for severity assessment in conditions such as hidradenitis suppurativa (HS), acne, and cutaneous lymphomas.
Reduce inter-observer and intra-observer variability in lesion counting, which is common in clinical practice and clinical trials [101].
Enable automated severity scoring by integrating nodule counts into composite indices such as the International Hidradenitis Suppurativa Severity Score System (IHS4), which uses the counts of nodules, abscesses, and draining tunnels [102].
Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type) [99, 100].

Justification (Clinical Evidence):

Clinical guidelines emphasize lesion counts (e.g., nodules) as a cornerstone for HS severity scoring (IHS4) and for acne grading systems [102].
Human counting is prone to fatigue and subjective error, with discrepancies in whether a lesion qualifies as a nodule, or is double-counted/omitted [REQ_002].
Automated counting has shown high accuracy: AI-based acne lesion counting achieved F1 scores >0.80 for inflammatory lesions [101].
Object detection approaches (CNN + attention mechanisms) are validated in lesion-counting tasks and other biomedical domains, offering superior reproducibility compared to human raters [Cai 2019; Wang 2021].

Abscess Lesion Quantification

Support accurate identification of abscesses, which are critical indicators of severe disease activity in hidradenitis suppurativa and require differentiation from nodules [102].
Reduce diagnostic variability in distinguishing abscesses from other inflammatory lesions, improving consistency in severity assessment.
Enable precise IHS4 scoring, where abscess count is weighted more heavily than nodules (multiplication factor of 2) due to greater clinical significance [102].
Facilitate treatment decision-making, as abscess presence and count influence therapeutic choices including systemic therapy initiation.

Justification (Clinical Evidence):

The IHS4 scoring system assigns double weight to abscesses compared to nodules, reflecting their greater clinical importance in HS severity assessment [102].
Inter-observer variability in abscess identification ranges from moderate to substantial (κ = 0.55-0.75), highlighting the need for objective assessment tools [101].
Automated detection systems can distinguish abscesses from nodules based on visual features such as fluctuance appearance, size, and surrounding inflammation with >85% accuracy [101].
Accurate abscess quantification is essential for treatment monitoring and response assessment in clinical trials [102].

Non-Draining Tunnel Lesion Quantification

Support identification of non-draining tunnels (sinus tracts), which represent chronic disease progression and structural tissue damage in hidradenitis suppurativa.
Reduce detection variability, as non-draining tunnels may be subtle and easily missed during clinical examination, leading to underestimation of disease severity.
Enable comprehensive severity assessment, as tunnel presence indicates advanced disease requiring more aggressive therapeutic interventions.
Facilitate longitudinal monitoring of disease progression and treatment response, particularly for therapies targeting tunnel resolution.

Justification (Clinical Evidence):

Non-draining tunnels are often underreported in clinical assessments, with detection rates varying significantly between observers (κ = 0.40-0.65) [101].
Presence of tunnels (draining or non-draining) is associated with higher disease burden and poorer quality of life outcomes in HS patients [102].
Visual assessment of tunnels shows significant inter-observer disagreement, particularly in distinguishing non-draining from draining tunnels [101].
Automated detection can improve tunnel identification by analyzing subtle surface irregularities and linear patterns indicative of underlying sinus tracts [101].

Draining Tunnel Lesion Quantification

Support accurate identification of draining tunnels, which are the most severe manifestation in hidradenitis suppurativa and the most heavily weighted component in IHS4 scoring (multiplication factor of 4) [102].
Reduce assessment variability in detecting active drainage, which can be subtle or intermittent during examination.
Enable precise severity stratification, as draining tunnel count is the strongest predictor of severe disease requiring advanced therapeutic interventions.
Facilitate treatment monitoring, as reduction in draining tunnels is a key endpoint in HS clinical trials and therapeutic response assessment.

Justification (Clinical Evidence):

Draining tunnels carry the highest weight in IHS4 scoring (×4 multiplier), reflecting their role as the most severe disease manifestation [102].
Inter-observer agreement for draining tunnel detection ranges from moderate to good (κ = 0.60-0.80), with variability influenced by drainage activity at time of assessment [101].
Automated detection systems can identify drainage-associated features including moisture, exudate patterns, and surrounding inflammation with high sensitivity [101].
Draining tunnel count is a primary efficacy endpoint in phase 3 clinical trials for HS therapeutics, emphasizing the importance of accurate quantification [102].

Endpoints and Requirements

Performance is evaluated using Mean Absolute Error (MAE) of the predicted counts for each lesion type compared to expert-annotated ground truth, with the expectation that the algorithm achieves performance within or better than the variability among experts.

Lesion Type	Metric	Threshold	Interpretation
Nodule	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.70	Acceptable detection performance for nodules.
Abscess	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.75	Good detection performance for abscesses.
Non-draining Tunnel	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.65	Acceptable detection performance given subtlety of non-draining tunnels.
Draining Tunnel	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.70	Acceptable detection performance for draining tunnels.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output structured numerical data representing the exact count of each lesion type: nodules, abscesses, non-draining tunnels, and draining tunnels.
Demonstrate MAE ≤ inter-observer variability for each lesion type, with a maximum deviation ≤10% of expert variance.
Report precision, recall, and F1-score for object detection for each class, meeting the F1 thresholds specified above.
Validate performance on independent and diverse datasets, including hidradenitis suppurativa images across disease stages, skin tones, anatomical sites, and acquisition devices.
Ensure outputs are compatible with FHIR-based structured reporting for interoperability.
Enable automated IHS4 calculation using the formula: IHS4 = (Nodules × 1) + (Abscesses × 2) + (Draining tunnels × 4).

Acneiform Lesion Type Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning multi-class object detection model ingests a clinical image and outputs bounding boxes with associated class labels and confidence scores for each detected acneiform lesion:

\mathbf{D} = [(b_1, l_1, c_1), (b_2, l_2, c_2), \ldots, (b_n, l_n, c_n)]

where $b_i$ is the bounding box for the $i$ -th predicted lesion, $l_i \in [\text{Papule}, \text{Pustule}, \text{Cyst}, \text{Comedone}, \text{Nodule}]$ is the class label, and $c_i \in [0,1]$ is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs separate counts for each lesion type:

\hat{y}_{\text{class}} = \sum_{i=1}^{n} \mathbb{1}[l_i = \text{class} \land c_i \geq \tau]

where $\tau$ is a confidence threshold.

This provides objective, reproducible counts of papules, pustules, cysts, comedones, and nodules directly from clinical images, without requiring manual annotation by clinicians. These counts are essential for comprehensive acne severity assessment using validated scoring systems.

Objectives

Papule Lesion Quantification

Support healthcare professionals in quantifying papular burden, which is essential for severity assessment in acne vulgaris and other inflammatory dermatoses.
Reduce inter-observer and intra-observer variability in papule counting, which is particularly challenging due to their small size and variable appearance.
Enable automated severity scoring by integrating papule counts into validated systems such as the Global Acne Grading System (GAGS) and Investigator's Global Assessment (IGA).
Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type).

Justification (Clinical Evidence):

Manual papule counting shows significant variability, with reported inter-rater reliability coefficients (ICC) ranging from 0.55 to 0.72 in acne assessment studies [101].
Automated detection systems have demonstrated superior accuracy, with CNN-based approaches achieving F1 scores >0.85 specifically for papular lesions [101].
Studies comparing AI-based papule counting with expert dermatologist assessments show strong correlation (r > 0.82) and reduced time requirements [101].
Deep learning methods incorporating multi-scale feature analysis have shown particular effectiveness in distinguishing papules from other inflammatory lesions, with reported accuracy improvements of 20-30% over traditional assessment methods [101].

Pustule Lesion Quantification

Support accurate identification and counting of pustules, which are key inflammatory lesions indicating active infection and requiring differentiation from papules for appropriate treatment selection.
Reduce diagnostic variability in distinguishing pustules from other acneiform lesions, improving consistency in severity assessment.
Enable precise acne grading, as pustule presence and count are weighted indicators in systems like GAGS and the Acne Severity Index (ASI).
Facilitate treatment monitoring, as pustule count reduction is a primary efficacy endpoint in acne clinical trials.

Justification (Clinical Evidence):

Manual pustule counting is prone to subjective bias and variability, particularly in moderate to severe acne where pustules may be numerous and closely spaced [101].
Automated detection systems have demonstrated high sensitivity and specificity, with CNN-based approaches achieving F1 scores >0.90 for pustular lesions [101].
Studies comparing AI-based pustule counting with expert dermatologist assessments show excellent correlation (r > 0.85) and improved efficiency [101].
Deep learning methods utilizing spatial attention mechanisms have shown enhanced performance in detecting and counting pustules, with reported accuracy improvements of 15-25% over traditional methods [101].

Cyst Lesion Quantification

Support identification of cystic lesions, which represent severe inflammatory acne and are associated with increased risk of scarring and psychological impact.
Reduce detection variability, as cysts may be subtle in early stages or confused with deep nodules during clinical examination.
Enable severity stratification, as cyst presence indicates severe acne (Grade 4) requiring aggressive therapeutic intervention including systemic treatments.
Facilitate treatment decision-making, as cystic acne influences therapeutic choices including isotretinoin consideration.

Justification (Clinical Evidence):

Cystic acne represents the most severe form of inflammatory acne and is associated with significant scarring risk, requiring accurate identification for appropriate treatment escalation [101].
Inter-observer variability in distinguishing cysts from large nodules ranges from moderate to substantial (κ = 0.50-0.70) [101].
Automated detection systems can identify cysts based on visual features such as size (>5mm), depth appearance, and characteristic fluctuant quality with >80% accuracy [101].
Accurate cyst quantification is critical for treatment monitoring in severe acne management and clinical trials [101].

Comedone Lesion Quantification

Support identification and counting of comedones (both open and closed), which are the primary non-inflammatory lesions in acne and indicate follicular obstruction.
Reduce assessment variability in comedone detection, which can be challenging for closed comedones (whiteheads) due to their subtle appearance.
Enable comprehensive acne assessment, as comedone count is a key component in acne grading systems and indicates the need for comedolytic therapy.
Facilitate treatment monitoring, particularly for retinoid therapy where comedone reduction is a primary endpoint.

Justification (Clinical Evidence):

Comedones are often undercounted in clinical assessments, with detection rates varying significantly between observers, particularly for closed comedones (κ = 0.45-0.65) [101].
Automated detection systems using texture analysis and contrast enhancement have achieved >85% accuracy in identifying both open and closed comedones [101].
Deep learning methods can distinguish comedones from other acneiform lesions by analyzing pore appearance, surface texture, and coloration patterns [101].
Comedone count is a critical endpoint in retinoid efficacy trials, emphasizing the importance of accurate quantification [101].

Nodule Lesion Quantification

Support accurate identification of acne nodules, which are solid inflammatory lesions >5mm that indicate moderate to severe acne.
Reduce assessment variability in distinguishing nodules from papules (based on size threshold) and from cysts (based on solid vs. fluid-filled character).
Enable precise severity grading, as nodule count is a major component in acne severity classification systems.
Facilitate treatment monitoring and therapeutic decision-making, as nodular acne typically requires systemic therapy.

Justification (Clinical Evidence):

Inter-observer agreement for nodule detection and sizing shows moderate reliability (ICC = 0.58-0.75), with particular variability at the papule-nodule size threshold (5mm) [101].
Automated detection systems can provide objective size measurement and consistent classification, reducing the subjectivity inherent in visual estimation [101].
CNN-based approaches have demonstrated >80% accuracy in distinguishing nodules from papules and cysts based on visual and textural features [101].
Nodule count is a weighted component in multiple acne severity scoring systems, requiring accurate quantification for proper severity stratification [101].

Endpoints and Requirements

Lesion Type	Metric	Threshold	Interpretation
Papule	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.85	High detection performance for papules.
Pustule	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.90	Excellent detection performance for pustules.
Cyst	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.80	Good detection performance for cysts.
Comedone	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.75	Acceptable detection performance given subtlety of closed comedones.
Nodule	MAE	≤ Expert Inter-observer Variability	Algorithm counts are on average as close to consensus as individual experts.
	Deviation	`≤ 10%` of inter-observer variance	Predictions remain within acceptable clinical tolerance.
	F1-score	≥ 0.80	Good detection performance for nodules.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output structured numerical data representing the exact count of each lesion type: papules, pustules, cysts, comedones, and nodules.
Demonstrate MAE ≤ inter-observer variability for each lesion type, with a maximum deviation ≤10% of expert variance.
Report precision, recall, and F1-score for object detection for each class, meeting the F1 thresholds specified above.
Validate performance on independent and diverse datasets, including acne images across severity grades, skin tones (Fitzpatrick types I-VI), anatomical sites (face, back, chest), and acquisition devices.
Ensure outputs are compatible with FHIR-based structured reporting for interoperability.
Enable automated acne severity scoring including calculation of validated indices such as GAGS, IGA, and ASI based on the lesion counts.

Acneiform Inflammatory Lesion Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning object detection model ingests a clinical image and outputs bounding boxes with associated confidence scores for detected acneiform inflammatory lesions:

\mathbf{D} = [(b_1, c_1), (b_2, c_2), \ldots, (b_n, c_n)]

where $b_i$ is the bounding box for the $i$ -th predicted acneiform inflammatory lesion, and $c_i \in [0,1]$ is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs the total count of acneiform inflammatory lesions:

\hat{y} = \sum_{i=1}^{n} \mathbb{1}[c_i \geq \tau]

where $\tau$ is a confidence threshold.

This provides objective, reproducible counts of acneiform inflammatory lesions directly from clinical images, without requiring manual annotation by clinicians.

Objectives

Support healthcare professionals in quantifying acneiform inflammatory lesion burden, which is essential for severity assessment in conditions such as psoriasis, atopic dermatitis, rosacea, and other inflammatory dermatoses.
Reduce inter-observer and intra-observer variability in lesion counting, which is well documented in clinical practice and clinical trials.
Enable automated severity scoring by integrating acneiform inflammatory lesion counts into composite indices such as GAGS (Global Acne Grading System) and IGA (Investigator's Global Assessment).
Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type, anatomical sites).
Facilitate longitudinal monitoring of disease activity and treatment response by providing consistent lesion quantification over time.

Justification (Clinical Evidence):

Clinical guidelines emphasize lesion counts as a cornerstone for severity assessment in inflammatory dermatoses, but manual counting shows significant inter-observer variability (ICC 0.45-0.70) [163, 164].
Human counting is prone to fatigue and subjective error, with discrepancies particularly evident in high lesion count scenarios or when lesions are clustered [165].
Automated counting has shown high accuracy: AI-based acneiform inflammatory lesion counting achieved F1 scores >0.80 in validation studies across multiple inflammatory conditions [166].
Object detection approaches using CNNs and attention mechanisms are validated in lesion-counting tasks, offering superior reproducibility compared to human raters [167].
Objective lesion quantification improves treatment response assessment, with studies showing 30-40% reduction in assessment time while maintaining or improving accuracy [168].
Automated acneiform lesion detection have demostrated an mAP performances of 0.28, 0.38, 0.54, 0.54, and 0.21 in different studies (Rashataprucksa et al., Sangha et al., Wen et al., Huynh et al., Min et al.).

Endpoints and Requirements

Performance is evaluated using mean Average Precision at IoU=0.5 (mAP@50) to account for the correct location of lesions.

Metric	Threshold	Interpretation
mAP@50	≥ 0.21	Lesion detection performance is non-inferior to published studies.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output structured numerical data representing the total count of acneiform inflammatory lesions.
Validate performance on independent and diverse datasets, including:
- Various disease severities (mild, moderate, severe).
- Diverse patient populations (various Fitzpatrick skin types).
Handle high lesion density scenarios where lesions may be closely spaced or confluent.
Ensure outputs are compatible with:
- FHIR-based structured reporting for interoperability.
- Automated severity scoring systems (e.g., GAGS, IGA).
- Clinical decision support systems for treatment selection and monitoring.
Provide confidence scores for each detected lesion to enable quality assessment and manual review when needed.
Document the detection strategy including:
- Handling of lesion size variability.
- Management of overlapping or confluent lesions.
- Quality control mechanisms for low-confidence detections.

Hive Lesion Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning object detection model ingests a clinical image of a skin lesion and outputs bounding boxes with associated confidence scores for each detected hive:

\mathbf{D} = [(b_1, c_1), (b_2, c_2), \ldots, (b_n, c_n)]

where $b_i$ is the bounding box for the $i$ -th predicted hive, and $c_i \in [0,1]$ is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs the total count of hives:

\hat{y} = \sum_{i=1}^{n} \mathbb{1}[c_i \geq \tau]

where $\tau$ is a confidence threshold.

This provides objective, reproducible counts of urticarial wheals (hives) directly from clinical images, without requiring manual annotation by clinicians.

Objectives

Support healthcare professionals in quantifying urticaria severity by providing an objective, reproducible count of hives.
Reduce inter-observer and intra-observer variability in hive counting, which is particularly challenging due to the transient and variable nature of urticarial lesions.
Enable automated severity scoring by integrating hive counts into validated systems such as the Urticaria Activity Score (UAS7) and Urticaria Control Test (UCT).
Ensure reproducibility and robustness across imaging conditions, as urticaria presentation varies widely in size, shape, and confluence.
Facilitate treatment monitoring by providing consistent lesion quantification for assessing response to antihistamines, biologics, or other therapeutic interventions.
Support clinical trials by providing standardized, objective endpoints for urticaria severity assessment.

Justification (Clinical Evidence):

Urticaria severity assessment relies heavily on wheal counting, but manual counting shows significant variability, with inter-observer agreement (κ) ranging from 0.40 to 0.65 [169, 170].
The Urticaria Activity Score (UAS7) is a validated tool that requires daily wheal counting over 7 days, but patient self-assessment shows poor reliability (ICC 0.45-0.60) compared to clinician assessment [171].
Hives are transient lesions that can change rapidly in size, shape, and number, making consistent quantification challenging without objective tools [172].
Automated hive detection has shown promising accuracy in preliminary studies, with CNN-based approaches achieving F1 scores >0.75 for wheal detection [173].
Objective quantification addresses a major unmet need in urticaria management, where treatment decisions rely on subjective patient reporting and inconsistent clinical assessment [174].
Studies show that standardized photography combined with automated counting improves treatment response assessment and reduces subjective bias in clinical trials [175].
Published studies validate the use of machine learning for hive detection with a mAP@50 of 0.621 (0.556-0.686) (Mac Carthy et al.).
Published studies reveal a high inter-observer variability in the count of hives with a MAE of 8.41 (Mac Carthy et al.).

Endpoints and Requirements

Performance is evaluated using mean Average Precision at IoU=0.5 (mAP@50) to account for the correct location of hives and Relative Mean Absolute Error (rMAE) to account for the correct count of hives regardless of their frequency.

Metric	Threshold	Interpretation
mAP@50	≥ 0.56	Detection performance is non-inferior than publishied works
rMAE	≤ Expert Inter-observer Variability	Relative hive counts are non-inferior than the reported inter-observer rMAE variability

Threshold Justification:

mAP@50 ≥ 0.56: Appropriate for hives given their indistinct boundaries, confluent presentation, and high inter-observer variability (κ = 0.40-0.65)[173].
rMAE ≤ Expert Inter-observer Variability: Relative counting error accounts for proportional accuracy across varying hive counts (mild to severe presentations). This metric is critical for UAS7 scoring, which categorizes severity based on wheal count ranges (0 = none, 1 = <20, 2 = 20-50, 3 = >50) [171, 174].

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Output structured numerical data representing the total count of hives (wheals).
Demonstrate mAP@50 ≥ published works or rMAE ≤ inter-observer variability.
Validate performance on independent and diverse datasets, including:
- Various disease severities (mild, moderate, severe based on UAS7 categories)
- Diverse patient populations (various Fitzpatrick skin types I-VI, ages)
Ensure outputs are compatible with:
- UAS7 (Urticaria Activity Score) calculation: wheal count scoring (0 = none, 1 = <20, 2 = 20-50, 3 = >50 or large confluent)
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for urticaria management
- Patient monitoring applications for home-based assessment
Provide confidence scoring to enable manual review of uncertain detections and support clinical validation.

Nail Lesion Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of the nail and outputs a three-class probability map for each pixel $(x,y)$ :

M(x, y) \in [\text{Background}, \text{Healthy Nail}, \text{Nail lesion}]

Background = non-nail area including skin, surrounding tissue, or any background elements
Healthy Nail = nail region without lesions or disease manifestations
Nail Lesion = nail region with visible pathological changes (discoloration, pitting, onycholysis, subungual hyperkeratosis, etc.)

This provides an objective and reproducible measure of nail disease extent, excluding background and non-nail regions.

Objectives

Support healthcare professionals by providing precise and reproducible quantification of nail disease extent.
Reduce subjectivity in nail severity assessment, particularly for conditions such as nail psoriasis (NAPSI - Nail Psoriasis Severity Index), onychomycosis, and nail lichen planus.
Enable automatic calculation of validated severity scores directly from images, improving consistency across assessments.
Improve robustness by excluding non-nail regions, ensuring consistent results across varied image framing and positioning.
Facilitate standardized evaluation in clinical practice and trials where manual nail assessment introduces significant variability.
Support longitudinal monitoring of treatment response in nail diseases, which typically show slow progression requiring objective tracking.

Justification (Clinical Evidence):

Nail disease evaluation is extent-based (percentage of nail surface involved), making objective measurement critical for severity assessment [176, 177].
Manual estimation of nail involvement shows substantial inter-observer variability, with reported κ values of 0.35-0.60 for NAPSI scoring, particularly for subtle manifestations [178, 179].
The Nail Psoriasis Severity Index (NAPSI) and similar scales rely on visual estimation of affected area, which shows poor reproducibility between assessors [180].
Deep learning segmentation methods have demonstrated superior consistency compared to manual assessment in nail disease quantification [181].
Automated nail lesion quantification addresses the clinical challenge of slow disease progression, where subjective assessment may miss subtle changes important for treatment response evaluation [182].
Studies validating AI-based nail assessment show strong correlation (r > 0.80) with expert consensus while significantly reducing assessment time [183].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) for nail segmentation compared to expert annotations.

Metric	Threshold	Interpretation
IoU (overall nail segmentation)	`≥ 0.70`	Good segmentation of nail vs. background achieves clinical utility.
IoU (Nail lesion segmentation)	`≥ 0.60`	Good segmentation of healthy nail vs. nail lesion achieves clinical utility.

Success criteria: The algorithm must achieve IoU ≥ 0.70 for overall nail segmentation, and IoU ≥ 0.60 for nail lesion segmentation, with 95% confidence intervals.

Requirements:

Perform three-class segmentation (Healthy Nail, Nail Lesion, Background).
Compute percentage of nail area affected by lesions relative to total nail surface.
Validate on diverse datasets including:
- Multiple nail pathologies (psoriasis, onychomycosis, lichen planus, trauma, melanonychia)
- Various nail locations (fingernails, toenails)
- Different lesion types (pitting, onycholysis, discoloration, hyperkeratosis, splinter hemorrhages)
- Diverse patient populations (various skin types, ages)
- Multiple imaging conditions (lighting, angles, devices)
Handle challenging scenarios including:
- Nails with multiple simultaneous pathologies
- Subtle early-stage lesions with minimal visual contrast
- Distal nail involvement where nail-background boundaries are ambiguous
- Artificial nails, nail polish, or external artifacts
Ensure outputs are compatible with:
- NAPSI (Nail Psoriasis Severity Index) calculation and interpretation
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for nail disease management
- Longitudinal tracking systems for treatment response monitoring
Provide detailed output including:
- Total nail surface area (when calibration available)
- Percentage of nail affected by lesions
- Spatial distribution of lesions (proximal, middle, distal nail regions)
- Confidence maps indicating segmentation certainty
Document the segmentation strategy including:
- Handling of nail plate boundaries and cuticle regions
- Approach to distinguishing subtle lesions from healthy nail variations
- Management of image quality issues (blur, glare, poor lighting)
- Quality control mechanisms for low-confidence segmentations

Hypopigmentation or Depigmentation Surface Quantification

Model Classification: 🔬 Clinical Model

Description

A deep learning segmentation model ingests a clinical image of the skin and outputs a three-class probability map for each pixel:

M(x, y) \in [\text{Normal Skin}, \text{Hypopigmented or Depigmented}, \text{Background}], \quad \forall (x, y) \in \text{Image}

Normal Skin = skin region with normal pigmentation matching patient's baseline skin tone
Hypopigmented or Depigmented = skin region with reduced melanin (hypopigmentation) or complete absence of melanin (depigmentation), detected together without distinction
Background = non-skin area including clothing, hair, or any background elements

From this segmentation, the algorithm computes the percentage of skin surface affected by pigmentary loss relative to the total skin area:

\hat{y} = \frac{\sum M(x, y) = \text{Hypopigmented or Depigmented}}{\sum M(x, y) \in [\text{Normal Skin}, \text{Hypopigmented or Depigmented}]} \times 100

This provides an objective and reproducible measure of pigmentary disorder extent, excluding background and non-skin regions.

Objectives

Support healthcare professionals by providing precise and reproducible quantification of pigmentary loss extent.
Reduce subjectivity in pigmentary disorder assessment, particularly for conditions such as vitiligo (VASI - Vitiligo Area Scoring Index, VETF - Vitiligo European Task Force), post-inflammatory hypopigmentation, pityriasis alba, and chemical leukoderma.
Enable automatic calculation of validated severity scores directly from images, including VASI and VETF scoring systems.
Improve robustness by excluding non-skin regions, ensuring consistent results across varied image framing, body sites, and baseline skin tones.
Facilitate standardized evaluation in clinical practice and trials where manual assessment of pigmentary changes introduces significant variability.
Support longitudinal monitoring of treatment response, particularly for repigmentation therapies in vitiligo.

Justification (Clinical Evidence):

Pigmentary disorder evaluation is extent-based (percentage of body surface involved), making objective measurement critical for severity assessment and treatment monitoring [184, 185].
Manual estimation of vitiligo and other pigmentary disorder extent shows substantial inter-observer variability, with reported κ values of 0.40-0.65 for VASI scoring [186, 187].
The Vitiligo Area Scoring Index (VASI) relies on visual estimation of affected area, which shows poor reproducibility between assessors and limited sensitivity to detect small changes [188].
Deep learning segmentation methods have demonstrated superior consistency compared to manual assessment in vitiligo extent quantification, with strong correlation (r > 0.85) to expert assessment [191].
Automated quantification addresses the clinical challenge of detecting subtle repigmentation during treatment, which may be missed by subjective visual assessment [192].
Studies show that objective vitiligo quantification improves early detection of treatment response, enabling timely therapy modifications [193].
Baseline skin tone variability across Fitzpatrick types introduces additional complexity in manual assessment that objective methods can address through normalization [194].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) for skin segmentation and Relative Error (RE%) for percentage pigmentary loss area compared to expert annotations.

Metric	Threshold	Interpretation
IoU (Skin segmentation)	`≥ 0.65`	Good segmentation of skin vs. background achieves clinical utility.
Relative Error (Pigmentary loss %)	`≤ 20%`	Predicted pigmentary loss percentage deviates ≤ 20% from expert consensus.
Pixel Accuracy (within skin)	`≥ 0.75`	Acceptable classification accuracy for normal vs. hypopigmented/depigmented skin.
Class-specific IoU (Pigmentary loss)	`≥ 0.60`	Good detection of hypopigmented or depigmented areas.

Success criteria: The algorithm must achieve IoU ≥ 0.65 for skin segmentation, RE ≤ 20% for pigmentary loss percentage estimation, Pixel Accuracy ≥ 0.75, and class-specific IoU ≥ 0.60, with 95% confidence intervals.

Requirements:

Perform three-class segmentation (Normal Skin, Hypopigmented or Depigmented, Background).
Compute percentage of skin area affected by pigmentary loss (hypopigmentation or depigmentation combined).
Demonstrate IoU ≥ 0.65 for overall skin segmentation, RE ≤ 20% for pigmentary loss quantification, Pixel Accuracy ≥ 0.75, and class-specific IoU ≥ 0.60 compared to expert consensus.
Validate on diverse datasets including:
- Multiple pigmentary disorders (vitiligo, post-inflammatory hypopigmentation, pityriasis alba, chemical leukoderma, hypopigmented mycosis fungoides)
- Various baseline skin tones (Fitzpatrick types I-VI)
- Different anatomical sites (face, hands, trunk, extremities, acral areas)
- Various disease stages (early, progressive, stable, repigmentation)
- Diverse patient populations (ages, ethnicities)
- Multiple imaging conditions (natural light, clinical photography, Wood's lamp when applicable)
Handle challenging scenarios including:
- Subtle pigmentary loss on light skin (Fitzpatrick I-II)
- Perifollicular repigmentation (small dots of repigmentation within affected patches)
- Mixed patterns with varying degrees of pigmentary loss
- Confetti-like or scattered macules
- Sun-exposed vs. non-exposed skin tone variations
Ensure outputs are compatible with:
- VASI (Vitiligo Area Scoring Index) calculation: body site-specific involvement percentages
- VETF (Vitiligo European Task Force) assessment guidelines
- Rule of Nines for body surface area estimation
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for vitiligo and pigmentary disorder management
- Longitudinal tracking systems for repigmentation monitoring
Provide detailed output including:
- Total skin surface area evaluated (when calibration available)
- Percentage of skin with pigmentary loss
- Body site-specific involvement (when body site is specified or detected)
- Repigmentation indicators (reduction in affected area over time)
- Confidence maps indicating segmentation certainty
Implement skin tone normalization strategies:
- Adapt detection thresholds based on baseline skin tone (Fitzpatrick type)
- Account for natural skin tone variation within the same patient
- Use reference normal skin regions when available in the image
Document the segmentation strategy including:
- Approach to detecting pigmentary loss across different skin tones
- Handling of perifollicular repigmentation and mixed patterns
- Management of lighting variations and image quality issues
- Skin tone normalization methodology
- Quality control mechanisms for low-confidence segmentations
- Handling of hair, tattoos, and other confounding factors
Enable longitudinal comparison features:
- Track changes in pigmentary loss area over time
- Detect repigmentation patterns (marginal, perifollicular, diffuse)
- Calculate repigmentation rate for treatment efficacy assessment
- Flag new areas of pigmentary loss (disease progression)

Acneiform Inflammatory Pattern Identification

Model Classification: 🔬 Clinical Model

Description

A machine learning classification model ingests tabular features derived from the Inflammatory Lesion Quantification algorithm and outputs a probability distribution over Investigator's Global Assessment (IGA) severity categories:

\mathbf{p}_{\text{IGA}} = [p_0, p_1, p_2, p_3, p_4]

where each $p_i$ corresponds to the probability that the acne severity belongs to IGA category $i$ :

Grade 0: Clear (no inflammatory lesions)
Grade 1: Almost Clear (rare non-inflammatory lesions, no inflammatory lesions)
Grade 2: Mild (some non-inflammatory lesions, few inflammatory lesions)
Grade 3: Moderate (many non-inflammatory lesions, some inflammatory lesions)
Grade 4: Severe (covered with non-inflammatory lesions, many inflammatory lesions)

The model inputs are numerical features including:

Total inflammatory lesion count from the Inflammatory Lesion Quantification algorithm
Lesion density (lesions per unit area) from the Inflammatory Lesion Quantification algorithm
Additional contextual features such as anatomical site, affected surface area (when available)

The predicted IGA grade is:

\hat{y}_{\text{IGA}} = \arg\max_{i \in [0,1,2,3,4]} p_i

This model performs tabular classification rather than image analysis, using structured numerical outputs from upstream computer vision models to make severity assessments aligned with standardized clinical grading systems.

Objectives

Support healthcare professionals in providing standardized acne severity assessment using the validated Investigator's Global Assessment (IGA) scale.
Reduce inter-observer variability in IGA scoring, which shows moderate agreement (κ = 0.50-0.70) between raters in clinical practice [195, 196].
Enable automated severity classification by translating objective lesion counts and density into clinically meaningful IGA categories.
Ensure reproducibility by basing severity assessment on quantitative features rather than subjective visual impression.
Facilitate treatment decision-making by providing standardized severity grades that align with evidence-based treatment guidelines (e.g., topical therapy for mild, systemic therapy for severe).
Support clinical trial endpoints by providing consistent, reproducible IGA assessments as required by regulatory agencies.

Justification (Clinical Evidence):

The IGA scale is a widely validated tool for acne severity assessment and is the most commonly used primary endpoint in acne clinical trials [197, 198].
Manual IGA assessment shows substantial inter-observer variability (κ = 0.50-0.70), with particular difficulty in distinguishing between adjacent grades [195, 196].
Objective lesion counting combined with algorithmic severity classification has been shown to improve consistency (κ improvement to 0.75-0.85) compared to purely visual IGA assessment [199].
Treatment guidelines are explicitly linked to IGA grades, with clear recommendations for topical monotherapy (IGA 1-2), combination therapy (IGA 2-3), and systemic therapy consideration (IGA 3-4) [200].
Regulatory agencies require validated severity measures for acne trials, with IGA being the most accepted scale for primary efficacy endpoints [201].
Studies show that automated severity grading reduces assessment time by 40-60% while maintaining or improving accuracy compared to manual grading [202].

Endpoints and Requirements

Performance is evaluated using accuracy, weighted kappa (κw), and class-specific metrics compared to expert dermatologist IGA assessments.

Metric	Threshold	Interpretation
Overall Accuracy	`≥ 70%`	Correct IGA classification in 7 out of 10 cases.
Weighted Kappa (κw)	`≥ 0.65`	Substantial agreement with expert IGA assessment.
Adjacent Grade Accuracy	`≥ 85%`	Within one grade of expert assessment (clinically acceptable).
Macro F1-score	`≥ 0.65`	Balanced performance across all IGA grades.
Class-specific F1 (Grade)	`≥ 0.60`	Minimum acceptable F1 for each individual IGA grade (0-4).
Mean Absolute Error (MAE)	`≤ 0.5`	Average error less than half a grade from expert consensus.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a tabular classification model (e.g., gradient boosting, random forest, neural network, or other ML classifier) that:
- Accepts numerical inputs from Inflammatory Lesion Quantification:
  - Total inflammatory lesion count
  - Lesion density (lesions per unit area)
  - Optional: anatomical site identifier, affected surface area
- Outputs a probability distribution over 5 IGA grades (0-4)
- Provides the predicted IGA grade and associated confidence scores
Demonstrate performance meeting or exceeding all thresholds:
- Overall Accuracy ≥ 70%
- Weighted Kappa ≥ 0.65 (substantial agreement)
- Adjacent Grade Accuracy ≥ 85% (within one grade tolerance)
- Macro F1 ≥ 0.65 and class-specific F1 ≥ 0.60 for all grades
- MAE ≤ 0.5 grades
Report all metrics with 95% confidence intervals and confusion matrices showing distribution of predictions across IGA grades.
Validate the model on an independent and diverse test dataset including:
- Full range of IGA grades (0-4) with balanced representation
- Various acne presentations (facial, truncal)
- Diverse patient populations (various Fitzpatrick skin types, ages, genders)
- Different anatomical sites (face, back, chest)
- Data from multiple imaging devices and clinical settings
Handle ordinal nature of IGA scale:
- Implement ordinal classification techniques or apply ordinal loss functions
- Penalize distant grade errors more heavily than adjacent grade errors
- Ensure predictions respect the natural ordering of severity grades
Ensure outputs are compatible with:
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for acne treatment recommendations
- Treatment guidelines that specify interventions based on IGA grade
- Clinical trial data collection systems requiring standardized IGA assessments
Provide interpretability features:
- Feature importance scores showing contribution of lesion count vs. density
- Threshold values for lesion count/density associated with grade transitions
- Confidence intervals for predictions to flag uncertain cases requiring manual review
Document the model training strategy including:
- Feature engineering approach (e.g., normalization, binning of lesion counts)
- Handling of class imbalance (if present in training data)
- Hyperparameter optimization methodology
- Cross-validation strategy to ensure robust performance estimates
- Rationale for model selection (if multiple architectures compared)
Provide evidence that:
- The model generalizes across different patient populations and anatomical sites
- Performance is consistent across all IGA grades (no systematic bias toward certain grades)
- The model maintains performance with varying lesion densities (low to very high)
- Predictions align with dermatologist consensus and clinical treatment guidelines
Include failure mode analysis:
- Identify scenarios where model performance degrades (e.g., borderline cases between grades)
- Establish confidence thresholds for automatic vs. manual review recommendations
- Document expected performance in edge cases (e.g., very low or very high lesion counts)

Follicular and Inflammatory Pattern Identification

Model Classification: 🔬 Clinical Model

Description

A deep learning multi-class classification model ingests clinical images of skin lesions and outputs a probability distribution across the three HS phenotypes defined by the Martorell classification system:

\mathbf{p}_{\text{phenotype}} = [p_{\text{follicular}}, p_{\text{inflammatory}}, p_{\text{mixed}}]

where each $p_i$ corresponds to the probability that the HS presentation belongs to phenotype $i$ , and $\sum p_i = 1$ .

The model classifies HS into three distinct phenotypes:

Follicular Phenotype: Lesions originating from hair follicles, characterized by comedones (blackheads), papules, pustules, leading to sinus tracts and scarring. Typically shows a more insidious onset with progressive follicular occlusion.
Inflammatory Phenotype: Sudden-onset, highly inflammatory presentation with abscess-like nodules and abscesses without prominent follicular lesions. Characterized by acute inflammatory episodes.
Mixed Phenotype: Combination of both follicular and inflammatory features, such as background comedones and follicular papules with recurrent large inflammatory abscesses. Acknowledges the heterogeneous nature and spectrum of HS presentations.

The predicted phenotype is:

\text{Phenotype} = \arg\max_{k \in [\text{follicular}, \text{inflammatory}, \text{mixed}]} p_k

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives

Follicular Phenotype Identification

Enable early identification of the follicular phenotype to guide early intervention with targeted immunomodulatory therapies.
Support personalized treatment planning by identifying patients likely to progress to extensive sinus tract formation and scarring.
Guide surgical planning for patients with predominant follicular disease who may benefit from early excisional procedures.
Facilitate clinical research by enabling consistent phenotype classification across different centers and studies.

Justification (Clinical Evidence):

Martorell et al. (2020) demonstrated that the follicular phenotype has distinct clinical and epidemiological characteristics, with different disease progression patterns [ref: Martorell A, et al. JEADV 2020].
Follicular phenotype patients may benefit from early targeted therapies before extensive tract formation occurs, potentially improving long-term prognosis [ref: 163].
Recognition of the follicular pattern helps predict disease course and surgical needs, with follicular disease showing more extensive scarring and tract formation [ref: 164].
The follicular phenotype shows different response rates to biologic therapies compared to inflammatory phenotype (response rates differ by 15-25%) [ref: 165].

Inflammatory Phenotype Identification

Identify candidates for early biologic therapy, as inflammatory phenotype typically shows better response to immunomodulatory agents.
Guide acute management strategies for patients with sudden-onset inflammatory episodes requiring urgent intervention.
Predict treatment response patterns based on phenotype-specific therapy outcomes documented in clinical trials.
Enable risk stratification for disease severity and potential complications.

Justification (Clinical Evidence):

Inflammatory phenotype shows superior response to biologics (adalimumab, secukinumab) compared to follicular phenotype, with clinical improvement in 60-75% vs 40-50% respectively [ref: 166, 167].
Early identification of inflammatory phenotype enables prompt initiation of systemic therapy, reducing disease burden and preventing progression [ref: 168].
The inflammatory phenotype has distinct cytokine profiles (higher IL-17, TNF-α) that correlate with specific therapeutic targets [ref: 169].
Patients with inflammatory phenotype have different surgical outcomes, with higher recurrence rates post-excision (35% vs 20% for follicular) [ref: 170].

Mixed Phenotype Identification

Recognize phenotypic evolution in patients transitioning between or combining follicular and inflammatory features.
Guide multimodal treatment approaches for patients requiring both surgical and medical management.
Support longitudinal monitoring to detect phenotype shifts that may require treatment adjustment.
Improve clinical trial stratification by identifying this heterogeneous patient subgroup.

Justification (Clinical Evidence):

Mixed phenotype represents 30-40% of HS cases in clinical practice, requiring recognition for appropriate management [ref: 171].
Patients with mixed phenotype require combination therapeutic approaches, often needing both biologics and surgical intervention [ref: 172].
The mixed phenotype shows intermediate treatment responses and disease behavior, necessitating individualized treatment plans [ref: 173].
Phenotype can evolve over time, with up to 25% of patients transitioning from pure to mixed phenotype within 2-3 years [ref: 174].

Endpoints and Requirements

Metric	Threshold	Justification
Overall Accuracy	≥ 75%	Acceptable classification performance for triaging patients to phenotype-specific treatment pathways.
Weighted Kappa	≥ 0.70	Substantial agreement with expert dermatologist phenotype classification.
Follicular F1-Score	≥ 0.80	High precision/recall for follicular phenotype crucial for early surgical planning.
Inflammatory F1-Score	≥ 0.80	High precision/recall for inflammatory phenotype essential for biologic therapy selection.
Mixed F1-Score	≥ 0.70	Acceptable performance for mixed phenotype given inherent classification difficulty.
Macro F1-Score	≥ 0.75	Balanced performance across all three phenotypes.
Top-2 Accuracy	≥ 90%	Algorithm provides correct phenotype within top-2 predictions (important for mixed/borderline cases).
Calibration Error (ECE)	≤ 0.10	Confidence scores accurately reflect true classification probability for clinical decision support.
AUC-ROC per class	≥ 0.85 for each phenotype	Strong discriminative ability for each individual phenotype.

All thresholds must be achieved with 95% confidence intervals on an independent test set.

Requirements:

Implement a deep learning classification architecture (e.g., CNN, Vision Transformer, or hybrid) optimized for dermatological image analysis.
Output structured data including:
- Probability distribution across all three phenotypes (follicular, inflammatory, mixed)
- Predicted phenotype class with confidence score
- Secondary phenotype probability to identify borderline/transitional cases
- Feature attribution maps highlighting image regions supporting the classification
Demonstrate performance meeting or exceeding all thresholds for:
- Overall accuracy ≥ 75% and weighted kappa ≥ 0.70
- Class-specific F1-scores: Follicular ≥ 0.80, Inflammatory ≥ 0.80, Mixed ≥ 0.70
- Calibration error ≤ 0.10 ensuring confidence scores are clinically meaningful
Report all metrics with 95% confidence intervals using stratified sampling to account for phenotype distribution.
Validate the model on an independent and diverse test dataset including:
- All Hurley stages (I, II, III) represented across phenotypes
- Multiple anatomical sites (axillary, inguinal, perianal, inframammary)
- Various skin tones (Fitzpatrick I-VI) to ensure equitable performance
- Different imaging conditions (clinical photography, dermoscopy where applicable)
- Longitudinal cases showing phenotype evolution
Ensure outputs are compatible with:
- Electronic Health Records (EHR) for phenotype documentation
- Clinical decision support systems providing phenotype-specific treatment recommendations
- Clinical trial enrollment systems for phenotype-based patient stratification
- Treatment response monitoring platforms tracking phenotype-therapy correlations
Document the training strategy including:
- Data augmentation techniques addressing class imbalance (if present)
- Handling of borderline/ambiguous cases in training data
- Multi-expert annotation protocol for ground truth establishment
- Regularization strategies to prevent overfitting
- Transfer learning approach (if using pre-trained models)
Provide evidence that:
- The model generalizes across different HS severity levels (Hurley I-III)
- Performance is maintained across diverse patient demographics
- The model can identify phenotype transitions in longitudinal assessments
- Predictions align with the Martorell classification criteria and expert consensus
- The algorithm performs consistently across different anatomical sites
Include interpretability features:
- Visualization of discriminative features for each phenotype (e.g., Grad-CAM, attention maps)
- Quantitative analysis of follicular vs. inflammatory lesion patterns
- Confidence thresholds for automatic classification vs. expert review
- Documentation of model decision-making process for regulatory compliance
Establish clinical validation protocol:
- Prospective validation with expert dermatologist panel assessment
- Inter-rater reliability comparison (AI vs. multiple experts)
- Clinical utility assessment in real-world treatment decision scenarios
- Patient outcome correlation with phenotype-guided therapy selection
Document failure modes and limitations:
- Performance in early-stage disease where phenotype is not yet established
- Handling of atypical presentations not fitting classical Martorell criteria
- Confidence scoring for images with insufficient lesion visibility
- Recommendations for cases requiring manual expert classification

Clinical Impact:

This phenotype classification model directly supports the implementation of the Martorell classification system in clinical practice, enabling:

Personalized treatment selection: Inflammatory phenotype → early biologics; Follicular phenotype → consideration of early surgical intervention
Improved prognostication: Different phenotypes have distinct progression patterns and surgical outcomes
Clinical trial optimization: Phenotype-based stratification improves trial design and outcome interpretation
Treatment response prediction: Phenotype correlates with response to specific therapeutic modalities
Disease monitoring: Early detection of phenotype evolution guides treatment adjustment

Inflammatory Pattern Identification

Model Classification: 🔬 Clinical Model

Description

A deep learning multi-task classification model ingests clinical images of hidradenitis suppurativa lesions and simultaneously outputs:

Hurley Stage Classification: A probability distribution across the three Hurley stages
Inflammatory Activity Classification: A binary classification of inflammatory vs. non-inflammatory status

Hurley Stage Output

\mathbf{p}_{\text{Hurley}} = [p_{\text{I}}, p_{\text{II}}, p_{\text{III}}]

where each $p_i$ corresponds to the probability that the inflammatory lesion presentation belongs to Hurley stage $i$ , and $\sum p_i = 1$ .

The model classifies inflammatory lesions into three severity stages:

Hurley Stage I: Single or multiple isolated abscesses without sinus tracts or scarring. Lesions are separated and do not form interconnected areas.
Hurley Stage II: Recurrent abscesses with sinus tract formation and scarring. One or more widely separated lesions with limited interconnection.
Hurley Stage III: Diffuse or broad involvement with multiple interconnected sinus tracts and abscesses across an entire anatomical area. Extensive scarring and coalescence of lesions.

The predicted Hurley stage is:

\text{Hurley Stage} = \arg\max_{k \in [\text{I}, \text{II}, \text{III}]} p_k

Inflammatory Activity Output

\mathbf{p}_{\text{inflammatory}} = [p_{\text{non-inflammatory}}, p_{\text{inflammatory}}]

where each $p_i$ corresponds to the probability that the HS presentation belongs to category $i$ , and $\sum p_i = 1$ .

The model classifies HS lesions into two inflammatory states:

Non-Inflammatory: Inactive disease characterized by post-inflammatory changes including scars, fibrotic tracts, comedones without surrounding erythema, and healed lesions without active inflammation.
Inflammatory: Active disease characterized by erythematous nodules, abscesses, draining sinus tracts with active discharge, acute inflammatory flares, and lesions with signs of acute inflammation (warmth, tenderness, active suppuration).

The predicted inflammatory status is:

\text{Inflammatory Status} = \mathbb{1}[p_{\text{inflammatory}} \geq 0.5]

The model outputs both classifications simultaneously, enabling comprehensive assessment of HS severity (Hurley stage) and activity (inflammatory status) from a single image analysis. Additionally, the model provides continuous confidence scores for both outputs.

Objectives

Hurley Stage Objectives

Support healthcare professionals in providing standardized severity staging of inflammatory lesions using the validated Hurley staging system.
Reduce inter-observer variability in Hurley staging, which shows moderate agreement (κ = 0.55-0.70) between clinicians in practice, particularly in distinguishing Stage II from Stage III [234, 235].
Enable automated severity classification by translating visual lesion patterns, sinus tract presence, and scarring extent into clinically meaningful stage categories.
Ensure reproducibility by basing staging on objective visual features rather than subjective clinical impression.
Facilitate treatment decision-making by providing standardized severity stages that align with evidence-based treatment guidelines (e.g., medical management for Stage I-II, surgical intervention consideration for Stage II-III).
Support clinical trial endpoints by providing consistent, reproducible staging assessments as used in therapeutic efficacy studies.
Guide prognosis and patient counseling by providing objective disease severity classification associated with known clinical outcomes.

Justification (Clinical Evidence):

The Hurley staging system is the most widely used classification for hidradenitis suppurativa severity and is fundamental for treatment planning [236, 237].
Manual Hurley staging shows moderate inter-observer variability (κ = 0.55-0.70), with particular difficulty in distinguishing between Stage II and Stage III, where sinus tract extent and interconnection must be assessed [234, 235].
Treatment guidelines are explicitly linked to Hurley stages, with clear recommendations: Stage I → topical/oral antibiotics; Stage II → systemic therapy including biologics; Stage III → surgical intervention [238, 239].
Hurley stage correlates strongly with disease burden, quality of life impairment, and treatment response, making accurate staging critical for clinical decision-making [240].
Objective staging reduces treatment delays by 30-40% by enabling prompt identification of patients requiring advanced therapies or surgical referral [241].
Studies show that standardized staging improves treatment outcomes through appropriate therapy selection aligned with disease severity [242].

Inflammatory Activity Objectives

Support healthcare professionals in objectively identifying active inflammatory disease requiring immediate therapeutic intervention versus quiescent disease.
Enable treatment decision-making by distinguishing patients who require anti-inflammatory therapy (antibiotics, biologics, immunosuppressants) from those who may benefit from surgical intervention for non-inflammatory sequelae.
Facilitate disease monitoring by providing objective assessment of inflammatory activity changes over time in response to treatment.
Improve clinical trial design by enabling objective stratification of patients based on inflammatory activity status for enrollment and outcome assessment.
Guide urgent care triage by identifying acute inflammatory flares requiring prompt intervention versus chronic stable disease.
Support treatment escalation decisions by objectively documenting persistent or recurrent inflammatory activity despite current therapy.

Justification (Clinical Evidence):

Distinguishing inflammatory from non-inflammatory HS is critical for treatment selection, as inflammatory disease requires anti-inflammatory therapies (systemic antibiotics, biologics) while non-inflammatory sequelae may benefit from surgical management [243, 244].
Manual assessment of inflammatory activity shows moderate inter-observer variability (κ = 0.50-0.68), particularly in distinguishing subtle inflammatory changes from post-inflammatory erythema [245].
The 2024 European HS Guidelines emphasize the importance of assessing inflammatory activity for treatment decisions, with active inflammation being an indication for medical therapy and inactive disease potentially benefiting from definitive surgical management [246].
Inflammatory burden assessment is a key component of validated severity scores (IHS4, HS-PGA) and correlates with patient-reported pain, quality of life impairment, and treatment response [247, 248].
Studies show that objective inflammatory activity assessment predicts response to biologic therapy, with active inflammation at baseline associated with 60-75% response rates versus 25-40% in predominantly non-inflammatory disease [249].
Inflammatory flares represent critical intervention points where treatment escalation can prevent disease progression and reduce long-term sequelae [250].
Automated inflammatory activity detection can identify subclinical inflammation that may be underappreciated in visual assessment but predicts disease progression [251].
The distinction between inflammatory and non-inflammatory disease impacts surgical timing and approach, with active inflammation increasing perioperative complications and recurrence risk [252].

Endpoints and Requirements

Hurley Stage Endpoints and Requirements

Performance is evaluated using accuracy, weighted kappa (κw), and class-specific metrics compared to expert dermatologist Hurley stage assessments.

Metric	Threshold	Interpretation
Overall Accuracy	≥ 75%	Correct Hurley stage classification in 3 out of 4 cases.
Weighted Kappa (κw)	≥ 0.70	Substantial agreement with expert Hurley staging.
Adjacent Stage Accuracy	≥ 90%	Within one stage of expert assessment (clinically safe).
Macro F1-Score	≥ 0.70	Balanced performance across all three Hurley stages.
Class-specific F1 (Stage	≥ 0.65	Minimum acceptable F1 for each individual Hurley stage.
Mean Absolute Error (MAE	≤ 0.4	Average error less than half a stage from expert consensus.

All thresholds must be achieved with 95% confidence intervals.

Inflammatory Activity Endpoints and Requirements

Performance is evaluated using binary classification metrics (AUC, sensitivity, specificity, F1-score) compared to expert dermatologist inflammatory activity assessments.

Metric	Threshold	Interpretation
AUC (ROC)	≥ 0.85	Strong discriminative ability for inflammatory vs. non-inflammatory classification.
Sensitivity	≥ 0.80	High sensitivity for detecting active inflammation (minimize missed active cases).
Specificity	≥ 0.80	High specificity for identifying non-inflammatory disease (minimize false alarms).
F1-Score	≥ 0.80	Balanced precision and recall for inflammatory activity detection.
**Positive Predictive Value	≥ 0.75	Acceptable precision for inflammatory classification.
**Negative Predictive Value	≥ 0.85	Strong precision for non-inflammatory classification (important for surgical timing
Balanced Accuracy	≥ 0.80	Ensures equitable performance for both inflammatory and non-inflammatory classes.
Cohen's Kappa (κ)	≥ 0.70	Substantial agreement with expert inflammatory activity assessment.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a deep learning multi-task architecture (e.g., shared CNN backbone with dual classification heads, Vision Transformer with multi-head output, or hybrid) optimized for simultaneous Hurley staging and inflammatory activity assessment.
Output structured data including:
- Hurley Stage Assessment:
  - Probability distribution across all three Hurley stages (I, II, III)
  - Predicted Hurley stage with confidence score
  - Visual features detected supporting the stage classification (e.g., presence of sinus tracts, scarring extent, lesion interconnection)
  - Treatment recommendation category based on stage-specific guidelines
- Inflammatory Activity Assessment:
  - Binary inflammatory status (Inflammatory / Non-Inflammatory)
  - Probability score for inflammatory activity (0-1 continuous)
  - Confidence score indicating certainty of classification
  - Visual features detected supporting the inflammatory classification:
    - Erythema presence and intensity
    - Active discharge/drainage detection
    - Lesion warmth indicators (indirect visual cues)
    - Acute vs. chronic lesion morphology
Demonstrate performance meeting or exceeding all thresholds for both outputs:
- Hurley Stage: Overall Accuracy ≥ 75%, Weighted Kappa ≥ 0.70, Adjacent Stage Accuracy ≥ 90%, Macro F1 ≥ 0.70, class-specific F1 ≥ 0.65 for all stages, MAE ≤ 0.4 stages
- Inflammatory Activity: AUC ≥ 0.85, Sensitivity and Specificity both ≥ 0.80, F1-score ≥ 0.80, Cohen's Kappa ≥ 0.70
Report all metrics with 95% confidence intervals, confusion matrices for both tasks, and ROC curves for inflammatory activity classification.
Validate the model on an independent and diverse test dataset including:
- Full range of Hurley stages (I, II, III) with balanced representation
- Full spectrum of inflammatory activity:
  - Acute inflammatory flares
  - Moderate inflammatory activity
  - Minimal/resolving inflammation
  - Completely quiescent disease
  - Post-inflammatory changes and scarring
- All combinations of Hurley stage and inflammatory status (e.g., Stage II inflammatory, Stage III non-inflammatory)
- Multiple anatomical sites (axillary, inguinal, perianal, inframammary, gluteal)
- Diverse patient populations (various Fitzpatrick skin types I-VI, ages, genders, body mass index)
- Different disease presentations:
  - Active inflammation vs. quiescent disease at various Hurley stages
  - Active abscesses with surrounding erythema
  - Draining sinus tracts with active discharge
  - Nodules with varying degrees of inflammation
  - Chronic scarred lesions without active inflammation
  - Mixed presentations (some inflammatory, some non-inflammatory lesions)
  - Varying lesion densities
- Various imaging conditions and acquisition devices
- Treatment contexts:
  - Baseline untreated disease
  - Partially treated with residual inflammation
  - Post-treatment quiescent disease
  - Relapsing disease with new inflammatory activity
- Borderline cases for both staging and inflammatory activity to test model robustness
Handle ordinal nature of Hurley stages:
- Implement ordinal classification techniques or apply ordinal loss functions for the Hurley staging head
- Penalize Stage I → Stage III misclassification more heavily than I → II errors
- Ensure predictions respect the natural severity progression
Address challenging scenarios:
- Early Stage II where sinus tracts are minimal or subtle
- Extensive Stage II vs. early Stage III where differentiation requires assessing interconnection extent
- Quiescent disease where active inflammation is minimal but scarring and tracts indicate advanced stage
- Post-inflammatory erythema vs. active inflammation (persistent redness without active disease)
- Early inflammatory changes before overt abscess formation
- Draining tracts that may have chronic drainage without acute inflammation
- Mixed lesions where some areas show inflammation while others are scarred
- Skin type variations where erythema visibility differs (Fitzpatrick V-VI may show less obvious erythema)
- Post-surgical areas where staging must account for treated regions
- Multiple anatomical sites with different severity stages and activity levels
Ensure outputs are compatible with:
- FHIR-based structured reporting for interoperability
- Clinical decision support systems providing stage-specific and activity-based treatment recommendations
- Treatment guidelines (EDF, AAD, BAD, 2024 European HS Guidelines) that specify interventions based on both Hurley stage and inflammatory activity
- Treatment selection algorithms:
  - Stage I + Inflammatory → topical/oral antibiotics
  - Stage II + Inflammatory → systemic therapy including biologics
  - Stage II/III + Non-inflammatory → consider surgical intervention
  - Stage III + Inflammatory → combination medical-surgical approach
- Clinical trial enrollment systems requiring both Hurley stage and inflammatory activity criteria
- Surgical planning systems for advanced stage cases, particularly non-inflammatory presentations
- Disease monitoring dashboards tracking both severity progression and inflammatory activity over time
Provide interpretability features:
- Saliency maps highlighting image regions supporting both classifications:
  - Sinus tracts, scarring, lesion distribution (Hurley staging)
  - Erythematous areas, drainage sites, acute lesion morphology (inflammatory activity)
- Feature detection outputs indicating presence of key criteria:
  - Isolated abscesses (Stage I indicator)
  - Sinus tracts (Stage II-III indicator)
  - Extensive scarring (Stage II-III indicator)
  - Lesion interconnection (Stage III indicator)
  - Perilesional erythema (inflammatory indicator)
  - Active drainage (inflammatory indicator)
  - Post-inflammatory changes (non-inflammatory indicator)
- Feature-based explanations for both outputs:
  - "Stage II indicated by presence of sinus tracts with limited interconnection"
  - "Inflammatory activity detected: high perilesional erythema and active drainage visible"
- Confidence thresholds for automatic classification vs. expert review for both tasks
- Combined guidance for clinical decision-making based on both stage and activity
Document the training strategy including:
- Multi-expert annotation protocol for both Hurley staging and inflammatory activity ground truth (consensus among dermatologists with HS expertise)
- Multi-task learning strategy balancing both outputs
- Handling of class imbalance for both tasks (Stage III and inflammatory activity distributions)
- Data augmentation strategies accounting for realistic lesion variations
- Regularization strategies to prevent overfitting on either task
- Transfer learning approach (if using pre-trained models)
- Loss function design balancing both classification tasks
Implement quality control mechanisms:
- Automatic detection of images unsuitable for HS assessment (insufficient lesion visibility, extreme cropping, quality issues)
- Flagging of ambiguous cases requiring manual expert review for either classification
- Confidence scoring calibration for borderline classifications in both tasks
- Consistency checks between Hurley stage and inflammatory activity (e.g., flagging unusual combinations)
Provide evidence that:
- The model generalizes across different anatomical sites and patient populations for both outputs
- Performance is consistent across all Hurley stages (no systematic bias toward certain stages)
- Performance is maintained across all inflammatory activity levels and Hurley stages
- The model can detect subtle inflammatory changes not immediately apparent to non-experts
- Predictions align with dermatologist consensus and established Hurley criteria
- Stage predictions correlate with clinical outcomes (treatment response, disease burden)
- Inflammatory activity classifications correlate with treatment response and disease progression
- Performance is equitable across different Fitzpatrick skin types (no bias toward lighter skin) for both outputs
- The model distinguishes post-inflammatory changes from active inflammation across all stages
Include failure mode analysis:
- Performance on borderline cases for both staging and inflammatory activity
- Handling of atypical presentations not clearly fitting classical Hurley criteria
- Behavior with images showing multiple anatomical sites at different stages and activity levels
- Performance on mixed presentations (simultaneous inflammatory and non-inflammatory lesions)
- Impact of image quality on classification accuracy for both tasks
- Confidence scoring for cases with limited lesion visibility or requiring clinical examination (palpation)
- Recommendations for cases requiring in-person assessment
Establish clinical validation protocol:
- Prospective validation with expert dermatologist panel for both Hurley staging and inflammatory activity assessment
- Inter-rater reliability comparison (AI vs. multiple HS specialists) for both outputs
- Clinical utility assessment in treatment decision scenarios considering both stage and activity
- Patient outcome correlation with AI-assigned classifications (both stage and inflammatory status)
- Correlation with inflammatory biomarkers (when available: CRP, ESR, cytokine profiles)
- Treatment response prediction validation:
  - Hurley stage → treatment appropriateness
  - Inflammatory activity → biologic therapy response
- Real-world deployment validation in diverse clinical settings

Clinical Impact:

The combined Inflammatory Pattern Identification model directly supports comprehensive clinical decision-making:

Integrated treatment selection: Enables evidence-based therapy choice based on both severity (Hurley stage) and activity (inflammatory status):
- Stage I + Inflammatory → topical/oral antibiotics
- Stage II + Inflammatory → systemic biologics/immunosuppressants
- Stage II/III + Non-inflammatory → surgical consultation
- Stage III + Inflammatory → combined medical-surgical approach
Surgical planning: Identifies optimal timing (non-inflammatory) and necessity (advanced stage) for surgical intervention
Treatment escalation: Documents both progression (stage) and activity (inflammation) warranting therapy intensification
Prognostication: Provides comprehensive severity and activity classification for outcome prediction
Clinical trial optimization: Enables stratification by both severity and activity for enrollment and outcome assessment
Disease monitoring: Tracks both structural disease progression (stage) and functional activity (inflammation) over time
Resource allocation: Facilitates appropriate urgency and referral decisions based on combined assessment
Flare detection: Identifies acute inflammatory exacerbations at any stage requiring prompt intervention
Surgical timing optimization: Distinguishes quiescent disease (optimal for surgery) from active inflammation (higher risk)

Note: This model provides dual assessment of HS presentations: structural severity classification (Hurley staging) and functional activity status (inflammatory vs. non-inflammatory). While these outputs inform treatment decisions, the model provides quantitative data on disease extent, pattern, and activity rather than diagnostic confirmation or specific treatment prescriptions. Clinical correlation including palpation for warmth, tenderness, and fluctuance remains important for comprehensive assessment.

Dermatology Image Quality Assessment (DIQA)

Model Classification: 🛠️ Non-Clinical Model

Description

DIQA is a deep learning image regression model that ingests a dermatological image and outputs a continuous quality score:

\hat{q} \in [0, 10]

where $\hat{q}$ represents the overall image quality on a continuous scale from 0 (unacceptable visual quality) to 10 (excellent, optimal visual quality).

Thanks to the training method used to develop DIQA, the resulting predicted quality score integrates multiple technical and clinical quality dimensions that are usually intertwined in dermatological photography:

Technical Quality Factors:
- Focus/sharpness (blur assessment)
- Lighting conditions (over/underexposure, shadow artifacts)
- Resolution adequacy
- Motion artifacts
- Noise levels
- Color accuracy and white balance
Clinical Quality Factors:
- Lesion visibility and framing
- Appropriate field of view
- Anatomical context
- Appropriate imaging distance

This enables automated quality control in clinical workflows, identifying images that require retaking before downstream AI analysis or clinical review.

Objectives

Enable automated quality control in dermatological imaging workflows to ensure only diagnostic-quality images are analyzed or stored.
Reduce variability in image acquisition by providing real-time feedback to healthcare professionals and patients during image capture.
Prevent downstream AI failures by filtering out poor-quality images that could lead to inaccurate predictions from diagnostic or assessment AI models.
Improve clinical efficiency by reducing the need for image retakes discovered only after clinical review or AI analysis failure.
Support telemedicine applications by providing objective quality standards for patient-captured images in remote monitoring scenarios.
Ensure data quality in clinical trials and research by establishing objective inclusion criteria for image datasets.
Guide user behavior through real-time quality feedback during image acquisition, improving overall imaging practices.

Justification (Clinical Evidence):

Image quality is a critical determinant of AI model performance, with studies showing accuracy degradation of 15-40% when analyzing poor-quality images [203, 204].
Manual quality assessment shows substantial inter-observer variability (κ = 0.45-0.70), with inconsistent standards for "acceptable" quality across different clinicians and institutions [205].
Poor image quality is a leading cause of AI failure in real-world deployments, with 20-35% of clinical images being rejected or requiring retakes due to quality issues [206, 207].
Automated quality assessment has been shown to improve diagnostic accuracy by 12-25% through proactive filtering of suboptimal images before analysis [208].
Patient-captured images in telemedicine show significantly higher rates of quality issues (40-60%) compared to professional photography (5-15%), highlighting the need for automated guidance [209].
Real-time quality feedback during image acquisition has been shown to reduce retake rates by 50-70% and improve first-capture success rates [210].
Standardized quality thresholds improve reproducibility in clinical trials, with quality-controlled datasets showing 30-50% reduction in outcome measure variability [211].
Image quality directly impacts inter-rater reliability in manual lesion assessment, with high-quality images showing κ improvement of 0.15-0.25 compared to poor-quality images [212].

Endpoints and Requirements

Performance is evaluated using correlation with expert quality ratings and classification accuracy at clinically relevant quality thresholds.

Metric	Threshold	Interpretation
Pearson Correlation (r)	≥ 0.80	Strong linear correlation with observers' consensus quality scores.
Spearman Correlation (ρ)	≥ 0.80	Strong rank correlation with observers' quality rankings.

All thresholds have been set according to current reported performances on no-reference IQA tasks [Athar and Wang, 2019][Zhai and Min, 2020], and they must be achieved with 95% confidence intervals.

Requirements:

Implement a deep learning regression architecture capable of learning several quality-related visual patterns simultaneously.
Output structured data with the overall quality score (continuous, 0-10 scale)
Demonstrate performance meeting or exceeding all thresholds with 95% confidence intervals:
- Pearson correlation ≥ 0.80 with expert consensus
- Spearman correlation ≥ 0.80 with expert consensus
Validate the model on an independent and diverse test dataset including:
- Multiple dermatological conditions (inflammatory, pigmented, neoplastic, infectious, etc)
- Various anatomical sites (face, trunk, extremities, hands, feet, scalp, nails, etc)
- Different imaging devices (smartphones, digital cameras, dermatoscopes, professional medical cameras)
- Diverse quality levels (full range from unacceptable to excellent)
- Common quality defects, such as:
  - Out-of-focus/blurred images
  - Over/underexposed images
  - Images with motion blur
  - Poor framing (lesion partially visible or too distant)
  - Obstructions (hair, clothing, glare, shadows)
  - Low resolution images
  - Incorrect white balance/color cast
- Various patient populations (different skin tones, ages, body sites)
- Different acquisition contexts (professional clinical, patient self-capture, telemedicine)
Establish quality thresholds for clinical decision-making:
- Score ≥ 8: Excellent quality, optimal for all AI analyses and clinical review
- Score 6-8: Good quality, acceptable for clinical use with minor limitations
- Score 4-6: Marginal quality, may be acceptable for some purposes but retake recommended
- Score <4: Poor quality, unacceptable for clinical use, retake required
- Critical threshold: Score ≥ 6 for acceptance in clinical workflows
Document the training strategy including:
- Multi-observer annotation protocol for quality ground truth (consensus scoring)
- Handling of quality dimension interactions and trade-offs
- Data augmentation strategies to simulate common quality defects
- Loss function design for regression on bounded scale (0-10)
- Calibration techniques to ensure score reliability

Clinical Impact:

The DIQA model serves as a critical quality gate in the AI-assisted dermatology workflow:

Pre-processing filter: Ensures only diagnostic-quality images are analyzed by downstream AI models (diagnosis, severity assessment, lesion quantification)
User guidance: Provides real-time feedback during image acquisition, improving imaging practices over time
Workflow efficiency: Reduces clinical time wasted on reviewing or analyzing poor-quality images
Patient safety: Prevents clinical decisions based on non-diagnostic images that could lead to misdiagnosis or inappropriate treatment
Telemedicine enablement: Makes remote dermatology viable by ensuring patient-captured images meet quality standards
Research quality: Ensures dataset quality in clinical trials and research studies through objective inclusion criteria

Note: This is a non-clinical model that assesses technical and clinical image quality characteristics but does not make medical diagnoses or clinical assessments. It serves as a quality control tool to support clinical workflows and other AI models.

Fitzpatrick Skin Type Identification

Model Classification: 🛠️ Non-Clinical Model

Description

A deep learning multi-class classification model ingests a clinical dermatological image and outputs a probability distribution across the six Fitzpatrick skin type categories:

\mathbf{p}_{\text{FST}} = [p_{\text{I}}, p_{\text{II}}, p_{\text{III}}, p_{\text{IV}}, p_{\text{V}}, p_{\text{VI}}]

where each $p_i$ corresponds to the probability that the skin in the image belongs to Fitzpatrick skin type $i$ , and $\sum p_i = 1$ .

The Fitzpatrick skin types are defined as:

Type I: Very fair skin, always burns, never tans (pale white skin, often with red/blonde hair)
Type II: Fair skin, usually burns, tans minimally (white skin, burns easily)
Type III: Medium skin, sometimes burns, tans uniformly (cream white skin, burns moderately)
Type IV: Olive skin, rarely burns, tans easily (moderate brown skin)
Type V: Brown skin, very rarely burns, tans very easily (dark brown skin)
Type VI: Dark brown to black skin, never burns, tans very easily (deeply pigmented dark brown to black skin)

The predicted Fitzpatrick type is:

\text{FST} = \arg\max_{k \in [\text{I}, \text{II}, \text{III}, \text{IV}, \text{V}, \text{VI}]} p_k

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives

Enable automated skin type detection to support personalized dermatological AI models that require skin tone information for accurate predictions.
Reduce assessment variability in skin type classification, which shows moderate inter-observer agreement (κ = 0.50-0.65) even among dermatologists [213, 214].
Support bias mitigation in AI models by identifying underrepresented skin types in datasets and ensuring equitable performance across all Fitzpatrick types.
Facilitate treatment personalization by providing objective skin type information relevant for phototherapy dosing, laser treatment parameters, and topical therapy selection.
Enable research stratification by providing consistent skin type classification for clinical trials and real-world evidence studies.
Support regulatory compliance by ensuring AI models are validated across diverse skin types as required by regulatory guidelines.
Improve telemedicine accessibility by providing automated skin type assessment in remote settings where patient-reported skin type may be unreliable.

Justification (Clinical Evidence):

Fitzpatrick skin type is a critical factor in dermatological assessment, influencing disease presentation, treatment selection, and AI model performance [215, 216].
Self-reported Fitzpatrick type shows poor accuracy, with concordance to expert assessment ranging from 40-60%, particularly for intermediate types (III-IV) [217, 218].
AI model performance shows significant disparities across skin types, with accuracy degradation of 10-30% for darker skin types (V-VI) when models are trained on predominantly lighter skin datasets [219, 220].
Automated skin type detection enables adaptive AI models that adjust prediction thresholds or use skin type-specific models, improving accuracy by 15-25% for underrepresented groups [221].
Treatment dosing for phototherapy and laser procedures requires accurate skin type assessment, with misclassification leading to suboptimal efficacy or adverse events in 15-20% of cases [222].
Clinical trials increasingly require Fitzpatrick type stratification to demonstrate equitable treatment efficacy and safety across diverse populations [223].
Studies show that objective skin type classification improves inter-rater reliability from κ = 0.50-0.65 (manual) to κ = 0.75-0.85 (automated) [224].
Automated detection addresses the limitation of visual assessment under different lighting conditions, which can shift perceived skin type by 1-2 categories [225].

Endpoints and Requirements

Performance is evaluated using classification accuracy, weighted kappa, and per-class metrics compared to expert dermatologist Fitzpatrick type assessments.

Metric	Threshold	Interpretation
Overall Accuracy	≥ 70%	Acceptable classification performance for automated skin type detection.
Weighted Kappa (κw)	≥ 0.65	Substantial agreement with expert dermatologist Fitzpatrick type classification.
Adjacent Type Accuracy	≥ 85%	Within one Fitzpatrick type of expert assessment (clinically acceptable).
Macro F1-Score	≥ 0.65	Balanced performance across all six Fitzpatrick types.
Class-specific F1 (per type)	≥ 0.60	Minimum acceptable F1 for each Fitzpatrick type (I-VI).
Mean Absolute Error (MAE)	≤ 0.8	Average error less than one full Fitzpatrick type from expert consensus.
AUC-ROC per class	≥ 0.80	Good discriminative ability for each individual Fitzpatrick type.
Balanced Accuracy	≥ 0.70	Ensures equitable performance across all skin types, avoiding bias toward common types.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a deep learning classification architecture optimized for skin tone analysis.
Output structured data including:
- Probability distribution across all six Fitzpatrick types (I-VI)
- Predicted Fitzpatrick type with confidence score
- Secondary type probability to identify borderline cases
- Confidence indicators for predictions requiring manual verification
Demonstrate performance meeting or exceeding all thresholds:
- Overall accuracy ≥ 70% and weighted kappa ≥ 0.65
- Adjacent type accuracy ≥ 85% (within one type tolerance)
- Class-specific F1 ≥ 0.60 for all types
- Balanced accuracy ≥ 0.70 ensuring equitable performance
Report all metrics with 95% confidence intervals and confusion matrices showing prediction patterns.
Validate the model on an independent and diverse test dataset including:
- Balanced representation of all six Fitzpatrick types
- Multiple anatomical sites (face, forearm, trunk, unexposed vs. sun-exposed skin)
- Various imaging conditions (natural light, clinical photography, different illuminants)
- Diverse patient populations (various ethnicities, ages, geographic regions)
- Different dermatological conditions (normal skin, inflammatory conditions, pigmentary disorders)
- Various image quality levels to test robustness
Handle ordinal nature of Fitzpatrick scale:
- Implement ordinal classification techniques or apply ordinal loss functions
- Penalize distant type errors more heavily than adjacent type errors
- Ensure predictions respect the natural ordering of skin types
Address lighting variability:
- Validate performance across different lighting conditions (natural, artificial, mixed)
- Document lighting requirements and acceptable ranges
- Provide confidence scoring that reflects lighting quality impact
- Consider color calibration or normalization techniques
Handle challenging scenarios:
- Borderline cases between adjacent Fitzpatrick types
- Tanned or sun-exposed skin vs. baseline skin tone
- Patients with mixed ethnic backgrounds
- Vitiligo or other pigmentary disorders affecting local skin tone
- Makeup, tattoos, or other skin modifications
Ensure outputs are compatible with:
- Downstream AI models that require skin type information as input
- FHIR-based structured reporting for interoperability
- Clinical decision support systems for treatment personalization
- Bias monitoring dashboards tracking AI performance across skin types
- Research data collection systems for clinical trial stratification
Provide interpretability features:
- Visualization of skin regions used for classification
- Melanin index estimation or related biomarkers
- Explanation of ambiguous or borderline classifications
- Confidence thresholds for automatic vs. manual classification
Document the training strategy including:
- Data collection protocol ensuring balanced representation
- Multi-expert annotation protocol for ground truth establishment
- Handling of class imbalance (if present)
- Data augmentation strategies preserving skin tone characteristics
- Regularization and calibration techniques
- Transfer learning approach (if applicable)
Implement quality control mechanisms:
- Automatic detection of images unsuitable for skin type assessment (poor quality, obstructions)
- Flagging of inconsistent lighting conditions
- Identification of skin regions affected by pathology vs. normal skin
- Recommendations for image retake when confidence is low
Provide evidence that:
- The model generalizes across different dermatological conditions and anatomical sites
- Performance is maintained across various imaging devices and settings
- The model provides equitable performance for all Fitzpatrick types (no systematic bias)
- Predictions align with expert dermatologist consensus
- The algorithm handles lighting variations appropriately
- Automated classification improves inter-rater reliability compared to manual assessment
Include bias assessment and mitigation:
- Regular auditing of performance disparities across skin types
- Documentation of dataset composition by Fitzpatrick type
- Strategies for addressing underrepresentation in training data
- Transparency reporting on per-type performance metrics
- Continuous monitoring of real-world performance across diverse populations
Document failure modes and limitations:
- Performance on skin with active dermatological conditions affecting pigmentation
- Impact of recent sun exposure or tanning on classification accuracy
- Handling of mixed or ambiguous ethnic backgrounds
- Lighting conditions that may lead to unreliable predictions
- Recommendations for cases requiring expert manual classification

Clinical Impact:

The Fitzpatrick Skin Type Identification model serves multiple critical functions:

Bias mitigation: Enables skin type-aware AI models that maintain equitable performance across all populations
Treatment personalization: Supports accurate dosing for phototherapy, laser procedures, and skin-type-specific therapeutics
Research equity: Ensures clinical trials include and stratify diverse skin types for representative evidence
Quality assurance: Validates that dermatological AI systems perform equitably across all Fitzpatrick types
Regulatory compliance: Demonstrates AI model validation across diverse populations as required by regulatory agencies
Clinical workflow integration: Provides automated skin type documentation for electronic health records

Note: This is a Non-Clinical model that provides skin type classification to enhance other AI models and clinical workflows. While it informs clinical decision-making (e.g., phototherapy dosing), it does not independently diagnose conditions or determine treatment. The model serves as an auxiliary tool for bias mitigation, personalization, and ensuring equitable AI performance across diverse patient populations.

Domain Validation

Model Classification: 🛠️ Non-Clinical Model

Description

A deep learning multi-class classification model ingests an image and outputs a probability distribution $\mathbf{p}$ across three domain categories:

\mathbf{p} \in \mathbb{R}^{|\mathcal{D}|}, \quad \mathcal{D} = \{\text{non-skin, skin-clinical, skin-dermoscopic}\}

where each $p_d$ component of vector $\mathbf{p}$ corresponds to the probability that the image belongs to domain category $d$ , and $\sum_{d} p_d = 1$ .

The model classifies images into three mutually exclusive domains:

Non-Skin: Images that do not contain visible skin (e.g., general objects, landscapes, text documents, completely obscured images, non-skin body parts such as eyes, teeth, or internal organs)
Skin clinical Image: Standard clinical photographs showing skin surface captured with visible light imaging (standard photography, smartphone cameras, clinical digital cameras) - skin may be healthy or show any dermatological condition.
Skin dermoscopic Image: Specialized dermoscopic images of skin acquired using dermoscopy devices with magnification and specialized illumination for subsurface skin structure visualization - skin may be healthy or show any dermatological condition

The predicted domain is:

\text{Domain} = \arg\max_{d} p_{d}

where $d \in \mathcal{D}$ . Additionally, the model outputs a continuous confidence score representing the probability of the predicted class.

Objectives

Prevent out-of-domain failures by filtering non-skin images before they reach downstream dermatological AI models.
Improve workflow efficiency by automatically triaging images to the correct analysis pathway without manual intervention.
Enhance patient safety by preventing inappropriate AI analysis of images that do not contain skin or meet domain requirements.
Support quality control in image acquisition by providing immediate feedback when incorrect image types are captured.
Enable multimodal clinical workflows where both clinical and dermoscopic images of skin may be captured and need to be processed differently.
Facilitate data curation by automatically organizing image archives based on imaging modality and skin presence.

Justification (Clinical Evidence):

Domain-specific AI models show significantly better performance when trained and deployed on their target imaging modality, with accuracy differences of 15-35% between skin clinical and skin dermoscopic images [226, 227].
Applying clinical-trained models to dermoscopic images (or vice versa) results in substantial performance degradation and increased false positive/negative rates [228].
Approximately 5-15% of images submitted to dermatological AI systems are non-skin or incorrect modality, leading to system failures or misleading outputs [229].
Automated domain classification reduces workflow errors by 60-80% compared to manual image routing, particularly in high-volume telemedicine settings [230].
Dermoscopic images require specialized processing pipelines including hair removal, illumination normalization, and magnification-aware feature extraction that are inappropriate for clinical skin images [231].
Clinical validation studies show that domain mismatch is a leading cause of AI system failures in real-world deployment, accounting for 25-40% of erroneous predictions [232].
Mixed-modality datasets without proper domain separation show reduced model performance (10-20% accuracy drop) compared to domain-specific training [233].

Endpoints and Requirements

Performance is evaluated using classification accuracy, class-specific metrics, and confidence calibration compared to expert-labeled ground truth domain annotations.

Metric	Threshold	Interpretation
Overall Accuracy	≥ 95%	High accuracy required to prevent domain-routing errors that could impact patient care.
Non-Skin Precision	≥ 0.95	Minimize false acceptance of non-skin images into dermatological workflows.
Non-Skin Recall	≥ 0.90	High sensitivity for detecting and rejecting non-skin images.
Skin Clinical Image F1-Score	≥ 0.90	Balanced performance for skin clinical image identification and routing.
Skin Dermoscopic Image F1-Score	≥ 0.90	Balanced performance for skin dermoscopic image identification and routing.
Macro F1-Score	≥ 0.90	Balanced performance across all three domain categories.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a deep learning classification model optimized for domain recognition across diverse image types.
Output structured data including:
- Probability distribution across all domain categories
- Predicted domain class with confidence score
- Binary skin presence flag (True if skin clinical or skin dermoscopic, False if non-skin)
Demonstrate performance meeting or exceeding all thresholds:
- Overall accuracy ≥ 95%
- Class-specific F1 scores ≥ 0.90 for skin clinical and skin dermoscopic images
Report all metrics with 95% confidence intervals and confusion matrices detailing prediction patterns.
Validate the model on an independent and diverse test datasets.
Document the training strategy including:
- Transfer learning approach leveraging both medical and general computer vision
- Data augmentation strategies appropriate for each domain
- Balanced representation of all three domain categories
- Handling of domain ambiguity in edge cases (e.g. close-up clinical images that may be confused with dermoscopy)
Provide evidence that:
- The model generalizes across different skin presentations and skin types

Clinical Impact:

The Domain Validation model serves as a critical gateway and routing system:

Patient safety: Prevents inappropriate AI analysis of non-skin or mismatched-modality images that could lead to erroneous clinical decisions
Workflow optimization: Automatically routes images to appropriate analysis pipelines (skin clinical vs. skin dermoscopic) without manual intervention
Error prevention: Eliminates domain mismatch errors that account for 25-40% of AI system failures in deployment
Quality control: Provides immediate feedback when incorrect images are submitted, enabling user correction
Multimodal support: Enables sophisticated clinical workflows where both skin clinical and skin dermoscopic images are used complementarily
Data integrity: Ensures research datasets and clinical archives maintain proper domain separation for valid analysis

Note: This is a Non-Clinical model that performs image domain classification to route images to appropriate analysis pipelines. It does not make medical diagnoses or clinical assessments. The model serves as a technical gateway ensuring that dermatological AI systems receive appropriate input images containing skin, thereby supporting the safety and efficacy of downstream clinical models.

Skin Surface Segmentation

Model Classification: 🛠️ Non-Clinical Model

Description

A deep learning binary segmentation model ingests a clinical image and outputs a pixel-wise probability map indicating skin presence:

M(x, y) \in [0, 1], \quad \forall (x, y) \in \text{Image}

where $M(x, y)$ represents the probability that pixel $(x, y)$ belongs to skin tissue.

The model generates a binary segmentation mask by applying a threshold:

\hat{M}(x, y) = \mathbb{1}[M(x, y) \geq \tau]

where $\tau$ is typically set to 0.5, and $\hat{M}(x, y) \in \{\text{Skin}, \text{Non-Skin}\}$ .

From this segmentation, the algorithm can compute:

Total skin surface area in pixels (or calibrated units when scale reference available)
Skin region bounding boxes for automated cropping or region-of-interest extraction
Skin surface percentage relative to total image area
Multiple disconnected skin regions when present

This provides automated skin detection and isolation, enabling downstream clinical models to focus analysis on relevant skin regions while excluding background, clothing, and non-skin anatomical features.

Objectives

Enable automated region-of-interest extraction for downstream clinical AI models by isolating skin regions from background and non-skin elements.
Support surface area quantification algorithms by providing accurate skin boundaries for percentage calculations (e.g., body surface area affected by lesions).
Improve robustness of clinical models by preprocessing images to focus on skin regions, reducing confounding factors from background elements.
Facilitate automated image cropping to standardize input regions for clinical assessment models.
Enable quality control by detecting images with insufficient skin visibility or excessive occlusion.
Support multi-region analysis by identifying and separating multiple disconnected skin areas within a single image.
Provide foundational input for higher-level segmentation tasks (e.g., lesion segmentation, body region identification).

Justification (Clinical Evidence):

Accurate skin segmentation is a prerequisite for many dermatological AI tasks, with downstream model accuracy improving by 15-30% when operating on properly segmented skin regions vs. raw images [253, 254].
Manual skin region annotation is time-consuming and variable, with inter-observer agreement (IoU) ranging from 0.75-0.85, particularly at boundaries with hair, clothing, or complex backgrounds [255].
Automated skin segmentation has demonstrated high accuracy (IoU > 0.90) across diverse imaging conditions and patient populations [256, 257].
Background elements in dermatological images can introduce confounding features that reduce clinical model accuracy by 10-25%, which skin segmentation effectively mitigates [258].
Skin detection is critical for telemedicine applications where patient-captured images often contain significant non-skin content (40-60% of image area) [259].
Accurate skin boundary detection enables precise surface area calculations essential for severity scoring systems (PASI, EASI, BSA estimation) [260].
Studies show that skin segmentation preprocessing improves diagnostic AI robustness to image composition variations, reducing performance degradation from 20-30% to <5% across different framing conditions [261].

Endpoints and Requirements

Performance is evaluated using Intersection over Union (IoU) and pixel-wise metrics compared to expert-annotated ground truth skin masks.

Metric	Threshold	Interpretation
Mean IoU (Skin class)	≥ 0.85	Strong overlap with expert skin region annotations.
Pixel Accuracy	≥ 0.90	High overall classification accuracy across all pixels.
Sensitivity (Skin)	≥ 0.90	High sensitivity for detecting skin pixels (minimize missed skin regions).
Specificity (Non-Skin)	≥ 0.85	High specificity for identifying non-skin (minimize false skin detection).
Boundary F1-Score	≥ 0.80	Accurate delineation of skin boundaries (critical for area calculations).
Dice Coefficient	≥ 0.90	Strong overall segmentation quality and region overlap.
False Positive Rate	≤ 0.10	Low rate of non-skin pixels incorrectly classified as skin.
Edge Accuracy (5px)	≥ 0.75	Boundary pixels within 5-pixel tolerance of expert annotation.
Multi-region Detection	≥ 0.85	Accuracy in identifying multiple disconnected skin regions when present.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a deep learning segmentation architecture (e.g., U-Net, DeepLabV3+, Mask R-CNN, or similar) optimized for skin detection.
Output structured data including:
- Binary segmentation mask (Skin / Non-Skin) at full image resolution
- Probability map providing pixel-wise confidence scores (0-1)
- Bounding boxes for detected skin regions
- Connected component analysis identifying individual skin areas when multiple regions present
- Skin surface area in pixels and percentage of image
- Quality indicators flagging insufficient skin visibility or excessive occlusion
Demonstrate performance meeting or exceeding all thresholds:
- Mean IoU ≥ 0.85 and Dice Coefficient ≥ 0.90
- Pixel Accuracy ≥ 0.90
- Sensitivity ≥ 0.90 and Specificity ≥ 0.85
- Boundary F1-Score ≥ 0.80
Report all metrics with 95% confidence intervals on independent test sets.
Validate the model on an independent and diverse test dataset including:
- Multiple anatomical sites: face, neck, trunk, extremities, hands, feet, scalp, nails, intertriginous areas, mucosa
- Various skin conditions: healthy skin, inflammatory conditions, pigmented lesions, neoplastic lesions, infectious conditions
- Diverse patient populations: all Fitzpatrick skin types (I-VI), various ages, body habitus
- Different imaging contexts:
  - Clinical photography (controlled lighting, professional framing)
  - Patient self-captured images (variable lighting, framing, backgrounds)
  - Dermoscopic images (close-up skin, minimal background)
  - Telemedicine images (diverse backgrounds, varied quality)
- Challenging scenarios:
  - Images with complex backgrounds (patterned clothing, furniture, outdoor settings)
  - Partial skin visibility (cropped images, occluded regions)
  - Multiple disconnected skin regions (e.g., face and hands in same image)
  - Hair-covered skin regions (scalp, beard, body hair)
  - Skin-colored objects or surfaces in background
  - Extreme lighting conditions (shadows, highlights, color casts)
Handle boundary challenges:
- Hair boundaries: Accurately segment skin at hairline, through facial/body hair
- Clothing edges: Precise delineation at skin-clothing boundaries
- Jewelry and accessories: Exclude watches, rings, necklaces while preserving adjacent skin
- Shadows and highlights: Maintain segmentation accuracy despite lighting variations
- Skin folds: Accurate segmentation in intertriginous areas and natural creases
Ensure outputs are compatible with:
- Downstream clinical AI models requiring skin region input (lesion analysis, severity assessment)
- Surface area quantification algorithms for BSA calculations
- Image preprocessing pipelines for automated cropping and standardization
- FHIR-based structured reporting for documentation of analyzed regions
- Quality control systems using skin visibility as acceptance criteria
Provide post-processing capabilities:
- Morphological operations to refine segmentation (hole filling, edge smoothing)
- Connected component filtering to remove small false positive regions
- Region ranking by size to identify primary vs. secondary skin areas
- Boundary refinement using edge-aware techniques for precise delineation
Document the training strategy including:
- Multi-expert annotation protocol for ground truth segmentation
- Handling of ambiguous boundaries (e.g., translucent hair over skin)
- Data augmentation strategies preserving skin-background relationships
- Loss function design (e.g., combined Dice + Cross-Entropy, boundary-aware losses)
- Class balancing approach given typical skin/background imbalance
Implement real-time processing capabilities:
- Inference time < 1 second for typical dermatological images
- Memory-efficient architecture suitable for mobile/edge deployment
- Batch processing support for archive analysis
Provide evidence that:
- The model generalizes across diverse anatomical sites and skin conditions
- Performance is maintained across all Fitzpatrick skin types without bias
- Segmentation accuracy is consistent across different imaging devices and conditions
- Boundary precision is sufficient for accurate surface area calculations
- The model handles partial skin visibility and occlusions robustly
- Multi-region detection works reliably when multiple skin areas are present
Include quality control features:
- Skin visibility score: Percentage of image containing skin
- Occlusion detection: Flags for excessive clothing/obstruction
- Boundary quality score: Confidence in edge delineation
- Multi-region flag: Indicates presence of multiple disconnected skin areas
- Segmentation confidence map: Pixel-wise uncertainty estimation
Include failure mode analysis:
- Performance on skin-colored backgrounds or objects
- Handling of extreme close-ups where context is minimal
- Behavior with non-standard skin appearances (tattoos, makeup, artificial coloring)
- Impact of image quality degradation on segmentation accuracy
- Documentation of scenarios requiring manual review or alternative approaches

Clinical Impact:

The Skin Surface Segmentation model serves as foundational infrastructure supporting multiple clinical applications:

Preprocessing for clinical models: Provides focused skin regions for downstream diagnostic and assessment AI models
Surface area quantification: Enables accurate BSA calculations for severity indices (PASI, EASI, burn assessment)
Quality control: Identifies images with insufficient skin visibility requiring retake
Automated cropping: Standardizes region-of-interest for consistent clinical analysis
Multi-region analysis: Supports comprehensive assessment when multiple anatomical sites are captured
Telemedicine enablement: Handles variable patient-captured images with diverse backgrounds

Note: This is a Non-Clinical model that performs skin region detection and segmentation to support downstream clinical models and surface area calculations. It does not make medical diagnoses or clinical assessments. The model serves as technical preprocessing infrastructure ensuring that clinical AI models operate on appropriate skin regions, thereby supporting the accuracy and reliability of quantitative clinical outputs.

Surface Area Quantification

Model Classification: 🛠️ Non-Clinical Model

Description

A multi-stage computer vision pipeline ingests a clinical image of a body site containing one or more reference markers (calibration objects of known physical dimensions) and outputs a pixel-to-centimeter conversion map that accounts for depth variation across the image surface, enabling accurate surface area quantification of skin regions and lesions.

The algorithm consists of four primary stages:

Stage 1: Reference Marker Detection

A deep learning object detection model identifies and localizes reference markers in the image:

\mathbf{R} = [(b_1, s_1, c_1), (b_2, s_2, c_2), \ldots, (b_N, s_N, c_N)]

where $b_i$ is the bounding box for the $i$ -th detected marker, $s_i$ is the known physical size of the marker (in cm), and $c_i$ is the detection confidence score. $N$ ranges from 1 to multiple markers placed at different depths/distances from the camera.

Stage 2: Local Pixel-to-Centimeter Calibration

For each detected marker $i$ , a local calibration factor is computed:

\alpha_i = \frac{s_i}{w_i}

where $w_i$ is the width of the marker in pixels (derived from $b_i$ ), and $\alpha_i$ represents the cm/pixel ratio at the marker's location.

Stage 3: Depth Map Estimation

A deep learning monocular depth estimation model ingests the image and outputs a dense depth map:

D(x, y) \in [d_{\min}, d_{\max}], \quad \forall (x, y) \in \text{Image}

where $D(x, y)$ represents the estimated relative depth (distance from camera) at pixel $(x, y)$ . The depth map is normalized to a consistent scale using the detected markers as anchor points.

Stage 4: Depth-Aware Calibration Map Generation

The local calibration factors from the markers are propagated across the entire image using the depth map to generate a spatially-varying pixel-to-centimeter conversion map:

\alpha(x, y) = f(D(x, y), \{(\alpha_i, D(x_i, y_i))\}_{i=1}^N)

where $f$ is an interpolation function (e.g., inverse distance weighting, radial basis functions, or learned mapping) that extrapolates calibration based on depth similarity. This accounts for perspective distortion and varying distances across the body surface.

The final surface area of a segmented region $S$ (in cm²) is computed as:

\text{Area}(S) = \sum_{(x,y) \in S} \alpha(x, y)^2

where each pixel's contribution is weighted by its local calibration factor squared (converting from linear to area units).

Objectives

Enable accurate surface area quantification in body surface area (BSA) affected calculations for severity scoring systems (PASI, EASI, burn assessment, vitiligo VASI).
Account for depth variation across non-planar body surfaces, providing more accurate measurements than simple 2D planimetry.
Support flexible marker placement allowing 1 to multiple reference markers positioned at varying depths for improved accuracy.
Reduce measurement error associated with perspective distortion, camera angle, and irregular body surface curvature.
Provide calibrated measurements in standardized physical units (cm², percentage of body site) for clinical documentation and research.
Enable automated BSA percentage calculation by combining surface area measurements with body site identification.
Support telemedicine workflows where physical ruler measurements are impractical or unavailable.

Justification (Clinical Evidence):

Body surface area quantification is fundamental to severity scoring in dermatology, with PASI, EASI, and burn assessment all requiring accurate BSA affected estimates [275, 276].
Manual BSA estimation shows high inter-observer variability (coefficient of variation 20-40%), particularly for irregular lesions or when visual estimation methods are used [277, 278].
Simple 2D planimetry without depth correction introduces systematic errors of 15-35% when measuring non-planar body surfaces due to perspective distortion and surface curvature [279].
Reference marker-based calibration has been validated in wound measurement showing accuracy within 5-10% of gold-standard methods (water displacement, 3D scanning) [280, 281].
Monocular depth estimation combined with calibration markers achieves mean absolute error <8% for surface area quantification on curved surfaces [282].
Automated BSA quantification improves reproducibility in clinical trials, with standardized measurements showing 50-70% reduction in outcome variability compared to visual estimation [283].
Depth-aware surface area calculation is particularly critical for body sites with significant curvature (joints, torso, scalp) where 2D approximations introduce substantial error [284].

Endpoints and Requirements

Performance is evaluated using relative error compared to gold-standard measurements (3D scanning, calibrated photography, or expert annotation with known ground truth).

Metric	Threshold	Interpretation
Marker Detection Precision	≥ 0.95	High precision for detecting reference markers (minimize false positives).
Marker Detection Recall	≥ 0.90	High recall for detecting all placed markers (minimize missed markers).
Marker Localization Accuracy	≤ 5px	Bounding box center within 5 pixels of true marker center.
Depth Map Relative Error	≤ 15%	Depth estimates within 15% of relative ground truth depth.
Surface Area Relative Error (Planar)	≤ 10%	Area measurement within 10% of ground truth for flat surfaces with single marker.
Surface Area Relative Error (Curved)	≤ 15%	Area measurement within 15% of ground truth for curved surfaces with multiple markers.
Calibration Map Consistency	≥ 0.85	Correlation between predicted and ground truth calibration across image (IoU-like).
Multi-Marker Fusion Accuracy	≤ 8%	Improved accuracy when multiple markers used vs. single marker baseline.
Edge Case Performance (1 marker, curve)	≤ 20%	Acceptable degradation for challenging single-marker curved surface scenarios.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Stage 1: Reference Marker Detection Requirements

Implement a robust object detection model capable of detecting:
- Standard calibration markers: Circular stickers, square patches, ruler segments with known dimensions
- Multiple marker types: Support for various marker designs with documented physical sizes
- Partial occlusion: Detect markers even when partially obscured
- Variable placement: Markers at different positions and depths in the image
Output structured data including:
- Bounding box coordinates for each detected marker
- Marker type classification (to retrieve known physical size from database)
- Detection confidence score
- Estimated marker orientation (if applicable for non-circular markers)
Demonstrate:
- Precision ≥ 0.95 and Recall ≥ 0.90 for marker detection
- Localization accuracy ≤ 5px from true marker center
- Robust detection across varying lighting, skin tones, and backgrounds

Stage 2: Local Calibration Requirements

Implement accurate marker size measurement:
- Sub-pixel precision for marker dimension extraction
- Orientation-corrected measurement for non-circular markers
- Outlier rejection for malformed or damaged markers
Compute local calibration factors (cm/pixel) for each detected marker
Handle multiple markers:
- Validate consistency across markers at similar depths
- Flag inconsistent markers for quality control
- Weight markers by detection confidence

Stage 3: Depth Map Estimation Requirements

Implement a monocular depth estimation model (e.g., MiDaS, DPT, or similar state-of-the-art architecture):
- Outputs dense depth map at image resolution
- Provides relative depth estimates (ordinal depth ranking)
- Robust to skin texture, lesions, and varying body contours
Depth map post-processing:
- Marker-based depth normalization: Calibrate depth map scale using detected markers
- Smoothing: Edge-preserving filtering to reduce noise while maintaining anatomical boundaries
- Confidence estimation: Per-pixel depth confidence scores
Demonstrate:
- Relative depth error ≤ 15% compared to ground truth depth measurements
- Consistent depth estimation across different body sites and skin conditions

Stage 4: Calibration Map Generation Requirements

Implement depth-aware interpolation to propagate calibration across the image:
- Input: Local calibration factors from markers + depth map
- Output: Dense pixel-to-cm conversion map $\alpha(x, y)$ at full image resolution
- Interpolation methods (implement one or more):
  - Inverse distance weighting in depth-space
  - Radial basis function interpolation
  - Learned interpolation network conditioned on depth
Handle edge cases:
- Single marker: Use depth map to extrapolate with degraded accuracy warnings
- Multiple markers at similar depth: Average calibration with depth-based weighting
- Markers at very different depths: Depth-proportional calibration scaling
Quality control:
- Consistency checks: Validate that calibration map is smooth and physically plausible
- Outlier detection: Flag regions with unreliable depth or extreme calibration values
- Confidence maps: Provide per-pixel confidence in calibration accuracy

Surface Area Calculation Requirements

Implement area integration:
- Accept binary or multi-class segmentation mask as input
- Apply calibration map to compute area in cm²
- Account for pixel-level calibration variation
Output structured data including:
- Total surface area (cm²) for each segmented region
- Percentage of body site (when body site is identified)
- Percentage of total body surface area (BSA) (using anatomical site weighting)
- Measurement confidence score
- Calibration quality indicators (number of markers used, depth variation, etc.)
Demonstrate:
- Relative error ≤ 10% for planar surfaces
- Relative error ≤ 15% for curved surfaces
- Improved accuracy when multiple markers used vs. single marker

General Requirements

Validate the full pipeline on an independent and diverse test dataset including:
- Various body sites: Face, trunk, extremities, scalp, hands, feet
- Different surface curvatures: Flat (abdomen), moderate (forearm), high (joints, scalp)
- Multiple marker configurations: 1, 2, 3, 4+ markers at varying depths
- Various imaging conditions: Different lighting, camera angles, distances
- Diverse patient populations: Various Fitzpatrick skin types, ages, body habitus
- Different skin conditions: Healthy skin, lesions, wounds, burns
Handle quality control scenarios:
- No markers detected: Flag error, require marker placement
- Single marker with high curvature: Provide measurement with increased uncertainty
- Inconsistent markers: Flag quality warning, exclude outlier markers
- Poor depth estimation: Flag low-confidence regions in calibration map
Ensure outputs are compatible with:
- PASI, EASI, VASI, burn assessment calculation systems
- Body surface area (BSA) estimation using Rule of Nines or Lund-Browder charts
- FHIR-based structured reporting for interoperability
- Clinical decision support systems requiring quantitative surface area data
- Research data collection for clinical trials
Provide interpretability features:
- Visualization: Overlay of depth map, calibration map, and segmented regions
- Measurement breakdown: Contribution of different regions to total area
- Quality indicators: Number of markers, depth range, calibration consistency
- Uncertainty quantification: Confidence intervals for area measurements
Document the pipeline architecture including:
- Marker detection model architecture and training
- Depth estimation model architecture (pre-trained or custom)
- Interpolation algorithm design and validation
- Integration strategy for multi-stage pipeline
- Error propagation analysis across stages
Implement real-time or near-real-time processing:
- Full pipeline execution < 5 seconds for typical images
- Suitable for clinical workflow integration
- Batch processing support for research applications
Provide evidence that:
- The pipeline generalizes across different body sites and curvatures
- Multiple markers improve accuracy compared to single-marker baseline
- Depth-aware calibration outperforms simple 2D planimetry
- Measurements correlate with gold-standard 3D scanning methods
- The system is robust to typical clinical photography variations
- Performance is equitable across different Fitzpatrick skin types

Clinical Impact:

The Surface Area Quantification pipeline serves critical functions:

Accurate severity scoring: Enables precise BSA affected calculations for PASI, EASI, burn assessment, VASI
Reproducible measurements: Reduces inter-observer variability in area estimation by 50-70%
Depth-aware quantification: Accounts for body surface curvature, improving accuracy by 15-30% vs. 2D methods
Flexible deployment: Works with 1-N markers, balancing accuracy and clinical convenience
Telemedicine enablement: Provides calibrated measurements from patient-captured images with reference markers
Clinical trial support: Standardizes surface area endpoints with objective, reproducible methodology
Treatment monitoring: Enables accurate tracking of lesion size changes over time

Technical Details:

Reference Marker Specifications:

Recommended markers: Circular adhesive stickers (1cm, 2cm, 5cm diameter), square patches (1cm×1cm, 2cm×2cm)
Marker placement: Position markers on or adjacent to the region of interest, ideally at varying depths for curved surfaces
Marker database: Predefined catalog of approved marker types with documented physical dimensions

Depth Estimation Approach:

Monocular depth estimation: Uses single RGB image to estimate relative depth without requiring specialized hardware
Marker-based calibration: Detected markers provide absolute scale anchors for depth map normalization
Anatomical priors: Optional integration of body-site-specific depth priors for improved accuracy

Calibration Map Interpolation:

Depth-proportional scaling: Calibration factor scales inversely with depth (objects farther from camera appear smaller)
Multi-marker fusion: When multiple markers present, use depth-weighted interpolation to handle varying distances
Confidence weighting: Regions closer to markers receive higher confidence in calibration

Failure Modes and Limitations:

No markers detected: Cannot provide calibrated measurements, requires marker placement
Single marker on highly curved surface: Accuracy degrades to 15-20% error, uncertainty flagged
Extreme camera angles: Perspective distortion may exceed correction capabilities, quality warning issued
Poor depth estimation: Textureless regions or unusual body positions may yield unreliable depth, affecting calibration accuracy
Marker occlusion or damage: Partially obscured or damaged markers may be excluded or yield unreliable calibration

Note: This is a Non-Clinical model that performs technical surface area quantification to support downstream clinical severity scoring and BSA calculations. It does not make medical diagnoses or clinical assessments. The model provides quantitative measurements (area in cm², percentage of body site) that are used as inputs to clinical scoring systems (PASI, EASI, etc.) operated by healthcare professionals.

Body Site Identification

Model Classification: 🛠️ Non-Clinical Model

Description

A deep learning multi-class classification model ingests a clinical image and outputs a probability distribution across anatomical body site categories:

\mathbf{p}_{\text{site}} = [p_{\text{site}_1}, p_{\text{site}_2}, \ldots, p_{\text{site}_n}]

where each $p_i$ corresponds to the probability that the image contains skin from anatomical site $i$ , and $\sum p_i = 1$ .

The model classifies images into the following primary body site categories:

Head and Neck Region:

Face (forehead, cheeks, nose, chin, perioral area)
Scalp (hair-bearing regions of the head)
Ears
Neck (anterior, posterior, lateral)
Periorbital (eyelids, periocular region)

Upper Extremities:

Hands (palms, dorsal hands, fingers)
Wrists
Forearms (volar, dorsal)
Upper Arms
Axillae (armpits, intertriginous)

Trunk:

Chest (anterior trunk, presternal area)
Abdomen
Back (upper, middle, lower back)
Inframammary (under breast folds, intertriginous)

Lower Extremities:

Feet (soles, dorsal feet, toes)
Ankles
Lower Legs (shins, calves)
Thighs
Knees

Anogenital Region:

Inguinal/Groin (inguinal folds, intertriginous)
Gluteal (buttocks)
Perianal
Genital

Other Specialized Sites:

Nails (fingernails, toenails)
Mucosa (oral, labial)
Intertriginous (body folds not elsewhere classified)

The predicted body site is:

\text{Body Site} = \arg\max_{k} p_k

Additionally, the model outputs a continuous confidence score and may provide secondary site probabilities for images showing multiple body regions or ambiguous anatomical boundaries.

Objectives

Enable anatomical context awareness for downstream clinical AI models that may have site-specific performance characteristics or require body site information for clinical interpretation.
Support body surface area (BSA) calculations by identifying anatomical regions for standardized BSA estimation in severity scoring systems (PASI, EASI, burn assessment).
Facilitate disease-specific analysis by routing images to body site-appropriate clinical models (e.g., palmoplantar-specific psoriasis assessment, facial acne analysis).
Improve clinical documentation by automatically annotating images with anatomical location for structured medical records.
Enable epidemiological analysis by tracking disease distribution across body sites for research and surveillance purposes.
Support treatment planning by providing anatomical context relevant for therapy selection (e.g., facial vs. truncal treatments differ in formulation and potency).
Enhance quality control by detecting anatomically inappropriate images for specific clinical workflows.

Justification (Clinical Evidence):

Body site location is a critical clinical variable influencing disease presentation, differential diagnosis, treatment selection, and prognosis across dermatological conditions [262, 263].
Manual anatomical site annotation is time-consuming and inconsistent, with variability particularly evident for boundary regions (e.g., wrist vs. forearm, neck vs. chest) [264].
Automated body site identification has demonstrated high accuracy (>85%) in multi-class classification tasks across diverse dermatological imaging datasets [265, 266].
Disease prevalence, morphology, and treatment response vary significantly by anatomical site:
- Psoriasis: Scalp, elbows, knees show different treatment responses than intertriginous areas [267]
- Acne: Facial acne requires different therapeutic approaches than truncal acne [268]
- Hidradenitis suppurativa: Predominantly affects axillary, inguinal, and perianal regions [269]
- Melanoma: Sun-exposed sites (face, arms) have different risk profiles than trunk or acral sites [270]
Body site-specific AI models show 10-20% accuracy improvement compared to site-agnostic models for certain conditions [271].
Accurate body site identification enables automated BSA calculation for severity indices (PASI, EASI), where site-specific weighting is required (e.g., head = 10%, trunk = 30%, upper extremities = 20%, lower extremities = 40%) [272].
Treatment guidelines often specify site-specific recommendations for corticosteroid potency, formulation selection, and therapy duration [273].
Clinical trials require body site documentation for subgroup analyses and to ensure representative distribution of lesions [274].

Endpoints and Requirements

Performance is evaluated using classification accuracy, weighted kappa, and per-class metrics compared to expert anatomical site annotations.

Metric	Threshold	Interpretation
Overall Accuracy	≥ 85%	High accuracy required for reliable anatomical context in clinical workflows.
Weighted Kappa (κw)	≥ 0.80	Strong agreement with expert anatomical site classification.
Macro F1-Score	≥ 0.80	Balanced performance across all body site categories.
Class-specific F1 (per site)	≥ 0.75	Minimum acceptable F1 for each body site category.
Region-level Accuracy	≥ 90%	Correct broad anatomical region (head/neck, trunk, upper/lower extremity, etc.).
Adjacent Site Tolerance	≥ 95%	Within anatomically adjacent site of expert assessment (e.g., wrist vs. hand).
Confidence Calibration (ECE)	≤ 0.10	Confidence scores accurately reflect true classification probability.
Top-3 Accuracy	≥ 95%	Correct site within top-3 predictions (useful for ambiguous boundaries).

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Implement a deep learning classification architecture (e.g., CNN, Vision Transformer, or hybrid) optimized for anatomical feature recognition.
Output structured data including:
- Probability distribution across all body site categories
- Predicted primary body site with confidence score
- Secondary site probabilities for images showing multiple regions
- Broad anatomical region (head/neck, upper extremity, trunk, lower extremity, anogenital)
- Intertriginous flag indicating body fold regions requiring special consideration
- Confidence indicators for ambiguous or boundary cases
Demonstrate performance meeting or exceeding all thresholds:
- Overall Accuracy ≥ 85% and Weighted Kappa ≥ 0.80
- Macro F1 ≥ 0.80 and class-specific F1 ≥ 0.75 for all sites
- Region-level Accuracy ≥ 90%
- Confidence calibration error ≤ 0.10
Report all metrics with 95% confidence intervals and confusion matrices showing prediction patterns.
Validate the model on an independent and diverse test dataset including:
- Balanced representation across all body site categories
- Multiple imaging perspectives (frontal, lateral, oblique views)
- Various skin conditions (healthy, inflammatory, neoplastic, pigmented) across all sites
- Diverse patient populations (various ages, genders, body habitus, Fitzpatrick skin types)
- Different imaging contexts (clinical photography, patient self-captured, telemedicine)
- Challenging boundary cases (wrist/hand, ankle/foot, neck/chest transitions)
- Partially visible anatomy where full body site context is limited
Handle specialized anatomical features:
- Intertriginous regions: Axillae, inframammary, inguinal folds, perianal, interdigital
- Acral sites: Palms, soles, nail apparatus
- Flexural surfaces: Antecubital fossae, popliteal fossae
- Mucosal surfaces: Oral mucosa, labial regions
- Hair-bearing regions: Scalp, beard area, body hair distribution
Ensure outputs are compatible with:
- Body surface area (BSA) calculation algorithms for severity scoring (PASI, EASI, burn assessment)
- Site-specific clinical models requiring anatomical routing
- FHIR-based structured reporting with standardized anatomical site codes (SNOMED CT, ICD-11)
- Clinical decision support systems providing site-specific treatment recommendations
- Medical record systems for automated anatomical documentation
- Epidemiological databases for disease surveillance and research
Provide hierarchical classification:
- Broad region (e.g., "Upper Extremity")
- Intermediate region (e.g., "Hand/Wrist")
- Specific site (e.g., "Dorsal Hand")
- Enables flexible downstream usage depending on required anatomical granularity
Document the training strategy including:
- Multi-expert annotation protocol for anatomical ground truth
- Handling of images showing multiple body sites simultaneously
- Data augmentation strategies preserving anatomical context
- Class balancing approach for underrepresented sites (e.g., mucosa, genital)
- Transfer learning from anatomical recognition tasks
Implement real-time processing capabilities:
- Inference time < 300ms for immediate anatomical routing
- Lightweight architecture suitable for mobile/edge deployment
- Batch processing for archive anatomical annotation
Provide interpretability features:
- Saliency maps highlighting anatomical features supporting site classification
- Anatomical landmarks detected (e.g., nipples for chest, umbilicus for abdomen)
- Confidence thresholds for automatic vs. manual site annotation
- Multi-site flags when multiple body regions are visible
Handle challenging scenarios:
- Ambiguous boundaries: Wrist/hand, ankle/foot, neck/shoulder transitions
- Close-ups: Limited anatomical context (e.g., extreme close-up of skin without landmarks)
- Atypical perspectives: Unusual angles or cropping
- Body site variants: Accounting for anatomical variation (e.g., high vs. low hairline)
- Bilateral symmetry: Left vs. right distinction when relevant
Provide evidence that:
- The model generalizes across different imaging devices and conditions
- Performance is maintained across diverse patient populations and body habitus
- Anatomical classification is robust to skin conditions and lesions
- Site predictions align with expert dermatologist consensus
- The model handles partial anatomy and cropped images appropriately
- Multi-site detection works when multiple body regions are present
Include failure mode analysis:
- Performance on extreme close-ups lacking anatomical landmarks
- Handling of atypical anatomy or surgical alterations
- Behavior with images showing non-standard positioning
- Confidence scoring for ambiguous boundary cases
- Documentation of body sites requiring clinical examination context
Establish clinical validation protocol:
- Prospective validation with expert dermatologist site annotation
- Inter-rater reliability comparison for boundary cases
- Clinical utility assessment in automated BSA calculation accuracy
- Integration testing with site-specific clinical models

Clinical Impact:

The Body Site Identification model serves multiple critical functions:

Automated documentation: Eliminates manual anatomical site entry in medical records
BSA calculation support: Enables accurate body surface area estimation for severity indices
Site-specific routing: Directs images to optimal body site-specific AI models
Treatment personalization: Supports site-appropriate therapy recommendations
Quality assurance: Validates anatomical appropriateness for specific clinical workflows
Research enablement: Facilitates epidemiological analysis and clinical trial stratification
Workflow optimization: Reduces clinician time spent on anatomical annotation

Body Site Categories (Detailed):

The model classifies images into the following hierarchical body site structure:

1. Head and Neck (10% BSA)

Face (subdivided: forehead, temple, cheek, nose, chin, perioral)
Scalp (anterior, vertex, posterior)
Ears (auricle, retroauricular)
Neck (anterior, posterior, lateral)
Periorbital (eyelids, periocular - requires specialized handling)

2. Upper Extremities (20% BSA total: 9% each arm + 2% hands)

Shoulders
Upper arms (anterior, posterior)
Elbows (antecubital, posterior)
Forearms (volar, dorsal)
Wrists (volar, dorsal)
Hands (palms, dorsal, fingers)
Axillae (intertriginous)

3. Trunk (30% BSA: anterior 18% + posterior 18%)

Chest/Anterior Trunk (presternal, lateral chest)
Abdomen (upper, lower, periumbilical)
Back (upper, middle, lower back)
Inframammary (under breast folds - intertriginous)

4. Lower Extremities (40% BSA total: 18% each leg + 4% feet)

Buttocks/Gluteal
Thighs (anterior, posterior, medial, lateral)
Knees (anterior, posterior/popliteal)
Lower legs (shins/anterior, calves/posterior)
Ankles
Feet (soles/plantar, dorsal, toes)
Inguinal (groin folds - intertriginous)

5. Anogenital Region

Inguinal/groin (intertriginous folds)
Perianal
Genital (external genitalia)

6. Specialized Sites

Nails (fingernails, toenails - requires specialized assessment)
Mucosa (oral, labial)
Intertriginous (generalized body folds classification)

Note: This is a Non-Clinical model that provides anatomical site classification to support downstream clinical models, BSA calculations, and clinical documentation. It does not make medical diagnoses or clinical assessments. The model serves as an auxiliary tool providing anatomical context that enhances the accuracy and clinical relevance of other AI models and clinical workflows.

Data Specifications

The development of the algorithms requires the collection and annotation of dermatological images.

We defined three types of data to collect:

Clinical Data: data with the diversity to be found in a hospital dermatology department (in terms of patients, demographics, skin tones, anatomical locations, and clinical indications).
Atlas Data: data from online atlases or reference image repositories that provide a broader variability of cases and rare conditions, which might not be commonly encountered in everyday clinical practice but are necessary to strengthen the robustness of the algorithms.
Evaluation Data: data specifically intended to enable unbiased training, validation, and evaluation of the algorithms.

To answer these specifications, three complementary data collections will be performed:

Retrospective Data: data already available from dermatological atlases, hospital databases, or other private sources. These datasets include a wide variety of conditions, including rare diseases, and will be used to enhance diversity and improve training robustness.
Prospective Data: data collected prospectively from hospital dermatology departments during routine clinical care. These images will ensure the dataset reflects real-world usage, patient demographics, and skin types, thereby supporting training, validation, and evaluation of the algorithms.
Evaluation Data (Hold-out Sets): data specifically sequestered for independent testing and validation, ensuring unbiased performance assessment of the algorithms.

The collected data should reflect the intended population in terms of demographics, skin tones, anatomical regions, and dermatological parameters. A description of the population represented in the collected datasets will be presented in the R-TF-028-005 AI/ML Development Report.

Regarding annotation, multiple types of expert labeling will be performed depending on the model requirements which will be detailed in R-TF-028-004. Annotation will be performed exclusively by dermatologists, with adjudication steps to ensure consistency.

Methods to ensure data quality (both in collection and annotation), the sequestration of datasets, and the determination of ground truth will be implemented and documented.

The goal is to obtain data characterized by:

Scale: [NUMBER OF IMAGES] dermatological images [cite: 51–53].
Diversity: Representation of multiple skin tones, demographics, clinical contexts, and lesion types [cite: 54].
Annotation: Expert dermatologists only, with inter-rater agreement checks [cite: 9, 10].
Separation: Training, validation, and test sets with strict hold-out policies [cite: 68].

Requirements:

Perform 1 retrospective and 2 prospective data collections.
Provide evidence that collected data are representative of the intended population.
Ensure complete independence of the test set from training/tuning datasets.
Guarantee reproducible, consistent, and high-quality ground truth determination.
Maintain data traceability, standardized labeling protocols, and robust quality control.

Other Specifications

Development Environment:

Fixed hardware/software stack for training and evaluation.
Deployment conversion validated by prediction equivalence testing.

Requirements:

Track software versions (TensorFlow, NumPy, etc.).
Verify equivalence between development and deployed model outputs.

Cybersecurity and Transparency

Data: Always de-identified/pseudonymized [cite: 9].
Access: Research server restricted to authorized staff only.
Traceability: Development Report to include data management, model training, evaluation methods, and results.
Explainability: Logs, saliency maps, and learning curves to support monitoring.
User Documentation: Must state algorithm purpose, inputs/outputs, limitations, and that AI/ML is used.

Requirements:

Secure and segregate research data.
Provide full traceability of data and algorithms.
Communicate limitations clearly to end-users.

Specifications and Risks

Risks linked to specifications are recorded in the AI/ML Risk Matrix (R-TF-028-011).

Key Risks:

Misinterpretation of outputs.
Incorrect diagnosis suggestions.
Data bias or mislabeled ground truth.
Model drift over time.
Input image variability (lighting, resolution).

Risk Mitigations:

Rigorous pre-market validation.
Continuous monitoring and retraining.
Controlled input requirements.
Clear clinical instructions for use.

Integration and Environment

Integration

Algorithms will be packaged for integration into Legit.Health Plus to support healthcare professionals [cite: 20, 22, 25, 40].

Environment

Inputs: Clinical and dermoscopic images [cite: 26].
Robustness: Must handle variability in acquisition [cite: 8].
Compatibility: Package size and computational load must align with target device hardware/software.

References

Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056
Liu Y, Jain A, Eng C, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020;26(6):900-908. doi:10.1038/s41591-020-0842-3
Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol. 2018;138(7):1529-1538. doi:10.1016/j.jid.2018.01.028
Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842. doi:10.1093/annonc/mdy166
Brinker TJ, Hekler A, Enk AH, et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur J Cancer. 2019;113:47-54. doi:10.1016/j.ejca.2019.04.001
Tschandl P, Codella N, Akay BN, et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 2019;20(7):938-947. doi:10.1016/S1470-2045(19)30333-X
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2020;128(2):336-359. doi:10.1007/s11263-019-01228-7
Janda M, Horsham C, Vagenas D, et al. Accuracy of mobile digital teledermoscopy for skin self-examinations in adults at high risk of skin cancer: an open-label, randomised controlled trial. Lancet Digit Health. 2020;2(3):e129-e137. doi:10.1016/S2589-7500(20)30001-7
Han SS, Park I, Chang SE, et al. Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J Invest Dermatol. 2020;140(9):1753-1761. doi:10.1016/j.jid.2020.01.019
Rajpara SM, Botello AP, Townend J, Ormerod AD. Systematic review of dermoscopy and digital dermoscopy/artificial intelligence for the diagnosis of melanoma. Br J Dermatol. 2009;161(3):591-604. doi:10.1111/j.1365-2133.2009.09093.x
Maron RC, Weichenthal M, Utikal JS, et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur J Cancer. 2019;119:57-65. doi:10.1016/j.ejca.2019.06.013
Tognetti L, Bonechi S, Andreini P, et al. A new deep learning approach integrated with clinical data for the dermoscopic differentiation of early melanomas from atypical nevi. J Dermatol Sci. 2021;101(2):115-122. doi:10.1016/j.jdermsci.2020.11.009
Ferrante di Ruffano L, Dinnes J, Deeks JJ, et al. Optical coherence tomography for diagnosing skin cancer in adults. Cochrane Database Syst Rev. 2018;12(12):CD013189. doi:10.1002/14651858.CD013189
Dinnes J, Deeks JJ, Chuchu N, et al. Dermoscopy, with and without visual inspection, for diagnosing melanoma in adults. Cochrane Database Syst Rev. 2018;12(12):CD011902. doi:10.1002/14651858.CD011902.pub2
Phillips M, Marsden H, Jaffe W, et al. Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Netw Open. 2019;2(10):e1913436. doi:10.1001/jamanetworkopen.2019.13436
National Institute for Health and Care Excellence (NICE). Suspected cancer: recognition and referral [NG12]. London: NICE; 2015. Updated 2021. Available from: https://www.nice.org.uk/guidance/ng12
Garbe C, Amaral T, Peris K, et al. European consensus-based interdisciplinary guideline for melanoma. Part 1: Diagnostics - Update 2019. Eur J Cancer. 2020;126:141-158. doi:10.1016/j.ejca.2019.11.014
Walter FM, Morris HC, Humphrys E, et al. Effect of adding a diagnostic aid to best practice to manage suspicious pigmented lesions in primary care: randomised controlled trial. BMJ. 2012;345:e4110. doi:10.1136/bmj.e4110
Curchin DJ, Harris VR, McCormack CJ, et al. Early detection of melanoma: a consensus report from the Australian Skin and Skin Cancer Research Centre Melanoma Screening Summit. Aust J Gen Pract. 2022;51(1-2):9-14. doi:10.31128/AJGP-06-21-6016
Warshaw EM, Gravely AA, Nelson DB. Reliability of physical examination versus lesion photography in assessing melanocytic skin lesion morphology. J Am Acad Dermatol. 2010;63(4):e81-e87. doi:10.1016/j.jaad.2009.11.030
Tan J, Liu H, Leyden JJ, Leoni MJ. Reliability of clinician erythema assessment grading scale. J Am Acad Dermatol. 2014;71(4):760-763. doi:10.1016/j.jaad.2014.05.044
Lee JH, Kim YJ, Kim J, et al. Erythema detection in digital skin images using CNN. Skin Res Technol. 2021;27(3):295-301. doi:10.1111/srt.12938
Cho SB, Lee SJ, Chung WS, et al. Automated erythema detection and quantification in rosacea using deep learning. J Eur Acad Dermatol Venereol. 2021;35(4):965-972. doi:10.1111/jdv.17000
Kim YJ, Park SH, Lee JH, et al. Automated erythema assessment using deep learning for sunscreen efficacy testing. Photodermatol Photoimmunol Photomed. 2023;39(2):135-142. doi:10.1111/phpp.12825
Fredriksson T, Pettersson U. Severe psoriasis--oral therapy with a new retinoid. Dermatologica. 1978;157(4):238-244. doi:10.1159/000250839
Langley RGB, Krueger GG, Griffiths CEM. Psoriasis: epidemiology, clinical features, and quality of life. Ann Rheum Dis. 2005;64(Suppl 2):ii18-ii23. doi:10.1136/ard.2004.033217
Puzenat E, Bronsard V, Prey S, et al. What are the best outcome measures for assessing plaque psoriasis severity? A systematic review of the literature. J Eur Acad Dermatol Venereol. 2010;24(Suppl 2):10-16. doi:10.1111/j.1468-3083.2009.03562.x
Noble WC, Somerville DA. Microbiology of Human Skin. 2nd ed. London: WB Saunders; 1974.
Schmid-Wendtner MH, Korting HC. The pH of the skin surface and its impact on the barrier function. Skin Pharmacol Physiol. 2006;19(6):296-302. doi:10.1159/000094670
Humbert P, Fanian F, Maibach HI, Agache P. Agache's Measuring the Skin. 2nd ed. Cham: Springer; 2017. doi:10.1007/978-3-319-32383-1
Shen X, Zhang J, Yan C, Zhou H. An automatic diagnosis method of facial acne vulgaris based on convolutional neural network. Sci Rep. 2018;8(1):5839. doi:10.1038/s41598-018-24204-6
Seité S, Khammari A, Benzaquen M, et al. Development and accuracy of an artificial intelligence algorithm for acne grading from smartphone photographs. Exp Dermatol. 2019;28(11):1252-1257. doi:10.1111/exd.14022
Wu X, Wen N, Liang J, et al. Joint acne image grading and counting via label distribution learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:10642-10651. doi:10.1109/ICCV.2019.01074
Kimball AB, Kerdel F, Adams D, et al. Adalimumab for the treatment of moderate to severe hidradenitis suppurativa: a parallel randomized trial. Ann Intern Med. 2012;157(12):846-855. doi:10.7326/0003-4819-157-12-201212180-00004
Olsen EA, Hordinsky MK, Price VH, et al. Alopecia areata investigational assessment guidelines--Part II. National Alopecia Areata Foundation. J Am Acad Dermatol. 2004;51(3):440-447. doi:10.1016/j.jaad.2003.09.032
Lee Y, Lee SH, Kim YH, et al. Hair loss quantification from standardized scalp photographs using deep learning. J Invest Dermatol. 2022;142(6):1636-1643. doi:10.1016/j.jid.2021.10.031
Messenger AG, McKillop J, Farrant P, McDonagh AJ, Sladden M. British Association of Dermatologists' guidelines for the management of alopecia areata 2012. Br J Dermatol. 2012;166(5):916-926. doi:10.1111/j.1365-2133.2012.10955.x
Gupta AK, Mays RR, Dotzert MS, et al. Efficacy of non-surgical treatments for androgenetic alopecia: a systematic review and network meta-analysis. J Eur Acad Dermatol Venereol. 2018;32(12):2112-2125. doi:10.1111/jdv.15081
Gardner SE, Frantz RA. Wound bioburden and infection-related complications in diabetic foot ulcers. Biol Res Nurs. 2008;10(1):44-53. doi:10.1177/1099800408319056
Cutting KF, White RJ. Criteria for identifying wound infection--revisited. Ostomy Wound Manage. 2005;51(1):28-34.
Rahma ON, Iyer R, Kattapuram T, et al. Objective assessment of perilesional erythema of chronic wounds using digital color image processing. Adv Skin Wound Care. 2015;28(1):11-16. doi:10.1097/01.ASW.0000459039.98700.74
Wannous H, Treuillet S, Lucas Y. Robust tissue classification for reproducible wound assessment in telemedicine environments. J Electron Imaging. 2010;19(2):023002. doi:10.1117/1.3432622
Falanga V. Wound bed preparation and the role of enzymes: a case for multiple actions of therapeutic agents. Wounds. 2002;14(2):47-57.
Lazarus GS, Cooper DM, Knighton DR, et al. Definitions and guidelines for assessment of wounds and evaluation of healing. Arch Dermatol. 1994;130(4):489-493.
Sibbald RG, Woo K, Ayello EA. Increased bacterial burden and infection: the story of NERDS and STONES. Adv Skin Wound Care. 2006;19(8):447-461. doi:10.1097/00129334-200610000-00012
Wolcott RD, Rhoads DD, Dowd SE. Biofilms and chronic wound inflammation. J Wound Care. 2008;17(8):333-341. doi:10.12968/jowc.2008.17.8.30796
Schultz GS, Sibbald RG, Falanga V, et al. Wound bed preparation: a systematic approach to wound management. Wound Repair Regen. 2003;11(Suppl 1):S1-S28. doi:10.1046/j.1524-475x.11.s2.1.x
Cutting KF. Wound exudate: composition and functions. Br J Community Nurs. 2003;8(9):S4-S9. doi:10.12968/bjcn.2003.8.Sup3.11577
Stephen-Haynes J. Assessment and management of peri-wound skin. Br J Community Nurs. 2012;17(Sup3):S28-S35. doi:10.12968/bjcn.2012.17.Sup3.S28
James GA, Swogger E, Wolcott R, et al. Biofilms in chronic wounds. Wound Repair Regen. 2008;16(1):37-44. doi:10.1111/j.1524-475X.2007.00321.x
National Pressure Ulcer Advisory Panel. NPUAP Pressure Injury Stages. 2016. Available from: https://npiap.com/page/PressureInjuryStages
Black JM, Cuddigan JE, Walko MA, Didier LA, Lander MJ, Kelpe MR. Medical device related pressure ulcers in hospitalized patients. Int Wound J. 2010;7(5):358-365. doi:10.1111/j.1742-481X.2010.00699.x
Flanagan M. Wound Healing and Skin Integrity: Principles and Practice. Oxford: Wiley-Blackwell; 2013.
Bowler PG, Duerden BI, Armstrong DG. Wound microbiology and associated approaches to wound management. Clin Microbiol Rev. 2001;14(2):244-269. doi:10.1128/CMR.14.2.244-269.2001
Hoiby N, Bjarnsholt T, Givskov M, Molin S, Ciofu O. Antibiotic resistance of bacterial biofilms. Int J Antimicrob Agents. 2010;35(4):322-332. doi:10.1016/j.ijantimicag.2009.12.011
Guo S, Dipietro LA. Factors affecting wound healing. J Dent Res. 2010;89(3):219-229. doi:10.1177/0022034509359125
Edwards R, Harding KG. Bacteria and wound healing. Curr Opin Infect Dis. 2004;17(2):91-96. doi:10.1097/00001432-200404000-00004
Attinger CE, Janis JE, Steinberg J, Schwartz J, Al-Attar A, Couch K. Clinical approach to wounds: debridement and wound bed preparation including the use of dressings and wound-healing adjuvants. Plast Reconstr Surg. 2006;117(7 Suppl):72S-109S. doi:10.1097/01.prs.0000225470.42514.8f
Menke NB, Ward KR, Witten TM, Bonchev DG, Diegelmann RF. Impaired wound healing. Clin Dermatol. 2007;25(1):19-25. doi:10.1016/j.clindermatol.2006.12.005
Martin P. Wound healing--aiming for perfect skin regeneration. Science. 1997;276(5309):75-81. doi:10.1126/science.276.5309.75
Keast DH, Bowering CK, Evans AW, Mackean GL, Burrows C, D'Souza L. MEASURE: A proposed assessment framework for developing best practice recommendations for wound assessment. Wound Repair Regen. 2004;12(3 Suppl):S1-S17. doi:10.1111/j.1067-1927.2004.0123S1.x
Kottner J, Dassen T, Tannen A. Inter- and intrarater reliability of the Waterlow pressure sore risk scale: a systematic review. Int J Nurs Stud. 2009;46(3):369-379. doi:10.1016/j.ijnurstu.2008.09.010
European Pressure Ulcer Advisory Panel, National Pressure Injury Advisory Panel, Pan Pacific Pressure Injury Alliance. Prevention and Treatment of Pressure Ulcers/Injuries: Clinical Practice Guideline. 3rd ed. 2019. Available from: https://internationalguideline.com
Falanga V, Saap LJ, Ozonoff A. Wound bed score and its correlation with healing of chronic wounds. Dermatol Ther. 2006;19(6):383-390. doi:10.1111/j.1529-8019.2006.00096.x
Houghton PE, Kincaid CB, Lovell M, et al. Effect of electrical stimulation on chronic leg ulcer size and appearance. Phys Ther. 2003;83(1):17-28.
Gelfand JM, Hoffstad O, Margolis DJ. Surrogate endpoints for the treatment of venous leg ulcers. J Invest Dermatol. 2002;119(6):1420-1425. doi:10.1046/j.1523-1747.2002.19629.x
Bowler PG. The 10(5) bacterial growth guideline: reassessing its clinical relevance in wound healing. Ostomy Wound Manage. 2003;49(1):44-53.
Goyal A, Sharma A, Garg MK, Chatterjee P, Kamboj P. Artificial intelligence-based automated wound tissue detection using convolutional neural network. J Digit Imaging. 2023;36(2):881-894. doi:10.1007/s10278-022-00745-z
Téot L, Boissiere F, Fluieraru S. Novel foam dressing using negative pressure wound therapy with instillation to remove thick exudate. Int Wound J. 2017;14(5):842-848. doi:10.1111/iwj.12719
Langemo D, Anderson J, Hanson D, Hunter S, Thompson P. Measuring wound length, width, and area: which technique? Adv Skin Wound Care. 2008;21(1):42-45. doi:10.1097/01.ASW.0000305456.26429.65
Chang AC, Dearman B, Greenwood JE. A comparison of wound area measurement techniques: visitrak versus photography. Eplasty. 2011;11:e18.
Cardinal M, Eisenbud DE, Armstrong DG, et al. Serial surgical debridement: a retrospective study on clinical outcomes in chronic lower extremity wounds. Wound Repair Regen. 2009;17(3):306-311. doi:10.1111/j.1524-475X.2009.00485.x
Kantor J, Margolis DJ. A multicentre study of percentage change in venous leg ulcer area as a prognostic index of healing at 24 weeks. Br J Dermatol. 2000;142(5):960-964. doi:10.1046/j.1365-2133.2000.03478.x
Goldman R. Growth factors and chronic wound healing: past, present, and future. Adv Skin Wound Care. 2004;17(1):24-35. doi:10.1097/00129334-200401000-00012
Ferris AH, Leung HJ, Hitos K, Cleland H. Comparison of alginate with electrostatic hydrogel dressings for healing of donor sites: a randomized controlled trial. Eplasty. 2019;19:e13.
Tallman P, Muscare E, Carson P, Eaglstein WH, Falanga V. Initial rate of healing predicts complete healing of venous ulcers. Arch Dermatol. 1997;133(10):1231-1234.
Margolis DJ, Allen-Taylor L, Hoffstad O, Berlin JA. Diabetic neuropathic foot ulcers: predicting which ones will not heal. Am J Med. 2003;115(8):627-631. doi:10.1016/j.amjmed.2003.06.006
Wolcott RD, Kennedy JP, Dowd SE. Regular debridement is the main tool for maintaining a healthy wound bed in most chronic wounds. J Wound Care. 2009;18(2):54-56. doi:10.12968/jowc.2009.18.2.38743
Steed DL, Donohoe D, Webster MW, Lindsley L. Effect of extensive debridement and treatment on the healing of diabetic foot ulcers. Diabetic Ulcer Study Group. J Am Coll Surg. 1996;183(1):61-64.
Golinko MS, Joffe R, de Vinck D, et al. Surgical pathology to identify wound bed barriers to healing. Wound Repair Regen. 2009;17(1):20-26. doi:10.1111/j.1524-475X.2008.00436.x
Lavery LA, Armstrong DG, Murdoch DP, Peters EJ, Lipsky BA. Validation of the Infectious Diseases Society of America's diabetic foot infection classification system. Clin Infect Dis. 2007;44(4):562-565. doi:10.1086/511036
Saap LJ, Falanga V. Debridement performance index and its correlation with complete closure of diabetic foot ulcers. Wound Repair Regen. 2002;10(6):354-359. doi:10.1046/j.1524-475x.2002.10604.x
Sumpio BE, Armstrong DG, Lavery LA, Andros G. The role of interdisciplinary team approach in the management of the diabetic foot: a joint statement from the Society for Vascular Surgery and the American Podiatric Medical Association. J Vasc Surg. 2010;51(6):1504-1506. doi:10.1016/j.jvs.2010.02.255
Stephen-Haynes J, Thompson G. The different methods of wound debridement. Br J Community Nurs. 2007;12(Sup3):S6-S16. doi:10.12968/bjcn.2007.12.Sup3.23742
White RJ, Cutting KF. Modern exudate management: a review of wound treatments. World Wide Wounds. 2006. Available from: http://www.worldwidewounds.com/2006/september/White/Modern-Exudate-Mgt.html
Gray D, White RJ, Cooper P, Kingsley A. Applied wound management and using the wound infection continuum to help select appropriate interventions. Wounds UK. 2010;6(4):61-68.
Worlock P, Slack R, Harvey L, Mawhinney R. The prevention of infection in open fractures: an experimental study of the effect of fracture stability. Injury. 1994;25(1):31-38. doi:10.1016/0020-1383(94)90180-5
Patzakis MJ, Wilkins J. Factors influencing infection rate in open fracture wounds. Clin Orthop Relat Res. 1989;(243):36-40.
Dellinger EP, Miller SD, Wertz MJ, Grypma M, Droppert B, Anderson PA. Risk of infection after open fracture of the arm or leg. Arch Surg. 1988;123(11):1320-1327. doi:10.1001/archsurg.1988.01400350034003
Gustilo RB, Anderson JT. Prevention of infection in the treatment of one thousand and twenty-five open fractures of long bones: retrospective and prospective analyses. J Bone Joint Surg Am. 1976;58(4):453-458.
Lipsky BA, Berendt AR, Cornia PB, et al. 2012 Infectious Diseases Society of America clinical practice guideline for the diagnosis and treatment of diabetic foot infections. Clin Infect Dis. 2012;54(12):e132-e173. doi:10.1093/cid/cis346
Jeffcoate WJ, Bus SA, Game FL, Hinchliffe RJ, Price PE, Schaper NC. Reporting standards of studies and papers on the prevention and management of foot ulcers in diabetes: required details and markers of good quality. Lancet Diabetes Endocrinol. 2016;4(9):781-788. doi:10.1016/S2213-8587(16)30012-2
Senneville EM, Lipsky BA, van Asten SAV, et al. Diagnosing diabetic foot osteomyelitis. Diabetes Metab Res Rev. 2020;36(Suppl 1):e3250. doi:10.1002/dmrr.3250
Prompers L, Huijberts M, Apelqvist J, et al. High prevalence of ischaemia, infection and serious comorbidity in patients with diabetic foot disease in Europe. Baseline results from the Eurodiale study. Diabetologia. 2007;50(1):18-25. doi:10.1007/s00125-006-0491-1
Leshem YA, Hajar T, Hanifin JM, Simpson EL. What the Eczema Area and Severity Index score tells us about the severity of atopic dermatitis: an interpretability study. Br J Dermatol. 2015;172(5):1353-1357. doi:10.1111/bjd.13662
Severity scoring of atopic dermatitis: the SCORAD index. Consensus Report of the European Task Force on Atopic Dermatitis. Dermatology. 1993;186(1):23-31. doi:10.1159/000247298
Charman CR, Venn AJ, Williams HC. The patient-oriented eczema measure: development and initial validation of a new tool for measuring atopic eczema severity from the patients' perspective. Arch Dermatol. 2004;140(12):1513-1519. doi:10.1001/archderm.140.12.1513
Han SS, Park GH, Lim W, et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS One. 2018;13(1):e0191493. doi:10.1371/journal.pone.0191493
Esteva A, Chou K, Yeung S, et al. Deep learning-enabled medical computer vision. NPJ Digit Med. 2021;4(1):5. doi:10.1038/s41746-020-00376-2
Thomsen K, Iversen L, Titlestad TL, Winther O. Systematic review of machine learning for diagnosis and prognosis in dermatology. J Dermatolog Treat. 2020;31(5):496-510. doi:10.1080/09546634.2019.1682500
Zuberbier T, Aberer W, Asero R, et al. The EAACI/GA²LEN/EDF/WAO guideline for the definition, classification, diagnosis and management of urticaria. Allergy. 2018;73(7):1393-1414. doi:10.1111/all.13397
Młynek A, Zalewska-Janowska A, Martus P, Staubach P, Zuberbier T, Maurer M. How to assess disease activity in patients with chronic urticaria? Allergy. 2008;63(6):777-780. doi:10.1111/j.1398-9995.2008.01726.x
Fitzpatrick TB. The validity and practicality of sun-reactive skin types I through VI. Arch Dermatol. 1988;124(6):869-871. doi:10.1001/archderm.124.6.869
Eilers S, Bach DQ, Gaber R, et al. Accuracy of self-report in assessing Fitzpatrick skin phototypes I through VI. JAMA Dermatol. 2013;149(11):1289-1294. doi:10.1001/jamadermatol.2013.6101
Sachdeva S. Fitzpatrick skin typing: applications in dermatology. Indian J Dermatol Venereol Leprol. 2009;75(1):93-96. doi:10.4103/0378-6323.45238
Ware OR, Dawson JE, Shinohara MM, Taylor SC. Racial limitations of Fitzpatrick skin type. Cutis. 2020;105(2):77-80.
Johari K, Kist JM, Bulera VN, et al. Self-reported Fitzpatrick skin type classification is unreliable in dermatology patients. J Drugs Dermatol. 2020;19(9):892-895. doi:10.36849/JDD.2020.5274
Farnebo S, Samuelsson A, Henricson J, Karlsson M, Sjöberg F. Unaided visual evaluation of erythema is poor in the assessment of laser settings. Scand J Plast Reconstr Surg Hand Surg. 2009;43(6):315-319. doi:10.3109/02844310903265416
Adamson AS, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol. 2018;154(11):1247-1248. doi:10.1001/jamadermatol.2018.2348
Daneshjou R, Barata C, Betz-Stablein B, et al. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines from the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatol. 2022;158(1):90-96. doi:10.1001/jamadermatol.2021.4915
Kinyanjui NM, Odongo T, Cintas C, et al. Estimating skin tone and effects on classification performance in dermatology datasets. arXiv preprint arXiv:1910.13268. 2019.
Ezzedine K, Eleftheriadou V, Whitton M, van Geel N. Vitiligo. Lancet. 2015;386(9988):74-84. doi:10.1016/S0140-6736(14)60763-7
Taylor SC, Arsonnaud S, Czernielewski J. The Taylor hyperpigmentation scale: a new visual assessment tool for the evaluation of skin color and pigmentation. Cutis. 2005;76(4):270-274.
Del Bino S, Duval C, Bernerd F. Clinical and biological characterization of skin pigmentation diversity and its consequences on UV impact. Int J Mol Sci. 2018;19(9):2668. doi:10.3390/ijms19092668
Ly BCK, Dyer EB, Feig JL, Chien AL, Del Bino S. Research Techniques Made Simple: Cutaneous Colorimetry: A Reliable Technique for Objective Skin Color Measurement. J Invest Dermatol. 2020;140(1):3-12.e1. doi:10.1016/j.jid.2019.11.003
Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018;5:180161. doi:10.1038/sdata.2018.161
Winkler JK, Fink C, Toberer F, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 2019;155(10):1135-1141. doi:10.1001/jamadermatol.2019.1735
Codella NCF, Gutman D, Celebi ME, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE; 2018:168-172. doi:10.1109/ISBI.2018.8363547
Fujisawa Y, Inoue S, Nakamura Y. The possibility of deep learning-based, computer-aided skin tumor classifiers. Front Med (Lausanne). 2019;6:191. doi:10.3389/fmed.2019.00191
Jain A, Way D, Gupta V, et al. Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Netw Open. 2021;4(4):e217249. doi:10.1001/jamanetworkopen.2021.7249
Lee T, Ng V, Gallagher R, Coldman A, McLean D. Dullrazor: A software approach to hair removal from images. Comput Biol Med. 1997;27(6):533-543. doi:10.1016/s0010-4825(97)00020-6
Winkler JK, Sies K, Fink C, et al. Melanoma recognition by a deep learning convolutional neural network-Performance in different melanoma subtypes and localisations. Eur J Cancer. 2020;127:21-29. doi:10.1016/j.ejca.2019.11.020
Yap J, Yolland W, Tschandl P. Multimodal skin lesion classification using deep learning. Exp Dermatol. 2018;27(11):1261-1267. doi:10.1111/exd.13777
Chakravorty R, Abedini M, Halpern A, et al. Dermoscopic image segmentation using deep convolutional networks. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2017:542-545. doi:10.1109/EMBC.2017.8036895
Bi L, Kim J, Ahn E, Kumar A, Fulham M, Feng D. Dermoscopic image segmentation via multistage fully convolutional networks. IEEE Trans Biomed Eng. 2017;64(9):2065-2074. doi:10.1109/TBME.2017.2712771
Celebi ME, Wen Q, Iyatomi H, Shimizu K, Zhou H, Schaefer G. A state-of-the-art survey on lesion border detection in dermoscopy images. Dermoscopy Image Analysis. 2015:97-129. doi:10.1201/b19107-5
Yuan Y, Chao M, Lo YC. Automatic skin lesion segmentation using deep fully convolutional networks with Jaccard distance. IEEE Trans Med Imaging. 2017;36(9):1876-1886. doi:10.1109/TMI.2017.2695227
Mirikharaji Z, Abhishek K, Izadi S, Hamarneh G. Star shape prior in fully convolutional networks for skin lesion segmentation. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. Springer; 2018:737-745. doi:10.1007/978-3-030-00937-3_84
Barata C, Ruela M, Francisco M, Mendonça T, Marques JS. Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE Syst J. 2014;8(3):965-979. doi:10.1109/JSYST.2013.2271540
Marchetti MA, Codella NCF, Dusza SW, et al. Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J Am Acad Dermatol. 2018;78(2):270-277.e1. doi:10.1016/j.jaad.2017.08.016
Stern RS, Nijsten T, Feldman SR, Margolis DJ, Rolstad T. Psoriasis is common, carries a substantial burden even when not extensive, and is associated with widespread treatment dissatisfaction. J Investig Dermatol Symp Proc. 2004;9(2):136-139. doi:10.1046/j.1087-0024.2003.09102.x
Jaspers S, Hopermann S, Sauermann G, et al. Rapid in vivo measurement of the topography of human skin by active image triangulation using a digital micromirror device. Skin Res Technol. 1999;5(3):195-207. doi:10.1111/j.1600-0846.1999.tb00131.x
Takeshita J, Gelfand JM, Li P, et al. Psoriasis in the U.S. Medicare population: prevalence, treatment, and factors associated with biologic use. J Invest Dermatol. 2015;135(12):2955-2963. doi:10.1038/jid.2015.296
Parisi R, Symmons DP, Griffiths CE, Ashcroft DM. Global epidemiology of psoriasis: a systematic review of incidence and prevalence. J Invest Dermatol. 2013;133(2):377-385. doi:10.1038/jid.2012.339
Chalmers RJ, O'Sullivan T, Owen CM, Griffiths CE. A systematic review of treatments for guttate psoriasis. Br J Dermatol. 2001;145(6):891-894. doi:10.1046/j.1365-2133.2001.04567.x
Kawahara J, Daneshvar S, Argenziano G, Hamarneh G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J Biomed Health Inform. 2019;23(2):538-546. doi:10.1109/JBHI.2018.2824327
Fujisawa Y, Otomo Y, Ogata Y, et al. Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. Br J Dermatol. 2019;180(2):373-381. doi:10.1111/bjd.16924
Menter A, Strober BE, Kaplan DH, et al. Joint AAD-NPF guidelines of care for the management and treatment of psoriasis with biologics. J Am Acad Dermatol. 2019;80(4):1029-1072. doi:10.1016/j.jaad.2018.11.057
Zaenglein AL, Pathy AL, Schlosser BJ, et al. Guidelines of care for the management of acne vulgaris. J Am Acad Dermatol. 2016;74(5):945-973.e33. doi:10.1016/j.jaad.2015.12.037
Jemec GB. Clinical practice. Hidradenitis suppurativa. N Engl J Med. 2012;366(2):158-164. doi:10.1056/NEJMcp1014163
Bradford PT, Goldstein AM, McMaster ML, Tucker MA. Acral lentiginous melanoma: incidence and survival patterns in the United States, 1986-2005. Arch Dermatol. 2009;145(4):427-434. doi:10.1001/archdermatol.2008.609
Kawahara J, BenTaieb A, Hamarneh G. Deep features to classify skin lesions. In: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI). IEEE; 2016:1397-1400. doi:10.1109/ISBI.2016.7493528
Robinson A, Kardos M, Kimball AB. Physician Global Assessment (PGA) and Psoriasis Area and Severity Index (PASI): why do both? A systematic analysis of randomized controlled trials of biologic agents for moderate to severe plaque psoriasis. J Am Acad Dermatol. 2012;66(3):369-375. doi:10.1016/j.jaad.2011.01.022
Yentzer BA, Ade RA, Fountain JM, et al. Simplifying regimens promotes greater adherence and outcomes with topical acne medications: a randomized controlled trial. Cutis. 2010;86(2):103-108.
Rademaker M, Agnew K, Anagnostou N, et al. Psoriasis in those planning a family, pregnant or breastfeeding. The Australasian Psoriasis Collaboration. Australas J Dermatol. 2018;59(2):86-100. doi:10.1111/ajd.12733
Brown BC, McKenna SP, Siddhi K, McGrouther DA, Bayat A. The hidden cost of skin scars: quality of life after skin scarring. J Plast Reconstr Aesthet Surg. 2008;61(9):1049-1058. doi:10.1016/j.bjps.2008.03.020
Rennekampff HO, Hansbrough JF, Kiessig V, Doré C, Stoutenbeek CP, Schröder-Printzen I. Bioactive interleukin-8 is expressed in wounds and enhances wound healing. J Surg Res. 2000;93(1):41-54. doi:10.1006/jsre.2000.5892
Wachtel TL, Berry CC, Wachtel EE, Frank HA. The inter-rater reliability of estimating the size of burns from various burn area chart drawings. Burns. 2000;26(2):156-170. doi:10.1016/s0305-4179(99)00047-9
van Baar ME, Essink-Bot ML, Oen IM, Dokter J, Boxma H, van Beeck EF. Functional outcome after burns: a review. Burns. 2006;32(1):1-9. doi:10.1016/j.burns.2005.08.007
Shuster S, Black MM, McVitie E. The influence of age and sex on skin thickness, skin collagen and density. Br J Dermatol. 1975;93(6):639-643. doi:10.1111/j.1365-2133.1975.tb05113.x
Lucas C, Stanborough RW, Freeman CL, De Haan RJ. Efficacy of low-level laser therapy on wound healing in human subjects: a systematic review. Lasers Med Sci. 2000;15(2):84-93. doi:10.1007/s101030050053
Mayrovitz HN, Soontupe LB. Wound areas by computerized planimetry of digital images: accuracy and reliability. Adv Skin Wound Care. 2009;22(5):222-229. doi:10.1097/01.ASW.0000305410.58350.36
Wannous H, Lucas Y, Treuillet S, Albouy B. Supervised tissue classification from color images for a complete wound assessment tool. In: 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2007:6031-6034. doi:10.1109/IEMBS.2007.4353725
Spilsbury K, Semmens JB, Saunders CM, Hall SE. Long-term survival outcomes following breast-conserving surgery with and without radiotherapy for invasive breast cancer. ANZ J Surg. 2005;75(5):337-342. doi:10.1111/j.1445-2197.2005.03374.x
Draaijers LJ, Tempelman FR, Botman YA, et al. The patient and observer scar assessment scale: a reliable and feasible tool for scar evaluation. Plast Reconstr Surg. 2004;113(7):1960-1965. doi:10.1097/01.prs.0000122207.28773.56
Lebwohl M, Yeilding N, Szapary P, et al. Impact of weight on the efficacy and safety of ustekinumab in patients with moderate to severe psoriasis: rationale for dosing recommendations. J Am Acad Dermatol. 2010;63(4):571-579. doi:10.1016/j.jaad.2009.11.012
Hanifin JM, Thurston M, Omoto M, Cherill R, Tofte SJ, Graeber M. The eczema area and severity index (EASI): assessment of reliability in atopic dermatitis. EASI Evaluator Group. Exp Dermatol. 2001;10(1):11-18. doi:10.1034/j.1600-0625.2001.100102.x
Thomas CL, Finlay KA. Defining the boundaries: a critical evaluation of the Birmingham Burn Unit body map. Burns. 1986;12(8):544-548. doi:10.1016/0305-4179(86)90188-1
Berkley JL. Determining total body surface area of a burn using a Lund and Browder chart. Nursing. 2007;37(10):18. doi:10.1097/01.NURSE.0000296227.88874.9e
Langley RG, Ellis CN. Evaluating psoriasis with Psoriasis Area and Severity Index, Psoriasis Global Assessment, and Lattice System Physician's Global Assessment. J Am Acad Dermatol. 2004;51(4):563-569. doi:10.1016/j.jaad.2004.04.012
Finlay AY. Current severe psoriasis and the rule of tens. Br J Dermatol. 2005;152(5):861-867. doi:10.1111/j.1365-2133.2005.06502.x
Gudi V, Akhondi H. Burn Surface Area Assessment. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023.
Gottlieb AB, Kalb RE, Blauvelt A, et al. The efficacy and safety of infliximab in patients with plaque psoriasis who had an inadequate response to etanercept: results of a prospective, multicenter, open-label study. J Am Acad Dermatol. 2012;67(4):642-650. doi:10.1016/j.jaad.2011.10.031
Augustin M, Radtke MA, Glaeske G, Reich K, Christophers E, Schaefer I. Epidemiology and comorbidity in children with psoriasis and atopic eczema. Dermatology. 2015;231(1):35-40. doi:10.1159/000381913
Tripathi R, Knusel KD, Ezaldein HH, Scott JF, Bordeaux JS. Association of topical emollient use with clinical outcomes in patients with atopic dermatitis: A systematic review and meta-analysis. JAMA Dermatol. 2017;153(12):1203-1212. doi:10.1001/jamadermatol.2017.3647
Albrecht J, Werth VP. Development of the CLASI as an outcome instrument for cutaneous lupus erythematosus. Dermatol Ther. 2007;20(2):93-101. doi:10.1111/j.1529-8019.2007.00116.x
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137-1149. doi:10.1109/TPAMI.2016.2577031
Wu Z, Shen C, van den Hengel A. Bridging category-level and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885. 2016.
Powell BJ, Waltz TJ, Chinman MJ, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implement Sci. 2015;10:21. doi:10.1186/s13012-015-0209-1
Sidbury R, Davis DM, Cohen DE, et al. Guidelines of care for the management of atopic dermatitis: section 3. Management and treatment with phototherapy and systemic agents. J Am Acad Dermatol. 2014;71(2):327-349. doi:10.1016/j.jaad.2014.03.030
Hawro T, Ohanyan T, Schoepke N, et al. The urticaria activity score—validity, reliability, and responsiveness. J Allergy Clin Immunol Pract. 2018;6(4):1185-1190.e1. doi:10.1016/j.jacp.2017.10.001
Maurer M, Weller K, Bindslev-Jensen C, et al. Unmet clinical needs in chronic spontaneous urticaria. A GA²LEN task force report. Allergy. 2011;66(3):317-330. doi:10.1111/j.1398-9995.2010.02496.x
Han SS, Moon IJ, Lim W, et al. Keratinocytic skin cancer detection on the face using region-based convolutional neural network. JAMA Dermatol. 2020;156(1):29-37. doi:10.1001/jamadermatol.2019.3807
Zuberbier T, Balke M, Worm M, Edenharter G, Maurer M. Epidemiology of urticaria: a representative cross-sectional population survey. Clin Exp Dermatol. 2010;35(8):869-873. doi:10.1111/j.1365-2230.2010.03840.x
Mathias SD, Dreskin SC, Kaplan A, Saini SS, Rosén K, Beck LA. Development of a daily diary for patients with chronic idiopathic urticaria. Ann Allergy Asthma Immunol. 2010;105(2):142-148. doi:10.1016/j.anai.2010.06.011
Olsen EA, Dunlap FE, Funicella T, et al. A randomized clinical trial of 5% topical minoxidil versus 2% topical minoxidil and placebo in the treatment of androgenetic alopecia in men. J Am Acad Dermatol. 2002;47(3):377-385. doi:10.1067/mjd.2002.124088
Sinclair R, Patel M, Dawson TL Jr, et al. Hair loss in women: medical and cosmetic approaches to increase scalp hair fullness. Br J Dermatol. 2011;165(Suppl 3):12-18. doi:10.1111/j.1365-2133.2011.10630.x
Alkhalifah A, Alsantali A, Wang E, McElwee KJ, Shapiro J. Alopecia areata update: part I. Clinical picture, histopathology, and pathogenesis. J Am Acad Dermatol. 2010;62(2):177-188. doi:10.1016/j.jaad.2009.10.032
Bolduc C, Shapiro J. Hair care products: waving, straightening, conditioning, and coloring. Clin Dermatol. 2001;19(4):431-436. doi:10.1016/s0738-081x(01)00201-2
Rich P, Scher RK. Nail Psoriasis Severity Index: a useful tool for evaluation of nail psoriasis. J Am Acad Dermatol. 2003;49(2):206-212. doi:10.1067/s0190-9622(03)00910-1
Fernández-Nieto D, Cura-Gonzalez ID, Esteban-Velasco C, Marques-Mejias MA, Ortega-Quijano D. Artificial intelligence to assess nail unit disorders: A pilot study. Skin Appendage Disord. 2021;7(6):428-433. doi:10.1159/000517341
Parrish CA, Sobera JO, Elewski BE. Modification of the Nail Psoriasis Severity Index. J Am Acad Dermatol. 2005;53(4):745-746. doi:10.1016/j.jaad.2005.04.028
Han SS, Park I, Eun Chang S, et al. Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J Invest Dermatol. 2020;140(9):1753-1761.e4. doi:10.1016/j.jid.2020.01.019
Antonini D, Simonatto M, Candi E, Melino G. Keratinocyte stem cells and their niches in the skin appendages. J Invest Dermatol. 2014;134(7):1797-1799. doi:10.1038/jid.2014.126
Njoo MD, Westerhof W, Bos JD, Bossuyt PM. A systematic review of autologous transplantation methods in vitiligo. Arch Dermatol. 1998;134(12):1543-1549. doi:10.1001/archderm.134.12.1543
Grimes PE, Miller MM. Vitiligo: Patient stories, self-esteem, and the psychological burden of disease. Int J Womens Dermatol. 2018;4(1):32-37. doi:10.1016/j.ijwd.2017.11.005
Hamzavi I, Jain H, McLean D, Shapiro J, Zeng H, Lui H. Parametric modeling of narrowband UV-B phototherapy for vitiligo using a novel quantitative tool: the Vitiligo Area Scoring Index. Arch Dermatol. 2004;140(6):677-683. doi:10.1001/archderm.140.6.677
Njoo MD, Vodegel RM, Westerhof W. Depigmentation therapy in vitiligo universalis with topical 4-methoxyphenol and the Q-switched ruby laser. J Am Acad Dermatol. 2000;42(5 Pt 1):760-769. doi:10.1016/s0190-9622(00)90009-x
Parsad D, Pandhi R, Dogra S, Kumar B. Clinical study of repigmentation patterns with different treatment modalities and their correlation with speed and stability of repigmentation in 352 vitiliginous patches. J Am Acad Dermatol. 2004;50(1):63-67. doi:10.1016/s0190-9622(03)02463-4
Ezzedine K, Lim HW, Suzuki T, et al. Revised classification/nomenclature of vitiligo and related issues: the Vitiligo Global Issues Consensus Conference. Pigment Cell Melanoma Res. 2012;25(3):E1-E13. doi:10.1111/j.1755-148X.2012.00997.x
Yuan Y, Lo YC. Improving dermoscopic image segmentation with enhanced convolutional-deconvolutional networks. IEEE J Biomed Health Inform. 2019;23(2):519-526. doi:10.1109/JBHI.2017.2787487
Rodrigues M, Ezzedine K, Hamzavi I, Pandya AG, Harris JE. New discoveries in the pathogenesis and classification of vitiligo. J Am Acad Dermatol. 2017;77(1):1-13. doi:10.1016/j.jaad.2016.10.048
Passeron T, Ortonne JP. Use of the 308-nm excimer laser for psoriasis and vitiligo. Clin Dermatol. 2006;24(1):33-42. doi:10.1016/j.clindermatol.2005.10.018
Gawkrodger DJ, Ormerod AD, Shaw L, et al. Guideline for the diagnosis and management of vitiligo. Br J Dermatol. 2008;159(5):1051-1076. doi:10.1111/j.1365-2133.2008.08881.x
Halder RM, Taliaferro SJ. Vitiligo. In: Wolff K, Goldsmith LA, Katz SI, et al, eds. Fitzpatrick's Dermatology in General Medicine. 7th ed. McGraw-Hill; 2008:616-622.
Doshi A, Zaheer A, Stiller MJ. A comparison of current acne grading systems and proposal of a novel system. Int J Dermatol. 1997;36(6):416-418. doi:10.1046/j.1365-4362.1997.00099.x
Tan JK, Tang J, Fung K, et al. Development and validation of a comprehensive acne severity scale. J Cutan Med Surg. 2007;11(6):211-216. doi:10.2310/7750.2007.00037
Layton AM, Henderson CA, Cunliffe WJ. A clinical evaluation of acne scarring and its incidence. Clin Exp Dermatol. 1994;19(4):303-308. doi:10.1111/j.1365-2230.1994.tb01200.x
Tan J, Thiboutot D, Popp G, et al. Randomized phase 3 evaluation of trifarotene 50 μg/g cream treatment of moderate facial and truncal acne. J Am Acad Dermatol. 2019;80(6):1691-1699. doi:10.1016/j.jaad.2019.02.044
Leyden J, Stein-Gold L, Weiss J. Why topical retinoids are mainstay of therapy for acne. Dermatol Ther (Heidelb). 2017;7(3):293-304. doi:10.1007/s13555-017-0185-2
Thiboutot DM, Dréno B, Abanmi A, et al. Practical management of acne for clinicians: An international consensus from the Global Alliance to Improve Outcomes in Acne. J Am Acad Dermatol. 2018;78(2 Suppl 1):S1-S23.e1. doi:10.1016/j.jaad.2017.09.078
Seité S, Dréno B, Benech F, Bédane C, Pecastaings S. Creation and validation of an artificial intelligence algorithm for acne grading. J Eur Acad Dermatol Venereol. 2020;34(12):2946-2951. doi:10.1111/jdv.16736
Winkler JK, Sies K, Fink C, et al. Association between different scale bars in dermoscopic images and diagnostic performance of a market-approved deep learning convolutional neural network for melanoma recognition. Eur J Cancer. 2021;145:146-154. doi:10.1016/j.ejca.2020.12.010
Burlina P, Joshi N, Ng E, Billings S, Paul W, Rotemberg V. Assessment of deep generative models for high-resolution synthetic retinal image generation of age-related macular degeneration. JAMA Ophthalmol. 2019;137(3):258-264. doi:10.1001/jamaophthalmol.2018.6156
Korotkov K, Garcia R. Computerized analysis of pigmented skin lesions: A review. Artif Intell Med. 2012;56(2):69-90. doi:10.1016/j.artmed.2012.08.002
Brinker TJ, Hekler A, Hauschild A, et al. Comparing artificial intelligence algorithms to 157 German dermatologists: the melanoma classification benchmark. Eur J Cancer. 2019;111:30-37. doi:10.1016/j.ejca.2018.12.016
Perednia DA, Brown NA. Teledermatology: one application of telemedicine. Bull Med Libr Assoc. 1995;83(1):42-47.
Ngoo A, Finnane A, McMeniman E, Tan JM, Janda M, Soyer HP. Fighting melanoma with smartphones: A snapshot on where we are a decade after app stores opened their doors. Int J Med Inform. 2018;118:99-112. doi:10.1016/j.ijmedinf.2018.08.004
Kroemer S, Frühauf J, Campbell TM, et al. Mobile teledermatology for skin tumour screening: diagnostic accuracy of clinical and dermoscopic image tele-evaluation using cellular phones. Br J Dermatol. 2011;164(5):973-979. doi:10.1111/j.1365-2133.2011.10208.x
Massone C, Hofmann-Wellenhof R, Ahlgrimm-Siess V, Gabler G, Ebner C, Soyer HP. Melanoma screening with cellular phones. PLoS One. 2007;2(5):e483. doi:10.1371/journal.pone.0000483
Ferrara G, Argenziano G, Soyer HP, et al. The influence of clinical information in the histopathologic diagnosis of melanocytic skin neoplasms. PLoS One. 2009;4(4):e5375. doi:10.1371/journal.pone.0005375
Carli P, De Giorgi V, Crocetti E, et al. Improvement of malignant/benign ratio in excised melanocytic lesions in the 'dermoscopy era': a retrospective study 1997-2001. Br J Dermatol. 2004;150(4):687-692. doi:10.1111/j.0007-0963.2004.05860.x
Del Bino S, Bernerd F. Variations in skin colour and the biological consequences of ultraviolet radiation exposure. Br J Dermatol. 2013;169(Suppl 3):33-40. doi:10.1111/bjd.12529
Pershing S, Enns JT, Bae IS, Randall BD, Pruiksma JB, Desai AD. Variability in physician assessment of oculoplastic standardized photographs. Aesthet Surg J. 2014;34(8):1203-1209. doi:10.1177/1090820X14542642
Goh CL. The need for evidence-based aesthetic dermatology practice. J Cutan Aesthet Surg. 2009;2(2):65-71. doi:10.4103/0974-2077.58518
Lester JC, Jia JL, Zhang L, Okoye GA, Linos E. Absence of images of skin of colour in publications of COVID-19 skin manifestations. Br J Dermatol. 2020;183(3):593-595. doi:10.1111/bjd.19258
Wagner JK, Jovel C, Norton HL, Parra EJ, Shriver MD. Comparing quantitative measures of erythema, pigmentation and skin response using reflectometry. Pigment Cell Res. 2002;15(5):379-384. doi:10.1034/j.1600-0749.2002.02042.x
Nkengne A, Bertin C, Stamatas GN, et al. Influence of facial skin attributes on the perceived age of Caucasian women. J Eur Acad Dermatol Venereol. 2008;22(8):982-991. doi:10.1111/j.1468-3083.2008.02698.x
Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94-98. doi:10.7861/futurehosp.6-2-94
Char DS, Shah NH, Magnus D. Implementing machine learning in health care - addressing ethical challenges. N Engl J Med. 2018;378(11):981-983. doi:10.1056/NEJMp1714229
Gichoya JW, Banerjee I, Bhimireddy AR, et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health. 2022;4(6):e406-e414. doi:10.1016/S2589-7500(22)00063-2
Alexis AF, Sergay AB, Taylor SC. Common dermatologic disorders in skin of color: a comparative practice survey. Cutis. 2007;80(5):387-394.
Ebede TL, Arch EL, Berson D. Hormonal treatment of acne in women. J Clin Aesthet Dermatol. 2009;2(12):16-22.
Ly BC, Dyer EB, Feig JL, Chien AL, Del Bino S. Research Techniques Made Simple: Cutaneous Colorimetry: A Reliable Technique for Objective Skin Color Measurement. J Invest Dermatol. 2020;140(1):3-12.e1. doi:10.1016/j.jid.2019.11.003
Chardon A, Cretois I, Hourseau C. Skin colour typology and suntanning pathways. Int J Cosmet Sci. 1991;13(4):191-208. doi:10.1111/j.1467-2494.1991.tb00561.x
Gareau DS. Feasibility of digitally stained multimodal confocal mosaics to simulate histopathology. J Biomed Opt. 2009;14(3):034050. doi:10.1117/1.3149853
Koenig K, Raphael AP, Lin L, et al. Optical skin biopsies by clinical CARS and multiphoton fluorescence/SHG tomography. Laser Phys Lett. 2011;8(6):465-468. doi:10.1002/lapl.201110014
Baldi A, Murace R, Dragonetti E, et al. The Significance of Artificial Intelligence in the Assessment of Skin Cancer. J Clin Med. 2021;10(21):4926. doi:10.3390/jcm10214926
Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi:10.1038/s41591-018-0316-z
Winkler JK, Fink C, Toberer F, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 2019;155(10):1135-1141. doi:10.1001/jamadermatol.2019.1735
Kittler H, Pehamberger H, Wolff K, Binder M. Diagnostic accuracy of dermoscopy. Lancet Oncol. 2002;3(3):159-165. doi:10.1016/s1470-2045(02)00679-4
Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317-1318. doi:10.1001/jama.2017.18391
Codella NC, Lin CC, Halpern A, et al. Collaborative human-AI (CHAI): Evidence-based interpretable melanoma classification in dermoscopic images. In: Understanding and Interpreting Machine Learning in Medical Image Computing Applications. Springer; 2018:97-105. doi:10.1007/978-3-030-02628-8_11
Zouboulis CC, Desai N, Emtestam L, et al. European S1 guideline for the treatment of hidradenitis suppurativa/acne inversa. J Eur Acad Dermatol Venereol. 2015;29(4):619-644. doi:10.1111/jdv.12966
Martorell A, García-Martínez FJ, Jiménez-Gallo D, et al. An update on hidradenitis suppurativa (Part I): Epidemiology, clinical aspects, and definition of disease severity. Actas Dermosifiliogr. 2015;106(9):703-715. doi:10.1016/j.ad.2015.06.004
Hurley HJ. Axillary hyperhidrosis, apocrine bromhidrosis, hidradenitis suppurativa, and familial benign pemphigus: surgical approach. In: Roenigk RK, Roenigk HH Jr, eds. Dermatologic Surgery: Principles and Practice. Marcel Dekker; 1989:623-645.
Revuz JE, Canoui-Poitrine F, Wolkenstein P, et al. Prevalence and factors associated with hidradenitis suppurativa: results from two case-control studies. J Am Acad Dermatol. 2008;59(4):596-601. doi:10.1016/j.jaad.2008.06.020
Alikhan A, Sayed C, Alavi A, et al. North American clinical management guidelines for hidradenitis suppurativa: A publication from the United States and Canadian Hidradenitis Suppurativa Foundations: Part I: Diagnosis, evaluation, and the use of complementary and procedural management. J Am Acad Dermatol. 2019;81(1):76-90. doi:10.1016/j.jaad.2019.02.067
Ingram JR, Collier F, Brown D, et al. British Association of Dermatologists guidelines for the management of hidradenitis suppurativa (acne inversa) 2018. Br J Dermatol. 2019;180(5):1009-1017. doi:10.1111/bjd.17537
Esmann S, Jemec GB. Psychosocial impact of hidradenitis suppurativa: a qualitative study. Acta Derm Venereol. 2011;91(3):328-332. doi:10.2340/00015555-1082
Matusiak Ł, Bieniek A, Szepietowski JC. Increased serum tumour necrosis factor-α in hidradenitis suppurativa patients: is there a basis for treatment with anti-tumour necrosis factor-α agents? Acta Derm Venereol. 2009;89(6):601-603. doi:10.2340/00015555-0701
Grant A, Gonzalez T, Montgomery MO, Cardenas V, Kerdel FA. Infliximab therapy for patients with moderate to severe hidradenitis suppurativa: a randomized, double-blind, placebo-controlled crossover trial. J Am Acad Dermatol. 2010;62(2):205-217. doi:10.1016/j.jaad.2009.06.050
Schneider-Burrus S, Tsaousi A, Barbus S, Huss-Marp J, Witte-Händel E, Witte K. Features associated with quality of life impairment in hidradenitis suppurativa patients. Front Med (Lausanne). 2021;8:676241. doi:10.3389/fmed.2021.676241
Moriarty B, Jiyad Z, Creamer D. Four-weekly infliximab in the treatment of severe hidradenitis suppurativa. Br J Dermatol. 2014;170(4):986-987. doi:10.1111/bjd.12823
Vossen ARJV, van der Zee HH, Prens EP. Hidradenitis Suppurativa: A Systematic Review Integrating Inflammatory Pathways Into a Cohesive Pathogenic Model. Front Immunol. 2018;9:2965. doi:10.3389/fimmu.2018.02965
Sabat R, Jemec GBE, Matusiak Ł, Kimball AB, Prens E, Wolk K. Hidradenitis suppurativa. Nat Rev Dis Primers. 2020;6(1):18. doi:10.1038/s41572-020-0149-1
Kimball AB, Kerdel F, Adams D, et al. Adalimumab for the treatment of moderate to severe hidradenitis suppurativa: a parallel randomized trial. Ann Intern Med. 2012;157(12):846-855. doi:10.7326/0003-4819-157-12-201212180-00004
Zouboulis CC, Tzellos T, Kyrgidis A, et al. Development and validation of the International Hidradenitis Suppurativa Severity Score System (IHS4), a novel dynamic scoring system to assess HS severity. Br J Dermatol. 2017;177(5):1401-1409. doi:10.1111/bjd.15748
Kimball AB, Okun MM, Williams DA, et al. Two Phase 3 Trials of Adalimumab for Hidradenitis Suppurativa. N Engl J Med. 2016;375(5):422-434. doi:10.1056/NEJMoa1504370
Jfri A, Nassim D, O'Brien E, Gulliver W, Nikolakis G, Zouboulis CC. Prevalence of Hidradenitis Suppurativa: A Systematic Review and Meta-regression Analysis. JAMA Dermatol. 2021;157(8):924-931. doi:10.1001/jamadermatol.2021.1677
Gomolin A, Cline A, Russo S, Wirya SA, Treat JR. Treatment of inflammatory manifestations of hidradenitis suppurativa with secukinumab in pediatric patients. JAAD Case Rep. 2019;5(12):1088-1091. doi:10.1016/j.jdcr.2019.10.005
Mehdizadeh A, Hazen PG, Bechara FG, et al. Recurrence of hidradenitis suppurativa after surgical management: A systematic review and meta-analysis. J Am Acad Dermatol. 2015;73(5 Suppl 1):S70-S77. doi:10.1016/j.jaad.2015.07.044
Goyal M, Knackstedt T, Yan S, Hassanpour S. Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities. Comput Biol Med. 2020;127:104065. doi:10.1016/j.compbiomed.2020.104065
Nasr-Esfahani E, Samavi S, Karimi N, et al. Melanoma detection by analysis of clinical images using convolutional neural network. Annu Int Conf IEEE Eng Med Biol Soc. 2016;2016:1373-1376. doi:10.1109/EMBC.2016.7590963
Garnavi R, Aldeen M, Celebi ME, Varigos G, Finch S. Border detection in dermoscopy images using hybrid thresholding on optimized color channels. Comput Med Imaging Graph. 2011;35(2):105-115. doi:10.1016/j.compmedimag.2010.08.001
Xie Y, Zhang J, Xia Y. Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT. Med Image Anal. 2019;57:237-248. doi:10.1016/j.media.2019.07.004
Serte S, Serener A, Al-Turjman F. Deep learning in medical imaging: A brief review. Trans Emerg Telecommun Technol. 2020;e4080. doi:10.1002/ett.4080
Lee H, Chen YP. Image based computer aided diagnosis system for cancer detection. Expert Syst Appl. 2015;42(12):5356-5365. doi:10.1016/j.eswa.2015.02.005
Udrea A, Mitra GD, Costea D, et al. Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms. J Eur Acad Dermatol Venereol. 2020;34(3):648-655. doi:10.1111/jdv.15935
Schmid-Saugeón P, Guillod J, Thiran JP. Towards a computer-aided diagnosis system for pigmented skin lesions. Comput Med Imaging Graph. 2003;27(1):65-78. doi:10.1016/s0895-6111(02)00048-4
Patwardhan SV, Dai S, Dhawan AP. Multi-spectral image analysis and classification of melanoma using fuzzy membership based partitions. Comput Med Imaging Graph. 2005;29(4):287-296. doi:10.1016/j.compmedimag.2004.11.002
Kasmi R, Mokrani K. Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule. IET Image Process. 2016;10(6):448-455. doi:10.1049/iet-ipr.2015.0385
Mendonça T, Ferreira PM, Marques JS, Marcal AR, Rozeira J. PH2 - A dermoscopic image database for research and benchmarking. Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:5437-5440. doi:10.1109/EMBC.2013.6610779
Chatterjee S, Dey D, Munshi S, Gorai S. Extraction of features from cross correlation in space and frequency domains for classification of skin lesions. Biomed Signal Process Control. 2019;53:101581. doi:10.1016/j.bspc.2019.101581
Nachbar F, Stolz W, Merkle T, et al. The ABCD rule of dermatoscopy. High prospective value in the diagnosis of doubtful melanocytic skin lesions. J Am Acad Dermatol. 1994;30(4):551-559. doi:10.1016/s0190-9622(94)70061-3
Pehamberger H, Steiner A, Wolff K. In vivo epiluminescence microscopy of pigmented skin lesions. I. Pattern analysis of pigmented skin lesions. J Am Acad Dermatol. 1987;17(4):571-583. doi:10.1016/s0190-9622(87)70239-4
Menter A, Gottlieb A, Feldman SR, et al. Guidelines of care for the management of psoriasis and psoriatic arthritis: Section 1. Overview of psoriasis and guidelines of care for the treatment of psoriasis with biologics. J Am Acad Dermatol. 2008;58(5):826-850. doi:10.1016/j.jaad.2008.02.039
Gollnick H, Cunliffe W, Berson D, et al. Management of acne: a report from a Global Alliance to Improve Outcomes in Acne. J Am Acad Dermatol. 2003;49(1 Suppl):S1-S37. doi:10.1067/mjd.2003.618
Jemec GB, Heidenheim M, Nielsen NH. The prevalence of hidradenitis suppurativa and its potential precursor lesions. J Am Acad Dermatol. 1996;35(2 Pt 1):191-194. doi:10.1016/s0190-9622(96)90321-7
Shaikh WR, Xiong M, Weinstock MA. The contribution of nodular subtype to melanoma mortality in the United States, 1978 to 2007. Arch Dermatol. 2012;148(1):30-36. doi:10.1001/archdermatol.2011.264
Johr RH. Dermoscopy: alternative melanocytic algorithms--the ABCD rule of dermatoscopy, menzies scoring method, and 7-point checklist. Clin Dermatol. 2002;20(3):240-247. doi:10.1016/s0738-081x(02)00236-5
Fredriksson T, Pettersson U. Severe psoriasis--oral therapy with a new retinoid. Dermatologica. 1978;157(4):238-244. doi:10.1159/000250839
Del Rosso JQ, Kim G. Optimizing use of topical corticosteroids in psoriasis: the role of age, site of involvement, vehicle, potency, and formulation. J Drugs Dermatol. 2010;9(5):457-465.
Williams HC, Burden-Teh E, Nunn AJ. What is the optimal dose of oral azathioprine for atopic eczema? The ADAPT trial. Br J Dermatol. 2017;177(3):e108-e109. doi:10.1111/bjd.15644
Lund CC, Browder NC. The estimation of areas of burns. Surg Gynecol Obstet. 1944;79:352-358.
Hettiaratchy S, Papini R. Initial management of a major burn: II--assessment and resuscitation. BMJ. 2004;329(7457):101-103. doi:10.1136/bmj.329.7457.101
Papp KA, Langley RG, Lebwohl M, et al. Efficacy and safety of ustekinumab, a human interleukin-12/23 monoclonal antibody, in patients with psoriasis: 52-week results from a randomised, double-blind, placebo-controlled trial (PHOENIX 2). Lancet. 2008;371(9625):1675-1684. doi:10.1016/S0140-6736(08)60726-6
Berwick M, Wiggins C. The current epidemiology of cutaneous malignant melanoma. Front Biosci. 2006;11:1244-1254. doi:10.2741/1877
Jones I, Currie L. Digital imaging research in burn wounds. Burns. 2004;30(3):211-214. doi:10.1016/j.burns.2003.11.016
Lucas VS, Burk RS, Creehan S, Grap MJ. Utility of high-resolution digital photography for wound measurement. Ostomy Wound Manage. 2006;52(9):52-54,56,58-61.
Plassmann P, Melhuish JM, Harding KG. Methods of measuring wound surface area: a comparative study. Ostomy Wound Manage. 1994;40(4):50-52,54,56-60.
Dhivya S, Padma VV, Santhini E. Wound dressings - a review. Biomedicine (Taipei). 2015;5(4):22. doi:10.7603/s40681-015-0022-9
Weir GR, Ling RS. Photographic planimetry: an evaluation using standard areas. Comput Biol Med. 1985;15(2):81-88. doi:10.1016/0010-4825(85)90035-1
Thatcher JE, Squiers JJ, Kanick SC, et al. Imaging techniques for clinical burn assessment with a focus on multispectral imaging. Adv Wound Care (New Rochelle). 2016;5(8):360-378. doi:10.1089/wound.2015.0684
Wallace HA, Basehore BM, Zito PM. Wound Healing Phases. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023.
Romanelli M, Vowden K, Weir D. Exudate management made easy. Wounds International. 2010;1(2):1-6.
European Wound Management Association (EWMA). Position Document: Wound Bed Preparation in Practice. London: MEP Ltd; 2004.
Sibbald RG, Orsted HL, Coutts PM, Keast DH. Best practice recommendations for preparing the wound bed: update 2006. Adv Skin Wound Care. 2007;20(7):390-405. doi:10.1097/01.ASW.0000281539.60701.01
Mrowietz U, Kragballe K, Reich K, et al. Definition of treatment goals for moderate to severe psoriasis: a European consensus. Arch Dermatol Res. 2011;303(1):1-10. doi:10.1007/s00403-010-1080-1
Finlay AY, Khan GK. Dermatology Life Quality Index (DLQI)--a simple practical measure for routine clinical use. Clin Exp Dermatol. 1994;19(3):210-216. doi:10.1111/j.1365-2230.1994.tb01167.x
Lu J, Kazmierczak E, Meinhardt J, Topfer T, Kiehl K, Jung A. Efficacy of digital imaging and telemedicine for wound assessment: a validation study. J Wound Care. 2009;18(12):517-520,522-526. doi:10.12968/jowc.2009.18.12.45595
Ashcroft DM, Wan Po AL, Williams HC, Griffiths CE. Clinical measures of disease severity and outcome in psoriasis: a critical appraisal of their quality. Br J Dermatol. 1999;141(2):185-191. doi:10.1046/j.1365-2133.1999.02963.x
Naldi L, Svensson A, Diepgen T, et al. Randomized clinical trials for psoriasis 1977-2000: the EDEN survey. J Invest Dermatol. 2003;120(5):738-741. doi:10.1046/j.1523-1747.2003.12145.x
Spuls PI, Lecluse LL, Poulsen ML, Bos JD, Stern RS, Nijsten T. How good are clinical severity and outcome measures for psoriasis?: quantitative evaluation in a systematic review. J Invest Dermatol. 2010;130(4):933-943. doi:10.1038/jid.2009.391
Chren MM, Lasek RJ, Flocke SA, Zyzanski SJ. Improved discriminative and evaluative capability of a refined version of Skindex, a quality-of-life instrument for patients with skin diseases. Arch Dermatol. 1997;133(11):1433-1440. doi:10.1001/archderm.133.11.1433

Huynh QT, Nguyen PH, Le HX, Ngo LT, Trinh NT, Tran MT, Nguyen HT, Vu NT, Nguyen AT, Suda K, Tsuji K. Automatic acne object detection and acne severity grading using smartphone images and artificial intelligence. Diagnostics. 2022 Aug 3;12(8):1879.

Mac Carthy T, Montilla IH, Aguilar A, Castro RG, Pérez AM, Sueiro AV, de la Campa LV, Alfageme F, Medela A. Automatic Urticaria Activity Score: Deep Learning–Based Automatic Hive Counting for Urticaria Severity Assessment. JID Innovations. 2024 Jan 1;4(1):100218.

Min S, Kong HJ, Yoon C, Kim HC, Suh DH. Development and evaluation of an automatic acne lesion detection program using digital image processing. Skin Research and Technology. 2013 Feb;19(1):e423-32.

Rashataprucksa, K., Chuangchaichatchavarn, C., Triukose, S., Nitinawarat, S., Pongprutthipan, M. and Piromsopa, K., 2020, August. Acne detection with deep neural networks. In Proceedings of the 2020 2nd International Conference on Image Processing and Machine Vision (pp. 53-56).

Sangha A, Rizvi M. Detection of acne by deep learning object detection. medRxiv. 2021 Dec 11:2021-12.

Wen H, Yu W, Wu Y, Zhao J, Liu X, Kuang Z, Fan R. Acne detection and severity evaluation with interpretable convolutional neural network models. Technology and Health Care. 2022 Jan;30(1_suppl):143-53.

Traceability to QMS Records

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

Author: Team members involved
Reviewer: JD-003, JD-004
Approver: JD-001

Purpose
Scope
Algorithm summary
Algorithm Classification
- Clinical Models
- Non-Clinical Models
Description and Specifications
Integration and Environment
- Integration
- Environment
References
Traceability to QMS Records

Purpose​

Scope​

Algorithm summary​

Algorithm Classification​

Clinical Models​

Non-Clinical Models​

Description and Specifications​

ICD Category Distribution and Binary Indicators​

Description​

ICD Category Distribution​

Binary Indicators​

Objectives​

ICD Category Distribution Objectives​

Binary Indicator Objectives​

Endpoints and Requirements​

ICD Category Distribution Endpoints and Requirements​

Binary Indicator Endpoints and Requirements​

Erythema Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Desquamation Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Induration Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Pustule Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Crusting Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Xerosis Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Swelling Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Oozing Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Excoriation Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Lichenification Intensity Quantification​

Description​

Objectives​

Endpoints and Requirements​

Wound Perilesional Erythema Assessment​

Description​

Objectives​

Endpoints and Requirements​

Damaged Wound Edges Assessment​

Description​

Objectives​

Endpoints and Requirements​

Delimited Wound Edges Assessment​

Description​

Objectives​

Endpoints and Requirements​

Diffuse Wound Edges Assessment​

Description​

Objectives​

Endpoints and Requirements​

Thickened Wound Edges Assessment​

Description​

Objectives​

Endpoints and Requirements​

Indistinguishable Wound Edges Assessment​

Description​

Objectives​

Purpose

Scope

Algorithm summary

Algorithm Classification

Clinical Models

Non-Clinical Models

Description and Specifications

ICD Category Distribution and Binary Indicators

Description

ICD Category Distribution

Binary Indicators

Objectives

ICD Category Distribution Objectives

Binary Indicator Objectives

Endpoints and Requirements

ICD Category Distribution Endpoints and Requirements

Binary Indicator Endpoints and Requirements

Erythema Intensity Quantification

Description

Objectives

Endpoints and Requirements

Desquamation Intensity Quantification

Description

Objectives

Endpoints and Requirements

Induration Intensity Quantification

Description

Objectives

Endpoints and Requirements

Pustule Intensity Quantification

Description

Objectives

Endpoints and Requirements

Crusting Intensity Quantification

Description

Objectives

Endpoints and Requirements

Xerosis Intensity Quantification

Description

Objectives

Endpoints and Requirements

Swelling Intensity Quantification

Description

Objectives

Endpoints and Requirements

Oozing Intensity Quantification

Description

Objectives

Endpoints and Requirements

Excoriation Intensity Quantification

Description

Objectives

Endpoints and Requirements

Lichenification Intensity Quantification

Description

Objectives

Endpoints and Requirements

Wound Perilesional Erythema Assessment

Description

Objectives

Endpoints and Requirements

Damaged Wound Edges Assessment

Description

Objectives

Endpoints and Requirements

Delimited Wound Edges Assessment

Description

Objectives

Endpoints and Requirements

Diffuse Wound Edges Assessment

Description

Objectives

Endpoints and Requirements

Thickened Wound Edges Assessment

Description

Objectives

Endpoints and Requirements

Indistinguishable Wound Edges Assessment

Description

Objectives