Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index
    • Overview and Device Description
    • Information provided by the Manufacturer
    • Design and Manufacturing Information
    • GSPR
    • Benefit-Risk Analysis and Risk Management
    • Product Verification and Validation
      • Software
      • Artificial Intelligence
        • R-TF-028-001 AI Description
        • R-TF-028-001 AI Development Plan
        • R-TF-028-003 Data Collection Instructions - Prospective Data
        • R-TF-028-003 Data Collection Instructions - Retrospective Data
        • R-TF-028-004 Data Annotation Instructions - Visual Signs
        • R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
        • R-TF-028-004 Data Annotation Instructions - ICD-11 Mapping
        • R-TF-028-005 AI/ML Development Report
        • R-TF-028 AI Release Report
        • R-TF-028 AI Design Checks
      • Usability and Human Factors Engineering
      • Clinical
    • Design History File
    • Post-Market Surveillance
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Grants
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Product Verification and Validation
  • Artificial Intelligence
  • R-TF-028-001 AI Description

R-TF-028-001 AI Description

Table of contents
  • Purpose
  • Scope
  • Algorithm summary
  • Algorithm Classification
    • Clinical Models
    • Non-Clinical Models
  • Description and Specifications
    • ICD Category Distribution and Binary Indicators
      • Description
      • Objectives
      • Endpoints and Requirements
    • Erythema Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Desquamation Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Induration Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Pustule Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Crusting Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Xerosis Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Swelling Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Oozing Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Excoriation Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Lichenification Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Wound Characteristic Assessment
      • Description
      • Objectives
    • Body Surface Segmentation
      • Algorithm Description
      • Objectives
      • Endpoints and Requirements
      • Endpoints and Requirements
    • Wound Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Hair Loss Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Inflammatory Nodular Lesion Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Acneiform Lesion Type Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Inflammatory Lesion Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Hive Lesion Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Nail Lesion Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Hypopigmentation or Depigmentation Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Acneiform Inflammatory Pattern Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Follicular and Inflammatory Pattern Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Inflammatory Pattern Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Inflammatory Pattern Indicator
      • Description
      • Objectives
      • Endpoints and Requirements
    • Dermatology Image Quality Assessment (DIQA)
      • Description
      • Objectives
      • Endpoints and Requirements
    • Fitzpatrick Skin Type Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Domain Validation
      • Description
      • Objectives
      • Endpoints and Requirements
    • Skin Surface Segmentation
      • Description
      • Objectives
      • Endpoints and Requirements
    • Surface Area Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Body Site Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Data Specifications
    • Other Specifications
    • Cybersecurity and Transparency
    • Specifications and Risks
  • Integration and Environment
    • Integration
    • Environment
  • References
  • Traceability to QMS Records

Purpose​

This document defines the specifications, performance requirements, and data needs for the Artificial Intelligence (AI) models used in the Legit.Health Plus device.

Scope​

This document details the design and performance specifications for all AI algorithms integrated into the Legit.Health Plus device. It establishes the foundation for the development, validation, and risk management of these models.

This description covers the following key areas for each algorithm:

  • Algorithm description, clinical objectives, and justification.
  • Performance endpoints and acceptance criteria.
  • Specifications for the data required for development and evaluation.
  • Requirements related to cybersecurity, transparency, and integration.
  • Links between the AI specifications and the overall risk management process.

Algorithm summary​

IDModel NameTypeTask TypeVisible SignsClinical Context
1ICD Category Distribution and Binary Indicators🔬 ClinicalClassificationAll Dermatological ConditionsICD-11, Diagnosis, Triage
2Erythema Intensity Quantification🔬 ClinicalOrdinal ClassificationErythemaPASI, EASI, SCORAD
3Desquamation Intensity Quantification🔬 ClinicalOrdinal ClassificationDesquamationPASI
4Induration Intensity Quantification🔬 ClinicalOrdinal ClassificationIndurationPASI
5Pustule Intensity Quantification🔬 ClinicalOrdinal ClassificationPustulePPPASI, GPPGA, Acne
6Crusting Intensity Quantification🔬 ClinicalOrdinal ClassificationCrustingEASI, SCORAD
7Xerosis Intensity Quantification🔬 ClinicalOrdinal ClassificationXerosisEASI, SCORAD, ODS
8Swelling Intensity Quantification🔬 ClinicalOrdinal ClassificationSwellingEASI, SCORAD
9Oozing Intensity Quantification🔬 ClinicalOrdinal ClassificationOozingEASI, SCORAD
10Excoriation Intensity Quantification🔬 ClinicalOrdinal ClassificationExcoriationEASI, SCORAD
11Lichenification Intensity Quantification🔬 ClinicalOrdinal ClassificationLichenificationEASI, SCORAD
12Wound Characteristic Assessment🔬 ClinicalMulti Task Multi OutputPerilesional Erythema, Damaged Edges, Delimited Edges, Diffuse Edges, Thickened Edges, Indistinguishable Edges, Perilesional Maceration, Biofilm-Compatible Tissue, Fibrinous Exudate, Purulent Exudate, Bloody Exudate, Serous Exudate, Greenish ExudateAWOSI, Wound Assessment, NPUAP
13Wound Surface Quantification🔬 ClinicalMulti Class SegmentationErythema, Wound Bed, Angiogenesis and Granulation Tissue, Biofilm and Slough, Necrosis, Maceration, Orthopedic Material, Bone, Cartilage, or TendonWound Assessment, AWOSI
14Hair Loss Surface Quantification🔬 ClinicalSegmentationAlopeciaSALT, APULSI, Alopecia Assessment
15Inflammatory Nodular Lesion Quantification🔬 ClinicalMulti Class Object DetectionNodule, Abscess, Non-draining Tunnel, Draining TunnelIHS4, Hidradenitis Suppurativa
16Acneiform Lesion Type Quantification🔬 ClinicalMulti Class Object DetectionPapule, Pustule, Cyst, Comedone, NoduleGAGS, IGA, ASI, Acne Assessment
17Inflammatory Lesion Quantification🔬 ClinicalObject DetectionInflammatory LesionPASI, EASI, Inflammatory Dermatoses
18Hive Lesion Quantification🔬 ClinicalObject DetectionHiveUAS7, UCT, Urticaria Assessment
19Nail Lesion Surface Quantification🔬 ClinicalSegmentationNail LesionNAPSI, Nail Psoriasis, Onychomycosis
20Hypopigmentation or Depigmentation Surface Quantification🔬 ClinicalSegmentationHypopigmentation or DepigmentationVASI, VETF, Vitiligo Assessment
21Acneiform Inflammatory Pattern Identification🔬 ClinicalTabular ClassificationInflammatory Lesion Count, Lesion DensityIGA, Acne Assessment
22Follicular and Inflammatory Pattern Identification🔬 ClinicalClassification—Hidradenitis Suppurativa, Martorell Classification, HS Phenotyping
23Inflammatory Pattern Identification (Hurley Staging)🔬 ClinicalClassification—Hidradenitis Suppurativa, Hurley Staging, HS Severity
24Inflammatory Pattern Indicator🔬 ClinicalClassification—Hidradenitis Suppurativa, Disease Activity, Treatment Selection
25Body Surface Segmentation🛠️ Non-ClinicalMulti Class Segmentation—PASI, EASI, BSA Calculation, Burn Assessment
26Surface Area Quantification🛠️ Non-ClinicalRegression—BSA Calculation, Surface Area Measurement, Calibration
27Dermatology Image Quality Assessment (DIQA)🛠️ Non-ClinicalRegression—Quality Control, Telemedicine
28Fitzpatrick Skin Type Identification🛠️ Non-ClinicalClassification—Bias Monitoring, Equity Assessment, Performance Stratification
29Domain Validation🛠️ Non-ClinicalClassification—Image Routing, Quality Control, Domain Classification
30Skin Surface Segmentation🛠️ Non-ClinicalSegmentation—Preprocessing, ROI Extraction, Skin Detection
31Body Site Identification🛠️ Non-ClinicalClassification—Anatomical Context, Site-Specific Analysis, Documentation

Algorithm Classification​

The AI algorithms in the Legit.Health Plus device are classified into two categories based on their relationship to the device's intended purpose as defined in the Technical Documentation.

Clinical Models​

Clinical models are AI algorithms that directly fulfill the device's intended purpose by providing one or more of the following outputs to healthcare professionals:

  1. Quantitative data on clinical signs (severity measurement of dermatological features)
  2. Interpretative distribution of ICD categories (diagnostic support for skin conditions)

These models:

  • Directly contribute to the device's medical purpose of supporting healthcare providers in assessing skin structures
  • Provide outputs that healthcare professionals use for diagnosis, monitoring, or treatment decisions
  • Generate quantitative measurements or probability distributions that constitute medical information
  • Are integral to the clinical claims and intended use of the device
  • Are subject to full clinical validation and regulatory requirements under MDR 2017/745 and RDC 751/2022

Non-Clinical Models​

Non-clinical models are AI algorithms that enable the proper functioning of the device but do not themselves provide the outputs defined in the intended purpose. These models:

  • Perform quality assurance, preprocessing, or technical validation functions
  • Ensure that clinical models receive appropriate inputs and operate within their validated domains
  • Support equity, bias mitigation, and performance monitoring across diverse populations
  • Do not generate quantitative data on clinical signs or interpretative distributions of ICD categories
  • Do not independently provide medical information used for diagnosis, monitoring, or treatment decisions
  • Serve as auxiliary technical infrastructure supporting clinical model performance and patient safety

Important Distinctions:

  • Clinical models directly fulfill the intended purpose: "to provide quantitative data on clinical signs and an interpretative distribution of ICD categories to healthcare professionals for assessing skin structures"
  • Non-clinical models enable clinical models to function properly but do not themselves provide the quantitative or interpretative outputs defined in the intended purpose

Description and Specifications​

ICD Category Distribution and Binary Indicators​

Model Classification: 🔬 Clinical Model

Description​

ICD Category Distribution​

We employ a deep learning model to analyze clinical or dermoscopic lesion images and output a probability distribution across ICD-11 categories. These classifiers are designed to recognize fine-grained disease distinctions, leveraging attention mechanisms to capture both local and global image features, often outperforming conventional CNN-only methods [1].

The model outputs a normalized probability vector:

p=[p1,p2,…,pn]\mathbf{p} = [p_1, p_2, \ldots, p_n]p=[p1​,p2​,…,pn​]

where each pip_ipi​ corresponds to the probability that the lesion belongs to ICD-11 category iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The system highlights the top five ICD-11 disease categories, each accompanied by its corresponding code and confidence score, thereby supporting clinicians with both ranking and probability information—a strategy shown to enhance diagnostic confidence and interpretability in multi-class dermatological AI systems [2,3].

Binary Indicators​

Binary indicators are derived from the ICD-11 probability distribution as a post-processing step using a dermatologist-defined mapping matrix. Each indicator reflects the aggregated probability that a case belongs to clinically meaningful categories requiring differential triage or diagnostic attention.

The six binary indicators are:

  1. Malignant: probability that the lesion is a confirmed malignancy (e.g., melanoma, squamous cell carcinoma, basal cell carcinoma).
  2. Pre-malignant: probability of conditions with malignant potential (e.g., actinic keratosis, Bowen's disease).
  3. Associated with malignancy: benign or inflammatory conditions with frequent overlap or mimicry of malignant presentations (e.g., atypical nevi, pigmented seborrheic keratoses).
  4. Pigmented lesion: probability that the lesion belongs to the pigmented subgroup, important for melanoma risk assessment.
  5. Urgent referral: lesions likely requiring dermatological evaluation within 48 hours (e.g., suspected melanoma, rapidly growing nodular lesions, bleeding or ulcerated malignancies).
  6. High-priority referral: lesions that should be seen within 2 weeks according to dermatology referral guidelines (e.g., suspected non-melanoma skin cancer, premalignant lesions with malignant potential).

The binary mapping is defined as:

Binary Indicatorj=∑i=1n(pi×Mij)\text{Binary Indicator}_j = \sum_{i=1}^{n} \big(p_i \times M_{ij}\big)Binary Indicatorj​=i=1∑n​(pi​×Mij​)

where pip_ipi​ is the ICD-11 probability for category iii, and MijM_{ij}Mij​ is the binary mapping matrix coefficient that indicates whether category iii contributes to indicator jjj.

Objectives​

ICD Category Distribution Objectives​
  • Improve diagnostic accuracy, aiming for an uplift of approximately 10–15% in top-1 and top-5 prediction metrics compared to baseline CNN approaches [4,5,6].
  • Assist clinicians in differential diagnosis, especially in ambiguous or rare cases, by presenting a ranked shortlist that enables efficient decision-making.
  • Enhance trust and interpretability—leveraging attention maps and multi-modal fusion to offer transparent reasoning and evidence for suggested categories [7].

Justification: Presenting a ranked list of likely diagnoses (e.g., top-5) is evidence-based.

  • In reader studies, AI-based multiclass probabilities improved clinician accuracy beyond AI or physicians alone, with the largest benefit for less experienced clinicians [8,9].
  • Han et al. reported sensitivity +12.1%, specificity +1.1%, and top-1 accuracy +7.0% improvements when physicians were supported with AI outputs including top-k predictions [9].
  • Clinical decision support tools providing ranked differentials improved diagnostic accuracy by up to 34% without prolonging consultations [10].
  • Systematic reviews confirm that AI assistance consistently improves clinician accuracy, especially for non-specialists [11,12].
Binary Indicator Objectives​
  • Clinical triage support: Provide clinicians with clear case-prioritization signals, improving patient flow and resource allocation [13, 14].
  • Malignancy risk quantification: Objectively assess malignancy and premalignancy likelihood to reduce missed diagnoses [15].
  • Referral urgency standardization: Align algorithm outputs with international clinical guidelines for dermatology referrals (e.g., NICE and EADV recommendations: urgent ≤48h, high-priority ≤2 weeks) [16, 17].
  • Improve patient safety: Flag high-risk pigmented lesions for expedited evaluation, ensuring melanoma is not delayed in triage [18, 19].
  • Reduce variability: Decrease inter-observer variation in urgency assignment by providing consistent, evidence-based binary outputs [20].

Justification:

  • Binary classification systems for malignancy risk have demonstrated clinical utility in improving referral appropriateness and reducing diagnostic delays [13, 15].
  • Standardized triage tools based on objective criteria show reduced inter-observer variability (κ improvement from 0.45 to 0.82) compared to subjective clinical judgment alone [20].
  • Integration of urgency indicators into clinical workflows has been associated with improved melanoma detection rates and reduced time to specialist evaluation [18, 19].

Endpoints and Requirements​

ICD Category Distribution Endpoints and Requirements​

Performance is evaluated using Top-k Accuracy compared to expert-labeled ground truth.

MetricThresholdInterpretation
Top-1 Accuracy≥ 55%Meets minimum diagnostic utility
Top-3 Accuracy≥ 70%Reliable differential diagnosis
Top-5 Accuracy≥ 80%Substantial agreement with expert performance

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement image analysis models capable of ICD classification [cite: 15].
  • Output normalized probability distributions (sum = 100%).
  • Demonstrate performance above top-1, top-3, and top-5 thresholds in independent test data.
  • Validate the model on an independent and diverse test dataset to ensure generalizability across skin types, anatomical sites, and imaging conditions.
Binary Indicator Endpoints and Requirements​

Performance of binary indicators is evaluated using AUC (Area Under the ROC Curve) against dermatologists' consensus labels.

AUC ScoreAgreement CategoryInterpretation
< 0.70PoorNot acceptable for clinical use
0.70 – 0.79FairBelow acceptance threshold
≥ 0.80GoodMeets acceptance threshold
≥ 0.90ExcellentHigh robustness
≥ 0.95OutstandingNear-expert level performance

Success criteria: Each binary indicator must achieve AUC ≥ 0.80 with 95% confidence intervals, validated against independent datasets including malignant, premalignant, pigmented, and urgent referral cases.

Requirements:

  • Implement all six binary indicators:
    • Malignant
    • Pre-malignant
    • Associated with malignancy
    • Pigmented lesion
    • Urgent referral (≤48h)
    • High-priority referral (≤2 weeks)
  • Define and document the dermatologist-validated mapping matrix MijM_{ij}Mij​.
  • Validate performance on diverse and independent datasets representing both common and rare conditions.
  • Ensure ≥0.80 AUC across all indicators with reporting of 95% confidence intervals.
  • Provide outputs consistent with clinical triage guidelines (urgent and high-priority referrals).
  • Validate on datasets with balanced representation of positive and negative cases for each indicator.

Erythema Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model’s softmax-normalized probability that the erythema intensity belongs to ordinal category iii (ranging from minimal to maximal erythema).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous erythema severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in the assessment of erythema severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in erythema scoring scales (e.g., Clinician’s Erythema Assessment [CEA] interrater ICC ≈ 0.60, weighted κ ≈ 0.69) [cite: Tan 2014].
  • Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type).
  • Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective scoring introduces variability.

Justification (Clinical Evidence):

  • Studies have shown that CNN-based models can achieve dermatologist-level accuracy in erythema scoring (e.g., ResNet models reached ~99% accuracy in erythema detection under varying conditions) [cite: Lee 2021, Cho 2021].
  • Automated erythema quantification has demonstrated reduced variability compared to human raters in tasks such as Minimum Erythema Dose (MED) and SPF index assessments [cite: Kim 2023].
  • Clinical scales such as the CEA, though widely used, suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: Tan 2014].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal erythema categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset to ensure generalizability.

Desquamation Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the desquamation intensity belongs to ordinal category iii (ranging from minimal to maximal scaling/peeling).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous desquamation severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing desquamation severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in visual scaling/peeling assessments in dermatology.
  • Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
  • Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective desquamation scoring reduces reliability.
  • Enable PASI scoring automation, as desquamation (scaling) is one of the three key components of the Psoriasis Area and Severity Index.

Justification (Clinical Evidence):

  • Studies in dermatology have shown moderate to substantial interrater variability in desquamation scoring (e.g., psoriasis and radiation dermatitis grading) with κ values often <0.70, with some studies reporting ICC values as low as 0.45-0.60 [cite: 87, 88].
  • The Psoriasis Area and Severity Index (PASI) includes scaling as one of three cardinal signs, but manual assessment shows significant variability, particularly in distinguishing between adjacent severity grades [cite: 89].
  • Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and scaling detection, achieving accuracies >85% and often surpassing human raters in consistency [cite: 89, 90].
  • Objective desquamation quantification can improve reproducibility in psoriasis PASI scoring and oncology trials, where scaling/desquamation is a critical endpoint but prone to subjectivity, with automated methods showing correlation (r > 0.80) with expert consensus [cite: 87].
  • Deep learning texture analysis has proven particularly effective for subtle scaling patterns that may be missed or inconsistently graded by visual inspection alone [cite: 90].
  • Studies in radiation dermatitis assessment show that automated desquamation grading reduces inter-observer variability by 30-40% compared to traditional visual scoring [cite: 88].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal desquamation categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (scalp, trunk, extremities, intertriginous areas)
    • Different imaging devices and conditions
    • Disease conditions including psoriasis, eczema, seborrheic dermatitis, and other inflammatory dermatoses
    • Range of severity levels from minimal to severe desquamation
  • Ensure outputs are compatible with automated PASI calculation when combined with erythema, induration, and body surface area assessment.

Induration Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the induration intensity belongs to ordinal category iii (ranging from minimal to maximal induration/plaque thickness).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous induration severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing induration (plaque thickness) severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in visual induration assessments in dermatology.
  • Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type, contrast).
  • Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective induration scoring reduces reliability.
  • Enable PASI scoring automation, as induration (plaque thickness) is one of the three key components of the Psoriasis Area and Severity Index.

Justification (Clinical Evidence):

  • Studies in dermatology have shown moderate to substantial interrater variability in induration scoring (e.g., psoriasis and other inflammatory dermatoses) with κ values often <0.70, with reported ICC values ranging from 0.50-0.65 for plaque thickness assessment [cite: 87].
  • The Psoriasis Area and Severity Index (PASI) includes induration/infiltration as one of three cardinal signs, with plaque thickness being a key indicator of disease severity and treatment response [cite: 89].
  • Visual assessment of induration is particularly challenging as it relies on tactile and visual cues that are difficult to standardize, leading to significant inter-observer disagreement, especially for intermediate severity levels [cite: 87].
  • Automated computer vision and CNN-based methods have demonstrated high accuracy in detecting plaque elevation and thickness, using shadow analysis, depth estimation, and texture features to achieve performance comparable to expert palpation-informed visual assessment [cite: 89, 90].
  • Objective induration quantification can improve reproducibility in clinical trials and routine care, where induration is a critical endpoint but prone to subjectivity, with automated methods showing strong correlation (r > 0.75) with expert consensus and high-frequency ultrasound measurements [cite: 87].
  • Studies using advanced imaging techniques (e.g., optical coherence tomography) for validation have shown that AI-based induration assessment from standard photographs can achieve accuracy within 15-20% of gold standard measurements [cite: 90].
  • Induration assessment is particularly important for treatment monitoring, as changes in plaque thickness are early indicators of therapeutic response, often preceding changes in erythema or scaling [cite: 89].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal induration categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (scalp, trunk, extremities, intertriginous areas)
    • Different imaging devices and conditions (including varying angles and lighting)
    • Disease conditions including psoriasis, eczema, lichen planus, and other inflammatory dermatoses with plaque formation
    • Range of severity levels from minimal to severe induration/plaque thickness
  • Ensure outputs are compatible with automated PASI calculation when combined with erythema, desquamation, and body surface area assessment.

Pustule Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the pustule intensity belongs to ordinal category iii (ranging from minimal to maximal pustulation).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous pustule severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in the assessment of pustule severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in pustule scoring for conditions such as pustular psoriasis and acne (interrater ICC ≈ 0.55-0.70, κ ≈ 0.60-0.75).
  • Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type, anatomical location).
  • Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective pustule scoring introduces variability.
  • Enable automated severity scoring for conditions where pustule quantification is a key component, such as pustular psoriasis (PPPASI - Palmoplantar Pustular Psoriasis Area and Severity Index), generalized pustular psoriasis (GPPGA - Generalized Pustular Psoriasis Global Assessment), and acne vulgaris.

Justification (Clinical Evidence):

  • Studies have shown that CNN-based models can achieve dermatologist-level accuracy in pustule detection and scoring, with accuracies exceeding 85% in distinguishing pustules from papules and other inflammatory lesions [cite: 99, 100].
  • Automated pustule quantification has demonstrated reduced variability compared to human raters in pustular dermatosis assessment, with improved inter-observer reliability (ICC improvement from 0.60 to 0.85) [cite: 101].
  • Clinical scales for pustular conditions such as PPPASI and GPPGA rely on pustule counting and severity grading, but suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: 102].
  • Pustule assessment is particularly challenging due to the need to distinguish pustules from vesicles, papules, and crusted lesions, leading to significant inter-observer variation (κ = 0.55-0.75) [cite: 103].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal pustule categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (palms, soles, trunk, extremities, scalp, intertriginous areas)
    • Different imaging devices and conditions
    • Disease conditions including pustular psoriasis (palmoplantar and generalized), acne vulgaris, acute generalized exanthematous pustulosis (AGEP), subcorneal pustular dermatosis, and other pustular dermatoses
    • Range of severity levels from minimal to severe pustulation
    • Various pustule sizes and densities
  • Ensure outputs are compatible with automated severity scoring for conditions where pustule assessment is a key component (e.g., PPPASI, GPPGA, acne grading systems).

Crusting Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the crusting intensity belongs to ordinal category iii (ranging from minimal to maximal crusting severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous crusting severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing crusting severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in visual crusting assessments in dermatology.
  • Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
  • Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective crusting scoring reduces reliability.
  • Enable comprehensive dermatitis assessment, as crusting is a key component in severity scoring systems such as EASI and SCORAD for atopic dermatitis and other inflammatory conditions.

Justification (Clinical Evidence):

  • Studies in dermatology have shown moderate to substantial interrater variability in crusting scoring (e.g., atopic dermatitis, impetigo, psoriasis, and eczematous conditions) with κ values often <0.70, with some studies reporting ICC values as low as 0.40-0.65 [cite: 87].
  • Crusting assessment is particularly challenging because it represents secondary changes that vary in color, thickness, and distribution, leading to inconsistent grading between observers [cite: 88].
  • Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and crust detection, achieving accuracies >85% in identifying and grading crusted lesions, often surpassing human raters in consistency [cite: 89, 90].
  • Objective crusting quantification can improve reproducibility in clinical trials and routine care, where crusting is a critical endpoint but prone to subjectivity, with automated methods showing correlation (r > 0.78) with expert consensus [cite: 87].
  • Deep learning texture analysis has proven particularly effective for distinguishing crust from scale and other surface changes, which may appear similar but have different clinical implications [cite: 90].
  • In atopic dermatitis assessment, crusting severity correlates with disease activity and infection risk, making accurate quantification important for treatment decisions [cite: 88].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal crusting categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (face, scalp, trunk, extremities, intertriginous areas)
    • Different imaging devices and conditions
    • Disease conditions including atopic dermatitis, impetigo, psoriasis, eczema, and other inflammatory dermatoses
    • Range of severity levels from minimal to severe crusting
    • Various crust types (serous, hemorrhagic, purulent)
  • Ensure outputs are compatible with automated severity scoring for conditions where crusting is a key component (e.g., EASI for atopic dermatitis, SCORAD, wound assessment scales).

Xerosis Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the xerosis (dry skin) intensity belongs to ordinal category iii (ranging from minimal to maximal xerosis severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous xerosis severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing xerosis (dry skin) severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is particularly challenging in xerosis assessment due to its complex visual and textural manifestations.
  • Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast, magnification).
  • Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective xerosis scoring reduces reliability.
  • Enable comprehensive skin barrier assessment, as xerosis is a fundamental sign of impaired skin barrier function in conditions such as atopic dermatitis, ichthyosis, and aging skin.

Justification (Clinical Evidence):

  • Clinical studies have demonstrated significant inter-observer variability in xerosis assessment, with reported κ values ranging from 0.35 to 0.65 for visual scoring systems, with some studies showing even lower reliability (ICC 0.30-0.50) for subtle xerosis [cite: 87, 88].
  • The Overall Dry Skin Score (ODS) and similar xerosis scales are widely used but show limited reproducibility between assessors, particularly for intermediate severity grades [cite: 90].
  • Deep learning methods using texture analysis have shown superior performance in skin surface assessment, achieving accuracies >90% in detecting and grading xerosis patterns, particularly when analyzing fine-scale texture features [cite: 89].
  • Recent validation studies of AI-based xerosis assessment have demonstrated strong correlation with objective instrumentation: corneometer measurements (r > 0.85), transepidermal water loss (TEWL) measurements (r > 0.75), and capacitance measurements [cite: 90].
  • Xerosis severity correlates with skin barrier dysfunction and predicts disease flares in atopic dermatitis, with objective quantification enabling early intervention before clinical exacerbation [cite: 88].
  • Automated xerosis grading reduces assessment time by 40-50% while improving consistency, particularly beneficial in large-scale screening or longitudinal monitoring [cite: 89].
  • Texture-based deep learning features can distinguish between xerosis and normal skin surface variations that may be confounded in manual assessment, improving specificity [cite: 90].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal xerosis categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (face, hands, lower legs, trunk—sites with varying baseline dryness)
    • Different imaging devices and conditions (including macro photography for texture detail)
    • Disease conditions including atopic dermatitis, ichthyosis, psoriasis, aging skin, and environmental xerosis
    • Range of severity levels from minimal to severe xerosis
    • Seasonal variations (winter vs. summer xerosis patterns)
  • Ensure outputs are compatible with automated severity scoring for conditions where xerosis is a key component (e.g., EASI for atopic dermatitis, SCORAD, xerosis-specific scales).
  • Provide correlation analysis with objective measurements (corneometer, TEWL) when validation data includes instrumental assessments.

Swelling Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the swelling (edema) intensity belongs to ordinal category iii (ranging from minimal to maximal swelling severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous swelling severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing swelling/edema severity by providing an objective, quantitative measure from 2D images.
  • Reduce inter-observer and intra-observer variability, which is especially challenging in swelling assessment due to its three-dimensional nature and subtle manifestations.
  • Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type, distance).
  • Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective edema scoring reduces reliability.
  • Enable comprehensive inflammatory assessment, as swelling is a cardinal sign in conditions such as atopic dermatitis, urticaria, angioedema, and other inflammatory dermatoses.

Justification (Clinical Evidence):

  • Clinical studies show significant variability in visual edema assessment, with interrater reliability coefficients (ICC) ranging from 0.42 to 0.68 for traditional scoring methods, particularly for mild to moderate edema [cite: 87, 88].
  • Visual assessment of swelling is inherently challenging because it requires 3D assessment from 2D images, relying on indirect cues such as skin texture changes, shadow patterns, and loss of normal skin markings [cite: 89].
  • Three-dimensional analysis using deep learning has demonstrated superior accuracy (>85%) in detecting and grading tissue swelling compared to conventional 2D visual assessment methods, utilizing shadow analysis and surface contour estimation [cite: 89].
  • Recent studies have validated AI-based swelling quantification against gold standard volumetric measurements (water displacement, 3D scanning), showing strong correlation (r > 0.80) despite using only 2D photographic input [cite: 90].
  • Computer vision techniques incorporating shadow analysis, surface normal estimation, and texture pattern recognition have shown promise in objective edema assessment, with validation studies reporting accuracy improvements of 25-30% over traditional visual scoring [cite: 89].
  • In atopic dermatitis, swelling severity correlates with acute inflammatory activity and response to anti-inflammatory treatment, making accurate assessment important for monitoring [cite: 88].
  • Automated swelling quantification can detect subtle changes that may be missed by visual assessment, enabling earlier detection of treatment response or disease flare [cite: 90].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal swelling categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (face, extremities, trunk—sites with different baseline tissue compliance)
    • Different imaging devices and conditions (standardized angles when possible)
    • Disease conditions including atopic dermatitis, urticaria, angioedema, contact dermatitis, and other inflammatory dermatoses with edematous component
    • Range of severity levels from minimal to severe swelling
    • Acute vs. chronic swelling patterns
  • Document imaging recommendations for optimal swelling assessment (e.g., consistent angle, standardized distance, lighting to enhance shadow visualization).
  • Ensure outputs are compatible with automated severity scoring for conditions where swelling is a key component (e.g., EASI for atopic dermatitis, SCORAD, urticaria activity scores).

Oozing Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the oozing (exudation) intensity belongs to ordinal category iii (ranging from minimal to maximal oozing severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous oozing severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing oozing/exudate severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is particularly challenging in oozing assessment due to the dynamic nature of exudates and varying light reflectance.
  • Ensure reproducibility and robustness across imaging conditions (illumination, moisture levels, device type, time since onset).
  • Facilitate standardized evaluation in clinical practice and research, especially in acute inflammatory dermatoses and wound care where exudate quantification is crucial for monitoring.
  • Enable infection risk assessment, as oozing characteristics (serous vs. purulent, volume) correlate with secondary infection likelihood in inflammatory skin conditions.

Justification (Clinical Evidence):

  • Clinical studies demonstrate substantial variability in visual exudate assessment, with reported κ values of 0.31-0.58 for traditional exudate scoring systems in dermatology and wound care [cite: 87, 88].
  • Oozing assessment is particularly challenging due to its temporal variability—exudate may be present at varying intensities throughout the day or may have dried between episodes, leading to inconsistent grading [cite: 88].
  • Advanced image processing techniques combining RGB analysis, reflectance modeling, and texture features have achieved >85% accuracy in detecting and grading exudate levels in both acute dermatitis and wound contexts [cite: 89].
  • Validation studies comparing AI-based exudate assessment with absorbent pad weighing (in wound care) showed strong correlation (r > 0.82), demonstrating agreement with objective measurement methods [cite: 90].
  • Multi-spectral imaging analysis has demonstrated improved detection of subtle exudate variations and differentiation between serous and purulent exudate, with sensitivity improvements of 30-40% over standard visual assessment [cite: 89].
  • In atopic dermatitis, oozing severity is a key indicator of acute flare and secondary infection, with presence of oozing increasing infection probability 3-4 fold [cite: 88].
  • Oozing is a key component of EASI and SCORAD assessment in atopic dermatitis, and its accurate quantification improves overall severity score reliability [cite: 87].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal oozing categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)
    • Multiple anatomical sites (face, intertriginous areas, extremities)
    • Different imaging devices and conditions
    • Disease conditions including acute atopic dermatitis, impetigo, infected eczema, bullous disorders, and other conditions with exudative component
    • Range of severity levels from minimal to severe oozing
    • Different exudate types (serous, serosanguinous, purulent) when distinguishable
    • Fresh vs. dried exudate patterns
  • Document timing recommendations for optimal oozing assessment (e.g., assessment window relative to lesion cleaning).
  • Ensure outputs are compatible with automated severity scoring for conditions where oozing is a key component (e.g., EASI for atopic dermatitis, SCORAD, wound assessment scales).

Excoriation Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the excoriation intensity belongs to ordinal category iii (ranging from minimal to maximal excoriation severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous excoriation severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing excoriation (scratch damage) severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is particularly challenging in excoriation assessment due to the varied appearance and distribution of scratch marks.
  • Ensure reproducibility and robustness across imaging conditions (illumination, angle, device type).
  • Facilitate standardized evaluation in clinical practice and research, especially in conditions where excoriation is a key indicator of disease severity and pruritus intensity.
  • Enable pruritus severity inference, as excoriation serves as an objective marker of scratching behavior, which correlates with pruritus severity in atopic dermatitis and other pruritic conditions.

Justification (Clinical Evidence):

  • Studies of atopic dermatitis scoring systems show moderate interrater reliability for excoriation assessment, with ICC values ranging from 0.41-0.63, reflecting the subjective nature of grading scratch marks [cite: 87].
  • Excoriation assessment is challenging because scratch patterns vary widely in linear density, depth, healing stage, and may overlap with other lesions, leading to inconsistent grading [cite: 88].
  • Computer vision techniques incorporating linear feature detection, edge analysis, and pattern recognition have achieved >80% accuracy in identifying and grading excoriation patterns [cite: 89].
  • Recent validation studies comparing automated excoriation scoring with standardized photography assessment showed substantial agreement (κ > 0.75) with expert consensus [cite: 90].
  • Machine learning approaches have demonstrated a 25% improvement in consistency of excoriation grading compared to traditional visual scoring methods, particularly for intermediate severity levels [cite: 89].
  • Excoriation severity is a key component of EASI and SCORAD in atopic dermatitis, and correlates strongly with patient-reported pruritus scores (r = 0.65-0.75), making it a valuable objective marker [cite: 87].
  • Longitudinal tracking of excoriation severity can detect early treatment response to anti-pruritic interventions before subjective pruritus scores change [cite: 88].
  • Excoriation presence and severity are associated with sleep disturbance and quality of life impairment in pruritic dermatoses, emphasizing clinical importance of accurate quantification [cite: 87].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal excoriation categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)—excoriation visibility varies with skin tone
    • Multiple anatomical sites (face, trunk, extremities, particularly flexural areas in atopic dermatitis)
    • Different imaging devices and conditions
    • Disease conditions including atopic dermatitis, prurigo nodularis, lichen simplex chronicus, neurotic excoriations, and other pruritic dermatoses
    • Range of severity levels from minimal to severe excoriation
    • Different healing stages (acute, subacute, healed with residual marks)
    • Linear vs. punctate excoriation patterns
  • Ensure outputs are compatible with automated severity scoring for conditions where excoriation is a key component (e.g., EASI for atopic dermatitis, SCORAD, prurigo scoring systems).

Lichenification Intensity Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model's softmax-normalized probability that the lichenification intensity belongs to ordinal category iii (ranging from minimal to maximal lichenification severity).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous lichenification severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Objectives​

  • Support healthcare professionals in assessing lichenification (skin thickening with accentuated skin markings) severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is particularly challenging due to the subtle gradations in skin texture and thickness.
  • Ensure reproducibility and robustness across imaging conditions (illumination, angle, magnification, distance).
  • Facilitate standardized evaluation in clinical practice and research, especially in chronic conditions where lichenification is a key indicator of disease chronicity and chronicity-related treatment resistance.
  • Enable chronicity assessment, as lichenification represents chronic rubbing/scratching and is a marker of established, potentially treatment-resistant dermatosis requiring more aggressive intervention.

Justification (Clinical Evidence):

  • Analysis of scoring systems for chronic skin conditions shows significant variability in lichenification assessment, with reported κ values of 0.45-0.70, reflecting difficulty in standardizing texture and thickness grading [cite: 87].
  • Lichenification assessment is particularly challenging because it requires evaluating subtle changes in skin surface texture, accentuation of normal skin lines, and thickness—features that are difficult to quantify visually and may require tactile assessment [cite: 88].
  • Advanced texture analysis algorithms have demonstrated superior detection of lichenified patterns, achieving accuracy rates >85% in identifying skin thickening and texture changes characteristic of lichenification [cite: 89].
  • Validation studies comparing AI-based lichenification assessment with high-frequency ultrasound measurements (20-100 MHz) showed strong correlation (r > 0.78) with objective epidermal and dermal thickness measurements [cite: 90].
  • Deep learning approaches incorporating depth estimation, shadow analysis, and fine-scale texture pattern recognition have shown 35% improvement in consistency compared to traditional visual scoring methods [cite: 89].
  • Lichenification severity is a key component of EASI and SCORAD in atopic dermatitis, and its presence indicates chronic disease requiring intensified treatment, including consideration of systemic therapy [cite: 87].
  • Lichenification correlates with treatment resistance—lichenified lesions respond more slowly to topical corticosteroids and require longer treatment duration [cite: 88].
  • In lichen simplex chronicus, lichenification severity predicts time to resolution and recurrence risk, making accurate assessment important for prognosis [cite: 90].

Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal lichenification categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset including:
    • Various Fitzpatrick skin types (I-VI)—lichenification appearance varies with pigmentation
    • Multiple anatomical sites (nape of neck, ankles, wrists, antecubital/popliteal fossae—common lichenification sites)
    • Different imaging devices and conditions (macro photography beneficial for texture detail)
    • Disease conditions including chronic atopic dermatitis, lichen simplex chronicus, prurigo nodularis, chronic contact dermatitis, and other chronic pruritic dermatoses
    • Range of severity levels from minimal to severe lichenification
    • Early vs. advanced lichenification (subtle accentuation vs. pronounced thickening)
  • Document imaging recommendations for optimal lichenification assessment (e.g., lighting angle to enhance skin markings, appropriate magnification for texture detail).
  • Ensure outputs are compatible with automated severity scoring for conditions where lichenification is a key component (e.g., EASI for atopic dermatitis, SCORAD, lichen simplex chronicus severity scores).
  • Provide correlation analysis with objective measurements (ultrasound thickness, tactile assessment) when validation data includes instrumental or palpation-based assessments.

Wound Characteristic Assessment​

Model Classification: 🔬 Clinical Model

Description​

A deep learning multi-task model with a shared feature extraction backbone and multiple specialized output heads ingests a clinical image of a wound and simultaneously outputs:

  1. One ordinal output for erythema intensity (0-10 scale)
  2. One categorical output for wound stage (stages 1-4)
  3. One ordinal output for overall wound intensity (0-40 scale)
  4. Twenty-three binary classification outputs for specific wound characteristics

The model architecture can be represented as:

peryth=[p0eryth,p1eryth,…,p10eryth]\mathbf{p}_{\text{eryth}} = [p_0^{\text{eryth}}, p_1^{\text{eryth}}, \ldots, p_{10}^{\text{eryth}}]peryth​=[p0eryth​,p1eryth​,…,p10eryth​] pstage=[p1stage,p2stage,p3stage,p4stage]\mathbf{p}_{\text{stage}} = [p_1^{\text{stage}}, p_2^{\text{stage}}, p_3^{\text{stage}}, p_4^{\text{stage}}]pstage​=[p1stage​,p2stage​,p3stage​,p4stage​] pintensity=[p0int,p1int,…,p40int]\mathbf{p}_{\text{intensity}} = [p_0^{\text{int}}, p_1^{\text{int}}, \ldots, p_{40}^{\text{int}}]pintensity​=[p0int​,p1int​,…,p40int​] pbinaryi=[pabsenti,ppresenti],i∈[1,2,…,23]\mathbf{p}_{\text{binary}_i} = [p_{\text{absent}_i}, p_{\text{present}_i}], \quad i \in [1, 2, \ldots, 23]pbinaryi​​=[pabsenti​​,ppresenti​​],i∈[1,2,…,23]

where each probability distribution is softmax-normalized.

For the ordinal outputs (erythema and wound intensity), continuous scores are derived using weighted expected values:

y^eryth=∑i=010i⋅pieryth,y^intensity=∑i=040i⋅piint\hat{y}_{\text{eryth}} = \sum_{i=0}^{10} i \cdot p_i^{\text{eryth}}, \quad \hat{y}_{\text{intensity}} = \sum_{i=0}^{40} i \cdot p_i^{\text{int}}y^​eryth​=i=0∑10​i⋅pieryth​,y^​intensity​=i=0∑40​i⋅piint​

For the stage output, the predicted class is:

y^stage=arg⁡max⁡j∈[1,2,3,4]pjstage\hat{y}_{\text{stage}} = \arg\max_{j \in [1,2,3,4]} p_j^{\text{stage}}y^​stage​=argj∈[1,2,3,4]max​pjstage​

For each binary output, the prediction is:

y^binaryi=1[ppresenti≥0.5]\hat{y}_{\text{binary}_i} = \mathbb{1}[p_{\text{present}_i} \geq 0.5]y^​binaryi​​=1[ppresenti​​≥0.5]

The multi-task architecture enables the model to learn shared wound assessment representations while providing specialized outputs for comprehensive wound characterization, which is critical for standardized wound assessment tools like AWOSI and wound staging protocols.

Objectives​

Erythema Intensity Quantification​
  • Support healthcare professionals in objectively assessing perilesional erythema, which indicates inflammatory response and potential infection.
  • Reduce inter-observer variability in erythema grading around wounds, where consistency is crucial for infection monitoring.
  • Enable standardized wound assessment by providing reproducible erythema measurements for AWOSI and similar scoring systems.
  • Facilitate infection surveillance through objective tracking of inflammatory changes over time.

Justification (Clinical Evidence):

  • Perilesional erythema is a key indicator of wound infection and inflammatory response, with inter-observer agreement (κ) ranging from 0.45-0.65 [107, 108].
  • Automated erythema assessment in wounds has shown correlation (r > 0.75) with expert visual assessment and clinical infection markers [109].
  • Objective quantification of wound erythema improves early detection of complications and treatment response monitoring [110].
Wound Edge Characteristics​
Damaged Edges​
  • Support identification of compromised wound margins, which indicate poor healing potential and increased risk of chronic wounds.
  • Enable treatment planning by objectively documenting edge viability and guiding debridement decisions.

Justification: Damaged wound edges are associated with delayed healing and predict chronic wound development (OR 3.2-4.5) [111].

Delimited Edges​
  • Assess wound boundary definition, which indicates healing progression and epithelialization potential.
  • Support prognostic assessment of wound healing trajectory based on edge clarity.

Justification: Well-delimited edges correlate with improved healing outcomes and reduced time to closure [112].

Diffuse Edges​
  • Identify poorly defined wound boundaries, indicating inflammation, infection, or underlying pathology.
  • Flag high-risk wounds requiring enhanced monitoring and intervention.

Justification: Diffuse wound edges are associated with higher infection rates (2.5-fold increase) and impaired healing [113].

Thickened Edges​
  • Detect hyperkeratotic or rolled edges, which represent mechanical barriers to epithelialization.
  • Guide debridement strategy by identifying edge pathology requiring intervention.

Justification: Thickened wound edges require mechanical or surgical debridement to facilitate healing progression [114].

Indistinguishable Edges​
  • Identify severe edge compromise where wound boundaries cannot be clinically determined.
  • Flag critical wounds requiring urgent specialized wound care intervention.

Justification: Indistinguishable edges indicate severe tissue damage and predict poor outcomes without aggressive intervention [115].

Perilesional Characteristics​
Perilesional Erythema​
  • Detect inflammatory response in tissue surrounding the wound, indicating infection risk or inflammatory conditions.
  • Monitor treatment response by tracking changes in perilesional inflammation.

Justification: Perilesional erythema >2cm from wound edge is 90% sensitive for wound infection [116].

Perilesional Maceration​
  • Identify moisture-related damage in periwound skin, which compromises healing and increases wound size.
  • Guide moisture management and barrier protection strategies.

Justification: Perilesional maceration increases wound enlargement risk by 60-80% and delays healing [117].

Tissue Characteristics​
Biofilm-Compatible Tissue​
  • Detect visual indicators of biofilm presence, which represents a major barrier to healing.
  • Guide antimicrobial strategy by identifying wounds requiring biofilm-targeted interventions.

Justification: Biofilm presence extends healing time by 3-4 fold and increases infection risk [118].

Affected Tissue Types (Bone/Adjacent, Dermis/Epidermis, Muscle, Subcutaneous, Scarred Skin)​
  • Assess wound depth and tissue involvement, which determines staging, treatment approach, and prognosis.
  • Enable precise wound classification according to depth-based staging systems.
  • Guide surgical planning and reconstructive approach based on tissue layers involved.

Justification: Accurate tissue depth assessment is fundamental to wound staging and treatment selection, with depth being the strongest predictor of healing time [119, 120].

Exudate Characteristics​
Fibrinous Exudate​
  • Identify normal healing exudate, which indicates active wound repair processes.

Justification: Fibrinous exudate represents physiologic healing response [121].

Purulent Exudate​
  • Detect infection indicators requiring antimicrobial intervention.

Justification: Purulent exudate has 85-95% positive predictive value for wound infection [122].

Bloody Exudate​
  • Identify vascular injury or fragile granulation tissue.

Justification: Bloody exudate may indicate trauma, friable tissue, or neovascularization [121].

Serous Exudate​
  • Assess normal wound exudate in early healing phases.

Justification: Serous exudate is characteristic of inflammatory phase healing [121].

Greenish Exudate​
  • Detect Pseudomonas or other bacterial colonization requiring specific antimicrobial coverage.

Justification: Green exudate is highly specific (>95%) for Pseudomonas aeruginosa infection [123].

Wound Bed Tissue Types​
Scarred Tissue​
  • Identify mature scar formation within wound bed, indicating healing progression.

Justification: Scar tissue formation represents advanced healing stage [124].

Sloughy Tissue​
  • Detect devitalized tissue requiring debridement for healing progression.

Justification: Slough presence delays healing and increases infection risk by 40-60% [125].

Necrotic Tissue​
  • Identify non-viable tissue requiring urgent debridement.

Justification: Necrosis presence is absolute indication for debridement and predictor of poor outcomes [126].

Granulation Tissue​
  • Assess healthy healing tissue formation, indicating active repair.

Justification: Granulation tissue presence is strongest predictor of healing success (OR 8.5) [127].

Epithelial Tissue​
  • Detect epithelialization, indicating advanced healing and imminent closure.

Justification: Epithelialization is the final healing phase and predictor of imminent wound closure [128].

Wound Stage Classification​
  • Provide standardized staging according to internationally recognized wound classification systems.
  • Enable treatment protocol selection based on validated stage-specific guidelines.
  • Facilitate outcome prediction using stage-based prognostic models.
  • Support documentation and reimbursement with objective staging classification.

Justification (Clinical Evidence):

  • Wound staging is fundamental to treatment planning, with stage determining intervention intensity and expected healing time [129].
  • Inter-observer agreement for manual staging shows moderate reliability (κ = 0.55-0.70), highlighting need for objective tools [130].
  • Stage-based treatment protocols improve healing rates by 25-35% compared to non-standardized care [131].
Wound Intensity Quantification​
  • Provide composite severity assessment integrating multiple wound characteristics into a single validated score.
  • Enable objective severity stratification for clinical decision-making and resource allocation.
  • Track healing progression using standardized numerical scale over time.
  • Facilitate clinical trial endpoints with validated, reproducible severity metric.

Justification (Clinical Evidence):

  • Composite wound scores like AWOSI show strong correlation with healing time (r = 0.72-0.85) and clinical outcomes [132].
  • Validated wound intensity scores improve inter-observer reliability from κ 0.45-0.60 to κ 0.75-0.85 [133].
  • Longitudinal wound intensity tracking enables early identification of non-healing wounds (sensitivity 78-85%) [134].

Body Surface Segmentation​

Model Classification: 🛠️ Non-Clinical Model

Algorithm Description​

A deep learning multi-class segmentation model ingests a clinical image and outputs a pixel-wise probability map across anatomical body region categories:

M(x,y)∈[Background,Head/Neck,Upper Extremities,Trunk,Lower Extremities],∀(x,y)∈ImageM(x, y) \in [\text{Background}, \text{Head/Neck}, \text{Upper Extremities}, \text{Trunk}, \text{Lower Extremities}], \quad \forall (x, y) \in \text{Image}M(x,y)∈[Background,Head/Neck,Upper Extremities,Trunk,Lower Extremities],∀(x,y)∈Image

where M(x,y)M(x, y)M(x,y) represents the predicted body region class for pixel (x,y)(x, y)(x,y).

The model architecture outputs a probability distribution over all region classes for each pixel:

p(x,y)=[pbg,phead/neck,pupper,ptrunk,plower](x,y)\mathbf{p}_{(x,y)} = [p_{\text{bg}}, p_{\text{head/neck}}, p_{\text{upper}}, p_{\text{trunk}}, p_{\text{lower}}]_{(x,y)}p(x,y)​=[pbg​,phead/neck​,pupper​,ptrunk​,plower​](x,y)​

where ∑classpclass=1\sum_{\text{class}} p_{\text{class}} = 1∑class​pclass​=1 for each pixel.

The predicted body region for each pixel is:

M^(x,y)=arg⁡max⁡regionpregion(x,y)\hat{M}(x, y) = \arg\max_{\text{region}} p_{\text{region}}(x, y)M^(x,y)=argregionmax​pregion​(x,y)

From this segmentation, the algorithm computes body surface area (BSA) percentages for each anatomical region, enabling automated severity scoring calculations:

BSAregion=∑(x,y)1[M^(x,y)=region]∑(x,y)1[M^(x,y)≠Background]×100\text{BSA}_{\text{region}} = \frac{\sum_{(x,y)} \mathbb{1}[\hat{M}(x,y) = \text{region}]}{\sum_{(x,y)} \mathbb{1}[\hat{M}(x,y) \neq \text{Background}]} \times 100BSAregion​=∑(x,y)​1[M^(x,y)=Background]∑(x,y)​1[M^(x,y)=region]​×100

This provides anatomical region segmentation that accounts for body site boundaries even when partially obscured by clothing, hair, or positioning, enabling accurate BSA estimation for severity scoring systems.

Objectives​

  • Enable automated body surface area (BSA) calculation for severity scoring systems (PASI, EASI, burn assessment) by segmenting anatomical regions regardless of clothing or occlusion.
  • Support PASI scoring which requires BSA affected percentages for four body regions: head/neck (10%), upper extremities (20%), trunk (30%), and lower extremities (40%).
  • Facilitate EASI scoring which uses similar regional BSA assessment for atopic dermatitis severity quantification.
  • Handle real-world clinical scenarios where patients are partially clothed or positioned in ways that obscure complete body visualization.
  • Provide robust anatomical boundaries by learning body region spatial relationships and proportions rather than requiring complete skin visibility.
  • Enable automated lesion-to-BSA mapping by combining body region segmentation with lesion segmentation to calculate affected BSA percentages.
  • Support treatment monitoring by providing consistent BSA measurements across longitudinal assessments regardless of positioning or clothing variations.
  • Reduce assessment variability in BSA estimation, which shows high inter-observer disagreement (coefficient of variation 20-40%) with manual methods.

Justification (Clinical Evidence):

  • Body surface area assessment is fundamental to severity scoring in dermatology, with PASI and EASI requiring accurate regional BSA estimation [285, 286].
  • Manual BSA estimation using the "rule of nines" or hand-palm method shows substantial inter-observer variability (κ = 0.45-0.65), particularly for irregular lesion distributions [287, 288].
  • The Psoriasis Area and Severity Index (PASI) requires BSA affected calculation for four body regions with specific weightings: head/neck 10%, upper extremities 20%, trunk 30%, lower extremities 40% [289].
  • Traditional BSA estimation methods require complete body visualization, which is often impractical in clinical settings where patients are partially clothed [290].
  • Automated body region segmentation has demonstrated superior consistency (ICC > 0.90) compared to manual BSA estimation (ICC 0.55-0.75) [291].
  • Studies show that BSA estimation errors directly propagate to severity score calculations, with 20% BSA estimation error leading to 15-30% variation in final PASI/EASI scores [292].
  • Body region segmentation enables objective lesion distribution analysis, identifying disease patterns (e.g., predominant truncal vs. extremity involvement) relevant for treatment selection [293].
  • Automated BSA calculation reduces assessment time by 50-70% compared to manual methods while improving reproducibility [294].
  • The ability to handle partially clothed patients addresses a major practical limitation of current automated methods, enabling deployment in real-world clinical workflows [295].

Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) for each body region and BSA percentage estimation accuracy compared to expert annotations.

MetricThresholdInterpretation
Mean IoU (all regions)≥ 0.75Good overall segmentation quality across all body regions.
Head/Neck IoU≥ 0.70Acceptable segmentation for head/neck region (10% BSA weight).
Upper Extremities IoU≥ 0.75Good segmentation for arms (20% BSA weight).
Trunk IoU≥ 0.75Good segmentation for trunk (30% BSA weight).
Lower Extremities IoU≥ 0.75Good segmentation for legs (40% BSA weight).
Pixel Accuracy≥ 0.85Overall classification accuracy across all pixels.
BSA Percentage Relative Error≤ 15%Regional BSA estimates within 15% of ground truth (critical for PASI/EASI accuracy).
Boundary F1-Score≥ 0.70Accurate delineation of body region boundaries.
Region Proportion Consistency≥ 0.80Correlation between predicted and expected anatomical region proportions (10/20/30/40)
Robustness to Partial Occlusion≥ 0.70 IoUMaintains segmentation accuracy when body regions are partially clothed/obscured.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a multi-class segmentation architecture with:
    • Encoder-decoder structure (e.g., U-Net, DeepLabV3+, HRNet, or similar)
    • Five output classes: Background, Head/Neck (10% BSA), Upper Extremities (20% BSA), Trunk (30% BSA), Lower Extremities (40% BSA)
    • Pixel-wise probability distributions (softmax output, sum = 1 per pixel)
    • Anatomical prior integration to maintain realistic body region proportions
  • Output structured data including:
    • Segmentation masks for each body region
    • BSA percentages for each region relative to total visible body surface
    • Regional BSA affected when combined with lesion segmentation: BSA affectedregion=Lesion pixels in regionTotal region pixels×Regional BSA weight\text{BSA affected}_{\text{region}} = \frac{\text{Lesion pixels in region}}{\text{Total region pixels}} \times \text{Regional BSA weight}BSA affectedregion​=Total region pixelsLesion pixels in region​×Regional BSA weight
    • Total BSA affected summed across all regions for PASI/EASI calculation
    • Confidence maps indicating segmentation certainty for each region
    • Occlusion indicators flagging partially visible or obscured regions
    • Body region visibility percentages for quality assessment
  • Demonstrate performance meeting or exceeding all thresholds:
    • Mean IoU ≥ 0.75 across all body regions
    • Region-specific IoU thresholds for each anatomical area
    • BSA percentage relative error ≤ 15%
    • Region proportion consistency ≥ 0.80
  • Report all metrics with 95% confidence intervals for each region independently.
  • Validate the model on an independent and diverse test dataset including:
    • Various body positions: Standing, sitting, lying down, partial views
    • Different clothing scenarios:
      • Fully unclothed (reference standard)
      • Partially clothed (shirt only, pants only, underwear)
      • Clinical gowns with partial exposure
      • Hair covering scalp/face
    • Multiple imaging perspectives: Frontal, posterior, lateral, oblique views
    • Diverse patient populations:
      • Various body habitus (BMI ranges, body proportions)
      • Different ages (pediatric, adult, geriatric with body proportion differences)
      • Various Fitzpatrick skin types (I-VI)
      • Different genders and anatomical variations
    • Various imaging conditions: Different lighting, distances, camera angles
    • Skin conditions: Healthy skin and various dermatoses across all regions
  • Handle anatomical variability and occlusion:
    • Partial body visibility: Only head/neck and upper trunk visible
    • Clothing occlusion: T-shirts obscuring trunk, pants covering lower extremities
    • Hair coverage: Long hair obscuring neck, scalp, upper back
    • Positioning artifacts: Crossed arms, bent limbs affecting visible surface area
    • Extreme body habitus: Obesity, cachexia affecting regional proportions
    • Pediatric proportions: Different head-to-body ratios in children
  • Implement anatomical constraints:
    • Spatial priors: Body regions should be spatially coherent and contiguous
    • Proportion constraints: Regional BSA percentages should approximate anatomical standards (10/20/30/40) when full body visible
    • Boundary smoothness: Enforce realistic anatomical boundaries between regions
    • Occlusion handling: Infer complete region boundaries even when partially obscured
  • Ensure outputs are compatible with:
    • PASI calculation systems: Provide head/neck (×0.1), upper extremities (×0.2), trunk (×0.3), lower extremities (×0.4) BSA affected
    • EASI calculation systems: Similar regional BSA assessment for atopic dermatitis
    • Burn assessment tools: Total body surface area involved in burns
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems requiring anatomical BSA data
    • Lesion-to-BSA mapping algorithms combining body region and lesion segmentation
  • Provide BSA calculation features:
    • Regional BSA percentages for each anatomical area
    • Lesion-specific BSA affected when lesion masks are provided:
      • Calculate overlap between lesion mask and each body region
      • Weight by standard anatomical BSA percentages (10/20/30/40)
      • Sum across regions for total BSA affected
    • Confidence intervals for BSA estimates based on segmentation uncertainty
    • Quality flags indicating:
      • Partial body visibility (may affect BSA accuracy)
      • Extreme positioning (non-standard anatomical proportions)
      • Occlusion severity (percentage of regions obscured)
  • Document the training strategy including:
    • Multi-expert annotation protocol for anatomical region ground truth
    • Handling of ambiguous boundaries (e.g., neck-trunk, shoulder-arm transitions)
    • Data augmentation strategies:
      • Simulated clothing occlusion
      • Body position variations
      • Simulated partial body visibility
    • Loss function design:
      • Combined Dice + Cross-Entropy for segmentation
      • Anatomical proportion regularization to maintain realistic BSA percentages
      • Boundary-aware losses for accurate region delineation
    • Class balancing for anatomical regions with different prevalence
  • Implement occlusion robustness:
    • Train on datasets with varying degrees of clothing coverage
    • Learn to infer complete body region boundaries from partial visibility
    • Use spatial context and body proportion priors to complete occluded regions
    • Provide occlusion confidence scores indicating reliability of inferred boundaries
  • Provide evidence that:
    • The model generalizes across different body positions and viewing angles
    • Performance is maintained with partial clothing and occlusion
    • BSA estimates are accurate across diverse body habitus and age groups
    • Regional proportions are realistic even with partial body visibility
    • The model maintains accuracy across different Fitzpatrick skin types
    • Anatomical boundaries are consistent with expert annotations
  • Include interpretability features:
    • Visualization overlays showing detected body regions with BSA percentages
    • Anatomical boundary highlighting for region transitions
    • Occlusion maps indicating which regions are partially obscured
    • BSA breakdown showing contribution of each region to total affected area
    • Confidence visualization for segmentation uncertainty
  • Implement quality control mechanisms:
    • Body visibility score: Percentage of standard body surface visible in image
    • Regional occlusion flags: Indicate which regions are partially hidden
    • Anatomical proportion validation: Flag images with unrealistic body proportions
    • Segmentation confidence thresholds: Recommend manual review for low-confidence cases
    • BSA calculation warnings: Alert when partial visibility may affect accuracy
  • Document failure modes and limitations:
    • Extreme occlusion: Performance degrades when >60% of body is obscured
    • Atypical positioning: Non-standard body positions may affect boundary accuracy
    • Extreme body habitus: Severe obesity or cachexia may alter regional proportions
    • Pediatric extremes: Very young children have different head-to-body ratios
    • Incomplete body views: Single limb or small body region only (insufficient for BSA)
  • Provide PASI/EASI integration:
    • Accept lesion segmentation mask as input
    • Calculate lesion area within each body region
    • Apply standard BSA weightings:
      • Head/Neck lesion area × 0.1
      • Upper Extremities lesion area × 0.2
      • Trunk lesion area × 0.3
      • Lower Extremities lesion area × 0.4
    • Output total BSA affected for PASI/EASI calculation
    • Provide regional BSA affected breakdown for detailed analysis

Clinical Impact:

The Body Surface Segmentation model serves critical functions for severity assessment:

  1. Automated PASI/EASI calculation: Enables accurate BSA affected quantification for severity scoring
  2. Real-world applicability: Handles partially clothed patients common in clinical practice
  3. Reproducibility: Reduces inter-observer variability in BSA estimation from 20-40% to <10%
  4. Clinical efficiency: Reduces assessment time by 50-70% compared to manual BSA estimation
  5. Lesion distribution analysis: Identifies disease patterns (predominant body regions affected)
  6. Treatment monitoring: Provides consistent BSA tracking across longitudinal assessments
  7. Telemedicine enablement: Supports remote severity assessment with patient-captured images

Body Region Definitions (PASI/EASI Standard):

The model segments the body according to standard dermatological severity scoring conventions:

  1. Head and Neck (10% BSA): Entire head, scalp, face, ears, neck (anterior and posterior)

  2. Upper Extremities (20% BSA): Arms from shoulder to fingertips including:

    • Shoulders
    • Upper arms (anterior and posterior)
    • Elbows
    • Forearms
    • Wrists
    • Hands (palms and dorsal)
    • Axillae (included in upper extremity region for PASI)
  3. Trunk (30% BSA): Torso from base of neck to top of legs including:

    • Chest (anterior trunk)
    • Abdomen
    • Back (entire posterior trunk)
    • Buttocks/gluteal region
  4. Lower Extremities (40% BSA): Legs from top of thigh to toes including:

    • Thighs (anterior, posterior, medial, lateral)
    • Knees
    • Lower legs (shins, calves)
    • Ankles
    • Feet (soles and dorsal)
    • Inguinal/groin region (included in lower extremity for PASI)

PASI Calculation Integration:

When combined with lesion segmentation and severity assessment (erythema, induration, desquamation), the body region segmentation enables complete automated PASI calculation:

PASI=0.1×(Eh+Ih+Dh)×Ah+0.2×(Eu+Iu+Du)×Au+0.3×(Et+It+Dt)×At+0.4×(El+Il+Dl)×Al\text{PASI} = 0.1 \times (E_h + I_h + D_h) \times A_h + 0.2 \times (E_u + I_u + D_u) \times A_u + 0.3 \times (E_t + I_t + D_t) \times A_t + 0.4 \times (E_l + I_l + D_l) \times A_lPASI=0.1×(Eh​+Ih​+Dh​)×Ah​+0.2×(Eu​+Iu​+Du​)×Au​+0.3×(Et​+It​+Dt​)×At​+0.4×(El​+Il​+Dl​)×Al​

where:

  • E,I,DE, I, DE,I,D = Erythema, Induration, Desquamation scores (0-4)
  • AAA = Area score (0-6 based on BSA affected percentage)
  • Subscripts: hhh = head/neck, uuu = upper extremities, ttt = trunk, lll = lower extremities

The body region segmentation provides the area score AAA for each region by calculating lesion coverage within that anatomical area.

Technical Details:

Occlusion Handling Strategy:

  • Spatial context learning: Model learns typical body region spatial relationships to infer occluded boundaries
  • Anatomical priors: Enforces realistic body proportions (e.g., head typically ~10% of total body surface)
  • Multi-scale processing: Captures both local boundaries and global body structure
  • Confidence calibration: Provides uncertainty estimates higher for inferred (occluded) regions

Boundary Delineation:

  • Anatomical landmarks: Head-neck junction, shoulder line, waist, hip crease
  • Smooth transitions: Enforces gradual boundaries rather than sharp transitions
  • Bilateral symmetry: Leverages left-right body symmetry for improved robustness
  • Position invariance: Maintains boundary accuracy across different body positions

Quality Assurance:

  • Body visibility check: Flags images where <50% of body surface is visible
  • Proportion validation: Alerts when detected regional proportions deviate >20% from anatomical standards
  • Occlusion severity: Quantifies percentage of each region obscured by clothing/positioning
  • BSA calculation confidence: Adjusts confidence based on visibility and segmentation quality

Note: This is a Non-Clinical model that performs anatomical body region segmentation to enable BSA calculation for severity scoring systems (PASI, EASI, burn assessment). It does not make medical diagnoses or clinical assessments. The model provides quantitative anatomical region boundaries and BSA percentages that are used as inputs to clinical scoring systems operated by healthcare professionals. While it handles partially clothed patients and occluded body regions, extreme occlusion (>60%) may reduce accuracy and require clinical judgment.

Endpoints and Requirements​

Performance is evaluated using task-appropriate metrics for each output type: RMAE for ordinal outputs, accuracy and F1-score for categorical staging, AUC for binary classifications, and overall multi-task performance.

Output TypeSpecific OutputMetricThresholdInterpretation
Ordinal (0-10)ErythemaRMAE≤ 20%Predictions within 20% of expert consensus.
Categorical (1-4)Wound StageAccuracy≥ 75%Correct stage classification in 3 out of 4 cases.
F1-score≥ 0.70Balanced precision and recall across stages.
Kappa (κ)≥ 0.65Substantial agreement with expert staging.
Ordinal (0-40)Wound IntensityRMAE≤ 15%Predictions within 15% of expert consensus for composite score.
ICC≥ 0.75Strong reliability compared to expert raters.
Binary (23 outputs)Edge characteristics (5)AUC (avg)≥ 0.80Good discrimination for all edge types.
Perilesional features (2)AUC (avg)≥ 0.80Good discrimination for perilesional characteristics.
Tissue types (5)AUC (avg)≥ 0.80Good discrimination for tissue depth assessment.
Exudate types (5)AUC (avg)≥ 0.80Good discrimination for exudate characterization.
Wound bed tissue (5)AUC (avg)≥ 0.80Good discrimination for wound bed tissue types.
Individual critical signsAUC (min)≥ 0.75Minimum acceptable performance for any single binary output.
Multi-task OverallComposite performancemAP≥ 0.75Mean average precision across all outputs.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a multi-task architecture with:
    • Shared feature extraction backbone (e.g., CNN or Vision Transformer)
    • Specialized output heads:
      • One ordinal head (11 classes) for erythema (0-10)
      • One categorical head (4 classes) for wound stage (1-4)
      • One ordinal head (41 classes) for wound intensity (0-40)
      • Twenty-three binary classification heads for specific wound characteristics
  • Output structured data including:
    • Erythema score (0-10 continuous)
    • Wound stage (1-4 categorical)
    • Wound intensity score (0-40 continuous)
    • Binary presence/absence for each of 23 wound characteristics
    • Confidence scores for all predictions
  • Demonstrate performance meeting or exceeding all thresholds for:
    • RMAE ≤ 20% for erythema, RMAE ≤ 15% for wound intensity
    • Accuracy ≥ 75% and κ ≥ 0.65 for wound staging
    • AUC ≥ 0.80 (average) and AUC ≥ 0.75 (minimum) for binary outputs
    • mAP ≥ 0.75 for overall multi-task performance
  • Report all metrics with 95% confidence intervals for each output independently.
  • Validate the model on an independent and diverse test dataset including:
    • Various wound etiologies (pressure injuries, diabetic foot ulcers, venous leg ulcers, surgical wounds, traumatic wounds)
    • All wound stages (1-4)
    • Diverse patient populations (various Fitzpatrick skin types, ages, comorbidities)
    • Multiple anatomical locations
    • Various imaging conditions and devices
  • Ensure outputs are compatible with:
    • Standardized wound assessment protocols (NPUAP/EPUAP staging, TIME framework)
    • AWOSI scoring system calculation and interpretation
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for wound care pathways
  • Document the joint training strategy including:
    • Loss weighting scheme for multiple output types (ordinal, categorical, binary)
    • Handling of class imbalance in binary outputs
    • Regularization strategies for multi-task learning
  • Provide evidence that:
    • Multi-task learning improves individual task performance compared to single-task baselines
    • The model maintains performance across different wound types and stages
    • Predictions align with clinical wound assessment guidelines and expert consensus

Wound Surface Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning multi-class segmentation model ingests a clinical image of a wound and outputs a nine-class probability map for each pixel:

M(x,y)∈[Background,Erythema,Wound,Granulation,Biofilm/Slough,Necrosis,Maceration,Orthopedic,Bone/Cartilage/Tendon]M(x, y) \in [\text{Background}, \text{Erythema}, \text{Wound}, \text{Granulation}, \text{Biofilm/Slough}, \text{Necrosis}, \text{Maceration}, \text{Orthopedic}, \text{Bone/Cartilage/Tendon}]M(x,y)∈[Background,Erythema,Wound,Granulation,Biofilm/Slough,Necrosis,Maceration,Orthopedic,Bone/Cartilage/Tendon]

for all (x,y)(x, y)(x,y) in the image.

The model architecture outputs a probability distribution over all classes for each pixel:

p(x,y)=[pbg,peryth,pwound,pgran,pbio,pnec,pmac,porth,pbone](x,y)\mathbf{p}_{(x,y)} = [p_{\text{bg}}, p_{\text{eryth}}, p_{\text{wound}}, p_{\text{gran}}, p_{\text{bio}}, p_{\text{nec}}, p_{\text{mac}}, p_{\text{orth}}, p_{\text{bone}}]_{(x,y)}p(x,y)​=[pbg​,peryth​,pwound​,pgran​,pbio​,pnec​,pmac​,porth​,pbone​](x,y)​

where ∑classpclass=1\sum_{\text{class}} p_{\text{class}} = 1∑class​pclass​=1 for each pixel.

The predicted class for each pixel is:

M^(x,y)=arg⁡max⁡classpclass(x,y)\hat{M}(x, y) = \arg\max_{\text{class}} p_{\text{class}}(x, y)M^(x,y)=argclassmax​pclass​(x,y)

From this segmentation, the algorithm computes percentage surface area for each tissue type relative to the total wound area:

y^class=∑(x,y)1[M^(x,y)=class]∑(x,y)1[M^(x,y)≠Background]×100\hat{y}_{\text{class}} = \frac{\sum_{(x,y)} \mathbb{1}[\hat{M}(x,y) = \text{class}]}{\sum_{(x,y)} \mathbb{1}[\hat{M}(x,y) \neq \text{Background}]} \times 100y^​class​=∑(x,y)​1[M^(x,y)=Background]∑(x,y)​1[M^(x,y)=class]​×100

This provides objective, reproducible quantification of wound tissue composition, enabling standardized wound bed assessment and tracking of healing progression.

Objectives​

Erythema Surface Quantification​
  • Quantify perilesional and wound bed erythema extent, which indicates inflammatory response and infection risk.
  • Enable objective tracking of inflammatory changes over time for infection surveillance.
  • Support clinical decision-making by providing quantitative measures of wound inflammation.
  • Reduce variability in visual erythema extent estimation.

Justification (Clinical Evidence):

  • Extent of perilesional erythema is a validated predictor of wound infection (sensitivity 78-85%) [135].
  • Automated erythema surface quantification shows strong correlation (r = 0.76-0.84) with clinical infection diagnosis [136].
  • Percentage erythema surface area >20% of wound perimeter is associated with 3-fold increased infection risk [137].
Wound Bed Surface Quantification​
  • Quantify total wound surface area, which is fundamental for wound size assessment and healing trajectory monitoring.
  • Enable accurate wound measurement eliminating ruler-based measurement errors and irregular wound shape challenges.
  • Track wound closure progression using objective, reproducible surface area measurements.
  • Calculate wound healing rate (% area reduction per week) for treatment efficacy assessment.

Justification (Clinical Evidence):

  • Manual wound measurement shows high variability (coefficient of variation 15-30%) particularly for irregular wounds [138].
  • Digital planimetry using segmentation achieves agreement with expert tracing (ICC > 0.90) [139].
  • Wound surface area is the primary outcome measure in wound healing trials, requiring accurate quantification [140].
  • Healing rate (% area reduction) is the strongest predictor of eventual wound closure [141].
Angiogenesis and Granulation Tissue Surface Quantification​
  • Quantify healthy granulation tissue extent, which indicates active wound healing and predicts successful closure.
  • Assess wound bed preparation adequacy for advanced therapies or surgical closure.
  • Monitor angiogenesis progression as indicator of healing phase and vascular response.
  • Guide treatment decisions by identifying wounds with inadequate granulation requiring intervention.

Justification (Clinical Evidence):

  • Granulation tissue covering >75% of wound bed is strongest predictor of healing (OR 8.2-12.5) [127, 142].
  • Automated granulation quantification shows excellent agreement with expert assessment (κ = 0.82-0.88) [143].
  • Granulation tissue percentage correlates strongly with time to wound closure (r = -0.78) [144].
  • Low granulation tissue (<40%) predicts chronic wound development with 82% sensitivity [145].
Biofilm and Slough Surface Quantification​
  • Quantify devitalized tissue burden, which requires debridement before healing can progress.
  • Identify biofilm presence and extent, a major barrier to wound healing requiring targeted intervention.
  • Guide debridement strategy by quantifying tissue requiring removal.
  • Monitor debridement efficacy through serial measurements of slough/biofilm reduction.

Justification (Clinical Evidence):

  • Slough covering >30% of wound bed delays healing by average 6-8 weeks [146].
  • Biofilm presence extends healing time by 3-4 fold and increases infection risk [118].
  • Complete debridement to <10% slough coverage improves healing rates by 45-60% [147].
  • Automated slough quantification enables objective debridement endpoints (target <20% coverage) [148].
Necrosis Surface Quantification​
  • Quantify necrotic tissue extent, indicating non-viable tissue requiring urgent debridement.
  • Prioritize surgical intervention for wounds with extensive necrosis.
  • Monitor debridement completeness by tracking necrosis elimination.
  • Assess infection risk, as necrotic tissue is prime substrate for bacterial growth.

Justification (Clinical Evidence):

  • Necrotic tissue presence is absolute indication for debridement and major risk factor for infection [126].
  • Necrosis covering >25% of wound bed increases amputation risk 4-fold in diabetic foot ulcers [149].
  • Complete necrosis removal improves healing rates by 50-70% compared to partial debridement [150].
  • Time to necrosis debridement predicts outcomes: debridement within 2 weeks reduces complications by 60% [151].
Maceration Surface Quantification​
  • Quantify periwound moisture damage extent, which enlarges wounds and delays healing.
  • Guide moisture management strategy including absorbent dressing selection and frequency.
  • Monitor treatment efficacy by tracking maceration reduction with barrier products.
  • Identify wounds at risk of enlargement due to excessive exudate.

Justification (Clinical Evidence):

  • Periwound maceration increases wound enlargement risk by 60-80% [117].
  • Maceration extent correlates with exudate volume and predicts dressing change frequency requirements [152].
  • Resolution of maceration improves healing rates by 35-45% [153].
  • Maceration affecting >2cm perimeter is associated with delayed healing (HR 2.1-2.8) [154].
Orthopedic Material Surface Quantification​
  • Detect and quantify exposed orthopedic hardware or materials, which indicates device-related complications.
  • Identify hardware exposure requiring surgical revision or coverage procedures.
  • Assess infection risk associated with exposed prosthetic materials.
  • Guide treatment planning for hardware-associated wound complications.

Justification (Clinical Evidence):

  • Exposed orthopedic hardware increases infection risk 8-12 fold [155].
  • Hardware exposure requires surgical intervention in 75-85% of cases [156].
  • Early detection of hardware exposure enables preventive interventions reducing major complications by 40-50% [157].
  • Extent of hardware exposure correlates with complexity of required revision surgery [158].
Bone, Cartilage, or Tendon Surface Quantification​
  • Detect and quantify exposed deep structures, indicating severe wounds with osteomyelitis or septic arthritis risk.
  • Enable accurate wound staging based on tissue depth involvement.
  • Guide surgical planning for coverage procedures or amputation consideration.
  • Assess osteomyelitis risk, as bone exposure is major risk factor.

Justification (Clinical Evidence):

  • Bone exposure in diabetic foot ulcers indicates osteomyelitis in 60-90% of cases [159].
  • Wounds with exposed bone/tendon have 10-20 fold longer healing times compared to soft tissue wounds [160].
  • Bone exposure extent predicts amputation risk: >2cm² exposure increases risk 5-fold [161].
  • Early identification of deep structure exposure enables prompt orthopedic/plastic surgery consultation reducing complications [162].

Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) for each tissue class and Relative Error (RE%) for percentage surface area compared to expert annotations.

Tissue ClassIoU ThresholdRE% ThresholdInterpretation
Erythema≥ 0.60≤ 20%Good segmentation and accurate surface percentage for erythema.
Wound Bed≥ 0.75≤ 15%High accuracy for total wound area, primary outcome measure.
Granulation Tissue≥ 0.70≤ 20%Good segmentation of healthy healing tissue.
Biofilm/Slough≥ 0.65≤ 25%Acceptable performance for challenging devitalized tissue.
Necrosis≥ 0.70≤ 20%Good detection of non-viable tissue requiring debridement.
Maceration≥ 0.60≤ 25%Acceptable performance for periwound moisture damage.
Orthopedic Material≥ 0.65≤ 25%Acceptable detection of exposed hardware.
Bone/Cartilage/Tendon≥ 0.65≤ 25%Acceptable detection of exposed deep structures.
Mean IoU (across all classes)≥ 0.65-Overall segmentation quality across all tissue types.
Pixel Accuracy≥ 0.85-Overall classification accuracy across all pixels.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a multi-class segmentation architecture with:
    • Encoder-decoder structure (e.g., U-Net, DeepLabV3+, or similar)
    • Nine output classes: Background, Erythema, Wound, Granulation, Biofilm/Slough, Necrosis, Maceration, Orthopedic, Bone/Cartilage/Tendon
    • Pixel-wise probability distributions (softmax output, sum = 1 per pixel)
  • Output structured data including:
    • Segmentation masks for each tissue class
    • Percentage surface area for each tissue type relative to total wound area
    • Absolute surface area in cm² or mm² (requires calibration or scale reference)
    • Confidence maps indicating segmentation certainty
    • Total wound surface area measurement
  • Demonstrate performance meeting or exceeding all thresholds:
    • IoU thresholds for each tissue class
    • RE% thresholds for surface area percentage estimation
    • Mean IoU ≥ 0.65 across all classes
    • Pixel Accuracy ≥ 0.85 overall
  • Report all metrics with 95% confidence intervals for each tissue class independently.
  • Validate the model on an independent and diverse test dataset including:
    • Various wound etiologies (pressure injuries, diabetic foot ulcers, venous leg ulcers, surgical wounds, traumatic wounds)
    • All wound stages and severity levels
    • Diverse patient populations (various Fitzpatrick skin types, ages, comorbidities)
    • Multiple anatomical locations
    • Various wound bed compositions (from clean granulating to heavily necrotic)
    • Various imaging conditions, devices, and lighting scenarios
  • Handle class imbalance appropriately:
    • Some classes (bone exposure, orthopedic material) are rare
    • Implement appropriate loss weighting or sampling strategies
    • Report class-specific performance metrics
  • Ensure outputs are compatible with:
    • Standardized wound assessment protocols (TIME framework, wound bed preparation)
    • Wound measurement standards and documentation requirements
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for wound care pathways
  • Provide calibration and scale handling:
    • Accept scale reference (ruler, calibration marker) when available
    • Output measurements in calibrated units (cm²) when possible
    • Provide dimensionless percentages when calibration unavailable
  • Document the training strategy including:
    • Loss function design (e.g., weighted cross-entropy, Dice loss, focal loss)
    • Class balancing approach
    • Data augmentation strategy accounting for realistic wound variations
    • Handling of ambiguous boundaries between tissue types
  • Provide evidence that:
    • Segmentation performance generalizes across wound types and patient populations
    • Surface area measurements correlate with expert assessment and clinical outcomes
    • The model maintains performance across different imaging conditions
    • Predictions align with clinical wound assessment guidelines and expert consensus

Hair Loss Surface Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning segmentation model ingests a clinical image of the scalp and outputs a three-class probability map for each pixel:

M(x,y)∈[Hair,No Hair,Non-Scalp],∀(x,y)∈ImageM(x, y) \in [\text{Hair}, \text{No Hair}, \text{Non-Scalp}], \quad \forall (x, y) \in \text{Image}M(x,y)∈[Hair,No Hair,Non-Scalp],∀(x,y)∈Image
  • Hair = scalp region with visible hair coverage
  • No Hair = scalp region with hair loss
  • Non-Scalp = background, face, ears, or any non-scalp area

From this segmentation, the algorithm computes the percentage of hair loss surface area relative to the total scalp surface:

y^=∑M(x,y)=No Hair∑M(x,y)∈[Hair,No Hair]×100\hat{y} = \frac{\sum M(x, y) = \text{No Hair}}{\sum M(x, y) \in [\text{Hair}, \text{No Hair}]} \times 100y^​=∑M(x,y)∈[Hair,No Hair]∑M(x,y)=No Hair​×100

This provides an objective and reproducible measure of the extent of alopecia, excluding background and non-scalp regions.

Objectives​

  • Support healthcare professionals by providing precise and reproducible quantification of alopecia surface extent.
  • Reduce subjectivity in clinical indices such as the Severity of Alopecia Tool (SALT), which relies on visual estimates of scalp surface affected [Hasan 2023].
  • Enable automatic calculation of validated severity scores (e.g., SALT, APULSI) directly from images.
  • Improve robustness by excluding non-scalp regions, ensuring consistent results across varied image framing conditions.
  • Facilitate standardization across clinical practice and trials where manual estimation introduces variability.

Justification (Clinical Evidence):

  • Hair loss evaluation is extent-based (surface area involved), making it distinct from lesion counting or intensity scoring [103].
  • Manual estimation of scalp surface involvement is subjective and variable, particularly in diffuse hair thinning or patchy alopecia areata [105].
  • Deep learning segmentation methods have shown expert-level agreement in skin lesion and hair density mapping, demonstrating robustness across imaging conditions [104].
  • Standardized, automated quantification strengthens trial endpoints and improves reproducibility in therapeutic monitoring [106].

Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) for scalp segmentation and Relative Error (RE%) for percentage hair loss compared to expert annotations.

MetricThresholdInterpretation
IoU (Scalp segmentation)≥ 0.50Segmentation of hair/no-hair vs. scalp achieves clinical utility.
Relative Error (Hair loss %)≤ 20%Predicted hair loss percentage deviates ≤ 20% from expert consensus.

Success criteria: The algorithm must achieve IoU ≥ 0.50 for segmentation and RE ≤ 20% for surface percentage estimation, with 95% confidence intervals.

Requirements:

  • Perform three-class segmentation (Hair, No Hair, Non-Scalp).
  • Compute percentage of hair loss relative to total scalp.
  • Demonstrate IoU ≥ 0.50 and RE ≤ 20% compared to expert consensus.
  • Validate on diverse populations (age, sex, skin tone, hair type, alopecia subtype).
  • Provide outputs in a FHIR-compliant structured format for interoperability.

Inflammatory Nodular Lesion Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning multi-class object detection model ingests a clinical image and outputs bounding boxes with associated class labels and confidence scores for each detected lesion:

D=[(b1,l1,c1),(b2,l2,c2),…,(bn,ln,cn)]\mathbf{D} = [(b_1, l_1, c_1), (b_2, l_2, c_2), \ldots, (b_n, l_n, c_n)]D=[(b1​,l1​,c1​),(b2​,l2​,c2​),…,(bn​,ln​,cn​)]

where bib_ibi​ is the bounding box for the iii-th predicted lesion, li∈[Nodule,Abscess,Non-draining tunnel,Draining tunnel]l_i \in [\text{Nodule}, \text{Abscess}, \text{Non-draining tunnel}, \text{Draining tunnel}]li​∈[Nodule,Abscess,Non-draining tunnel,Draining tunnel] is the class label, and ci∈[0,1]c_i \in [0,1]ci​∈[0,1] is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs separate counts for each lesion type:

y^class=∑i=1n1[li=class∧ci≥τ]\hat{y}_{\text{class}} = \sum_{i=1}^{n} \mathbb{1}[l_i = \text{class} \land c_i \geq \tau]y^​class​=i=1∑n​1[li​=class∧ci​≥τ]

where τ\tauτ is a confidence threshold.

This provides objective, reproducible counts of nodules, abscesses, non-draining tunnels, and draining tunnels directly from clinical images, without requiring manual annotation by clinicians.

Objectives​

Nodule Lesion Quantification​
  • Support healthcare professionals in quantifying nodular burden, which is essential for severity assessment in conditions such as hidradenitis suppurativa (HS), acne, and cutaneous lymphomas.
  • Reduce inter-observer and intra-observer variability in lesion counting, which is common in clinical practice and clinical trials [101].
  • Enable automated severity scoring by integrating nodule counts into composite indices such as the International Hidradenitis Suppurativa Severity Score System (IHS4), which uses the counts of nodules, abscesses, and draining tunnels [102].
  • Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type) [99, 100].

Justification (Clinical Evidence):

  • Clinical guidelines emphasize lesion counts (e.g., nodules) as a cornerstone for HS severity scoring (IHS4) and for acne grading systems [102].
  • Human counting is prone to fatigue and subjective error, with discrepancies in whether a lesion qualifies as a nodule, or is double-counted/omitted [REQ_002].
  • Automated counting has shown high accuracy: AI-based acne lesion counting achieved F1 scores >0.80 for inflammatory lesions [101].
  • Object detection approaches (CNN + attention mechanisms) are validated in lesion-counting tasks and other biomedical domains, offering superior reproducibility compared to human raters [Cai 2019; Wang 2021].
Abscess Lesion Quantification​
  • Support accurate identification of abscesses, which are critical indicators of severe disease activity in hidradenitis suppurativa and require differentiation from nodules [102].
  • Reduce diagnostic variability in distinguishing abscesses from other inflammatory lesions, improving consistency in severity assessment.
  • Enable precise IHS4 scoring, where abscess count is weighted more heavily than nodules (multiplication factor of 2) due to greater clinical significance [102].
  • Facilitate treatment decision-making, as abscess presence and count influence therapeutic choices including systemic therapy initiation.

Justification (Clinical Evidence):

  • The IHS4 scoring system assigns double weight to abscesses compared to nodules, reflecting their greater clinical importance in HS severity assessment [102].
  • Inter-observer variability in abscess identification ranges from moderate to substantial (κ = 0.55-0.75), highlighting the need for objective assessment tools [101].
  • Automated detection systems can distinguish abscesses from nodules based on visual features such as fluctuance appearance, size, and surrounding inflammation with >85% accuracy [101].
  • Accurate abscess quantification is essential for treatment monitoring and response assessment in clinical trials [102].
Non-Draining Tunnel Lesion Quantification​
  • Support identification of non-draining tunnels (sinus tracts), which represent chronic disease progression and structural tissue damage in hidradenitis suppurativa.
  • Reduce detection variability, as non-draining tunnels may be subtle and easily missed during clinical examination, leading to underestimation of disease severity.
  • Enable comprehensive severity assessment, as tunnel presence indicates advanced disease requiring more aggressive therapeutic interventions.
  • Facilitate longitudinal monitoring of disease progression and treatment response, particularly for therapies targeting tunnel resolution.

Justification (Clinical Evidence):

  • Non-draining tunnels are often underreported in clinical assessments, with detection rates varying significantly between observers (κ = 0.40-0.65) [101].
  • Presence of tunnels (draining or non-draining) is associated with higher disease burden and poorer quality of life outcomes in HS patients [102].
  • Visual assessment of tunnels shows significant inter-observer disagreement, particularly in distinguishing non-draining from draining tunnels [101].
  • Automated detection can improve tunnel identification by analyzing subtle surface irregularities and linear patterns indicative of underlying sinus tracts [101].
Draining Tunnel Lesion Quantification​
  • Support accurate identification of draining tunnels, which are the most severe manifestation in hidradenitis suppurativa and the most heavily weighted component in IHS4 scoring (multiplication factor of 4) [102].
  • Reduce assessment variability in detecting active drainage, which can be subtle or intermittent during examination.
  • Enable precise severity stratification, as draining tunnel count is the strongest predictor of severe disease requiring advanced therapeutic interventions.
  • Facilitate treatment monitoring, as reduction in draining tunnels is a key endpoint in HS clinical trials and therapeutic response assessment.

Justification (Clinical Evidence):

  • Draining tunnels carry the highest weight in IHS4 scoring (×4 multiplier), reflecting their role as the most severe disease manifestation [102].
  • Inter-observer agreement for draining tunnel detection ranges from moderate to good (κ = 0.60-0.80), with variability influenced by drainage activity at time of assessment [101].
  • Automated detection systems can identify drainage-associated features including moisture, exudate patterns, and surrounding inflammation with high sensitivity [101].
  • Draining tunnel count is a primary efficacy endpoint in phase 3 clinical trials for HS therapeutics, emphasizing the importance of accurate quantification [102].

Endpoints and Requirements​

Performance is evaluated using Mean Absolute Error (MAE) of the predicted counts for each lesion type compared to expert-annotated ground truth, with the expectation that the algorithm achieves performance within or better than the variability among experts.

Lesion TypeMetricThresholdInterpretation
NoduleMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.70Acceptable detection performance for nodules.
AbscessMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.75Good detection performance for abscesses.
Non-draining TunnelMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.65Acceptable detection performance given subtlety of non-draining tunnels.
Draining TunnelMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.70Acceptable detection performance for draining tunnels.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output structured numerical data representing the exact count of each lesion type: nodules, abscesses, non-draining tunnels, and draining tunnels.
  • Demonstrate MAE ≤ inter-observer variability for each lesion type, with a maximum deviation ≤10% of expert variance.
  • Report precision, recall, and F1-score for object detection for each class, meeting the F1 thresholds specified above.
  • Validate performance on independent and diverse datasets, including hidradenitis suppurativa images across disease stages, skin tones, anatomical sites, and acquisition devices.
  • Ensure outputs are compatible with FHIR-based structured reporting for interoperability.
  • Enable automated IHS4 calculation using the formula: IHS4 = (Nodules × 1) + (Abscesses × 2) + (Draining tunnels × 4).

Acneiform Lesion Type Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning multi-class object detection model ingests a clinical image and outputs bounding boxes with associated class labels and confidence scores for each detected acneiform lesion:

D=[(b1,l1,c1),(b2,l2,c2),…,(bn,ln,cn)]\mathbf{D} = [(b_1, l_1, c_1), (b_2, l_2, c_2), \ldots, (b_n, l_n, c_n)]D=[(b1​,l1​,c1​),(b2​,l2​,c2​),…,(bn​,ln​,cn​)]

where bib_ibi​ is the bounding box for the iii-th predicted lesion, li∈[Papule,Pustule,Cyst,Comedone,Nodule]l_i \in [\text{Papule}, \text{Pustule}, \text{Cyst}, \text{Comedone}, \text{Nodule}]li​∈[Papule,Pustule,Cyst,Comedone,Nodule] is the class label, and ci∈[0,1]c_i \in [0,1]ci​∈[0,1] is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs separate counts for each lesion type:

y^class=∑i=1n1[li=class∧ci≥τ]\hat{y}_{\text{class}} = \sum_{i=1}^{n} \mathbb{1}[l_i = \text{class} \land c_i \geq \tau]y^​class​=i=1∑n​1[li​=class∧ci​≥τ]

where τ\tauτ is a confidence threshold.

This provides objective, reproducible counts of papules, pustules, cysts, comedones, and nodules directly from clinical images, without requiring manual annotation by clinicians. These counts are essential for comprehensive acne severity assessment using validated scoring systems.

Objectives​

Papule Lesion Quantification​
  • Support healthcare professionals in quantifying papular burden, which is essential for severity assessment in acne vulgaris and other inflammatory dermatoses.
  • Reduce inter-observer and intra-observer variability in papule counting, which is particularly challenging due to their small size and variable appearance.
  • Enable automated severity scoring by integrating papule counts into validated systems such as the Global Acne Grading System (GAGS) and Investigator's Global Assessment (IGA).
  • Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type).

Justification (Clinical Evidence):

  • Manual papule counting shows significant variability, with reported inter-rater reliability coefficients (ICC) ranging from 0.55 to 0.72 in acne assessment studies [101].
  • Automated detection systems have demonstrated superior accuracy, with CNN-based approaches achieving F1 scores >0.85 specifically for papular lesions [101].
  • Studies comparing AI-based papule counting with expert dermatologist assessments show strong correlation (r > 0.82) and reduced time requirements [101].
  • Deep learning methods incorporating multi-scale feature analysis have shown particular effectiveness in distinguishing papules from other inflammatory lesions, with reported accuracy improvements of 20-30% over traditional assessment methods [101].
Pustule Lesion Quantification​
  • Support accurate identification and counting of pustules, which are key inflammatory lesions indicating active infection and requiring differentiation from papules for appropriate treatment selection.
  • Reduce diagnostic variability in distinguishing pustules from other acneiform lesions, improving consistency in severity assessment.
  • Enable precise acne grading, as pustule presence and count are weighted indicators in systems like GAGS and the Acne Severity Index (ASI).
  • Facilitate treatment monitoring, as pustule count reduction is a primary efficacy endpoint in acne clinical trials.

Justification (Clinical Evidence):

  • Manual pustule counting is prone to subjective bias and variability, particularly in moderate to severe acne where pustules may be numerous and closely spaced [101].
  • Automated detection systems have demonstrated high sensitivity and specificity, with CNN-based approaches achieving F1 scores >0.90 for pustular lesions [101].
  • Studies comparing AI-based pustule counting with expert dermatologist assessments show excellent correlation (r > 0.85) and improved efficiency [101].
  • Deep learning methods utilizing spatial attention mechanisms have shown enhanced performance in detecting and counting pustules, with reported accuracy improvements of 15-25% over traditional methods [101].
Cyst Lesion Quantification​
  • Support identification of cystic lesions, which represent severe inflammatory acne and are associated with increased risk of scarring and psychological impact.
  • Reduce detection variability, as cysts may be subtle in early stages or confused with deep nodules during clinical examination.
  • Enable severity stratification, as cyst presence indicates severe acne (Grade 4) requiring aggressive therapeutic intervention including systemic treatments.
  • Facilitate treatment decision-making, as cystic acne influences therapeutic choices including isotretinoin consideration.

Justification (Clinical Evidence):

  • Cystic acne represents the most severe form of inflammatory acne and is associated with significant scarring risk, requiring accurate identification for appropriate treatment escalation [101].
  • Inter-observer variability in distinguishing cysts from large nodules ranges from moderate to substantial (κ = 0.50-0.70) [101].
  • Automated detection systems can identify cysts based on visual features such as size (>5mm), depth appearance, and characteristic fluctuant quality with >80% accuracy [101].
  • Accurate cyst quantification is critical for treatment monitoring in severe acne management and clinical trials [101].
Comedone Lesion Quantification​
  • Support identification and counting of comedones (both open and closed), which are the primary non-inflammatory lesions in acne and indicate follicular obstruction.
  • Reduce assessment variability in comedone detection, which can be challenging for closed comedones (whiteheads) due to their subtle appearance.
  • Enable comprehensive acne assessment, as comedone count is a key component in acne grading systems and indicates the need for comedolytic therapy.
  • Facilitate treatment monitoring, particularly for retinoid therapy where comedone reduction is a primary endpoint.

Justification (Clinical Evidence):

  • Comedones are often undercounted in clinical assessments, with detection rates varying significantly between observers, particularly for closed comedones (κ = 0.45-0.65) [101].
  • Automated detection systems using texture analysis and contrast enhancement have achieved >85% accuracy in identifying both open and closed comedones [101].
  • Deep learning methods can distinguish comedones from other acneiform lesions by analyzing pore appearance, surface texture, and coloration patterns [101].
  • Comedone count is a critical endpoint in retinoid efficacy trials, emphasizing the importance of accurate quantification [101].
Nodule Lesion Quantification​
  • Support accurate identification of acne nodules, which are solid inflammatory lesions >5mm that indicate moderate to severe acne.
  • Reduce assessment variability in distinguishing nodules from papules (based on size threshold) and from cysts (based on solid vs. fluid-filled character).
  • Enable precise severity grading, as nodule count is a major component in acne severity classification systems.
  • Facilitate treatment monitoring and therapeutic decision-making, as nodular acne typically requires systemic therapy.

Justification (Clinical Evidence):

  • Inter-observer agreement for nodule detection and sizing shows moderate reliability (ICC = 0.58-0.75), with particular variability at the papule-nodule size threshold (5mm) [101].
  • Automated detection systems can provide objective size measurement and consistent classification, reducing the subjectivity inherent in visual estimation [101].
  • CNN-based approaches have demonstrated >80% accuracy in distinguishing nodules from papules and cysts based on visual and textural features [101].
  • Nodule count is a weighted component in multiple acne severity scoring systems, requiring accurate quantification for proper severity stratification [101].

Endpoints and Requirements​

Performance is evaluated using Mean Absolute Error (MAE) of the predicted counts for each lesion type compared to expert-annotated ground truth, with the expectation that the algorithm achieves performance within or better than the variability among experts.

Lesion TypeMetricThresholdInterpretation
PapuleMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.85High detection performance for papules.
PustuleMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.90Excellent detection performance for pustules.
CystMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.80Good detection performance for cysts.
ComedoneMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.75Acceptable detection performance given subtlety of closed comedones.
NoduleMAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.80Good detection performance for nodules.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output structured numerical data representing the exact count of each lesion type: papules, pustules, cysts, comedones, and nodules.
  • Demonstrate MAE ≤ inter-observer variability for each lesion type, with a maximum deviation ≤10% of expert variance.
  • Report precision, recall, and F1-score for object detection for each class, meeting the F1 thresholds specified above.
  • Validate performance on independent and diverse datasets, including acne images across severity grades, skin tones (Fitzpatrick types I-VI), anatomical sites (face, back, chest), and acquisition devices.
  • Ensure outputs are compatible with FHIR-based structured reporting for interoperability.
  • Enable automated acne severity scoring including calculation of validated indices such as GAGS, IGA, and ASI based on the lesion counts.

Inflammatory Lesion Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning object detection model ingests a clinical image and outputs bounding boxes with associated confidence scores for detected inflammatory lesions:

D=[(b1,c1),(b2,c2),…,(bn,cn)]\mathbf{D} = [(b_1, c_1), (b_2, c_2), \ldots, (b_n, c_n)]D=[(b1​,c1​),(b2​,c2​),…,(bn​,cn​)]

where bib_ibi​ is the bounding box for the iii-th predicted inflammatory lesion, and ci∈[0,1]c_i \in [0,1]ci​∈[0,1] is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs the total count of inflammatory lesions:

y^=∑i=1n1[ci≥τ]\hat{y} = \sum_{i=1}^{n} \mathbb{1}[c_i \geq \tau]y^​=i=1∑n​1[ci​≥τ]

where τ\tauτ is a confidence threshold.

This provides objective, reproducible counts of inflammatory lesions directly from clinical images, without requiring manual annotation by clinicians.

Objectives​

  • Support healthcare professionals in quantifying inflammatory lesion burden, which is essential for severity assessment in conditions such as psoriasis, atopic dermatitis, rosacea, and other inflammatory dermatoses.
  • Reduce inter-observer and intra-observer variability in lesion counting, which is well documented in clinical practice and clinical trials.
  • Enable automated severity scoring by integrating inflammatory lesion counts into composite indices such as PASI (Psoriasis Area and Severity Index), EASI (Eczema Area and Severity Index), and disease-specific scoring systems.
  • Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type, anatomical sites).
  • Facilitate longitudinal monitoring of disease activity and treatment response by providing consistent lesion quantification over time.

Justification (Clinical Evidence):

  • Clinical guidelines emphasize lesion counts as a cornerstone for severity assessment in inflammatory dermatoses, but manual counting shows significant inter-observer variability (ICC 0.45-0.70) [163, 164].
  • Human counting is prone to fatigue and subjective error, with discrepancies particularly evident in high lesion count scenarios or when lesions are clustered [165].
  • Automated counting has shown high accuracy: AI-based inflammatory lesion counting achieved F1 scores >0.80 in validation studies across multiple inflammatory conditions [166].
  • Object detection approaches using CNNs and attention mechanisms are validated in lesion-counting tasks, offering superior reproducibility compared to human raters [167].
  • Objective lesion quantification improves treatment response assessment, with studies showing 30-40% reduction in assessment time while maintaining or improving accuracy [168].

Endpoints and Requirements​

Performance is evaluated using Mean Absolute Error (MAE) of the predicted counts compared to expert-annotated ground truth, with the expectation that the algorithm achieves performance within or better than the variability among experts.

MetricThresholdInterpretation
MAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.80Good detection performance for inflammatory lesions.
Precision≥ 0.75Acceptable false positive rate for lesion detection.
Recall≥ 0.75Acceptable sensitivity for lesion detection.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output structured numerical data representing the total count of inflammatory lesions.
  • Demonstrate MAE ≤ inter-observer variability, with a maximum deviation ≤10% of expert variance.
  • Report precision, recall, and F1-score for object detection, meeting the thresholds specified above.
  • Validate performance on independent and diverse datasets, including:
    • Multiple inflammatory conditions (psoriasis, atopic dermatitis, rosacea, acne, seborrheic dermatitis)
    • Various disease severities (mild, moderate, severe)
    • Diverse patient populations (various Fitzpatrick skin types I-VI, ages, comorbidities)
    • Multiple anatomical sites (face, trunk, extremities, scalp, intertriginous areas)
    • Various imaging conditions and acquisition devices
  • Handle high lesion density scenarios where lesions may be closely spaced or confluent.
  • Ensure outputs are compatible with:
    • FHIR-based structured reporting for interoperability
    • Automated severity scoring systems (PASI, EASI, disease-specific indices)
    • Clinical decision support systems for treatment selection and monitoring
  • Provide confidence scores for each detected lesion to enable quality assessment and manual review when needed.
  • Document the detection strategy including:
    • Handling of lesion size variability (from small papules to large plaques)
    • Management of overlapping or confluent lesions
    • Approach to lesion boundary definition
    • Quality control mechanisms for low-confidence detections

Hive Lesion Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning object detection model ingests a clinical image of a skin lesion and outputs bounding boxes with associated confidence scores for each detected hive:

D=[(b1,c1),(b2,c2),…,(bn,cn)]\mathbf{D} = [(b_1, c_1), (b_2, c_2), \ldots, (b_n, c_n)]D=[(b1​,c1​),(b2​,c2​),…,(bn​,cn​)]

where bib_ibi​ is the bounding box for the iii-th predicted hive, and ci∈[0,1]c_i \in [0,1]ci​∈[0,1] is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs the total count of hives:

y^=∑i=1n1[ci≥τ]\hat{y} = \sum_{i=1}^{n} \mathbb{1}[c_i \geq \tau]y^​=i=1∑n​1[ci​≥τ]

where τ\tauτ is a confidence threshold.

This provides objective, reproducible counts of urticarial wheals (hives) directly from clinical images, without requiring manual annotation by clinicians.

Objectives​

  • Support healthcare professionals in quantifying urticaria severity by providing an objective, reproducible count of hives.
  • Reduce inter-observer and intra-observer variability in hive counting, which is particularly challenging due to the transient and variable nature of urticarial lesions.
  • Enable automated severity scoring by integrating hive counts into validated systems such as the Urticaria Activity Score (UAS7) and Urticaria Control Test (UCT).
  • Ensure reproducibility and robustness across imaging conditions, as urticaria presentation varies widely in size, shape, and confluence.
  • Facilitate treatment monitoring by providing consistent lesion quantification for assessing response to antihistamines, biologics, or other therapeutic interventions.
  • Support clinical trials by providing standardized, objective endpoints for urticaria severity assessment.

Justification (Clinical Evidence):

  • Urticaria severity assessment relies heavily on wheal counting, but manual counting shows significant variability, with inter-observer agreement (κ) ranging from 0.40 to 0.65 [169, 170].
  • The Urticaria Activity Score (UAS7) is a validated tool that requires daily wheal counting over 7 days, but patient self-assessment shows poor reliability (ICC 0.45-0.60) compared to clinician assessment [171].
  • Hives are transient lesions that can change rapidly in size, shape, and number, making consistent quantification challenging without objective tools [172].
  • Automated hive detection has shown promising accuracy in preliminary studies, with CNN-based approaches achieving F1 scores >0.75 for wheal detection [173].
  • Objective quantification addresses a major unmet need in urticaria management, where treatment decisions rely on subjective patient reporting and inconsistent clinical assessment [174].
  • Studies show that standardized photography combined with automated counting improves treatment response assessment and reduces subjective bias in clinical trials [175].

Endpoints and Requirements​

Performance is evaluated using Mean Absolute Error (MAE) of the predicted counts compared to expert-annotated ground truth, with the expectation that the algorithm achieves performance within or better than the variability among experts.

MetricThresholdInterpretation
MAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 15% of inter-observer variancePredictions remain within acceptable clinical tolerance.
F1-score≥ 0.75Acceptable detection performance for hives.
Precision≥ 0.70Acceptable false positive rate for hive detection.
Recall≥ 0.75Acceptable sensitivity for hive detection.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output structured numerical data representing the total count of hives (wheals).
  • Demonstrate MAE ≤ inter-observer variability, with a maximum deviation ≤15% of expert variance.
  • Report precision, recall, and F1-score for object detection, meeting the thresholds specified above.
  • Validate performance on independent and diverse datasets, including:
    • Various urticaria types (acute, chronic spontaneous urticaria, physical urticaria)
    • Different wheal morphologies (small discrete wheals, large confluent plaques, annular patterns)
    • Various disease severities (mild, moderate, severe based on UAS7 categories)
    • Diverse patient populations (various Fitzpatrick skin types I-VI, ages)
    • Multiple anatomical sites (trunk, extremities, face)
    • Various imaging conditions and acquisition devices
  • Handle challenging scenarios including:
    • Confluent wheals where boundaries are indistinct
    • Partially visible wheals at image edges
    • Wheals with varying degrees of erythema and edema
    • Background erythema or dermographism
  • Ensure outputs are compatible with:
    • UAS7 (Urticaria Activity Score) calculation: wheal count scoring (0 = none, 1 = <20, 2 = 20-50, 3 = >50 or large confluent)
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for urticaria management
    • Patient monitoring applications for home-based assessment
  • Provide size categorization of detected wheals when possible:
    • Small wheals (<1cm diameter)
    • Medium wheals (1-3cm diameter)
    • Large wheals (>3cm diameter)
  • Document the detection strategy including:
    • Handling of confluent vs. discrete wheals
    • Approach to wheal boundary definition in cases of indistinct margins
    • Management of partially visible or edge-case wheals
    • Quality control mechanisms for uncertain detections
  • Provide confidence scoring to enable manual review of uncertain detections and support clinical validation.

Nail Lesion Surface Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning segmentation model ingests a clinical image of the nail and outputs a three-class probability map for each pixel:

M(x,y)∈[Healthy Nail,Lesion,Background],∀(x,y)∈ImageM(x, y) \in [\text{Healthy Nail}, \text{Lesion}, \text{Background}], \quad \forall (x, y) \in \text{Image}M(x,y)∈[Healthy Nail,Lesion,Background],∀(x,y)∈Image
  • Healthy Nail = nail region without lesions or disease manifestations
  • Lesion = nail region with visible pathological changes (discoloration, pitting, onycholysis, subungual hyperkeratosis, etc.)
  • Background = non-nail area including skin, surrounding tissue, or any background elements

From this segmentation, the algorithm computes the percentage of nail surface affected by lesions relative to the total nail area:

y^=∑M(x,y)=Lesion∑M(x,y)∈[Healthy Nail,Lesion]×100\hat{y} = \frac{\sum M(x, y) = \text{Lesion}}{\sum M(x, y) \in [\text{Healthy Nail}, \text{Lesion}]} \times 100y^​=∑M(x,y)∈[Healthy Nail,Lesion]∑M(x,y)=Lesion​×100

This provides an objective and reproducible measure of nail disease extent, excluding background and non-nail regions.

Objectives​

  • Support healthcare professionals by providing precise and reproducible quantification of nail disease extent.
  • Reduce subjectivity in nail severity assessment, particularly for conditions such as nail psoriasis (NAPSI - Nail Psoriasis Severity Index), onychomycosis, and nail lichen planus.
  • Enable automatic calculation of validated severity scores directly from images, improving consistency across assessments.
  • Improve robustness by excluding non-nail regions, ensuring consistent results across varied image framing and positioning.
  • Facilitate standardized evaluation in clinical practice and trials where manual nail assessment introduces significant variability.
  • Support longitudinal monitoring of treatment response in nail diseases, which typically show slow progression requiring objective tracking.

Justification (Clinical Evidence):

  • Nail disease evaluation is extent-based (percentage of nail surface involved), making objective measurement critical for severity assessment [176, 177].
  • Manual estimation of nail involvement shows substantial inter-observer variability, with reported κ values of 0.35-0.60 for NAPSI scoring, particularly for subtle manifestations [178, 179].
  • The Nail Psoriasis Severity Index (NAPSI) and similar scales rely on visual estimation of affected area, which shows poor reproducibility between assessors [180].
  • Deep learning segmentation methods have demonstrated superior consistency compared to manual assessment in nail disease quantification [181].
  • Automated nail lesion quantification addresses the clinical challenge of slow disease progression, where subjective assessment may miss subtle changes important for treatment response evaluation [182].
  • Studies validating AI-based nail assessment show strong correlation (r > 0.80) with expert consensus while significantly reducing assessment time [183].

Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) for nail segmentation and Relative Error (RE%) for percentage nail lesion area compared to expert annotations.

MetricThresholdInterpretation
IoU (Nail segmentation)≥ 0.60Good segmentation of nail vs. background achieves clinical utility.
Relative Error (Lesion area %)≤ 20%Predicted lesion percentage deviates ≤ 20% from expert consensus.
Pixel Accuracy (within nail)≥ 0.75Acceptable classification accuracy for healthy vs. lesion nail pixels.

Success criteria: The algorithm must achieve IoU ≥ 0.60 for nail segmentation, RE ≤ 20% for lesion percentage estimation, and Pixel Accuracy ≥ 0.75 within the nail region, with 95% confidence intervals.

Requirements:

  • Perform three-class segmentation (Healthy Nail, Lesion, Background).
  • Compute percentage of nail area affected by lesions relative to total nail surface.
  • Demonstrate IoU ≥ 0.60 for nail segmentation, RE ≤ 20% for lesion quantification, and Pixel Accuracy ≥ 0.75 compared to expert consensus.
  • Validate on diverse datasets including:
    • Multiple nail pathologies (psoriasis, onychomycosis, lichen planus, trauma, melanonychia)
    • Various nail locations (fingernails, toenails)
    • Different lesion types (pitting, onycholysis, discoloration, hyperkeratosis, splinter hemorrhages)
    • Diverse patient populations (various skin types, ages)
    • Multiple imaging conditions (lighting, angles, devices)
  • Handle challenging scenarios including:
    • Nails with multiple simultaneous pathologies
    • Subtle early-stage lesions with minimal visual contrast
    • Distal nail involvement where nail-background boundaries are ambiguous
    • Artificial nails, nail polish, or external artifacts
  • Ensure outputs are compatible with:
    • NAPSI (Nail Psoriasis Severity Index) calculation and interpretation
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for nail disease management
    • Longitudinal tracking systems for treatment response monitoring
  • Provide detailed output including:
    • Total nail surface area (when calibration available)
    • Percentage of nail affected by lesions
    • Spatial distribution of lesions (proximal, middle, distal nail regions)
    • Confidence maps indicating segmentation certainty
  • Document the segmentation strategy including:
    • Handling of nail plate boundaries and cuticle regions
    • Approach to distinguishing subtle lesions from healthy nail variations
    • Management of image quality issues (blur, glare, poor lighting)
    • Quality control mechanisms for low-confidence segmentations

Hypopigmentation or Depigmentation Surface Quantification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning segmentation model ingests a clinical image of the skin and outputs a three-class probability map for each pixel:

M(x,y)∈[Normal Skin,Hypopigmented or Depigmented,Background],∀(x,y)∈ImageM(x, y) \in [\text{Normal Skin}, \text{Hypopigmented or Depigmented}, \text{Background}], \quad \forall (x, y) \in \text{Image}M(x,y)∈[Normal Skin,Hypopigmented or Depigmented,Background],∀(x,y)∈Image
  • Normal Skin = skin region with normal pigmentation matching patient's baseline skin tone
  • Hypopigmented or Depigmented = skin region with reduced melanin (hypopigmentation) or complete absence of melanin (depigmentation), detected together without distinction
  • Background = non-skin area including clothing, hair, or any background elements

From this segmentation, the algorithm computes the percentage of skin surface affected by pigmentary loss relative to the total skin area:

y^=∑M(x,y)=Hypopigmented or Depigmented∑M(x,y)∈[Normal Skin,Hypopigmented or Depigmented]×100\hat{y} = \frac{\sum M(x, y) = \text{Hypopigmented or Depigmented}}{\sum M(x, y) \in [\text{Normal Skin}, \text{Hypopigmented or Depigmented}]} \times 100y^​=∑M(x,y)∈[Normal Skin,Hypopigmented or Depigmented]∑M(x,y)=Hypopigmented or Depigmented​×100

This provides an objective and reproducible measure of pigmentary disorder extent, excluding background and non-skin regions.

Objectives​

  • Support healthcare professionals by providing precise and reproducible quantification of pigmentary loss extent.
  • Reduce subjectivity in pigmentary disorder assessment, particularly for conditions such as vitiligo (VASI - Vitiligo Area Scoring Index, VETF - Vitiligo European Task Force), post-inflammatory hypopigmentation, pityriasis alba, and chemical leukoderma.
  • Enable automatic calculation of validated severity scores directly from images, including VASI and VETF scoring systems.
  • Improve robustness by excluding non-skin regions, ensuring consistent results across varied image framing, body sites, and baseline skin tones.
  • Facilitate standardized evaluation in clinical practice and trials where manual assessment of pigmentary changes introduces significant variability.
  • Support longitudinal monitoring of treatment response, particularly for repigmentation therapies in vitiligo.

Justification (Clinical Evidence):

  • Pigmentary disorder evaluation is extent-based (percentage of body surface involved), making objective measurement critical for severity assessment and treatment monitoring [184, 185].
  • Manual estimation of vitiligo and other pigmentary disorder extent shows substantial inter-observer variability, with reported κ values of 0.40-0.65 for VASI scoring [186, 187].
  • The Vitiligo Area Scoring Index (VASI) relies on visual estimation of affected area, which shows poor reproducibility between assessors and limited sensitivity to detect small changes [188].
  • Deep learning segmentation methods have demonstrated superior consistency compared to manual assessment in vitiligo extent quantification, with strong correlation (r > 0.85) to expert assessment [191].
  • Automated quantification addresses the clinical challenge of detecting subtle repigmentation during treatment, which may be missed by subjective visual assessment [192].
  • Studies show that objective vitiligo quantification improves early detection of treatment response, enabling timely therapy modifications [193].
  • Baseline skin tone variability across Fitzpatrick types introduces additional complexity in manual assessment that objective methods can address through normalization [194].

Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) for skin segmentation and Relative Error (RE%) for percentage pigmentary loss area compared to expert annotations.

MetricThresholdInterpretation
IoU (Skin segmentation)≥ 0.65Good segmentation of skin vs. background achieves clinical utility.
Relative Error (Pigmentary loss %)≤ 20%Predicted pigmentary loss percentage deviates ≤ 20% from expert consensus.
Pixel Accuracy (within skin)≥ 0.75Acceptable classification accuracy for normal vs. hypopigmented/depigmented skin.
Class-specific IoU (Pigmentary loss)≥ 0.60Good detection of hypopigmented or depigmented areas.

Success criteria: The algorithm must achieve IoU ≥ 0.65 for skin segmentation, RE ≤ 20% for pigmentary loss percentage estimation, Pixel Accuracy ≥ 0.75, and class-specific IoU ≥ 0.60, with 95% confidence intervals.

Requirements:

  • Perform three-class segmentation (Normal Skin, Hypopigmented or Depigmented, Background).
  • Compute percentage of skin area affected by pigmentary loss (hypopigmentation or depigmentation combined).
  • Demonstrate IoU ≥ 0.65 for overall skin segmentation, RE ≤ 20% for pigmentary loss quantification, Pixel Accuracy ≥ 0.75, and class-specific IoU ≥ 0.60 compared to expert consensus.
  • Validate on diverse datasets including:
    • Multiple pigmentary disorders (vitiligo, post-inflammatory hypopigmentation, pityriasis alba, chemical leukoderma, hypopigmented mycosis fungoides)
    • Various baseline skin tones (Fitzpatrick types I-VI)
    • Different anatomical sites (face, hands, trunk, extremities, acral areas)
    • Various disease stages (early, progressive, stable, repigmentation)
    • Diverse patient populations (ages, ethnicities)
    • Multiple imaging conditions (natural light, clinical photography, Wood's lamp when applicable)
  • Handle challenging scenarios including:
    • Subtle pigmentary loss on light skin (Fitzpatrick I-II)
    • Perifollicular repigmentation (small dots of repigmentation within affected patches)
    • Mixed patterns with varying degrees of pigmentary loss
    • Confetti-like or scattered macules
    • Sun-exposed vs. non-exposed skin tone variations
  • Ensure outputs are compatible with:
    • VASI (Vitiligo Area Scoring Index) calculation: body site-specific involvement percentages
    • VETF (Vitiligo European Task Force) assessment guidelines
    • Rule of Nines for body surface area estimation
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for vitiligo and pigmentary disorder management
    • Longitudinal tracking systems for repigmentation monitoring
  • Provide detailed output including:
    • Total skin surface area evaluated (when calibration available)
    • Percentage of skin with pigmentary loss
    • Body site-specific involvement (when body site is specified or detected)
    • Repigmentation indicators (reduction in affected area over time)
    • Confidence maps indicating segmentation certainty
  • Implement skin tone normalization strategies:
    • Adapt detection thresholds based on baseline skin tone (Fitzpatrick type)
    • Account for natural skin tone variation within the same patient
    • Use reference normal skin regions when available in the image
  • Document the segmentation strategy including:
    • Approach to detecting pigmentary loss across different skin tones
    • Handling of perifollicular repigmentation and mixed patterns
    • Management of lighting variations and image quality issues
    • Skin tone normalization methodology
    • Quality control mechanisms for low-confidence segmentations
    • Handling of hair, tattoos, and other confounding factors
  • Enable longitudinal comparison features:
    • Track changes in pigmentary loss area over time
    • Detect repigmentation patterns (marginal, perifollicular, diffuse)
    • Calculate repigmentation rate for treatment efficacy assessment
    • Flag new areas of pigmentary loss (disease progression)

Acneiform Inflammatory Pattern Identification​

Model Classification: 🔬 Clinical Model

Description​

A machine learning classification model ingests tabular features derived from the Inflammatory Lesion Quantification algorithm and outputs a probability distribution over Investigator's Global Assessment (IGA) severity categories:

pIGA=[p0,p1,p2,p3,p4]\mathbf{p}_{\text{IGA}} = [p_0, p_1, p_2, p_3, p_4]pIGA​=[p0​,p1​,p2​,p3​,p4​]

where each pip_ipi​ corresponds to the probability that the acne severity belongs to IGA category iii:

  • Grade 0: Clear (no inflammatory lesions)
  • Grade 1: Almost Clear (rare non-inflammatory lesions, no inflammatory lesions)
  • Grade 2: Mild (some non-inflammatory lesions, few inflammatory lesions)
  • Grade 3: Moderate (many non-inflammatory lesions, some inflammatory lesions)
  • Grade 4: Severe (covered with non-inflammatory lesions, many inflammatory lesions)

The model inputs are numerical features including:

  • Total inflammatory lesion count from the Inflammatory Lesion Quantification algorithm
  • Lesion density (lesions per unit area) from the Inflammatory Lesion Quantification algorithm
  • Additional contextual features such as anatomical site, affected surface area (when available)

The predicted IGA grade is:

y^IGA=arg⁡max⁡i∈[0,1,2,3,4]pi\hat{y}_{\text{IGA}} = \arg\max_{i \in [0,1,2,3,4]} p_iy^​IGA​=argi∈[0,1,2,3,4]max​pi​

This model performs tabular classification rather than image analysis, using structured numerical outputs from upstream computer vision models to make severity assessments aligned with standardized clinical grading systems.

Objectives​

  • Support healthcare professionals in providing standardized acne severity assessment using the validated Investigator's Global Assessment (IGA) scale.
  • Reduce inter-observer variability in IGA scoring, which shows moderate agreement (κ = 0.50-0.70) between raters in clinical practice [195, 196].
  • Enable automated severity classification by translating objective lesion counts and density into clinically meaningful IGA categories.
  • Ensure reproducibility by basing severity assessment on quantitative features rather than subjective visual impression.
  • Facilitate treatment decision-making by providing standardized severity grades that align with evidence-based treatment guidelines (e.g., topical therapy for mild, systemic therapy for severe).
  • Support clinical trial endpoints by providing consistent, reproducible IGA assessments as required by regulatory agencies.

Justification (Clinical Evidence):

  • The IGA scale is a widely validated tool for acne severity assessment and is the most commonly used primary endpoint in acne clinical trials [197, 198].
  • Manual IGA assessment shows substantial inter-observer variability (κ = 0.50-0.70), with particular difficulty in distinguishing between adjacent grades [195, 196].
  • Objective lesion counting combined with algorithmic severity classification has been shown to improve consistency (κ improvement to 0.75-0.85) compared to purely visual IGA assessment [199].
  • Treatment guidelines are explicitly linked to IGA grades, with clear recommendations for topical monotherapy (IGA 1-2), combination therapy (IGA 2-3), and systemic therapy consideration (IGA 3-4) [200].
  • Regulatory agencies require validated severity measures for acne trials, with IGA being the most accepted scale for primary efficacy endpoints [201].
  • Studies show that automated severity grading reduces assessment time by 40-60% while maintaining or improving accuracy compared to manual grading [202].

Endpoints and Requirements​

Performance is evaluated using accuracy, weighted kappa (κw), and class-specific metrics compared to expert dermatologist IGA assessments.

MetricThresholdInterpretation
Overall Accuracy≥ 70%Correct IGA classification in 7 out of 10 cases.
Weighted Kappa (κw)≥ 0.65Substantial agreement with expert IGA assessment.
Adjacent Grade Accuracy≥ 85%Within one grade of expert assessment (clinically acceptable).
Macro F1-score≥ 0.65Balanced performance across all IGA grades.
Class-specific F1 (Grade)≥ 0.60Minimum acceptable F1 for each individual IGA grade (0-4).
Mean Absolute Error (MAE)≤ 0.5Average error less than half a grade from expert consensus.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a tabular classification model (e.g., gradient boosting, random forest, neural network, or other ML classifier) that:
    • Accepts numerical inputs from Inflammatory Lesion Quantification:
      • Total inflammatory lesion count
      • Lesion density (lesions per unit area)
      • Optional: anatomical site identifier, affected surface area
    • Outputs a probability distribution over 5 IGA grades (0-4)
    • Provides the predicted IGA grade and associated confidence scores
  • Demonstrate performance meeting or exceeding all thresholds:
    • Overall Accuracy ≥ 70%
    • Weighted Kappa ≥ 0.65 (substantial agreement)
    • Adjacent Grade Accuracy ≥ 85% (within one grade tolerance)
    • Macro F1 ≥ 0.65 and class-specific F1 ≥ 0.60 for all grades
    • MAE ≤ 0.5 grades
  • Report all metrics with 95% confidence intervals and confusion matrices showing distribution of predictions across IGA grades.
  • Validate the model on an independent and diverse test dataset including:
    • Full range of IGA grades (0-4) with balanced representation
    • Various acne presentations (facial, truncal)
    • Diverse patient populations (various Fitzpatrick skin types, ages, genders)
    • Different anatomical sites (face, back, chest)
    • Data from multiple imaging devices and clinical settings
  • Handle ordinal nature of IGA scale:
    • Implement ordinal classification techniques or apply ordinal loss functions
    • Penalize distant grade errors more heavily than adjacent grade errors
    • Ensure predictions respect the natural ordering of severity grades
  • Ensure outputs are compatible with:
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for acne treatment recommendations
    • Treatment guidelines that specify interventions based on IGA grade
    • Clinical trial data collection systems requiring standardized IGA assessments
  • Provide interpretability features:
    • Feature importance scores showing contribution of lesion count vs. density
    • Threshold values for lesion count/density associated with grade transitions
    • Confidence intervals for predictions to flag uncertain cases requiring manual review
  • Document the model training strategy including:
    • Feature engineering approach (e.g., normalization, binning of lesion counts)
    • Handling of class imbalance (if present in training data)
    • Hyperparameter optimization methodology
    • Cross-validation strategy to ensure robust performance estimates
    • Rationale for model selection (if multiple architectures compared)
  • Provide evidence that:
    • The model generalizes across different patient populations and anatomical sites
    • Performance is consistent across all IGA grades (no systematic bias toward certain grades)
    • The model maintains performance with varying lesion densities (low to very high)
    • Predictions align with dermatologist consensus and clinical treatment guidelines
  • Include failure mode analysis:
    • Identify scenarios where model performance degrades (e.g., borderline cases between grades)
    • Establish confidence thresholds for automatic vs. manual review recommendations
    • Document expected performance in edge cases (e.g., very low or very high lesion counts)

Follicular and Inflammatory Pattern Identification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning multi-class classification model ingests clinical images of skin lesions and outputs a probability distribution across the three HS phenotypes defined by the Martorell classification system:

pphenotype=[pfollicular,pinflammatory,pmixed]\mathbf{p}_{\text{phenotype}} = [p_{\text{follicular}}, p_{\text{inflammatory}}, p_{\text{mixed}}]pphenotype​=[pfollicular​,pinflammatory​,pmixed​]

where each pip_ipi​ corresponds to the probability that the HS presentation belongs to phenotype iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The model classifies HS into three distinct phenotypes:

  • Follicular Phenotype: Lesions originating from hair follicles, characterized by comedones (blackheads), papules, pustules, leading to sinus tracts and scarring. Typically shows a more insidious onset with progressive follicular occlusion.
  • Inflammatory Phenotype: Sudden-onset, highly inflammatory presentation with abscess-like nodules and abscesses without prominent follicular lesions. Characterized by acute inflammatory episodes.
  • Mixed Phenotype: Combination of both follicular and inflammatory features, such as background comedones and follicular papules with recurrent large inflammatory abscesses. Acknowledges the heterogeneous nature and spectrum of HS presentations.

The predicted phenotype is:

Phenotype=arg⁡max⁡k∈[follicular,inflammatory,mixed]pk\text{Phenotype} = \arg\max_{k \in [\text{follicular}, \text{inflammatory}, \text{mixed}]} p_kPhenotype=argk∈[follicular,inflammatory,mixed]max​pk​

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives​

Follicular Phenotype Identification​
  • Enable early identification of the follicular phenotype to guide early intervention with targeted immunomodulatory therapies.
  • Support personalized treatment planning by identifying patients likely to progress to extensive sinus tract formation and scarring.
  • Guide surgical planning for patients with predominant follicular disease who may benefit from early excisional procedures.
  • Facilitate clinical research by enabling consistent phenotype classification across different centers and studies.

Justification (Clinical Evidence):

  • Martorell et al. (2020) demonstrated that the follicular phenotype has distinct clinical and epidemiological characteristics, with different disease progression patterns [ref: Martorell A, et al. JEADV 2020].
  • Follicular phenotype patients may benefit from early targeted therapies before extensive tract formation occurs, potentially improving long-term prognosis [ref: 163].
  • Recognition of the follicular pattern helps predict disease course and surgical needs, with follicular disease showing more extensive scarring and tract formation [ref: 164].
  • The follicular phenotype shows different response rates to biologic therapies compared to inflammatory phenotype (response rates differ by 15-25%) [ref: 165].
Inflammatory Phenotype Identification​
  • Identify candidates for early biologic therapy, as inflammatory phenotype typically shows better response to immunomodulatory agents.
  • Guide acute management strategies for patients with sudden-onset inflammatory episodes requiring urgent intervention.
  • Predict treatment response patterns based on phenotype-specific therapy outcomes documented in clinical trials.
  • Enable risk stratification for disease severity and potential complications.

Justification (Clinical Evidence):

  • Inflammatory phenotype shows superior response to biologics (adalimumab, secukinumab) compared to follicular phenotype, with clinical improvement in 60-75% vs 40-50% respectively [ref: 166, 167].
  • Early identification of inflammatory phenotype enables prompt initiation of systemic therapy, reducing disease burden and preventing progression [ref: 168].
  • The inflammatory phenotype has distinct cytokine profiles (higher IL-17, TNF-α) that correlate with specific therapeutic targets [ref: 169].
  • Patients with inflammatory phenotype have different surgical outcomes, with higher recurrence rates post-excision (35% vs 20% for follicular) [ref: 170].
Mixed Phenotype Identification​
  • Recognize phenotypic evolution in patients transitioning between or combining follicular and inflammatory features.
  • Guide multimodal treatment approaches for patients requiring both surgical and medical management.
  • Support longitudinal monitoring to detect phenotype shifts that may require treatment adjustment.
  • Improve clinical trial stratification by identifying this heterogeneous patient subgroup.

Justification (Clinical Evidence):

  • Mixed phenotype represents 30-40% of HS cases in clinical practice, requiring recognition for appropriate management [ref: 171].
  • Patients with mixed phenotype require combination therapeutic approaches, often needing both biologics and surgical intervention [ref: 172].
  • The mixed phenotype shows intermediate treatment responses and disease behavior, necessitating individualized treatment plans [ref: 173].
  • Phenotype can evolve over time, with up to 25% of patients transitioning from pure to mixed phenotype within 2-3 years [ref: 174].

Endpoints and Requirements​

MetricThresholdJustification
Overall Accuracy≥ 75%Acceptable classification performance for triaging patients to phenotype-specific treatment pathways.
Weighted Kappa≥ 0.70Substantial agreement with expert dermatologist phenotype classification.
Follicular F1-Score≥ 0.80High precision/recall for follicular phenotype crucial for early surgical planning.
Inflammatory F1-Score≥ 0.80High precision/recall for inflammatory phenotype essential for biologic therapy selection.
Mixed F1-Score≥ 0.70Acceptable performance for mixed phenotype given inherent classification difficulty.
Macro F1-Score≥ 0.75Balanced performance across all three phenotypes.
Top-2 Accuracy≥ 90%Algorithm provides correct phenotype within top-2 predictions (important for mixed/borderline cases).
Calibration Error (ECE)≤ 0.10Confidence scores accurately reflect true classification probability for clinical decision support.
AUC-ROC per class≥ 0.85 for each phenotypeStrong discriminative ability for each individual phenotype.

All thresholds must be achieved with 95% confidence intervals on an independent test set.

Requirements:

  • Implement a deep learning classification architecture (e.g., CNN, Vision Transformer, or hybrid) optimized for dermatological image analysis.
  • Output structured data including:
    • Probability distribution across all three phenotypes (follicular, inflammatory, mixed)
    • Predicted phenotype class with confidence score
    • Secondary phenotype probability to identify borderline/transitional cases
    • Feature attribution maps highlighting image regions supporting the classification
  • Demonstrate performance meeting or exceeding all thresholds for:
    • Overall accuracy ≥ 75% and weighted kappa ≥ 0.70
    • Class-specific F1-scores: Follicular ≥ 0.80, Inflammatory ≥ 0.80, Mixed ≥ 0.70
    • Calibration error ≤ 0.10 ensuring confidence scores are clinically meaningful
  • Report all metrics with 95% confidence intervals using stratified sampling to account for phenotype distribution.
  • Validate the model on an independent and diverse test dataset including:
    • All Hurley stages (I, II, III) represented across phenotypes
    • Multiple anatomical sites (axillary, inguinal, perianal, inframammary)
    • Various skin tones (Fitzpatrick I-VI) to ensure equitable performance
    • Different imaging conditions (clinical photography, dermoscopy where applicable)
    • Longitudinal cases showing phenotype evolution
  • Ensure outputs are compatible with:
    • Electronic Health Records (EHR) for phenotype documentation
    • Clinical decision support systems providing phenotype-specific treatment recommendations
    • Clinical trial enrollment systems for phenotype-based patient stratification
    • Treatment response monitoring platforms tracking phenotype-therapy correlations
  • Document the training strategy including:
    • Data augmentation techniques addressing class imbalance (if present)
    • Handling of borderline/ambiguous cases in training data
    • Multi-expert annotation protocol for ground truth establishment
    • Regularization strategies to prevent overfitting
    • Transfer learning approach (if using pre-trained models)
  • Provide evidence that:
    • The model generalizes across different HS severity levels (Hurley I-III)
    • Performance is maintained across diverse patient demographics
    • The model can identify phenotype transitions in longitudinal assessments
    • Predictions align with the Martorell classification criteria and expert consensus
    • The algorithm performs consistently across different anatomical sites
  • Include interpretability features:
    • Visualization of discriminative features for each phenotype (e.g., Grad-CAM, attention maps)
    • Quantitative analysis of follicular vs. inflammatory lesion patterns
    • Confidence thresholds for automatic classification vs. expert review
    • Documentation of model decision-making process for regulatory compliance
  • Establish clinical validation protocol:
    • Prospective validation with expert dermatologist panel assessment
    • Inter-rater reliability comparison (AI vs. multiple experts)
    • Clinical utility assessment in real-world treatment decision scenarios
    • Patient outcome correlation with phenotype-guided therapy selection
  • Document failure modes and limitations:
    • Performance in early-stage disease where phenotype is not yet established
    • Handling of atypical presentations not fitting classical Martorell criteria
    • Confidence scoring for images with insufficient lesion visibility
    • Recommendations for cases requiring manual expert classification

Clinical Impact:

This phenotype classification model directly supports the implementation of the Martorell classification system in clinical practice, enabling:

  1. Personalized treatment selection: Inflammatory phenotype → early biologics; Follicular phenotype → consideration of early surgical intervention
  2. Improved prognostication: Different phenotypes have distinct progression patterns and surgical outcomes
  3. Clinical trial optimization: Phenotype-based stratification improves trial design and outcome interpretation
  4. Treatment response prediction: Phenotype correlates with response to specific therapeutic modalities
  5. Disease monitoring: Early detection of phenotype evolution guides treatment adjustment

Inflammatory Pattern Identification​

Model Classification: 🔬 Clinical Model

Description​

A deep learning multi-class classification model ingests clinical images of inflammatory lesions and outputs a probability distribution across the three Hurley stages:

pHurley=[pI,pII,pIII]\mathbf{p}_{\text{Hurley}} = [p_{\text{I}}, p_{\text{II}}, p_{\text{III}}]pHurley​=[pI​,pII​,pIII​]

where each pip_ipi​ corresponds to the probability that the inflammatory lesion presentation belongs to Hurley stage iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The model classifies inflammatory lesions into three severity stages:

  • Hurley Stage I: Single or multiple isolated abscesses without sinus tracts or scarring. Lesions are separated and do not form interconnected areas.
  • Hurley Stage II: Recurrent abscesses with sinus tract formation and scarring. One or more widely separated lesions with limited interconnection.
  • Hurley Stage III: Diffuse or broad involvement with multiple interconnected sinus tracts and abscesses across an entire anatomical area. Extensive scarring and coalescence of lesions.

The predicted Hurley stage is:

Hurley Stage=arg⁡max⁡k∈[I,II,III]pk\text{Hurley Stage} = \arg\max_{k \in [\text{I}, \text{II}, \text{III}]} p_kHurley Stage=argk∈[I,II,III]max​pk​

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives​

  • Support healthcare professionals in providing standardized severity staging of inflammatory lesions using the validated Hurley staging system.
  • Reduce inter-observer variability in Hurley staging, which shows moderate agreement (κ = 0.55-0.70) between clinicians in practice, particularly in distinguishing Stage II from Stage III [234, 235].
  • Enable automated severity classification by translating visual lesion patterns, sinus tract presence, and scarring extent into clinically meaningful stage categories.
  • Ensure reproducibility by basing staging on objective visual features rather than subjective clinical impression.
  • Facilitate treatment decision-making by providing standardized severity stages that align with evidence-based treatment guidelines (e.g., medical management for Stage I-II, surgical intervention consideration for Stage II-III).
  • Support clinical trial endpoints by providing consistent, reproducible staging assessments as used in therapeutic efficacy studies.
  • Guide prognosis and patient counseling by providing objective disease severity classification associated with known clinical outcomes.

Justification (Clinical Evidence):

  • The Hurley staging system is the most widely used classification for hidradenitis suppurativa severity and is fundamental for treatment planning [236, 237].
  • Manual Hurley staging shows moderate inter-observer variability (κ = 0.55-0.70), with particular difficulty in distinguishing between Stage II and Stage III, where sinus tract extent and interconnection must be assessed [234, 235].
  • Treatment guidelines are explicitly linked to Hurley stages, with clear recommendations: Stage I → topical/oral antibiotics; Stage II → systemic therapy including biologics; Stage III → surgical intervention [238, 239].
  • Hurley stage correlates strongly with disease burden, quality of life impairment, and treatment response, making accurate staging critical for clinical decision-making [240].
  • Objective staging reduces treatment delays by 30-40% by enabling prompt identification of patients requiring advanced therapies or surgical referral [241].
  • Studies show that standardized staging improves treatment outcomes through appropriate therapy selection aligned with disease severity [242].

Endpoints and Requirements​

Performance is evaluated using accuracy, weighted kappa (κw), and class-specific metrics compared to expert dermatologist Hurley stage assessments.

MetricThresholdInterpretation
Overall Accuracy≥ 75%Correct Hurley stage classification in 3 out of 4 cases.
Weighted Kappa (κw)≥ 0.70Substantial agreement with expert Hurley staging.
Adjacent Stage Accuracy≥ 90%Within one stage of expert assessment (clinically safe).
Macro F1-Score≥ 0.70Balanced performance across all three Hurley stages.
Class-specific F1 (Stage≥ 0.65Minimum acceptable F1 for each individual Hurley stage.
Mean Absolute Error (MAE≤ 0.4Average error less than half a stage from expert consensus.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning classification architecture (e.g., CNN, Vision Transformer, or hybrid) optimized for dermatological lesion pattern analysis.
  • Output structured data including:
    • Probability distribution across all three Hurley stages (I, II, III)
    • Predicted Hurley stage with confidence score
    • Visual features detected supporting the classification (e.g., presence of sinus tracts, scarring extent, lesion interconnection)
    • Treatment recommendation category based on stage-specific guidelines
  • Demonstrate performance meeting or exceeding all thresholds:
    • Overall Accuracy ≥ 75% and Weighted Kappa ≥ 0.70
    • Adjacent Stage Accuracy ≥ 90% (critical for patient safety—misclassification by one stage is clinically acceptable)
    • Macro F1 ≥ 0.70 and class-specific F1 ≥ 0.65 for all stages
    • MAE ≤ 0.4 stages
  • Report all metrics with 95% confidence intervals and confusion matrices showing distribution of predictions across Hurley stages.
  • Validate the model on an independent and diverse test dataset including:
    • Full range of Hurley stages (I, II, III) with balanced representation
    • Multiple anatomical sites (axillary, inguinal, perianal, inframammary, gluteal)
    • Diverse patient populations (various Fitzpatrick skin types I-VI, ages, genders, body mass index)
    • Different disease presentations (active inflammation vs. quiescent disease, varying lesion densities)
    • Various imaging conditions and acquisition devices
    • Borderline cases between stages to test model robustness
  • Handle ordinal nature of Hurley stages:
    • Implement ordinal classification techniques or apply ordinal loss functions
    • Penalize Stage I → Stage III misclassification more heavily than I → II errors
    • Ensure predictions respect the natural severity progression
  • Address challenging scenarios:
    • Early Stage II where sinus tracts are minimal or subtle
    • Extensive Stage II vs. early Stage III where differentiation requires assessing interconnection extent
    • Quiescent disease where active inflammation is minimal but scarring and tracts indicate advanced stage
    • Post-surgical areas where staging must account for treated regions
    • Multiple anatomical sites with different severity stages
  • Ensure outputs are compatible with:
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems providing stage-specific treatment recommendations
    • Treatment guidelines (EDF, AAD, BAD) that specify interventions based on Hurley stage
    • Clinical trial enrollment systems requiring Hurley stage inclusion/exclusion criteria
    • Surgical planning systems for Stage III cases requiring intervention
  • Provide interpretability features:
    • Saliency maps highlighting image regions supporting stage classification (sinus tracts, scarring, lesion distribution)
    • Feature detection outputs indicating presence of key staging criteria:
      • Isolated abscesses (Stage I indicator)
      • Sinus tracts (Stage II-III indicator)
      • Extensive scarring (Stage II-III indicator)
      • Lesion interconnection (Stage III indicator)
    • Confidence thresholds for automatic staging vs. expert review
    • Stage-specific guidance for clinical decision-making
  • Document the training strategy including:
    • Multi-expert annotation protocol for Hurley staging ground truth (consensus among dermatologists with HS expertise)
    • Handling of class imbalance (Stage III may be less prevalent in training data)
    • Data augmentation strategies accounting for realistic lesion variations
    • Regularization strategies to prevent overfitting
    • Transfer learning approach (if using pre-trained models)
  • Implement quality control mechanisms:
    • Automatic detection of images unsuitable for Hurley staging (insufficient lesion visibility, extreme cropping, quality issues)
    • Flagging of ambiguous cases requiring manual expert review
    • Confidence scoring calibration for borderline stage classifications
  • Provide evidence that:
    • The model generalizes across different anatomical sites and patient populations
    • Performance is consistent across all Hurley stages (no systematic bias toward certain stages)
    • The model maintains performance across various disease presentations (active vs. quiescent)
    • Predictions align with dermatologist consensus and established Hurley criteria
    • Stage predictions correlate with clinical outcomes (treatment response, disease burden)
  • Include failure mode analysis:
    • Performance on borderline cases between stages (particularly Stage II/III boundary)
    • Handling of atypical presentations not clearly fitting classical Hurley criteria
    • Behavior with images showing multiple anatomical sites at different stages
    • Confidence scoring for cases with limited lesion visibility or image quality issues
    • Recommendations for cases requiring in-person clinical examination
  • Establish clinical validation protocol:
    • Prospective validation with expert dermatologist panel Hurley staging
    • Inter-rater reliability comparison (AI vs. multiple HS specialists)
    • Clinical utility assessment in treatment decision scenarios
    • Patient outcome correlation with AI-assigned vs. clinician-assigned stages
    • Real-world deployment validation in diverse clinical settings

Clinical Impact:

The Hurley Staging model directly supports clinical decision-making:

  1. Treatment selection: Enables evidence-based therapy choice aligned with stage-specific guidelines
  2. Surgical planning: Identifies Stage III patients requiring surgical consultation
  3. Prognostication: Provides severity classification associated with known disease trajectories
  4. Clinical trial eligibility: Supports patient stratification and enrollment decisions
  5. Disease monitoring: Enables objective tracking of disease progression or response to therapy
  6. Resource allocation: Facilitates appropriate referral to specialized HS centers for advanced disease

Note: This model provides staging assessment based on visual lesion patterns, sinus tract presence, and scarring extent. While staging informs treatment decisions, the model outputs severity classification (quantitative data on disease extent and pattern) rather than diagnostic confirmation or specific treatment prescriptions.

Inflammatory Pattern Indicator​

Model Classification: 🔬 Clinical Model

Description​

A deep learning binary classification model ingests clinical images of hidradenitis suppurativa lesions and outputs a probability distribution indicating the presence or absence of active inflammatory activity:

pinflammatory=[pnon-inflammatory,pinflammatory]\mathbf{p}_{\text{inflammatory}} = [p_{\text{non-inflammatory}}, p_{\text{inflammatory}}]pinflammatory​=[pnon-inflammatory​,pinflammatory​]

where each pip_ipi​ corresponds to the probability that the HS presentation belongs to category iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The model classifies HS lesions into two states:

  • Non-Inflammatory: Inactive disease characterized by post-inflammatory changes including scars, fibrotic tracts, comedones without surrounding erythema, and healed lesions without active inflammation.
  • Inflammatory: Active disease characterized by erythematous nodules, abscesses, draining sinus tracts with active discharge, acute inflammatory flares, and lesions with signs of acute inflammation (warmth, tenderness, active suppuration).

The predicted inflammatory status is:

Inflammatory Status=1[pinflammatory≥0.5]\text{Inflammatory Status} = \mathbb{1}[p_{\text{inflammatory}} \geq 0.5]Inflammatory Status=1[pinflammatory​≥0.5]

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives​

  • Support healthcare professionals in objectively identifying active inflammatory disease requiring immediate therapeutic intervention versus quiescent disease.
  • Enable treatment decision-making by distinguishing patients who require anti-inflammatory therapy (antibiotics, biologics, immunosuppressants) from those who may benefit from surgical intervention for non-inflammatory sequelae.
  • Facilitate disease monitoring by providing objective assessment of inflammatory activity changes over time in response to treatment.
  • Improve clinical trial design by enabling objective stratification of patients based on inflammatory activity status for enrollment and outcome assessment.
  • Guide urgent care triage by identifying acute inflammatory flares requiring prompt intervention versus chronic stable disease.
  • Support treatment escalation decisions by objectively documenting persistent or recurrent inflammatory activity despite current therapy.

Justification (Clinical Evidence):

  • Distinguishing inflammatory from non-inflammatory HS is critical for treatment selection, as inflammatory disease requires anti-inflammatory therapies (systemic antibiotics, biologics) while non-inflammatory sequelae may benefit from surgical management [243, 244].
  • Manual assessment of inflammatory activity shows moderate inter-observer variability (κ = 0.50-0.68), particularly in distinguishing subtle inflammatory changes from post-inflammatory erythema [245].
  • The 2024 European HS Guidelines emphasize the importance of assessing inflammatory activity for treatment decisions, with active inflammation being an indication for medical therapy and inactive disease potentially benefiting from definitive surgical management [246].
  • Inflammatory burden assessment is a key component of validated severity scores (IHS4, HS-PGA) and correlates with patient-reported pain, quality of life impairment, and treatment response [247, 248].
  • Studies show that objective inflammatory activity assessment predicts response to biologic therapy, with active inflammation at baseline associated with 60-75% response rates versus 25-40% in predominantly non-inflammatory disease [249].
  • Inflammatory flares represent critical intervention points where treatment escalation can prevent disease progression and reduce long-term sequelae [250].
  • Automated inflammatory activity detection can identify subclinical inflammation that may be underappreciated in visual assessment but predicts disease progression [251].
  • The distinction between inflammatory and non-inflammatory disease impacts surgical timing and approach, with active inflammation increasing perioperative complications and recurrence risk [252].

Endpoints and Requirements​

Performance is evaluated using binary classification metrics (AUC, sensitivity, specificity, F1-score) compared to expert dermatologist inflammatory activity assessments.

MetricThresholdInterpretation
AUC (ROC)≥ 0.85Strong discriminative ability for inflammatory vs. non-inflammatory classification.
Sensitivity≥ 0.80High sensitivity for detecting active inflammation (minimize missed active cases).
Specificity≥ 0.80High specificity for identifying non-inflammatory disease (minimize false alarms).
F1-Score≥ 0.80Balanced precision and recall for inflammatory activity detection.
**Positive Predictive Value≥ 0.75Acceptable precision for inflammatory classification.
**Negative Predictive Value≥ 0.85Strong precision for non-inflammatory classification (important for surgical timing
Balanced Accuracy≥ 0.80Ensures equitable performance for both inflammatory and non-inflammatory classes.
Cohen's Kappa (κ)≥ 0.70Substantial agreement with expert inflammatory activity assessment.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning binary classification architecture (e.g., CNN, Vision Transformer, or hybrid) optimized for detecting inflammatory features in dermatological images.
  • Output structured data including:
    • Binary inflammatory status (Inflammatory / Non-Inflammatory)
    • Probability score for inflammatory activity (0-1 continuous)
    • Confidence score indicating certainty of classification
    • Visual features detected supporting the classification:
      • Erythema presence and intensity
      • Active discharge/drainage detection
      • Lesion warmth indicators (indirect visual cues)
      • Acute vs. chronic lesion morphology
  • Demonstrate performance meeting or exceeding all thresholds:
    • AUC ≥ 0.85
    • Sensitivity and Specificity both ≥ 0.80
    • F1-score ≥ 0.80
    • Cohen's Kappa ≥ 0.70
  • Report all metrics with 95% confidence intervals and provide ROC curves and confusion matrices.
  • Validate the model on an independent and diverse test dataset including:
    • Full spectrum of inflammatory activity:
      • Acute inflammatory flares
      • Moderate inflammatory activity
      • Minimal/resolving inflammation
      • Completely quiescent disease
      • Post-inflammatory changes and scarring
    • All Hurley stages (I, II, III) to ensure performance across disease severity spectrum
    • Multiple anatomical sites (axillary, inguinal, perianal, inframammary, gluteal)
    • Diverse patient populations (various Fitzpatrick skin types I-VI, ages, genders)
    • Different disease presentations:
      • Active abscesses with surrounding erythema
      • Draining sinus tracts with active discharge
      • Nodules with varying degrees of inflammation
      • Chronic scarred lesions without active inflammation
      • Mixed presentations (some inflammatory, some non-inflammatory lesions)
    • Various imaging conditions and acquisition devices
    • Treatment contexts:
      • Baseline untreated disease
      • Partially treated with residual inflammation
      • Post-treatment quiescent disease
      • Relapsing disease with new inflammatory activity
  • Address challenging scenarios:
    • Post-inflammatory erythema vs. active inflammation (persistent redness without active disease)
    • Early inflammatory changes before overt abscess formation
    • Draining tracts that may have chronic drainage without acute inflammation
    • Mixed lesions where some areas show inflammation while others are scarred
    • Skin type variations where erythema visibility differs (Fitzpatrick V-VI may show less obvious erythema)
  • Ensure outputs are compatible with:
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems providing treatment recommendations based on inflammatory status
    • Treatment selection algorithms:
      • Inflammatory → medical therapy (antibiotics, biologics, immunosuppressants)
      • Non-inflammatory → consider surgical intervention
    • Disease monitoring dashboards tracking inflammatory activity over time
    • Clinical trial systems for patient stratification and outcome assessment
  • Provide interpretability features:
    • Saliency maps or attention visualizations highlighting image regions indicating inflammation:
      • Erythematous areas
      • Drainage sites
      • Acute lesion morphology
    • Feature-based explanation indicating detected inflammatory signs:
      • "High perilesional erythema detected"
      • "Active drainage visible"
      • "Acute inflammatory morphology"
    • Confidence calibration for borderline cases requiring expert review
    • Inflammatory activity score (continuous 0-1) in addition to binary classification
  • Document the training strategy including:
    • Multi-expert annotation protocol for inflammatory activity ground truth (consensus among dermatologists with HS expertise)
    • Handling of class imbalance (if present in training data)
    • Data augmentation strategies accounting for realistic inflammatory variations
    • Regularization strategies to prevent overfitting
    • Transfer learning approach (if using pre-trained models)
    • Loss function optimization for balanced sensitivity/specificity
  • Implement quality control mechanisms:
    • Automatic detection of images unsuitable for inflammatory assessment (poor quality, insufficient lesion visibility)
    • Flagging of ambiguous cases (borderline inflammation) requiring manual expert review
    • Confidence thresholds for automatic classification vs. expert consultation
  • Provide evidence that:
    • The model generalizes across different anatomical sites and patient populations
    • Performance is maintained across all Hurley stages and disease severities
    • The model can detect subtle inflammatory changes not immediately apparent to non-experts
    • Classifications correlate with clinical outcomes (treatment response, disease progression)
    • Performance is equitable across different Fitzpatrick skin types (no bias toward lighter skin)
    • The model distinguishes post-inflammatory changes from active inflammation
  • Include failure mode analysis:
    • Performance on borderline inflammatory cases
    • Handling of mixed presentations (simultaneous inflammatory and non-inflammatory lesions)
    • Behavior with atypical inflammatory presentations
    • Impact of image quality on classification accuracy
    • Confidence scoring for cases requiring clinical examination (palpation for warmth/tenderness)
  • Establish clinical validation protocol:
    • Prospective validation with expert dermatologist panel assessment
    • Inter-rater reliability comparison (AI vs. multiple HS specialists)
    • Clinical utility assessment in treatment decision scenarios
    • Correlation with inflammatory biomarkers (when available: CRP, ESR, cytokine profiles)
    • Patient outcome correlation with inflammatory status classification
    • Treatment response prediction validation (inflammatory cases → biologic response)

Clinical Impact:

The Inflammatory Activity Binary Indicator directly supports clinical decision-making:

  1. Treatment selection: Enables evidence-based differentiation between patients requiring anti-inflammatory therapy vs. surgical management
  2. Treatment escalation: Objectively documents persistent inflammatory activity warranting therapy intensification
  3. Surgical timing: Identifies quiescent disease optimal for surgical intervention with lower complication risk
  4. Flare detection: Enables early identification of inflammatory exacerbations requiring prompt intervention
  5. Treatment monitoring: Provides objective tracking of inflammatory activity changes in response to therapy
  6. Clinical trial optimization: Enables stratification of patients by inflammatory activity for targeted enrollment and outcome assessment
  7. Resource allocation: Facilitates appropriate urgency assignment and referral decisions

Note: This model provides binary inflammatory activity classification based on visual assessment of erythema, drainage, and acute lesion morphology. While it informs treatment decisions, the model outputs inflammatory status (quantitative data on disease activity) rather than diagnostic confirmation or specific treatment prescriptions. Clinical correlation including palpation for warmth and tenderness remains important for comprehensive inflammatory assessment.

Dermatology Image Quality Assessment (DIQA)​

Model Classification: 🛠️ Non-Clinical Model

Description​

A deep learning regression model ingests a clinical dermatological image and outputs a continuous quality score:

q^∈[0,10]\hat{q} \in [0, 10]q^​∈[0,10]

where q^\hat{q}q^​ represents the overall image quality on a continuous scale from 0 (unacceptable, non-diagnostic quality) to 10 (excellent, optimal diagnostic quality).

The quality score integrates multiple technical and clinical quality dimensions:

  • Technical Quality Factors:

    • Focus/sharpness (blur assessment)
    • Lighting conditions (over/underexposure, shadow artifacts)
    • Resolution adequacy
    • Motion artifacts
    • Noise levels
    • Color accuracy and white balance
  • Clinical Quality Factors:

    • Lesion visibility and framing
    • Appropriate field of view
    • Anatomical context
    • Scale/measurement reference (when required)
    • Absence of obstructions (hair, clothing, jewelry, cosmetics)
    • Appropriate imaging distance

The model may also output dimension-specific subscores for detailed quality assessment:

qsubscores=[qfocus,qlighting,qframing,qartifacts,qresolution]\mathbf{q}_{\text{subscores}} = [q_{\text{focus}}, q_{\text{lighting}}, q_{\text{framing}}, q_{\text{artifacts}}, q_{\text{resolution}}]qsubscores​=[qfocus​,qlighting​,qframing​,qartifacts​,qresolution​]

This enables automated quality control in clinical workflows, identifying images that require retaking before downstream AI analysis or clinical review.

Objectives​

  • Enable automated quality control in dermatological imaging workflows to ensure only diagnostic-quality images are analyzed or stored.
  • Reduce variability in image acquisition by providing real-time feedback to healthcare professionals and patients during image capture.
  • Prevent downstream AI failures by filtering out poor-quality images that could lead to inaccurate predictions from diagnostic or assessment AI models.
  • Improve clinical efficiency by reducing the need for image retakes discovered only after clinical review or AI analysis failure.
  • Support telemedicine applications by providing objective quality standards for patient-captured images in remote monitoring scenarios.
  • Ensure data quality in clinical trials and research by establishing objective inclusion criteria for image datasets.
  • Guide user behavior through real-time quality feedback during image acquisition, improving overall imaging practices.

Justification (Clinical Evidence):

  • Image quality is a critical determinant of AI model performance, with studies showing accuracy degradation of 15-40% when analyzing poor-quality images [203, 204].
  • Manual quality assessment shows substantial inter-observer variability (κ = 0.45-0.70), with inconsistent standards for "acceptable" quality across different clinicians and institutions [205].
  • Poor image quality is a leading cause of AI failure in real-world deployments, with 20-35% of clinical images being rejected or requiring retakes due to quality issues [206, 207].
  • Automated quality assessment has been shown to improve diagnostic accuracy by 12-25% through proactive filtering of suboptimal images before analysis [208].
  • Patient-captured images in telemedicine show significantly higher rates of quality issues (40-60%) compared to professional photography (5-15%), highlighting the need for automated guidance [209].
  • Real-time quality feedback during image acquisition has been shown to reduce retake rates by 50-70% and improve first-capture success rates [210].
  • Standardized quality thresholds improve reproducibility in clinical trials, with quality-controlled datasets showing 30-50% reduction in outcome measure variability [211].
  • Image quality directly impacts inter-rater reliability in manual lesion assessment, with high-quality images showing κ improvement of 0.15-0.25 compared to poor-quality images [212].

Endpoints and Requirements​

Performance is evaluated using correlation with expert quality ratings and classification accuracy at clinically relevant quality thresholds.

MetricThresholdInterpretation
Pearson Correlation (r)≥ 0.80Strong linear correlation with expert consensus quality scores.
Spearman Correlation (ρ)≥ 0.80Strong rank correlation with expert quality rankings.
Mean Absolute Error (MAE)≤ 0.8Predicted score deviates ≤ 0.8 points (on 0-10 scale) from expert consensus.
Binary Classification (Accept/Reject)≥ 0.90Accuracy in classifying images as acceptable (≥6) vs. unacceptable (<6) quality.
Sensitivity (Reject Detection)≥ 0.85High sensitivity for detecting unacceptable images requiring retake (minimize false negatives).
Specificity (Accept Detection)≥ 0.85High specificity for identifying acceptable images (minimize false positives/unnecessary retakes).
Cohen's Kappa (Binary)≥ 0.75Substantial agreement with expert binary accept/reject decisions.
Calibration Error≤ 0.15Quality scores accurately reflect true image quality levels.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning regression architecture capable of assessing multiple quality dimensions simultaneously.
  • Output structured data including:
    • Overall quality score (continuous, 0-10 scale)
    • Binary recommendation (Accept/Reject) based on quality threshold
    • Dimension-specific subscores for interpretability:
      • Focus/sharpness score
      • Lighting quality score
      • Framing/composition score
      • Artifact severity score (inverse: higher = fewer artifacts)
      • Resolution adequacy score
    • Actionable feedback indicating primary quality deficiencies for rejected images
    • Confidence score indicating certainty of the quality assessment
  • Demonstrate performance meeting or exceeding all thresholds:
    • Pearson correlation ≥ 0.80 with expert consensus
    • Binary classification accuracy ≥ 0.90 for accept/reject decisions
    • Sensitivity and specificity both ≥ 0.85 for quality thresholding
    • MAE ≤ 0.8 points on the 0-10 scale
  • Report all metrics with 95% confidence intervals and demonstrate calibration of quality scores.
  • Validate the model on an independent and diverse test dataset including:
    • Multiple dermatological conditions (inflammatory, pigmented, neoplastic, infectious)
    • Various anatomical sites (face, trunk, extremities, hands, feet, scalp, nails)
    • Different imaging devices (smartphones, digital cameras, dermatoscopes, professional medical cameras)
    • Diverse quality levels (full range from unacceptable to excellent)
    • Common quality defects:
      • Out-of-focus/blurred images
      • Over/underexposed images
      • Images with motion blur
      • Poor framing (lesion partially visible or too distant)
      • Obstructions (hair, clothing, glare, shadows)
      • Low resolution images
      • Incorrect white balance/color cast
    • Various patient populations (different skin tones, ages, body sites)
    • Different acquisition contexts (professional clinical, patient self-capture, telemedicine)
  • Establish quality thresholds for clinical decision-making:
    • Score ≥ 8: Excellent quality, optimal for all AI analyses and clinical review
    • Score 6-8: Good quality, acceptable for clinical use with minor limitations
    • Score 4-6: Marginal quality, may be acceptable for some purposes but retake recommended
    • Score <4: Poor quality, unacceptable for clinical use, retake required
    • Critical threshold: Score ≥ 6 for acceptance in clinical workflows
  • Ensure outputs are compatible with:
    • Real-time feedback systems for image capture guidance (mobile apps, clinical photography systems)
    • Quality control workflows in clinical and research settings
    • FHIR-based structured reporting for quality documentation
    • Image archive systems for automated quality tagging and filtering
    • Clinical decision support systems that require quality-controlled inputs
  • Provide interpretability features:
    • Saliency maps or attention visualizations highlighting image regions affecting quality scores
    • Dimension-specific explanations (e.g., "Image rejected due to: poor focus (3/10), inadequate lighting (4/10)")
    • Actionable recommendations for image retake (e.g., "Move closer to lesion", "Improve lighting", "Hold camera steady")
    • Quality improvement guidance for user training and feedback
  • Document the training strategy including:
    • Multi-expert annotation protocol for quality ground truth (consensus scoring)
    • Handling of quality dimension interactions and trade-offs
    • Data augmentation strategies to simulate common quality defects
    • Loss function design for regression on bounded scale (0-10)
    • Calibration techniques to ensure score reliability
  • Implement real-time processing capabilities:
    • Inference time < 500ms on typical mobile/clinical devices
    • Batch processing capabilities for archive quality assessment
    • Efficient model architecture suitable for edge deployment
  • Provide evidence that:
    • The model generalizes across different dermatological presentations and anatomical sites
    • Performance is consistent across imaging devices and manufacturers
    • The model maintains accuracy across different skin tones (Fitzpatrick I-VI)
    • Quality scores correlate with downstream AI model performance (higher quality → better accuracy)
    • The model can identify subtle quality defects that may not be apparent to non-expert users
    • Binary accept/reject recommendations align with clinical usability requirements
  • Include failure mode analysis:
    • Performance on edge cases (e.g., intentionally artistic or non-standard clinical images)
    • Handling of images with multiple simultaneous quality defects
    • Behavior with images outside the training distribution (e.g., novel imaging modalities)
    • Confidence calibration for borderline quality cases (scores 4-6)
    • Documentation of quality assessment limitations and disclaimers

Clinical Impact:

The DIQA model serves as a critical quality gate in the AI-assisted dermatology workflow:

  1. Pre-processing filter: Ensures only diagnostic-quality images are analyzed by downstream AI models (diagnosis, severity assessment, lesion quantification)
  2. User guidance: Provides real-time feedback during image acquisition, improving imaging practices over time
  3. Workflow efficiency: Reduces clinical time wasted on reviewing or analyzing poor-quality images
  4. Patient safety: Prevents clinical decisions based on non-diagnostic images that could lead to misdiagnosis or inappropriate treatment
  5. Telemedicine enablement: Makes remote dermatology viable by ensuring patient-captured images meet quality standards
  6. Research quality: Ensures dataset quality in clinical trials and research studies through objective inclusion criteria

Note: This is a non-clinical model that assesses technical and clinical image quality characteristics but does not make medical diagnoses or clinical assessments. It serves as a quality control tool to support clinical workflows and other AI models.

Fitzpatrick Skin Type Identification​

Model Classification: 🛠️ Non-Clinical Model

Description​

A deep learning multi-class classification model ingests a clinical dermatological image and outputs a probability distribution across the six Fitzpatrick skin type categories:

pFST=[pI,pII,pIII,pIV,pV,pVI]\mathbf{p}_{\text{FST}} = [p_{\text{I}}, p_{\text{II}}, p_{\text{III}}, p_{\text{IV}}, p_{\text{V}}, p_{\text{VI}}]pFST​=[pI​,pII​,pIII​,pIV​,pV​,pVI​]

where each pip_ipi​ corresponds to the probability that the skin in the image belongs to Fitzpatrick skin type iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The Fitzpatrick skin types are defined as:

  • Type I: Very fair skin, always burns, never tans (pale white skin, often with red/blonde hair)
  • Type II: Fair skin, usually burns, tans minimally (white skin, burns easily)
  • Type III: Medium skin, sometimes burns, tans uniformly (cream white skin, burns moderately)
  • Type IV: Olive skin, rarely burns, tans easily (moderate brown skin)
  • Type V: Brown skin, very rarely burns, tans very easily (dark brown skin)
  • Type VI: Dark brown to black skin, never burns, tans very easily (deeply pigmented dark brown to black skin)

The predicted Fitzpatrick type is:

FST=arg⁡max⁡k∈[I,II,III,IV,V,VI]pk\text{FST} = \arg\max_{k \in [\text{I}, \text{II}, \text{III}, \text{IV}, \text{V}, \text{VI}]} p_kFST=argk∈[I,II,III,IV,V,VI]max​pk​

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives​

  • Enable automated skin type detection to support personalized dermatological AI models that require skin tone information for accurate predictions.
  • Reduce assessment variability in skin type classification, which shows moderate inter-observer agreement (κ = 0.50-0.65) even among dermatologists [213, 214].
  • Support bias mitigation in AI models by identifying underrepresented skin types in datasets and ensuring equitable performance across all Fitzpatrick types.
  • Facilitate treatment personalization by providing objective skin type information relevant for phototherapy dosing, laser treatment parameters, and topical therapy selection.
  • Enable research stratification by providing consistent skin type classification for clinical trials and real-world evidence studies.
  • Support regulatory compliance by ensuring AI models are validated across diverse skin types as required by regulatory guidelines.
  • Improve telemedicine accessibility by providing automated skin type assessment in remote settings where patient-reported skin type may be unreliable.

Justification (Clinical Evidence):

  • Fitzpatrick skin type is a critical factor in dermatological assessment, influencing disease presentation, treatment selection, and AI model performance [215, 216].
  • Self-reported Fitzpatrick type shows poor accuracy, with concordance to expert assessment ranging from 40-60%, particularly for intermediate types (III-IV) [217, 218].
  • AI model performance shows significant disparities across skin types, with accuracy degradation of 10-30% for darker skin types (V-VI) when models are trained on predominantly lighter skin datasets [219, 220].
  • Automated skin type detection enables adaptive AI models that adjust prediction thresholds or use skin type-specific models, improving accuracy by 15-25% for underrepresented groups [221].
  • Treatment dosing for phototherapy and laser procedures requires accurate skin type assessment, with misclassification leading to suboptimal efficacy or adverse events in 15-20% of cases [222].
  • Clinical trials increasingly require Fitzpatrick type stratification to demonstrate equitable treatment efficacy and safety across diverse populations [223].
  • Studies show that objective skin type classification improves inter-rater reliability from κ = 0.50-0.65 (manual) to κ = 0.75-0.85 (automated) [224].
  • Automated detection addresses the limitation of visual assessment under different lighting conditions, which can shift perceived skin type by 1-2 categories [225].

Endpoints and Requirements​

Performance is evaluated using classification accuracy, weighted kappa, and per-class metrics compared to expert dermatologist Fitzpatrick type assessments.

MetricThresholdInterpretation
Overall Accuracy≥ 70%Acceptable classification performance for automated skin type detection.
Weighted Kappa (κw)≥ 0.65Substantial agreement with expert dermatologist Fitzpatrick type classification.
Adjacent Type Accuracy≥ 85%Within one Fitzpatrick type of expert assessment (clinically acceptable).
Macro F1-Score≥ 0.65Balanced performance across all six Fitzpatrick types.
Class-specific F1 (per type)≥ 0.60Minimum acceptable F1 for each Fitzpatrick type (I-VI).
Mean Absolute Error (MAE)≤ 0.8Average error less than one full Fitzpatrick type from expert consensus.
AUC-ROC per class≥ 0.80Good discriminative ability for each individual Fitzpatrick type.
Balanced Accuracy≥ 0.70Ensures equitable performance across all skin types, avoiding bias toward common types.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning classification architecture optimized for skin tone analysis.
  • Output structured data including:
    • Probability distribution across all six Fitzpatrick types (I-VI)
    • Predicted Fitzpatrick type with confidence score
    • Secondary type probability to identify borderline cases
    • Confidence indicators for predictions requiring manual verification
  • Demonstrate performance meeting or exceeding all thresholds:
    • Overall accuracy ≥ 70% and weighted kappa ≥ 0.65
    • Adjacent type accuracy ≥ 85% (within one type tolerance)
    • Class-specific F1 ≥ 0.60 for all types
    • Balanced accuracy ≥ 0.70 ensuring equitable performance
  • Report all metrics with 95% confidence intervals and confusion matrices showing prediction patterns.
  • Validate the model on an independent and diverse test dataset including:
    • Balanced representation of all six Fitzpatrick types
    • Multiple anatomical sites (face, forearm, trunk, unexposed vs. sun-exposed skin)
    • Various imaging conditions (natural light, clinical photography, different illuminants)
    • Diverse patient populations (various ethnicities, ages, geographic regions)
    • Different dermatological conditions (normal skin, inflammatory conditions, pigmentary disorders)
    • Various image quality levels to test robustness
  • Handle ordinal nature of Fitzpatrick scale:
    • Implement ordinal classification techniques or apply ordinal loss functions
    • Penalize distant type errors more heavily than adjacent type errors
    • Ensure predictions respect the natural ordering of skin types
  • Address lighting variability:
    • Validate performance across different lighting conditions (natural, artificial, mixed)
    • Document lighting requirements and acceptable ranges
    • Provide confidence scoring that reflects lighting quality impact
    • Consider color calibration or normalization techniques
  • Handle challenging scenarios:
    • Borderline cases between adjacent Fitzpatrick types
    • Tanned or sun-exposed skin vs. baseline skin tone
    • Patients with mixed ethnic backgrounds
    • Vitiligo or other pigmentary disorders affecting local skin tone
    • Makeup, tattoos, or other skin modifications
  • Ensure outputs are compatible with:
    • Downstream AI models that require skin type information as input
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems for treatment personalization
    • Bias monitoring dashboards tracking AI performance across skin types
    • Research data collection systems for clinical trial stratification
  • Provide interpretability features:
    • Visualization of skin regions used for classification
    • Melanin index estimation or related biomarkers
    • Explanation of ambiguous or borderline classifications
    • Confidence thresholds for automatic vs. manual classification
  • Document the training strategy including:
    • Data collection protocol ensuring balanced representation
    • Multi-expert annotation protocol for ground truth establishment
    • Handling of class imbalance (if present)
    • Data augmentation strategies preserving skin tone characteristics
    • Regularization and calibration techniques
    • Transfer learning approach (if applicable)
  • Implement quality control mechanisms:
    • Automatic detection of images unsuitable for skin type assessment (poor quality, obstructions)
    • Flagging of inconsistent lighting conditions
    • Identification of skin regions affected by pathology vs. normal skin
    • Recommendations for image retake when confidence is low
  • Provide evidence that:
    • The model generalizes across different dermatological conditions and anatomical sites
    • Performance is maintained across various imaging devices and settings
    • The model provides equitable performance for all Fitzpatrick types (no systematic bias)
    • Predictions align with expert dermatologist consensus
    • The algorithm handles lighting variations appropriately
    • Automated classification improves inter-rater reliability compared to manual assessment
  • Include bias assessment and mitigation:
    • Regular auditing of performance disparities across skin types
    • Documentation of dataset composition by Fitzpatrick type
    • Strategies for addressing underrepresentation in training data
    • Transparency reporting on per-type performance metrics
    • Continuous monitoring of real-world performance across diverse populations
  • Document failure modes and limitations:
    • Performance on skin with active dermatological conditions affecting pigmentation
    • Impact of recent sun exposure or tanning on classification accuracy
    • Handling of mixed or ambiguous ethnic backgrounds
    • Lighting conditions that may lead to unreliable predictions
    • Recommendations for cases requiring expert manual classification

Clinical Impact:

The Fitzpatrick Skin Type Identification model serves multiple critical functions:

  1. Bias mitigation: Enables skin type-aware AI models that maintain equitable performance across all populations
  2. Treatment personalization: Supports accurate dosing for phototherapy, laser procedures, and skin-type-specific therapeutics
  3. Research equity: Ensures clinical trials include and stratify diverse skin types for representative evidence
  4. Quality assurance: Validates that dermatological AI systems perform equitably across all Fitzpatrick types
  5. Regulatory compliance: Demonstrates AI model validation across diverse populations as required by regulatory agencies
  6. Clinical workflow integration: Provides automated skin type documentation for electronic health records

Note: This is a Non-Clinical model that provides skin type classification to enhance other AI models and clinical workflows. While it informs clinical decision-making (e.g., phototherapy dosing), it does not independently diagnose conditions or determine treatment. The model serves as an auxiliary tool for bias mitigation, personalization, and ensuring equitable AI performance across diverse patient populations.

Domain Validation​

Model Classification: 🛠️ Non-Clinical Model

Description​

A deep learning multi-class classification model ingests an image and outputs a probability distribution across three domain categories:

pdomain=[pnon-skin,pskin-clinical,pskin-dermoscopic]\mathbf{p}_{\text{domain}} = [p_{\text{non-skin}}, p_{\text{skin-clinical}}, p_{\text{skin-dermoscopic}}]pdomain​=[pnon-skin​,pskin-clinical​,pskin-dermoscopic​]

where each pip_ipi​ corresponds to the probability that the image belongs to domain category iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The model classifies images into three mutually exclusive domains:

  • Non-Skin: Images that do not contain visible skin (e.g., general objects, landscapes, text documents, completely obscured images, non-skin body parts such as eyes, teeth, or internal organs)
  • Skin Clinical Image: Standard clinical photographs showing skin surface captured with visible light imaging (standard photography, smartphone cameras, clinical digital cameras) - skin may be healthy or show any dermatological condition
  • Skin Dermoscopic Image: Specialized dermoscopic images of skin acquired using dermoscopy devices with magnification and specialized illumination for subsurface skin structure visualization - skin may be healthy or show any dermatological condition

The predicted domain is:

Domain=arg⁡max⁡k∈[non-skin,skin-clinical,skin-dermoscopic]pk\text{Domain} = \arg\max_{k \in [\text{non-skin}, \text{skin-clinical}, \text{skin-dermoscopic}]} p_kDomain=argk∈[non-skin,skin-clinical,skin-dermoscopic]max​pk​

Additionally, the model outputs a continuous confidence score representing the certainty of the classification.

Objectives​

  • Enable automated image routing to ensure images are analyzed by appropriate domain-specific AI models (skin clinical vs. skin dermoscopic analysis pipelines).
  • Prevent out-of-domain failures by filtering non-skin images before they reach downstream dermatological AI models.
  • Improve workflow efficiency by automatically triaging images to the correct analysis pathway without manual intervention.
  • Enhance patient safety by preventing inappropriate AI analysis of images that do not contain skin or meet domain requirements.
  • Support quality control in image acquisition by providing immediate feedback when incorrect image types are captured.
  • Enable multimodal clinical workflows where both clinical and dermoscopic images of skin may be captured and need to be processed differently.
  • Facilitate data curation by automatically organizing image archives based on imaging modality and skin presence.

Justification (Clinical Evidence):

  • Domain-specific AI models show significantly better performance when trained and deployed on their target imaging modality, with accuracy differences of 15-35% between skin clinical and skin dermoscopic images [226, 227].
  • Applying clinical-trained models to dermoscopic images (or vice versa) results in substantial performance degradation and increased false positive/negative rates [228].
  • Approximately 5-15% of images submitted to dermatological AI systems are non-skin or incorrect modality, leading to system failures or misleading outputs [229].
  • Automated domain classification reduces workflow errors by 60-80% compared to manual image routing, particularly in high-volume telemedicine settings [230].
  • Dermoscopic images require specialized processing pipelines including hair removal, illumination normalization, and magnification-aware feature extraction that are inappropriate for clinical skin images [231].
  • Clinical validation studies show that domain mismatch is a leading cause of AI system failures in real-world deployment, accounting for 25-40% of erroneous predictions [232].
  • Mixed-modality datasets without proper domain separation show reduced model performance (10-20% accuracy drop) compared to domain-specific training [233].

Endpoints and Requirements​

Performance is evaluated using classification accuracy, class-specific metrics, and confidence calibration compared to expert-labeled ground truth domain annotations.

MetricThresholdInterpretation
Overall Accuracy≥ 95%High accuracy required to prevent domain-routing errors that could impact patient care.
Non-Skin Precision≥ 0.95Minimize false acceptance of non-skin images into dermatological workflows.
Non-Skin Recall≥ 0.90High sensitivity for detecting and rejecting non-skin images.
Skin Clinical Image F1-Score≥ 0.93Balanced performance for skin clinical image identification and routing.
Skin Dermoscopic Image F1-Score≥ 0.93Balanced performance for skin dermoscopic image identification and routing.
Clinical vs. Dermoscopic Confusion≤ 5%Minimize misclassification between skin clinical and skin dermoscopic images (critical for safety)
Macro F1-Score≥ 0.92Balanced performance across all three domain categories.
Confidence Calibration (ECE)≤ 0.05Confidence scores accurately reflect true classification probability for decision gates.
AUC-ROC per class≥ 0.95Excellent discriminative ability for each domain category.
Specificity (Skin Content)≥ 0.95High specificity for accepting skin images (clinical or dermoscopic combined).

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning classification architecture optimized for domain recognition across diverse image types.
  • Output structured data including:
    • Probability distribution across all three domain categories
    • Predicted domain class with confidence score
    • Binary skin presence flag (True if skin clinical or skin dermoscopic, False if non-skin)
    • Routing recommendation for downstream AI pipelines
    • Confidence indicators for borderline cases requiring manual review
  • Demonstrate performance meeting or exceeding all thresholds:
    • Overall accuracy ≥ 95%
    • Class-specific F1 scores ≥ 0.93 for skin clinical and skin dermoscopic images
    • Clinical vs. dermoscopic confusion rate ≤ 5%
    • Confidence calibration error ≤ 0.05
  • Report all metrics with 95% confidence intervals and confusion matrices detailing prediction patterns.
  • Validate the model on an independent and diverse test dataset including:
    • Non-skin images: General objects, indoor/outdoor scenes, text documents, non-skin body parts (eyes, teeth, internal organs, hair only), random photography, intentionally uploaded incorrect content
    • Skin clinical images:
      • Various anatomical sites (face, trunk, extremities, hands, feet, scalp, nails, mucosa)
      • Healthy skin and various skin conditions (inflammatory, pigmented, neoplastic, infectious)
      • Multiple imaging devices (smartphones, digital cameras, clinical photography systems)
      • Various lighting conditions and image quality levels
      • Diverse skin tones (Fitzpatrick I-VI)
    • Skin dermoscopic images:
      • Contact dermoscopy (direct skin contact with immersion)
      • Non-contact dermoscopy (polarized light)
      • Various dermoscopy devices and manufacturers
      • Different magnification levels (10x, 20x, higher magnification)
      • Healthy skin and various presentations (melanocytic, non-melanocytic, benign, malignant patterns)
      • Various anatomical sites and skin types
  • Handle challenging scenarios:
    • Borderline clinical/dermoscopic images: Close-up clinical images of skin that may appear dermoscopic-like
    • Partially visible skin: Images where skin is present but not the primary subject
    • Low-quality skin images: Blurry, poorly lit, or obscured skin
    • Macro photography: High-magnification clinical images of skin that resemble dermoscopy
    • Mixed content: Images containing both skin and non-skin elements
    • Edge cases: Tattoos, cosmetic applications, artificial skin materials, mannequins
  • Ensure outputs are compatible with:
    • Image routing systems that direct images to appropriate AI analysis pipelines
    • Quality control workflows that filter inappropriate submissions
    • FHIR-based structured reporting for modality documentation
    • Clinical decision support systems requiring modality-specific processing
    • Data management systems for automated image categorization and archiving
  • Provide interpretability features:
    • Saliency maps highlighting image regions supporting domain classification
    • Confidence thresholds for automatic routing vs. manual review
    • Rejection reasons for non-skin images (e.g., "no skin visible", "text document detected", "non-skin body part")
    • Modality indicators specifying dermoscopic device signatures when detected
  • Document the training strategy including:
    • Balanced representation of all three domain categories
    • Inclusion of difficult borderline cases in training data
    • Data augmentation strategies appropriate for each domain
    • Multi-expert annotation protocol for ambiguous cases
    • Handling of domain ambiguity in edge cases
    • Transfer learning approach leveraging both medical and general computer vision
  • Implement real-time processing capabilities:
    • Inference time < 200ms for immediate routing decisions
    • Lightweight architecture suitable for edge deployment (mobile, clinical devices)
    • Batch processing for archive retrospective analysis
  • Provide evidence that:
    • The model generalizes across different skin presentations and imaging devices
    • Performance is maintained across various image quality levels
    • Clinical vs. dermoscopic distinction is robust regardless of magnification or framing
    • Non-skin rejection prevents downstream AI failures effectively
    • The model handles diverse non-skin content robustly
    • Domain routing improves downstream AI accuracy compared to domain-agnostic approaches
  • Include failure mode analysis:
    • Performance on ambiguous cases (macro clinical skin photography, dermoscopy without typical patterns)
    • Handling of novel dermoscopy devices not seen during training
    • Behavior with intentionally adversarial or misleading inputs
    • Confidence scoring for images requiring expert review
    • Documentation of domain classification limitations and edge cases
  • Establish clinical validation protocol:
    • Prospective validation with expert dermatologist domain labeling
    • Inter-rater reliability comparison for ambiguous cases
    • Real-world deployment assessment with clinical workflow integration
    • Impact assessment on downstream AI model performance when domain filtering is applied

Clinical Impact:

The Domain Validation model serves as a critical gateway and routing system:

  1. Patient safety: Prevents inappropriate AI analysis of non-skin or mismatched-modality images that could lead to erroneous clinical decisions
  2. Workflow optimization: Automatically routes images to appropriate analysis pipelines (skin clinical vs. skin dermoscopic) without manual intervention
  3. Error prevention: Eliminates domain mismatch errors that account for 25-40% of AI system failures in deployment
  4. Quality control: Provides immediate feedback when incorrect images are submitted, enabling user correction
  5. Multimodal support: Enables sophisticated clinical workflows where both skin clinical and skin dermoscopic images are used complementarily
  6. Data integrity: Ensures research datasets and clinical archives maintain proper domain separation for valid analysis

Note: This is a Non-Clinical model that performs image domain classification to route images to appropriate analysis pipelines. It does not make medical diagnoses or clinical assessments. The model serves as a technical gateway ensuring that dermatological AI systems receive appropriate input images containing skin, thereby supporting the safety and efficacy of downstream clinical models.

Skin Surface Segmentation​

Model Classification: 🛠️ Non-Clinical Model

Description​

A deep learning binary segmentation model ingests a clinical image and outputs a pixel-wise probability map indicating skin presence:

M(x,y)∈[0,1],∀(x,y)∈ImageM(x, y) \in [0, 1], \quad \forall (x, y) \in \text{Image}M(x,y)∈[0,1],∀(x,y)∈Image

where M(x,y)M(x, y)M(x,y) represents the probability that pixel (x,y)(x, y)(x,y) belongs to skin tissue.

The model generates a binary segmentation mask by applying a threshold:

M^(x,y)=1[M(x,y)≥τ]\hat{M}(x, y) = \mathbb{1}[M(x, y) \geq \tau]M^(x,y)=1[M(x,y)≥τ]

where τ\tauτ is typically set to 0.5, and M^(x,y)∈{Skin,Non-Skin}\hat{M}(x, y) \in \{\text{Skin}, \text{Non-Skin}\}M^(x,y)∈{Skin,Non-Skin}.

From this segmentation, the algorithm can compute:

  • Total skin surface area in pixels (or calibrated units when scale reference available)
  • Skin region bounding boxes for automated cropping or region-of-interest extraction
  • Skin surface percentage relative to total image area
  • Multiple disconnected skin regions when present

This provides automated skin detection and isolation, enabling downstream clinical models to focus analysis on relevant skin regions while excluding background, clothing, and non-skin anatomical features.

Objectives​

  • Enable automated region-of-interest extraction for downstream clinical AI models by isolating skin regions from background and non-skin elements.
  • Support surface area quantification algorithms by providing accurate skin boundaries for percentage calculations (e.g., body surface area affected by lesions).
  • Improve robustness of clinical models by preprocessing images to focus on skin regions, reducing confounding factors from background elements.
  • Facilitate automated image cropping to standardize input regions for clinical assessment models.
  • Enable quality control by detecting images with insufficient skin visibility or excessive occlusion.
  • Support multi-region analysis by identifying and separating multiple disconnected skin areas within a single image.
  • Provide foundational input for higher-level segmentation tasks (e.g., lesion segmentation, body region identification).

Justification (Clinical Evidence):

  • Accurate skin segmentation is a prerequisite for many dermatological AI tasks, with downstream model accuracy improving by 15-30% when operating on properly segmented skin regions vs. raw images [253, 254].
  • Manual skin region annotation is time-consuming and variable, with inter-observer agreement (IoU) ranging from 0.75-0.85, particularly at boundaries with hair, clothing, or complex backgrounds [255].
  • Automated skin segmentation has demonstrated high accuracy (IoU > 0.90) across diverse imaging conditions and patient populations [256, 257].
  • Background elements in dermatological images can introduce confounding features that reduce clinical model accuracy by 10-25%, which skin segmentation effectively mitigates [258].
  • Skin detection is critical for telemedicine applications where patient-captured images often contain significant non-skin content (40-60% of image area) [259].
  • Accurate skin boundary detection enables precise surface area calculations essential for severity scoring systems (PASI, EASI, BSA estimation) [260].
  • Studies show that skin segmentation preprocessing improves diagnostic AI robustness to image composition variations, reducing performance degradation from 20-30% to <5% across different framing conditions [261].

Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) and pixel-wise metrics compared to expert-annotated ground truth skin masks.

MetricThresholdInterpretation
Mean IoU (Skin class)≥ 0.85Strong overlap with expert skin region annotations.
Pixel Accuracy≥ 0.90High overall classification accuracy across all pixels.
Sensitivity (Skin)≥ 0.90High sensitivity for detecting skin pixels (minimize missed skin regions).
Specificity (Non-Skin)≥ 0.85High specificity for identifying non-skin (minimize false skin detection).
Boundary F1-Score≥ 0.80Accurate delineation of skin boundaries (critical for area calculations).
Dice Coefficient≥ 0.90Strong overall segmentation quality and region overlap.
False Positive Rate≤ 0.10Low rate of non-skin pixels incorrectly classified as skin.
Edge Accuracy (5px)≥ 0.75Boundary pixels within 5-pixel tolerance of expert annotation.
Multi-region Detection≥ 0.85Accuracy in identifying multiple disconnected skin regions when present.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning segmentation architecture (e.g., U-Net, DeepLabV3+, Mask R-CNN, or similar) optimized for skin detection.
  • Output structured data including:
    • Binary segmentation mask (Skin / Non-Skin) at full image resolution
    • Probability map providing pixel-wise confidence scores (0-1)
    • Bounding boxes for detected skin regions
    • Connected component analysis identifying individual skin areas when multiple regions present
    • Skin surface area in pixels and percentage of image
    • Quality indicators flagging insufficient skin visibility or excessive occlusion
  • Demonstrate performance meeting or exceeding all thresholds:
    • Mean IoU ≥ 0.85 and Dice Coefficient ≥ 0.90
    • Pixel Accuracy ≥ 0.90
    • Sensitivity ≥ 0.90 and Specificity ≥ 0.85
    • Boundary F1-Score ≥ 0.80
  • Report all metrics with 95% confidence intervals on independent test sets.
  • Validate the model on an independent and diverse test dataset including:
    • Multiple anatomical sites: face, neck, trunk, extremities, hands, feet, scalp, nails, intertriginous areas, mucosa
    • Various skin conditions: healthy skin, inflammatory conditions, pigmented lesions, neoplastic lesions, infectious conditions
    • Diverse patient populations: all Fitzpatrick skin types (I-VI), various ages, body habitus
    • Different imaging contexts:
      • Clinical photography (controlled lighting, professional framing)
      • Patient self-captured images (variable lighting, framing, backgrounds)
      • Dermoscopic images (close-up skin, minimal background)
      • Telemedicine images (diverse backgrounds, varied quality)
    • Challenging scenarios:
      • Images with complex backgrounds (patterned clothing, furniture, outdoor settings)
      • Partial skin visibility (cropped images, occluded regions)
      • Multiple disconnected skin regions (e.g., face and hands in same image)
      • Hair-covered skin regions (scalp, beard, body hair)
      • Skin-colored objects or surfaces in background
      • Extreme lighting conditions (shadows, highlights, color casts)
  • Handle boundary challenges:
    • Hair boundaries: Accurately segment skin at hairline, through facial/body hair
    • Clothing edges: Precise delineation at skin-clothing boundaries
    • Jewelry and accessories: Exclude watches, rings, necklaces while preserving adjacent skin
    • Shadows and highlights: Maintain segmentation accuracy despite lighting variations
    • Skin folds: Accurate segmentation in intertriginous areas and natural creases
  • Ensure outputs are compatible with:
    • Downstream clinical AI models requiring skin region input (lesion analysis, severity assessment)
    • Surface area quantification algorithms for BSA calculations
    • Image preprocessing pipelines for automated cropping and standardization
    • FHIR-based structured reporting for documentation of analyzed regions
    • Quality control systems using skin visibility as acceptance criteria
  • Provide post-processing capabilities:
    • Morphological operations to refine segmentation (hole filling, edge smoothing)
    • Connected component filtering to remove small false positive regions
    • Region ranking by size to identify primary vs. secondary skin areas
    • Boundary refinement using edge-aware techniques for precise delineation
  • Document the training strategy including:
    • Multi-expert annotation protocol for ground truth segmentation
    • Handling of ambiguous boundaries (e.g., translucent hair over skin)
    • Data augmentation strategies preserving skin-background relationships
    • Loss function design (e.g., combined Dice + Cross-Entropy, boundary-aware losses)
    • Class balancing approach given typical skin/background imbalance
  • Implement real-time processing capabilities:
    • Inference time < 1 second for typical dermatological images
    • Memory-efficient architecture suitable for mobile/edge deployment
    • Batch processing support for archive analysis
  • Provide evidence that:
    • The model generalizes across diverse anatomical sites and skin conditions
    • Performance is maintained across all Fitzpatrick skin types without bias
    • Segmentation accuracy is consistent across different imaging devices and conditions
    • Boundary precision is sufficient for accurate surface area calculations
    • The model handles partial skin visibility and occlusions robustly
    • Multi-region detection works reliably when multiple skin areas are present
  • Include quality control features:
    • Skin visibility score: Percentage of image containing skin
    • Occlusion detection: Flags for excessive clothing/obstruction
    • Boundary quality score: Confidence in edge delineation
    • Multi-region flag: Indicates presence of multiple disconnected skin areas
    • Segmentation confidence map: Pixel-wise uncertainty estimation
  • Include failure mode analysis:
    • Performance on skin-colored backgrounds or objects
    • Handling of extreme close-ups where context is minimal
    • Behavior with non-standard skin appearances (tattoos, makeup, artificial coloring)
    • Impact of image quality degradation on segmentation accuracy
    • Documentation of scenarios requiring manual review or alternative approaches

Clinical Impact:

The Skin Surface Segmentation model serves as foundational infrastructure supporting multiple clinical applications:

  1. Preprocessing for clinical models: Provides focused skin regions for downstream diagnostic and assessment AI models
  2. Surface area quantification: Enables accurate BSA calculations for severity indices (PASI, EASI, burn assessment)
  3. Quality control: Identifies images with insufficient skin visibility requiring retake
  4. Automated cropping: Standardizes region-of-interest for consistent clinical analysis
  5. Multi-region analysis: Supports comprehensive assessment when multiple anatomical sites are captured
  6. Telemedicine enablement: Handles variable patient-captured images with diverse backgrounds

Note: This is a Non-Clinical model that performs skin region detection and segmentation to support downstream clinical models and surface area calculations. It does not make medical diagnoses or clinical assessments. The model serves as technical preprocessing infrastructure ensuring that clinical AI models operate on appropriate skin regions, thereby supporting the accuracy and reliability of quantitative clinical outputs.

Surface Area Quantification​

Model Classification: 🛠️ Non-Clinical Model

Description​

A multi-stage computer vision pipeline ingests a clinical image of a body site containing one or more reference markers (calibration objects of known physical dimensions) and outputs a pixel-to-centimeter conversion map that accounts for depth variation across the image surface, enabling accurate surface area quantification of skin regions and lesions.

The algorithm consists of four primary stages:

Stage 1: Reference Marker Detection

A deep learning object detection model identifies and localizes reference markers in the image:

R=[(b1,s1,c1),(b2,s2,c2),…,(bN,sN,cN)]\mathbf{R} = [(b_1, s_1, c_1), (b_2, s_2, c_2), \ldots, (b_N, s_N, c_N)]R=[(b1​,s1​,c1​),(b2​,s2​,c2​),…,(bN​,sN​,cN​)]

where bib_ibi​ is the bounding box for the iii-th detected marker, sis_isi​ is the known physical size of the marker (in cm), and cic_ici​ is the detection confidence score. NNN ranges from 1 to multiple markers placed at different depths/distances from the camera.

Stage 2: Local Pixel-to-Centimeter Calibration

For each detected marker iii, a local calibration factor is computed:

αi=siwi\alpha_i = \frac{s_i}{w_i}αi​=wi​si​​

where wiw_iwi​ is the width of the marker in pixels (derived from bib_ibi​), and αi\alpha_iαi​ represents the cm/pixel ratio at the marker's location.

Stage 3: Depth Map Estimation

A deep learning monocular depth estimation model ingests the image and outputs a dense depth map:

D(x,y)∈[dmin⁡,dmax⁡],∀(x,y)∈ImageD(x, y) \in [d_{\min}, d_{\max}], \quad \forall (x, y) \in \text{Image}D(x,y)∈[dmin​,dmax​],∀(x,y)∈Image

where D(x,y)D(x, y)D(x,y) represents the estimated relative depth (distance from camera) at pixel (x,y)(x, y)(x,y). The depth map is normalized to a consistent scale using the detected markers as anchor points.

Stage 4: Depth-Aware Calibration Map Generation

The local calibration factors from the markers are propagated across the entire image using the depth map to generate a spatially-varying pixel-to-centimeter conversion map:

α(x,y)=f(D(x,y),{(αi,D(xi,yi))}i=1N)\alpha(x, y) = f(D(x, y), \{(\alpha_i, D(x_i, y_i))\}_{i=1}^N)α(x,y)=f(D(x,y),{(αi​,D(xi​,yi​))}i=1N​)

where fff is an interpolation function (e.g., inverse distance weighting, radial basis functions, or learned mapping) that extrapolates calibration based on depth similarity. This accounts for perspective distortion and varying distances across the body surface.

The final surface area of a segmented region SSS (in cm²) is computed as:

Area(S)=∑(x,y)∈Sα(x,y)2\text{Area}(S) = \sum_{(x,y) \in S} \alpha(x, y)^2Area(S)=(x,y)∈S∑​α(x,y)2

where each pixel's contribution is weighted by its local calibration factor squared (converting from linear to area units).

Objectives​

  • Enable accurate surface area quantification in body surface area (BSA) affected calculations for severity scoring systems (PASI, EASI, burn assessment, vitiligo VASI).
  • Account for depth variation across non-planar body surfaces, providing more accurate measurements than simple 2D planimetry.
  • Support flexible marker placement allowing 1 to multiple reference markers positioned at varying depths for improved accuracy.
  • Reduce measurement error associated with perspective distortion, camera angle, and irregular body surface curvature.
  • Provide calibrated measurements in standardized physical units (cm², percentage of body site) for clinical documentation and research.
  • Enable automated BSA percentage calculation by combining surface area measurements with body site identification.
  • Support telemedicine workflows where physical ruler measurements are impractical or unavailable.

Justification (Clinical Evidence):

  • Body surface area quantification is fundamental to severity scoring in dermatology, with PASI, EASI, and burn assessment all requiring accurate BSA affected estimates [275, 276].
  • Manual BSA estimation shows high inter-observer variability (coefficient of variation 20-40%), particularly for irregular lesions or when visual estimation methods are used [277, 278].
  • Simple 2D planimetry without depth correction introduces systematic errors of 15-35% when measuring non-planar body surfaces due to perspective distortion and surface curvature [279].
  • Reference marker-based calibration has been validated in wound measurement showing accuracy within 5-10% of gold-standard methods (water displacement, 3D scanning) [280, 281].
  • Monocular depth estimation combined with calibration markers achieves mean absolute error <8% for surface area quantification on curved surfaces [282].
  • Automated BSA quantification improves reproducibility in clinical trials, with standardized measurements showing 50-70% reduction in outcome variability compared to visual estimation [283].
  • Depth-aware surface area calculation is particularly critical for body sites with significant curvature (joints, torso, scalp) where 2D approximations introduce substantial error [284].

Endpoints and Requirements​

Performance is evaluated using relative error compared to gold-standard measurements (3D scanning, calibrated photography, or expert annotation with known ground truth).

MetricThresholdInterpretation
Marker Detection Precision≥ 0.95High precision for detecting reference markers (minimize false positives).
Marker Detection Recall≥ 0.90High recall for detecting all placed markers (minimize missed markers).
Marker Localization Accuracy≤ 5pxBounding box center within 5 pixels of true marker center.
Depth Map Relative Error≤ 15%Depth estimates within 15% of relative ground truth depth.
Surface Area Relative Error (Planar)≤ 10%Area measurement within 10% of ground truth for flat surfaces with single marker.
Surface Area Relative Error (Curved)≤ 15%Area measurement within 15% of ground truth for curved surfaces with multiple markers.
Calibration Map Consistency≥ 0.85Correlation between predicted and ground truth calibration across image (IoU-like).
Multi-Marker Fusion Accuracy≤ 8%Improved accuracy when multiple markers used vs. single marker baseline.
Edge Case Performance (1 marker, curve)≤ 20%Acceptable degradation for challenging single-marker curved surface scenarios.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

Stage 1: Reference Marker Detection Requirements

  • Implement a robust object detection model capable of detecting:
    • Standard calibration markers: Circular stickers, square patches, ruler segments with known dimensions
    • Multiple marker types: Support for various marker designs with documented physical sizes
    • Partial occlusion: Detect markers even when partially obscured
    • Variable placement: Markers at different positions and depths in the image
  • Output structured data including:
    • Bounding box coordinates for each detected marker
    • Marker type classification (to retrieve known physical size from database)
    • Detection confidence score
    • Estimated marker orientation (if applicable for non-circular markers)
  • Demonstrate:
    • Precision ≥ 0.95 and Recall ≥ 0.90 for marker detection
    • Localization accuracy ≤ 5px from true marker center
    • Robust detection across varying lighting, skin tones, and backgrounds

Stage 2: Local Calibration Requirements

  • Implement accurate marker size measurement:
    • Sub-pixel precision for marker dimension extraction
    • Orientation-corrected measurement for non-circular markers
    • Outlier rejection for malformed or damaged markers
  • Compute local calibration factors (cm/pixel) for each detected marker
  • Handle multiple markers:
    • Validate consistency across markers at similar depths
    • Flag inconsistent markers for quality control
    • Weight markers by detection confidence

Stage 3: Depth Map Estimation Requirements

  • Implement a monocular depth estimation model (e.g., MiDaS, DPT, or similar state-of-the-art architecture):
    • Outputs dense depth map at image resolution
    • Provides relative depth estimates (ordinal depth ranking)
    • Robust to skin texture, lesions, and varying body contours
  • Depth map post-processing:
    • Marker-based depth normalization: Calibrate depth map scale using detected markers
    • Smoothing: Edge-preserving filtering to reduce noise while maintaining anatomical boundaries
    • Confidence estimation: Per-pixel depth confidence scores
  • Demonstrate:
    • Relative depth error ≤ 15% compared to ground truth depth measurements
    • Consistent depth estimation across different body sites and skin conditions

Stage 4: Calibration Map Generation Requirements

  • Implement depth-aware interpolation to propagate calibration across the image:
    • Input: Local calibration factors from markers + depth map
    • Output: Dense pixel-to-cm conversion map α(x,y)\alpha(x, y)α(x,y) at full image resolution
    • Interpolation methods (implement one or more):
      • Inverse distance weighting in depth-space
      • Radial basis function interpolation
      • Learned interpolation network conditioned on depth
  • Handle edge cases:
    • Single marker: Use depth map to extrapolate with degraded accuracy warnings
    • Multiple markers at similar depth: Average calibration with depth-based weighting
    • Markers at very different depths: Depth-proportional calibration scaling
  • Quality control:
    • Consistency checks: Validate that calibration map is smooth and physically plausible
    • Outlier detection: Flag regions with unreliable depth or extreme calibration values
    • Confidence maps: Provide per-pixel confidence in calibration accuracy

Surface Area Calculation Requirements

  • Implement area integration:
    • Accept binary or multi-class segmentation mask as input
    • Apply calibration map to compute area in cm²
    • Account for pixel-level calibration variation
  • Output structured data including:
    • Total surface area (cm²) for each segmented region
    • Percentage of body site (when body site is identified)
    • Percentage of total body surface area (BSA) (using anatomical site weighting)
    • Measurement confidence score
    • Calibration quality indicators (number of markers used, depth variation, etc.)
  • Demonstrate:
    • Relative error ≤ 10% for planar surfaces
    • Relative error ≤ 15% for curved surfaces
    • Improved accuracy when multiple markers used vs. single marker

General Requirements

  • Validate the full pipeline on an independent and diverse test dataset including:
    • Various body sites: Face, trunk, extremities, scalp, hands, feet
    • Different surface curvatures: Flat (abdomen), moderate (forearm), high (joints, scalp)
    • Multiple marker configurations: 1, 2, 3, 4+ markers at varying depths
    • Various imaging conditions: Different lighting, camera angles, distances
    • Diverse patient populations: Various Fitzpatrick skin types, ages, body habitus
    • Different skin conditions: Healthy skin, lesions, wounds, burns
  • Handle quality control scenarios:
    • No markers detected: Flag error, require marker placement
    • Single marker with high curvature: Provide measurement with increased uncertainty
    • Inconsistent markers: Flag quality warning, exclude outlier markers
    • Poor depth estimation: Flag low-confidence regions in calibration map
  • Ensure outputs are compatible with:
    • PASI, EASI, VASI, burn assessment calculation systems
    • Body surface area (BSA) estimation using Rule of Nines or Lund-Browder charts
    • FHIR-based structured reporting for interoperability
    • Clinical decision support systems requiring quantitative surface area data
    • Research data collection for clinical trials
  • Provide interpretability features:
    • Visualization: Overlay of depth map, calibration map, and segmented regions
    • Measurement breakdown: Contribution of different regions to total area
    • Quality indicators: Number of markers, depth range, calibration consistency
    • Uncertainty quantification: Confidence intervals for area measurements
  • Document the pipeline architecture including:
    • Marker detection model architecture and training
    • Depth estimation model architecture (pre-trained or custom)
    • Interpolation algorithm design and validation
    • Integration strategy for multi-stage pipeline
    • Error propagation analysis across stages
  • Implement real-time or near-real-time processing:
    • Full pipeline execution < 5 seconds for typical images
    • Suitable for clinical workflow integration
    • Batch processing support for research applications
  • Provide evidence that:
    • The pipeline generalizes across different body sites and curvatures
    • Multiple markers improve accuracy compared to single-marker baseline
    • Depth-aware calibration outperforms simple 2D planimetry
    • Measurements correlate with gold-standard 3D scanning methods
    • The system is robust to typical clinical photography variations
    • Performance is equitable across different Fitzpatrick skin types

Clinical Impact:

The Surface Area Quantification pipeline serves critical functions:

  1. Accurate severity scoring: Enables precise BSA affected calculations for PASI, EASI, burn assessment, VASI
  2. Reproducible measurements: Reduces inter-observer variability in area estimation by 50-70%
  3. Depth-aware quantification: Accounts for body surface curvature, improving accuracy by 15-30% vs. 2D methods
  4. Flexible deployment: Works with 1-N markers, balancing accuracy and clinical convenience
  5. Telemedicine enablement: Provides calibrated measurements from patient-captured images with reference markers
  6. Clinical trial support: Standardizes surface area endpoints with objective, reproducible methodology
  7. Treatment monitoring: Enables accurate tracking of lesion size changes over time

Technical Details:

Reference Marker Specifications:

  • Recommended markers: Circular adhesive stickers (1cm, 2cm, 5cm diameter), square patches (1cm×1cm, 2cm×2cm)
  • Marker placement: Position markers on or adjacent to the region of interest, ideally at varying depths for curved surfaces
  • Marker database: Predefined catalog of approved marker types with documented physical dimensions

Depth Estimation Approach:

  • Monocular depth estimation: Uses single RGB image to estimate relative depth without requiring specialized hardware
  • Marker-based calibration: Detected markers provide absolute scale anchors for depth map normalization
  • Anatomical priors: Optional integration of body-site-specific depth priors for improved accuracy

Calibration Map Interpolation:

  • Depth-proportional scaling: Calibration factor scales inversely with depth (objects farther from camera appear smaller)
  • Multi-marker fusion: When multiple markers present, use depth-weighted interpolation to handle varying distances
  • Confidence weighting: Regions closer to markers receive higher confidence in calibration

Failure Modes and Limitations:

  • No markers detected: Cannot provide calibrated measurements, requires marker placement
  • Single marker on highly curved surface: Accuracy degrades to 15-20% error, uncertainty flagged
  • Extreme camera angles: Perspective distortion may exceed correction capabilities, quality warning issued
  • Poor depth estimation: Textureless regions or unusual body positions may yield unreliable depth, affecting calibration accuracy
  • Marker occlusion or damage: Partially obscured or damaged markers may be excluded or yield unreliable calibration

Note: This is a Non-Clinical model that performs technical surface area quantification to support downstream clinical severity scoring and BSA calculations. It does not make medical diagnoses or clinical assessments. The model provides quantitative measurements (area in cm², percentage of body site) that are used as inputs to clinical scoring systems (PASI, EASI, etc.) operated by healthcare professionals.

Body Site Identification​

Model Classification: 🛠️ Non-Clinical Model

Description​

A deep learning multi-class classification model ingests a clinical image and outputs a probability distribution across anatomical body site categories:

psite=[psite1,psite2,…,psiten]\mathbf{p}_{\text{site}} = [p_{\text{site}_1}, p_{\text{site}_2}, \ldots, p_{\text{site}_n}]psite​=[psite1​​,psite2​​,…,psiten​​]

where each pip_ipi​ corresponds to the probability that the image contains skin from anatomical site iii, and ∑pi=1\sum p_i = 1∑pi​=1.

The model classifies images into the following primary body site categories:

Head and Neck Region:

  • Face (forehead, cheeks, nose, chin, perioral area)
  • Scalp (hair-bearing regions of the head)
  • Ears
  • Neck (anterior, posterior, lateral)
  • Periorbital (eyelids, periocular region)

Upper Extremities:

  • Hands (palms, dorsal hands, fingers)
  • Wrists
  • Forearms (volar, dorsal)
  • Upper Arms
  • Axillae (armpits, intertriginous)

Trunk:

  • Chest (anterior trunk, presternal area)
  • Abdomen
  • Back (upper, middle, lower back)
  • Inframammary (under breast folds, intertriginous)

Lower Extremities:

  • Feet (soles, dorsal feet, toes)
  • Ankles
  • Lower Legs (shins, calves)
  • Thighs
  • Knees

Anogenital Region:

  • Inguinal/Groin (inguinal folds, intertriginous)
  • Gluteal (buttocks)
  • Perianal
  • Genital

Other Specialized Sites:

  • Nails (fingernails, toenails)
  • Mucosa (oral, labial)
  • Intertriginous (body folds not elsewhere classified)

The predicted body site is:

Body Site=arg⁡max⁡kpk\text{Body Site} = \arg\max_{k} p_kBody Site=argkmax​pk​

Additionally, the model outputs a continuous confidence score and may provide secondary site probabilities for images showing multiple body regions or ambiguous anatomical boundaries.

Objectives​

  • Enable anatomical context awareness for downstream clinical AI models that may have site-specific performance characteristics or require body site information for clinical interpretation.
  • Support body surface area (BSA) calculations by identifying anatomical regions for standardized BSA estimation in severity scoring systems (PASI, EASI, burn assessment).
  • Facilitate disease-specific analysis by routing images to body site-appropriate clinical models (e.g., palmoplantar-specific psoriasis assessment, facial acne analysis).
  • Improve clinical documentation by automatically annotating images with anatomical location for structured medical records.
  • Enable epidemiological analysis by tracking disease distribution across body sites for research and surveillance purposes.
  • Support treatment planning by providing anatomical context relevant for therapy selection (e.g., facial vs. truncal treatments differ in formulation and potency).
  • Enhance quality control by detecting anatomically inappropriate images for specific clinical workflows.

Justification (Clinical Evidence):

  • Body site location is a critical clinical variable influencing disease presentation, differential diagnosis, treatment selection, and prognosis across dermatological conditions [262, 263].
  • Manual anatomical site annotation is time-consuming and inconsistent, with variability particularly evident for boundary regions (e.g., wrist vs. forearm, neck vs. chest) [264].
  • Automated body site identification has demonstrated high accuracy (>85%) in multi-class classification tasks across diverse dermatological imaging datasets [265, 266].
  • Disease prevalence, morphology, and treatment response vary significantly by anatomical site:
    • Psoriasis: Scalp, elbows, knees show different treatment responses than intertriginous areas [267]
    • Acne: Facial acne requires different therapeutic approaches than truncal acne [268]
    • Hidradenitis suppurativa: Predominantly affects axillary, inguinal, and perianal regions [269]
    • Melanoma: Sun-exposed sites (face, arms) have different risk profiles than trunk or acral sites [270]
  • Body site-specific AI models show 10-20% accuracy improvement compared to site-agnostic models for certain conditions [271].
  • Accurate body site identification enables automated BSA calculation for severity indices (PASI, EASI), where site-specific weighting is required (e.g., head = 10%, trunk = 30%, upper extremities = 20%, lower extremities = 40%) [272].
  • Treatment guidelines often specify site-specific recommendations for corticosteroid potency, formulation selection, and therapy duration [273].
  • Clinical trials require body site documentation for subgroup analyses and to ensure representative distribution of lesions [274].

Endpoints and Requirements​

Performance is evaluated using classification accuracy, weighted kappa, and per-class metrics compared to expert anatomical site annotations.

MetricThresholdInterpretation
Overall Accuracy≥ 85%High accuracy required for reliable anatomical context in clinical workflows.
Weighted Kappa (κw)≥ 0.80Strong agreement with expert anatomical site classification.
Macro F1-Score≥ 0.80Balanced performance across all body site categories.
Class-specific F1 (per site)≥ 0.75Minimum acceptable F1 for each body site category.
Region-level Accuracy≥ 90%Correct broad anatomical region (head/neck, trunk, upper/lower extremity, etc.).
Adjacent Site Tolerance≥ 95%Within anatomically adjacent site of expert assessment (e.g., wrist vs. hand).
Confidence Calibration (ECE)≤ 0.10Confidence scores accurately reflect true classification probability.
Top-3 Accuracy≥ 95%Correct site within top-3 predictions (useful for ambiguous boundaries).

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement a deep learning classification architecture (e.g., CNN, Vision Transformer, or hybrid) optimized for anatomical feature recognition.
  • Output structured data including:
    • Probability distribution across all body site categories
    • Predicted primary body site with confidence score
    • Secondary site probabilities for images showing multiple regions
    • Broad anatomical region (head/neck, upper extremity, trunk, lower extremity, anogenital)
    • Intertriginous flag indicating body fold regions requiring special consideration
    • Confidence indicators for ambiguous or boundary cases
  • Demonstrate performance meeting or exceeding all thresholds:
    • Overall Accuracy ≥ 85% and Weighted Kappa ≥ 0.80
    • Macro F1 ≥ 0.80 and class-specific F1 ≥ 0.75 for all sites
    • Region-level Accuracy ≥ 90%
    • Confidence calibration error ≤ 0.10
  • Report all metrics with 95% confidence intervals and confusion matrices showing prediction patterns.
  • Validate the model on an independent and diverse test dataset including:
    • Balanced representation across all body site categories
    • Multiple imaging perspectives (frontal, lateral, oblique views)
    • Various skin conditions (healthy, inflammatory, neoplastic, pigmented) across all sites
    • Diverse patient populations (various ages, genders, body habitus, Fitzpatrick skin types)
    • Different imaging contexts (clinical photography, patient self-captured, telemedicine)
    • Challenging boundary cases (wrist/hand, ankle/foot, neck/chest transitions)
    • Partially visible anatomy where full body site context is limited
  • Handle specialized anatomical features:
    • Intertriginous regions: Axillae, inframammary, inguinal folds, perianal, interdigital
    • Acral sites: Palms, soles, nail apparatus
    • Flexural surfaces: Antecubital fossae, popliteal fossae
    • Mucosal surfaces: Oral mucosa, labial regions
    • Hair-bearing regions: Scalp, beard area, body hair distribution
  • Ensure outputs are compatible with:
    • Body surface area (BSA) calculation algorithms for severity scoring (PASI, EASI, burn assessment)
    • Site-specific clinical models requiring anatomical routing
    • FHIR-based structured reporting with standardized anatomical site codes (SNOMED CT, ICD-11)
    • Clinical decision support systems providing site-specific treatment recommendations
    • Medical record systems for automated anatomical documentation
    • Epidemiological databases for disease surveillance and research
  • Provide hierarchical classification:
    • Broad region (e.g., "Upper Extremity")
    • Intermediate region (e.g., "Hand/Wrist")
    • Specific site (e.g., "Dorsal Hand")
    • Enables flexible downstream usage depending on required anatomical granularity
  • Document the training strategy including:
    • Multi-expert annotation protocol for anatomical ground truth
    • Handling of images showing multiple body sites simultaneously
    • Data augmentation strategies preserving anatomical context
    • Class balancing approach for underrepresented sites (e.g., mucosa, genital)
    • Transfer learning from anatomical recognition tasks
  • Implement real-time processing capabilities:
    • Inference time < 300ms for immediate anatomical routing
    • Lightweight architecture suitable for mobile/edge deployment
    • Batch processing for archive anatomical annotation
  • Provide interpretability features:
    • Saliency maps highlighting anatomical features supporting site classification
    • Anatomical landmarks detected (e.g., nipples for chest, umbilicus for abdomen)
    • Confidence thresholds for automatic vs. manual site annotation
    • Multi-site flags when multiple body regions are visible
  • Handle challenging scenarios:
    • Ambiguous boundaries: Wrist/hand, ankle/foot, neck/shoulder transitions
    • Close-ups: Limited anatomical context (e.g., extreme close-up of skin without landmarks)
    • Atypical perspectives: Unusual angles or cropping
    • Body site variants: Accounting for anatomical variation (e.g., high vs. low hairline)
    • Bilateral symmetry: Left vs. right distinction when relevant
  • Provide evidence that:
    • The model generalizes across different imaging devices and conditions
    • Performance is maintained across diverse patient populations and body habitus
    • Anatomical classification is robust to skin conditions and lesions
    • Site predictions align with expert dermatologist consensus
    • The model handles partial anatomy and cropped images appropriately
    • Multi-site detection works when multiple body regions are present
  • Include failure mode analysis:
    • Performance on extreme close-ups lacking anatomical landmarks
    • Handling of atypical anatomy or surgical alterations
    • Behavior with images showing non-standard positioning
    • Confidence scoring for ambiguous boundary cases
    • Documentation of body sites requiring clinical examination context
  • Establish clinical validation protocol:
    • Prospective validation with expert dermatologist site annotation
    • Inter-rater reliability comparison for boundary cases
    • Clinical utility assessment in automated BSA calculation accuracy
    • Integration testing with site-specific clinical models

Clinical Impact:

The Body Site Identification model serves multiple critical functions:

  1. Automated documentation: Eliminates manual anatomical site entry in medical records
  2. BSA calculation support: Enables accurate body surface area estimation for severity indices
  3. Site-specific routing: Directs images to optimal body site-specific AI models
  4. Treatment personalization: Supports site-appropriate therapy recommendations
  5. Quality assurance: Validates anatomical appropriateness for specific clinical workflows
  6. Research enablement: Facilitates epidemiological analysis and clinical trial stratification
  7. Workflow optimization: Reduces clinician time spent on anatomical annotation

Body Site Categories (Detailed):

The model classifies images into the following hierarchical body site structure:

1. Head and Neck (10% BSA)

  • Face (subdivided: forehead, temple, cheek, nose, chin, perioral)
  • Scalp (anterior, vertex, posterior)
  • Ears (auricle, retroauricular)
  • Neck (anterior, posterior, lateral)
  • Periorbital (eyelids, periocular - requires specialized handling)

2. Upper Extremities (20% BSA total: 9% each arm + 2% hands)

  • Shoulders
  • Upper arms (anterior, posterior)
  • Elbows (antecubital, posterior)
  • Forearms (volar, dorsal)
  • Wrists (volar, dorsal)
  • Hands (palms, dorsal, fingers)
  • Axillae (intertriginous)

3. Trunk (30% BSA: anterior 18% + posterior 18%)

  • Chest/Anterior Trunk (presternal, lateral chest)
  • Abdomen (upper, lower, periumbilical)
  • Back (upper, middle, lower back)
  • Inframammary (under breast folds - intertriginous)

4. Lower Extremities (40% BSA total: 18% each leg + 4% feet)

  • Buttocks/Gluteal
  • Thighs (anterior, posterior, medial, lateral)
  • Knees (anterior, posterior/popliteal)
  • Lower legs (shins/anterior, calves/posterior)
  • Ankles
  • Feet (soles/plantar, dorsal, toes)
  • Inguinal (groin folds - intertriginous)

5. Anogenital Region

  • Inguinal/groin (intertriginous folds)
  • Perianal
  • Genital (external genitalia)

6. Specialized Sites

  • Nails (fingernails, toenails - requires specialized assessment)
  • Mucosa (oral, labial)
  • Intertriginous (generalized body folds classification)

Note: This is a Non-Clinical model that provides anatomical site classification to support downstream clinical models, BSA calculations, and clinical documentation. It does not make medical diagnoses or clinical assessments. The model serves as an auxiliary tool providing anatomical context that enhances the accuracy and clinical relevance of other AI models and clinical workflows.

Data Specifications​

The development of the algorithms requires the collection and annotation of dermatological images.

We defined three types of data to collect:

  • Clinical Data: data with the diversity to be found in a hospital dermatology department (in terms of patients, demographics, skin tones, anatomical locations, and clinical indications).
  • Atlas Data: data from online atlases or reference image repositories that provide a broader variability of cases and rare conditions, which might not be commonly encountered in everyday clinical practice but are necessary to strengthen the robustness of the algorithms.
  • Evaluation Data: data specifically intended to enable unbiased training, validation, and evaluation of the algorithms.

To answer these specifications, three complementary data collections will be performed:

  • Retrospective Data: data already available from dermatological atlases, hospital databases, or other private sources. These datasets include a wide variety of conditions, including rare diseases, and will be used to enhance diversity and improve training robustness.
  • Prospective Data: data collected prospectively from hospital dermatology departments during routine clinical care. These images will ensure the dataset reflects real-world usage, patient demographics, and skin types, thereby supporting training, validation, and evaluation of the algorithms.
  • Evaluation Data (Hold-out Sets): data specifically sequestered for independent testing and validation, ensuring unbiased performance assessment of the algorithms.

The collected data should reflect the intended population in terms of demographics, skin tones, anatomical regions, and dermatological parameters. A description of the population represented in the collected datasets will be presented in the R-TF-028-005 AI/ML Development Report.

Regarding annotation, multiple types of expert labeling will be performed depending on the model requirements which will be detailed in R-TF-028-004. Annotation will be performed exclusively by dermatologists, with adjudication steps to ensure consistency.

Methods to ensure data quality (both in collection and annotation), the sequestration of datasets, and the determination of ground truth will be implemented and documented.

The goal is to obtain data characterized by:

  • Scale: [NUMBER OF IMAGES] dermatological images [cite: 51–53].
  • Diversity: Representation of multiple skin tones, demographics, clinical contexts, and lesion types [cite: 54].
  • Annotation: Expert dermatologists only, with inter-rater agreement checks [cite: 9, 10].
  • Separation: Training, validation, and test sets with strict hold-out policies [cite: 68].

Requirements:

  • Perform 1 retrospective and 2 prospective data collections.
  • Provide evidence that collected data are representative of the intended population.
  • Ensure complete independence of the test set from training/tuning datasets.
  • Guarantee reproducible, consistent, and high-quality ground truth determination.
  • Maintain data traceability, standardized labeling protocols, and robust quality control.

Other Specifications​

Development Environment:

  • Fixed hardware/software stack for training and evaluation.
  • Deployment conversion validated by prediction equivalence testing.

Requirements:

  • Track software versions (TensorFlow, NumPy, etc.).
  • Verify equivalence between development and deployed model outputs.

Cybersecurity and Transparency​

  • Data: Always de-identified/pseudonymized [cite: 9].
  • Access: Research server restricted to authorized staff only.
  • Traceability: Development Report to include data management, model training, evaluation methods, and results.
  • Explainability: Logs, saliency maps, and learning curves to support monitoring.
  • User Documentation: Must state algorithm purpose, inputs/outputs, limitations, and that AI/ML is used.

Requirements:

  • Secure and segregate research data.
  • Provide full traceability of data and algorithms.
  • Communicate limitations clearly to end-users.

Specifications and Risks​

Risks linked to specifications are recorded in the AI/ML Risk Matrix (R-TF-028-011).

Key Risks:

  • Misinterpretation of outputs.
  • Incorrect diagnosis suggestions.
  • Data bias or mislabeled ground truth.
  • Model drift over time.
  • Input image variability (lighting, resolution).

Risk Mitigations:

  • Rigorous pre-market validation.
  • Continuous monitoring and retraining.
  • Controlled input requirements.
  • Clear clinical instructions for use.

Integration and Environment​

Integration​

Algorithms will be packaged for integration into Legit.Health Plus to support healthcare professionals [cite: 20, 22, 25, 40].

Environment​

  • Inputs: Clinical and dermoscopic images [cite: 26].
  • Robustness: Must handle variability in acquisition [cite: 8].
  • Compatibility: Package size and computational load must align with target device hardware/software.

References​

  1. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056

  2. Liu Y, Jain A, Eng C, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020;26(6):900-908. doi:10.1038/s41591-020-0842-3

  3. Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J Invest Dermatol. 2018;138(7):1529-1538. doi:10.1016/j.jid.2018.01.028

  4. Haenssle HA, Fink C, Schneiderbauer R, et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann Oncol. 2018;29(8):1836-1842. doi:10.1093/annonc/mdy166

  5. Brinker TJ, Hekler A, Enk AH, et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur J Cancer. 2019;113:47-54. doi:10.1016/j.ejca.2019.04.001

  6. Tschandl P, Codella N, Akay BN, et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 2019;20(7):938-947. doi:10.1016/S1470-2045(19)30333-X

  7. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int J Comput Vis. 2020;128(2):336-359. doi:10.1007/s11263-019-01228-7

  8. Janda M, Horsham C, Vagenas D, et al. Accuracy of mobile digital teledermoscopy for skin self-examinations in adults at high risk of skin cancer: an open-label, randomised controlled trial. Lancet Digit Health. 2020;2(3):e129-e137. doi:10.1016/S2589-7500(20)30001-7

  9. Han SS, Park I, Chang SE, et al. Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J Invest Dermatol. 2020;140(9):1753-1761. doi:10.1016/j.jid.2020.01.019

  10. Rajpara SM, Botello AP, Townend J, Ormerod AD. Systematic review of dermoscopy and digital dermoscopy/artificial intelligence for the diagnosis of melanoma. Br J Dermatol. 2009;161(3):591-604. doi:10.1111/j.1365-2133.2009.09093.x

  11. Maron RC, Weichenthal M, Utikal JS, et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur J Cancer. 2019;119:57-65. doi:10.1016/j.ejca.2019.06.013

  12. Tognetti L, Bonechi S, Andreini P, et al. A new deep learning approach integrated with clinical data for the dermoscopic differentiation of early melanomas from atypical nevi. J Dermatol Sci. 2021;101(2):115-122. doi:10.1016/j.jdermsci.2020.11.009

  13. Ferrante di Ruffano L, Dinnes J, Deeks JJ, et al. Optical coherence tomography for diagnosing skin cancer in adults. Cochrane Database Syst Rev. 2018;12(12):CD013189. doi:10.1002/14651858.CD013189

  14. Dinnes J, Deeks JJ, Chuchu N, et al. Dermoscopy, with and without visual inspection, for diagnosing melanoma in adults. Cochrane Database Syst Rev. 2018;12(12):CD011902. doi:10.1002/14651858.CD011902.pub2

  15. Phillips M, Marsden H, Jaffe W, et al. Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Netw Open. 2019;2(10):e1913436. doi:10.1001/jamanetworkopen.2019.13436

  16. National Institute for Health and Care Excellence (NICE). Suspected cancer: recognition and referral [NG12]. London: NICE; 2015. Updated 2021. Available from: https://www.nice.org.uk/guidance/ng12

  17. Garbe C, Amaral T, Peris K, et al. European consensus-based interdisciplinary guideline for melanoma. Part 1: Diagnostics - Update 2019. Eur J Cancer. 2020;126:141-158. doi:10.1016/j.ejca.2019.11.014

  18. Walter FM, Morris HC, Humphrys E, et al. Effect of adding a diagnostic aid to best practice to manage suspicious pigmented lesions in primary care: randomised controlled trial. BMJ. 2012;345:e4110. doi:10.1136/bmj.e4110

  19. Curchin DJ, Harris VR, McCormack CJ, et al. Early detection of melanoma: a consensus report from the Australian Skin and Skin Cancer Research Centre Melanoma Screening Summit. Aust J Gen Pract. 2022;51(1-2):9-14. doi:10.31128/AJGP-06-21-6016

  20. Warshaw EM, Gravely AA, Nelson DB. Reliability of physical examination versus lesion photography in assessing melanocytic skin lesion morphology. J Am Acad Dermatol. 2010;63(4):e81-e87. doi:10.1016/j.jaad.2009.11.030

  21. Tan J, Liu H, Leyden JJ, Leoni MJ. Reliability of clinician erythema assessment grading scale. J Am Acad Dermatol. 2014;71(4):760-763. doi:10.1016/j.jaad.2014.05.044

  22. Lee JH, Kim YJ, Kim J, et al. Erythema detection in digital skin images using CNN. Skin Res Technol. 2021;27(3):295-301. doi:10.1111/srt.12938

  23. Cho SB, Lee SJ, Chung WS, et al. Automated erythema detection and quantification in rosacea using deep learning. J Eur Acad Dermatol Venereol. 2021;35(4):965-972. doi:10.1111/jdv.17000

  24. Kim YJ, Park SH, Lee JH, et al. Automated erythema assessment using deep learning for sunscreen efficacy testing. Photodermatol Photoimmunol Photomed. 2023;39(2):135-142. doi:10.1111/phpp.12825

  25. Fredriksson T, Pettersson U. Severe psoriasis--oral therapy with a new retinoid. Dermatologica. 1978;157(4):238-244. doi:10.1159/000250839

  26. Langley RGB, Krueger GG, Griffiths CEM. Psoriasis: epidemiology, clinical features, and quality of life. Ann Rheum Dis. 2005;64(Suppl 2):ii18-ii23. doi:10.1136/ard.2004.033217

  27. Puzenat E, Bronsard V, Prey S, et al. What are the best outcome measures for assessing plaque psoriasis severity? A systematic review of the literature. J Eur Acad Dermatol Venereol. 2010;24(Suppl 2):10-16. doi:10.1111/j.1468-3083.2009.03562.x

  28. Noble WC, Somerville DA. Microbiology of Human Skin. 2nd ed. London: WB Saunders; 1974.

  29. Schmid-Wendtner MH, Korting HC. The pH of the skin surface and its impact on the barrier function. Skin Pharmacol Physiol. 2006;19(6):296-302. doi:10.1159/000094670

  30. Humbert P, Fanian F, Maibach HI, Agache P. Agache's Measuring the Skin. 2nd ed. Cham: Springer; 2017. doi:10.1007/978-3-319-32383-1

  31. Shen X, Zhang J, Yan C, Zhou H. An automatic diagnosis method of facial acne vulgaris based on convolutional neural network. Sci Rep. 2018;8(1):5839. doi:10.1038/s41598-018-24204-6

  32. Seité S, Khammari A, Benzaquen M, et al. Development and accuracy of an artificial intelligence algorithm for acne grading from smartphone photographs. Exp Dermatol. 2019;28(11):1252-1257. doi:10.1111/exd.14022

  33. Wu X, Wen N, Liang J, et al. Joint acne image grading and counting via label distribution learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:10642-10651. doi:10.1109/ICCV.2019.01074

  34. Kimball AB, Kerdel F, Adams D, et al. Adalimumab for the treatment of moderate to severe hidradenitis suppurativa: a parallel randomized trial. Ann Intern Med. 2012;157(12):846-855. doi:10.7326/0003-4819-157-12-201212180-00004

  35. Olsen EA, Hordinsky MK, Price VH, et al. Alopecia areata investigational assessment guidelines--Part II. National Alopecia Areata Foundation. J Am Acad Dermatol. 2004;51(3):440-447. doi:10.1016/j.jaad.2003.09.032

  36. Lee Y, Lee SH, Kim YH, et al. Hair loss quantification from standardized scalp photographs using deep learning. J Invest Dermatol. 2022;142(6):1636-1643. doi:10.1016/j.jid.2021.10.031

  37. Messenger AG, McKillop J, Farrant P, McDonagh AJ, Sladden M. British Association of Dermatologists' guidelines for the management of alopecia areata 2012. Br J Dermatol. 2012;166(5):916-926. doi:10.1111/j.1365-2133.2012.10955.x

  38. Gupta AK, Mays RR, Dotzert MS, et al. Efficacy of non-surgical treatments for androgenetic alopecia: a systematic review and network meta-analysis. J Eur Acad Dermatol Venereol. 2018;32(12):2112-2125. doi:10.1111/jdv.15081

  39. Gardner SE, Frantz RA. Wound bioburden and infection-related complications in diabetic foot ulcers. Biol Res Nurs. 2008;10(1):44-53. doi:10.1177/1099800408319056

  40. Cutting KF, White RJ. Criteria for identifying wound infection--revisited. Ostomy Wound Manage. 2005;51(1):28-34.

  41. Rahma ON, Iyer R, Kattapuram T, et al. Objective assessment of perilesional erythema of chronic wounds using digital color image processing. Adv Skin Wound Care. 2015;28(1):11-16. doi:10.1097/01.ASW.0000459039.98700.74

  42. Wannous H, Treuillet S, Lucas Y. Robust tissue classification for reproducible wound assessment in telemedicine environments. J Electron Imaging. 2010;19(2):023002. doi:10.1117/1.3432622

  43. Falanga V. Wound bed preparation and the role of enzymes: a case for multiple actions of therapeutic agents. Wounds. 2002;14(2):47-57.

  44. Lazarus GS, Cooper DM, Knighton DR, et al. Definitions and guidelines for assessment of wounds and evaluation of healing. Arch Dermatol. 1994;130(4):489-493.

  45. Sibbald RG, Woo K, Ayello EA. Increased bacterial burden and infection: the story of NERDS and STONES. Adv Skin Wound Care. 2006;19(8):447-461. doi:10.1097/00129334-200610000-00012

  46. Wolcott RD, Rhoads DD, Dowd SE. Biofilms and chronic wound inflammation. J Wound Care. 2008;17(8):333-341. doi:10.12968/jowc.2008.17.8.30796

  47. Schultz GS, Sibbald RG, Falanga V, et al. Wound bed preparation: a systematic approach to wound management. Wound Repair Regen. 2003;11(Suppl 1):S1-S28. doi:10.1046/j.1524-475x.11.s2.1.x

  48. Cutting KF. Wound exudate: composition and functions. Br J Community Nurs. 2003;8(9):S4-S9. doi:10.12968/bjcn.2003.8.Sup3.11577

  49. Stephen-Haynes J. Assessment and management of peri-wound skin. Br J Community Nurs. 2012;17(Sup3):S28-S35. doi:10.12968/bjcn.2012.17.Sup3.S28

  50. James GA, Swogger E, Wolcott R, et al. Biofilms in chronic wounds. Wound Repair Regen. 2008;16(1):37-44. doi:10.1111/j.1524-475X.2007.00321.x

  51. National Pressure Ulcer Advisory Panel. NPUAP Pressure Injury Stages. 2016. Available from: https://npiap.com/page/PressureInjuryStages

  52. Black JM, Cuddigan JE, Walko MA, Didier LA, Lander MJ, Kelpe MR. Medical device related pressure ulcers in hospitalized patients. Int Wound J. 2010;7(5):358-365. doi:10.1111/j.1742-481X.2010.00699.x

  53. Flanagan M. Wound Healing and Skin Integrity: Principles and Practice. Oxford: Wiley-Blackwell; 2013.

  54. Bowler PG, Duerden BI, Armstrong DG. Wound microbiology and associated approaches to wound management. Clin Microbiol Rev. 2001;14(2):244-269. doi:10.1128/CMR.14.2.244-269.2001

  55. Hoiby N, Bjarnsholt T, Givskov M, Molin S, Ciofu O. Antibiotic resistance of bacterial biofilms. Int J Antimicrob Agents. 2010;35(4):322-332. doi:10.1016/j.ijantimicag.2009.12.011

  56. Guo S, Dipietro LA. Factors affecting wound healing. J Dent Res. 2010;89(3):219-229. doi:10.1177/0022034509359125

  57. Edwards R, Harding KG. Bacteria and wound healing. Curr Opin Infect Dis. 2004;17(2):91-96. doi:10.1097/00001432-200404000-00004

  58. Attinger CE, Janis JE, Steinberg J, Schwartz J, Al-Attar A, Couch K. Clinical approach to wounds: debridement and wound bed preparation including the use of dressings and wound-healing adjuvants. Plast Reconstr Surg. 2006;117(7 Suppl):72S-109S. doi:10.1097/01.prs.0000225470.42514.8f

  59. Menke NB, Ward KR, Witten TM, Bonchev DG, Diegelmann RF. Impaired wound healing. Clin Dermatol. 2007;25(1):19-25. doi:10.1016/j.clindermatol.2006.12.005

  60. Martin P. Wound healing--aiming for perfect skin regeneration. Science. 1997;276(5309):75-81. doi:10.1126/science.276.5309.75

  61. Keast DH, Bowering CK, Evans AW, Mackean GL, Burrows C, D'Souza L. MEASURE: A proposed assessment framework for developing best practice recommendations for wound assessment. Wound Repair Regen. 2004;12(3 Suppl):S1-S17. doi:10.1111/j.1067-1927.2004.0123S1.x

  62. Kottner J, Dassen T, Tannen A. Inter- and intrarater reliability of the Waterlow pressure sore risk scale: a systematic review. Int J Nurs Stud. 2009;46(3):369-379. doi:10.1016/j.ijnurstu.2008.09.010

  63. European Pressure Ulcer Advisory Panel, National Pressure Injury Advisory Panel, Pan Pacific Pressure Injury Alliance. Prevention and Treatment of Pressure Ulcers/Injuries: Clinical Practice Guideline. 3rd ed. 2019. Available from: https://internationalguideline.com

  64. Falanga V, Saap LJ, Ozonoff A. Wound bed score and its correlation with healing of chronic wounds. Dermatol Ther. 2006;19(6):383-390. doi:10.1111/j.1529-8019.2006.00096.x

  65. Houghton PE, Kincaid CB, Lovell M, et al. Effect of electrical stimulation on chronic leg ulcer size and appearance. Phys Ther. 2003;83(1):17-28.

  66. Gelfand JM, Hoffstad O, Margolis DJ. Surrogate endpoints for the treatment of venous leg ulcers. J Invest Dermatol. 2002;119(6):1420-1425. doi:10.1046/j.1523-1747.2002.19629.x

  67. Bowler PG. The 10(5) bacterial growth guideline: reassessing its clinical relevance in wound healing. Ostomy Wound Manage. 2003;49(1):44-53.

  68. Goyal A, Sharma A, Garg MK, Chatterjee P, Kamboj P. Artificial intelligence-based automated wound tissue detection using convolutional neural network. J Digit Imaging. 2023;36(2):881-894. doi:10.1007/s10278-022-00745-z

  69. Téot L, Boissiere F, Fluieraru S. Novel foam dressing using negative pressure wound therapy with instillation to remove thick exudate. Int Wound J. 2017;14(5):842-848. doi:10.1111/iwj.12719

  70. Langemo D, Anderson J, Hanson D, Hunter S, Thompson P. Measuring wound length, width, and area: which technique? Adv Skin Wound Care. 2008;21(1):42-45. doi:10.1097/01.ASW.0000305456.26429.65

  71. Chang AC, Dearman B, Greenwood JE. A comparison of wound area measurement techniques: visitrak versus photography. Eplasty. 2011;11:e18.

  72. Cardinal M, Eisenbud DE, Armstrong DG, et al. Serial surgical debridement: a retrospective study on clinical outcomes in chronic lower extremity wounds. Wound Repair Regen. 2009;17(3):306-311. doi:10.1111/j.1524-475X.2009.00485.x

  73. Kantor J, Margolis DJ. A multicentre study of percentage change in venous leg ulcer area as a prognostic index of healing at 24 weeks. Br J Dermatol. 2000;142(5):960-964. doi:10.1046/j.1365-2133.2000.03478.x

  74. Goldman R. Growth factors and chronic wound healing: past, present, and future. Adv Skin Wound Care. 2004;17(1):24-35. doi:10.1097/00129334-200401000-00012

  75. Ferris AH, Leung HJ, Hitos K, Cleland H. Comparison of alginate with electrostatic hydrogel dressings for healing of donor sites: a randomized controlled trial. Eplasty. 2019;19:e13.

  76. Tallman P, Muscare E, Carson P, Eaglstein WH, Falanga V. Initial rate of healing predicts complete healing of venous ulcers. Arch Dermatol. 1997;133(10):1231-1234.

  77. Margolis DJ, Allen-Taylor L, Hoffstad O, Berlin JA. Diabetic neuropathic foot ulcers: predicting which ones will not heal. Am J Med. 2003;115(8):627-631. doi:10.1016/j.amjmed.2003.06.006

  78. Wolcott RD, Kennedy JP, Dowd SE. Regular debridement is the main tool for maintaining a healthy wound bed in most chronic wounds. J Wound Care. 2009;18(2):54-56. doi:10.12968/jowc.2009.18.2.38743

  79. Steed DL, Donohoe D, Webster MW, Lindsley L. Effect of extensive debridement and treatment on the healing of diabetic foot ulcers. Diabetic Ulcer Study Group. J Am Coll Surg. 1996;183(1):61-64.

  80. Golinko MS, Joffe R, de Vinck D, et al. Surgical pathology to identify wound bed barriers to healing. Wound Repair Regen. 2009;17(1):20-26. doi:10.1111/j.1524-475X.2008.00436.x

  81. Lavery LA, Armstrong DG, Murdoch DP, Peters EJ, Lipsky BA. Validation of the Infectious Diseases Society of America's diabetic foot infection classification system. Clin Infect Dis. 2007;44(4):562-565. doi:10.1086/511036

  82. Saap LJ, Falanga V. Debridement performance index and its correlation with complete closure of diabetic foot ulcers. Wound Repair Regen. 2002;10(6):354-359. doi:10.1046/j.1524-475x.2002.10604.x

  83. Sumpio BE, Armstrong DG, Lavery LA, Andros G. The role of interdisciplinary team approach in the management of the diabetic foot: a joint statement from the Society for Vascular Surgery and the American Podiatric Medical Association. J Vasc Surg. 2010;51(6):1504-1506. doi:10.1016/j.jvs.2010.02.255

  84. Stephen-Haynes J, Thompson G. The different methods of wound debridement. Br J Community Nurs. 2007;12(Sup3):S6-S16. doi:10.12968/bjcn.2007.12.Sup3.23742

  85. White RJ, Cutting KF. Modern exudate management: a review of wound treatments. World Wide Wounds. 2006. Available from: http://www.worldwidewounds.com/2006/september/White/Modern-Exudate-Mgt.html

  86. Gray D, White RJ, Cooper P, Kingsley A. Applied wound management and using the wound infection continuum to help select appropriate interventions. Wounds UK. 2010;6(4):61-68.

  87. Worlock P, Slack R, Harvey L, Mawhinney R. The prevention of infection in open fractures: an experimental study of the effect of fracture stability. Injury. 1994;25(1):31-38. doi:10.1016/0020-1383(94)90180-5

  88. Patzakis MJ, Wilkins J. Factors influencing infection rate in open fracture wounds. Clin Orthop Relat Res. 1989;(243):36-40.

  89. Dellinger EP, Miller SD, Wertz MJ, Grypma M, Droppert B, Anderson PA. Risk of infection after open fracture of the arm or leg. Arch Surg. 1988;123(11):1320-1327. doi:10.1001/archsurg.1988.01400350034003

  90. Gustilo RB, Anderson JT. Prevention of infection in the treatment of one thousand and twenty-five open fractures of long bones: retrospective and prospective analyses. J Bone Joint Surg Am. 1976;58(4):453-458.

  91. Lipsky BA, Berendt AR, Cornia PB, et al. 2012 Infectious Diseases Society of America clinical practice guideline for the diagnosis and treatment of diabetic foot infections. Clin Infect Dis. 2012;54(12):e132-e173. doi:10.1093/cid/cis346

  92. Jeffcoate WJ, Bus SA, Game FL, Hinchliffe RJ, Price PE, Schaper NC. Reporting standards of studies and papers on the prevention and management of foot ulcers in diabetes: required details and markers of good quality. Lancet Diabetes Endocrinol. 2016;4(9):781-788. doi:10.1016/S2213-8587(16)30012-2

  93. Senneville EM, Lipsky BA, van Asten SAV, et al. Diagnosing diabetic foot osteomyelitis. Diabetes Metab Res Rev. 2020;36(Suppl 1):e3250. doi:10.1002/dmrr.3250

  94. Prompers L, Huijberts M, Apelqvist J, et al. High prevalence of ischaemia, infection and serious comorbidity in patients with diabetic foot disease in Europe. Baseline results from the Eurodiale study. Diabetologia. 2007;50(1):18-25. doi:10.1007/s00125-006-0491-1

  95. Leshem YA, Hajar T, Hanifin JM, Simpson EL. What the Eczema Area and Severity Index score tells us about the severity of atopic dermatitis: an interpretability study. Br J Dermatol. 2015;172(5):1353-1357. doi:10.1111/bjd.13662

  96. Severity scoring of atopic dermatitis: the SCORAD index. Consensus Report of the European Task Force on Atopic Dermatitis. Dermatology. 1993;186(1):23-31. doi:10.1159/000247298

  97. Charman CR, Venn AJ, Williams HC. The patient-oriented eczema measure: development and initial validation of a new tool for measuring atopic eczema severity from the patients' perspective. Arch Dermatol. 2004;140(12):1513-1519. doi:10.1001/archderm.140.12.1513

  98. Han SS, Park GH, Lim W, et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS One. 2018;13(1):e0191493. doi:10.1371/journal.pone.0191493

  99. Esteva A, Chou K, Yeung S, et al. Deep learning-enabled medical computer vision. NPJ Digit Med. 2021;4(1):5. doi:10.1038/s41746-020-00376-2

  100. Thomsen K, Iversen L, Titlestad TL, Winther O. Systematic review of machine learning for diagnosis and prognosis in dermatology. J Dermatolog Treat. 2020;31(5):496-510. doi:10.1080/09546634.2019.1682500

  101. Zuberbier T, Aberer W, Asero R, et al. The EAACI/GA²LEN/EDF/WAO guideline for the definition, classification, diagnosis and management of urticaria. Allergy. 2018;73(7):1393-1414. doi:10.1111/all.13397

  102. Młynek A, Zalewska-Janowska A, Martus P, Staubach P, Zuberbier T, Maurer M. How to assess disease activity in patients with chronic urticaria? Allergy. 2008;63(6):777-780. doi:10.1111/j.1398-9995.2008.01726.x

  103. Fitzpatrick TB. The validity and practicality of sun-reactive skin types I through VI. Arch Dermatol. 1988;124(6):869-871. doi:10.1001/archderm.124.6.869

  104. Eilers S, Bach DQ, Gaber R, et al. Accuracy of self-report in assessing Fitzpatrick skin phototypes I through VI. JAMA Dermatol. 2013;149(11):1289-1294. doi:10.1001/jamadermatol.2013.6101

  105. Sachdeva S. Fitzpatrick skin typing: applications in dermatology. Indian J Dermatol Venereol Leprol. 2009;75(1):93-96. doi:10.4103/0378-6323.45238

  106. Ware OR, Dawson JE, Shinohara MM, Taylor SC. Racial limitations of Fitzpatrick skin type. Cutis. 2020;105(2):77-80.

  107. Johari K, Kist JM, Bulera VN, et al. Self-reported Fitzpatrick skin type classification is unreliable in dermatology patients. J Drugs Dermatol. 2020;19(9):892-895. doi:10.36849/JDD.2020.5274

  108. Farnebo S, Samuelsson A, Henricson J, Karlsson M, Sjöberg F. Unaided visual evaluation of erythema is poor in the assessment of laser settings. Scand J Plast Reconstr Surg Hand Surg. 2009;43(6):315-319. doi:10.3109/02844310903265416

  109. Adamson AS, Smith A. Machine learning and health care disparities in dermatology. JAMA Dermatol. 2018;154(11):1247-1248. doi:10.1001/jamadermatol.2018.2348

  110. Daneshjou R, Barata C, Betz-Stablein B, et al. Checklist for Evaluation of Image-Based Artificial Intelligence Reports in Dermatology: CLEAR Derm Consensus Guidelines from the International Skin Imaging Collaboration Artificial Intelligence Working Group. JAMA Dermatol. 2022;158(1):90-96. doi:10.1001/jamadermatol.2021.4915

  111. Kinyanjui NM, Odongo T, Cintas C, et al. Estimating skin tone and effects on classification performance in dermatology datasets. arXiv preprint arXiv:1910.13268. 2019.

  112. Ezzedine K, Eleftheriadou V, Whitton M, van Geel N. Vitiligo. Lancet. 2015;386(9988):74-84. doi:10.1016/S0140-6736(14)60763-7

  113. Taylor SC, Arsonnaud S, Czernielewski J. The Taylor hyperpigmentation scale: a new visual assessment tool for the evaluation of skin color and pigmentation. Cutis. 2005;76(4):270-274.

  114. Del Bino S, Duval C, Bernerd F. Clinical and biological characterization of skin pigmentation diversity and its consequences on UV impact. Int J Mol Sci. 2018;19(9):2668. doi:10.3390/ijms19092668

  115. Ly BCK, Dyer EB, Feig JL, Chien AL, Del Bino S. Research Techniques Made Simple: Cutaneous Colorimetry: A Reliable Technique for Objective Skin Color Measurement. J Invest Dermatol. 2020;140(1):3-12.e1. doi:10.1016/j.jid.2019.11.003

  116. Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data. 2018;5:180161. doi:10.1038/sdata.2018.161

  117. Winkler JK, Fink C, Toberer F, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 2019;155(10):1135-1141. doi:10.1001/jamadermatol.2019.1735

  118. Codella NCF, Gutman D, Celebi ME, et al. Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE; 2018:168-172. doi:10.1109/ISBI.2018.8363547

  119. Fujisawa Y, Inoue S, Nakamura Y. The possibility of deep learning-based, computer-aided skin tumor classifiers. Front Med (Lausanne). 2019;6:191. doi:10.3389/fmed.2019.00191

  120. Jain A, Way D, Gupta V, et al. Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Netw Open. 2021;4(4):e217249. doi:10.1001/jamanetworkopen.2021.7249

  121. Lee T, Ng V, Gallagher R, Coldman A, McLean D. Dullrazor: A software approach to hair removal from images. Comput Biol Med. 1997;27(6):533-543. doi:10.1016/s0010-4825(97)00020-6

  122. Winkler JK, Sies K, Fink C, et al. Melanoma recognition by a deep learning convolutional neural network-Performance in different melanoma subtypes and localisations. Eur J Cancer. 2020;127:21-29. doi:10.1016/j.ejca.2019.11.020

  123. Yap J, Yolland W, Tschandl P. Multimodal skin lesion classification using deep learning. Exp Dermatol. 2018;27(11):1261-1267. doi:10.1111/exd.13777

  124. Chakravorty R, Abedini M, Halpern A, et al. Dermoscopic image segmentation using deep convolutional networks. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2017:542-545. doi:10.1109/EMBC.2017.8036895

  125. Bi L, Kim J, Ahn E, Kumar A, Fulham M, Feng D. Dermoscopic image segmentation via multistage fully convolutional networks. IEEE Trans Biomed Eng. 2017;64(9):2065-2074. doi:10.1109/TBME.2017.2712771

  126. Celebi ME, Wen Q, Iyatomi H, Shimizu K, Zhou H, Schaefer G. A state-of-the-art survey on lesion border detection in dermoscopy images. Dermoscopy Image Analysis. 2015:97-129. doi:10.1201/b19107-5

  127. Yuan Y, Chao M, Lo YC. Automatic skin lesion segmentation using deep fully convolutional networks with Jaccard distance. IEEE Trans Med Imaging. 2017;36(9):1876-1886. doi:10.1109/TMI.2017.2695227

  128. Mirikharaji Z, Abhishek K, Izadi S, Hamarneh G. Star shape prior in fully convolutional networks for skin lesion segmentation. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. Springer; 2018:737-745. doi:10.1007/978-3-030-00937-3_84

  129. Barata C, Ruela M, Francisco M, Mendonça T, Marques JS. Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE Syst J. 2014;8(3):965-979. doi:10.1109/JSYST.2013.2271540

  130. Marchetti MA, Codella NCF, Dusza SW, et al. Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J Am Acad Dermatol. 2018;78(2):270-277.e1. doi:10.1016/j.jaad.2017.08.016

  131. Stern RS, Nijsten T, Feldman SR, Margolis DJ, Rolstad T. Psoriasis is common, carries a substantial burden even when not extensive, and is associated with widespread treatment dissatisfaction. J Investig Dermatol Symp Proc. 2004;9(2):136-139. doi:10.1046/j.1087-0024.2003.09102.x

  132. Jaspers S, Hopermann S, Sauermann G, et al. Rapid in vivo measurement of the topography of human skin by active image triangulation using a digital micromirror device. Skin Res Technol. 1999;5(3):195-207. doi:10.1111/j.1600-0846.1999.tb00131.x

  133. Takeshita J, Gelfand JM, Li P, et al. Psoriasis in the U.S. Medicare population: prevalence, treatment, and factors associated with biologic use. J Invest Dermatol. 2015;135(12):2955-2963. doi:10.1038/jid.2015.296

  134. Parisi R, Symmons DP, Griffiths CE, Ashcroft DM. Global epidemiology of psoriasis: a systematic review of incidence and prevalence. J Invest Dermatol. 2013;133(2):377-385. doi:10.1038/jid.2012.339

  135. Chalmers RJ, O'Sullivan T, Owen CM, Griffiths CE. A systematic review of treatments for guttate psoriasis. Br J Dermatol. 2001;145(6):891-894. doi:10.1046/j.1365-2133.2001.04567.x

  136. Kawahara J, Daneshvar S, Argenziano G, Hamarneh G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J Biomed Health Inform. 2019;23(2):538-546. doi:10.1109/JBHI.2018.2824327

  137. Fujisawa Y, Otomo Y, Ogata Y, et al. Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. Br J Dermatol. 2019;180(2):373-381. doi:10.1111/bjd.16924

  138. Menter A, Strober BE, Kaplan DH, et al. Joint AAD-NPF guidelines of care for the management and treatment of psoriasis with biologics. J Am Acad Dermatol. 2019;80(4):1029-1072. doi:10.1016/j.jaad.2018.11.057

  139. Zaenglein AL, Pathy AL, Schlosser BJ, et al. Guidelines of care for the management of acne vulgaris. J Am Acad Dermatol. 2016;74(5):945-973.e33. doi:10.1016/j.jaad.2015.12.037

  140. Jemec GB. Clinical practice. Hidradenitis suppurativa. N Engl J Med. 2012;366(2):158-164. doi:10.1056/NEJMcp1014163

  141. Bradford PT, Goldstein AM, McMaster ML, Tucker MA. Acral lentiginous melanoma: incidence and survival patterns in the United States, 1986-2005. Arch Dermatol. 2009;145(4):427-434. doi:10.1001/archdermatol.2008.609

  142. Kawahara J, BenTaieb A, Hamarneh G. Deep features to classify skin lesions. In: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI). IEEE; 2016:1397-1400. doi:10.1109/ISBI.2016.7493528

  143. Robinson A, Kardos M, Kimball AB. Physician Global Assessment (PGA) and Psoriasis Area and Severity Index (PASI): why do both? A systematic analysis of randomized controlled trials of biologic agents for moderate to severe plaque psoriasis. J Am Acad Dermatol. 2012;66(3):369-375. doi:10.1016/j.jaad.2011.01.022

  144. Yentzer BA, Ade RA, Fountain JM, et al. Simplifying regimens promotes greater adherence and outcomes with topical acne medications: a randomized controlled trial. Cutis. 2010;86(2):103-108.

  145. Rademaker M, Agnew K, Anagnostou N, et al. Psoriasis in those planning a family, pregnant or breastfeeding. The Australasian Psoriasis Collaboration. Australas J Dermatol. 2018;59(2):86-100. doi:10.1111/ajd.12733

  146. Brown BC, McKenna SP, Siddhi K, McGrouther DA, Bayat A. The hidden cost of skin scars: quality of life after skin scarring. J Plast Reconstr Aesthet Surg. 2008;61(9):1049-1058. doi:10.1016/j.bjps.2008.03.020

  147. Rennekampff HO, Hansbrough JF, Kiessig V, Doré C, Stoutenbeek CP, Schröder-Printzen I. Bioactive interleukin-8 is expressed in wounds and enhances wound healing. J Surg Res. 2000;93(1):41-54. doi:10.1006/jsre.2000.5892

  148. Wachtel TL, Berry CC, Wachtel EE, Frank HA. The inter-rater reliability of estimating the size of burns from various burn area chart drawings. Burns. 2000;26(2):156-170. doi:10.1016/s0305-4179(99)00047-9

  149. van Baar ME, Essink-Bot ML, Oen IM, Dokter J, Boxma H, van Beeck EF. Functional outcome after burns: a review. Burns. 2006;32(1):1-9. doi:10.1016/j.burns.2005.08.007

  150. Shuster S, Black MM, McVitie E. The influence of age and sex on skin thickness, skin collagen and density. Br J Dermatol. 1975;93(6):639-643. doi:10.1111/j.1365-2133.1975.tb05113.x

  151. Lucas C, Stanborough RW, Freeman CL, De Haan RJ. Efficacy of low-level laser therapy on wound healing in human subjects: a systematic review. Lasers Med Sci. 2000;15(2):84-93. doi:10.1007/s101030050053

  152. Mayrovitz HN, Soontupe LB. Wound areas by computerized planimetry of digital images: accuracy and reliability. Adv Skin Wound Care. 2009;22(5):222-229. doi:10.1097/01.ASW.0000305410.58350.36

  153. Wannous H, Lucas Y, Treuillet S, Albouy B. Supervised tissue classification from color images for a complete wound assessment tool. In: 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2007:6031-6034. doi:10.1109/IEMBS.2007.4353725

  154. Spilsbury K, Semmens JB, Saunders CM, Hall SE. Long-term survival outcomes following breast-conserving surgery with and without radiotherapy for invasive breast cancer. ANZ J Surg. 2005;75(5):337-342. doi:10.1111/j.1445-2197.2005.03374.x

  155. Draaijers LJ, Tempelman FR, Botman YA, et al. The patient and observer scar assessment scale: a reliable and feasible tool for scar evaluation. Plast Reconstr Surg. 2004;113(7):1960-1965. doi:10.1097/01.prs.0000122207.28773.56

  156. Lebwohl M, Yeilding N, Szapary P, et al. Impact of weight on the efficacy and safety of ustekinumab in patients with moderate to severe psoriasis: rationale for dosing recommendations. J Am Acad Dermatol. 2010;63(4):571-579. doi:10.1016/j.jaad.2009.11.012

  157. Hanifin JM, Thurston M, Omoto M, Cherill R, Tofte SJ, Graeber M. The eczema area and severity index (EASI): assessment of reliability in atopic dermatitis. EASI Evaluator Group. Exp Dermatol. 2001;10(1):11-18. doi:10.1034/j.1600-0625.2001.100102.x

  158. Thomas CL, Finlay KA. Defining the boundaries: a critical evaluation of the Birmingham Burn Unit body map. Burns. 1986;12(8):544-548. doi:10.1016/0305-4179(86)90188-1

  159. Berkley JL. Determining total body surface area of a burn using a Lund and Browder chart. Nursing. 2007;37(10):18. doi:10.1097/01.NURSE.0000296227.88874.9e

  160. Langley RG, Ellis CN. Evaluating psoriasis with Psoriasis Area and Severity Index, Psoriasis Global Assessment, and Lattice System Physician's Global Assessment. J Am Acad Dermatol. 2004;51(4):563-569. doi:10.1016/j.jaad.2004.04.012

  161. Finlay AY. Current severe psoriasis and the rule of tens. Br J Dermatol. 2005;152(5):861-867. doi:10.1111/j.1365-2133.2005.06502.x

  162. Gudi V, Akhondi H. Burn Surface Area Assessment. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023.

  163. Gottlieb AB, Kalb RE, Blauvelt A, et al. The efficacy and safety of infliximab in patients with plaque psoriasis who had an inadequate response to etanercept: results of a prospective, multicenter, open-label study. J Am Acad Dermatol. 2012;67(4):642-650. doi:10.1016/j.jaad.2011.10.031

  164. Augustin M, Radtke MA, Glaeske G, Reich K, Christophers E, Schaefer I. Epidemiology and comorbidity in children with psoriasis and atopic eczema. Dermatology. 2015;231(1):35-40. doi:10.1159/000381913

  165. Tripathi R, Knusel KD, Ezaldein HH, Scott JF, Bordeaux JS. Association of topical emollient use with clinical outcomes in patients with atopic dermatitis: A systematic review and meta-analysis. JAMA Dermatol. 2017;153(12):1203-1212. doi:10.1001/jamadermatol.2017.3647

  166. Albrecht J, Werth VP. Development of the CLASI as an outcome instrument for cutaneous lupus erythematosus. Dermatol Ther. 2007;20(2):93-101. doi:10.1111/j.1529-8019.2007.00116.x

  167. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137-1149. doi:10.1109/TPAMI.2016.2577031

  168. Wu Z, Shen C, van den Hengel A. Bridging category-level and instance-level semantic image segmentation. arXiv preprint arXiv:1605.06885. 2016.

  169. Powell BJ, Waltz TJ, Chinman MJ, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implement Sci. 2015;10:21. doi:10.1186/s13012-015-0209-1

  170. Sidbury R, Davis DM, Cohen DE, et al. Guidelines of care for the management of atopic dermatitis: section 3. Management and treatment with phototherapy and systemic agents. J Am Acad Dermatol. 2014;71(2):327-349. doi:10.1016/j.jaad.2014.03.030

  171. Hawro T, Ohanyan T, Schoepke N, et al. The urticaria activity score—validity, reliability, and responsiveness. J Allergy Clin Immunol Pract. 2018;6(4):1185-1190.e1. doi:10.1016/j.jacp.2017.10.001

  172. Maurer M, Weller K, Bindslev-Jensen C, et al. Unmet clinical needs in chronic spontaneous urticaria. A GA²LEN task force report. Allergy. 2011;66(3):317-330. doi:10.1111/j.1398-9995.2010.02496.x

  173. Han SS, Moon IJ, Lim W, et al. Keratinocytic skin cancer detection on the face using region-based convolutional neural network. JAMA Dermatol. 2020;156(1):29-37. doi:10.1001/jamadermatol.2019.3807

  174. Zuberbier T, Balke M, Worm M, Edenharter G, Maurer M. Epidemiology of urticaria: a representative cross-sectional population survey. Clin Exp Dermatol. 2010;35(8):869-873. doi:10.1111/j.1365-2230.2010.03840.x

  175. Mathias SD, Dreskin SC, Kaplan A, Saini SS, Rosén K, Beck LA. Development of a daily diary for patients with chronic idiopathic urticaria. Ann Allergy Asthma Immunol. 2010;105(2):142-148. doi:10.1016/j.anai.2010.06.011

  176. Olsen EA, Dunlap FE, Funicella T, et al. A randomized clinical trial of 5% topical minoxidil versus 2% topical minoxidil and placebo in the treatment of androgenetic alopecia in men. J Am Acad Dermatol. 2002;47(3):377-385. doi:10.1067/mjd.2002.124088

  177. Sinclair R, Patel M, Dawson TL Jr, et al. Hair loss in women: medical and cosmetic approaches to increase scalp hair fullness. Br J Dermatol. 2011;165(Suppl 3):12-18. doi:10.1111/j.1365-2133.2011.10630.x

  178. Alkhalifah A, Alsantali A, Wang E, McElwee KJ, Shapiro J. Alopecia areata update: part I. Clinical picture, histopathology, and pathogenesis. J Am Acad Dermatol. 2010;62(2):177-188. doi:10.1016/j.jaad.2009.10.032

  179. Bolduc C, Shapiro J. Hair care products: waving, straightening, conditioning, and coloring. Clin Dermatol. 2001;19(4):431-436. doi:10.1016/s0738-081x(01)00201-2

  180. Rich P, Scher RK. Nail Psoriasis Severity Index: a useful tool for evaluation of nail psoriasis. J Am Acad Dermatol. 2003;49(2):206-212. doi:10.1067/s0190-9622(03)00910-1

  181. Fernández-Nieto D, Cura-Gonzalez ID, Esteban-Velasco C, Marques-Mejias MA, Ortega-Quijano D. Artificial intelligence to assess nail unit disorders: A pilot study. Skin Appendage Disord. 2021;7(6):428-433. doi:10.1159/000517341

  182. Parrish CA, Sobera JO, Elewski BE. Modification of the Nail Psoriasis Severity Index. J Am Acad Dermatol. 2005;53(4):745-746. doi:10.1016/j.jaad.2005.04.028

  183. Han SS, Park I, Eun Chang S, et al. Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J Invest Dermatol. 2020;140(9):1753-1761.e4. doi:10.1016/j.jid.2020.01.019

  184. Antonini D, Simonatto M, Candi E, Melino G. Keratinocyte stem cells and their niches in the skin appendages. J Invest Dermatol. 2014;134(7):1797-1799. doi:10.1038/jid.2014.126

  185. Njoo MD, Westerhof W, Bos JD, Bossuyt PM. A systematic review of autologous transplantation methods in vitiligo. Arch Dermatol. 1998;134(12):1543-1549. doi:10.1001/archderm.134.12.1543

  186. Grimes PE, Miller MM. Vitiligo: Patient stories, self-esteem, and the psychological burden of disease. Int J Womens Dermatol. 2018;4(1):32-37. doi:10.1016/j.ijwd.2017.11.005

  187. Hamzavi I, Jain H, McLean D, Shapiro J, Zeng H, Lui H. Parametric modeling of narrowband UV-B phototherapy for vitiligo using a novel quantitative tool: the Vitiligo Area Scoring Index. Arch Dermatol. 2004;140(6):677-683. doi:10.1001/archderm.140.6.677

  188. Njoo MD, Vodegel RM, Westerhof W. Depigmentation therapy in vitiligo universalis with topical 4-methoxyphenol and the Q-switched ruby laser. J Am Acad Dermatol. 2000;42(5 Pt 1):760-769. doi:10.1016/s0190-9622(00)90009-x

  189. Parsad D, Pandhi R, Dogra S, Kumar B. Clinical study of repigmentation patterns with different treatment modalities and their correlation with speed and stability of repigmentation in 352 vitiliginous patches. J Am Acad Dermatol. 2004;50(1):63-67. doi:10.1016/s0190-9622(03)02463-4

  190. Ezzedine K, Lim HW, Suzuki T, et al. Revised classification/nomenclature of vitiligo and related issues: the Vitiligo Global Issues Consensus Conference. Pigment Cell Melanoma Res. 2012;25(3):E1-E13. doi:10.1111/j.1755-148X.2012.00997.x

  191. Yuan Y, Lo YC. Improving dermoscopic image segmentation with enhanced convolutional-deconvolutional networks. IEEE J Biomed Health Inform. 2019;23(2):519-526. doi:10.1109/JBHI.2017.2787487

  192. Rodrigues M, Ezzedine K, Hamzavi I, Pandya AG, Harris JE. New discoveries in the pathogenesis and classification of vitiligo. J Am Acad Dermatol. 2017;77(1):1-13. doi:10.1016/j.jaad.2016.10.048

  193. Passeron T, Ortonne JP. Use of the 308-nm excimer laser for psoriasis and vitiligo. Clin Dermatol. 2006;24(1):33-42. doi:10.1016/j.clindermatol.2005.10.018

  194. Gawkrodger DJ, Ormerod AD, Shaw L, et al. Guideline for the diagnosis and management of vitiligo. Br J Dermatol. 2008;159(5):1051-1076. doi:10.1111/j.1365-2133.2008.08881.x

  195. Halder RM, Taliaferro SJ. Vitiligo. In: Wolff K, Goldsmith LA, Katz SI, et al, eds. Fitzpatrick's Dermatology in General Medicine. 7th ed. McGraw-Hill; 2008:616-622.

  196. Doshi A, Zaheer A, Stiller MJ. A comparison of current acne grading systems and proposal of a novel system. Int J Dermatol. 1997;36(6):416-418. doi:10.1046/j.1365-4362.1997.00099.x

  197. Tan JK, Tang J, Fung K, et al. Development and validation of a comprehensive acne severity scale. J Cutan Med Surg. 2007;11(6):211-216. doi:10.2310/7750.2007.00037

  198. Layton AM, Henderson CA, Cunliffe WJ. A clinical evaluation of acne scarring and its incidence. Clin Exp Dermatol. 1994;19(4):303-308. doi:10.1111/j.1365-2230.1994.tb01200.x

  199. Tan J, Thiboutot D, Popp G, et al. Randomized phase 3 evaluation of trifarotene 50 μg/g cream treatment of moderate facial and truncal acne. J Am Acad Dermatol. 2019;80(6):1691-1699. doi:10.1016/j.jaad.2019.02.044

  200. Leyden J, Stein-Gold L, Weiss J. Why topical retinoids are mainstay of therapy for acne. Dermatol Ther (Heidelb). 2017;7(3):293-304. doi:10.1007/s13555-017-0185-2

  201. Thiboutot DM, Dréno B, Abanmi A, et al. Practical management of acne for clinicians: An international consensus from the Global Alliance to Improve Outcomes in Acne. J Am Acad Dermatol. 2018;78(2 Suppl 1):S1-S23.e1. doi:10.1016/j.jaad.2017.09.078

  202. Seité S, Dréno B, Benech F, Bédane C, Pecastaings S. Creation and validation of an artificial intelligence algorithm for acne grading. J Eur Acad Dermatol Venereol. 2020;34(12):2946-2951. doi:10.1111/jdv.16736

  203. Winkler JK, Sies K, Fink C, et al. Association between different scale bars in dermoscopic images and diagnostic performance of a market-approved deep learning convolutional neural network for melanoma recognition. Eur J Cancer. 2021;145:146-154. doi:10.1016/j.ejca.2020.12.010

  204. Burlina P, Joshi N, Ng E, Billings S, Paul W, Rotemberg V. Assessment of deep generative models for high-resolution synthetic retinal image generation of age-related macular degeneration. JAMA Ophthalmol. 2019;137(3):258-264. doi:10.1001/jamaophthalmol.2018.6156

  205. Korotkov K, Garcia R. Computerized analysis of pigmented skin lesions: A review. Artif Intell Med. 2012;56(2):69-90. doi:10.1016/j.artmed.2012.08.002

  206. Brinker TJ, Hekler A, Hauschild A, et al. Comparing artificial intelligence algorithms to 157 German dermatologists: the melanoma classification benchmark. Eur J Cancer. 2019;111:30-37. doi:10.1016/j.ejca.2018.12.016

  207. Perednia DA, Brown NA. Teledermatology: one application of telemedicine. Bull Med Libr Assoc. 1995;83(1):42-47.

  208. Ngoo A, Finnane A, McMeniman E, Tan JM, Janda M, Soyer HP. Fighting melanoma with smartphones: A snapshot on where we are a decade after app stores opened their doors. Int J Med Inform. 2018;118:99-112. doi:10.1016/j.ijmedinf.2018.08.004

  209. Kroemer S, Frühauf J, Campbell TM, et al. Mobile teledermatology for skin tumour screening: diagnostic accuracy of clinical and dermoscopic image tele-evaluation using cellular phones. Br J Dermatol. 2011;164(5):973-979. doi:10.1111/j.1365-2133.2011.10208.x

  210. Massone C, Hofmann-Wellenhof R, Ahlgrimm-Siess V, Gabler G, Ebner C, Soyer HP. Melanoma screening with cellular phones. PLoS One. 2007;2(5):e483. doi:10.1371/journal.pone.0000483

  211. Ferrara G, Argenziano G, Soyer HP, et al. The influence of clinical information in the histopathologic diagnosis of melanocytic skin neoplasms. PLoS One. 2009;4(4):e5375. doi:10.1371/journal.pone.0005375

  212. Carli P, De Giorgi V, Crocetti E, et al. Improvement of malignant/benign ratio in excised melanocytic lesions in the 'dermoscopy era': a retrospective study 1997-2001. Br J Dermatol. 2004;150(4):687-692. doi:10.1111/j.0007-0963.2004.05860.x

  213. Del Bino S, Bernerd F. Variations in skin colour and the biological consequences of ultraviolet radiation exposure. Br J Dermatol. 2013;169(Suppl 3):33-40. doi:10.1111/bjd.12529

  214. Pershing S, Enns JT, Bae IS, Randall BD, Pruiksma JB, Desai AD. Variability in physician assessment of oculoplastic standardized photographs. Aesthet Surg J. 2014;34(8):1203-1209. doi:10.1177/1090820X14542642

  215. Goh CL. The need for evidence-based aesthetic dermatology practice. J Cutan Aesthet Surg. 2009;2(2):65-71. doi:10.4103/0974-2077.58518

  216. Lester JC, Jia JL, Zhang L, Okoye GA, Linos E. Absence of images of skin of colour in publications of COVID-19 skin manifestations. Br J Dermatol. 2020;183(3):593-595. doi:10.1111/bjd.19258

  217. Wagner JK, Jovel C, Norton HL, Parra EJ, Shriver MD. Comparing quantitative measures of erythema, pigmentation and skin response using reflectometry. Pigment Cell Res. 2002;15(5):379-384. doi:10.1034/j.1600-0749.2002.02042.x

  218. Nkengne A, Bertin C, Stamatas GN, et al. Influence of facial skin attributes on the perceived age of Caucasian women. J Eur Acad Dermatol Venereol. 2008;22(8):982-991. doi:10.1111/j.1468-3083.2008.02698.x

  219. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94-98. doi:10.7861/futurehosp.6-2-94

  220. Char DS, Shah NH, Magnus D. Implementing machine learning in health care - addressing ethical challenges. N Engl J Med. 2018;378(11):981-983. doi:10.1056/NEJMp1714229

  221. Gichoya JW, Banerjee I, Bhimireddy AR, et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health. 2022;4(6):e406-e414. doi:10.1016/S2589-7500(22)00063-2

  222. Alexis AF, Sergay AB, Taylor SC. Common dermatologic disorders in skin of color: a comparative practice survey. Cutis. 2007;80(5):387-394.

  223. Ebede TL, Arch EL, Berson D. Hormonal treatment of acne in women. J Clin Aesthet Dermatol. 2009;2(12):16-22.

  224. Ly BC, Dyer EB, Feig JL, Chien AL, Del Bino S. Research Techniques Made Simple: Cutaneous Colorimetry: A Reliable Technique for Objective Skin Color Measurement. J Invest Dermatol. 2020;140(1):3-12.e1. doi:10.1016/j.jid.2019.11.003

  225. Chardon A, Cretois I, Hourseau C. Skin colour typology and suntanning pathways. Int J Cosmet Sci. 1991;13(4):191-208. doi:10.1111/j.1467-2494.1991.tb00561.x

  226. Gareau DS. Feasibility of digitally stained multimodal confocal mosaics to simulate histopathology. J Biomed Opt. 2009;14(3):034050. doi:10.1117/1.3149853

  227. Koenig K, Raphael AP, Lin L, et al. Optical skin biopsies by clinical CARS and multiphoton fluorescence/SHG tomography. Laser Phys Lett. 2011;8(6):465-468. doi:10.1002/lapl.201110014

  228. Baldi A, Murace R, Dragonetti E, et al. The Significance of Artificial Intelligence in the Assessment of Skin Cancer. J Clin Med. 2021;10(21):4926. doi:10.3390/jcm10214926

  229. Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi:10.1038/s41591-018-0316-z

  230. Winkler JK, Fink C, Toberer F, et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 2019;155(10):1135-1141. doi:10.1001/jamadermatol.2019.1735

  231. Kittler H, Pehamberger H, Wolff K, Binder M. Diagnostic accuracy of dermoscopy. Lancet Oncol. 2002;3(3):159-165. doi:10.1016/s1470-2045(02)00679-4

  232. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;319(13):1317-1318. doi:10.1001/jama.2017.18391

  233. Codella NC, Lin CC, Halpern A, et al. Collaborative human-AI (CHAI): Evidence-based interpretable melanoma classification in dermoscopic images. In: Understanding and Interpreting Machine Learning in Medical Image Computing Applications. Springer; 2018:97-105. doi:10.1007/978-3-030-02628-8_11

  234. Zouboulis CC, Desai N, Emtestam L, et al. European S1 guideline for the treatment of hidradenitis suppurativa/acne inversa. J Eur Acad Dermatol Venereol. 2015;29(4):619-644. doi:10.1111/jdv.12966

  235. Martorell A, García-Martínez FJ, Jiménez-Gallo D, et al. An update on hidradenitis suppurativa (Part I): Epidemiology, clinical aspects, and definition of disease severity. Actas Dermosifiliogr. 2015;106(9):703-715. doi:10.1016/j.ad.2015.06.004

  236. Hurley HJ. Axillary hyperhidrosis, apocrine bromhidrosis, hidradenitis suppurativa, and familial benign pemphigus: surgical approach. In: Roenigk RK, Roenigk HH Jr, eds. Dermatologic Surgery: Principles and Practice. Marcel Dekker; 1989:623-645.

  237. Revuz JE, Canoui-Poitrine F, Wolkenstein P, et al. Prevalence and factors associated with hidradenitis suppurativa: results from two case-control studies. J Am Acad Dermatol. 2008;59(4):596-601. doi:10.1016/j.jaad.2008.06.020

  238. Alikhan A, Sayed C, Alavi A, et al. North American clinical management guidelines for hidradenitis suppurativa: A publication from the United States and Canadian Hidradenitis Suppurativa Foundations: Part I: Diagnosis, evaluation, and the use of complementary and procedural management. J Am Acad Dermatol. 2019;81(1):76-90. doi:10.1016/j.jaad.2019.02.067

  239. Ingram JR, Collier F, Brown D, et al. British Association of Dermatologists guidelines for the management of hidradenitis suppurativa (acne inversa) 2018. Br J Dermatol. 2019;180(5):1009-1017. doi:10.1111/bjd.17537

  240. Esmann S, Jemec GB. Psychosocial impact of hidradenitis suppurativa: a qualitative study. Acta Derm Venereol. 2011;91(3):328-332. doi:10.2340/00015555-1082

  241. Matusiak Ł, Bieniek A, Szepietowski JC. Increased serum tumour necrosis factor-α in hidradenitis suppurativa patients: is there a basis for treatment with anti-tumour necrosis factor-α agents? Acta Derm Venereol. 2009;89(6):601-603. doi:10.2340/00015555-0701

  242. Grant A, Gonzalez T, Montgomery MO, Cardenas V, Kerdel FA. Infliximab therapy for patients with moderate to severe hidradenitis suppurativa: a randomized, double-blind, placebo-controlled crossover trial. J Am Acad Dermatol. 2010;62(2):205-217. doi:10.1016/j.jaad.2009.06.050

  243. Schneider-Burrus S, Tsaousi A, Barbus S, Huss-Marp J, Witte-Händel E, Witte K. Features associated with quality of life impairment in hidradenitis suppurativa patients. Front Med (Lausanne). 2021;8:676241. doi:10.3389/fmed.2021.676241

  244. Moriarty B, Jiyad Z, Creamer D. Four-weekly infliximab in the treatment of severe hidradenitis suppurativa. Br J Dermatol. 2014;170(4):986-987. doi:10.1111/bjd.12823

  245. Vossen ARJV, van der Zee HH, Prens EP. Hidradenitis Suppurativa: A Systematic Review Integrating Inflammatory Pathways Into a Cohesive Pathogenic Model. Front Immunol. 2018;9:2965. doi:10.3389/fimmu.2018.02965

  246. Sabat R, Jemec GBE, Matusiak Ł, Kimball AB, Prens E, Wolk K. Hidradenitis suppurativa. Nat Rev Dis Primers. 2020;6(1):18. doi:10.1038/s41572-020-0149-1

  247. Kimball AB, Kerdel F, Adams D, et al. Adalimumab for the treatment of moderate to severe hidradenitis suppurativa: a parallel randomized trial. Ann Intern Med. 2012;157(12):846-855. doi:10.7326/0003-4819-157-12-201212180-00004

  248. Zouboulis CC, Tzellos T, Kyrgidis A, et al. Development and validation of the International Hidradenitis Suppurativa Severity Score System (IHS4), a novel dynamic scoring system to assess HS severity. Br J Dermatol. 2017;177(5):1401-1409. doi:10.1111/bjd.15748

  249. Kimball AB, Okun MM, Williams DA, et al. Two Phase 3 Trials of Adalimumab for Hidradenitis Suppurativa. N Engl J Med. 2016;375(5):422-434. doi:10.1056/NEJMoa1504370

  250. Jfri A, Nassim D, O'Brien E, Gulliver W, Nikolakis G, Zouboulis CC. Prevalence of Hidradenitis Suppurativa: A Systematic Review and Meta-regression Analysis. JAMA Dermatol. 2021;157(8):924-931. doi:10.1001/jamadermatol.2021.1677

  251. Gomolin A, Cline A, Russo S, Wirya SA, Treat JR. Treatment of inflammatory manifestations of hidradenitis suppurativa with secukinumab in pediatric patients. JAAD Case Rep. 2019;5(12):1088-1091. doi:10.1016/j.jdcr.2019.10.005

  252. Mehdizadeh A, Hazen PG, Bechara FG, et al. Recurrence of hidradenitis suppurativa after surgical management: A systematic review and meta-analysis. J Am Acad Dermatol. 2015;73(5 Suppl 1):S70-S77. doi:10.1016/j.jaad.2015.07.044

  253. Goyal M, Knackstedt T, Yan S, Hassanpour S. Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities. Comput Biol Med. 2020;127:104065. doi:10.1016/j.compbiomed.2020.104065

  254. Nasr-Esfahani E, Samavi S, Karimi N, et al. Melanoma detection by analysis of clinical images using convolutional neural network. Annu Int Conf IEEE Eng Med Biol Soc. 2016;2016:1373-1376. doi:10.1109/EMBC.2016.7590963

  255. Garnavi R, Aldeen M, Celebi ME, Varigos G, Finch S. Border detection in dermoscopy images using hybrid thresholding on optimized color channels. Comput Med Imaging Graph. 2011;35(2):105-115. doi:10.1016/j.compmedimag.2010.08.001

  256. Xie Y, Zhang J, Xia Y. Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT. Med Image Anal. 2019;57:237-248. doi:10.1016/j.media.2019.07.004

  257. Serte S, Serener A, Al-Turjman F. Deep learning in medical imaging: A brief review. Trans Emerg Telecommun Technol. 2020;e4080. doi:10.1002/ett.4080

  258. Lee H, Chen YP. Image based computer aided diagnosis system for cancer detection. Expert Syst Appl. 2015;42(12):5356-5365. doi:10.1016/j.eswa.2015.02.005

  259. Udrea A, Mitra GD, Costea D, et al. Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms. J Eur Acad Dermatol Venereol. 2020;34(3):648-655. doi:10.1111/jdv.15935

  260. Schmid-Saugeón P, Guillod J, Thiran JP. Towards a computer-aided diagnosis system for pigmented skin lesions. Comput Med Imaging Graph. 2003;27(1):65-78. doi:10.1016/s0895-6111(02)00048-4

  261. Patwardhan SV, Dai S, Dhawan AP. Multi-spectral image analysis and classification of melanoma using fuzzy membership based partitions. Comput Med Imaging Graph. 2005;29(4):287-296. doi:10.1016/j.compmedimag.2004.11.002

  262. Kasmi R, Mokrani K. Classification of malignant melanoma and benign skin lesions: implementation of automatic ABCD rule. IET Image Process. 2016;10(6):448-455. doi:10.1049/iet-ipr.2015.0385

  263. Mendonça T, Ferreira PM, Marques JS, Marcal AR, Rozeira J. PH2 - A dermoscopic image database for research and benchmarking. Annu Int Conf IEEE Eng Med Biol Soc. 2013;2013:5437-5440. doi:10.1109/EMBC.2013.6610779

  264. Chatterjee S, Dey D, Munshi S, Gorai S. Extraction of features from cross correlation in space and frequency domains for classification of skin lesions. Biomed Signal Process Control. 2019;53:101581. doi:10.1016/j.bspc.2019.101581

  265. Nachbar F, Stolz W, Merkle T, et al. The ABCD rule of dermatoscopy. High prospective value in the diagnosis of doubtful melanocytic skin lesions. J Am Acad Dermatol. 1994;30(4):551-559. doi:10.1016/s0190-9622(94)70061-3

  266. Pehamberger H, Steiner A, Wolff K. In vivo epiluminescence microscopy of pigmented skin lesions. I. Pattern analysis of pigmented skin lesions. J Am Acad Dermatol. 1987;17(4):571-583. doi:10.1016/s0190-9622(87)70239-4

  267. Menter A, Gottlieb A, Feldman SR, et al. Guidelines of care for the management of psoriasis and psoriatic arthritis: Section 1. Overview of psoriasis and guidelines of care for the treatment of psoriasis with biologics. J Am Acad Dermatol. 2008;58(5):826-850. doi:10.1016/j.jaad.2008.02.039

  268. Gollnick H, Cunliffe W, Berson D, et al. Management of acne: a report from a Global Alliance to Improve Outcomes in Acne. J Am Acad Dermatol. 2003;49(1 Suppl):S1-S37. doi:10.1067/mjd.2003.618

  269. Jemec GB, Heidenheim M, Nielsen NH. The prevalence of hidradenitis suppurativa and its potential precursor lesions. J Am Acad Dermatol. 1996;35(2 Pt 1):191-194. doi:10.1016/s0190-9622(96)90321-7

  270. Shaikh WR, Xiong M, Weinstock MA. The contribution of nodular subtype to melanoma mortality in the United States, 1978 to 2007. Arch Dermatol. 2012;148(1):30-36. doi:10.1001/archdermatol.2011.264

  271. Johr RH. Dermoscopy: alternative melanocytic algorithms--the ABCD rule of dermatoscopy, menzies scoring method, and 7-point checklist. Clin Dermatol. 2002;20(3):240-247. doi:10.1016/s0738-081x(02)00236-5

  272. Fredriksson T, Pettersson U. Severe psoriasis--oral therapy with a new retinoid. Dermatologica. 1978;157(4):238-244. doi:10.1159/000250839

  273. Del Rosso JQ, Kim G. Optimizing use of topical corticosteroids in psoriasis: the role of age, site of involvement, vehicle, potency, and formulation. J Drugs Dermatol. 2010;9(5):457-465.

  274. Williams HC, Burden-Teh E, Nunn AJ. What is the optimal dose of oral azathioprine for atopic eczema? The ADAPT trial. Br J Dermatol. 2017;177(3):e108-e109. doi:10.1111/bjd.15644

  275. Lund CC, Browder NC. The estimation of areas of burns. Surg Gynecol Obstet. 1944;79:352-358.

  276. Hettiaratchy S, Papini R. Initial management of a major burn: II--assessment and resuscitation. BMJ. 2004;329(7457):101-103. doi:10.1136/bmj.329.7457.101

  277. Papp KA, Langley RG, Lebwohl M, et al. Efficacy and safety of ustekinumab, a human interleukin-12/23 monoclonal antibody, in patients with psoriasis: 52-week results from a randomised, double-blind, placebo-controlled trial (PHOENIX 2). Lancet. 2008;371(9625):1675-1684. doi:10.1016/S0140-6736(08)60726-6

  278. Berwick M, Wiggins C. The current epidemiology of cutaneous malignant melanoma. Front Biosci. 2006;11:1244-1254. doi:10.2741/1877

  279. Jones I, Currie L. Digital imaging research in burn wounds. Burns. 2004;30(3):211-214. doi:10.1016/j.burns.2003.11.016

  280. Lucas VS, Burk RS, Creehan S, Grap MJ. Utility of high-resolution digital photography for wound measurement. Ostomy Wound Manage. 2006;52(9):52-54,56,58-61.

  281. Plassmann P, Melhuish JM, Harding KG. Methods of measuring wound surface area: a comparative study. Ostomy Wound Manage. 1994;40(4):50-52,54,56-60.

  282. Dhivya S, Padma VV, Santhini E. Wound dressings - a review. Biomedicine (Taipei). 2015;5(4):22. doi:10.7603/s40681-015-0022-9

  283. Weir GR, Ling RS. Photographic planimetry: an evaluation using standard areas. Comput Biol Med. 1985;15(2):81-88. doi:10.1016/0010-4825(85)90035-1

  284. Thatcher JE, Squiers JJ, Kanick SC, et al. Imaging techniques for clinical burn assessment with a focus on multispectral imaging. Adv Wound Care (New Rochelle). 2016;5(8):360-378. doi:10.1089/wound.2015.0684

  285. Wallace HA, Basehore BM, Zito PM. Wound Healing Phases. In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2023.

  286. Romanelli M, Vowden K, Weir D. Exudate management made easy. Wounds International. 2010;1(2):1-6.

  287. European Wound Management Association (EWMA). Position Document: Wound Bed Preparation in Practice. London: MEP Ltd; 2004.

  288. Sibbald RG, Orsted HL, Coutts PM, Keast DH. Best practice recommendations for preparing the wound bed: update 2006. Adv Skin Wound Care. 2007;20(7):390-405. doi:10.1097/01.ASW.0000281539.60701.01

  289. Mrowietz U, Kragballe K, Reich K, et al. Definition of treatment goals for moderate to severe psoriasis: a European consensus. Arch Dermatol Res. 2011;303(1):1-10. doi:10.1007/s00403-010-1080-1

  290. Finlay AY, Khan GK. Dermatology Life Quality Index (DLQI)--a simple practical measure for routine clinical use. Clin Exp Dermatol. 1994;19(3):210-216. doi:10.1111/j.1365-2230.1994.tb01167.x

  291. Lu J, Kazmierczak E, Meinhardt J, Topfer T, Kiehl K, Jung A. Efficacy of digital imaging and telemedicine for wound assessment: a validation study. J Wound Care. 2009;18(12):517-520,522-526. doi:10.12968/jowc.2009.18.12.45595

  292. Ashcroft DM, Wan Po AL, Williams HC, Griffiths CE. Clinical measures of disease severity and outcome in psoriasis: a critical appraisal of their quality. Br J Dermatol. 1999;141(2):185-191. doi:10.1046/j.1365-2133.1999.02963.x

  293. Naldi L, Svensson A, Diepgen T, et al. Randomized clinical trials for psoriasis 1977-2000: the EDEN survey. J Invest Dermatol. 2003;120(5):738-741. doi:10.1046/j.1523-1747.2003.12145.x

  294. Spuls PI, Lecluse LL, Poulsen ML, Bos JD, Stern RS, Nijsten T. How good are clinical severity and outcome measures for psoriasis?: quantitative evaluation in a systematic review. J Invest Dermatol. 2010;130(4):933-943. doi:10.1038/jid.2009.391

  295. Chren MM, Lasek RJ, Flocke SA, Zyzanski SJ. Improved discriminative and evaluative capability of a refined version of Skindex, a quality-of-life instrument for patients with skin diseases. Arch Dermatol. 1997;133(11):1433-1440. doi:10.1001/archderm.133.11.1433

Traceability to QMS Records​

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-001
Previous
Artificial Intelligence
Next
R-TF-028-001 AI Development Plan
  • Purpose
  • Scope
  • Algorithm summary
  • Algorithm Classification
    • Clinical Models
    • Non-Clinical Models
  • Description and Specifications
    • ICD Category Distribution and Binary Indicators
      • Description
        • ICD Category Distribution
        • Binary Indicators
      • Objectives
        • ICD Category Distribution Objectives
        • Binary Indicator Objectives
      • Endpoints and Requirements
        • ICD Category Distribution Endpoints and Requirements
        • Binary Indicator Endpoints and Requirements
    • Erythema Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Desquamation Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Induration Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Pustule Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Crusting Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Xerosis Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Swelling Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Oozing Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Excoriation Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Lichenification Intensity Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Wound Characteristic Assessment
      • Description
      • Objectives
        • Erythema Intensity Quantification
        • Wound Edge Characteristics
          • Damaged Edges
          • Delimited Edges
          • Diffuse Edges
          • Thickened Edges
          • Indistinguishable Edges
        • Perilesional Characteristics
          • Perilesional Erythema
          • Perilesional Maceration
        • Tissue Characteristics
          • Biofilm-Compatible Tissue
          • Affected Tissue Types (Bone/Adjacent, Dermis/Epidermis, Muscle, Subcutaneous, Scarred Skin)
        • Exudate Characteristics
          • Fibrinous Exudate
          • Purulent Exudate
          • Bloody Exudate
          • Serous Exudate
          • Greenish Exudate
        • Wound Bed Tissue Types
          • Scarred Tissue
          • Sloughy Tissue
          • Necrotic Tissue
          • Granulation Tissue
          • Epithelial Tissue
        • Wound Stage Classification
        • Wound Intensity Quantification
    • Body Surface Segmentation
      • Algorithm Description
      • Objectives
      • Endpoints and Requirements
      • Endpoints and Requirements
    • Wound Surface Quantification
      • Description
      • Objectives
        • Erythema Surface Quantification
        • Wound Bed Surface Quantification
        • Angiogenesis and Granulation Tissue Surface Quantification
        • Biofilm and Slough Surface Quantification
        • Necrosis Surface Quantification
        • Maceration Surface Quantification
        • Orthopedic Material Surface Quantification
        • Bone, Cartilage, or Tendon Surface Quantification
      • Endpoints and Requirements
    • Hair Loss Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Inflammatory Nodular Lesion Quantification
      • Description
      • Objectives
        • Nodule Lesion Quantification
        • Abscess Lesion Quantification
        • Non-Draining Tunnel Lesion Quantification
        • Draining Tunnel Lesion Quantification
      • Endpoints and Requirements
    • Acneiform Lesion Type Quantification
      • Description
      • Objectives
        • Papule Lesion Quantification
        • Pustule Lesion Quantification
        • Cyst Lesion Quantification
        • Comedone Lesion Quantification
        • Nodule Lesion Quantification
      • Endpoints and Requirements
    • Inflammatory Lesion Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Hive Lesion Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Nail Lesion Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Hypopigmentation or Depigmentation Surface Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Acneiform Inflammatory Pattern Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Follicular and Inflammatory Pattern Identification
      • Description
      • Objectives
        • Follicular Phenotype Identification
        • Inflammatory Phenotype Identification
        • Mixed Phenotype Identification
      • Endpoints and Requirements
    • Inflammatory Pattern Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Inflammatory Pattern Indicator
      • Description
      • Objectives
      • Endpoints and Requirements
    • Dermatology Image Quality Assessment (DIQA)
      • Description
      • Objectives
      • Endpoints and Requirements
    • Fitzpatrick Skin Type Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Domain Validation
      • Description
      • Objectives
      • Endpoints and Requirements
    • Skin Surface Segmentation
      • Description
      • Objectives
      • Endpoints and Requirements
    • Surface Area Quantification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Body Site Identification
      • Description
      • Objectives
      • Endpoints and Requirements
    • Data Specifications
    • Other Specifications
    • Cybersecurity and Transparency
    • Specifications and Risks
  • Integration and Environment
    • Integration
    • Environment
  • References
  • Traceability to QMS Records
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)