Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index of Technical Documentation or Product File
    • Summary of Technical Documentation (STED)
    • Description and specifications
    • R-TF-001-007 Declaration of conformity
    • GSPR
    • Artificial Intelligence
      • R-TF-028-001 AI/ML Description
      • R-TF-028-001 AI/ML Development Plan
      • R-TF-028-003 Data Collection Instructions - Prospective Data
      • R-TF-028-003 Data Collection Instructions - Retrospective Data
      • R-TF-028-004 Data Annotation Instructions - Visual Signs
      • R-TF-028-004 Data Annotation Instructions - Binary Indicator Mapping
      • R-TF-028-004 AI/ML Development Report
      • R-TF-028 AI/ML Release Report
      • R-TF-028 AI/ML Design Checks
    • Clinical
    • Cybersecurity
    • Design and development
    • Design History File
    • IFU and label
    • Post-Market Surveillance
    • Quality control
    • Risk Management
    • Usability and Human Factors Engineering
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Artificial Intelligence
  • R-TF-028-001 AI/ML Description

R-TF-028-001 AI/ML Description

Table of contents
  • Purpose
  • Scope
  • Description and Specifications
    • ICD Category Distribution
      • Algorithm Description
      • Algorithm Objectives
      • Algorithm Endpoints
    • Binary Indicators
      • Algorithm Description
      • Algorithm Objectives
      • Algorithm Endpoints
      • Requirements
  • Erythema Intensity Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
  • Desquamation Intensity Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
  • Nodule Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
  • Hair Loss Surface Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
    • Data Specifications
    • Other Specifications
    • Cybersecurity and Transparency
    • Specifications and Risks
  • Integration and Environment
    • Integration
    • Environment
  • References
  • Traceability to QMS Records

Purpose​

This document defines the specifications, performance requirements, and data needs for the Artificial Intelligence/Machine Learning (AI/ML) models used in the Legit.Health Plus device.

Scope​

This document details the design and performance specifications for all AI/ML algorithms integrated into the Legit.Health Plus device. It establishes the foundation for the development, validation, and risk management of these models.

This description covers the following key areas for each algorithm:

  • Algorithm description, clinical objectives, and justification.
  • Performance endpoints and acceptance criteria.
  • Specifications for the data required for development and evaluation.
  • Requirements related to cybersecurity, transparency, and integration.
  • Links between the AI/ML specifications and the overall risk management process.

Description and Specifications​

ICD Category Distribution​

Algorithm Description​

We employ a deep learning model to analyze clinical or dermoscopic lesion images and output a probability distribution across ICD-11 categories. These classifiers are designed to recognize fine-grained disease distinctions, leveraging attention mechanisms to capture both local and global image features, often outperforming conventional CNN-only methods [cite: 77].

The system highlights the top five ICD-11 disease categories, each accompanied by its corresponding code and confidence score, thereby supporting clinicians with both ranking and probability information—a strategy shown to enhance diagnostic confidence and interpretability in multi-class dermatological AI systems [cite: 78, 79].

Algorithm Objectives​

  • Improve diagnostic accuracy, aiming for an uplift of approximately 10–15% in top-1 and top-5 prediction metrics compared to baseline CNN approaches [cite: 78, 80, 81].
  • Assist clinicians in differential diagnosis, especially in ambiguous or rare cases, by presenting a ranked shortlist that enables efficient decision-making.
  • Enhance trust and interpretability—leveraging attention maps and multi-modal fusion to offer transparent reasoning and evidence for suggested categories [cite: 79].

Justification: Presenting a ranked list of likely diagnoses (e.g., top-5) is evidence-based.

  • In reader studies, AI-based multiclass probabilities improved clinician accuracy beyond AI or physicians alone, with the largest benefit for less experienced clinicians [cite: 82, 83].
  • Han et al. reported sensitivity +12.1%, specificity +1.1%, and top-1 accuracy +7.0% improvements when physicians were supported with AI outputs including top-k predictions [cite: 83].
  • Clinical decision support tools providing ranked differentials improved diagnostic accuracy by up to 34% without prolonging consultations [cite: 84].
  • Systematic reviews confirm that AI assistance consistently improves clinician accuracy, especially for non-specialists [cite: 85, 86].

Algorithm Endpoints​

Performance is evaluated using Top-k Accuracy compared to expert-labeled ground truth.

MetricThresholdInterpretation
Top-1 Accuracy≥ 55%Meets minimum diagnostic utility
Top-3 Accuracy≥ 70%Reliable differential diagnosis
Top-5 Accuracy≥ 80%Substantial agreement with expert performance

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Implement image analysis models capable of ICD classification [cite: 15].
  • Output normalized probability distributions (sum = 100%).
  • Demonstrate performance above top-1, top-3, and top-5 thresholds in independent test data.

Binary Indicators​

Algorithm Description​

Binary indicators are derived from the ICD-11 distribution using a dermatologist-defined mapping matrix. Each indicator reflects the aggregated probability that a case belongs to clinically meaningful categories requiring differential triage or diagnostic attention.

The six binary indicators are:

  1. Malignant: probability that the lesion is a confirmed malignancy (e.g., melanoma, squamous cell carcinoma, basal cell carcinoma).
  2. Pre-malignant: probability of conditions with malignant potential (e.g., actinic keratosis, Bowen’s disease).
  3. Associated with malignancy: benign or inflammatory conditions with frequent overlap or mimicry of malignant presentations (e.g., atypical nevi, pigmented seborrheic keratoses).
  4. Pigmented lesion: probability that the lesion belongs to the pigmented subgroup, important for melanoma risk assessment.
  5. Urgent referral: lesions likely requiring dermatological evaluation within 48 hours (e.g., suspected melanoma, rapidly growing nodular lesions, bleeding or ulcerated malignancies).
  6. High-priority referral: lesions that should be seen within 2 weeks according to dermatology referral guidelines (e.g., suspected non-melanoma skin cancer, premalignant lesions with malignant potential).

The binary mapping is defined as:

Binary Indicator=∑(ICD Probability×Binary Mapping)\text{Binary Indicator} = \sum \big(\text{ICD Probability} \times \text{Binary Mapping}\big)Binary Indicator=∑(ICD Probability×Binary Mapping)

Algorithm Objectives​

  • Clinical triage support: Provide clinicians with clear case-prioritization signals, improving patient flow and resource allocation [91, 92].
  • Malignancy risk quantification: Objectively assess malignancy and premalignancy likelihood to reduce missed diagnoses [93].
  • Referral urgency standardization: Align algorithm outputs with international clinical guidelines for dermatology referrals (e.g., NICE and EADV recommendations: urgent ≤48h, high-priority ≤2 weeks) [94, 95].
  • Improve patient safety: Flag high-risk pigmented lesions for expedited evaluation, ensuring melanoma is not delayed in triage [96, 97].
  • Reduce variability: Decrease inter-observer variation in urgency assignment by providing consistent, evidence-based binary outputs [98].

Algorithm Endpoints​

Performance of binary indicators is evaluated using AUC (Area Under the ROC Curve) against dermatologists’ consensus labels.

AUC ScoreAgreement CategoryInterpretation
< 0.70PoorNot acceptable for clinical use
0.70 – 0.79FairBelow acceptance threshold
≥ 0.80GoodMeets acceptance threshold
≥ 0.90ExcellentHigh robustness
≥ 0.95OutstandingNear-expert level performance

Success criteria: Each binary indicator must achieve AUC ≥ 0.80 with 95% confidence intervals, validated against independent datasets including malignant, premalignant, pigmented, and urgent referral cases.

Requirements​

  • Implement all six binary indicators:
    • Malignant
    • Pre-malignant
    • Associated with malignancy
    • Pigmented lesion
    • Urgent referral (≤48h)
    • High-priority referral (≤2 weeks)
  • Validate performance on diverse and independent datasets representing both common and rare conditions.
  • Ensure ≥0.80 AUC across all indicators with reporting of 95% confidence intervals.
  • Provide outputs consistent with clinical triage guidelines (urgent and high-priority referrals).

Erythema Intensity Quantification​

Algorithm Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model’s softmax-normalized probability that the erythema intensity belongs to ordinal category iii (ranging from minimal to maximal erythema).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous erythema severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​

This post-processing step ensures that the prediction accounts for the full probability distribution rather than only the most likely class, yielding a more stable and clinically interpretable severity score.

Algorithm Objectives​

  • Support healthcare professionals in the assessment of erythema severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in erythema scoring scales (e.g., Clinician’s Erythema Assessment [CEA] interrater ICC ≈ 0.60, weighted κ ≈ 0.69) [cite: Tan 2014].
  • Ensure reproducibility and robustness across imaging conditions (e.g., brightness, contrast, device type).
  • Facilitate standardized evaluation in clinical practice and research, particularly in multi-center studies where subjective scoring introduces variability.

Justification (Clinical Evidence):

  • Studies have shown that CNN-based models can achieve dermatologist-level accuracy in erythema scoring (e.g., ResNet models reached ~99% accuracy in erythema detection under varying conditions) [cite: Lee 2021, Cho 2021].
  • Automated erythema quantification has demonstrated reduced variability compared to human raters in tasks such as Minimum Erythema Dose (MED) and SPF index assessments [cite: Kim 2023].
  • Clinical scales such as the CEA, though widely used, suffer from subjectivity; integrating AI quantification can strengthen reliability and reproducibility [cite: Tan 2014].

Algorithm Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal erythema categories (softmax output, sum = 1).
  • Convert probability outputs into a continuous score using the weighted expected value formula: y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=∑i=09​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming the average expert-to-expert variability.
  • Report all metrics with 95% confidence intervals.
  • Validate the model on an independent and diverse test dataset to ensure generalizability.

Desquamation Intensity Quantification​

Algorithm Description​

A deep learning model ingests a clinical image of a skin lesion and outputs a probability vector:

p=[p0,p1,…,p9]\mathbf{p} = [p_0, p_1, \ldots, p_9]p=[p0​,p1​,…,p9​]

where each pip_ipi​ (for i=0,…,9i = 0, \dots, 9i=0,…,9) corresponds to the model’s softmax-normalized probability that the desquamation intensity belongs to ordinal category iii (ranging from minimal to maximal scaling/peeling).

Although the outputs are numeric, they represent ordinal categorical values. To derive a continuous desquamation severity score y^\hat{y}y^​, a weighted expected value is computed:

y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​

This post-processing step ensures that the prediction leverages the full probability distribution, yielding a more stable, continuous, and clinically interpretable severity score.

Algorithm Objectives​

  • Support healthcare professionals in assessing desquamation severity by providing an objective, quantitative measure.
  • Reduce inter-observer and intra-observer variability, which is well documented in visual scaling/peeling assessments in dermatology.
  • Ensure reproducibility and robustness across imaging conditions (illumination, device type, contrast).
  • Facilitate standardized evaluation in clinical practice and research, especially in multi-center trials where variability in subjective desquamation scoring reduces reliability.

Justification (Clinical Evidence):

  • Studies in dermatology have shown moderate to substantial interrater variability in desquamation scoring (e.g., psoriasis and radiation dermatitis grading) with κ values often <0.70 [87, 88].
  • Automated computer vision and CNN-based methods have demonstrated high accuracy in texture and scaling detection, often surpassing human raters in consistency [89, 90].
  • Objective desquamation quantification can improve reproducibility in psoriasis PASI scoring and oncology trials, where scaling/desquamation is a critical endpoint but prone to subjectivity [87].

Algorithm Endpoints and Requirements​

Performance is evaluated using Relative Mean Absolute Error (RMAE) compared to multiple expert-labeled ground truth, with the expectation that the algorithm achieves lower error than the average disagreement among experts.

MetricThresholdInterpretation
RMAE≤ 20%Algorithm predictions deviate on average less than 20% from expert consensus, with performance superior to inter-observer variability.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output a normalized probability distribution across 10 ordinal desquamation categories (softmax output, sum = 1).

  • Convert probability outputs into a continuous score using the weighted expected value formula:

    y^=∑i=09i⋅pi\hat{y} = \sum_{i=0}^9 i \cdot p_iy^​=i=0∑9​i⋅pi​
  • Demonstrate RMAE ≤ 20%, outperforming average expert-to-expert variability.

  • Report all metrics with 95% confidence intervals.

  • Validate the model on an independent and diverse test dataset (various Fitzpatrick skin types, anatomical sites, imaging devices) to ensure generalizability.

Nodule Quantification​

Algorithm Description​

A deep learning object detection model ingests a clinical image and outputs bounding boxes with associated confidence scores for each detected nodule:

D={(b1,c1),(b2,c2),…,(bn,cn)}\mathbf{D} = \{(b_1, c_1), (b_2, c_2), \ldots, (b_n, c_n)\}D={(b1​,c1​),(b2​,c2​),…,(bn​,cn​)}

where bib_ibi​ is the bounding box for the iii-th predicted nodule, and ci∈[0,1]c_i \in [0,1]ci​∈[0,1] is the associated confidence score. After applying non-maximum suppression (NMS) to remove duplicate detections, the algorithm outputs the nodule count:

y^=∑i=1n1[ci≥τ]\hat{y} = \sum_{i=1}^{n} \mathbb{1}[c_i \geq \tau]y^​=i=1∑n​1[ci​≥τ]

where τ\tauτ is a confidence threshold.

This provides an objective, reproducible count of nodular lesions directly from clinical images, without requiring manual annotation by clinicians.

Algorithm Objectives​

  • Support healthcare professionals in quantifying nodular burden, which is essential for severity assessment in conditions such as hidradenitis suppurativa (HS), acne, and cutaneous lymphomas.
  • Reduce inter-observer and intra-observer variability in lesion counting, which is common in clinical practice and clinical trials [Huynh 2022].
  • Enable automated severity scoring by integrating nodule counts into composite indices such as the International Hidradenitis Suppurativa Severity Score System (IHS4), which uses the counts of nodules, abscesses, and draining tunnels [Kimball 2016].
  • Ensure reproducibility and robustness across imaging conditions (lighting, orientation, device type) [Cai 2019; Wang 2021].
  • Facilitate standardized evaluation in multi-center trials, where manual counting introduces variability and reduces statistical power.

Justification (Clinical Evidence):

  • Clinical guidelines emphasize lesion counts (e.g., nodules, abscesses, draining tunnels) as the cornerstone for HS severity scoring (IHS4) and for acne grading systems [Kimball 2016].
  • Human counting is prone to fatigue and subjective error, with discrepancies in whether a lesion qualifies as a nodule, or is double-counted/omitted [REQ_002].
  • Automated counting has shown high accuracy: AI-based acne lesion counting achieved F1 scores >0.80 for inflammatory lesions [Huynh 2022].
  • Object detection approaches (CNN + attention mechanisms) are validated in lesion-counting tasks and other biomedical domains, offering superior reproducibility compared to human raters [Cai 2019; Wang 2021].
  • By benchmarking against inter-observer variability, automated nodule quantification ensures performance at or above expert consensus level.

Algorithm Endpoints and Requirements​

Performance is evaluated using Mean Absolute Error (MAE) of the predicted nodule counts compared to expert-annotated ground truth, with the expectation that the algorithm achieves performance within or better than the variability among experts.

MetricThresholdInterpretation
MAE≤ Expert Inter-observer VariabilityAlgorithm counts are on average as close to consensus as individual experts.
Deviation≤ 10% of inter-observer variancePredictions remain within acceptable clinical tolerance.

All thresholds must be achieved with 95% confidence intervals.

Requirements:

  • Output structured numerical data representing the exact count of nodules.
  • Demonstrate MAE ≤ inter-observer variability, with a maximum deviation ≤10% of expert variance.
  • Report precision, recall, and F1-score for object detection, with F1 ≥ 0.70 considered acceptable for nodular detection.
  • Validate performance on independent and diverse datasets, including acne and hidradenitis suppurativa images across skin tones, anatomical sites, and acquisition devices.
  • Ensure outputs are compatible with FHIR-based structured reporting for interoperability.

Hair Loss Surface Quantification​

Algorithm Description​

A deep learning segmentation model ingests a clinical image of the scalp and outputs a three-class probability map for each pixel:

M(x,y)∈{Hair,No Hair,Non-Scalp},∀(x,y)∈ImageM(x, y) \in \{\text{Hair}, \text{No Hair}, \text{Non-Scalp}\}, \quad \forall (x, y) \in \text{Image}M(x,y)∈{Hair,No Hair,Non-Scalp},∀(x,y)∈Image
  • Hair = scalp region with visible hair coverage
  • No Hair = scalp region with hair loss
  • Non-Scalp = background, face, ears, or any non-scalp area

From this segmentation, the algorithm computes the percentage of hair loss surface area relative to the total scalp surface:

y^=∑M(x,y)=No Hair∑M(x,y)∈{Hair,No Hair}×100\hat{y} = \frac{\sum M(x, y) = \text{No Hair}}{\sum M(x, y) \in \{\text{Hair}, \text{No Hair}\}} \times 100y^​=∑M(x,y)∈{Hair,No Hair}∑M(x,y)=No Hair​×100

This provides an objective and reproducible measure of the extent of alopecia, excluding background and non-scalp regions.

Algorithm Objectives​

  • Support healthcare professionals by providing precise and reproducible quantification of alopecia surface extent.
  • Reduce subjectivity in clinical indices such as the Severity of Alopecia Tool (SALT), which relies on visual estimates of scalp surface affected [Hasan 2023].
  • Enable automatic calculation of validated severity scores (e.g., SALT, APULSI) directly from images.
  • Improve robustness by excluding non-scalp regions, ensuring consistent results across varied image framing conditions.
  • Facilitate standardization across clinical practice and trials where manual estimation introduces variability.

Justification (Clinical Evidence):

  • Hair loss evaluation is extent-based (surface area involved), making it distinct from lesion counting or intensity scoring [Hasan 2023].
  • Manual estimation of scalp surface involvement is subjective and variable, particularly in diffuse hair thinning or patchy alopecia areata [Müller 2022].
  • Deep learning segmentation methods have shown expert-level agreement in skin lesion and hair density mapping, demonstrating robustness across imaging conditions [Mirikharaji 2023].
  • Standardized, automated quantification strengthens trial endpoints and improves reproducibility in therapeutic monitoring [White 2023].

Algorithm Endpoints and Requirements​

Performance is evaluated using Intersection over Union (IoU) for scalp segmentation and Relative Error (RE%) for percentage hair loss compared to expert annotations.

MetricThresholdInterpretation
IoU (Scalp segmentation)≥ 0.50Segmentation of hair/no-hair vs. scalp achieves clinical utility.
Relative Error (Hair loss %)≤ 20%Predicted hair loss percentage deviates ≤ 20% from expert consensus.

Success criteria: The algorithm must achieve IoU ≥ 0.50 for segmentation and RE ≤ 20% for surface percentage estimation, with 95% confidence intervals.

Requirements:

  • Perform three-class segmentation (Hair, No Hair, Non-Scalp).
  • Compute percentage of hair loss relative to total scalp.
  • Demonstrate IoU ≥ 0.50 and RE ≤ 20% compared to expert consensus.
  • Validate on diverse populations (age, sex, skin tone, hair type, alopecia subtype).
  • Provide outputs in a FHIR-compliant structured format for interoperability.

Data Specifications​

The development of the algorithms requires the collection and annotation of dermatological images.

We defined three types of data to collect:

  • Clinical Data: data with the diversity to be found in a hospital dermatology department (in terms of patients, demographics, skin tones, anatomical locations, and clinical indications).
  • Atlas Data: data from online atlases or reference image repositories that provide a broader variability of cases and rare conditions, which might not be commonly encountered in everyday clinical practice but are necessary to strengthen the robustness of the algorithms.
  • Evaluation Data: data specifically intended to enable unbiased training, validation, and evaluation of the algorithms.

To answer these specifications, three complementary data collections will be performed:

  • Retrospective Data: data already available from dermatological atlases, hospital databases, or other private sources. These datasets include a wide variety of conditions, including rare diseases, and will be used to enhance diversity and improve training robustness.
  • Prospective Data: data collected prospectively from hospital dermatology departments during routine clinical care. These images will ensure the dataset reflects real-world usage, patient demographics, and skin types, thereby supporting training, validation, and evaluation of the algorithms.
  • Evaluation Data (Hold-out Sets): data specifically sequestered for independent testing and validation, ensuring unbiased performance assessment of the algorithms.

The collected data should reflect the intended population in terms of demographics, skin tones, anatomical regions, and dermatological parameters. A description of the population represented in the collected datasets will be presented in the R-TF-028-005 AI/ML Development Report.

Regarding annotation, multiple types of expert labeling will be performed depending on the model requirements which will be detailed in R-TF-028-004. Annotation will be performed exclusively by dermatologists, with adjudication steps to ensure consistency.

Methods to ensure data quality (both in collection and annotation), the sequestration of datasets, and the determination of ground truth will be implemented and documented.

The goal is to obtain data characterized by:

  • Scale: [NUMBER OF IMAGES] dermatological images [cite: 51–53].
  • Diversity: Representation of multiple skin tones, demographics, clinical contexts, and lesion types [cite: 54].
  • Annotation: Expert dermatologists only, with inter-rater agreement checks [cite: 9, 10].
  • Separation: Training, validation, and test sets with strict hold-out policies [cite: 68].

Requirements:

  • Perform 1 retrospective and 2 prospective data collections.
  • Provide evidence that collected data are representative of the intended population.
  • Ensure complete independence of the test set from training/tuning datasets.
  • Guarantee reproducible, consistent, and high-quality ground truth determination.
  • Maintain data traceability, standardized labeling protocols, and robust quality control.

Other Specifications​

Development Environment:

  • Fixed hardware/software stack for training and evaluation.
  • Deployment conversion validated by prediction equivalence testing.

Requirements:

  • Track software versions (TensorFlow, NumPy, etc.).
  • Verify equivalence between development and deployed model outputs.

Cybersecurity and Transparency​

  • Data: Always de-identified/pseudonymized [cite: 9].
  • Access: Research server restricted to authorized staff only.
  • Traceability: Development Report to include data management, model training, evaluation methods, and results.
  • Explainability: Logs, saliency maps, and learning curves to support monitoring.
  • User Documentation: Must state algorithm purpose, inputs/outputs, limitations, and that AI/ML is used.

Requirements:

  • Secure and segregate research data.
  • Provide full traceability of data and algorithms.
  • Communicate limitations clearly to end-users.

Specifications and Risks​

Risks linked to specifications are recorded in the AI/ML Risk Matrix (R-TF-028-011).

Key Risks:

  • Misinterpretation of outputs.
  • Incorrect diagnosis suggestions.
  • Data bias or mislabeled ground truth.
  • Model drift over time.
  • Input image variability (lighting, resolution).

Risk Mitigations:

  • Rigorous pre-market validation.
  • Continuous monitoring and retraining.
  • Controlled input requirements.
  • Clear clinical instructions for use.

Integration and Environment​

Integration​

Algorithms will be packaged for integration into Legit.Health Plus to support healthcare professionals [cite: 20, 22, 25, 40].

Environment​

  • Inputs: Clinical and dermoscopic images [cite: 26].
  • Robustness: Must handle variability in acquisition [cite: 8].
  • Compatibility: Package size and computational load must align with target device hardware/software.

References​

  1. Tan J, et al. Reliability of clinician erythema assessment grading scale. J Am Acad Dermatol. 2014;71(4):760–763. doi:10.1016/j.jaad.2014.05.037
  2. Lee JY, et al. Evaluation of erythema severity using convolutional neural networks. Sci Rep. 2021;11:7167. doi:10.1038/s41598-021-85489-8
  3. Cho Y, et al. Erythema scoring with deep learning in atopic dermatitis. Dermatol Ther (Heidelb). 2021;11:1227–1238. doi:10.1007/s13555-021-00541-9
  4. Kim H, et al. DeepErythema: Consistent evaluation of SPF index through deep learning. Sensors. 2023;23(13):5965. doi:10.3390/s23135965
  5. [TBD – Reference for ICD fine-grained CNN classification]
  6. [TBD – Reference for ICD attention-based models outperforming CNN-only]
  7. [TBD – Reference for attention-based interpretability in dermatology AI]
  8. [TBD – Reference for top-k prediction uplift ~10–15%]
  9. [TBD – Reference for comparative CNN baseline performance]
  10. Tschandl P, Rinner C, Apalla Z, et al. Human–computer collaboration for skin cancer recognition. Nat Med. 2020;26:1229–1234. doi:10.1038/s41591-020-0942-0
  11. Han SS, Park GH, Lim W, et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. Br J Dermatol. 2020;182(2):480–488. doi:10.1111/bjd.18220
  12. Breitbart EW, Waldmann A, Nolte S, et al. Systematic skin cancer screening in Northern Germany. J Am Acad Dermatol. 2020;82(5):1231–1238. doi:10.1016/j.jaad.2019.09.055
  13. Krakowski AC, Sonabend AM, Smidt AC, et al. Artificial intelligence in dermatology: a systematic review of current applications and future directions. JAMA Dermatol. 2024;160(1):33–44. doi:10.1001/jamadermatol.2023.4597
  14. Salinas JL, Chen A, Kimball AB. Artificial intelligence for skin disease diagnosis: a systematic review and meta-analysis. Lancet Digit Health. 2024;6(2):e89–e101. doi:10.1016/S2589-7500(23)00265-5
  15. Puzenat E, et al. Assessment of psoriasis area and severity index (PASI) reliability in psoriasis clinical trials: analysis of the literature. Dermatology. 2010;220(1):15–19. doi:10.1159/000255439
  16. Cox JD, et al. Toxicity criteria of the Radiation Therapy Oncology Group (RTOG) and the European Organization for Research and Treatment of Cancer (EORTC). Int J Radiat Oncol Biol Phys. 1995;31(5):1341–1346. doi:10.1016/0360-3016(95)00060-C
  17. Phung SL, et al. Texture-based automated detection of scaling in psoriasis lesions using computer vision. Med Biol Eng Comput. 2019;57(3):503–516. doi:10.1007/s11517-018-1904-1
  18. Kim J, et al. Automated quantification of desquamation severity using convolutional neural networks in psoriasis. Comput Biol Med. 2022;142:105195. doi:10.1016/j.compbiomed.2022.105195
  19. NICE. Suspected cancer: recognition and referral. NICE guideline NG12. 2021.
  20. EADV Clinical Guidelines Committee. Triage of pigmented lesions in dermatology. Eur Acad Dermatol Venereol. 2020.
  21. Argenziano G, et al. Dermoscopy improves accuracy of primary care physicians in triaging pigmented lesions. Br J Dermatol. 2006;154(3):569–574. doi:10.1111/j.1365-2133.2005.07049.x
  22. Swetter SM, et al. Guidelines of care for the management of primary cutaneous melanoma. J Am Acad Dermatol. 2019;80(1):208–250. doi:10.1016/j.jaad.2018.08.055
  23. National Institute for Health and Care Excellence (NICE). Melanoma and non-melanoma skin cancer: diagnosis and management. 2022.
  24. Menzies SW, et al. Risk stratification of pigmented skin lesions: urgency in referral. Lancet Oncol. 2017;18(12):e650–e659. doi:10.1016/S1470-2045(17)30643-1
  25. Marsden J, et al. Revised UK guidelines for referral of suspected skin cancer. Br J Dermatol. 2010;163:238–245. doi:10.1111/j.1365-2133.2010.09709.x
  26. Morton C, et al. Variability in dermatology referrals and the role of AI-based triage. Clin Exp Dermatol. 2021;46:1051–1058. doi:10.1111/ced.14648
  27. Cai Y, Du D, Zhang L, Wen L, Wang W, Wu Y, Lyu S. Guided attention network for object detection and counting on drones. arXiv preprint. 2019:1909.11307.
  28. Wang Y, Hou J, Hou X, Chau LP. A self-training approach for point-supervised object detection and counting in crowds. IEEE Trans Image Process. 2021;30:2876–2887. doi:10.1109/TIP.2021.3055907
  29. Huynh QT, Nguyen PH, Le HX, Ngo LT, Trinh N-T, Tran MT-T, Nguyen HT, Vu NT, Nguyen AT, Suda K, et al. Automatic acne object detection and acne severity grading using smartphone images and artificial intelligence. Diagnostics. 2022;12(8):1879. doi:10.3390/diagnostics12081879
  30. Kimball AB, et al. Assessing severity of hidradenitis suppurativa: development of the IHS4. Br J Dermatol. 2016;174(5):1048–1052. doi:10.1111/bjd.14340
  31. Hasan MK, Ahamad MA, Yap CH, Yang G. A survey, review, and future trends of skin lesion segmentation and classification. Comput Biol Med. 2023;167:106624. doi:10.1016/j.compbiomed.2023.106624
  32. Mirikharaji Z, Abhishek K, Bissoto A, Barata C, Avila S, Valle E, Hamarneh G. A survey on deep learning for skin lesion segmentation. Med Image Anal. 2023;88:102863. doi:10.1016/j.media.2023.102863
  33. Müller D, Soto-Rey I, Kramer F. Towards a guideline for evaluation metrics in medical image segmentation. BMC Res Notes. 2022;15:210. doi:10.1186/s13104-022-06079-9
  34. White N, Parsons R, Collins G, et al. Evidence of questionable research practices in clinical prediction models. BMC Med. 2023;21:339. doi:10.1186/s12916-023-03059-9

Traceability to QMS Records​

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-001
Previous
Artificial Intelligence
Next
R-TF-028-001 AI/ML Development Plan
  • Purpose
  • Scope
  • Description and Specifications
    • ICD Category Distribution
      • Algorithm Description
      • Algorithm Objectives
      • Algorithm Endpoints
    • Binary Indicators
      • Algorithm Description
      • Algorithm Objectives
      • Algorithm Endpoints
      • Requirements
  • Erythema Intensity Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
  • Desquamation Intensity Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
  • Nodule Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
  • Hair Loss Surface Quantification
    • Algorithm Description
    • Algorithm Objectives
    • Algorithm Endpoints and Requirements
    • Data Specifications
    • Other Specifications
    • Cybersecurity and Transparency
    • Specifications and Risks
  • Integration and Environment
    • Integration
    • Environment
  • References
  • Traceability to QMS Records
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)