TEST_003 The user receives quantifiable data on the extent of clinical signs
Test typeβ
System
Linked activitiesβ
- MDS-100
- MDS-99
- MDS-173
- MDS-393
Resultβ
- Passed
- Failed
Descriptionβ
Tests carried out with the automatic visual sign surface quantification algorithms, to verify that the performance is comparable to that of an expert dermatologist.
Objectiveβ
The goal is to prove that the quantifiable data on the surface quantification of clinical signs received by the user is extracted with expert dermatologist-level performance.
Acceptance criteriaβ
Area Under the Curve (AUC) greater than 0.80 and Intersection over Union (IoU) greater than 0.5.
Materials & methodsβ
A total of 4596 images were utilized, with 1083 pertaining to atopic dermatitis and similar eczemas, 1376 to psoriasis, 1826 to scalp images with or without hair loss, and 311 images of pressure ulcers. Each image set was evaluated by different health practitioners specializing in their respective conditions, with 3 dermatologist experts for psoriasis, 3 for atopic dermatitis, 3 for hair loss, and four expert nurses for pressure ulcers. To establish the ground truth, we employed mean and median statistics pixel-wise.
To determine whether the datasets are adequate to meet our objectives, we first evaluate the complexity of the tasks at hand. This evaluation involves a thorough analysis of medical evidence, supplemented by a variability study conducted in collaboration with experienced doctors. These doctors possess an in-depth understanding of the nuances of the problem, enabling us to establish a robust baseline for our analysis.
In this context, we calculated the key metrics for each annotated dataset, such as the Intersection over Union (IoU) and the Area Under the Curve (AUC). Impressively, doctors achieved values exceeding 0.9, and in the case of darker skin tones, they remain high, surpassing 0.8. Such strong agreement not only demonstrates the feasibility of our task but also aligns closely with the expectations set by medical professionals. This alignment suggests that our datasets have consistent labeling suitable to effectively address the demands of each task.
Each dataset followed a different splitting strategy based on the amount of patient-related metadata available:
- Atopic dermatitis and eczema datasets were split into training, validation, and test sets. Their metrics are reported on the test sets.
- Psoriasis and scalp datasets were randomly split into their corresponding training and validation sets using K-fold cross-validation. The reported metrics are reported on the validation sets.
- The pressure ulcer dataset was split patient-wise thanks to the metadata available, which helped identify which images belonged to the same individual. The metrics are reported on the validation set using K-fold cross-validation.
For each visual sign surface segmentation task, we applied a U-Net, an architecture first designed for biomedical image segmentation, and showed great results on the task of cell tracking. The main contribution of this architecture was the ability to achieve good results even with hundreds of examples. The U-Net consists of two paths: a contracting path and an expanding path. The contracting path is a typical convolutional network where convolution and pooling operations are repeatedly applied. We decided to use the Resnet-34 architecture, which is the typical backbone used in the contracting path. Additionally, this architecture excels in the data environment in which we are currently operating, especially when combined with transfer learning techniques. This synergy enhances the overall performance and adaptability of the model, making it particularly effective in our specific context.
For all tasks, we followed a transfer learning strategy: first, we froze all the layers (except for the last one) of a model pre-trained on large-scale datasets and trained for several epochs; then, we unfroze all layers and train the full model for another number of epochs. Techniques like data augmentation are also employed to increase the performance, as detailed in Legit.Health Plus description and specifications 2023_001.
The processors we assessed in this test are ASCORAD, APASI, ASALT, and APULSI.
Resultsβ
- The Intersection over Union (IoU) for erythema, edema, oozing, excoriation, lichenification, and dryness quantification stands at 0.93, exceeding the specified requirement of 0.8.
- In terms of the Area Under the Curve (AUC) for erythema, induration, and desquamation quantification, the performance reaches 0.95, which is an excellent result in AUC terms.
- The quantification of hair density resulted in an IoU score of 0.71, determined by considering annotations from three annotators to ensure the incorporation of consensus from expert evaluations.
- The IoU values for wound, maceration, necrotic tissue, and bone quantification are 0.76, 0.60, 0.57, and 0.79, respectively. All values have been measured by considering annotations from three annotators to ensure the incorporation of consensus from expert evaluations.
Protocol deviationsβ
There were no deviations from the initial protocol.
Conclusionsβ
The quantifiable data regarding the surface quantification of clinical signs provided to the user matches the expertise of healthcare professionals. This ensures the quality of the training datasets (size and consistency), offering healthcare practitioners the best information to support their clinical assessments.
Test checklistβ
The following checklist verifies the completion of the goals and metrics specified in the requirement REQ_003.
Requirement verificationβ
- Area Under the Curve (AUC) greater than 0.8 for erythema, induration and desquamation surface quantification |
- Area Under the Curve (AUC) greater than 0.8 for erythema, edema, oozing, excoriation, lichenification and dryness surface quantification |
- Intersection over Union (IoU) greater than 0.6 for hair density surface quantification |
- Intersection over Union (IoU) greater than 0.5 for wound surface quantification |
- Intersection over Union (IoU) greater than 0.5 for maceration surface quantification |
- Intersection over Union (IoU) greater than 0.5 for necrotic tissue surface quantification |
- Intersection over Union (IoU) greater than 0.5 for bone surface quantification |
Evidenceβ
Evidence of algorithms for surface quantification can be found in the following attachments and in the activities and deliverables.
Automatic SCOring of Atopic Dermatitis Using Deep Learning: A Pilot Study
Alfonso Medela, Taig Mac Carthy, S. Andy Aguilar Robles, Carlos M. Chiesa-Estomba, Ramon Grimalt
Published: February 10, 2022
DOI: https://doi.org/10.1016/j.xjidi.2022.100107
APASI: Automatic Psoriasis Area Severity Index Estimation using Deep Learning
Alfonso Medela1, *, Taig Mac Carthy1,** ,+, Andy Aguilar1,***, Pedro G Μomez-Tejerina1,****, Carlos M Chiesa-Estomba2,3,4, Fernando Alfageme-Rold Μan5,+, and Gaston Roustan Gull Μon5,+
1 Department of Medical Computer Vision and PROMs, LEGIT.HEALTH, 48013, Bilbao, Spain
2 Department of Otorhinolaryngology, Osakidetza, Donostia University Hospital, 20014 San Sebastian, Spain
3 Biodonostia Health Research Institute, 20014 San Sebastian, Spain
4 Head Neck Study Group of Young-Otolaryngologists of the International Federations of Oto-rhino-laryngological Societies (YO-IFOS), 13005 Marseille, France
5 Servicio de Dermatolog ΜΔ±a, Hospital Puerta de Hierro, Majadahonda, Madrid, Spain
Not published at the time of writing this document
A visual example of the evidence of the performance in hair density quantification:
We present four examples to visually illustrate the performance of our models in quantifying wound, maceration, necrotic tissue, and bone. These examples serve as individual instances, while the table below provides the mean values computed over the entire validation dataset.
The following table showcases the results obtained from our models alongside annotator variability. It displays mean IoU, Specificity, and Sensitivity values computed across the validation dataset, utilizing annotations from three annotators for establishing the ground truth. To assess expert variability, we calculate these metrics using our models and the ground truth determined through a consensus approach involving three expert annotators. Additionally, we gauge annotator variability by employing one annotator as a reference and assessing the metrics with the same three annotators used to evaluate model performance. This approach enables us to evaluate model performance at a consensus expert level while simultaneously quantifying annotator variability.
Reference | Lesion | Maceration | Necrotic | Bone |
---|---|---|---|---|
Annotator 1: IoU Specificity Sensitivity | 0.806 - 0.885 | 0.611 0.9998 0.252 | 0.672 0.9998 0.467 | 0.836 0.9996 0.192 |
Model 1: IoU Specificity Sensitivity | 0.767 - 0.847 | 0.606 0.9996 0.190 | 0.411 0.9987 0.450 | 0.313 0.9985 0.311 |
Model 2: IoU Specificity Sensitivity | 0.767 - 0.847 | 0.606 0.9996 0.209 | 0.575 0.9993 0.403 | 0.797 0.9991 0.157 |
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Tester: JD-017, JD-009, JD-004
- Approver: JD-005