TEST_008 Notify the user image modality and if the image does not represent a skin structure
Test type
System
Linked activities
- MDS-451
- MDS-452
Result
- Passed
- Failed
Description
Tests carried out at the domain detection algorithm, to verify its ability to detect skin structures and image modality.
Objective
The goal of this test is to conduct evaluations on the domain check algorithm to verify its capability for skin structure and image modality detection.
Acceptance criteria
Area Under the Curve (AUC) is greater than 0.8.
Materials & methods
Dataset
We collected a dataset consisting of more than 100,000 dermatological images representing a variety of skin conditions according to the ICD-11 categories. This dataset is the same one used to train, validate, and test the main image recognition model. In addition, we included images devoid of skin structures from diverse sources, including ImageNet with the exclusion of the person category, MS-COCO (2017) with the exclusion of person instances, Google's Cartoon Set, and the Textures dataset. Each subset was split into a training and a validation set. We reused the training and validation splits of the dermatology image dataset used for the image recognition model (TEST_004), whereas the remaining datasets were split randomly into training and validation sets.
The reason behind choosing the aforementioned non-dermatology datasets (ImageNet, MS-COCO, Cartoon, Textures) is that they cover all the scenarios that might occur when assessing the domain of an image:
-
MS-COCO and ImageNet contain an enormous variety of everyday objects and entities;
-
The Cartoon dataset prepares the model for other types of unwanted input images (such as scribbles that may accidentally be uploaded if a user photographed a treatment recipe, for example);
-
The Textures dataset prepares the model for scenarios where the input image may be confused with skin structures (leather, fur, etc).
Model training and validation
To process this dataset, we employed a cutting-edge convolutional neural network known as a vision transformer. Specifically, we chose the "ViT Small" model due to its exceptional performance and speed, encouraged by the compelling results of existing research such as “Vision Transformers are Robust Learners”.
Regarding the training strategy, we used transfer learning to train the network: first, we froze all layers (except for the last linear layer) of a model with pre-trained weights, and trained for several epochs; then, we unfroze all layers and fine-tuned the entire model for another number of epochs.
After training the model, we evaluated the performance on the validation set by converting the multi-class predictions to binary. In this scenario, an image is a positive case when the predicted class is either “clinical” or “dermoscopic”, and negative otherwise. This matches the final use of the model, which is telling if an image is valid for analysis or not. While this could be obtained by directly training a binary classifier, we chose to frame it as a multi-class classification scenario to a higher level of detail (not only knowing whether it's valid or not but also the type of image).
Results
The model attains an exceptional AUC result of 0.9957 on the validation set.
Protocol deviations
There were no deviations from initial protocol.
Conclusions
The algorithm excels at the detection of the presence of a skin structure and image modality.
Test checklist
The following checklist verifies the completion of the goals and metrics specified in the requirement REQ_008 and REQ_010.
Requirement verification |
---|
- [x] Users receive data of the presence of a skin structure |
- [x] Skin structures are detected with an Area Under the Curve (AUC) greater than 0.8 |
- [x] Image modality is detected with an Area Under the Curve (AUC) greater than 0.8 |
Evidence
Output of an image that contains a skin structure:
{'isValid': True, 'metrics': {'hasEnoughQuality': True, 'isDermatologyDomain': True}, 'score': 89.0}
Output of an image that does not contain a skin structure:
{'isValid': False, 'metrics': {'hasEnoughQuality': True, 'isDermatologyDomain': False}, 'score': 64.0}
Screenshot proof:
The metrics calculated in the deliverable are presented in the table below along with a confusion matrix illustrating the algorithm's correct and incorrect predictions.
Metric | Value |
---|---|
ACC | 0.9923 |
BACC | 0.9918 |
AUC | 0.9957 |
Regarding the image modality, the output json comes with either Clinical
or Dermatoscopic
. This is an example of an output for 3 clinical images:
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Tester: JD-017, JD-009, JD-004
- Approver: JD-005