TEST_009 Notify the user if the quality of the image is insufficient
Test type​
System
Linked activities​
- MDS-232
Result​
- Passed
- Failed
Description​
Tests carried out at the image quality assessment algorithm, to verify that images of insufficient quality are detected and notified to the user.
Objective​
The objective is to demonstrate that the quality assessment algorithm excels at identifying poor-quality images, ensuring accurate notifications to the users.
Acceptance criteria​
A linear correlation greater than 0.7.
Materials & methods​
Dataset​
We gathered a dataset comprising 934 dermatology images (clinical and dermoscopic) taken with smartphones, digital cameras, and dermoscopes. The images were collected from internal sources, including anonymized user images and images from the main dataset used to train our image-based skin disease recognition model (see TEST_004). This ensures that not only different skin tones were represented but also different manifestations of skin pathology (including healthy skin). Every subset was split into training, validation, and test subsets. For the dermatology images, we stratified the data patient-wise, whereas the natural images did not require stratification as they were all different and unique. The reported results are obtained from the test set.
The 934 images were evaluated by 40 non-expert observers, according to International Telecommunication Union's recommendations (ITU-T P.910), in addition to following an evaluation protocol that considered factors such as lighting, focus or saturation. Based on these criteria, the observers rated every image a final quality score from 1 (worst) to 10 (best), covering the different levels of the Absolute Category Rating (ACR) Scale: bad, poor, fair, good, excellent. In the end, each image had a mean opinion score (MOS) that reflected the overall opinion of the 40 observers.
We also used other image quality assessment datasets to make sure our models were presented with as many different types of distortions (either real or artificial) as possible, as some may not be observed in our dataset. These datasets (KonIQ-10k, SPAQ, Kadid-10k) included mean opinion scores from their corresponding observer groups, annotated following the same ACR scale and focusing on similar quality attributes (lighting, focus, saturation, etc.). In order to make all datasets compatible, the MOS from these datasets were transformed from their original scale into the [1, 10] scale that we used for our dermatology dataset.
Regarding the annotation protocol, all datasets followed the same guidelines: using large boards of non-expert human observers to obtain realistic approximations of human visual quality perception, large enough to resist occasional noisy annotators (i.e. workers that did not understand the task or have specific biases). By using large groups of observers, it is possible to discard those that deviate from the average by a certain margin and criterion. In our case, we used the absolute difference between an observer's score and the average score of the others to detect unreliable observers. Once the unreliable observers are detected, they can be excluded from the analysis to obtain the final MOS values.
Training and validation​
We split each dataset into a training and validation set and trained a convolutional neural network to predict the quality score of images. The model was tested on the dermatological image validation set by comparing the output quality score to the original mean opinion score using the mean absolute error (MAE). We also evaluated performance in terms of Pearson's linear correlation (LCC) to the ground truth quality ratings.
Regarding the training strategy, we used transfer learning to train an EfficientNet-B0 convolutional neural network: first, we froze all layers (except for the last linear layer) of a model with pre-trained weights, and trained for several epochs; then, we unfroze all layers and fine-tuned the entire model for another number of epochs.
Results​
The tests resulted in a linear correlation (LC) of 0.80 achieved by the best-performing model on the test set.
Training data | Distortion type | Data used for fine-tuning | LCC |
---|---|---|---|
General domain images | Synthetic | Dermatological images | 0.657 |
General domain images | Real | Dermatological images | 0.806 |
General domain images | Synthetic + real | Dermatological images | 0.750 |
General domain + dermatological images | Synthetic + real | No fine-tuning | 0.737 |
Protocol deviations​
There were no deviations from the initial protocol.
Conclusions​
The quality assessment algorithm effectively identifies low-quality images and ensures that the data is accurately transmitted to the user.
Test checklist​
The following checklist verifies the completion of the goals and metrics specified in the requirement REQ_009.
Requirement verification |
---|
- [x] Users receive data of image quality |
- [x] Algorithm achieves a linear correlation greater than 0.7 |
Evidence​
Evidence is published in the Journal of the American Academy of Dermatology (JAAD) in our Dermatology Image Quality Assessment (DIQA): Artificial intelligence to ensure the clinical utility of images for remote consultations and clinical trials article.
Furthermore, a screenshot of the medical device's output is included, demonstrating that users receive data regarding image quality. This feature allows users to identify cases where the image quality is insufficient. Output of a valid image:
{'isValid': True, 'metrics': {'hasEnoughQuality': True, 'isDermatologyDomain': True}, 'score': 92.0}
Output of a bad quality image:
{'isValid': False, 'metrics': {'hasEnoughQuality': False, 'isDermatologyDomain': True}, 'score': 34.0}
Proof:
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Tester: JD-017, JD-009, JD-004
- Approver: JD-005