REQ_009 Notify the user if the quality of the image is insufficient
Category​
Minor
Source​
- Alfonso Medela JD-005
INTERNAL SYSTEM INPUTS AND OUTPUTS ALARMS, WARNINGS AND MESSAGES USER DATABASE AND DATA DEFINITION ARCHITECTURE
Activities generated​
- MDS-232
Causes failure modes​
- The quality assessment algorithm is too sensitive, flagging acceptable images as insufficient quality.
- The AI model fails to identify low-quality images, leading to no notification when the image quality is actually insufficient.
- The AI model is not sensitive enough, missing issues in the image quality.
- Notifications are not clear or specific enough for the user to understand why the image quality was flagged as insufficient, causing confusion about what is wrong with the image quality.
- The algorithm has been trained on a limited dataset, making it less effective at accurately assessing image quality.
- The criteria for determining image quality are too subjective, leading to inconsistent and unreliable assessments.
- Problems with assessing images of different resolutions or formats.
Related risks​
-
- Image artefacts/resolution: the medical device receives an input that does not have sufficient quality in a way that affects it performance
-
- Inaccessible skin areas: the patient can't capture the affected skin area inside the picture
-
- The user is unable to provide adequate lighting conditions
-
- Inaccurate training data: image datasets used in the development of the device are not properly labeled
-
- Biased or incomplete training data: image datasets used in the development of the device are not properly selected
-
- The device inputs images that do not represent skin structure
-
- Stagnation of model performance: the AI/ML models of the device do not benefit from the potential improvement in performance that comes from re-training
-
- Degradation of model performance: automatic re-training of models decreases the performance of the device
User Requirement, Software Requirement Specification and Design Requirement​
To draw meaningful clinical conclusions from an image, it's crucial to take the photo correctly, ensuring that the lesion is well-focused and centered. What cannot be seen clearly cannot be properly analyzed, whether by a specialist or algorithms.
Taking low-quality images is a common issue, especially among inexperienced users. To address this problem, it's important to develop an algorithm that can identify and filter out such poor-quality images. The requirements deriving from this reasoning are:
- User Requirement 9.1: Users should be informed if the submitted image is of insufficient quality for analysis.
- Software Requirement Specifications 9.2: Deploy image quality assessment algorithms to validate the clarity, focus, and appropriateness of uploaded images for further processing.
- Design Requirement 9.3: Structure device responses such that image quality validation results are clearly communicated through key-value pairs in compliance with medical data standards.
Description​
As outlined in the requirement REQ_008, Convolutional Neural Networks (CNNs) have proven highly effective in their designated tasks. Nevertheless, these models exhibit specific constraints, notably when encountering images falling outside the designated task scope. Equally significant is the impact of image quality on their performance. It is essential to screen out low-quality images, such as those marred by excessive blurriness, to safeguard users from receiving subpar results.
Evaluating image quality poses a complex and subjective challenge. The primary hurdle stems from the process of image annotation, as relying solely on a single annotator's judgment may introduce inherent bias into the algorithm's evaluation. To address this concern, we are engaging a diverse panel of 40 individuals who will evaluate various facets of the image and assign a quality rating, scaling from 0 to 10. Their assessments will adhere to a standardized guide to ensure uniformity and consistency. All workers are given some training prior to annotation to ensure they understand the task for which data needs to be labeled.
Following this, our algorithm training process will incorporate a novel approach. Rather than conventional methods like simple score averaging or traditional statistical techniques, we will leverage the entire distribution of quality values to inform the algorithm's learning process. This approach enhances the robustness and effectiveness of our model.
Success metrics​
For this task, we used the Linear Correlation Coefficient (LCC or Pearson's correlation) to measure the agreement between human observers and the trained model.
The correlation between variables (in this case, the ratings of the human observers and the ratings of a deep learning model) can be assessed in the following ranges:
- A coefficient between 0.9 and 1.0 indicate variables which can be considered very highly correlated.
- A coefficient between 0.7 and 0.9 indicate variables which can be considered highly correlated.
- A coefficient between 0.5 and 0.7 indicate variables which can be considered moderately correlated.
- A coefficient between 0.3 and 0.5 indicate variables which have a low correlation.
- A coefficient lower than 0.3 indicates little if any (linear) correlation.
Given the current size limitations of the dermatology-only image dataset, we set the success value to 0.70 to aim for a DIQA model that is highly correlated, as aiming for an even higher value would be dangerous as we would be probably overfitting the model to achieve such value.
Goal | Metric |
---|---|
Users know if the image has enough quality | Users receive data of image quality |
Bad quality images are correctly identified | Linear correlation > 0.7 |
Previous related requirements​
- REQ_005
- REQ_007
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Tester: JD-017, JD-009, JD-005, JD-004
- Approver: JD-003