R-TF-015-006 Clinical investigation report LEGIT_MC_EVCDAO_2019
Research Title
Clinical validation study of a Computer-aided diagnosis (CADx) system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma.
Description
Clinical validation study of a smartphone-based CADx system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma on patients with skin lesions with suspected malignancy from two hospitals (Hospital Universitario Cruces and Hospital Universitario Basurto) since 2020. The study had an initial cohort of 40 subjects, which was later extended to 105.
Product identification
Information | |
---|---|
Device name | Legit.Health Plus (hereinafter, the device) |
Model and type | NA |
Version | 1.0.0.0 |
Basic UDI-DI | 8437025550LegitCADx6X |
Certificate number (if available) | MDR 792790 |
EMDN code(s) | Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software) |
GMDN code | 65975 |
Class | Class IIb |
Classification rule | Rule 11 |
Novel product (True/False) | FALSE |
Novel related clinical procedure (True/False) | FALSE |
SRN | ES-MF-000025345 |
Promoter Identification and Contact
Manufacturer data | |
---|---|
Legal manufacturer name | AI Labs Group S.L. |
Address | Street Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain) |
SRN | ES-MF-000025345 |
Person responsible for regulatory compliance | Alfonso Medela, María Diez, Giulia Foglia |
office@legit.health | |
Phone | +34 638127476 |
Trademark | Legit.Health |
Identification of the Clinical Investigation Plan (CIP)
- Title: Clinical validation study of a CADx system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma.
- Protocol code: LEGIT_MC_EVCDAO_2019
- Study design: Cross-sectional analytical observational study of clinical case series
- Product under investigation: Legit.Health Plus
- Version and date: Version 3.0, dated October 28th 2021
Public Access Database
The database used in this study is not publicly accessible due to privacy and confidentiality considerations.
Research Team
Principal investigators
- Dr. Jesus Gardeazabal (Osakidetza)
- Dr. Rosa Mª Izu (Osakidetza)
Collaborators
- Dr. Juan Antonio Ratón Nieto (Servicio de Dermatología, Hospital Universitario Cruces)
- Dr. Ana Sánchez Díez (Servicio Dermatología, Hospital Universitario Basurto)
- Alfonso Medela (AI Labs Group S.L.)
- Andy Aguilar (AI Labs Group S.L.)
- Taig Mac Carthy (AI Labs Group S.L.)
Centers
- Hospital Universitario Cruces
- Hospital Universitario Basurto
Compliance Statement
The clinical investigation was perforfed according to the Clinical Investigation Plan (CIP) and other applicable guidances and regulations. This includes compliance with:
- Harmonized standard
UNE-EN ISO 14155:2021
Regulation (EU) 2017/745 on medical devices (MDR)
- Harmonized standard
UNE-EN ISO 13485:2016s
Regulation (EU) 2016/679
(GDPR).- Spanish
Organic Law 3/2018
on the Protection of Personal Data and guarantee of digital rights`.
All data processing within the device is carried out in accordance with the highest standards of data protection and privacy. Patient information is managed in an encrypted manner to ensure confidentiality and security.
The research team assumes the role of Data Controller, responsible for the collection and management of study data. Legit.Health acts as the Data Processor and is not involved in the processing of patient data.
The storage and transfer of data comply with European data protection regulations. At the conclusion of the study, all information stored in the device will be permanently and securely deleted.
The device employs robust technical and organizational security measures to safeguard personal data against unauthorized access, alteration, loss, or processing.
Report Date
May 31, 2024.
Report Authors
Ignacio Hernández Montilla
Table of contents
Table of contents
- Research Title
- Description
- Product identification
- Promoter Identification and Contact
- Identification of the Clinical Investigation Plan (CIP)
- Public Access Database
- Research Team
- Compliance Statement
- Report Date
- Report Authors
- Table of contents
- Abbreviations and Definitions
- Summary
- Introduction
- Materials and methods
- Results
- Discussion and Overall Conclusions
- Ethical Aspects of Clinical Research
- Investigators and Administrative Structure of Clinical Research
Abbreviations and Definitions
- CAD: Computer-Aided Diagnosis
- CIP: Clinical Investigation Plan
- CUS: Clinical Utility Questionnaire
- SUS: System Usability Scale
- GCP: Standards of Good Clinical Practice
- ICH: International Conference of Harmonization
- PI: Principal Investigator
- DLQI: Dermatology Quality of Life Index
- ICH: International Conference of Harmonization
- AUC: Area Under the ROC Curve
Summary
Research title
Clinical validation study of a CADx system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma.
Introduction
Cutaneous melanoma (CM), a type of skin cancer, has seen a significant rise in incidence and mortality. It's particularly aggressive and can metastasize rapidly, making it resistant to chemotherapy and radiotherapy. However, early detection through simple surgical excision is highly treatable. Differentiating between benign and malignant pigmented lesions, especially during visual examination, is challenging.
Due to low public awareness and limited access to dermatologists, melanoma often gets diagnosed at a later stage. To address this, there's growing interest in computer-aided diagnostics (CAD) using artificial intelligence (AI) for early melanoma detection. AI technologies have shown competence comparable to dermatologists in classifying lesions from photographs. Machine vision and AI present a significant opportunity to improve diagnosis.
Preventive activities and early diagnosis campaigns have improved patient survival, pointing to the fact that AI-based devices to assess skin lesion malignancy and distinguish between micro melanomas and other skin lesions like nevus and lentigines may further increase patient survival. This study aims to clinically validate the detection of cutaneous melanoma using computer vision and machine learning applications.
Objectives
Hypothesis
A CADx system powered by computer vision allows early and non-invasive diagnosis of cutaneous melanoma in vivo.
Primary objective
To validate that the artificial intelligence algorithm developed by AI Labs Group SL for the identification of cutaneous melanoma in images of lesions taken with a dermoscopic camera achieves the following values:
- AUC greater than 0.8
- Sensitivity of 80% or higher
- Specificity of 70% or higher
Secondary objective
- To compare the performance of the artificial intelligence algorithm developed by the manufacturer with the performance of healthcare professionals of different specializations:
- Dermatologists
- Primary care physicians
- Validate the usefulness and feasibility of the artificial intelligence algorithm developed by the manufacturer in adverse environments with severe technical limitations, such as lack of instrumentation or internet connection.
The study does not compare the performance of the device against the performance of performance primary care physicians; it only focuses on dermatologists. However, it is widely known that dermatologists have a significantly higher diagnostic success rate in the detection of melanoma.
The gold standard in this study was the diagnosis of an expert dermatologist, resorting to a biopsy when needed. This means that the dermatologists' responses were available for every image, but not every image had a biopsy confirmation. Having both types of clinical data enabled us to compare the device performance to the dermatologists' clinical assessments as well as the biopsy results (but in a smaller subset).
Population
Patients with skin lesions suspected of malignancy are seen at the Dermatology Department of the Hospital Universitario Cruces and Hospital Universitario Basurto.
Design and Methods
Design
This study is an analytical observational case series aimed at assessing the performance of a diagnostic test. It involves observing and analyzing data from a specific group of cases without any intervention by the researchers. Since all measurements are taken at one point in time, it is classified as a cross-sectional study.
Number of Subjects
The initial number of subjects for the study was 40. However, since all included lesions were biopsied due to specialist suspicion of malignancy, this is not representative of typical clinical practice in primary care or dermatology. Therefore, to create a less biased dataset that includes both malignant and benign lesions, we decided to include nevi and other types of skin lesions. As a result, the proposed number of subjects was increased to approximately 200 people, with at least 40 having cutaneous melanoma (20% of the sample).
By the end of the study, 105 patients were recruited, out of which 36 presented cutaneous melanoma. Despite being a smaller sample than the goal (200 subjects), we managed to increase the ratio of cutaneous melanoma subjects originally planned (from 20% to 34.29%), which provides a better understanding of model performance on melanoma detection. By extending the study, we achieved the goal of including more skin lesions (nevi, haemangioma, basal cell carcinoma and dermatofibroma, among others) to account for typical cases in everyday clinical practice.
Initiation Date
The date of inclusion of the first subject was September 17th, 2020.
Completion Date
The last subject of the initial sample of 40 participants was included on March 24, 2021. The study was closed on November 13, 2023, after recruiting 105 participants.
Duration
This study had a recruitment period of 7 months for the inclusion of the first 40 patients. The recruitment period was extended for the inclusion of up to 200 patients, to include at least 40 cases of melanoma (20% of the sample).
The total duration of the study was 38 months, including the time required after the recruitment of the last subject for closing and editing the database, data analysis and preparation of the final study report. The study was finally closed with 105 subjects, close to the expected number of melanoma cases (36) and surpassing the desired ratio (>20%).
Methods
All the skin lesions were photographed following these technical indications:
- Uncompressed image formats, such as PNG, HEIC or TIFF.
- Taken with the DermLite Foto X dermatoscope of the 3Gen Inc.
- Taken from a smartphone with the following characteristics:
- With a camera with a minimum resolution of not less than 13 megapixels.
- Taken with one of the following models:
- Google Pixel 3 and Google Pixel 3 XL.
- Samsung Galaxy Note 10, Samsung Galaxy S10, Samsung Galaxy S10E
- iPhone X and below
- Disabling all image post-processing, such as HDR, portrait mode, colour filters or digital zoom.
Every month, the research team collected the images and verified their correctness. If any image was not of sufficient quality, the investigator repeated the photograph. The research team also collected diagnostic data from expert dermatologists.
Due to the expected variability in image acquisition settings, all images required a preprocessing step to enhance consistency. This involved cropping the areas of interest that contained the skin lesions essential for analysis. Cropping was crucial to minimize noise, such as background elements and any non-skin structures. Once cropped, the images were processed by the device. This study's analysis included this crucial preprocessing step. For more details, refer to the section Subject and Investigational Product Management.
To ensure stable predictions from the device, all the images underwent test-time augmentation (TTA), as detailed in Legit.Health Plus description and specifications
.
Finally, the output predictions were compared to the gold standard to obtain performance metrics: AUC, precision, sensitivity, specificity, and accuracy. Except for AUC, all other metrics were calculated in their top-1, top-3, and top-5 variants (i.e., prediction is successful when the correct class is within the top K predictions). The gold standard was the pathological anatomy results for the biopsied cases and the clinical diagnosis of expert dermatologists for the non-biopsied lesions.
Results
The performance of the device on melanoma detection is excellent when considering the AUC score, indicating a high level of accuracy in distinguishing between melanoma and non-melanoma cases. However, when it comes to predicting malignancy in a broader sense (not just limited to melanoma), the performance is generally reliable, although not as precise or accurate as it is for melanoma detection.
Results of the first sample
The initial sample of 40 subjects was not sufficient to draw sound conclusions. A high percentage of these cases corresponded to very difficult diagnoses for which the dermatologists needed additional tests (i.e. biopsy) to support the naked-eye examination.
Moreover, the 395 images of the skin lesions of these subjects presented a high heterogeneity in terms of image quality (heavy blur, over- and underexposure...), which may also limit the power of the analyses. For that reason, the dataset was expanded to improve image quality and include more common cases.
Results of the second sample
At the end of the study, the number of subjects increased to 105, comprising 565 images. After discarding 2 cases with non-conclusive diagnoses (10 images), the same analysis was planned for the definitive sample of 103 subjects (555 images), obtaining an overall similar performance while improving the malignancy prediction results.
A closer inspection of the 555 images revealed that, despite the extension of the sample, a high percentage of them still presented suboptimal image quality, which could heavily impact the processing capabilities of the device.
Conclusions
The device demonstrates great malignancy prediction and compelling image recognition capacity for melanoma and other pigmented skin lesions such as carcinoma, keratoses or nevi, with results similar to internal validation tests.
Regarding the detection of melanoma, the data collected in this study limits the power of the analysis due to class imbalance, difficult diagnoses, and inconsistent image quality, but the results obtained are compelling even under such challenging conditions.
Introduction
In 2019, the Spanish Society of Medical Oncology (SEOM) estimated that the number of new cancer cases diagnosed in Spain would reach 277,234, 12% more than in 2015 when 247,771 were diagnosed. Among the different types of cancer, cutaneous melanoma (CM) is the type of skin cancer that causes the most deaths, with a significant increase in incidence and mortality in recent decades. It is characterized by a rapidly increasing incidence rate among Caucasian populations and tens of thousands of people worldwide die each year from this cancer. Melanoma is one of the most aggressive malignancies and rapidly metastasizes to distant organs.
When it progresses to the metastatic stage, it establishes powerful mechanisms to resist chemotherapy and radiotherapy, which hinders the efficacy of current medical therapies. However, when detected early, melanoma is treatable in almost all cases with simple surgical excision. On the other hand, there are also benign types of pigmented skin lesions, such as moles, which are natural parts of the skin, that share similar visual characteristics. This makes the differentiation between melanoma and non-melanoma a challenging problem for specialists, especially non-specialists. This problem is particularly significant during a naked-eye examination because early-stage melanomas often resemble benign lesions. A person with a suspicious pigmented skin lesion will go through several steps before a definitive diagnosis of melanoma: self-assessment, evaluation by a primary care physician, evaluation by a specialist, excision, and evaluation by histopathology. Due to low public awareness of the importance of skin cancer prevention and insufficient access to dermatologists in many regions worldwide, melanoma is often diagnosed only after a tumour grows to a medium size.
In light of the above data, prevention and early diagnosis of melanoma have become an extremely important issue. In recent years, there has been increasing demand to develop computer-aided diagnostics and systems that facilitate the early detection of melanoma that could be applied by non-experts and the general public.
Advances in image recognition and artificial intelligence have set in motion innovations in the diagnosis of skin lesions. With appropriate development and proper evaluation, technology could improve diagnostic accuracy. It has been demonstrated that through artificial intelligence (AI) algorithms, it is possible to classify photographs of lesions, including melanoma, with a level of competence comparable to that of dermatologists. The advent of machine vision has revolutionized our understanding of this pathology and presents an enormous opportunity for diagnosis.
Among the trends that have caused an improvement in patient survival, preventive activities and early diagnosis campaigns stand out, among other factors. Therefore, based on our previous research, we propose a system developed through Artificial Intelligence to evaluate the malignancy of a skin lesion, as well as to differentiate between micro melanomas and different moles and lesions developed on the skin among them nevus and lentigines.
This study aims to clinically validate the diagnosis of cutaneous melanoma through machine vision and machine learning applications.
Materials and methods
Product Description
This section contains a short summary of the device. A complete description of the intended purpose, including device description, can be found in the record Legit.Health Plus description and specifications
.
Product description
The device is computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.
The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.
The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.
Intended purpose
The device is a computational software-only medical device intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing:
- quantification of intensity, count, extent of visible clinical signs
- interpretative distribution representation of possible International Classification of Diseases (ICD) classes.
Intended previous uses
No specific intended use was designated in prior stages of development.
Product changes during clinical research
The device maintained a consistent performance and features throughout the entire clinical research process. No alterations or modifications were made during this period.
Clinical Investigation Plan
The study aims to validate a CADx system utilizing machine vision for the early and non-invasive in-vivo diagnosis of cutaneous melanoma.
The primary objective is to confirm that the processor developed for melanoma identification in dermoscopic images achieves the expected values of AUC, sensitivity, and specificity.
Secondary objectives include comparing the device's performance with dermatologists, with consideration of primary care physicians in later phases, and assessing the utility of the device in adverse environments.
The study follows ethical guidelines, including the Declaration of Helsinki and data protection laws, and requires informed consent. Data quality assurance is the responsibility of the Principal Investigator, who reviews and approves the protocol and final study report. The subject population consists of patients with suspected malignancy skin lesions, and no specific treatment is administered as part of the research protocol. Statistical analysis employs AUC, sensitivity, and specificity to assess device performance.
Objectives
The objective of this study is to validate the capability of our CADx system, which leverages machine vision, for the early and non-invasive in-vivo diagnosis of cutaneous melanoma.
The primary objective is to demonstrate that the AI algorithm developed for detecting cutaneous melanoma in dermoscopic images achieves:
- AUC > 0.8
- Sensitivity ≥ 80%
- Specificity ≥ 70%
The secondary objectives are:
- Comparing the performance of the device with that of dermatologists, with an evaluation of primary care physicians' assessments planned for subsequent phases.
- Evaluating the practicality and reliability of the device in challenging environments with technical constraints.
Design (type of research, assessment criteria, methods, active group, and control group)
This is an analytical observational case series study for the performance of a diagnostic test study. Measurements are performed in a single case, so it is a cross-sectional study. There is a single group of participants, consisting of patients with skin lesions with suspected malignancy seen at the Dermatology Department of the Hospital Universitario Cruces and Hospital Universitario Basurto.
Ethical considerations
The conduct of this study adhered to international Good Clinical Practice (GCP) guidelines, the Declaration of Helsinki in its latest amendment, and applicable international and national regulations. Approval from the relevant Ethics Committee was obtained prior to the initiation of the study. Any modifications to the protocol were reviewed and approved by the Principal Investigator (PI) and subsequently evaluated by the Ethics Committee before subjects were enrolled under a modified protocol.
This study was conducted in compliance with European Regulation 2016/679, of 27 April, concerning the protection of natural persons with regard to the processing of personal data and the free movement of such data (General Data Protection Regulation, GDPR), and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and the guarantee of digital rights. In accordance with these regulations, no data enabling the personal identification of participants was collected, and all information was managed securely in an encrypted format.
Patients were informed both orally and in writing about all relevant aspects of the study, with the information being tailored to their level of understanding. They were provided with a copy of the informed consent form and the accompanying patient information sheet. Adequate time was given to patients to ask questions and fully comprehend the details of the study before providing their consent.
The Principal Investigator was responsible for the preparation of the informed consent form, ensuring it included all elements required by the International Conference on Harmonisation (ICH), adhered to current regulatory guidelines, and complied with the ethical principles of GCP and the Declaration of Helsinki.
The original signed informed consent forms were securely stored in a restricted access area under the custody of the PI. These documents remained at the research site at all times. Patients were provided with a copy of their signed consent form for their records.
Data Quality Assurance
The Principal Investigator is responsible for reviewing and approving the protocol, signing the Principal Investigator commitment, guaranteeing that the persons involved in the centre will respect the confidentiality of patient information and protect personal data, and reviewing and approving the final study report together with the sponsor. All the clinical members of the research team assess the eligibility of the patients in the study, inform and request written informed consent, collect the source data of the study in the clinical record and transfer them to the Data Collection Notebook (DCN) or Data Collection Forms (CRF).
Subject Population (inclusion/exclusion criteria and sample size)
Patients with skin lesions suspected of malignancy were seen at the Dermatology Department of the Hospital Universitario Cruces and Hospital Universitario Basurto.
Inclusion criteria
- Patients with skin lesions with suspected malignancy.
- Age over 18 years old.
- Patients who consent to participate in the study by signing the Informed Consent form.
Exclusion criteria
- Patients under 18 years of age.
Treatment
Patients participating in this study did not receive any specific treatment as part of the research protocol.
Concomitant medication or treatment
Patients continued their regular prescribed medications and treatments as directed by their primary healthcare providers. No additional medications or treatments were administered as part of this study.
Follow-Up duration
This study did not require a follow-up of the subjects. Every patient only got their skin lesions photographed at the time of visit.
Statistical analysis
To estimate the device's performance, we used different metrics depending on the task:
Task | Metrics |
---|---|
Melanoma detection | Top-K precision Top-K sensitivity, Top-K specificity, AUC |
Malignancy prediction | AUC |
Skin lesion recognition | Top-K accuracy |
For this study, we set the value of K to 1, 3, and 5, for they are the most common values to assess classification performance.
Hasan MK, Ahamad MA, Yap CH, Yang G. "A survey, review, and future trends of skin lesion segmentation and classification." Computers in Biology and Medicine. 2023 Mar 1;155:106624.
Results
Initiation and Completion Date
The first subject was included on September 17, 2020, and the last subject of the initial cohort of 40 subjects was included on March 24, 2021. The readjusted target number of subjects (200) was finally capped at 105, having reached the desired ratio of cutaneous melanoma cases.
Subject and Investigational Product Management
A total of 105 subjects were included in the study (79 from Hospital Universitario Basurto and 26 from Hospital Universitario Cruces).
Mean | Standard deviation | |
---|---|---|
Images per subject | 5.38 | 4.71 |
The definitive image dataset still presents class imbalance:
Diagnosis | # images (initial) | # images (initial + extension) |
---|---|---|
Actinic keratosis | 5 | 7 |
Angiokeratoma | 0 | 4 |
Angioma | 0 | 13 |
Basal cell carcinoma | 65 | 80 |
Blue nevus | 25 | 29 |
Comedo | 0 | 2 |
Compound nevus | 2 | 6 |
Cutaneous melanoma | 267 | 288 |
Dermatofibroma | 0 | 19 |
Dysplastic nevus | 27 | 27 |
Haemangioma | 0 | 4 |
Junctional nevus | 0 | 2 |
Melanocytic nevus | 2 | 21 |
Nevus | 0 | 4 |
Non-conclusive | 0 | 10 |
Not available | 0 | 1 |
Seborrheic keratosis | 2 | 46 |
Spitz nevus | 0 | 2 |
The distribution of cases reveals that not only did we manage to recruit the desired ratio of melanoma cases (20%), but also increase it (34.29%):
Diagnosis | # Subjects |
---|---|
Actinic keratosis | 2 |
Angiokeratoma | 2 |
Angioma | 5 |
Basal cell carcinoma | 13 |
Blue nevus | 4 |
Comedo | 1 |
Compound nevus | 3 |
Cutaneous melanoma | 36 |
Dermatofibroma | 7 |
Dysplastic nevus | 2 |
Haemangioma | 1 |
Junctional nevus | 1 |
Melanocytic nevus | 10 |
Nevus | 2 |
Non-conclusive | 2 |
Not available | 1 |
Seborrheic keratosis | 22 |
Spitz nevus | 1 |
You will see that the aggregate of subjects per disease does not add up to 105. That is because some subjects presented more than one type of lesion.
In addition to class imbalance, the first sample consisted of difficult cases (that required biopsy), whereas the second one was more diverse and provided a more representative sample of what healthcare professionals would see in their everyday clinical practice:
Sample | # patients with biopsy | # images with biopsy |
---|---|---|
Initial | 40 | 395 |
Extended | 18 | 35 |
Total | 58 | 430 |
To assess visual quality, we used the device's integrated Dermatology Image Quality Assessment (DIQA) processor. By only analysing the raw images (i.e. without cropping the skin lesion), we see that overall quality is acceptable, with some outliers of very low quality.
However, after cropping the images to extract the regions of interest (i.e. the part of the image where the skin lesion is shown), we observed that there was a significant number of images with a DIQA score below 5.
This happens because in many cases the lesion was photographed at a long distance, which leads to low-resolution crops of the region of interest.
In a normal daily clinical practice, the device would automatically detect images with visual quality below a certain DIQA score and reject it. To mimic that behaviour, all images with a DIQA score below 5 were excluded from the analysis, leaving us with 469 images.
Subject Demographics
The research did not specifically focus on sex or age factors. The ancestry composition of the population is equivalent to the general population of the region in which the study was carried out.
Source | Female | Male | Total |
---|---|---|---|
Hospital Universitario Basurto | 38 | 41 | 79 |
Hospital Universitario Cruces | 14 | 12 | 26 |
Total | 52 | 53 | 105 |
Source | Age (mean ± standard deviation) |
---|---|
Hospital Universitario Basurto | 61.66 ± 15.43 |
Hospital Universitario Cruces | 63.65 ± 15.23 |
CIP (Clinical Investigation Plan) Compliance
The study adhered to all aspects outlined in the Clinical Investigation Plan (CIP). This ensured that the research was conducted following established protocols, procedures, and ethical standards. Any deviations from the CIP were duly documented and appropriately addressed. Compliance with the CIP was rigorously monitored throughout the study to uphold the integrity and validity of the research findings.
Analysis
Primary analysis
Melanoma detection
To study model performance in terms of melanoma detection, the multi-class output of the device was converted to a top-K binary output. In other words, the output of the device was successful, or positive, when the melanoma class was present in the top-K predictions, and negative otherwise.
Metric name | Value (initial) | Value (extension) | Task type |
---|---|---|---|
Top-1 precision | 0.8241 | 0.8097 | Melanoma vs non-melanoma |
Top-1 sensitivity | 0.7773 | 0.7379 | Melanoma vs non-melanoma |
Top-1 specificity | 0.6238 | 0.8054 | Melanoma vs non-melanoma |
Top-3 precision | 0.7356 | 0.6418 | Melanoma vs non-melanoma |
Top-3 sensitivity | 0.9476 | 0.9032 | Melanoma vs non-melanoma |
Top-3 specificity | 0.2277 | 0.4344 | Melanoma vs non-melanoma |
Top-5 precision | 0.7111 | 0.5914 | Melanoma vs non-melanoma |
Top-5 sensitivity | 0.9782 | 0.9395 | Melanoma vs non-melanoma |
Top-5 specificity | 0.0990 | 0.2715 | Melanoma vs non-melanoma |
As we increase K, top-K sensitivity increases and top-K specificity decreases. This is normal since looking at the top-K predicted classes gives the "Malignant melanoma" class more chances to appear, which increases sensitivity (more correct detections) but reduces specificity (more false positives). Our top-1 accuracy results already show a good balance between sensitivity and specificity.
Additionally, we computed the AUC for melanoma detection using each image's predicted probabilities for the "Malignant melanoma" class. After the extension of the sample, the score becomes excellent (AUC >= 0.80). Note that even the score obtained with the data from the initial phase (0.79) also gets very close to the objective (0.80). The 95% confidence interval of the melanoma AUC on the full dataset was (0.7832, 0.9137).
Metric name | Value (initial) | Value (initial + extension) | Task |
---|---|---|---|
Melanoma AUC | 0.7915 | 0.8482 | Melanoma vs non-melanoma |
Secondary analysis
Skin lesion recognition
In this analysis, the multi-class output of the device was used without any modification, and compared to the confirmed diagnoses. To compute the top-K accuracy, it was necessary to check if the correct diagnosis was within the top-K predictions of the device. Similarly to melanoma detection, increasing K leads to better accuracy metrics.
Metric name | Value (initial) | Value (extension) | Task type |
---|---|---|---|
Top-1 image-level accuracy | 0.6242 | 0.5501 | Skin lesion classification |
Top-3 image-level accuracy | 0.8303 | 0.7569 | Skin lesion classification |
Top-5 image-level accuracy | 0.9152 | 0.8422 | Skin lesion classification |
Malignancy prediction
Finally, the AUC was computed using the malignancy probability (which is part of the output of the device) and comparing it to the confirmed diagnoses. As in the primary analysis, this requires working in a binary scenario: this was achieved by assigning a positive (or "malignant") label to all images with a diagnosis of a malignant skin lesion (not exclusively melanoma). The 95% confidence interval of the melanoma AUC on the full dataset is (0.8571, 0.9357).
Metric name | Value (initial) | Value (initial + extension) | Task |
---|---|---|---|
Malignancy AUC | 0.8981 | 0.8983 | Malignancy estimation |
Comparison to dermatologist performance
As a high percentage of images of the final sample had both a clinical diagnosis from a dermatologist and a pathological anatomy result (i.e. a biopsy), we compared the performance of the model to that of the dermatologists in that subset of 363 images. As most of the dermatologists' responses presented just one diagnosis (i.e. not a differential diagnosis), we could not compute top-K metrics as for the device.
As dermatologists do not assess malignancy by predicting a probability (which would be out of the scope of the study), it was not possible to compute their corresponding melanoma and malignancy AUCs either.
Dermatologist metric | Value (initial) | Value (extension) | Task type |
---|---|---|---|
Accuracy | 0.6394 | 0.6281 | Skin lesion classification |
Precision | 0.8120 | 0.8120 | Melanoma vs non-melanoma |
Sensitivity | 0.7950 | 0.7950 | Melanoma vs non-melanoma |
Specificity | 0.8087 | 0.8087 | Melanoma vs non-melanoma |
Device metric | Value (initial) | Value (extension) | Task type |
---|---|---|---|
Top-1 accuracy | 0.6242 | 0.6006 | Skin lesion classification |
Top-1 precision | 0.8241 | 0.8206 | Melanoma vs non-melanoma |
Top-1 sensitivity | 0.7773 | 0.7657 | Melanoma vs non-melanoma |
Top-1 specificity | 0.6238 | 0.6774 | Melanoma vs non-melanoma |
Melanoma AUC | 0.7915 | 0.8087 | Melanoma vs non-melanoma |
The clinical diagnoses used in this study came from more than one dermatologist. In other words, patients from Hospital Universitario Basurto and Hospital Universitario Cruces were treated by more than one practitioner. This means that each case's clinical diagnosis is not a consensus between dermatologists but the assessment from a single expert. This analysis can be seen as a comparison between device performance and what a board of dermatologists would yield in everyday clinical practice.
Adverse Events and Adverse Reactions to the Product
Throughout the study, no adverse events or adverse reactions related to the investigated device have been observed. Participants have not experienced any negative reactions or side effects associated with the use of the device. This indicates a favourable safety profile of the investigated device in the context of this study.
Product Deficiencies
No deficiencies in the device have been observed during this study. As a result, no corrective actions have been deemed necessary. The device has demonstrated consistent performance per the study's objectives.
Subgroup Analysis for Special Populations
In the context of the analyzed pathologies, no special population subgroups were identified for this study. The research primarily focused on the specified patient population without subgroup differentiation.
Accounting for all subjects
A total of 105 subjects were included in the study. However, the heterogeneity of the data (class imbalance and image quality) limited the power of the analyses. Due to inconclusive diagnoses, 2 cases were excluded from the latest analysis. This ensures that all the images of the analysis present skin lesions that can be detected by the device.
Discussion and Overall Conclusions
Clinical Performance, Efficacy, and Safety performance
The device has demonstrated an excellent performance in terms of malignancy prediction, which turns it into a valuable tool to prioritize patients according to their risk of presenting malignancy.
The AUC metric for the malignancy prediction was 0.8983, which is comparable to that of expert healthcare professionals (HCP) and speaks to the potential of using the device to improve clinical workflows.
Regarding skin lesion recognition in general terms, the Top-5 accuracy was 84.22%, which supports the device's intended use as a clinical decision-support tool. Specifically in melanoma, the AUC metric was 84.82% which is considerably high and means the consecution of the goals set out in the hypotheses of the study. On the downside, the Top-1 accuracy was 55.01% in the multiple ICD classification task, but the Top-3 accuracy increased to 75.69%. However, it's important to keep in mind that the Top-1 accuracy metric was not a relevant metric to this study, nor the performance of the device, because the device is designed to always output at least the top five predicted classes. This is aligned with its intended purpose as a clinical decision-support tool.
Given the results of the image dataset collected in this study, it is clear that images of better quality would have improved the functioning of the device and provided more valuable insights. Also, as the raw images were usually taken far away from the skin lesion, the cropping resulted in images of suboptimal resolution, which would have also been corrected by higher quality images - or by actually taking the image closer to the lesion.
The data gleaned from our study underscores the imperative need for targeted training programs aimed at enhancing the skills of HCPs in capturing high-quality clinical images. Proper training is paramount to ensure that the images taken in real-world clinical settings are of sufficient quality to yield accurate and reliable diagnostic outcomes. This aligns closely with the findings from our research, indicating that when HCPs are adept at taking good images, the real-world performance of diagnostic tools and assessments can be significantly improved, closely mirroring the positive results obtained in controlled research settings. Thus, investing in comprehensive training for HCPs on effective image-taking techniques stands out as a critical strategy for optimizing patient care and enhancing the overall efficiency of the healthcare system.
Additionally, we believe another factor that limits the results is that, in some cases, malignant cases can not be easily analyzed simply by observing the image. Indeed, some cases require a biopsy to ascertain a diagnosis, regardless of the experience of the observer. This is not a problem for the performance or the safety of the device because whenever there is a suspicion of melanoma, clinicians universally adhere to the protocol of conducting a biopsy to confirm a diagnostic suspicion. This established clinical practice is rooted in the fundamental understanding that the removal of a melanoma is a minor procedure compared to the significant risks associated with the disease. Consequently, practitioners will never rely solely on the device for information when it comes to identifying melanoma, ensuring a comprehensive and cautious approach to diagnosis and treatment.
Clinical Risks and Benefits
Participants in this study did not undergo any procedures that posed a risk to their safety.
Clinical Relevance
While most of the body of research in this area is focused on the use of computer vision for the classification of pigmented skin lesions[1], our device is capable of recognising a variety of ICD classes, including but not limited to pigmented skin lesions. Compared to the state-of-the-art [2][3], the device presents a comparable performance in terms of malignancy prediction, despite the limitations of the study's current image dataset.
Compared to other works such as that of Han et al.[2], our results in overall skin image recognition also demonstrate the potential of computer vision in dermatology.
[1] Li, Ling-Fang, et al. "Deep learning in skin disease image recognition: A review." Ieee Access 8 (2020): 208264-208280.
[2] Han, Seung Seog, et al. "Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders." Journal of Investigative Dermatology 140.9 (2020): 1753-1761.
[3] Haenssle, Holger A., et al. "Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists." Annals of Oncology 29.8 (2018): 1836-1842.
Specific Benefit or Special Precaution
Benefits
- Diagnosis support of skin structures: The device can recognize a variety of ICD classes. This makes the device a diagnosis support tool that could potentially save time for healthcare professionals and lead to faster treatment for patients.
- Longitudinal measure of disease progression: Despite not being tested in this study, the device can measure the severity of skin diseases, which can assist in monitoring the progression of the disease and the effectiveness of treatment.
- Data Collection and Analysis: The device can provide a wide range of clinical data from the analyzed images. This can assist healthcare practitioners in their clinical evaluations and allow healthcare provider organizations to gather data and improve their workflows.
Precautions
- Use on visible skin structures only: The device can only quantify clinical signs that are visible in a clinical or a dermoscopic image. This limitation should be considered when using the device.
- Data demographics, diversity and bias: The device is trained on a large collection of dermatology images to ensure its functioning is stable among all types of populations and skin tones. However, despite the diversity of the data, there may still be some bias depending on the ICD class.
- Image quality: As discussed in this study, visual quality plays a crucial role in the analysis of an image. The device already incorporates the DIQA* algorithm to ensure all images processed by it are of enough visual quality.
- User Training: As the device is intended for use by health care professionals, adequate training should be provided to ensure correct use and interpretation of the results.
Hernández Montilla, Ignacio, Taig Mac Carthy, Andy Aguilar, and Alfonso Medela. "Dermatology Image Quality Assessment (DIQA): Artificial intelligence to ensure the clinical utility of images for remote consultations and clinical trials." Journal of the American Academy of Dermatology 88, no. 4 (2023): 927-928.
Implications for Future Research
The device's image recognition processor is the result of a continuous improvement, not only of the deep learning model employed but also of the list of ICD classes it is capable of recognising.
Achieving more granularity and detail for the current taxonomy of ICD classes will lead to better results, even with this limited image dataset of suboptimal quality.
The overall image acquisition process will also be supervised more rigorously, to ensure the person responsible for taking the images ensures the skin lesion is properly framed and has good visual quality.
Limitations of Clinical Research
The main limitation of computer vision-based skin image recognition lies in the quantity and quality of the images collected. Variability in illumination, colour, shape, size and focus are determinants, in addition to the number of images per lesion. This means that a large variability within the same lesion (outliers) and an insufficient number of images to reflect that variability can result in an accuracy lower than expected. This has happened in this study, with extremely zoomed-out, blurry, out-of-frame, over- and underexposed images that limited the power of the analyses.
For this reason, the originally proposed size of 40 subjects was readjusted to 200, of which at least 40 (20%) should present cutaneous melanoma. At the study closure, 105 subjects were recruited, with 36 cases of cutaneous melanoma (34.29%). The impact of low-quality data was compensated by excluding low-quality images from the analyses (DIQA < 5).
After analyzing the data from the pilot study with the entire cohort of patients, the results obtained indicated that despite the good diagnostic capacity of the CADx system, the results of discrimination between melanoma and other pathologies were not representative of daily clinical practice, since it only and exclusively included cases with a high suspicion of malignancy. However, the results provide a good understanding of model performance on challenging cases like the ones included. After extending the study and including a wider variety of lesions, the performance was equally compelling and the dataset became more representative of daily clinical practice.
Ethical Aspects of Clinical Research
The conduct of the study will conform to international Good Clinical Practice guidelines, to the Declaration of Helsinki in its latest active amendment, and to international and national rules and regulations and will not be initiated until approval has been obtained from the Basque Country Committee on Drug Research Ethics (CEIm de Euskadi). Any modification of this protocol will be reviewed and approved by the Principal Investigator and must be evaluated by the CEIm of Euskadi for approval before including subjects in a modified protocol.
The study will be conducted according to European Regulation 2016/679, of 27 April, on the protection of natural persons concerning the processing of personal data and the free movement of such data and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and guarantee of digital rights about data processing in which no data allowing personal identification of subjects will be included, the information being managed in an encrypted manner.
Patients will be informed orally and in writing about all the information related to the study and adapted to their level of understanding. A copy of the consent form and information sheet will be provided to the patient. The investigator should allow the patient the necessary time to ask questions about the details of the study.
The preparation of the informed consent form is the responsibility of the Principal Investigator. This form must include all the elements required by the International Conference of Harmonization (ICH), current regulatory guidelines, and comply with the GCP Guidelines and the ethical principles that originate from the Declaration of Helsinki.
The investigator or the Principal Investigator's designee will keep the original signed informed consent form in a secure restricted access area in the custody of the Principal Investigator and will never leave the facility and will provide a copy of the original signed consent form to the patient.
Investigators and Administrative Structure of Clinical Research
Brief Description
The clinical investigation team comprises highly esteemed dermatologists and a specialist in artificial intelligence. Dr. Jesús Gardeazabal García and Dr. Rosa María Izu Belloso serve as the Principal Investigators, affiliating with Hospital Universitario Cruces and Hospital Universitario Basurto's Dermatology Services, respectively.
Collaborating on the study, we have Dr Juan Antonio Ratón Nieto and Dr Ana Sánchez Díez, both of whom are associated with the dermatology departments of Hospital Universitario Cruces and Hospital Universitario Basurto. Completing the team, Alfonso Medela represents AI Labs Group SL, bringing in expertise in artificial intelligence to the clinical investigation, together with Andy Aguilar and Taig Mac Carthy.
This diverse and skilled team ensures a comprehensive approach to the clinical evaluation of the device, aiming to validate its safety, effectiveness, and performance in a real-world dermatological setting.
Investigators
Principal investigators
- Dr. Jesus Gardeazabal (Osakidetza).
- Dra. Rosa Mª Izu (Osakidetza).
Collaborators
- Dr. Juan Antonio Ratón Nieto (Servicio de Dermatología, Hospital Universitario Cruces).
- Dr. Ana Sánchez Díez (Servicio Dermatología, Hospital Universitario Basurto).
- Alfonso Medela (AI Labs Group S.L.).
- Andy Aguilar (AI Labs Group S.L.).
- Taig Mac Carthy (AI Labs Group S.L.).
External Organization
No external organizations contributed to this research.
Sponsor and Monitor
AI Labs Group S.L.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Author: Team members involved
- Reviewer: JD-003, JD-004
- Approver: JD-001