R-TF-015-006 Clinical investigation report

Research Title

Clinical validation study of a Computer-aided diagnosis (CADx) system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma.

Description

Clinical validation study of a smartphone-based CADx system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma on patients with skin lesions with suspected malignancy from two hospitals (Hospital Universitario Cruces and Hospital Universitario Basurto) since 2020. The study had an initial cohort of 40 subjects, which was later extended to 105.

Product identification

	Information
Device name	Legit.Health Plus (hereinafter, the device)
Model and type	NA
Version	1.1.0.0
Basic UDI-DI	8437025550LegitCADx6X
Certificate number (if available)	MDR 792790
EMDN code(s)	Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software)
GMDN code	65975
EU MDR 2017/745	Class IIb
EU MDR Classification rule	Rule 11
Novel product (True/False)	TRUE
Novel related clinical procedure (True/False)	TRUE
SRN	ES-MF-000025345

	Manufacturer data
Legal manufacturer name	AI Labs Group S.L.
Address	Street Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain)
SRN	ES-MF-000025345
Person responsible for regulatory compliance	Alfonso Medela, Saray Ugidos
E-mail	office@legit.health
Phone	+34 638127476
Trademark	Legit.Health
Authorized Representative	Not applicable (manufacturer is based in EU)

Identification of the Clinical Investigation Plan (CIP)

	CIP
Title of the clinical investigation	Clinical validation study of a CAD system with artificial intelligence algorithms for early noninvasive in vivo cutaneous melanoma detection
Device under investigation	Legit.Health Legacy Device
Protocol version	Version 3.0
Date	2021-10-28
Protocol code	LEGIT_MC_EVCDAO_2019
Sponsor	AI Labs Group S.L.
Coordinating Investigator	Dr. Jesus Gardeazabal Garcia and Dr. Rosa Ma Izu Belloso
Principal Investigator(s)	Dr. Jesus Gardeazabal Garcia and Dr. Rosa Ma Izu Belloso
Investigational site(s)	Hospital Universitario Cruces and Hospital Universitario Basurto
Ethics Committee	Comite de Etica de la Investigacion con Medicamentos de Euskadi

Public Access Database

The database used in this study is not publicly accessible due to privacy and confidentiality considerations.

Research Team

Principal investigators

Dr. Jesús Gardeazabal García (Osakidetza, Hospital Universitario Cruces)
Dr. Rosa Mª Izu Belloso (Osakidetza, Hospital Universitario Basurto)

Collaborators

Dr. Juan Antonio Ratón Nieto (Servicio de Dermatología, Hospital Universitario Cruces)
Dr. Ana Sánchez Díez (Servicio Dermatología, Hospital Universitario Basurto)
Alfonso Medela (AI Labs Group S.L.)
Andy Aguilar (AI Labs Group S.L.)
Taig Mac Carthy (AI Labs Group S.L.)

Centres

Hospital Universitario Cruces
Hospital Universitario Basurto

Compliance Statement

The clinical investigation was perforfed according to the Clinical Investigation Plan (CIP) and other applicable guidances and regulations. This includes compliance with:

Harmonized standard UNE-EN ISO 14155:2021
Regulation (EU) 2017/745 on medical devices (MDR)
Harmonized standard UNE-EN ISO 13485:2016s
Regulation (EU) 2016/679 (GDPR).
Spanish Organic Law 3/2018 on the Protection of Personal Data and guarantee of digital rights.

All data processing within the device is carried out in accordance with the highest standards of data protection and privacy. Patient information is managed in an encrypted manner to ensure confidentiality and security.

The research team assumes the role of Data Controller, responsible for the collection and management of study data. Legit.Health acts as the Data Processor and is not involved in the processing of patient data.

The storage and transfer of data comply with European data protection regulations. At the conclusion of the study, all information stored in the device will be permanently and securely deleted.

The device employs robust technical and organizational security measures to safeguard personal data against unauthorized access, alteration, loss, or processing.

Report Date

May 31, 2024.

Report Author(s)

The full name, the ID and the signature for the authorship, as well as the approval process of this document, can be found in the verified commits at the repository. This information is saved alongside the digital signature, to ensure the integrity of the document.

Table of contents

Research Title
Description
Product identification
Sponsor Identification and Contact
Identification of the Clinical Investigation Plan (CIP)
Public Access Database
Research Team
Compliance Statement
Report Date
Report Author(s)
Table of contents
Abbreviations and Definitions
Summary
Introduction
Materials and methods
Results
Discussion and Overall Conclusions
Ethical Aspects of Clinical Research
Investigators and Administrative Structure of Clinical Research

Abbreviations and Definitions

AE: Adverse Event
AEMPS: Spanish Agency of Medicines and Medical Devices
AEP: Adverse Reaction to Product
AUC: Area Under the ROC Curve
CAD: Computer-Aided Diagnosis
CMD: Data Monitoring Committee
CIP: Clinical Investigation Plan
CUS: Clinical Utility Questionnaire
DLQI: Dermatology Quality of Life Index
GCP: Standards of Good Clinical Practice
ICH: International Conference of Harmonization
IFU: Instructions For Use
IRB: Institutional Review Board
N/A: Not Applicable
NCA: National Competent Authority
PI: Principal Investigator
PPV: Positive Predictive Value
NPV: Negative Predictive Value
SAE: Serious Adverse Events
SAEP: Serious Adverse Event to Product
SUAEP: Serious and Unexpected Adverse Event to the Product
SUS: System Usability Scale

Summary

Research title

Clinical validation study of a CADx system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma.

Introduction

Cutaneous melanoma (CM), a type of skin cancer, has seen a significant rise in incidence and mortality. It's particularly aggressive and can metastasize rapidly, making it resistant to chemotherapy and radiotherapy. However, early detection through simple surgical excision is highly treatable. Differentiating between benign and malignant pigmented lesions, especially during visual examination, is challenging.

Due to low public awareness and limited access to dermatologists, melanoma often gets diagnosed at a later stage. To address this, there's growing interest in computer-aided diagnostics (CAD) using artificial intelligence (AI) for early melanoma detection. AI technologies have shown competence comparable to dermatologists in classifying lesions from photographs. Machine vision and AI present a significant opportunity to improve diagnosis.

Preventive activities and early diagnosis campaigns have improved patient survival, pointing to the fact that AI-based devices to assess skin lesion malignancy and distinguish between micro melanomas and other skin lesions like nevus and lentigines may further increase patient survival. This study aims to clinically validate the detection of cutaneous melanoma using computer vision and machine learning applications.

Objectives

Hypothesis

A CADx system powered by computer vision allows early and non-invasive diagnosis of cutaneous melanoma in vivo.

Primary objective

To validate that the device developed by AI Labs Group SL for the identification of cutaneous melanoma in images of lesions taken with a dermoscopic camera achieves the following values:

Error: Could not determine study code. Provide studyCode or folderSlug prop, or ensure this component is used within an Investigation document with a registered folder slug.

Secondary objective

To compare the performance of the device developed by the manufacturer with the performance of healthcare professionals of different specializations:
- Dermatologists
- Primary care practitioners
Validate the usefulness and feasibility of the device developed by the manufacturer in adverse environments with severe technical limitations, such as lack of instrumentation or internet connection.

top-1 accuracy equal to or greater than 50.00%.
top-3 accuracy equal to or greater than 60.00%.
top-5 accuracy equal to or greater than 80.00%.
AUC (area under the ROC curve) equal to or greater than 80.00% detecting melanoma.
top-1 accuracy equal to or greater than 80.00% detecting melanoma.
sensitivity equal to or greater than 80.00% detecting melanoma.
specificity equal to or greater than 70.00% detecting melanoma.
AUC (area under the ROC curve) equal to or greater than 80.00% detecting malignancy.
sensitivity equal to or greater than 80.00% detecting malignancy.
specificity equal to or greater than 84.00% detecting malignancy.
PPV (positive predictive value) equal to or greater than 80.00% detecting malignancy.
NPV (negative predictive value) equal to or greater than 90.00% detecting malignancy.

Performance of primary care practitioners and dermatologists

The study does not compare the performance of the device against the performance of general practitioners; it only focuses on dermatologists. This comparison was planned as a secondary objective for subsequent phases of research but was not conducted in this study due to recruitment difficulties and the heavy workload of the practitioners at the participating centres. However, it is widely known that dermatologists have a significantly higher diagnostic success rate in the detection of melanoma compared to general practitioners.

The gold standard in this study was primarily based on pathological anatomy results (biopsy) when available. For cases where biopsy was not necessary, the consensus diagnosis of expert dermatologists was used as the gold standard, which is a common practice in clinical validation studies. This means that the dermatologists' responses were available for every image, and biopsy confirmation was available for a subset of cases. Having both types of clinical data enabled us to compare the device performance to the dermatologists' clinical assessments as well as the biopsy results (in those cases where biopsy was performed).

Population

Patients with skin lesions suspected of malignancy are seen at the Dermatology Department of the Hospital Universitario Cruces and Hospital Universitario Basurto.

Sample size

For this study, we aimed for a proportion of 25% positive melanoma cases. To calculate the appropriate sample size which allows us to detect significant differences, we used G*Power 3.1.9 software. We performed a z-test for a 95% power at a 5% significance level and an allocation ratio N2/N1 of 4. Consequently, a sample size of 160 was obtained, with 32 patients with melanoma and 128 without it. To take into account possible loss of follow-up we determined a sample size of 200 with 40 patients diagnosed with melanoma.

Design and Methods

Design

This study is an analytical observational case series aimed at assessing the performance of a diagnostic test. It involves observing and analyzing data from a specific group of cases without any intervention by the researchers. Since all measurements are taken at one point in time, it is classified as a cross-sectional study.

Number of Subjects

The initial number of subjects for the study was 40. However, since all included lesions were biopsied due to specialist suspicion of malignancy, this was not representative of typical clinical practice in primary care or dermatology. Therefore, to create a less biased dataset that includes both malignant and benign lesions, we decided to include nevi and other types of skin lesions. As a result, the proposed number of subjects was increased to approximately 200 people, with at least 40 having cutaneous melanoma (20% of the sample).

By the end of the study, 105 patients were recruited, out of which 36 presented cutaneous melanoma. The study was concluded when the target number of melanoma cases was achieved, as the primary objective of the clinical investigation was to validate the device's performance in detecting melanoma. By achieving 36 melanoma cases (exceeding the 20% planned ratio with an actual ratio of 34.29%), we obtained sufficient statistical power for the validation of melanoma detection performance, thereby justifying the early closure of the study. Additionally, we managed to include more skin lesions (nevi, haemangioma, basal cell carcinoma and dermatofibroma, among others) to account for typical cases in everyday clinical practice, addressing the initial limitation that the first cohort consisted primarily of difficult diagnoses requiring biopsy confirmation.

Initiation Date

The study was approved by the Ethics Committee on February 10th, 2020 and the date of inclusion of the first subject was September 17th, 2020.

Completion Date

The last subject of the initial sample of 40 participants was included on March 24, 2021. The study was closed on November 13, 2023, after recruiting 105 participants.

Duration

This study had a recruitment period of 7 months for the inclusion of the first 40 patients. The recruitment period was extended for the inclusion of up to 200 patients, to include at least 40 cases of melanoma (20% of the sample).

The total duration of the study was 38 months, including the time required after the recruitment of the last subject for closing and editing the database, data analysis and preparation of the final study report. The study was finally closed with 105 subjects, close to the expected number of melanoma cases (36%) and surpassing the desired ratio (>20%).

Methods

All the skin lesions were photographed following these technical indications:

Uncompressed image formats, such as PNG, HEIC or TIFF.
Taken with the DermLite Foto X dermatoscope of the 3Gen Inc.
Taken from a smartphone with the following characteristics:
- With a camera with a minimum resolution of not less than 13 megapixels.
- Taken with one of the following models:
  - Google Pixel 3 and Google Pixel 3 XL.
  - Samsung Galaxy Note 10, Samsung Galaxy S10, Samsung Galaxy S10E
  - iPhone X and below
Disabling all image post-processing, such as HDR, portrait mode, colour filters or digital zoom.

Every month, the research team collected the images and verified their correctness. If any image was not of sufficient quality, the investigator repeated the photograph. The research team also collected diagnostic data from expert dermatologists.

Image cropping

Due to the expected variability in image acquisition settings, all images required a preprocessing step to enhance consistency. This involved cropping the areas of interest that contained the skin lesions essential for analysis. Cropping was crucial to minimize noise, such as background elements and any non-skin structures. Once cropped, the images were processed by the device. This study's analysis included this crucial preprocessing step. For more details, refer to the section Subject and Investigational Product Management.

To ensure stable predictions from the device, all the images underwent test-time augmentation (TTA), as detailed in Legit.Health Plus description and specifications.

Finally, the output predictions were compared to the gold standard to obtain performance metrics: AUC, precision or positive predictive value (PPV), sensitivity, specificity, negative predictive value (NPV), and accuracy. Except for AUC, PPV and NPV, all other metrics were calculated in their top-1, top-3, and top-5 variants (i.e., prediction is successful when the correct class is within the top K predictions). The gold standard was primarily based on pathological anatomy results (biopsy) for biopsied cases. In cases where biopsy was not necessary, the consensus diagnosis of expert dermatologists was used as the gold standard, which is a common and accepted practice in clinical validation studies of diagnostic devices.

Results

The performance of the device on melanoma detection is excellent when considering the AUC score, indicating a high level of accuracy in distinguishing between melanoma and non-melanoma cases. The obtained AUC scores support that malignancy estimation performance beyond melanoma is also excellent.

Results of the first sample

The initial sample of 40 subjects was not sufficient to draw sound conclusions. A high percentage of these cases corresponded to very difficult diagnoses for which the dermatologists needed additional tests (i.e. biopsy) to support the naked-eye examination.

Moreover, the 395 images of the skin lesions of these subjects presented a high heterogeneity in terms of image quality (heavy blur, over- and underexposure...), which may also limit the power of the analyses. For that reason, the dataset was expanded to improve image quality and include more common cases.

Results of the second sample

At the end of the study, the number of subjects increased to 105, comprising 565 images. After discarding 2 cases with non-conclusive diagnoses (10 images), the same analysis was planned for the definitive sample of 103 subjects (555 images), obtaining an overall similar performance while improving the malignancy prediction results.

A closer inspection of the 555 images revealed that, despite the extension of the sample, a high percentage of them still presented suboptimal image quality, which could heavily impact the processing capabilities of the device.

Conclusions

The CADx device demonstrates strong potential for malignancy prediction and accurate image recognition of melanoma and other pigmented skin lesions, such as carcinoma, keratoses, and nevi. Its performance in this clinical validation is consistent with previous internal validation tests.

However, the study faced limitations due to class imbalance, challenging diagnoses, and inconsistent image quality. These factors reduced the statistical power for melanoma detection. Despite these challenges, the device still achieved compelling results, even under difficult conditions.

Importantly, the results highlight the need for high-quality, well-framed images to maximize diagnostic accuracy. The exclusion of low-quality images (DIQA < 5) helped mitigate some of the negative impact of poor data.

In summary, the CADx system shows promise as a supportive tool for early melanoma detection, especially in complex or ambiguous cases. Further studies with larger, more balanced cohorts and consistently high-quality images are recommended to confirm and extend these findings.

The clinical data generated in this investigation supports the demonstration of conformity with the General Safety and Performance Requirements (GSPR), specifically GSPR 1 and GSPR 8 of the Medical Device Regulation (EU) 2017/745. The results confirm a positive benefit-risk ratio, as the device provides significant diagnostic support without introducing any safety risks to the patients.

Introduction

In 2019, the Spanish Society of Medical Oncology (SEOM) estimated that the number of new cancer cases diagnosed in Spain would reach 277,234, 12% more than in 2015 when 247,771 were diagnosed. Among the different types of cancer, cutaneous melanoma (CM) is the type of skin cancer that causes the most deaths, with a significant increase in incidence and mortality in recent decades. It is characterized by a rapidly increasing incidence rate among Caucasian populations and tens of thousands of people worldwide die each year from this cancer. Melanoma is one of the most aggressive malignancies and rapidly metastasizes to distant organs.

When it progresses to the metastatic stage, it establishes powerful mechanisms to resist chemotherapy and radiotherapy, which hinders the efficacy of current medical therapies. However, when detected early, melanoma is treatable in almost all cases with simple surgical excision. On the other hand, there are also benign types of pigmented skin lesions, such as moles, which are natural parts of the skin, that share similar visual characteristics. This makes the differentiation between melanoma and non-melanoma a challenging problem for specialists, especially non-specialists. This problem is particularly significant during a naked eye examination because early-stage melanomas often resemble benign lesions. A person with a suspicious pigmented skin lesion will go through several steps before a definitive diagnosis of melanoma: self-assessment, evaluation by a primary care practitioner, evaluation by a specialist, excision, and evaluation by histopathology. Due to low public awareness of the importance of skin cancer prevention and insufficient access to dermatologists in many regions worldwide, melanoma is often diagnosed only after a tumour grows to a medium size.

In light of the above data, prevention and early diagnosis of melanoma have become an extremely important issue. In recent years, there has been increasing demand to develop computer-aided diagnostics and systems that facilitate the early detection of melanoma that could be applied by non-experts and the general public.

Advances in image recognition and artificial intelligence have set in motion innovations in the diagnosis of skin lesions. With appropriate development and proper evaluation, technology could improve diagnostic accuracy. It has been demonstrated that through artificial intelligence (AI) algorithms, it is possible to classify photographs of lesions, including melanoma, with a level of competence comparable to that of dermatologists. The advent of machine vision has revolutionized our understanding of this pathology and presents an enormous opportunity for diagnosis.

Among the trends that have caused an improvement in patient survival, preventive activities and early diagnosis campaigns stand out, among other factors. Therefore, based on our previous research, we propose a system developed through Artificial Intelligence to evaluate the malignancy of a skin lesion, as well as to differentiate between micro melanomas and different moles and lesions developed on the skin among them nevus and lentigines.

This study aims to clinically validate the diagnosis of cutaneous melanoma through machine vision and machine learning applications.

Materials and methods

Product Description

This section contains a short summary of the device. A complete description of the intended purpose, including device description, can be found in the record Legit.Health Plus description and specifications.

Product description

The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.

The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.

The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.

Intended purpose

The device is a computational software-only medical device intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing:

quantification of intensity, count, extent of visible clinical signs
interpretative distribution representation of possible International Classification of Diseases (ICD) categories.

Intended previous uses

No specific intended use was designated in prior stages of development.

Product changes during clinical research

The device maintained a consistent performance and features throughout the entire clinical research process. No alterations or modifications were made during this period.

Clinical Investigation Plan

The study aims to validate a CADx system utilizing machine vision for the early and non-invasive in-vivo diagnosis of cutaneous melanoma.

The primary objective is to confirm that the processor developed for melanoma identification in dermoscopic images achieves the expected values of AUC, sensitivity, and specificity.

Secondary objectives include comparing the device's performance with dermatologists, with consideration of primary care practitioners in later phases, and assessing the utility of the device in adverse environments.

The study follows ethical guidelines, including the Declaration of Helsinki and data protection laws, and requires informed consent. Data quality assurance is the responsibility of the Principal Investigator, who reviews and approves the protocol and final study report. The subject population consists of patients with suspected malignancy skin lesions, and no specific treatment is administered as part of the research protocol. Statistical analysis employs AUC, sensitivity, and specificity to assess device performance.

Objectives

The objective of this study is to validate the capability of our CADx system, which leverages machine vision, for the early and non-invasive in-vivo diagnosis of cutaneous melanoma.

The primary objective is to demonstrate that the AI algorithm developed for detecting cutaneous melanoma in dermoscopic images achieves:

Error: Could not determine study code. Provide studyCode or folderSlug prop, or ensure this component is used within an Investigation document with a registered folder slug.

The secondary objectives are:

Comparing the performance of the device with that of dermatologists, with an evaluation of primary care practitioners' assessments planned for subsequent phases.
Evaluating the practicality and reliability of the device in challenging environments with technical constraints.

Design (type of research, assessment criteria, methods, active group, and control group)

This is an analytical observational case series study for the performance of a diagnostic test study. Measurements are performed in a single case, so it is a cross-sectional study. There is a single group of participants, consisting of patients with skin lesions with suspected malignancy seen at the Dermatology Department of the Hospital Universitario Cruces and Hospital Universitario Basurto.

Ethical considerations

This study adhered to international Good Clinical Practice (GCP) guidelines, the Declaration of Helsinki in its latest amendment, and applicable international and national regulations. As applicable, approval from the relevant Ethics Committee was obtained prior to the initiation of the study. When applicable, modifications to the protocol were reviewed and approved by the Principal Investigator (PI) and subsequently evaluated by the Ethics Committee before subjects were enrolled under a modified protocol.

This study was conducted in compliance with European Regulation 2016/679, of 27 April, concerning the protection of natural persons with regard to the processing of personal data and the free movement of such data (General Data Protection Regulation, GDPR), and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and the guarantee of digital rights. In accordance with these regulations, no data enabling the personal identification of participants was collected, and all information was managed securely in an encrypted format.

Participants were informed both orally and in writing about all relevant aspects of the study, with the information being tailored to their level of understanding. They were provided with a copy of the informed consent form and the accompanying patient information sheet. Adequate time was given to patients to ask questions and fully comprehend the details of the study before providing their consent.

The PI was responsible for the preparation of the informed consent form, ensuring it included all elements required by the International Conference on Harmonisation (ICH), adhered to current regulatory guidelines, and complied with the ethical principles of GCP and the Declaration of Helsinki.

The original signed informed consent forms were securely stored in a restricted access area under the custody of the PI. These documents remained at the research site at all times. Participants were provided with a copy of their signed consent form for their records.

Data confidentiality

Current legislation will be complied with in terms of data confidentiality protection (European Regulation 2016/679, of 27 April, on the protection of natural persons with regard to the processing of personal data and the free movement of such data and Organic Law 3/2018, of 5 December, on Personal Data Protection and guarantee of digital rights). For this purpose, when applicable, each participant will receive an alphanumeric identification code in the study that will not include any data allowing personal identification (coded CRD). The Principal Investigator will have an independent list that will allow the connection of the identification codes of the patients participating in the study with their clinical and personal data. This document will be filed in a secure area with restricted access, under the custody of the Principal Investigator and will never leave the centre.

Once the paper CRDs are completed and closed by the Principal Investigator, the data will be transferred to a database.

As in the CRDs, the Database will comply with current legislation in terms of data confidentiality protection (European Regulation 2016/679, of 27 April, on the protection of natural persons about the processing of personal data and the free movement of such data and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and guarantee of digital rights) in which no data allowing personal identification of patients will be included.

Data Quality Assurance

The Principal Investigator is responsible for reviewing and approving the protocol, signing the Principal Investigator commitment, guaranteeing that the persons involved in the centre will respect the confidentiality of patient information and protect personal data, and reviewing and approving the final study report together with the sponsor. All the clinical members of the research team assess the eligibility of the patients in the study, inform and request written informed consent, collect the source data of the study in the clinical record and transfer them to the Data Collection Notebook (DCN) or Data Collection Forms (CRF).

Subject Population (inclusion/exclusion criteria and sample size)

Patients with skin lesions suspected of malignancy were seen at the Dermatology Department of the Hospital Universitario Cruces and Hospital Universitario Basurto.

Inclusion criteria

Patients with skin lesions with suspected malignancy.
Age over 18 years old.
Patients who consent to participate in the study by signing the Informed Consent form.

Exclusion criteria

Patients under 18 years of age.

Treatment

Patients participating in this study did not receive any specific treatment as part of the research protocol.

Concomitant medication or treatment

Patients continued their regular prescribed medications and treatments as directed by their primary healthcare providers. No additional medications or treatments were administered as part of this study.

Follow-Up duration

This study did not require a follow-up of the subjects. Every patient only got their skin lesions photographed at the time of visit.

Statistical analysis

To estimate the device's performance, we used different metrics depending on the task:

Task	Metrics
Melanoma detection	AUC, Top-K precision, Top-K sensitivity, Top-K specificity, PPV, NPV
Malignancy prediction	AUC, Sensitivity, Specificity, PPV, NPV
Skin lesion recognition	Top-K accuracy

Derived from performanceClaims.ts (for comparison):

Error: Could not determine study code. Provide studyCode or folderSlug prop, or ensure this component is used within an Investigation document with a registered folder slug.

For this study, we set the value of K to 1, 3, and 5, for they are the most common values to assess classification performance.

Performance metrics

Hasan MK, Ahamad MA, Yap CH, Yang G. "A survey, review, and future trends of skin lesion segmentation and classification." Computers in Biology and Medicine. 2023 Mar 1;155:106624.

Results

Initiation and Completion Date

The first subject was included on September 17, 2020, and the last subject of the initial cohort of 40 subjects was included on March 24, 2021. The readjusted target number of subjects was finally capped at 105, having reached the desired ratio of cutaneous melanoma cases.

Subject and Investigational Product Management

A total of 105 subjects were included in the study (79 from Hospital Universitario Basurto and 26 from Hospital Universitario Cruces).

	Mean	Standard deviation
Images per subject	5.38	4.71

The definitive image dataset still presents class imbalance:

Diagnosis	# images (initial)	# images (initial + extension)
Actinic keratosis	5	7
Angiokeratoma	0	4
Angioma	0	13
Basal cell carcinoma	65	80
Blue nevus	25	29
Comedo	0	2
Compound nevus	2	6
Cutaneous melanoma	267	288
Dermatofibroma	0	19
Dysplastic nevus	27	27
Haemangioma	0	4
Junctional nevus	0	2
Melanocytic nevus	2	21
Nevus	0	4
Non-conclusive	0	10
Not available	0	1
Seborrheic keratosis	2	46
Spitz nevus	0	2

The distribution of cases reveals that not only did we manage to recruit the desired ratio of melanoma cases (20%), but also increase it (34.29%):

Diagnosis	# Subjects
Actinic keratosis	2
Angiokeratoma	2
Angioma	5
Basal cell carcinoma	13
Blue nevus	4
Comedo	1
Compound nevus	3
Cutaneous melanoma	36
Dermatofibroma	7
Dysplastic nevus	2
Haemangioma	1
Junctional nevus	1
Melanocytic nevus	10
Nevus	2
Non-conclusive	2
Not available	1
Seborrheic keratosis	22
Spitz nevus	1

Distribution of cases

You will see that the aggregate of subjects per disease does not add up to 105. That is because some subjects presented more than one type of lesion.

In addition to class imbalance, the first sample consisted of difficult cases (that required biopsy), whereas the second one was more diverse and provided a more representative sample of what healthcare professionals would see in their everyday clinical practice:

Sample	# patients with biopsy	# images with biopsy
Initial	40	395
Extended	18	35
Total	58	430

To assess visual quality, we used the device's integrated Dermatology Image Quality Assessment (DIQA) processor. By only analysing the raw images (i.e. without cropping the skin lesion), we see that overall quality is acceptable, with some outliers of very low quality.

However, after cropping the images to extract the regions of interest (i.e. the part of the image where the skin lesion is shown), we observed that there was a significant number of images with a DIQA score below 5.

This happens because in many cases the lesion was photographed at a long distance, which leads to low-resolution crops of the region of interest.

Excluding low-quality images from analysis

In a normal daily clinical practice, the device would automatically detect images with visual quality below a certain DIQA score and reject it. To mimic that behaviour, all images with a DIQA score below 5 were excluded from the analysis, leaving us with 469 images.

Subject Demographics

The research did not specifically focus on sex or age factors. The ancestry composition of the population is equivalent to the general population of the region in which the study was carried out.

Source	Female	Male	Total
Hospital Universitario Basurto	38	41	79
Hospital Universitario Cruces	14	12	26
Total	52	53	105

Source	Age (mean ± standard deviation)
Hospital Universitario Basurto	61.66 ± 15.43
Hospital Universitario Cruces	63.65 ± 15.23

CIP (Clinical Investigation Plan) Compliance

The study adhered to all aspects outlined in the Clinical Investigation Plan (CIP). This ensured that the research was conducted following established protocols, procedures, and ethical standards. Any deviations from the CIP were duly documented and appropriately addressed. Compliance with the CIP was rigorously monitored throughout the study to uphold the integrity and validity of the research findings.

Analysis

Primary analysis

Melanoma detection

To study model performance in terms of melanoma detection, the multi-class output of the device was converted to a top-K binary output. In other words, the output of the device was successful, or positive, when the melanoma class was present in the top-K predictions, and negative otherwise.

Metric name	Value (initial)	Value (extension)	Task type
Top-1 precision	0.8241: 178/216 (95% CI: [0.6592-0.9545])	0.8097: 183/226 (95% CI: [0.6555-0.9378])	Melanoma vs non-melanoma
Top-1 sensitivity	0.7773: 178/229 (95% CI: [0.6527-0.8868])	0.7379: 183/248 (95% CI: [0.6093-0.8473])	Melanoma vs non-melanoma
Top-1 specificity	0.6238: 63/101 (95% CI: [0.4130-0.8679])	0.8054: 178/221 (95% CI: [0.6941-0.9254])	Melanoma vs non-melanoma
Top-3 precision	0.7356: 217/295 (95% CI: [0.5600-0.8864])	0.6418: 224/349 (95% CI: [0.4920-0.7758])	Melanoma vs non-melanoma
Top-3 sensitivity	0.9476: 217/229 (95% CI: [0.8967-0.9864])	0.9032: 224/248 (95% CI: [0.8230-0.9617])	Melanoma vs non-melanoma
Top-3 specificity	0.2277: 23/101 (95% CI: [0.0384-0.4828])	0.4344: 96/221 (95% CI: [0.3067-0.5777])	Melanoma vs non-melanoma
Top-5 precision	0.7111: 224/315 (95% CI: [0.5389-0.8639])	0.5914: 233/394 (95% CI: [0.4456-0.7188])	Melanoma vs non-melanoma
Top-5 sensitivity	0.9782: 224/229 (95% CI: [0.9541-0.9958])	0.9395: 233/248 (95% CI: [0.8836-0.9805])	Melanoma vs non-melanoma
Top-5 specificity	0.0990: 10/101 (95% CI: [0.0000-0.2400])	0.2715: 60/221 (95% CI: [0.1852-0.3768])	Melanoma vs non-melanoma

The effect of "K"

As we increase K, top-K sensitivity increases and top-K specificity decreases. This is normal since looking at the top-K predicted classes gives the "Malignant melanoma" class more chances to appear, which increases sensitivity (more correct detections) but reduces specificity (more false positives). Our top-1 accuracy results already show a good balance between sensitivity and specificity.

Additionally, we computed the AUC for melanoma detection using each image's predicted probabilities for the "Malignant melanoma" class. After the extension of the sample, the score becomes excellent (AUC >= 0.80). Note that even the score obtained with the data from the initial phase (0.79) also gets very close to the objective (0.80).

Metric name	Value (initial)	Value (initial + extension)	Task
Melanoma AUC	0.7915 (95% CI: [0.6526-0.9181])	0.8482 (95% CI: [0.7629-0.9222])	Melanoma vs non-melanoma

Secondary analysis

Skin lesion recognition

In this analysis, the multi-class output of the device was used without any modification, and compared to the confirmed diagnoses. To compute the top-K accuracy, it was necessary to check if the correct diagnosis was within the top-K predictions of the device. Similarly to melanoma detection, increasing K leads to better accuracy metrics.

Metric name	Value (initial)	Value (extension)	Task type
Top-1 image-level accuracy	0.6242: 206/330 (95% CI: [0.4903-0.7524])	0.5501: 258/469 (95% CI: [0.4488-0.6487])	Skin lesion classification
Top-3 image-level accuracy	0.8303: 274/330 (95% CI: [0.7170-0.9246])	0.7569: 355/469 (95% CI: [0.6673-0.8325])	Skin lesion classification
Top-5 image-level accuracy	0.9152: 302/330 (95% CI: [0.8242-0.9813])	0.8422: 395/469 (95% CI: [0.7652-0.9021])	Skin lesion classification

Malignancy prediction

Finally, malignancy AUC, sensitivity, specificity, PPV and NPV were computed using the malignancy probability (which is part of the output of the device) and comparing it to the confirmed diagnoses. As in the primary analysis, this requires working in a binary scenario: this was achieved by assigning a positive (or "malignant") label to all images with a diagnosis of a malignant skin lesion (not exclusively melanoma).

Metric name	Value (initial)	Value (extension)	Task type
Malignancy AUC	0.8981 (95% CI: [0.7528-0.9810])	0.8983 (95% CI: [0.8430-0.9438])	Malignancy estimation

Regarding the sensitivity, specificity, and positive and negative predictive values (PPV/NPV), they were computed at several malignancy thresholds to better understand model performance. In terms of these metrics, we observed that the optimal malignancy threshold for sensitivity and specificity is different from the optimal one for PPV and NPV. Given the right threshold, it was possible to obtain sensitivity and specificity values above the objective. This was done for both phases (initial and initial+extension), obtaining the optimal thresholds of 0.22 and 0.43. The following table presents the metrics obtained for each malignancy threshold:

Threshold	Sensitivity (initial)	Specificity (initial)	PPV (initial)	NPV (initial)	Sensitivity (initial + extension)	Specificity (initial + extension)	PPV (initial + extension)	NPV (initial + extension)
0.00	1.0000	0.0000	0.8697	0.0000	1.0000	0.0000	0.6802	0.0000
0.05	0.9826	0.4419	0.9216	0.7917	0.9404	0.5667	0.8219	0.8173
0.10	0.9582	0.6047	0.9418	0.6842	0.9122	0.6600	0.8509	0.7795
0.15	0.9443	0.6977	0.9542	0.6522	0.8997	0.7200	0.8723	0.7714
0.20	0.9233	0.7442	0.9601	0.5926	0.8809	0.7667	0.8892	0.7516
0.25	0.9164	0.7674	0.9634	0.5789	0.8746	0.7933	0.9000	0.7484
0.30	0.8850	0.7674	0.9621	0.5000	0.8433	0.7933	0.8967	0.7041
0.35	0.8711	0.7674	0.9615	0.4714	0.8307	0.8133	0.9044	0.6932
0.40	0.8606	0.7674	0.9611	0.4521	0.8213	0.8467	0.9193	0.6902
0.45	0.8362	0.7674	0.9600	0.4125	0.7994	0.8600	0.9239	0.6684
0.50	0.8293	0.7674	0.9597	0.4024	0.7931	0.8600	0.9234	0.6615
0.55	0.8153	0.7674	0.9590	0.3837	0.7806	0.8600	0.9222	0.6482
0.60	0.8014	0.8140	0.9664	0.3804	0.7680	0.8733	0.9280	0.6390
0.65	0.7840	0.8140	0.9657	0.3608	0.7492	0.8733	0.9264	0.6209
0.70	0.7770	0.8140	0.9654	0.3535	0.7429	0.8867	0.9331	0.6186
0.75	0.7700	0.8372	0.9693	0.3529	0.7335	0.9000	0.9398	0.6136
0.80	0.7387	0.8372	0.9680	0.3243	0.7022	0.9000	0.9372	0.5870
0.85	0.6934	0.8605	0.9707	0.2960	0.6614	0.9133	0.9420	0.5592
0.90	0.6481	0.8605	0.9687	0.2681	0.6207	0.9333	0.9519	0.5364
0.95	0.5470	0.9535	0.9874	0.2398	0.5204	0.9733	0.9765	0.4883
1.00	0.0000	1.0000	0.0000	0.1303	0.0000	1.0000	0.0000	0.3198

The optimal malignancy threshold was chosen by maximising the Youden's index (J). For the resulting thresholds (0.22 and 0.43 for the initial and extension phase, respectively), we obtained the following malignancy estimation metrics:

J = Sensitivity + Specificity - 1

Metric name	Value (initial)	Value (extension)	Task type
Malignancy precision	0.9635: 264/274 (95% CI: [0.9115-1.0000])	0.9247: 258/279 (95% CI: [0.8556-0.9708])	Malignancy estimation
Malignancy sensitivity	0.9199: 264/287 (95% CI: [0.8676-0.9635])	0.8088: 258/319 (95% CI: [0.7175-0.8839])	Malignancy estimation
Malignancy specificity	0.7674: 33/43 (95% CI: [0.4737-1.0000])	0.8600: 129/150 (95% CI: [0.7723-0.9388])	Malignancy estimation
Malignancy PPV	0.9635: 264/274 (95% CI: [0.9115-1.0000])	0.9247: 258/279 (95% CI: [0.8556-0.9708])	Malignancy estimation
Malignancy NPV	0.5893: 33/56 (95% CI: [0.2391-0.8200])	0.6789: 129/190 (95% CI: [0.5427-0.8077])	Malignancy estimation

Comparison to dermatologist performance

As a high percentage of images of the final sample had both a clinical diagnosis from a dermatologist and a pathological anatomy result (i.e. a biopsy), we compared the performance of the model to that of the dermatologists in that subset of 363 images. As most of the dermatologists' responses presented just one diagnosis (i.e. not a differential diagnosis), we could not compute top-K metrics for the dermatologists.

As dermatologists do not assess malignancy by predicting a probability (which would be out of the scope of the study), it was not possible to compute their corresponding melanoma and malignancy AUCs either.

Metric name	Value (initial)	Value (extension)	Task type
Dermatologist image-level accuracy	0.6394: 211/330 (95% CI: [0.4671-0.8018])	0.6281: 228/363 (95% CI: [0.4737-0.7772])	Skin lesion classification
Dermatologist precision	0.8486: 185/218 (95% CI: [0.6502-0.9911])	0.8444: 190/225 (95% CI: [0.6542-0.9834])	Melanoma vs non-melanoma
Dermatologist sensitivity	0.8079: 185/229 (95% CI: [0.6390-0.9482])	0.7950: 190/239 (95% CI: [0.6256-0.9299])	Melanoma vs non-melanoma
Dermatologist specificity	0.6733: 68/101 (95% CI: [0.3412-0.9775])	0.7177: 89/124 (95% CI: [0.4482-0.9655])	Melanoma vs non-melanoma

Metric name	Value (initial)	Value (extension)	Task type
Top-1 image-level accuracy	0.6242: 206/330 (95% CI: [0.4903-0.7524])	0.6006: 218/363 (95% CI: [0.4819-0.7193])	Skin lesion classification
Top-3 image-level accuracy	0.8303: 274/330 (95% CI: [0.7170-0.9246])	0.8072: 293/363 (95% CI: [0.6988-0.8933])	Skin lesion classification
Top-5 image-level accuracy	0.9152: 302/330 (95% CI: [0.8242-0.9813])	0.8871: 322/363 (95% CI: [0.7990-0.9534])	Skin lesion classification
Top-1 precision	0.8241: 178/216 (95% CI: [0.6592-0.9545])	0.8206: 183/223 (95% CI: [0.6648-0.9500])	Melanoma vs non-melanoma
Top-1 sensitivity	0.7773: 178/229 (95% CI: [0.6527-0.8868])	0.7657: 183/239 (95% CI: [0.6412-0.8786])	Melanoma vs non-melanoma
Top-1 specificity	0.6238: 63/101 (95% CI: [0.4130-0.8679])	0.6774: 84/124 (95% CI: [0.5136-0.8740])	Melanoma vs non-melanoma
Top-3 precision	0.7356: 217/295 (95% CI: [0.5600-0.8864])	0.6991: 223/319 (95% CI: [0.5401-0.8415])	Melanoma vs non-melanoma
Top-3 sensitivity	0.9476: 217/229 (95% CI: [0.8967-0.9864])	0.9331: 223/239 (95% CI: [0.8738-0.9776])	Melanoma vs non-melanoma
Top-3 specificity	0.2277: 23/101 (95% CI: [0.0384-0.4828])	0.2258: 28/124 (95% CI: [0.0635-0.4236])	Melanoma vs non-melanoma
Top-5 precision	0.7111: 224/315 (95% CI: [0.5389-0.8639])	0.6725: 230/342 (95% CI: [0.5167-0.8140])	Melanoma vs non-melanoma
Top-5 sensitivity	0.9782: 224/229 (95% CI: [0.9541-0.9958])	0.9623: 230/239 (95% CI: [0.9242-0.9895])	Melanoma vs non-melanoma
Top-5 specificity	0.0990: 10/101 (95% CI: [0.0000-0.2400])	0.0968: 12/124 (95% CI: [0.0120-0.2110])	Melanoma vs non-melanoma
Melanoma AUC	0.7915 (95% CI: [0.6526-0.9181])	0.8087 (95% CI: [0.6972-0.9149])	Melanoma vs non-melanoma

Dermatologists' diagnoses

The clinical diagnoses used in this study came from more than one dermatologist. In other words, patients from Hospital Universitario Basurto and Hospital Universitario Cruces were treated by more than one practitioner. This means that each case's clinical diagnosis is not a consensus between dermatologists but the assessment from a single expert. This analysis can be seen as a comparison between device performance and what a board of dermatologists would yield in everyday clinical practice.

Adverse Events and Adverse Reactions to the Product

Throughout the study, no adverse events or adverse reactions related to the investigated device have been observed. Participants have not experienced any negative reactions or side effects associated with the use of the device. This indicates a favourable safety profile of the investigated device in the context of this study.

Product Deficiencies

No deficiencies in the device have been observed during this study. As a result, no corrective actions have been deemed necessary. The device has demonstrated consistent performance per the study's objectives.

Subgroup Analysis for Special Populations

In the context of the analyzed pathologies, no special population subgroups were identified for this study. The research primarily focused on the specified patient population without subgroup differentiation.

Accounting for all subjects

A total of 105 subjects were included in the study. However, the heterogeneity of the data (class imbalance and image quality) limited the power of the analyses. Due to inconclusive diagnoses, 2 cases were excluded from the latest analysis. This ensures that all the images of the analysis present skin lesions that can be detected by the device.

Discussion and Overall Conclusions

Clinical Performance, Efficacy, and Safety performance

Summary of Performance Claims:

Error: Could not determine study code. Provide studyCode or folderSlug prop, or ensure this component is used within an Investigation document with a registered folder slug.

The device has demonstrated an excellent performance in terms of malignancy prediction, which turns it into a valuable tool to prioritise patients according to their risk of presenting malignancy. The AUC metric for the malignancy prediction was 0.8983, which is comparable to that of expert healthcare professionals (HCP).

For the task of melanoma detection, the device also showed an excellent performance in terms of AUC, which as 0.8482 (95% CI: [0.7629-0.9222]). Despite the complexity of the dataset, it was possible balance between sensitivity and specificity: top-1 sensitivity was 0.7379: 183/248 (95% CI: [0.6093-0.8473]), and top-1 specificity 0.8054: 178/221 (95% CI: [0.6941-0.9254]), which met the objective of the study.

Regarding skin lesion recognition in general terms, the Top-5 accuracy was 0.8422: 395/469 (95% CI: [0.7652-0.9021]), which supports the device's intended use as a clinical decision-support tool. Specifically in melanoma, the AUC metric was 0.8482 (95% CI: [0.7629-0.9222]) which is considerably high and means the consecution of the goals set out in the hypotheses of the study. On the downside, the Top-1 accuracy was 0.5501: 258/469 (95% CI: [0.4488-0.6487]) in the multiple ICD classification task, but the Top-3 accuracy increased to 0.7569: 355/469 (95% CI: [0.6673-0.8325]). However, it's important to keep in mind that the Top-1 accuracy metric was not a relevant metric to this study, nor the performance of the device, because the device is designed to always output at least the top five predicted classes. This is aligned with its intended purpose as a clinical decision-support tool.

Enhancing Image-Taking Skills Among Healthcare Professionals

The data gleaned from our study underscores the imperative need for targeted training programs aimed at enhancing the skills of HCPs in capturing high-quality clinical images. Proper training is paramount to ensure that the images taken in real-world clinical settings are of sufficient quality to yield accurate and reliable diagnostic outcomes. This aligns closely with the findings from our research, indicating that when HCPs are adept at taking good images, the real-world performance of diagnostic tools and assessments can be significantly improved, closely mirroring the positive results obtained in controlled research settings. Thus, investing in comprehensive training for HCPs on effective image-taking techniques stands out as a critical strategy for optimizing patient care and enhancing the overall efficiency of the healthcare system.

Additionally, we believe another factor that limits the results is that, in some cases, malignant cases can not be easily analyzed simply by observing the image. Indeed, some cases require a biopsy to ascertain a diagnosis, regardless of the experience of the observer. This is not a problem for the performance or the safety of the device because whenever there is a suspicion of melanoma, clinicians universally adhere to the protocol of conducting a biopsy to confirm a diagnostic suspicion. This established clinical practice is rooted in the fundamental understanding that the removal of a melanoma is a minor procedure compared to the significant risks associated with the disease. Consequently, practitioners will never rely solely on the device for information when it comes to identifying melanoma, ensuring a comprehensive and cautious approach to diagnosis and treatment.

Clinical Risks and Benefits

Participants in this study did not undergo any procedures that posed a risk to their safety.

Clinical Relevance

While most of the body of research in this area is focused on the use of computer vision for the classification of pigmented skin lesions[1], our device is capable of recognising a variety of ICD categories, including but not limited to pigmented skin lesions. Compared to the state-of-the-art [2][3], the device presents a comparable performance in terms of malignancy prediction, despite the limitations of the study's current image dataset.

Compared to other works such as that of Han et al.[2], our results in overall skin image recognition also demonstrate the potential of computer vision in dermatology.

Regarding the results obtained in this study, it is also important to highlight the high-performance of the device, achieving, such an AUC, which supports its potential use in clinical triage, helping to prioritise patients if there is suspicion of malignancy [4]. In addition to this, the identification of malignant lesions, such as melanoma, can also enhance referral efficiency to dermatology, reducing unnecessary consultations and optimising healthcare resources[5]. Additionally, early detection of skin cancer not only impacts treatment and survival outcomes [6], but also early detection may lead to less aggressive treatment needs [7].

The device demonstrated strong performance in recognising skin lesions, achieving a Top-5 accuracy of 0.8422, and showed diagnostic capabilities comparable to those of dermatologists, particularly in melanoma detection, where it reached an AUC of 0.8482. These findings highlight the device's high capacity for identifying dermatological conditions, consistent with its intended use. Such performance has the potential to enhance diagnostic accuracy among healthcare professionals, especially in primary care settings, where long waiting times and suboptimal referral practices are common [8]. By supporting more accurate diagnoses, the device can contribute to more appropriate referral decisions, ensuring that patients with more serious conditions are prioritised for specialist care [8]. Earlier access to the right treatment not only improves patient outcomes but also leads to healthcare cost savings by reducing delays, unnecessary referrals, and avoidable complications. [9]

References

[1] Li, Ling-Fang, et al. "Deep learning in skin disease image recognition: A review." Ieee Access 8 (2020): 208264-208280.

[2] Han, Seung Seog, et al. "Augmented intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders." Journal of Investigative Dermatology 140.9 (2020): 1753-1761. doi: 0.1016/j.jid.2020.01.019. (https://doi.org/10.1016/j.jid.2020.01.019).

[3] Haenssle, Holger A., et al. "Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists." Annals of Oncology 29.8 (2018): 1836-1842. doi: 10.1093/annonc/mdy166. (https://doi.org/10.1093/annonc/mdy166).

[4] Papachristou, P, et al. "Evaluation of an artificial intelligence-based decision support for the detection of cutaneous melanoma in primary care: a prospective real-life clinical trial". British Journal of Dermatology 191.1 (2024): 125-133. doi: 10.1093/bjd/ljae021. (https://doi.org/10.1093/bjd/ljae021).

[5] Marsden, H, et al. "Accuracy of an Artificial Intelligence as a medical device as part of a UK-based skin cancer teledermatology service". Frontiers in medicine 11:1302363 (2024). doi: 10.3389/fmed.2024.1302363. (https://doi.org/10.3389/fmed.2024.1302363).

[6] Jerant AF, et al. Early detection and treatment of skin cancer. Am Fam Physician. 2000 Jul 15;62(2):357-68.

[7] Schuldt K, et al. Skin Cancer Screening and Medical Treatment Intensity in Patients with Malignant Melanoma and Non-Melanocytic Skin Cancer. Dtsch Arztebl Int. 2023 Jan 20;120(3):33-39. doi: 10.3238/arztebl.m2022.0364. (https://doi.org/10.3238/arztebl.m2022.0364).

Giavina-Bianchi M, et al. Teledermatology reduces dermatology referrals and improves access to specialists. EClinicalMedicine. 2020 Nov 21;29-30:100641. doi: 10.1016/j.eclinm.2020.100641. (https://doi.org/10.1016/j.eclinm.2020.100641).
Eminovic, Nina, et al. "Teledermatologic consultation and reduction in referrals to dermatologists: a cluster randomized controlled trial". Archives of dermatology 145(5) (2009). doi: 10.1001/archdermatol.2009.44. (https://doi.org/10.1001/archdermatol.2009.44).

Specific Benefit or Special Precaution

Benefits

Diagnosis support of skin structures: The device can recognize a variety of ICD categories. This makes the device a diagnosis support tool that could potentially save time for healthcare professionals and lead to faster treatment for patients.
Longitudinal measure of disease progression: Despite not being tested in this study, the device can measure the severity of skin diseases, which can assist in monitoring the progression of the disease and the effectiveness of treatment.
Data Collection and Analysis: The device can provide a wide range of clinical data from the analyzed images. This can assist healthcare practitioners in their clinical evaluations and allow healthcare provider organizations to gather data and improve their workflows.

Precautions

Use on visible skin structures only: The device can only quantify clinical signs that are visible in a clinical or a dermoscopic image. This limitation should be considered when using the device.
Data demographics, diversity and bias: The device is trained on a large collection of dermatology images to ensure its functioning is stable among all types of populations and skin tones. However, despite the diversity of the data, there may still be some bias depending on the ICD category.
Image quality: As discussed in this study, visual quality plays a crucial role in the analysis of an image. The device already incorporates the DIQA^* algorithm to ensure all images processed by it are of enough visual quality.
User Training: As the device is intended for use by health care professionals, adequate training should be provided to ensure correct use and interpretation of the results.

DIQA

Hernández Montilla, Ignacio, Taig Mac Carthy, Andy Aguilar, and Alfonso Medela. "Dermatology Image Quality Assessment (DIQA): Artificial intelligence to ensure the clinical utility of images for remote consultations and clinical trials." Journal of the American Academy of Dermatology 88, no. 4 (2023): 927-928.

Implications for Future Research

The device's image recognition processor is the result of a continuous improvement, not only of the deep learning model employed but also of the list of ICD categories it is capable of recognising.

Achieving more granularity and detail for the current taxonomy of ICD categories will lead to better results, even with this limited image dataset of suboptimal quality.

The overall image acquisition process will also be supervised more rigorously, to ensure the person responsible for taking the images ensures the skin lesion is properly framed and has good visual quality.

Limitations of Clinical Research

The main limitation of computer vision-based skin image recognition lies in the quantity and quality of the images collected. Variability in illumination, colour, shape, size and focus are determinants, in addition to the number of images per lesion. This means that a large variability within the same lesion (outliers) and an insufficient number of images to reflect that variability can result in an accuracy lower than expected. This has happened in this study, with extremely zoomed-out, blurry, out-of-frame, over- and underexposed images that limited the power of the analyses.

For this reason, the originally proposed size of 40 subjects was readjusted to 200, of which at least 40 (20%) should present cutaneous melanoma. At the study closure, 105 subjects were recruited, with 36 cases of cutaneous melanoma (34.29%), exceding the proposed ratio of melanoma cases. The impact of low-quality data was compensated by excluding low-quality images from the analyses (DIQA < 5).

After analyzing the data from the pilot study with the entire cohort of patients, the results obtained indicated that despite the good diagnostic capacity of the CADx system, the results of discrimination between melanoma and other pathologies were not representative of daily clinical practice, since it only and exclusively included cases with a high suspicion of malignancy. However, the results provide a good understanding of model performance on challenging cases like the ones included. After extending the study and including a wider variety of lesions, the performance was equally compelling and the dataset became more representative of daily clinical practice.

Ethical Aspects of Clinical Research

Investigators and Administrative Structure of Clinical Research

Brief Description

The clinical investigation team comprises highly esteemed dermatologists and a specialist in artificial intelligence. Dr. Jesús Gardeazabal García and Dr. Rosa María Izu Belloso serve as the Principal Investigators, affiliating with Hospital Universitario Cruces and Hospital Universitario Basurto's Dermatology Services, respectively.

Collaborating on the study, we have Dr Juan Antonio Ratón Nieto and Dr Ana Sánchez Díez, both of whom are associated with the dermatology departments of Hospital Universitario Cruces and Hospital Universitario Basurto. Completing the team, Alfonso Medela represents AI Labs Group SL, bringing expertise in artificial intelligence to the clinical investigation, together with Andy Aguilar and Taig Mac Carthy.

This diverse and skilled team ensures a comprehensive approach to the clinical evaluation of the device, aiming to validate its safety, effectiveness, and performance in a real-world dermatological setting.

Investigators

Principal investigators

Dr. Jesus Gardeazabal (Osakidetza).
Dra. Rosa Mª Izu (Osakidetza).

Collaborators

Dr. Juan Antonio Ratón Nieto (Servicio de Dermatología, Hospital Universitario Cruces).
Dr. Ana Sánchez Díez (Servicio Dermatología, Hospital Universitario Basurto).
Alfonso Medela (AI Labs Group S.L.).
Andy Aguilar (AI Labs Group S.L.).
Taig Mac Carthy (AI Labs Group S.L.).

External Organization

No external organizations contributed to this research.

AI Labs Group S.L.

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

Author: Team members involved
Reviewer: JD-003 Design & Development Manager, JD-004 Quality Manager & PRRC
Approver: JD-001 General Manager

Research Title​

Description​

Product identification​

Sponsor Identification and Contact​

Identification of the Clinical Investigation Plan (CIP)​

Public Access Database​

Research Team​

Principal investigators​

Collaborators​

Centres​

Compliance Statement​

Report Date​

Report Author(s)​

Table of contents​

Abbreviations and Definitions​

Summary​

Research title​

Introduction​

Objectives​

Hypothesis​

Primary objective​

Secondary objective​

Population​

Sample size​

Design and Methods​

Design​

Number of Subjects​

Initiation Date​

Completion Date​

Duration​

Methods​

Results​

Results of the first sample​

Results of the second sample​

Conclusions​

Introduction​

Materials and methods​

Product Description​

Product description​

Intended purpose​

Intended previous uses​

Product changes during clinical research​

Clinical Investigation Plan​

Objectives​

Design (type of research, assessment criteria, methods, active group, and control group)​

Ethical considerations​

Data confidentiality​

Data Quality Assurance​

Subject Population (inclusion/exclusion criteria and sample size)​

Inclusion criteria​

Exclusion criteria​

Treatment​

Concomitant medication or treatment​

Follow-Up duration​

Statistical analysis​

Results​

Initiation and Completion Date​

Subject and Investigational Product Management​

Subject Demographics​

CIP (Clinical Investigation Plan) Compliance​

Analysis​

Primary analysis​

Melanoma detection​

Secondary analysis​

Skin lesion recognition​

Malignancy prediction​

Comparison to dermatologist performance​

Adverse Events and Adverse Reactions to the Product​

Product Deficiencies​

Subgroup Analysis for Special Populations​

Accounting for all subjects​

Discussion and Overall Conclusions​

Clinical Performance, Efficacy, and Safety performance​

Clinical Risks and Benefits​

Clinical Relevance​

Specific Benefit or Special Precaution​

Benefits​

Precautions​

Implications for Future Research​

Limitations of Clinical Research​

Research Title

Description

Product identification

Sponsor Identification and Contact

Identification of the Clinical Investigation Plan (CIP)

Public Access Database

Research Team

Principal investigators

Collaborators

Centres

Compliance Statement

Report Date

Report Author(s)

Table of contents

Abbreviations and Definitions

Summary

Research title

Introduction

Objectives

Hypothesis

Primary objective

Secondary objective

Population

Sample size

Design and Methods

Design

Number of Subjects

Initiation Date

Completion Date

Duration

Methods

Results

Results of the first sample

Results of the second sample

Conclusions

Introduction

Materials and methods

Product Description

Product description

Intended purpose

Intended previous uses

Product changes during clinical research

Clinical Investigation Plan

Objectives

Design (type of research, assessment criteria, methods, active group, and control group)

Ethical considerations

Data confidentiality

Data Quality Assurance

Subject Population (inclusion/exclusion criteria and sample size)

Inclusion criteria

Exclusion criteria

Treatment

Concomitant medication or treatment

Follow-Up duration

Statistical analysis

Results

Initiation and Completion Date

Subject and Investigational Product Management

Subject Demographics

CIP (Clinical Investigation Plan) Compliance

Analysis

Primary analysis

Melanoma detection

Secondary analysis

Skin lesion recognition

Malignancy prediction

Comparison to dermatologist performance

Adverse Events and Adverse Reactions to the Product

Product Deficiencies

Subgroup Analysis for Special Populations

Accounting for all subjects

Discussion and Overall Conclusions

Clinical Performance, Efficacy, and Safety performance

Clinical Risks and Benefits

Clinical Relevance

Specific Benefit or Special Precaution

Benefits

Precautions

Implications for Future Research

Limitations of Clinical Research