R-TF-015-006 Clinical investigation report Legit.Health_IDEI_2023
Research Title
Optimization of clinical flow in patients with dermatological conditions using Artificial Intelligence.
Description
Clinical study validation using Legit.Health to improve the clinical flow by improving the diagnosis and severity assessment. In this study patients with pigmented lesions, androgenic alopecia and acne from the dermatological clinic IDEI will be recruited. The study is divided into two parts: first recruitment and analysis of patients with pigmented lesions and androgenic alopecia; and the second recruitment of patients with acne and the final analysis.
Product identification
Information | |
---|---|
Device name | Legit.Health Plus (hereinafter, the device) |
Model and type | NA |
Version | 1.0.0.0 |
Basic UDI-DI | 8437025550LegitCADx6X |
Certificate number (if available) | MDR 792790 |
EMDN code(s) | Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software) |
GMDN code | 65975 |
Class | Class IIb |
Classification rule | Rule 11 |
Novel product (True/False) | FALSE |
Novel related clinical procedure (True/False) | FALSE |
SRN | ES-MF-000025345 |
Promoter identification and contact
Manufacturer data | |
---|---|
Legal manufacturer name | AI Labs Group S.L. |
Address | Street Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain) |
SRN | ES-MF-000025345 |
Person responsible for regulatory compliance | Alfonso Medela, María Diez, Giulia Foglia |
office@legit.health | |
Phone | +34 638127476 |
Trademark | Legit.Health |
Identification of the Clinical Investigation Plan (CIP)
- Title: Optimization of clinical flow in patients with dermatological conditions using Artificial Intelligence.
- Protocol code: Legit.Health_IDEI_2023.
- Study Design: Prospective observational study with longitudinal and retrospective case series.
- Product under investigation: Legit.Health
- Version and date: Version 12.0, date 27/12/2023
Public Access Database
Please note that the database used in this study is not publicly accessible due to privacy and confidentiality considerations.
Research team
Principal Investigators
- Dr. Miguel Sánchez Viera (Instituto de Dermatología Integral, IDEI)
Collaborating Investigators
- IDEI, Instituto de Dermatología Integral
- Dr. Concetta D'Alessandro
- Dr. Alejandra Capote
- Dr Pablo Lopez Andina
- Dr. Allison Marie Bell-Smythe Sorg
- Dr. Alejandra Vallejos
- Dr. Isabel del Campo
- Dr. Juliana Machado
- Dr. Raúl Lucas Escobar
- Beatriz Torres
- Legit.Health (AI Labs Group S.L.)
- Alfonso Medela
- Taig Mac Carthy
Investigational site
- Instituto de Dermatología Integral (IDEI)
Compliance Statement
The clinical investigation was perforfed according to the Clinical Investigation Plan (CIP) and other applicable guidances and regulations. This includes compliance with:
- Harmonized standard
UNE-EN ISO 14155:2021
Regulation (EU) 2017/745 on medical devices (MDR)
- Harmonized standard
UNE-EN ISO 13485:2016s
Regulation (EU) 2016/679
(GDPR).- Spanish
Organic Law 3/2018
on the Protection of Personal Data and guarantee of digital rights`.
All data processing within the device is carried out in accordance with the highest standards of data protection and privacy. Patient information is managed in an encrypted manner to ensure confidentiality and security.
The research team assumes the role of Data Controller, responsible for the collection and management of study data. Legit.Health acts as the Data Processor and is not involved in the processing of patient data.
The storage and transfer of data comply with European data protection regulations. At the conclusion of the study, all information stored in the device will be permanently and securely deleted.
The device employs robust technical and organizational security measures to safeguard personal data against unauthorized access, alteration, loss, or processing.
Report date
October 20, 2024
Report author(s)
The full name, the ID and the signature for the authorship, as well as the approval process of this document, can be found in the verified commits at the repository. This information is saved alongside the digital signature, to ensure the integrity of the document.
Table of contents
Table of contents
- Research Title
- Description
- Product identification
- Promoter identification and contact
- Identification of the Clinical Investigation Plan (CIP)
- Public Access Database
- Research team
- Compliance Statement
- Report date
- Report author(s)
- Table of contents
- Abbreviations and definitions
- Summary
- Introduction
- Materials and methods
- Results
- Discussion and overall Conclusions
- Ethical considerations
- Investigators and administrative structure of clinical research
- Report annexes
Abbreviations and definitions
- CAD: Computer-Aided Diagnosis
- CIP: Clinical Investigation Plan
- CUS: Clinical Utility Questionnaire
- SUS: System Usability Scale
- GCP: Standards of Good Clinical Practice
- ICH: International Conference of Harmonization
- PI: Principal Investigator
- DLQI: Dermatology Quality of Life Index
- ICH: International Conference of Harmonization
- AUC: Area Under the ROC Curve
Summary
This is an observational study, both prospective and retrospective, of a series of clinical cases designed to validate whether the device, powered by artificial intelligence, can effectively optimize the clinical workflow and assist in the management of dermatological patients. The study focuses on evaluating patients treated at IDEI, with an estimated inclusion of at least 120 patients, covering cases of pigmented lesions and androgenic alopecia. The primary objective is to assess the tool's impact on reducing the time and cost of patient care by enhancing diagnostic accuracy and appropriately assigning consultations.
Title
Optimization of clinical flow in patients with dermatological conditions using Artificial Intelligence.
Introduction
Image-based artificial intelligence (AI) holds great potential to enhance diagnostic accuracy in the medical field. During the COVID-19 pandemic, limited access to in-person healthcare services accelerated the adoption of telemedicine, highlighting the importance of AI in triage and decision-making to help professionals manage workloads and improve efficiency. In dermatology, common conditions like pigmented lesions, acne, and alopecia require significant resources for triage, clinical evaluation, and follow-up. AI tools can help reduce these demands and optimize workflows.
Advances in image recognition and AI have driven innovations in diagnosing various conditions, including skin disorders. Computer-Aided Diagnosis (CAD) systems and algorithm-based technologies have proven capable of classifying lesion images with expertise comparable to that of a skilled dermatologist.
The primary goal of this study is to validate whether the device improves clinical workflow efficiency and patient care by accurately diagnosing and determining the severity of lesions. This will reduce the need for in-person consultations and associated costs, ensuring patients are directed to the appropriate consultation types. Secondary objectives include reducing wait times for patients with varying degrees of medical urgency, decreasing the number of initial dermatology consultations, improving specialist satisfaction and patient usability, and indirectly benefiting the clinic economically.
This innovation presents a significant opportunity to enhance clinical practice in private clinics, particularly in managing patients with pigmented lesions. By optimizing care processes and clinical workflows, the device has the potential to positively impact the quality of life for patients with these dermatological conditions. Additionally, this advanced technology could facilitate the early detection of severe skin cancer cases, enabling more effective and timely treatment and follow-up for at-risk patients.
Objectives
Hypothesis
Legit.Health improves efficiency in clinical flow and patient care processes, facilitating care and reducing the need for face-to-face care per patient.
Primary objective
To validate that the device optimizes the clinical flow and patient care process, decreasing the time and cost of care per patient, through greater accuracy in medical diagnosis and determining the degree of malignancy or severity.
Secondary objectives
Secondary objectives focus on measuring the diagnostic performance of the device. More specifically:
- To demonstrate that the device improves the ability of healthcare professionals to detect malignant or suspected malignant pigmented lesions.
- Demonstrate that the device improves the ability and accuracy of healthcare professionals to measure the degree of involvement of patients with female androgenic alopecia.
- To demonstrate that the device improves the ability and accuracy of healthcare professionals to measure the degree of involvement of patients with acne.
Also, the study aims to measure the usefulness of this tool. More specifically:
- Automate the initial triage/assessment process in patients consulting for pigmented lesions.
- To evaluate the reduction in the use of healthcare resources by the centre by reducing the number of triage consultations and direct referrals to the appropriate consultation (aesthetic or dermatological).
- Evaluate the degree of usability of the device by the patient.
- Demonstrate that the device increases specialist satisfaction.
- Evaluate the reduction in the use of healthcare resources by reducing the number of triage consultations and directing the patient directly to the appropriate consultation, whether in the aesthetic or dermatological field.
Population
Adult patients (≥ 18 years) with skin pathologies seen at IDEI. These patients should be diagnosed with pigmented lesions, androgenetic alopecia or acne.
Design and methods
Design
This is a prospective observational study with both longitudinal and retrospective case series.
Number of subjects
Prospectively, a minimum of 60 cases will be included:
- 30 with pigmented lesions.
- 15 with androgenic alopecia.
- 15 with inflammatory acne.
Retrospectively, 60 patients with pigmented lesions, 15 with androgenic alopecia and 15 with inflammatory acne will be included.
The sample size was estimated based on the number of patients that the IDEI Dermatology Unit can care for. The data collected from these patients during the study period will be analyzed, and depending on the results obtained it will be assessed whether it is necessary to extend the sample size to include more patients.
By the time of the report, we have recruited:
- 76 retrospective patients with pigmented lesions (88 lesions).
- 32 prospective patients with pigmented lesions (42 lesions).
- 62 retrospective patients with androgenetic alopecia.
- 34 prospective patients with androgenetic alopecia.
By this date, no acne patients have been recruited yet.
Initiation date
February 2nd, 2024.
Completion date
The first part of the study concluded on August 23rd, 2024. The second part of the study is pending to be completed.
Duration
This study estimates a recruitment period of 3 months.
The total duration of the study is estimated to be 6 months, including the previous time for retrospective analysis and the time required after the recruitment of the last subject for closing and editing the database, data analysis, and preparation of the final study report.
The total duration of the study for each participant with pigmented lesions will be 1-3 months. The duration for patients with acne and alopecia will be 1 day.
Methods
This study employed both a prospective and retrospective observational analytical design to assess if the medical device can effectively improve the clinical workflow and assist in the management of dermatological patients. This investigation included at the time of this report 204 patients with pigmented lesions or androgenic alopecia. Data collection included photograph analysis, severity assessment and the use of questionnaires. The study adhered to strict ethical guidelines, ensuring patient confidentiality and compliance with international standards. Patients were provided with detailed information and informed consent. Python programming language will be used as statistical software.
Results
We collected a substantial number of images from 108 patients with pigmented lesions and 96 patients with androgenetic alopecia, combining both retrospective and prospective data. Currently, patient recruitment for acne has not yet begun.
For pigmented lesions, 87.5% of the retrospective images were dermatoscopic and the rest were clinical. In the case of the prospective images, all of them were clinical. In the case of alopecia all images were clinical, with no trichoscopic photographs included. The medical device demonstrated an AUC of 0.76 in detecting lesion malignancy from retrospective images, while the dermatologists achieved an AUC of 0.79. In the same source of images, the medical device achieved a top-5 accuracy of 0.47 when doing the diagnosis assessment, while the dermatologists achieved a 0.45 top-3 accuracy. When not accounting for the specific kind of nevus in the diagnosis, the medical device achieves a superior top-5 accuracy of 0.78 and the dermatologists achieve a top-3 accuracy of 0.70. In the analysis of prospective images, we analyze the performance of dermatologists when aided by the legacy medical device, the legacy medical device on its own, and the current medical device. In malignancy analysis, they get an AUC of 0.94, 0.95, and 0.97 respectively. Regarding diagnosis performance, the dermatologists aided by the legacy medical device achieved a top-1 accuracy of 0.30, and both the legacy and the current medical devices achieved a top-5 accuracy of 0.44 and 0.52 respectively. When not accounting for the specific kind of nevus in the diagnosis, the accuracies are increased to 0.85, 0.89, and 0.93 respectively.
For androgenetic alopecia, we collected 49 retrospective images in addition to 13 previously obtained. The optimized AI model showed a correlation of 0.77 on this earlier dataset. In the prospective test, 34 images were analyzed without parameter tuning, ensuring an unbiased evaluation of the algorithm's performance. The overall accuracy of the model was 47%, while the accuracy of the latest model optimized for FAA was 53%, based on the investigator's scores. This suggests that the device algorithm can still benefit from further data integration and model optimization.
Conclusions
The device's diagnostic capability in distinguishing malignancy is on par with expert dermatologists, not only in teledermatology but also in in-person consultations. This confirms its reliability as a screening tool for malignant ICD-11 categories, helping to prioritize patients based on urgency and direct them to the appropriate specialist or consultation.
Additionally, we observed a strong correlation in Ludwig scores, despite a decline in the prospective trial, which may be attributed to inconsistencies in criteria alignment.
Introduction
Image-based Artificial Intelligence (AI) holds great potential to enhance diagnostic accuracy in the medical field. During the COVID-19 pandemic, limited access to in-person healthcare services accelerated the adoption of telemedicine, highlighting the importance of AI in triage and decision-making to help professionals manage workloads and improve efficiency. In dermatology, common conditions like pigmented lesions, acne, and alopecia require significant resources for triage, clinical evaluation, and follow-up. AI tools can help reduce these demands and optimize workflows.
Advances in image recognition and AI have driven innovations in diagnosing various conditions, including skin disorders. Computer-Aided Diagnosis (CAD) systems and algorithm-based technologies have proven capable of classifying lesion images with expertise comparable to that of a skilled dermatologist.
This study evaluates the device, an AI tool developed by AI Labs Group S.L., which aims to optimize clinical workflows and patient care in dermatology. The tool automatically prioritizes urgent cases, assigns the appropriate consultation type (dermatological or aesthetic), enhances diagnostic accuracy, and detects malignant pigmented lesions. It also provides a visual record (photograph) for external experts to review.
The primary goal of this study is to validate whether the device improves clinical workflow efficiency and patient care by accurately diagnosing and determining the severity of lesions. This will reduce the need for in-person consultations and associated costs, ensuring patients are directed to the appropriate consultation type. Secondary objectives include reducing wait times for patients with varying degrees of medical urgency, decreasing the number of initial dermatology consultations, improving specialist satisfaction and patient usability, and indirectly benefiting the clinic economically.
This innovation presents a significant opportunity to enhance clinical practice in private clinics, particularly in managing patients with pigmented lesions. By optimizing care processes and clinical workflows, the device has the potential to positively impact the quality of life of patients with these dermatological conditions. Additionally, this advanced technology could facilitate the early detection of severe skin cancer cases, enabling more effective and timely treatment and follow-up for at-risk patients.
Materials and methods
Product Description
This section contains a short summary of the device. A complete description of the intended purpose, including device description, can be found in the record Legit.Health Plus description and specifications
.
Product description
The device is computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.
The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.
The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.
Intended purpose
The device is a computational software-only medical device intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing:
- quantification of intensity, count, extent of visible clinical signs
- interpretative distribution representation of possible International Classification of Diseases (ICD) categories.
Intended previous uses
No specific intended use was designated in prior stages of development.
Product changes during clinical research
The device maintained a consistent performance and features throughout the entire clinical research process. No alterations or modifications were made during this period.
Clinical Investigation Plan (CIP)
This is an observational study, both prospective and retrospective, of a series of clinical cases designed to validate whether the device, powered by artificial intelligence, can effectively optimize the clinical workflow and assist in the management of dermatological patients. The first part of the study focuses on evaluating patients treated at IDEI, with an estimated inclusion of at least 120 patients, covering cases of pigmented lesions and androgenic alopecia. The primary objective is to assess the tool's impact on reducing the time and cost of patient care by enhancing diagnostic accuracy and appropriately assigning consultations.
Objectives
The primary objective is to validate that the device enhances the clinical workflow and patient care process by improving diagnostic accuracy and reducing both the time and cost of care per patient. Secondary objectives include demonstrating the device's effectiveness in increasing healthcare professionals' ability to detect malignant pigmented lesions, assess the severity of female androgenic alopecia and acne, automate the initial triage process, reduce healthcare resource use through more efficient patient referrals, and increase specialist satisfaction and patient usability.
Design
This is an observational study, both prospective and retrospective, focusing on a series of clinical cases. The study does not include an active or control group, as it aims to evaluate the performance of the device in a real-world clinical setting. The assessment relies on photograph submissions through the device platform, with the study centred on analyzing these images. Additionally, retrospective images taken outside device platform are also included and analyzed separately as part of the retrospective study.
Ethical considerations
The conduct of this study adhered to international Good Clinical Practice (GCP) guidelines, the Declaration of Helsinki in its latest amendment, and applicable international and national regulations. Approval from the relevant Ethics Committee was obtained prior to the initiation of the study. Any modifications to the protocol were reviewed and approved by the Principal Investigator (PI) and subsequently evaluated by the Ethics Committee before subjects were enrolled under a modified protocol.
This study was conducted in compliance with European Regulation 2016/679, of 27 April, concerning the protection of natural persons with regard to the processing of personal data and the free movement of such data (General Data Protection Regulation, GDPR), and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and the guarantee of digital rights. In accordance with these regulations, no data enabling the personal identification of participants was collected, and all information was managed securely in an encrypted format.
Patients were informed both orally and in writing about all relevant aspects of the study, with the information being tailored to their level of understanding. They were provided with a copy of the informed consent form and the accompanying patient information sheet. Adequate time was given to patients to ask questions and fully comprehend the details of the study before providing their consent.
The Principal Investigator was responsible for the preparation of the informed consent form, ensuring it included all elements required by the International Conference on Harmonisation (ICH), adhered to current regulatory guidelines, and complied with the ethical principles of GCP and the Declaration of Helsinki.
The original signed informed consent forms were securely stored in a restricted access area under the custody of the PI. These documents remained at the research site at all times. Patients were provided with a copy of their signed consent form for their records.
Data quality assurance
The Principal Investigator is responsible for reviewing and approving the study protocol and its possible modifications in the future, signing the Principal Investigator's commitment, guaranteeing that the persons involved in the centre will respect the confidentiality of patient information and protect personal data, and reviewing and approving the final study report. All members of the research team will assess the eligibility of the study patients, inform and request written informed consent, collect the study source data in the clinical record and transfer them to the Data Collection Forms (DCF).
Subject population
The study enrolled patients that fulfilled the following criteria:
Inclusion criteria
- Patients aged 18 years or older (16 years in case of acne).
- Patients with pigmented lesions who meet any of the following conditions:
- Who consult for the first time for any pigmented lesion.
- Patients who have already had a dermoscopy appointment for the first time or a check-up of pigmented lesions.
- Patients with active inflammatory acne.
- Women with androgenic alopecia.
Exclusion criteria
- Patients who at the investigator's discretion cannot or will not comply with the study procedures.
The study will prospectively include a minimum of 60 cases: 30 with pigmented lesions, 15 with androgenic alopecia, and 15 with inflammatory acne. In the retrospective analysis, 60 patients with pigmented lesions, 15 with androgenic alopecia, and 15 with inflammatory acne will also be included.
Treatment
Patients in this study did not receive any specific treatment as part of the research protocol.
Concomitant medication/treatment
Patients continued their regular prescribed medications and treatments as directed by their primary healthcare providers. No additional medications or treatments were administered as part of this study.
Follow-Up Duration
This study did not require a follow-up of the subjects. Every patient only got their skin lesions photographed at the time of visit.
Statistical analysis
For the evaluation of diagnostic performance, the pathological examination served as the gold standard across both retrospective and prospective studies. Several statistical techniques were employed to analyze and compare the performance of dermatologists and the medical device. Area Under the Curve (AUC) was calculated to assess firstly the diagnostic performance of dermatologists and the medical device in detecting malignancy, and secondly to compare the performance of dermatologists assisted by the legacy device, the legacy device and the new medical device. Sensitivity and specificity were also calculated.
For skin lesion recognition, top-K accuracy was calculated. These metrics measured how often the correct diagnosis was among the top predictions from the device and dermatologists. It was calculated for dermatologists, the legacy device, and the new medical device. For this study, we set the value of K to 1, 3, and 5. Differences in diagnostic performance between retrospective and prospective studies were assessed, attributing improvements to the homogeneity of the prospective lesion dataset and the assistance provided by the medical device.
For the analysis of androgenic alopecia, a correlation analysis was performed to compare the investigator's Ludwig score and the algorithm developed to assess the severity of androgenic alopecia.
Results
Initiation and completion date
The first part of the study started on 2024-02-02 and included 202 subjects at the moment of writing this report. It concluded on 2024-08-23. This study will conclude when the second part is completed, which will include patients diagnosed with acne.
Subject and investigational product management
This study included 202 patients treated at IDEI. It included 76 retrospective patients with pigmented lesions (88 lesions), 32 prospective patients with pigmented lesions (42 lesions), 62 retrospective patients with androgenetic alopecia and 34 prospective patients with androgenetic alopecia. However, this study is not yet finished, so it is planned to include prospectively 15 patients with acne and retrospectively 15 more patients with acne.
The investigational products were stored and handled following strict protocols. This included proper storage conditions, handling procedures, and documentation of product usage. The accountability and traceability of investigational products were rigorously maintained throughout the study.
Subject demographics
All participants in this study were from Spain and Caucasians.
Clinical Investigation Plan (CIP) compliance
The study adhered to all aspects outlined in the CIP. This ensured that the research was conducted in accordance with established protocols, procedures, and ethical standards. Any deviations from the CIP were duly documented and appropriately addressed. The compliance with the CIP was rigorously monitored throughout the study to uphold the integrity and validity of the research findings.
Analysis
Pigmented lesions
Introduction
To validate the performance of the device in distinguishing between malignant and benign skin lesions, we conducted both retrospective and prospective studies.
Datasets
The dataset includes 88 images sourced from 76 distinct retrospective patients, of which 77 images are dermoscopic and 11 are clinical ones. Each lesion counts with only one image. Of the clinical images, 10 were manually cropped to focus on the lesion area and enhance the precision of medical device analysis. Additionally, one extra retrospective clinical image, which falls outside the total set of 88, was excluded from this report due to ambiguity regarding which lesion within the image should be examined.
The dataset also includes 120 images of 42 lesions sourced from 32 different prospective patients. Each lesion counts with up to 3 images. All of these prospective images are clinical. Prospective lesions are also provided with the dermatologist's recommendation related to their extirpation.
Methodology
We used the ICD-11 categories to calculate the probability of malignancy by summing the probabilities of categories identified as malignant. This approach is based on the post-processing of the output from an image-based recognition model for visible ICD categories, rather than an independent algorithm.
Malignancy scores were calculated for each retrospective and prospective image. Dermatologists diagnosed the cases, and those suspected of skin cancer were biopsied and confirmed through pathological examination, which served as the gold standard. Additionally, investigators assigned a suspicion score from 0 to 10 based on their clinical judgment. These suspicion scores, along with the diagnoses, were used to determine the sensitivity and specificity of the system.
As diagnoses from both dermatologists and pathological examinations, unlike those from the medical device, are presented in plain text they do not necessarily adhere to the ICD-11 international classification standard. To enable comparison and analysis, these diagnoses have been manually translated into their closest matching ICD-11 categories, among those recognized by the medical device. However, there are cases where this translation may lack the necessary precision for a perfect match. For instance, a dermatologist's diagnosis of carcinoma may not align exactly with a pathological examination identifying squamous cell carcinoma. While both are malignant, in their diagnosis evaluation both outcomes do not match.
Androgenetic alopecia
Introduction
To estimate the performance of the device algorithm in predicting feminine androgenetic alopecia (FAA) by automatically computing the Ludwig score, two analyses were conducted:
- Retrospective analysis: This analysis utilized all 62 images provided for the initial retrospective study. These images were used to search for the best hyperparameters for the neural networks to extract the Ludwig score.
- Prospective analysis: This analysis involved 34 images set aside for prospective evaluation. These images were not used to tune the model, ensuring an unbiased assessment of the model's performance.
Datasets
The dataset comprises 96 images of patients with varying degrees of FAA, collected by expert dermatologists. The dataset is divided as follows:
- 13 images initially received.
- 49 images for retrospective analysis.
- 34 images for prospective analysis.
The first two groups were used to tune the device models for predicting the Ludwig score, and the third set was used for model evaluation.
Methodology
The algorithm designed to determine the Ludwig score is composed of three parts:
- Head cropper: Crops the area of the head from the image.
- Scalp and alopecia segmentation: Segments of the total scalp and the part affected by alopecia.
- Ludwig score computation: Computes the Ludwig score.
Head Detector
A YOLO detector is employed to identify and predict the bounding box of the head in the input image, focusing on regions critical for estimating the severity of alopecia.
Scalp and alopecia segmentation
A ResNet50 encoder extracts features from the image, which are then input for the decoder forming a UNet. This UNet segments the scalp and areas of hair loss. This model was trained on large external datasets covering various cases of alopecia with different degrees, perspectives, illumination, and resolution.
Ludwig score computation
After cropping and segmentation, the percentage of alopecia predicted by the model is calculated. The Ludwig score is derived from the alopecia percentage using the following equation:
Where:
- are the total number of pixels that cover the scalp in the image
- are the number of pixels covered with hair.
The counts of and depend on the threshold used to convert the logits to their categorical prediction, which affects the Ludwig score. Additionally, the head cropper's hyperparameters influence the pixel counts. To determine the optimal hyperparameters, we used two search methods: grid search and Bayesian optimization. The grid search ensures an exhaustive exploration of the configuration space, while Bayesian optimization uses probabilistic theory to optimize the search more finely.
Results and discussion
Pigmented lesions
The evaluation of diagnostic performance for pigmented lesions relies on pathological examination as the gold standard. In the following, we present the findings from this evaluation, encompassing both retrospective and prospective studies.
Retrospective analysis
The results of malignancy demonstrate similar diagnostic performance between dermatologists and medical device. In particular, dermatologists and medical device achieved an AUC of 0.79 and 0.76 respectively. In terms of sensitivity and specificity, dermatologists reached a sensitivity of 86% and a specificity of 36%, while the medical device recorded a sensitivity of 81% and a specificity of 52%. These values were derived using a threshold of 10, as recommended by the manufacturer. Malignancy results showcase a higher tendency of dermatologists to diagnose malignant pathologies that, by being more conservative, could lead to increased resource utilization in clinical dermatology practice.
For the pathology diagnosis analysis, we discarded two samples from the set of 88 images that did not have valid results from the pathological or dermatology examination. Despite not achieving particularly high diagnostic accuracy, the analysis reveals comparable performance between dermatologists and the medical device, as shown in the table below. Note that, for this evaluation, dermatologists only provide up to 3 diagnosis results.
Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | |
---|---|---|---|
Dermatologist | 0.33 | 0.45 | -- |
Medical device | 0.23 | 0.38 | 0.47 |
A detailed evaluation of the diagnostic results reveals that 36 out of 86 samples (42%) correspond to different types of nevus. Among these, dermatologists and medical device incorrectly classify the specific type of nevus in 24 and 27 of the 36 cases, respectively. To provide a broader view of the diagnosis performance, we relaxed the evaluation criteria, considering any nevus diagnosis as correct when a nevus is identified, irrespective of its specific type. With this generalized approach, the number of misclassifications drops to 2 for the dermatologists and 0 for the medical device. This adjustment leads to a significant improvement in performance for both, with the medical device's top-5 accuracy significantly surpassing that of the dermatologists.
Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | |
---|---|---|---|
Dermatologist | 0.56 | 0.70 | -- |
medical device | 0.50 | 0.71 | 0.78 |
Visually inspecting the images, we observe that most of them were captured using a dermatoscope, resulting in significantly higher image quality compared to standard smartphone photos. However, many images fail to centre the lesion of interest or are obscured by a substantial amount of hair covering the lesion. These artefacts can cause the medical device to focus more on analyzing healthy skin rather than the affected areas, potentially affecting its diagnostic accuracy. Addressing these issues during the image capture process could enhance the device's performance in clinical practice.
Lesion not centred in the image:
Lesion covered by hair:
Prospective analysis
The prospective analysis involves evaluating the performance of three distinct sources: dermatologists assisted by the legacy medical device, the legacy medical device on its own, and the new medical device.
The malignancy results, as shown in the table, demonstrate excellent diagnostic performance across the three sources of diagnosis. Notably, dermatologists exhibit strong malignancy detection when assisted by the legacy medical device. The medical device makes a more conservative diagnosis, leading to the identification of a greater number of malignant pathologies. Sensitivity and specificity statistics were calculated using a threshold of 10, as recommended by the manufacturer. However, as shown in Figure 2, this threshold can be adjusted for the medical device to enhance specificity at no sensitivity cost.
AUC | Sensitivity@10 | Specificity@10 | |
---|---|---|---|
Dermatologists + legacy device | 0.94 | 88 % | 85 % |
Legacy medical device | 0.95 | 100 % | 71 % |
Medical device | 0.97 | 100 % | 74 % |
Of special relevance, the prospective evaluation results include the dermatologist's recommendation for either follow-up or removal of the lesion. As expected, this recommendation is closely aligned with the malignancy score measured by the medical device. Therefore, a malignancy threshold can be applied over the medical device's malignancy to determine when removal may be necessary, providing valuable support for the dermatologist's clinical decision-making. In our analysis, we found that with a malignancy threshold set at 0.40, the medical device can predict the need for removal with an accuracy of 90%.
For the evaluation of the pathology diagnosis, we discard 15 out of the 42 samples that do not have a confirmed pathological examination. As a result, we find that the accuracy of the dermatologists, when supported by the medical device, is comparable to that of the device operating independently.
Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | |
---|---|---|---|
Dermatologists + legacy device | 0.30 | -- | -- |
Legacy medical device | 0.22 | 0.33 | 0.44 |
Medical device | 0.26 | 0.37 | 0.52 |
As in the retrospective study, we found that 18 of the 27 samples (67%) correspond to various types of nevus. Among these cases, 60-80% of these nevus cases are misclassified when it comes to identifying the specific type of nevus. Despite this, when not taking into account the exact subtype, all the dermatologists aided by the legacy medical device, the legacy medical device, and the medical device make no diagnostic errors in the nevus samples, leading to improved top-k accuracy. Notably, dermatologists aided by the legacy medical device nearly triple their accuracy, while the medical device achieves top-1 accuracy approaching that of the dermatologists.
Top-1 Accuracy | Top-3 Accuracy | Top-5 Accuracy | |
---|---|---|---|
Dermatologists + legacy device | 0.85 | -- | -- |
Legacy medical device | 0.74 | 0.85 | 0.89 |
Medical device | 0.81 | 0.93 | 0.93 |
The prospective evaluation results demonstrate that the current medical device consistently outperforms the legacy device across all statistical metrics. Additionally, both the dermatologists and the medical device show superior performance when assessing prospective lesions compared to retrospective ones. This disparity can be attributed to several factors. First, the prospective lesions are derived from a smaller, more homogenous dataset, predominantly comprising well-known pathologies such as seborrheic keratosis, basal cell carcinoma, and nevus. In contrast, the retrospective lesions exhibit greater variability, encompassing pathologies like dermatofibroma, lentigo, and various carcinomas, which pose more diagnostic challenges. Furthermore, the improved performance in the prospective evaluation can be attributed to the fact that dermatologists benefit from the assistance of the medical device, which in this case analyzes up to three images per lesion. Differently, in the retrospective evaluation, only a single image per lesion was available for analysis.
Androgenetic alopecia
Retrospective analysis
For the retrospective analysis, 49 images were collected in addition to 13 images previously received. Since the alopecia models were trained to predict the scalp area and the alopecia area, they could not be directly used to obtain the Ludwig score. Therefore, Equation 1 was designed to compute the Ludwig score from the device alopecia model. Hyperparameter tuning was done using grid search and Bayesian optimization, maximizing the correlation between the predicted grade and the investigator's score. The optimized model achieved a correlation of 0.77 on the previous dataset.
Prospective Analysis
The 34 images used for the prospective analysis were evaluated without tuning any model parameters, ensuring an unbiased assessment of the algorithm's performance. The results are presented in Table 1, comparing the model's predictions with the investigator's results. The overall accuracy of the model was 47%, while the accuracy of the latest model optimized for FAA, using the investigator's score as the ground truth, was 53%. This indicates that the device algorithm can still be improved by incorporating more data and continuously optimizing current models.
NHC | FileName | Ludwig score: Investigator | Ludwig score: LH newest algorithm | Ludwig score: LH algorithm | Alopecia percentage |
---|---|---|---|---|---|
25176 | AYUh87VKrBb | 1 | 1 | 3 | 4 |
69267 | z1QiXRY32xW | 1 | 1 | 0 | 19 |
69267 | JhErfsHvA5p | 1 | 1 | 0 | 20 |
69267 | MEBRrTgpMr7 | 1 | 1 | 0 | 20 |
69267 | DEicMHFj1Ah | 1 | 1 | 0 | 22 |
69267 | rSpfwyy93hE | 1 | 1 | 0 | 22 |
69267 | f8HwKf6DBkC | 1 | 2 | 0 | 26 |
44891 | Mecdm6xSspk | 1 | 2 | 2 | 27 |
69267 | SLijRgf93jA | 1 | 2 | 0 | 27 |
44891 | jqJLrgdoL1P | 1 | 2 | 2 | 29 |
44891 | PQ9PAuYXGfG | 1 | 2 | 2 | 30 |
44891 | n9fcxw3GrCa | 1 | 2 | 2 | 32 |
109847 | 1huKjxoFbe5 | 1 | 3 | 3 | 47 |
51537 | zVTiAobQg8H | 1 | 3 | 3 | 66 |
90908 | 51de3pWwMsQ | 2 | 1 | 2 | 19 |
90908 | z48L66dcGLP | 2 | 1 | 2 | 24 |
60024 | De2LYbvQ3pD | 2 | 1 | 2 | 25 |
54272 | G3Q5A7G1ujD | 2 | 2 | 2 | 25 |
90908 | uQPm9gUSKEp | 2 | 2 | 2 | 28 |
58554 | HKUyjyhNt4r | 2 | 2 | 2 | 28 |
58554 | wbDdjhK9V7V | 2 | 2 | 2 | 30 |
31798 | JBGtD9eD7qw | 2 | 2 | 3 | 33 |
87139 | bxguczSGLzk | 2 | 2 | 3 | 33 |
119023 | ihRxxo4GX3u | 2 | 2 | 2 | 34 |
39877 | CxGjaJxS13h | 2 | 2 | 2 | 35 |
118294 | avWyvdqVLwA | 2 | 2 | 2 | 35 |
90908 | RaYS75i5i5U | 2 | 2 | 2 | 36 |
52669 | PR26K8s3dAW | 2 | 3 | 3 | 45 |
88229 | feoUsREEq7e | 2 | 3 | 3 | 46 |
58554 | 1Ud2duBk3bS | 2 | 3 | 2 | 53 |
31219 | m3wNt42aEwg | 3 | 2 | 3 | 35 |
117484 | 5aGL8DkosRJ | 3 | 2 | 3 | 40 |
108456 | T5aXmVYwSZ8 | 3 | 3 | 3 | 61 |
30810 | Vb4eoyRXUZz | 3 | 3 | 3 | 91 |
Table 1: Results of the predicted grade using the device algorithm and the investigator's score assigned to each image.
To illustrate the outcomes, we present examples for each grade:
Grade 1 examples
Three examples with Grade 1 from the investigator.
Grade 2 examples
Three examples with Grade 2 from the investigator.
Grade 3 examples
Three examples with Grade 3 from the investigator.
Confusion matrix and correlation
The confusion matrix shows that the primary mismatch occurs between Grade 1 and Grade 2. The model predicted Grade 2 when the investigator assigned Grade 1 in 6 out of 14 cases. Additionally, 50% of the investigator's Grade 3 scores were predicted as Grade 2 by the model. There were no instances where the investigator's Grade 3 was predicted as Grade 1 by the model, and only 2 out of 14 cases predicted as Grade 3 by the model were scored as Grade 1 by the investigator.
The correlation analysis shows a higher correlation of 50% with the alopecia percentage compared to 34% with the predicted grade. This suggests that the alopecia percentage predicted by the model is more closely aligned with the investigator's score than the categorical grade, likely due to the loss of information when converting the alopecia degree to its categorical label. This is consistent with the observed confusion matrix, indicating that small changes in the alopecia percentage can alter the final grade by one degree.
Confusion matrix between the model predictions and the GT:
Correlation between the model predictions and the GT:
Adverse events and adverse reactions to the product
Throughout the study, no adverse events or adverse reactions related to the investigational product have been observed. Participants have not experienced any negative reactions or side effects associated with the use of the product. This indicates a favourable safety profile of the investigational product in the context of this study.
Product deficiencies
No deficiencies in the product were observed throughout this study. As a result, no corrective actions have been deemed necessary. The product has demonstrated consistent performance in accordance with the study's objectives.
Subgroup analysis for special populations
In the context of the analyzed pathologies, no special population subgroups were identified for this study. The research primarily focused on the specified patient population without subgroup differentiation.
Accounting for all subjects
120 patients with pigmented lesions, androgenic alopecia and acne were initially considered for inclusion in this study.
However, since the study has not yet concluded, 202 individuals who met the specified eligibility criteria were included. The second part of the study will include patients with acne. It is true that in the first part of the study.
Discussion and overall Conclusions
Clinical performance, effectiveness, and safety
The medical device demonstrated high performance in malignancy detection and pathology diagnosis, performing at a level comparable to that of expert dermatologists both for the retrospective and prospective analysis. This performance was achieved despite the inherent bias in the dataset, which only includes lesions deemed suspicious enough to warrant a biopsy.
The device algorithms demonstrate moderate accuracy in predicting the Ludwig score for FAA. The overall accuracy was 47% in the prospective analysis, improving to 53% in the latest model. There is a low incidence of predicted grades differing by two grades from the investigator's score and a 50% correlation between the alopecia percentage and the investigator's score. These results indicate the potential of the device solution as a tool for estimating the Ludwig score for FAA. Besides that, expanding the dataset and incorporating more diverse image samples could enhance the model's robustness and generalizability.
Limitations of clinical research
Several factors reduce the accuracy of predicting the Ludwig score:
- Pictures taken from an angle that is not perpendicular to the top of the head, leading to confusion between the front of the head and areas affected by alopecia.
- Hands positioned on the sides of the head to hold the hair during picture collection, sometimes mistaken for areas of alopecia.
- The ground truth (GT) is based on the annotation of a single specialist. To eliminate bias and increase reliability, a GT based on multiple specialists would be recommended. A variability test would also provide valuable insight into the interpretability of the model performance.
Regarding malignancy detection and diagnosis of pigmented lesions, the main limitation is the image quality and clinical utility of retrospective pictures. However, this was solved with the prospective study.
Clinical risks and benefits
Participants in this study did not undergo any procedures posing a risk to their safety. However, using the device could optimize patient diagnosis, save costs and time, and provide better treatment to patients.
Clinical relevance
The device represents a significant advancement in the field of dermatology. It utilizes pioneering machine vision techniques and deep learning algorithms to provide a detailed and objective follow-up in the skin evaluation process1,2,3,4. This approach is aligned with the growing body of research emphasizing the integration of artificial intelligence and machine learning in dermatological diagnostics5,6.
Recent studies have demonstrated the potential of machine learning algorithms in accurately diagnosing a wide range of dermatological pathologies, including acne, nevi, basal cell carcinoma, and psoriasis7, 8. Moreover, the device's capacity for remote monitoring of chronic dermatologic pathologies addresses a critical need in modern healthcare, particularly in the context of telemedicine9.
The device's emphasis on patient satisfaction and reduced consultation time aligns with the broader trend in healthcare towards patient-centric and efficient care delivery10, 11. Additionally, the absence of adverse events or reactions observed in this study underscores the favourable safety profile of the device, in line with current standards for medical device safety12.
Comparative to others, the device distinguishes itself by providing a comprehensive solution that combines diagnostic support with effective pathology tracking. While some existing tools focus primarily on diagnostic accuracy, the device's unique dual functionality enhances its clinical utility and potential impact on patient care13, 14.
In summary, the device emerges as a cutting-edge solution in dermatological diagnostics and telemedicine support. Its integration of machine learning algorithms, patient-centred approach, and favourable safety profile position it at the forefront of advancements in dermatology technology.
Specific benefit or special precaution
Benefits:
- The device allows the diagnosis of a large set of skin lesions automatically from digital images.
- Automated diagnosis provides quick feedback to the medical practitioner easing and speeding up its practice.
- Diagnosis insights help to optimise the referrals and teledermatology, reducing the waiting lists and the subsequent cost, and improving the treatment and experience of the patient.
- The device can also evaluate the severity of different diseases, which can assist in monitoring the progression of the disease and the effectiveness of treatment, as well as saving time for the medical practitioner.
Precautions:
- The device must be used as a clinical support and not to replace the expertise of the medical practitioner.
- The device can only analyze visible lesions and provide insight into a closed set of skin lesions. Skin lesions not learnt by the device can not be diagnosed.
- Images taken with a low quality can lead to a poor diagnosis. To ensure the image quality and provide feedback on its usefulness, the device incorporates the DIQA11 algorithm.
Implications for future research
The study's positive results suggest several promising directions for future research. For instance, standardizing the automatic Ludwig could greatly benefit clinical trials by providing a reliable and consistent method for assessing severity. If proven to be a stable and effective tool, it could significantly enhance measurement accuracy. Additionally, evaluating the device's performance with images taken by patients at home could expand its potential applications in detecting malignancies. While these advancements would still require medical oversight, they could improve workflow efficiency by reducing the need for constant supervision.
Limitations of clinical research
The main limitation of machine learning in this context is the quantity and quality of the images collected. Factors such as lighting, colour, shape, size, and focus, as well as the number of images per patient, all play a crucial role. High variability within the same patient and an insufficient number of images to capture this variability can reduce accuracy.
In this study, a specific challenge is analyzing retrospective images, which often suffer from poor quality and may need to be discarded. Unlike prospective studies, which benefit from the Dermatology Image Quality Assessment (DIQA) algorithm that filters out low-quality images, retrospective images may not meet these standards. Additionally, even when past images are of good quality, they are often taken without adhering to the medical device's Instructions for Use (IFU), which can limit the effectiveness of the AI in fully utilizing these images.
Ethical considerations
The conduct of this study adhered to international Good Clinical Practice (GCP) guidelines, the Declaration of Helsinki in its latest amendment, and applicable international and national regulations. Approval from the relevant Ethics Committee was obtained prior to the initiation of the study. Any modifications to the protocol were reviewed and approved by the Principal Investigator (PI) and subsequently evaluated by the Ethics Committee before subjects were enrolled under a modified protocol.
This study was conducted in compliance with European Regulation 2016/679, of 27 April, concerning the protection of natural persons with regard to the processing of personal data and the free movement of such data (General Data Protection Regulation, GDPR), and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and the guarantee of digital rights. In accordance with these regulations, no data enabling the personal identification of participants was collected, and all information was managed securely in an encrypted format.
Patients were informed both orally and in writing about all relevant aspects of the study, with the information being tailored to their level of understanding. They were provided with a copy of the informed consent form and the accompanying patient information sheet. Adequate time was given to patients to ask questions and fully comprehend the details of the study before providing their consent.
The Principal Investigator was responsible for the preparation of the informed consent form, ensuring it included all elements required by the International Conference on Harmonisation (ICH), adhered to current regulatory guidelines, and complied with the ethical principles of GCP and the Declaration of Helsinki.
The original signed informed consent forms were securely stored in a restricted access area under the custody of the PI. These documents remained at the research site at all times. Patients were provided with a copy of their signed consent form for their records.
Investigators and administrative structure of clinical research
Brief description
The clinical investigation team is comprised of highly respected dermatologists. Dr Sánchez Viera, with over 15 years of experience, is a leading expert in dermatology, particularly in Skin Cancer and Cutaneous Aesthetics. He has earned international recognition in these fields and has worked in both major public and private hospitals, including Gregorio Marañón Hospital in Madrid, where he headed the Skin Cancer and Dermatological Surgery department. Currently, he is the founder and director of the Instituto de Dermatología Integral (IDEI) in Madrid and collaborates with several private hospitals. Dr Sánchez Viera also coordinates the Spanish Group of Aesthetic and Therapeutic Dermatology (GEDET) within the Spanish Academy of Dermatology. He has taught at the Complutense University of Madrid and regularly lectures at global courses and congresses. His extensive publication record includes numerous articles in national and international journals, and he serves on the editorial boards of several of these publications. Additionally, he is actively involved in numerous scientific associations and their steering committees.
Dr Sánchez Viera's team at IDEI includes several esteemed dermatologists: Dr Concetta D'Alessandro, Dr Alejandra Capote, Dr Pablo Lopez Andina, Dr Allison Marie Bell-Smythe Sorg, Dr Alejandra Vallejos, Dr Isabel del Campo, Dr Juliana Machado, and Dr Raúl Lucas Escobar.
The team also includes Alfonso Medela from AI Labs Group S.L., who provides crucial expertise in artificial intelligence, alongside Taig Mac Carthy. This diverse and skilled team ensures a thorough approach to evaluating the device's safety, effectiveness, and performance in real-world dermatological settings, including the Sodupe-Güeñes, Balmaseda, Buruaga, and Zurbarán Health Centers.
Investigators
Principal investigator
- Dr Miguel Sanchez Viera
Collaborators
- Dr Concetta D'Alessandro (IDEI)
- Dr Alejandra Capote (IDEI)
- Dr Pablo Lopez Andina (IDEI)
- Dr Allison Marie Bell-Smythe Sorg (IDEI)
- Dr Alejandra Vallejos (IDEI)
- Dr Isabel del Campo (IDEI)
- Dr Juliana Machado (IDEI)
- Dr Raúl Lucas Escobar (IDEI)
- Beatriz Torres (IDEI)
- Alfonso Medela (AI Labs Group S.L.)
- Taig Mac Carthy (AI Labs Group S.L.)
Centers
- IDEI centro dermatológico
External organization
No additional organizations, beyond those previously mentioned, contributed to the clinical research. The study was conducted with the collaboration and resources of the specified entities.
Promoter and monitor
- Legit.Health ®
- AI Labs Groups S.L.
- Gran Vía 1, BAT Tower, 48001 Bilbao, Bizkaia, Spain
Report annexes
- Ethics Committee resolution can be found in the document
CEIm_Legit.Health_IDEI_2023.pdf
. - Instructions For Use (IFU) can be found in the protocol.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Author: Team members involved
- Reviewer: JD-003, JD-004
- Approver: JD-001