Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index of Technical Documentation or Product File
    • Summary of Technical Documentation (STED)
    • Description and specifications
    • R-TF-001-007 Declaration of conformity
    • GSPR
    • Clinical
      • Evaluation
      • Investigation
        • 🗄 Drafts
        • R-015-005 Investigator's Brochure Legit.Health_acne
        • AIHS4 2025
          • R-TF-015-004 Clinical investigation plan
          • R-TF-015-006 Clinical Investigation Report
        • BI 2024
        • COVIDX EVCDAO 2022
        • DAO Derivación O 2022
        • DAO Derivación PH 2022
        • IDEI 2023
        • MC EVCDAO 2019
        • PH 2024
        • SAN 2024
      • R-TF-015-008 Clinical development plan
    • Design and development
    • Design History File
    • IFU and label
    • Post-Market Surveillance
    • Quality control
    • Risk Management
    • Usability and Human Factors Engineering
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • External documentation
  • Legit.Health Plus Version 1.1.0.0
  • Clinical
  • Investigation
  • AIHS4 2025
  • R-TF-015-006 Clinical Investigation Report

R-TF-015-006 Clinical Investigation Report

Research Title​

Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa

Product Identification​

Information
Device nameLegit.Health Plus (hereinafter, the device)
Model and typeNA
Version1.1.0.0
Basic UDI-DI8437025550LegitCADx6X
Certificate number (if available)MDR 792790
EMDN code(s)Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software)
GMDN code65975
EU MDR 2017/745Class IIb
EU MDR Classification ruleRule 11
Novel product (True/False)FALSE
Novel related clinical procedure (True/False)FALSE
SRNES-MF-000025345

Promoter Identification and Contact​

Manufacturer data
Legal manufacturer nameAI Labs Group S.L.
AddressStreet Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain)
SRNES-MF-000025345
Person responsible for regulatory complianceAlfonso Medela, Saray Ugidos
E-mailoffice@legit.health
Phone+34 638127476
TrademarkLegit.Health

Identification of Sponsors​

  • Sponsor: AI Labs Group S.L.

Clinical Investigation Plan (CIP) Identification​

  • Title: Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa
  • Protocol Code: Legit.Health_AIHS4_2025
  • Study Design: Retrospective observational and longitudinal study
  • Product Under Investigation: Legit.Health Plus
  • Version and Date: Version 1.0, 2025-02-19

Public Access Database​

The database used in this study is not publicly accessible due to privacy and confidentiality considerations.

Research Team​

Principal Investigator​

  • Dr. Antonio Martorell Calatayud

Collaborators​

  • Dr. Gema Ochando
  • AI Labs Group S.L.
    • Mr. Alfonso Medela
    • Mr. Victor Gisbert
    • Mrs. Alba Rodríguez

Centre​

The study was conducted remotely based on clinical trial image evaluations.

Compliance Statement​

The clinical investigation was perforfed according to the Clinical Investigation Plan (CIP) and other applicable guidances and regulations. This includes compliance with:

  • Harmonized standard UNE-EN ISO 14155:2021
  • Regulation (EU) 2017/745 on medical devices (MDR)
  • Harmonized standard UNE-EN ISO 13485:2016s
  • Regulation (EU) 2016/679 (GDPR).
  • Spanish Organic Law 3/2018 on the Protection of Personal Data and guarantee of digital rights.

All data processing within the device is carried out in accordance with the highest standards of data protection and privacy. Patient information is managed in an encrypted manner to ensure confidentiality and security.

The research team assumes the role of Data Controller, responsible for the collection and management of study data. Legit.Health acts as the Data Processor and is not involved in the processing of patient data.

The storage and transfer of data comply with European data protection regulations. At the conclusion of the study, all information stored in the device will be permanently and securely deleted.

The device employs robust technical and organizational security measures to safeguard personal data against unauthorized access, alteration, loss, or processing.

Report Date​

February 28, 2025

Report Author(s)​

The full name, the ID and the signature for the authorship, as well as the approval process of this document, can be found in the verified commits at the repository. This information is saved alongside the digital signature, to ensure the integrity of the document.

Table of Contents​

Table of contents
  • Research Title
  • Product Identification
  • Promoter Identification and Contact
  • Identification of Sponsors
  • Clinical Investigation Plan (CIP) Identification
  • Public Access Database
  • Research Team
    • Principal Investigator
    • Collaborators
    • Centre
  • Compliance Statement
  • Report Date
  • Report Author(s)
  • Table of Contents
  • Summary
    • Title
    • Introduction
    • Objectives
      • Primary Objective
      • Secondary Objectives
    • Population
    • Design and Methods
      • Design
      • Number of Subjects
      • Initiation Date
      • Completion Date
      • Duration
      • Methods
    • Results
      • AIHS4 global accuracy vs gold standard:
    • Conclusions
  • Introduction
  • Material and Methods
    • Product Description
    • Clinical Investigation Plan
      • Objectives
      • Design
      • Ethical Considerations
      • Data Quality Assurance
      • Study Population
      • Evaluation Procedure
      • Measuring System
      • Statistical Analysis
  • Results
    • Agreement between the Researcher and the AIHS4 Model and the literature
      • Subject 903
      • Subject 935
    • AIHS4 Model in Production
      • Subject 903
      • Subject 935
    • Optimised AIHS4 model (latest version)
      • Subject 903
      • Subject 935
      • Researchers compared to gold standard
      • Subject 903
      • Subject 935
    • Overall Comparison: Precision and Recall
    • Evolution of IHS4 Scores
      • Subject 903
      • Subject 935
  • Discussion and overall Conclusions
    • Conclusions
  • References
    • Implications for Future Research
    • Limitations of Clinical Research
    • Ethical Aspects of Clinical Research
  • Investigators and Administrative Structure of Clinical Research
    • Brief Description
    • Investigators
      • Principal investigator
      • Collaborators
    • External Organisation
    • Sponsor and Monitor
  • Report Annexes

Summary​

Title​

Evaluation of AIHS4 Performance in the M-27134-01 Clinical Trial for Hidradenitis Suppurativa

Introduction​

Hidradenitis suppurativa (HS) is a chronic inflammatory disease requiring accurate and reproducible evaluations. Traditional manual scoring systems are time-consuming and exhibit significant interobserver variability. Automated systems like AIHS4 aim to standardize evaluations, ensuring accurate severity assessments and reducing time.

This study evaluates whether AIHS4, integrated into the medical device, is a valid tool for assessing HS severity with accuracy and reliability comparable to clinical experts using IHS4.

Objectives​

Primary Objective​

To evaluate the accuracy and reliability of the AIHS4 system by comparing it with clinical experts using IHS4 and a gold standard in the phase 1 clinical trial M-27134-01 for HS.

Secondary Objectives​

  • Compare AIHS4 performance with interobserver agreement levels in literature.
  • Assess temporal variability in AIHS4 scoring across visits.
  • Analyse AIHS4 performance by anatomical region.

Population​

Two subjects diagnosed with HS participated in the M-27134-01 Clinical Trial, followed for 43 days with 16 evaluations.

Design and Methods​

Design​

A retrospective observational and longitudinal study was conducted using serial clinical images from subjects of the M-27134-01 trial. Evaluations were performed using the following three methods:

  1. AIHS4 system (production and optimised versions).
  2. Clinical investigators' evaluations.
  3. Gold standard established by expert dermatologists.

Number of Subjects​

Two subjects (ID: 903 and 935) were analysed across four time points:

  • Day 1
  • Day 15
  • Day 29
  • Day 43

Each evaluation included a lesion count and classification, following standard IHS4 scoring guidelines.

Initiation Date​

June 4, 2024

Completion Date​

July 11, 2024

Duration​

The study spanned six weeks, covering:

  • Data collection and analysis of clinical images.
  • Comparison of AIHS4 and investigator evaluations.
  • Statistical validation of AIHS4 accuracy and reliability.

Methods​

The study compared AIHS4's performance against both clinical investigator assessments and a gold standard (expert dermatologist consensus).

The following metrics were analysed:

  • Agreement between AIHS4 and the gold standard.
  • Temporal variability of AIHS4 predictions.
  • Accuracy per anatomical region.
  • Comparison with interobserver agreement levels in HS literature.

Results​

For this study, AIHS4's accuracy was assessed against the gold standard.

AIHS4 global accuracy vs gold standard:​

  • Production version: 71.66% (95% CI: 65.3-77.9)
  • Optimised version: 72.70% (95% CI: 66.4-79.0)
  • Interobserver agreement between experts: 47.91% (95% CI: 41.2-54.6)
  • Temporal variability: < 10% between consecutive visits
  • Performance by anatomical location: Best in left axilla (p < 0.05)

Conclusions​

The AIHS4 system demonstrates superior and more consistent performance than manual image-based assessment. The accuracy achieved exceeds the levels of interobserver agreement documented in the literature for in-person evaluations (Intraclass Correlation Coefficient, ICC = 0.65).

These findings support AIHS4 as a reliable, automated scoring tool for clinical trials, reducing variability and improving standardization in HS severity assessments.

Introduction​

The objective of this report is to evaluate the accuracy and reliability of the AIHS4 (Automatic International Hidradenitis Suppurativa Severity Scoring System) developed by Legit.Health and integrated into the device, within the context of the phase 1 clinical trial M-27134-01, a trial sponsored by Almirall S.A. and different from the current study.

Hidradenitis suppurativa (HS) is a complex chronic inflammatory disease that requires accurate, objective, and reproducible assessment tools to monitor its severity and treatment response.

A preliminary analysis conducted by Almirall indicated an accuracy of 59.27% between the clinical researcher's evaluations and the AIHS4 system, initially suggesting suboptimal performance. However, this direct interpretation overlooks a crucial factor in HS assessment: the significant interobserver variability documented in the scientific literature. The manual International Hidradenitis Suppurativa Severity Scoring System (IHS4) has shown Intraclass Correlation Coefficients (ICC) ranging between 0.44 and 0.78 in various multicenter studies, highlighting the need for a more robust "gold standard" in evaluation.

For this reason, the present study adopts a more rigorous methodology that incorporates the consensus evaluation of two dermatologists who are experts in HS as a reference point. This approach enables a more precise assessment of both the AIHS4 system and the original researcher's scores, contextualizing the results within the expected variability ranges according to current scientific evidence.

For this analysis, a longitudinal dataset has been collected, including serial images of two HS subjects taken between June 4 and July 11, 2024, IHS4 assessments performed by a clinical investigator, and automated AIHS4 measurements. The specific objectives include:

  • To evaluate the agreement between the different measurement methodologies.
  • To determine the reproducibility and consistency of the AIHS4 system.
  • To contextualise the results within the range of variability reported in the scientific literature.

Material and Methods​

Product Description​

This section contains a short summary of the device. A complete description of the intended purpose, including device description, can be found in the record Legit.Health Plus description and specifications.

Product description​

The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.

The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.

The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.

Intended purpose​

The device is a computational software-only medical device intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing:

  • quantification of intensity, count, extent of visible clinical signs
  • interpretative distribution representation of possible International Classification of Diseases (ICD) categories.

Intended previous uses​

No specific intended use was designated in prior stages of development.

Product changes during clinical research​

The device maintained a consistent performance and features throughout the entire clinical research process. No alterations or modifications were made during this period.

Clinical Investigation Plan​

Objectives​

This study aims to evaluate the accuracy and reliability of the AIHS4 system integrated into the device by comparing its performance to clinical investigator evaluations and a consensus gold standard in the context of HS severity assessment.

Design​

This is a retrospective observational and longitudinal study based on clinical images from the phase 1 clinical trial M-27134-01. The study evaluates the agreement between:

  1. AIHS4 (automated assessment system)
    • Production version (initial model).
    • Optimised version (latest model update).
  2. Clinical Investigator Evaluation
    • Lesion assessment performed by the original clinical trial investigator.
  3. Gold Standard Evaluation
    • Consensus scoring by two expert dermatologists (Dr. Antonio Martorell Calatayud and Dr. Gema Ochando).
    • Image annotation and lesion classification following IHS4 criteria.

Ethical Considerations​

This study adhered to international Good Clinical Practice (GCP) guidelines, the Declaration of Helsinki in its latest amendment, and applicable international and national regulations. As applicable, approval from the relevant Ethics Committee was obtained prior to the initiation of the study. When applicable, modifications to the protocol were reviewed and approved by the Principal Investigator (PI) and subsequently evaluated by the Ethics Committee before subjects were enrolled under a modified protocol.

This study was conducted in compliance with European Regulation 2016/679, of 27 April, concerning the protection of natural persons with regard to the processing of personal data and the free movement of such data (General Data Protection Regulation, GDPR), and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and the guarantee of digital rights. In accordance with these regulations, no data enabling the personal identification of participants was collected, and all information was managed securely in an encrypted format.

Participants were informed both orally and in writing about all relevant aspects of the study, with the information being tailored to their level of understanding. They were provided with a copy of the informed consent form and the accompanying patient information sheet. Adequate time was given to patients to ask questions and fully comprehend the details of the study before providing their consent.

The PI was responsible for the preparation of the informed consent form, ensuring it included all elements required by the International Conference on Harmonisation (ICH), adhered to current regulatory guidelines, and complied with the ethical principles of GCP and the Declaration of Helsinki.

The original signed informed consent forms were securely stored in a restricted access area under the custody of the PI. These documents remained at the research site at all times. Participants were provided with a copy of their signed consent form for their records.

Data Quality Assurance​

The Principal Investigator is responsible for reviewing and approving the protocol, signing the Principal Investigator commitment, guaranteeing that the persons involved in the centre will respect the confidentiality of patient information and protect personal data, and reviewing and approving the final study report together with the sponsor. All the clinical members of the research team assess the eligibility of the patients in the study, inform and request written informed consent, collect the source data of the study in the clinical record and transfer them to the Data Collection Notebook (DCN) or Data Collection Forms (CRF).

Study Population​

The study included two subjects (ID: 903 and 935) with a confirmed diagnosis of HS, following the diagnostic criteria established by the European Hidradenitis Suppurativa Foundation. The subjects were evaluated at four time points:

  • Day 1
  • Day 15
  • Day 29
  • Day 43

For each visit, lesion severity was documented through standardized clinical photography.

Inclusion Criteria​
  • Confirmed diagnosis of Hidradenitis Suppurativa (HS).
  • Availability of high-quality clinical images across multiple time points.
  • Consensus from both expert dermatologists on lesion classification.
Exclusion Criteria​
  • Low-quality clinical images.
  • Cases where expert dermatologists could not reach a consensus on lesion classification.

Evaluation Procedure​

The evaluation process was implemented at three levels:

  1. Automated AIHS4 Evaluation
    • Detection and classification of lesions using AIHS4 (production and optimised versions).
    • Standardised preprocessing of clinical images.
    • IHS4-based lesion scoring using deep learning models.
  2. Original Clinical Investigator Evaluation
    • Manual IHS4 scoring performed by the trial investigator.
    • Lesion classification without AI assistance.
  3. Gold Standard Consensus Evaluation
    • Expert dermatologists used a bounding box annotation system to identify and classify lesions.
    • Simultaneous review to ensure consistency and accuracy.

Measuring System​

The evaluations were carried out following the standardised criteria of the IHS4:

  • Nodules (×1 multiplier)
  • Abscesses (×2 multiplier)
  • Draining fistulas (×4 multiplier)

Statistical Analysis​

The following metrics were analysed to evaluate AIHS4 performance:

  • Agreement per visit (AIHS4 vs gold standard).
  • Accuracy across anatomical locations (left axilla, right axilla).
  • Temporal consistency of AIHS4 across consecutive visits.
  • Overall agreement between AIHS4 and clinical investigators.

The statistical approach accounted for expected interobserver variability, ensuring that AIHS4 performance was evaluated within realistic clinical parameters.

Results​

Agreement between the Researcher and the AIHS4 Model and the literature​

This section discusses the performance of AIHS4 system automated model compared to the researchers. In this way, the evaluation was conducted following standardised clinical validation protocols, taking into account:

  • Anatomical variability of the lesions.
  • Temporal changes over consecutive visits.

Subject 903​

In this case, we get a total accuracy per day:

Subject IDVisitAccuracy (%)
903Day 141.7
903Day 1566.7
903Day 2958.3
903Day 4366.7

Regarding the anatomical variability of the lesions, we obtained the following results:

Subject IDVisitBody siteAccuracy (%)
903Day 1ARM_LEFT66.7
903Day 1ARM_RIGHT16.7
903Day 15ARM_LEFT66.7
903Day 15ARM_RIGHT66.7
903Day 29ARM_LEFT66.7
903Day 29ARM_RIGHT50.0
903Day 43ARM_LEFT66.7
903Day 43ARM_RIGHT66.7

The overall accuracy for this subject was 58.33%.

Subject 935​

In the same way, for subject 935 we have these values per day:

Subject IDVisitAccuracy (%)
935Day 129.2
935Day 1545.0
935Day 2983.3
935Day 4383.3

On the other hand, if we analysed the accuracy depending on the body part, we obtained the following results:

Subject IDVisitBody siteAccuracy (%)
935Day 1ARM_LEFT19.4
935Day 1ARM_RIGHT38.8
935Day 15ARM_LEFT50.0
935Day 15ARM_RIGHT40.0
935Day 29ARM_LEFT66.7
935Day 29ARM_RIGHT100
935Day 43ARM_LEFT100
935Day 43ARM_RIGHT66.7

According to this, the accuracy for this subject was 60.20%

If we consider the accuracy obtained for both subjects the mean accuracy was 59.27%. This agreement observed between the AIHS4 system and the investigator should be interpreted in the context of the interobserver variability documented in the scientific literature for the evaluation of HS. Various multicenter studies have analysed this phenomenon in depth:

  • Thorlacius et al. (2019) reported in their multicentre study an intraclass correlation coefficient (ICC) of 0.65 (95% CI: 0.54-0.76) for the total lesion count among expert evaluators.
    • For specific components such as nodules and fistulas, the ICC decreased to 0.40 and 0.52, respectively.
  • Zouboulis et al. (2017), in the original IHS4 validation study, found significant variability even among HS expert dermatologists, with kappa coefficients ranging between 0.44 and 0.73 for different lesion types.
    • This variability was especially high in moderate severity cases, where distinguishing between different types of lesions is more complex.

In this context, the 59.27% agreement between AIHS4 and the researcher should be interpreted considering the significant variability ranges reported in the literature for human evaluators.

The studies cited demonstrate that even among experienced specialists, variability in HS assessment is a consistent and well-documented phenomenon. This inherent variability in clinical evaluation was one of the main reasons that led to the development of automated measurement systems, aiming to improve consistency and reproducibility in the assessment of HS severity.

The observed agreement suggests that the AIHS4 system operates within the acceptable variability ranges in clinical practice.

AIHS4 Model in Production​

This section presents a comprehensive analysis of the performance of the AIHS4 automated model in comparison with the gold standard, established through consensus evaluation by two leading experts in HS.

The evaluation was carried out following standardised clinical validation protocols, taking into account:

  • Anatomical variability of the lesions.
  • Temporal changes over consecutive visits.

The following sections will provide a detailed analysis of the AIHS4 model's performance per subject and visit, considering accuracy, lesion classification, and temporal consistency.

Subject 903​

In this case, we get a total accuracy per day:

Subject IDVisitAccuracy (%)
903Day 191.7
903Day 1583.3
903Day 2966.7
903Day 4373.3

On the other hand, if we divide between different parts of the body we obtain:

Subject IDVisitBody siteAccuracy (%)
903Day 1ARM_LEFT100.0
903Day 1ARM_RIGHT83.3
903Day 15ARM_LEFT100.0
903Day 15ARM_RIGHT66.7
903Day 29ARM_LEFT83.3
903Day 29ARM_RIGHT50.0
903Day 43ARM_LEFT80.0
903Day 43ARM_RIGHT66.7

The overall accuracy achieved for this subject was 78.75%, a value that is within the ranges of interobserver variability reported in the literature for the manual evaluation of the IHS4 (ICC = 0.65, 95% CI: 0.54-0.76, Thorlacius et al., 2019).

Subject 935​

In the same way, for subject 935 we have these values per day:

Subject IDVisitAccuracy (%)
935Day 186.1
935Day 1561.1
935Day 2955.6
935Day 4355.6

And by location and by day:

Subject IDVisitBody siteAccuracy (%)
935Day 1ARM_LEFT100.0
935Day 1ARM_RIGHT72.2
935Day 15ARM_LEFT72.2
935Day 15ARM_RIGHT50.0
935Day 29ARM_LEFT77.8
935Day 29ARM_RIGHT33.3
935Day 43ARM_LEFT77.8
935Day 43ARM_RIGHT33.3

The overall accuracy for this subject was 64.58%.

Overal Performance​

The AIHS4 system achieved a total average accuracy of 71.66% compared to the gold standard.

Optimised AIHS4 model (latest version)​

As part of our commitment to continuous improvement in the automated evaluation of HS, we present the analysis of an optimised version of the AIHS4 system.

Subject 903​

With which we obtain these accuracy values per day:

Subject IDVisit** Accuracy (%)**
903Day 150.0
903Day 1583.3
903Day 2950.0
903Day 4375.0

And these values per day and per location:

Subject IDVisitBody siteAccuracy (%)
903Day 1ARM_LEFT33.3
903Day 1ARM_RIGHT66.7
903Day 15ARM_LEFT100.0
903Day 15ARM_RIGHT66.7
903Day 29ARM_LEFT66.7
903Day 29ARM_RIGHT33.3
903Day 43ARM_LEFT83.3
903Day 43ARM_RIGHT66.7

The overall accuracy achieved was 64.58%, comparable with the interobserver agreement rates documented in the scientific literature

Subject 935​

In which the accuracy per day would be:

Subject IDVisitAccuracy (%)
935Day 190.0
935Day 1591.7
935Day 2966.7
935Day 4375.0

And for each location and day:

Subject IDVisitBody siteAccuracy (%)
935Day 1ARM_LEFT100.0
935Day 1ARM_RIGHT80.0
935Day 15ARM_LEFT100.0
935Day 15ARM_RIGHT83.3
935Day 29ARM_LEFT66.7
935Day 29ARM_RIGHT66.7
935Day 43ARM_LEFT83.3
935Day 43ARM_RIGHT66.7

The overall accuracy achieved was 80.33%, demonstrating particularly robust performance in this case.

Overall Performance​

The advanced version of AIHS4 achieved full accuracy of 72.70%, demonstrating remarkable consistency in the longitudinal evaluation of HS. This level of accuracy aligns with interobserver variability standards established in the literature.2 and suggests that the system maintains performance comparable to expert clinical evaluation.

Researchers compared to gold standard​

To evaluate the consistency and reliability of the manual annotations, we have performed a comparative analysis between the evaluations of the two HS experts. This analysis allows us to measure the degree of agreement between researchers and establish a solid reference framework for the validation of automated models.

Through this comparison, we examined variability in lesion identification and quantification at different time points and anatomical regions. These results provide key insight into interobserver consistency, which in turn reinforces the interpretation of data obtained with artificial intelligence systems.

Subject 903​

Where the accuracy per day is:

Subject IDVisitAccuracy (%)
903Day 137.5
903Day 1550.0
903Day 2950.0
903Day 4383.3

And the accuracy by location and by day:

Subject IDVisitBody siteAccuracy (%)
903Day 1ARM_LEFT66.7
903Day 1ARM_RIGHT8.3
903Day 15ARM_LEFT66.7
903Day 15ARM_RIGHT33.3
903Day 29ARM_LEFT66.7
903Day 29ARM_RIGHT33.3
903Day 43ARM_LEFT66.7
903Day 43ARM_RIGHT100.0

The total subject accuracy obtained is 55.20%.

Subject 935​

In which the accuracy per days is:

Subject IDVisitAccuracy (%)
935Day 120.3
935Day 1530.6
935Day 2938.9
935Day 4372.2

And the accuracy by location and day:

Subject IDVisitBody siteAccuracy (%)
935Day 1ARM_LEFT19.4
935Day 1ARM_RIGHT22.2
935Day 15ARM_LEFT44.4
935Day 15ARM_RIGHT16.7
935Day 29ARM_LEFT44.4
935Day 29ARM_RIGHT33.3
935Day 43ARM_LEFT77.8
935Day 43ARM_RIGHT66.7

Given these values, the total accuracy achieved is 40.63%

Overall Performance​

The comparison between the evaluations carried out by the two clinical experts in HS showed an interobserver accuracy of 47.91%. This variability reflects inherent differences in lesion interpretation when based solely on a review of individual images, without in-person evaluation of the subject.

Overall Comparison: Precision and Recall​

To further evaluate performance differences, we computed precision and recall for each method compared against Researcher 2.

MethodPrecision (%)Recall (%)
AIHS4 (Production) vs Investigator 258.7894.07
AIHS4 (Optimised) vs Investigator 268.9685.42

This comparison provides additional insights into false positive and false negative rates, further contextualising the accuracy results presented above.

Evolution of IHS4 Scores​

To visualise how IHS4 scores evolve over time across different evaluation methods, the following figures present the score progression for each subject:

Subject 903​

Subject 935​

These graphs illustrate the variation in IHS4 scoring depending on the evaluation method used (clinical investigator, AIHS4 models, and expert consensus). Further discussion on the implications of these differences is provided in the Discussion section.

Discussion and overall Conclusions​

Conclusions​

The conclusions of this study reveal significant findings in the evaluation of HS using automated systems compared to traditional clinical evaluation. The AIHS4 system demonstrated superior performance, achieving an overall accuracy of 71.66% in its original model version and 72.70% in its optimised model version, compared to the 47.91% obtained by the investigator when compared to the gold standard.

This performance was consistent at the individual subject level (Subject 903: AIHS4 78.75% vs. researcher 55.20%; Subject 935: AIHS4 64.58% vs. researcher 40.63%) and over time.

Importantly, the initial 59.27% agreement observed between the AIHS4 system and the investigator, which was initially considered suboptimal, actually falls within the documented interobserver variability ranges for HS assessment. Multicenter studies have reported intraclass correlation coefficients (ICC) ranging between 0.44 and 0.78 (Thorlacius et al., 2019; Zouboulis et al., 2017), with lower reliability for the identification of specific lesion types (ICC as low as 0.40 for nodules and 0.52 for fistulas). This suggests that the AIHS4 system operates within the expected variability for human evaluators.

This should be highlighted, as it indicates that the AIHS4 system is not only performing at a level comparable to human evaluators but also has the potential to reduce variability and improve standardisation in HS severity assessment, which has shown to be a significant challenge in clinical practice due to the inherent subjectivity and variability of manual evaluations.

Furthermore, our results build upon prior research conducted by our team, where AIHS4 was introduced as a novel AI-based severity scoring system for HS (Hernández Montilla et al., 2023). In that study, different YOLOv5 architectures were evaluated for lesion detection in HS subjects, reporting precision values ranging from 0.42 to 0.46 and recall values between 0.39 and 0.41. The findings of our current investigation further validate the reliability of AIHS4 in clinical practice, demonstrating its potential as a standardised tool for HS severity assessment.

Additionally, our findings indicate that the AIHS4 system aligns closely with the gold standard, as established by consensus between two expert dermatologists (Investigator 2). The observed differences between Investigator 1 and Investigator 2 reflect the well-documented interobserver variability in HS evaluation. This variability is inherent in clinical assessment and has been previously highlighted in the literature, particularly in cases where lesion differentiation is more complex (Thorlacius et al., 2019). The alignment of AIHS4 with the gold standard underscores its potential for improving consistency in HS severity assessment, reducing the impact of subjective variability that is naturally present among human evaluators.

It is crucial to contextualise these findings considering the inherent limitations of the evaluation process. Although the gold standard was established through the independent evaluation of two clinical experts (Dr. Antonio Martorell and Dr. Gema Ochando), their assessment was based solely on static images and not on in-person clinical evaluation. However, according to Dr. Martorell, the lesions in this study were particularly evident, minimising the potential impact of this limitation.

Additionally, the expert evaluators did not have access to the original investigator's notes, and the investigator's evaluation only provided a numerical lesion count without lesion localisation, adding complexity to the comparison.

The objectivity in HS severity assessment is a critical aspect of clinical practice. In this way, a more objective and standardised evaluation system of HS can help determin in a more accurate way the severity of the disease, which is essential for effective treatment planning and monitoring (Zouboulis et al., 2018). Along with this, the objective assessment of the severity of hidradenitis suppurativa and the appropriateness of patient treatment allows for improved clinical flow, better patient's outcomes and reduced healthcare costs (Zouboulis et al., 2019). In this way, preventing the disease progression via early diagnosis and severity assessment could decrease hidradenitis suppurativa-related expenditure and improve the quality of life of patients suffering from this condition (Tsentemeidou et al., 2022).

The superior performance and consistency demonstrated by AIHS4 highlight its potential as a complementary tool in clinical practice, particularly in contexts where standardisation and reproducibility are crucial, such as clinical trials. Given that interobserver variability in HS literature (ICC = 0.65) is obtained under in-person conditions, AIHS4's performance further underscores its potential value in current clinical practice.

References​

  1. Zouboulis CC, Tzellos T, Kyrgidis A, Jemec GBE, Bechara FG, Giamarellos-Bourboulis EJ, Ingram JR, Kanni T, Karagiannidis I, Martorell A, Matusiak Ł, Pinter A, Prens EP, Presser D, Schneider-Burrus S, von Stebut E, Szepietowski JC, van der Zee HH, Wilden SM, Sabbath R; European Suppurative Hidradenitis Foundation Investigator Group. Development and validation of the International Hidradenitis Suppurativa Severity Score System (IHS4), a novel dynamic scoring system for assessing HS severity. Br J Dermatol. 2017 Nov;177(5):1401-1409. doi: 10.1111/bjd.15748. PMID: 28636793.

  2. Thorlacius L, Garg A, Riis PT, Nielsen SM, Bettoli V, Ingram JR, Del Marble V, Matusiak L, Pascual JC, Revuz J, Sartorius K, Tzellos T, van der Zee HH, Zouboulis CC, Saunte DM, Gottlieb AB, Christensen R, Jemec GBE. Inter-rater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa. Br J Dermatol. 2019 Sep;181(3):483-484. doi: 10.1111/bjd.17716. PMID: 30724351.

  3. Hernández Montilla I, Medela A, Mac Carthy T, et al. Automatic International Hidradenitis Suppurativa Severity Score System (AIHS4): A novel tool to assess the severity of hidradenitis suppurativa using artificial intelligence. Skin Res Technol. 2023; 29:e13357. doi: 10.1111/srt.13357.

  4. Zouboulis CC, Bechara FG, Dickinson-Blok JL, et al. Hidradenitis suppurativa/acne inversa: a practical framework for treatment optimization – systematic review and recommendations from the HS ALLIANCE working group. J Eur Acad Dermatol Venereol. 2019;33(1): 19-31. doi: 10.1111/jdv.15233.

  5. Tsentemeidou A, Sotiriou E, Ioannides D, et al. Hidradenitis suppurativa-related expenditure, a call for awareness: systematic review of literature. J Dtsch Dermatol Ges 2022;20(8): 1061-1072. doi: 10.1111/ddg.14796. (https://doi.org/10.1111/ddg.14796).

Implications for Future Research​

The positive outcomes of this study pave the way for several avenues of future research. Firstly, helping to improve the diagnosis of difficult-to-diagnose pathologies such as HS, which significantly impacts the quality of life of subjects who suffer from them.

On the other hand, exploring the integration of artificial intelligence and machine learning techniques to refine the tool's diagnostic capabilities warrants attention. This could lead to even more accurate and reliable assessments, potentially revolutionising the field of dermatology.

Additionally, conducting long-term studies to evaluate the impact of the device on subject outcomes, including treatment adherence and quality of life, would provide a comprehensive understanding of its broader clinical implications.

Limitations of Clinical Research​

The main limitations of the pilot included several factors that may influence the perception and effectiveness of the AI-based device. Firstly, the acceptance and trust of healthcare professionals in these emerging technologies can vary significantly. The device's effectiveness may be compromised if users are not fully convinced of its accuracy or usefulness, thereby affecting the overall perception of its performance.

Additionally, image quality is crucial for the device's performance. Issues such as low-quality photographs, errors in cropping lesions, or variations in lighting and focus could deteriorate the quality of the data received by the system, which may negatively influence the evaluation and perception of its effectiveness by the researchers.

Variability in image conditions is also an important aspect to consider. Differences in lighting, colour, shape, size, and focus of the images, along with the number of images available for each subject, can affect the accuracy of the results. High variability in images of the same subject or an insufficient number of representative images can lead to a decrease in the expected diagnostic accuracy of the device.

Additionally, the consistency of investigators in using the device is crucial. Variations in how diligently investigators use the device can impact the pilot's findings. If the investigators are not consistent in their use of the device, it can lead to unreliable results and affect the overall assessment of its efficacy.

Ethical Aspects of Clinical Research​

The conduct of this study adheres to international Good Clinical Practice standards and is in compliance with the Declaration of Helsinki in its latest active amendment. It also conforms to international and national rules and regulations.

The study did not require the approbation by an Ethics Committee due to its observational and retrospective character and not allowing subjects' identification.

The study has been conducted in accordance with European Regulation 2016/679, of 27 April, on the protection of natural persons about the processing of personal data and the free movement of such data. Additionally, it adheres to the Spanish Organic Law 3/2018, of 5 December, on the Protection of Personal Data and guarantee of digital rights concerning data processing. No data that allows the personal identification of subjects has been included, and all information has been managed in an encrypted manner.

Investigators have received comprehensive oral and written information about the study, tailored to their level of understanding. The main investigator ensured that the participants had sufficient time to ask questions and clarify any doubts regarding the study details.

The Data Controller for this study is the research team. Legit.Health, the Data Processor, is not responsible for the processing of the data included in the Software or its users. The storage and handling of data and photographs is aligned with the European Regulation 2016/679 of 27 April on the protection of natural persons with regard to the processing of personal data and the free movement of such data and the Organic Law 3/2018 of 5 December on the Protection of Personal Data and guarantee of digital rights. At the conclusion of the study, all information stored in the device will be completely and permanently deleted.

The device complies with current legislation on the protection and confidentiality of personal data. Appropriate technical and organizational security measures are in place to ensure the security of personal data and prevent its alteration, loss, unauthorised processing or access.

Investigators and Administrative Structure of Clinical Research​

Brief Description​

This clinical investigation has been conducted in collaboration with AI Labs Group S.L. (Legit.Health) and Almirall S.A..

Investigators​

Principal investigator​

  • Dr. Antonio Martorell Calatayud

Collaborators​

  • Dr. Gema Ochando
  • AI Labs Group S.L.
    • Mr. Alfonso Medela
    • Mr. Victor Gisbert
    • Mrs. Alba Rodríguez

External Organisation​

No additional organisations, beyond those previously mentioned, contributed to the clinical research. The study was conducted with the collaboration and resources of the specified entities.

Sponsor and Monitor​

AI Labs Group S.L.

Report Annexes​

  • Instructions For Use (IFU) can be found in the protocol.

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-005
Previous
R-TF-015-004 Clinical investigation plan
Next
BI 2024
  • Research Title
  • Product Identification
  • Promoter Identification and Contact
  • Identification of Sponsors
  • Clinical Investigation Plan (CIP) Identification
  • Public Access Database
  • Research Team
    • Principal Investigator
    • Collaborators
    • Centre
  • Compliance Statement
  • Report Date
  • Report Author(s)
  • Table of Contents
  • Summary
    • Title
    • Introduction
    • Objectives
      • Primary Objective
      • Secondary Objectives
    • Population
    • Design and Methods
      • Design
      • Number of Subjects
      • Initiation Date
      • Completion Date
      • Duration
      • Methods
    • Results
      • AIHS4 global accuracy vs gold standard:
    • Conclusions
  • Introduction
  • Material and Methods
    • Product Description
    • Clinical Investigation Plan
      • Objectives
      • Design
      • Ethical Considerations
      • Data Quality Assurance
      • Study Population
        • Inclusion Criteria
        • Exclusion Criteria
      • Evaluation Procedure
      • Measuring System
      • Statistical Analysis
  • Results
    • Agreement between the Researcher and the AIHS4 Model and the literature
      • Subject 903
      • Subject 935
    • AIHS4 Model in Production
      • Subject 903
      • Subject 935
        • Overal Performance
    • Optimised AIHS4 model (latest version)
      • Subject 903
      • Subject 935
        • Overall Performance
      • Researchers compared to gold standard
      • Subject 903
      • Subject 935
        • Overall Performance
    • Overall Comparison: Precision and Recall
    • Evolution of IHS4 Scores
      • Subject 903
      • Subject 935
  • Discussion and overall Conclusions
    • Conclusions
  • References
    • Implications for Future Research
    • Limitations of Clinical Research
    • Ethical Aspects of Clinical Research
  • Investigators and Administrative Structure of Clinical Research
    • Brief Description
    • Investigators
      • Principal investigator
      • Collaborators
    • External Organisation
    • Sponsor and Monitor
  • Report Annexes
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)