R-TF-015-006 Clinical investigation report
Research Title
Simulated-use multi-reader multi-case (MRMC) investigation of Legit.Health Plus (hereinafter "the device"): effect on the top-1 diagnostic accuracy of healthcare professionals on a curated set of anonymised dermatological images, with specific attention to rare pustular dermatoses and hidradenitis suppurativa.
Nature and positioning of the evidence
This investigation is a simulated-use MRMC reader study performed entirely on retrospective, fully anonymised dermatological images sourced from public dermatological atlases and from a de-identified image library provided by the sponsor (Boehringer Ingelheim). No patients were recruited, no patient-identifiable data were processed, and no diagnostic or therapeutic intervention was performed on any patient as a consequence of the investigation.
Under MDCG 2020-6 Appendix III the resulting evidence is Rank 11 (simulated-use reader study on retrospective images); it is distinct from clinical data on real patients within the meaning of MDR Article 2(48). Per MDCG 2020-1 §4.4 it contributes Pillar 3 Clinical Performance supporting evidence — measuring the clinician's diagnostic decision-making when using the device's Top-5 prioritised differential view — at a lower rank than the prospective real-patient investigations that carry the primary Pillar 3 weight. Pillar 2 (algorithm-level analytical performance across the 346 ICD-11 categories at API level) is evidenced independently through the device verification-and-validation records and the published severity-validation manuscripts; it is not the subject of this investigation. Pillar 1 (Valid Clinical Association literature anchoring diagnostic accuracy to patient outcomes) is documented in the State of the Art (R-TF-015-011).
Product Identification
| Information | |
|---|---|
| Device name | Legit.Health Plus (hereinafter, the device) |
| Model and type | NA |
| Version | 1.1.0.0 |
| Basic UDI-DI | 8437025550LegitCADx6X |
| Certificate number (if available) | MDR 000000 (Pending) |
| EMDN code(s) | Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software) |
| GMDN code | 65975 |
| EU MDR 2017/745 | Class IIb |
| EU MDR Classification rule | Rule 11 |
| Novel product (True/False) | TRUE |
| Novel related clinical procedure (True/False) | TRUE |
| SRN | ES-MF-000025345 |
Throughout this document, references to "the device" refer to the investigational product identified above.
Device version under investigation and bridging to the CE-marked release
The investigation was conducted using device version v1.1.0.0, which is the only version placed on the market and the version to which the present technical documentation applies. No intermediate development build was used during the conduct of the investigation. The bridging between the investigation-version and the CE-marked release is therefore an identity bridge (no algorithm, model, UI or training-data changes); no clinical-relevance assessment is required. The device-version statement has been reviewed and signed off by the PRRC.
Promoter Identification and Contact
| Manufacturer data | |
|---|---|
| Legal manufacturer name | AI Labs Group S.L. |
| Address | Street Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain) |
| SRN | ES-MF-000025345 |
| Person responsible for regulatory compliance | Alfonso Medela, Saray Ugidos |
| office@legit.health | |
| Phone | +34 638127476 |
| Trademark | Legit.Health |
| Authorized Representative | Not applicable (manufacturer is based in EU) |
Identification of sponsor
Sponsor: Boehringer Ingelheim. The sponsor provided the de-identified internal image subset used alongside the public-atlas images and took no role in the analysis or the conclusions. Monitoring was performed as described in the CIP §Monitoring plan.
Identification of the Clinical Investigation Plan (CIP)
| CIP | |
|---|---|
| Title of the clinical investigation | Multi-Reader Multi-Case Study for Assessing the Impact of Legit.Health Plus on the Clinical Assessment of Generalised Pustular Psoriasis and Other Skin Conditions by Healthcare Professionals. |
| Device under investigation | Legit.Health Plus |
| Protocol version | Version 1.0 |
| Date | 2024-06-01 |
| Protocol code | LEGIT.HEALTH_BI_2024 |
| Sponsor | Boehringer Ingelheim |
| Coordinating Investigator | Dr. Antonio Martorell Calatayud |
| Principal Investigator(s) | Dr. Antonio Martorell Calatayud |
| Investigational site(s) | This study was conducted remotely by sending the images to the participating dermatologists. |
| Ethics Committee | This study did not require Ethics Committee approval because it is observational and non-interventional. All data used consists of fully anonymized images sourced from public dermatology atlases and databases, containing no information permitting patient identification. As such, the research meets the criteria for exemption from ethics committee review under applicable regulatory frameworks. |
Trial Registrations
- ClinicalTrials.gov (NCT): NCT07428915
- EMA RWD Catalogue (EUPAS): EUPAS1000000910
Public Access Database
A public-facing results summary is made available through the trial registrations identified above. The underlying image set and reader-level raw data are not publicly accessible due to privacy and confidentiality considerations, and because the image set includes a de-identified sponsor-provided subset under the terms of the sponsor-investigator agreement.
Research Team
Principal investigator
- Dr. Antonio Martorell Calatayud
Collaborating Investigators (Clinical Staff)
- Dr. Mari Carmen Galindo
- Dr. Paco García Tolosa
- Dr. Laura Yuste Hidalgo
- Dr. Nuria Comabella
- Dr. Marta Vázquez
- Dr. David Palacios
- Dr. Norma Alejandra Doria Carlin
- Dr. Francisco José Esteban González
- Dr. Alfonso José Valcarce Leonisio
- Dr. José Antonio Arjona Sevilla
- Dr. Javier Melgosa
- Dr. Manuel Ballesteros Redondo
- Dr. Esmeralda Silva
- Dr. Ana Llull Ramos
- Dr. Angela Patricia Guzmán
Technical Support (Manufacturer)
- Mr. Alfonso Medela — Chief Technology Officer
- Dr. Alberto Sabater — Algorithm Lead
- Mrs. Alba Rodríguez — Clinical Operations
Centre
The investigation was conducted remotely through a centralised, access-controlled web-based platform. Reader-participants were distributed across multiple primary-care and dermatology sites; image review and diagnostic annotation activities were performed online through the access-controlled platform.
Compliance Statement
The clinical investigation will be conducted according to the Clinical Investigation Plan (CIP) and other applicable guidances and regulations. This includes compliance with:
- The ethical principles originating from the
World Medical Association's Declaration of Helsinki - Harmonized standard
UNE-EN ISO 14155:2020 Regulation (EU) 2017/745 on medical devices (MDR), including the applicableGeneral Safety and Performance Requirements (GSPR)as outlined in Annex I, and the requirements ofAnnex XV(Chapter I and Chapter II, Section 3)- Harmonized standard
UNE-EN ISO 13485:2016 MDCG 2024-3for its structural and content expectations,MDCG 2021-8concerning application requirements, andMDCG 2020-10/1 Rev 1for safety reporting timelines and definitionsRegulation (EU) 2016/679(GDPR)- Spanish
Organic Law 3/2018on the Protection of Personal Data and guarantee of digital rights.
All data processing within the device is carried out in accordance with the highest standards of data protection and privacy. Patient information is managed in an encrypted manner to ensure confidentiality and security.
The research team assumes the role of Data Controller, responsible for the collection and management of study data. Legit.Health acts as the Data Processor and is not involved in the processing of patient data.
The storage and transfer of data comply with European data protection regulations. At the conclusion of the study, all information stored in the device will be permanently and securely deleted.
The device employs robust technical and organizational security measures to safeguard personal data against unauthorized access, alteration, loss, or processing.
Report Date
October 15, 2024
Report author(s)
The full name, the ID and the signature for the authorship, as well as the approval process of this document, can be found in the verified commits at the repository. This information is saved alongside the digital signature, to ensure the integrity of the document.
Table of contents
Table of contents
- Research Title
- Product Identification
- Promoter Identification and Contact
- Identification of sponsor
- Identification of the Clinical Investigation Plan (CIP)
- Public Access Database
- Research Team
- Compliance Statement
- Report Date
- Report author(s)
- Table of contents
- Abbreviations and Definitions
- Summary
- Introduction
- Material and methods
- Results
- Initiation and Completion Dates
- Reader and Device Management
- Image-Case Demographics (n = 100)
- Reader-Participant Characteristics (n = 15)
- Clinical Investigation Plan (CIP) Compliance
- Analysis
- Primary Analysis — confirmatory
- Per-specialty secondary analysis
- Rare-disease pooled secondary analysis
- Exploratory decomposition of device-attributable decision changes
- Exploratory per-pathology analysis
- Exploratory per-specialty, per-pathology tables
- Target-pathology exploratory description
- Adverse Events and Adverse Reactions to the Product
- Product Deficiencies
- Paediatric age-band subgroup — exploratory, non-evaluable for confirmatory claims
- Discussion and Overall Conclusions
- References
- Investigators and Administrative Structure of Clinical Research
- Report Annexes
Abbreviations and Definitions
- AE: Adverse Event
- AEMPS: Spanish Agency of Medicines and Medical Devices
- AEP: Adverse Reaction to Product
- AUC: Area Under the ROC Curve
- CAD: Computer-Aided Diagnosis
- CMD: Data Monitoring Committee
- CIP: Clinical Investigation Plan
- CUS: Clinical Utility Questionnaire
- DLQI: Dermatology Quality of Life Index
- GCP: Standards of Good Clinical Practice
- ICH: International Conference of Harmonization
- IFU: Instructions For Use
- IRB: Institutional Review Board
- N/A: Not Applicable
- NCA: National Competent Authority
- PI: Principal Investigator
- PPV: Positive Predictive Value
- NPV: Negative Predictive Value
- SAE: Serious Adverse Events
- SAEP: Serious Adverse Event to Product
- SUAEP: Serious and Unexpected Adverse Event to the Product
- SUS: System Usability Scale
Summary
Title
Simulated-use multi-reader multi-case (MRMC) investigation of the device: effect on the top-1 diagnostic accuracy of healthcare professionals on a curated set of anonymised dermatological images, with specific attention to rare pustular dermatoses and hidradenitis suppurativa.
Introduction
Dermatological conditions represent a significant portion of primary care consultations, constituting approximately 5% of all visits. Discrepancies between diagnoses made by primary care physicians and dermatologists remain substantial, with concordance rates between 57% and 65.52%, particularly in rare and severe conditions such as generalised pustular psoriasis (GPP) and hidradenitis suppurativa (HS); these discrepancies can contribute to misdiagnoses, incorrect referrals and delays in appropriate treatment (the anchoring literature for this Valid Clinical Association is summarised in R-TF-015-011 State of the Art).
This investigation assessed, under simulated-use conditions on a curated set of 100 anonymised dermatological images sourced from public dermatological atlases and from a de-identified sponsor image library, whether use of the device's Top-5 prioritised differential view improves the top-1 diagnostic accuracy of healthcare professionals relative to their unaided reads on the same image set. Fifteen healthcare professionals (11 primary care physicians and 4 dermatologists) each reviewed the image set, first unaided and then with the device. The investigation was conducted following applicable data-protection and research-ethics principles; the sponsor's determination of non-applicability of biomedical-research law and ethics-committee review is documented in Annex E.
Objectives
Primary objective
- To demonstrate that the device, used in its Top-5 prioritised differential-view configuration, increases the top-1 diagnostic accuracy of healthcare professionals on a curated, anonymised image set representing multiple dermatological conditions.
Secondary objectives
- To characterise the paired change in pooled top-1 diagnostic accuracy within the pre-specified rare-disease subgroup (GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris).
- To characterise the paired change in top-1 diagnostic accuracy stratified by reader specialty (primary care physicians versus dermatologists).
Acceptance criteria
- top-1 accuracy equal to or greater than 7.00%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy greater than 47.94%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 53.96%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 6.93%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 70.00%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 52.61%.(User Group: Dermatologists, Primary care practitioners)
- specificity equal to or greater than 5.06%.(User Group: Dermatologists, Primary care practitioners)
- specificity equal to or greater than 70.00%.(User Group: Dermatologists, Primary care practitioners)
- specificity greater than 56.45%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 47.91%.(User Group: Primary care practitioners)
- top-1 accuracy equal to or greater than 46.12%.(User Group: Primary care practitioners)
- sensitivity equal to or greater than 14.30%.(User Group: Primary care practitioners)
- sensitivity equal to or greater than 66.30%.(User Group: Primary care practitioners)
- specificity equal to or greater than 11.88%.(User Group: Primary care practitioners)
- specificity equal to or greater than 70.10%.(User Group: Primary care practitioners)
- top-1 accuracy equal to or greater than 5.83%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 57.25%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 61.80%.(User Group: Dermatologists)
- sensitivity equal to or greater than 6.93%.(User Group: Dermatologists)
- sensitivity equal to or greater than 70.00%.(User Group: Dermatologists)
- sensitivity greater than 61.64%.(User Group: Dermatologists)
- specificity equal to or greater than 77.60%.(User Group: Dermatologists)
- specificity greater than 62.47%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 6.93%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 30.90%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity equal to or greater than 6.93%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 21.04%.(User Group: Dermatologists, Primary care practitioners)
- specificity equal to or greater than 5.06%.(User Group: Dermatologists, Primary care practitioners)
- specificity greater than 38.69%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 24.34%.(User Group: Primary care practitioners)
- sensitivity equal to or greater than 14.30%.(User Group: Primary care practitioners)
- sensitivity greater than 19.33%.(User Group: Primary care practitioners)
- specificity equal to or greater than 11.88%.(User Group: Primary care practitioners)
- specificity greater than 36.64%.(User Group: Primary care practitioners)
- top-1 accuracy equal to or greater than 5.83%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 48.15%.(User Group: Dermatologists)
- sensitivity greater than 35.89%.(User Group: Dermatologists)
- specificity greater than 55.67%.(User Group: Dermatologists)
Population
The reader population consisted of board-certified healthcare professionals (primary care physicians and dermatologists) without prior regular exposure to the device. Fifteen reader-participants were enrolled (11 primary care physicians and 4 dermatologists).
Sample size
The investigation was powered to detect a pre-specified absolute improvement of at least 10 percentage points (pp) in pooled top-1 diagnostic accuracy between unaided and device-aided reads on the same image set, using McNemar's test for paired proportions at a two-sided significance level of 0.05. The target sample of 15 reader-participants each reviewing 100 anonymised dermatological images yields 1,500 paired per-observation reads. For a paired-binary design with an expected unaided baseline accuracy of approximately 50% and an expected aided accuracy of approximately 60% (corresponding to a discordant-pair rate consistent with state-of-the-art aided-HCP performance documented in R-TF-015-011), 1,500 paired observations provide power ≥ 80% to detect the pre-specified 10 pp absolute improvement on the pooled endpoint. Per-pathology, per-age-band and within-specialty stratum analyses were pre-specified as exploratory / hypothesis-generating; the investigation is not powered for confirmatory per-stratum conclusions, and the dermatologist stratum (n = 4) is acknowledged as under-powered for per-specialty confirmatory claims.
Design and Methods
Design
The investigation proceeded as follows.
Reader recruitment and image presentation
A centralised, access-controlled web-based platform served as the electronic Case Report Form (eCRF) for image presentation and data capture. The UI surface under evaluation — the Top-5 prioritised differential view — is the integration surface the device mandates on any integrator; the Pillar 3 Clinical Performance claim supported by this investigation applies to that UI configuration. Each reader-participant logged in with individual authenticated credentials and was presented, for each case, with the following pre-specified question set:
- Based on the provided image, what diagnosis do you consider most appropriate? This question was accompanied by anamnesis inquiries regarding allergies, ongoing treatments and other relevant medical history, including systemic symptoms potentially related to conditions such as GPP.
- Considering both the image and the information provided by the device, what diagnosis do you now consider most appropriate? At this step, the same information from question 1 was supplemented with the device's Top-5 prioritised differential diagnoses and their respective confidence levels.
Each reader-participant was presented with 100 cases. These images had been previously confirmed by dermatologists and were sourced from public dermatological atlases and from a de-identified image library provided by the sponsor. The conditions were distributed as follows:
| Condition | ICD-11 code | Number of images |
|---|---|---|
| Generalised pustular psoriasis | EA90.40 | 10 |
| Eczematous dermatitis | EA89 | 10 |
| Acute generalised exanthematous pustulosis | EH67.0 | 5 |
| Acne | ED80 | 5 |
| Acne conglobata | ED80.41 | 10 |
| Severe inflammatory acne | ED80.4 | 5 |
| Seborrheic keratosis | 2F21.0 | 5 |
| Seborrheic dermatitis | EA81 | 5 |
| Palmoplantar pustulosis | EA90.42 | 5 |
| Plaque psoriasis | EA90.0 | 5 |
| Pemphigus vulgaris | EB40.0 | 5 |
| Impetigo | 1B72 | 10 |
| Hidradenitis suppurativa | ED92.0 | 10 |
| Subcorneal pustular dermatosis | EB2Y | 5 |
| Tinea corporis | 1F28.Y | 5 |
The list of conditions includes the target condition GPP, as well as related pustular dermatoses (subcorneal pustular dermatosis, palmoplantar pustulosis, AGEP) and common confounders in primary care (eczematous dermatitis, seborrheic dermatitis, plaque psoriasis, impetigo, tinea corporis).
All reader responses were recorded in the secure central database and, upon database lock, exported in a controlled, structured tabular format for analysis. The pre-specified statistical analysis was implemented in a deterministic, version-controlled analytics environment maintained by the manufacturer; statistical tests (including McNemar's test for the primary endpoint) and confidence intervals were computed as specified in the CIP §Statistical analysis.
Number of reader-participants
A total of 15 healthcare professionals (11 primary care physicians and 4 dermatologists) were recruited and participated in the investigation.
Initiation and completion dates
- Initiation: 6 June 2024
- Completion (data lock): 15 September 2024
- Report signed off: 15 October 2024
Duration
The investigation spanned approximately three months of data collection (6 June 2024 to 15 September 2024), followed by approximately one month of database cleaning, analysis and report preparation (to 15 October 2024). The total duration from first reader session to signed report was approximately four months.
Methods
The investigation applied a prospective, simulated-use, multi-reader multi-case (MRMC) self-controlled design to evaluate whether the use of the device improved the top-1 diagnostic accuracy of healthcare professionals on a curated set of 100 anonymised dermatological images, with particular attention to rare pustular dermatoses and hidradenitis suppurativa. Each reader-participant served as their own comparator, first providing diagnoses unaided and subsequently providing diagnoses with the device's Top-5 prioritised differential view on the same image set.
Results
Fifteen reader-participants (11 primary care physicians and 4 dermatologists) reviewed the curated 100-image set. Nine reader-participants completed the full 100-image protocol; six completed a partial set (99, 93, 93, 80, 77 and 68 images respectively), for a total of 1,449 paired per-observation reads (96.6% of the planned 1,500). The partial-completion deviation is documented in §Clinical Investigation Plan (CIP) Compliance below; the pre-specified primary analysis population is the per-observation paired set (complete-case handling).
All 95% confidence intervals below are reported using the Wilson score method; Newcombe 95% CIs for the paired difference are reported in the Analysis section.
On the pre-specified primary endpoint (pooled per-observation paired analysis across all healthcare professionals):
- Top-1 diagnostic accuracy: 47.94% (95% Wilson CI 45.4–50.5%) unaided → 63.06% (95% Wilson CI 60.5–65.5%) aided; absolute improvement +15.12 pp (McNemar p < 0.001).
- Diagnostic sensitivity (descriptive secondary): 52.61% unaided → 71.04% aided; absolute improvement +18.43 pp.
- Diagnostic specificity (descriptive secondary): 56.45% unaided → 75.83% aided; absolute improvement +19.38 pp.
On the pre-specified per-specialty secondary endpoint:
- Primary care physicians (11 readers, 1,049 paired observations): 44.71% → 61.71% top-1 accuracy; absolute improvement +17.00 pp (exceeds the pre-specified ≥ 10 pp threshold).
- Dermatologists (4 readers, 400 paired observations): 57.25% → 65.65% top-1 accuracy; absolute improvement +8.39 pp (exceeds the pre-specified ≥ 5 pp threshold). The dermatologist stratum is acknowledged as under-powered for per-pathology confirmatory claims.
On the pre-specified rare-disease secondary endpoint (pooled across GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris):
- All healthcare professionals: 25.56% → 57.88% top-1 accuracy; absolute improvement +32.32 pp (on the pooled rare-disease subgroup; see Analysis for per-pathology caveats).
Per-pathology analyses (2–10 images per pathology) are pre-specified as exploratory / hypothesis-generating and are reported in the Analysis section; no per-pathology confirmatory benefit claim is made on the basis of this investigation, and the per-pathology endpoints not met are named explicitly in §Analysis — Endpoints not met.
Conclusions
The pre-specified primary endpoint was met: use of the device's Top-5 prioritised differential view increased pooled top-1 diagnostic accuracy among healthcare professionals from 47.94% to 63.06% (absolute improvement +15.12 pp; McNemar p < 0.001) on the curated anonymised image set. Under MDCG 2020-6 Appendix III this is Rank 11 supporting evidence; per MDCG 2020-1 §4.4 it contributes Pillar 3 Clinical Performance supporting evidence — that the clinician, using the device's Top-5 prioritised differential view, makes measurably more accurate top-1 diagnostic decisions than without the device on the curated image set.
The two pre-specified per-specialty secondary endpoints (PCP ≥ 10 pp, dermatologist ≥ 5 pp) were met on the pooled per-specialty comparisons.
Diagnostic accuracy is a surrogate endpoint. The patient-benefit chain (earlier correct treatment, reduced disease progression, more efficient referral pathways) is indirect and is anchored by published Valid Clinical Association literature summarised in R-TF-015-011 State of the Art; it is not demonstrated by this investigation in isolation and is assessed at Clinical Evaluation level in R-TF-015-003.
Per-pathology analyses were exploratory and under-powered. Specifically, per-pathology paired change did not reach statistical significance for acute generalised exanthematous pustulosis (AGEP) (Δ = 0.00 pp, exploratory p = 1.00), subcorneal pustular dermatosis (Δ = 0.00 pp, exploratory p = 1.00), seborrheic keratosis (Δ = +1.33 pp, exploratory p = 1.00), eczematous dermatitis (Δ = +1.83 pp, exploratory p = 0.63) and plaque psoriasis (Δ = +5.41 pp, exploratory p = 0.13). No per-pathology confirmatory benefit claim is made on the basis of this investigation for any of these conditions; the Clinical Evaluation (R-TF-015-003) addresses per-pathology performance using the broader body of clinical evidence, and the PMCF Plan (R-TF-007-002) records the per-pathology exploratory trends as PMCF topics.
Claims concerning Fitzpatrick V/VI performance, confirmatory paediatric performance, and real-world deployment performance are not made on the basis of this investigation alone and are addressed by dedicated evidence within the Clinical Evaluation and by the PMCF Plan.
Introduction
Dermatological conditions represent a significant portion of primary care consultations, constituting approximately 5% of all visits. Discrepancies between diagnoses made by primary care physicians and dermatologists remain substantial, with concordance rates between 57% and 65.52%, and the limited availability of dermatologists (particularly in rural settings) further complicates patient access. This clinical gap motivates interest in diagnostic-decision-support tools that can operate within the primary-care consultation, and specifically for the recognition of rare pustular dermatoses and hidradenitis suppurativa where delayed recognition is a known source of patient harm.
Under MDCG 2020-6 Appendix III the resulting evidence from this investigation is Rank 11 (simulated-use reader study on retrospective images); per MDCG 2020-1 §4.4 it contributes Pillar 3 Clinical Performance supporting evidence — measuring the clinician's diagnostic decision-making when using the device's Top-5 prioritised differential view on a curated anonymised image set. Pillar 2 (algorithm-level analytical performance) is evidenced independently at Clinical Evaluation level and is not the subject of this investigation; Pillar 1 (Valid Clinical Association literature anchoring diagnostic accuracy to patient outcomes) is documented in R-TF-015-011 State of the Art. This investigation measures the change in top-1 diagnostic accuracy of healthcare professionals attributable to use of the device's Top-5 prioritised differential view on the curated image set, and nothing more; real-world patient-outcome, triage, teledermatology and healthcare-economics claims are explicitly out of scope.
Material and methods
Product Description
This section contains a short summary of the device. A complete description of the intended purpose, including device description, can be found in the record Legit.Health Plus description and specifications.
Product description
The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.
The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.
The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.
Intended purpose
The device is a computational software-only medical device intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing:
- quantification of intensity, count, extent of visible clinical signs
- interpretative distribution representation of possible International Classification of Diseases (ICD) categories.
Intended previous uses
No specific intended use was designated in prior stages of development.
Product changes during clinical research
The device maintained a consistent performance and features throughout the entire clinical research process. No alterations or modifications were made during this period.
Clinical Investigation Plan
Objectives
To demonstrate that the device, used in its Top-5 prioritised differential-view configuration, increases the top-1 diagnostic accuracy of healthcare professionals on a curated, anonymised image set representing multiple dermatological conditions, and to characterise — as pre-specified secondary endpoints — the paired change in pooled accuracy within the rare-disease subgroup and stratified by reader specialty.
Acceptance criteria
- top-1 accuracy equal to or greater than 7.00%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy greater than 47.94%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 53.96%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 6.93%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 70.00%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 52.61%.(User Group: Dermatologists, Primary care practitioners)
- specificity equal to or greater than 5.06%.(User Group: Dermatologists, Primary care practitioners)
- specificity equal to or greater than 70.00%.(User Group: Dermatologists, Primary care practitioners)
- specificity greater than 56.45%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 47.91%.(User Group: Primary care practitioners)
- top-1 accuracy equal to or greater than 46.12%.(User Group: Primary care practitioners)
- sensitivity equal to or greater than 14.30%.(User Group: Primary care practitioners)
- sensitivity equal to or greater than 66.30%.(User Group: Primary care practitioners)
- specificity equal to or greater than 11.88%.(User Group: Primary care practitioners)
- specificity equal to or greater than 70.10%.(User Group: Primary care practitioners)
- top-1 accuracy equal to or greater than 5.83%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 57.25%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 61.80%.(User Group: Dermatologists)
- sensitivity equal to or greater than 6.93%.(User Group: Dermatologists)
- sensitivity equal to or greater than 70.00%.(User Group: Dermatologists)
- sensitivity greater than 61.64%.(User Group: Dermatologists)
- specificity equal to or greater than 77.60%.(User Group: Dermatologists)
- specificity greater than 62.47%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 6.93%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 30.90%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity equal to or greater than 6.93%.(User Group: Dermatologists, Primary care practitioners)
- sensitivity greater than 21.04%.(User Group: Dermatologists, Primary care practitioners)
- specificity equal to or greater than 5.06%.(User Group: Dermatologists, Primary care practitioners)
- specificity greater than 38.69%.(User Group: Dermatologists, Primary care practitioners)
- top-1 accuracy equal to or greater than 24.34%.(User Group: Primary care practitioners)
- sensitivity equal to or greater than 14.30%.(User Group: Primary care practitioners)
- sensitivity greater than 19.33%.(User Group: Primary care practitioners)
- specificity equal to or greater than 11.88%.(User Group: Primary care practitioners)
- specificity greater than 36.64%.(User Group: Primary care practitioners)
- top-1 accuracy equal to or greater than 5.83%.(User Group: Dermatologists)
- top-1 accuracy equal to or greater than 48.15%.(User Group: Dermatologists)
- sensitivity greater than 35.89%.(User Group: Dermatologists)
- specificity greater than 55.67%.(User Group: Dermatologists)
The primary confirmatory endpoint is the paired absolute improvement of ≥ 10 pp in pooled top-1 diagnostic accuracy across the full reader cohort (all specialties pooled). The secondary confirmatory endpoints are (i) per-specialty paired absolute improvement with thresholds PCP ≥ 10 pp and dermatologist ≥ 5 pp (supportive only on the dermatologist stratum, which is acknowledged as under-powered), and (ii) rare-disease pooled paired absolute improvement using the same PCP ≥ 10 pp / dermatologist ≥ 5 pp structure (pre-specified per CIP §Justification of acceptance thresholds; thresholds derived from R-TF-015-011 State of the Art). All per-pathology and per-age-band comparisons are pre-specified as exploratory / hypothesis-generating.
Design (type of research, assessment criteria, methods, active group, and control group)
The investigation is a prospective, simulated-use, multi-reader multi-case (MRMC) self-controlled reader study. There is no separate active or control group: each reader-participant serves as their own comparator, first diagnosing each image unaided and then re-diagnosing the same image with the device's Top-5 prioritised differential view. The assessment criterion is the paired change in top-1 diagnostic accuracy across the full image set, captured through a secure, access-controlled web-based eCRF platform.
Ethical considerations
This study adhered to international Good Clinical Practice (GCP) guidelines, the Declaration of Helsinki in its latest amendment, and applicable international and national regulations. As applicable, approval from the relevant Ethics Committee was obtained prior to the initiation of the study. When applicable, modifications to the protocol were reviewed and approved by the Principal Investigator (PI) and subsequently evaluated by the Ethics Committee before subjects were enrolled under a modified protocol.
This study was conducted in compliance with European Regulation 2016/679, of 27 April, concerning the protection of natural persons with regard to the processing of personal data and the free movement of such data (General Data Protection Regulation, GDPR), and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and the guarantee of digital rights. In accordance with these regulations, no data enabling the personal identification of participants was collected, and all information was managed securely in an encrypted format.
Participants were informed both orally and in writing about all relevant aspects of the study, with the information being tailored to their level of understanding. They were provided with a copy of the informed consent form and the accompanying patient information sheet. Adequate time was given to patients to ask questions and fully comprehend the details of the study before providing their consent.
The PI was responsible for the preparation of the informed consent form, ensuring it included all elements required by the International Conference on Harmonisation (ICH), adhered to current regulatory guidelines, and complied with the ethical principles of GCP and the Declaration of Helsinki.
The original signed informed consent forms were securely stored in a restricted access area under the custody of the PI. These documents remained at the research site at all times. Participants were provided with a copy of their signed consent form for their records.
Data confidentiality
Current legislation will be complied with in terms of data confidentiality protection (European Regulation 2016/679, of 27 April, on the protection of natural persons with regard to the processing of personal data and the free movement of such data and Organic Law 3/2018, of 5 December, on Personal Data Protection and guarantee of digital rights). For this purpose, when applicable, each participant will receive an alphanumeric identification code in the study that will not include any data allowing personal identification (coded CRD). The Principal Investigator will have an independent list that will allow the connection of the identification codes of the patients participating in the study with their clinical and personal data. This document will be filed in a secure area with restricted access, under the custody of the Principal Investigator and will never leave the centre.
Once the paper CRDs are completed and closed by the Principal Investigator, the data will be transferred to a database.
As in the CRDs, the Database will comply with current legislation in terms of data confidentiality protection (European Regulation 2016/679, of 27 April, on the protection of natural persons about the processing of personal data and the free movement of such data and Organic Law 3/2018, of 5 December, on the Protection of Personal Data and guarantee of digital rights) in which no data allowing personal identification of patients will be included.
Data Quality Assurance
The Principal Investigator is responsible for reviewing and approving the protocol, signing the Principal Investigator commitment, guaranteeing that the persons involved in the centre will respect the confidentiality of patient information and protect personal data, and reviewing and approving the final study report together with the sponsor. All the clinical members of the research team assess the eligibility of the patients in the study, inform and request written informed consent, collect the source data of the study in the clinical record and transfer them to the Data Collection Notebook (DCN) or Data Collection Forms (CRF).
Reader Population
The investigation enrolled 15 healthcare professionals (11 primary care physicians and 4 dermatologists) to review anonymised dermatological images and record their diagnostic decisions. Nine reader-participants completed the full 100-image set; six completed a partial set (68–99 images); 1,449 paired per-observation reads were captured in total (96.6% of the planned 1,500).
Inclusion criteria
- Board-certified primary care physicians and dermatologists, regardless of years of professional experience, without prior regular exposure to the device.
- Dermatological images with confirmed reference diagnosis, meeting pre-specified technical quality criteria (sufficient resolution, adequate focus, lesion clearly framed).
Exclusion criteria
- Images of insufficient technical quality (blurred, poorly framed, or with insufficient lesion coverage) that cannot be properly analysed.
Statistical Analysis
The pre-specified primary analysis tested the paired change in pooled top-1 diagnostic accuracy across the full image set using McNemar's test for paired proportions at a two-sided significance level of 0.05, with Wilson 95% confidence intervals for each proportion and Newcombe 95% confidence intervals for the paired difference. Secondary analyses of the rare-disease pooled subgroup and of the per-specialty strata were reported with Wilson 95% CIs and Newcombe 95% CIs for the paired difference. Per-pathology analyses were pre-specified as exploratory / hypothesis-generating; no multiplicity correction was applied across exploratory analyses, consistent with their hypothesis-generating purpose. Missing reader responses were excluded from the paired analysis for that reader-image pair (complete-case handling); a sensitivity analysis at the 100%-completer reader level (9 readers, 900 paired observations) was pre-specified and confirmed the direction and magnitude of the primary estimate. Zero-cell strata used exact confidence intervals and were flagged as non-evaluable for confirmatory purposes. Inter-reader agreement on the primary endpoint was characterised descriptively. All analyses were executed in a deterministic, version-controlled analytics environment maintained by the manufacturer.
Performance claims cross-reference
studyCode or folderSlug prop, or ensure this component is used within an Investigation document with a registered folder slug.Results
Initiation and Completion Dates
- Initiation: 6 June 2024
- Completion (data lock): 15 September 2024
- Report signed off: 15 October 2024
Reader and Device Management
A total of 15 healthcare professionals (reader-participants) took part in the investigation. Each reader-participant was presented with the 100-image set and, for each image, recorded an unaided diagnosis followed by a device-aided diagnosis. The device — a software-only, web-accessible medical device — was provisioned to reader-participants through individual authenticated credentials on the secure eCRF platform; version identification (v1.1.0.0), session logging and complete audit-trail records were maintained for the duration of the investigation and retained under the custody of the Principal Investigator.
Image-Case Demographics (n = 100)
The characteristics reported in this section describe the 100 anonymised image cases presented to the reader-participants (i.e., the patient cases depicted in the images sourced from public atlases and from the de-identified sponsor image library), not the 15 reader-participants. Age and phototype metadata were extracted from the source atlas records; where a source did not report a given attribute, the image was still included in the pooled primary analysis as pre-specified. One source record carried inconsistent gender metadata (counted in one denominator totalling 101 rather than 100); this does not affect the image-count denominator of 100 used for all pooled analyses, and has been reconciled to the 100-image protocol.
Gender (image-case level): Men 63 (63.0%), Women 37 (37.0%).
| Age Group | Count | Percentage |
|---|---|---|
| Newborn (birth to 1 month) | 0 | 0.0% |
| Infant (1 month to 2 years) | 3 | 3.0% |
| Child (2 to 12 years) | 14 | 14.0% |
| Adolescent (12 to 21 years) | 20 | 20.0% |
| Adult (22 to 64 years) | 51 | 51.0% |
| Elderly (≥ 65 years) | 12 | 12.0% |
| Fitzpatrick Phototype | Count | Percentage |
|---|---|---|
| Phototype I | 20 | 20.0% |
| Phototype II | 43 | 43.0% |
| Phototype III | 22 | 22.0% |
| Phototype IV | 9 | 9.0% |
| Phototype V | 6 | 6.0% |
| Phototype VI | 0 | 0.0% |
The image set contains no Fitzpatrick VI cases and only 6 Fitzpatrick V cases. Accordingly, this investigation does not, on its own, support performance claims on Fitzpatrick V or VI; phototype coverage is addressed by dedicated phototype-bridging evidence at Clinical Evaluation level (R-TF-015-003) and by the PMCF Plan (R-TF-007-002).
Reader-Participant Characteristics (n = 15)
Fifteen healthcare professionals (11 primary care physicians and 4 dermatologists) acted as reader-participants. They are identified under pseudonymised codes in the exported analysis dataset; the un-pseudonymised mapping is retained in the investigator identification-code list under the custody of the Principal Investigator and is available on audit request. Reader-participants were board-certified physicians without prior regular exposure to the device.
Clinical Investigation Plan (CIP) Compliance
The investigation was conducted in accordance with the CIP. The following CIP deviations are recorded:
| Ref | Description | Impact assessment | Action |
|---|---|---|---|
| D-01 | Partial completion by 6 of 15 reader-participants (completing 68, 77, 80, 93, 93 and 99 images respectively rather than the full 100), owing to clinical-workload constraints during the reading window | 1,449 of 1,500 planned paired observations captured (96.6%); pre-specified complete-case handling applied; a sensitivity analysis restricted to the 9 full-completer readers (900 paired observations) confirmed the direction and magnitude of the primary estimate. No impact on the primary-endpoint pass/fail call. | Documented as a minor deviation; no CAPA required. |
| D-02 | Source-record gender-metadata discrepancy (one source-record totalling 101 instead of 100) reconciled to the 100-image protocol prior to analysis | No impact on the image-count denominator of 100 used for all pooled analyses; reconciliation recorded in the data-management log. | Documented for traceability; no CAPA required. |
No other CIP deviations occurred during the conduct of the investigation. No deviations impacted reader or image-case safety, data integrity, analysis population, or pre-specified endpoints.
Analysis
Primary Analysis — confirmatory
The pre-specified primary endpoint was the paired change in pooled top-1 diagnostic accuracy across the full image set (1,449 paired per-observation reads captured out of 1,500 planned). Unaided top-1 accuracy was 47.94% (95% Wilson CI 45.4–50.5%); aided top-1 accuracy was 63.06% (95% Wilson CI 60.5–65.5%). The absolute paired improvement was +15.12 pp (95% Newcombe CI for the paired difference is reported in the locked analysis dataset and is consistent in direction and magnitude with the point estimate); McNemar's test for paired proportions gave p < 0.001. The primary endpoint was met.
A pre-specified sensitivity analysis restricted to the 9 full-completer readers (900 paired observations) yielded directionally and magnitudinally consistent results (+15 pp absolute improvement; McNemar p < 0.001), supporting the primary estimate under the partial-completion deviation.
Per-specialty secondary analysis
| Specialty | Readers | Paired obs. | Unaided accuracy | Aided accuracy | Absolute difference | Pre-specified threshold | Call |
|---|---|---|---|---|---|---|---|
| Primary care | 11 | 1,049 | 44.71% | 61.71% | +17.00 pp | ≥ 10 pp | Pass |
| Dermatologists | 4 | 400 | 57.25% | 65.65% | +8.39 pp | ≥ 5 pp | Directional, supportive |
The primary-care secondary endpoint was met on the pre-specified ≥ 10 pp threshold. The dermatologist-stratum point estimate exceeded the ≥ 5 pp threshold; however, with n = 4 readers the stratum is under-powered for confirmatory claims and is therefore treated as a directional, supportive observation only — not as an independent confirmatory pass. Per-pathology analyses within the dermatologist stratum are reported descriptively and labelled exploratory.
Rare-disease pooled secondary analysis
Pooled across the six pre-specified rare-disease pathologies (GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris):
| Specialty | Unaided accuracy | Aided accuracy | Absolute difference |
|---|---|---|---|
| All healthcare professionals | 25.56% | 57.88% | +32.32 pp |
| Primary care | 24.34% | 56.44% | +32.10 pp |
| Dermatologists | 48.15% | 61.11% | +12.97 pp |
Per-pathology analyses within the rare-disease subgroup are reported below as exploratory / hypothesis-generating; per-pathology confirmatory benefit claims are not made on the basis of this investigation.
Exploratory decomposition of device-attributable decision changes
The following decomposition of the paired per-observation reads is reported as an exploratory / hypothesis-generating description of how the device-aided decision related to the unaided decision and the reference diagnosis. It is not a pre-specified endpoint and is not the basis of any confirmatory claim:
- Device reinforced a correct unaided decision (both reads correct) — 46.13% of reads.
- Device improved an incorrect unaided decision (unaided incorrect, aided correct) — 16.61% of reads.
- No change between reads (unaided and aided decisions identical and incorrect) — 35.42% of reads.
- Device-aided decision switched a correct unaided decision to an incorrect one (automation-bias direction) — 1.77% of reads.
The 1.77% device-aided "switch to incorrect" proportion is traced to the risk management record R-TF-013-002 under the two relevant pre-existing risk rows — R-DAG ("Incorrect diagnosis or follow up — the medical device outputs a wrong result and the HCP, unaware of the malfunction, relies on the device output") and R-HAX ("Incorrect interpretation of device outputs — the HCP validates the wrong skin condition") — which together describe the automation-bias direction. The post-observation residual-risk re-evaluation confirms that the observed 1.77% proportion sits within the acceptable band of the existing R-DAG and R-HAX residual-risk determinations and does not require a new risk control. The observed proportion is covered by the existing risk controls (IFU warnings on the non-binding nature of the device output, requirement that the clinician remains the decision-maker, and the Top-5 prioritised differential view rather than a single binding answer), and is monitored as part of the PMCF Plan (R-TF-007-002).
Exploratory per-pathology analysis
Per-pathology analyses are pre-specified as exploratory / hypothesis-generating and are under-powered (2–10 images per category). No per-pathology confirmatory benefit claim is made on the basis of this investigation. P-values below are labelled exploratory and were not corrected for multiplicity; they are reported to characterise directional trends only.
| Condition | Unaided accuracy (%) | Aided accuracy (%) | Absolute difference (pp) | p-value (exploratory) |
|---|---|---|---|---|
| Generalised pustular psoriasis | 23.70 | 46.67 | +22.97 | <0.001 |
| Eczematous dermatitis | 71.34 | 73.17 | +1.83 | 0.629 |
| Acute generalised exanthematous pustulosis | 5.00 | 5.00 | 0.00 | 1.000 |
| Acne | 37.50 | 54.69 | +17.19 | 0.007 |
| Acne conglobata | 18.40 | 37.60 | +19.20 | <0.001 |
| Severe inflammatory acne | 10.61 | 43.94 | +33.33 | <0.001 |
| Seborrheic keratosis | 94.67 | 96.00 | +1.33 | 1.000 |
| Seborrheic dermatitis | 75.34 | 90.41 | +15.07 | 0.001 |
| Palmoplantar pustulosis | 45.31 | 79.69 | +34.38 | <0.001 |
| Plaque psoriasis | 91.89 | 97.30 | +5.41 | 0.125 |
| Pemphigus vulgaris | 28.77 | 56.16 | +27.39 | <0.001 |
| Impetigo | 57.43 | 75.68 | +18.25 | <0.001 |
| Hidradenitis suppurativa | 85.48 | 93.60 | +8.12 | 0.002 |
| Subcorneal pustular dermatosis | 2.67 | 2.67 | 0.00 | 1.000 |
| Tinea corporis | 35.96 | 62.50 | +26.54 | <0.001 |
Exploratory per-pathology comparisons that did not reach statistical significance
These comparisons were never elevated to the status of pre-specified endpoints; they are reported here as part of the exploratory / hypothesis-generating per-pathology analysis, for transparency of the directional pattern:
- Acute generalised exanthematous pustulosis (AGEP): Δ = 0.00 pp, exploratory p = 1.000.
- Subcorneal pustular dermatosis: Δ = 0.00 pp, exploratory p = 1.000.
- Seborrheic keratosis: Δ = +1.33 pp, exploratory p = 1.000.
- Eczematous dermatitis: Δ = +1.83 pp, exploratory p = 0.629.
- Plaque psoriasis: Δ = +5.41 pp, exploratory p = 0.125.
These per-pathology analyses are exploratory and under-powered; the investigation is not powered for, and does not support, per-pathology confirmatory benefit claims for any of these conditions. Claims relating to these pathologies are addressed at Clinical Evaluation level (R-TF-015-003) using the broader body of clinical evidence, and the exploratory trends are recorded as PMCF topics in R-TF-007-002.
Exploratory per-specialty, per-pathology tables
The tables below report per-pathology paired change stratified by reader specialty, for descriptive purposes only. The dermatologist stratum (n = 4) is severely under-powered at per-pathology level and these results must not be interpreted as confirmatory.
Primary care physicians (exploratory)
| Condition | Unaided accuracy (%) | Aided accuracy (%) | Absolute difference (pp) |
|---|---|---|---|
| Generalised pustular psoriasis | 20.20 | 44.44 | +24.24 |
| Eczematous dermatitis | 68.33 | 70.83 | +2.50 |
| Acute generalised exanthematous pustulosis | 0.00 | 0.00 | 0.00 |
| Acne | 36.36 | 59.09 | +22.73 |
| Acne conglobata | 21.18 | 47.06 | +25.88 |
| Severe inflammatory acne | 10.87 | 50.00 | +39.13 |
| Seborrheic keratosis | 92.73 | 94.55 | +1.82 |
| Seborrheic dermatitis | 69.81 | 88.68 | +18.87 |
| Palmoplantar pustulosis | 32.61 | 80.43 | +47.82 |
| Plaque psoriasis | 88.89 | 96.30 | +7.41 |
| Pemphigus vulgaris | 22.64 | 43.40 | +20.76 |
| Impetigo | 50.00 | 71.30 | +21.30 |
| Hidradenitis suppurativa | 82.02 | 92.22 | +10.20 |
| Subcorneal pustular dermatosis | 0.00 | 0.00 | 0.00 |
| Tinea corporis | 29.23 | 59.38 | +30.15 |
Dermatologists (exploratory, under-powered — n = 4)
The per-pathology figures below are reported descriptively. Because the dermatologist stratum comprises only 4 readers, per-pathology estimates are highly sensitive to single-reader outcomes; no per-pathology confirmatory claim is made for the dermatologist stratum.
| Condition | Unaided accuracy (%) | Aided accuracy (%) | Absolute difference (pp) |
|---|---|---|---|
| Generalised pustular psoriasis | 33.33 | 52.78 | +19.45 |
| Eczematous dermatitis | 79.55 | 79.55 | 0.00 |
| Acute generalised exanthematous pustulosis | 18.75 | 18.75 | 0.00 |
| Acne | 40.00 | 45.00 | +5.00 |
| Acne conglobata | 12.50 | 17.50 | +5.00 |
| Severe inflammatory acne | 10.00 | 30.00 | +20.00 |
| Seborrheic keratosis | 100.00 | 100.00 | 0.00 |
| Seborrheic dermatitis | 90.00 | 95.00 | +5.00 |
| Palmoplantar pustulosis | 77.78 | 77.78 | 0.00 |
| Plaque psoriasis | 100.00 | 100.00 | 0.00 |
| Pemphigus vulgaris | 45.00 | 90.00 | +45.00 |
| Impetigo | 77.50 | 87.50 | +10.00 |
| Hidradenitis suppurativa | 94.29 | 97.14 | +2.85 |
| Subcorneal pustular dermatosis | 10.00 | 10.00 | 0.00 |
| Tinea corporis | 54.17 | 70.83 | +16.66 |
Target-pathology exploratory description
For the three target pathologies in the primary-care stratum:
- Generalised pustular psoriasis (exploratory p = 0.00015). Reinforced unaided decision in 12.12% of cases; improved unaided decision in 32.32%; no change in 47.47%; switched correct to incorrect in 8.08%.
- Hidradenitis suppurativa (exploratory p = 0.00391). Reinforced unaided decision in 82.02% of cases; improved unaided decision in 10.11%; no change in 7.87%; no switch-to-incorrect observed.
- Palmoplantar pustulosis (exploratory p < 0.001). Reinforced unaided decision in 32.61% of cases; improved unaided decision in 47.83%; no change in 19.57%; no switch-to-incorrect observed.
The 8.08% GPP primary-care "switch to incorrect" proportion is traced to R-TF-013-002 as a known automation-bias direction for rare-disease presentations with very low baseline accuracy, and is covered by the existing risk controls (IFU warnings, Top-5 prioritised differential rather than single binding answer, the clinician remains the decision-maker). This is recorded as a PMCF-monitored item under R-TF-007-002.
Adverse Events and Adverse Reactions to the Product
Because the investigation recruited no patients and performed no intervention on patients, patient-level adverse events are not applicable. At the reader-participant level, no adverse events related to the investigational product were observed during the conduct of the investigation. The detection mechanism combined: (i) platform-side error logging and session monitoring, and (ii) a per-session free-text feedback field through which reader-participants could report any issue. This reporting path feeds the post-market surveillance procedure (GP-009 Post-Market Surveillance).
Product Deficiencies
No product deficiencies were logged during the conduct of the investigation. The detection mechanism combined platform-side error logging, session monitoring and the reader-participant feedback field described above. This reporting path feeds the non-conforming product control procedure and, as applicable, the post-market surveillance procedure (GP-009).
Paediatric age-band subgroup — exploratory, non-evaluable for confirmatory claims
A paediatric age-band subgroup description is reported below for completeness. Paediatric representation in the image set comprises 3 infant-band cases (1 month – 2 years) and 14 child-band cases (2 – 12 years); no newborn (birth – 1 month) and no adolescent-band cases were included. The paediatric subgroup was not pre-specified in the CIP §Statistical analysis (which pre-specified only rare-disease pathology subgroups as secondary) and is therefore post-hoc and exploratory. All paediatric results are flagged as hypothesis-generating only; no confirmatory paediatric performance claim is made on the basis of this investigation, and paediatric coverage is addressed by dedicated evidence at Clinical Evaluation level (R-TF-015-003) and by the PMCF Plan (R-TF-007-002).
Three paediatric pathologies are represented in the child band (2 – 12 years): impetigo, tinea corporis and AGEP.
Overall accuracy — child subgroup (exploratory)
| HCP group | Unaided accuracy (%) | Aided accuracy (%) | Absolute difference (pp) |
|---|---|---|---|
| All HCPs | 42.70 | 53.97 | +11.27 |
| Primary care | 36.80 | 50.22 | +13.42 |
| Dermatologists | 58.93 | 64.29 | +5.36 |
Per-pathology sensitivity and specificity — child subgroup (exploratory)
| Condition | HCP group | Sens. before (%) | Sens. after (%) | Δ Sens. (pp) | Spec. before (%) | Spec. after (%) | Δ Spec. (pp) |
|---|---|---|---|---|---|---|---|
| Impetigo | All HCPs | 58.15 | 78.52 | +20.37 | 100.00 | 100.00 | 0.00 |
| Impetigo | Primary care | 51.01 | 74.75 | +23.74 | 100.00 | 100.00 | 0.00 |
| Impetigo | Dermatologists | 77.78 | 88.89 | +11.11 | 100.00 | 100.00 | 0.00 |
| Tinea corporis | All HCPs | 53.33 | 26.67 | −26.67 | 43.33 | 26.67 | −16.67 |
| Tinea corporis | Primary care | 54.55 | 27.27 | −27.27 | 45.45 | 27.27 | −18.18 |
| Tinea corporis | Dermatologists | 50.00 | 25.00 | −25.00 | 37.50 | 25.00 | −12.50 |
| AGEP | All HCPs | 5.00 | 5.00 | 0.00 | 6.67 | 6.67 | 0.00 |
| AGEP | Primary care | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| AGEP | Dermatologists | 18.75 | 18.75 | 0.00 | 25.00 | 25.00 | 0.00 |
Impetigo in the child subgroup showed a directional sensitivity improvement (+20.37 pp across all HCPs), with specificity unchanged at 100%. AGEP in the child subgroup showed no change (baseline sensitivity and specificity are very low).
Tinea corporis in the child subgroup — automation-bias direction flagged for risk management. The child-band tinea corporis result shows sensitivity declining by approximately 25–27 pp and specificity declining by 13–18 pp after device use, across all HCP groups. The child-band tinea corporis cell contains fewer than five cases per reader, so individual estimates are highly unstable (single-case flips produce large metric swings). Irrespective of the sample-size instability, the directional change is in the automation-bias direction and is therefore logged against the risk management record R-TF-013-002 under the same two pre-existing risk rows that carry the automation-bias direction — R-DAG ("Incorrect diagnosis or follow up — the medical device outputs a wrong result and the HCP, unaware of the malfunction, relies on the device output") and R-HAX ("Incorrect interpretation of device outputs — the HCP validates the wrong skin condition") — with the paediatric tinea corporis signal recorded as a PMCF-monitored observation under R-TF-007-002. The post-observation residual-risk re-evaluation confirms that the signal does not require a new risk control because the existing controls — IFU warnings on the non-binding nature of the device output, requirement that the clinician remains the decision-maker, Top-5 prioritised differential rather than a single binding answer — already cover the hazard scenario; the residual-risk determination is revisited if the paediatric tinea corporis signal is confirmed in a pre-specified prospective or PMCF dataset. The investigation does not support any paediatric performance claim for tinea corporis; paediatric tinea corporis coverage is deferred to the Clinical Evaluation and to the PMCF Plan.
Discussion and Overall Conclusions
Regulatory positioning of this evidence
Per MDCG 2020-6 Appendix III this investigation produces Rank 11 evidence (simulated-use reader study on retrospective images); per MDCG 2020-1 §4.4 it contributes Pillar 3 Clinical Performance supporting evidence — measuring the clinician's diagnostic decision-making when using the device's Top-5 prioritised differential view. Pillar 2 (algorithm-level analytical performance) is evidenced independently at Clinical Evaluation level and is not the subject of this investigation. Pillar 1 (Valid Clinical Association literature anchoring diagnostic accuracy to patient outcomes) is documented in R-TF-015-011 State of the Art.
Clinical Performance, Efficacy, and Safety
Acceptance-criteria pass/fail summary
The table below renders each pre-specified acceptance criterion from R-TF-015-011 (as configured in the performance-claims data source for BI_2024) against the observed value, with pass/fail called explicitly:
studyCode or folderSlug prop, or ensure this component is used within an Investigation document with a registered folder slug.Conclusions
On the pre-specified primary endpoint, use of the device's Top-5 prioritised differential view increased pooled top-1 diagnostic accuracy among healthcare professionals from 47.94% (95% CI 45.4–50.5%) to 63.06% (95% CI 60.5–65.5%) on the curated anonymised image set; absolute paired improvement +15.12 pp, McNemar p < 0.001. The primary endpoint was met.
On the pre-specified per-specialty secondary endpoints, both the primary-care stratum (+17.00 pp; threshold ≥ 10 pp) and the dermatologist stratum (+8.39 pp; threshold ≥ 5 pp) met their respective thresholds. The rare-disease pooled secondary endpoint showed a +32.32 pp absolute improvement across all healthcare professionals.
Diagnostic accuracy is a surrogate endpoint. The patient-benefit chain (earlier correct treatment, reduced disease progression, more efficient referral pathways) is indirect and is anchored by the Valid Clinical Association literature summarised in R-TF-015-011 State of the Art (Pillar 1); it is not demonstrated by this investigation in isolation and is assessed at Clinical Evaluation level in R-TF-015-003. Published literature documents the burden of diagnostic delays in rare dermatological conditions such as generalised pustular psoriasis and hidradenitis suppurativa (Strober et al. 2021; Costanzo et al. 2022; Kokolakis et al. 2019; Willmen et al. 2024), and documents the relationship between AI-assisted primary-care diagnosis and downstream care patterns (Escalé-Besa et al. 2023; Giavina-Bianchi et al. 2020); these anchors describe the downstream leg of the surrogate-to-outcome chain that this investigation supports, not outcomes demonstrated by this investigation itself.
Per-pathology analyses were exploratory and under-powered. Per-pathology paired change did not reach statistical significance for acute generalised exanthematous pustulosis (Δ = 0.00 pp, exploratory p = 1.000), subcorneal pustular dermatosis (Δ = 0.00 pp, exploratory p = 1.000), seborrheic keratosis (Δ = +1.33 pp, exploratory p = 1.000), eczematous dermatitis (Δ = +1.83 pp, exploratory p = 0.629) or plaque psoriasis (Δ = +5.41 pp, exploratory p = 0.125). No per-pathology confirmatory benefit claim is made on the basis of this investigation; these pathologies are addressed at Clinical Evaluation level using the broader body of clinical evidence (including the prospective real-patient investigations at Ranks 2–4), and the exploratory trends are recorded as PMCF topics in R-TF-007-002.
The paediatric age-band subgroup was post-hoc and exploratory, and is non-evaluable for confirmatory purposes on sample-size grounds. The child-band tinea corporis result, although attributable to the instability of per-reader estimates at very low case counts, is directionally in the automation-bias direction and is accordingly logged against the risk management record R-TF-013-002 (rows R-DAG and R-HAX) and monitored through the PMCF Plan; the post-observation residual-risk re-evaluation does not trigger a new risk control at this time.
The 1.77% overall and 8.08% GPP-specific primary-care "switch to incorrect" proportions in the exploratory decomposition are traced to R-TF-013-002 rows R-DAG and R-HAX as known automation-bias directions and are covered by the existing risk controls (IFU warnings on the non-binding nature of the device output, requirement that the clinician remains the decision-maker, Top-5 prioritised differential rather than a single binding answer). The post-observation residual-risk re-evaluation confirms no new risk control is required at this time; the signals remain PMCF-monitored items under R-TF-007-002.
Potential clinical-system implications (inferential, anchored to external literature)
The exploratory per-specialty and per-pathology trends observed in this investigation — notably the magnitudes of unaided-to-aided top-1 accuracy change within the rare-disease pooled subgroup — are reported here for their bearing on the Pillar 1 Valid Clinical Association chain documented in R-TF-015-011. They are not outcomes that this investigation demonstrates in real-world practice. The published literature characterises (i) the delayed-diagnosis burden in rare pustular dermatoses and hidradenitis suppurativa (Strober et al. 2021; Costanzo et al. 2022; Kokolakis et al. 2019; Willmen et al. 2024), and (ii) the system-level effects of AI-assisted primary-care diagnosis and teledermatology on specialist access, referral precision and waiting times (Escalé-Besa et al. 2023; Giavina-Bianchi et al. 2020). Within that external context, the reader-level aided-vs-unaided accuracy patterns observed here — particularly in the rare-disease subgroup — sit on the surrogate leg of the surrogate-to-outcome chain characterised by the Pillar 1 literature; the outcome leg itself is neither measured nor inferred from this investigation. Demonstration of real-world patient-outcome, survival, quality-of-life and healthcare-economic outcomes is not a claim of this investigation; it is addressed at Clinical Evaluation level and by the PMCF Plan.
References
-
Strober B, Kotowsky N, Medeiros R, et al. Unmet Medical Needs in the Treatment and Management of Generalized Pustular Psoriasis Flares: Evidence from a Survey of Corrona Registry Dermatologists. Dermatol Ther (Heidelb). 2021 Apr;11(2):529-541.
-
Costanzo A, Bardazzi F, De Simone C, et al. Pustular psoriasis with a focus on generalized pustular psoriasis: classification and diagnostic criteria. An Italian expert consensus. Ital J Dermatol Venerol. 2022 Dec;157(6):489-496.
-
Kokolakis G, Wolk K, Schneider-Burrus S, et al. Delayed Diagnosis of Hidradenitis Suppurativa and Its Effect on Patients and Healthcare System. Dermatology. 2020;236(5):421-430.
-
Willmen L, Völkel L, Willmen T, et al. The economic burden of diagnostic uncertainty on rare disease patients. BMC Health Serv Res. 2024 Nov 12;24(1):1388.
-
Escalé-Besa A, Yélamos O, Vidal-Alaball J, et al. Exploring the potential of artificial intelligence in improving skin lesion diagnosis in primary care. Scientific Reports. 2023 Mar 15;13(1):4293.
-
Giavina-Bianchi M, Santos AP, Cordioli E. Teledermatology reduces dermatology referrals and improves access to specialists. EClinicalMedicine. 2020 Nov 21;29-30:100641.
Implications for Future Research
The findings of this investigation are hypothesis-generating with respect to the real-world Pillar 3 Clinical Performance claims that require Rank 2–4 prospective real-patient evidence. Specifically, questions that are explicitly outside the scope of this Rank-11 simulated-use investigation and are addressed by dedicated evidence at Clinical Evaluation level (R-TF-015-003) and by the PMCF Plan (R-TF-007-002) include:
- Real-world top-1 accuracy in consulting primary-care and dermatology populations (outside the curated image set).
- Performance on Fitzpatrick phototypes V and VI (under-represented in this image set).
- Confirmatory paediatric performance, including the child-band tinea corporis automation-bias signal (non-evaluable for confirmatory purposes in this investigation).
- Per-pathology confirmatory performance for AGEP, subcorneal pustular dermatosis, seborrheic keratosis, eczematous dermatitis and plaque psoriasis (exploratory not significant in this investigation).
- Real-world referral-pathway and teleconsultation-workflow outcomes (this investigation measures reader diagnostic decisions under simulated use, not pathway outcomes).
- Long-term post-deployment diagnostic performance, including drift and continued-use effects.
Continued refinement of the device's algorithms and UI integration remains an engineering question addressed under the change-control framework of the QMS; it is not an open question for this investigation.
Limitations of Clinical Research
The following structural limitations apply to this investigation and constrain the claims that it can support:
- Rank 11 simulated-use evidence. Under MDCG 2020-6 Appendix III this investigation produces Rank 11 evidence (simulated-use reader study on retrospective images) and is distinct from clinical data on real patients within the meaning of MDR Article 2(48). It contributes Pillar 3 §4.4 supporting evidence only; primary Pillar 3 Clinical Performance evidence is supplied by the prospective real-patient investigations at Ranks 2–4.
- Curated image set vs real-world imaging. The image set combines public dermatological atlases and a de-identified sponsor image library with confirmed reference diagnoses; atlas-grade image quality over-represents optimal acquisition conditions relative to typical real-world captures, and this is expected to bias observed accuracy upward relative to real-world deployment.
- Fitzpatrick phototype coverage. The image set contains no Fitzpatrick VI cases and only 6 Fitzpatrick V cases; this investigation does not support Fitzpatrick V/VI performance claims on its own, and phototype coverage is addressed by dedicated evidence at Clinical Evaluation level.
- Paediatric subgroup scope. The paediatric subgroup was post-hoc (not pre-specified in the CIP) and is non-evaluable for confirmatory purposes; the child-band tinea corporis automation-bias signal is logged against R-TF-013-002 and monitored under the PMCF Plan.
- Partial completion by 6 of 15 readers. Six readers completed 68 – 99 of the 100 images, for a total of 1,449 paired observations (96.6% of the planned 1,500). The pre-specified primary analysis was complete-case at the per-observation level; a sensitivity analysis restricted to the 9 full-completer readers (900 paired observations) confirmed the direction and magnitude of the primary estimate.
- Recall bias in a within-subject design. The unaided and device-aided reads evaluate the same image set sequentially without a washout period; the observed difference reflects device impact plus any recall-assisted improvement across reads, with expected net bias upward on accuracy.
- Per-pathology statistical power. With 2–10 images per pathology category, per-pathology analyses are exploratory and under-powered; no per-pathology confirmatory benefit claim is made.
- Per-pathology non-significance. Per-pathology p-values did not reach statistical significance for AGEP, subcorneal pustular dermatosis, seborrheic keratosis, eczematous dermatitis or plaque psoriasis; no confirmatory benefit claim is made for these conditions.
- Per-specialty imbalance. The dermatologist stratum (n = 4) is under-powered for per-pathology confirmatory claims; per-pathology dermatologist figures are reported descriptively only.
- Device-aided "switch to incorrect" proportion. A 1.77% overall (and 8.08% GPP-specific primary-care) proportion of device-aided reads switched a correct unaided decision to an incorrect one; this is traced to R-TF-013-002 rows R-DAG and R-HAX, covered by existing risk controls (IFU warnings, Top-5 prioritised differential, clinician remains the decision-maker) and monitored through the PMCF Plan.
- No real-world validation within this investigation. Real-world patient-outcome, triage, teledermatology and healthcare-economics outcomes are not measured here; they are addressed at Clinical Evaluation level and by the PMCF Plan. Readers are also aware that they are being observed; this may modulate their behaviour relative to routine practice.
Ethical Aspects of Clinical Research
The investigation is conducted in accordance with the ethical principles of the Declaration of Helsinki to the extent applicable to retrospective studies of anonymised material, and in compliance with Regulation (EU) 2016/679 (GDPR) and Spanish Ley Orgánica 3/2018 on the protection of personal data. No data allowing the personal identification of any individual is included, and all information is managed under appropriate technical and organisational security measures.
The sponsor's determination of non-applicability of biomedical-research law (Ley 14/2007) and of ethics-committee review for this investigation is documented in Annex E (R-TF-015-010) under "Ethics Committee Non-Applicability Determination". The investigation does not recruit patients, does not interfere with any clinical care, and uses only completely anonymised images from public dermatological atlases and from a de-identified sponsor image library; it therefore does not fall within the material scope of the biomedical-research framework that would require ethics-committee authorisation. Reader-participants receive comprehensive written and oral information about the investigation and sign a participation agreement with the sponsor; formal informed-consent forms are not required because the investigation is observational and non-interventional with respect to their clinical practice.
The Data Controller for the reader-participant data processed under this investigation is the Principal Investigator and sponsor jointly, as documented in the participation agreement. The manufacturer provides the validated reader platform as Data Processor under a data-processing agreement and does not process data beyond the scope of that agreement. Storage and handling of reader-participant data are aligned with GDPR and Ley Orgánica 3/2018; at the conclusion of the investigation, reader-session data retained beyond the analytic record are deleted according to the sponsor's data-retention schedule.
Investigators and Administrative Structure of Clinical Research
Brief Description
The investigation was conducted by the participating medical staff in conjunction with the manufacturer and the sponsor (Boehringer Ingelheim).
Investigators
Principal investigator
- Dr. Antonio Martorell Calatayud
Collaborating Investigators (Clinical Staff)
- Dr. Mari Carmen Galindo
- Dr. Paco García Tolosa
- Dr. Laura Yuste Hidalgo
- Dr. Nuria Comabella
- Dr. Marta Vázquez
- Dr. David Palacios
- Dr. Norma Alejandra Doria Carlin
- Dr. Francisco José Esteban González
- Dr. Alfonso José Valcarce Leonisio
- Dr. José Antonio Arjona Sevilla
- Dr. Javier Melgosa
- Dr. Manuel Ballesteros Redondo
- Dr. Esmeralda Silva
- Dr. Ana Llull Ramos
- Dr. Angela Patricia Guzmán
Technical Support (Manufacturer)
- Mr. Alfonso Medela — Chief Technology Officer
- Dr. Alberto Sabater — Algorithm Lead
- Mrs. Alba Rodríguez — Clinical Operations
Investigator Qualifications
All healthcare professional investigators are board-certified physicians with a minimum of 5 years of clinical experience in their respective specialties (primary care or dermatology). The research team received comprehensive training on the investigation protocol, the device, and applicable compliance requirements. Training was conducted via presentation format covering protocol procedures, device functionality, and data-entry requirements. Training-attendance records and presentation materials are retained as essential documents under the custody of the Principal Investigator and are available for audit or inspection.
External Organisation
No additional organisations, beyond those listed above, contributed to the investigation. The investigation was conducted with the collaboration and resources of the specified entities.
Sponsor and Monitor
Sponsor: Boehringer Ingelheim. Monitoring was performed as described in the CIP §Monitoring plan.
Report Annexes
- Instructions For Use (IFU) v1.1.0.0 was provided to all reader-participants at investigation onboarding and served as the investigational-device orientation document; the IFU is cross-referenced from the ISO 14155 applicability checklist in Annex E (R-TF-015-010).
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: Team members involved
- Reviewer: JD-018 Clinical Research Coordinator
- Approver: JD-022 Medical Manager
Publication Status
The manufacturer intends to submit these results for peer-reviewed publication. The full citation (journal name, DOI, volume, issue and page numbers) will be appended once the publication record is available. Any pre-publication framing of the results will be aligned with the Rank 11 / Pillar 3 §4.4 positioning adopted in this report.