Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
      • Round 1
        • M1: Diagnostic Function
          • Q1: IFU Performance Claims
            • Question
            • Research and planning
            • Response
            • fyi
              • Why Performance Claims Are Not Condition-Specific: Clinical Evidence Rationale
              • The Device Provides an ICD-11 Distribution, Not a Diagnosis
              • Question for Jordi: Studies in the IFU bibliography that may raise concerns
          • Q2: Test Environment
          • Q3: Biofilm/Slough Verification
          • Q4: T377 Test Results
        • M2: Software V&V
        • N1: Information Supplied
        • N2: Usability
        • N3: Risk Management
    • BSI Non-Conformities
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Technical Review
  • Round 1
  • M1: Diagnostic Function
  • Q1: IFU Performance Claims
  • fyi
  • Why Performance Claims Are Not Condition-Specific: Clinical Evidence Rationale

Why Performance Claims Are Not Condition-Specific: Clinical Evidence Rationale

Purpose of This Document​

This document explains why the device's performance claims do not -- and should not -- enumerate specific conditions (e.g., "the device diagnoses psoriasis with X% sensitivity"). Instead, the claims address the improvement in healthcare professional (HCP) diagnostic performance when using the device, which always outputs a probability distribution across all ICD-11 categories. This is not a gap; it is a direct consequence of the device's principle of operation and intended purpose.


The Fundamental Distinction: Output vs. Sample​

The device always outputs all ICD-11 categories​

Every time the device processes an image, regardless of what condition the image depicts, the output is a normalised probability distribution vector across all validated ICD-11 categories:

p = [p_1, p_2, ..., p_346]   where   sum(p_i) = 1.0

This means:

  • If the image depicts psoriasis, the output includes probabilities for all 346 ICD-11 categories, not just psoriasis.
  • If the image depicts melanoma, the output still includes probabilities for all 346 ICD-11 categories.
  • If the image depicts a rare condition like Generalized Pustular Psoriasis (GPP), the output includes probabilities for all 346 ICD-11 categories.

The device does not "switch on" for certain conditions and "switch off" for others. There is no condition-specific mode. The distribution is always complete.

What varies across studies is the image sample, not the output​

In our clinical investigations, the image samples are composed of specific conditions -- this is a practical necessity of study design. But the device's output for each of those images is never limited to the conditions in the sample. For every image, the HCP receives the full ICD-11 probability distribution and uses it to inform their clinical decision.

Therefore, when we measure "diagnostic accuracy improvement," we are measuring whether the full distribution (presented to the HCP) helps them arrive at a better clinical assessment -- not whether the device correctly identifies condition X in isolation.


How This Is Reflected in Our Clinical Studies​

Our clinical evidence, as documented in clinicalStudiesData.ts, includes studies across diverse clinical settings:

StudySettingImage Sample FocusDevice Output
COVIDX_EVCDAO_2022Hospital (Torrejon)Multiple dermatological conditionsFull ICD-11 distribution (all 346 categories)
DAO_Derivacion_PH_2022Primary Care (Pozuelo/Majadahonda)Multiple dermatological conditionsFull ICD-11 distribution (all 346 categories)
IDEI_2023Dermatology InstituteMultiple dermatological conditionsFull ICD-11 distribution (all 346 categories)
SAN_2024Remote (Sanitas)Multiple dermatological conditionsFull ICD-11 distribution (all 346 categories)
PH_2024Remote (Puerta de Hierro)Multiple dermatological conditionsFull ICD-11 distribution (all 346 categories)
BI_2024Remote (Boehringer Ingelheim)GPP, HS, and confusable conditionsFull ICD-11 distribution (all 346 categories)
DAO_Derivation_O_2022Primary Care (Osakidetza)Multiple dermatological conditionsFull ICD-11 distribution (all 346 categories)

In every single study, the device output is the same: a full probability distribution across all ICD-11 categories. The claim being tested is always: "Does the information provided by the device increase the diagnostic performance of the HCP?" -- never "Does the device diagnose condition X?"


The Rare Disease Case (Clinical Benefit 9VW)​

What benefit 9VW says​

Clinical benefit 9VW states:

"The device improves accuracy of HCPs during the diagnosis of rare diseases. This has a positive impact on patient management and outcomes related to diagnosis and monitoring of patients, especially those suffering from rare diseases."

This benefit is supported by performance claims from the BI_2024 study (Boehringer Ingelheim), which demonstrated that the device increased HCP diagnostic accuracy for rare diseases by approximately 27% overall -- and by 32% for primary care practitioners specifically.

Why this does not make the device "disease-focused"​

The existence of benefit 9VW does not mean the device has been redesigned or reconfigured to target rare diseases. The device operates identically for rare diseases as it does for common ones: it outputs the full ICD-11 probability distribution.

What differs is the clinical context, not the device behaviour:

  1. HCPs have lower baseline knowledge of rare conditions. By definition, rare diseases (GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris) are encountered infrequently. HCPs -- especially primary care practitioners -- have less clinical experience with them, leading to longer diagnostic delays and higher misdiagnosis rates (Kokolakis et al. 2019; Willmen et al. 2024).

  2. The ICD-11 distribution is therefore more impactful. When an HCP lacks experience with a rare condition, the device's probability distribution provides proportionally more useful information than when the HCP already has strong clinical intuition about a common condition. The device does not do anything differently; it is the HCP's baseline that is lower.

  3. The BI_2024 study confirmed this. Even though the image sample was composed of GPP, HS, and conditions confusable with these, the device output for each image included all 346 ICD-11 categories. The study measured whether having this full distribution improved HCP performance -- and it did, particularly for rare conditions where baseline HCP accuracy was lowest.

The image sample included rare conditions; the output included all conditions​

In the BI_2024 study:

  • Image sample: 100 images per HCP, including GPP, HS, and confusable pathologies.
  • Device output per image: Full probability distribution across all 346 ICD-11 categories.
  • What was measured: Whether the HCP's diagnostic accuracy increased after receiving the device output.
  • Subgroup analysis: A specific subgroup analysis was performed for diseases categorised as rare (GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris).

The subgroup analysis does not mean the device was "targeting" rare diseases. It means the study design intentionally verified that the device's benefit extends to a class of conditions where HCP baseline performance is expected to be lower -- a pragmatic distinction based on clinical reality, not a change in the device's principle of operation.


Regulatory Justification​

MDR 2017/745 -- Intended Purpose Drives Performance Evaluation​

Under the Medical Devices Regulation (EU) 2017/745, performance evidence must be aligned with the device's intended purpose (Article 2(12)). The intended purpose of this device is:

"to support healthcare providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing an interpretative distribution representation of possible ICD categories"

The intended purpose is defined in terms of the distribution output, not in terms of diagnosing specific conditions. Therefore, performance evidence should demonstrate that this distribution achieves its stated purpose: supporting HCPs in their clinical assessment across all dermatological conditions.

MDR Article 61 and Annex XIV -- Clinical Evidence Requirements​

Article 61(1) requires manufacturers to demonstrate conformity with the relevant General Safety and Performance Requirements (GSPRs) through clinical evidence "appropriate to the characteristics of the device and its intended purpose." Annex XIV Part A further specifies that the clinical evaluation must be based on the device's intended purpose.

Since the device's intended purpose is to provide a distributional output across all ICD-11 categories, the clinical evidence must demonstrate that this distribution achieves its purpose -- not that the device diagnoses any individual condition.

MDCG 2020-1 -- Clinical Evidence for Medical Device Software (MDSW)​

The MDCG 2020-1 guidance on clinical evaluation of medical device software recognises three components of clinical evidence:

  1. Valid clinical association -- Is the device's output clinically relevant?
  2. Technical performance -- Does the device produce accurate outputs?
  3. Clinical performance -- Does the device achieve its intended clinical benefit in practice?

For this device:

  1. Valid clinical association: The ICD-11 classification system is the internationally recognised standard for disease categorisation. The device's output (a probability distribution across ICD-11 categories) is inherently clinically relevant because it maps to the established clinical taxonomy.

  2. Technical performance: Demonstrated through top-k accuracy metrics across the full distribution (top-1 >= 50%, top-3 >= 60%, top-5 >= 70%). These are distributional metrics, not condition-specific sensitivity/specificity.

  3. Clinical performance: Demonstrated through clinical investigations measuring HCP diagnostic accuracy improvement when using the device. The claim is that the device improves HCP performance -- not that the device itself diagnoses.

MDCG 2020-6 -- Level of Clinical Evidence​

The MDCG 2020-6 guidance provides a hierarchy of clinical evidence. Our clinical investigations constitute the strongest form of clinical evidence (manufacturer's own clinical investigations), and they directly measure the intended clinical benefit: improvement in HCP diagnostic performance.

Why condition-specific claims would be inappropriate​

A condition-specific performance claim (e.g., "the device detects melanoma with 95% sensitivity") would be appropriate for a diagnostic test -- a device that provides a binary positive/negative result for a specific condition (like a COVID-19 antigen test or an HIV screening test).

This device is not a diagnostic test. It does not provide a binary result. It does not confirm or rule out any condition. It provides a probability distribution across all ICD-11 categories, and the clinical decision remains with the HCP.

Imposing condition-specific performance requirements on this device would be analogous to requiring a weather forecast to provide a binary "rain / no rain" prediction for a specific city, when in fact the forecast provides a continuous probability distribution of precipitation across all regions. The forecast is not "wrong about London" if it assigns 30% probability to rain in London and 70% to dry weather -- it is providing information in a fundamentally different format from a binary alarm.

The MDR recognises this distinction. Article 2(12) defines intended purpose broadly, and Article 61 requires clinical evidence "appropriate to the characteristics of the device." The characteristics of this device are distributional, not binary, and the clinical evidence reflects this.


The Correct Framework for Evaluating This Device​

What we claim​

We claim that the device improves the diagnostic performance of HCPs by providing them with a probability distribution across all ICD-11 categories. This is a decision support claim, not a diagnostic claim.

What we measure​

  • Top-k accuracy: Whether the correct ICD-11 category appears in the device's top-1, top-3, or top-5 predictions. This is a distributional metric that respects the device's output format.
  • HCP accuracy improvement: Whether HCPs make more accurate diagnoses when they have access to the device's output versus when they do not. This directly measures the intended clinical benefit.
  • Sensitivity/specificity of the HCP: In some studies, we measure the HCP's sensitivity and specificity -- but this is the HCP's performance, not the device's. The device does not have sensitivity or specificity because it does not produce binary results.

What we do not claim​

  • We do not claim that the device diagnoses any specific condition.
  • We do not claim condition-specific sensitivity or specificity for the device.
  • We do not claim that the device replaces clinical judgment.
  • We do not claim that the device provides binary diagnostic results.

Why specific conditions appear in study designs​

Specific conditions appear in study designs because:

  1. Practical necessity: Clinical investigation images must depict real conditions. You cannot run a study with abstract images.
  2. Representative sampling: The image samples are designed to represent the range of conditions the device will encounter in clinical practice.
  3. Subgroup analysis: Analysing performance across subgroups (rare diseases, malignant conditions, common conditions) provides richer evidence about where the device's distributional output has the greatest clinical impact -- without implying the device operates differently for each subgroup.

The fact that study images include conditions A, B, and C does not mean the performance claim is about conditions A, B, and C. The performance claim is about the distribution across all 346 ICD-11 categories that the device outputs for each image, regardless of what the image depicts.


Summary​

AspectDiagnostic Test (e.g., COVID-19)This Device
Output formatBinary (positive / negative)Probability distribution across all 346 ICD-11 categories
Performance metricSensitivity / specificity for condition XTop-k accuracy across all categories; HCP accuracy improvement
Performance claim"Detects condition X with Y% sensitivity""Improves HCP diagnostic performance by providing ICD-11 distribution"
Condition-specific?Yes -- each claim is about one conditionNo -- each output covers all conditions simultaneously
Who diagnoses?The device provides a result; HCP interpretsThe device provides information; HCP decides
Study sample compositionAffects the claim directlyAffects only which images were used, not the device output

The device's performance claims are not condition-specific because the device itself is not condition-specific. It always outputs all ICD-11 categories, and the clinical evidence measures whether this output helps HCPs -- not whether the device diagnoses any particular disease. The inclusion of rare diseases in our evidence base (benefit 9VW, study BI_2024) reflects a pragmatic recognition that HCPs benefit disproportionately from the device's output when dealing with conditions they encounter infrequently -- not a shift toward disease-focused operation.

Previous
Response
Next
The Device Provides an ICD-11 Distribution, Not a Diagnosis
  • Purpose of This Document
  • The Fundamental Distinction: Output vs. Sample
    • The device always outputs all ICD-11 categories
    • What varies across studies is the image sample, not the output
  • How This Is Reflected in Our Clinical Studies
  • The Rare Disease Case (Clinical Benefit 9VW)
    • What benefit 9VW says
    • Why this does not make the device "disease-focused"
    • The image sample included rare conditions; the output included all conditions
  • Regulatory Justification
    • MDR 2017/745 -- Intended Purpose Drives Performance Evaluation
    • MDR Article 61 and Annex XIV -- Clinical Evidence Requirements
    • MDCG 2020-1 -- Clinical Evidence for Medical Device Software (MDSW)
    • MDCG 2020-6 -- Level of Clinical Evidence
    • Why condition-specific claims would be inappropriate
  • The Correct Framework for Evaluating This Device
    • What we claim
    • What we measure
    • What we do not claim
    • Why specific conditions appear in study designs
  • Summary
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)