Why Performance Claims Are Not Condition-Specific: Clinical Evidence Rationale

Purpose of This Document

This document explains why the device's performance claims do not -- and should not -- enumerate specific conditions (e.g., "the device diagnoses psoriasis with X% sensitivity"). Instead, the claims address the improvement in healthcare professional (HCP) diagnostic performance when using the device, which always outputs a probability distribution across all ICD-11 categories. This is not a gap; it is a direct consequence of the device's principle of operation and intended purpose.

The Fundamental Distinction: Output vs. Sample

The device always outputs all ICD-11 categories

Every time the device processes an image, regardless of what condition the image depicts, the output is a normalised probability distribution vector across all validated ICD-11 categories:

p = [p_1, p_2, ..., p_346]   where   sum(p_i) = 1.0

This means:

If the image depicts psoriasis, the output includes probabilities for all 346 ICD-11 categories, not just psoriasis.
If the image depicts melanoma, the output still includes probabilities for all 346 ICD-11 categories.
If the image depicts a rare condition like Generalized Pustular Psoriasis (GPP), the output includes probabilities for all 346 ICD-11 categories.

The device does not "switch on" for certain conditions and "switch off" for others. There is no condition-specific mode. The distribution is always complete.

What varies across studies is the image sample, not the output

In our clinical investigations, the image samples are composed of specific conditions -- this is a practical necessity of study design. But the device's output for each of those images is never limited to the conditions in the sample. For every image, the HCP receives the full ICD-11 probability distribution and uses it to inform their clinical decision.

Therefore, when we measure "diagnostic accuracy improvement," we are measuring whether the full distribution (presented to the HCP) helps them arrive at a better clinical assessment -- not whether the device correctly identifies condition X in isolation.

How This Is Reflected in Our Clinical Studies

Our clinical evidence, as documented in clinicalStudiesData.ts, includes studies across diverse clinical settings:

Study	Setting	Image Sample Focus	Device Output
COVIDX_EVCDAO_2022	Hospital (Torrejon)	Multiple dermatological conditions	Full ICD-11 distribution (all 346 categories)
DAO_Derivacion_PH_2022	Primary Care (Pozuelo/Majadahonda)	Multiple dermatological conditions	Full ICD-11 distribution (all 346 categories)
IDEI_2023	Dermatology Institute	Multiple dermatological conditions	Full ICD-11 distribution (all 346 categories)
SAN_2024	Remote (Sanitas)	Multiple dermatological conditions	Full ICD-11 distribution (all 346 categories)
PH_2024	Remote (Puerta de Hierro)	Multiple dermatological conditions	Full ICD-11 distribution (all 346 categories)
BI_2024	Remote (Boehringer Ingelheim)	GPP, HS, and confusable conditions	Full ICD-11 distribution (all 346 categories)
DAO_Derivation_O_2022	Primary Care (Osakidetza)	Multiple dermatological conditions	Full ICD-11 distribution (all 346 categories)

In every single study, the device output is the same: a full probability distribution across all ICD-11 categories. The claim being tested is always: "Does the information provided by the device increase the diagnostic performance of the HCP?" -- never "Does the device diagnose condition X?"

The Rare Disease Case (Sub-criterion (b) of Benefit 7GH)

What this sub-criterion says

The rare disease sub-criterion of benefit 7GH states:

"The device improves accuracy of HCPs during the diagnosis of rare diseases. This has a positive impact on patient management and outcomes related to diagnosis and monitoring of patients, especially those suffering from rare diseases."

This sub-criterion is supported by performance claims from the BI_2024 study (Boehringer Ingelheim), which demonstrated that the device increased HCP diagnostic accuracy for rare diseases by approximately 27% overall -- and by 32% for primary care practitioners specifically.

Why this does not make the device "disease-focused"

The existence of this sub-criterion does not mean the device has been redesigned or reconfigured to target rare diseases. The device operates identically for rare diseases as it does for common ones: it outputs the full ICD-11 probability distribution.

What differs is the clinical context, not the device behaviour:

HCPs have lower baseline knowledge of rare conditions. By definition, rare diseases (GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris) are encountered infrequently. HCPs -- especially primary care practitioners -- have less clinical experience with them, leading to longer diagnostic delays and higher misdiagnosis rates (Kokolakis et al. 2019; Willmen et al. 2024).
The ICD-11 distribution is therefore more impactful. When an HCP lacks experience with a rare condition, the device's probability distribution provides proportionally more useful information than when the HCP already has strong clinical intuition about a common condition. The device does not do anything differently; it is the HCP's baseline that is lower.
The BI_2024 study confirmed this. Even though the image sample was composed of GPP, HS, and conditions confusable with these, the device output for each image included all 346 ICD-11 categories. The study measured whether having this full distribution improved HCP performance -- and it did, particularly for rare conditions where baseline HCP accuracy was lowest.

The image sample included rare conditions; the output included all conditions

In the BI_2024 study:

Image sample: 100 images per HCP, including GPP, HS, and confusable pathologies.
Device output per image: Full probability distribution across all 346 ICD-11 categories.
What was measured: Whether the HCP's diagnostic accuracy increased after receiving the device output.
Subgroup analysis: A specific subgroup analysis was performed for diseases categorised as rare (GPP, Acne Conglobata, Palmoplantar Pustulosis, Subcorneal Pustular Dermatosis, AGEP, Pemphigus Vulgaris). These rare diseases are classified according to the European Commission's definition (Decision No. 1295/1999/CE): conditions affecting no more than 5 per 10,000 inhabitants in the European Union.

The subgroup analysis does not mean the device was "targeting" rare diseases. It means the study design intentionally verified that the device's benefit extends to a class of conditions where HCP baseline performance is expected to be lower -- a pragmatic distinction based on clinical reality, not a change in the device's principle of operation.

Regulatory Justification

MDR 2017/745 -- Intended Purpose Drives Performance Evaluation

Under the Medical Devices Regulation (EU) 2017/745, performance evidence must be aligned with the device's intended purpose (Article 2(12)). The intended purpose of this device is:

"to support healthcare providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing an interpretative distribution representation of possible ICD categories"

The intended purpose is defined in terms of the distribution output, not in terms of diagnosing specific conditions. Therefore, performance evidence should demonstrate that this distribution achieves its stated purpose: supporting HCPs in their clinical assessment across all dermatological conditions.

MDR Article 61 and Annex XIV -- Clinical Evidence Requirements

Article 61(1) requires manufacturers to demonstrate conformity with the relevant General Safety and Performance Requirements (GSPRs) through clinical evidence "appropriate to the characteristics of the device and its intended purpose." Annex XIV Part A further specifies that the clinical evaluation must be based on the device's intended purpose.

Since the device's intended purpose is to provide a distributional output across all ICD-11 categories, the clinical evidence must demonstrate that this distribution achieves its purpose -- not that the device diagnoses any individual condition.

MDCG 2020-1 -- Clinical Evidence for Medical Device Software (MDSW)

The MDCG 2020-1 guidance on clinical evaluation of medical device software recognises three components of clinical evidence:

Valid clinical association -- Is the device's output clinically relevant?
Technical performance -- Does the device produce accurate outputs?
Clinical performance -- Does the device achieve its intended clinical benefit in practice?

For this device:

Valid clinical association: The ICD-11 classification system is the internationally recognised standard for disease categorisation. The device's output (a probability distribution across ICD-11 categories) is inherently clinically relevant because it maps to the established clinical taxonomy.
Technical performance: Demonstrated through top-k accuracy metrics across the full distribution (top-1 >= 50%, top-3 >= 60%, top-5 >= 70%). These are distributional metrics, not condition-specific sensitivity/specificity.
Clinical performance: Demonstrated through clinical investigations measuring HCP diagnostic accuracy improvement when using the device. The claim is that the device improves HCP performance -- not that the device itself diagnoses.

MDCG 2020-6 -- Level of Clinical Evidence

The MDCG 2020-6 guidance provides a hierarchy of clinical evidence. Our clinical investigations constitute the strongest form of clinical evidence (manufacturer's own clinical investigations), and they directly measure the intended clinical benefit: improvement in HCP diagnostic performance.

Why condition-specific claims would be inappropriate

A condition-specific performance claim (e.g., "the device detects melanoma with 95% sensitivity") would be appropriate for a diagnostic test -- a device that provides a binary positive/negative result for a specific condition (like a COVID-19 antigen test or an HIV screening test).

This device is not a diagnostic test. It does not provide a binary result. It does not confirm or rule out any condition. It provides a probability distribution across all ICD-11 categories, and the clinical decision remains with the HCP.

Imposing condition-specific performance requirements on this device would be analogous to requiring a weather forecast to provide a binary "rain / no rain" prediction for a specific city, when in fact the forecast provides a continuous probability distribution of precipitation across all regions. The forecast is not "wrong about London" if it assigns 30% probability to rain in London and 70% to dry weather -- it is providing information in a fundamentally different format from a binary alarm.

The MDR recognises this distinction. Article 2(12) defines intended purpose broadly, and Article 61 requires clinical evidence "appropriate to the characteristics of the device." The characteristics of this device are distributional, not binary, and the clinical evidence reflects this.

The Correct Framework for Evaluating This Device

What we claim

We claim that the device improves the diagnostic performance of HCPs by providing them with a probability distribution across all ICD-11 categories. This is a decision support claim, not a diagnostic claim.

What we measure

Top-k accuracy: Whether the correct ICD-11 category appears in the device's top-1, top-3, or top-5 predictions. This is a distributional metric that respects the device's output format.
HCP accuracy improvement: Whether HCPs make more accurate diagnoses when they have access to the device's output versus when they do not. This directly measures the intended clinical benefit.
Sensitivity/specificity of the HCP: In some studies, we measure the HCP's sensitivity and specificity -- but this is the HCP's performance, not the device's. The device does not have sensitivity or specificity because it does not produce binary results.

What we do not claim

We do not claim that the device diagnoses any specific condition.
We do not claim condition-specific sensitivity or specificity for the device.
We do not claim that the device replaces clinical judgment.
We do not claim that the device provides binary diagnostic results.

Why specific conditions appear in study designs

Specific conditions appear in study designs because:

Practical necessity: Clinical investigation images must depict real conditions. You cannot run a study with abstract images.
Representative sampling: The image samples are designed to represent the range of conditions the device will encounter in clinical practice.
Subgroup analysis: Analysing performance across subgroups (rare diseases, malignant conditions, common conditions) provides richer evidence about where the device's distributional output has the greatest clinical impact -- without implying the device operates differently for each subgroup.

The fact that study images include conditions A, B, and C does not mean the performance claim is about conditions A, B, and C. The performance claim is about the distribution across all 346 ICD-11 categories that the device outputs for each image, regardless of what the image depicts.

Summary

Aspect	Diagnostic Test (e.g., COVID-19)	This Device
Output format	Binary (positive / negative)	Probability distribution across all 346 ICD-11 categories
Performance metric	Sensitivity / specificity for condition X	Top-k accuracy across all categories; HCP accuracy improvement
Performance claim	"Detects condition X with Y% sensitivity"	"Improves HCP diagnostic performance by providing ICD-11 distribution"
Condition-specific?	Yes -- each claim is about one condition	No -- each output covers all conditions simultaneously
Who diagnoses?	The device provides a result; HCP interprets	The device provides information; HCP decides
Study sample composition	Affects the claim directly	Affects only which images were used, not the device output

The device's performance claims are not condition-specific because the device itself is not condition-specific. It always outputs all ICD-11 categories, and the clinical evidence measures whether this output helps HCPs -- not whether the device diagnoses any particular disease. The inclusion of rare diseases in our evidence base (sub-criterion (b) of benefit 7GH, study BI_2024) reflects a pragmatic recognition that HCPs benefit disproportionately from the device's output when dealing with conditions they encounter infrequently -- not a shift toward disease-focused operation.

Purpose of This Document​

The Fundamental Distinction: Output vs. Sample​

The device always outputs all ICD-11 categories​

What varies across studies is the image sample, not the output​

How This Is Reflected in Our Clinical Studies​

The Rare Disease Case (Sub-criterion (b) of Benefit 7GH)​

What this sub-criterion says​

Why this does not make the device "disease-focused"​

The image sample included rare conditions; the output included all conditions​

Regulatory Justification​

MDR 2017/745 -- Intended Purpose Drives Performance Evaluation​

MDR Article 61 and Annex XIV -- Clinical Evidence Requirements​

MDCG 2020-1 -- Clinical Evidence for Medical Device Software (MDSW)​

MDCG 2020-6 -- Level of Clinical Evidence​

Why condition-specific claims would be inappropriate​

The Correct Framework for Evaluating This Device​

What we claim​

What we measure​

What we do not claim​

Why specific conditions appear in study designs​

Summary​