Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 0: Background & Action Plan
          • X-3: Disease categorisation decision
          • Item 0 — AI Context
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
    • BSI Non-Conformities
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • Item 0: Background & Action Plan
  • X-3: Disease categorisation decision

X-3: Disease categorisation decision

Internal working document

This page documents the team's decision on how to structure clinical evidence by disease category. It is the resolution of the tension identified during the BSI clarification meeting (2026-03-25) and the internal debriefs. This decision is a prerequisite for Items 2b and 3a.

The tension​

The device outputs a probability distribution over all visible ICD-11 classes. It never produces a binary positive/negative for any specific condition. This is the factual description of the device architecture — and it is the intended use that we claim.

However, MDR clinical evaluation requirements demand that evidence demonstrate performance for the device's clinical indications. BSI's Nick stated explicitly: "the rules do not change" regardless of how the device frames its output. The evaluation must show that the probability distribution is clinically reliable per condition where clinical risk demands it.

This creates a tension: structuring evidence by disease category implicitly frames the device as "for diagnosing diseases," which conflicts with the ICD-11 probability distribution architecture. The question is whether and how to reconcile these two framings.

The resolution​

The two framings operate at different levels and are not in conflict.

LevelFramingWhere it lives
Device description (what the device does)ICD-11 probability distribution — the device ranks likelihoods across all visible dermatological ICD-11 categories. The physician sees a prioritised short-list, not a binary diagnosis.CER § Device description, intended purpose
Clinical evaluation (how we assess the device)Disease-category evidence — performance is assessed by clinical risk tier, with individual analysis for high-risk conditions and justified pooling for lower-risk categories.CER § Analysis of clinical data, acceptance criteria, sufficiency justification

The device's architecture and intended use remain as described: a general classifier that outputs probability distributions. The clinical evaluation demonstrates that this output is reliable by assessing performance where it matters most — using a risk-proportionate, tiered evidence structure.

Regulatory basis for this approach​

The following regulatory requirements and guidance documents support — and in some cases mandate — a condition-level or category-level evidence assessment:

RequirementSourceWhat it demands
Sensitivity/specificity for major clinical indicationsMEDDEV 2.7.1 Rev 4, Annex A7.3Diagnostic devices must report performance metrics "for major clinical indications" specifically — not only as a pooled aggregate.
Separate VCA per claimed outputMDCG 2020-1, § Valid Clinical Association"Each specific claimed output (diagnosis, severity grading, disease monitoring) requires separate VCA establishment." The device claims diagnostic support across ICD-11 categories; VCA must be established for representative categories, not only in aggregate.
Risk-based justification for poolingMDCG 2020-6, Appendix III (evidence hierarchy)Data pooled across conditions must have a risk-based justification for why pooling is appropriate. Without justification, pooled data is not acceptable for high-risk indications.
Narrow intended purpose if evidence insufficientMDCG 2020-6, § 6.5(e)If evidence is insufficient for an indication, the intended purpose must be narrowed — or the gap declared acceptable with PMCF.
Individual breakdown for high-risk conditionsBSI meeting, Nick (2026-03-25)"For higher-risk conditions (melanoma, malignancies): an individual breakdown of acceptance criteria and evidence. BSI will specifically audit these."
Consider removing under-supported indicationsBSI meeting, Nick (2026-03-25)"Consider whether all indications are equally well-supported by data — and whether some should be removed from the claims if data is insufficient."
CEAR Section E — performance evaluation per indicationMDCG 2020-13, § E.2The notified body assesses whether "the clinical performance endpoints are appropriate for each indication" — implying per-indication scrutiny.

The three-tier evidence structure​

Evidence is assessed at three tiers based on the clinical risk of misclassification:

Tier 1 — Malignant conditions (individual analysis)​

Clinical risk: Highest. Misclassification can delay cancer diagnosis, leading to disease progression and mortality.

Approach: Individual acceptance criteria per condition (melanoma) or per condition group (multiple malignant conditions). Dedicated studies provide primary evidence.

Evidence:

Condition/groupPrimary studySampleKey metric
MelanomaMC_EVCDAO_2019 (Cruces + Basurto)105 patients, 36 melanomaAUC 0.8482, Top-3 sensitivity 0.9032
Multiple malignant conditionsMC_EVCDAO_2019 + IDEI_2023 + DAO_O + DAO_PH + PH_2024 + SAN_2024Melanoma, BCC, SCC, actinic keratosis across 6 studiesAUC 0.8983 (MC_EVCDAO), AUC 0.9669 (PH prospective)

Regulatory alignment: This tier satisfies Nick's explicit requirement for individual breakdown of high-risk conditions, MEDDEV A7.3's requirement for per-indication sensitivity/specificity, and MDCG 2020-6's prohibition on unsupported pooling for high-risk indications.

Tier 2 — Rare diseases (grouped analysis)​

Clinical risk: Moderate-high. Rare diseases are frequently misdiagnosed; delayed diagnosis leads to prolonged suffering and inappropriate treatment. These conditions require specialist expertise that primary care physicians often lack.

Approach: Grouped analysis with dedicated acceptance criteria. The rare diseases subgroup is explicitly defined in the BI_2024 study protocol.

Evidence:

Conditions in subgroupStudySampleKey metric
GPP, acne conglobata, palmoplantar pustulosis, subcorneal pustular dermatosis, AGEP, pemphigus vulgarisBI_2024 (Boehringer Ingelheim)15 HCPs × 100 images = 1,449 evaluationsRare disease accuracy: +26.77% improvement (25.56% → 57.88%)
Pustular psoriasis, HSPH_2024 (Puerta de Hierro)9 PCPs × 30 imagesPustular psoriasis: +299.64% relative improvement; HS: +24.14%

Regulatory alignment: MDCG 2020-6 § 6.5(e) requires either sufficient evidence per indication or narrowing the intended purpose. This subgroup analysis demonstrates that the device significantly improves rare disease diagnosis — a distinct clinical benefit (9VW) with its own acceptance criteria (absolute accuracy >= 54%).

Tier 3 — General conditions (pooled with risk-based justification)​

Clinical risk: Lower. Misranking within these non-malignant categories leads to delayed or modified treatment, not mortality. The physician always makes the final clinical decision — the device is a decision-support tool, not a stand-alone diagnostic.

Approach: Pooled analysis across conditions with explicit risk-based justification. The pooled studies cover a representative sample of the epidemiological landscape, documented using a 7-category framework.

Risk-based justification for pooling:

  1. Comparable clinical consequence of misclassification. Within non-malignant, non-rare categories, the clinical consequence of an incorrect ranking is comparable: delayed or modified treatment. The physician always makes the final decision. This is fundamentally different from malignant conditions, where a missed diagnosis can be fatal.

  2. Device architecture supports pooling. The device outputs a probability distribution over all ICD-11 categories simultaneously. It does not make independent per-condition predictions — it ranks likelihoods across the full ICD-11 space. Assessing how well this ranking performs across the general dermatological spectrum is therefore a natural and valid evaluation approach.

  3. Representative sampling across epidemiological categories. The pooled studies include conditions from all 7 major epidemiological categories of dermatological disease, ensuring that the pooled metric is not biased toward any single category.

  4. Consistent architecture guarantees consistent capability. The uniform algorithm architecture (Vision Transformer) ensures that the device's technical capability to extract clinically relevant features from images is consistent across conditions. A model that performs well on inflammatory conditions uses the same feature extraction as one that performs well on infectious conditions — validating across the breadth provides high confidence.

Epidemiological framework: 7 categories of dermatological disease​

To demonstrate that the clinical evidence portfolio representatively covers the full spectrum of dermatological conditions, we adopt the following epidemiological categorisation, based on the Global Burden of Disease Study (Karimkhani et al., 2017) and related prevalence literature:

CategoryApproximate prevalenceDescription
Infectious diseases57%Fungal (34%), bacterial (23%), viral infections
Other conditions19%Acne, alopecia, urticaria, and other common conditions
Inflammatory diseases15%Psoriasis, atopic dermatitis, hidradenitis suppurativa, eczema
Malignant diseases5%Melanoma, BCC, SCC, actinic keratosis
Autoimmune diseases3%Lupus erythematosus, dermatomyositis, bullous diseases
Genodermatoses1%Epidermolysis bullosa, ichthyosis
Vascular diseases1%Haemangiomas, vascular malformations

Evidence coverage matrix​

The following matrix shows which disease categories are represented in each clinical investigation:

StudyInfectiousOtherInflammatoryMalignantAutoimmuneGenodermatosesVascular
BI_2024Impetigo, Tinea corporisAcne (×3 variants)GPP, dermatitis, psoriasis, HS, AGEP +4—Pemphigus vulgaris——
PH_2024——Psoriasis (×2), HS, urticariaMelanoma, BCC, actinic keratosis———
SAN_2024Herpes, tinea, onychomycosisAcne, alopecia, urticariaDermatitis, psoriasisMelanoma———
IDEI_2023—Androgenetic alopecia (96 pts)—Melanoma, BCC, SCC———
MC_EVCDAO_2019———Melanoma (36), BCC (13), actinic K.——Angioma (5), haemangioma, angiokeratoma
AIHS4_2025——HS (severity)————
COVIDX_2022Folliculitis, herpes, tineaAcne (67 pts), alopeciaPsoriasis, AD, HS, eczema, lichen planus, rosaceaMelanoma, BCC, SCC, actinic K.—Keratosis palmaris (3)?Haemangioma (14)
DAO_O_2022—AlopeciaPsoriasis (×3), eczema (×3), ADMelanoma (×4), BCC (×9), actinic K. (27)Bullous pemphigoid (5)—Spider telangiectasis, pyogenic granuloma
DAO_PH_2022Warts, molluscum, herpes—Psoriasis, AD, urticaria, HS, lichen planusBCC, SCC, melanoma——Angiomas
Coverage4 studies5 studies8 studies7 studies2 studies~1 study4 studies

Coverage assessment​

CategoryCoverage strengthAssessment
Infectious (57%)ModeratePresent in 4 studies with bacterial (impetigo), fungal (tinea, onychomycosis), and viral (herpes, warts, molluscum) conditions. Per-condition sample sizes are small (2–10 images in MRMC studies), but COVIDX includes folliculitis in a real-world setting. Coverage is representative though not deep.
Other (19%)Moderate-strongAcne is well-represented (67 patients in COVIDX alone, multiple MRMC studies). Androgenetic alopecia has dedicated evidence (IDEI with 96 patients, AIHS4 for severity). Urticaria represented in PH_2024 and SAN_2024.
Inflammatory (15%)StrongRepresented in 8 of 9 studies. Psoriasis (multiple subtypes), AD, HS, eczema, lichen planus, rosacea, and AGEP all covered. GPP has dedicated acceptance criteria (BI_2024 primary objective). HS has dedicated severity scoring (AIHS4_2025).
Malignant (5%)StrongDedicated study (MC_EVCDAO: 105 patients, 36 melanoma). Melanoma, BCC, SCC, and actinic keratosis across 7 studies. Individual acceptance criteria established. Tier 1 analysis.
Autoimmune (3%)WeakOnly pemphigus vulgaris (BI_2024, 5 images) and bullous pemphigoid (DAO_O, 5 cases). Declared acceptable gap — see below.
Genodermatoses (1%)Very weakOnly debatable keratosis palmaris in COVIDX. Declared acceptable gap — see below.
Vascular (1%)Thin but presentAngiomas and haemangiomas across 4 studies. Haemangioma well-represented in COVIDX (14 patients). Sufficient for a 1%-prevalence category.

Declared acceptable gaps​

Per MDCG 2020-6 § 6.5(e), when evidence is insufficient for an indication, the manufacturer must either narrow the intended purpose or declare the gap acceptable with justification and address it via PMCF.

We declare the following gaps as acceptable and do not narrow the intended purpose:

Gap A — Autoimmune diseases (3% prevalence)​

Gap: Evidence is limited to pemphigus vulgaris (5 images in BI_2024) and bullous pemphigoid (5 cases in DAO_O_2022). No dedicated study addresses autoimmune conditions as a group.

Why acceptable:

  • Autoimmune skin conditions represent only 3% of dermatological presentations.
  • The device's intended use is as a decision-support tool; the physician always makes the final diagnosis. For autoimmune conditions, which typically require serological confirmation beyond visual assessment, the device's role is triage and differential ranking, not definitive diagnosis.
  • The uniform Vision Transformer architecture means the model's feature extraction capability is not condition-specific. Performance demonstrated on inflammatory and other conditions (which share visual features with autoimmune presentations) provides supporting confidence.
  • No safety concern: misranking an autoimmune condition does not carry acute mortality risk comparable to malignancy.

PMCF activity: Prospective data collection on autoimmune conditions in real-world deployment, with per-condition accuracy tracking.

Gap B — Genodermatoses (1% prevalence)​

Gap: No study in the portfolio specifically addresses genetic skin disorders (epidermolysis bullosa, ichthyosis, etc.).

Why acceptable:

  • Genodermatoses represent approximately 1% of dermatological presentations.
  • These conditions are typically diagnosed through genetic testing and clinical history, not primarily through image-based assessment. The device's role is supportive (triage, differential ranking), not definitive.
  • The extreme rarity of these conditions makes prospective study recruitment impractical for pre-market evidence.
  • Post-market monitoring will capture any genodermatoses cases encountered in real-world use.

PMCF activity: Passive surveillance of genodermatoses cases through PMS/PMCF data collection. Active recruitment is not feasible given the 1% prevalence.

Where this decision affects the CER​

The following CER sections must be updated to reflect this disease categorisation framework:

1. Data Pooling Methodology (current CER § "Data Pooling Methodology")​

Current state: Generic statement that pooling is justified by "clinical comparability and homogeneity" with no risk-based reasoning.

Required update: Add the risk-based justification for pooling (the 4 points from the Tier 3 rationale above). Reference the 7-category epidemiological framework to demonstrate representative sampling. Replace the vague "homogeneity" claim with the explicit argument: comparable clinical consequence of misclassification within non-malignant categories + device architecture supports pooling + representative coverage demonstrated.

2. Clarification on "Multiple conditions" (current CER § "Clarification on Multiple conditions")​

Current state: States that "Multiple conditions" reflects "broad, representative inclusion aligned with diverse ICD-11 categories." This is too vague for BSI.

Required update: Replace with the 7-category framework. Show which categories are represented in which studies (the coverage matrix). Explain that "multiple conditions" encompasses the full epidemiological spectrum of visible dermatological disease, with evidence sampling from all 7 categories. This transforms an assertion into demonstrated coverage.

3. Indication Coverage (current CER § "Justification of Sufficiency of Clinical Evidence", bullet 4)​

Current state: References "anchor conditions" — malignancy detection, chronic inflammatory diseases, and rare dermatological conditions. This is the current 3-tier language.

Required update: Expand to reference the 7-category framework and the coverage matrix. Add the declared acceptable gaps (autoimmune, genodermatoses) with justification. Show that the 3 tiers (malignant → individual, rare → grouped, general → pooled) are a deliberate risk-proportionate evidence assessment strategy, not an omission.

4. Acceptance Criteria Derivation from State of the Art (current CER § "Acceptance Criteria Derivation from State of the Art")​

Current state: Acceptance criteria are presented by clinical domain (melanoma detection, diagnostic accuracy improvement, etc.) — which is partially aligned with the tiered approach but not explicitly linked to the disease categorisation rationale.

Required update: Add introductory text explaining that acceptance criteria follow the 3-tier structure. Tier 1 (malignant) has condition-specific thresholds derived from SotA. Tier 2 (rare) has grouped thresholds justified by the distinct clinical benefit (9VW). Tier 3 (general) uses pooled thresholds justified by the risk-based pooling rationale. This makes the link between categorisation and acceptance criteria explicit and auditable.

5. Need for more clinical evidence / Gaps (current CER § "Need for more clinical evidence")​

Current state: Declares 3 gaps (triage/prioritization, severity assessment, algorithmic stability). These are operational/performance gaps — not coverage gaps.

Required update: Add Gap A (autoimmune) and Gap B (genodermatoses) as declared acceptable evidence coverage gaps, with the justifications documented above. Link each to a specific PMCF activity. This satisfies MDCG 2020-6 § 6.5(e) and BSI's Item 6 requirement that PMCF activities be linked to identified gaps.

6. PMCF Plan​

Impact: Two new PMCF activities must be added to address Gap A and Gap B. These feed directly into Item 6 (PMCF plan), which requires each activity to be linked to a specific gap.

Relationship to formal BSI items​

BSI itemHow X-3 feeds into it
Item 2a (device description)The ICD-11 probability distribution framing remains the device description. No change to how the device is described — only to how the evidence is structured.
Item 2b (clinical benefits, SotA, acceptance criteria)Acceptance criteria now follow the 3-tier structure. High-risk conditions get individual criteria. Pooled criteria have explicit risk-based justification. The 7-category framework demonstrates representative coverage.
Item 3a (clinical data analysis)Clinical data analysis adopts the 3-tier structure. Per-study analyses reference which disease categories are covered. The coverage matrix becomes part of the sufficiency argument.
Item 3b (data sufficiency)The declared gaps (autoimmune, genodermatoses) are formally documented with justification. Sufficiency is argued positively for 5 of 7 categories and declared acceptable with PMCF for 2.
Item 6 (PMCF plan)Two new PMCF activities linked to the two declared gaps. This satisfies BSI's requirement that each PMCF activity be linked to a specific identified gap.

Decision status​

DecisionStatusOwner
Adopt 3-tier evidence structure (malignant → individual, rare → grouped, general → pooled)DecidedTeam (2026-03-28)
Use 7-category epidemiological framework as pooling justificationDecidedTeam (2026-03-28)
Declare autoimmune and genodermatoses as acceptable gapsDecidedTeam (2026-03-28)
Update CER § Data Pooling MethodologyTo doJordi
Update CER § Clarification on "Multiple conditions"To doJordi
Update CER § Indication CoverageTo doJordi
Update CER § Acceptance Criteria DerivationTo doJordi
Update CER § Need for more clinical evidence (add gaps A & B)To doJordi
Add PMCF activities for gaps A & BTo doJordi
Previous
Item 0: Background & Action Plan
Next
Item 0 — AI Context
  • The tension
  • The resolution
    • Regulatory basis for this approach
  • The three-tier evidence structure
    • Tier 1 — Malignant conditions (individual analysis)
    • Tier 2 — Rare diseases (grouped analysis)
    • Tier 3 — General conditions (pooled with risk-based justification)
  • Epidemiological framework: 7 categories of dermatological disease
    • Evidence coverage matrix
    • Coverage assessment
  • Declared acceptable gaps
    • Gap A — Autoimmune diseases (3% prevalence)
    • Gap B — Genodermatoses (1% prevalence)
  • Where this decision affects the CER
    • 1. Data Pooling Methodology (current CER § "Data Pooling Methodology")
    • 2. Clarification on "Multiple conditions" (current CER § "Clarification on Multiple conditions")
    • 3. Indication Coverage (current CER § "Justification of Sufficiency of Clinical Evidence", bullet 4)
    • 4. Acceptance Criteria Derivation from State of the Art (current CER § "Acceptance Criteria Derivation from State of the Art")
    • 5. Need for more clinical evidence / Gaps (current CER § "Need for more clinical evidence")
    • 6. PMCF Plan
  • Relationship to formal BSI items
  • Decision status
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)