Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health version 2.1 (Legacy MDD)
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
    • Technical Review
    • Clinical Review
      • Round 1
        • Item 0: Background & Action Plan
        • Item 1: CER Update Frequency
        • Item 2: Device Description & Claims
        • Item 3: Clinical Data
          • Request A: Clinical Data Analysis
            • Question
            • Research and planning
            • Response
            • Clinical Data Gap Analysis — Literature Research & Actions
            • Information_for_answers
          • Request B: Data Sufficiency Justification
          • Message for Saray: PMS data clarifications (updated 2026-04-07)
        • Item 4: Usability
        • Item 5: PMS Plan
        • Item 6: PMCF Plan
        • Item 7: Risk
        • task-3b2-3b3-legacy-rwe-study
        • task-3b4-mrmc-dark-phototypes
  • Pricing
  • Public tenders
  • BSI Non-Conformities
  • Clinical Review
  • Round 1
  • Item 3: Clinical Data
  • Request A: Clinical Data Analysis
  • Clinical Data Gap Analysis — Literature Research & Actions

Clinical Data Gap Analysis — Literature Research & Actions

Internal working document

Living working document created 2026-04-09. Records the gap analysis of the CER (R-TF-015-003), SotA (R-TF-015-011), and CEP (R-TF-015-001), and tracks the execution of every remediation action. Update this document as work progresses. Not included in the BSI response.

Task tracker​

IDActionTypePriorityStatus
T1Fix melanoma criterion inconsistency in CER (line 818 vs. derivation table)CER editP1⬜ Not started
T2Formally declare Fitzpatrick V–VI as acceptable gap per §6.5(e) in CERCER editP1⬜ Not started
T3Strengthen alopecia dermatologist sub-criteria justification in CERCER editP1⬜ Not started
T4Literature search A1: BCC/cSCC AI in non-specialist settingsSearchP2⬜ Not started
T5Literature search A2: IHS4 AI independent validationSearchP2⬜ Not started
T6Literature search A3: Teledermatology utility scale benchmarksSearchP2⬜ Not started
T7Re-read existing SotA high-weight articles for underused dataLiterature reviewP2⬜ Not started
T8Literature search B1: Fitzpatrick V–VI AI dermatologySearchP3⬜ Not started
T9Literature search B2: Pediatric AI dermatologySearchP3⬜ Not started
T10Literature search B3: Severity Pillar 3 real-world clinical studiesSearchP3⬜ Not started
T11Literature search C1: Autoimmune skin disease AI detectionSearchP4⬜ Not started
T12Literature search C2: UAS inter-rater benchmarksSearchP4⬜ Not started

T1 — Fix melanoma criterion inconsistency​

Problem: CER line 818 says "Met: AUC >= 0.80 for melanoma detection achieved". The acceptance criteria derivation table (line 2008) states AUC >= 0.85, Sensitivity >= 0.93, Specificity >= 0.80. MC_EVCDAO_2019 achieved AUC 0.8482 (95% CI: 0.7629–0.9222) and sensitivity 73.79%. BSI will flag this inconsistency.

Resolution path:

The binding 7GH sub-criterion (c) is the aggregate malignancy AUC ≥ 0.90 (line 2048), which is met at 0.97. The derivation table criterion (AUC ≥ 0.85) is a condition-level benchmark. The fix is:

  1. Update line 818: remove the "≥ 0.80" inconsistency; state that the individual melanoma AUC criterion (≥ 0.85) is met at 0.8482, and that the binding 7GH sub-criterion (c) — aggregate malignancy AUC ≥ 0.90 — is met at 0.97 across all malignancy studies.
  2. Verify the derivation table at line 2008 is consistent with this framing (AUC ≥ 0.85 as individual threshold, not the aggregate 7GH criterion).

Files to edit:

  • R-TF-015-003-Clinical-Evaluation-Report.mdx line 818

Done when: Line 818 no longer says "≥ 0.80"; the text correctly states the individual melanoma threshold and its achieved value, and cross-references the 7GH aggregate criterion.


T2 — Formally declare Fitzpatrick V–VI as acceptable gap per §6.5(e)​

Problem: The CER currently mentions phototype V–VI underrepresentation only as a "PMCF monitoring priority." This is weaker than the §6.5(e) treatment given to autoimmune diseases and genodermatoses. BSI may ask why phototype V–VI is not declared as an acceptable gap with the same rigour.

Resolution path:

Option A (preferred if literature search T8 finds supporting data): cite published evidence showing that AI dermatology tools perform adequately in Fitzpatrick V–VI in external studies.

Option B (if no literature found): Add a formal §6.5(e) acceptable gap declaration for Fitzpatrick V–VI in the CER, structured identically to the autoimmune and genodermatoses gap declarations, with these justification elements:

  • (a) The primary deployment context (Spain) has low phototype V–VI prevalence, producing inherent under-recruitment.
  • (b) The ASCORAD_2022 study explicitly tested the device on Fitzpatrick IV–VI images (112 images), demonstrating that the algorithm architecture handles pigmentation variation.
  • (c) The Vision Transformer architecture assesses relative lesion intensity, not absolute pixel values, reducing sensitivity to skin tone compared to pixel-classification approaches.
  • (d) PMCF activity to monitor performance across phototypes is already planned.

Files to edit:

  • R-TF-015-003-Clinical-Evaluation-Report.mdx — "Declared acceptable gaps in indication coverage" section (around line 1951) and "Need for more clinical evidence" section

Done when: Fitzpatrick V–VI underrepresentation is formally declared as an acceptable gap with §6.5(e) citation, justification, and PMCF linkage, either supported by literature or on standalone grounds.


T3 — Strengthen alopecia dermatologist sub-criteria justification​

Problem: CER lines 1833–1834 show two sub-criteria marked ❌ for the dermatologist cohort subset:

Sub-criterionThresholdResult
Correlation [Dermatologists]≥ 0.50.47
Kappa [Dermatologists]≥ 0.60.3297

The existing note explains this is a skewed severity distribution. BSI will scrutinize this.

Resolution path:

  1. Clarify in the CER that the pre-specified primary endpoint is the all-HCP pooled analysis (correlation 0.77, Kappa 0.74 — both met). The per-HCP-tier sub-analysis was exploratory.
  2. Cite the IDEI_2023 CIR documentation of the severity distribution skew in the dermatologist subset.
  3. Note that dermatologists in IDEI_2023 were assessing a private clinic population enriched for moderate-to-severe conditions; the limited range of severity scores in that subset reduces the reliability of agreement coefficients (a known methodological artifact for range-restricted data, consistent with Cohen 1960 and Landis & Koch 1977).

Files to edit:

  • R-TF-015-003-Clinical-Evaluation-Report.mdx — around lines 1833–1834

Done when: The ❌ sub-criteria have a clear, well-justified prose explanation that a BSI auditor would find credible.


T4 — Literature search A1: BCC/cSCC AI in non-specialist settings​

Purpose: Provide SotA context for NMSC_2025's specialist-setting result (80% malignancy prevalence, H&N surgery clinic). Potentially derive individual acceptance criteria for BCC and cSCC in general practice settings.

PubMed search string:

("basal cell carcinoma" OR "squamous cell carcinoma" OR "non-melanoma skin cancer" OR "NMSC")
AND ("artificial intelligence" OR "deep learning" OR "machine learning" OR "convolutional neural network")
AND ("primary care" OR "general practice" OR "general dermatology" OR "smartphone" OR "mobile")
AND ("diagnostic accuracy" OR "sensitivity" OR "specificity" OR "AUC" OR "area under the curve")

Period: 2018–2025. Species: Humans. Language: English. Article types: prospective studies, RCTs, systematic reviews, meta-analyses.

Qualifying criteria: Must report AUC or sensitivity/specificity for BCC or cSCC detection in a non-specialist (primary care or general dermatology) setting, with histological confirmation as reference standard.

What to do with qualifying papers:

  • Score with CRIT1-7 (for SotA corpus) — include if score ≥ 4
  • If score ≥ 4: add to SotA malignancy NMSC section; add to CER acceptance criteria derivation table (BCC/SCC individual benchmarks); add to NMSC_2025 appraisal prose as contextualising evidence; update response item 5

Search findings​

Record results here after search execution.

ReferenceSettingBCC resultcSCC resultEligible?Notes

T5 — Literature search A2: IHS4 AI independent validation​

Purpose: Corroborate the barely-met ICC criterion (0.727 vs ≥ 0.70) with external independent evidence.

PubMed search string:

("hidradenitis suppurativa" OR "acne inversa")
AND ("IHS4" OR "International Hidradenitis Suppurativa Severity Score" OR "severity score")
AND ("artificial intelligence" OR "deep learning" OR "machine learning" OR "automatic" OR "automated" OR "computer vision")

Period: 2022–2025 (post-AIHS4_2023).

Qualifying criteria: Peer-reviewed validation of AI or automated IHS4 scoring; uses clinical or atlas images; reports ICC or equivalent agreement metric.

What to do with qualifying papers:

  • Add to SotA severity section; add row to CER "Analysis of published severity validation studies" summary table; update the 5RB evidence sufficiency prose; update response item 7 (QUADAS-2 or MINORS depending on study design)

Search findings​

Record results here after search execution.

ReferenceSampleICC or metricEligible?Notes

T6 — Literature search A3: Teledermatology utility scale benchmarks​

Purpose: Anchor the COVIDX_EVCDAO_2022 acceptance criterion (Clinical Utility Score ≥ 8) with published literature establishing this threshold.

PubMed search string:

("teledermatology" OR "store-and-forward dermatology" OR "remote dermatology" OR "digital dermatology")
AND ("clinical utility" OR "utility score" OR "usability" OR "clinician satisfaction" OR "acceptance")
AND ("scale" OR "score" OR "questionnaire" OR "threshold" OR "benchmark")

Period: 2005–2025.

Qualifying criteria: Papers reporting a clinical utility or acceptance scale for teledermatology tools with published score thresholds or interpretation ranges.

What to do with qualifying papers:

  • Add to CER acceptance criteria derivation rationale for 3KX remote care criterion; add to COVIDX_EVCDAO_2022 prose (contextualising the 7.66 vs ≥ 8 result)

Search findings​

Record results here after search execution.

ReferenceScale usedThresholdEligible?Notes

T7 — Re-examine existing high-weight SotA articles​

Purpose: Extract underused data from already-appraised articles that may address BCC/SCC gaps, condition-level accuracy for gap areas, or dark skin/pediatric subgroups.

For each article below, read the full text and answer:

  1. Does it report BCC or cSCC individual accuracy in non-specialist settings?
  2. Does it include Fitzpatrick V–VI subgroup data?
  3. Does it include pediatric subgroup data?
  4. Does it contain autoimmune disease classification data?
  5. Does it contain clinical (non-atlas) severity assessment data?
ArticleWeightReview targetFindings
Chen et al. 2024 (JAMA Dermatology)10/10BCC/SCC primary care; PPV by malignancy type
Krakowski et al. 202410/10Condition-level accuracy; any NMSC or severity subgroup
Gregoor et al. 2023 NPJ9.5/10Multi-condition scope; autoimmune, NMSC, rare diseases
Goldfarb et al. 20219.5/10IHS4 real-world clinical data (already used for ICC baseline)
Ferris et al. 20259/10Condition-level accuracy; any gap area
Marsden et al. 20249/10Multi-condition; dark skin or pediatric subgroups
Sangers et al. 20229/10Real-world NMSC; dark skin
Tepedino et al. 20248/10BCC/SCC non-specialist setting
Barata et al. 20237.5/10BCC/SCC individual metrics
Jaklitsch et al. 20258/10Competitor device BCC/SCC general settings
Jaklitsch et al. 20237.5/10Competitor device BCC/SCC general settings

T8 — Literature search B1: AI dermatology in Fitzpatrick V–VI​

PubMed search string:

("dermatology" OR "skin disease" OR "skin lesion")
AND ("artificial intelligence" OR "deep learning" OR "machine learning")
AND ("dark skin" OR "Fitzpatrick" OR "phototype V" OR "phototype VI" OR "skin of color" OR "Black skin" OR "sub-Saharan" OR "African")
AND ("diagnostic accuracy" OR "sensitivity" OR "performance" OR "validation")

Period: 2018–2025.

What to do with qualifying papers: Add to SotA; add to CER phototype representativeness section; use to either strengthen or formally declare §6.5(e) acceptable gap for Fitzpatrick V–VI (T2).

Search findings​

Record results here after search execution.


T9 — Literature search B2: Pediatric AI dermatology​

PubMed search string:

("pediatric" OR "paediatric" OR "children" OR "child" OR "infant" OR "neonate")
AND ("dermatology" OR "skin disease" OR "skin lesion")
AND ("artificial intelligence" OR "deep learning" OR "machine learning")
AND ("diagnostic accuracy" OR "clinical performance" OR "validation")

Period: 2015–2025.

What to do with qualifying papers: Add to SotA; add to CER "Representativeness" demographic section to contextualise the 6.3% pediatric proportion.

Search findings​

Record results here after search execution.


T10 — Literature search B3: Severity assessment Pillar 3 (real-world clinical performance)​

Purpose: Any published study using AI severity assessment in real clinical encounters (not atlas images) to partially bridge Gap 2 before PMCF results are available.

PubMed search string:

("psoriasis" OR "atopic dermatitis" OR "hidradenitis suppurativa" OR "urticaria" OR "eczema")
AND ("PASI" OR "SCORAD" OR "UAS" OR "IHS4" OR "severity score")
AND ("artificial intelligence" OR "smartphone" OR "automated" OR "deep learning")
AND ("clinical" OR "prospective" OR "real-world")
AND ("ICC" OR "intraclass" OR "Kappa" OR "agreement" OR "concordance")

Period: 2018–2025. Target: prospective clinical (non-atlas) studies with HCP-captured images.

What to do with qualifying papers: Add to CER "Appraisal of published severity validation literature"; update Gap 2 declaration to cite supporting Pillar 3 literature where found.

Search findings​

Record results here after search execution.


T11 — Literature search C1: Autoimmune skin disease AI detection​

PubMed search string:

("pemphigus" OR "bullous pemphigoid" OR "lupus erythematosus" OR "dermatomyositis" OR "scleroderma")
AND ("artificial intelligence" OR "deep learning" OR "image classification" OR "machine learning")
AND ("diagnostic accuracy" OR "sensitivity" OR "classification")

Period: 2015–2025.

What to do with qualifying papers: Add to SotA; strengthen Gap 4 acceptable gap justification by showing even the SotA lacks strong AI evidence for autoimmune visual diagnosis.

Search findings​

Record results here after search execution.


T12 — Literature search C2: UAS inter-rater benchmarks​

PubMed search string:

("urticaria activity score" OR "UAS" OR "urticaria severity")
AND ("inter-rater" OR "inter-observer" OR "agreement" OR "reliability" OR "Krippendorff" OR "Kappa")

Period: 2010–2025.

What to do with qualifying papers: Cite in CER to contextualise the barely-met Krippendorff α = 0.603 for UAS severity.

Search findings​

Record results here after search execution.


Background: gap analysis rationale​

Declared gaps — severity ratings​

These five gaps are already documented in the CER ("Need for more clinical evidence"). The severity ratings reflect BSI Round 2 exposure:

GapDescriptionBSI exposure
Gap 1Triage operational impact — post-marketLow
Gap 2Severity assessment — all 4 conditions are Pillar 2 only; AIHS4_2025 = 2 patientsHigh
Gap 3AI performance stability — post-market monitoringLow
Gap 4Autoimmune diseases (3%) — bullous pemphigoid 5 casesLow
Gap 5Genodermatoses (1%) — no evidenceLow

Undeclared weaknesses summary​

CodeIssueAddressable by
W1Melanoma criterion inconsistency (≥ 0.80 vs ≥ 0.85)CER edit (T1)
W2Alopecia dermatologist sub-criteria unmet (correlation 0.47, Kappa 0.33)CER edit (T3)
W3IHS4 ICC 0.727 on 2 patientsLiterature (T5)
W4UAS severity α = 0.603 — 1 decimal point marginLiterature (T12)
W5NMSC_2025 — specialist only, no SotA for general-setting NMSCLiterature (T4)
W6COVIDX utility ≥ 8 threshold — no SotA anchorLiterature (T6)
W7Fitzpatrick V–VI — not declared as §6.5(e) acceptable gapCER edit (T2) ± literature (T8)

What only PMCF can fix​

  • Gap 2 (severity Pillar 3): Requires prospective clinical studies with device-captured images. PMCF B.1–B.5 (ALADIN, AVASI, condition-specific). No published literature on other devices substitutes.
  • Gap 1 (triage operational impact): Requires real-world deployment data post-CE marking.
  • AIHS4_2025 sample size: Must grow to n ≥ 100 (PMCF B.1).
Previous
Response
Next
Information_for_answers
  • Task tracker
  • T1 — Fix melanoma criterion inconsistency
  • T2 — Formally declare Fitzpatrick V–VI as acceptable gap per §6.5(e)
  • T3 — Strengthen alopecia dermatologist sub-criteria justification
  • T4 — Literature search A1: BCC/cSCC AI in non-specialist settings
    • Search findings
  • T5 — Literature search A2: IHS4 AI independent validation
    • Search findings
  • T6 — Literature search A3: Teledermatology utility scale benchmarks
    • Search findings
  • T7 — Re-examine existing high-weight SotA articles
  • T8 — Literature search B1: AI dermatology in Fitzpatrick V–VI
    • Search findings
  • T9 — Literature search B2: Pediatric AI dermatology
    • Search findings
  • T10 — Literature search B3: Severity assessment Pillar 3 (real-world clinical performance)
    • Search findings
  • T11 — Literature search C1: Autoimmune skin disease AI detection
    • Search findings
  • T12 — Literature search C2: UAS inter-rater benchmarks
    • Search findings
  • Background: gap analysis rationale
    • Declared gaps — severity ratings
    • Undeclared weaknesses summary
    • What only PMCF can fix
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)