Clinical Data Gap Analysis — Literature Research & Actions
Living working document created 2026-04-09. Records the gap analysis of the CER (R-TF-015-003), SotA (R-TF-015-011), and CEP (R-TF-015-001), and tracks the execution of every remediation action. Update this document as work progresses. Not included in the BSI response.
Task tracker
| ID | Action | Type | Priority | Status |
|---|---|---|---|---|
| T1 | Fix melanoma criterion inconsistency in CER (line 818 vs. derivation table) | CER edit | P1 | ⬜ Not started |
| T2 | Formally declare Fitzpatrick V–VI as acceptable gap per §6.5(e) in CER | CER edit | P1 | ⬜ Not started |
| T3 | Strengthen alopecia dermatologist sub-criteria justification in CER | CER edit | P1 | ⬜ Not started |
| T4 | Literature search A1: BCC/cSCC AI in non-specialist settings | Search | P2 | ⬜ Not started |
| T5 | Literature search A2: IHS4 AI independent validation | Search | P2 | ⬜ Not started |
| T6 | Literature search A3: Teledermatology utility scale benchmarks | Search | P2 | ⬜ Not started |
| T7 | Re-read existing SotA high-weight articles for underused data | Literature review | P2 | ⬜ Not started |
| T8 | Literature search B1: Fitzpatrick V–VI AI dermatology | Search | P3 | ⬜ Not started |
| T9 | Literature search B2: Pediatric AI dermatology | Search | P3 | ⬜ Not started |
| T10 | Literature search B3: Severity Pillar 3 real-world clinical studies | Search | P3 | ⬜ Not started |
| T11 | Literature search C1: Autoimmune skin disease AI detection | Search | P4 | ⬜ Not started |
| T12 | Literature search C2: UAS inter-rater benchmarks | Search | P4 | ⬜ Not started |
T1 — Fix melanoma criterion inconsistency
Problem: CER line 818 says "Met: AUC >= 0.80 for melanoma detection achieved". The acceptance criteria derivation table (line 2008) states AUC >= 0.85, Sensitivity >= 0.93, Specificity >= 0.80. MC_EVCDAO_2019 achieved AUC 0.8482 (95% CI: 0.7629–0.9222) and sensitivity 73.79%. BSI will flag this inconsistency.
Resolution path:
The binding 7GH sub-criterion (c) is the aggregate malignancy AUC ≥ 0.90 (line 2048), which is met at 0.97. The derivation table criterion (AUC ≥ 0.85) is a condition-level benchmark. The fix is:
- Update line 818: remove the "≥ 0.80" inconsistency; state that the individual melanoma AUC criterion (≥ 0.85) is met at 0.8482, and that the binding 7GH sub-criterion (c) — aggregate malignancy AUC ≥ 0.90 — is met at 0.97 across all malignancy studies.
- Verify the derivation table at line 2008 is consistent with this framing (AUC ≥ 0.85 as individual threshold, not the aggregate 7GH criterion).
Files to edit:
R-TF-015-003-Clinical-Evaluation-Report.mdxline 818
Done when: Line 818 no longer says "≥ 0.80"; the text correctly states the individual melanoma threshold and its achieved value, and cross-references the 7GH aggregate criterion.
T2 — Formally declare Fitzpatrick V–VI as acceptable gap per §6.5(e)
Problem: The CER currently mentions phototype V–VI underrepresentation only as a "PMCF monitoring priority." This is weaker than the §6.5(e) treatment given to autoimmune diseases and genodermatoses. BSI may ask why phototype V–VI is not declared as an acceptable gap with the same rigour.
Resolution path:
Option A (preferred if literature search T8 finds supporting data): cite published evidence showing that AI dermatology tools perform adequately in Fitzpatrick V–VI in external studies.
Option B (if no literature found): Add a formal §6.5(e) acceptable gap declaration for Fitzpatrick V–VI in the CER, structured identically to the autoimmune and genodermatoses gap declarations, with these justification elements:
- (a) The primary deployment context (Spain) has low phototype V–VI prevalence, producing inherent under-recruitment.
- (b) The ASCORAD_2022 study explicitly tested the device on Fitzpatrick IV–VI images (112 images), demonstrating that the algorithm architecture handles pigmentation variation.
- (c) The Vision Transformer architecture assesses relative lesion intensity, not absolute pixel values, reducing sensitivity to skin tone compared to pixel-classification approaches.
- (d) PMCF activity to monitor performance across phototypes is already planned.
Files to edit:
R-TF-015-003-Clinical-Evaluation-Report.mdx— "Declared acceptable gaps in indication coverage" section (around line 1951) and "Need for more clinical evidence" section
Done when: Fitzpatrick V–VI underrepresentation is formally declared as an acceptable gap with §6.5(e) citation, justification, and PMCF linkage, either supported by literature or on standalone grounds.
T3 — Strengthen alopecia dermatologist sub-criteria justification
Problem: CER lines 1833–1834 show two sub-criteria marked ❌ for the dermatologist cohort subset:
| Sub-criterion | Threshold | Result |
|---|---|---|
| Correlation [Dermatologists] | ≥ 0.5 | 0.47 |
| Kappa [Dermatologists] | ≥ 0.6 | 0.3297 |
The existing note explains this is a skewed severity distribution. BSI will scrutinize this.
Resolution path:
- Clarify in the CER that the pre-specified primary endpoint is the all-HCP pooled analysis (correlation 0.77, Kappa 0.74 — both met). The per-HCP-tier sub-analysis was exploratory.
- Cite the IDEI_2023 CIR documentation of the severity distribution skew in the dermatologist subset.
- Note that dermatologists in IDEI_2023 were assessing a private clinic population enriched for moderate-to-severe conditions; the limited range of severity scores in that subset reduces the reliability of agreement coefficients (a known methodological artifact for range-restricted data, consistent with Cohen 1960 and Landis & Koch 1977).
Files to edit:
R-TF-015-003-Clinical-Evaluation-Report.mdx— around lines 1833–1834
Done when: The ❌ sub-criteria have a clear, well-justified prose explanation that a BSI auditor would find credible.
T4 — Literature search A1: BCC/cSCC AI in non-specialist settings
Purpose: Provide SotA context for NMSC_2025's specialist-setting result (80% malignancy prevalence, H&N surgery clinic). Potentially derive individual acceptance criteria for BCC and cSCC in general practice settings.
PubMed search string:
("basal cell carcinoma" OR "squamous cell carcinoma" OR "non-melanoma skin cancer" OR "NMSC")
AND ("artificial intelligence" OR "deep learning" OR "machine learning" OR "convolutional neural network")
AND ("primary care" OR "general practice" OR "general dermatology" OR "smartphone" OR "mobile")
AND ("diagnostic accuracy" OR "sensitivity" OR "specificity" OR "AUC" OR "area under the curve")
Period: 2018–2025. Species: Humans. Language: English. Article types: prospective studies, RCTs, systematic reviews, meta-analyses.
Qualifying criteria: Must report AUC or sensitivity/specificity for BCC or cSCC detection in a non-specialist (primary care or general dermatology) setting, with histological confirmation as reference standard.
What to do with qualifying papers:
- Score with CRIT1-7 (for SotA corpus) — include if score ≥ 4
- If score ≥ 4: add to SotA malignancy NMSC section; add to CER acceptance criteria derivation table (BCC/SCC individual benchmarks); add to NMSC_2025 appraisal prose as contextualising evidence; update response item 5
Search findings
Record results here after search execution.
| Reference | Setting | BCC result | cSCC result | Eligible? | Notes |
|---|---|---|---|---|---|
T5 — Literature search A2: IHS4 AI independent validation
Purpose: Corroborate the barely-met ICC criterion (0.727 vs ≥ 0.70) with external independent evidence.
PubMed search string:
("hidradenitis suppurativa" OR "acne inversa")
AND ("IHS4" OR "International Hidradenitis Suppurativa Severity Score" OR "severity score")
AND ("artificial intelligence" OR "deep learning" OR "machine learning" OR "automatic" OR "automated" OR "computer vision")
Period: 2022–2025 (post-AIHS4_2023).
Qualifying criteria: Peer-reviewed validation of AI or automated IHS4 scoring; uses clinical or atlas images; reports ICC or equivalent agreement metric.
What to do with qualifying papers:
- Add to SotA severity section; add row to CER "Analysis of published severity validation studies" summary table; update the 5RB evidence sufficiency prose; update response item 7 (QUADAS-2 or MINORS depending on study design)
Search findings
Record results here after search execution.
| Reference | Sample | ICC or metric | Eligible? | Notes |
|---|---|---|---|---|
T6 — Literature search A3: Teledermatology utility scale benchmarks
Purpose: Anchor the COVIDX_EVCDAO_2022 acceptance criterion (Clinical Utility Score ≥ 8) with published literature establishing this threshold.
PubMed search string:
("teledermatology" OR "store-and-forward dermatology" OR "remote dermatology" OR "digital dermatology")
AND ("clinical utility" OR "utility score" OR "usability" OR "clinician satisfaction" OR "acceptance")
AND ("scale" OR "score" OR "questionnaire" OR "threshold" OR "benchmark")
Period: 2005–2025.
Qualifying criteria: Papers reporting a clinical utility or acceptance scale for teledermatology tools with published score thresholds or interpretation ranges.
What to do with qualifying papers:
- Add to CER acceptance criteria derivation rationale for 3KX remote care criterion; add to COVIDX_EVCDAO_2022 prose (contextualising the 7.66 vs ≥ 8 result)
Search findings
Record results here after search execution.
| Reference | Scale used | Threshold | Eligible? | Notes |
|---|---|---|---|---|
T7 — Re-examine existing high-weight SotA articles
Purpose: Extract underused data from already-appraised articles that may address BCC/SCC gaps, condition-level accuracy for gap areas, or dark skin/pediatric subgroups.
For each article below, read the full text and answer:
- Does it report BCC or cSCC individual accuracy in non-specialist settings?
- Does it include Fitzpatrick V–VI subgroup data?
- Does it include pediatric subgroup data?
- Does it contain autoimmune disease classification data?
- Does it contain clinical (non-atlas) severity assessment data?
| Article | Weight | Review target | Findings |
|---|---|---|---|
| Chen et al. 2024 (JAMA Dermatology) | 10/10 | BCC/SCC primary care; PPV by malignancy type | |
| Krakowski et al. 2024 | 10/10 | Condition-level accuracy; any NMSC or severity subgroup | |
| Gregoor et al. 2023 NPJ | 9.5/10 | Multi-condition scope; autoimmune, NMSC, rare diseases | |
| Goldfarb et al. 2021 | 9.5/10 | IHS4 real-world clinical data (already used for ICC baseline) | |
| Ferris et al. 2025 | 9/10 | Condition-level accuracy; any gap area | |
| Marsden et al. 2024 | 9/10 | Multi-condition; dark skin or pediatric subgroups | |
| Sangers et al. 2022 | 9/10 | Real-world NMSC; dark skin | |
| Tepedino et al. 2024 | 8/10 | BCC/SCC non-specialist setting | |
| Barata et al. 2023 | 7.5/10 | BCC/SCC individual metrics | |
| Jaklitsch et al. 2025 | 8/10 | Competitor device BCC/SCC general settings | |
| Jaklitsch et al. 2023 | 7.5/10 | Competitor device BCC/SCC general settings |
T8 — Literature search B1: AI dermatology in Fitzpatrick V–VI
PubMed search string:
("dermatology" OR "skin disease" OR "skin lesion")
AND ("artificial intelligence" OR "deep learning" OR "machine learning")
AND ("dark skin" OR "Fitzpatrick" OR "phototype V" OR "phototype VI" OR "skin of color" OR "Black skin" OR "sub-Saharan" OR "African")
AND ("diagnostic accuracy" OR "sensitivity" OR "performance" OR "validation")
Period: 2018–2025.
What to do with qualifying papers: Add to SotA; add to CER phototype representativeness section; use to either strengthen or formally declare §6.5(e) acceptable gap for Fitzpatrick V–VI (T2).
Search findings
Record results here after search execution.
T9 — Literature search B2: Pediatric AI dermatology
PubMed search string:
("pediatric" OR "paediatric" OR "children" OR "child" OR "infant" OR "neonate")
AND ("dermatology" OR "skin disease" OR "skin lesion")
AND ("artificial intelligence" OR "deep learning" OR "machine learning")
AND ("diagnostic accuracy" OR "clinical performance" OR "validation")
Period: 2015–2025.
What to do with qualifying papers: Add to SotA; add to CER "Representativeness" demographic section to contextualise the 6.3% pediatric proportion.
Search findings
Record results here after search execution.
T10 — Literature search B3: Severity assessment Pillar 3 (real-world clinical performance)
Purpose: Any published study using AI severity assessment in real clinical encounters (not atlas images) to partially bridge Gap 2 before PMCF results are available.
PubMed search string:
("psoriasis" OR "atopic dermatitis" OR "hidradenitis suppurativa" OR "urticaria" OR "eczema")
AND ("PASI" OR "SCORAD" OR "UAS" OR "IHS4" OR "severity score")
AND ("artificial intelligence" OR "smartphone" OR "automated" OR "deep learning")
AND ("clinical" OR "prospective" OR "real-world")
AND ("ICC" OR "intraclass" OR "Kappa" OR "agreement" OR "concordance")
Period: 2018–2025. Target: prospective clinical (non-atlas) studies with HCP-captured images.
What to do with qualifying papers: Add to CER "Appraisal of published severity validation literature"; update Gap 2 declaration to cite supporting Pillar 3 literature where found.
Search findings
Record results here after search execution.
T11 — Literature search C1: Autoimmune skin disease AI detection
PubMed search string:
("pemphigus" OR "bullous pemphigoid" OR "lupus erythematosus" OR "dermatomyositis" OR "scleroderma")
AND ("artificial intelligence" OR "deep learning" OR "image classification" OR "machine learning")
AND ("diagnostic accuracy" OR "sensitivity" OR "classification")
Period: 2015–2025.
What to do with qualifying papers: Add to SotA; strengthen Gap 4 acceptable gap justification by showing even the SotA lacks strong AI evidence for autoimmune visual diagnosis.
Search findings
Record results here after search execution.