Freeman 2020 — Algorithm-based smartphone apps for skin cancer risk: BMJ systematic review [BALANCING]
Citation
Freeman K, Dinnes J, Chuchu N, Takwoingi Y, Bayliss SE, Matin RN, et al. Algorithm based smartphone apps to assess risk of skin cancer in adults: systematic review of diagnostic accuracy studies. BMJ. 2020 Feb 10;368:m127. DOI: 10.1136/bmj.m127. PMID 32041693.
Study design and population
Cochrane-style systematic review (PROSPERO CRD42016033595); 9 studies evaluating 6 commercial algorithm-based smartphone apps for adult skin-cancer risk triage; QUADAS-2 risk-of-bias assessment.
Reported metrics
- SkinVision (3 studies; n = 267 lesions; 66 (pre)malignant): pooled sensitivity 80 % (95 % CI 63–92); specificity 78 % (95 % CI 67–87)
- Revised SkinVision (pigmented-only dataset): sensitivity 88 % (95 % CI 70–98); specificity 79 % (95 % CI 70–86)
- SkinScan: 0 % sensitivity for melanoma (n = 5)
- High risk of bias across primary studies; high unevaluable-image rates
Surrogate-to-outcome linkage
BALANCING reference. Quantifies the limit of current algorithm-based triage apps: heterogeneous performance, missed melanomas in some CE-marked products, and high unevaluable-image rates. Provides direct evidence that diagnostic-accuracy claims cannot be assumed generic across AI products — each device requires its own clinical-data package. Directly relevant to the MDR benefit–risk argument.
CRIT1–7 appraisal
| Criterion | Score | Justification |
|---|---|---|
| CRIT1 Relevance | 3 | Direct — systematic review of AI / app-based skin-cancer diagnostic accuracy. |
| CRIT2 Methodology | 3 | PROSPERO-registered; PRISMA; QUADAS-2 risk-of-bias assessment. |
| CRIT3 Reporting | 3 | Pooled sens/spec with 95 % CIs; per-app stratification. |
| CRIT4 Applicability | 3 | Directly addresses the intended-use concerns (community deployment, CE-marked products). |
| CRIT5 Evidence weight | 3 | Systematic review — highest tier. |
| CRIT6 Risk of bias | 2 | Constituent primary studies heterogeneous; clinician-recruited rather than user-recruited; high unevaluable rates. |
| CRIT7 Contribution | 3 | MANDATORY balancing reference — establishes the device-specific clinical-data requirement that the EU MDR and MDCG 2020-1 Pillar 2/3 enforce. |
Aggregate: very strong (as balancing reference).
Limitations and notes
Primary studies largely enriched-prevalence, clinician-recruited (not user-recruited); generalisation to consumer-deployment scenarios limited; not all CE-marked products evaluated.
Strength as anchor
Mandatory balancing reference. BSI Erin has historically flagged selective citation as a risk; inclusion of Freeman 2020 demonstrates transparent acknowledgement that "AI dermatology" is not homogeneous — each device must be evaluated on its own performance data. Supports the justification for our own CIP/CIR evidence stack over reliance on surrogate literature.