Freeman 2020 — Algorithm-based smartphone apps for skin cancer risk: BMJ systematic review [BALANCING]

Citation

Freeman K, Dinnes J, Chuchu N, Takwoingi Y, Bayliss SE, Matin RN, et al. Algorithm based smartphone apps to assess risk of skin cancer in adults: systematic review of diagnostic accuracy studies. BMJ. 2020 Feb 10;368:m127. DOI: 10.1136/bmj.m127. PMID 32041693.

Study design and population

Cochrane-style systematic review (PROSPERO CRD42016033595); 9 studies evaluating 6 commercial algorithm-based smartphone apps for adult skin-cancer risk triage; QUADAS-2 risk-of-bias assessment.

Reported metrics

SkinVision (3 studies; n = 267 lesions; 66 (pre)malignant): pooled sensitivity 80 % (95 % CI 63–92); specificity 78 % (95 % CI 67–87)
Revised SkinVision (pigmented-only dataset): sensitivity 88 % (95 % CI 70–98); specificity 79 % (95 % CI 70–86)
SkinScan: 0 % sensitivity for melanoma (n = 5)
High risk of bias across primary studies; high unevaluable-image rates

Surrogate-to-outcome linkage

BALANCING reference. Quantifies the limit of current algorithm-based triage apps: heterogeneous performance, missed melanomas in some CE-marked products, and high unevaluable-image rates. Provides direct evidence that diagnostic-accuracy claims cannot be assumed generic across AI products — each device requires its own clinical-data package. Directly relevant to the MDR benefit–risk argument.

CRIT1–7 appraisal

Criterion	Score	Justification
CRIT1 Relevance	3	Direct — systematic review of AI / app-based skin-cancer diagnostic accuracy.
CRIT2 Methodology	3	PROSPERO-registered; PRISMA; QUADAS-2 risk-of-bias assessment.
CRIT3 Reporting	3	Pooled sens/spec with 95 % CIs; per-app stratification.
CRIT4 Applicability	3	Directly addresses the intended-use concerns (community deployment, CE-marked products).
CRIT5 Evidence weight	3	Systematic review — highest tier.
CRIT6 Risk of bias	2	Constituent primary studies heterogeneous; clinician-recruited rather than user-recruited; high unevaluable rates.
CRIT7 Contribution	3	MANDATORY balancing reference — establishes the device-specific clinical-data requirement that the EU MDR and MDCG 2020-1 Pillar 2/3 enforce.

Aggregate: very strong (as balancing reference).

Limitations and notes

Primary studies largely enriched-prevalence, clinician-recruited (not user-recruited); generalisation to consumer-deployment scenarios limited; not all CE-marked products evaluated.

Strength as anchor

Mandatory balancing reference. BSI Erin has historically flagged selective citation as a risk; inclusion of Freeman 2020 demonstrates transparent acknowledgement that "AI dermatology" is not homogeneous — each device must be evaluated on its own performance data. Supports the justification for our own CIP/CIR evidence stack over reliance on surrogate literature.

Citation​

Study design and population​

Reported metrics​

Surrogate-to-outcome linkage​

CRIT1–7 appraisal​

Limitations and notes​

Strength as anchor​