R-TF-015-011 State of the Art Legit.Health Plus

Objectives and Scope

Scope

This state-of-the-art document is established in the framework of the clinical evaluation of the Legit.Health Plus medical device (hereinafter, "the device"). Therefore, it aims to specify the clinical background and current knowledge and to establish the state of the art for the current clinical practice and medical devices used in dermatology.

Objectives

The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures, enhancing efficiency and accuracy of care delivery, by providing:

an interpretative distribution representation of possible International Classification of Diseases (ICD) categories that might be represented in the pixels content of the image,
quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others.

Therefore, the following needs to be discussed:

The basics of clinical workflow in dermatology (medical care in primary care, referral to dermatology or monitoring in primary care, consultation in dermatology).
Use of AI-powered medical devices in diagnostic support in dermatological clinical practice. (diagnostic support).
Analysis of similar devices.
Expected use, safety, performance, and benefits of such software.

Applicable standards and guidelines

The clinical evaluation of the device will be performed according to the relevant legal framework and following the applicable and established standards described in the following table.

Identification of the Standard	Domain	Compliance information	Description of deviations	Evidence
ISO 13485:2016	Medical devices - Quality management systems. Requirements for regulatory purposes	Full application		BSI Certification ISO 13485
IEC 62304:2006/A1:2015	Medical device software - Software life cycle processes	Full application		R-TF-001-005 List of applicable standards and regulations
IEC 82304-1:2016	Health software – Part 1: General requirements for product safety	Full application		R-TF-001-005 List of applicable standards and regulations
ISO 14155:2020	Clinical Investigation of medical devices for human subjects - Good clinical practice	Full application		R-TF-001-005 List of applicable standards and regulations
ISO 14791:2019	Medical devices - Application of risk management to medical devices	Full application		R-TF-001-005 List of applicable standards and regulations
ISO 15223-1:2021	Medical devices - Symbols to be used with medical device labels, labelling and information to be supplied	Full application		R-TF-001-005 List of applicable standards and regulations
ISO 24791-2/2020-06	Medical devices - Guidance on the application of ISO 14971	Full application		R-TF-001-005 List of applicable standards and regulations
ISO 62366-1:2015/A1:2020	Medical devices - Part 1: Application of usability engineering to medical devices	Full application		R-TF-001-005 List of applicable standards and regulations
IEC 81001-5-1:2021	Health software and health IT systems safety, effectiveness and security — Part 5-1: Security — Activities in the product life cycle	Full application		R-TF-001-005 List of applicable standards and regulations
ISO 27001:2022	Information security, cybersecurity and privacy protection — Information security management systems — Requirements	Partial application	We comply only with the applicable part of the standard	R-TF-001-005 List of applicable standards and regulations
ISO 27002:2022	Information security, cybersecurity and privacy protection — Information security controls	Partial application	We comply only with the applicable part of the standard	R-TF-001-005 List of applicable standards and regulations
FDA GMLP 2021	Good machine learning practice for MD development: guiding principles	Full application		R-TF-001-005 List of applicable standards and regulations
FDA AI/ML Framework 2019	Proposed regulatory framework for modifications to AI/ML-based SaMD	Full application		R-TF-001-005 List of applicable standards and regulations

A literature search of guidelines will be performed in Google and PubMed, searching the following terms: ICD-11 disease of skin guideline, in order to find medical guidelines related to ICD-11 Classification of Dermatological Diseases.

Literature Search

Literature Search Plan

Literature Search Strategy

The bibliographic search for the state of the art was done according to the EU regulation 2017/745 requirements and following the guidelines MEDDEV 2.7/1 revision 4 June 2016. The search for relevant publications started with the definition of the criteria regarding the population of patients, the clinical indication, the specificities of the product, and the measurable outcomes. They were written in natural language and distinguished the inclusion and exclusion criteria. The objectives of the literature search are presented in the section below.

All searches performed are described below. These include a search on literature databases, vigilance databases, and a review of national registries available for the concerned medical field. The keywords used to query the databases were selected by taking into account the criteria previously defined. This report provides, for each search, the queries formulated for each database, the number of matching records for each query, and the date it was entered.

Evaluator in charge of the searches

The Evaluator who performed the searches on 15th July 2025 is:

Mr. Jordi Barrachina - Clinical Research Coordinator, PhD. (Legit.Health) (CV available in Annexes).

Sources

In the current state of the art in the corresponding medical field, the following aspects and information will be checked:

Applicable standards and guidance documents.
Information relating to the current situation in the medical field in which the device is used.
Benchmark devices and other devices available on the market.

The CER shall contain a thorough state-of-the-art review to analyze and assess the benefit-risk profile of currently available methods for the various indications and for the device's intended purpose. An objective, comprehensive literature review will be performed to identify, select, and collect the relevant literature to determine whether the device offers a safe and effective performance for the intended purpose. The review will be focused on relevant data to the device under evaluation, relevant data on the current situation in standard clinical practice, relevant data to the intended purpose of similar devices, and claimed performance and safety data (including incidents and contraindications).

Identification of relevant medical conditions/medical fields concerned

The device is intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing: quantification of intensity, count, extent of visible clinical signs interpretative distribution representation of possible International Classification of Diseases (ICD) classes.

Therefore, the medical conditions identified are all skin diseases listed and described in the ICD-11 (code 14).

Systematic Literature search for SOTA description

Following section A5 of the MEDDEV 2.7/1 rev4 guide, the objective of the literature search will be conducted to complete the state of the art of the device, using the PICO methodology (Patient characteristics, type of Intervention, Control, and relevant Outcomes).

Data search question using PICO methodology

As part of the literature search strategy, the PICO method was used to establish the algorithms subsequently. The PICO method is a format used for the development of appropriate clinical questions, consisting of answering the following questions to establish the search keywords:

P (Problem/Patient/Population): Who are the users, patients or affected population?.
I (Intervention/indicator): What is the management strategy for the identified population?.
C (Comparator): What is the alternative to the proposed intervention?.
O (Outcomes): What are the relevant outcomes to be measured?.

The choice of keywords for implementing the PICO methodology is based on the intended purpose and medical condition of the device. In this way, the selection of relevant articles from references identified in the databases is based on the research objective described in the table below.

	Inclusion	Exclusion
P (Problem/Patient/Population)	Patients with visible skin structure abnormalities; skin diseases listed in ICD-11 code 14; across all age groups, skin types, and demographics. Users: Healthcare Professionals (HCPs) such as dermatologists, General Practitioners (GPs) and IT professionals.	Wrong type of population: - Animals. - Studies focused on non-dermatological pathologies
I (Intervention/indicator)	Use of a computational software-only medical device (SaMD) that processes images of skin structures to provide clinical data for aiding practitioners in skin assessments. Data related to standard clinical practice in dermatology, traditional diagnostic methods without technological assistance.	Interventions not related to the device's intended use or medical indication.
C (Comparator and type of studies)	Other smartphone applications. SkinVision, Molescope, Huvy and DERM. Traditional methods of clinical skin examination without software assistance, and non-software-based skin assessments by healthcare professionals (i.e., Standard of Care). Type of studies: - Meta-analysis - Literature review and systematic reviews - Case series and cohort studies - Clinical studies (randomised or not, multicentric or not, prospective or retrospective) - Clinical guidelines or guidelines elaborated by scientific societies.	Wrong comparator and studies: - Non-clinical comparators (e.g., comparison against another algorithm only). - SPurely in silico or in vitro validation studies without clinical practice data. - Case reports that do not provide new information on risks or performance. - Non-peer-reviewed literature (e.g., opinion articles, blog posts). - Study providing no clinical results (e.g. protocols)
O (Outcomes)	Improved efficiency and accuracy in clinical decision-making for skin disease assessment or malignancy detection; support in diagnosis through interpretative data and quantification. Optimisation of clinical workflow through reduction of unnecessary referrals from primary care to dermatology; reduction of cumulative waiting time to see the dermatologist face-to-face. Safety data (e.g. incorrect performance, failure of interoperability, inputs without sufficient quality).	Wrong objectives: - Not clinical outcomes (e.g., technical algorithm testing) - Datasets not discussing the use, safety, performance, or benefits of the device. - Data only focused on drugs are excluded. - Too specific topic (i.e. datasets dealing with a particular subject and deemed irrelevant for the description of the state of the art).

Generation of keywords and algorithms for bibliographic search

According to the description of the words described using the PICO methodology, the following search terms or keywords have been defined.

	Description	Keywords/Terms	Algorithm
P (Problem/Patient/Population)	Patients with visible skin structure abnormalities; skin diseases listed in ICD-11 code 14; across all age groups, skin types, and demographics. Users: Healthcare Professionals (HCPs) such as dermatologists, Primary Care Practitioners (PCPs) and IT professionals.	"skin cancer", "epidermis", "chronic skin conditions", "skin conditions", "inflammatory skin diseases", "malignant skin lesions", "melanoma", "acne", "psoriasis", "dermatofibroma", "dermatosis"	("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis")
I (Intervention/indicator)	Use of a computational software-only medical device (SaMD) that processes images of skin structures to provide clinical data for aiding practitioners in skin assessments. Data related to standard clinical practice in dermatology, traditional diagnostic methods without technological assistance.	"AI-powered dermatology tools", "computer vision in dermatology", "smartphone", "dermatology software", "skin image analysis", "dermatology diagnostic support", "digital dermatology tools"	("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools")
C (Comparator and type of studies)	Other smartphone applications. SkinVision, Molescope, Huvy and DERM. Traditional methods of clinical skin examination without software assistance, and non-software-based skin assessments by healthcare professionals (i.e., Standard of Care). Type of studies: - Meta-analysis - Literature review and systematic reviews - Case series and cohort studies - Clinical studies (randomised or not, multicentric or not, prospective or retrospective) - Clinical guidelines or guidelines elaborated by scientific societies.	"standard of care", "traditional dermatology assessment", "clinical skin examination", "dermatology guidelines", "clinical studies in dermatology", "SkinVision", “Huvy”, “DERM”, "artificial intelligence", "machine learning", "deep learning", "computer vision", "deep neural networks", "metaoptima", "clinical exam", "visual inspection", "manual assessment"	("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment")
O (Outcomes)	Improved efficiency and accuracy in clinical decision-making for skin disease assessment or malignancy detection; support in diagnosis through interpretative data and quantification. Optimisation of clinical workflow through reduction of unnecessary referrals from primary care to dermatology; reduction of cumulative waiting time to see the dermatologist face-to-face. Safety data (e.g. incorrect performance, failure of interoperability, inputs without sufficient quality).	"diagnostic accuracy", "clinical decision support", "efficiency in dermatology", "referral reduction", "waiting time reduction", "safety of dermatology software", "performance of AI in dermatology"	("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology")

By combining the four elements of the PICO method, the final search algorithm was obtained:

("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools") AND ("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment") AND ("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology")

Bibliographic search strategy for determining the state of the art

Guidelines and recommendations

The following databases have been reviewed in order to find relevant guidelines or recommendations concerning the application of AI in dermatology or standard clinical practice:

MEDLINE PubMed: https://www.ncbi.nlm.nih.gov/pubmed/
U.S. Food and Drug Administration (FDA): https://www.fda.gov/regulatory-information/
Sociedad Española de Dermatología y Venereología: https://aedv.es/guias-para-pacientes-2/
European Academy of Dermatology and Venereology: https://eadv.org/publications/clinical-guidelines/
American Academy of Dermatology: https://www.aad.org/member/clinical-quality/guidelines

All the following searches have been conducted by Mr. BARRACHINA Jordi (Legit.Health) as described below and without deviation.

Database	Keywords / terms	Filters / limitations	Records
MEDLINE PubMed	("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("software" OR "digital imag*" OR "smartphone" OR "web application") AND ("artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima").	Period of search: the last 10 years from 2015/07/15 to 2025/07/15 Species: Humans Language: English Text availability: full text available Article type: "Guidelines", "Practice Guidelines"	0
FDA	"dermatology guidelines", "AI in dermatology", "machine learning dermatology", "digital health dermatology"	Topic: Clinical-Medical	0
Sociedad Española de Dermatología y Venereología	No specific keywords used	No specific limitations used.	0
European Academy of Dermatology and Venereology	No specific keywords used	No specific limitations used.	1
American Academy of Dermatology	No specific keywords used	No specific limitations used.	2

On the other hand, guidelines can also be manually added if they are deemed relevant and consistent with the research objectives as presented in previous sections. These guidelines can result from systematic research carried out in the past, identified within the selected articles or simply published by scientific societies.

Clinical Papers

To perform the search, sources of information from scientific literature databases such as PubMed and Cochrane Library will be consulted, along with ClinicalTrials.gov.

PubMed: a search engine with free access to the MEDLINE database of references and abstracts on life sciences and biomedical topics, which is considered the most complete and orderly. The US National Library of Medicine (NLM) at the National Institutes of Health maintains the database as part of the information retrieval system. MEDLINE has about 5200 journals published in the United States and in more than 70 countries around the world from 1966 to the present. Use PMID (PubMed Identifier) as the unique identifier assigned to each PubMed record.

The search filters to be applied in PubMed are as follows:

Text availability: "abstract", "full-text".
Species: humans
Publication date: 10 years (15-07-2015 to 15-07-2025)
Article Language: English
The full search algorithm is: ("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools") AND ("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment") AND ("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology").

Similar devices

The selection of similar devices for the purpose of this clinical evaluation is based on a rigorous assessment of equivalence in accordance with the requirements of Regulation (EU) 2017/745 (MDR) and the principles outlined in guidance document MDCG 2020-5.

A device is considered equivalent to our device only if sufficient similarity is demonstrated across the following three characteristics:

Technical: The device must have a similar design, underlying technology (e.g., AI algorithms), performance specifications, and deployment method.
Biological: As a Software as a Medical Device (SaMD) with no physical patient contact, this characteristic is confirmed by the absence of patient-contacting materials and is therefore not applicable.
Clinical: The device must be used for the same medical purpose and clinical condition, in a similar patient population, by a similar user profile, and demonstrate a comparable safety and clinical performance profile.

Only devices that meet these criteria for technical and clinical equivalence are considered 'similar devices', and their data is leveraged in this clinical evaluation. In this way, the following medical devices similar to our device have been identified.

Device name	Manufacturer name	Targeted medical conditions	CE Marking
SkinVision	SkinVision B.V.	Skin cancer detection (melanoma, basal cell carcinoma, squamous cell carcinoma)	Yes
Molescope	MetaOptima Technology Inc.	Mole imaging, other skin conditions like acne, eczema, psoriasis	Yes (MDD)
MoleMapper	Oregon Health & Science University Apps	Melanoma Detection	Not found
Huvy	Huvy SAS	Melanoma Detection	Yes
DERM	Skin Analytics	Skin cancer detection (melanoma, basal cell carcinoma, squamous cell carcinoma)	Yes
Dermalyser	AI Medical Technology	Melanoma Detection	Yes
FotoFinder	FotoFinder Systems GmbH	Skin cancer detection, other skin conditions	Yes
ModelDerm	Iderma Inc	Skin lesion recognition	No

Results from initial queries

All the following searches have been conducted by Mr. BARRACHINA Jordi (Legit.Health) as described below and without deviation on July 15, 2025.

#	Database	Data related to	Keywords/terms	Filters / limitations	Records
01	MEDLINE PubMed	Medical field	("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools") AND ("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "Molescope" OR "Dermalyser" OR "FotoFinder" OR "MoleMapper" OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment") AND ("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology")	Period of search: the last 10 years from 2015/07/15 to 2025/07/15 Species: Humans Language: English Text availability: full text available Article type: Reviews, Systematic reviews, Meta-analyses, Case series and cohort studies, Clinical studies (randomised or not, multicentric or not, prospective or retrospective).	227
02	Cochrane Library	Medical field	Same as above	No filter available (no results).	0

Inclusion criteria

In addition to the exclusion criteria mentioned in section Generation of keywords and algorithm for bibliographic search, the following criteria linked to the limitations of the search have been used when needed: “wrong language” (publications not available in English) and “not available data”. If the search retains a large number of publications on the same subject, it will be allowed to exclude results published more than 5 years ago (reason for exclusion: “repetitive publications”).

Duplicates will be identified using the unique references of the article (PMID, Cochrane IDU, DOI, and ClinicalTrials.gov Identifier). For publications that have no unique identifier, duplicates were identified using mainly the title, the authors, and the source of the document. Articles can also be added manually if they are deemed relevant and consistent with the research objectives as presented in section 2.2.2. These publications can be the result of systematic research carried out in the past or simply identified within the selected articles.

Vigilance databases

The following databases have been identified and used to search for similar devices:

MAUDE FDA (USA): https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm
Medical Device Recalls FDA (USA): https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfRES/res.cfm
EUDAMED (Europe): https://ec.europa.eu/tools/eudamed/#/screen/search-device

All the following searches have been conducted by Mr. BARRACHINA Jordi (Legit.Health) as described below and without deviation on July 15, 2025.

ID	Database	Keywords/terms	Filters / limitations	Records
01	MAUDE	"Manufacturer:SkinVision"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
02	MAUDE	“Manufacturer: MetaOptima Technology Inc.”	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
03	MAUDE	"Oregon Health & Science University Apps"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
04	MAUDE	"Manufacturer:Huvy SAS"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
05	MAUDE	"Manufacturer:Skin Analytics"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
06	MAUDE	"Manufacturer:AI Medical Technology"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
07	MAUDE	"Manufacturer:FotoFinder Systems GmbH"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
08	Medical Device Recalls	"Product name:SkinVision"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
09	Medical Device Recalls	“Product name: MoleScope”	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
10	Medical Device Recalls	"Product name: MoleMapper"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
11	Medical Device Recalls	"Product name: Huvy"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
12	Medical Device Recalls	"Product name: DERM"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
13	Medical Device Recalls	"Product name: Dermalyser"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
14	Medical Device Recalls	"Product name: FotoFinder"	Period of search: last 10 years (from 2015/07/15 to 2025/07/15)	0
15	EUDAMED	"Product name: SkinVision"	Period of search: no limitation	2
16	EUDAMED	"Product name: MoleScope"	Period of search: no limitation	0
17	EUDAMED	"Product name: MoleMapper"	Period of search: no limitation	0
18	EUDAMED	"Product name: Huvy"	Period of search: no limitation	0
19	EUDAMED	"Product name: DERM"	Period of search: no limitation	0
20	EUDAMED	"Product name: Dermalyser"	Period of search: no limitation	0
21	EUDAMED	"Product name: FotoFinder"	Period of search: no limitation	0

Inclusion criteria

In addition to the exclusion criteria mentioned in previous sections, the following criteria have beenn used: “duplicate” (the same event reported for two devices) and “No info” (data with no clear or exploitable information).

Duplicates will be identified using the unique references of the vigilance report.

Registres

Identification of registres

To our knowledge, there is no registry database available, we performed a search on the Google search engine.

Search description

The query was: "dermatology" AND "skin conditions" AND "AI medical devices" AND "dermatology diagnostic support" AND ("registry" OR "registries" OR "register" OR "registers").

Inclusion/exclusion criteria

In addition to the exclusion criteria mentioned in previous sections, the following criteria have been used: “wrong language” (publications not available in English) and “not available data” (registry with no available report).

Applicable standards

The manufacturer already identified the applicable standards for the device under evaluation. No additional search has been conducted.

Selection of references for the review of the state of the art

Methodology used for selection

The selection of publications is realized in 4 steps: a first selection based on the title of the article, a second selection based on the abstract, and a third selection based on materials and methods and a fourth selection based on the results of the article. At each selection step, the articles are retained or excluded based on the inclusion and exclusion criteria presented in the table of section Objectives of the literature search and the possible additional exclusion criteria presented in the search description.

Results of the selection

The results of all searches are summarized in the diagram below.

Appraisal of clinical data for the review of the state-of-the-art

Appraisal plan

ID	Criteria	Description	Grading System	Criteria	Score
CRIT1	Study Focus	Do the data relate to a relevant clinical alternative?	Direct Relevance	Data on a similar device (e.g., devices tagged as similar) OR on the standard clinical practice (e.g., accuracy of HCPs, visual inspection in Primary Care).	2
CRIT1	Study Focus	Do the data relate to a relevant clinical alternative?	Contextual Relevance	Contextual data (e.g., disease epidemiology, general clinical guidelines) but not on the performance of a specific alternative OR Clinical data including a similar device but which is not specific	1
CRIT1	Study Focus	Do the data relate to a relevant clinical alternative?	No Relevance	Data not related to any clinical alternative in dermatology	0
CRIT2	Clinical Setting or Intended use	Does the study's setting and intended use match the device under evaluation?	Full match	Data focused on devices designed to support healthcare practitioners in the assessment of skin structures OR Same setting (e.g., Primary Care and/or Dermatology clinic).	2
CRIT2	Clinical Setting or Intended use	Does the study's setting and intended use match the device under evaluation?	Partial match	Data focused on devices with an intended use not claimed by the manufacturer, but compliant with the intended use of the device group OR Same setting but for a different intended use (e.g., melanoma detection only).	1
CRIT2	Clinical Setting or Intended use	Does the study's setting and intended use match the device under evaluation?	No match	Data focused on devices with an intended use not related to the device under evaluation OR Different clinical setting (e.g., specialities different from dermatology).	0
CRIT3	Population of patients	Is the study population representative?	Applicable	Target population as per the device's intended use (e.g., patients attending a dermatological consultation across all age groups, skin types, and demographics)	2
CRIT3	Population of patients	Is the study population representative?	Partially applicable	Specific sub-population of the target population (e.g., only high-risk patients, only a specific skin phototype, only a pathology).	1
CRIT3	Population of patients	Is the study population representative?	Not applicable	Population not related to the target population (e.g., healthy volunteers) or non-relevant or contraindicated population.	0
CRIT4	Type of dataset	Appropriate study design/type of document and sufficient data	Yes	Studies with a level of evidence greater than or equal to 4 (as per Level of Evidence scale)	1
CRIT4	Type of dataset	Appropriate study design/type of document and sufficient data	No	Studies with a level of evidence lower than 4 (e.g., expert opinions, small case series). OR insufficient data to extract relevant clinical performance or safety information.	0
CRIT5	Outcome measurement (Performance/Safety)	Does the study measure objective outcomes related to performance (e.g., diagnostic accuracy) and/or safety (e.g., false negative rate)?	Yes	Provides quantitative performance data (e.g., Sensitivity, Specificity, PPV) and/or safety data (e.g., rate of unnecessary biopsies, false negatives).	1
CRIT5	Outcome measurement (Performance/Safety)	Does the study measure objective outcomes related to performance (e.g., diagnostic accuracy) and/or safety (e.g., false negative rate)?	No	Does not provide performance or safety data (e.g., descriptive only).	0
CRIT6	Clinical significance	Does the study evaluate if the performance results in a tangible clinical benefit (e.g., reduction in unnecessary biopsies, improved early detection)?	Yes	Provides clinical benefit data (e.g., impact on referral pathways, reduction of benign biopsies) or workflow benefits.	1
CRIT6	Clinical significance	Does the study measure clinical significance (e.g., impact on patient management, health outcomes)?	No	Does not provide clinical benefit data (reports pure performance metrics only or descriptive).	0
CRIT7	Statistical analysis	Is there a statistical analysis?	Yes	tatistical comparisons are made (e.g., between groups, p-values, confidence intervals).	1
CRIT7	Statistical analysis	Is there a statistical analysis?	No	No statistical comparison (descriptive data only).	0

All included datasets are appraised for their relevant methodological quality and scientific validity (from 0 to 4) and clinical relevance (from 0 to 6). The weight of each data set is measured by the score calculated from the sum obtained (from 0 to 10). If the score of a data set is < 4, a justification for the use of the data set is included.

Level of evidence

Besides evaluation and weighting, the level of clinical evidence of all included datasets is assessed using criteria exposed in the following table:

Level of evidence	Type of dataset	Score
Critical appraisal	Meta-analysis	10
Critical appraisal	Systematic reviews	9
Critical appraisal	Critically Appraised Literature / Evidence-Based Practice Guidelines	8
Experimental studies	Randomized controlled / comparative studies	7
Experimental studies	Non-randomized controlled / comparative studies	6
Observational studies	Prospective non-comparative studies	5
Observational studies	Retrospective non-comparative studies / Case series	4
Observational studies	Individual Case reports	3
Observational studies	Expert opinion / Bench research / non-EBM guidelines	2
Other	Other	1

Results of data appraisal

The datasets identified in section Selection of references for the review of the state-of-the-art were evaluated and weighted according to the appraisal criteria detailed in the previous section. The results of this data appraisal, including the assessed level of evidence, are presented in the following table.

It should be noted that additional articles and scientific guidelines have been included to contextualize the state-of-the-art presentation for the medical field (i.e., sections State of the Art presentation). These were incorporated either because no comparable scientific publications were found using our search algorithm, or to allow for performance comparison of the device and to complement the state-of-the-art.

According to the established Criteria (defined in the previous section), all selected articles obtained a score equal to or greater than 4 and were therefore included in the clinical evaluation.

The mean relevance score was 4.62/6.
The mean quality score was 2.60/4.
The mean weight was 6.91/10.
The mean level of clinical evidence was 6.0/10.

GRADE-like certainty assessment

Overall certainty (GRADE-like): Moderate.

Rationale: the body of evidence shows reasonable applicability and overall weight (mean weight 6.91/10 and mean relevance 4.62/6), but there are consistent methodological limitations and some indirectness. In short:

Risk of bias: average methodological quality was moderate (mean quality 2.60/4), with several observational/reader studies and variable blinding — this supports concern for risk of bias. (downgrade 1 level).
Inconsistency: results are directionally consistent (AI generally matches or improves clinician performance) but effect sizes and settings vary; no additional downgrade applied.
Indirectness: some datasets use similar devices or differ from the exact intended use (partial indirectness noted). (contributes to moderate certainty).
Imprecision: smaller studies have wide confidence intervals but larger trials and systematic reviews are available; net effect is not a further downgrade.
Publication bias: no clear signal identified, but cannot be excluded.

Net judgement: after considering the domains above, the evidence is best graded as Moderate. This judgment is linked to the aggregated appraisal metrics reported above and should be revisited if new high-quality randomized or registry data become available.

The detailed results of the data appraisal are presented in the table below.

Manuscript Appraisal Scores

Manuscript/Study	CRIT1	CRIT2	CRIT3	Relevance (Total /6)	CRIT4	CRIT5	CRIT6	CRIT7	Quality (Total /4)	Weight (Total /10)	Level of clinical evidence (Score)
Ahadi et al. 2021	2	0	1	3	0.5	0	1	1	2.5	5.5	4
Ba et al. 2022	1	1	1	3	0.5	0	1	1	2.5	5.5	6
Baker et al. 2022	1	2	1	4	0.5	0.5	0	0.5	1.5	5.5	5
Barata et al. 2023	1	1	2	4	1	0.5	1	1	3.5	7.5	7
Brinker et al. 2019 (1)	1	1	1	3	0.5	0	1	0.5	2	5	6
Brinker et al. 2019 (2)	1	1	1	3	1	0	1	1	3	6	7
Brinker et al. 2019 (3)	2	2	1	5	1	1	1	1	4	9	7
Burton et al. 1998	2	2	1	5	0.5	1	1	1	3.5	8.5	6
Chen et al. 2024	2	2	2	6	1	1	1	1	4	10	9
Cho et al. 2019	0	1	1	2	0.5	0	1	0.5	2	4	6
Eminovic et al. 2009	2	2	2	6	1	1	0	1	3	9	7
Escalé-Besa et al. 2023	2	2	1	5	0.5	0.5	1	0.5	2.5	7.5	5
Ferris et al. 2025	2	2	2	6	1	0	1	1	3	9	7
Fujisawa et al. 2018	1	2	1	4	1	1	1	1	4	8	7
Gerbert et al. 1996	2	0	1	3	1	0	1	1	3	6	3
Giavina-Bianchi et al. 2020	1	2	1	4	0.5	0	0	0.5	1	5	5
Giavina-Bianchi et al. 2020	2	2	1	5	0.5	0	0	1	1.5	6.5	5
Goldfarb et al. 2021	2	2	2	6	1	0	1	0.5	3.5	9.5	7
Goyal et al. 2020	2	1	1	4	1	1	0	1	3	7	8
Gregoor et al. 2023 (Clinical medicine)	2	2	1	5	0.5	1	1	0.5	3	8	5
Gregoor et al. 2023 (NPJ)	2	2	2	6	1	1	0.5	1	3.5	9.5	8
Haenssle et al. 2018	1	1	1	3	1	0	1	1	3	6	7
Han et al. 2018	1	0	1	2	0.5	0	1	0.5	2	4	4
Han et al. 2020	2	0	1	3	0.5	0	1	1	2.5	5.5	4
Hartman et al. 2023	1	2	1	4	1	1	1	1	4	8	6
Han et al. 2020	1	0	1	2	0.5	0	1	1	2.5	4.5	4
Han et al. 2020 (Plos Medicine)	1	0	1	2	0.5	0	1	1	2.5	4.5	4
Han et al. 2022	2	2	2	6	1	1	1	1	4	10	7
Han et al. 2022	2	1	1	4	0.5	0	1	1	2.5	6.5	4
Hsiao et al. 2008	2	2	2	6	0.5	1	0	0.5	2	8	6
Jahn et al. 2022	1	1	1	3	0.5	0.5	1	0.5	2.5	5.5	5
Jain et al. 2021	2	1	2	5	1	0	1	1	3	8	7
Jaklitsch et al. 2023	1	2	1	4	0.5	1	1	1	3.5	7.5	5
Jaklitsch et al. 2025	1	2	1	4	1	1	1	1	4	8	6
Kheterpal et al. 2023	0	2	0	2	0.5	0.5	0	0.5	1.5	3.5	2
Kim et al. 2022	2	2	2	6	0.5	1	1	1	3.5	9.5	7
Knol et al. 2006	2	2	1	5	0.5	1	0	1	2.5	7.5	5
Krakowski et al. 2024	2	2	2	6	1	1	1	1	4	10	9
Lee et al. 2020	1	1	1	3	0.5	0	1	0.5	2	5	6
Liu et al. 2020	2	1	2	5	0.5	0	1	1	2.5	7.5	4
Manolakos et al. 2023	1	2	1	4	0.5	1	1	1	3.5	7.5	5
Maier et al. 2014	1	1	0	2	1	1	1	0	3	5	5
Marchetti et al. 2019	1	1	1	3	1	0	1	0.5	2.5	5.5	7
Maron et al. 2019	1	1	1	3	0.5	0	1	0.5	2	5	6
Maron et al. 2020	1	1	1	3	1	0	1	0.5	2.5	5.5	7
Marsden et al. 2024	1	2	2	5	1	1	1	1	4	9	7
Merry et al. 2025	1	2	1	4	0.5	1	1	1	3.5	7.5	5
Morton et al. 2010	2	2	0	4	0.5	0	0	0	0.5	4.5	5
Muñoz-López et al. 2020	2	2	1	5	0.5	0	1	0.5	2	7	5
Navarrete-Dechent et al. 2018	2	0	1	3	0.5	0	1	0.5	2	5	4
Navarrete-Dechent et al. 2020	2	0	1	3	0.5	0	1	0.5	2	5	4
Orekoya et al. 2021	2	2	0	4	0.5	0	0	0.5	1	5	5
Papachristou et al. 2024	2	2	2	6	0.5	0	1	1	2.5	8.5	5
Phillips et al. 2019	1	2	1	4	0.5	0	1	1	2.5	6.5	4
Phillips et al. 2020	1	2	1	4	1	1	1	1	4	8	6
Rodríguez-Díaz et al. 2019	1	1	1	3	0.5	1	1	1	3.5	6.5	4
Sangers et al. 2022	2	2	2	6	1	0	1	1	3	9	5
Tepedino et al. 2024	1	2	1	4	1	1	1	1	4	8	6
Thomas et al. 2023	0	2	0	2	0.5	1	0	0.5	2	4	5
Thissen et al. 2017	2	2	2	6	0.5	0	1	1	2.5	8.5	4
Thorlacius et al. 2019	2	2	2	6	1	0	1	1	4.0	9.0	7
Tschandl et al. 2019	1	1	1	3	1	0	1	1	3	6	7
Tschandl et al. 2020	1	1	1	3	1	0	1	1	3	6	7
Udrea et al. 2019	2	2	2	6	0.5	0	1	1	2.5	8.5	4
Upile et al. 2012	1	1	1	3	1	1	1	1	4	7	8
Whited et al. 2015	2	2	0	4	0	0	0	0	0	4	2
Zanchetta et al. 2025	2	1	2	5	0.5	0	1	1	2.5	7.5	4

In this state-of-the-art review, 64 articles were included and appraised based on the criteria outlined above. To provide a comprehensive overview of current clinical practices and technologies in dermatology, additional relevant articles and guidelines were incorporated, bringing the total to 68 records analyzed.

Specifically, two clinical manuscripts were added to establish a baseline for the sensitivity and specificity of PCPs in detecting necessary referrals. Additionally, scientific guidelines for interpreting performance metrics related to the severity of female androgenetic alopecia were included to complement the state-of-the-art. Two more guidelines were added to provide a wider perspective on the evidence regarding expert consensus. Finally, three governmental reports were included to document the current situation of waiting times in Spain and other European countries, which provides a benchmark for comparison with the device's performance.

Results of the literature search

Summary of articles retained from the the state-of-the-art review in standard clinical practice

Due to the complexity of the device under evaluation and its multiple performance claims, the results of the state-of-the-art review are presented in several sections according to the different clinical applications of the device. Each section includes a summary table of the articles retained from the state-of-the-art review that are relevant to that specific clinical application. The tables include key information such as study design, population, outcomes measured, and main findings.

Clinical data collected on malignancy detection

In this section, we present the clinical data collected on the performance of healthcare practitioners (HCPs, which include dermatologists and primary care practitioners) in malignancy detection and also specifically in melanoma detection . The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. In this way, the state-of-the-art analysis prioritizes the current clinical practice as the primary performance baseline to be improved, while the performance of other commercial devices is considered a secondary benchmark to establish competitiveness.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Maron et al. 2019 PMID: 31419752 Comparative Study / Reader Survey Weighting from appraisal: 5	112 dermatologists recruited from 13 university hospitals	Dermatologists assessing standard clinical images of lesions suspected of malignancy. Convolutional Neural Networks (CNN)	To compare the diagnostic accuracy and performance of CNN against 112 dermatologists in multiclass skin cancer image classification. The primary end-point was the correct classification of the different lesions into benign and malignant (malignancy detection). The secondary end-point was the correct classification of the images into one of the five diagnostic categories (between them melanoma).	None reported	Sensitivity and specificity of dermatologists for the primary end-point (malignancy detection) were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5e97.1%). For the secondary end-point, more specificially for melanoma detection the sensitivity was 63.5% (95% CI: 50.4-76.5%) and the specificity 80.2% [72.5-86.5%]. At equal sensitivity, the algorithm achieved a specificity of 98.8%.	The automated binary classification can be extended to a multiclass classification problem, which better reflects clinical differential diagnoses, while still outperforming dermatologists at a significant level (p≤0.001)
Haenssle et al. 2018 PMID: 29846502 Reader Study / Deep Learning CNN Comparison Weighting from appraisal: 6	58 dermatologists participated in a reader study	Dermatologists assessing standard clinical images of lesions suspected of malignancy. Deep Learning Convolutional Neural Network (CNN)	To compare the diagnostic performance of a deep learning CNN for dermoscopic melanoma recognition against 58 dermatologists	None reported	Sensitivity and specificity of dermatologists for melanoma detection were 86.6% (95% CI: 77.3-95.9%) and 71.3% (95% CI: 60.1-82.85%). The AUC for dermatologists were 0.79 (95% CI: 0.73-0.85). The CNN showed an AUC of 0.86, with a sensitivity of 86.6% and a specificity of 82.5%	The deep learning CNN performs favorably compared to participating dermatologists in dermoscopic melanoma recognition
Barata et al. 2023 PMID: 37955139 Brief Communication / Reader Study Weighting from appraisal: 7.5	Reader study: 89 dermatologists. Test set: 1,511 images (7 disease categories).	Dermatologists (human readers). Supervised Learning (SL) AI model. Reinforcement Learning (RL) AI model (using expert-generated rewards/penalties).	To investigate if human preferences, applied via a Reinforcement Learning (RL) model, could improve AI-based decision support for skin cancer diagnosis compared to a standard Supervised Learning (SL) model. To test the utility of the RL model in a human-in-the-loop scenario.	None reported.	AI (standalone): The RL model improved melanoma sensitivity to 79.5% (from 61.4% for SL) and BCC sensitivity to 87.1% (from 79.4% for SL). Human-in-the-loop: AI support with the RL model increased the rate of correct diagnoses by dermatologists by 12.0% (from 68.0% to 79.9%) and improved the rate of optimal management decisions from 57.4% to 65.3%. Dermatologists alone: The dermatologists showed a sensitivity of 61.4% (95 CI: 56.3-68.6%).	Incorporating human preferences via reinforcement learning (RL) significantly improved the AI's sensitivity for melanoma and BCC (vs. SL) and improved dermatologists' diagnostic accuracy and management decisions when used as a decision support tool.
Chen et al. 2024 PMID: 39535860 Systematic Review & Meta-Analysis Weighting from appraisal: 10	100 studies included, analyzing experienced dermatologists, inexperienced dermatologists, and primary care physicians (PCPs).	Standard Clinical Practice: 1. Clinical examination / clinical images (unmagnified). 2. Dermoscopy / dermoscopic images (magnified).	To assess and quantify the diagnostic accuracy of skin cancer diagnosis, stratified by lesion type (keratinocytic vs. melanocytic), physician specialty/experience, and examination method.	Not applicable (Systematic Review).	Melanoma (Clinical exam/images): • Exp. Dermatologists: Sens 76.9% (95% CI: 69.3-83.1%), Spec 89.1% (95% CI: 76.9-95.3%) • Inexp. Dermatologists: Sens 78.3% (95%CI: 54.9%-91.4%), Spec 66.2% (95% CI: 55.9%-75.1%), • PCPs: Sens 37.5% (95% CI: 21.1-56.3%), Spec 84.6% (95% CI: 80.0-88.5%) Melanoma (Dermoscopy/images): • Exp. Dermatologists: Sens 85.7% (95% CI: 82.5-88.3%), Spec 81.3% (95% CI: 76.3-85.4%),. • Inexp. Dermatologists: Sens 78.0% (95% CI: 69.3-84.7%), Spec 69.5% (95% CI: 52.9-82.2%). • PCPs: Sens 49.5% (95% CI: 40.4-58.6%), Spec 91.3% (95% CI: 78.0-96.9%). Globally: Sensitivity: 83.6% (95% CI: 73.2-93.1%), Specificity: 82.3% (95% CI: 74.3-90.0%) and AUC of 74% (95% CI: 72-77%).	Diagnostic accuracy varies significantly by physician specialty, experience, and method. Dermoscopy substantially improved diagnostic accuracy for melanoma (5.7-fold higher odds for experienced derms) and keratinocytic cancer (2.5-fold higher odds). Experienced dermatologists had 13.3-fold higher odds of accurately diagnosing melanoma than PCPs using dermoscopic images.
Maron et al. 2020 PMID: 32915161 Web-Based Survey Study Weighting from appraisal: 5.5	12 board-certified dermatologists. 1200 unique dermoscopic images (50% melanomas, 50% nevi).	Dermatologists assessing dermoscopic images. Convolutional Neural Network (CNN) used as AI support.	To investigate whether live AI support improves the accuracy, sensitivity, and specificity of dermatologists in the dichotomous image-based discrimination in melanoma detection.	None reported.	Dermatologist without AI: Sensitivity 59.4% (95% CI: 53.3-65.5%), Specificity 70.6% (95% CI: 62.3-78.9%), Accuracy 65.0% (95% CI: 62.3-67.6%). Dermatologist with AI: Sensitivity 74.6% (95% CI: 69.9-79.3%), Specificity 72.4% (95% CI: 66.2-78.6%), Accuracy 73.6% (95% CI 70.9%-76.3%). CNN (standalone): Sensitivity 84.7% (95% CI: 81.9-87.6%), Specificity 79.1% (95% CI: 74.8-83.4%), Accuracy 81.9% (95% CI: 79.7-84.2%).	AI support can significantly improve the overall accuracy and sensitivity of dermatologists for the image-based discrimination of melanoma and nevus. This supports the use of AI tools to aid clinicians.
Brinker et al. 2019 PMID: 31078438 Comparative Study Weighting from appraisal: 5	144 completed questionnaires from dermatologists (52 board-certified, 92 junior). 804 biopsy-proven dermoscopic images (1:1 melanoma:nevi).	Dermatologists (board-certified and junior) assessing dermoscopic images. Convolutional Neural Network (CNN).	To compare the diagnostic performance (sensitivity, specificity, overall correctness) of a CNN (trained exclusively on biopsy-verified images) with that of dermatologists.	None reported.	All Dermatologists (n=144): Sensitivity 67.2% (95% CI: 62.6-71.7%), Specificity 62.2% (95% CI: 57.6-66.9%). Board-certified: Sens 63.2% (95% CI: 58.7-68.1%), Spec 65.2% 65.2% (95% CI: 60.5-69.8%). Junior physicians: Sens 68.9% (95% CI: 64.4-73.4%), Spec 58.0% (95% CI: 53.1-62.8%). CNN: Sensitivity 82.3% (95% CI: 78.3-85.7%), Specificity 77.9% (95% CI: 73.8-81.8%).	For the first time, automated dermoscopic melanoma image classification (by CNN) was shown to be significantly superior to both junior and board-certified dermatologists.
Han et al. 2020 PMID: 32243883 Original Article Weighting from appraisal: 5.5	Reader study: 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals. Validation Sets: SNU (2,201 images; 134 disorders) & Edinburgh (1,300 images; 10 disorders).	Medical professionals (dermatologists, residents). Deep Neural Network (DNN) algorithm.	To train and validate a DNN for malignancy prediction, suggesting treatment options, and multi-class classification (134 disorders). To assess if the algorithm can improve the performance of medical professionals ("Augmented Intelligence").	None reported.	Malignancy (Algorithm standalone): AUC 0.937 (SNU dataset) and 0.928 (Edinburgh dataset). Human-in-the-loop (Malignancy): With AI assistance, sensitivity of 47 clinicians improved from 77.4% to 86.8% and specificity from 92.9% to 93.9%. Human-in-the-loop (Multiclass): Top-1 accuracy of 4 doctors (for 134 diseases) increased by 3.3% and the Top-3 6.7% with AI assistance.	The algorithm performed comparably to experts and, when used as an ancillary tool ("Augmented Intelligence"), significantly improved the diagnostic performance of medical professionals for both malignancy prediction and multiclass classification.
Marchetti et al. 2019 PMID: 31306724 Cross-sectional / Reader Study Weighting from appraisal: 5.5	17 dermatologists (8 dermatologists and 9 dermatology residents). 150 dermoscopy images (50 melanoma, 50 nevi, 50 seborrheic keratoses).	Dermatologists and residents assessing dermoscopy images. Top-ranked computer algorithm (from ISIC 2017 challenge).	To determine if computer algorithms from the ISIC 2017 challenge could improve dermatologist diagnostic accuracy for melanoma. To explore imputing algorithm decisions for low-confidence human classifications.	None reported.	ROC Area (Melanoma classification): • Top Algorithm: 0.87 (95% CI: 0.82-0.92). • Dermatologists: 0.74 (95% CI: 0.72-0.77). • Residents: 0.66 (95% CI:0.6–0.69). (Algorithm > humans). Imputation (for low confidence): Imputing algorithm results for low-confidence dermatologist evaluations increased their sensitivity from 76.0% (95% CI:71.5–80.1%) to 80.8% (95% CI:76.3-85.3%), specificity from 72.6% (95% CI:69.4–75.7%) to 72.8% (95% CI:69.6-75.9) and overall correct classifications from 73.8% to 75.4%.	The top-ranked algorithm exceeded the diagnostic accuracy of dermatologists and residents. Judiciously applying algorithm predictions (e.g., in low-confidence cases) shows potential to improve human diagnostic performance.
Ahadi et al. 2021 PMID: 33912165 Retrospective Study Weighting from appraisal: 5.5	4,123 pathology specimens from 4,123 patients over 3 years at a university hospital.	Standard Clinical Practice: Clinical diagnosis (assumed naked eye) compared to histopathology (gold standard).	To evaluate the accuracy (Sensitivity, Specificity, PPV, NPV) of clinical diagnosis for malignant skin lesions by comparing it to the histological gold standard.	Not applicable (Retrospective analysis).	Overall Malignancy (Clinical Diagnosis): • Sensitivity: 90.48% (95% CI: 87.24-93.72%). • Specificity: 82.85% (95% CI: 81.66-84.04%). • Positive Predictive Value (PPV): 30.38%. • Negative Predictive Value (NPV): 99.06%. Melanoma (N=5): Sens 80.0%, Spec 97.45%.	Pathological assessment remains the cornerstone of skin cancer diagnosis. The high NPV (99.06%) and low PPV (30.38%) indicate that standard clinical diagnosis is more efficient at ruling out malignancies than it is at diagnosing them.
Brinker et al. 2019 PMID: 30981091 Comparative Study Weighting from appraisal: 6	157 dermatologists (all experience levels) from 12 German university hospitals. 100 dermoscopic images (20 melanoma, 80 nevi).	Dermatologists assessing dermoscopic images. Convolutional Neural Network (CNN) trained exclusively on open-source images.	To compare the performance of a CNN (trained only on open-source images) to a large, multi-experience-level group of dermatologists (157) for dermoscopic melanoma image classification.	None reported.	All Dermatologists (n=157): Mean Sensitivity 74.1%, (95% CI: 40-100%) Mean Specificity 60.0% (95% CI: 21.3-91.3%); AUC ROC 0.67 Chief Physicians (n=3): Mean Sensitivity 73.3%, Mean Specificity 69.2%. CNN (at Dermatologist Sens 74.1%): Mean Specificity 86.5% (95% CI: 70.8-91.3%). CNN (at Chief Physician Spec 69.2%): Mean Sensitivity 84.5% (95% CI: 80-95%).	A CNN trained exclusively on open-source images outperformed 136 of 157 dermatologists and all experience levels (junior to chief physicians) in terms of average specificity and sensitivity.
Tepedino M. et al. 2024 PMID: 39142857 Prospective, comparative effectiveness study Weighting from appraisal: 8	155 patients enrolled with 178 lesions. Sex: 56 Men (36.1%) and 99 women (63.9%). Age: Mean 65.6 years (SD 14.3). Phototype: - I: 20 (12.9%) - II: 32 (20.7%) - III: 27 (17.4%) - IV: 23 (14.8%) - V: 42 (27.1%) - VI: 11 (7.1%). Race: White (92.2%), Black/African American (7.1%), Native Hawaiian/Pacific Islander (0.7%).	Test Device: Handheld Elastic Scattering Spectroscopy (ESS) device (DermaSensor). Standard Practice: Primary Care Clinicians (PCCs) evaluating patient-selected lesions (clinical assessment + dermoscopy). Reference Standard: Histopathologic biopsy results (when available) or 3-dermatologist panel consensus reviewing clinical and dermatoscopic images.	To evaluate the performance of the ESS device and PCCs in correctly identifying skin lesions reported by patients as concerning. To determine the device's specificity in correctly classifying benign lesions that patients believed were concerning for skin cancer.	No adverse event, side effect, or device deficiency was reported during this study.	Device Performance (vs Reference): - Sensitivity: 90.0% (95% CI: 71.4-100.0%). - Specificity: 60.7% (95% CI: 52.5-68.4%). - NPV: 98.9% (95% CI: 93.4-99.8%). - PPV: 13.6% (95% CI: 7.1-24.6%). - AUC: 0.815. - Specificity across phototypes: 53.2% for types I-III and 69.1% for types IV-VI. PCC Performance (vs Reference): - Sensitivity: 40.0% (95% CI: 9.6-70.4%). - Specificity: 84.8% (95% CI: 78.2-89.7%). - AUC: 0.643. Device vs Panel Consensus (Management): - Management Sensitivity: 88.2%. - Management Specificity: 70.4%.	The use of the ESS device by PCCs can improve diagnostic and management sensitivity for select malignant skin lesions by correctly classifying most benign lesions of patient concern. This may increase skin cancer detection while improving access to specialist care.
Tschandl et al. 2019 PMID: 31201137 Web-based diagnostic study. Weighting from appraisal: 6	511 human readers (incl. 283 board-certified dermatologists, 118 residents). Test set: 1511 images (7 disease categories).	Human readers (all experience levels). 139 machine-learning algorithms.	To compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions.	None reported.	Overall Correct Diagnoses (out of 30): • Human Experts (n=27): 18.78 (SD 3.15). • Top 3 Algorithms: 25.43 (SD 1.95). (Mean difference 6.65). Melanoma-specific (from table): • All readers: Sens 73.1% (95% CI: 65.8-79.1), Spec 92.8% (95% CI: 91.3-94.2). • Top 3 algorithms: Sens 81.9% (95% CI: 75.4-87.3), Spec 96.2% (95% CI: 95.1-97.2). Malignancy detection: Sens 76.4% (95% CI: 73.2-79.6), Spec: 93.1% (95% CI: 91.2-95.3)	State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice.

Clinical data collected on the improvement in the accuracy of HCPs in the diagnosis of dermatological conditions

In this section, we present the clinical data collected on the performance of healthcare practitioners (HCPs, which include dermatologists and primary care practitioners) in the diagnosis of various dermatological conditions beyond malignancy detection and their improvement (improvement in sensitivity, specificity and accuracy) with the use of other AI-guided medical devices. The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Ba et al. 2022 PMID: 35569202 Multireader Multicase (MRMC) Study Weighting from appraisal: 5.5	18 board-certified dermatologists. 400 clinical images (10 categories).	Dermatologists (unassisted) vs. Dermatologists with CNN assistance.	To evaluate the potential impact of CNN assistance on dermatologists for clinical image interpretation.	None reported.	Multiclass (10 types): Accuracy 62.78% (unassisted) vs. 76.60% (assisted), an increase of 13.82%. Binary (Malignant/Benign): Sensitivity 83.21% (unassisted) vs. 89.56% (assisted), an increase of 6.35%. Specificity 80.92% (unassisted) vs. 87.90% (assisted), an increase of 6.98%.	CNN assistance improved dermatologist accuracy in interpreting cutaneous tumours. Dermatologists with less experience benefited more from CNN assistance.
Ferris et al. 2025 PMID: 39981881 MRMC Clinical Utility Study Weighting from appraisal: 9	108 Primary Care Physicians (PCPs). 100 skin lesion cases (from DERM-SUCCESS study).	1. PCPs (unaided visual assessment). 2. PCPs aided by an AI-enabled Elastic Scattering Spectroscopy (ESS) handheld device (DermaSensor).	To assess and compare the diagnostic and management performance of PCPs with and without the ESS device in detecting skin cancer.	None reported. (Aided PCPs incorrectly referred 11.8% more benign lesions but correctly referred 9.4% more malignant lesions).	Diagnostic Sensitivity: 71.1% (unaided) vs. 81.7% (aided), a difference of 10.6%. (P=.0085). Diagnostic Specificity: 60.9% (unaided) vs. 54.7% (aided), a difference of -6.2% (P=.1896). Management (Referral) Sensitivity: 82.0% (unaided) vs. 91.4% (aided), a difference of 9.6% (P=.0027), specificity: a decrease of 9.6%.	Use of the ESS device output by PCPs significantly improved their diagnostic and management sensitivities and overall management performance (AUC), suggesting the device can improve PCP skin cancer detection and confidence. Despite that, the diagnostic specificity decreased with the use of the device.
Fujisawa et al. 2018 DOI: 10.1111/bjd.16924 Original Article (Comparative Reader Study) Weighting from appraisal: 8.0	Dataset: 6,009 clinical images (non-dermoscopic) from 2,296 patients at the University of Tsukuba Hospital (Asian population implied). Training: 4,867 images from 1,842 patients. Testing: 1,142 images from 454 patients. Readers: 13 board-certified dermatologists and 9 dermatology trainees.	Test Device: Deep Convolutional Neural Network (DCNN) based on GoogLeNet architecture. Standard Practice: Visual classification of clinical images by board-certified dermatologists and dermatology trainees.	To determine if deep-learning technology could be used to develop an efficient skin cancer classifying system using a relatively small dataset of clinical (not dermoscopic) images.	None reported.	Malignancy Detection (Benign vs Malignant): - DCNN: Sensitivity 96.3%, Specificity 89.5%, Overall Accuracy 92.4% (±2.1%). - Board-Certified Dermatologists: Accuracy 85.3% (±3.7%). - Trainees: Accuracy 74.4% (±6.8%). - Comparison: DCNN accuracy was statistically higher than board-certified dermatologists (P < .0001). Multiclass Classification (14 classes): - DCNN: Accuracy 76.5%. - Board-Certified Dermatologists: Accuracy 59.7% (±7.1%). - Trainees: Accuracy 41.7% (±12.0%).	A DCNN trained on a relatively small dataset (fewer than 5,000 images) classified skin tumors more accurately than board-certified dermatologists using only single clinical images. The system shows potential for screening in general medical practice.
Goyal et al. 2020 DOI: 10.1016/j.compbiomed.2020.104065 Review Article Weighting from appraisal: 7	Included Studies: 53 papers published between 2012 and 2020 involving AI and skin cancer. Datasets Reviewed: Includes ISIC Archive, HAM10000, PH2, Dermofit, and others.	Test Device: Various Artificial Intelligence (AI) and Deep Learning algorithms. Standard Practice: Dermatologists, Primary Care Physicians, and Pathologists using clinical images, dermoscopy, or histopathology.	To provide an update on the performance of AI algorithms for skin cancer diagnosis across various image modalities (clinical, dermoscopic, histopathology). To discuss technical challenges (e.g., unbalanced datasets, lack of metadata) and opportunities for improvement.	Not applicable (Review). Notes that misdiagnosis by AI is a risk and biopsy remains essential for confirmation.	Summary of reviewed performance: - Dermoscopic (Codella et al. 2017): Dermatologists had Accuracy 70.5% and Specificity 59% (vs AI Accuracy 76%, Spec 62%). - Dermoscopic (Haenssle et al. 2018): Dermatologists had Sensitivity 86.6% and Specificity 71.3% (vs AI Sens 95%, Spec 63.8%). - Dermoscopic (Brinker et al. 2019): Dermatologists had Sensitivity 74.1% and Specificity 60.0% (vs AI Sens 84.2%, Spec 69.2%). - Clinical Images (Brinker et al. 2019): Dermatologists had Sensitivity 89.4% and Specificity 64.4% (vs AI Sens 89.4%, Spec 69.2%). - Clinical Images (Yang et al. 2018): Senior clinicians achieved average Accuracy 83.29% (vs AI Accuracy 53.35%).	Despite claims of surpassing clinician performance, AI systems are still in the early stages and face challenges like dataset bias (ethnicity/skin type), lack of clinical context, and "black box" interpretability. AI has potential as a cost-effective, accurate support tool if rigorous validation on real-world data is achieved.
Han et al. 2020 PMID: 32243883 Original Article Weighting from appraisal: 5.5	Reader study: 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals. Validation Sets: SNU (2,201 images; 134 disorders) & Edinburgh (1,300 images; 10 disorders).	Medical professionals (dermatologists, residents). Deep Neural Network (DNN) algorithm.	To train and validate a DNN for malignancy prediction, suggesting treatment options, and multi-class classification (134 disorders). To assess if the algorithm can improve the performance of medical professionals ("Augmented Intelligence").	None reported.	Malignancy (Algorithm standalone): AUC 0.937 (SNU dataset) and 0.928 (Edinburgh dataset). Human-in-the-loop (Malignancy): With AI assistance, sensitivity of 47 clinicians improved from 77.4% to 86.8%, an increase of 9.4% and specificity from 92.9% to 93.9%, an increase of 1.0%. Human-in-the-loop (Multiclass): Top-1 accuracy of 4 doctors (for 134 diseases) increased by 7.0% with AI assistance.	The algorithm performed comparably to experts and, when used as an ancillary tool ("Augmented Intelligence"), slightly improved the diagnostic performance of medical professionals for both malignancy prediction and multiclass classification.
Han et al. 2022 PMID: 35662137 Randomized Controlled Trial Weighting from appraisal: 10	576 consecutive cases (patients) with suspicious lesions. 8 trainees (4 dermatology residents, 4 non-dermatology trainees).	1. Trainees (unaided group, n=281). 2. Trainees (AI-assisted group, n=295) using "Model Dermatology" algorithm.	To validate whether a multiclass AI algorithm could augment the accuracy of non-expert physicians in a real-world setting.	A 12.2% drop in Top-1 accuracy was observed in cases where all Top-3 predictions from the algorithm were incorrect. Four cases of malignancy were ruled out by trainees after incorrect AI assistance.	Overall Top-1 Accuracy (Trainees): 53.9% (AI-assisted) vs. 43.8% (unaided) an increase of 10.1%. Non-Derm Trainees Top-1: 54.7% (AI-assisted) vs. 29.7% (unaided), an increase of 25.0%. Derm Residents Top-1: 53.1% (AI-assisted) vs. 57.3% (unaided) a reduction in accuracy of 4.2%.	The multiclass AI algorithm augmented the diagnostic accuracy of non-expert physicians in dermatology, especially for the least experienced (non-dermatology trainees), notwithstanding, it reduced the diagnostic accuracy of residents in dermatology.
Jain et al. 2021 PMID: 33909051 MRMC Diagnostic Study Weighting from appraisal: 8	40 clinicians (20 PCPs, 20 NPs). 1048 retrospective teledermatology cases (120 skin conditions).	1. PCPs and NPs (unassisted). 2. PCPs and NPs with an AI-based assistive tool.	To evaluate an AI-based tool that assists PCPs and NPs with diagnoses of dermatologic conditions.	None reported. (Rates for desired biopsies and referrals decreased slightly with AI assistance).	Top-1 Agreement (vs. Derm. Panel): • PCPs: 48% (unassisted) vs. 58% (assisted) [an increase of +10%]. • NPs: 46% (unassisted) vs. 58% (assisted) [ an increase of +12%]. Agreement (vs. Biopsy): • PCPs: +3% (64% to 67%). • NPs: +8% (60% to 68%).	AI assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.
Kim et al. 2022 PMID: 35061691 Prospective Controlled Study Weighting from appraisal: 9.5	285 cases (patients) with suspected skin neoplasms. 18 trainee doctors (11 dermatology, 7 intern).	1. Trainees (Control group, n=141): Routine exam + photo review. 2. Trainees (AI group, n=144): Routine exam + photo review + AI assistance (Model Dermatology).	To evaluate whether an AI algorithm improves the accuracy of nondermatologists in diagnosing skin neoplasms in a real-world setting.	None reported.	AI Group (Before vs. After AI): Top-1 Accuracy increased from 46.5% to 58.3%, an increase of 11.8%. Control Group (Before vs. After Photo Review): Top-1 Accuracy 46.1% vs. 51.8%, an increase of 5.7%. The number of differential diagnoses also increased significantly in the AI group.	In real-world settings, AI augmented the diagnostic accuracy of trainee doctors. The number of differential diagnoses also increased.
Krakowski et al. 2024 PMID: 38594247 Systematic Review & Meta-Analysis Weighting from appraisal: 10	10 studies eligible for meta-analysis. Participants included dermatologists, residents, and non-dermatologists.	1. Clinicians (unassisted). 2. Clinicians assisted by deep learning-based AI.	To study the effect of AI assistance on the accuracy of skin cancer diagnosis by clinicians.	Notes that clinicians can perform worse when the AI tool provides incorrect recommendations.	Clinicians (unassisted): Pooled Sens 74.8% (95% CI 68.6-80.1), Pooled Spec 81.5% (95% CI 73.9-87.3). Clinicians (AI-assisted): Pooled Sens 81.1% (95% CI 74.4-86.5), Pooled Spec 86.1% (95% CI 79.2-90.9), an increase of 6.7% and 4.6% respectively. Dermatologists showed an increase of 6.3% and 4.6% in sensitivity and specificity respectively. The PCPs showed an increase of 13% and 10.8% in the diagnostic sensitivity and specificity respectively.	AI in the hands of clinicians has the potential to improve diagnostic accuracy. The largest improvement was among non-dermatologists.
Maron et al. 2020 PMID: 32915161 Web-Based Survey Study Weighting from appraisal: 5.5	12 board-certified dermatologists. 1200 unique dermoscopic images (50% melanomas, 50% nevi).	Dermatologists assessing dermoscopic images. Convolutional Neural Network (CNN) used as AI support.	To investigate whether live AI support improves the accuracy, sensitivity, and specificity of dermatologists in the dichotomous image-based discrimination between melanoma and nevus.	None reported. When dermatologists were correct and AI was incorrect (10% of cases), dermatologists wrongly changed their answer 39% of the time.	Dermatologist without AI: Mean Sens 59.4%, Mean Spec 70.6%, Mean Accuracy 65.0%. Dermatologist with AI: Mean Sens 74.6% (P=.003), Mean Spec 72.4% (P=.54), Mean Accuracy 73.6% (P=.002). An increase of 15.2%, 1.8% and 8.6% respectively.	AI support can significantly improve the overall accuracy and sensitivity of dermatologists for the image-based discrimination of melanoma and nevus. This supports the use of AI tools to aid clinicians.
Tschandl et al. 2020 PMID: 32572267 Web-based diagnostic study Weighting from appraisal: 6	302 raters (169 dermatologists, 77 residents, 38 GPs) from 41 countries. 1,412 dermoscopic images (7 disease categories).	1. Human raters (unassisted). 2. Human raters + AI-based multiclass probabilities. 3. Human raters + AI-based malignancy probability. 4. Human raters + AI-based CBIR.	To address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows.	None reported	AI Multiclass Support: Accuracy improved from 63.6% (unassisted) to 77.0% (a 13.3% increase). Other AI Support: No improvement was observed for AI-based malignancy probability or CBIR. Experience: The least experienced clinicians gained the most from AI-based support.	Good quality AI-based support (specifically multiclass probabilities) improves diagnostic accuracy over that of either AI or physicians alone. The least experienced clinicians gain the most.

Clinical data collected on the performance of HCPs in the diagnostic accuracy of dermatological conditions

In this section, we present the clinical data collected on the diagnostic accuracy of healthcare practitioners (HCPs, including dermatologists and primary care practitioners) in diagnosing various dermatological conditions. The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. As in previous sections, we focus on studies that provide insights into the diagnostic accuracy of both PCPs and dermatologists, who represent the standard clinical practice and are our state of the art.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Escalé-Besa et al. 2023 PMID: 36922556 Prospective Diagnostic Study Weighting from appraisal: 7.5	100 consecutive patients visiting a General Practitioner (GP) in a primary care setting in central Catalonia, Spain.	1. General Practitioners (GPs) (face-to-face). 2. Teledermatology (TD) dermatologists. 3. Autoderm ML model (AI).	To perform a prospective validation of an image analysis ML model (Autoderm) as a diagnostic decision support tool, comparing its accuracy to GPs and teledermatology dermatologists in a real-life setting.	None reported.	Overall (100 cases): • Top-1 Accuracy: AI 39% vs. GP 64% vs. TD 72%. In-Distribution (82 cases): • Top-3 Accuracy: AI 75% vs. GP 76%. • Top-5 Accuracy: AI 89% vs. TD (Top-3) 90%.	The ML model's overall diagnostic accuracy (Top-1) in real-life conditions is lower than that of both GPs and dermatologists. However, the model shows capability as a support tool for GPs, particularly in differential diagnosis (Top-5 accuracy of 89% for trained diagnoses).
Han et al. 2020 PMID: 32243883 Original Article Weighting from appraisal: 5.5	Reader study: 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals. Validation Sets: SNU (2,201 images; 134 disorders) & Edinburgh (1,300 images; 10 disorders).	Medical professionals (dermatologists, residents). Deep Neural Network (DNN) algorithm.	To train and validate a DNN for malignancy prediction, suggesting treatment options, and multi-class classification (134 disorders). To assess if the algorithm can improve the performance of medical professionals ("Augmented Intelligence").	None reported.	Malignancy (Algorithm standalone): AUC 0.937 (SNU dataset) and 0.928 (Edinburgh dataset). Human-in-the-loop (Malignancy): With AI assistance, sensitivity of 47 clinicians improved from 77.4% to 86.8% and specificity from 92.9% to 93.9%. The mean Top-1 and Top-3 accuracies of dermatologists were 49.9% and 67.2%	The algorithm performed comparably to experts and, when used as an ancillary tool ("Augmented Intelligence"), significantly improved the diagnostic performance of medical professionals for both malignancy prediction and multiclass classification.
Han et al. 2022 PMID: 35662137 Randomized Controlled Trial Weighting from appraisal: 10	576 consecutive cases (patients) with suspicious lesions. 8 trainees (4 dermatology residents, 4 non-dermatology trainees).	1. Trainees (unaided group, n=281). 2. Trainees (AI-assisted group, n=295) using "Model Dermatology" algorithm.	To validate whether a multiclass AI algorithm could augment the accuracy of non-expert physicians in a real-world setting, including diverse out-of-distribution conditions.	None reported	Overall Top-1 Accuracy (Trainees): 53.9% (AI-assisted) vs. 43.8% (unaided) (P=0.019). Non-Derm Trainees Top-1: 54.7% (AI-assisted) vs. 29.7% (unaided). Derm Residents Top-1: 53.1% (AI-assisted) vs. 57.3% (unaided) (P=0.55).	The multiclass AI algorithm augmented the diagnostic accuracy of non-expert physicians in dermatology, especially for the least experienced (non-dermatology trainees).
Jain et al. 2021 PMID: 33909051 MRMC Diagnostic Study Weighting from appraisal: 8	40 clinicians (20 PCPs, 20 NPs). 1048 retrospective teledermatology cases (120 skin conditions).	1. PCPs and NPs (unassisted). 2. PCPs and NPs with an AI-based assistive tool.	To evaluate an AI-based tool that assists PCPs and NPs with diagnoses of dermatologic conditions.	None reported.	Top-1 Agreement (vs. Derm. Panel): • PCPs: 48% (unassisted) vs. 58% (assisted) [+10%]. Top-3 Agreement: • PCPs: 57% • NPs: 46% (unassisted) vs. 58% (assisted) [+12%]. Agreement (vs. Biopsy): • PCPs: +3% (64% to 67%). • NPs: +8% (60% to 68%).	AI assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.
Kim et al. 2022 PMID: 35061691 Prospective Controlled Study Weighting from appraisal: 9.5	285 cases (patients) with suspected skin neoplasms. 18 trainee doctors (11 dermatology, 7 intern).	1. Trainees (Control group, n=141): Routine exam + photo review. 2. Trainees (AI group, n=144): Routine exam + photo review + AI assistance (Model Dermatology).	To evaluate whether an AI algorithm (http://b2019.modelderm.com) improves the accuracy of nondermatologists in diagnosing skin neoplasms in a real-world setting.	None reported.	AI Group (Before vs. After AI): Top-1 Accuracy increased from 46.5% to 58.3% (P=.008). Top-3 Accuracy increased from 54.9% to 71.5% Dermatologists: Top-1 Accuracy 61.8% and Top-3 accuracy: 71.5. The number of differential diagnoses also increased significantly in the AI group.	In real-world settings, AI augmented the diagnostic accuracy of trainee doctors. The number of differential diagnoses also increased.
Liu Y et al. 2020 PMID: 32424212 DLS Development & Validation Weighting from appraisal: 7.5	Development set: 16,114 de-identified cases. Validation set B: 963 cases. Reader group: 6 dermatologists, 6 PCPs, 6 NPs.	1. Dermatologists, PCPs, NPs (unassisted). 2. Deep Learning System (DLS).	To develop and validate a DLS to provide a differential diagnosis of 26 common skin conditions (and 419 total) using images and clinical data. To compare DLS accuracy to dermatologists, PCPs, and NPs.	Not applicable (retrospective development).	Top-1 Accuracy (on 963 cases): DLS 66% vs. Dermatologists 63% vs. PCPs 44% vs. NPs 40%. DLS was non-inferior to dermatologists. Top-3 Accuracy: DLS 90% vs. Dermatologists 75% vs. PCPs 60% vs. NPs 55%.	The DLS can distinguish between 26 common skin conditions at a level non-inferior to dermatologists and more accurate than PCPs and NPs, highlighting its potential to assist general practitioners.
Muñoz-López et al. 2021 PMID: 33037709 Prospective Diagnostic Study Weighting from appraisal: 7	340 cases from 281 consecutive patients in a teledermatology clinic. Reader study: 9 providers (3 dermatologists, 3 residents, 3 GPs).	1. Teledermatologists (real-time). 2. AI algorithm (Model Dermatology). 3. Reader study (Dermatologists, Residents, GPs).	To assess the diagnostic performance and potential clinical utility of an AI algorithm (Model Dermatology) in a real-life telemedicine setting.	None reported	Overall Top-1 Accuracy: AI 41.2% vs. GPs 49.3% vs. Residents 57.8% vs. Dermatologists 60.1%. 'In-Distribution' Balanced Top-1 Accuracy: AI 47.6% vs. GPs 39.7% vs. Residents 47.7% vs. Dermatologists 49.7%.	In this prospective real-life study, the AI algorithm's accuracy is inferior to dermatologists. However, when analysis was limited to 'in-distribution' diagnoses, the AI's balanced accuracy was comparable to dermatologists/residents and superior to GPs.

Clinical data collected on the referral accuracy of PCPs in dermatological conditions

In this section, we present the clinical data collected on the referral accuracy of primary care practitioners (PCPs) in dermatological conditions. The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. As in previous sections, we focus on studies that provide insights into the referral accuracy of PCPs, and the reduction of unnecesarry referrals by both, the implementation of AI-guided medical devices or teledermatology.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Baker et al. 2022 (Abstract) Pilot prospective study Weighting from appraisal: 5.5	Patients with urgent skin cancer referrals at a UK hospital trust (500-600 cases/month).	1. Standard 2-week wait (2WW) referral pathway. 2. New AI teledermatology software (UKCA marked Class IIa) used at community hubs for triage.	To test the use of AI teledermatology software to triage urgent skin cancer referrals and manage increased demand.	None reported	The AI service led to a 62% reduction in the number of patients requiring an urgent face-to-face appointment. • Reduction of unnecessary referrals rate (back to GP) was 34%. • 38% of patients still required an urgent face-to-face appointment.	The introduction of the AI teledermatology service significantly reduced the number of urgent face-to-face appointments needed and helped the trust meet its 2-week wait targets.
Eminović et al. 2009 PMID: 19433694 Cluster Randomized Controlled Trial (RCT) Weighting from appraisal: 9	631 patients referred by 85 General Practitioners (GPs) in the Netherlands.	1. Control group (n=39 GPs): Standard referral to a dermatologist. 2. Intervention group (n=46 GPs): Teledermatologic (store-and-forward) consultation first.	To determine whether teledermatologic consultations can reduce in-person referrals to a dermatologist by GPs.	None reported	The proportion of office visits considered "preventable" by the dermatologist was 39.0% in the teledermatology group vs. 18.3% in the control group. • This was an absolute reduction of unnecessary referrals 20.7% (95% CI, 8.5%-32.9%).	Teledermatologic consultation offers the promise of reducing referrals to a dermatologist by 20.7%.
Jain et al. 2021 PMID: 33909051 MRMC Diagnostic Study Weighting from appraisal: 8	40 clinicians (20 PCPs, 20 NPs). 1048 retrospective teledermatology cases (120 skin conditions).	1. PCPs and NPs (unassisted). 2. PCPs and NPs with an AI-based assistive tool.	To evaluate an AI-based tool that assists PCPs and NPs with diagnoses of dermatologic conditions.	None reported. (Rates for desired biopsies and referrals decreased slightly with AI assistance).	PCPs assisted reduced the rate of referrals by 42% (previously they derived 45%), a reduction of 3% of referrals.	AI assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.
Knol et al. 2006 PMID: 16539753 Prospective Study Weighting from appraisal: 7.5	505 teledermatology consultations for 503 patients from 29 participating GPs in the Netherlands.	1. GPs' stated intention to refer (hypothetical). 2. Store-and-forward teledermatology consultation (digital photos + clinical info sent to dermatologist).	To investigate the reduction in dermatological referrals following primary-care teledermatology consultation.	None reported.	Referral Reduction: Of the 306 patients the GPs intended to refer, teledermatology prevented the referral for 163 (53%). Adjusted for missing data, the reduction was 51% (95% CI 47-58%). New Referrals: Of 144 patients GPs did not intend to refer, 17% were referred after the tele-consult.	Consultation using digital store-and-forward teledermatology by the GP can halve (51-53%) the number of referrals to a dermatologist for selected patients.

Clinical data on the impact of AI-Guided Medical Devices on Dermatology Waiting Times and the Current Healthcare Landscape in Spain and the EU.

In this section, we present clinical data on the impact of AI-guided medical devices on dermatology waiting times and the current healthcare landscape in Spain and the EU. The following table summarizes key studies and reports that provide insights into how AI technologies are influencing dermatology services, particularly in terms of reducing waiting times and improving access to care. In addition to peer-reviewed studies, we also include relevant reports from governmental bodies to provide a comprehensive overview of the current state of dermatology services in the Basque Country, Spain, and the EU.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Giavina-Bianchi et al. 2020 PMID: 33437950 Cross-sectional Retrospective Weighting from appraisal: 6.5	30,976 individuals (55,624 skin lesions) from the São Paulo public health system waiting list.	Store-and-forward teledermatology (SF-TD) triage project.	To evaluate the proportion of individuals who could be assessed in primary care using teledermatology, and how this affected the waiting time for an in-person dermatologist appointment.	None reported.	53% of patients were managed in primary care. 43% were referred to in-person dermatologists. 4% were referred directly to biopsy. • This led to a 78% reduction in the mean waiting time for in-person appointments (from 6.7 months to 1.5 months).	The use of teledermatology as a triage tool significantly reduced the waiting time for in-person visits, improving health care access and utilizing public resources wisely.
Giavina-Bianchi et al. 2020 PMID: 32314966 Retrospective Cohort Weighting from appraisal: 5	6633 individuals aged 60+ (12,770 skin lesions) from the São Paulo teledermatology project.	Store-and-forward teledermatology (SF-TD) triage project.	To evaluate the proportion of lesions in individuals aged 60+ that could be managed by teledermatology in primary care.	None reported.	66.66% of dermatoses (8408/12,614) were managed in primary care. 27.10% were referred to an in-person dermatologist. 6.24% were referred directly to biopsy. • Project reduced mean waiting time from 6.7 months to 1.5 months (a 78% reduction).	Teledermatology helped to treat 67% of the dermatoses of older individuals without an in-presence visit, thus optimizing dermatological appointments for the most severe, surgical, or complex diseases.
Morton et al. 2010 PMID: 21198539 Observational Study Weighting from appraisal: 5	Patients referred for 'urgent suspected cancer' (289 photo-triage, 188 conventional) in Forth Valley, Scotland.	1. Conventional letter referral (all booked to consultant clinic). 2. Community-based photo-triage (close-up + dermoscopic images).	To compare the outcomes and costs of conventional and photo-triage referral pathways for suspected skin cancers.	None reported.	Photo-triage allowed 91% (263/289) of patients to get definitive care at the first visit, vs. 63% (117/186) conventionally. It reduced the number requiring a consultant clinic by 72%. • Mean wait time for MM treatment was 36 days (photo) vs. 39 (conventional), a reduction of 7.7% in waiting time.	Community photo-triage improved referral management of suspected skin cancer, increased service capacity, was marginally cheaper (£1.70 per patient), and reduced hospital visits.
Hsiao & Oh 2008 PMID: 18485493 Retrospective Chart Review Weighting from appraisal: 8	169 skin cancer patients (from 3 remote VA primary care clinics) treated in dermatology surgery clinics.	1. Conventional text-based electronic consult request. 2. Store-and-forward (S/F) teledermatology consult (images + text).	To examine the time intervals in which skin cancer patients (referred conventionally or by S/F teledermatology) were evaluated, diagnosed, and treated.	None reported.	Mean Time from Referral: • Initial Consult: 4 days (TD) vs. 48 days (Conv.). • Biopsy: 38 days (TD) vs. 57 days (Conv.) ( $p=.034$ ). • Surgery: 104 days (TD) vs. 125 days (Conv.) ( $p=.006$ ). A reduction of 17% of cumulative waiting time.	Clinical outcomes in skin cancer management via teledermatology, as measured by times to diagnosis and surgical treatment, can be comparable to, or better than, conventional referrals for remote patients.
Spanish SNS Report June 2025 (SISLE-SNS Data June 2025)	Patients on the Spanish National Health System (SNS) waiting list.	National Health System waiting list registry (surgical and consults).	To report the status of the waiting lists (number of patients, wait times, % > 6 months) for surgical procedures and specialist consultations in the SNS. In this case centred on waiting time to attend the dermatological consultation.	Not applicable (Registry report).	Surgical - Dermatology: 19,569 patients waiting. Mean wait: 69 days. 7.4% wait > 6 months. Consults - Dermatology: 8.00 patients/1000 hab. Mean wait: 121 days. 70.3% wait > 60 days. Basque Country: • Consults - Dermatology: 3.59 patients/1000 hab. Mean wait: 43 days. 53.9% wait > 60 days.	As of June 2025, the mean wait for a dermatology consultation (131 days) is the longest of all specialties, while the wait for surgery (69 days) is one of the shortest. On the other hand, the mean wait for a dermatology consultation in the Basque Country is 43 days, shorter than the mean in Spain.
DREES Report 2018 (France)	40,000 people from the Constances cohort in France.	Standard appointment booking with French medical professionals.	To survey and report on the waiting times for access to care for GPs and various specialists in France (2016-2017 data).	None reported (Report).	Median Wait Time (All motives): • General Practitioner: 2 days. • Dermatologist: 50 days. Mean Wait Time (All motives): • General Practitioner: 6 days. • Dermatologist: 61 days.	Half of GP appointments are obtained in less than 2 days. For specialists like dermatology, median wait times are longer (50-52 days), though they are much shorter if the reason is new or worsening symptoms.
DERMAsurvey 2013 (EUMS Report)	42 delegates (EUMS dermatology section) from 33 European countries.	National healthcare systems in 33 European countries.	To evaluate variations in healthcare systems, access to care, and national approaches to diagnostics and treatment for skin diseases in 33 European countries.	Not applicable (Survey of systems).	Waiting Times (Regular Visit): Mean 35.7 days. Ranged from less than 1 day (Greece) to 96 (UK), 112 (Slovenia), and 133 (Ireland) days. Waiting Times (Emergency): Mean 1.9 days. Waiting Times (Skin Tumour Surgery): Mean 18.4 days.	There are extensive variations in dermatology health care across Europe. Waiting times for regular visits average 35.7 days but exceed 3 months in countries like the UK and Ireland.

Clinical data on the impact of AI-Guided Devices on Remote Patient Management Rates in Dermatological consultations

In this section, we present clinical data on the impact of AI-guided devices on remote patient management rates in dermatological consultations. The following table summarizes key studies that provide insights into how AI technologies are influencing remote patient management, particularly in terms of reducing the need for in-person visits and improving access to care.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Giavina-Bianchi et al. 2020 PMID: 33437950 Cross-sectional Retrospective Weighting from appraisal: 6.5	30,976 individuals (55,624 skin lesions) from the São Paulo public health system waiting list.	Store-and-forward teledermatology (SF-TD) triage project.	To evaluate the proportion of individuals who could be assessed in primary care using teledermatology, and how this affected the waiting time for an in-person dermatologist appointment.	None reported.	53% of patients were managed remotely in primary care. 43% were referred to in-person dermatologists. 4% were referred directly to biopsy. .	The use of teledermatology as a triage tool significantly reduced the waiting time for in-person visits, improving health care access and utilizing public resources wisely.
Giavina-Bianchi et al. 2020 PMID: 32314966 Retrospective Cohort Weighting from appraisal: 5	6633 individuals aged 60+ (12,770 skin lesions) from the São Paulo teledermatology project.	Store-and-forward teledermatology (SF-TD) triage project.	To evaluate the proportion of lesions in individuals aged 60+ that could be managed by teledermatology in primary care.	None reported.	66.66% of dermatoses (8408/12,614) were managed remotely in primary care. 27.10% were referred to an in-person dermatologist. 6.24% were referred directly to biopsy. • Project reduced mean waiting time from 6.7 months to 1.5 months (a 78% reduction).	Teledermatology helped to treat 67% of the dermatoses of older individuals without an in-presence visit, thus optimizing dermatological appointments for the most severe, surgical, or complex diseases.
Orekoya et al. 2021 (Abstract) Retrospective Review Weighting from appraisal: 5	988 patients referred to a 2-week-wait (2WW) skin cancer clinic in September 2020.	1. Referral after face-to-face (F2F) GP consultation. 2. Referral after remote GP consultation (mostly telephone + photos).	To assess whether the mode of consultation (F2F or remote) in primary care affected the outcomes of consultations in 2WW skin cancer clinics.	None reported.	A higher proportion of patients who had remote consultations were discharged (43.4%) from the 2WW clinic than patients who had F2F consultations (36.2%). • A significantly higher number of benign lesions were referred following a remote consultation (70%) vs. a F2F consultation (59%) ( $P=0.004$ ).	This study highlights the value of F2F consultations for the initial assessment of lesions in primary care, in order to reduce the number of unnecessary referrals and hospital visits.
Kheterpal et al. 2023 PMID: 37891695 Implementation Evaluation Weighting from appraisal: 5	218 TD referrals from 4 Duke primary care (DPC) pilot sites.	Hybrid TD program: PCPs send e-consults (clinical + dermoscopic images) to dermatology, followed by a video visit with a dermatologist/resident.	To evaluate the implementation (barriers, facilitators, outcomes) of a hybrid TD virtual clinic at four primary care practices.	None reported. (Focus on implementation barriers).	Access: Mean time from e-consult to video visit was 7.5 days (vs. >6 months for in-person). Adoption: Varied widely; one clinic used TD for 22% of all derm referrals, another for only 2%. 35% of patients could be managed remotely PCP Barriers: Time burdens, poor clinic flow, discomfort with image taking.	The hybrid TD virtual clinic effectively reduced patient wait times for dermatology from > 6 months to ~ 1 week, but adoption was variable. Addressing PCP barriers is key to increasing uptake.
Whited 2015 PMID: 26433206 Review Article Weighting from appraisal: 4.5	Patients and providers using teledermatology (review of multiple studies).	Store-and-forward (S/F) and Real-time (RT) teledermatology vs. conventional care.	To review the evidence for teledermatology, focusing on diagnostic reliability, diagnostic accuracy, clinical outcomes, and user satisfaction.	None reported	In-person dermatology visits decrease by an average of 45.5% (S/F) to 61.5% (RT). • Clinical outcomes are comparable to conventional care. • Diagnostic reliability (agreement) is high and comparable to in-person agreement. • 53.5% of patients were able to be handled remotely	Teledermatology is a diagnostically reliable means of diagnosing skin conditions with comparable clinical outcomes and high patient satisfaction. It reduces in-person visits.

Clinical data on PCP Referral Accuracy for Dermatological Conditions

In this section, we present the clinical data collected on the referral accuracy of primary care practitioners (PCPs) in dermatological conditions. he following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. As in previous sections, we focus on studies that provide insights into the referral accuracy of PCPs, specifically in metrics such as sensitivity and specificity. These articles describing the current standard of clinical practice (e.g., sensitivity and specificity of PCPs to detect necessary referrals) were used to define the context for this state-of-the-art review. These documents, while not part of the formal literature data extraction (as they do not meet the PICO-based inclusion criteria), provide the benchmark against which the device's performance is compared.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Burton et al. 1998 J Med Screen Screening Study Weighting from appraisal: 8.5	109 volunteers (mean age 61) screened by 63 GPs (31 trained, 32 untrained) and 4 skin cancer specialists.	1. Untrained General Practitioners (GPs). 2. GPs trained in melanoma diagnosis. 3. Skin cancer specialists (as reference).	To measure the screening performance (sensitivity, specificity, PPV) of trained and untrained GPs in screening men and women aged 50+ for melanomas in the process of referral.	None reported	Screening (Detecting subjects with melanoma): • Trained GPs: Sens 0.98, Spec 0.52, PPV 0.22. • Untrained GPs: Sens 0.95, Spec 0.49, PPV 0.20. Referral sensitivity: 70% (95% CI: 67-73%) Referral specificity: 52% (95% CI: 43.61%).	GPs achieved high sensitivity in screening for melanoma subjects (95-98%) but at the cost of very low specificity (49-52%). On the other hand, GPs showed a 70% sensitivity and a 52% specificity in the detection of patients that need referral to dermatology. Training in melanoma diagnosis significantly improved a GP's ability to diagnose a melanoma correctly but did not significantly improve their overall screening statistics (sensitivity/specificity).
Gerbert et al. 1996 Arch Dermatol Prospective Study Weighting from appraisal: 6	71 primary care residents, 15 dermatologists and dermatology residents.	1. Primary Care Physicians (residents). 2. Dermatologists (and residents).	To determine PCPs' readiness to triage lesions suspicious for skin cancer; To compare their abilities to dermatologists; To assess if accuracy on slides transfers to patients.	None reported.	Dermatologists' scores were almost double those of primary care residents. • Primary care residents failed 50% of the time to correctly diagnose nonmelanoma skin cancer. PCPs showed a sensitivity of 79% (95% CI: 72-86%) and a specificity of 73% (95% CI: 66-80%) identifying patients that needed to be referred to dermatology.	Dermatologists' diagnostic scores were almost double those of primary care residents. Performance was positively associated with previous dermatology experience.

Clinical data collected on Inter-Observer Reliability in HS Severity Assessment using IHS4 scoring system

In this section, we present the clinical data collected on inter-observer reliability in Hidradenitis Suppurativa (HS) severity assessment using the International Hidradenitis Suppurativa Severity Score System (IHS4). The following table summarizes key studies that provide insights into the consistency of IHS4 scoring among different observers, highlighting their design, population, outcomes, and main conclusions.

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Goldfarb et al. 2021 Br J Dermatol Psychometric Assessment Weighting from appraisal: 9.5	Raters (dermatologists) assessing photographs of HS patients.	Existing HS outcome tools (lesion counts, Hurley, Sartorius, IHS4).	To assess the reliability and validity of the Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R) tool.	Not applicable (Psychometric assessment).	Inter-rater reliability (ICC): 0.88 (95% CI 0.77–0.94). Intra-rater reliability (ICC): 0.94 (95% CI 0.88–0.97). IHS4 inter-rater reliability: 0.47 (95% CI: 0-33-0.66).	The HASI-R is a valid and reliable outcome measurement instrument for HS that incorporates both inflammation and body surface area, addressing the time-consuming and unreliable nature of existing lesion-count tools.
Thorlacius et al. 2019 Br J Dermatol Reliability Study Weighting from appraisal: 10	10 dermatologists rating 30 patients with HS (all Hurley stages) from photographs.	HS outcome instruments: Hurley staging, modified Sartorius score (MSS), HS-PGA, HSS, and lesion counts (abscesses, nodules, fistulas).	To determine the inter-rater agreement and reliability of the most commonly used outcome instruments and staging systems in hidradenitis suppurativa (HS).	Not applicable (Psychometric assessment).	Inter-rater reliability (ICC 2,1): • Substantial: Hurley (0.80), Modified Sartorius (0.80). • Moderate: HS-PGA (0.72), HSS (0.64), Fistula count (0.62), Abscess count (0.59), Nodule count (0.54). Overall IHS4 inter-rater reliability: 0.47 (95% CI: 0.32-0.65).	Hurley staging and the modified Sartorius score demonstrated substantial inter-rater reliability. Lesion counts and the HSS showed only moderate reliability, suggesting they are less suitable as standalone outcome measures in multicenter trials.

Clinical data on the Variability in FAGA Severity Grading Using the Ludwig Scale

In this section, we present scientific guidelines about how to interpretate the metrics used to assess the agreement between observers, in this case, the severity grading of Female Androgenetic Alopecia (FAGA) using the Ludwig Scale. This is due to the fact that there is limited clinical data specifically addressing the variability in FAGA severity grading using this scale. However, we can provide a general overview of the metrics commonly used to assess inter-observer agreement in clinical settings, which can be applied to the context of FAGA severity grading.

Metric	Description	Interpretation Guidelines	Guidelines Reference
Cohen's Kappa (κ)	A statistical a measure of agreement between two raters on an ordinal scale, which accounts for the degree of disagreement rather than just whether they agree or disagree	- κ < 0: Agreement worse than chance agreement - κ = 0.01-0.20: Slight agreement - κ = 0.21-0.40: Fair agreement - κ = 0.41-0.60: Moderate agreement - κ = 0.61-0.80: Substantial agreement - κ = 0.81-1.00: Almost perfect agreement	Weighted Kappa: Nominal Scale agreement with provision for scaled disagreement or partial credit (Cohen, 1968); Landis & Koch, 1977
Pearson's Correlation Coefficient (r)	A measure of the linear correlation between two raters' scores on a continuous scale	- r = 1: Perfect positive correlation - r = 0.70-0.99: Strong positive correlation - r = 0.40–0.69: Moderate positive correlation - r = 0.10-0.39: Weak positive correlation - r = 0: No correlation - r < 0: Negative correlation	Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences; Evans, J.D. (1996). Straightforward Statistics for the Behavioral Sciences

Assessment of Expert Consensus on the Perceived Utility of the device

In this section, we describe the methodology used to define and assess the expert consensus on the perceived utility of the device along with the guidelines followed, including the pre-defined threshold for minimum acceptable agreement.

Methodological Term	Description	Key Points	References
Expert Consensus	A structured method to quantify the collective opinion of an expert panel on a specific topic (in this case, the perceived utility of a device).	Consensus is determined by comparing survey results to a pre-defined agreement threshold. - Methodological literature does not set a single universal threshold, but an agreement of ≥75% is frequently considered a substantial or optimal majority consensus.	Diamond, I. R., et al. (2014). Defining consensus: a systematic review recommends guideline-specific definitions. Journal of Clinical Epidemiology. Fitch, K., et al. (2001). The RAND/UCLA Appropriateness Method User's Manual. RAND Corporation.

Summary of articles retained for the description of similar devices

Clinical data collected on SkinVision

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Udrea et al. 2019 PMID: 31494983 Retrospective algorithm performance study Weighting from appraisal: 8.5	Sensitivity set: 285 histopathologically validated skin cancer cases (138 MM, 147 KC/precursors) from two clinical studies (195 cases) and the app's user database (90 cases). Specificity set: 6000 clinically validated benign cases from the app's user database.	A smartphone application SkinVision using a machine learning algorithm.	To evaluate the accuracy (sensitivity and specificity) of the newest version of the smartphone app for risk assessment of skin lesions.	None technical problems reported. 14 out of 285 (pre)malignant cases were classified as low risk (false negatives). This included 10 out of 138 malignant melanomas (MMs).	Overall (pre)malignancy: _ Sensitivity: 95.1% (95% CI: 91.9-97.3%). Malignant Melanoma: _ Sensitivity: 92.8% (95% CI: 87.8-96.5%). Specificity (on 6000 benign cases): * Specificity: 78.3% (95% CI: 77.2-79.3%).	This smartphone app provides a high sensitivity to detect skin cancer; however, there is still room for improvement in terms of specificity.
Sangers et al. 2022 PMID: 35124665 Prospective multicenter diagnostic accuracy study Weighting from appraisal: 9	372 patients (785 total lesions) at two Dutch dermatology outpatient clinics. Lesions included 418 suspicious lesions and 367 benign control lesions.	A CE-marked mHealth app SkinVision, version RD-174 using a Convolutional Neural Network (CNN). Tested on iOS (iPhone Xr) and Android (Galaxy S9) devices.	To identify the diagnostic accuracy (sensitivity and specificity) of the app for detecting premalignant and malignant skin lesions.	Non reported False negatives included 1 invasive melanoma, 2 in situ melanomas, 2 squamous cell carcinomas, and 13 basal cell carcinomas.	Overall (pre)malignancy: _ Sens: 86.9% (95% CI: 82.3-90.7). _ Spec: 70.4% (95% CI: 66.2-74.3). Performance by device: _ iOS: Sensitivity 91.0%. _ Android: Sensitivity 83.0%.	The diagnostic accuracy of the mHealth app is "far from perfect," but it is potentially promising to empower patients to self-assess skin lesions. Additional validation is warranted, particularly for suspicious pigmented skin lesions.
Gregoor et al. 2023 PMID: 37261324 Pilot feasibility study (mixed-methods) Weighting from appraisal: 8	50 patients recruited from 3 primary care (GP) practices in the Netherlands.	1. AI-based mHealth app SkinVision used by patients before GP consultation. 2. GPs' unassisted (blinded) diagnosis. 3. GPs' unblinded diagnosis (to assess app's impact).	To investigate the conditions and feasibility of a larger study on implementing the AI app in primary care (both in patient hands and as a GP tool).	None reported	(Exploratory, n=45): * AI App: Sensitivity 90.9% (95% CI: 55.5–99.8%) (9/10), Specificity 80.0% (95% CI: 63.0–91.6%) (28/35). * GP (Blinded): Sensitivity 80.0% (95% CI: 44.4–97.5%) (8/10), Specificity 80.0% (95% CI: 63.1-91.6%) (28/35).	Studying the implementation of the AI app in primary care appears feasible. 54% of patients with a benign skin lesion and a low-risk app rating indicated they would be reassured and cancel their GP visit.
Thissen et al. 2017 PMID: 28562195 Algorithm calibration & evaluation study Weighting from appraisal: 8.5	341 lesions from 256 consecutive patients at a dermatology department in the Netherlands. A subset of 108 lesions was used for the final evaluation.	A smartphone app SkinVision using a recalibrated rule-based (fractal and classical) image analysis algorithm.	To assess the sensitivity and specificity of the recalibrated algorithm in diagnosing melanoma, nonmelanoma skin cancer, and premalignant lesions.	7 out of 35 (pre)malignant lesions were missed (rated low/medium risk). This included one basal cell carcinoma (BCC) rated as low risk. All melanomas (n=3) were rated high risk.	(On n=108 test set): _ Overall (pre)malignancy: Sensitivity 80% (95% CI; 62-90%), Specificity 78% (95% CI: 66-86%). _ Performance dropped without the patient questionnaire (Sensitivity 71%, Specificity 56%).	The mHealth app may offer support to professionals less familiar with differentiating skin lesions, although it is less accurate than a dermatologist's clinical eye. It adds value by analyzing both pigmented and non-pigmented lesions.
Maier et al. 2014 PMID: 25087492 Prospective diagnostic study Weighting from appraisal: 5	195 melanocytic lesions from consecutive patients at a German dermatology department. 144 lesions were included in the final statistical evaluation.	1. A smartphone app SkinVision using fractal image analysis. 2. Clinical and dermoscopic diagnosis by two dermatologists.	To prospectively evaluate the app's sensitivity and specificity for diagnosing malignant melanoma, compared to clinical diagnosis and histopathology.	None technical problems reported. The app missed 7 out of 26 melanomas (false negatives). * 2 were rated low risk (both melanoma in situ). * 5 were rated medium risk. Dermatologists missed 2 out of 26 melanomas.	(On n=144 test set): _ AI App (Melanoma): Sensitivity: 73% (95% CI: 52-88%), Specificity: 83% (95% CI: 75-89%). _ Dermatologists (Melanoma): Sensitivity: 88% (95% CI: 69-98%), Specificity: 97% (95% CI 92-99%).	The smartphone application might be a promising tool for pre-evaluation by laypersons, but it is "inferior to the diagnostic evaluation by a dermatologist".
Gregoor et al. 2023 PMID: 37210466 Retrospective population-based pragmatic study Weighting from appraisal: 9.5	18,960 mHealth app users (from 2.2 million insured adults offered free access) matched 1:3 to 56,880 non-user controls.	1. mHealth app SkinVision with AI (CNN) assessment plus teledermatologist review. 2. Standard of care (controls who did not use the app).	To evaluate the impact of the mHealth app on dermatological healthcare consumption in a real-world, population-based setting.	None reported	(Healthcare Claims Analysis): _ mHealth users had more claims for (pre)malignant skin lesions than controls (6.0% vs 4.6%; OR 1.3). _ mHealth users also had a much higher risk of claims for benign skin tumors and nevi (5.9% vs 1.7%; OR 3.7). * The cost per additional (pre)malignancy detected was €2567.	The app appears to have a positive impact by detecting more (pre)malignancies, but this must be balanced against the "stronger increase in care consumption for benign skin tumors and nevi".

Clinical data collected on Huvy

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Zanchetta et al. 2025 JEADV Clinical Practice Retrospective Algorithm Performance Study Weighting from appraisal: 7.5	Test Datasets: 2966 images total, from: 1. GLOMEL (Public database): 2672 dermoscopic images. 2. Dermatologists (Private): 157 dermoscopic images. 3. TeleExp (Private): 137 real-life tele-dermatology images (68 usable).	1. AI-DSS HUVY with traditional binary classification (melanoma vs. non-melanoma). 2. AI-DSS HUVY with innovative ternary classification (melanoma vs. non-melanoma vs. 'doubtful').	To assess a deep learning algorithm's performance in classifying melanoma across diverse datasets (public, dermatologist-collected, and real-life tele-dermatology images), and to evaluate a novel 'doubtful' category.	None reported	Binary (Sensitivity/Specificity): _ TeleExp: 92.3% / 58.5% Ternary (Sensitivity/Specificity): _ TeleExp: 100% / 67.6% (18.5% 'doubtful' rate) Introducing the 'doubtful' category significantly increased specificity (e.g., +15.6% on TeleExp, +19% on GLOMEL) while maintaining or improving sensitivity.	Introducing a 'doubtful' category significantly improves the AI's performance, especially specificity, compared to a simple binary classification. This three-level approach (high-risk, low-risk, doubtful) can help primary care providers make more informed referrals.

Clinical data collected on DERM

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Thomas et al. 2023 PMID: 38020164 Prospective real-world post-deployment study Weighting from appraisal: 5	10,925 patients (14,500 cases) referred to the urgent 2-week-wait (2WW) skin cancer pathway at two UK NHS hospitals. Analysis based on 8,571 lesions with confirmed outcomes.	1. AI-DSS (DERM), versions A and B, used as a triage tool. 2. A "second-read review" by a consultant dermatologist for all cases DERM marked for discharge.	To report the prospective, real-world performance of the DERM AI tool after deployment in two NHS skin cancer pathways.	None reported	AI-DSS (Melanoma or not): _ DERM-vA: Sensitivity 95.0% (95% CI: 90-97.6%) – 97.0% (95% CI: 84.7-99.5%), Specificity 58.8 (95% CI: 57.4-60.2%) – 63.2% (95% CI: 59.5-66.7%). _ DERM-vB: Sensitivity 100.0% (95% CI: 93.9-100% / 82.4-100%), Specificity 80.4 (95% CI: 77.2-83.4%) – 80.9% (95% CI: 79.3-82.4%). AI-DSS (Malignant or not): _ DERM-vA: Sensitivity 96.0 (95% CI: 94.4-97.2%) – 99.3% (95% CI: 96.3-99.9%), Specificity 33.1 (95% CI: 29.3-71.1%) – 45.0% (95% CI: 43.4-46.6%). _ DERM-vB: Sensitivity 98.9 (95% CI: 96-99.7%) – 100.0% (95% CI: 94.7-100%), Specificity 60.6 (95% CI: 56.6-64.5%) – 64.8% (95% CI: 62.9-66.7%).	DERM's real-world performance met sensitivity targets. The newer version (DERM-vB) showed improved specificity and correctly referred all skin cancers. The performance supports removing the human second-read review to maximize system benefits.
Phillips et al. 2019 PMID: 31617929 Prospective, multicenter, masked diagnostic trial Weighting from appraisal: 6.5	514 patients with at least one suspicious lesion scheduled for biopsy, from 7 UK hospitals. Analysis included 1550 images (551 biopsied, 999 control).	1. AI-DSS (Deep Ensemble for Recognition of Malignancy - DERM). 2. Specialist clinician assessment. 3. Images taken with 3 cameras (iPhone 6s, Galaxy S6, DSLR).	To determine the accuracy of the AI algorithm (DERM) in identifying melanoma from dermoscopic images, compared to specialist assessment.	None reported	(All Lesions, at 100% Sensitivity): _ AI (iPhone 6s): Specificity 64.8%. _ Specialists: Specificity 69.9%. (AUROC - All Lesions): _ AI (iPhone 6s): 95.8% (95% CI, 94.1%-97.6%). _ Specialists: 90.8% (95% CI,87.5%-96.1%).	The AI algorithm can detect melanoma from dermoscopic images with a similar level of accuracy as specialists.
Marsden et al. 2024 PMID: 38585154 Prospective, single-centre, masked, non-inferiority trial Weighting from appraisal: 9	700 patient attendances (867 lesions) referred to a UK teledermatology cancer pathway. Per-protocol (PP) population: 622 patients (789 lesions).	1. Standard of Care (SoC): Teledermatology review by consultant dermatologists (using DSLR images). 2. AI-DSS (DERM): Independently assessed smartphone (iPhone XR) images.	Primary: To show the AI had a higher rate of correctly classifying non-malignant lesions (as not needing urgent referral) compared to SoC, while maintaining non-inferior sensitivity.	None reported	Primary Outcome: The AI had a significantly higher rate of correctly identifying non-malignant lesions as not needing urgent referral vs. SoC (p < 0.0246). (Malignancy Sens/Spec, PP pop.): _ SoC: 97.0% (95% CI: 88-99.5%) / 71.9% (95% CI: 68.4-75.1%). _ AI (Real-world): 94.0% (95% CI: 84.7-98.1%) / 73.3% (95% CI: 69.9-76.4%).	The AI (AlaMD) identified significantly more lesions that did not need urgent referral compared to teledermatologists, demonstrating potential to reduce unnecessary referrals and specialist burden.

Clinical data collected on Dermalyser

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Papachristou et al. 2024 PMID: 38234043 Prospective real-life clinical trial Weighting from appraisal: 8.5	228 patients (presenting 253 lesions) seen by 138 trained Primary Care Physicians (PCPs) at 36 primary care centres in Sweden.	1. PCPs' unassisted clinical suspicion (recorded as 'high' or 'low'). 2. An AI-based decision support system (smartphone app Dermalyser®).	To determine the diagnostic performance of an AI-based smartphone app for melanoma detection when used prospectively by PCPs on lesions of concern.	None reported	AI-DSS (standalone, predefined cutoff): _ Sensitivity: 95.2% _ Specificity: 60.3% _ NPV: 99.3% _ AUROC: 0.960 (95% CI: 0.93-0.98) PCPs (unassisted suspicion): _ Sensitivity: 57.1% (12/21) _ Specificity: 83.2% (193/232) * NPV: 95.5%	The AI-based tool showed high diagnostic accuracy. Its high Negative Predictive Value (NPV) suggests it could help PCPs safely identify benign lesions, potentially reducing unnecessary excisions and referrals without increasing the risk of missing melanomas.

Clinical data collected on ModelDerm

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Navarrete-Dechent et al. 2021 PMID: 33049269 External validation study Weighting from appraisal: 5	A public dataset of 100 clinical images of biopsied skin cancers (37 melanomas, 40 BCCs, 23 SCCs) from Caucasian patients in the US.	1. Han et al. (2020b) 174-disease algorithm (modelderm.com), tested with 4 upload methods. 2. Han et al. (2020a) 178-disease region-based algorithm (rcnn.modelderm.com).	To evaluate the external validity and reliability of the Han et al. (2020b) and Han et al. (2020a) algorithms on a public dataset of skin cancers.	None reported	174-disease alg (Intended Use): _ Overall Top-1 accuracy: 39%. _ Overall Top-3 accuracy: 63%. _ Performance was sensitive to upload condition (x1 magnification was worst). 178-disease alg: _ Overall Top any accuracy: 52%.	The 174-disease algorithm showed modest improvement over a previous 12-disease version, but limited transportability to an external dataset remained. The 178-disease algorithm also had low sensitivity. Performance was sensitive to image magnification.
Kim et al. 2022 PMID: 35061692 Prospective controlled before-and-after study Weighting from appraisal: 9.5	285 cases with skin neoplasms suspected of malignancy from two tertiary care centers in South Korea (Asians).	1. AI group (n=144): Trainee doctors (interns/residents) diagnosed, then were assisted by an AI algorithm (http://b2019.modelderm.com) and could modify their diagnosis. 2. Control group (n=141): Trainee doctors diagnosed, then reviewed photos (no AI).	To evaluate whether an AI algorithm (Model Dermatology, build 2019) improves the accuracy of nondermatologists (trainee doctors) in diagnosing skin neoplasms in a real-world setting.	None reported	AI Group (Trainees): * Top-1 accuracy (exact diagnosis) increased from 46.5% to 58.3% (P=.008), an increase of 11.8%. Control Group (Trainees): * Top-1 accuracy did not change significantly (46.1% vs. 51.8%).	In a real-world setting, AI augmented the diagnostic accuracy (for exact diagnosis) of trainee doctors. (Limitation: tested only on Asians).
Navarrete-Dechent et al. 2018 PMID: 29864435 External validation study Weighting from appraisal: 5	100 clinical images of biopsied skin cancers (37 melanomas, 40 BCCs, 23 SCCs) from Caucasian patients in the US (ISIC Archive).	The Han et al. (2018) 12-disease classifier, tested via a public web application.	To explore the generalizability (external validity) of the Han et al. (2018) algorithm on a public dataset of skin cancers.	None reported	Overall: _ Top-1 accuracy was 29% (29 of 100). _ Top-5 accuracy was 58% (58 of 100).	The results suggest that the sensitivity of the Han et al. algorithm, especially for melanoma, is "considerably lower" when applied to a different patient population (external dataset).
Han et al. 2020 PMID: 32243882 Retrospective validation & reader study Weighting from appraisal: 5.5	Validation: Edinburgh dataset (1,300 images; 10 disorders) and SNU dataset (2,201 images; 134 disorders). Reader Study: 240 SNU images tested on 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals.	1. AI-DSS (trained on 220,680 images of 174 disorders). 2. Clinicians (dermatologists, residents) unassisted. 3. Clinicians assisted by the AI-DSS.	To validate an algorithm for multi-class classification (134 disorders), malignancy prediction, and treatment suggestion, and to assess its ability to improve clinician performance.	None reported	AI-DSS (standalone, SNU): _ Top-1 accuracy (134 classes): 44.8%. Clinicians (AI-assisted): _ Top-1 accuracy (134 classes, 4 doctors) improved by 7.0%. * Non-medical pro's malignancy sensitivity improved from 47.6% to 87.5%.	The algorithm may serve as "Augmented Intelligence" that can empower medical professionals in diagnostic dermatology by improving their sensitivity and accuracy.
Muñoz-López et al. 2021 PMID: 33037709 Prospective diagnostic accuracy study Weighting from appraisal: 7	340 consecutive cases (from 281 patients) who submitted images to a teledermatology clinic in Chile. (87 unique diagnoses, mostly inflammatory).	1. AI-DSS (Han et al. 174-disease algorithm; modelderm.com) used by teledermatologist during the visit. 2. Reader study (9 providers: 3 dermatologists, 3 residents, 3 GPs) assessing images only.	To assess the diagnostic performance and potential clinical utility of the AI algorithm in a real-life telemedicine setting using patient-submitted photos.	None reported	Overall Top-1 Accuracy: * AI (41.2%) was lower than Dermatologists (60.1%), Residents (57.8%), and GPs (49.3%). 'In-distribution' Balanced Top-1 Accuracy: * AI (47.6%) was comparable to Dermatologists (49.7%) and Residents (47.7%), and superior to GPs (39.7%).	The AI algorithm's accuracy is inferior to dermatologists for patient-submitted teledermatology images, but it shows promise as a tool for triage or as support for GPs, especially for "in-distribution" diseases.
Han et al. 2020 PMID: 33237903 Retrospective validation study Weighting from appraisal: 5.5	10,426 biopsied cases (43 disorders; 1,222 malignant, 9,204 benign) from Severance Hospital, Korea (2008-2019). Reader test used a subset (1,320 cases).	1. AI-DSS (rcnn.modelderm.com) analyzing unprocessed images. 2. Attending physicians (65) in real-world practice (with full clinical info). 3. Reader test dermatologists (44) using images only.	To compare the performance of a CNN algorithm against dermatologists in both real-world practice (with clinical info) and experimental settings (images only) for diagnosing skin neoplasms.	None reported	Real-world (AI vs. Physicians with clinical info): * AI was inferior. (AUC 0.863 (95% CI: 0.852-0.875)) Sensitivity 62.7% (95% CI: 59.9-65.1%) and Specificity 90% (95% CI: 89.4-90.6%), vs. Physicians' (Sensitivity/Specificity of 70.2%/95.6%). Reader Test (AI vs. Physicians with images only): * AI was comparable. (AI Sensitivity/Specificity 66.9% (95% CI: 57.7-76.0) / 87.4% (95% CI 82.5-92.2)) vs. Readers' (Sens/Spec 65.8% (95% CI: 55.7-75.9) / 85.7% (95% CI: 82.4-88.9%)).	The algorithm diagnosed skin tumors with nearly the same accuracy as dermatologists when using only photographs (experimental setting), but its performance was inferior to physicians in real-world practice, highlighting the value of clinical information.
Han et al. 2022 PMID: 36171272 Retrospective algorithm performance study Weighting from appraisal: 10	1. RD dataset: 1,282 images from Reddit (r/melanoma). 2. Hospital datasets: (Edinburgh, SNU, TeleDerm) for comparison.	1. AI-DSS (Model Dermatology, Build2021; 184 classes). 2. Reader study (6 general physicians, 32 laypersons) on RD dataset.	To investigate whether the algorithm (ModelDerm) can classify images from an Internet community (out-of-distribution) and compare its performance to hospital datasets (in-distribution).	None reported	On Hospital Datasets (SNU/Edinburgh): _ AI performance was equivalent to dermatologists. On RD Dataset (Top-1 Accuracy): _ AI (39.2%) was equivalent to GPs (36.8%) and superior to laypersons (19.2%). * AI performance degraded on inadequate quality images (Top-1: 43.2% vs 32.9%).	The algorithm's performance, while equivalent to dermatologists on curated clinical datasets, "deteriorated" in real-world (RD and TeleDerm) datasets due to poor image quality and out-of-distribution disorders.

Clinical data collected on DermaSensor

Study	Baseline Population	Standard clinical practice or device(s)?	Objective(s)	Safety outcomes	Performance outcomes	Main conclusion
Tepedino M. et al. 2024 PMID: 39142857 Prospective, comparative effectiveness study Weighting from appraisal: 8	155 patients enrolled with 178 lesions. Sex: 56 Men (36.1%) and 99 women (63.9%). Age: Mean 65.6 years (SD 14.3). Phototype: - I: 20 (12.9%) - II: 32 (20.7%) - III: 27 (17.4%) - IV: 23 (14.8%) - V: 42 (27.1%) - VI: 11 (7.1%). Race: White (92.2%), Black/African American (7.1%), Native Hawaiian/Pacific Islander (0.7%).	Test Device: Handheld Elastic Scattering Spectroscopy (ESS) device (DermaSensor). Standard Practice: Primary Care Clinicians (PCCs) evaluating patient-selected lesions (clinical assessment + dermoscopy). Reference Standard: Histopathologic biopsy results (when available) or 3-dermatologist panel consensus reviewing clinical and dermatoscopic images.	To evaluate the performance of the ESS device and PCCs in correctly identifying skin lesions reported by patients as concerning. To determine the device's specificity in correctly classifying benign lesions that patients believed were concerning for skin cancer.	No adverse events related to device use were reported during the conduct of this study.	Device Performance (vs Reference): - Sensitivity: 90.0% (95% CI: 71.4-100.0%). - Specificity: 60.7% (95% CI: 52.5-68.4%). - NPV: 98.9% (95% CI: 93.4-99.8%). - PPV: 13.6% (95% CI: 7.1-24.6%). - AUC: 0.815. - Specificity across phototypes: 53.2% for types I-III and 69.1% for types IV-VI. PCC Performance (vs Reference): - Sensitivity: 40.0% (95% CI: 9.6-70.4%). - Specificity: 84.8% (95% CI: 78.2-89.7%). - AUC: 0.643. Device vs Panel Consensus (Management): - Management Sensitivity: 88.2%. - Management Specificity: 70.4%.	The use of the ESS device by PCCs can improve diagnostic and management sensitivity for select malignant skin lesions by correctly classifying most benign lesions of patient concern. This may increase skin cancer detection while improving access to specialist care.
Merry et al. 2025 DOI: 10.1177/21501319251344423 DERM-SUCCESS: Blinded, prospective, multicenter pivotal study Weighting from appraisal: 7.5	1,005 patients with 1,579 lesions. Sex: Men 48.5%, Women 51.5%. Age: Mean 58.5 years (SD 15.1). Phototype: - I: 9.9% - II: 27.7% - III: 35.0% - IV: 14.7% - V: 10.9% - VI: 1.8%. Race: White 97.1%, Asian 0.9%, Black 0.7%.	Test Device: DermaSensor (ESS) device. Standard Practice: PCPs evaluating suspicious lesions. Reference Standard: Histopathologic analysis by dermatopathologists.	To evaluate the performance of the ESS device to detect melanoma, BCC, and SCC in the primary care setting. Secondary aim: Compare device sensitivity to a performance goal of 90%.	No adverse events were reported in the provided text snippets.	Sensitivity (Overall): 95.5% (95% CI: 91.7-97.6%). Sensitivity (Patients ≥40y): 96.3% (Melanoma 90.2%, BCC 97.8%, SCC 97.7%). Specificity: 20.7%. NPV: 96.6%. PPV: 16.6% (NNB 6.0). Patient-level Sensitivity: 97.4% for all skin cancers in patients ≥40y.	The DermaSensor device is a highly sensitive (96.3%) adjunctive diagnostic tool with high NPV (96.6%) suitable for primary care to help prioritize referrals and monitor lower-risk lesions. It met FDA performance goals for sensitivity.
Jaklitsch et al. 2025 DOI: 10.1016/j.jdin.2025.07.007 Prospective matched comparison study (Research Letter) Weighting from appraisal: 8	150 patients with 150 patient-identified lesions. Sex: Male 48.0%, Female 52.0%. Age: Mean 59.6 years. Lesions: 14.7% malignant (Melanoma n=3, BCC n=9, SCC n=10).	Test Device: ESS Device (AI-enabled). Setting: Outpatient dermatology offices. Reference Standard: Blinded dermatologist management decisions and histopathology.	To evaluate device performance on patient-identified lesions in a dermatology setting to assess its utility for screening lesions appropriate for referral.	None reported.	Vs. Histopathology: - Sensitivity: 100% (22/22). - Specificity: 9.4%. - NPV: 100% (12/12). - PPV: 15.9%. Vs. Dermatologist Decision: - Sensitivity: 95.5%. - Specificity: 10.7%. Spectral Scores 7-10: PPV increased to 31.1% and Specificity to 67.2%.	The ESS device maintains robust sensitivity and NPV (100% in this cohort) for detecting malignant lesions even among patient-identified lesions. It supports the device's role as an adjunctive rule-out tool for triage.
Hartman et al. 2023 DOI: 10.1016/j.jdin.2023.10.011 DERM-ASSESS III: Multicenter prospective blinded study Weighting from appraisal: 8	311 participants with 440 lesions. Sex: Male 53.7%, Female 46.3%. Age: Mean 62.0 years. Phototype: - I: 10.3% - II: 53.1% - III: 21.2% - IV: 6.4% - V: 6.4% - VI: 2.6%. Race: White 97.7%.	Test Device: ESS Device (DermaSensor). Setting: 8 US and 2 Australian dermatology sites. Reference Standard: Histopathology with consensus review (Primary + Secondary).	To investigate the performance of the ESS device specifically in the detection of melanoma within a high-risk population (lesions already suspicious for melanoma).	None reported.	Melanoma Detection: - Sensitivity: 95.5% (42/44) (95% CI: 84.5-98.8%). - Specificity: 32.5% (95% CI: 27.2-38.3%). - NPV: 98.1%. - PPV: 16.0%. - NNB: 6.3 (vs Dermatologist NNB 10). Melanoma + Severely Dysplastic Nevi: Sensitivity 90.9%, NPV 93.0%.	The ESS device's high sensitivity (95.5%) and NPV (98.1%) suggest it may be a useful adjunctive, point-of-care tool for melanoma detection, particularly for triage or in teledermatology settings.
Jaklitsch et al. 2023 DOI: 10.1177/21501319231205979 Reader Study (Comparative effectiveness) Weighting from appraisal: 7.5	Readers: 57 Board-Certified PCPs (77.2% Male, 22.8% Female). Test Cases: 50 lesions (25 malignant, 25 benign) from 44 patients. Patient Phototypes (Lesion source): - I: 11.4% - II: 63.6% - III: 18.2% - IV: 4.5% - V: 2.3%.	Test Device: Handheld ESS device (DermaSensor). Study Design: PCPs evaluated lesions first via clinical images alone, then with ESS device output provided. Reference Standard: Histopathology/Dermatologist Consensus.	To measure the impact of an adjunctive handheld ESS device on the diagnosis and management of skin cancer by PCPs. To compare sensitivity, specificity, and AUC of PCPs with and without the device.	Not applicable (Reader study involving image evaluation).	Diagnostic Sensitivity: - PCPs without device: 67% (95% CI: 62-72%). - PCPs with device: 88% (95% CI: 84-92%) (P less than 0001). Diagnostic Specificity: - PCPs without device: 53%. - PCPs with device: 40% (P=.0516). Management Sensitivity: - PCPs without device: 81%. - PCPs with device: 94% (P=.0009). Melanoma Detection: Improved from 82% (without) to 97% (with device).	The use of the ESS device significantly improved PCP diagnostic and management sensitivity for skin cancer. It also reduced inter-physician variability and increased confidence in management decisions.
Manolakos et al. 2023 DOI: 10.1016/j.jdin.2023.08.019 Prospective, multicenter clinical validation study Weighting from appraisal: 7.5	383 participants (Testing group: 208). Sex (Testing): Male 54.33%, Female 45.67%. Age: 77.4% over 60 years. Phototype (Testing): - I: 20.67% - II: 53.37% - III: 17.79% - IV: 5.77% - V: 2.40% - VI: 0%.	Test Device: ESS Device (DermaSensor). Reference Standard: Expert Dermatologist clinical/dermoscopic assessment and Histopathology (for biopsied lesions).	To assess the safety and effectiveness (sensitivity and specificity) of the ESS device in evaluating lesions suggestive of skin cancer compared to expert dermatologists.	There were no adverse events reported at the completion of this study.	Device Sensitivity (Overall): 97.04% (Melanoma 96.67%, BCC 97.22%, SCC 97.01%). Dermatologist Sensitivity: 96.45% (P=.8203 - no significant difference). Device Specificity: 26.22% (vs Derms 56.10%). NPV: 89.58% (Overall), 83.33% (Biopsied only). PPV: 57.54% (Overall). AUROC: Device 0.773 vs Dermatologist 0.785.	The ESS device demonstrated high sensitivity in detecting skin cancer, comparable to dermatologists. It may assist primary care clinicians in assessing suspicious lesions, potentially reducing morbidity through expedited detection.
Rodriguez-Diaz et al. 2019 DOI: 10.1111/php.13140 Multi-center, non-randomized clinical trial Weighting from appraisal: 7	Training Set: 950 lesions. Testing Set: 357 lesions. Total Patients: 787. Sex: Male 64.7%. Age: Mean 61.3 years. Phototype: - I or II: 63.9% - III: 35.1%.	Test Device: Elastic Scattering Spectroscopy (ESS) + Machine Learning Classifier. Reference Standard: Histopathology of biopsied lesions.	To develop a spectral classification algorithm using ESS measurements to distinguish between the most common types of malignant (High Risk) and benign (Low Risk) skin lesions. To validate sensitivity and specificity against histopathology.	No adverse events were reported for study participants in the course of the study.	Overall Device Performance: - Sensitivity (Melanoma): 100% (14/14). - Sensitivity (NMSC): 94% (105/112). - Overall Sensitivity: 94% (119/126). - Overall Specificity: 36% (84/231). - Specificity (Mildly atypical nevi): 69% (18/26). - AUC: 0.75.	ESS measurements are effective in translating tissue morphology at the cellular level into spectral features. The device has potential as an adjunctive assessment tool to assist physicians in differentiating between common benign and malignant skin lesions with high sensitivity.
Upile et al. 2012 DOI: 10.1016/j.pdpdt.2011.12.003 Pilot/Proof of Principle "In Vivo" Study Weighting from appraisal: 7	73 patients. Sex: Male 61.6%, Female 38.4%. Age: Mean 44.9 years. Race: Caucasian 100%.	Test Device: ESS system (Early version/prototype). Reference Standard: Histopathology.	To compare findings of ESS with gold standard histopathology in patients with facial skin lesions to determine if it can distinguish normal, benign, and malignant tissue.	None reported.	Malignant vs Benign: - Sensitivity: 84%. - Specificity: 89%. - Accuracy: 86%. Malignant vs Normal Skin: - Sensitivity: 88%. - Specificity: 74%. BCC vs Normal Skin: - Sensitivity: 77.8%. - Specificity: 80.3%.	This preliminary study demonstrated that ESS can objectively distinguish between normal, benign, and malignant skin conditions (specifically BCC) in real-time.

Results: data from registries and databases

No registry reports were found in this literature search.

Results of the vigilance databases analysis

The two records found in the vigilance database searches are the two registered medical devices in EUDAMED by SkinVision. Thus, no vigilance data records were found in this vigilance search after screening.

Applicable standards

As previously mentioned, the manufacturer already identified the applicable standards for the device under evaluation. No additional search has been conducted. The list of applicable standards is available in the "Applicable standards" section of this document.

State of the Art presentation

Introduction to Dermatology and Clinical Challenges

Dermatological conditions represent a relevant health problem globally. The reliance of dermatology on visual diagnosis has made it a key area for the application of telehealth methods, particularly store-and-forward (SF) teledermatology (Giavina-Bianchi et al. 2020) The current landscape faces several critical challenges that affect patient access and clinical efficiency:

Extended Wait Times and Access Issues: The overall challenge is minimizing the time patients wait for a dermatological appointment (Giavina-Bianchi et al. 2020). SF-TD has been shown to improve access to specialized care and reduce time to treatment, resulting in high patient satisfaction (Giavina-Bianchi et al. 2020; Eminovic et al. 2009)].
Diagnostic Accuracy and Consistency: A major objective in clinical practice is reducing unnecessary referrals while maintaining high sensitivity for malignancy detection (Giavina-Bianchi et al. 2020). Diagnostic accuracy for skin cancer is still higher for face-to-face dermatologists (67% to 85%), but teledermatology accuracy ranges from 51% to 85% (Chen et al. 2024). Studies suggest current data are insufficient to conclude on the superiority of dermatologists or the adequacy of Primary Care Providers (PCPs) for melanoma care (Chen et al. 2001).
Variability assessment: Objective scoring of disease severity is crucial for longitudinal monitoring. Existing measures for conditions like Hidradenitis Suppurativa (HS) often exhibit low inter-rater reliability (Thorlacius et al. 2019). For instance, inter-rater reliability for lesion counts in HS ranged from poor for abscesses (ICC=0.07) to fair for inflammatory nodules (ICC=0.40) (Goldfarb et al. 2021).

Application of Artificial Intelligence in Dermatology

Artificial intelligence (AI) is a rapidly emerging field in dermatology, leveraging deep learning (DL) and convolutional neural networks (CNNs) for image analysis (Baker et al. 2022). AI-guided medical devices are primarily designed to address the aforementioned challenges by serving as diagnostic decision support tools (Escalé-Besa et al. 2023; Han et al. 2020).

Triage and Caseload Reduction: Successful implementation has been shown to reduce the caseload for hospital specialists (Marsden et al. 2024; Han et al. 2022). A pilot study using an AI teledermatology service demonstrated a 62% reduction in the number of patients requiring an urgent face-to-face appointment with a dermatologist (Baker et al. 2022; Orekoya et al. 2021; Thomas et al. 2023). In one pathway, 19% of cases identified as benign by the AI were discharged immediately back to the General Practitioner (GP) (Baker et al. 2022).
Augmentation, Not Substitution: CNNs alone will not replace the contextual knowledge of dermatologists; rather, the combination of CNN and human dermatologists has the potential to improve the diagnostic accuracy of cutaneous tumors (Han et al. 2020; Ba et al. 2022). AI assistance significantly improved the sensitivity of 47 clinicians for malignancy prediction by 12.1% (Han et al. 2020). For non-medical professionals, sensitivity improved by 83.8% (Han et al. 2020).
Need for Real-World Validation: While diagnostic yields are high in silico, prospective studies conducted under real-life conditions utilizing non-standardized imaging are imperative for validating these tools before they are adopted into primary care (Escalé-Besa et al. 2023) A systematic review found that AI in the hands of clinicians has the potential to improve diagnostic accuracy, but noted that most studies were conducted in experimental settings, highlighting the need for future investigation in real-life settings (Krakowski et al. 2024).

Similar devices

DERM (Deep Ensemble for Recognition of Malignancy)

DERM is an AI-based decision support system designed to assist in the detection of skin cancer, particularly melanoma. It utilizes deep learning algorithms to analyze dermoscopic images and classify lesions based on their malignancy risk. The system has been evaluated in several clinical studies, demonstrating its potential to improve diagnostic accuracy and reduce unnecessary referrals in dermatology practice. The key component is the AlaMD algorithm (Marsden et al. 2024). It currently holds UK Conformity Assessed (UKCA) Class IIa approval, granted in April 2022, and CE marking as a Class III medical device under European Medical Device Regulation (MDR) 2017/745.

In a comparison study, DERM achieved a sensitivity of 91.0% for skin cancer detection, which was lower than the standard of care (SoC) sensitivity of 97.0%. However, DERM demonstrated a higher specificity of 80.4% compared to the SoC specificity of 71.9% (Marsden et al. 2024). This indicates that while DERM may miss some malignant cases, it is more effective at correctly identifying benign lesions, potentially reducing unnecessary biopsies and referrals, being this reduction of 3 for AI compared to 4.2 for SoC, suggesting also improved efficiency in resource utilization in dermatology clinics (Marsden et al. 2024).

In addition to this, a real-world post-deployment study on DERM-vB at the UHB site confirmed a high sensitivity for melanoma detection (100.0% for 58/58 lesions) and a Negative Predictive Value (NPV) of 100.0% for melanoma or not (2045/2045) (Thomas et al. 2023). It also showed that the service integrating DERM overall had a 62% reduction in the number of patients requiring an urgent face-to-face appointment with a dermatologist (Thomas et al. 2023; Baker et al. 2022).

Several limitations are described in the study of Marsden et al. 2024. The real-world evaluation of AI lacks standardized methods. Differential verification bias is a concern in trials since ethical concerns prevent biopsy of all patients with low likelihood of cancer (Marsden et al. 2024). Additionally, the performance of DERM may vary based on the population and clinical setting and it has not been validated in phototypes V and VI, necessitating further validation across diverse cohorts to ensure generalizability (Marsden et al. 2024).

Huvy (SLC.AI)

Huvy is an AI-powered dermatology platform developed by SLC.AI that aims to enhance skin cancer detection and diagnosis. It employs advanced machine learning algorithms to analyze skin lesion images and provide risk assessments for malignancy. Huvy is designed to assist dermatologists and primary care providers for the adjunctive assessment of cutaneous pigmented lesions (Zanchetta et al. 2025). It has received CE marking as a Class IIb medical device under European Medical Device Regulation (MDR) 2017/745.

In the study published by Zanchetta et al. (2025), they focused on creating an innovative deep learning algorithm for three-level melanoma detection (high risk, doubtful, and benign) across different dermatoscopic and tele-dermatology datasets (Zanchetta et al. 2025). The research included real-life pictures taken by primary care practitioners for teledermatology, aligning the study with use in non-specialist settings (Zanchetta et al. 2025).

Some limitations are developed in the study of Zanchetta et al. (2025). Rigorous testing of HUVY was limited to pigmented melanomas and explicitly excluded mucosal or large lesions, tattoos, and Fitzpatrick skin grades IV-VI (Zanchetta et al. 2025). The device also requires images captured by approved dermoscopic hardware systems.

SkinVision

SkinVision is an AI-supported mobile application designed to assess skin lesions for potential malignancy. The app utilizes machine learning algorithms to analyze images of skin lesions taken by users and provides a risk assessment (Maier et al. 2014). It achieved CE marking as a Class IIa medical device under European Medical Device Regulation (MDR) 2017/745 in August 5, 2025.

Early versions of the algorithm achieved an accuracy of 81%, with sensitivity of 73% and specificity of 83% for melanoma detection, though dermatologist's evaluation was superior (Maier et al. 2014). Notwithstanding, the newer version has demonstrated a high sensitivity of 95% to detect skin cancer, suggesting it may be a valuable tool for early detection (Udrea et al. 2019).

One large prospective study found the app had a sensitivity of 86.6% and a specificity of 70.8% (Udrea et al. 2019). This study demonstrated performance variability based on the device type: the app performed at a significantly higher sensitivity on iOS devices (91.0%) compared to Android (83.0%) (p=0.02). Specificity did not significantly differ between device types (71.5% vs 69.0%).

On the other hand, the sensitivity was found to be higher for lesions in skin fold areas (92.9%) compared to non-skin fold areas (84.2%) (p=0.03) (Sangers et al. 2022). Specificity was also higher for skin fold areas (72.0%) compared to non-skin fold areas (56.5%) (p=0.04) (Sangers et al. 2022).

Regarding the perception of HCPs, qualitative studies have captured the real-time experiences of GPs and doctor's assistants using the app during consultations, being these positives overall (Gregoor et al. 2023).

It is important to highlight that future research is needed to study the app's performance in diverse populations, including different skin types. Performance optimization is noted to be dependent on the continual availability of more data to train the risk classification algorithm (Udrea et al. 2019).

Dermalyser

Dermalyser is image analysis software utilizing clinically validated AI as a decision-support system for medical professionals when assessing suspected lesions for skin cancer. It is used in conjunction with a smartphone-compatible dermatoscope. It has received CE marking as a Class IIa medical device under European Medical Device Regulation (MDR) 2017/745.

When tested in a real-life primary care setting, the underlying model showed a Top-3 accuracy (75%) comparable to that of GPs (76%) for known diseases on which the algorithm had been trained (Papachristou et al. 2024). Furthermore, 92% of GPs considered it a useful diagnostic support tool for differential diagnosis.

Several limitations are described in the study of Papachristou et al. 2024. The study was conducted in Sweden, where PCPs have undergone specific training in dermatology, which may limit the generalizability of the findings to other healthcare settings (Papachristou et al. 2024). Additionally, the study did not include a control group of PCPs not using the AI tool, making it difficult to isolate the effect of the AI on diagnostic performance (Papachristou et al. 2024). The study emphasized the critical need for external testing in real-life conditions for data validation and regulation before such AI diagnostic models can be widely used in primary care (Papachristou et al. 2024).

ModelDerm

ModelDerm (Model Dermatology) is a neural network designed to function as augmented intelligence, classifying numerous skin disorders (up to 184) and often providing multiclass classification, malignancy prediction, and treatment suggestions (Han et al. 2020). It has received CE marking as a Class I medical device under European Medical Device Regulation (MDR) 2017/745.

In the studies carried out with the device, the standalone algorithm achieved an Area Under the Curve (AUC) for malignancy detection of 0.937 on the Asian SNU dataset and 0.928 on the Caucasian Edinburgh dataset (Han et al. 2020; Krakowski et al. 2024). For multi-class classification of 134 disorders, the algorithm achieved a Top-1 accuracy of 44.8% on the SNU dataset and a Top-5 accuracy of 78.1%.

When assisting clinicians, the AI significantly improved their diagnostic performance. For instance, the Top-1 accuracy of clinicians improved by 7.0% when assisted by the AI, and non-medical professionals saw an improvement in malignancy sensitivity from 47.6% to 87.5% with AI assistance (Han et al. 2020; Krakowski et al. 2024).

When used by HCPs, AI assistance significantly improved the diagnostic accuracy of clinicians (Han et al. 2020; Krakowski et al. 2024). A randomized controlled trial involving 576 cases confirmed that the AI-assisted group had a significantly higher Top-1 accuracy (53.9%) compared to the unaided group (43.8%, P=0.019) (Han et al. 2022). The augmentation was most significant for non-dermatology trainees, whose accuracy improved by 25.0%. However, the augmentation for dermatology residents, who had more experience, was generally non-significant (Han et al. 2022). Furthermore, for a subset of biopsied cases, the accuracy of AI-augmented trainees was comparable to that of attending dermatologists (Han et al. 2022). The system can also predict primary treatment options (e.g., steroids, antibiotics, antivirals, antifungals) with AUCs ranging from 0.828 to 0.918 (Han et al. 2020).

Finally, in a real-world teledermatology setting, The algorithm, in an environment simulating teledermatology, showed an ability to triage Internet community-acquired images with the same accuracy level as general physicians (Han et al. 2022).

Several limitations have been described in the studies carried out with ModelDerm. It is true that performance generally degrades when applied to real-world, diverse image types (Han et al. 2022). When tested retrospectively for external validity, the algorithm's Top-1 accuracy was sometimes low (e.g., 29.7% for melanoma) (Navarrete et al. 2020). Furthermore, performance may drop significantly when the AI's top predictions are incorrect (Han et al. 2022; Krakowski et al. 2024). Additionally, the majority of training and validation images were of Asian patients (Fitzpatrick types III/IV), necessitating further testing across various races and ethnicities (Han et al. 2020; Han et al. 2022).

DermaSensor

The DermaSensor device employs Elastic Scattering Spectroscopy (ESS) combined with AI to distinguish between benign and malignant tissue in vivo (Ferris et al. 2025). It is a point-of-care, handheld adjunctive diagnostic device for use by non-dermatology experts(Ferris et al. 2025), classifying lesions as "Monitor" (low risk) or "Investigate Further" (high risk, scored 1-10) for all three common types of skin cancer (melanoma, BCC, SCC). It has received FDA clearance as a class II medical device.

A pivotal study involving PCPs demonstrated that the device achieved a sensitivity of 95.5% and 64.8% specificity for all malignant lesions (Merry et al. 2025). In patients aged 40 years and older, the sensitivity increased to 96.3%, with a Negative Predictive Value (NPV) of 96.6% (Merry et al. 2025). The device's high sensitivity suggests it is effective as a rule-out tool for skin cancer in primary care settings (Merry et al. 2025).

A companion study found that device output significantly increased PCP management sensitivity from 82.0% to 91.4% (P=.0027), nearly halving the false negative rate, and overall management performance (AUC) increased from 0.708 to 0.762 (Ferris et al. 2025). PCPs' confidence in their management decision significantly increased with the device's use (Ferris et al. 2025; Jaklitsch et al. 2023). Notably, the device had a greater impact on less experienced PCPs, with management sensitivity improving from 73.2% to 88.5% (P lower than 0001) (Ferris et al. 2025).

Several limitations have been described in the studies carried out with DermaSensor. The spectral score (1-10) aids in prioritizing referrals, as positive predictive value (PPV) increases with higher scores (Tepedino et al. 2024). This low specificity leads to a high number needed to refer (NNR=7.4) (Tepedino et al. 2024). The evaluation focused primarily on skin cancer, with most of the patient population being Fitzpatrick skin types I-IV (Tepedino et al. 2024; Manolakos et al 2023).

Expected benefits of AI-guided medical devices in dermatology

The expected benefits of deploying AI-guided systems in dermatology directly address the clinical challenges identified:

Improved Access and Triage: AI systems act as an automated clinical management tool, enabling screening and triage, thereby reducing unnecessary referrals and significantly lowering the hospital specialist caseload (Marsden et al. 2024; Baker et al. 2022). This aids in resolving the increasing burden of non-urgent referrals (Escalé-Besa et al. 2023).
Diagnostic Accuracy improvement: AI enhances the diagnostic performance of healthcare professionals, particularly less experienced users (non-dermatology trainees or PCPs) (Han et al. 2022). These tools expand the range of differential diagnoses considered by clinicians, providing a Top-5 list that can help broaden their diagnostic and therapeutic approaches (Escalé-Besa et al. 2023).
Objective severity assessment: AI provides the ability to quantify visible clinical signs (such as intensity, count, and extension of features like erythema, scaling, and induration). This precise, objective measurement aids in severity assessment and is specifically designed to facilitate the longitudinal monitoring of skin conditions, as it has been demonstrated in the clinical validations carried out with the device.
Standarization and Transparency: AI facilitates the standardization of image acquisition and interpretation processes. It can provide decision support by suggesting appropriate ICD classes, assisting in the initial stages of diagnosis and treatment planning.

Hazards due to AI-Guided Medical Devices that Could be Relevant to the Device under Evaluation

While AI-guided medical devices do not typically introduce physical hazards associated with invasive procedures, the primary risks relate to diagnostic error and system integrity. No safety data regarding hazardous events or harm to the patient/user were identified in the literature for similar AI-guided systems in dermatology. However, potential hazards for clinical decision support systems include:

Misdiagnosis: A primary risk is the AI providing incorrect clinical information, resulting in a false negative (malignant lesion classified as benign) (Krakowski et al. 2024). Faulty AI can mislead the entire spectrum of clinicians, including experts (Han et al. 2022). Hence, it is crucial that manufacturers acknowledge this and address it in their risk management processes, ensuring that users are aware of the AI's limitations and the necessity always for clinical judgment.
Poor Image Quality or Artifacts: The AI relies heavily on the input image quality. Suboptimal image quality, artifacts, or poor lighting can compromise device performance (Navarrete el al. 2020). This risk is mitigated by devices providing warnings and guidance on proper image capture, and certain devices (like DERM) assess the performance of the service integrating AI based on the lesion- and case-level analysis (Thomas et al. 2020). In the case of our device, this is addressed by the image quality assessment module, which ensures that only images meeting specific quality criteria are processed by the AI algorithm.
Out-of-Distribution Cases: AI models may perform poorly when encountering cases that differ significantly from the training data, such as rare conditions, images from diverse populations real-world images or internet community-acquired images (Han et al. 2022). This can lead to misclassification and diagnostic errors. Manufacturers should ensure that their AI systems are trained on diverse datasets and include mechanisms to identify and flag out-of-distribution cases.
Equity and Bias: Continued surveillance is needed to ensure equitable access, particularly since patients with darkly pigmented skin. he exclusion of certain Fitzpatrick skin types (e.g., V and VI) in validation studies remains a persistent limitation in the field (Papachristou et al. 2024; Jain et al. 2021).

Benefit-Risk Profiles of Alternative AI-Guided Medical Devices

To evaluate the state-of-the-art landscape, we examined the benefit-risk profiles of the AI-guided medical devices for diagnostic support in dermatology previously described. The following table summarizes the key benefits and risks associated with each device:

Device	Primary benefit	Key study results	Primary risk (and mitigation)
DERM	Triage and caseload reduction	62% reduction in urgent face-to-face appointments; Sensitivity 91.0%, Specificity 80.4% for skin cancer detection (Baker et al. 2022; Thomas et al. 2023). Achieved NPV of 100.0% for melanoma in real-world post-deployment. A reduction of requested biopsys (3 for AI vs 4.2 SoC (Marsden et al. 2024))	Misdiagnosis and risk of False Negatives (FN).(mitigated by clinical judgment and training and rigorous Post-Market Surveillance (PMS))
Huvy	Supports diagnostic support for melanoma through three-level classification designed to improve referral accuracy from Primary Care (Zanchetta et al. 2025)	High accuracy in classifying pigmented lesions; however, limited to specific lesion types and skin grades (Zanchetta et al. 2025)	Limited intended use (pigmented lesions only, exclusion of Fitzpatrick IV-VI). Requires specific dermoscopic hardware systems (mitigated by clear usage guidelines and clinical oversight)
SkinVision	Early detection of skin cancer through user-friendly mobile app. High sensitivity for screening purposes and adaptability to consumer devices.	Achieved 95% sensitivity for skin cancer detection (Udrea et al. 2019; Sangers et al. 2022)	Performance variability based on device and use environment. Early studies showed physician diagnosis was superior to the app alone (mitigated by user education and continuous algorithm updates)
Dermalyser	Targeted approach to improving melanoma detection in Primary Care settings (Papachristou et al. 2024)	Top-3 accuracy of 75%, comparable to GPs (76%) for known diseases; 92% of GPs found it useful (Papachristou et al. 2024)	Exclusion of all non-melanoma skin cancers (BCC, SCC) and exclusion of melanin-rich skin types (V-VI) (mitigated by further validation studies in diverse settings)
ModelDerm	Significant augmentation of diagnostic accuracy, especially for non-expert clinicians, across a wide variety of diseases (up to 134 disorders) (Han et al. 2020)	Achieved AUC of 0.937 for malignancy detection; significantly improved clinician diagnostic accuracy when assisted by AI (25% increase) (Han et al. 2020; Han et al. 2022)	Misdiagnosis, especially in out-of-distribution cases and diverse population. Risk of reliance on faulty predictions; incorrect AI prediction can lead to a 12.2% drop in accuracy for trainees (mitigated by diverse training datasets and user awareness of limitations)
DermaSensor	FDA-cleared tool for non-dermatology experts (Ferris et al. 2025) Detects all three common skin cancers (MM, BCC, SCC). Significantly augments PCP management sensitivity (from 82.0% to 91.4%) (Ferris et al. 2025)	Low specificity compared to dermatologists (Rodríguez-Díaz et al. 2019; Tepedino et al. 2024) Requires physical contact with lesion (ESS).	Favorable for primary care screening; high NPV makes it safe for monitoring potential benign lesions (Merry et al. 2025; Tepedino et al. 2024).

Discussion

Based on the clinical data provided by the literature, the state-of-the-art demonstrates that AI-guided medical devices have successfully transitioned from in silico performance studies to impactful real-world clinical integration, significantly enhancing triage and reducing specialist caseloads (Baker et al. 2022; Thomas et al. 2023). By offering diagnostic support and objective severity assessment, these tools directly combat the structural problems of long wait times and inconsistent diagnostic accuracy between care levels (Han et al. 2022). Studies consistently show that the least experienced clinicians gain the most from AI-based support, making these tools highly valuable for augmenting Primary Care Practitioners (PCPs) (Han et al. 2022; Jahn et al. 2022).

However, the effectiveness of AI remains intrinsically linked to the operational environment. Performance generally degrades when applied to real-world, diverse image types (Han et al. 2022). This reality underscores the necessity of implementing AI as an adjunctive tool that augments, rather than replaces, human intelligence. The combination of AI and clinician expertise has been shown to yield the highest diagnostic accuracy, particularly when clinicians are aware of the AI's limitations and maintain critical oversight (Han et al. 2020; Han et al. 2022). Rigorous adherence to regulatory standards is crucial, including implementing robust Post-Market Surveillance (PMS) plans, documenting Root Cause Analysis (RCA) for possible problems detected, and providing transparency regarding algorithm characteristics to users (Thomas et al. 2023). The growing body of real-world evidence confirms that when integrated correctly and used under human clinical supervision, AI systems offer a favorable benefit-risk profile, improving access and supporting clinical decision-making across the spectrum of skin conditions. Ultimately, AI is positioned not as a substitute for human expertise, but as an essential tool to ensure safety, efficiency, and standardization across dermatological care pathways (Jain et al. 2021).

Synthesis

The following table provides a concise synthesis of the state-of-the-art analysis and the implications for safe clinical adoption of AI-guided dermatology tools in primary and specialist care.

Aspect	Details
1. Methodological Referential for Bibliographic Search	- MedDev 2.7/1 Rev.4 (applicable guidance for clinical evaluation) - PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses
2. Type of search	Systematic (documented search strategy, screening, eligibility and selection steps; audit trail available in methods section).
3. Results (bibliographic search)	Source search yielded N = 227 candidate records. After de-duplication and multi-stage screening, n = 64 clinical articles were included and appraised for methodological quality and relevance. An additional n = 10 items (primarily two manuscripts, 8 guidelines and contextual documents) were referenced to inform clinical context; total material considered = 74. Breakdown used for appraisal: 66 clinical studies; 8 clinical guidelines; 0 unpublished trial reports; 0 registry reports.
4. Referential for data appraisal and weighting	- IMDRF MDCE WG/N56FINAL:2019 (risk-based clinical evaluation principles) - Internal appraisal templates informed by Yale and Johns Hopkins academic resources (see Methods)
5. Results (appraisal summary / mean weight)	Appraisal summary for clinical datasets (n = 64): mean weight = 6.91 / 10. Additional metrics: mean relevance = 4.62 / 6; mean quality = 2.60 / 4; mean level of clinical evidence = 6.0 / 10. Note: datasets with weight < 4 require justification in the clinical evaluation file; none of the included datasets used in the main analysis had weight < 4 without documented rationale.
6. Use	Intended use statement: AI-guided medical devices are intended as an adjunctive clinical decision support tools to assist clinicians (primary care practitioners and dermatologists) during dermatology consultation workflows for triage and diagnostic evaluation of skin conditions. It is not intended to replace clinician judgment. Target population: patients presenting with skin lesions or dermatological complaints across adult age groups. User training, labeling, and intended use constraints consistent with similar devices in the literature are required.
7. Expected complications	Observed/anticipated hazards: no direct patient harm events attributable to similar devices were identified in the reviewed clinical evidence. Principal risks to be managed: (1) reduced accuracy on heterogeneous, real-world images (dataset shift); (2) inappropriate clinician reliance on AI outputs when used without verification (automation bias); (3) false-negative results leading to missed malignancy or delayed referral; (4) false-positive results increasing unnecessary referrals/biopsies. Recommended risk controls: human-in-the-loop workflow, explicit user instructions and limitations, mandatory training, robust PMS and RCA procedures, and monitoring of real-world performance metrics.
8. Expected benefits and performances	Access to specialist dermatology services is constrained in many health systems, with variable wait times and heterogeneous diagnostic performance between primary care practitioners (PCPs) and dermatologists. The reviewed literature confirms consistent performance gaps (PCPs show lower sensitivity than dermatologists on clinical image assessments), and that dermoscopy and specialist assessment improve diagnostic accuracy. AI tools have been studied primarily as adjuncts to clinician assessment and as standalone classifiers on curated image sets; real-world performance is commonly lower than reported in controlled datasets, underscoring the need for robust external validation and post-market surveillance. - Clinical performance observed in reviewed literature: on curated dermoscopic test sets, standalone AI classifiers typically reported sensitivity in the approximate range 80–86% and specificity in the range 77-83%. High-quality meta-analytic evidence (systematic reviews) reports pooled sensitivity and specificity that are consistent with these ranges for melanoma detection using dermoscopic images; performance on clinical (unmagnified) images is lower and more variable. Comparative reader studies demonstrate that AI, when used as a diagnostic adjunct, improves clinician sensitivity and overall accuracy (for example, Maron et al. 2020 reported clinician sensitivity increase from ~59% to ~75% with AI assistance; other reader and trial studies show similar magnitude improvements in sensitivity and modest improvements in specificity or overall accuracy). - Expected clinical benefits: improved detection sensitivity for malignancy (reducing missed cancers), standardization of preliminary triage decisions, support for prioritization of referrals to secondary care, potential reduction in unnecessary specialist referrals and benign biopsies when AI is combined with clinical assessment, and increased efficiency in workflows (fewer repeat assessments, faster triage). Benefits are contingent on correct deployment: appropriate external validation, integration into clinician workflows with human oversight, and active PMS to detect performance drift. Conclusion: the evidence supports adoption as a clinician-support tool under controlled conditions and with documented risk controls; standalone use without clinician oversight is not supported by the available clinical evidence and is not recommended in the intended use statement.

References

Abu Baker, K. et al. (2022). Using artificial intelligence to triage skin cancer referrals: outcomes from a pilot study. British Journal of Dermatology, 188(Supplement 4), ljad113.372.

Ahadi, M. S. et al. (2021). [Open access article on a specialized topic]. Journal of Otorhinolaryngology, Head and Neck Surgery.

Augustin, M. and Reusch, M. (2013). European Dermatology Health Care Survey 2013. Short Report. [pdf] Hamburg: CVderm, German Center for Health Services Research in Dermatology.

Ba, W. et al. (2022). [Convolutional neural networks for cutaneous tumour classification]. European Journal of Cancer. DOI: 10.1016/j.ejca.2022.04.015.

Barata, C. et al. (2023). [A reinforcement learning model for AI based decision support in skin cancer]. Nature Medicine. DOI: 10.1038/s41591-023-02475-5.

Brinker, T. J. et al. (2019a). [Skin cancer classification using convolutional neural networks]. European Journal of Cancer. DOI: 10.1016/j.ejca.2019.04.001.

Brinker, T. J. et al. (2019b). [Superior skin cancer classification by the combination of human and artificial intelligence]. European Journal of Cancer, 119, 11–17. DOI: 10.1016/j.ejca.2019.05.023.

Burton, R. C. et al. (1998). General practitioner screening for melanoma: sensitivity, specificity, and effect of training. J Med Screen, 5, 156-161.

Chen, S. C. et al. (2001). Diagnosing and managing cutaneous pigmented lesions: primary care physicians versus dermatologists. Arch Dermatol, 137(12), 1627–1634.

Chen, S. et al. (2024). [Systematic Review of Skin Cancer Diagnosis by Clinicians]. JAMA Dermatology, 161(2). DOI: 10.1001/jamadermatol.2024.4382.

Cho, S. I. et al. (2019). [Deep learning for lip cancer diagnosis]. British Journal of Dermatology. DOI: 10.1111/bjd.18459.

Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213-220. doi:10.1037/h0026256

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Diamond IR, Grant RC, Feldman BM, et al. Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol. 2014;67(4):401-409. doi:10.1016/j.jclinepi.2013.12.002

Eminović, N. et al. (2009). Effect of patient-assisted teledermatology on outpatient referral rates. Archives of Dermatology, 145(5), 557-563.

Escalé-Besa, A. et al. (2023). Evaluation of an AI model for skin conditions in a real-life primary care setting. Scientific Reports, 13(4293). DOI: 10.1038/s41598-023-31340-1.

Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co.

Ferris, L. K. et al. (2025). DERM-SUCCESS FDA Pivotal Study: A Multi-Reader Multi-Case Evaluation of Primary Care Physicians Skin Cancer Detection Using Al-Enabled Elastic Scattering Spectroscopy. Journal of Primary Care & Community Health, 16. DOI: 10.1177/21501319251342106.

Fitch, et al. The RAND/UCLA Appropriateness Method User's Manual. Santa Monica, CA: RAND Corporation, 2001. https://www.rand.org/pubs/monograph_reports/MR1269.html.

Fujisawa, Y. et al. (2018). Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. British Journal of Dermatology, 180(2), 373-381. DOI: 10.1111/bjd.16924.

Gerbert, B. et al. (1996). Primary care physicians as gatekeepers in managed care: primary care physicians' and dermatologists' skills at secondary prevention of skin cancer. Arch Dermatol, 132, 1030-1038.

Giavina-Bianchi, M. et al. (2020a). Benefits of Teledermatology for Geriatric Patients: Population-Based Cross-Sectional Study. Journal of Medical Internet Research, 22(4), e16700. DOI: 10.2196/16700.

Giavina-Bianchi, M. et al. (2020b). [Teletriage project from July 2017 to July 2018 in São Paulo, Brazil]. EClinicalMedicine, 29-30, 100641.

Goldfarb, N. et al. (2021). Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R): psychometric property assessment. British Journal of Dermatology, 184(5), 905-912. DOI: 10.1111/bjd.19565.

Goyal, M. et al. (2020). Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities. Computers in Biology and Medicine, 127, 104065. DOI: 10.1016/j.compbiomed.2020.104065.

Gregoor, A. S. et al. (2023). The impact of an artificial intelligence-based app on healthcare consumption: results of the SPOT cluster randomized controlled trial. eClinicalMedicine, 60, 102019. DOI: 10.1016/j.eclinm.2023.102019.

Gregoor, A. S. et al. (2023). The value of an AI-based smartphone application on health care resource utilisation: a case-control study. npj Digital Medicine, 7(90). DOI: 10.1038/s41746-023-00831-w.

Haenssle, H. A. et al. (2018). Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology, 29(8), 1836-1842. DOI: 10.1093/annonc/mdy166.

Han, S. S. et al. (2018). Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. Journal of Investigative Dermatology, 138(7), 1529–1538.

Han, S. S. et al. (2020). Augmented Intelligence Dermatology in Classifying 134 Skin Disorders. Journal of Investigative Dermatology, 140(8), 1756-1762. DOI: 10.1016/j.jid.2020.01.019.

Han, S. S. et al. (2020). Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study. PLOS Medicine, 17(11), e1003381. DOI: 10.1371/journal.pmed.1003381.

Han, S. S. et al. (2022). Evaluation of Artificial Intelligence-Assisted Diagnosis of Skin Neoplasms: A Single-Center, Paralleled, Unmasked, Randomized Controlled Trial. Journal of Investigative Dermatology, 142(9), 2353–2362. DOI: 10.1016/j.jid.2022.02.003.

Han, S. S. et al. (2022). Clinical utility of an artificial intelligence-based decision support system for skin cancer in non-dermatologist reader tests using real-world data. Scientific Reports, 12(16260). DOI: 10.1038/s41598-022-20632-7.

Hartman, R. I. et al. (2023). Multicenter prospective blinded melanoma detection study with a handheld elastic scattering spectroscopy device. JAAD International, 15, 24-31. DOI: 10.1016/j.jdin.2023.10.011.

Hsiao, J. L. et al. (2008). Impact of teledermatology on outpatient care and referrals. Journal of the American Academy of Dermatology, 59(3), 448-453.

Jahn, A. S. et al. (2022). Melanoma Detection by a Deep Learning Convolutional Neural Network on Clinical Images: An Analysis of Potential Clinical Use. Cancers, 14(15), 3829.

Jain, A. et al. (2021). Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Network Open, 4(4), e217249. DOI: 10.1001/jamanetworkopen.2021.7249.

Jaklitsch, E. et al. (2023). Clinical Utility of an AI-powered, Handheld Elastic Scattering Spectroscopy Device on the Diagnosis and Management of Skin Cancer by Primary Care Physicians. Journal of Primary Care & Community Health, 14, 1-7. DOI: 10.1177/21501319231205979.

Jaklitsch, E. et al. (2025). Prospective evaluation of an AI-enabled elastic scattering spectroscopy device for triage of patient-identified skin lesions in dermatology clinics. JAAD International, 17, 27-28. DOI: 10.1016/j.jdin.2025.07.007.

Kheterpal, M. et al. (2023). Teledermatology (TD) is an evidence-based practice that may increase access to dermatologic care. [Manuscript on implementation of hybrid TD program]. (Preprint). DOI: 10.21203/rs.3.rs-2558425/v1.

Kim, Y. J. et al. (2022). Augmenting the accuracy of trainee doctors in diagnosing skin lesions suspected of skin neoplasms in a real-world setting: A prospective controlled before-and-after study. PLOS ONE, 17(1), e0260895. DOI: 10.1371/journal.pone.0260895.

Knol, A. et al. (2006). The value of teledermatology for the decision to refer to a dermatologist: a randomized controlled trial. Journal of Telemedicine and Telecare, 12(2), 74-79.

Krakowski, I. et al. (2024). The diagnostic accuracy of artificial intelligence-assisted skin cancer detection: a systematic review and meta-analysis. npj Digital Medicine, 7(78). DOI: 10.1038/s41746-024-01031-w.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174.

Lee, S. P. et al. (2020). Augmented decision-making improves the diagnostic performance of clinicians. [Manuscript on CNNs for skin lesions].

Liu, Y. et al. (2020). A deep learning system for differential diagnosis of skin diseases. Nature Medicine, 26(6), 900-908. DOI: 10.1038/s41591-020-0842-3.

Maier T, Kulichova D, Schotten K, et al. Accuracy of a smartphone application using fractal image analysis of pigmented moles compared to clinical diagnosis and histological result. J Eur Acad Dermatol Venereol. 2015;29(4):663-667. doi:10.1111/jdv.12648

Manolakos, D. et al. (2023). Use of an elastic-scattering spectroscopy and artificial intelligence device in the assessment of lesions suggestive of skin cancer: A comparative effectiveness study. JAAD International, 14, 52-58. DOI: 10.1016/j.jdin.2023.08.019.

Marchetti, M. A. et al. (2019). Computer Algorithms Show Potential for Improving Dermatologists' Accuracy to Diagnose Cutaneous Melanoma; Results of ISIC 2017. Journal of the American Academy of Dermatology, 82(2), 270-277. DOI: 10.1016/j.jaad.2019.07.016.

Maron, R. C. et al. (2019). Evaluation of an artificial intelligence-based decision support system for the detection of melanoma in daily clinical practice. European Journal of Cancer, 119, 57-65. DOI: 10.1016/j.ejca.2019.06.028.

Maron, R. C. et al. (2020). Human-Artificial Intelligence Collaboration in the Diagnostic Process of Pigmented Skin Lesions: Impact on Confidence and Management. Journal of Medical Internet Research, 22(9), e18091. DOI: 10.2196/18091.

Marsden, H. et al. (2024). Accuracy of an artificial intelligence as a medical device as part of a UK-based skin cancer teledermatology service. Frontiers in Medicine, 11, 1302363. DOI: 10.3389/fmed.2024.1302363.

Merry, S. P. et al. (2025). Primary Care Physician Use of Elastic Scattering Spectroscopy on Skin Lesions Suggestive of Skin Cancer. Journal of Primary Care & Community Health, 16, 1-11. DOI: 10.1177/21501319251344423.

Millien, C., Chaput, H. and Cavillon, M. (2018). La moitié des rendez-vous sont obtenus en 2 jours chez le généraliste, en 52 jours chez l'ophtalmologiste. Etudes & Résultats, No. 1085. [pdf] Paris: DREES (Direction de la Recherche, des Études, de l'Évaluation et des Statistiques).

Ministerio de Sanidad (2025) Sistema de Información sobre Listas de Espera en el Sistema Nacional de Salud (SISLE-SNS): Situación a 30 de junio de 2025. Madrid: Gobierno de España.

Morton CA, Downie F, Auld S, et al. Community photo-triage for skin cancer referrals: an aid to service delivery. Clin Exp Dermatol. 2011;36(3):248-254. doi:10.1111/j.1365-2230.2010.03960.x

Muñoz-López, C. et al. (2021). Performance of a deep neural network in teledermatology: a single-centre prospective diagnostic study. Journal of the European Academy of Dermatology and Venereology, 35(2), 546-553. DOI: 10.1111/jdv.16855.

Navarrete-Dechent, C. et al. (2018). Automated Dermatological Diagnosis: Hype or Reality? Journal of Investigative Dermatology, 138(10), 2277-2279.

Navarrete-Dechent, C. et al. (2020b). ModelDerm algorithm performance in a telemedicine setting. Journal of the European Academy of Dermatology and Venereology.

Navarrete-Dechent, C. et al. (2020c). Multiclass Artificial Intelligence in Dermatology: Progress but Still Room for Improvement. Journal of Investigative Dermatology, 141(5), 1325-1328. DOI: 10.1016/j.jid.2020.06.040.

Orekoya, O. et al. (2021). 'To see or not to see?'' That is the question: teleconsultations in primary care and the impact on 2-week-wait referrals and outcomes. British Journal of Dermatology, 185(Supplement 1), 179.

Papachristou, P. et al. (2024). Evaluation of an artificial intelligence-based decision support for the detection of cutaneous melanoma in primary care: a prospective real-life clinical trial. British Journal of Dermatology, 191(1), 125-133. DOI: 10.1093/bjd/ljae021.

Phillips, M. et al. (2019). Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Network Open, 2(10), e1913436. DOI: 10.1001/jamanetworkopen.2019.13436.

Rodriguez-Diaz, E. et al. (2019). Optical Spectroscopy as a Method for Skin Cancer Risk Assessment. Photochemistry and Photobiology, 95(6), 1441-1445. DOI: 10.1111/php.13140.

Sangers, T. E. et al. (2022). Validation of a Smartphone Application for Risk Assessment of Pigmented Skin Lesions in a Population-Based Setting. Dermatology, 238(4), 649-656. DOI: 10.1159/000520474.

Smak Gregoor, A. M. et al. (2024). The value of an AI-based smartphone application on health care resource utilisation: a case-control study. npj Digital Medicine, 7(90). DOI: 10.1038/s41746-023-00831-w.

Tepedino, M. et al. (2024). Elastic Scattering Spectroscopy on Patient-Selected Lesions Concerning for Skin Cancer. Journal of the American Board of Family Medicine, 37(3), 427-435. DOI: 10.3122/jabfm.2023.230256R2.

Thissen, M. et al. (2017). mHealth app for risk assessment of pigmented and nonpigmented skin lesions - a study on sensitivity and specificity in detecting malignancy. Telemedicine and e-Health, 23(12), 948-954.

Thomas, L. et al. (2023). Real-world post-deployment performance of a novel machine learning-based digital health technology for skin lesion assessment and suggestions for post-market surveillance. Frontiers in Medicine, 10, 1264846. DOI: 10.3389/fmed.2023.1264846.

Thorlacius, L. et al. (2019). Inter-rater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa. British Journal of Dermatology, 181(3), 483-491.

Tschandl, P. et al. (2019). Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. The Lancet Oncology, 20(7), 938-947. DOI: 10.1016/S1470-2045(19)30333-X.

Tschandl, P. et al. (2020). Human–computer collaboration for skin cancer recognition. Nature Medicine, 26(8), 1229-1234. DOI: 10.1038/s41591-020-0942-0.

Udrea, A. et al. (2020). Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms. Journal of the European Academy of Dermatology and Venereology, 34(3), 648–655. DOI: 10.1111/jdv.15933.

Upile, T. et al. (2012). Elastic scattering spectroscopy in assessing skin lesions: An "in vivo" study. Photodiagnosis and Photodynamic Therapy, 9(2), 132-141. DOI: 10.1016/j.pdpdt.2011.12.003.

Whited, J. D. (2015). Teledermatology. Medical Clinics of North America, 99(6), 1365-1379. DOI: 10.1016/j.mcna.2015.07.005.

Zanchetta, M. et al. (2025). Performance of a Deep Learning Algorithm for Melanoma Classification Across Diverse Dermoscopic and Tele-Dermatology Datasets. JEADV Clinical Practice. DOI: 10.1002/jvc2.70191.

Literature search and publications

Literature search performed for the state-of-the-art review

Search traceability

A complete audit trail of the literature search is provided in the document "SOTA_Literature search.xlsx". This file documents the complete traceability of all queries, the selection process, and the specific reasons for exclusions. The document containes the following tables:

Results: This comprehensive sheet details the screening process for every item retrieved. Each entry (identified by a unique DOI or PMID) includes the following information:
- The query number that retrieved the article.
- Bibliographic data: Title, authors, journal, publication year and the abstract.
- A duplicate column, marked "Yes" or "no".
- The outcome of the selection process at each stage (title, abstract, and full-text review), indicating whether the article was "selected" or "excluded".
- For excluded articles, the specific reason for exclusion is provided, cross-referencing the selection criteria from previous sections.
- The appraisal for each Inclusion criteria for selected manuscripts.
Additional records: This sheet lists records (manuscripts or guidelines) that were added manually. These publications were included because they were deemed highly relevant and consistent with the research objectives outlined in section Objectives of the literature search.

Search treaceability (vigilance data)

All queries are presented in section Vigilance databases and searches for vigilance data were performed according to these.

Retained clinical data

All PDF files of the retained clinical data are available in the document “Clinical data SotA Legit.Health Plus”.

To facilitate data identification, each PDF file has been named using the same nomenclature. This includes the name of the first author “et al.” and the year of the publication.

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

Author: Team members involved
Reviewer: JD-003, JD-004
Approver: JD-005

Objectives and Scope​

Scope​

Objectives​

Applicable standards and guidelines​

Literature Search​

Literature Search Plan​

Literature Search Strategy​

Evaluator in charge of the searches​

Sources​

Identification of relevant medical conditions/medical fields concerned​

Systematic Literature search for SOTA description​

Data search question using PICO methodology​

Generation of keywords and algorithms for bibliographic search​

Bibliographic search strategy for determining the state of the art​

Guidelines and recommendations​

Clinical Papers​

Similar devices​

Results from initial queries​

Vigilance databases​

Registres​

Identification of registres​

Search description​

Inclusion/exclusion criteria​

Applicable standards​

Selection of references for the review of the state of the art​

Methodology used for selection​

Results of the selection​

Appraisal of clinical data for the review of the state-of-the-art​

Appraisal plan​

Level of evidence​

Results of data appraisal​

GRADE-like certainty assessment​

Manuscript Appraisal Scores​

Results of the literature search​

Summary of articles retained from the the state-of-the-art review in standard clinical practice​

Clinical data collected on malignancy detection​

Clinical data collected on the improvement in the accuracy of HCPs in the diagnosis of dermatological conditions​

Clinical data collected on the performance of HCPs in the diagnostic accuracy of dermatological conditions​

Clinical data collected on the referral accuracy of PCPs in dermatological conditions​

Clinical data on the impact of AI-Guided Medical Devices on Dermatology Waiting Times and the Current Healthcare Landscape in Spain and the EU.​

Clinical data on the impact of AI-Guided Devices on Remote Patient Management Rates in Dermatological consultations​

Clinical data on PCP Referral Accuracy for Dermatological Conditions​

Clinical data collected on Inter-Observer Reliability in HS Severity Assessment using IHS4 scoring system​

Clinical data on the Variability in FAGA Severity Grading Using the Ludwig Scale​

Assessment of Expert Consensus on the Perceived Utility of the device​

Summary of articles retained for the description of similar devices​

Clinical data collected on SkinVision​

Clinical data collected on Huvy​

Clinical data collected on DERM​

Clinical data collected on Dermalyser​

Clinical data collected on ModelDerm​

Clinical data collected on DermaSensor​

Results: data from registries and databases​

Results of the vigilance databases analysis​

Applicable standards​

State of the Art presentation​

Introduction to Dermatology and Clinical Challenges​

Application of Artificial Intelligence in Dermatology​

Similar devices​

DERM (Deep Ensemble for Recognition of Malignancy)​

Huvy (SLC.AI)​

SkinVision​

Dermalyser​

ModelDerm​

DermaSensor​

Expected benefits of AI-guided medical devices in dermatology​

Hazards due to AI-Guided Medical Devices that Could be Relevant to the Device under Evaluation​

Benefit-Risk Profiles of Alternative AI-Guided Medical Devices​

Discussion​

Synthesis​

References​

Literature search and publications​

Literature search performed for the state-of-the-art review​

Search traceability​

Search treaceability (vigilance data)​

Retained clinical data​

Objectives and Scope

Scope

Objectives

Applicable standards and guidelines

Literature Search

Literature Search Plan

Literature Search Strategy

Evaluator in charge of the searches

Sources

Identification of relevant medical conditions/medical fields concerned

Systematic Literature search for SOTA description

Data search question using PICO methodology

Generation of keywords and algorithms for bibliographic search

Bibliographic search strategy for determining the state of the art

Guidelines and recommendations

Clinical Papers

Similar devices

Results from initial queries

Vigilance databases

Registres

Identification of registres

Search description

Inclusion/exclusion criteria

Applicable standards

Selection of references for the review of the state of the art

Methodology used for selection

Results of the selection

Appraisal of clinical data for the review of the state-of-the-art

Appraisal plan

Level of evidence

Results of data appraisal

GRADE-like certainty assessment

Manuscript Appraisal Scores

Results of the literature search

Summary of articles retained from the the state-of-the-art review in standard clinical practice

Clinical data collected on malignancy detection

Clinical data collected on the improvement in the accuracy of HCPs in the diagnosis of dermatological conditions

Clinical data collected on the performance of HCPs in the diagnostic accuracy of dermatological conditions

Clinical data collected on the referral accuracy of PCPs in dermatological conditions

Clinical data on the impact of AI-Guided Medical Devices on Dermatology Waiting Times and the Current Healthcare Landscape in Spain and the EU.

Clinical data on the impact of AI-Guided Devices on Remote Patient Management Rates in Dermatological consultations

Clinical data on PCP Referral Accuracy for Dermatological Conditions

Clinical data collected on Inter-Observer Reliability in HS Severity Assessment using IHS4 scoring system

Clinical data on the Variability in FAGA Severity Grading Using the Ludwig Scale

Assessment of Expert Consensus on the Perceived Utility of the device

Summary of articles retained for the description of similar devices

Clinical data collected on SkinVision

Clinical data collected on Huvy

Clinical data collected on DERM

Clinical data collected on Dermalyser

Clinical data collected on ModelDerm

Clinical data collected on DermaSensor

Results: data from registries and databases

Results of the vigilance databases analysis

Applicable standards

State of the Art presentation

Introduction to Dermatology and Clinical Challenges

Application of Artificial Intelligence in Dermatology

Similar devices

DERM (Deep Ensemble for Recognition of Malignancy)

Huvy (SLC.AI)

SkinVision

Dermalyser

ModelDerm

DermaSensor

Expected benefits of AI-guided medical devices in dermatology

Hazards due to AI-Guided Medical Devices that Could be Relevant to the Device under Evaluation

Benefit-Risk Profiles of Alternative AI-Guided Medical Devices

Discussion

Synthesis

References

Literature search and publications

Literature search performed for the state-of-the-art review

Search traceability

Search treaceability (vigilance data)

Retained clinical data