Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
  • Legit.Health Plus Version 1.1.0.0
    • Index
    • Overview and Device Description
    • Information provided by the Manufacturer
    • Design and Manufacturing Information
    • GSPR
    • Benefit-Risk Analysis and Risk Management
    • Product Verification and Validation
      • Software
      • Artificial Intelligence
      • Cybersecurity
      • Usability and Human Factors Engineering
      • Clinical
        • Evaluation
          • R-TF-015-001 Clinical Evaluation Plan
          • R-TF-015-003 Clinical Evaluation Report
          • R-TF-015-007 Declaration of interest Alberto Sabater
          • R-TF-015-007 Declaration of interest Alfonso Medela
          • R-TF-015-007 Delaration of interest Constanza Balboni
          • R-TF-015-007 Delaration of interest María Belén Hirigoity
          • R-TF-015-007 Declaration of interest María Diez
          • R-TF-015-007 Declaration of interest Taig Mac Carthy
          • R-TF-015-011 State of the Art Legit.Health Plus
        • Investigation
        • R-TF-015-008 Clinical development plan
      • Commissioning
    • Post-Market Surveillance
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Grants
  • Pricing
  • Public tenders
  • Legit.Health Plus Version 1.1.0.0
  • Product Verification and Validation
  • Clinical
  • Evaluation
  • R-TF-015-011 State of the Art Legit.Health Plus

R-TF-015-011 State of the Art Legit.Health Plus

Objectives and Scope​

Scope​

This state-of-the-art document is established in the framework of the clinical evaluation of the Legit.Health Plus medical device (hereinafter, "the device"). Therefore, it aims to specify the clinical background and current knowledge and to establish the state of the art for the current clinical practice and medical devices used in dermatology.

Objectives​

The device is a computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures, enhancing efficiency and accuracy of care delivery, by providing:

  • an interpretative distribution representation of possible International Classification of Diseases (ICD) categories that might be represented in the pixels content of the image,
  • quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others.

Therefore, the following needs to be discussed:

  • The basics of clinical workflow in dermatology (medical care in primary care, referral to dermatology or monitoring in primary care, consultation in dermatology).
  • Use of AI-powered medical devices in diagnostic support in dermatological clinical practice. (diagnostic support).
  • Analysis of similar devices.
  • Expected use, safety, performance, and benefits of such software.

Applicable standards and guidelines​

The clinical evaluation of the device will be performed according to the relevant legal framework and following the applicable and established standards described in the following table.

Identification of the StandardDomainCompliance informationDescription of deviationsEvidence
ISO 13485:2016Medical devices - Quality management systems. Requirements for regulatory purposesFull applicationBSI Certification ISO 13485
IEC 62304:2006/A1:2015Medical device software - Software life cycle processesFull applicationR-TF-001-005 List of applicable standards and regulations
IEC 82304-1:2016Health software – Part 1: General requirements for product safetyFull applicationR-TF-001-005 List of applicable standards and regulations
ISO 14155:2020Clinical Investigation of medical devices for human subjects - Good clinical practiceFull applicationR-TF-001-005 List of applicable standards and regulations
ISO 14791:2019Medical devices - Application of risk management to medical devicesFull applicationR-TF-001-005 List of applicable standards and regulations
ISO 15223-1:2021Medical devices - Symbols to be used with medical device labels, labelling and information to be suppliedFull applicationR-TF-001-005 List of applicable standards and regulations
ISO 24791-2/2020-06Medical devices - Guidance on the application of ISO 14971Full applicationR-TF-001-005 List of applicable standards and regulations
ISO 62366-1:2015/A1:2020Medical devices - Part 1: Application of usability engineering to medical devicesFull applicationR-TF-001-005 List of applicable standards and regulations
IEC 81001-5-1:2021Health software and health IT systems safety, effectiveness and security — Part 5-1: Security — Activities in the product life cycleFull applicationR-TF-001-005 List of applicable standards and regulations
ISO 27001:2022Information security, cybersecurity and privacy protection — Information security management systems — RequirementsPartial applicationWe comply only with the applicable part of the standardR-TF-001-005 List of applicable standards and regulations
ISO 27002:2022Information security, cybersecurity and privacy protection — Information security controlsPartial applicationWe comply only with the applicable part of the standardR-TF-001-005 List of applicable standards and regulations
FDA GMLP 2021Good machine learning practice for MD development: guiding principlesFull applicationR-TF-001-005 List of applicable standards and regulations
FDA AI/ML Framework 2019Proposed regulatory framework for modifications to AI/ML-based SaMDFull applicationR-TF-001-005 List of applicable standards and regulations

A literature search of guidelines will be performed in Google and PubMed, searching the following terms: ICD-11 disease of skin guideline, in order to find medical guidelines related to ICD-11 Classification of Dermatological Diseases.

Literature Search​

Literature Search Plan​

Literature Search Strategy​

The bibliographic search for the state of the art was done according to the EU regulation 2017/745 requirements and following the guidelines MEDDEV 2.7/1 revision 4 June 2016. The search for relevant publications started with the definition of the criteria regarding the population of patients, the clinical indication, the specificities of the product, and the measurable outcomes. They were written in natural language and distinguished the inclusion and exclusion criteria. The objectives of the literature search are presented in the section below.

All searches performed are described below. These include a search on literature databases, vigilance databases, and a review of national registries available for the concerned medical field. The keywords used to query the databases were selected by taking into account the criteria previously defined. This report provides, for each search, the queries formulated for each database, the number of matching records for each query, and the date it was entered.

Evaluator in charge of the searches​

The Evaluator who performed the searches on 15th July 2025 is:

  • Mr. Jordi Barrachina - Clinical Research Coordinator, PhD. (Legit.Health) (CV available in Annexes).

Sources​

In the current state of the art in the corresponding medical field, the following aspects and information will be checked:

  • Applicable standards and guidance documents.
  • Information relating to the current situation in the medical field in which the device is used.
  • Benchmark devices and other devices available on the market.

The CER shall contain a thorough state-of-the-art review to analyze and assess the benefit-risk profile of currently available methods for the various indications and for the device's intended purpose. An objective, comprehensive literature review will be performed to identify, select, and collect the relevant literature to determine whether the device offers a safe and effective performance for the intended purpose. The review will be focused on relevant data to the device under evaluation, relevant data on the current situation in standard clinical practice, relevant data to the intended purpose of similar devices, and claimed performance and safety data (including incidents and contraindications).

Identification of relevant medical conditions/medical fields concerned​

The device is intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing: quantification of intensity, count, extent of visible clinical signs interpretative distribution representation of possible International Classification of Diseases (ICD) classes.

Therefore, the medical conditions identified are all skin diseases listed and described in the ICD-11 (code 14).

Systematic Literature search for SOTA description​

Following section A5 of the MEDDEV 2.7/1 rev4 guide, the objective of the literature search will be conducted to complete the state of the art of the device, using the PICO methodology (Patient characteristics, type of Intervention, Control, and relevant Outcomes).

Data search question using PICO methodology​

As part of the literature search strategy, the PICO method was used to establish the algorithms subsequently. The PICO method is a format used for the development of appropriate clinical questions, consisting of answering the following questions to establish the search keywords:

  • P (Problem/Patient/Population): Who are the users, patients or affected population?.
  • I (Intervention/indicator): What is the management strategy for the identified population?.
  • C (Comparator): What is the alternative to the proposed intervention?.
  • O (Outcomes): What are the relevant outcomes to be measured?.

The choice of keywords for implementing the PICO methodology is based on the intended purpose and medical condition of the device. In this way, the selection of relevant articles from references identified in the databases is based on the research objective described in the table below.

InclusionExclusion
P (Problem/Patient/Population)Patients with visible skin structure abnormalities; skin diseases listed in ICD-11 code 14; across all age groups, skin types, and demographics.
Users: Healthcare Professionals (HCPs) such as dermatologists, General Practitioners (GPs) and IT professionals.
Wrong type of population: - Animals.
- Studies focused on non-dermatological pathologies
I (Intervention/indicator)Use of a computational software-only medical device (SaMD) that processes images of skin structures to provide clinical data for aiding practitioners in skin assessments.
Data related to standard clinical practice in dermatology, traditional diagnostic methods without technological assistance.
Interventions not related to the device's intended use or medical indication.
C (Comparator and type of studies)Other smartphone applications. SkinVision, Molescope, Huvy and DERM. Traditional methods of clinical skin examination without software assistance, and non-software-based skin assessments by healthcare professionals (i.e., Standard of Care).
Type of studies:
- Meta-analysis
- Literature review and systematic reviews
- Case series and cohort studies
- Clinical studies (randomised or not, multicentric or not, prospective or retrospective)
- Clinical guidelines or guidelines elaborated by scientific societies.
Wrong comparator and studies:
- Non-clinical comparators (e.g., comparison against another algorithm only).
- SPurely in silico or in vitro validation studies without clinical practice data.
- Case reports that do not provide new information on risks or performance.
- Non-peer-reviewed literature (e.g., opinion articles, blog posts).
- Study providing no clinical results (e.g. protocols)
O (Outcomes)Improved efficiency and accuracy in clinical decision-making for skin disease assessment or malignancy detection; support in diagnosis through interpretative data and quantification. Optimisation of clinical workflow through reduction of unnecessary referrals from primary care to dermatology; reduction of cumulative waiting time to see the dermatologist face-to-face.
Safety data (e.g. incorrect performance, failure of interoperability, inputs without sufficient quality).
Wrong objectives:
- Not clinical outcomes (e.g., technical algorithm testing)
- Datasets not discussing the use, safety, performance, or benefits of the device.
- Data only focused on drugs are excluded.
- Too specific topic (i.e. datasets dealing with a particular subject and deemed irrelevant for the description of the state of the art).

Generation of keywords and algorithms for bibliographic search​

According to the description of the words described using the PICO methodology, the following search terms or keywords have been defined.

DescriptionKeywords/TermsAlgorithm
P (Problem/Patient/Population)Patients with visible skin structure abnormalities; skin diseases listed in ICD-11 code 14; across all age groups, skin types, and demographics.
Users: Healthcare Professionals (HCPs) such as dermatologists, Primary Care Practitioners (PCPs) and IT professionals.
"skin cancer", "epidermis", "chronic skin conditions", "skin conditions", "inflammatory skin diseases", "malignant skin lesions", "melanoma", "acne", "psoriasis", "dermatofibroma", "dermatosis"("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis")
I (Intervention/indicator)Use of a computational software-only medical device (SaMD) that processes images of skin structures to provide clinical data for aiding practitioners in skin assessments.
Data related to standard clinical practice in dermatology, traditional diagnostic methods without technological assistance.
"AI-powered dermatology tools", "computer vision in dermatology", "smartphone", "dermatology software", "skin image analysis", "dermatology diagnostic support", "digital dermatology tools"("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools")
C (Comparator and type of studies)Other smartphone applications. SkinVision, Molescope, Huvy and DERM. Traditional methods of clinical skin examination without software assistance, and non-software-based skin assessments by healthcare professionals (i.e., Standard of Care).
Type of studies:
- Meta-analysis
- Literature review and systematic reviews
- Case series and cohort studies
- Clinical studies (randomised or not, multicentric or not, prospective or retrospective)
- Clinical guidelines or guidelines elaborated by scientific societies.
"standard of care", "traditional dermatology assessment", "clinical skin examination", "dermatology guidelines", "clinical studies in dermatology", "SkinVision", “Huvy”, “DERM”, "artificial intelligence", "machine learning", "deep learning", "computer vision", "deep neural networks", "metaoptima", "clinical exam", "visual inspection", "manual assessment"("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment")
O (Outcomes)Improved efficiency and accuracy in clinical decision-making for skin disease assessment or malignancy detection; support in diagnosis through interpretative data and quantification. Optimisation of clinical workflow through reduction of unnecessary referrals from primary care to dermatology; reduction of cumulative waiting time to see the dermatologist face-to-face.
Safety data (e.g. incorrect performance, failure of interoperability, inputs without sufficient quality).
"diagnostic accuracy", "clinical decision support", "efficiency in dermatology", "referral reduction", "waiting time reduction", "safety of dermatology software", "performance of AI in dermatology"("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology")

By combining the four elements of the PICO method, the final search algorithm was obtained:

("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools") AND ("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment") AND ("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology")

Bibliographic search strategy for determining the state of the art​

Guidelines and recommendations​

The following databases have been reviewed in order to find relevant guidelines or recommendations concerning the application of AI in dermatology or standard clinical practice:

  • MEDLINE PubMed: https://www.ncbi.nlm.nih.gov/pubmed/
  • U.S. Food and Drug Administration (FDA): https://www.fda.gov/regulatory-information/
  • Sociedad Española de Dermatología y Venereología: https://aedv.es/guias-para-pacientes-2/
  • European Academy of Dermatology and Venereology: https://eadv.org/publications/clinical-guidelines/
  • American Academy of Dermatology: https://www.aad.org/member/clinical-quality/guidelines

All the following searches have been conducted by Mr. BARRACHINA Jordi (Legit.Health) as described below and without deviation.

DatabaseKeywords / termsFilters / limitationsRecords
MEDLINE PubMed("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("software" OR "digital imag*" OR "smartphone" OR "web application") AND ("artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima").Period of search: the last 10 years from 2015/07/15 to 2025/07/15
Species: Humans
Language: English
Text availability: full text available
Article type: "Guidelines", "Practice Guidelines"
0
FDA"dermatology guidelines", "AI in dermatology", "machine learning dermatology", "digital health dermatology"Topic: Clinical-Medical0
Sociedad Española de Dermatología y VenereologíaNo specific keywords usedNo specific limitations used.0
European Academy of Dermatology and VenereologyNo specific keywords usedNo specific limitations used.1
American Academy of DermatologyNo specific keywords usedNo specific limitations used.2

On the other hand, guidelines can also be manually added if they are deemed relevant and consistent with the research objectives as presented in previous sections. These guidelines can result from systematic research carried out in the past, identified within the selected articles or simply published by scientific societies.

Clinical Papers​

To perform the search, sources of information from scientific literature databases such as PubMed and Cochrane Library will be consulted, along with ClinicalTrials.gov.

PubMed: a search engine with free access to the MEDLINE database of references and abstracts on life sciences and biomedical topics, which is considered the most complete and orderly. The US National Library of Medicine (NLM) at the National Institutes of Health maintains the database as part of the information retrieval system. MEDLINE has about 5200 journals published in the United States and in more than 70 countries around the world from 1966 to the present. Use PMID (PubMed Identifier) as the unique identifier assigned to each PubMed record.

The search filters to be applied in PubMed are as follows:

  • Text availability: "abstract", "full-text".
  • Species: humans
  • Publication date: 10 years (15-07-2015 to 15-07-2025)
  • Article Language: English
  • The full search algorithm is: ("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools") AND ("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment") AND ("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology").

Similar devices​

The selection of similar devices for the purpose of this clinical evaluation is based on a rigorous assessment of equivalence in accordance with the requirements of Regulation (EU) 2017/745 (MDR) and the principles outlined in guidance document MDCG 2020-5.

A device is considered equivalent to our device only if sufficient similarity is demonstrated across the following three characteristics:

  • Technical: The device must have a similar design, underlying technology (e.g., AI algorithms), performance specifications, and deployment method.
  • Biological: As a Software as a Medical Device (SaMD) with no physical patient contact, this characteristic is confirmed by the absence of patient-contacting materials and is therefore not applicable.
  • Clinical: The device must be used for the same medical purpose and clinical condition, in a similar patient population, by a similar user profile, and demonstrate a comparable safety and clinical performance profile.

Only devices that meet these criteria for technical and clinical equivalence are considered 'similar devices', and their data is leveraged in this clinical evaluation. In this way, the following medical devices similar to our device have been identified.

Device nameManufacturer nameTargeted medical conditionsCE Marking
SkinVisionSkinVision B.V.Skin cancer detection (melanoma, basal cell carcinoma, squamous cell carcinoma)Yes
MolescopeMetaOptima Technology Inc.Mole imaging, other skin conditions like acne, eczema, psoriasisYes (MDD)
MoleMapperOregon Health & Science University AppsMelanoma DetectionNot found
HuvyHuvy SASMelanoma DetectionYes
DERMSkin AnalyticsSkin cancer detection (melanoma, basal cell carcinoma, squamous cell carcinoma)Yes
DermalyserAI Medical TechnologyMelanoma DetectionYes
FotoFinderFotoFinder Systems GmbHSkin cancer detection, other skin conditionsYes
ModelDermIderma IncSkin lesion recognitionNo

Results from initial queries​

All the following searches have been conducted by Mr. BARRACHINA Jordi (Legit.Health) as described below and without deviation on July 15, 2025.

#DatabaseData related toKeywords/termsFilters / limitationsRecords
01MEDLINE PubMedMedical field("skin cancer" OR "epidermis" OR "chronic skin conditions" OR "skin conditions" OR "inflammatory skin diseases" OR "malignant skin lesions" OR "melanoma" OR "acne" OR "psoriasis" OR "dermatofibroma" OR "dermatosis") AND ("AI-powered dermatology tools" OR "computer vision in dermatology" OR "smartphone" OR "dermatology software" OR "skin image analysis" OR "dermatology diagnostic support" OR "digital dermatology tools") AND ("standard of care" OR "traditional dermatology assessment" OR "clinical skin examination" OR "dermatology guidelines" OR "clinical studies in dermatology" OR "SkinVision" OR “Huvy” OR “DERM” OR "Molescope" OR "Dermalyser" OR "FotoFinder" OR "MoleMapper" OR "artificial intelligence" OR "machine learning" OR "deep learning" OR "computer vision" OR "deep neural networks" OR "metaoptima" OR "clinical exam", "visual inspection" OR "manual assessment") AND ("diagnostic accuracy" OR "clinical decision support" OR "efficiency in dermatology" OR "referral reduction" OR "waiting time reduction" OR "safety of dermatology software" OR "performance of AI in dermatology")Period of search: the last 10 years from 2015/07/15 to 2025/07/15
Species: Humans
Language: English
Text availability: full text available
Article type: Reviews, Systematic reviews, Meta-analyses, Case series and cohort studies, Clinical studies (randomised or not, multicentric or not, prospective or retrospective).
227
02Cochrane LibraryMedical fieldSame as aboveNo filter available (no results).0

Inclusion criteria

In addition to the exclusion criteria mentioned in section Generation of keywords and algorithm for bibliographic search, the following criteria linked to the limitations of the search have been used when needed: “wrong language” (publications not available in English) and “not available data”. If the search retains a large number of publications on the same subject, it will be allowed to exclude results published more than 5 years ago (reason for exclusion: “repetitive publications”).

Duplicates will be identified using the unique references of the article (PMID, Cochrane IDU, DOI, and ClinicalTrials.gov Identifier). For publications that have no unique identifier, duplicates were identified using mainly the title, the authors, and the source of the document. Articles can also be added manually if they are deemed relevant and consistent with the research objectives as presented in section 2.2.2. These publications can be the result of systematic research carried out in the past or simply identified within the selected articles.

Vigilance databases​

The following databases have been identified and used to search for similar devices:

  • MAUDE FDA (USA): https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm
  • Medical Device Recalls FDA (USA): https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfRES/res.cfm
  • EUDAMED (Europe): https://ec.europa.eu/tools/eudamed/#/screen/search-device

All the following searches have been conducted by Mr. BARRACHINA Jordi (Legit.Health) as described below and without deviation on July 15, 2025.

IDDatabaseKeywords/termsFilters / limitationsRecords
01MAUDE"Manufacturer:SkinVision"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
02MAUDE“Manufacturer: MetaOptima Technology Inc.”Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
03MAUDE"Oregon Health & Science University Apps"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
04MAUDE"Manufacturer:Huvy SAS"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
05MAUDE"Manufacturer:Skin Analytics"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
06MAUDE"Manufacturer:AI Medical Technology"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
07MAUDE"Manufacturer:FotoFinder Systems GmbH"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
08Medical Device Recalls"Product name:SkinVision"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
09Medical Device Recalls“Product name: MoleScope”Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
10Medical Device Recalls"Product name: MoleMapper"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
11Medical Device Recalls"Product name: Huvy"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
12Medical Device Recalls"Product name: DERM"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
13Medical Device Recalls"Product name: Dermalyser"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
14Medical Device Recalls"Product name: FotoFinder"Period of search: last 10 years (from 2015/07/15 to 2025/07/15)0
15EUDAMED"Product name: SkinVision"Period of search: no limitation2
16EUDAMED"Product name: MoleScope"Period of search: no limitation0
17EUDAMED"Product name: MoleMapper"Period of search: no limitation0
18EUDAMED"Product name: Huvy"Period of search: no limitation0
19EUDAMED"Product name: DERM"Period of search: no limitation0
20EUDAMED"Product name: Dermalyser"Period of search: no limitation0
21EUDAMED"Product name: FotoFinder"Period of search: no limitation0

Inclusion criteria

In addition to the exclusion criteria mentioned in previous sections, the following criteria have beenn used: “duplicate” (the same event reported for two devices) and “No info” (data with no clear or exploitable information).

Duplicates will be identified using the unique references of the vigilance report.

Registres​

Identification of registres​

To our knowledge, there is no registry database available, we performed a search on the Google search engine.

Search description​

The query was: "dermatology" AND "skin conditions" AND "AI medical devices" AND "dermatology diagnostic support" AND ("registry" OR "registries" OR "register" OR "registers").

Inclusion/exclusion criteria​

In addition to the exclusion criteria mentioned in previous sections, the following criteria have been used: “wrong language” (publications not available in English) and “not available data” (registry with no available report).

Applicable standards​

The manufacturer already identified the applicable standards for the device under evaluation. No additional search has been conducted.

Selection of references for the review of the state of the art​

Methodology used for selection​

The selection of publications is realized in 4 steps: a first selection based on the title of the article, a second selection based on the abstract, and a third selection based on materials and methods and a fourth selection based on the results of the article. At each selection step, the articles are retained or excluded based on the inclusion and exclusion criteria presented in the table of section Objectives of the literature search and the possible additional exclusion criteria presented in the search description.

Results of the selection​

The results of all searches are summarized in the diagram below.

Appraisal of clinical data for the review of the state-of-the-art​

Appraisal plan​

IDCriteriaDescriptionGrading SystemCriteriaScore
CRIT1Study FocusDo the data relate to a relevant clinical alternative?Direct RelevanceData on a similar device (e.g., devices tagged as similar) OR on the standard clinical practice (e.g., accuracy of HCPs, visual inspection in Primary Care).2
CRIT1Study FocusDo the data relate to a relevant clinical alternative?Contextual RelevanceContextual data (e.g., disease epidemiology, general clinical guidelines) but not on the performance of a specific alternative OR Clinical data including a similar device but which is not specific1
CRIT1Study FocusDo the data relate to a relevant clinical alternative?No RelevanceData not related to any clinical alternative in dermatology0
CRIT2Clinical Setting or Intended useDoes the study's setting and intended use match the device under evaluation?Full matchData focused on devices designed to support healthcare practitioners in the assessment of skin structures OR Same setting (e.g., Primary Care and/or Dermatology clinic).2
CRIT2Clinical Setting or Intended useDoes the study's setting and intended use match the device under evaluation?Partial matchData focused on devices with an intended use not claimed by the manufacturer, but compliant with the intended use of the device group OR Same setting but for a different intended use (e.g., melanoma detection only).1
CRIT2Clinical Setting or Intended useDoes the study's setting and intended use match the device under evaluation?No matchData focused on devices with an intended use not related to the device under evaluation OR Different clinical setting (e.g., specialities different from dermatology).0
CRIT3Population of patientsIs the study population representative?ApplicableTarget population as per the device's intended use (e.g., patients attending a dermatological consultation across all age groups, skin types, and demographics)2
CRIT3Population of patientsIs the study population representative?Partially applicableSpecific sub-population of the target population (e.g., only high-risk patients, only a specific skin phototype, only a pathology).1
CRIT3Population of patientsIs the study population representative?Not applicablePopulation not related to the target population (e.g., healthy volunteers) or non-relevant or contraindicated population.0
CRIT4Type of datasetAppropriate study design/type of document and sufficient dataYesStudies with a level of evidence greater than or equal to 4 (as per Level of Evidence scale)1
CRIT4Type of datasetAppropriate study design/type of document and sufficient dataNoStudies with a level of evidence lower than 4 (e.g., expert opinions, small case series). OR insufficient data to extract relevant clinical performance or safety information.0
CRIT5Outcome measurement (Performance/Safety)Does the study measure objective outcomes related to performance (e.g., diagnostic accuracy) and/or safety (e.g., false negative rate)?YesProvides quantitative performance data (e.g., Sensitivity, Specificity, PPV) and/or safety data (e.g., rate of unnecessary biopsies, false negatives).1
CRIT5Outcome measurement (Performance/Safety)Does the study measure objective outcomes related to performance (e.g., diagnostic accuracy) and/or safety (e.g., false negative rate)?NoDoes not provide performance or safety data (e.g., descriptive only).0
CRIT6Clinical significanceDoes the study evaluate if the performance results in a tangible clinical benefit (e.g., reduction in unnecessary biopsies, improved early detection)?YesProvides clinical benefit data (e.g., impact on referral pathways, reduction of benign biopsies) or workflow benefits.1
CRIT6Clinical significanceDoes the study measure clinical significance (e.g., impact on patient management, health outcomes)?NoDoes not provide clinical benefit data (reports pure performance metrics only or descriptive).0
CRIT7Statistical analysisIs there a statistical analysis?Yestatistical comparisons are made (e.g., between groups, p-values, confidence intervals).1
CRIT7Statistical analysisIs there a statistical analysis?NoNo statistical comparison (descriptive data only).0

All included datasets are appraised for their relevant methodological quality and scientific validity (from 0 to 4) and clinical relevance (from 0 to 6). The weight of each data set is measured by the score calculated from the sum obtained (from 0 to 10). If the score of a data set is < 4, a justification for the use of the data set is included.

Level of evidence​

Besides evaluation and weighting, the level of clinical evidence of all included datasets is assessed using criteria exposed in the following table:

Level of evidenceType of datasetScore
Critical appraisalMeta-analysis10
Critical appraisalSystematic reviews9
Critical appraisalCritically Appraised Literature / Evidence-Based Practice Guidelines8
Experimental studiesRandomized controlled / comparative studies7
Experimental studiesNon-randomized controlled / comparative studies6
Observational studiesProspective non-comparative studies5
Observational studiesRetrospective non-comparative studies / Case series4
Observational studiesIndividual Case reports3
Observational studiesExpert opinion / Bench research / non-EBM guidelines2
OtherOther1

Results of data appraisal​

The datasets identified in section Selection of references for the review of the state-of-the-art were evaluated and weighted according to the appraisal criteria detailed in the previous section. The results of this data appraisal, including the assessed level of evidence, are presented in the following table.

It should be noted that additional articles and scientific guidelines have been included to contextualize the state-of-the-art presentation for the medical field (i.e., sections State of the Art presentation). These were incorporated either because no comparable scientific publications were found using our search algorithm, or to allow for performance comparison of the device and to complement the state-of-the-art.

According to the established Criteria (defined in the previous section), all selected articles obtained a score equal to or greater than 4 and were therefore included in the clinical evaluation.

  • The mean relevance score was 4.40/6.
  • The mean quality score was 2.47/4.
  • The mean weight was 6.88/10.
  • The mean level of clinical evidence was 6.3/10.

GRADE-like certainty assessment​

Overall certainty (GRADE-like): Moderate.

Rationale: the body of evidence shows reasonable applicability and overall weight (mean weight 6.88/10 and mean relevance 4.40/6), but there are consistent methodological limitations and some indirectness. In short:

  • Risk of bias: average methodological quality was moderate (mean quality 2.47/4), with several observational/reader studies and variable blinding — this supports concern for risk of bias. (downgrade 1 level).
  • Inconsistency: results are directionally consistent (AI generally matches or improves clinician performance) but effect sizes and settings vary; no additional downgrade applied.
  • Indirectness: some datasets use similar devices or differ from the exact intended use (partial indirectness noted). (contributes to moderate certainty).
  • Imprecision: smaller studies have wide confidence intervals but larger trials and systematic reviews are available; net effect is not a further downgrade.
  • Publication bias: no clear signal identified, but cannot be excluded.

Net judgement: after considering the domains above, the evidence is best graded as Moderate. This judgment is linked to the aggregated appraisal metrics reported above and should be revisited if new high-quality randomized or registry data become available.

The detailed results of the data appraisal are presented in the table below.

Manuscript Appraisal Scores​

Manuscript/StudyCRIT1CRIT2CRIT3Relevance (Total /6)CRIT4CRIT5CRIT6CRIT7Quality (Total /4)Weight (Total /10)Level of clinical evidence (Score)
Ahadi et al. 202120130.50112.55.54
Ba et al. 202211130.50112.55.56
Baker et al. 202212140.50.500.51.55.55
Barata et al. 2023112410.5113.57.57
Brinker et al. 2019 (1)11130.5010.5256
Brinker et al. 2019 (2)11131011367
Brinker et al. 2019 (3)22151111497
Burton et al. 199822150.51113.58.56
Chen et al. 2024222611114109
Cho et al. 201901120.5010.5246
Eminovic et al. 200922261101397
Escalé-Besa et al. 202322150.50.510.52.57.55
Ferris et al. 202522261011397
Gerbert et al. 199620131011363
Giavina-Bianchi et al. 202012140.5000.5155
Giavina-Bianchi et al. 202022150.50011.56.55
Goldfarb et al. 202122261010.53.59.57
Gregoor et al. 2023 (Clinical medicine)22150.5110.5385
Gregoor et al. 2023 (NPJ)2226110.513.59.58
Haenssle et al. 201811131011367
Han et al. 201810120.5010.5244
Han et al. 202020130.50112.55.54
Han et al. 202010120.50112.54.54
Han et al. 2020 (Plos Medicine)10120.50112.54.54
Han et al. 2022222611114107
Han et al. 202221140.50112.56.54
Hsiao et al. 200822260.5100.5286
Jahn et al. 202211130.50.510.52.55.55
Jain et al. 202121251011387
Kheterpal et al. 202302020.50.500.51.53.52
Kim et al. 202222260.51113.59.57
Knol et al. 200622150.51012.57.55
Krakowski et al. 2024222611114109
Lee et al. 202011130.5010.5256
Liu et al. 202021250.50112.57.54
Maier et al. 201411021110355
Marchetti et al. 201911131010.52.55.57
Maron et al. 201911130.5010.5256
Maron et al. 202011131010.52.55.57
Marsden et al. 202412251111497
Morton et al. 201022040.50000.54.55
Muñoz-López et al. 202022150.5010.5275
Navarrete-Dechent et al. 201820130.5010.5254
Navarrete-Dechent et al. 202020130.5010.5254
Orekoya et al. 202122040.5000.5155
Papachristou et al. 202422260.50112.58.55
Phillips et al. 201912140.50112.56.54
Sangers et al. 202222261011395
Thomas et al. 202302020.5100.5245
Thissen et al. 201722260.50112.58.54
Thorlacius et al. 2019222610114.010.07
Tschandl et al. 201911131011367
Tschandl et al. 202011131011367
Udrea et al. 201922260.50112.58.54
Whited et al. 201522040000042
Zanchetta et al. 202521250.50112.57.54

In this state-of-the-art review, 57 articles were included and appraised based on the criteria outlined above. To provide a comprehensive overview of current clinical practices and technologies in dermatology, additional relevant articles and guidelines were incorporated, bringing the total to 68 records analyzed.

Specifically, two clinical manuscripts were added to establish a baseline for the sensitivity and specificity of PCPs in detecting necessary referrals. Additionally, scientific guidelines for interpreting performance metrics related to the severity of female androgenetic alopecia were included to complement the state-of-the-art. Two more guidelines were added to provide a wider perspective on the evidence regarding expert consensus. Finally, three governmental reports were included to document the current situation of waiting times in Spain and other European countries, which provides a benchmark for comparison with the device's performance.

Results of the literature search​

Summary of articles retained from the the state-of-the-art review in standard clinical practice​

Due to the complexity of the device under evaluation and its multiple performance claims, the results of the state-of-the-art review are presented in several sections according to the different clinical applications of the device. Each section includes a summary table of the articles retained from the state-of-the-art review that are relevant to that specific clinical application. The tables include key information such as study design, population, outcomes measured, and main findings.

Clinical data collected on malignancy detection​

In this section, we present the clinical data collected on the performance of healthcare practitioners (HCPs, which include dermatologists and primary care practitioners) in malignancy detection and also specifically in melanoma detection . The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. In this way, the state-of-the-art analysis prioritizes the current clinical practice as the primary performance baseline to be improved, while the performance of other commercial devices is considered a secondary benchmark to establish competitiveness.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Maron et al. 2019
PMID: 31419752
Comparative Study / Reader Survey
Weighting from appraisal: 5
112 dermatologists recruited from 13 university hospitalsDermatologists assessing standard clinical images of lesions suspected of malignancy.
Convolutional Neural Networks (CNN)
To compare the diagnostic accuracy and performance of CNN against 112 dermatologists in multiclass skin cancer image classification. The primary end-point was the correct classification of the different lesions into benign and malignant (malignancy detection). The secondary end-point was the correct classification of the images into one of the five diagnostic categories (between them melanoma).None reportedSensitivity and specificity of dermatologists for the primary end-point (malignancy detection) were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5e97.1%). For the secondary end-point, more specificially for melanoma detection the sensitivity was 63.5% (95% CI: 50.4-76.5%) and the specificity 80.2% [72.5-86.5%]. At equal sensitivity, the algorithm achieved a specificity of 98.8%.The automated binary classification can be extended to a multiclass classification problem, which better reflects clinical differential diagnoses, while still outperforming dermatologists at a significant level (p≤0.001)
Haenssle et al. 2018
PMID: 29846502
Reader Study / Deep Learning CNN Comparison
Weighting from appraisal: 6
58 dermatologists participated in a reader studyDermatologists assessing standard clinical images of lesions suspected of malignancy.
Deep Learning Convolutional Neural Network (CNN)
To compare the diagnostic performance of a deep learning CNN for dermoscopic melanoma recognition against 58 dermatologistsNone reportedSensitivity and specificity of dermatologists for melanoma detection were 86.6% (95% CI: 77.3-95.9%) and 71.3% (95% CI: 60.1-82.85%).
The AUC for dermatologists were 0.79 (95% CI: 0.73-0.85). The CNN showed an AUC of 0.86, with a sensitivity of 86.6% and a specificity of 82.5%
The deep learning CNN performs favorably compared to participating dermatologists in dermoscopic melanoma recognition
Barata et al. 2023
PMID: 37955139
Brief Communication / Reader Study
Weighting from appraisal: 7.5
Reader study: 89 dermatologists.
Test set: 1,511 images (7 disease categories).
Dermatologists (human readers).
Supervised Learning (SL) AI model.
Reinforcement Learning (RL) AI model (using expert-generated rewards/penalties).
To investigate if human preferences, applied via a Reinforcement Learning (RL) model, could improve AI-based decision support for skin cancer diagnosis compared to a standard Supervised Learning (SL) model.
To test the utility of the RL model in a human-in-the-loop scenario.
None reported.AI (standalone): The RL model improved melanoma sensitivity to 79.5% (from 61.4% for SL) and BCC sensitivity to 87.1% (from 79.4% for SL).
Human-in-the-loop: AI support with the RL model increased the rate of correct diagnoses by dermatologists by 12.0% (from 68.0% to 79.9%) and improved the rate of optimal management decisions from 57.4% to 65.3%. Dermatologists alone: The dermatologists showed a sensitivity of 61.4% (95 CI: 56.3-68.6%).
Incorporating human preferences via reinforcement learning (RL) significantly improved the AI's sensitivity for melanoma and BCC (vs. SL) and improved dermatologists' diagnostic accuracy and management decisions when used as a decision support tool.
Chen et al. 2024
PMID: 39535860
Systematic Review & Meta-Analysis
Weighting from appraisal: 10
100 studies included, analyzing experienced dermatologists, inexperienced dermatologists, and primary care physicians (PCPs).Standard Clinical Practice:
1. Clinical examination / clinical images (unmagnified).
2. Dermoscopy / dermoscopic images (magnified).
To assess and quantify the diagnostic accuracy of skin cancer diagnosis, stratified by lesion type (keratinocytic vs. melanocytic), physician specialty/experience, and examination method.Not applicable (Systematic Review).Melanoma (Clinical exam/images):
• Exp. Dermatologists: Sens 76.9% (95% CI: 69.3-83.1%), Spec 89.1% (95% CI: 76.9-95.3%)
• Inexp. Dermatologists: Sens 78.3% (95%CI: 54.9%-91.4%), Spec 66.2% (95% CI: 55.9%-75.1%),
• PCPs: Sens 37.5% (95% CI: 21.1-56.3%), Spec 84.6% (95% CI: 80.0-88.5%)
Melanoma (Dermoscopy/images):
• Exp. Dermatologists: Sens 85.7% (95% CI: 82.5-88.3%), Spec 81.3% (95% CI: 76.3-85.4%),.
• Inexp. Dermatologists: Sens 78.0% (95% CI: 69.3-84.7%), Spec 69.5% (95% CI: 52.9-82.2%).
• PCPs: Sens 49.5% (95% CI: 40.4-58.6%), Spec 91.3% (95% CI: 78.0-96.9%). Globally: Sensitivity: 83.6% (95% CI: 73.2-93.1%), Specificity: 82.3% (95% CI: 74.3-90.0%) and AUC of 74% (95% CI: 72-77%).
Diagnostic accuracy varies significantly by physician specialty, experience, and method. Dermoscopy substantially improved diagnostic accuracy for melanoma (5.7-fold higher odds for experienced derms) and keratinocytic cancer (2.5-fold higher odds). Experienced dermatologists had 13.3-fold higher odds of accurately diagnosing melanoma than PCPs using dermoscopic images.
Maron et al. 2020
PMID: 32915161
Web-Based Survey Study
Weighting from appraisal: 5.5
12 board-certified dermatologists.
1200 unique dermoscopic images (50% melanomas, 50% nevi).
Dermatologists assessing dermoscopic images.
Convolutional Neural Network (CNN) used as AI support.
To investigate whether live AI support improves the accuracy, sensitivity, and specificity of dermatologists in the dichotomous image-based discrimination in melanoma detection.None reported.Dermatologist without AI: Sensitivity 59.4% (95% CI: 53.3-65.5%), Specificity 70.6% (95% CI: 62.3-78.9%), Accuracy 65.0% (95% CI: 62.3-67.6%).
Dermatologist with AI: Sensitivity 74.6% (95% CI: 69.9-79.3%), Specificity 72.4% (95% CI: 66.2-78.6%), Accuracy 73.6% (95% CI 70.9%-76.3%).
CNN (standalone): Sensitivity 84.7% (95% CI: 81.9-87.6%), Specificity 79.1% (95% CI: 74.8-83.4%), Accuracy 81.9% (95% CI: 79.7-84.2%).
AI support can significantly improve the overall accuracy and sensitivity of dermatologists for the image-based discrimination of melanoma and nevus. This supports the use of AI tools to aid clinicians.
Brinker et al. 2019
PMID: 31078438
Comparative Study
Weighting from appraisal: 5
144 completed questionnaires from dermatologists (52 board-certified, 92 junior).
804 biopsy-proven dermoscopic images (1:1 melanoma:nevi).
Dermatologists (board-certified and junior) assessing dermoscopic images.
Convolutional Neural Network (CNN).
To compare the diagnostic performance (sensitivity, specificity, overall correctness) of a CNN (trained exclusively on biopsy-verified images) with that of dermatologists.None reported.All Dermatologists (n=144): Sensitivity 67.2% (95% CI: 62.6-71.7%), Specificity 62.2% (95% CI: 57.6-66.9%).
Board-certified: Sens 63.2% (95% CI: 58.7-68.1%), Spec 65.2% 65.2% (95% CI: 60.5-69.8%).
Junior physicians: Sens 68.9% (95% CI: 64.4-73.4%), Spec 58.0% (95% CI: 53.1-62.8%).
CNN: Sensitivity 82.3% (95% CI: 78.3-85.7%), Specificity 77.9% (95% CI: 73.8-81.8%).
For the first time, automated dermoscopic melanoma image classification (by CNN) was shown to be significantly superior to both junior and board-certified dermatologists.
Han et al. 2020
PMID: 32243883
Original Article
Weighting from appraisal: 5.5
Reader study: 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals.
Validation Sets: SNU (2,201 images; 134 disorders) & Edinburgh (1,300 images; 10 disorders).
Medical professionals (dermatologists, residents).
Deep Neural Network (DNN) algorithm.
To train and validate a DNN for malignancy prediction, suggesting treatment options, and multi-class classification (134 disorders).
To assess if the algorithm can improve the performance of medical professionals ("Augmented Intelligence").
None reported.Malignancy (Algorithm standalone): AUC 0.937 (SNU dataset) and 0.928 (Edinburgh dataset).
Human-in-the-loop (Malignancy): With AI assistance, sensitivity of 47 clinicians improved from 77.4% to 86.8% and specificity from 92.9% to 93.9%.
Human-in-the-loop (Multiclass): Top-1 accuracy of 4 doctors (for 134 diseases) increased by 3.3% and the Top-3 6.7% with AI assistance.
The algorithm performed comparably to experts and, when used as an ancillary tool ("Augmented Intelligence"), significantly improved the diagnostic performance of medical professionals for both malignancy prediction and multiclass classification.
Marchetti et al. 2019
PMID: 31306724
Cross-sectional / Reader Study
Weighting from appraisal: 5.5
17 dermatologists (8 dermatologists and 9 dermatology residents).
150 dermoscopy images (50 melanoma, 50 nevi, 50 seborrheic keratoses).
Dermatologists and residents assessing dermoscopy images.
Top-ranked computer algorithm (from ISIC 2017 challenge).
To determine if computer algorithms from the ISIC 2017 challenge could improve dermatologist diagnostic accuracy for melanoma.
To explore imputing algorithm decisions for low-confidence human classifications.
None reported.ROC Area (Melanoma classification):
• Top Algorithm: 0.87 (95% CI: 0.82-0.92).
• Dermatologists: 0.74 (95% CI: 0.72-0.77).
• Residents: 0.66 (95% CI:0.6–0.69).
(Algorithm > humans).
Imputation (for low confidence): Imputing algorithm results for low-confidence dermatologist evaluations increased their sensitivity from 76.0% (95% CI:71.5–80.1%) to 80.8% (95% CI:76.3-85.3%), specificity from 72.6% (95% CI:69.4–75.7%) to 72.8% (95% CI:69.6-75.9) and overall correct classifications from 73.8% to 75.4%.
The top-ranked algorithm exceeded the diagnostic accuracy of dermatologists and residents. Judiciously applying algorithm predictions (e.g., in low-confidence cases) shows potential to improve human diagnostic performance.
Ahadi et al. 2021
PMID: 33912165
Retrospective Study
Weighting from appraisal: 5.5
4,123 pathology specimens from 4,123 patients over 3 years at a university hospital.Standard Clinical Practice: Clinical diagnosis (assumed naked eye) compared to histopathology (gold standard).To evaluate the accuracy (Sensitivity, Specificity, PPV, NPV) of clinical diagnosis for malignant skin lesions by comparing it to the histological gold standard.Not applicable (Retrospective analysis).Overall Malignancy (Clinical Diagnosis):
• Sensitivity: 90.48% (95% CI: 87.24-93.72%).
• Specificity: 82.85% (95% CI: 81.66-84.04%).
• Positive Predictive Value (PPV): 30.38%.
• Negative Predictive Value (NPV): 99.06%.
Melanoma (N=5): Sens 80.0%, Spec 97.45%.
Pathological assessment remains the cornerstone of skin cancer diagnosis. The high NPV (99.06%) and low PPV (30.38%) indicate that standard clinical diagnosis is more efficient at ruling out malignancies than it is at diagnosing them.
Brinker et al. 2019
PMID: 30981091
Comparative Study
Weighting from appraisal: 6
157 dermatologists (all experience levels) from 12 German university hospitals.
100 dermoscopic images (20 melanoma, 80 nevi).
Dermatologists assessing dermoscopic images.
Convolutional Neural Network (CNN) trained exclusively on open-source images.
To compare the performance of a CNN (trained only on open-source images) to a large, multi-experience-level group of dermatologists (157) for dermoscopic melanoma image classification.None reported.All Dermatologists (n=157): Mean Sensitivity 74.1%, (95% CI: 40-100%) Mean Specificity 60.0% (95% CI: 21.3-91.3%); AUC ROC 0.67
Chief Physicians (n=3): Mean Sensitivity 73.3%, Mean Specificity 69.2%.
CNN (at Dermatologist Sens 74.1%): Mean Specificity 86.5% (95% CI: 70.8-91.3%).
CNN (at Chief Physician Spec 69.2%): Mean Sensitivity 84.5% (95% CI: 80-95%).
A CNN trained exclusively on open-source images outperformed 136 of 157 dermatologists and all experience levels (junior to chief physicians) in terms of average specificity and sensitivity.
Tschandl et al. 2019
PMID: 31201137
Web-based diagnostic study.
Weighting from appraisal: 6
511 human readers (incl. 283 board-certified dermatologists, 118 residents).
Test set: 1511 images (7 disease categories).
Human readers (all experience levels).
139 machine-learning algorithms.
To compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions.None reported.Overall Correct Diagnoses (out of 30):
• Human Experts (n=27): 18.78 (SD 3.15).
• Top 3 Algorithms: 25.43 (SD 1.95). (Mean difference 6.65).
Melanoma-specific (from table):
• All readers: Sens 73.1% (95% CI: 65.8-79.1), Spec 92.8% (95% CI: 91.3-94.2).
• Top 3 algorithms: Sens 81.9% (95% CI: 75.4-87.3), Spec 96.2% (95% CI: 95.1-97.2).
Malignancy detection: Sens 76.4% (95% CI: 73.2-79.6), Spec: 93.1% (95% CI: 91.2-95.3)
State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice.
Clinical data collected on the improvement in the accuracy of HCPs in the diagnosis of dermatological conditions​

In this section, we present the clinical data collected on the performance of healthcare practitioners (HCPs, which include dermatologists and primary care practitioners) in the diagnosis of various dermatological conditions beyond malignancy detection and their improvement (improvement in sensitivity, specificity and accuracy) with the use of other AI-guided medical devices. The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Ba et al. 2022
PMID: 35569202
Multireader Multicase (MRMC) Study
Weighting from appraisal: 5.5
18 board-certified dermatologists.
400 clinical images (10 categories).
Dermatologists (unassisted) vs. Dermatologists with CNN assistance.To evaluate the potential impact of CNN assistance on dermatologists for clinical image interpretation.None reported.Multiclass (10 types): Accuracy 62.78% (unassisted) vs. 76.60% (assisted), an increase of 13.82%.
Binary (Malignant/Benign): Sensitivity 83.21% (unassisted) vs. 89.56% (assisted), an increase of 6.35%. Specificity 80.92% (unassisted) vs. 87.90% (assisted), an increase of 6.98%.
CNN assistance improved dermatologist accuracy in interpreting cutaneous tumours. Dermatologists with less experience benefited more from CNN assistance.
Ferris et al. 2025
PMID: 39981881
MRMC Clinical Utility Study
Weighting from appraisal: 9
108 Primary Care Physicians (PCPs).
100 skin lesion cases (from DERM-SUCCESS study).
1. PCPs (unaided visual assessment).
2. PCPs aided by an AI-enabled Elastic Scattering Spectroscopy (ESS) handheld device (DermaSensor).
To assess and compare the diagnostic and management performance of PCPs with and without the ESS device in detecting skin cancer.None reported.
(Aided PCPs incorrectly referred 11.8% more benign lesions but correctly referred 9.4% more malignant lesions).
Diagnostic Sensitivity: 71.1% (unaided) vs. 81.7% (aided), a difference of 10.6%. (P=.0085).
Diagnostic Specificity: 60.9% (unaided) vs. 54.7% (aided), a difference of -6.2% (P=.1896).
Management (Referral) Sensitivity: 82.0% (unaided) vs. 91.4% (aided), a difference of 9.6% (P=.0027), specificity: a decrease of 9.6%.
Use of the ESS device output by PCPs significantly improved their diagnostic and management sensitivities and overall management performance (AUC), suggesting the device can improve PCP skin cancer detection and confidence. Despite that, the diagnostic specificity decreased with the use of the device.
Han et al. 2020
PMID: 32243883
Original Article
Weighting from appraisal: 5.5
Reader study: 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals. Validation Sets: SNU (2,201 images; 134 disorders) & Edinburgh (1,300 images; 10 disorders).Medical professionals (dermatologists, residents).
Deep Neural Network (DNN) algorithm.
To train and validate a DNN for malignancy prediction, suggesting treatment options, and multi-class classification (134 disorders).
To assess if the algorithm can improve the performance of medical professionals ("Augmented Intelligence").
None reported.Malignancy (Algorithm standalone): AUC 0.937 (SNU dataset) and 0.928 (Edinburgh dataset).
Human-in-the-loop (Malignancy): With AI assistance, sensitivity of 47 clinicians improved from 77.4% to 86.8%, an increase of 9.4% and specificity from 92.9% to 93.9%, an increase of 1.0%.
Human-in-the-loop (Multiclass): Top-1 accuracy of 4 doctors (for 134 diseases) increased by 7.0% with AI assistance.
The algorithm performed comparably to experts and, when used as an ancillary tool ("Augmented Intelligence"), slightly improved the diagnostic performance of medical professionals for both malignancy prediction and multiclass classification.
Han et al. 2022
PMID: 35662137
Randomized Controlled Trial
Weighting from appraisal: 10
576 consecutive cases (patients) with suspicious lesions.
8 trainees (4 dermatology residents, 4 non-dermatology trainees).
1. Trainees (unaided group, n=281).
2. Trainees (AI-assisted group, n=295) using "Model Dermatology" algorithm.
To validate whether a multiclass AI algorithm could augment the accuracy of non-expert physicians in a real-world setting.A 12.2% drop in Top-1 accuracy was observed in cases where all Top-3 predictions from the algorithm were incorrect. Four cases of malignancy were ruled out by trainees after incorrect AI assistance.Overall Top-1 Accuracy (Trainees): 53.9% (AI-assisted) vs. 43.8% (unaided) an increase of 10.1%.
Non-Derm Trainees Top-1: 54.7% (AI-assisted) vs. 29.7% (unaided), an increase of 25.0%.
Derm Residents Top-1: 53.1% (AI-assisted) vs. 57.3% (unaided) a reduction in accuracy of 4.2%.
The multiclass AI algorithm augmented the diagnostic accuracy of non-expert physicians in dermatology, especially for the least experienced (non-dermatology trainees), notwithstanding, it reduced the diagnostic accuracy of residents in dermatology.
Jain et al. 2021
PMID: 33909051
MRMC Diagnostic Study
Weighting from appraisal: 8
40 clinicians (20 PCPs, 20 NPs).
1048 retrospective teledermatology cases (120 skin conditions).
1. PCPs and NPs (unassisted).
2. PCPs and NPs with an AI-based assistive tool.
To evaluate an AI-based tool that assists PCPs and NPs with diagnoses of dermatologic conditions.None reported.
(Rates for desired biopsies and referrals decreased slightly with AI assistance).
Top-1 Agreement (vs. Derm. Panel):
• PCPs: 48% (unassisted) vs. 58% (assisted) [an increase of +10%].
• NPs: 46% (unassisted) vs. 58% (assisted) [ an increase of +12%].
Agreement (vs. Biopsy):
• PCPs: +3% (64% to 67%).
• NPs: +8% (60% to 68%).
AI assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.
Kim et al. 2022
PMID: 35061691
Prospective Controlled Study
Weighting from appraisal: 9.5
285 cases (patients) with suspected skin neoplasms.
18 trainee doctors (11 dermatology, 7 intern).
1. Trainees (Control group, n=141): Routine exam + photo review.
2. Trainees (AI group, n=144): Routine exam + photo review + AI assistance (Model Dermatology).
To evaluate whether an AI algorithm improves the accuracy of nondermatologists in diagnosing skin neoplasms in a real-world setting.None reported.AI Group (Before vs. After AI): Top-1 Accuracy increased from 46.5% to 58.3%, an increase of 11.8%.
Control Group (Before vs. After Photo Review): Top-1 Accuracy 46.1% vs. 51.8%, an increase of 5.7%.
The number of differential diagnoses also increased significantly in the AI group.
In real-world settings, AI augmented the diagnostic accuracy of trainee doctors. The number of differential diagnoses also increased.
Krakowski et al. 2024
PMID: 38594247
Systematic Review & Meta-Analysis
Weighting from appraisal: 10
10 studies eligible for meta-analysis. Participants included dermatologists, residents, and non-dermatologists.1. Clinicians (unassisted).
2. Clinicians assisted by deep learning-based AI.
To study the effect of AI assistance on the accuracy of skin cancer diagnosis by clinicians.Notes that clinicians can perform worse when the AI tool provides incorrect recommendations.Clinicians (unassisted): Pooled Sens 74.8% (95% CI 68.6-80.1), Pooled Spec 81.5% (95% CI 73.9-87.3).
Clinicians (AI-assisted): Pooled Sens 81.1% (95% CI 74.4-86.5), Pooled Spec 86.1% (95% CI 79.2-90.9), an increase of 6.7% and 4.6% respectively. Dermatologists showed an increase of 6.3% and 4.6% in sensitivity and specificity respectively. The PCPs showed an increase of 13% and 10.8% in the diagnostic sensitivity and specificity respectively.
AI in the hands of clinicians has the potential to improve diagnostic accuracy. The largest improvement was among non-dermatologists.
Maron et al. 2020
PMID: 32915161
Web-Based Survey Study
Weighting from appraisal: 5.5
12 board-certified dermatologists.
1200 unique dermoscopic images (50% melanomas, 50% nevi).
Dermatologists assessing dermoscopic images.
Convolutional Neural Network (CNN) used as AI support.
To investigate whether live AI support improves the accuracy, sensitivity, and specificity of dermatologists in the dichotomous image-based discrimination between melanoma and nevus.None reported.
When dermatologists were correct and AI was incorrect (10% of cases), dermatologists wrongly changed their answer 39% of the time.
Dermatologist without AI: Mean Sens 59.4%, Mean Spec 70.6%, Mean Accuracy 65.0%.
Dermatologist with AI: Mean Sens 74.6% (P=.003), Mean Spec 72.4% (P=.54), Mean Accuracy 73.6% (P=.002).
An increase of 15.2%, 1.8% and 8.6% respectively.
AI support can significantly improve the overall accuracy and sensitivity of dermatologists for the image-based discrimination of melanoma and nevus. This supports the use of AI tools to aid clinicians.
Tschandl et al. 2020
PMID: 32572267
Web-based diagnostic study
Weighting from appraisal: 6
302 raters (169 dermatologists, 77 residents, 38 GPs) from 41 countries.
1,412 dermoscopic images (7 disease categories).
1. Human raters (unassisted).
2. Human raters + AI-based multiclass probabilities.
3. Human raters + AI-based malignancy probability.
4. Human raters + AI-based CBIR.
To address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows.None reportedAI Multiclass Support: Accuracy improved from 63.6% (unassisted) to 77.0% (a 13.3% increase).
Other AI Support: No improvement was observed for AI-based malignancy probability or CBIR.
Experience: The least experienced clinicians gained the most from AI-based support.
Good quality AI-based support (specifically multiclass probabilities) improves diagnostic accuracy over that of either AI or physicians alone. The least experienced clinicians gain the most.
Clinical data collected on the performance of HCPs in the diagnostic accuracy of dermatological conditions​

In this section, we present the clinical data collected on the diagnostic accuracy of healthcare practitioners (HCPs, including dermatologists and primary care practitioners) in diagnosing various dermatological conditions. The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. As in previous sections, we focus on studies that provide insights into the diagnostic accuracy of both PCPs and dermatologists, who represent the standard clinical practice and are our state of the art.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Escalé-Besa et al. 2023
PMID: 36922556
Prospective Diagnostic Study
Weighting from appraisal: 7.5
100 consecutive patients visiting a General Practitioner (GP) in a primary care setting in central Catalonia, Spain.1. General Practitioners (GPs) (face-to-face).
2. Teledermatology (TD) dermatologists.
3. Autoderm ML model (AI).
To perform a prospective validation of an image analysis ML model (Autoderm) as a diagnostic decision support tool, comparing its accuracy to GPs and teledermatology dermatologists in a real-life setting.None reported.Overall (100 cases):
• Top-1 Accuracy: AI 39% vs. GP 64% vs. TD 72%.
In-Distribution (82 cases):
• Top-3 Accuracy: AI 75% vs. GP 76%.
• Top-5 Accuracy: AI 89% vs. TD (Top-3) 90%.
The ML model's overall diagnostic accuracy (Top-1) in real-life conditions is lower than that of both GPs and dermatologists. However, the model shows capability as a support tool for GPs, particularly in differential diagnosis (Top-5 accuracy of 89% for trained diagnoses).
Han et al. 2020
PMID: 32243883
Original Article
Weighting from appraisal: 5.5
Reader study: 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals. Validation Sets: SNU (2,201 images; 134 disorders) & Edinburgh (1,300 images; 10 disorders).Medical professionals (dermatologists, residents).
Deep Neural Network (DNN) algorithm.
To train and validate a DNN for malignancy prediction, suggesting treatment options, and multi-class classification (134 disorders).
To assess if the algorithm can improve the performance of medical professionals ("Augmented Intelligence").
None reported.Malignancy (Algorithm standalone): AUC 0.937 (SNU dataset) and 0.928 (Edinburgh dataset).
Human-in-the-loop (Malignancy): With AI assistance, sensitivity of 47 clinicians improved from 77.4% to 86.8% and specificity from 92.9% to 93.9%.
The mean Top-1 and Top-3 accuracies of dermatologists were 49.9% and 67.2%
The algorithm performed comparably to experts and, when used as an ancillary tool ("Augmented Intelligence"), significantly improved the diagnostic performance of medical professionals for both malignancy prediction and multiclass classification.
Han et al. 2022
PMID: 35662137
Randomized Controlled Trial
Weighting from appraisal: 10
576 consecutive cases (patients) with suspicious lesions.
8 trainees (4 dermatology residents, 4 non-dermatology trainees).
1. Trainees (unaided group, n=281).
2. Trainees (AI-assisted group, n=295) using "Model Dermatology" algorithm.
To validate whether a multiclass AI algorithm could augment the accuracy of non-expert physicians in a real-world setting, including diverse out-of-distribution conditions.None reportedOverall Top-1 Accuracy (Trainees): 53.9% (AI-assisted) vs. 43.8% (unaided) (P=0.019).
Non-Derm Trainees Top-1: 54.7% (AI-assisted) vs. 29.7% (unaided).
Derm Residents Top-1: 53.1% (AI-assisted) vs. 57.3% (unaided) (P=0.55).
The multiclass AI algorithm augmented the diagnostic accuracy of non-expert physicians in dermatology, especially for the least experienced (non-dermatology trainees).
Jain et al. 2021
PMID: 33909051
MRMC Diagnostic Study
Weighting from appraisal: 8
40 clinicians (20 PCPs, 20 NPs).
1048 retrospective teledermatology cases (120 skin conditions).
1. PCPs and NPs (unassisted).
2. PCPs and NPs with an AI-based assistive tool.
To evaluate an AI-based tool that assists PCPs and NPs with diagnoses of dermatologic conditions.None reported.Top-1 Agreement (vs. Derm. Panel):
• PCPs: 48% (unassisted) vs. 58% (assisted) [+10%]. Top-3 Agreement: • PCPs: 57%
• NPs: 46% (unassisted) vs. 58% (assisted) [+12%].
Agreement (vs. Biopsy):
• PCPs: +3% (64% to 67%).
• NPs: +8% (60% to 68%).
AI assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.
Kim et al. 2022
PMID: 35061691
Prospective Controlled Study
Weighting from appraisal: 9.5
285 cases (patients) with suspected skin neoplasms.
18 trainee doctors (11 dermatology, 7 intern).
1. Trainees (Control group, n=141): Routine exam + photo review.
2. Trainees (AI group, n=144): Routine exam + photo review + AI assistance (Model Dermatology).
To evaluate whether an AI algorithm (http://b2019.modelderm.com) improves the accuracy of nondermatologists in diagnosing skin neoplasms in a real-world setting.None reported.AI Group (Before vs. After AI): Top-1 Accuracy increased from 46.5% to 58.3% (P=.008).
Top-3 Accuracy increased from 54.9% to 71.5%
Dermatologists: Top-1 Accuracy 61.8% and Top-3 accuracy: 71.5.
The number of differential diagnoses also increased significantly in the AI group.
In real-world settings, AI augmented the diagnostic accuracy of trainee doctors. The number of differential diagnoses also increased.
Liu Y et al. 2020
PMID: 32424212
DLS Development & Validation
Weighting from appraisal: 7.5
Development set: 16,114 de-identified cases.
Validation set B: 963 cases.
Reader group: 6 dermatologists, 6 PCPs, 6 NPs.
1. Dermatologists, PCPs, NPs (unassisted).
2. Deep Learning System (DLS).
To develop and validate a DLS to provide a differential diagnosis of 26 common skin conditions (and 419 total) using images and clinical data. To compare DLS accuracy to dermatologists, PCPs, and NPs.Not applicable (retrospective development).Top-1 Accuracy (on 963 cases): DLS 66% vs. Dermatologists 63% vs. PCPs 44% vs. NPs 40%. DLS was non-inferior to dermatologists.
Top-3 Accuracy: DLS 90% vs. Dermatologists 75% vs. PCPs 60% vs. NPs 55%.
The DLS can distinguish between 26 common skin conditions at a level non-inferior to dermatologists and more accurate than PCPs and NPs, highlighting its potential to assist general practitioners.
Muñoz-López et al. 2021
PMID: 33037709
Prospective Diagnostic Study
Weighting from appraisal: 7
340 cases from 281 consecutive patients in a teledermatology clinic.
Reader study: 9 providers (3 dermatologists, 3 residents, 3 GPs).
1. Teledermatologists (real-time).
2. AI algorithm (Model Dermatology).
3. Reader study (Dermatologists, Residents, GPs).
To assess the diagnostic performance and potential clinical utility of an AI algorithm (Model Dermatology) in a real-life telemedicine setting.None reportedOverall Top-1 Accuracy: AI 41.2% vs. GPs 49.3% vs. Residents 57.8% vs. Dermatologists 60.1%.
'In-Distribution' Balanced Top-1 Accuracy: AI 47.6% vs. GPs 39.7% vs. Residents 47.7% vs. Dermatologists 49.7%.
In this prospective real-life study, the AI algorithm's accuracy is inferior to dermatologists. However, when analysis was limited to 'in-distribution' diagnoses, the AI's balanced accuracy was comparable to dermatologists/residents and superior to GPs.
Clinical data collected on the referral accuracy of PCPs in dermatological conditions​

In this section, we present the clinical data collected on the referral accuracy of primary care practitioners (PCPs) in dermatological conditions. The following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. As in previous sections, we focus on studies that provide insights into the referral accuracy of PCPs, and the reduction of unnecesarry referrals by both, the implementation of AI-guided medical devices or teledermatology.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Baker et al. 2022
(Abstract)
Pilot prospective study
Weighting from appraisal: 5.5
Patients with urgent skin cancer referrals at a UK hospital trust (500-600 cases/month).1. Standard 2-week wait (2WW) referral pathway.
2. New AI teledermatology software (UKCA marked Class IIa) used at community hubs for triage.
To test the use of AI teledermatology software to triage urgent skin cancer referrals and manage increased demand.None reportedThe AI service led to a 62% reduction in the number of patients requiring an urgent face-to-face appointment.
• Reduction of unnecessary referrals rate (back to GP) was 34%.
• 38% of patients still required an urgent face-to-face appointment.
The introduction of the AI teledermatology service significantly reduced the number of urgent face-to-face appointments needed and helped the trust meet its 2-week wait targets.
Eminović et al. 2009
PMID: 19433694
Cluster Randomized Controlled Trial (RCT)
Weighting from appraisal: 9
631 patients referred by 85 General Practitioners (GPs) in the Netherlands.1. Control group (n=39 GPs): Standard referral to a dermatologist.
2. Intervention group (n=46 GPs): Teledermatologic (store-and-forward) consultation first.
To determine whether teledermatologic consultations can reduce in-person referrals to a dermatologist by GPs.None reportedThe proportion of office visits considered "preventable" by the dermatologist was 39.0% in the teledermatology group vs. 18.3% in the control group.
• This was an absolute reduction of unnecessary referrals 20.7% (95% CI, 8.5%-32.9%).
Teledermatologic consultation offers the promise of reducing referrals to a dermatologist by 20.7%.
Jain et al. 2021
PMID: 33909051
MRMC Diagnostic Study
Weighting from appraisal: 8
40 clinicians (20 PCPs, 20 NPs).
1048 retrospective teledermatology cases (120 skin conditions).
1. PCPs and NPs (unassisted).
2. PCPs and NPs with an AI-based assistive tool.
To evaluate an AI-based tool that assists PCPs and NPs with diagnoses of dermatologic conditions.None reported.
(Rates for desired biopsies and referrals decreased slightly with AI assistance).
PCPs assisted reduced the rate of referrals by 42% (previously they derived 45%), a reduction of 3% of referrals.AI assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.
Knol et al. 2006
PMID: 16539753
Prospective Study
Weighting from appraisal: 7.5
505 teledermatology consultations for 503 patients from 29 participating GPs in the Netherlands.1. GPs' stated intention to refer (hypothetical).
2. Store-and-forward teledermatology consultation (digital photos + clinical info sent to dermatologist).
To investigate the reduction in dermatological referrals following primary-care teledermatology consultation.None reported.Referral Reduction: Of the 306 patients the GPs intended to refer, teledermatology prevented the referral for 163 (53%). Adjusted for missing data, the reduction was 51% (95% CI 47-58%).
New Referrals: Of 144 patients GPs did not intend to refer, 17% were referred after the tele-consult.
Consultation using digital store-and-forward teledermatology by the GP can halve (51-53%) the number of referrals to a dermatologist for selected patients.
Clinical data on the impact of AI-Guided Medical Devices on Dermatology Waiting Times and the Current Healthcare Landscape in Spain and the EU.​

In this section, we present clinical data on the impact of AI-guided medical devices on dermatology waiting times and the current healthcare landscape in Spain and the EU. The following table summarizes key studies and reports that provide insights into how AI technologies are influencing dermatology services, particularly in terms of reducing waiting times and improving access to care. In addition to peer-reviewed studies, we also include relevant reports from governmental bodies to provide a comprehensive overview of the current state of dermatology services in the Basque Country, Spain, and the EU.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Giavina-Bianchi et al. 2020
PMID: 33437950
Cross-sectional Retrospective
Weighting from appraisal: 6.5
30,976 individuals (55,624 skin lesions) from the São Paulo public health system waiting list.Store-and-forward teledermatology (SF-TD) triage project.To evaluate the proportion of individuals who could be assessed in primary care using teledermatology, and how this affected the waiting time for an in-person dermatologist appointment.None reported.53% of patients were managed in primary care. 43% were referred to in-person dermatologists. 4% were referred directly to biopsy.
• This led to a 78% reduction in the mean waiting time for in-person appointments (from 6.7 months to 1.5 months).
The use of teledermatology as a triage tool significantly reduced the waiting time for in-person visits, improving health care access and utilizing public resources wisely.
Giavina-Bianchi et al. 2020
PMID: 32314966
Retrospective Cohort
Weighting from appraisal: 5
6633 individuals aged 60+ (12,770 skin lesions) from the São Paulo teledermatology project.Store-and-forward teledermatology (SF-TD) triage project.To evaluate the proportion of lesions in individuals aged 60+ that could be managed by teledermatology in primary care.None reported.66.66% of dermatoses (8408/12,614) were managed in primary care. 27.10% were referred to an in-person dermatologist. 6.24% were referred directly to biopsy.
• Project reduced mean waiting time from 6.7 months to 1.5 months (a 78% reduction).
Teledermatology helped to treat 67% of the dermatoses of older individuals without an in-presence visit, thus optimizing dermatological appointments for the most severe, surgical, or complex diseases.
Morton et al. 2010
PMID: 21198539
Observational Study
Weighting from appraisal: 5
Patients referred for 'urgent suspected cancer' (289 photo-triage, 188 conventional) in Forth Valley, Scotland.1. Conventional letter referral (all booked to consultant clinic).
2. Community-based photo-triage (close-up + dermoscopic images).
To compare the outcomes and costs of conventional and photo-triage referral pathways for suspected skin cancers.None reported.Photo-triage allowed 91% (263/289) of patients to get definitive care at the first visit, vs. 63% (117/186) conventionally. It reduced the number requiring a consultant clinic by 72%.
• Mean wait time for MM treatment was 36 days (photo) vs. 39 (conventional), a reduction of 7.7% in waiting time.
Community photo-triage improved referral management of suspected skin cancer, increased service capacity, was marginally cheaper (£1.70 per patient), and reduced hospital visits.
Hsiao & Oh 2008
PMID: 18485493
Retrospective Chart Review
Weighting from appraisal: 8
169 skin cancer patients (from 3 remote VA primary care clinics) treated in dermatology surgery clinics.1. Conventional text-based electronic consult request.
2. Store-and-forward (S/F) teledermatology consult (images + text).
To examine the time intervals in which skin cancer patients (referred conventionally or by S/F teledermatology) were evaluated, diagnosed, and treated.None reported.Mean Time from Referral:
• Initial Consult: 4 days (TD) vs. 48 days (Conv.).
• Biopsy: 38 days (TD) vs. 57 days (Conv.) (p=.034p=.034p=.034).
• Surgery: 104 days (TD) vs. 125 days (Conv.) (p=.006p=.006p=.006). A reduction of 17% of cumulative waiting time.
Clinical outcomes in skin cancer management via teledermatology, as measured by times to diagnosis and surgical treatment, can be comparable to, or better than, conventional referrals for remote patients.
Spanish SNS Report June 2025
(SISLE-SNS Data June 2025)
Patients on the Spanish National Health System (SNS) waiting list.National Health System waiting list registry (surgical and consults).To report the status of the waiting lists (number of patients, wait times, % > 6 months) for surgical procedures and specialist consultations in the SNS. In this case centred on waiting time to attend the dermatological consultation.Not applicable (Registry report).Surgical - Dermatology: 19,569 patients waiting. Mean wait: 69 days. 7.4% wait > 6 months.
Consults - Dermatology: 8.00 patients/1000 hab. Mean wait: 121 days. 70.3% wait > 60 days.
Basque Country:
• Consults - Dermatology: 3.59 patients/1000 hab. Mean wait: 43 days. 53.9% wait > 60 days.
As of June 2025, the mean wait for a dermatology consultation (131 days) is the longest of all specialties, while the wait for surgery (69 days) is one of the shortest. On the other hand, the mean wait for a dermatology consultation in the Basque Country is 43 days, shorter than the mean in Spain.
DREES Report 2018
(France)
40,000 people from the Constances cohort in France.Standard appointment booking with French medical professionals.To survey and report on the waiting times for access to care for GPs and various specialists in France (2016-2017 data).None reported (Report).Median Wait Time (All motives):
• General Practitioner: 2 days.
• Dermatologist: 50 days.
Mean Wait Time (All motives):
• General Practitioner: 6 days.
• Dermatologist: 61 days.
Half of GP appointments are obtained in less than 2 days. For specialists like dermatology, median wait times are longer (50-52 days), though they are much shorter if the reason is new or worsening symptoms.
DERMAsurvey 2013
(EUMS Report)
42 delegates (EUMS dermatology section) from 33 European countries.National healthcare systems in 33 European countries.To evaluate variations in healthcare systems, access to care, and national approaches to diagnostics and treatment for skin diseases in 33 European countries.Not applicable (Survey of systems).Waiting Times (Regular Visit): Mean 35.7 days. Ranged from less than 1 day (Greece) to 96 (UK), 112 (Slovenia), and 133 (Ireland) days.
Waiting Times (Emergency): Mean 1.9 days.
Waiting Times (Skin Tumour Surgery): Mean 18.4 days.
There are extensive variations in dermatology health care across Europe. Waiting times for regular visits average 35.7 days but exceed 3 months in countries like the UK and Ireland.
Clinical data on the impact of AI-Guided Devices on Remote Patient Management Rates in Dermatological consultations​

In this section, we present clinical data on the impact of AI-guided devices on remote patient management rates in dermatological consultations. The following table summarizes key studies that provide insights into how AI technologies are influencing remote patient management, particularly in terms of reducing the need for in-person visits and improving access to care.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Giavina-Bianchi et al. 2020
PMID: 33437950
Cross-sectional Retrospective
Weighting from appraisal: 6.5
30,976 individuals (55,624 skin lesions) from the São Paulo public health system waiting list.Store-and-forward teledermatology (SF-TD) triage project.To evaluate the proportion of individuals who could be assessed in primary care using teledermatology, and how this affected the waiting time for an in-person dermatologist appointment.None reported.53% of patients were managed remotely in primary care. 43% were referred to in-person dermatologists. 4% were referred directly to biopsy.
.
The use of teledermatology as a triage tool significantly reduced the waiting time for in-person visits, improving health care access and utilizing public resources wisely.
Giavina-Bianchi et al. 2020
PMID: 32314966
Retrospective Cohort
Weighting from appraisal: 5
6633 individuals aged 60+ (12,770 skin lesions) from the São Paulo teledermatology project.Store-and-forward teledermatology (SF-TD) triage project.To evaluate the proportion of lesions in individuals aged 60+ that could be managed by teledermatology in primary care.None reported.66.66% of dermatoses (8408/12,614) were managed remotely in primary care. 27.10% were referred to an in-person dermatologist. 6.24% were referred directly to biopsy.
• Project reduced mean waiting time from 6.7 months to 1.5 months (a 78% reduction).
Teledermatology helped to treat 67% of the dermatoses of older individuals without an in-presence visit, thus optimizing dermatological appointments for the most severe, surgical, or complex diseases.
Orekoya et al. 2021
(Abstract)
Retrospective Review
Weighting from appraisal: 5
988 patients referred to a 2-week-wait (2WW) skin cancer clinic in September 2020.1. Referral after face-to-face (F2F) GP consultation.
2. Referral after remote GP consultation (mostly telephone + photos).
To assess whether the mode of consultation (F2F or remote) in primary care affected the outcomes of consultations in 2WW skin cancer clinics.None reported.A higher proportion of patients who had remote consultations were discharged (43.4%) from the 2WW clinic than patients who had F2F consultations (36.2%).
• A significantly higher number of benign lesions were referred following a remote consultation (70%) vs. a F2F consultation (59%) (P=0.004P=0.004P=0.004).
This study highlights the value of F2F consultations for the initial assessment of lesions in primary care, in order to reduce the number of unnecessary referrals and hospital visits.
Kheterpal et al. 2023
PMID: 37891695
Implementation Evaluation
Weighting from appraisal: 5
218 TD referrals from 4 Duke primary care (DPC) pilot sites.Hybrid TD program: PCPs send e-consults (clinical + dermoscopic images) to dermatology, followed by a video visit with a dermatologist/resident.To evaluate the implementation (barriers, facilitators, outcomes) of a hybrid TD virtual clinic at four primary care practices.None reported. (Focus on implementation barriers).Access: Mean time from e-consult to video visit was 7.5 days (vs. >6 months for in-person).
Adoption: Varied widely; one clinic used TD for 22% of all derm referrals, another for only 2%. 35% of patients could be managed remotely
PCP Barriers: Time burdens, poor clinic flow, discomfort with image taking.
The hybrid TD virtual clinic effectively reduced patient wait times for dermatology from > 6 months to ~ 1 week, but adoption was variable. Addressing PCP barriers is key to increasing uptake.
Whited 2015
PMID: 26433206
Review Article
Weighting from appraisal: 4.5
Patients and providers using teledermatology (review of multiple studies).Store-and-forward (S/F) and Real-time (RT) teledermatology vs. conventional care.To review the evidence for teledermatology, focusing on diagnostic reliability, diagnostic accuracy, clinical outcomes, and user satisfaction.None reportedIn-person dermatology visits decrease by an average of 45.5% (S/F) to 61.5% (RT).
• Clinical outcomes are comparable to conventional care.
• Diagnostic reliability (agreement) is high and comparable to in-person agreement. • 53.5% of patients were able to be handled remotely
Teledermatology is a diagnostically reliable means of diagnosing skin conditions with comparable clinical outcomes and high patient satisfaction. It reduces in-person visits.
Clinical data on PCP Referral Accuracy for Dermatological Conditions​

In this section, we present the clinical data collected on the referral accuracy of primary care practitioners (PCPs) in dermatological conditions. he following table summarizes the key studies included in this section, highlighting their design, population, outcomes, and main conclusions. As in previous sections, we focus on studies that provide insights into the referral accuracy of PCPs, specifically in metrics such as sensitivity and specificity. These articles describing the current standard of clinical practice (e.g., sensitivity and specificity of PCPs to detect necessary referrals) were used to define the context for this state-of-the-art review. These documents, while not part of the formal literature data extraction (as they do not meet the PICO-based inclusion criteria), provide the benchmark against which the device's performance is compared.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Burton et al. 1998
J Med Screen
Screening Study
Weighting from appraisal: 8.5
109 volunteers (mean age 61) screened by 63 GPs (31 trained, 32 untrained) and 4 skin cancer specialists.1. Untrained General Practitioners (GPs).
2. GPs trained in melanoma diagnosis.
3. Skin cancer specialists (as reference).
To measure the screening performance (sensitivity, specificity, PPV) of trained and untrained GPs in screening men and women aged 50+ for melanomas in the process of referral.None reportedScreening (Detecting subjects with melanoma):
• Trained GPs: Sens 0.98, Spec 0.52, PPV 0.22.
• Untrained GPs: Sens 0.95, Spec 0.49, PPV 0.20.
Referral sensitivity: 70% (95% CI: 67-73%)
Referral specificity: 52% (95% CI: 43.61%).
GPs achieved high sensitivity in screening for melanoma subjects (95-98%) but at the cost of very low specificity (49-52%). On the other hand, GPs showed a 70% sensitivity and a 52% specificity in the detection of patients that need referral to dermatology. Training in melanoma diagnosis significantly improved a GP's ability to diagnose a melanoma correctly but did not significantly improve their overall screening statistics (sensitivity/specificity).
Gerbert et al. 1996
Arch Dermatol
Prospective Study
Weighting from appraisal: 6
71 primary care residents, 15 dermatologists and dermatology residents.1. Primary Care Physicians (residents).
2. Dermatologists (and residents).
To determine PCPs' readiness to triage lesions suspicious for skin cancer; To compare their abilities to dermatologists; To assess if accuracy on slides transfers to patients.None reported.Dermatologists' scores were almost double those of primary care residents.
• Primary care residents failed 50% of the time to correctly diagnose nonmelanoma skin cancer. PCPs showed a sensitivity of 79% (95% CI: 72-86%) and a specificity of 73% (95% CI: 66-80%) identifying patients that needed to be referred to dermatology.
Dermatologists' diagnostic scores were almost double those of primary care residents. Performance was positively associated with previous dermatology experience.
Clinical data collected on Inter-Observer Reliability in HS Severity Assessment using IHS4 scoring system​

In this section, we present the clinical data collected on inter-observer reliability in Hidradenitis Suppurativa (HS) severity assessment using the International Hidradenitis Suppurativa Severity Score System (IHS4). The following table summarizes key studies that provide insights into the consistency of IHS4 scoring among different observers, highlighting their design, population, outcomes, and main conclusions.

StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Goldfarb et al. 2021
Br J Dermatol
Psychometric Assessment
Weighting from appraisal: 9.5
Raters (dermatologists) assessing photographs of HS patients.Existing HS outcome tools (lesion counts, Hurley, Sartorius, IHS4).To assess the reliability and validity of the Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R) tool.Not applicable (Psychometric assessment).Inter-rater reliability (ICC): 0.88 (95% CI 0.77–0.94).
Intra-rater reliability (ICC): 0.94 (95% CI 0.88–0.97).
IHS4 inter-rater reliability: 0.47 (95% CI: 0-33-0.66).
The HASI-R is a valid and reliable outcome measurement instrument for HS that incorporates both inflammation and body surface area, addressing the time-consuming and unreliable nature of existing lesion-count tools.
Thorlacius et al. 2019
Br J Dermatol
Reliability Study
Weighting from appraisal: 10
10 dermatologists rating 30 patients with HS (all Hurley stages) from photographs.HS outcome instruments: Hurley staging, modified Sartorius score (MSS), HS-PGA, HSS, and lesion counts (abscesses, nodules, fistulas).To determine the inter-rater agreement and reliability of the most commonly used outcome instruments and staging systems in hidradenitis suppurativa (HS).Not applicable (Psychometric assessment).Inter-rater reliability (ICC 2,1):
• Substantial: Hurley (0.80), Modified Sartorius (0.80).
• Moderate: HS-PGA (0.72), HSS (0.64), Fistula count (0.62), Abscess count (0.59), Nodule count (0.54). Overall IHS4 inter-rater reliability: 0.47 (95% CI: 0.32-0.65).
Hurley staging and the modified Sartorius score demonstrated substantial inter-rater reliability. Lesion counts and the HSS showed only moderate reliability, suggesting they are less suitable as standalone outcome measures in multicenter trials.
Clinical data on the Variability in FAGA Severity Grading Using the Ludwig Scale​

In this section, we present scientific guidelines about how to interpretate the metrics used to assess the agreement between observers, in this case, the severity grading of Female Androgenetic Alopecia (FAGA) using the Ludwig Scale. This is due to the fact that there is limited clinical data specifically addressing the variability in FAGA severity grading using this scale. However, we can provide a general overview of the metrics commonly used to assess inter-observer agreement in clinical settings, which can be applied to the context of FAGA severity grading.

MetricDescriptionInterpretation GuidelinesGuidelines Reference
Cohen's Kappa (κ)A statistical a measure of agreement between two raters on an ordinal scale, which accounts for the degree of disagreement rather than just whether they agree or disagree- κ < 0: Agreement worse than chance agreement
- κ = 0.01-0.20: Slight agreement
- κ = 0.21-0.40: Fair agreement
- κ = 0.41-0.60: Moderate agreement
- κ = 0.61-0.80: Substantial agreement
- κ = 0.81-1.00: Almost perfect agreement
Weighted Kappa: Nominal Scale agreement with provision for scaled disagreement or partial credit (Cohen, 1968); Landis & Koch, 1977
Pearson's Correlation Coefficient (r)A measure of the linear correlation between two raters' scores on a continuous scale- r = 1: Perfect positive correlation
- r = 0.70-0.99: Strong positive correlation
- r = 0.40–0.69: Moderate positive correlation
- r = 0.10-0.39: Weak positive correlation
- r = 0: No correlation
- r < 0: Negative correlation
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences; Evans, J.D. (1996). Straightforward Statistics for the Behavioral Sciences
Assessment of Expert Consensus on the Perceived Utility of the device​

In this section, we describe the methodology used to define and assess the expert consensus on the perceived utility of the device along with the guidelines followed, including the pre-defined threshold for minimum acceptable agreement.

Methodological TermDescriptionKey PointsReferences
Expert ConsensusA structured method to quantify the collective opinion of an expert panel on a specific topic (in this case, the perceived utility of a device).Consensus is determined by comparing survey results to a pre-defined agreement threshold.

- Methodological literature does not set a single universal threshold, but an agreement of ≥75% is frequently considered a substantial or optimal majority consensus.
Diamond, I. R., et al. (2014). Defining consensus: a systematic review recommends guideline-specific definitions. Journal of Clinical Epidemiology.

Fitch, K., et al. (2001). The RAND/UCLA Appropriateness Method User's Manual. RAND Corporation.

Summary of articles retained for the description of similar devices​

Clinical data collected on SkinVision​
StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Udrea et al. 2019
PMID: 31494983
Retrospective algorithm performance study
Weighting from appraisal: 8.5
Sensitivity set: 285 histopathologically validated skin cancer cases (138 MM, 147 KC/precursors) from two clinical studies (195 cases) and the app's user database (90 cases).
Specificity set: 6000 clinically validated benign cases from the app's user database.
A smartphone application SkinVision using a machine learning algorithm.To evaluate the accuracy (sensitivity and specificity) of the newest version of the smartphone app for risk assessment of skin lesions.None technical problems reported. 14 out of 285 (pre)malignant cases were classified as low risk (false negatives).
This included 10 out of 138 malignant melanomas (MMs).
Overall (pre)malignancy:
_ Sensitivity: 95.1% (95% CI: 91.9-97.3%).
Malignant Melanoma:
_ Sensitivity: 92.8% (95% CI: 87.8-96.5%).
Specificity (on 6000 benign cases):
* Specificity: 78.3% (95% CI: 77.2-79.3%).
This smartphone app provides a high sensitivity to detect skin cancer; however, there is still room for improvement in terms of specificity.
Sangers et al. 2022
PMID: 35124665
Prospective multicenter diagnostic accuracy study
Weighting from appraisal: 9
372 patients (785 total lesions) at two Dutch dermatology outpatient clinics.
Lesions included 418 suspicious lesions and 367 benign control lesions.
A CE-marked mHealth app SkinVision, version RD-174 using a Convolutional Neural Network (CNN). Tested on iOS (iPhone Xr) and Android (Galaxy S9) devices.To identify the diagnostic accuracy (sensitivity and specificity) of the app for detecting premalignant and malignant skin lesions.Non reported
False negatives included 1 invasive melanoma, 2 in situ melanomas, 2 squamous cell carcinomas, and 13 basal cell carcinomas.
Overall (pre)malignancy:
_ Sens: 86.9% (95% CI: 82.3-90.7).
_ Spec: 70.4% (95% CI: 66.2-74.3).
Performance by device:
_ iOS: Sensitivity 91.0%.
_ Android: Sensitivity 83.0%.
The diagnostic accuracy of the mHealth app is "far from perfect," but it is potentially promising to empower patients to self-assess skin lesions. Additional validation is warranted, particularly for suspicious pigmented skin lesions.
Gregoor et al. 2023
PMID: 37261324
Pilot feasibility study (mixed-methods)
Weighting from appraisal: 8
50 patients recruited from 3 primary care (GP) practices in the Netherlands.1. AI-based mHealth app SkinVision used by patients before GP consultation.
2. GPs' unassisted (blinded) diagnosis.
3. GPs' unblinded diagnosis (to assess app's impact).
To investigate the conditions and feasibility of a larger study on implementing the AI app in primary care (both in patient hands and as a GP tool).None reported(Exploratory, n=45):
* AI App: Sensitivity 90.9% (95% CI: 55.5–99.8%) (9/10), Specificity 80.0% (95% CI: 63.0–91.6%) (28/35).
* GP (Blinded): Sensitivity 80.0% (95% CI: 44.4–97.5%) (8/10), Specificity 80.0% (95% CI: 63.1-91.6%) (28/35).
Studying the implementation of the AI app in primary care appears feasible. 54% of patients with a benign skin lesion and a low-risk app rating indicated they would be reassured and cancel their GP visit.
Thissen et al. 2017
PMID: 28562195
Algorithm calibration & evaluation study
Weighting from appraisal: 8.5
341 lesions from 256 consecutive patients at a dermatology department in the Netherlands.
A subset of 108 lesions was used for the final evaluation.
A smartphone app SkinVision using a recalibrated rule-based (fractal and classical) image analysis algorithm.To assess the sensitivity and specificity of the recalibrated algorithm in diagnosing melanoma, nonmelanoma skin cancer, and premalignant lesions.7 out of 35 (pre)malignant lesions were missed (rated low/medium risk).
This included one basal cell carcinoma (BCC) rated as low risk. All melanomas (n=3) were rated high risk.
(On n=108 test set):
_ Overall (pre)malignancy: Sensitivity 80% (95% CI; 62-90%), Specificity 78% (95% CI: 66-86%).
_ Performance dropped without the patient questionnaire (Sensitivity 71%, Specificity 56%).
The mHealth app may offer support to professionals less familiar with differentiating skin lesions, although it is less accurate than a dermatologist's clinical eye. It adds value by analyzing both pigmented and non-pigmented lesions.
Maier et al. 2014
PMID: 25087492
Prospective diagnostic study
Weighting from appraisal: 5
195 melanocytic lesions from consecutive patients at a German dermatology department.
144 lesions were included in the final statistical evaluation.
1. A smartphone app SkinVision using fractal image analysis.
2. Clinical and dermoscopic diagnosis by two dermatologists.
To prospectively evaluate the app's sensitivity and specificity for diagnosing malignant melanoma, compared to clinical diagnosis and histopathology.None technical problems reported. The app missed 7 out of 26 melanomas (false negatives).
* 2 were rated low risk (both melanoma in situ).
* 5 were rated medium risk.
Dermatologists missed 2 out of 26 melanomas.
(On n=144 test set):
_ AI App (Melanoma): Sensitivity: 73% (95% CI: 52-88%), Specificity: 83% (95% CI: 75-89%).
_ Dermatologists (Melanoma): Sensitivity: 88% (95% CI: 69-98%), Specificity: 97% (95% CI 92-99%).
The smartphone application might be a promising tool for pre-evaluation by laypersons, but it is "inferior to the diagnostic evaluation by a dermatologist".
Gregoor et al. 2023
PMID: 37210466
Retrospective population-based pragmatic study
Weighting from appraisal: 9.5
18,960 mHealth app users (from 2.2 million insured adults offered free access) matched 1:3 to 56,880 non-user controls.1. mHealth app SkinVision with AI (CNN) assessment plus teledermatologist review.
2. Standard of care (controls who did not use the app).
To evaluate the impact of the mHealth app on dermatological healthcare consumption in a real-world, population-based setting.None reported(Healthcare Claims Analysis):
_ mHealth users had more claims for (pre)malignant skin lesions than controls (6.0% vs 4.6%; OR 1.3).
_ mHealth users also had a much higher risk of claims for benign skin tumors and nevi (5.9% vs 1.7%; OR 3.7).
* The cost per additional (pre)malignancy detected was €2567.
The app appears to have a positive impact by detecting more (pre)malignancies, but this must be balanced against the "stronger increase in care consumption for benign skin tumors and nevi".
Clinical data collected on Huvy​
StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Zanchetta et al. 2025
JEADV Clinical Practice
Retrospective Algorithm Performance Study
Weighting from appraisal: 7.5
Test Datasets:
2966 images total, from:
1. GLOMEL (Public database): 2672 dermoscopic images.
2. Dermatologists (Private): 157 dermoscopic images.
3. TeleExp (Private): 137 real-life tele-dermatology images (68 usable).
1. AI-DSS HUVY with traditional binary classification (melanoma vs. non-melanoma).
2. AI-DSS HUVY with innovative ternary classification (melanoma vs. non-melanoma vs. 'doubtful').
To assess a deep learning algorithm's performance in classifying melanoma across diverse datasets (public, dermatologist-collected, and real-life tele-dermatology images), and to evaluate a novel 'doubtful' category.None reportedBinary (Sensitivity/Specificity):
_ TeleExp: 92.3% / 58.5%
Ternary (Sensitivity/Specificity):
_ TeleExp: 100% / 67.6% (18.5% 'doubtful' rate)

Introducing the 'doubtful' category significantly increased specificity (e.g., +15.6% on TeleExp, +19% on GLOMEL) while maintaining or improving sensitivity.
Introducing a 'doubtful' category significantly improves the AI's performance, especially specificity, compared to a simple binary classification. This three-level approach (high-risk, low-risk, doubtful) can help primary care providers make more informed referrals.
Clinical data collected on DERM​
StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Thomas et al. 2023
PMID: 38020164
Prospective real-world post-deployment study
Weighting from appraisal: 5
10,925 patients (14,500 cases) referred to the urgent 2-week-wait (2WW) skin cancer pathway at two UK NHS hospitals.
Analysis based on 8,571 lesions with confirmed outcomes.
1. AI-DSS (DERM), versions A and B, used as a triage tool.
2. A "second-read review" by a consultant dermatologist for all cases DERM marked for discharge.
To report the prospective, real-world performance of the DERM AI tool after deployment in two NHS skin cancer pathways.None reportedAI-DSS (Melanoma or not):
_ DERM-vA: Sensitivity 95.0% (95% CI: 90-97.6%) – 97.0% (95% CI: 84.7-99.5%), Specificity 58.8 (95% CI: 57.4-60.2%) – 63.2% (95% CI: 59.5-66.7%).
_ DERM-vB: Sensitivity 100.0% (95% CI: 93.9-100% / 82.4-100%), Specificity 80.4 (95% CI: 77.2-83.4%) – 80.9% (95% CI: 79.3-82.4%).
AI-DSS (Malignant or not):
_ DERM-vA: Sensitivity 96.0 (95% CI: 94.4-97.2%) – 99.3% (95% CI: 96.3-99.9%), Specificity 33.1 (95% CI: 29.3-71.1%) – 45.0% (95% CI: 43.4-46.6%).
_ DERM-vB: Sensitivity 98.9 (95% CI: 96-99.7%) – 100.0% (95% CI: 94.7-100%), Specificity 60.6 (95% CI: 56.6-64.5%) – 64.8% (95% CI: 62.9-66.7%).
DERM's real-world performance met sensitivity targets. The newer version (DERM-vB) showed improved specificity and correctly referred all skin cancers. The performance supports removing the human second-read review to maximize system benefits.
Phillips et al. 2019
PMID: 31617929
Prospective, multicenter, masked diagnostic trial
Weighting from appraisal: 6.5
514 patients with at least one suspicious lesion scheduled for biopsy, from 7 UK hospitals.
Analysis included 1550 images (551 biopsied, 999 control).
1. AI-DSS (Deep Ensemble for Recognition of Malignancy - DERM).
2. Specialist clinician assessment.
3. Images taken with 3 cameras (iPhone 6s, Galaxy S6, DSLR).
To determine the accuracy of the AI algorithm (DERM) in identifying melanoma from dermoscopic images, compared to specialist assessment.None reported(All Lesions, at 100% Sensitivity):
_ AI (iPhone 6s): Specificity 64.8%.
_ Specialists: Specificity 69.9%.
(AUROC - All Lesions):
_ AI (iPhone 6s): 95.8% (95% CI, 94.1%-97.6%).
_ Specialists: 90.8% (95% CI,87.5%-96.1%).
The AI algorithm can detect melanoma from dermoscopic images with a similar level of accuracy as specialists.
Marsden et al. 2024
PMID: 38585154
Prospective, single-centre, masked, non-inferiority trial
Weighting from appraisal: 9
700 patient attendances (867 lesions) referred to a UK teledermatology cancer pathway.
Per-protocol (PP) population: 622 patients (789 lesions).
1. Standard of Care (SoC): Teledermatology review by consultant dermatologists (using DSLR images).
2. AI-DSS (DERM): Independently assessed smartphone (iPhone XR) images.
Primary: To show the AI had a higher rate of correctly classifying non-malignant lesions (as not needing urgent referral) compared to SoC, while maintaining non-inferior sensitivity.None reportedPrimary Outcome: The AI had a significantly higher rate of correctly identifying non-malignant lesions as not needing urgent referral vs. SoC (p < 0.0246).
(Malignancy Sens/Spec, PP pop.):
_ SoC: 97.0% (95% CI: 88-99.5%) / 71.9% (95% CI: 68.4-75.1%).
_ AI (Real-world): 94.0% (95% CI: 84.7-98.1%) / 73.3% (95% CI: 69.9-76.4%).
The AI (AlaMD) identified significantly more lesions that did not need urgent referral compared to teledermatologists, demonstrating potential to reduce unnecessary referrals and specialist burden.
Clinical data collected on Dermalyser​
StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Papachristou et al. 2024
PMID: 38234043
Prospective real-life clinical trial
Weighting from appraisal: 8.5
228 patients (presenting 253 lesions) seen by 138 trained Primary Care Physicians (PCPs) at 36 primary care centres in Sweden.1. PCPs' unassisted clinical suspicion (recorded as 'high' or 'low').
2. An AI-based decision support system (smartphone app Dermalyser®).
To determine the diagnostic performance of an AI-based smartphone app for melanoma detection when used prospectively by PCPs on lesions of concern.None reportedAI-DSS (standalone, predefined cutoff):
_ Sensitivity: 95.2%
_ Specificity: 60.3%
_ NPV: 99.3%
_ AUROC: 0.960 (95% CI: 0.93-0.98)
PCPs (unassisted suspicion):
_ Sensitivity: 57.1% (12/21)
_ Specificity: 83.2% (193/232)
* NPV: 95.5%
The AI-based tool showed high diagnostic accuracy. Its high Negative Predictive Value (NPV) suggests it could help PCPs safely identify benign lesions, potentially reducing unnecessary excisions and referrals without increasing the risk of missing melanomas.
Clinical data collected on ModelDerm​
StudyBaseline PopulationStandard clinical practice or device(s)?Objective(s)Safety outcomesPerformance outcomesMain conclusion
Navarrete-Dechent et al. 2021
PMID: 33049269
External validation study
Weighting from appraisal: 5
A public dataset of 100 clinical images of biopsied skin cancers (37 melanomas, 40 BCCs, 23 SCCs) from Caucasian patients in the US.1. Han et al. (2020b) 174-disease algorithm (modelderm.com), tested with 4 upload methods.
2. Han et al. (2020a) 178-disease region-based algorithm (rcnn.modelderm.com).
To evaluate the external validity and reliability of the Han et al. (2020b) and Han et al. (2020a) algorithms on a public dataset of skin cancers.None reported174-disease alg (Intended Use):
_ Overall Top-1 accuracy: 39%.
_ Overall Top-3 accuracy: 63%.
_ Performance was sensitive to upload condition (x1 magnification was worst).
178-disease alg:
_ Overall Top any accuracy: 52%.
The 174-disease algorithm showed modest improvement over a previous 12-disease version, but limited transportability to an external dataset remained. The 178-disease algorithm also had low sensitivity. Performance was sensitive to image magnification.
Kim et al. 2022
PMID: 35061692
Prospective controlled before-and-after study
Weighting from appraisal: 9.5
285 cases with skin neoplasms suspected of malignancy from two tertiary care centers in South Korea (Asians).1. AI group (n=144): Trainee doctors (interns/residents) diagnosed, then were assisted by an AI algorithm (http://b2019.modelderm.com) and could modify their diagnosis.
2. Control group (n=141): Trainee doctors diagnosed, then reviewed photos (no AI).
To evaluate whether an AI algorithm (Model Dermatology, build 2019) improves the accuracy of nondermatologists (trainee doctors) in diagnosing skin neoplasms in a real-world setting.None reportedAI Group (Trainees):
* Top-1 accuracy (exact diagnosis) increased from 46.5% to 58.3% (P=.008), an increase of 11.8%.
Control Group (Trainees):
* Top-1 accuracy did not change significantly (46.1% vs. 51.8%).
In a real-world setting, AI augmented the diagnostic accuracy (for exact diagnosis) of trainee doctors. (Limitation: tested only on Asians).
Navarrete-Dechent et al. 2018
PMID: 29864435
External validation study
Weighting from appraisal: 5
100 clinical images of biopsied skin cancers (37 melanomas, 40 BCCs, 23 SCCs) from Caucasian patients in the US (ISIC Archive).The Han et al. (2018) 12-disease classifier, tested via a public web application.To explore the generalizability (external validity) of the Han et al. (2018) algorithm on a public dataset of skin cancers.None reportedOverall:
_ Top-1 accuracy was 29% (29 of 100).
_ Top-5 accuracy was 58% (58 of 100).
The results suggest that the sensitivity of the Han et al. algorithm, especially for melanoma, is "considerably lower" when applied to a different patient population (external dataset).
Han et al. 2020
PMID: 32243882
Retrospective validation & reader study
Weighting from appraisal: 5.5
Validation: Edinburgh dataset (1,300 images; 10 disorders) and SNU dataset (2,201 images; 134 disorders).
Reader Study: 240 SNU images tested on 47 clinicians (21 dermatologists, 26 residents) & 23 non-medical professionals.
1. AI-DSS (trained on 220,680 images of 174 disorders).
2. Clinicians (dermatologists, residents) unassisted.
3. Clinicians assisted by the AI-DSS.
To validate an algorithm for multi-class classification (134 disorders), malignancy prediction, and treatment suggestion, and to assess its ability to improve clinician performance.None reportedAI-DSS (standalone, SNU):
_ Top-1 accuracy (134 classes): 44.8%.
Clinicians (AI-assisted):
_ Top-1 accuracy (134 classes, 4 doctors) improved by 7.0%.
* Non-medical pro's malignancy sensitivity improved from 47.6% to 87.5%.
The algorithm may serve as "Augmented Intelligence" that can empower medical professionals in diagnostic dermatology by improving their sensitivity and accuracy.
Muñoz-López et al. 2021
PMID: 33037709
Prospective diagnostic accuracy study
Weighting from appraisal: 7
340 consecutive cases (from 281 patients) who submitted images to a teledermatology clinic in Chile. (87 unique diagnoses, mostly inflammatory).1. AI-DSS (Han et al. 174-disease algorithm; modelderm.com) used by teledermatologist during the visit.
2. Reader study (9 providers: 3 dermatologists, 3 residents, 3 GPs) assessing images only.
To assess the diagnostic performance and potential clinical utility of the AI algorithm in a real-life telemedicine setting using patient-submitted photos.None reportedOverall Top-1 Accuracy:
* AI (41.2%) was lower than Dermatologists (60.1%), Residents (57.8%), and GPs (49.3%).
'In-distribution' Balanced Top-1 Accuracy:
* AI (47.6%) was comparable to Dermatologists (49.7%) and Residents (47.7%), and superior to GPs (39.7%).
The AI algorithm's accuracy is inferior to dermatologists for patient-submitted teledermatology images, but it shows promise as a tool for triage or as support for GPs, especially for "in-distribution" diseases.
Han et al. 2020
PMID: 33237903
Retrospective validation study
Weighting from appraisal: 5.5
10,426 biopsied cases (43 disorders; 1,222 malignant, 9,204 benign) from Severance Hospital, Korea (2008-2019). Reader test used a subset (1,320 cases).1. AI-DSS (rcnn.modelderm.com) analyzing unprocessed images.
2. Attending physicians (65) in real-world practice (with full clinical info).
3. Reader test dermatologists (44) using images only.
To compare the performance of a CNN algorithm against dermatologists in both real-world practice (with clinical info) and experimental settings (images only) for diagnosing skin neoplasms.None reportedReal-world (AI vs. Physicians with clinical info):
* AI was inferior. (AUC 0.863 (95% CI: 0.852-0.875)) Sensitivity 62.7% (95% CI: 59.9-65.1%) and Specificity 90% (95% CI: 89.4-90.6%), vs. Physicians' (Sensitivity/Specificity of 70.2%/95.6%).
Reader Test (AI vs. Physicians with images only):
* AI was comparable. (AI Sensitivity/Specificity 66.9% (95% CI: 57.7-76.0) / 87.4% (95% CI 82.5-92.2)) vs. Readers' (Sens/Spec 65.8% (95% CI: 55.7-75.9) / 85.7% (95% CI: 82.4-88.9%)).
The algorithm diagnosed skin tumors with nearly the same accuracy as dermatologists when using only photographs (experimental setting), but its performance was inferior to physicians in real-world practice, highlighting the value of clinical information.
Han et al. 2022
PMID: 36171272
Retrospective algorithm performance study
Weighting from appraisal: 10
1. RD dataset: 1,282 images from Reddit (r/melanoma).
2. Hospital datasets: (Edinburgh, SNU, TeleDerm) for comparison.
1. AI-DSS (Model Dermatology, Build2021; 184 classes).
2. Reader study (6 general physicians, 32 laypersons) on RD dataset.
To investigate whether the algorithm (ModelDerm) can classify images from an Internet community (out-of-distribution) and compare its performance to hospital datasets (in-distribution).None reportedOn Hospital Datasets (SNU/Edinburgh):
_ AI performance was equivalent to dermatologists.
On RD Dataset (Top-1 Accuracy):
_ AI (39.2%) was equivalent to GPs (36.8%) and superior to laypersons (19.2%).
* AI performance degraded on *inadequate* quality images (Top-1: 43.2% vs 32.9%).
The algorithm's performance, while equivalent to dermatologists on curated clinical datasets, "deteriorated" in real-world (RD and TeleDerm) datasets due to poor image quality and out-of-distribution disorders.

Results: data from registries and databases​

No registry reports were found in this literature search.

Results of the vigilance databases analysis​

The two records found in the vigilance database searches are the two registered medical devices in EUDAMED by SkinVision. Thus, no vigilance data records were found in this vigilance search after screening.

Applicable standards​

As previously mentioned, the manufacturer already identified the applicable standards for the device under evaluation. No additional search has been conducted. The list of applicable standards is available in the "Applicable standards" section of this document.

State of the Art presentation​

Introduction to Dermatology and Clinical Challenges​

Dermatological conditions represent a relevant health problem globally. The reliance of dermatology on visual diagnosis has made it a key area for the application of telehealth methods, particularly store-and-forward (SF) teledermatology (Giavina-Bianchi et al. 2020) The current landscape faces several critical challenges that affect patient access and clinical efficiency:

  1. Extended Wait Times and Access Issues: The overall challenge is minimizing the time patients wait for a dermatological appointment (Giavina-Bianchi et al. 2020). SF-TD has been shown to improve access to specialized care and reduce time to treatment, resulting in high patient satisfaction (Giavina-Bianchi et al. 2020; Eminovic et al. 2009)].

  2. Diagnostic Accuracy and Consistency: A major objective in clinical practice is reducing unnecessary referrals while maintaining high sensitivity for malignancy detection (Giavina-Bianchi et al. 2020). Diagnostic accuracy for skin cancer is still higher for face-to-face dermatologists (67% to 85%), but teledermatology accuracy ranges from 51% to 85% (Chen et al. 2024). Studies suggest current data are insufficient to conclude on the superiority of dermatologists or the adequacy of Primary Care Providers (PCPs) for melanoma care (Chen et al. 2001).

  3. Variability assessment: Objective scoring of disease severity is crucial for longitudinal monitoring. Existing measures for conditions like Hidradenitis Suppurativa (HS) often exhibit low inter-rater reliability (Thorlacius et al. 2019). For instance, inter-rater reliability for lesion counts in HS ranged from poor for abscesses (ICC=0.07) to fair for inflammatory nodules (ICC=0.40) (Goldfarb et al. 2021).

Application of Artificial Intelligence in Dermatology​

Artificial intelligence (AI) is a rapidly emerging field in dermatology, leveraging deep learning (DL) and convolutional neural networks (CNNs) for image analysis (Baker et al. 2022). AI-guided medical devices are primarily designed to address the aforementioned challenges by serving as diagnostic decision support tools (Escalé-Besa et al. 2023; Han et al. 2020).

  • Triage and Caseload Reduction: Successful implementation has been shown to reduce the caseload for hospital specialists (Marsden et al. 2024; Han et al. 2022). A pilot study using an AI teledermatology service demonstrated a 62% reduction in the number of patients requiring an urgent face-to-face appointment with a dermatologist (Baker et al. 2022; Orekoya et al. 2021; Thomas et al. 2023). In one pathway, 19% of cases identified as benign by the AI were discharged immediately back to the General Practitioner (GP) (Baker et al. 2022).

  • Augmentation, Not Substitution: CNNs alone will not replace the contextual knowledge of dermatologists; rather, the combination of CNN and human dermatologists has the potential to improve the diagnostic accuracy of cutaneous tumors (Han et al. 2020; Ba et al. 2022). AI assistance significantly improved the sensitivity of 47 clinicians for malignancy prediction by 12.1% (Han et al. 2020). For non-medical professionals, sensitivity improved by 83.8% (Han et al. 2020).

  • Need for Real-World Validation: While diagnostic yields are high in silico, prospective studies conducted under real-life conditions utilizing non-standardized imaging are imperative for validating these tools before they are adopted into primary care (Escalé-Besa et al. 2023) A systematic review found that AI in the hands of clinicians has the potential to improve diagnostic accuracy, but noted that most studies were conducted in experimental settings, highlighting the need for future investigation in real-life settings (Krakowski et al. 2024).

Similar devices​

DERM (Deep Ensemble for Recognition of Malignancy)​

DERM is an AI-based decision support system designed to assist in the detection of skin cancer, particularly melanoma. It utilizes deep learning algorithms to analyze dermoscopic images and classify lesions based on their malignancy risk. The system has been evaluated in several clinical studies, demonstrating its potential to improve diagnostic accuracy and reduce unnecessary referrals in dermatology practice. The key component is the AlaMD algorithm (Marsden et al. 2024). It currently holds UK Conformity Assessed (UKCA) Class IIa approval, granted in April 2022, and CE marking as a Class III medical device under European Medical Device Regulation (MDR) 2017/745.

In a comparison study, DERM achieved a sensitivity of 91.0% for skin cancer detection, which was lower than the standard of care (SoC) sensitivity of 97.0%. However, DERM demonstrated a higher specificity of 80.4% compared to the SoC specificity of 71.9% (Marsden et al. 2024). This indicates that while DERM may miss some malignant cases, it is more effective at correctly identifying benign lesions, potentially reducing unnecessary biopsies and referrals, being this reduction of 3 for AI compared to 4.2 for SoC, suggesting also improved efficiency in resource utilization in dermatology clinics (Marsden et al. 2024).

In addition to this, a real-world post-deployment study on DERM-vB at the UHB site confirmed a high senstivity for melanoma detection (100.0% for 58/58 lesions) and a Negative Predictive Value (NPV) of 100.0% for melanoma or not (2045/2045) (Thomas et al. 2023). It also showed that the service integrating DERM overall had a 62% reduction in the number of patients requiring an urgent face-to-face appointment with a dermatologist (Thomas et al. 2023; Baker et al. 2022).

Several limitations are described in the study of Marsden et al. 2024. The real-world evaluation of AI lacks standardized methods. Differential verification bias is a concern in trials since ethical concerns prevent biopsy of all patients with low likelihood of cancer (Marsden et al. 2024). Additionally, the performance of DERM may vary based on the population and clinical setting and it has not been validated in phototypes V and VI, necessitating further validation across diverse cohorts to ensure generalizability (Marsden et al. 2024).

Huvy (SLC.AI)​

Huvy is an AI-powered dermatology platform developed by SLC.AI that aims to enhance skin cancer detection and diagnosis. It employs advanced machine learning algorithms to analyze skin lesion images and provide risk assessments for malignancy. Huvy is designed to assist dermatologists and primary care providers for the adjunctive assessment of cutaneous pigmented lesions (Zanchetta et al. 2025). It has received CE marking as a Class IIb medical device under European Medical Device Regulation (MDR) 2017/745.

In the study published by Zanchetta et al. (2025), they focused on creating an innovative deep learning algorithm for three-level melanoma detection (high risk, doubtful, and benign) across different dermatoscopic and tele-dermatology datasets (Zanchetta et al. 2025). The research included real-life pictures taken by primare care practitioners for teledermatology, aligning the study with use in non-specialist settings (Zanchetta et al. 2025).

Some limitations are developed in the study of Zanchetta et al. (2025). Rigorous testing of HUVY was limited to pigmented melanomas and explicitly excluded mucosal or large lesions, tattoos, and Fitzpatrick skin grades IV-VI (Zanchetta et al. 2025). The device also requires images captured by approved dermoscopic hardware systems.

SkinVision​

SkinVision is an AI-supported mobile application designed to assess skin lesions for potential malignancy. The app utilizes machine learning algorithms to analyze images of skin lesions taken by users and provides a risk assessment (Maier et al. 2014). It achieved CE marking as a Class IIa medical device under European Medical Device Regulation (MDR) 2017/745 in August 5, 2025.

Early versions of the algorithm achieved an accuracy of 81%, with sensitivity of 73% and specificity of 83% for melanoma detection, though dermatologit's evaluation was superior (Maier et al. 2014). Notwithstanding, the newer version has demonstrated a high sensitivity of 95% to detect skin cancer, suggesting it may be a valuable tool for early detection (Udrea et al. 2019).

One large prospective study found the app had a sensitivity of 86.6% and a specificity of 70.8% (Udrea et al. 2019). This study demonstrated performance variability based on the device type: the app performed at a significantly higher sensitivity on iOS devices (91.0%) compared to Android (83.0%) (p=0.02). Specificity did not significantly differ between device types (71.5% vs 69.0%).

On the other hand, the sensitivity was found to be higher for lesions in skin fold areas (92.9%) compared to non-skin fold areas (84.2%) (p=0.03) (Sangers et al. 2022). Specificity was also higher for skin fold areas (72.0%) compared to non-skin fold areas (56.5%) (p=0.04) (Sangers et al. 2022).

Regarding the perception of HCPs, qualitative studies have captured the real-time experiences of GPs and doctor's assistants using the app during consultations, being these positives overall (Gregoor et al. 2023).

It is important to highlight that future research is needed to study the app's performance in diverse populations, including different skin types. Performance optimization is noted to be dependent on the continual availability of more data to train the risk classification algorithm (Udrea et al. 2019).

Dermalyser​

Dermalyser is image analysis software utilizing clinically validated AI as a decision-support system for medical professionals when assessing suspected lesions for skin cancer. It is used in conjunction with a smartphone-compatible dermatoscope. It has received CE marking as a Class IIa medical device under European Medical Device Regulation (MDR) 2017/745.

When tested in a real-life primary care setting, the underlying model showed a Top-3 accuracy (75%) comparable to that of GPs (76%) for known diseases on which the algorithm had been trained (Papachristou et al. 2024). Furthermore, 92% of GPs considered it a useful diagnostic support tool for differential diagnosis.

Several limitations are described in the study of Papachristou et al. 2024. The study was conducted in Sweden, where PCPs have undergone specific training in dermatology, which may limit the generalizability of the findings to other healthcare settings (Papachristou et al. 2024). Additionally, the study did not include a control group of PCPs not using the AI tool, making it difficult to isolate the effect of the AI on diagnostic performance (Papachristou et al. 2024). The study emphasized the critical need for external testing in real-life conditions for data validation and regulation before such AI diagnostic models can be widely used in primary care (Papachristou et al. 2024).

ModelDerm​

ModelDerm (Model Dermatology) is a neural network designed to function as augmented intelligence, classifying numerous skin disorders (up to 184) and often providing multiclass classification, malignancy prediction, and treatment suggestions (Han et al. 2020). It has received CE marking as a Class I medical device under European Medical Device Regulation (MDR) 2017/745.

In the studies carried out with the device, the standalone algorithm achieved an Area Under the Curve (AUC) for malignancy detection of 0.937 on the Asian SNU dataset and 0.928 on the Caucasian Edinburgh dataset (Han et al. 2020; Krakowski et al. 2024). For multi-class classification of 134 disorders, the algorithm achieved a Top-1 accuracy of 44.8% on the SNU dataset and a Top-5 accuracy of 78.1%.

When assisting clinicians, the AI significantly improved their diagnostic performance. For instance, the Top-1 accuracy of clinicians improved by 7.0% when assisted by the AI, and non-medical professionals saw an improvement in malignancy sensitivity from 47.6% to 87.5% with AI assistance (Han et al. 2020; Krakowski et al. 2024).

When used by HCPs, AI assistance significantly improved the diagnostic accuracy of clinicians (Han et al. 2020; Krakowski et al. 2024). A randomized controlled trial involving 576 cases confirmed that the AI-assisted group had a significantly higher Top-1 accuracy (53.9%) compared to the unaided group (43.8%, P=0.019) (Han et al. 2022). The augmentation was most significant for non-dermatology trainees, whose accuracy improved by 25.0%. However, the augmentation for dermatology residents, who had more experience, was generally non-significant (Han et al. 2022). Furthermore, for a subset of biopsied cases, the accuracy of AI-augmented trainees was comparable to that of attending dermatologists (Han et al. 2022). The system can also predict primary treatment options (e.g., steroids, antibiotics, antivirals, antifungals) with AUCs ranging from 0.828 to 0.918 (Han et al. 2020).

Finally, in a real-world teledermatology setting, The algorithm, in an environment simulating teledermatology, showed an ability to triage Internet community-acquired images with the same accuracy level as general physicians (Han et al. 2022).

Several limitatins have been described in the studies carried out with ModelDerm. It is true that performance generally degrades when applied to real-world, diverse image types (Han et al. 2022). When tested retrospectively for external validity, the algorithm's Top-1 accuracy was sometimes low (e.g., 29.7% for melanoma) (Navarrete et al. 2020). Furthermore, performance may drop significantly when the AI's top predictions are incorrect (Han et al. 2022; Krakowski et al. 2024). Additionally, the majority of training and validation images were of Asian patients (Fitzpatrick types III/IV), necessitating further testing across various races and ethnicities (Han et al. 2020; Han et al. 2022).

Expected benefits of AI-guided medical devices in dermatology​

The expected benefits of deploying AI-guided systems in dermatology directly address the clinical challenges identified:

  • Triage and Efficiency: AI systems act as an automated clinical management tool, enabling screening and triage, thereby reducing unnecessary referrals and significantly lowering the hospital specialist caseload (Marsden et al. 2024; Baker et al. 2022). This aids in resolving the increasing burden of non-urgent referrals (Escalé-Besa et al. 2023).

  • Diagnostic Accuracy improvement: AI enhances the diagnostic performance of healthcare professionals, particularly less experienced users (non-dermatology trainees or PCPs) (Han et al. 2022). These tools expand the range of differential diagnoses considered by clinicians, providing a Top-5 list that can help broaden their diagnostic and therapeutic approaches (Escalé-Besa et al. 2023).

  • Objective severity assessment: AI provides the ability to quantify visible clinical signs (such as intensity, count, and extension of features like erythema, scaling, and induration). This precise, objective measurement aids in severity assessment and is specifically designed to facilitate the longitudinal monitoring of skin conditions, as it has been demonstrated in the clinical validations carried out with the device.

  • Standarization and Transparency: AI facilitates the standardization of image acquisition and interpretation processes. It can provide decision support by suggesting appropriate ICD classes, assisting in the initial stages of diagnosis and treatment planning.

Hazards due to AI-Guided Medical Devices that Could be Relevant to the Device under Evaluation​

While AI-guided medical devices do not typically introduce physical hazards associated with invasive procedures, the primary risks relate to diagnostic error and system integrity. No safety data regarding hazardous events or harm to the patient/user were identified in the literature for similar AI-guided systems in dermatology. However, potential hazards for clinical decision support systems include:

  1. Misdiagnosis: A primary risk is the AI providing incorrect clinical information, resulting in a false negative (malignant lesion classified as benign) (Krakowski et al. 2024). Faulty AI can mislead the entire spectrum of clinicians, including experts (Han et al. 2022). Hence, it is crucial that manufacturers acknowledge this and address it in their risk management processes, ensuring that users are aware of the AI's limitations and the necessity always for clinical judgment.

  2. Poor Image Quality or Artifacts: The AI relies heavily on the input image quality. Suboptimal image quality, artifacts, or poor lighting can compromise device performance (Navarrete el al. 2020). This risk is mitigated by devices providing warnings and guidance on proper image capture, and certain devices (like DERM) assess the performance of the service integrating AI based on the lesion- and case-level analysis (Thomas et al. 2020). In the case of our device, this is addressed by the image quality assessment module, which ensures that only images meeting specific quality criteria are processed by the AI algorithm.

  3. Out-of-Distribution Cases: AI models may perform poorly when encountering cases that differ significantly from the training data, such as rare conditions, images from diverse populations real-world images or internet community-acquired images (Han et al. 2022). This can lead to misclassification and diagnostic errors. Manufacturers should ensure that their AI systems are trained on diverse datasets and include mechanisms to identify and flag out-of-distribution cases.

  4. Equity and Bias: Continued surveillance is needed to ensure equitable access, particularly since patients with darkly pigmented skin. he exclusion of certain Fitzpatrick skin types (e.g., V and VI) in validation studies remains a persistent limitation in the field (Papachristou et al. 2024; Jain et al. 2021).

Benefit-Risk Profiles of Alternative AI-Guided Medical Devices​

To evaluate the state-of-the-art landscape, we examined the benefit-risk profiles of the AI-guided medical devices for diagnostic support in dermatology previously described. The following table summarizes the key benefits and risks associated with each device:

DevicePrimary benefitKey study resultsPrimary risk (and mitigation)
DERMTriage and caseload reduction62% reduction in urgent face-to-face appointments; Sensitivity 91.0%, Specificity 80.4% for skin cancer detection (Baker et al. 2022; Thomas et al. 2023). Achieved NPV of 100.0% for melanoma in real-world post-deployment. A reduction of requested biopsys (3 for AI vs 4.2 SoC (Marsden et al. 2024))Misdiagnosis and risk of False Negatives (FN).(mitigated by clinical judgment and training and rigorous Post-Market Surveillance (PMS))
HuvySupports diagnostic support for melanoma through three-level classification designed to improve referral accuracy from Primary Care (Zanchetta et al. 2025)High accuracy in classifying pigmented lesions; however, limited to specific lesion types and skin grades (Zanchetta et al. 2025)Limited intended use (pigmented lesions only, exclusion of Fitzpatrick IV-VI). Requires specific dermoscopic hardware systems (mitigated by clear usage guidelines and clinical oversight)
SkinVisionEarly detection of skin cancer through user-friendly mobile app. High sensitivity for screening purposes and adaptability to consumer devices.Achieved 95% sensitivity for skin cancer detection (Udrea et al. 2019; Sangers et al. 2022)Performance variability based on device and use environment. Early studies showed physician diagnosis was superior to the app alone (mitigated by user education and continuous algorithm updates)
DermalyserTargeted approach to improving melanoma detection in Primary Care settings (Papachristou et al. 2024)Top-3 accuracy of 75%, comparable to GPs (76%) for known diseases; 92% of GPs found it useful (Papachristou et al. 2024)Exclusion of all non-melanoma skin cancers (BCC, SCC) and exclusion of melanin-rich skin types (V-VI) (mitigated by further validation studies in diverse settings)
ModelDermSignificant augmentation of diagnostic accuracy, especially for non-expert clinicians, across a wide variety of diseases (up to 134 disorders) (Han et al. 2020)Achieved AUC of 0.937 for malignancy detection; significantly improved clinician diagnostic accuracy when assisted by AI (25% increase) (Han et al. 2020; Han et al. 2022)Misdiagnosis, especially in out-of-distribution cases and diverse population. Risk of reliance on faulty predictions; incorrect AI prediction can lead to a 12.2% drop in accuracy for trainees (mitigated by diverse training datasets and user awareness of limitations)

Discussion​

Based on the clinical data provided by the literature, the state-of-the-art demonstrates that AI-guided medical devices have successfully transitioned from in silico performance studies to impactful real-world clinical integration, significantly enhancing triage and reducing specialist caseloads (Baker et al. 2022; Thomas et al. 2023). By offering diagnostic support and objective severity assessment, these tools directly combat the structural problems of long wait times and inconsistent diagnostic accuracy between care levels (Han et al. 2022). Studies consistently show that the least experienced clinicians gain the most from AI-based support, making these tools highly valuable for augmenting Primary Care Practitioners (PCPs) (Han et al. 2022; Jahn et al. 2022).

However, the effectiveness of AI remains intrinsically linked to the operational environment. Performance generally degrades when applied to real-world, diverse image types (Han et al. 2022). This reality underscores the necessity of implementing AI as an adjunctive tool that augments, rather than replaces, human intelligence. The combination of AI and clinician expertise has been shown to yield the highest diagnostic accuracy, particularly when clinicians are aware of the AI's limitations and maintain critical oversight (Han et al. 2020; Han et al. 2022). Rigorous adherence to regulatory standards is crucial, including implementing robust Post-Market Surveillance (PMS) plans, documenting Root Cause Analysis (RCA) for possible problems detected, and providing transparency regarding algorithm characteristics to users (Thomas et al. 2023). The growing body of real-world evidence confirms that when integrated correctly and used under human clinical supervision, AI systems offer a favorable benefit-risk profile, improving access and supporting clinical decision-making across the spectrum of skin conditions.

Synthesis​

The following table provides a concise synthesis of the state-of-the-art analysis and the implications for safe clinical adoption of AI-guided dermatology tools in primary and specialist care.

AspectDetails
1. Methodological Referential for Bibliographic Search- MedDev 2.7/1 Rev.4 (applicable guidance for clinical evaluation)
- PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses
2. Type of searchSystematic (documented search strategy, screening, eligibility and selection steps; audit trail available in methods section).
3. Results (bibliographic search)Source search yielded N = 228 candidate records. After de-duplication and multi-stage screening, n = 57 clinical articles were included and appraised for methodological quality and relevance. An additional n = 10 items (primarily two manuscripts, 9 guidelines and contextual documents) were referenced to inform clinical context; total material considered = 68. Breakdown used for appraisal: 58 clinical studies; 8 clinical guidelines; 0 unpublished trial reports; 0 registry reports.
4. Referential for data appraisal and weighting- IMDRF MDCE WG/N56FINAL:2019 (risk-based clinical evaluation principles)
- Internal appraisal templates informed by Yale and Johns Hopkins academic resources (see Methods)
5. Results (appraisal summary / mean weight)Appraisal summary for clinical datasets (n = 53): mean weight = 6.88 / 10.
Additional metrics: mean relevance = 4.40 / 6; mean quality = 2.47 / 4; mean level of clinical evidence = 6.3 / 10.
Note: datasets with weight < 4 require justification in the clinical evaluation file; none of the included datasets used in the main analysis had weight < 4 without documented rationale.
6. UseIntended use statement: AI-guided medical devices are intended as an adjunctive clinical decision support tools to assist clinicians (primary care practitioners and dermatologists) during dermatology consultation workflows for triage and diagnostic evaluation of skin conditions. It is not intended to replace clinician judgment. Target population: patients presenting with skin lesions or dermatological complaints across adult age groups. User training, labeling, and intended use constraints consistent with similar devices in the literature are required.
7. Expected complicationsObserved/anticipated hazards: no direct patient harm events attributable to similar devices were identified in the reviewed clinical evidence. Principal risks to be managed: (1) reduced accuracy on heterogeneous, real-world images (dataset shift); (2) inappropriate clinician reliance on AI outputs when used without verification (automation bias); (3) false-negative results leading to missed malignancy or delayed referral; (4) false-positive results increasing unnecessary referrals/biopsies. Recommended risk controls: human-in-the-loop workflow, explicit user instructions and limitations, mandatory training, robust PMS and RCA procedures, and monitoring of real-world performance metrics.
8. Expected benefits and performancesAccess to specialist dermatology services is constrained in many health systems, with variable wait times and heterogeneous diagnostic performance between primary care practitioners (PCPs) and dermatologists. The reviewed literature confirms consistent performance gaps (PCPs show lower sensitivity than dermatologists on clinical image assessments), and that dermoscopy and specialist assessment improve diagnostic accuracy. AI tools have been studied primarily as adjuncts to clinician assessment and as standalone classifiers on curated image sets; real-world performance is commonly lower than reported in controlled datasets, underscoring the need for robust external validation and post-market surveillance.
- Clinical performance observed in reviewed literature: on curated dermoscopic test sets, standalone AI classifiers typically reported sensitivity in the approximate range 80–86% and specificity in the range 77-83%. High-quality meta-analytic evidence (systematic reviews) reports pooled sensitivity and specificity that are consistent with these ranges for melanoma detection using dermoscopic images; performance on clinical (unmagnified) images is lower and more variable. Comparative reader studies demonstrate that AI, when used as a diagnostic adjunct, improves clinician sensitivity and overall accuracy (for example, Maron et al. 2020 reported clinician sensitivity increase from ~59% to ~75% with AI assistance; other reader and trial studies show similar magnitude improvements in sensitivity and modest improvements in specificity or overall accuracy).
- Expected clinical benefits: improved detection sensitivity for malignancy (reducing missed cancers), standardization of preliminary triage decisions, support for prioritization of referrals to secondary care, potential reduction in unnecessary specialist referrals and benign biopsies when AI is combined with clinical assessment, and increased efficiency in workflows (fewer repeat assessments, faster triage). Benefits are contingent on correct deployment: appropriate external validation, integration into clinician workflows with human oversight, and active PMS to detect performance drift.
Conclusion: the evidence supports adoption as a clinician-support tool under controlled conditions and with documented risk controls; standalone use without clinician oversight is not supported by the available clinical evidence and is not recommended in the intended use statement.

References​

Abu Baker, K. et al. (2022). Using artificial intelligence to triage skin cancer referrals: outcomes from a pilot study. British Journal of Dermatology, 188(Supplement 4), ljad113.372.

Ahadi, M. S. et al. (2021). [Open access article on a specialized topic]. Journal of Otorhinolaryngology, Head and Neck Surgery.

Ba, W. et al. (2022). [Convolutional neural networks for cutaneous tumour classification]. European Journal of Cancer. DOI: 10.1016/j.ejca.2022.04.015.

Barata, C. et al. (2023). [A reinforcement learning model for AI based decision support in skin cancer]. Nature Medicine. DOI: 10.1038/s41591-023-02475-5.

Brinker, T. J. et al. (2019a). [Skin cancer classification using convolutional neural networks]. European Journal of Cancer. DOI: 10.1016/j.ejca.2019.04.001.

Brinker, T. J. et al. (2019b). [Superior skin cancer classification by the combination of human and artificial intelligence]. European Journal of Cancer, 119, 11–17. DOI: 10.1016/j.ejca.2019.05.023.

Burton, R. C. et al. (1998). General practitioner screening for melanoma: sensitivity, specificity, and effect of training. J Med Screen, 5, 156-161.

Chen, S. C. et al. (2001). Diagnosing and managing cutaneous pigmented lesions: primary care physicians versus dermatologists. Arch Dermatol, 137(12), 1627–1634.

Chen, S. et al. (2024). [Systematic Review of Skin Cancer Diagnosis by Clinicians]. JAMA Dermatology, 161(2). DOI: 10.1001/jamadermatol.2024.4382.

Cho, S. I. et al. (2019). [Deep learning for lip cancer diagnosis]. British Journal of Dermatology. DOI: 10.1111/bjd.18459.

Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213-220. doi:10.1037/h0026256

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Augustin, M. and Reusch, M. (2013). European Dermatology Health Care Survey 2013. Short Report. [pdf] Hamburg: CVderm, German Center for Health Services Research in Dermatology.

Diamond IR, Grant RC, Feldman BM, et al. Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol. 2014;67(4):401-409. doi:10.1016/j.jclinepi.2013.12.002

Millien, C., Chaput, H. and Cavillon, M. (2018). La moitié des rendez-vous sont obtenus en 2 jours chez le généraliste, en 52 jours chez l'ophtalmologiste. Etudes & Résultats, No. 1085. [pdf] Paris: DREES (Direction de la Recherche, des Études, de l'Évaluation et des Statistiques).

Eminović, N. et al. (2009). Effect of patient-assisted teledermatology on outpatient referral rates. Archives of Dermatology, 145(5), 557-563.

Escalé-Besa, A. et al. (2023). Evaluation of an AI model for skin conditions in a real-life primary care setting. Scientific Reports, 13(4293). DOI: 10.1038/s41598-023-31340-1.

Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co.

Ferris, L. K. et al. (2025). DERM-SUCCESS FDA Pivotal Study: A Multi-Reader Multi-Case Evaluation of Primary Care Physicians Skin Cancer Detection Using Al-Enabled Elastic Scattering Spectroscopy. Journal of Primary Care & Community Health, 16. DOI: 10.1177/21501319251342106.

Fitch, et al. The RAND/UCLA Appropriateness Method User's Manual. Santa Monica, CA: RAND Corporation, 2001. https://www.rand.org/pubs/monograph_reports/MR1269.html.

Gerbert, B. et al. (1996). Primary care physicians as gatekeepers in managed care: primary care physicians' and dermatologists' skills at secondary prevention of skin cancer. Arch Dermatol, 132, 1030-1038.

Giavina-Bianchi, M. et al. (2020a). Benefits of Teledermatology for Geriatric Patients: Population-Based Cross-Sectional Study. Journal of Medical Internet Research, 22(4), e16700. DOI: 10.2196/16700.

Giavina-Bianchi, M. et al. (2020b). [Teletriage project from July 2017 to July 2018 in São Paulo, Brazil]. EClinicalMedicine, 29-30, 100641.

Goldfarb, N. et al. (2021). Hidradenitis Suppurativa Area and Severity Index Revised (HASI-R): psychometric property assessment. British Journal of Dermatology, 184(5), 905-912. DOI: 10.1111/bjd.19565.

Gregoor, A. S. et al. (2023). The impact of an artificial intelligence-based app on healthcare consumption: results of the SPOT cluster randomized controlled trial. eClinicalMedicine, 60, 102019. DOI: 10.1016/j.eclinm.2023.102019.

Gregoor, A. S. et al. (2023). The value of an AI-based smartphone application on health care resource utilisation: a case-control study. npj Digital Medicine, 7(90). DOI: 10.1038/s41746-023-00831-w.

Haenssle, H. A. et al. (2018). Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology, 29(8), 1836-1842. DOI: 10.1093/annonc/mdy166.

Han, S. S. et al. (2018). Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. Journal of Investigative Dermatology, 138(7), 1529–1538.

Han, S. S. et al. (2020). Augmented Intelligence Dermatology in Classifying 134 Skin Disorders. Journal of Investigative Dermatology, 140(8), 1756-1762. DOI: 10.1016/j.jid.2020.01.019.

Han, S. S. et al. (2020). Assessment of deep neural networks for the diagnosis of benign and malignant skin neoplasms in comparison with dermatologists: A retrospective validation study. PLOS Medicine, 17(11), e1003381. DOI: 10.1371/journal.pmed.1003381.

Han, S. S. et al. (2022). Evaluation of Artificial Intelligence-Assisted Diagnosis of Skin Neoplasms: A Single-Center, Paralleled, Unmasked, Randomized Controlled Trial. Journal of Investigative Dermatology, 142(9), 2353–2362. DOI: 10.1016/j.jid.2022.02.003.

Han, S. S. et al. (2022). Clinical utility of an artificial intelligence-based decision support system for skin cancer in non-dermatologist reader tests using real-world data. Scientific Reports, 12(16260). DOI: 10.1038/s41598-022-20632-7.

Hsiao, J. L. et al. (2008). Impact of teledermatology on outpatient care and referrals. Journal of the American Academy of Dermatology, 59(3), 448-453.

Jahn, A. S. et al. (2022). Melanoma Detection by a Deep Learning Convolutional Neural Network on Clinical Images: An Analysis of Potential Clinical Use. Cancers, 14(15), 3829.

Jain, A. et al. (2021). Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Network Open, 4(4), e217249. DOI: 10.1001/jamanetworkopen.2021.7249.

Kheterpal, M. et al. (2023). Teledermatology (TD) is an evidence-based practice that may increase access to dermatologic care. [Manuscript on implementation of hybrid TD program]. (Preprint). DOI: 10.21203/rs.3.rs-2558425/v1.

Kim, Y. J. et al. (2022). Augmenting the accuracy of trainee doctors in diagnosing skin lesions suspected of skin neoplasms in a real-world setting: A prospective controlled before-and-after study. PLOS ONE, 17(1), e0260895. DOI: 10.1371/journal.pone.0260895.

Knol, A. et al. (2006). The value of teledermatology for the decision to refer to a dermatologist: a randomized controlled trial. Journal of Telemedicine and Telecare, 12(2), 74-79.

Krakowski, I. et al. (2024). The diagnostic accuracy of artificial intelligence-assisted skin cancer detection: a systematic review and meta-analysis. npj Digital Medicine, 7(78). DOI: 10.1038/s41746-024-01031-w.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174.

Lee, S. P. et al. (2020). Augmented decision-making improves the diagnostic performance of clinicians. [Manuscript on CNNs for skin lesions].

Liu, Y. et al. (2020). A deep learning system for differential diagnosis of skin diseases. Nature Medicine, 26(6), 900-908. DOI: 10.1038/s41591-020-0842-3.

Maier T, Kulichova D, Schotten K, et al. Accuracy of a smartphone application using fractal image analysis of pigmented moles compared to clinical diagnosis and histological result. J Eur Acad Dermatol Venereol. 2015;29(4):663-667. doi:10.1111/jdv.12648

Marchetti, M. A. et al. (2019). Computer Algorithms Show Potential for Improving Dermatologists' Accuracy to Diagnose Cutaneous Melanoma; Results of ISIC 2017. Journal of the American Academy of Dermatology, 82(2), 270-277. DOI: 10.1016/j.jaad.2019.07.016.

Maron, R. C. et al. (2019). Evaluation of an artificial intelligence-based decision support system for the detection of melanoma in daily clinical practice. European Journal of Cancer, 119, 57-65. DOI: 10.1016/j.ejca.2019.06.028.

Maron, R. C. et al. (2020). Human-Artificial Intelligence Collaboration in the Diagnostic Process of Pigmented Skin Lesions: Impact on Confidence and Management. Journal of Medical Internet Research, 22(9), e18091. DOI: 10.2196/18091.

Marsden, H. et al. (2024). Accuracy of an artificial intelligence as a medical device as part of a UK-based skin cancer teledermatology service. Frontiers in Medicine, 11, 1302363. DOI: 10.3389/fmed.2024.1302363.

Morton CA, Downie F, Auld S, et al. Community photo-triage for skin cancer referrals: an aid to service delivery. Clin Exp Dermatol. 2011;36(3):248-254. doi:10.1111/j.1365-2230.2010.03960.x

Muñoz-López, C. et al. (2021). Performance of a deep neural network in teledermatology: a single-centre prospective diagnostic study. Journal of the European Academy of Dermatology and Venereology, 35(2), 546-553. DOI: 10.1111/jdv.16855.

Navarrete-Dechent, C. et al. (2018). Automated Dermatological Diagnosis: Hype or Reality? Journal of Investigative Dermatology, 138(10), 2277-2279.

Navarrete-Dechent, C. et al. (2020b). ModelDerm algorithm performance in a telemedicine setting. Journal of the European Academy of Dermatology and Venereology.

Navarrete-Dechent, C. et al. (2020c). Multiclass Artificial Intelligence in Dermatology: Progress but Still Room for Improvement. Journal of Investigative Dermatology, 141(5), 1325-1328. DOI: 10.1016/j.jid.2020.06.040.

Orekoya, O. et al. (2021). 'To see or not to see?'' That is the question: teleconsultations in primary care and the impact on 2-week-wait referrals and outcomes. British Journal of Dermatology, 185(Supplement 1), 179.

Papachristou, P. et al. (2024). Evaluation of an artificial intelligence-based decision support for the detection of cutaneous melanoma in primary care: a prospective real-life clinical trial. British Journal of Dermatology, 191(1), 125-133. DOI: 10.1093/bjd/ljae021.

Phillips, M. et al. (2019). Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Network Open, 2(10), e1913436. DOI: 10.1001/jamanetworkopen.2019.13436.

Sangers, T. E. et al. (2022). Validation of a Smartphone Application for Risk Assessment of Pigmented Skin Lesions in a Population-Based Setting. Dermatology, 238(4), 649-656. DOI: 10.1159/000520474.

Smak Gregoor, A. M. et al. (2024). The value of an AI-based smartphone application on health care resource utilisation: a case-control study. npj Digital Medicine, 7(90). DOI: 10.1038/s41746-023-00831-w.

Ministerio de Sanidad (2025) Sistema de Información sobre Listas de Espera en el Sistema Nacional de Salud (SISLE-SNS): Situación a 30 de junio de 2025. Madrid: Gobierno de España.

Thomas, L. et al. (2023). Real-world post-deployment performance of a novel machine learning-based digital health technology for skin lesion assessment and suggestions for post-market surveillance. Frontiers in Medicine, 10, 1264846. DOI: 10.3389/fmed.2023.1264846.

Thissen, M. et al. (2017). mHealth app for risk assessment of pigmented and nonpigmented skin lesions - a study on sensitivity and specificity in detecting malignancy. Telemedicine and e-Health, 23(12), 948-954.

Thorlacius, L. et al. (2019). Inter-rater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa. British Journal of Dermatology, 181(3), 483-491.

Tschandl, P. et al. (2019). Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. The Lancet Oncology, 20(7), 938-947. DOI: 10.1016/S1470-2045(19)30333-X.

Tschandl, P. et al. (2020). Human–computer collaboration for skin cancer recognition. Nature Medicine, 26(8), 1229-1234. DOI: 10.1038/s41591-020-0942-0.

Udrea, A. et al. (2020). Accuracy of a smartphone application for triage of skin lesions based on machine learning algorithms. Journal of the European Academy of Dermatology and Venereology, 34(3), 648–655. DOI: 10.1111/jdv.15933.

Whited, J. D. (2015). Teledermatology. Medical Clinics of North America, 99(6), 1365-1379. DOI: 10.1016/j.mcna.2015.07.005.

Zanchetta, M. et al. (2025). Performance of a Deep Learning Algorithm for Melanoma Classification Across Diverse Dermoscopic and Tele-Dermatology Datasets. JEADV Clinical Practice. DOI: 10.1002/jvc2.70191.

Literature search and publications​

Literature search performed for the state-of-the-art review​

Search traceability​

A complete audit trail of the literature search is provided in the document "SOTA_Literature search.xlsx". This file documents the complete traceability of all queries, the selection process, and the specific reasons for exclusions. The document containes the following tables:

  • Results: This comprehensive sheet details the screening process for every item retrieved. Each entry (identified by a unique DOI or PMID) includes the following information:

    • The query number that retrieved the article.
    • Bibliographic data: Title, authors, journal, publication year and the abstract.
    • A duplicate column, marked "Yes" or "no".
    • The outcome of the selection process at each stage (title, abstract, and full-text review), indicating whether the article was "selected" or "excluded".
    • For excluded articles, the specific reason for exclusion is provided, cross-referencing the selection criteria from previous sections.
    • The appraisal for each Inclusion criteria for selected manuscripts.
  • Additional records: This sheet lists records (manuscripts or guidelines) that were added manually. These publications were included because they were deemed highly relevant and consistent with the research objectives outlined in section Objectives of the literature search.

Search treaceability (vigilance data)​

All queries are presented in section Vigilance databases and searches for vigilance data were performed according to these.

Retained clinical data​

All PDF files of the retained clinical data are available in the document “Clinical data SotA Legit.Health Plus”.

To facilitate data identification, each PDF file has been named using the same nomenclature. This includes the name of the first author “et al.” and the year of the publication.

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-005
Previous
R-TF-015-007 Declaration of interest Taig Mac Carthy
Next
Investigation
  • Objectives and Scope
    • Scope
    • Objectives
    • Applicable standards and guidelines
  • Literature Search
    • Literature Search Plan
      • Literature Search Strategy
      • Evaluator in charge of the searches
      • Sources
      • Identification of relevant medical conditions/medical fields concerned
    • Systematic Literature search for SOTA description
      • Data search question using PICO methodology
      • Generation of keywords and algorithms for bibliographic search
      • Bibliographic search strategy for determining the state of the art
        • Guidelines and recommendations
        • Clinical Papers
      • Similar devices
      • Results from initial queries
      • Vigilance databases
      • Registres
        • Identification of registres
        • Search description
        • Inclusion/exclusion criteria
      • Applicable standards
    • Selection of references for the review of the state of the art
      • Methodology used for selection
      • Results of the selection
    • Appraisal of clinical data for the review of the state-of-the-art
      • Appraisal plan
        • Level of evidence
      • Results of data appraisal
    • GRADE-like certainty assessment
  • Manuscript Appraisal Scores
    • Results of the literature search
      • Summary of articles retained from the the state-of-the-art review in standard clinical practice
        • Clinical data collected on malignancy detection
        • Clinical data collected on the improvement in the accuracy of HCPs in the diagnosis of dermatological conditions
        • Clinical data collected on the performance of HCPs in the diagnostic accuracy of dermatological conditions
        • Clinical data collected on the referral accuracy of PCPs in dermatological conditions
        • Clinical data on the impact of AI-Guided Medical Devices on Dermatology Waiting Times and the Current Healthcare Landscape in Spain and the EU.
        • Clinical data on the impact of AI-Guided Devices on Remote Patient Management Rates in Dermatological consultations
        • Clinical data on PCP Referral Accuracy for Dermatological Conditions
        • Clinical data collected on Inter-Observer Reliability in HS Severity Assessment using IHS4 scoring system
        • Clinical data on the Variability in FAGA Severity Grading Using the Ludwig Scale
        • Assessment of Expert Consensus on the Perceived Utility of the device
      • Summary of articles retained for the description of similar devices
        • Clinical data collected on SkinVision
        • Clinical data collected on Huvy
        • Clinical data collected on DERM
        • Clinical data collected on Dermalyser
        • Clinical data collected on ModelDerm
      • Results: data from registries and databases
      • Results of the vigilance databases analysis
      • Applicable standards
  • State of the Art presentation
    • Introduction to Dermatology and Clinical Challenges
    • Application of Artificial Intelligence in Dermatology
    • Similar devices
      • DERM (Deep Ensemble for Recognition of Malignancy)
      • Huvy (SLC.AI)
      • SkinVision
      • Dermalyser
      • ModelDerm
    • Expected benefits of AI-guided medical devices in dermatology
    • Hazards due to AI-Guided Medical Devices that Could be Relevant to the Device under Evaluation
    • Benefit-Risk Profiles of Alternative AI-Guided Medical Devices
    • Discussion
  • Synthesis
  • References
  • Literature search and publications
    • Literature search performed for the state-of-the-art review
      • Search traceability
      • Search treaceability (vigilance data)
    • Retained clinical data
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)