R-TF-007-005 Post-Market Clinical Follow-up (PMCF) evaluation report_2023_001
Details
- Corresponding PMCF plan number and version: 2023_001 dated 20230110.
- PMCF report number: 2023_001
- PMCF report date and version: registered in the git repository
Manufacturer contact details
Manufacturer data | |
---|---|
Legal manufacturer name | AI Labs Group S.L. |
Address | Street Gran Vía 1, BAT Tower, 48001, Bilbao, Bizkaia (Spain) |
SRN | ES-MF-000025345 |
Person responsible for regulatory compliance | Alfonso Medela, María Diez, Giulia Foglia |
office@legit.health | |
Phone | +34 638127476 |
Trademark | Legit.Health |
Medical device characterization
Information | |
---|---|
Device name | Legit.Health Plus (hereinafter, the device) |
Model and type | NA |
Version | 1.0.0.0 |
Basic UDI-DI | 8437025550LegitCADx6X |
Certificate number (if available) | MDR 792790 |
EMDN code(s) | Z12040192 (General medicine diagnosis and monitoring instruments - Medical device software) |
GMDN code | 65975 |
Class | Class IIb |
Classification rule | Rule 11 |
Novel product (True/False) | FALSE |
Novel related clinical procedure (True/False) | FALSE |
SRN | ES-MF-000025345 |
Intended use or purpose
Intended use
The device is a computational software-only medical device intended to support health care providers in the assessment of skin structures, enhancing efficiency and accuracy of care delivery, by providing:
- quantification of intensity, count, extent of visible clinical signs
- interpretative distribution representation of possible International Classification of Diseases (ICD) classes.
Quantification of intensity, count and extent of visible clinical signs
The device provides quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others; including, but not limited to:
- erythema,
- desquamation,
- induration,
- crusting,
- dryness,
- oedema,
- oozing,
- excoriation,
- swelling,
- lichenification,
- exudation,
- depth,
- edges,
- undermining,
- pustulation,
- hair loss,
- type of necrotic tissue,
- amount of necrotic tissue,
- type of exudate,
- peripheral tissue edema,
- peripheral tissue induration,
- granulation tissue,
- epithelialization,
- nodule count,
- papule count,
- pustule count,
- cyst count,
- comedone count,
- abscess count,
- draining tunnel count,
- lesion count
Image-based recognition of visible ICD classes
The device is intended to provide an interpretative distribution representation of possible International Classification of Diseases (ICD) classes that might be represented in the pixels content of the image.
Device description
The device is computational software-only medical device leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures. Its principal function is to provide a wide range of clinical data from the analyzed images to assist healthcare practitioners in their clinical evaluations and allow healthcare provider organisations to gather data and improve their workflows.
The generated data is intended to aid healthcare practitioners and organizations in their clinical decision-making process, thus enhancing the efficiency and accuracy of care delivery.
The device should never be used to confirm a clinical diagnosis. On the contrary, its result is one element of the overall clinical assessment. Indeed, the device is designed to be used when a healthcare practitioner chooses to obtain additional information to consider a decision.
Intended medical indication
The device is indicated for use on images of visible skin structure abnormalities to support the assessment of all diseases of the skin incorporating conditions affecting the epidermis, its appendages (hair, hair follicle, sebaceous glands, apocrine sweat gland apparatus, eccrine sweat gland apparatus and nails) and associated mucous membranes (conjunctival, oral and genital), the dermis, the cutaneous vasculature and the subcutaneous tissue (subcutis).
Intended patient population
The device is intended for use on images of skin from patients presenting visible skin structure abnormalities, across all age groups, skin types, and demographics.
Intended user
The medical device is intended for use by healthcare providers to aid in the assessment of skin structures.
User qualification and competencies
In this section we specificy the specific qualifications and competencies needed for users of the device, to properly use the device, provided that they already belong to their professional category. In other words, when describing the qualifications of HCPs, it is assumed that healthcare professionals (HCPs) already have the qualifications and competencies native to their profession.
Healthcare professionals
No official qualifications are needes, but it is advisable if HCPs have some competencies:
- Knowledge on how to take images with smartphones.
IT professionals
IT professionals are responsible for the integration of the medical device into the healthcare organisation's system.
No specific official qualifications are needed, but it is advisable that IT professionals using the device have the following competencies:
- Basic knowledge of FHIR
- Understanding of the output of the device.
Use environment
The device is intended to be used in the setting of healthcare organisations and their IT departments, which commonly are situated inside hospitals or other clinical facilities.
The device is intended to be integrated into the healthcare organisation's system by IT professionals.
Operating principle
The device is computational medical tool leveraging computer vision algorithms to process images of the epidermis, the dermis and its appendages, among other skin structures.
Body structures
The device is intended to use on the epidermis, its appendages (hair, hair follicle, sebaceous glands, apocrine sweat gland apparatus, eccrine sweat gland apparatus and nails) and associated mucous membranes (conjunctival, oral and genital), the dermis, the cutaneous vasculature and the subcutaneous tissue (subcutis).
In fact, the device is intended to use on visible skin structures. As such, it can only quantify clinical signs that are visible, and distribute the probabilities across ICD classes that are visible.
Variants and models
The device does not have any variants.
Expected lifetime
Its expected lifetime is considered unlimited because the device will be updated with each improvement opportunity extracted from the information and analysis of the data provided by the continuous and systematic post-market data follow-up.
List of any accessories covered by this report
The device does not have any accessories.
The device's functionality can be leveraged through its API, which is compatible with any device that has an internet connection, from personal computers and servers to mobile phones, regardless of operating systems and makes.
Explanation of any novel features
The device represents a significant enhancement of existing technology, developed to advance the current state-of-the-art in analyzing photographs of skin structures. It employs sophisticated artificial intelligence algorithms to process these images and extract data with clinical significance.
We detail the novel features of the device within the Description and specifications document.
Post-Market Clinical Follow-up (PMCF) activities
Activity 1: Clinical literature review
This activity compiles and evaluates state-of-the-art publications on image diagnostic and severity measure methods. The overall objective of this strategy is to identify, select and collect the relevant literature to determine if the device is safe for its intended use and if there is any emergent risk that we must consider.
The literature review was performed according to the R-TF-015-001 Clinical evaluation plan
and the results were collected at the R-TF-015-002 Preclinical and clinical evaluation record
.
Skin structures analysis
In the literature review for skin analysis methods, 188 articles were compiled, of which 81 articles passed the Data Screening and 107 were discarded. These 81 articles that passed the Data Screening were reviewed thoroughly to assess their Quality, Relevance, and Contribution. As a result:
- 31 articles have been evaluated as highly relevant.
- 25 articles have been evaluated as relevant.
- 25 articles have been evaluated as non-relevant.
In this review, we identified different kinds of skin analysis methods that are intended for different use cases:
- Most commonly, these solutions are designed for skin-related image diagnosis (IDs 019, 020, 030, 034, 042, 044, 045, 047, 053, 058, 061, 063, 067, 068, 072, 078, 079, 080, 083, 085, 092, 096, 119, 122, 127, 128, 129, 132, 166), usually being able to detect 2 to 10 different skin diseases but in particular cases up to 174 different lesion categories. Of special relevance, certain solutions include novel techniques to increase the diagnosis performance. IDs 092 and 128 jointly process image and metadata information, ID 020 combines image and lesion segmentation information, ID 080 attaches a special device to a smartphone camera to improve the image quality and ease its processing, and IDs 042 and 132 include different tricks such as knowledge distillation, cost-sensitive learning, soft targets, cumulative learning or specific data augmentations to ease the learning with small and unbalanced datasets.
- Differently, other methods are designed for lesion segmentation with pixel-precision or for the severity assessment of different pathologies. In the case of lesion segmentation (IDs 016, 025, 027, 031, 037, 064, 076, 122), we find interesting novelties such as ID 016 that includes the use of special pre and post-processing steps, and ID 064 that proposes a new architecture designed with fewer parameters to perform more efficiently. And, as for the case of lesion severity assessment, we find some methods designed for psoriasis (IDs 009) or acne (IDs 021, 035, 093) evaluation.
- Besides the design and development of these image analysis methods, we also find some other articles that include information related to their deployment and integration in phone apps and/or the cloud (IDs 019, 021, 035, 061, 087). Of special relevance, ID 021 proposes the pruning and feature-based knowledge distillation of the trained models to improve their efficiency in end-devices.
- Interestingly, we find also review articles that summarize the state-of-the-art skin lesion analysis methods (IDs 001, 007, 009, 014, 015, 019, 113, 116, 120, 121, 146, 187, 188), including important tips for their design, deployment, and analysis of the challenges they face.
- Finally, other research articles also provide information related to the evaluation of these methods in clinical trials (IDs 019, 021, 030, 053, 081, 085, 140, 143, 184), providing meaningful information related to the performance of these methods in real environments and the benefits they offer to health professionals.
ID | Name | Authors | Publication | Q (<40) | R (<30) | C (<30) | Weighted Value |
---|---|---|---|---|---|---|---|
001 | Explainable artificial intelligence in skin cancer recognition: A systematic review | Hauser, K., Kurz, A., Haggenmueller, S., Maron, R. C., von Kalle, C., Utikal, J. S., ... & Brinker, T. J. | European Journal of Cancer | 40 | 20 | 10 | 70 |
004 | DermIA: Machine Learning to Improve Skin Cancer Screening | Shoen, Ezra; Shoen, Ezra | Journal of digital imaging | 0 | 20 | 5 | 25 |
007 | Non-Melanoma Skin Cancer Detection in the Age of Advanced Technology: A Review | Stafford, Haleigh; Buell, Jane; Yaniv, Dan | Cancers | 30 | 25 | 15 | 70 |
009 | Telemedicine and e-Health in the Management of Psoriasis: Improving Patient Outcomes A Narrative Review | Havelin, Alison; Hampton, Philip; Hampton, Philip | Psoriasis (Auckland, N.Z.) | 25 | 20 | 5 | 50 |
013 | Artificial intelligence for the automated single-shot assessment of psoriasis severity | Okamoto, T; Kawai, M; Kawamura, T | Journal of the European Academy of Dermatology and Venereology : JEADV | 10 | 5 | 5 | 20 |
014 | A survey, review, and future trends of skin lesion segmentation and classification | Hasan, Md Kamrul; Ahamad, Md Asif; Yang, Guang | Computers in biology and medicine | 38 | 27 | 10 | 75 |
015 | Digital skin imaging applications, part I: Assessment of image acquisition technique features | Sun, Mary D; Kentley, Jonathan; Halpern, Allan C | Skin Research and Technology | 35 | 25 | 25 | 85 |
016 | Melanoma segmentation using deep learning with test-time augmentations and conditional random fields | Ashraf, Hassan; Waris, Asim; Niazi, Imran Khan | Scientific reports | 35 | 25 | 15 | 75 |
018 | DenseNet-II: an improved deep convolutional neural network for melanoma cancer detection | Girdhar, Nancy; Sinha, Aparna; Gupta, Shivang | Soft computing | 5 | 20 | 0 | 25 |
019 | Automatic skin disease diagnosis using deep learning from clinical image and patient information | Muhaba, K A; Dese, K; Simegn, G L | Skin health and disease | 35 | 30 | 10 | 75 |
020 | An interpretable CNN-based CAD system for skin lesion diagnosis | López-Labraca, Javier; González-Díaz, Iván; Fueyo-Casado, Alejandro | Artificial intelligence in medicine | 40 | 25 | 25 | 90 |
021 | A cell phone app for facial acne severity assessment | Wang, Jiaoju; Luo, Yan; Zhang, Jianglin | Applied intelligence (Dordrecht, Netherlands) | 40 | 28 | 28 | 96 |
025 | Monitoring of Pigmented Skin Lesions Using 3D Whole Body Imaging | Ahmedt-Aristizabal, David; Nguyen, Chuong; Wang, Dadong | Computer methods and programs in biomedicine | 20 | 5 | 10 | 35 |
027 | Machine learning based skin lesion segmentation method with novel borders and hair removal techniques | Rehman, Mohibur; Ali, Mushtaq; Mustafa Hilal, Anwer | PloS one | 20 | 20 | 10 | 50 |
030 | Convolutional neural network assistance significantly improves dermatologists' diagnosis of cutaneous tumours using clinical images | Ba, Wei; Wu, Huan; Li, Cheng X | European journal of cancer (Oxford, England : 1990) | 35 | 30 | 8 | 73 |
031 | Skin Lesion Segmentation Using an Ensemble of Different Image Processing Methods | Tamoor, Maria; Naseer, Asma; Zafar, Kashif | Diagnostics | 15 | 15 | 5 | 35 |
032 | Attention Cost-Sensitive Deep Learning-Based Approach for Skin Cancer Detection and Classification | Ravi, Vinayakumar; Ravi, Vinayakumar | Cancers | 15 | 10 | 0 | 25 |
034 | ExAID: A multimodal explanation framework for computer-aided diagnosis of skin lesions | Lucieri, Adriano; Bajwa, Muhammad Naseer; Ahmed, Sheraz | Computer methods and programs in biomedicine | 30 | 25 | 20 | 75 |
035 | Automatic Acne Object Detection and Acne Severity Grading Using Smartphone Images and Artificial Intelligence | Huynh, Quan Thanh; Nguyen, Phuc Hoang; Ngo, Hoan Thanh | Diagnostics | 15 | 25 | 10 | 50 |
037 | An IoMT-Based Melanoma Lesion Segmentation Using Conditional Generative Adversarial Networks | Ali, Zeeshan; Naz, Sheneela; Kim, Yongsung | Sensors | 15 | 20 | 10 | 45 |
039 | Human Monkeypox Classification from Skin Lesion Images with Deep Pre-trained Network using Mobile Application | Sahin, Veysel Harun; Oztel, Ismail; Yolcu Oztel, Gozde | Journal of medical systems | 5 | 15 | 0 | 20 |
042 | Melanoma classification from dermatoscopy images using knowledge distillation for highly imbalanced data | Adepu, Anil Kumar; Sahayam, Subin; Arramraju, Rashmika | Computers in biology and medicine | 35 | 25 | 25 | 85 |
044 | A rotation meanout network with invariance for dermoscopy image classification and retrieval | Zhang, Yilan; Xie, Fengying; Liu, Jie | Computers in biology and medicine | 25 | 20 | 15 | 60 |
045 | Implementation of artificial intelligence algorithms for melanoma screening in a primary care setting | Giavina-Bianchi, Mara; de Sousa, Raquel Machado; Machado, Birajara Soares | PloS one | 35 | 25 | 15 | 75 |
047 | A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss | Nie, Yali; Sommella, Paolo; Lundgren, Jan | Diagnostics | 10 | 20 | 5 | 35 |
051 | A novel approach toward skin cancer classification through fused deep features and neutrosophic environment | Abdelhafeez, Ahmed; Mohamed, Hoda K; Khalil, Nariman A | Frontiers in public health | 5 | 10 | 5 | 20 |
053 | A machine learning-based, decision support, mobile phone application for diagnosis of common dermatological diseases | Pangti, R; Mathur, J; Gupta, S | Journal of the European Academy of Dermatology and Venereology : JEADV | 35 | 30 | 10 | 75 |
055 | Development and Clinical Evaluation of an Artificial Intelligence Support Tool for Improving Telemedicine Photo Quality | Vodrahalli, Kailas; Ko, Justin; Daneshjou, Roxana | JAMA dermatology | 30 | 30 | 15 | 75 |
056 | The Role of Machine Learning and Deep Learning Approaches for the Detection of Skin Cancer | Mazhar, Tehseen; Haq, Inayatul; Goh, Lucky Poh Wah | Healthcare (Basel, Switzerland) | 15 | 10 | 0 | 25 |
058 | Automatic identification of benign pigmented skin lesions from clinical images using deep convolutional neural network | Ding, Hui; Zhang, Eejia; Lin, Tong | BMC biotechnology | 20 | 25 | 5 | 50 |
061 | Accuracy of a Smartphone-Based Artificial Intelligence Application for Classification of Melanomas, Melanocytic Nevi, and Seborrheic Keratoses | Liutkus, Jokubas; Kriukas, Arturas; Valiukeviciene, Skaidra | Diagnostics | 25 | 20 | 5 | 50 |
063 | Lesion identification and malignancy prediction from clinical dermatological images | Xia, Meng; Kheterpal, Meenal K; Henao, Ricardo | Scientific reports | 25 | 20 | 10 | 55 |
064 | Ms RED: A novel multi-scale residual encoding and decoding network for skin lesion segmentation | Dai, Duwei; Dong, Caixia; Luo, Nana | Medical image analysis | 35 | 25 | 20 | 80 |
067 | Ensemble Method of Convolutional Neural Networks with Directed Acyclic Graph Using Dermoscopic Images: Melanoma Detection Application | Foahom Gouabou, Arthur Cartel; Damoiseaux, Jean-Luc; Merad, Djamal | Sensors | 10 | 10 | 10 | 30 |
068 | Incorporating clinical knowledge with constrained classifier chain into a multimodal deep network for melanoma detection | Wang, Yuheng; Cai, Jiayue; Lee, Tim K | Computers in biology and medicine | 15 | 10 | 5 | 30 |
070 | A Novel Approach for the Shape Characterisation of Non-Melanoma Skin Lesions Using Elliptic Fourier Analyses and Clinical Images | Courtenay, Lloyd A; Barbero-García, Inés; Román-Curto, Concepción | Journal of clinical medicine | 10 | 5 | 5 | 20 |
071 | Multiclass skin lesion localization and classification using deep learning based features fusion and selection framework for smart healthcare | Maqsood, Sarmad; Damaševičius, Robertas; Damaševičius, Robertas | Neural networks : the official journal of the International Neural Network Society | 5 | 5 | 5 | 15 |
072 | Light-Dermo: A Lightweight Pretrained Convolution Neural Network for the Diagnosis of Multiclass Skin Lesions | Baig, Abdul Rauf; Abbas, Qaisar; Ahmed, Alaa E S | Diagnostics | 15 | 15 | 10 | 40 |
076 | A Multi-Feature Fusion Framework for Automatic Skin Cancer Diagnostics | Bakheet, Samy; Alsubai, Shtwai; Alqahtani, Abdullah | Diagnostics | 25 | 15 | 10 | 50 |
078 | Computer Aided Diagnosis of Melanoma Using Deep Neural Networks and Game Theory: Application on Dermoscopic Images of Skin Lesions | Foahom Gouabou, Arthur Cartel; Collenne, Jules; Merad, Djamal | International journal of molecular sciences | 25 | 20 | 20 | 65 |
079 | Deeply Supervised Skin Lesions Diagnosis with Stage and Branch Attention | Dai, Wei; Liu, Rui; Liu, Jun | IEEE journal of biomedical and health informatics | 30 | 25 | 20 | 75 |
080 | The Role in Teledermoscopy of an Inexpensive and Easy-to-Use Smartphone Device for the Classification of Three Types of Skin Lesions Using Convolutional Neural Networks | Veronese, Federica; Branciforti, Francesco; Savoia, Paola | Diagnostics | 30 | 25 | 20 | 75 |
081 | Validation of a Market-Approved Artificial Intelligence Mobile Health App for Skin Cancer Screening: A Prospective Multicenter Diagnostic Accuracy Study | Sangers, Tobias; Reeder, Suzan; Wakkee, Marlies | Dermatology (Basel, Switzerland) | 35 | 30 | 15 | 80 |
082 | Non-melanoma skin cancer diagnosis: a comparison between dermoscopic and smartphone images by unified visual and sonification deep learning algorithms | Dascalu, A; Walker, B N; David, E O | Journal of cancer research and clinical oncology | 30 | 10 | 5 | 45 |
083 | The Classification of Six Common Skin Diseases Based on Xiangya-Derm: Development of a Chinese Database for Artificial Intelligence | Huang, Kai; Jiang, Zixi; Zhao, Shuang | Journal of medical Internet research | 30 | 30 | 20 | 80 |
085 | Performance of a deep neural network in teledermatology: a single-centre prospective diagnostic study | Muñoz-López, C; Ramírez-Cornejo, C; Navarrete-Dechent, C | Journal of the European Academy of Dermatology and Venereology : JEADV | 35 | 30 | 10 | 75 |
087 | Imtidad: A Reference Architecture and a Case Study on Developing Distributed AI Services for Skin Disease Diagnosis over Cloud, Fog and Edge | Janbi, Nourah; Mehmood, Rashid; Yigitcanlar, Tan | Sensors | 40 | 20 | 20 | 80 |
088 | Over-Detection of Melanoma-Suspect Lesions by a CE-Certified Smartphone App: Performance in Comparison to Dermatologists, 2D and 3D Convolutional Neural Networks in a Prospective Data Set of 1204 Pigmented Skin Lesions Involving Patients' Perception | Jahn, Anna Sophie; Navarini, Alexander Andreas; Maul, Lara Valeska | Cancers | 15 | 5 | 0 | 20 |
089 | Design and validation of a new machine-learning-based diagnostic tool for the differentiation of dermatoscopic skin cancer images | Tajerian, Amin; Kazemian, Mohsen; Akhavan Malayeri, Ava | PloS one | 5 | 5 | 5 | 15 |
092 | Reimagining leprosy elimination with AI analysis of a combination of skin lesion images with demographic and clinical data | Barbieri, Raquel R; Xu, Yixi; Moraes, Milton O | Lancet regional health. Americas | 35 | 25 | 15 | 75 |
093 | A smart LED therapy device with an automatic facial acne vulgaris diagnosis based on deep learning and internet of things application | Phan, Duc Tri; Ta, Quoc Bao; Oh, Junghwan | Computers in biology and medicine | 30 | 10 | 20 | 60 |
096 | Dermatoscopy of combined blue nevi: a multicentre study of the International Dermoscopy Society | Stojkovic-Filipovic, J; Tiodorovic, D; Kittler, H | Journal of the European Academy of Dermatology and Venereology : JEADV | 30 | 20 | 10 | 60 |
113 | Skin cancer detection: a review using deep learning techniques | Dildar, M., Akram, S., Irfan, M., Khan, H. U., Ramzan, M., Mahmood, A. R., ... & Mahnashi, M. H. | International journal of environmental research and public health | 40 | 20 | 0 | 60 |
116 | Skin cancer classification via convolutional neural networks: systematic review of studies involving human experts | Haggenmüller, S., Maron, R. C., Hekler, A., Utikal, J. S., Barata, C., Barnhill, R. L., ... & Brinker, T. J. | European Journal of Cancer | 20 | 20 | 5 | 45 |
117 | An enhanced technique of skin cancer classification using deep convolutional neural network with transfer learning models | Ali, M. S., Miah, M. S., Haque, J., Rahman, M. M., & Islam, M. K. | Machine Learning with Applications | 5 | 10 | 0 | 15 |
118 | Multiclass skin cancer classification using EfficientNets-a first step towards preventing skin cancer | Ali, K., Shaikh, Z. A., Khan, A. A., & Laghari, A. A. | Neuroscience Informatics | 10 | 10 | 0 | 20 |
119 | Intelligence Skin Cancer Detection using IoT with a Fuzzy Expert System | Al-Dmour, N. A., Salahat, M., Nair, H. K., Kanwal, N., Saleem, M., & Aziz, N. | In 2022 International Conference on Cyber Resilience (ICCR) | 30 | 15 | 15 | 60 |
120 | Machine learning and deep learning methods for skin lesion classification and diagnosis: a systematic review | Kassem, M. A., Hosny, K. M., Damaševičius, R., & Eltoukhy, M. M. | Diagnostics | 35 | 20 | 5 | 60 |
121 | Skin lesion classification based on deep convolutional neural networks architectures | Saeed, J., & Zeebaree, S. | Journal of Applied Science and Technology Trends | 10 | 20 | 5 | 35 |
122 | Multi-class skin lesion detection and classification via teledermatology | Khan, M. A., Muhammad, K., Sharif, M., Akram, T., & de Albuquerque, V. H. C. | IEEE journal of biomedical and health informatics | 20 | 20 | 20 | 60 |
124 | Intelligent skin cancer detection applying autoencoder, MobileNetV2 and spiking neural networks | Toğaçar, M., Cömert, Z., & Ergen, B. | Chaos, Solitons & Fractals | 10 | 10 | 5 | 25 |
125 | Skin lesion analyser: an efficient seven-way multi-class skin cancer classification using MobileNet | Chaturvedi, S. S., Gupta, K., & Prasad, P. S. | Advanced Machine Learning Technologies and Applications: Proceedings of AMLTA 2020 | 5 | 10 | 0 | 15 |
126 | Multiclass skin lesion classification using hybrid deep features selection and extreme learning machine | Afza, F., Sharif, M., Khan, M. A., Tariq, U., Yong, H. S., & Cha, J. | Sensors | 10 | 10 | 5 | 25 |
127 | Soft attention improves skin cancer classification performance | Datta, S. K., Shaikh, M. A., Srihari, S. N., & Gao, M. | In Interpretability of Machine Intelligence in Medical Image Computing | 25 | 25 | 25 | 75 |
128 | An attention-based mechanism to combine images and metadata in deep learning models applied to skin cancer classification | Pacheco, A. G., & Krohling, R. A. | IEEE journal of biomedical and health informatics | 25 | 25 | 25 | 75 |
129 | Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning | Abdar, M., Samami, M., Mahmoodabad, S. D., Doan, T., Mazoure, B., Hashemifesharaki, R., ... & Nahavandi, S. | Computers in biology and medicine | 35 | 25 | 20 | 80 |
132 | Single model deep learning on imbalanced small datasets for skin lesion classification | Yao, P., Shen, S., Xu, M., Liu, P., Zhang, F., Xing, J., ... & Xu, R. X. | IEEE transactions on medical imaging | 35 | 30 | 30 | 95 |
135 | Skin cancer diagnosis using convolutional neural networks for smartphone images: A comparative study | Medhat, S., Abdel-Galil, H., Aboutabl, A. E., & Saleh, H. | Journal of Radiation Research and Applied Sciences | 5 | 5 | 0 | 10 |
138 | A smartphone-based application for an early skin disease prognosis: Towards a lean healthcare system via computer-based vision | Shahin, M., Chen, F. F., Hosseinzadeh, A., Koodiani, H. K., Shahin, A., & Nafi, O. A. | Advanced Engineering Informatics | 10 | 5 | 5 | 20 |
140 | New AI-algorithms on smartphones to detect skin cancer in a clinical setting—A validation study | Kränke, T., Tripolt-Droschl, K., Röd, L., Hofmann-Wellenhof, R., Koppitz, M., & Tripolt, M. | Plos one | 35 | 30 | 10 | 75 |
141 | Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis | Balasubramaniam, V. | Journal of Artificial Intelligence and Capsule Networks | 1 | 5 | 5 | 11 |
143 | Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review | Jones, O. T., Matin, R. N., van der Schaar, M., Bhayankaram, K. P., Ranmuthu, C. K. I., Islam, M. S., ... & Walter, F. M. | The Lancet Digital Health | 35 | 20 | 20 | 75 |
146 | Artificial intelligence in the detection of skin cancer | Beltrami, E. J., Brown, A. C., Salmon, P. J., Leffell, D. J., Ko, J. M., & Grant-Kels, J. M. | Journal of the American Academy of Dermatology. | 35 | 20 | 0 | 55 |
149 | Smartphone-based Skin Cancer Detection using Image Processing and Convolutional Neural Network | Rahman, S., Raihan, M., & Mithila, S. K. | International Conference on Computing Communication and Networking Technologies (ICCCNT) | 10 | 10 | 0 | 20 |
153 | DSCC_Net: Multi-Classification Deep Learning Models for Diagnosing of Skin Cancer Using Dermoscopic Images | Tahir, M., Naeem, A., Malik, H., Tanveer, J., Naqvi, R. A., & Lee, S. W. | Cancers | 10 | 10 | 5 | 25 |
155 | Deep Learning-Based Classification of Dermoscopic Images for Skin Lesions | SÖNMEZ, A. F., ÇAKAR, S., CEREZCİ, F., KOTAN, M., DELİBAŞOĞLU, İ., & Gülüzar, Ç. İ. T. | Sakarya University Journal of Computer and Information Sciences | 10 | 10 | 5 | 25 |
158 | Detection of melanoma with hybrid learning method by removing hair from dermoscopic images using image processing techniques and wavelet transform | Suiçmez, Ç., Kahraman, H. T., Suiçmez, A., Yılmaz, C., & Balcı, F. | Biomedical Signal Processing and Control | 10 | 10 | 5 | 25 |
166 | DermoCC-GAN: A new approach for standardizing dermatological images using generative adversarial networks | Salvi, M., Branciforti, F., Veronese, F., Zavattaro, E., Tarantino, V., Savoia, P., & Meiburger, K. M. | Computer Methods and Programs in Biomedicine | 30 | 20 | 15 | 65 |
184 | Development and assessment of an artificial intelligence-based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices | Jain, A., Way, D., Gupta, V., Gao, Y., de Oliveira Marinho, G., Hartford, J., ... & Liu, Y. | JAMA network open | 35 | 30 | 10 | 75 |
187 | A survey on deep learning for skin lesion segmentation | Mirikharaji, Z., Abhishek, K., Bissoto, A., Barata, C., Avila, S., Valle, E., ... & Hamarneh, G. | Medical Image Analysis | 40 | 30 | 20 | 90 |
188 | Characteristics of publicly available skin cancer image datasets: a systematic review | Wen, D., Khan, S. M., Xu, A. J., Ibrahim, H., Smith, L., Caballero, J., ... & Matin, R. N. | The Lancet Digital Health | 40 | 30 | 10 | 80 |
Activity 2: PMCF studies
Introduction
This activity compiles the clinical validations performed to assess the safety and performance of the medical device in a real-world environment. Currently, 5 clinical investigations have been finished, 1 clinical investigations is currently ongoing and recruiting patients and 1 investigation the first part has concluded and the second will start in 2025. The overall objective of this strategy is to support the safety and effectiveness of the device in specific patient populations and under different clinical conditions.
These studies were performed according to the R-TF-015-001 Clinical evaluation plan
and the R-TF-007-002 Post-Market Clinical Follow-up (PMCF) Plan
and the protocols collected in the Clinical Investigation Plans. The results were collected at the R-TF-015-002 Preclinical and clinical evaluation record_2023_001 and also in the corresponding Clinical Investigation Report of each study.
Ongoing studies
Pilot study for the clinical validation of an artifical intelligence algorithm to optimize the appropriateness of dermatology referrals.
- Main objective: To validate that Legit.Health artificial intelligence algorithms are a valid tool for optimizing the appropriateness of dermatology referrals.
- Secondary objectives:
- To validate that the device reduces costs in secondary care.
- To validate that the device reduces dermatology waiting lists.
- To validate that the device optimizes clinical flow in Osakidetza.
- Design: Prospective observational and analytical study of a longitudinal clinical case series.
- Sample size: This study will include 400 patients. Currently, 79 patients have been recruited.
- Duration: 4 months. An extension of the study has been requested to continue patient recruitment.
- Current status: Recruiting patients to achieve the expected sample size.
- Results:
- In the detection of malignancy, the device showed a 100% sensitivity and 76% of specifity. On the other hand, the primary care practicioners do not properly identify malignant cases, showing a sensibility of 25% and they have a high specificity of 96%.
- Dermatologists opt for in-person consultations for 71% of the patients, while they address and resolve 29% of teledermatology cases directly.
- General practitioners believed that 86% of cases should be referred to a dermatologist, though dermatologists did not schedule in-person visits for 29% of teleconsultations. An estimated 15% of referrals could be avoided using tools like Legit.Health, with 30% of referred cases ultimately not requiring in-person visits. The algorithm flagged 57% of cases marked for referral as low malignancy, correctly identifying 43% of non-attended cases as low-risk. For cases without referral, all had low malignancy values, confirming no malignancy, except for one case later diagnosed as psoriasis.
- Regarding cost reduction, around 30% of the consultations initially deemed necessary for referral did not result in an in-person dermatologist consultation. Approximately 56% of the consultations that appeared to require an in-person visit turned out to be benign or cases that could have been managed in primary care.
- The average waiting time for dermatologist consultations was 10 days.
- Conclusions: It is soon to draw conclusions since this study is still recruiting patients. But these initial results seem to show that the use of the medical device could optimize costs, reduce waiting times, and expedite urgent cases, assuming appointments were delayed due to waiting lists.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan LEGIT.HEALTH_DAO_Derivación_O_2022
.
Optimization of clinical flow in patients with dermatological conditions using Artificial Intelligence.
- Main objective: To validate that the device optimizes the clinical flow and patient care process, decreasing the time and cost of care per patient, through greater accuracy in medical diagnosis and determination of the degree of malignancy or severity.
- Secondary objectives:
- To demonstrate that the device improves the ability of healthcare professionals in detecting malignant or suspected malignant pigmented lesions.
- Demonstrate that the device improves the ability and accuracy of healthcare professionals in measuring the degree of involvement of patients with female androgenic alopecia.
- To demonstrate that the device improves the ability and accuracy of healthcare professionals in measuring the degree of involvement of patients with acne.
- Automate the initial triage/assessment process in patients consulting for pigmented lesions.
- To evaluate the reduction in the use of healthcare resources by the center by reducing the number of triage consultations and direct referral of the patient to the appropriate consultation (aesthetic or dermatological).
- Evaluate the degree of usability of the device by the patient.
- Demonstrate that the device increases specialist satisfaction.
- Evaluate the reduction in the use of healthcare resources by reducing the number of triage consultations and directing the patient directly to the appropriate consultation, whether in the aesthetic or dermatological field.
- Design: This is a prospective observational study with both longitudinal and retrospective case series.
- Sample size: Prospectively, a minimum of 60 cases will be included: 30 with pigmented lesions, 15 with androgenic alopecia and 15 with inflammatory acne. Retrospectively, 60 patients with pigmented lesions, 15 with androgenic alopecia and 15 with inflammatory acne will be included.
- Duration: The first part of the study took 5 months to be completed.
- Current status: The first part of the study, which focused on the analysis of pigmented lesions and androgenic alopecia, finished in August 2024. The second part of the study will start on 2025.
- Results:
- For retrospective images, the medical deviced exhibited an AUC of 0.76 in detecting lesion malignancy, while the dermatologists achieved an AUC of 0.79.
- The medical device achieved a top-5 accuracy of 0.47 regarding the diagnosis assessment, while the dermatologists achieved a 0.45 top-3 accuracy. When not accounting for the specific kind of nevus in the diagnosis, the medical device achieves a superior top-5 accuracy of 0.78 and the dermatologists achieve a top-3 accuracy of 0.70.
- In the analysis of prospective images and in relation with the performance of the dermatologist with the medical device, they get an AUC of 0.94. In terms of diagnosis performance, the dermatologists aided by the legacy medical device achieved a top-1 accuracy of 0.30.
- For androgenetic alopecia, 49 retrospective images in addition to 13 previously obtained were collected. The overall accuracy of the model was 47%, while the accuracy of the latest model optimized for FAA was 53%, based on the investigator's scores.
- Conclusions: It is soon to draw conclusions, since the second part of the study has yet to begin. But in this first part of the study the device's diagnostic capability in distinguishing malignancy is on par with expert dermatologists, not only in teledermatology but also in in-person consultations. This confirms its reliability as a screening tool for malignant ICD-11 categories, helping to prioritize patients based on urgency and direct them to the appropriate specialist or consultation. Additionally, we observed a strong correlation in Ludwig scores, despite a decline in the prospective trial, which may be attributed to inconsistencies in criteria alignment.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan Legit.Health_IDEI_2023
.
Clinical validation study of a Computer-Aided Diagnosis (CADx) system with artificial intelligence algorithms for early non-invasive detection of in vivo cutaneous melanoma.
- Main objective: To validate that the artificial intelligence algorithm developed by AI LABS GROUP S.L. for the identification of cutaneous melanoma in images of lesions taken with a dermoscopic camera achieves the following values:
- AUC greater than 0.8.
- Sensitivity of 80% or higher.
- Specificity of 70% or higher.
- Secondary objectives: Validate the usefulness and feasibility of the artificial intelligence algorithm developed by the manufacturer in adverse environments with severe technical limitations, such as lack of instrumentation or lack of internet connection.
- Design: Analytical observational and cross-sectional case series study for the performance of a diagnostic test study.
- Sample size: The proposed number for this study was 200. By the end of the study, 105 patients were recruited. Despite being a smaller sample than the goal (200 subjects), we managed to increase the ratio of cutaneous melanoma cases originally planned (from 20% to 34%).
- Duration: 5 years.
- Results:
- In relation with the melanoma detection, the medical device achieved a top-1 precision of 0.80, a top-3 sensitivity of 0.90 and a top-1 specifity of 0.80.
- For melanoma detection, the medical device showed and AUC of 0.84, which is considered excellent.
- In the analysis of skin recognition, the medical device achieved a top-1 accuracy of 0.54, a top-3 accuracy of 0.75 and a top-5 accuracy of 0.84.
- Finally, the medical device showed an AUC of 0.88 in the detection of malignancy.
- Conclusions: The device demonstrates great malignancy prediction and compelling image recognition capacity for melanoma and other pigmented skin lesions such as carcinoma, keratoses or nevus, with results similar to internal validation tests. Regarding the detection of melanoma, the data collected in this study limits the power of the analysis due to class imbalance, difficult diagnoses, and inconsistent image quality, but the results obtained are compelling even under such challenging conditions.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan LEGIT_MC_EVCDAO_2019
.
Clinical Validation of a Computer-Aided Diagnosis (CAD) System Utilizing Artificial Intelligence Algorithms for Continuous and Remote Monitoring of Patient Condition Severity in an Objective and Stable Manner.
- Main objective: The primary aim of this study is to ascertain the validity of the device, leveraging artificial intelligence and developed by AI LABS GROUP S.L., in objectively and reliably tracking the progression of chronic dermatological conditions. This validation is deemed successful if the tool achieves a score of 8 or higher in the Clinical Utility Questionnaire (CUS).
- Secondary objectives:
- Confirming that the utilization of the device elicits a high level of patient satisfaction, particularly in its remote application.
- Demonstrating that the implementation of the device leads to a reduction in face-to-face consultations, thereby optimizing healthcare resources and patient convenience.
- Validating the device's ability to consistently generate reliable condition monitoring, thereby establishing its trustworthiness as a monitoring system.
- Design: Prospective observational analytical study of a longitudinal clinical case series.
- Sample size: 160 patients.
- Duration: 19 months.
- Results:
- A total of 400 patients were initially considered for inclusion in this study. However, after screening based on the predefined study criteria, 240 individuals were excluded. Consequently, the final cohort comprised 160 patients who met the specified eligibility criteria.
- The analysis of the Clinical Utility Questionnaire, the overall score, calculated by averaging scores across specialists and questions normalized from 0 to 100, stands at 71.39. For questions 2, 6, and 10, a score of 100 indicates "yes" and a score of 0 indicates "no." For question 5, responses "I have not reduced time" score 0, otherwise they score 100.
- Regarding the Data Utility Questionnaire, the aggregate score, obtained by averaging responses across specialists and questions, stands at 87 +- 16.
- An evaluation of the System Usability Questionnaire showed by averaging responses across specialists and questions, stands at 87.00.
- In relation with the Patient Satisfaction Questionnaire, the overall score, calculated by averaging scores across patients and questions, stands at 70.77.
- Conclusions: The device proves highly effective, safe, and user-friendly for managing chronic dermatologic conditions. Positive feedback from specialists and patients underscores its potential as a valuable clinical tool. The device exhibits significant clinical relevance in dermatology, offering objective follow-up in skin evaluation. The device streamlines the diagnostic process, reducing clinician workload. While providing substantial benefits, it is emphasized that the tool should complement, not replace, clinical judgment. Overall, the device holds promise as a valuable clinical decision support tool for dermatologists.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan LEGIT_COVIDX_EVCDAO_2022
.
Project to enhance Dermatology E-Consultations in Primary Care centres using Artifical Intelligence Tools.
- Main objective: To validate that the information provided by device increases the true accuracy of healthcare professionals (HCPs) in the diagnosis of multiple dermatological conditions.
- Secondary objectives:
- To validate what percentage of cases should be referred according the HCPs with the information provided by the device.
- To validate what percentage of cases could be handled remotely with the information provided by the device.
- Design: Prospective observational analytical study of a longitudinal clinical case series.
- Sample size: 100 patients.
- Duration: 5 months.
- Results:
- Primary care doctors demonstrated an accuracy of 72.96%, which notably increased to 82.22% with the integration of Legit.Health.
- In assessing the impact of Legit.Health on referrals, our findings revealed that 48.89% of cases did not necessitate a referral.
- Furthermore, we examined the feasibility of handling cases remotely through teledermatology. The results show that 60.74% of the cases can be handled remotely.
- A strong association exists between referrals and remote consultations. 36.67% of the cases do not require referral and can have follow-up remotely, 12.22% of the cases do not require referral but require an in-person appointment, 24.07% of the cases require referral and remote consultation and a 27.04% of the cases require a referral in addition to an in-person appointment.
- Conclusions: The medical device Legit.Health showed an improvement in the diagnostic accuracy in primary care, specially in conditions such as hidradenitis suppurativa or actinic keratosis. The implementation of these technologies can help improve remote patient management and reduce healthcare pressure.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan LEGIT.HEALTH_PH_2024
.
Multicenter pilot study of an artificial intelligence medical device for diagnostic support and severity assessment in primary care and dermatology.
- Main objective: To validate that the information provided by device increases the true accuracy of healthcare professionals (HCPs) in the diagnosis of generalized pustular psoriasis (GPP).
- Secondary objectives: To validate that the information provided by device increases the true accuracy of healthcare professionals (HCPs) in the diagnosis of other dermatological skin conditions, such as hidradenitis suppurativa.
- Design: Prospective cross-sectional study.
- Sample size: 15 healthcare professionals. Each participant was presented with a total of 100 cases or images to review.
- Duration: 4 months.
- Results:
- On average, diagnostic accuracy increased from 47.91% to 62.81%, a relative increase of 31%. For primary care doctors, the improvement was even more pronounced, with a 40% relative increase in correct diagnoses.
- Regarding General Pustular Psoriasis (GPP), there was an increase of 24.44% with the use of the medical device. This effect was even bigger in primary care, with an increase of 120% in GPP diagnoses.
- For other conditions like hidradenitis suppurativa and palmoplantar pustulosis, Primary care doctors correctly diagnosed 12.43% more cases, while dermatologists showed similar improvements with the device. For palmoplantar pustulosis, primary care doctors demonstrated an outstanding 146% increase, with dermatologists maintaining their performance.
- Conclusions: In conclusion, while the dermatologist's data were not statistically significant at the pathology level due to the small sample size (only four dermatologists compared to eleven in primary care), this was not a design flaw, as the focus of the study was on primary care. The high level of expertise among the dermatologists, particularly in hidradenitis suppurativa (HS), combined with the average complexity of HS cases, may explain why the tool was less impactful for them. However, this does not diminish the tool's potential usefulness for other dermatologists. Overall, Legit.Health had a substantial impact, particularly in primary care and for rare conditions like generalized pustular psoriasis (GPP), significantly improving diagnostic accuracy. This improvement may enhance referral appropriateness and reduce healthcare pressure by managing more cases effectively at the primary care level.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan LEGIT.HEALTH_BI_2024
.
Pilot study to evaluate the Performance of a Diagnostic Support Medical Device with Artificial Intelligence.
- Main objective: To validate that the information provided by device increases the true accuracy of healthcare professionals (HCPs) in the diagnosis of multiple dermatological conditions.
- Secondary objectives:
- To validate what percentage of cases should be refered according the HCP with the information provided by the device.
- To validate what percentage of cases could be handled remotely with the information provided by the device.
- Design: Prospective cross-sectional study.
- Sample size: In this study, a total of 16 healthcare professionals (HCPs) participated, comprising 10 primary care doctors and 6 dermatologists. Among them, 12 completed the entire process, while the remaining 4 reviewed a partial number of images, specifically 28, 15, 9, and 1 respectively.
- Duration: 4 months.
- Results:
- The use of the medical device improved diagnostic accuracy from 68.08% to 88.78%, with primary care physicians seeing an increase from 62.90% to 89.92% and dermatologists from 76.47% to 86.93%. Significant improvements were observed in conditions like tinea, granuloma annulare, and seborrheic keratosis.
- In assessing the impact of Legit.Health on referrals, our findings revealed that 58.1% of cases did not necessitate a referral. However, this percentage varied slightly to 60.89% for primary care doctors and 53.59% for dermatologists.
- Experts agreed on remote management for acne, herpes, and tinea, while melanoma and nevus required in-person care.
- Additionally, 87% of healthcare professionals found the tool efficient, reducing consultation time to under 10 minutes.
- Conclusions: In conclusion, Legit.Health proved to be a valuable tool in enhancing diagnostic accuracy for both primary care physicians and dermatologists. The system was particularly effective in improving the management of conditions such as tinea, granuloma annulare, and seborrheic keratosis, and facilitated more accurate diagnoses across a wide range of skin conditions. Its use reduced the need for referrals, allowing a significant portion of cases to be managed remotely, which helped alleviate pressure on specialist care. Feedback from healthcare professionals highlighted its utility and efficiency, especially in supporting remote consultations and streamlining patient management. These findings suggest that Legit.Health can play a crucial role in improving diagnostic workflows and optimizing healthcare resources in dermatology.
- More information: For more project information, see the study protocol at
R-TF-015-006 Clinical investigation plan LEGIT.HEALTH_SAN_2024
.
Activity 3: Image recognition processor success metrics
Introduction
This activity is an analysis of the performance of the previous generation of the device, which is identical in terms of software components. All the reports generated by the device during its time in the market were reviewed and analysed.
The goal of this analysis is to understand the performance of the device and its evolution throughout different iterations of the device, as well as to detect and inspect any possible case of unsuccessful performance, confirm the safety and performance, identify possible product misuse and monitor emergent risks.
Methodology
Source data
To conduct this analysis, a total of 4,857 reports were available to review, out of which 3,708 came from images taken by patients and 1,149 by health care practitioners.
Each report contains a variety of data, but we focused on the following items:
- Input image
- List of predicted classes
- Confirmed diagnosis (the class selected by the healthcare practitioner)
- Malignancy and premalignancy scores
- Name of the healthcare organisation
- Name of the health care practitioner
- Body site depicted in the image
Metrics
The performance of the device was evaluated using the following metrics:
- Top-1, Top-3, and Top-5 accuracy: Top-1, Top-3, and Top-5 accuracy metrics collectively provide a nuanced understanding of the device's performance, revealing not just its precision in making the correct prediction outright, but also its capability to list the correct outcome among its most confident suggestions. This ensures a comprehensive assessment of the device's reliability and effectiveness in aiding medical practitioners in the diagnostic process.
- Top-5 error rate: to measure how often the top-5 outputs of the device do not include the expected class.
- Taxonomic Discrepancy Rate (TDR): describes the rate at which there are differences between the device's output and the practitioner's diagnosis due to taxonomic discrepancies.
- Area Under the Curve (AUC): to measure the malignancy suspicion performance. The AUC is derived from the Receiver Operating Characteristic (ROC) curve, which is a graphical representation of a diagnostic test's true positive rate (sensitivity) against its false positive rate (1 - specificity) across various threshold setting. An AUC value ranges from 0 to 1, where an AUC of 0.5 indicates no discriminative ability (akin to random guessing), and an AUC of 1.0 indicates perfect discriminative ability.
These metrics can be easily computed from the data available in the reports based on the classes predicted by the image recognition processor used at the time of the report and the corresponding confirmed class. We refer to these metrics as the accumulated metrics, since they don't come from a single processor but from the continuous iterations of the image recognition processor.
In order to compare the accumulated performance to that of the latest image recognition processor, all the images were processed to obtain the latest ICD class probability distributions and the malignancy and premalignancy probabilities. We refer to these as the latest metrics.
- Accumulated Metrics: Derived from past reports, these metrics aggregate the results of continuous iterations of the image recognition processor over time, using predicted and confirmed classes.
- Latest Metrics: These metrics result from reprocessing all images with the most current image recognition processor to obtain up-to-date probability distributions and assess its performance.
Known limitations
Before delving into the results, it's important to understand four limitations of this analysis:
1. We only used the reports that had confirmation
The percentage of reports with a confirmation from an HCP was 81.20% (3,944). We discarded the remaining 913 reports to conduct this analysis.
2. The 'confirmation' cannot be considered a gold standard
The data we are using as 'confirmed' has been determined by a wide range of practitioners in real-world settings. These HCPs range from primary care physicians to nurses, who are not experts in the disease - as well as dermatologists. Furthermore, we have no way of knowing how thorough their assessment was. This means that, when the device does not match the confirmation, there is a significative chance that the practitioner is wrong and the device is correct.
3. The Taxonomic Discrepancy Rate (TDR) is 42%
From the 3,944 reports that had confirmation from an HCP, it is of utmost importance to mention that there was a TDR of 0.42, which is very high.
TDR describes the rate at which there are differences between the device's output and the practitioner's diagnosis due to taxonomic discrepancies. For example, this can happen if the device outputs Acne
, but the practitioner confirms the class Acne vulgaris
or Steroid acne
, which are not actually part of the device.
After inspecting the reports where the confirmed class was not in the list of predicted classes, we discovered that most of them were mismatches due to the taxonomy of the ICD classes of the image recognition processor at the time of the report.
We found reports of acne, rosacea, atopic dermatitis and psoriasis that present this behavior: these may all contribute to the low top-K metrics.
4. Image quality and image content-related issues
After inspecting the images of the incorrect reports we observed that a high percentage of images were not properly taken. For example, in many images the object of interest was too far away and not properly selected, thus reducing the usability of the image to generate a valid preliminary report. However, the overall visual quality of the images of the incorrect report was acceptable. To assess visual quality, we used the device's integrated Dermatology Image Quality Assessment (DIQA) algorithm.
DIQA was developed in 2022, which means that all reports before that year could not be used to create the chart as they did not contain a DIQA score.
Unsurprisingly, there are no images with a DIQA score below 50%. This is because all the reports included in this analysis are the ones sent through a user interface that rejects images which low quality and prompts the user to re-take the image.
However, as further explained later on, there are still issues with images. Even if their quality is good or acceptable, the region of interest may be too small or occluded in the image, potentially reducing the accuracy of the device.
These images are examples of pigmented lesions that are too small compared to the total image size. When the images are resized to the analyzed by the image recognition processor, the relevant semantic content of the images becomes almost residual.
To correct this, we cropped the pigmented lesions from the images. This resulted in a total of 467 images to be analyzed. The analysis was done using the prediction of the latest version of the image recognition processor.
Summary of results
After making efforts to correct for taxonomic discrepancy and issues of quality and incorrect images, the analysis of the performance of the device yelds the following results:
Metric name | Value |
---|---|
Top-1 accuracy | 0.5161 |
Top-3 accuracy | 0.6959 |
Top-5 accuracy | 0.7730 |
Top-5 error rate | 0.2270 |
These results are somewhat lower than the results garnered during the initial clinical evaluation of the device, but are still high enough to fulfill its intended purpose effectively.
Malignancy suspicion
Even considering the 42% Taxonomic Discrepancy Rate, the AUC metric for the malignancy suspicion is outstanding, as the following table shows.
Metric name | Value |
---|---|
Malignancy AUC | 0.9501 |
Taxonomic Discrepancy Rate | 0.1895 |
This is possibly due to the fact that the malignancy suspicion is an index, or an aggregated metric that sums the likelihood of ICD classes that are considered malignant, which makes it less susceptible to the impact of the high TDR.
Evolution of malignancy suspicion
In order to assess the evolution of the image recognition processor, we grouped the reports by year and computed the metrics in each year frame:
Year | 2021 | 2022 | 2023 |
---|---|---|---|
Taxonomic Discrepancy Rate | 0.47 | 0.56 | 0.19 |
Malignancy AUC | 0.47 | 0.58 | 0.95 |
Data shows that there is an inverse correlation between the Taxonomic Discrepancy Rate and the AUC of the malignancy suspicion index, which is to be expected. Still, there is an impressive jump in accuracy in 2023.
It was not really so bad before 2023. It's just that prior to 2023, the malignancy probability included both malignant and premalignant. Since 2023, malignancy scores only account for malignant ICD classes.
Classification of pigmented lesions
We explored the accumulated performance of the image recognition processor on pigmented lesions by computing the metrics exclusively with the reports of pigmented lesions. The results suggest that the current image recognition processor is particularly suitable and reliable for specific ICD classes.
This can be attributed to the current heterogeneity of the image dataset used to train the image recognition processor, which includes a higher percentage of images of pigmented lesions than other ICD classes.
Metric name | Value (pigmented lesions) |
---|---|
Top-1 accuracy | 0.5846 |
Top-3 accuracy | 0.7752 |
Top-5 accuracy | 0.8094 |
Top-5 error rate | 0.1906 |
Taxonomic Discrepancy Rate | 0.4161 |
Metrics group by managing organisation
It is interesting to analyse performance metrics by grouping reports by the managing organisation they belong to.
To this effect, part of the metrics were also computed per managing organisations and then averaged. We present both the average and weighted average (using the number of reports of each managing organisation as weights) of each metric and the corresponding standard deviation, if applicable.
Method | Top-1 accuracy | Top-3 accuracy | Top-5 accuracy |
---|---|---|---|
Average | 0.4488 ± 0.3274 | 0.6283 ± 0.3574 | 0.6941 ± 0.3436 |
Weighted average | 0.2018 | 0.3534 | 0.4133 |
Metrics group by ICD class
For each ICD class, we reframed the multi-class recognition as a binary classification and computed the same classification metrics. The high variability (high standard deviation) suggests that the performance of the processor heavily depends on the ICD class captured in the input image.
Note that sensitivity is omitted because recall and sensitivity refer to the same metric.
Metric name | Value (mean and standard deviation) |
---|---|
Top-1 precision | 0.2117 ± 0.3152 |
Top-1 recall | 0.2452 ± 0.3446 |
Top-1 specificity | 0.9948 ± 0.0327 |
Top-3 precision | 0.1129 ± 0.1877 |
Top-3 recall | 0.3889 ± 0.1877 |
Top-3 specificity | 0.9865 ± 0.0452 |
Top-5 precision | 0.0884 ± 0.1422 |
Top-5 recall | 0.4996 ± 0.4305 |
Top-5 specificity | 0.9789 ± 0.0528 |
Activity 4: Similar devices comparison
We consulted sanitary alert databases specified at the table looking for incidents or alerts by using similar devices or methodologies as ours. We consulted in these databases the following aspects:
- The similar devices described: Dermengine, Fotofinder hadyscope pro app, Skinscreener, Skinvision and Triage.
- The following keywords: artificial intelligence, dermatology, deep learning, medical imaging, computer vision, as we had specified at the
R-TF-007-001 PMS plan
.
Source of information | Link | Results | Analysis |
---|---|---|---|
FDA website MAUDE - manufacturer and User Facility Device Experience Searchable database | http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfMAUDE/search.CFM | 0 | |
FDA website Medical Device Recalls | http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfRes/textsearch.cfm | 0 | |
German Federal Institute for Drugs and Medical Devices (Bundesinstitut für Arzneimittel und Medizinprodukte, BfArM) | https://www.bfarm.de/EN/Medical-devices/Tasks/Risk-assessment-and-research/Field-corrective-actions/_node.html | 25 | Results obtained using the keywords "medical imaging". The incidences found have no relation to our device |
Swissmedic Swiss Competent Authority | https://www.swissmedic.ch/swissmedic/en/home/medical-devices/fsca.html | 71 | Results obtained using the keywords "medical imaging". The incidences found have no relation to our device, they are mainly related to X-ray and IVD devices |
AEMS Vigilancia de productos sanitarios | https://alertasps.aemps.es/alertasps/alertas | 0 | |
MHRA Adverse events reporting | https://www.gov.uk/drug-device-alerts | 15 | Results obtained using the keywords "artificial intelligence" and "deep learning". Although one result is related to a PACS, this one and the other incidences are not related to our device |
Ministero della Salute - Avvisi di sicurezza sui dispositivi medici | https://www.salute.gov.it/portale/news/p3_2_1.jsp?lingua=italiano&menu=notizie | 1 | Result obtained using the keyword "dermatology", but the incidence was related to cosmetics. |
Regarding the similar devices, only the Triage search yielded results, that are not included within the table as they are related to cardiac devices.
Evaluation of clinical data relating to similar devices
The table shows the results obtained on the literature search in relation to similar devices used in the skin structures field. Regarding the clinical data found in relation to facial palsy, they use real patient data to train some model, but there are no specific metrics on whether they improve outcomes over doctors because in this case the tasks they do are different.
Product name of equivalent /similar device | Results discussed | References used to get the results |
---|---|---|
Dermengine, Fotofinder hadyscope pro app, Skinscreener, Skinvision and Triage | There were no issues related to the similar devices in any of these databases. This indicates that the equivalent devices are safe and perform as specified by the different manufacturers for the specific intended use. | Results from the Activity 4 of this PMCF report |
Fotofinder | Although their Moleanalyzer Pro tool just focused on moles, their findings suggest that dermatologists may improve their performance when they cooperate with the CNN (Convolutional Neural Network) and that a broader application of this human with machine approach could be beneficial for dermatologists and patients | https://jamanetwork.com/journals/jamadermatology/fullarticle/2804568 |
7-class skin disease recognition | This AI has been trained on skin-related data collected in hospitals from the Southwest of Ethiopia, Eastern Amhara, and Afar region. Final device works with an accuracy, precision, and sensitivity of above 97%, showing high safety and performance to be used as an assistive tool. | ID 019 - https://onlinelibrary.wiley.com/doi/10.1002/ski2.81 |
Acne severity assessment app | The software developed shows a great performance for the acne severity assessment, being able to count and classify with high precision the different acne lesions. The performance of this app surpasses the General Practitioners' and gets close to the more experienced dermatologists. However, since the app could be biased towards the Chinese-like population, data from other regions should be included in the learning system. | ID 021 - https://link.springer.com/article/10.1007/s10489-022-03774-z |
10-class cutaneous tumor recognition | This study shows how the use of this AI can assist dermatologists in increasing their lesion analysis performance. In particular, this boost of performance is bigger for the dermatologists with less experience | ID 030 - https://link.springer.com/article/10.1007/s10489-022-03774-z |
mHealth app (CE-marked) | 40 skin diseases recognition AI trained in images of skin of colour from India. The app reflects an top-1 accuracy of 75% in clinical trials, 89% of top-3 accuracy, and 0.90 AU, showing its viability as a clinical decision support tool. A posterior independent study revealed that the app presents a sensitivity and specificity that surpasses the one from General Practitioners and gets close to the dermatologists' one. The performance also decreases depending on the phone device used, so the image quality should be taken into account. | ID 053 - https://onlinelibrary.wiley.com/doi/10.1111/jdv.16967 ID 081 - https://karger.com/drm/article/238/4/649/828305/Validation-of-a-Market-Approved-Artificial |
174-class skin disease recognition | This study reveals that this AI model gets comparable top-1 accuracy (47.6%) to the dermatologists (49.7%) and residents (47.7%) but superior to the general practitioners (39.7%), showing promising capabilities as an assistance for skin lesion recognition | ID 085 - https://onlinelibrary.wiley.com/doi/10.1111/jdv.16979 |
Two applications able to recognize 47 lesion categories, that fulfill the CE-criteria, and registered as medical product at the Austrian Federal Office for Safety in Health Care | A clinical trial in Austria reveals that these two apps present high sensitivity and specificity (94-96%), probing its performance and suitability for skin lesion analysis assistance. | ID 140 - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280670 |
Review of 272 clinical studies that include different AI solutions to facilitate the early diagnosis of skin cancers, especially in primary and community care settings. | This study reveals an average high accuracy for the recognition of melanoma (89%), squamous cell carcinoma (85%), basal cell carcinoma (87%), and malignancy estimation (88%). Although these numbers show the potential benefits of AI for skin lesion analysis (especially in primary care), some studies present different concerns related to the size, variability, and source of the studied population. | ID 143 - https://qmro.qmul.ac.uk/xmlui/handle/123456789/78876 |
Study of the benefits when using AI assistance for skin image analysis | The study involves 20 primary care physicians and 20 nurse practitioners with different levels of experience. When assisted by the AI, these practitioners increased their diagnosis agreement, demanded fewer biopsies and referrals, and increased their confidence. | ID 184 - https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2779250 |
Activity 5: Feedback and complaint analysis
Activity 5 is an extensive examination of user feedback and reported issues related to the device. This activity is structured into two main parts: survey results analysis and complaints handling.
The comprehensive analysis of feedback and complaints reveals a strong positive trend in user satisfaction and the effectiveness of the device, with particular improvement noted from 2022 to 2023. Users from different healthcare specialties have expressed high levels of satisfaction, indicating the device's broad applicability and effectiveness. The resolution of customer complaints through targeted corrective and preventive actions demonstrates a commitment to continuous improvement and user support. However, the recurring platform access issues highlight a need for improved user education or system enhancements to facilitate smoother operation and enhance overall user experience. These insights are crucial for future device enhancements and ensuring sustained user satisfaction.
Surveys results
2022
We garnered feedback exclusively from professionals within secondary healthcare.
- Performance: In 2022, users in the secondary healthcare specialty rated the application's performance highly, with an average score of 9, indicating that it met or exceeded their expectations for its intended use.
- Ease of use: Respondents found the application easy to use, with an average score of 8.
- Usefulness of information: Users rated the information provided by the application favorably, with an average score of 7, indicating that it was considered valuable for clinical decision-making.
- Remote consultations: About 5 users reported that the application had reduced the time of in-person consultations, suggesting that it had a positive impact on the efficiency of patient care.
- Time optimization: Users agreed on the device's capacity for time optimization, indicating that they found it helpful in managing their time in alignment with patient needs.
- Patient triage: Users found the tool useful for managing patients with different degrees of urgency or priority, with an average rating of 7, highlighting its role in patient prioritization.
- Speed: In terms of speed, the application received an average rating of 8, suggesting that users found it efficient in generating algorithm results.
- Diagnostic support: The tool was highly regarded for its diagnostic support, with an average score of 9, indicating that it was efficient in supporting clinical diagnoses.
- Patient state information: Users found that the application contributed to obtaining more information about the patient's state, which emphasizes its role in enhancing patient monitoring.
- Objectivity: Users reported an average score of 8, indicating that the application increased the objectivity in patient follow-up, potentially reducing subjectivity in clinical assessments.
- General satisfaction: Users were generally satisfied with the application, with an average rating of 9, indicating a high level of overall user satisfaction.
- Recommendation: Users were highly likely to recommend the service to other professionals, with an average rating of 9, suggesting strong confidence in the application's value.
2023
We received responses encompassing health care professionals across both primary and secondary healthcare specialties.
- Performance: Users from both specialties provided positive ratings for the application's performance. The positive trend from 2022 continued in 2023, indicating consistent performance satisfaction.
- Ease of use: Users in 2023 found the application equally easy to use as in 2022, maintaining a high level of usability.
- Usefulness of information: Users in 2023 found the information provided by the application to be valuable, suggesting that the application continued to deliver relevant clinical data.
- Remote consultations: Users in 2023 reported that the application continued to reduce the time of in-person consultations, emphasizing its ongoing role in enhancing efficiency.
- Time optimization: Users from both specialties in 2023 felt that the application helped optimize their time, indicating that it was equally valuable for both groups of professionals.
- Patient triage: Users in 2023 found the tool useful for patient triage, confirming its ongoing role in patient prioritization.
- Speed: The application received positive feedback for speed in 2023, maintaining an efficient performance.
- Diagnostic support: Users in 2023 found the tool efficient as diagnostic support, reinforcing its role in clinical decision-making.
- Patient state information: The application continued to contribute to obtaining more information about the patient's state in 2023, suggesting ongoing enhancements in patient monitoring capabilities.
- Objectivity: In 2023, the application continued to increase objectivity in patient follow-up, highlighting its role in reducing subjectivity.
- General satisfaction: Users were consistently satisfied with the application in 2023, with satisfaction levels matching those from 2022.
- Recommendation: Users in 2023 were equally likely to recommend the service to other professionals, reinforcing the application's value and trustworthiness.
In summary, the survey results in 2023 show a consistent positive trend in user satisfaction and perceived benefits of the device, with users in both primary and secondary health care specialties reporting similar positive experiences. The application appears to have addressed some of the concerns raised in 2022, such as image quality and diagnostic capabilities, resulting in improved ratings and user recommendations. This suggests that ongoing development and enhancements have been successful in meeting user needs and expectations.