R-TF-028-003 Data Collection Instructions: Custom Gathered Data
Table of contents
Purpose and Scope
This document defines the systematic protocol for the collection of dermatological images and associated clinical metadata from custom data gathering activities conducted by or for AI Labs Group S.L. This protocol forms part of the data acquisition strategy for the development, validation, and continuous improvement of the AI algorithms integrated into Legit.Health Plus.
Custom gathered data serves as a critical complement to retrospective archive data, providing controlled, high-quality datasets from known clinical contexts with standardized acquisition protocols. These datasets are sourced from clinical validation studies of the medical device and from prospective data acquisition studies specifically designed for algorithm training and testing purposes.
Context and Rationale
The development of clinically safe, effective, and generalizable AI algorithms requires not only large-scale retrospective data but also carefully curated prospective data collected under controlled conditions [1-3]. Custom gathered data offers several key advantages:
- Acquisition Control: Standardized imaging protocols ensure consistent image quality, resolution, and metadata completeness.
- Reference Standard Quality: Direct access to clinical workflows enables the collection of robust clinical labels, including differential diagnoses and clinical confidence levels.
- Real-World Clinical Context: Data collected during actual clinical practice or under realistic clinical scenarios ensures ecological validity and reflects the intended use environment [4, 5].
- Regulatory Compliance: Custom data collection enables full control over informed consent, data protection, and ethical oversight, ensuring compliance with MDR 2017/745 and GDPR requirements.
- Performance Validation: Data from clinical validation studies provides an independent assessment of device performance in real-world conditions.
This approach ensures the creation of high-quality, well-characterized datasets that enhance model robustness, support regulatory requirements, and enable continuous device improvement.
Objectives
The primary objectives of this custom data collection protocol are:
- Controlled Data Acquisition: To collect dermatological images and metadata under standardized protocols that ensure high quality, completeness, and consistency.
- Clinical Validation Support: To gather data during clinical validation studies, enabling assessment of device performance in real-world clinical settings.
- Algorithm Enhancement: To acquire targeted datasets for specific clinical categories, patient demographics, or imaging conditions to address performance gaps or expand device capabilities.
- Reference Standard Establishment: To establish robust clinical reference standard through expert clinical assessment, differential diagnoses, and where applicable, histopathological confirmation [6].
- Regulatory Compliance: To execute all data collection activities in full compliance with ethical requirements (informed consent, IRB/CEIm approval), data protection regulations (GDPR), and quality management system procedures.
Data Sources and Study Types
Custom gathered data is collected through two primary mechanisms, with distinct intended uses for algorithm development:
Clinical Validation Studies
Data collected during clinical validation studies has a strictly limited use:
- Performance Validation: Primary and sole purpose is to assess the clinical performance of the device in real-world settings, comparing device outputs to reference standard diagnoses.
- Independent Test/Validation Sets Only: Data from clinical validation studies is used exclusively for independent testing and validation of algorithm performance.
- Regulatory Compliance: This separation ensures unbiased performance assessment and compliance with regulatory expectations for independent validation datasets.
Clinical validation studies are conducted in accordance with the device's Clinical Evaluation Plan and comply with all applicable regulatory and ethical requirements.
Dedicated Prospective Data Acquisition Studies
Prospective observational studies designed specifically for data collection purposes, conducted without intervention of the medical device. These studies:
- Are designed to capture specific types of data needed for algorithm development (e.g., rare diagnoses, specific skin types, particular imaging conditions).
- Flexible Dataset Use: Data collected through these studies may be used for algorithm training, validation, and testing as needed for algorithm development and improvement.
- Follow standardized acquisition protocols to ensure data quality and consistency.
- Require IRB/CEIm approval and informed consent from all participants.
- Do not involve any intervention, treatment, or modification of the patient's standard care pathway.
Data Population Characteristics
Recruitment Strategy
Participants are recruited from dermatology clinics and healthcare institutions where AI Labs Group S.L. conducts clinical validation or prospective data acquisition studies. Recruitment strategies vary by study type:
- Clinical Validation Studies: Consecutive enrollment ("all-comers") strategy to ensure representative patient populations and minimize selection bias [7].
- Prospective Data Acquisition Studies: May employ stratified or purposive sampling to ensure adequate representation of specific diagnostic categories, demographics, or clinical presentations.
Participating Institutions
Data collection is conducted at qualified healthcare institutions, including:
- Academic medical centers with dermatology departments.
- Community dermatology clinics with experienced dermatologists.
- Specialized skin disease clinics.
All participating institutions are selected based on:
- Availability of qualified dermatologists.
- Appropriate clinical infrastructure for image acquisition.
- Institutional capacity to comply with ethical and data protection requirements.
- Established collaboration agreements or data collection contracts with AI Labs Group S.L.
Ethical and Legal Considerations
All custom data collection activities are conducted in full compliance with ethical and legal requirements:
Ethical Approval
- All data collection protocols require prior approval from an Institutional Review Board (IRB) or Comité de Ética de la Investigación (CEIm) at the participating institution [8].
- The ethics application includes the study protocol, informed consent forms, data protection measures, and any amendments.
- Ethics approval documentation is maintained in the technical file for each data collection study.
Informed Consent
- All participants provide written Informed Consent prior to any data collection [9].
- The consent process ensures participants are fully informed about:
- The purpose of the study (clinical validation or data collection).
- The types of data being collected (images, clinical metadata).
- How their de-identified data will be used (algorithm development, training, validation).
- Their rights, including the right to withdraw consent.
- Consent forms are available in the participant's native language.
- Signed consent forms are securely stored and maintained according to retention requirements.
Data Protection and Privacy (GDPR Compliance)
- All data collection, processing, and storage activities comply with the EU General Data Protection Regulation (GDPR) [10].
- Data is de-identified at the point of collection:
- Unique anonymized participant identifiers are assigned.
- No personally identifiable information (PII) such as names, dates of birth, medical record numbers, or addresses is collected.
- Image metadata (EXIF data) containing potential identifiers is stripped.
- Data transfer between institutions and AI Labs Group S.L. uses secure, encrypted channels.
- Access to identifiable data (during the active study phase) is restricted to authorized personnel under confidentiality agreements.
- Re-identification keys (linking anonymized IDs to participants) are maintained securely and separately from the research dataset, accessible only to the principal investigator at the collecting institution, and are not shared with AI Labs Group S.L.
Data Processing Agreements
Where applicable, formal Data Processing Agreements (DPAs) are established between AI Labs Group S.L. and participating institutions, clearly defining roles, responsibilities, and data handling procedures.
Inclusion Criteria
Participants and data will be included if they meet all of the following criteria:
- Eligibility: Patients attending a dermatology clinic for assessment of a skin presentation.
- Age: Patients aged 18 years or older (or as specified in the study-specific protocol).
- Consent: Patients who are able and willing to provide written informed consent.
- Clinical Scope: Clinical presentations involving the epidermis, dermis, and associated cutaneous structures, consistent with the intended use of Legit.Health Plus.
- Image Quality: Images captured must meet minimum quality standards (resolution, focus, lighting) as defined in the acquisition protocol.
- Clinical Information: A confirmed diagnosis or clinical assessment by a qualified dermatologist must be available for each case.
Exclusion Criteria
Participants and data will be excluded if they meet any of the following criteria:
- Inability to Consent: Patients unable to provide informed consent due to cognitive impairment, language barriers without interpretation, or other reasons.
- Refusal to Participate: Patients who decline to participate or withdraw consent.
- Poor Image Quality: Cases with extremely low visual quality due to patient factors (e.g., inability to remain still, uncooperative), anatomical factors, or technical issues.
- Out-of-Scope Presentations: Presentations outside the intended use of Legit.Health Plus (e.g., purely mucosal lesions, unless within device scope).
- Incomplete Data: Cases with missing critical metadata (clinical label, imaging modality, lesion location) that cannot be recovered.
- Missing Mandatory Demographics: Cases where age, sex, or Fitzpatrick skin phototype cannot be recorded after reasonable recovery efforts (see GP-028, Mandatory Demographic Metadata Policy).