Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
  • Records
    • GP-001 Documents and records control
    • GP-002 Quality planning
    • GP-003 Audits
    • GP-004 Vigilance system
    • GP-005 HR and training
    • GP-007 Post-market surveillance
    • GP-009 Sales
    • GP-010 Suppliers
    • GP-011 Provision of service
    • GP-012 Design, Redesign and Development
    • GP-018 Infrastructure and facilities
    • GP-019 Non-product software validation
    • GP-023 Change control management
    • GP-031 Training Data Governance
      • R-031-001-001 Acne Detection Datasets
    • GP-050 Data Protection
    • GP-051 Security violations
    • GP-052 Data Privacy Impact Assessment (DPIA)
    • GP-110 Esquema Nacional de Seguridad
    • GP-200 Remote Data Acquisition in Clinical Investigations
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Legit.Health Utilities
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • BSI Non-Conformities
  • Pricing
  • Public tenders
  • Records
  • GP-031 Training Data Governance
  • R-031-001-001 Acne Detection Datasets

R-031-001-001 Acne Detection Datasets

Datasets Under Evaluation​

Kaggle Acne Computer Vision​

  • Platform: Kaggle
  • URL: kaggle.com/datasets/imtkaggleteam/acne-computer-vision
  • Stated license: Attribution 4.0 International (CC BY 4.0)
  • Original source: Same (primary upload)

Verified: License confirmed as CC-BY-4.0 on the Kaggle dataset page:

Kaggle acne dataset page showing CC-BY-4.0 license

RoboFlow Acne Datasets​

Multiple datasets hosted on RoboFlow, all stated as CC-BY-4.0 on the platform:

  1. universe.roboflow.com/chulalongkorn-university-vjyly/acne-jvorn
  2. universe.roboflow.com/finalyearproject-qe4w9/acne-hh7ag/
  3. universe.roboflow.com/mingeon-cha/acne-data-xgnmi/
  4. universe.roboflow.com/kritsakorn/acne-kbm0q
  5. universe.roboflow.com/testdg2/fullset-original-1
  6. universe.roboflow.com/sophia-bq8e6/skin-condition-5
  7. universe.roboflow.com/skripsi-t886i/jerawat-gksjj
  8. universe.roboflow.com/ellie-zscqn/dataset-cjpri — Contains AcneSCU data

License conflict identified: These RoboFlow datasets state CC-BY-4.0, but some contain images from the AcneSCU dataset, whose original GitHub repository (github.com/pingguokiller/acnedetection) states in the README that commercial use is not permitted. Per GP-031, the governing license is the most restrictive one in the provenance chain.

Step 1: License Check​

Kaggle Acne Computer Vision​

The Kaggle page confirms the license as Attribution 4.0 International (CC BY 4.0):

Kaggle acne dataset page showing CC-BY-4.0 license

Governing license: CC-BY-4.0 (commercial use permitted).

Action: Provide attribution. No further copyright analysis needed.

RoboFlow Acne Datasets​

The RoboFlow uploads state CC-BY-4.0 (commercial use permitted). However, tracing the provenance reveals:

  • Some datasets contain images from ACNE04, a published academic dataset.
  • Some contain images from AcneSCU (github.com/pingguokiller/acnedetection), which states in the README: "Please note that AcneSCU can only be used for non-commercial use. Commercial use is prohibited without the authors' permission."
  • Some contain images from DermNet and other dermatology atlases.
  • Some contain images of unknown origin (likely obtained with web scrapers).

Additional finding: The AcneSCU GitHub repository contains an Apache-2.0 LICENSE file, which permits commercial use.

AcneSCU GitHub repo showing Apache-2.0 LICENSE and non-commercial README notice This contradicts the non-commercial restriction in the README. The Apache-2.0 license is the formal legal instrument; the README statement is an informal, non-binding restriction that cannot override the license file. Nevertheless, as a conservative measure, we apply the TDM analysis in Step 2 as if the restriction were valid.

Governing license: Treated conservatively as non-commercial for datasets containing AcneSCU content, despite the Apache-2.0 LICENSE file.

Action: Proceed to Step 2 (TDM Opt-Out Check).

Step 2: TDM Opt-Out Check​

Required for the RoboFlow datasets whose original source may restrict commercial use.

AcneSCU (Original Source: github.com/pingguokiller/acnedetection)​

The TDM opt-out check is performed against the original data source (GitHub), not the intermediary platform (RoboFlow). Under Art. 4(3) of the DSM Directive, the relevant opt-out must come from the rightsholder, not a platform hosting re-uploaded content.

CheckResultEvidence
robots.txt on github.comNo TDM/AI-specific blocks. Standard blocks for /*/raw/, /*/archive/ only.See below
TDMRep headersNone foundChecked 2026-03-05
HTML meta tagsNone signaling TDM reservationChecked 2026-03-05
Terms of serviceGitHub ToS do not constitute a machine-readable TDM opt-outChecked 2026-03-05
Repository READMEStates "no commercial use" in human-readable text onlySee below
Repository LICENSE fileApache-2.0 (permits commercial use; contradicts README)See below

GitHub robots.txt showing no TDM-specific blocks

AcneSCU GitHub repo showing Apache-2.0 LICENSE and non-commercial README notice

Conclusion: No machine-readable TDM opt-out was found on the original source. The human-readable "no commercial use" statement in the README is insufficient to constitute a valid TDM reservation under Art. 4(3) of the DSM Directive (confirmed by OLG Hamburg, 5 U 104/24, December 2025).

Legal basis: Art. 4 TDM permits the mining of this content for commercial purposes because (a) we had lawful access, and (b) no machine-readable TDM opt-out exists.

RoboFlow Platform (Intermediary)​

CheckResultEvidence
robots.txt on universe.roboflow.comBlocks GPTBot, ClaudeBot, Bytespider, TikTokSpider, Amazonbot, Ai2Bot-Dolma, meta-externalagentSee below
Image pathsDisallow: /*/*/images/ and Disallow: /*/*/dataset/*/download for all user agentsSee below

RoboFlow robots.txt blocking AI crawlers

Analysis: RoboFlow blocks AI crawlers and restricts image/download paths for all bots. However, RoboFlow is the platform host, not the rightsholder of the underlying dataset content. Under Art. 4(3), the TDM reservation must come from the rightsholder. RoboFlow's robots.txt represents RoboFlow's platform policy, not the dataset creators' TDM opt-out. Furthermore, manual download of datasets via the platform's download interface (as a human user with lawful access) is distinct from automated crawling governed by robots.txt.

Conclusion: The TDM analysis for the RoboFlow datasets focuses on the original data sources (GitHub for AcneSCU), where no machine-readable opt-out exists. The RoboFlow platform restrictions do not constitute a rightsholder TDM opt-out for the underlying content.

Kaggle Platform​

CheckResultEvidence
robots.txt on kaggle.comReturns 404 (no robots.txt)See below

Kaggle robots.txt returning 404

Conclusion: No TDM opt-out on Kaggle.

Other Original Sources (ACNE04, DermNet, etc.)​

Each original source must be checked individually for TDM opt-outs when the specific datasets are downloaded. The same methodology applies: check robots.txt, TDMRep headers, HTML meta tags, and archive all evidence with timestamps. The AI team is responsible for performing and documenting these checks at time of data collection.

Step 3: GDPR Assessment​

  • Identifiable features (face, tattoos)? Acne datasets typically show facial or body skin. Facial images are directly identifying. Body-only images may be identifiable depending on context.
  • Embedded metadata? Must be stripped. EXIF data must be checked on download.
  • Anonymization status: Public dataset images without faces or distinguishing marks may be considered sufficiently anonymized. Facial images are personal data.
  • Legal basis for processing: For anonymized images: GDPR does not apply. For identifiable images: Art. 6(1)(f) legitimate interest + Art. 9(2)(j) scientific research, with appropriate safeguards under Art. 89(1).
  • DPIA required? If identifiable facial images are present, yes (refer to GP-052).

Additional GDPR finding for the Kaggle dataset: The file names in this dataset contain what appear to be real names and personal details of the data subjects (e.g., "Naveen-Kumar-is-a-Fruit-seller-from-Lajwana", "Nazreen-Khan-is-a-Homemaker-from-Bally-Census"). This constitutes personal data under GDPR Art. 4(1). These file names must be stripped or anonymized before any processing. This is a significant privacy concern that reinforces the need for EXIF stripping and file name anonymization.

Action items for the AI team:

  1. Strip all EXIF metadata from downloaded images.
  2. Rename all files to remove personal names and identifying information from file names (critical for the Kaggle dataset).
  3. Flag images containing faces or other identifying features.
  4. If identifiable images are present, initiate a DPIA under GP-052.
  5. Apply data minimization: exclude images with identifiable features where possible.

Step 4: MDR / AI Act Documentation​

  • Source identification: Documented in this record (see Datasets Under Evaluation)
  • Legal basis: CC-BY-4.0 (Kaggle); Art. 4 TDM (RoboFlow datasets with restrictive originals)
  • Population representativeness: To be assessed by the AI team during data preparation (GP-028)
  • Labelling methodology: To be documented during annotation (GP-028, R-TF-028-004)
  • Bias assessment: To be conducted during model design (GP-028)
  • Version control: To be maintained by the AI team
  • Retention policy: Copies retained as long as necessary for TDM purposes; secure storage required

Decision​

Kaggle Acne Computer Vision​

APPROVED for use in AI training, subject to:

  • Attribution to the original creator(s).
  • EXIF metadata stripping.
  • File name anonymization (file names contain personal data).
  • GDPR assessment of any identifiable images.

RoboFlow Acne Datasets​

APPROVED for use in AI training under the EU TDM exception (Art. 4, DSM Directive), subject to:

  • The TDM opt-out checks documented above (no machine-readable opt-out found on the original sources).
  • Evidence of lawful access and absence of opt-out must be archived at the time of data collection for each original source not yet checked (ACNE04, DermNet, etc.).
  • Copies retained only as long as necessary for TDM purposes, stored securely.
  • Attribution provided where required by the underlying license.
  • EXIF metadata stripping.
  • File name anonymization where applicable.
  • GDPR assessment of any identifiable images.
  • If identifiable facial images are present, a DPIA under GP-052 must be completed before processing.

Verification Checklist​

This section tracks the fulfillment of the conditions and obligations from this evaluation. Each item must be verified before the dataset is used in any model that will be deployed in the device.

#ObligationResponsibleStatusEvidence / Notes
1Evidence archived (screenshots, robots.txt)JD-003Done5 evidence files in evidence/ folder: github-robots-txt.png, acnescu-github-repo.png, kaggle-acne-dataset.png, kaggle-robots-txt-404.png, roboflow-robots-txt.png
2EXIF metadata stripped from all imagesJD-009DoneAI team confirmed stripping was performed
3File names anonymized (remove personal data from Kaggle filenames)JD-009DoneAI team confirmed renaming was performed
4Images with faces flagged and assessed for GDPRJD-009DoneAI team provided count of facial images and GDPR disposition
5DPIA initiated under GP-052 (if facial images are present)JD-003DoneOnly required if identifiable facial images are retained
6Dataset provenance recorded in technical fileJD-009DoneThis record serves as the provenance documentation; linked from the technical file
7Population representativeness assessedJD-009DoneDocumented under GP-028
8Bias assessment conductedJD-009DoneDocumented under GP-028
9TDM checks for remaining sources (ACNE04, DermNet, etc.)JD-009DoneTDM opt-out checks performed and documented for each original source following the methodology in this record

Evidence Archive​

All evidence was captured on 2026-03-05. Files are stored in the evidence/ subdirectory of this record:

  • github-robots-txt.png: GitHub's robots.txt: no TDM-specific blocks for AI crawlers
  • acnescu-github-repo.png: AcneSCU repo showing Apache-2.0 LICENSE and non-commercial README notice
  • kaggle-acne-dataset.png: Kaggle dataset page confirming CC-BY-4.0 license
  • kaggle-robots-txt-404.png: Kaggle robots.txt returns 404 (no robots.txt exists)
  • roboflow-robots-txt.png: RoboFlow robots.txt blocking AI crawlers (platform policy, not rightsholder opt-out)

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: JD-003 Design & Development Manager
  • Reviewer: JD-004 Quality Manager & PRRC
  • Approver: JD-001 General Manager
Previous
GP-031 Training Data Governance
Next
GP-050 Data Protection
  • Datasets Under Evaluation
    • Kaggle Acne Computer Vision
    • RoboFlow Acne Datasets
  • Step 1: License Check
    • Kaggle Acne Computer Vision
    • RoboFlow Acne Datasets
  • Step 2: TDM Opt-Out Check
    • AcneSCU (Original Source: github.com/pingguokiller/acnedetection)
    • RoboFlow Platform (Intermediary)
    • Kaggle Platform
    • Other Original Sources (ACNE04, DermNet, etc.)
  • Step 3: GDPR Assessment
  • Step 4: MDR / AI Act Documentation
  • Decision
    • Kaggle Acne Computer Vision
    • RoboFlow Acne Datasets
  • Verification Checklist
  • Evidence Archive
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI Labs Group S.L.)