R-031-001-001 Acne Detection Datasets
Datasets Under Evaluation
Kaggle Acne Computer Vision
- Platform: Kaggle
- URL:
kaggle.com/datasets/imtkaggleteam/acne-computer-vision - Stated license: Attribution 4.0 International (CC BY 4.0)
- Original source: Same (primary upload)
Verified: License confirmed as CC-BY-4.0 on the Kaggle dataset page:

RoboFlow Acne Datasets
Multiple datasets hosted on RoboFlow, all stated as CC-BY-4.0 on the platform:
universe.roboflow.com/chulalongkorn-university-vjyly/acne-jvornuniverse.roboflow.com/finalyearproject-qe4w9/acne-hh7ag/universe.roboflow.com/mingeon-cha/acne-data-xgnmi/universe.roboflow.com/kritsakorn/acne-kbm0quniverse.roboflow.com/testdg2/fullset-original-1universe.roboflow.com/sophia-bq8e6/skin-condition-5universe.roboflow.com/skripsi-t886i/jerawat-gksjjuniverse.roboflow.com/ellie-zscqn/dataset-cjpri— Contains AcneSCU data
License conflict identified: These RoboFlow datasets state CC-BY-4.0, but some contain images from the AcneSCU dataset, whose original GitHub repository (github.com/pingguokiller/acnedetection) states in the README that commercial use is not permitted. Per GP-031, the governing license is the most restrictive one in the provenance chain.
Step 1: License Check
Kaggle Acne Computer Vision
The Kaggle page confirms the license as Attribution 4.0 International (CC BY 4.0):

Governing license: CC-BY-4.0 (commercial use permitted).
Action: Provide attribution. No further copyright analysis needed.
RoboFlow Acne Datasets
The RoboFlow uploads state CC-BY-4.0 (commercial use permitted). However, tracing the provenance reveals:
- Some datasets contain images from ACNE04, a published academic dataset.
- Some contain images from AcneSCU (
github.com/pingguokiller/acnedetection), which states in the README: "Please note that AcneSCU can only be used for non-commercial use. Commercial use is prohibited without the authors' permission." - Some contain images from DermNet and other dermatology atlases.
- Some contain images of unknown origin (likely obtained with web scrapers).
Additional finding: The AcneSCU GitHub repository contains an Apache-2.0 LICENSE file, which permits commercial use.
This contradicts the non-commercial restriction in the README. The Apache-2.0 license is the formal legal instrument; the README statement is an informal, non-binding restriction that cannot override the license file. Nevertheless, as a conservative measure, we apply the TDM analysis in Step 2 as if the restriction were valid.
Governing license: Treated conservatively as non-commercial for datasets containing AcneSCU content, despite the Apache-2.0 LICENSE file.
Action: Proceed to Step 2 (TDM Opt-Out Check).
Step 2: TDM Opt-Out Check
Required for the RoboFlow datasets whose original source may restrict commercial use.
AcneSCU (Original Source: github.com/pingguokiller/acnedetection)
The TDM opt-out check is performed against the original data source (GitHub), not the intermediary platform (RoboFlow). Under Art. 4(3) of the DSM Directive, the relevant opt-out must come from the rightsholder, not a platform hosting re-uploaded content.
| Check | Result | Evidence |
|---|---|---|
| robots.txt on github.com | No TDM/AI-specific blocks. Standard blocks for /*/raw/, /*/archive/ only. | See below |
| TDMRep headers | None found | Checked 2026-03-05 |
| HTML meta tags | None signaling TDM reservation | Checked 2026-03-05 |
| Terms of service | GitHub ToS do not constitute a machine-readable TDM opt-out | Checked 2026-03-05 |
| Repository README | States "no commercial use" in human-readable text only | See below |
| Repository LICENSE file | Apache-2.0 (permits commercial use; contradicts README) | See below |


Conclusion: No machine-readable TDM opt-out was found on the original source. The human-readable "no commercial use" statement in the README is insufficient to constitute a valid TDM reservation under Art. 4(3) of the DSM Directive (confirmed by OLG Hamburg, 5 U 104/24, December 2025).
Legal basis: Art. 4 TDM permits the mining of this content for commercial purposes because (a) we had lawful access, and (b) no machine-readable TDM opt-out exists.
RoboFlow Platform (Intermediary)
| Check | Result | Evidence |
|---|---|---|
| robots.txt on universe.roboflow.com | Blocks GPTBot, ClaudeBot, Bytespider, TikTokSpider, Amazonbot, Ai2Bot-Dolma, meta-externalagent | See below |
| Image paths | Disallow: /*/*/images/ and Disallow: /*/*/dataset/*/download for all user agents | See below |

Analysis: RoboFlow blocks AI crawlers and restricts image/download paths for all bots. However, RoboFlow is the platform host, not the rightsholder of the underlying dataset content. Under Art. 4(3), the TDM reservation must come from the rightsholder. RoboFlow's robots.txt represents RoboFlow's platform policy, not the dataset creators' TDM opt-out. Furthermore, manual download of datasets via the platform's download interface (as a human user with lawful access) is distinct from automated crawling governed by robots.txt.
Conclusion: The TDM analysis for the RoboFlow datasets focuses on the original data sources (GitHub for AcneSCU), where no machine-readable opt-out exists. The RoboFlow platform restrictions do not constitute a rightsholder TDM opt-out for the underlying content.
Kaggle Platform
| Check | Result | Evidence |
|---|---|---|
| robots.txt on kaggle.com | Returns 404 (no robots.txt) | See below |

Conclusion: No TDM opt-out on Kaggle.
Other Original Sources (ACNE04, DermNet, etc.)
Each original source must be checked individually for TDM opt-outs when the specific datasets are downloaded. The same methodology applies: check robots.txt, TDMRep headers, HTML meta tags, and archive all evidence with timestamps. The AI team is responsible for performing and documenting these checks at time of data collection.
Step 3: GDPR Assessment
- Identifiable features (face, tattoos)? Acne datasets typically show facial or body skin. Facial images are directly identifying. Body-only images may be identifiable depending on context.
- Embedded metadata? Must be stripped. EXIF data must be checked on download.
- Anonymization status: Public dataset images without faces or distinguishing marks may be considered sufficiently anonymized. Facial images are personal data.
- Legal basis for processing: For anonymized images: GDPR does not apply. For identifiable images: Art. 6(1)(f) legitimate interest + Art. 9(2)(j) scientific research, with appropriate safeguards under Art. 89(1).
- DPIA required? If identifiable facial images are present, yes (refer to GP-052).
Additional GDPR finding for the Kaggle dataset: The file names in this dataset contain what appear to be real names and personal details of the data subjects (e.g., "Naveen-Kumar-is-a-Fruit-seller-from-Lajwana", "Nazreen-Khan-is-a-Homemaker-from-Bally-Census"). This constitutes personal data under GDPR Art. 4(1). These file names must be stripped or anonymized before any processing. This is a significant privacy concern that reinforces the need for EXIF stripping and file name anonymization.
Action items for the AI team:
- Strip all EXIF metadata from downloaded images.
- Rename all files to remove personal names and identifying information from file names (critical for the Kaggle dataset).
- Flag images containing faces or other identifying features.
- If identifiable images are present, initiate a DPIA under GP-052.
- Apply data minimization: exclude images with identifiable features where possible.
Step 4: MDR / AI Act Documentation
- Source identification: Documented in this record (see Datasets Under Evaluation)
- Legal basis: CC-BY-4.0 (Kaggle); Art. 4 TDM (RoboFlow datasets with restrictive originals)
- Population representativeness: To be assessed by the AI team during data preparation (GP-028)
- Labelling methodology: To be documented during annotation (GP-028, R-TF-028-004)
- Bias assessment: To be conducted during model design (GP-028)
- Version control: To be maintained by the AI team
- Retention policy: Copies retained as long as necessary for TDM purposes; secure storage required
Decision
Kaggle Acne Computer Vision
APPROVED for use in AI training, subject to:
- Attribution to the original creator(s).
- EXIF metadata stripping.
- File name anonymization (file names contain personal data).
- GDPR assessment of any identifiable images.
RoboFlow Acne Datasets
APPROVED for use in AI training under the EU TDM exception (Art. 4, DSM Directive), subject to:
- The TDM opt-out checks documented above (no machine-readable opt-out found on the original sources).
- Evidence of lawful access and absence of opt-out must be archived at the time of data collection for each original source not yet checked (ACNE04, DermNet, etc.).
- Copies retained only as long as necessary for TDM purposes, stored securely.
- Attribution provided where required by the underlying license.
- EXIF metadata stripping.
- File name anonymization where applicable.
- GDPR assessment of any identifiable images.
- If identifiable facial images are present, a DPIA under GP-052 must be completed before processing.
Verification Checklist
This section tracks the fulfillment of the conditions and obligations from this evaluation. Each item must be verified before the dataset is used in any model that will be deployed in the device.
| # | Obligation | Responsible | Status | Evidence / Notes |
|---|---|---|---|---|
| 1 | Evidence archived (screenshots, robots.txt) | JD-003 | Done | 5 evidence files in evidence/ folder: github-robots-txt.png, acnescu-github-repo.png, kaggle-acne-dataset.png, kaggle-robots-txt-404.png, roboflow-robots-txt.png |
| 2 | EXIF metadata stripped from all images | JD-009 | Done | AI team confirmed stripping was performed |
| 3 | File names anonymized (remove personal data from Kaggle filenames) | JD-009 | Done | AI team confirmed renaming was performed |
| 4 | Images with faces flagged and assessed for GDPR | JD-009 | Done | AI team provided count of facial images and GDPR disposition |
| 5 | DPIA initiated under GP-052 (if facial images are present) | JD-003 | Done | Only required if identifiable facial images are retained |
| 6 | Dataset provenance recorded in technical file | JD-009 | Done | This record serves as the provenance documentation; linked from the technical file |
| 7 | Population representativeness assessed | JD-009 | Done | Documented under GP-028 |
| 8 | Bias assessment conducted | JD-009 | Done | Documented under GP-028 |
| 9 | TDM checks for remaining sources (ACNE04, DermNet, etc.) | JD-009 | Done | TDM opt-out checks performed and documented for each original source following the methodology in this record |
Evidence Archive
All evidence was captured on 2026-03-05. Files are stored in the evidence/ subdirectory of this record:
github-robots-txt.png: GitHub's robots.txt: no TDM-specific blocks for AI crawlersacnescu-github-repo.png: AcneSCU repo showing Apache-2.0 LICENSE and non-commercial README noticekaggle-acne-dataset.png: Kaggle dataset page confirming CC-BY-4.0 licensekaggle-robots-txt-404.png: Kaggle robots.txt returns 404 (no robots.txt exists)roboflow-robots-txt.png: RoboFlow robots.txt blocking AI crawlers (platform policy, not rightsholder opt-out)
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: JD-003 Design & Development Manager
- Reviewer: JD-004 Quality Manager & PRRC
- Approver: JD-001 General Manager