SP-009-010 Customer KPI Definition and Alignment Helpers
Scope and Purpose
This procedure establishes the guidelines for defining Key Performance Indicators (KPIs) and evaluation protocols when integrating the device into a customer's clinical workflow. It provides the rationale for steering customers away from redundant clinical accuracy validations (often proposed as informal "pseudo-studies") and towards measuring real-world workflow impact and adoption.
The primary objective during implementation is not to re-validate the technology, but to measure how effectively the solution integrates into the clinical environment, supports decision-making, and enhances efficiency for healthcare professionals.
Realistic and Appropriate KPIs
A customer has realistic and appropriate KPIs when they:
- Understand their objectives: They have a clear view of what they want to achieve from a business perspective.
- Understand what to measure and how to measure it: They select the right metrics and apply suitable methodologies to track performance effectively.
Measuring What Matters: Real-World Impact Metrics
Instead of setting up complex studies to re-validate accuracy, the focus must be on how the solution modifies behavior, optimizes the clinical workflow, and provides value to the user in their daily operations. These metrics do not require special study designs or extra steps for the physician; they are gathered from standard usage.
Primary KPIs
These metrics demonstrate the fundamental value of the integration:
- Adoption and Usage Volume: Tracking the number of cases where professionals actively consult the analysis over time. Sustained use is the strongest indicator of clinical value and acceptance.
- Image Quality Improvement: Rejection rate by Legit.Health's Diagnostic Image Quality Assessment (DIQA). This measures how the device helps primary care physicians capture better images, significantly reducing the burden of ungradable cases reaching specialists.
- Clinician Satisfaction: Measuring the perceived clinical utility and satisfaction of the professionals using the Clinical Utility Questionnaire (CUS) and Customer Satisfaction Survey (CSAT).
Secondary KPIs
These metrics measure downstream systemic impact:
- Impact on Referral Volumes: Evaluating aggregated historical referral data versus current data to see if overall unnecessary referrals have decreased, without needing case-by-case tracking.
- Time Efficiency: Measuring the average time spent per case review by specialists before and after device integration.
The following table outlines the standard impact metrics that should be measured to evaluate integration and efficiency:
| Area | Objective | KPI | Metric | Responsible |
|---|---|---|---|---|
| General KPIs | Professional Adoption | Number of primary care physicians using the tool | Total number | Customer |
| General KPIs | Professional Adoption | Number of specialists using the tool | Total number | Customer |
| General KPIs | Clinical Utility | Clinical Utility Questionnaire (CUS) | Target 70% | Customer & Legit.Health |
| General KPIs | Customer Satisfaction | Customer Satisfaction Survey (CSAT) | Target 75% | Customer & Legit.Health |
| General KPIs | Solution Usage | Number of images uploaded | Total number | Legit.Health |
| Quality Control | Ensure image quality | Acceptable quality image rate | % of images accepted | Legit.Health |
| Processing | Optimize technical times | Image processing time | Average seconds | Customer |
Division of Responsibilities
When measuring these impact metrics, responsibilities must be clearly delineated between Legit.Health and the customer.
What Legit.Health provides:
- Aggregate Data and Metrics: Anonymized, aggregated data derived from the device outputs, including DIQA rejection rates, overall image quality scores, total processed images, and the general distribution of malignancy suspicions.
- Standardized Evaluation Tools: The necessary framework and support for the Clinical Utility Questionnaire (CUS) and Customer Satisfaction Survey (CSAT) to accurately measure user satisfaction.
- Methodological Support: Expert guidance to ensure the pilot is structured in a way that generates comparable, standardized, and useful data for business decisions.
What Legit.Health does not provide:
- Workflow Data Collection & Analysis: Legit.Health does not build customized forms within the customer's Hospital Information System (HIS) to collect internal data, nor does it perform the final statistical analysis comparing historical customer data. The execution of internal data collection and workflow analysis must be led by the customer.
Standard Customer KPI Dashboard
This is the reference dashboard we offer to every customer as part of standard onboarding. It is generated automatically from the endpoints the device already exposes — no custom study design, no custom data-collection form, and no change to the clinician's workflow is required. The dashboard below is an interactive mockup populated with a deterministic synthetic dataset calibrated to match the real reference deployment, so customers can see what their KPIs will look like before go-live.
The mockup supports the two integration modes the customer can choose between (iframe or JSON only). The descriptive paragraphs below are intended for direct reuse in commercial contracts.
Iframe integration — contract description
Dashboard functionality (iframe integration): the dashboard is served embedded in the customer's system and is fed by the enriched diagnostic report (one record per report, with the severity assessment merged in when applicable). It includes the following tabs with the following functionality:
- Read me: official glossary and technical definitions of every metric used in the dashboard (DIQA, C1-C5, sensitivity, specificity, suspicion-of-malignancy band and the thresholds in effect at any given moment), to ensure correct interpretation of the data.
- Top 1 category: detailed table of individual diagnostic reports, including the report identifier, the primary diagnostic category (C1) with its probability (P1), the image quality score (DIQA) and the body site.
- Top 5 categories: breakdown of the 5 most likely diagnostic categories for each report (from C1 to C5), allowing the differential of secondary diagnoses identified by the system to be inspected.
- Body site: pie chart and table analysing the distribution of reports by body region analysed (forehead, back, extremities, etc.), highlighting the areas with the highest image volume.
- Suspicion of malignancy: classifies reports into three risk levels (Green, Orange, Red) based on numeric thresholds defined over the suspicion-of-malignancy metric. The thresholds are customer-configurable from the panel itself and every metric, chart and table recomputes automatically when they change.
- Top 1 category analysis: aggregated view of the most frequent primary diagnoses (such as eczematous dermatitis or acne) presented as a pie chart together with mean-probability metrics.
- Time series: line chart showing the time evolution of the number of diagnostic reports generated, making it possible to identify activity peaks and to compare against the daily mean.
- Images rejected by DIQA: failure rate of image quality and a listing of the specific sessions in which images were rejected for not meeting the DIQA technical standards.
- Data transfer: full table of every report with all available columns, a visible-columns picker, and download of the raw payload as JSON for external analysis (including severity scores when the report carries them).
JSON-only integration — contract description
Dashboard functionality (JSON-only integration): the dashboard is fed directly by the API endpoint responses the customer is already consuming, with no visual embedding required. It exposes an endpoint selector (/diagnosis-support, /severity-assessment/automatic/local, /severity-assessment/manual) which determines which tabs are available at any given moment. The tabs are:
- Read me: official glossary and technical definitions of every metric used in the dashboard, aligned with the endpoint the customer is consulting at any given moment.
- Top 1 category (available for
/diagnosis-support): detailed table of/diagnosis-supportcalls, including the call identifier, primary diagnostic category (C1), probability (P1), DIQA and body site. - Top 5 categories (available for
/diagnosis-support): breakdown of the 5 most likely diagnostic categories (from C1 to C5) for each/diagnosis-supportcall. - Body site (available for all three endpoints): pie chart and table analysing the distribution of calls by body region analysed, also flagging how many calls did not specify a body site when the endpoint allows it to be omitted.
- Suspicion of malignancy (available for
/diagnosis-support): classifies calls into three risk levels (Green, Orange, Red) based on customer-configurable numeric thresholds over the suspicion-of-malignancy metric. - Top 1 category analysis (available for
/diagnosis-support): aggregated view of the most frequent primary diagnoses presented as a pie chart together with mean-probability metrics. - Time series (available for all three endpoints): line chart of the time evolution of the number of calls to the selected endpoint, making it possible to identify activity peaks and to compare against the daily mean.
- Images rejected by DIQA (available for
/diagnosis-supportand/severity-assessment/automatic/local): failure rate of image quality and a listing of the sessions in which DIQA rejected images for not meeting the technical standards. - Calls overview (available for the severity endpoints): aggregated view of call volume on the selected severity endpoint, broken down by scoring system, severity distribution (Mild/Moderate/Severe) and body region covered.
- Data transfer (available for all three endpoints): full table of calls to the selected endpoint, a visible-columns picker, and download of the endpoint's raw payload as JSON for external analysis.
Customer KPI dashboard
Interactive mockup of the dashboard shared with every customer. All figures are derived from the endpoints the device already exposes — no custom study design required.
Reference definitions for every metric and field shown in the dashboard. Use this section to confirm the exact meaning of each column before reporting on the data.
- Diagnostic report: the resource defined in HL7 FHIR. It is the basic unit of device output.
- # Diagnostic reports: number of diagnostic reports generated.
- % Diagnostic reports: number of diagnostic reports generated relative to the total.
- Report id: the unique identifier of the diagnostic report.
- Timestamp (CET): date and time the report was generated, expressed in Central European Time.
- DIQA: Dermatology Image Quality Assessment. Measures the visual quality of the image used for analysis.
- # Images: number of images analysed in the report (i.e. images that passed DIQA).
- % Images: number of images analysed in the report relative to the total.
- C1, C2, C3, C4, C5: the top-5 ICD-11 categories identified by the system, ordered by probability.
- C1: the most likely category.
- C2–C5: the following ICD-11 categories in descending order of probability.
- P1, P2, P3, P4, P5: the probabilities for the top-5 ICD-11 categories (0–100).
- Avg. probability: arithmetic mean of P1 across all diagnostic reports where a given ICD-11 category appears as the top prediction (C1).
- Body site: anatomical location shown in the image (e.g.
HEAD_FRONT,TRUNK_BACK). - Suspicion of malignancy: sum of probabilities assigned to ICD-11 categories considered malignant in the literature (0–100). The Green / Orange / Red bands are configurable per customer; the values currently in effect are:
- [0, 15): Green
- [15, 30): Orange
- [30, 100]: Red
- Sensitivity: internal estimated sensitivity of the model (its ability to correctly identify positive cases), in percent.
- Specificity: internal estimated specificity of the model (its ability to correctly identify negative cases), in percent.
Understand the Limitations (Optional)
The Fallacy of "Pseudo-Studies" for Clinical Accuracy
When implementing the solution, customers often propose conducting a local "study" or "evaluation" to measure the device's diagnostic accuracy (for example, by measuring concordance between the device's output and the opinion of their local clinicians). Legit.Health must discourage this approach for several critical reasons:
- The Device is Already Clinically Validated: The technology is already fully validated as a medical device. Its clinical performance metrics (sensitivity, specificity, AUROC) have been proven through rigorous clinical investigations and are continuously monitored as part of Legit.Health's regulatory obligations. For published validation data and further details, see https://legit.health/validation.
- Flawed Methodology (Lack of a Gold Standard): Concordance with a single clinician is not a valid measure of diagnostic accuracy. Assuming that the clinician's judgment is always correct is misleading. A true clinical study requires histological confirmation (biopsy) or, at minimum, a consensus of at least three expert specialists to establish a true "Gold Standard".
- Information Asymmetry: The device and clinicians are not directly comparable. The device analyzes only the provided image and basic patient data. The clinician makes their decision based on a holistic view: the patient's full clinical history, family risk factors, physical examination, and the patient's concern. This asymmetry means differences are expected and do not necessarily indicate an error by either party.
- Complexity and Cost of Rigorous Protocols: If a customer genuinely wishes to measure clinical performance scientifically, they cannot do it informally. It requires setting up a proper clinical investigation protocol, controlling for biases, obtaining ethical committee (IRB) approvals, and ensuring histological ground truth. This level of rigor far exceeds the scope, budget, and timeline of a standard software integration project.
The Complexity of a Proper Pre/Post Workflow Study
Should a customer still insist on measuring the impact on their diagnostic decisions and referrals despite Legit.Health's recommendations, they must understand the methodological requirements. To measure this scientifically without introducing clinical biases, they cannot just compare their diagnosis with the device's; they must use a "Pre/Post" study design, collecting data before and after the professional interacts with the device's output.
The workflow would require a two-step data collection process for every single case:
- Step 1 (Pre-intervention): The professional registers their initial triage decision (e.g., No referral, Routine, Urgent) based solely on their clinical judgment before seeing the AI output.
- Step 2 (Post-intervention): The professional registers their final decision using the exact same scale after reviewing the device's results.
Due to the burden this places on the clinical workflow and the inherent biases, Legit.Health strongly recommends against attempting these complex studies during standard integrations.