Research and planning
This page is for internal planning only. It will not be included in the final response to BSI.
What BSI is asking
The response.json for T377 (ICD Category Distribution, case C537) does not match master.csv. Specifically:
master.csvexpectsicd_distributionandtop_5_predictionskeys, butresponse.jsondoes not have them- The
entropyandprobabilityvalues are similar but not identical
Root cause analysis — INVESTIGATION REQUIRED
The root cause has not been confirmed. The S3 evidence must be inspected before a response can be drafted. This is a blocker for writing the final response to BSI.
From master.csv (ai-models-integration-tests.csv, line 279), the expected result for T377 is:
{
"icd_distribution": {
"entropy": 0.39412604460588385,
"top_5_predictions": [
{"icd_code": "2C30", "name": "Cutaneous melanoma", "probability": 70.807356},
{"icd_code": "2F20.1", "name": "Atypical melanocytic nevus", "probability": 0.308577},
...
],
"full_distribution": [...]
}
}
BSI says the actual response.json in the evidence folder doesn't match. Critically, BSI states that icd_distribution and top_5_predictions keys are missing from response.json. This is a structural mismatch, not a numerical precision issue. The floating-point differences BSI also mentions ("entropy and probability are similar, but not the same") are secondary to the structural problem.
There are two plausible explanations (ordered by likelihood):
-
Post-processing layer mismatch (most likely): The Software Architecture (R-TF-012-029) describes the condition classifier response in a
study_aggregate.findings.hypothesesstructure. Theicd_distributionwrapper may be added at a different pipeline stage (e.g., the API gateway's report builder). If theresponse.jsonwas captured at the model inference layer rather than the API response layer, the JSON structure would differ — it would have raw model outputs without theicd_distribution/top_5_predictionswrapper keys. This also explains the minor numerical differences: the API layer may apply rounding or formatting not present in raw model output. -
Evidence compilation error: The
response.jsonprovided to BSI was from a different test run or a different version of the model. The evidence in S3 may have been overwritten or incorrectly compiled into the submission package.
The investigation must determine:
- What JSON structure does the actual
response.jsonats3://legit-health-plus/integration-verification/condition-classifier/case-001/evidence/contain? - Does it have the
icd_distributionwrapper? If not, what top-level keys does it have? - Can the numerical values be reconciled with the expected values within the acceptance criteria (≤ 1e-5)?
Relevant QMS documents
| Document | Path | Relevance |
|---|---|---|
| Integration tests CSV | ai-models-integration-tests.csv line 279 (T377) | Expected ICD Category Distribution output |
| Software Architecture | R-TF-012-029-Software-Architecture-Description.mdx | Condition Classifier response structure and pipeline stages |
| models.json | models.json lines 4-25 | ICD Category Distribution model specification |
| AI/ML Release Report | r-tf-028-006-aiml-release-report.mdx | Model integration verification package |
Gap analysis
- Already had: The expected results are well-defined in the CSV. The model produces the correct output structure.
- BSI couldn't find: A matching
response.jsonin the evidence folder. - Needs updating: (a) Investigate the actual
response.jsonin S3 to determine the structural mismatch; (b) if the evidence was captured at the wrong pipeline layer, re-capture at the API level; (c) if numerical differences exist within tolerance, explain the acceptance criteria and why small differences are expected; (d) provide corrected evidence.
Response strategy
Regulatory mapping for this response:
| Requirement | How our response addresses it |
|---|---|
| Annex II 6.2(f) | V&V evidence for the ICD Category Distribution model must match the expected output specification. Corrected evidence provided. |
| EN 62304 §5.5 | Integration verification evidence must demonstrate correct integration of the model within the software system |
Action required (investigation-dependent — cannot write final response until step 1 is complete):
-
BLOCKER: Investigate the actual
response.jsonin S3 ats3://legit-health-plus/integration-verification/condition-classifier/case-001/evidence/to determine the exact content BSI received. Document the JSON structure and values found. -
Based on investigation results, one of two responses:
If structural mismatch (no
icd_distributionwrapper — most likely):- Explain that the evidence was captured at the model inference layer rather than the API response layer, per the architecture described in R-TF-012-029
- Explain that the integration test specification in
master.csvdefines the expected API-level response, and the verification compares at this level - Provide the correct API-level response as evidence in the supplementary PDF
- Describe corrective action: evidence collection now captures at the API response layer to match the expected output specification
If same structure but different values:
- State the acceptance criteria for classification models (≤ 1e-5 per element, from R-TF-028-006)
- Provide a numerical comparison showing each differing value falls within tolerance
- If values exceed tolerance, explain the cause (e.g., non-deterministic TTA) and provide re-captured evidence
-
Provide the correct, matching evidence in the supplementary PDF.
Response tone (to be finalised after investigation): "The expected results in the integration test specification (master.csv, T377) define the API-level response structure per R-TF-012-029, including icd_distribution with entropy, top_5_predictions, and full_distribution. [Investigation-dependent explanation]. Per Annex II 6.2(f), we provide corrected evidence demonstrating the model produces the expected output. Corrective action: [investigation-dependent corrective action]."
Action items:
| # | Action | Owner | Document affected | Priority |
|---|---|---|---|---|
| 10 | BLOCKER: Investigate T377 response.json in S3 | Gerardo | — | Critical |
| 11 | Provide corrected T377 evidence | Gerardo | Supplementary evidence PDF | High |