PyNVML
General Information
| Field | Value |
|---|---|
| Package Name | nvidia-ml-py (pynvml) |
| Manufacturer / Vendor | NVIDIA Corporation |
| Software Category | Library |
| Primary Documentation | NVML Documentation, PyPI, pyNVML Docs |
| Programming Language(s) | Python, C |
| License | BSD-3-Clause |
| Deployed Version(s) | >=12.560.30 (version-locked at 13.590.44 across expert microservices) |
| Most Recent Available Version | 13.590.48 |
| Last Review Date | 2026-01-27 |
Overview
nvidia-ml-py provides Python bindings for the NVIDIA Management Library (NVML), a C-based programmatic interface for monitoring and managing NVIDIA GPUs. The package wraps NVML functions as Python methods using ctypes, converting NVML error codes into Python exceptions for clean error handling. NVML is the underlying library powering NVIDIA's nvidia-smi command-line tool and is designed as a platform for building third-party GPU management applications. The package is officially published and maintained by NVIDIA Corporation.
Within the medical device software, nvidia-ml-py serves as the GPU resource monitoring and detection layer within the distributed AI inference infrastructure. It is integrated into the legithp-expert framework, which provides the foundation for all 50+ clinical expert microservices. Specifically, nvidia-ml-py is used for:
- GPU device detection: The
NVMLGPUProvideradapter uses NVML to enumerate available CUDA GPUs, retrieve device handles, and query device counts during microservice initialization - Static device information: Retrieves immutable GPU properties including device name/model, total memory capacity, and CUDA compute capability for infrastructure logging and resource planning
- Runtime metrics collection: Queries dynamic GPU metrics including current memory usage, GPU utilization percentage, and temperature for operational monitoring
- Resource management: The
SystemInfoServiceaggregates GPU metrics alongside CPU, memory, and disk usage to provide comprehensive resource visibility for the inference platform - Fallback architecture: Part of a provider chain where
FallbackGPUProviderattempts PyTorch GPU detection first, falling back to direct NVML queries when PyTorch detection is insufficient
nvidia-ml-py was selected over alternatives due to:
- Official support and maintenance by NVIDIA Corporation with regular updates aligned to driver releases
- Direct access to low-level NVML functionality not exposed through PyTorch's CUDA interface
- Permissive BSD-3-Clause license compatible with commercial medical device software
- Graceful degradation when NVIDIA drivers are not installed or GPUs are not present
- Comprehensive GPU metrics (utilization, temperature) beyond what PyTorch exposes
- Clean Python exception handling for NVML error codes
Functional Requirements
The following functional capabilities of this SOUP are relied upon by the medical device software.
| Requirement ID | Description | Source / Reference |
|---|---|---|
| FR-001 | Initialize the NVML library for subsequent API calls | pynvml.nvmlInit() function |
| FR-002 | Clean shutdown of NVML library resources | pynvml.nvmlShutdown() function |
| FR-003 | Query the total number of NVIDIA GPUs available on the system | pynvml.nvmlDeviceGetCount() function |
| FR-004 | Obtain a device handle for a specific GPU by index | pynvml.nvmlDeviceGetHandleByIndex() function |
| FR-005 | Retrieve the name/model of a GPU device | pynvml.nvmlDeviceGetName() function |
| FR-006 | Query GPU memory information (total and used bytes) | pynvml.nvmlDeviceGetMemoryInfo() function |
| FR-007 | Retrieve CUDA compute capability version (major, minor) | pynvml.nvmlDeviceGetCudaComputeCapability() function |
| FR-008 | Query GPU utilization percentage | pynvml.nvmlDeviceGetUtilizationRates() function |
| FR-009 | Query GPU temperature in Celsius | pynvml.nvmlDeviceGetTemperature() with NVML_TEMPERATURE_GPU |
Performance Requirements
The following performance expectations are relevant to the medical device software.
| Requirement ID | Description | Acceptance Criteria |
|---|---|---|
| PR-001 | NVML initialization shall complete within acceptable startup time | Library initialization does not dominate service startup latency |
| PR-002 | GPU metric queries shall not introduce significant overhead | Metric queries complete in < 10ms under normal conditions |
| PR-003 | Library shall not cause memory leaks during continuous operation | Stable memory footprint with repeated metric polling |
| PR-004 | Shutdown shall release all NVML resources cleanly | No resource leaks on process termination |
Hardware Requirements
The following hardware dependencies or constraints are imposed by this SOUP component.
| Requirement ID | Description | Notes / Limitations |
|---|---|---|
| HR-001 | NVIDIA GPU hardware | Required for meaningful operation; library gracefully reports 0 GPUs if absent |
| HR-002 | NVIDIA GPU drivers installed on the host system | NVML is provided as part of the NVIDIA driver package |
| HR-003 | x86-64 or ARM64 processor architecture | Pre-built wheels available for common platforms |
Software Requirements
The following software dependencies and environmental assumptions are required by this SOUP component.
| Requirement ID | Description | Dependency / Version Constraints |
|---|---|---|
| SR-001 | Python runtime environment | Python >=3.6 (ctypes module required) |
| SR-002 | NVIDIA GPU drivers with NVML library | Driver version compatible with deployed NVML version |
| SR-003 | libnvidia-ml shared library | Provided by NVIDIA driver installation |
Known Anomalies Assessment
This section evaluates publicly reported issues, defects, or security vulnerabilities associated with this SOUP component and their relevance to the medical device software.
A comprehensive search of security vulnerability databases was conducted for the nvidia-ml-py Python package. No CVEs or security advisories have been reported specifically targeting nvidia-ml-py as of the review date.
While no vulnerabilities affect the Python bindings directly, the following related NVIDIA vulnerabilities were assessed for potential applicability to the device's GPU monitoring infrastructure:
| Anomaly Reference | Status | Applicable | Rationale | Reviewed At |
|---|---|---|---|---|
| CVE-2025-23266 (NVIDIA Container Toolkit) | Fixed | No | Critical (CVSS 9.0) container escape vulnerability in NVIDIA Container Toolkit. Not applicable: this CVE affects the container toolkit, not the NVML library or Python bindings. The device uses standard driver installations, not container toolkit | 2026-01-27 |
| CVE-2024-0126 (GPU Display Drivers) | Fixed | No | Code execution vulnerability in GPU display drivers. Not applicable: the device deploys with driver versions that include fixes; nvidia-ml-py is a query-only interface that does not execute arbitrary code on the GPU | 2026-01-27 |
The package provides Python bindings to NVML, which is included in the NVIDIA driver package. Security issues affecting NVML itself would be addressed through driver updates rather than Python package updates, as the Python bindings are thin wrappers around the driver-provided shared library.
The device's usage pattern minimizes attack surface exposure:
- Read-only operations: The device uses nvidia-ml-py exclusively for querying GPU information (device count, memory, utilization, temperature); no write operations or GPU configuration changes are performed
- Internal monitoring only: GPU metrics are collected for internal resource monitoring and logging; no GPU information is exposed to external users or APIs
- Graceful degradation: The
NVMLGPUProviderimplementation handles NVML initialization failures gracefully, logging warnings and reporting 0 GPUs rather than crashing - Process isolation: Each expert microservice runs in an isolated container with the GPU provider instantiated per-process
- Version locking: Requirements lock files pin nvidia-ml-py to version 13.590.44 across all expert microservices
- Lifecycle management: NVML shutdown is registered via
atexitto ensure clean resource release on process termination - Driver compatibility: The locked nvidia-ml-py version (13.590.x) is aligned with deployed NVIDIA driver versions
Risk Control Measures
The following risk control measures are implemented to mitigate potential security and operational risks associated with this SOUP component:
- Version locking via requirements_lock.txt ensures reproducible, auditable deployments
- Read-only usage pattern prevents any GPU configuration changes
- Graceful handling of missing NVIDIA drivers or GPUs
- Exception handling prevents crashes from individual GPU query failures
- Container isolation limits potential impact of any exploitation
- GPU metrics are used internally only; not exposed to external interfaces
Assessment Methodology
The following methodology was used to identify and assess known anomalies:
-
Sources consulted:
- National Vulnerability Database (NVD) search for "nvidia-ml-py" and "pynvml"
- Snyk vulnerability database for nvidia-ml-py
- NVIDIA Product Security page
- NVIDIA Archived Security Bulletins
- PyPI package security reports
- GitHub repository issues for related projects (nvidia-ml-py3, pynvml)
-
Criteria for determining applicability:
- Vulnerability must affect deployed versions (nvidia-ml-py 13.590.44)
- Vulnerability must be exploitable through the device's operational context (read-only GPU monitoring)
- Attack vector must be reachable through the device's interfaces (internal monitoring only)
- Graceful degradation, process isolation, and read-only usage must not already mitigate the vulnerability
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:
- Author: Team members involved
- Reviewer: JD-003, JD-004
- Approver: JD-001