PyNVML

General Information

Field	Value
Package Name	nvidia-ml-py (pynvml)
Manufacturer / Vendor	NVIDIA Corporation
Software Category	Library
Primary Documentation	NVML Documentation, PyPI, pyNVML Docs
Programming Language(s)	Python, C
License	BSD-3-Clause
Deployed Version(s)	>=12.560.30 (version-locked at 13.590.44 across expert microservices)
Most Recent Available Version	13.590.48
Last Review Date	2026-01-27

Overview

nvidia-ml-py provides Python bindings for the NVIDIA Management Library (NVML), a C-based programmatic interface for monitoring and managing NVIDIA GPUs. The package wraps NVML functions as Python methods using ctypes, converting NVML error codes into Python exceptions for clean error handling. NVML is the underlying library powering NVIDIA's nvidia-smi command-line tool and is designed as a platform for building third-party GPU management applications. The package is officially published and maintained by NVIDIA Corporation.

Within the medical device software, nvidia-ml-py serves as the GPU resource monitoring and detection layer within the distributed AI inference infrastructure. It is integrated into the legithp-expert framework, which provides the foundation for all 50+ clinical expert microservices. Specifically, nvidia-ml-py is used for:

GPU device detection: The NVMLGPUProvider adapter uses NVML to enumerate available CUDA GPUs, retrieve device handles, and query device counts during microservice initialization
Static device information: Retrieves immutable GPU properties including device name/model, total memory capacity, and CUDA compute capability for infrastructure logging and resource planning
Runtime metrics collection: Queries dynamic GPU metrics including current memory usage, GPU utilization percentage, and temperature for operational monitoring
Resource management: The SystemInfoService aggregates GPU metrics alongside CPU, memory, and disk usage to provide comprehensive resource visibility for the inference platform
Fallback architecture: Part of a provider chain where FallbackGPUProvider attempts PyTorch GPU detection first, falling back to direct NVML queries when PyTorch detection is insufficient

nvidia-ml-py was selected over alternatives due to:

Official support and maintenance by NVIDIA Corporation with regular updates aligned to driver releases
Direct access to low-level NVML functionality not exposed through PyTorch's CUDA interface
Permissive BSD-3-Clause license compatible with commercial medical device software
Graceful degradation when NVIDIA drivers are not installed or GPUs are not present
Comprehensive GPU metrics (utilization, temperature) beyond what PyTorch exposes
Clean Python exception handling for NVML error codes

Functional Requirements

The following functional capabilities of this SOUP are relied upon by the medical device software.

Requirement ID	Description	Source / Reference
FR-001	Initialize the NVML library for subsequent API calls	`pynvml.nvmlInit()` function
FR-002	Clean shutdown of NVML library resources	`pynvml.nvmlShutdown()` function
FR-003	Query the total number of NVIDIA GPUs available on the system	`pynvml.nvmlDeviceGetCount()` function
FR-004	Obtain a device handle for a specific GPU by index	`pynvml.nvmlDeviceGetHandleByIndex()` function
FR-005	Retrieve the name/model of a GPU device	`pynvml.nvmlDeviceGetName()` function
FR-006	Query GPU memory information (total and used bytes)	`pynvml.nvmlDeviceGetMemoryInfo()` function
FR-007	Retrieve CUDA compute capability version (major, minor)	`pynvml.nvmlDeviceGetCudaComputeCapability()` function
FR-008	Query GPU utilization percentage	`pynvml.nvmlDeviceGetUtilizationRates()` function
FR-009	Query GPU temperature in Celsius	`pynvml.nvmlDeviceGetTemperature()` with `NVML_TEMPERATURE_GPU`

Performance Requirements

The following performance expectations are relevant to the medical device software.

Requirement ID	Description	Acceptance Criteria
PR-001	NVML initialization shall complete within acceptable startup time	Library initialization does not dominate service startup latency
PR-002	GPU metric queries shall not introduce significant overhead	Metric queries complete in < 10ms under normal conditions
PR-003	Library shall not cause memory leaks during continuous operation	Stable memory footprint with repeated metric polling
PR-004	Shutdown shall release all NVML resources cleanly	No resource leaks on process termination

Hardware Requirements

The following hardware dependencies or constraints are imposed by this SOUP component.

Requirement ID	Description	Notes / Limitations
HR-001	NVIDIA GPU hardware	Required for meaningful operation; library gracefully reports 0 GPUs if absent
HR-002	NVIDIA GPU drivers installed on the host system	NVML is provided as part of the NVIDIA driver package
HR-003	x86-64 or ARM64 processor architecture	Pre-built wheels available for common platforms

Software Requirements

The following software dependencies and environmental assumptions are required by this SOUP component.

Requirement ID	Description	Dependency / Version Constraints
SR-001	Python runtime environment	Python >=3.6 (ctypes module required)
SR-002	NVIDIA GPU drivers with NVML library	Driver version compatible with deployed NVML version
SR-003	libnvidia-ml shared library	Provided by NVIDIA driver installation

Known Anomalies Assessment

This section evaluates publicly reported issues, defects, or security vulnerabilities associated with this SOUP component and their relevance to the medical device software.

A comprehensive search of security vulnerability databases was conducted for the nvidia-ml-py Python package. No CVEs or security advisories have been reported specifically targeting nvidia-ml-py as of the review date.

While no vulnerabilities affect the Python bindings directly, the following related NVIDIA vulnerabilities were assessed for potential applicability to the device's GPU monitoring infrastructure:

Anomaly Reference	Status	Applicable	Rationale	Reviewed At
CVE-2025-23266 (NVIDIA Container Toolkit)	Fixed	No	Critical (CVSS 9.0) container escape vulnerability in NVIDIA Container Toolkit. Not applicable: this CVE affects the container toolkit, not the NVML library or Python bindings. The device uses standard driver installations, not container toolkit	2026-01-27
CVE-2024-0126 (GPU Display Drivers)	Fixed	No	Code execution vulnerability in GPU display drivers. Not applicable: the device deploys with driver versions that include fixes; nvidia-ml-py is a query-only interface that does not execute arbitrary code on the GPU	2026-01-27

The package provides Python bindings to NVML, which is included in the NVIDIA driver package. Security issues affecting NVML itself would be addressed through driver updates rather than Python package updates, as the Python bindings are thin wrappers around the driver-provided shared library.

The device's usage pattern minimizes attack surface exposure:

Read-only operations: The device uses nvidia-ml-py exclusively for querying GPU information (device count, memory, utilization, temperature); no write operations or GPU configuration changes are performed
Internal monitoring only: GPU metrics are collected for internal resource monitoring and logging; no GPU information is exposed to external users or APIs
Graceful degradation: The NVMLGPUProvider implementation handles NVML initialization failures gracefully, logging warnings and reporting 0 GPUs rather than crashing
Process isolation: Each expert microservice runs in an isolated container with the GPU provider instantiated per-process
Version locking: Requirements lock files pin nvidia-ml-py to version 13.590.44 across all expert microservices
Lifecycle management: NVML shutdown is registered via atexit to ensure clean resource release on process termination
Driver compatibility: The locked nvidia-ml-py version (13.590.x) is aligned with deployed NVIDIA driver versions

Risk Control Measures

The following risk control measures are implemented to mitigate potential security and operational risks associated with this SOUP component:

Version locking via requirements_lock.txt ensures reproducible, auditable deployments
Read-only usage pattern prevents any GPU configuration changes
Graceful handling of missing NVIDIA drivers or GPUs
Exception handling prevents crashes from individual GPU query failures
Container isolation limits potential impact of any exploitation
GPU metrics are used internally only; not exposed to external interfaces

Assessment Methodology

The following methodology was used to identify and assess known anomalies:

Sources consulted:
- National Vulnerability Database (NVD) search for "nvidia-ml-py" and "pynvml"
- Snyk vulnerability database for nvidia-ml-py
- NVIDIA Product Security page
- NVIDIA Archived Security Bulletins
- PyPI package security reports
- GitHub repository issues for related projects (nvidia-ml-py3, pynvml)
Criteria for determining applicability:
- Vulnerability must affect deployed versions (nvidia-ml-py 13.590.44)
- Vulnerability must be exploitable through the device's operational context (read-only GPU monitoring)
- Attack vector must be reachable through the device's interfaces (internal monitoring only)
- Graceful degradation, process isolation, and read-only usage must not already mitigate the vulnerability

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

Author: Team members involved
Reviewer: JD-003 Design & Development Manager, JD-004 Quality Manager & PRRC
Approver: JD-001 General Manager

ㅤ

General Information​

Overview​

Functional Requirements​

Performance Requirements​

Hardware Requirements​

Software Requirements​

Known Anomalies Assessment​

Risk Control Measures​

Assessment Methodology​