Pandas
Description​
Pandas is a software library written for the Python programming language for data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series, such as DataFrames
and Series
, which can handle a wide range of data types and are equipped with tools for reading and writing data, handling missing data, filtering data, cleaning messy data, and much more. All these features make it a very convenient tool for data science, finance, and many areas of data analysis and machine learning.
General details​
- Developer(s): Wes McKinney initially started Pandas development in 2008, with significant contributions from many other developers over the years.
- Open source: Yes
- Language(s): Primarily written in Python, with some parts in Cython for performance.
- Repository: https://github.com/pandas-dev/pandas
- License: BSD-3-Clause
- Operating system(s): Compatible with Linux, Windows, and macOS.
- Actively maintained: Yes (within the past week)
Intended use on the device​
The SOUP is used in the medical device for the following specific purposes only:
- Perform processing, filtering, sorting and categorising operations on clinical data (but also non-clinical) that can naturally fit into a table.
Requirements​
For the integration and safe usage of this SOUP within a software system, it's important to outline both functional and performance requirements. These requirements help mitigate risks and ensure compatibility and performance standards are met.
Functional​
- Data structures: Support for high-level data structures like DataFrames and Series, capable of handling large datasets efficiently.
- Data manipulation: Capabilities to perform operations such as merging, reshaping, selecting, as well as data cleaning, and pivoting.
- Missing data handling: Features to easily detect, remove, or fill missing data.
- Time series analysis: Tools for working with dates and times, including date range generation, moving window functions, and shifting and lagging of data.
- File I/O: Ability to read and write data in various formats, including CSV, JSON, Parquet, SQL databases, and HDF5.
- Data aggregation and grouping: Support for grouping data and performing aggregate functions for summarisation, transformation, and filtration.
Performance​
- Speed: Optimised for performance, through internal use of C code and algorithms designed for efficiency. This involves leveraging vectorised operations for accelerated execution of data manipulation tasks.
- Resource utilisation: Efficient memory management, particularly when handling large datasets or when performing operations that typically require significant memory overhead.
- Scalability: Ability to handle larger-than-memory data sets efficiently, leveraging chunk processing or integration with other tools like
Dask
for distributed computing.
System requirements​
Establishing minimum software and hardware requirements is important to mitigate risks, such as security vulnerabilities, performance issues, or compatibility problems, and to ensure that the SOUP functions effectively within the intended environment.
Software​
After evaluation, we find that there are no specific software requirements for this SOUP. It works properly on standard computing devices, which includes our environment.
Hardware​
After evaluation, we find that there are no specific hardware requirements for this SOUP. It works properly on standard computing devices, which includes our environment.
Documentation​
The official SOUP documentation can be found at https://pandas.pydata.org/docs/
Additionally, a criterion for validating the SOUP is that all the items of the following checklist are satisfied:
- The vendor maintains clear and comprehensive documentation of the SOUP describing its functional capabilities, user guidelines, and tutorials, which facilitates learning and rapid adoption.
- The documentation for the SOUP is regularly updated and clearly outlines every feature utilized by the medical device, doing so for all integrated versions of the SOUP.
Related software items​
We catalog the interconnections between the microservices within our software architecture and the specific versions of the SOUP they utilise. This mapping ensures clarity and traceability, facilitating both the understanding of the system's dependencies and the management of SOUP components.
Although the title of the section mentions software items, the relationship with SOUP versions has been established with microservices (also considered software items, by the way) because each one is inside a different Docker container and, therefore, has its own isolated runtime environment.
SOUP version | Software item(s) |
---|---|
2.2.1 | REPORT BUILDER ICD MULTICLASS CLASSIFIER |
Related risks​
The following are risks applicable to this SOUP from the table found in document R-TF-013-002 Risk management record_2023_001
:
- 58. SOUP presents an anomaly that makes it incompatible with other SOUPs or with software elements of the device.
- 59. SOUP is not being maintained nor regularly patched.
- 60. SOUP presents cybersecurity vulnerabilities.
Lists of published anomalies​
The incidents, anomalies, known issues or changes between versions for this SOUP can be found at:
History of evaluation of SOUP anomalies​
29 Feb 2024​
- Reviewer of the anomalies: Alejandro Carmena Magro
- Version(s) of the SOUP reviewed: 2.2.1
No anomalies have been found.
Record signature meaning​
- Author: JD-004
- Reviewer: JD-003
- Approver: JD-005