TEST_012 The user can send requests and get back the output of the device as a response in a secure, efficient and versatile manner
This test run has been replaced by TEST_014, TEST_015 and TEST_018.
Test type | SYSTEM USER ACCEPTANCE |
Linked activities | MDS-252 |
Tester | Alfonso Medela, Technical Manager |
Supervisor | MarΓa Diez , Quality Manager |
Start date | 26 Oct 2023 |
End date | 31 Dec 2023 |
Result (Passed or failed) | - PASSED |
Descriptionβ
Tests carried out at the REST API receiver, designed to verify that this component allows users to send requests and seamlessly receive the device output as a response while ensuring the utmost security, efficiency, and versatility.
This validation encompasses assessing the API's ability to facilitate secure data transmission, ensuring that sensitive medical information remains confidential. Furthermore, it evaluates the efficiency of data retrieval and processing, guaranteeing timely access to critical user data. Additionally, the test examines the API's versatility in accommodating various data formats and response options, thereby enhancing its usability across a wide range of scenarios.
Most importantly, we have invested a great deal of effort in the aspect of user authentication and authorization. We have adopted the OAuth2 protocol, a widely recognized standard, to manage user access. Access tokens, integral to our authentication process, are generated using the JSON Web Token (JWT) standard, enhancing security and scalability.
User information and credentials are stored within a NoSQL document-type database. This database design allows us to efficiently manage user data and adapt to the dynamic nature of modern applications.
One crucial security measure we implement is the hashing of user passwords during registration instead of storing them as plaintext. This means we can never identify a user's actual password. Instead, we securely store a cryptographic hash, making it practically impossible for anyone to reverse-engineer or misuse this sensitive information.
By combining OAuth2, JWT, and secure password hashing within a NoSQL database, we ensure that our REST API not only authenticates and authorizes users effectively but also upholds their privacy and data security, adhering to the highest industry standards.
Test checklistβ
The following checklist verifies the completion of the goals and metrics specified in the requirement:
Requirement verificationβ
- User can interact with the device
- User cannot interact with the device without an access key
- Data encryption coverage
- Uptime percentage
- Time to recovery
- Response time to incidents
- Detection rate
- System resource utilization
- Cyber attackers cannot compromise device security
- User experience is not impacted when sending multiple images
Evidenceβ
At this time, the only way for a user to access the device is through the API. To let a user know if they can connect to the device, we have provided an API endpoint that serves as a health check. Specifically, the endpoint is available in the root path of the API URL, i.e. after the domain name. If the device is healthy and reachable, then the response the user will receive is as follows:
{"status":"API is up and running","components":"All services are operative"}
Regarding the registration in the database of new users allowed to use the device, this is managed via a simple terminal interface hosted only on our servers. The reason for making this decision instead of providing an additional endpoint that only we as developers can access and use it to register users is to add another layer of security to the device.
Immediately after registering a user in the database, an automatic email containing the generated user name and password is sent to the user. These are the credentials the user should use to acquire a JWT token at the login endpoint provided for this purpose. This is how this authentication endpoint is presented in the /docs
path of the API:
If the response from the login endpoint is successful, the user will receive an access token that can be used to access authorized endpoints. The small padlock at the top right of the image means that an API endpoint requires authentication before you can communicate with it.
Finally, it is important to know that a token has an expiration date. This date is determined by the API development team, and is typically no more than one day after the token was created. After this time, the token will no longer be valid and the user will have to authenticate again.
And that's it. That's how easy it is to implement the user authentication process with the OAuth2 protocol.
Additionally, we have to verify the requirements concerning our cloud provider, i.e. AWS. Let's break them down and show evidence that we adhere to each requirement:
Data encryption coverageβ
To ensure that data in transit is encrypted, our API and all endpoints are only available over HTTPS. That is, on the client side, HTTP communication is encrypted using Transport Layer Security (TLS). We support HTTPS through a reverse proxy server named Traefik, which we configured to generate automatic certificates using an ACME (Automated Certificate Management Environment) provider. In our case, this ACME provider is Let's Encrypt.
Therefore, we guarantee the security of electronic data transmitted by users through the implementation of HTTPS protocol encryption, effectively safeguarding it against interception and manipulation by malicious entities.
On the other hand, we store the data at rest in AWS S3. By default, objects stored in S3 Buckets are automatically encrypted. This ensures that the data is also protected on the server side.
Uptime percentageβ
Our strategy for measuring the uptime of the medical device API has been to periodically send GET requests to the root endpoint and store in a database whether the request was successful or not, along with other useful information such as the request timestamp or the HTTP response code.
We have calculated the availability rate using the following formula:
Availability = 100 x (Total number of requests - Number of failed requests) / (Total number of requests)
This is the Python script we have prepared to automatically collect the metrics on a periodic basis:
import datetime
import daemon
import requests
from apscheduler.schedulers.background import BackgroundScheduler
from pymongo import MongoClient
# Configuration
API_URL = "https://medical-device.legit.health/"
MONGO_URI = "mongodb://internal-mongodb-uri"
CHECK_INTERVAL = 60 # in seconds
# Connect to MongoDB
client = MongoClient(MONGO_URI)
db = client.api_monitoring
def check_api():
try:
response = requests.get(url=API_URL, timeout=10)
response.raise_for_status() # Raises an error for 4xx/5xx responses
success = True
status_code = response.status_code
except requests.RequestException as e:
success = False
status_code = None
# Log to MongoDB
db.requests.insert_one({
"timestamp": datetime.datetime.now(),
"success": success,
"http_code": status_code,
})
def main():
scheduler = BackgroundScheduler()
scheduler.add_job(func=check_api, trigger='interval', seconds=CHECK_INTERVAL)
scheduler.start()
try:
# Keep the script running
while True:
pass
except (KeyboardInterrupt, SystemExit):
scheduler.shutdown()
if __name__ == "__main__":
with daemon.DaemonContext():
main()
During 5 months monitoring the availability of the API service, we have obtained an availability of 96.7%. This value is above the target defined for this success metric (>95%).
- Time of recovery
In the event of an outage or service interruption due to a failure of the AWS infrastructure, it is the responsibility of our cloud provider to recover and get the service back up and running as soon as possible. We have found over the years using their services that this recovery time in the vast majority of cases is less than one hour, complying with the objective of this requirement.
However, we have adopted a recovery strategy in case the responsibility for the service interruption lies on our side. In case we need to restart the EC2 instance for any reason, the Docker Compose file we use as part of our device deployment pipeline has a restart
option that is set to always
. This means that when the Docker daemon is restarted it will take care of bringing back up all the microservices that comprise the medical device (including the API).
Since this process does not require any human intervention, the response time to a failure is very low and we have never exceeded the maximum time limit established for the recovery of the system.
- Response to incidents and detection rate
In order to quantify the incident response time and incident detection rate, we are compiling several metrics of interest in a table that will later allow us to carry out various kinds of data analysis. Some of these metrics come from CloudWatch, the AWS monitoring service, but most of them have been filled in manually, as they are custom application-level metrics.
Below is a sample of the database table where the indicated metrics are stored:
Issue ID | Issue Description | Reported Timestamp | Severity Level | Component Affected | Reporter | Resolution Started Timestamp | Detected | Resolved Timestamp | Resolution Status |
---|---|---|---|---|---|---|---|---|---|
1 | Timeout errors in user authentication | 2024-01-15 09:16 AM | High | Authentication | Automated Monitoring | 2024-01-15 09:40 AM | Yes | 2024-01-15 09:55 AM | Resolved |
18 | Database connection failure | 2024-01-16 11:20 AM | High | Database | User | 2024-01-16 11:34 PM | Yes | 2024-01-16 12:06 AM | Resolved |
91 | Slow API response for data retrieval | 2024-01-17 02:12 PM | Medium | API Endpoint | User | 2024-01-17 02:22 PM | Yes | 2024-01-17 02:41 PM | Resolved |
111 | Memory leak in server | 2024-01-18 10:02 AM | High | Virtual Machine | Automated Monitoring | 2024-01-18 10:30 AM | Yes | - | In progress |
125 | Inconsistent data output | 2024-01-19 01:52 PM | Low | Data Processing | User | 2024-01-19 02:07 PM | No | 2024-01-19 03:12 PM | Resolved |
176 | API gateway timeout | 2024-01-20 08:45 AM | Medium | Gateway | Automated Monitoring | 2024-01-20 08:59 AM | Yes | 2024-01-20 09:23 AM | Resolved |
189 | Incorrect user permissions set | 2024-01-21 03:33 PM | Low | User Management | User | 2024-01-21 04:14 PM | Yes | - | Pending |
201 | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ | β¦ |
The description of the metrics/columns shown in the table is detailed as follows:
- Issue ID: Code identifier of the incident.
- Issue Description: Brief description of the incident.
- Reported Timestamp: When the issue was first identified.
- Severity Level: To indicate the urgency or impact of the issue (Low, Medium, High, etc.).
- Component Affected: Specifies which part of the system is affected.
- Reporter: Who identified the issue (User, Automated Monitoring, etc.)
- Resolution Started Timestamp: When the resolution process began.
- Detected: Whether the incident was automatically detected or not.
- Resolved Timestamp: When the issue was resolved or marked as closed.
- Resolution Status: Indicates whether the issue has been resolved, is in progress, or is pending.
After performing an analysis with the metrics collected over a period of 5 months, we have found that the average response time to an incident is around 26.5 minutes, while the incident detection rate is 95.2%.
- System resource utilization
Our medical device runs on an EC2 instance with at least one GPU. AWS CloudWatch, by default, monitors CPU and network usage metrics, providing data like CPU utilization and network traffic. However, for RAM and GPU usage monitoring, it is required to install the CloudWatch Agent in the EC2 instance, which extends CloudWatch's capabilities to include detailed memory and GPU performance metrics.
To facilitate visualization and decision making, we have created a dashboard in CloudWatch where these metrics are represented along with other metrics such as storage disk I/O operations. The dashboard looks like this:
This dashboard is checked at least once every three days and, except for some very exceptional peak usage, the average resource consumption on the machine where the medical device is deployed remains within optimal ranges. Under normal operating conditions, it does not exceed 15-20% of the available CPU, and 40-50% in the case of RAM.
- User experience is not impacted when sending multiple images
This test has been conducted for the diagnosis_support
endpoint of the Web API receiver microservice, as currently it is the only service that accepts multiple images.
The experiment setup was as follows:
- Max. number of images: 5
- Image format: JPEG
- Image size: 600 x 600
- Number of executions: 10
- Negative impact threshold: >30 secs
The procedure consisted of sending several requests with different batches of images (1, 2, 3, 4 and 5), and measuring the API response time for each batch. Moreover, the request with the same payload is replayed as many times as executions have been set for the experiment. On the other hand, a maximum of 5 images has been chosen because it is the upper limit that the web API allows the user to send, so that the system is not overwhelmed.
After setting up the experiment, the script that automates it has been run and the following graph has been obtained:
This graph proves that the success metric of the user experience not being negatively affected when sending multiple images is met, considering as negative experience not only a response time longer than 5 seconds but also an exponential (or nearly exponential) increase of the response time with the number of images. However, we observe that the response time changes in an apparently linear fashion with a growth rate of just over 0.5 seconds per additional image.
Finally, we have verified the requirements concerning the cybersecurity of our device.
- Penetration testing
We have conducted a cybersecurity evaluation through rigorous penetration testing to assess the resilience of our device against potential cyber threats. This penetration test simulated a range of sophisticated cyber-attack scenarios, meticulously probing for vulnerabilities in our system.
Detailed information on the penetration test execution and its configuration is given below:
- Device version: 2.0.0
- Date: January 24th, 2024
- Tester: The penetration testing was executed by http://Intruder.io , a well-regarded cybersecurity company (ISO 27001 and SOC 2 compliant) specializing in comprehensive penetration tests and security assessments that follow OWASP guidelines. For more details on their services, please visit Intruder.io.
- Environment: The testing was conducted in a staging environment that closely mirrors our production setup. This includes identical network configurations, similar hardware, and software stacks, ensuring realistic testing conditions without affecting live operations.
- Target(s): https://medical-device.legit.health
- Scan time: This scan ran from 2024-01-24 00:00:54 UTC to 2024-01-24 00:33:58 UTC
The image below comes from the report issued by the API penetration testing service. It provides detailed insights on the simulated scenarios, the types of checks that have been performed, and the issues discovered.
The penetration test has underscored the strength of our system's defenses, demonstrating notable robustness and resilience against a variety of simulated cyber-attacks. Our proactive approach to security and continuous improvement efforts have ensured that our infrastructure and applications are safeguarded against numerous threats. However, in the spirit of continuous improvement and vigilance, the test did identify two areas for enhancement:
- MongoDB Database Exposed to the Internet: A MongoDB database, which is usually intended only to be accessible on local networks (i.e. not exposed to the public) was discovered to be accessible over the internet. Databases are designed to be repositories for business information and should never be directly exposed to the internet. Exposing this database to the internet increases our risk as an organization in three ways, as an attacker could:
- Attempt to use default, common or stolen credentials to login.
- Use privately owned/publicly unknown vulnerabilities to compromise the database.
- Exploit newly released vulnerabilities before we have time to patch the system.
We took immediate corrective measures to restrict external access and implemented stringent security controls, including a firewall policy to only permit access to allowed IP addresses, and advanced authentication protocols to secure the database.
- Strict Transport Security HTTP Header Not Set: The absence of the HTTP Strict Transport Security (HSTS) header was noted in the server response. The HTTP Strict Transport Security policy defines a timeframe within which a browser must connect to the server via HTTPS. The header adds additional protection against MitM (Man-in-the-Middle) attacks by instructing the user's web browser not to connect to the server unless it is done so over HTTPS with a valid certificate. This helps prevent an attacker in a MitM position from tricking the user into connecting to an attacker controlled server which is impersonating the targeted site.
Prompt action was taken to configure our server to include the HSTS header, thus strengthening the security of communications between our clients and servers.
Signature meaning
The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix
of the GP-001
, are:
- Tester: JD-017, JD-009, JD-004
- Approver: JD-005