Skip to main content
QMSQMS
QMS
  • Welcome to your QMS
  • Quality Manual
  • Procedures
    • GP-001 Control of documents
    • GP-002 Quality planning
    • GP-003 Audits
    • GP-004 Vigilance system
    • GP-005 Human Resources and Training
    • GP-006 Non-conformity, Corrective and Preventive actions
    • GP-007 Post-market surveillance
    • GP-008 Product requirements
    • GP-009 Sales
    • GP-010 Purchases and suppliers evaluation
    • GP-011 Provision of service
    • GP-012 Design, redesign and development
    • GP-013 Risk management
    • GP-014 Feedback and complaints
    • GP-015 Clinical evaluation
    • GP-016 Traceability and identification
    • GP-017 Technical assistance service
    • GP-018 Infrastructure and facilities
    • GP-019 Software validation plan
    • GP-020 QMS Data analysis
    • GP-021 Communications
    • GP-022 Document translation
    • GP-023 Change control management
    • GP-024 Predetermined Change Control Plan
    • GP-025 Usability and Human Factors Engineering
    • GP-027 Corporate Governance
    • GP-028 AI Development
    • GP-029 Software Delivery And Comissioning
    • GP-050 Data Protection
    • GP-051 Security violations
    • GP-052 Data Privacy Impact Assessment (DPIA)
    • GP-100 Business Continuity (BCP) and Disaster Recovery plans (DRP)
    • GP-101 Information security
    • GP-200 Remote Data Acquisition in Clinical Investigations
    • GP-026 Market-specific product requirements
    • GP-110 Esquema Nacional de Seguridad
      • ORG Marco organizativo
      • OP Marco operacional
        • OP.PL Planificación
          • OP.PL.1 Análisis de Riesgos
          • OP.PL.2 Arquitectura de Seguridad
          • OP.PL.3 Adquisición de nuevos componentes
          • OP.PL.4 Dimensionamiento y Gestión de Capacidades
          • OP.PL.5 Componentes Certificados
        • OP.ACC Control de acceso
        • OP.EXP Explotación
        • OP.EXT Servicios externos
        • OP.NUB Servicios en la nube
        • OP.CONT Continuidad del servicio
        • OP.MON Monitorización del sistema
      • MP Medidas de protección
      • Sin asignar
      • Real Decreto 311/2022, de 3 de mayo, por el que se regula el Esquema Nacional de Seguridad.
  • Records
  • Legit.Health Plus Version 1.1.0.0
  • Legit.Health Plus Version 1.1.0.1
  • Licenses and accreditations
  • Applicable Standards and Regulations
  • Public tenders
  • Procedures
  • GP-110 Esquema Nacional de Seguridad
  • OP Marco operacional
  • OP.PL Planificación
  • OP.PL.4 Dimensionamiento y Gestión de Capacidades

OP.PL.4 Dimensionamiento y Gestión de Capacidades

Documentos de referencia​

  • ISO/IEC 27000
    • 27002:2013
      • 12.1.3 - Gestión de capacidades
  • NIST SP 800-53 rev.4
    • [SA-2] Allocation of Resources
    • [AU-4] Audit Storage Capacity

Definiciones​

  • EAL. Evaluation Assurance Level. Niveles de confianza en la evaluación.
  • TOE. Target of Evaluation. Objetivo de evaluación.

Guía de implantación​

  1. Conviene destacar que esta medida de seguridad no es meramente técnica, sino que tiene implicaciones presupuestarias y por ello debe gestionarse con tiempo para que las necesidades queden debidamente recogidas en los presupuestos. Si en todas las medidas de seguridad hay que huir de la improvisación, en esta con mayor razón.

  2. Nótese que en entornos flexibles como es el empleo de recursos en la nube, el dimensionado efectivo del sistema puede ser dinámico, adecuándose a las necesidades del servicio.

Implementación en Legit Health Plus​

1. Marco de Dimensionamiento y Gestión de Capacidades​

1.1 Estrategia de Capacity Management​

Legit Health Plus implementa un enfoque proactivo de gestión de capacidades que abarca:

  • Planificación de capacidad basada en proyecciones de crecimiento
  • Monitorización continua de recursos y rendimiento
  • Escalado automático en infraestructura cloud
  • Optimización de costes mediante rightsizing
  • Gestión de picos de demanda estacionales o excepcionales
  • Planificación presupuestaria a medio y largo plazo

1.2 Modelo de Capacidad Multinivel​

La gestión de capacidades se estructura en múltiples niveles:

2. Dimensionamiento por Componentes​

2.1 Infraestructura de Aplicación​

Servicios Core - Dimensionamiento Actual:

ComponenteConfiguración ActualCapacidad MáximaUtilización Target
API Gateway4x ECS Tasks (2 vCPU, 4GB)1000 req/sec70% CPU
ML Inference2x GPU instances (g4dn.xlarge)100 concurrent80% GPU
Image ProcessingAuto-scaling 2-10 tasks500 images/min75% CPU
Auth Service2x ECS Tasks (1 vCPU, 2GB)200 auth/sec60% CPU
Background Jobs3x ECS Tasks (1 vCPU, 2GB)1000 jobs/hour70% CPU

2.2 Infraestructura de Datos​

Almacenamiento - Capacidad Planificada:

SistemaConfiguraciónCapacidad ActualProyección 12MCrecimiento Anual
DocumentDB3-node cluster (r6g.large)1TB5TB400%
S3 StorageStandard + IA tiers50TB200TB300%
Redis Cache2x r6g.large cluster32GB128GB300%
CloudWatch Logs30-day retention500GB/month2TB/month300%
Backup StorageS3 Glacier Deep Archive100TB500TB400%

2.3 Red y Conectividad​

Ancho de Banda Planificado:

Network Capacity Planning:
Internet Gateway:
current: 10 Gbps
projected: 50 Gbps
bottlenecks: Image upload bursts

Inter-AZ Traffic:
current: 5 Gbps
projected: 20 Gbps
pattern: DB replication, cross-AZ failover

VPN Connections:
current: 2x 1 Gbps
projected: 4x 10 Gbps
usage: Healthcare provider integrations

CDN (CloudFront):
current: Unlimited
cost_optimization: Regional caching strategy

3. Modelos de Demanda y Proyecciones​

3.1 Patrones de Uso Identificados​

Análisis de Demanda Histórica:

MétricaQ1 2024Q2 2024Q3 2024Q4 2024Proyección Q1 2025
Usuarios Activos5,0008,00012,00018,00027,000
Imágenes/Día10,00016,00024,00036,00054,000
API Calls/Día100K160K240K360K540K
Storage (TB)1525385585
Concurrent Users2003204807201,080

3.2 Factores de Crecimiento​

Drivers de Demanda:

  • Expansión geográfica: +200% usuarios por nueva región
  • Nuevas especialidades médicas: +50% por especialidad
  • Integraciones con HIS/EHR: +300% API calls por integración
  • Mejoras en algoritmos: +25% precisión → +40% retención
  • Programas de screening: Picos estacionales +500%

3.3 Modelado Predictivo​

Modelos de Forecasting:

# Modelo de predicción de capacidad
capacity_model = {
'base_growth': 0.15, # 15% monthly growth
'seasonal_factor': {
'Q1': 1.2, # Peak screening season
'Q2': 0.9, # Low season
'Q3': 1.0, # Normal
'Q4': 1.1 # Conference season
},
'expansion_multipliers': {
'new_region': 2.0,
'new_specialty': 1.5,
'enterprise_client': 3.0
},
'confidence_intervals': {
'p50': 'base_forecast',
'p80': 'base_forecast * 1.4',
'p95': 'base_forecast * 1.8'
}
}

4. Auto-scaling y Elasticidad​

4.1 Políticas de Auto-scaling​

ECS Auto-scaling Configuration:

api_service_scaling:
target_tracking:
cpu_utilization: 70%
memory_utilization: 75%
step_scaling:
scale_out:
- metric: RequestCount > 500/min
scaling: +2 tasks
- metric: RequestCount > 1000/min
scaling: +4 tasks
scale_in:
- metric: RequestCount < 200/min
scaling: -1 task (min 2)

ml_inference_scaling:
scheduled_scaling:
business_hours:
min_capacity: 4 instances
max_capacity: 20 instances
off_hours:
min_capacity: 2 instances
max_capacity: 10 instances

predictive_scaling:
enable: true
forecast_period: 7 days
buffer: 20%

4.2 Database Auto-scaling​

DocumentDB Scaling Strategy:

MétricaThresholdAcción
CPU > 80%5 min sustainedAdd read replica
Connections > 90%2 min sustainedScale up instance class
Storage > 85%Alert onlyManual review required
Network I/O > 80%10 min sustainedConsider sharding

4.3 Storage Tiering Automático​

S3 Lifecycle Policies:

storage_lifecycle:
medical_images:
- transition: Standard → IA (30 days)
- transition: IA → Glacier (90 days)
- transition: Glacier → Deep Archive (365 days)
- expiration: Never (regulatory retention)

application_logs:
- transition: Standard → IA (7 days)
- transition: IA → Glacier (30 days)
- expiration: 2555 days (7 years regulatory)

backup_data:
- immediate: Glacier Deep Archive
- expiration: 2920 days (8 years)

5. Monitorización y Alerting​

5.1 Métricas de Capacidad Críticas​

Dashboard Principal - KPIs de Capacidad:

MétricaSLAWarningCriticalFrecuencia
API Response Time< 500ms p95> 400ms> 800ms1 min
Image Processing Time< 30s p95> 25s> 45s1 min
Database CPU< 70% avg> 60%> 85%5 min
Storage Usage< 80%> 70%> 90%15 min
Concurrent UsersN/A> 80% capacity> 95% capacity1 min
Error Rate< 0.1%> 0.05%> 0.2%1 min

5.2 Alerting Automático​

Alert Routing Matrix:

alerts:
capacity_warning:
recipients: [devops, sre-team]
escalation_time: 30min
channels: [slack, email]

capacity_critical:
recipients: [oncall, cto, devops]
escalation_time: 5min
channels: [pagerduty, phone, slack]
auto_actions: [trigger_scaling, create_incident]

budget_alerts:
recipients: [finops, cto]
thresholds: [50%, 80%, 95%, 100%]
frequency: daily

5.3 Observabilidad Avanzada​

Stack de Monitorización:

HerramientaFunciónMétricas Clave
CloudWatchAWS metrics, logsInfrastructure, application metrics
DataDogAPM, syntheticsEnd-to-end latency, user experience
GrafanaDashboardsBusiness metrics, capacity trends
PrometheusCustom metricsApplication-specific KPIs
ELK StackLog analysisError patterns, usage analytics

6. Optimización de Costes​

6.1 FinOps - Gestión Financiera de Cloud​

Estructura de Costes Actual (Mensual):

CategoríaCoste% TotalOptimización Identificada
Compute (ECS/EC2)$15,00035%Reserved Instances: -25%
Storage (S3/EBS)$8,00018%Lifecycle policies: -30%
Database$12,00028%Rightsizing: -20%
Network/CDN$5,00012%Caching optimization: -15%
Monitoring/Logs$3,0007%Retention policies: -25%
Total$43,000100%Potential savings: -23%

6.2 Strategies de Optimización​

Cost Optimization Roadmap:

Q1_2025:
- implement: Reserved Instance strategy
savings: $3,750/month
- implement: S3 lifecycle policies
savings: $2,400/month

Q2_2025:
- implement: Database rightsizing
savings: $2,400/month
- implement: Log retention optimization
savings: $750/month

Q3_2025:
- implement: Spot instances for batch workloads
savings: $1,500/month
- implement: CDN optimization
savings: $750/month

Annual_Savings: $139,800
ROI_on_optimization: 340%

7. Capacity Planning - Presupuestos​

7.1 Proyecciones Presupuestarias​

Budget Planning FY2025:

TrimestreUsuarios ProyectadosCoste InfraestructuraCrecimiento
Q130,000$52,000/mes+21%
Q240,000$68,000/mes+31%
Q355,000$89,000/mes+31%
Q475,000$118,000/mes+33%
Total FY-$1,308,000/año+180%

7.2 Contingency Planning​

Escenarios de Capacidad:

EscenarioProbabilidadImpacto CosteContingencia
Crecimiento Base60%Budget baselinePlanned scaling
Crecimiento Acelerado25%+40% budgetEmergency scaling fund
Pandemia/Screening masivo10%+200% capacitySpot instances + CDN boost
Recession/Slow growth5%-30% demandScale-down automation

8. Gestión de Recursos Especializados​

8.1 GPU Computing para ML​

Configuración GPU Clusters:

ml_gpu_capacity:
training_cluster:
instances: 4x p3.8xlarge (V100)
usage_pattern: Batch training jobs
cost_optimization: Spot instances (70% savings)

inference_cluster:
instances: 6x g4dn.xlarge (T4)
usage_pattern: Real-time inference
scaling: Auto-scale 2-20 instances

development:
instances: 2x g4dn.large
usage_pattern: Model development
scheduling: Shared resource pool

8.2 Specialized Storage Requirements​

Medical Imaging Storage:

TipoRendimientoCapacidadCoste/TB/mesUse Case
EFS (General)100 MB/s1TB$300Shared model artifacts
S3 StandardN/A50TB$23Active image processing
S3 IAN/A150TB$12.50Recent images (30-90 days)
S3 GlacierMinutes500TB$4Archive (90+ days)
S3 Deep ArchiveHours2PB$1Long-term regulatory

9. Business Continuity y Disaster Recovery​

9.1 RTO/RPO Requirements​

Recovery Objectives por Servicio:

ServicioRTORPODR Strategy
API Core15 min1 minMulti-AZ + auto-failover
ML Inference30 min5 minMulti-region model deployment
Database1 hour15 minCross-region read replica
File Storage4 hours1 hourCross-region replication
Monitoring5 minReal-timeMulti-region deployment

9.2 Capacity for DR​

DR Infrastructure Sizing:

disaster_recovery:
primary_region: eu-west-1 (100% capacity)
dr_region: eu-central-1 (warm standby)

dr_capacity_allocation:
compute: 50% of primary (scale on demand)
storage: 100% (continuous replication)
network: 100% (redundant connections)

failover_scenarios:
planned_maintenance: Zero downtime
availability_zone_failure: < 15min RTO
region_failure: < 4hour RTO

cost_impact: +35% infrastructure cost

10. Compliance y Auditoría​

10.1 Capacity Management Audit Trail​

Auditoría de Decisiones de Capacidad:

audit_requirements:
capacity_changes:
approval_required: > 25% capacity change
documentation: Business justification + technical assessment
retention: 7 years

budget_variances:
threshold: +/- 15% monthly budget
escalation: CFO + CTO notification
review_cycle: Monthly

performance_slas:
monitoring: Continuous
reporting: Monthly SLA reports
compliance: 99%+ SLA adherence required

10.2 Regulatory Compliance para Healthcare​

Medical Device Capacity Requirements:

RegulaciónRequirementImplementation
FDA QSRChange control for capacityFormal approval process
EU MDRPerformance monitoringContinuous metrics collection
HIPAAAvailability requirements99.9% uptime SLA
GDPRData processing capacityPrivacy by design scaling

11. Automatización y Tooling​

11.1 Infrastructure as Code​

Capacity Automation Stack:

automation_tools:
infrastructure:
- terraform: Infrastructure provisioning
- ansible: Configuration management
- helm: Kubernetes deployments

monitoring:
- prometheus: Metrics collection
- grafana: Visualization
- alertmanager: Alert routing

optimization:
- aws_cost_explorer: Cost analysis
- rightsizing_recommendations: AWS Compute Optimizer
- custom_scripts: Cleanup automation

11.2 Self-Healing Infrastructure​

Automated Remediation:

# Ejemplo de auto-remediation
def handle_capacity_alert(alert_type, metrics):
if alert_type == "high_cpu":
if can_scale_horizontally():
trigger_auto_scaling()
else:
create_incident("Manual intervention required")

elif alert_type == "storage_full":
if is_log_storage():
cleanup_old_logs()
else:
expand_storage_tier()

elif alert_type == "memory_pressure":
restart_memory_intensive_services()
if not improved():
scale_up_instance_size()

12. Métricas y KPIs​

12.1 Operational Excellence KPIs​

MétricaTargetQ4 2024Trend
Capacity Utilization70-80%74%✅
Cost per Transaction< $0.02$0.018↓
Auto-scaling Events< 50/day32/day✅
Capacity Planning Accuracy±10%±8%✅
Time to Scale< 5 min3.2 min✅
Budget Variance±5%+3%✅

12.2 Business Impact Metrics​

Business KPITargetCurrentImpact
Revenue per GB$0.15$0.18↑ 20%
User Satisfaction> 4.5/54.7/5↑
Time to Market< 2 weeks10 days↑ 30%
Cost of Downtime< $1K/hour$0/hour✅

13. Roadmap de Evolución​

13.1 Capacity Management Maturity​

Niveles de Madurez:

  1. Reactive (Actual): ✅ Monitoring, alerting, manual scaling
  2. Proactive (Q1 2025): 🔄 Predictive scaling, cost optimization
  3. Predictive (Q3 2025): 📋 ML-based forecasting, automated optimization
  4. Autonomous (2026): 📋 Self-managing infrastructure, AI-driven decisions

13.2 Technology Evolution​

Emerging Technologies:

technology_roadmap:
2025:
- serverless_ml: AWS Lambda + SageMaker inference
- spot_fleet: 80% cost savings on batch workloads
- graviton_processors: 20% better price/performance

2026:
- kubernetes_adoption: EKS for better resource utilization
- service_mesh: Istio for traffic management
- chaos_engineering: Automated resilience testing

2027:
- quantum_ready: Quantum-safe cryptography planning
- edge_computing: Regional inference deployment
- ai_ops: Fully automated operations

Anexo A: Capacity Sizing Calculator​

class CapacityCalculator:
def __init__(self, growth_rate=0.15, seasonal_factor=1.0):
self.growth_rate = growth_rate
self.seasonal_factor = seasonal_factor

def project_capacity(self, current_usage, months_ahead):
base_projection = current_usage * (1 + self.growth_rate) ** months_ahead
return base_projection * self.seasonal_factor

def calculate_cost(self, capacity_requirements):
cost_model = {
'compute': capacity_requirements['cpu_hours'] * 0.05,
'storage': capacity_requirements['gb_storage'] * 0.023,
'network': capacity_requirements['gb_transfer'] * 0.09,
'ml_inference': capacity_requirements['ml_requests'] * 0.001
}
return sum(cost_model.values())

Anexo B: Emergency Scaling Runbook​

Procedimiento de Escalado de Emergencia:

  1. Detección: Alert automática o manual
  2. Evaluación: Determinar scope y urgencia (< 5 min)
  3. Autorización: Auto-aprobada si < 200% capacity
  4. Ejecución: Terraform apply + monitoring
  5. Validación: Verificar métricas en 15 min
  6. Comunicación: Stakeholder notification
  7. Post-mortem: Análisis de causa raíz en 48h

Signature meaning

The signatures for the approval process of this document can be found in the verified commits at the repository for the QMS. As a reference, the team members who are expected to participate in this document and their roles in the approval process, as defined in Annex I Responsibility Matrix of the GP-001, are:

  • Author: Team members involved
  • Reviewer: JD-003, JD-004
  • Approver: JD-001
Previous
OP.PL.3 Adquisición de nuevos componentes
Next
OP.PL.5 Componentes Certificados
  • Documentos de referencia
  • Definiciones
  • Guía de implantación
  • Implementación en Legit Health Plus
    • 1. Marco de Dimensionamiento y Gestión de Capacidades
      • 1.1 Estrategia de Capacity Management
      • 1.2 Modelo de Capacidad Multinivel
    • 2. Dimensionamiento por Componentes
      • 2.1 Infraestructura de Aplicación
      • 2.2 Infraestructura de Datos
      • 2.3 Red y Conectividad
    • 3. Modelos de Demanda y Proyecciones
      • 3.1 Patrones de Uso Identificados
      • 3.2 Factores de Crecimiento
      • 3.3 Modelado Predictivo
    • 4. Auto-scaling y Elasticidad
      • 4.1 Políticas de Auto-scaling
      • 4.2 Database Auto-scaling
      • 4.3 Storage Tiering Automático
    • 5. Monitorización y Alerting
      • 5.1 Métricas de Capacidad Críticas
      • 5.2 Alerting Automático
      • 5.3 Observabilidad Avanzada
    • 6. Optimización de Costes
      • 6.1 FinOps - Gestión Financiera de Cloud
      • 6.2 Strategies de Optimización
    • 7. Capacity Planning - Presupuestos
      • 7.1 Proyecciones Presupuestarias
      • 7.2 Contingency Planning
    • 8. Gestión de Recursos Especializados
      • 8.1 GPU Computing para ML
      • 8.2 Specialized Storage Requirements
    • 9. Business Continuity y Disaster Recovery
      • 9.1 RTO/RPO Requirements
      • 9.2 Capacity for DR
    • 10. Compliance y Auditoría
      • 10.1 Capacity Management Audit Trail
      • 10.2 Regulatory Compliance para Healthcare
    • 11. Automatización y Tooling
      • 11.1 Infrastructure as Code
      • 11.2 Self-Healing Infrastructure
    • 12. Métricas y KPIs
      • 12.1 Operational Excellence KPIs
      • 12.2 Business Impact Metrics
    • 13. Roadmap de Evolución
      • 13.1 Capacity Management Maturity
      • 13.2 Technology Evolution
    • Anexo A: Capacity Sizing Calculator
    • Anexo B: Emergency Scaling Runbook
All the information contained in this QMS is confidential. The recipient agrees not to transmit or reproduce the information, neither by himself nor by third parties, through whichever means, without obtaining the prior written permission of Legit.Health (AI LABS GROUP S.L.)