Back to Blog
    Kubernetes
    DevOps
    Production
    Best Practices

    Kubernetes Best Practices for Production Deployments

    August 6, 2025
    15 min read

    Kubernetes Best Practices for Production Deployments

    Running applications in production on Kubernetes is challenging, but following proven best practices can make the difference between a reliable system and a maintenance nightmare. This guide covers the essential strategies that successful teams use to deploy and maintain production workloads.

    What You'll Learn

    By the end of this guide, you'll understand:

    • How to design resilient deployments that handle failures gracefully
    • Security practices that protect your applications from common threats
    • Monitoring strategies that give you visibility into your system's health
    • Scaling patterns that keep your applications performant under load
    • Backup and recovery procedures that protect your data

    1. Resource Management: The Foundation of Stability

    Why Resource Management Matters

    Resource management is the cornerstone of stable Kubernetes deployments. Without proper resource allocation, your applications can experience:

    • Resource starvation when one pod consumes all available CPU/memory
    • OOM kills when containers exceed memory limits
    • Poor performance due to CPU throttling
    • Unpredictable scaling behavior

    Best Practice: Always Set Resource Limits

    resources.yaml
    1apiVersion: apps/v1
    2kind: Deployment
    3metadata:
    4  name: production-app
    5spec:
    6  replicas: 3
    7  selector:
    8    matchLabels:
    9      app: production-app
    10  template:
    11    metadata:
    12      labels:
    13        app: production-app
    14    spec:
    15      containers:
    16      - name: app
    17        image: production-app:v1.2.0
    18        resources:
    19          requests:
    20            memory: "512Mi"
    21            cpu: "250m"
    22          limits:
    23            memory: "1Gi"
    24            cpu: "500m"
    25        livenessProbe:
    26          httpGet:
    27            path: /health
    28            port: 8080
    29          initialDelaySeconds: 30
    30          periodSeconds: 10
    31          timeoutSeconds: 5
    32          failureThreshold: 3
    33        readinessProbe:
    34          httpGet:
    35            path: /ready
    36            port: 8080
    37          initialDelaySeconds: 5
    38          periodSeconds: 5
    39          timeoutSeconds: 3
    40          failureThreshold: 3

    Key Points:

    • Requests: Guaranteed resources your pod will receive
    • Limits: Maximum resources your pod can use
    • Health Checks: Essential for Kubernetes to know when pods are healthy
    • Realistic Values: Base limits on actual usage patterns, not guesses

    Pro Tip: Use Resource Quotas

    resource-quota.yaml
    1apiVersion: v1
    2kind: ResourceQuota
    3metadata:
    4  name: production-quota
    5spec:
    6  hard:
    7    requests.cpu: "4"
    8    requests.memory: 8Gi
    9    limits.cpu: "8"
    10    limits.memory: 16Gi
    11    pods: "20"

    2. Security: Protect Your Applications

    The Security-First Approach

    Security in Kubernetes requires careful attention. The principle of least privilege should guide every security decision.

    Best Practice: Run as Non-Root

    run-as-non-root.yaml
    1apiVersion: apps/v1
    2kind: Deployment
    3metadata:
    4  name: secure-app
    5spec:
    6  template:
    7    spec:
    8      securityContext:
    9        runAsNonRoot: true
    10        runAsUser: 1000
    11        fsGroup: 2000
    12      containers:
    13      - name: app
    14        image: secure-app:latest
    15        securityContext:
    16          allowPrivilegeEscalation: false
    17          readOnlyRootFilesystem: true
    18          capabilities:
    19            drop:
    20            - ALL
    21        volumeMounts:
    22        - name: tmp
    23          mountPath: /tmp
    24        - name: varlog
    25          mountPath: /var/log
    26        - name: app-config
    27          mountPath: /app/config
    28          readOnly: true
    29      volumes:
    30      - name: tmp
    31        emptyDir: {}
    32      - name: varlog
    33        emptyDir: {}
    34      - name: app-config
    35        configMap:
    36          name: app-config

    Security Benefits:

    • Non-root execution prevents privilege escalation attacks
    • Read-only filesystem prevents malicious file modifications
    • Dropped capabilities remove unnecessary privileges
    • ConfigMap mounting keeps configuration separate and secure

    Pro Tip: Use Network Policies

    network-policy.yaml
    1apiVersion: networking.k8s.io/v1
    2kind: NetworkPolicy
    3metadata:
    4  name: app-network-policy
    5spec:
    6  podSelector:
    7    matchLabels:
    8      app: production-app
    9  policyTypes:
    10  - Ingress
    11  - Egress
    12  ingress:
    13  - from:
    14    - namespaceSelector:
    15        matchLabels:
    16          name: frontend
    17    ports:
    18    - protocol: TCP
    19      port: 8080
    20  egress:
    21  - to:
    22    - namespaceSelector:
    23        matchLabels:
    24          name: database
    25    ports:
    26    - protocol: TCP
    27      port: 5432

    3. Monitoring and Observability: Know Your System

    The Three Pillars of Observability

    1. Metrics: Quantitative data about your system's performance
    2. Logs: Detailed records of events and errors
    3. Traces: Request flow through your distributed system

    Best Practice: Comprehensive Monitoring Setup

    prometheus-config.yaml
    1apiVersion: v1
    2kind: ConfigMap
    3metadata:
    4  name: prometheus-config
    5data:
    6  prometheus.yml: |
    7    global:
    8      scrape_interval: 15s
    9      evaluation_interval: 15s
    10    
    11    rule_files:
    12      - "alert_rules.yml"
    13    
    14    scrape_configs:
    15    - job_name: 'kubernetes-pods'
    16      kubernetes_sd_configs:
    17      - role: pod
    18      relabel_configs:
    19      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    20        action: keep
    21        regex: true
    22      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    23        action: replace
    24        target_label: __metrics_path__
    25        regex: (.+)
    26      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    27        action: replace
    28        regex: ([^:]+)(?::\d+)?;(\d+)
    29        replacement: $1:$2
    30        target_label: __address__

    Essential Metrics to Monitor

    alert-rules.yaml
    1# Example alerting rules
    2groups:
    3- name: kubernetes.rules
    4  rules:
    5  - alert: HighCPUUsage
    6    expr: container_cpu_usage_seconds_total{container!=""} > 0.8
    7    for: 5m
    8    labels:
    9      severity: warning
    10    annotations:
    11      summary: "High CPU usage detected"
    12      description: "Container {{ $labels.container }} is using {{ $value }} CPU"
    13
    14  - alert: HighMemoryUsage
    15    expr: container_memory_usage_bytes{container!=""} / container_spec_memory_limit_bytes{container!=""} > 0.85
    16    for: 5m
    17    labels:
    18      severity: warning
    19    annotations:
    20      summary: "High memory usage detected"
    21      description: "Container {{ $labels.container }} is using {{ $value | humanizePercentage }} memory"

    4. Scaling Strategies: Handle Traffic Spikes

    Horizontal vs Vertical Scaling

    Horizontal scaling (adding more pods) is generally preferred in Kubernetes because it's more resilient and can handle traffic spikes better.

    Best Practice: Implement HPA with Multiple Metrics

    hpa-config.yaml
    1apiVersion: autoscaling/v2
    2kind: HorizontalPodAutoscaler
    3metadata:
    4  name: production-app-hpa
    5spec:
    6  scaleTargetRef:
    7    apiVersion: apps/v1
    8    kind: Deployment
    9    name: production-app
    10  minReplicas: 3
    11  maxReplicas: 20
    12  metrics:
    13  - type: Resource
    14    resource:
    15      name: cpu
    16      target:
    17        type: Utilization
    18        averageUtilization: 70
    19  - type: Resource
    20    resource:
    21      name: memory
    22      target:
    23        type: Utilization
    24        averageUtilization: 80
    25  - type: Object
    26    object:
    27      metric:
    28        name: requests-per-second
    29      describedObject:
    30        apiVersion: networking.k8s.io/v1
    31        kind: Ingress
    32        name: production-app-ingress
    33      target:
    34        type: Value
    35        value: 1000
    36  behavior:
    37    scaleDown:
    38      stabilizationWindowSeconds: 300
    39      policies:
    40      - type: Percent
    41        value: 10
    42        periodSeconds: 60
    43    scaleUp:
    44      stabilizationWindowSeconds: 60
    45      policies:
    46      - type: Percent
    47        value: 100
    48        periodSeconds: 15

    Key Features:

    • Multiple metrics: CPU, memory, and custom metrics
    • Stabilization windows: Prevent rapid scaling oscillations
    • Conservative scale-down: Avoid scaling down too aggressively
    • Aggressive scale-up: Respond quickly to traffic spikes

    Pro Tip: Use VPA for Vertical Scaling

    vpa-config.yaml
    1apiVersion: autoscaling.k8s.io/v1
    2kind: VerticalPodAutoscaler
    3metadata:
    4  name: production-app-vpa
    5spec:
    6  targetRef:
    7    apiVersion: apps/v1
    8    kind: Deployment
    9    name: production-app
    10  updatePolicy:
    11    updateMode: "Off"  # Use "Auto" for automatic updates
    12  resourcePolicy:
    13    containerPolicies:
    14    - containerName: '*'
    15      minAllowed:
    16        cpu: 100m
    17        memory: 50Mi
    18      maxAllowed:
    19        cpu: 1
    20        memory: 500Mi
    21      controlledValues: RequestsAndLimits

    5. Backup and Disaster Recovery: Protect Your Data

    The 3-2-1 Backup Rule

    • 3 copies of your data
    • 2 different storage types
    • 1 off-site backup

    Best Practice: Automated Backup Strategy

    velero-backup.yaml
    1apiVersion: velero.io/v1
    2kind: Schedule
    3metadata:
    4  name: production-daily-backup
    5spec:
    6  schedule: "0 2 * * *"  # Daily at 2 AM
    7  template:
    8    includedNamespaces:
    9    - production
    10    includedResources:
    11    - deployments
    12    - services
    13    - configmaps
    14    - secrets
    15    - persistentvolumeclaims
    16    - persistentvolumes
    17    storageLocation: production-backup
    18    volumeSnapshotLocations:
    19    - production-snapshot
    20    ttl: 720h  # Keep backups for 30 days

    Backup Verification Strategy

    backup-verification.yaml
    1apiVersion: velero.io/v1
    2kind: Schedule
    3metadata:
    4  name: backup-verification
    5spec:
    6  schedule: "0 4 * * 0"  # Weekly on Sunday at 4 AM
    7  template:
    8    includedNamespaces:
    9    - backup-test
    10    includedResources:
    11    - deployments
    12    - services
    13    - configmaps
    14    - secrets
    15    - persistentvolumeclaims
    16    storageLocation: production-backup
    17    volumeSnapshotLocations:
    18    - production-snapshot
    19    ttl: 24h

    6. Deployment Strategies: Zero-Downtime Updates

    Rolling Updates with Health Checks

    rolling-update.yaml
    1apiVersion: apps/v1
    2kind: Deployment
    3metadata:
    4  name: production-app
    5spec:
    6  replicas: 5
    7  strategy:
    8    type: RollingUpdate
    9    rollingUpdate:
    10      maxSurge: 1
    11      maxUnavailable: 0
    12  template:
    13    spec:
    14      containers:
    15      - name: app
    16        image: production-app:v1.2.0
    17        readinessProbe:
    18          httpGet:
    19            path: /ready
    20            port: 8080
    21          initialDelaySeconds: 5
    22          periodSeconds: 5
    23          timeoutSeconds: 3
    24          failureThreshold: 3
    25        livenessProbe:
    26          httpGet:
    27            path: /health
    28            port: 8080
    29          initialDelaySeconds: 30
    30          periodSeconds: 10
    31          timeoutSeconds: 5
    32          failureThreshold: 3

    Benefits:

    • Zero downtime: New pods are ready before old ones are terminated
    • Rollback capability: Easy to revert to previous version
    • Health verification: Only healthy pods serve traffic

    Key Takeaways

    Immediate Actions You Can Take

    1. Set resource limits on all your containers today
    2. Implement health checks for every application
    3. Run containers as non-root users
    4. Set up basic monitoring with Prometheus
    5. Configure HPA for your critical applications
    6. Implement automated backups with Velero

    Long-term Strategy

    1. Gradually implement security policies
    2. Build comprehensive monitoring dashboards
    3. Test disaster recovery procedures regularly
    4. Optimize resource usage based on monitoring data
    5. Automate everything possible

    Most importantly, keep it simple! Overcomplicating your infrasturcutre will result in unimaginable growing pains.

    Remember: Production Kubernetes is a journey, not a destination. Start with these fundamentals and continuously improve based on your specific needs and challenges.


    Need help implementing these best practices? Join us on Slack

    Related Posts

    Deploy your favourite and essential homelab applications with Ankra in just a few clicks

    15 min read • 8/11/2025

    A practical guide to wiring an infrastructure agent into your CI: review comments on pull requests, deploy verification on merge, and Slack reports that contain an actual root cause instead of a red X.

    12 min read • 6/10/2026