Deployment & Infrastructure - 1/2

Oct 11, 2025

From Container Mastery to Production Infrastructure Reality

You’ve mastered advanced containerization with production-grade orchestration that handles automatic scaling and zero-downtime deployments, implemented comprehensive security hardening that prevents container escapes and vulnerability exploitation, optimized images for performance with techniques that reduce deployment time and resource consumption, and established enterprise registry management with authentication, cleanup policies, and operational monitoring. Your containerized applications now operate as production-grade infrastructure that scales, performs, and secures applications for enterprise environments. But here’s the infrastructure reality that separates hobby deployments from enterprise-grade systems: perfect containerization means nothing if your deployment infrastructure can’t handle real-world traffic, lacks cloud-native scalability, has no disaster recovery plan, and operates without proper monitoring and automation that can detect and resolve issues before customers notice them.

The production infrastructure nightmare that destroys scalable businesses:

# Your production infrastructure horror story
# CTO: "We need to handle 10x traffic for the product launch tomorrow"

# Attempt 1: Manual server scaling
$ ssh production-server-1
production$ htop
# CPU: 98%, Memory: 95%, Load: 15.2
# Single server melting under load

$ curl -I https://api.company.com/health
curl: (28) Operation timed out after 30001 milliseconds
# API completely unresponsive

# Attempt 2: Emergency server provisioning
$ aws ec2 run-instances --image-id ami-12345 --instance-type t3.large --count 5
# 20 minutes later...
$ ssh new-server-1
new-server$ sudo apt update && sudo apt install docker.io
# Another 15 minutes of manual setup per server
# No configuration management, everything installed by hand

# Attempt 3: Manual load balancer configuration
$ ssh load-balancer
lb$ sudo nano /etc/nginx/nginx.conf
# Frantically typing server IPs while customers can't access the site
# No health checking, traffic routing to failed servers

# Attempt 4: Database disaster
$ ssh database-server
db$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       20G   19G  100M  99% /
# Database disk full, transactions failing

# The cascading infrastructure disasters:
# - No auto-scaling, manual server provisioning takes hours
# - No infrastructure as code, every server manually configured
# - No monitoring, problems discovered by angry customers
# - No disaster recovery, single points of failure everywhere
# - No CI/CD pipelines, deployments via SSH and pray
# - No load balancing health checks, traffic to dead servers
# - No CDN, static assets served from overloaded origin servers
# - No backup strategy, data loss risk on every failure

# Launch day result: Complete system failure
# 8-hour outage during peak launch traffic
# 50,000 potential customers lost to competitors
# $5M funding round canceled due to "technical concerns"
# Engineering team blamed for "not being ready for scale"
# The painful truth: Perfect containers can't save amateur infrastructure

The uncomfortable production truth: Advanced containerization and orchestration can’t save you from infrastructure disasters when your deployment strategy lacks cloud-native architecture, automated scaling, proper monitoring, and disaster recovery planning. Professional infrastructure requires thinking beyond containers to the entire deployment ecosystem.

Real-world infrastructure failure consequences:

// What happens when infrastructure practices are amateur:
const infrastructureFailureImpact = {
  scalingDisasters: {
    problem: "Traffic spikes overwhelm manually managed infrastructure",
    cause: "No auto-scaling, manual provisioning, single-server dependencies",
    impact: "Website crashes during viral marketing campaign, customers lost",
    cost: "$2M in lost revenue during peak shopping season",
  },

  securityBreaches: {
    problem: "Compromised server leads to full infrastructure takeover",
    cause: "No infrastructure as code, inconsistent security policies",
    impact: "Attacker pivots through entire network, customer data stolen",
    consequences: "Class action lawsuit, regulatory fines, business closure",
  },

  operationalChaos: {
    problem: "Critical system failure at 3 AM with no monitoring alerts",
    cause: "No proper monitoring, alerting, or on-call procedures",
    impact: "6-hour outage discovered by customer complaints, not systems",
    reality: "Competitors with proper infrastructure capture market share",
  },

  disasterRecoveryFailure: {
    problem: "Data center failure causes complete data loss",
    cause: "No backup strategy, single-region deployment, no DR planning",
    impact: "Company closes permanently, all customer data lost forever",
    prevention: "Professional DR would cost 0.1% of revenue to implement",
  },

  // Perfect containerization is worthless when infrastructure
  // lacks scalability, monitoring, disaster recovery, and automation
};

Production infrastructure mastery requires understanding:

Deployment strategies that handle traffic gracefully with rolling updates, blue-green deployments, and canary releases
Cloud platforms architecture that leverages AWS, GCP, and Azure for scalable, resilient infrastructure
Server management and provisioning with Infrastructure as Code that eliminates manual configuration and ensures consistency
Load balancers and reverse proxies that distribute traffic intelligently and handle failures transparently
CDN and static asset delivery that optimizes performance globally and reduces origin server load

This article transforms your infrastructure from manual, error-prone processes into automated, scalable systems that handle real-world production demands with confidence.

Deployment Strategies: Beyond “SSH and Pray”

The Evolution from Manual Deployments to Professional Strategies

Understanding why manual deployments are career-limiting:

// Manual deployment vs Professional deployment strategies
const deploymentEvolution = {
  manualDeployment: {
    process: "SSH into servers and run commands manually",
    risk: "Human error, inconsistent deployments, downtime",
    scalability: "Doesn't scale beyond 2-3 servers",
    rollback: "Pray you have backups, usually doesn't work",
    testing: "Test in production, debug in front of customers",
    timeline: "Hours of downtime for simple changes",
    stressLevel: "Emergency room levels of blood pressure",
  },

  professionalDeployment: {
    process: "Automated, repeatable, tested deployment pipelines",
    risk: "Minimal risk with automated testing and rollbacks",
    scalability: "Handles thousands of servers automatically",
    rollback: "One-click rollback to any previous version",
    testing: "Comprehensive testing before production",
    timeline: "Zero-downtime deployments in minutes",
    stressLevel: "Relaxed coffee sipping while systems deploy",
  },

  theDeploymentGap: [
    "Manual processes don't scale beyond tiny teams",
    "Human error causes 80% of production incidents",
    "No standardization leads to configuration drift",
    "Rollback procedures often fail when needed most",
    "Testing in production destroys customer experience",
    "Manual deployments become bottlenecks for releases",
  ],
};

Rolling Deployment: The Foundation of Zero-Downtime Updates

#!/bin/bash
# rolling-deployment.sh - Professional rolling deployment implementation

set -euo pipefail

# Configuration
APP_NAME="${APP_NAME:-myapp}"
NEW_VERSION="${1:-latest}"
INSTANCES="${INSTANCES:-5}"
BATCH_SIZE="${BATCH_SIZE:-1}"
HEALTH_CHECK_URL="${HEALTH_CHECK_URL:-/health}"
ROLLBACK_ON_FAILURE="${ROLLBACK_ON_FAILURE:-true}"

rolling_deployment() {
    local new_version="$1"
    local total_instances="$2"
    local batch_size="$3"

    echo "🔄 Starting rolling deployment: $APP_NAME to $new_version"
    echo "📊 Configuration: $total_instances instances, batch size $batch_size"

    # Pre-deployment validation
    validate_deployment_prerequisites "$new_version"

    # Record current deployment state for rollback
    record_deployment_state

    local current_batch=1
    local total_batches=$(( (total_instances + batch_size - 1) / batch_size ))

    for (( i=1; i<=total_instances; i+=batch_size )); do
        local end_instance=$((i + batch_size - 1))
        if [ $end_instance -gt $total_instances ]; then
            end_instance=$total_instances
        fi

        echo "📦 Batch $current_batch/$total_batches: Updating instances $i-$end_instance"

        # Update batch of instances
        update_instance_batch "$i" "$end_instance" "$new_version"

        # Wait for instances to be healthy
        wait_for_batch_health "$i" "$end_instance"

        # Verify deployment quality
        if ! verify_deployment_quality "$i" "$end_instance"; then
            echo "❌ Deployment quality check failed, initiating rollback"
            rollback_deployment
            exit 1
        fi

        # Traffic validation - ensure no degradation
        if ! validate_traffic_health; then
            echo "❌ Traffic health degraded, initiating rollback"
            rollback_deployment
            exit 1
        fi

        # Pause between batches for monitoring
        if [ $end_instance -lt $total_instances ]; then
            echo "⏸️  Monitoring deployment health for 60 seconds..."
            sleep 60
        fi

        ((current_batch++))
    done

    # Final validation
    perform_final_deployment_validation "$new_version"

    echo "✅ Rolling deployment completed successfully"
    echo "🎉 All $total_instances instances updated to $new_version"
}

validate_deployment_prerequisites() {
    local version="$1"

    echo "🔍 Validating deployment prerequisites..."

    # Check if image exists and is healthy
    if ! docker pull "$APP_NAME:$version" &>/dev/null; then
        echo "❌ Cannot pull image $APP_NAME:$version"
        exit 1
    fi

    # Verify image passes security scan
    if ! security_scan_image "$APP_NAME:$version"; then
        echo "❌ Security scan failed for $APP_NAME:$version"
        exit 1
    fi

    # Check if infrastructure can handle the deployment
    if ! check_infrastructure_capacity; then
        echo "❌ Insufficient infrastructure capacity"
        exit 1
    fi

    # Validate database migrations if needed
    if ! validate_database_migrations "$version"; then
        echo "❌ Database migration validation failed"
        exit 1
    fi

    echo "✅ Prerequisites validation passed"
}

update_instance_batch() {
    local start_instance="$1"
    local end_instance="$2"
    local version="$3"

    for (( i=start_instance; i<=end_instance; i++ )); do
        echo "🔄 Updating instance $APP_NAME-$i to $version..."

        # Gracefully stop current instance
        docker stop "$APP_NAME-$i" --time=30 || true
        docker rm "$APP_NAME-$i" || true

        # Start new instance with updated version
        docker run -d \
            --name "$APP_NAME-$i" \
            --network production \
            --restart unless-stopped \
            --health-cmd "curl -f http://localhost:3000$HEALTH_CHECK_URL || exit 1" \
            --health-interval 10s \
            --health-retries 5 \
            --health-start-period 60s \
            --label "version=$version" \
            --label "deployment-batch=$(date +%Y%m%d-%H%M%S)" \
            -e NODE_ENV=production \
            -e INSTANCE_ID="$i" \
            "$APP_NAME:$version"
    done
}

wait_for_batch_health() {
    local start_instance="$1"
    local end_instance="$2"

    echo "🏥 Waiting for batch instances to become healthy..."

    for (( i=start_instance; i<=end_instance; i++ )); do
        local timeout=300  # 5 minutes
        local counter=0

        while [ $counter -lt $timeout ]; do
            local health_status=$(docker inspect "$APP_NAME-$i" --format='{{.State.Health.Status}}' 2>/dev/null || echo "unhealthy")

            if [ "$health_status" = "healthy" ]; then
                echo "✅ Instance $APP_NAME-$i is healthy"
                break
            fi

            if [ "$health_status" = "unhealthy" ]; then
                echo "❌ Instance $APP_NAME-$i failed health check"
                show_instance_logs "$APP_NAME-$i"
                return 1
            fi

            sleep 10
            ((counter += 10))
            echo -n "."
        done

        if [ $counter -ge $timeout ]; then
            echo "❌ Instance $APP_NAME-$i failed to become healthy within $timeout seconds"
            return 1
        fi
    done

    echo ""
    echo "✅ All instances in batch are healthy"
}

verify_deployment_quality() {
    local start_instance="$1"
    local end_instance="$2"

    echo "🧪 Verifying deployment quality..."

    # Test each instance individually
    for (( i=start_instance; i<=end_instance; i++ )); do
        local instance_ip=$(docker inspect "$APP_NAME-$i" --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}')

        # Smoke tests
        if ! run_smoke_tests "http://$instance_ip:3000"; then
            echo "❌ Smoke tests failed for instance $APP_NAME-$i"
            return 1
        fi

        # Performance baseline test
        if ! check_performance_baseline "$instance_ip"; then
            echo "❌ Performance baseline failed for instance $APP_NAME-$i"
            return 1
        fi
    done

    echo "✅ Deployment quality verification passed"
    return 0
}

validate_traffic_health() {
    echo "📊 Validating overall traffic health..."

    # Check error rate
    local error_rate=$(get_current_error_rate)
    if [ "$error_rate" -gt 5 ]; then
        echo "❌ Error rate too high: $error_rate%"
        return 1
    fi

    # Check response time
    local avg_response_time=$(get_average_response_time)
    if [ "$avg_response_time" -gt 2000 ]; then
        echo "❌ Response time too high: ${avg_response_time}ms"
        return 1
    fi

    # Check throughput
    local throughput=$(get_current_throughput)
    local expected_throughput=$(get_baseline_throughput)
    local throughput_ratio=$((throughput * 100 / expected_throughput))

    if [ $throughput_ratio -lt 80 ]; then
        echo "❌ Throughput too low: $throughput_ratio% of baseline"
        return 1
    fi

    echo "✅ Traffic health validation passed"
    return 0
}

# ========================================
# Blue-Green Deployment Strategy
# ========================================

blue_green_deployment() {
    local new_version="$1"
    local current_env="${2:-blue}"
    local target_env="green"

    if [ "$current_env" = "green" ]; then
        target_env="blue"
    fi

    echo "🔵🟢 Starting blue-green deployment: $current_env → $target_env"

    # Stage 1: Deploy to inactive environment
    deploy_to_environment "$target_env" "$new_version"

    # Stage 2: Comprehensive testing of new environment
    run_comprehensive_tests "$target_env"

    # Stage 3: Gradual traffic switching with monitoring
    gradual_traffic_switch "$current_env" "$target_env"

    # Stage 4: Monitor and validate
    monitor_post_switch "$target_env"

    # Stage 5: Cleanup old environment
    cleanup_old_environment "$current_env"

    echo "✅ Blue-green deployment completed successfully"
    echo "🎯 Active environment: $target_env"
}

deploy_to_environment() {
    local environment="$1"
    local version="$2"

    echo "🚀 Deploying $version to $environment environment..."

    # Update environment-specific configuration
    update_environment_config "$environment" "$version"

    # Deploy all services to environment
    docker-compose -f "docker-compose.yml" -f "docker-compose.$environment.yml" \
        pull --quiet

    docker-compose -f "docker-compose.yml" -f "docker-compose.$environment.yml" \
        up -d --scale app=3

    # Wait for environment to be fully operational
    wait_for_environment_ready "$environment"

    echo "✅ Deployment to $environment completed"
}

gradual_traffic_switch() {
    local old_env="$1"
    local new_env="$2"

    echo "🔀 Performing gradual traffic switch..."

    # Start with 10% traffic to new environment
    update_load_balancer_weights "$old_env:90" "$new_env:10"
    monitor_traffic_split 180  # Monitor for 3 minutes

    # Increase to 50% if healthy
    update_load_balancer_weights "$old_env:50" "$new_env:50"
    monitor_traffic_split 300  # Monitor for 5 minutes

    # Full switch if still healthy
    update_load_balancer_weights "$old_env:0" "$new_env:100"
    monitor_traffic_split 600  # Monitor for 10 minutes

    echo "✅ Traffic switch completed successfully"
}

# ========================================
# Canary Deployment Strategy
# ========================================

canary_deployment() {
    local new_version="$1"
    local canary_percentage="${2:-10}"
    local canary_duration="${3:-1800}"  # 30 minutes

    echo "🐤 Starting canary deployment: $canary_percentage% traffic to $new_version"

    # Deploy canary instances
    deploy_canary_instances "$new_version" "$canary_percentage"

    # Configure intelligent traffic routing
    setup_canary_routing "$canary_percentage"

    # Monitor canary metrics with automated decision making
    if monitor_canary_deployment "$canary_duration"; then
        echo "✅ Canary deployment successful, promoting to full deployment"
        promote_canary_to_production "$new_version"
    else
        echo "❌ Canary deployment failed, automatic rollback initiated"
        rollback_canary_deployment
        exit 1
    fi
}

deploy_canary_instances() {
    local version="$1"
    local percentage="$2"

    local total_instances=$(docker ps --filter "label=app=$APP_NAME" --format "{{.Names}}" | wc -l)
    local canary_instances=$(( (total_instances * percentage) / 100 ))

    if [ $canary_instances -eq 0 ]; then
        canary_instances=1
    fi

    echo "🚀 Deploying $canary_instances canary instances (${percentage}% of $total_instances)"

    for (( i=1; i<=canary_instances; i++ )); do
        docker run -d \
            --name "$APP_NAME-canary-$i" \
            --network production \
            --restart unless-stopped \
            --label "app=$APP_NAME" \
            --label "deployment-type=canary" \
            --label "version=$version" \
            --label "canary-group=$(date +%Y%m%d-%H%M%S)" \
            -e NODE_ENV=production \
            -e DEPLOYMENT_TYPE=canary \
            "$APP_NAME:$version"
    done

    # Wait for canary instances to be healthy
    wait_for_canary_health
}

monitor_canary_deployment() {
    local duration="$1"
    local start_time=$(date +%s)
    local end_time=$((start_time + duration))

    echo "📊 Monitoring canary deployment for $duration seconds..."

    while [ $(date +%s) -lt $end_time ]; do
        # Collect canary metrics
        local canary_metrics=$(collect_canary_metrics)

        # Analyze metrics for anomalies
        if ! analyze_canary_health "$canary_metrics"; then
            echo "❌ Canary health degraded, failing deployment"
            return 1
        fi

        # Progressive analysis - stricter thresholds over time
        local elapsed=$(($(date +%s) - start_time))
        local progress=$((elapsed * 100 / duration))

        if [ $progress -gt 50 ] && ! deep_canary_analysis "$canary_metrics"; then
            echo "❌ Deep canary analysis failed, deployment unhealthy"
            return 1
        fi

        echo "✅ Canary health check passed ($progress% complete)"
        sleep 60
    done

    echo "✅ Canary monitoring period completed successfully"
    return 0
}

# ========================================
# Deployment Utilities and Monitoring
# ========================================

run_smoke_tests() {
    local endpoint="$1"

    echo "🧪 Running smoke tests against $endpoint..."

    # Test 1: Health endpoint
    if ! curl -f -m 10 "$endpoint/health" &>/dev/null; then
        echo "❌ Health check failed"
        return 1
    fi

    # Test 2: Authentication flow
    local auth_token=$(curl -s -X POST "$endpoint/auth/login" \
        -H "Content-Type: application/json" \
        -d '{"username":"test","password":"test"}' | \
        jq -r '.token' 2>/dev/null)

    if [ -z "$auth_token" ] || [ "$auth_token" = "null" ]; then
        echo "❌ Authentication test failed"
        return 1
    fi

    # Test 3: Core API functionality
    if ! curl -f -H "Authorization: Bearer $auth_token" \
        "$endpoint/api/status" &>/dev/null; then
        echo "❌ Core API test failed"
        return 1
    fi

    # Test 4: Database connectivity
    if ! curl -f "$endpoint/api/health/database" &>/dev/null; then
        echo "❌ Database connectivity test failed"
        return 1
    fi

    echo "✅ All smoke tests passed"
    return 0
}

rollback_deployment() {
    echo "🔙 Initiating deployment rollback..."

    local previous_version=$(get_previous_deployment_version)

    if [ -z "$previous_version" ]; then
        echo "❌ Cannot determine previous version for rollback"
        exit 1
    fi

    echo "⏪ Rolling back to version: $previous_version"

    # Execute rollback using the same strategy as deployment
    case "${DEPLOYMENT_STRATEGY:-rolling}" in
        "rolling")
            rolling_deployment "$previous_version" "$INSTANCES" "$BATCH_SIZE"
            ;;
        "blue-green")
            # Switch back to previous environment
            switch_to_previous_environment
            ;;
        "canary")
            # Remove canary instances and restore full production
            remove_canary_instances
            ;;
    esac

    # Verify rollback success
    if validate_rollback_success "$previous_version"; then
        echo "✅ Rollback completed successfully"
        notify_rollback_success "$previous_version"
    else
        echo "❌ Rollback failed - manual intervention required"
        alert_operations_team
        exit 1
    fi
}

# Command routing for deployment strategies
case "${1:-help}" in
    "rolling")
        rolling_deployment "${2:-latest}" "${3:-5}" "${4:-1}"
        ;;
    "blue-green")
        blue_green_deployment "${2:-latest}" "${3:-blue}"
        ;;
    "canary")
        canary_deployment "${2:-latest}" "${3:-10}" "${4:-1800}"
        ;;
    "rollback")
        rollback_deployment
        ;;
    "smoke-test")
        run_smoke_tests "${2:-http://localhost:3000}"
        ;;
    "help"|*)
        cat << EOF
Professional Deployment Strategies

Usage: $0 <strategy> [options]

Strategies:
    rolling [version] [instances] [batch]  Rolling deployment (zero downtime)
    blue-green [version] [current-env]    Blue-green deployment (instant switch)
    canary [version] [percentage] [duration] Canary deployment (gradual rollout)
    rollback                              Rollback to previous version
    smoke-test [endpoint]                 Run smoke tests against endpoint

Examples:
    $0 rolling v2.1.0 10 2               # Rolling update, 10 instances, 2 at a time
    $0 blue-green v2.1.0 blue            # Blue-green from blue to green
    $0 canary v2.1.0 15 3600             # Canary with 15% traffic for 1 hour
    $0 rollback                          # Emergency rollback
EOF
        ;;
esac

Cloud Platform Architecture: AWS, GCP, and Azure Mastery

Choosing Your Cloud Foundation Wisely

Understanding cloud platform strengths and use cases:

// Cloud platform comparison for backend infrastructure
const cloudPlatformDecision = {
  aws: {
    strengths: [
      "Largest market share and ecosystem",
      "Most comprehensive service catalog",
      "Best enterprise and compliance support",
      "Mature networking and security services",
      "Extensive third-party integrations",
    ],

    idealFor: [
      "Enterprise applications requiring compliance",
      "Complex multi-service architectures",
      "Organizations with existing AWS investments",
      "Applications needing global edge presence",
      "Teams with strong DevOps/Infrastructure expertise",
    ],

    pricing: "Pay-as-you-go, complex but optimizable",
    learningCurve: "Steep but extensive documentation",

    keyServices: {
      compute: ["EC2", "ECS", "EKS", "Lambda", "Fargate"],
      storage: ["S3", "EBS", "EFS"],
      database: ["RDS", "DynamoDB", "ElastiCache", "DocumentDB"],
      networking: ["VPC", "ALB/NLB", "CloudFront", "Route53"],
      monitoring: ["CloudWatch", "X-Ray", "Systems Manager"],
    },
  },

  gcp: {
    strengths: [
      "Superior machine learning and AI services",
      "Excellent container orchestration (GKE)",
      "Competitive pricing and sustained use discounts",
      "Strong data analytics and BigData tools",
      "Google-scale global network infrastructure",
    ],

    idealFor: [
      "Data-heavy applications and analytics",
      "Machine learning and AI workloads",
      "Kubernetes-native applications",
      "Startups looking for cost-effective scaling",
      "Organizations leveraging Google Workspace",
    ],

    pricing: "Generally more cost-effective, simpler structure",
    learningCurve: "Moderate, good developer experience",

    keyServices: {
      compute: ["Compute Engine", "GKE", "Cloud Run", "Cloud Functions"],
      storage: ["Cloud Storage", "Persistent Disk"],
      database: ["Cloud SQL", "Firestore", "BigQuery", "Memorystore"],
      networking: ["VPC", "Load Balancing", "Cloud CDN", "Cloud DNS"],
      monitoring: ["Stackdriver", "Cloud Logging", "Cloud Monitoring"],
    },
  },

  azure: {
    strengths: [
      "Seamless Microsoft ecosystem integration",
      "Strong hybrid cloud capabilities",
      "Excellent enterprise Active Directory integration",
      "Competitive pricing for Microsoft shops",
      "Growing market presence and feature parity",
    ],

    idealFor: [
      "Organizations heavily using Microsoft stack",
      "Hybrid cloud deployments",
      "Enterprise applications requiring AD integration",
      "Windows-based application workloads",
      ".NET and C# development teams",
    ],

    pricing: "Competitive, especially with Microsoft licensing bundles",
    learningCurve: "Moderate, familiar for Windows administrators",

    keyServices: {
      compute: ["Virtual Machines", "AKS", "Container Instances", "Functions"],
      storage: ["Blob Storage", "Disk Storage", "Files"],
      database: ["SQL Database", "Cosmos DB", "PostgreSQL", "Redis Cache"],
      networking: ["Virtual Network", "Load Balancer", "CDN", "DNS"],
      monitoring: ["Monitor", "Log Analytics", "Application Insights"],
    },
  },
};

// Decision matrix for cloud platform selection
function selectCloudPlatform(requirements) {
  const {
    team_experience,
    existing_stack,
    compliance_needs,
    ai_ml_requirements,
    budget_constraints,
    geographical_presence,
    scaling_requirements,
  } = requirements;

  if (
    existing_stack.includes("microsoft") &&
    team_experience.includes("windows")
  ) {
    return "azure";
  }

  if (ai_ml_requirements === "high" || budget_constraints === "tight") {
    return "gcp";
  }

  if (compliance_needs === "enterprise" || geographical_presence === "global") {
    return "aws";
  }

  // Default recommendation for general use cases
  return team_experience.includes("aws") ? "aws" : "gcp";
}

AWS Infrastructure Implementation:

#!/bin/bash
# aws-infrastructure.sh - Professional AWS infrastructure setup

set -euo pipefail

# Configuration
AWS_REGION="${AWS_REGION:-us-west-2}"
PROJECT_NAME="${PROJECT_NAME:-myapp}"
ENVIRONMENT="${ENVIRONMENT:-production}"

setup_aws_infrastructure() {
    echo "🏗️  Setting up AWS infrastructure for $PROJECT_NAME..."

    # Create VPC and networking
    setup_vpc_networking

    # Set up compute resources
    setup_compute_infrastructure

    # Configure databases and storage
    setup_data_layer

    # Set up load balancing and CDN
    setup_networking_layer

    # Configure monitoring and logging
    setup_observability

    echo "✅ AWS infrastructure setup completed"
}

setup_vpc_networking() {
    echo "🌐 Setting up VPC networking..."

    # Create VPC
    local vpc_id=$(aws ec2 create-vpc \
        --cidr-block 10.0.0.0/16 \
        --tag-specifications "ResourceType=vpc,Tags=[{Key=Name,Value=$PROJECT_NAME-vpc},{Key=Environment,Value=$ENVIRONMENT}]" \
        --query 'Vpc.VpcId' \
        --output text)

    echo "Created VPC: $vpc_id"

    # Create Internet Gateway
    local igw_id=$(aws ec2 create-internet-gateway \
        --tag-specifications "ResourceType=internet-gateway,Tags=[{Key=Name,Value=$PROJECT_NAME-igw}]" \
        --query 'InternetGateway.InternetGatewayId' \
        --output text)

    aws ec2 attach-internet-gateway \
        --vpc-id "$vpc_id" \
        --internet-gateway-id "$igw_id"

    # Create public subnets (for load balancers)
    local public_subnet_1=$(aws ec2 create-subnet \
        --vpc-id "$vpc_id" \
        --cidr-block 10.0.1.0/24 \
        --availability-zone "${AWS_REGION}a" \
        --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-public-1},{Key=Type,Value=public}]" \
        --query 'Subnet.SubnetId' \
        --output text)

    local public_subnet_2=$(aws ec2 create-subnet \
        --vpc-id "$vpc_id" \
        --cidr-block 10.0.2.0/24 \
        --availability-zone "${AWS_REGION}b" \
        --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-public-2},{Key=Type,Value=public}]" \
        --query 'Subnet.SubnetId' \
        --output text)

    # Create private subnets (for application servers)
    local private_subnet_1=$(aws ec2 create-subnet \
        --vpc-id "$vpc_id" \
        --cidr-block 10.0.3.0/24 \
        --availability-zone "${AWS_REGION}a" \
        --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-private-1},{Key=Type,Value=private}]" \
        --query 'Subnet.SubnetId' \
        --output text)

    local private_subnet_2=$(aws ec2 create-subnet \
        --vpc-id "$vpc_id" \
        --cidr-block 10.0.4.0/24 \
        --availability-zone "${AWS_REGION}b" \
        --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-private-2},{Key=Type,Value=private}]" \
        --query 'Subnet.SubnetId' \
        --output text)

    # Create database subnets (isolated)
    local db_subnet_1=$(aws ec2 create-subnet \
        --vpc-id "$vpc_id" \
        --cidr-block 10.0.5.0/24 \
        --availability-zone "${AWS_REGION}a" \
        --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-db-1},{Key=Type,Value=database}]" \
        --query 'Subnet.SubnetId' \
        --output text)

    local db_subnet_2=$(aws ec2 create-subnet \
        --vpc-id "$vpc_id" \
        --cidr-block 10.0.6.0/24 \
        --availability-zone "${AWS_REGION}b" \
        --tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-db-2},{Key=Type,Value=database}]" \
        --query 'Subnet.SubnetId' \
        --output text)

    # Set up routing
    setup_vpc_routing "$vpc_id" "$igw_id" "$public_subnet_1" "$public_subnet_2" \
                     "$private_subnet_1" "$private_subnet_2"

    # Configure security groups
    setup_security_groups "$vpc_id"

    echo "✅ VPC networking configured"
}

setup_compute_infrastructure() {
    echo "💻 Setting up compute infrastructure..."

    # Create ECS cluster
    aws ecs create-cluster \
        --cluster-name "$PROJECT_NAME-cluster" \
        --capacity-providers EC2 FARGATE FARGATE_SPOT \
        --default-capacity-provider-strategy \
            capacityProvider=FARGATE,weight=1,base=2 \
            capacityProvider=FARGATE_SPOT,weight=4 \
            capacityProvider=EC2,weight=1

    # Create application task definition
    create_ecs_task_definition

    # Set up Auto Scaling Group for EC2 capacity
    setup_auto_scaling_group

    # Create ECS service with deployment configuration
    create_ecs_service

    echo "✅ Compute infrastructure configured"
}

create_ecs_task_definition() {
    cat > task-definition.json << EOF
{
    "family": "$PROJECT_NAME-app",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "1024",
    "memory": "2048",
    "executionRoleArn": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/ecsTaskRole",
    "containerDefinitions": [
        {
            "name": "$PROJECT_NAME-container",
            "image": "$PROJECT_NAME:latest",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": 3000,
                    "protocol": "tcp"
                }
            ],
            "environment": [
                {"name": "NODE_ENV", "value": "production"},
                {"name": "PORT", "value": "3000"}
            ],
            "secrets": [
                {
                    "name": "DATABASE_URL",
                    "valueFrom": "arn:aws:secretsmanager:$AWS_REGION:$(aws sts get-caller-identity --query Account --output text):secret:$PROJECT_NAME/database-url"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/$PROJECT_NAME",
                    "awslogs-region": "$AWS_REGION",
                    "awslogs-stream-prefix": "ecs"
                }
            },
            "healthCheck": {
                "command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"],
                "interval": 30,
                "timeout": 5,
                "retries": 3,
                "startPeriod": 60
            }
        }
    ]
}
EOF

    aws ecs register-task-definition \
        --cli-input-json file://task-definition.json

    rm task-definition.json
}

setup_data_layer() {
    echo "🗄️  Setting up data layer..."

    # Create RDS subnet group
    aws rds create-db-subnet-group \
        --db-subnet-group-name "$PROJECT_NAME-db-subnet-group" \
        --db-subnet-group-description "Subnet group for $PROJECT_NAME database" \
        --subnet-ids subnet-xxx subnet-yyy \
        --tags Key=Name,Value="$PROJECT_NAME-db-subnet-group"

    # Create RDS instance with Multi-AZ deployment
    aws rds create-db-instance \
        --db-instance-identifier "$PROJECT_NAME-postgres" \
        --db-instance-class db.t3.medium \
        --engine postgres \
        --engine-version 15.4 \
        --master-username postgres \
        --master-user-password "$(aws secretsmanager get-random-password --password-length 32 --exclude-characters '"@/\' --query RandomPassword --output text)" \
        --allocated-storage 100 \
        --storage-type gp2 \
        --storage-encrypted \
        --vpc-security-group-ids sg-xxx \
        --db-subnet-group-name "$PROJECT_NAME-db-subnet-group" \
        --backup-retention-period 30 \
        --multi-az \
        --deletion-protection \
        --enable-performance-insights \
        --performance-insights-retention-period 7 \
        --tags Key=Name,Value="$PROJECT_NAME-postgres" Key=Environment,Value="$ENVIRONMENT"

    # Create ElastiCache Redis cluster
    aws elasticache create-cache-cluster \
        --cache-cluster-id "$PROJECT_NAME-redis" \
        --cache-node-type cache.t3.micro \
        --engine redis \
        --num-cache-nodes 1 \
        --cache-subnet-group-name "$PROJECT_NAME-cache-subnet-group" \
        --security-group-ids sg-yyy \
        --tags Key=Name,Value="$PROJECT_NAME-redis"

    # Create S3 bucket for static assets
    aws s3 mb "s3://$PROJECT_NAME-assets-$(date +%Y%m%d)" \
        --region "$AWS_REGION"

    echo "✅ Data layer configured"
}

setup_networking_layer() {
    echo "🌐 Setting up load balancing and CDN..."

    # Create Application Load Balancer
    local alb_arn=$(aws elbv2 create-load-balancer \
        --name "$PROJECT_NAME-alb" \
        --subnets subnet-xxx subnet-yyy \
        --security-groups sg-zzz \
        --scheme internet-facing \
        --type application \
        --ip-address-type ipv4 \
        --tags Key=Name,Value="$PROJECT_NAME-alb" \
        --query 'LoadBalancers[0].LoadBalancerArn' \
        --output text)

    # Create target group
    local tg_arn=$(aws elbv2 create-target-group \
        --name "$PROJECT_NAME-tg" \
        --protocol HTTP \
        --port 3000 \
        --vpc-id vpc-xxx \
        --target-type ip \
        --health-check-protocol HTTP \
        --health-check-path /health \
        --health-check-interval-seconds 30 \
        --health-check-timeout-seconds 5 \
        --healthy-threshold-count 2 \
        --unhealthy-threshold-count 3 \
        --query 'TargetGroups[0].TargetGroupArn' \
        --output text)

    # Create ALB listener
    aws elbv2 create-listener \
        --load-balancer-arn "$alb_arn" \
        --protocol HTTP \
        --port 80 \
        --default-actions Type=forward,TargetGroupArn="$tg_arn"

    # Set up CloudFront distribution
    setup_cloudfront_distribution "$alb_arn"

    echo "✅ Networking layer configured"
}

setup_observability() {
    echo "📊 Setting up monitoring and logging..."

    # Create CloudWatch log group
    aws logs create-log-group \
        --log-group-name "/ecs/$PROJECT_NAME" \
        --retention-in-days 30

    # Create CloudWatch alarms
    create_cloudwatch_alarms

    # Set up X-Ray tracing
    aws xray create-service-map \
        --service-name "$PROJECT_NAME-service"

    echo "✅ Observability configured"
}

GCP Infrastructure Implementation:

# gcp-infrastructure.yaml - Professional GCP infrastructure with Deployment Manager
resources:
  # VPC Network
  - name: myapp-vpc
    type: compute.v1.network
    properties:
      autoCreateSubnetworks: false

  # Subnets
  - name: myapp-subnet-web
    type: compute.v1.subnetwork
    properties:
      network: $(ref.myapp-vpc.selfLink)
      ipCidrRange: 10.0.1.0/24
      region: us-west1

  - name: myapp-subnet-app
    type: compute.v1.subnetwork
    properties:
      network: $(ref.myapp-vpc.selfLink)
      ipCidrRange: 10.0.2.0/24
      region: us-west1
      privateIpGoogleAccess: true

  # GKE Cluster
  - name: myapp-gke-cluster
    type: container.v1.cluster
    properties:
      zone: us-west1-a
      network: $(ref.myapp-vpc.selfLink)
      subnetwork: $(ref.myapp-subnet-app.selfLink)
      initialClusterVersion: "1.27"
      nodePools:
        - name: default-pool
          initialNodeCount: 3
          config:
            machineType: e2-standard-2
            diskType: pd-ssd
            diskSizeGb: 100
            preemptible: false
            serviceAccount: default
            oauthScopes:
              - https://www.googleapis.com/auth/cloud-platform
          autoscaling:
            enabled: true
            minNodeCount: 1
            maxNodeCount: 10
          management:
            autoUpgrade: true
            autoRepair: true

  # Cloud SQL Instance
  - name: myapp-postgres
    type: sqladmin.v1beta4.instance
    properties:
      backendType: SECOND_GEN
      instanceType: CLOUD_SQL_INSTANCE
      databaseVersion: POSTGRES_15
      region: us-west1
      settings:
        tier: db-g1-small
        storageType: PD_SSD
        storageSize: 100
        storageAutoResize: true
        availabilityType: REGIONAL
        backupConfiguration:
          enabled: true
          startTime: "03:00"
          retainedBackups: 30
        ipConfiguration:
          privateNetwork: $(ref.myapp-vpc.selfLink)
          requireSsl: true
        maintenanceWindow:
          hour: 3
          day: 7

  # Redis Instance
  - name: myapp-redis
    type: redis.v1.instance
    properties:
      tier: STANDARD_HA
      memorySizeGb: 1
      region: us-west1
      authorizedNetwork: $(ref.myapp-vpc.selfLink)
      redisVersion: REDIS_7_0

  # Load Balancer
  - name: myapp-lb
    type: compute.v1.globalForwardingRule
    properties:
      IPProtocol: TCP
      portRange: 80-80
      target: $(ref.myapp-http-proxy.selfLink)

  # Cloud Storage Bucket
  - name: myapp-assets
    type: storage.v1.bucket
    properties:
      location: US
      storageClass: STANDARD
      versioning:
        enabled: true
      lifecycle:
        rule:
          - action:
              type: Delete
            condition:
              age: 365

Azure Infrastructure with ARM Templates:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "projectName": {
      "type": "string",
      "defaultValue": "myapp"
    },
    "environment": {
      "type": "string",
      "defaultValue": "production"
    }
  },
  "resources": [
    {
      "type": "Microsoft.Network/virtualNetworks",
      "apiVersion": "2021-05-01",
      "name": "[concat(parameters('projectName'), '-vnet')]",
      "location": "[resourceGroup().location]",
      "properties": {
        "addressSpace": {
          "addressPrefixes": ["10.0.0.0/16"]
        },
        "subnets": [
          {
            "name": "web-subnet",
            "properties": {
              "addressPrefix": "10.0.1.0/24"
            }
          },
          {
            "name": "app-subnet",
            "properties": {
              "addressPrefix": "10.0.2.0/24"
            }
          },
          {
            "name": "data-subnet",
            "properties": {
              "addressPrefix": "10.0.3.0/24",
              "serviceEndpoints": [
                {
                  "service": "Microsoft.Sql"
                }
              ]
            }
          }
        ]
      }
    },
    {
      "type": "Microsoft.ContainerService/managedClusters",
      "apiVersion": "2023-07-01",
      "name": "[concat(parameters('projectName'), '-aks')]",
      "location": "[resourceGroup().location]",
      "properties": {
        "kubernetesVersion": "1.27.3",
        "dnsPrefix": "[parameters('projectName')]",
        "agentPoolProfiles": [
          {
            "name": "nodepool1",
            "count": 3,
            "vmSize": "Standard_D2s_v3",
            "osType": "Linux",
            "mode": "System",
            "enableAutoScaling": true,
            "minCount": 1,
            "maxCount": 10
          }
        ],
        "servicePrincipalProfile": {
          "clientId": "msi"
        },
        "addonProfiles": {
          "azureKeyvaultSecretsProvider": {
            "enabled": true
          },
          "azurepolicy": {
            "enabled": true
          }
        },
        "networkProfile": {
          "networkPlugin": "azure",
          "serviceCidr": "172.16.0.0/16",
          "dnsServiceIP": "172.16.0.10"
        }
      },
      "identity": {
        "type": "SystemAssigned"
      }
    },
    {
      "type": "Microsoft.Sql/servers",
      "apiVersion": "2021-11-01",
      "name": "[concat(parameters('projectName'), '-sql')]",
      "location": "[resourceGroup().location]",
      "properties": {
        "administratorLogin": "sqladmin",
        "administratorLoginPassword": "[concat(toUpper(uniqueString(resourceGroup().id)), uniqueString(resourceGroup().id), '!')]"
      },
      "resources": [
        {
          "type": "databases",
          "apiVersion": "2021-11-01",
          "name": "[parameters('projectName')]",
          "dependsOn": [
            "[resourceId('Microsoft.Sql/servers', concat(parameters('projectName'), '-sql'))]"
          ],
          "properties": {
            "sku": {
              "name": "S1",
              "tier": "Standard"
            },
            "maxSizeBytes": 107374182400
          }
        }
      ]
    }
  ]
}

Server Management and Provisioning: Infrastructure as Code

The Evolution from Snowflake Servers to Cattle Infrastructure

Understanding the infrastructure management revolution:

// Server management evolution: Pets vs Cattle vs Code
const infrastructureEvolution = {
  petsModel: {
    approach: "Hand-crafted, named servers treated like pets",
    characteristics: [
      "Manually configured and maintained",
      "Irreplaceable and unique",
      "SSH access for troubleshooting",
      "Configuration drift over time",
      "Difficult to replicate",
    ],
    problems: [
      "Scaling requires manual work",
      "Inconsistent environments",
      "Single points of failure",
      "Knowledge trapped in individuals",
      "Disaster recovery is painful",
    ],
    reality: "Doesn't scale beyond small teams or simple applications",
  },

  cattleModel: {
    approach: "Disposable, identical servers treated like cattle",
    characteristics: [
      "Automated provisioning",
      "Replaceable and identical",
      "No SSH access needed",
      "Immutable infrastructure",
      "Auto-scaling capable",
    ],
    benefits: [
      "Consistent deployments",
      "Easy disaster recovery",
      "Horizontal scaling",
      "Reduced operational overhead",
      "Better security posture",
    ],
    limitations: "Still requires infrastructure management tooling",
  },

  infrastructureAsCode: {
    approach: "Infrastructure defined, versioned, and managed as code",
    characteristics: [
      "Declarative configuration",
      "Version controlled infrastructure",
      "Automated provisioning and updates",
      "Peer review for infrastructure changes",
      "Reproducible across environments",
    ],
    advantages: [
      "Infrastructure becomes predictable",
      "Changes are tracked and auditable",
      "Environment consistency guaranteed",
      "Collaboration through code review",
      "Disaster recovery through code deployment",
    ],
    outcome: "Infrastructure becomes as manageable as application code",
  },
};

Terraform Infrastructure as Code Implementation:

# terraform/main.tf - Professional multi-cloud infrastructure

terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.11"
    }
  }

  backend "s3" {
    bucket         = "myapp-terraform-state"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# Variables
variable "project_name" {
  description = "Name of the project"
  type        = string
  default     = "myapp"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-west-2"
}

variable "kubernetes_version" {
  description = "Kubernetes version for EKS"
  type        = string
  default     = "1.28"
}

# Data sources
data "aws_availability_zones" "available" {
  state = "available"
}

data "aws_caller_identity" "current" {}

# Local values
locals {
  cluster_name = "${var.project_name}-${var.environment}"

  common_tags = {
    Project     = var.project_name
    Environment = var.environment
    ManagedBy   = "terraform"
  }

  azs = slice(data.aws_availability_zones.available.names, 0, 3)
}

# VPC Configuration
module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "${local.cluster_name}-vpc"
  cidr = "10.0.0.0/16"

  azs             = local.azs
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  database_subnets = ["10.0.7.0/24", "10.0.8.0/24", "10.0.9.0/24"]

  enable_nat_gateway     = true
  single_nat_gateway     = false
  enable_vpn_gateway     = false
  enable_dns_hostnames   = true
  enable_dns_support     = true

  # VPC Flow Logs
  enable_flow_log                      = true
  create_flow_log_cloudwatch_log_group = true
  create_flow_log_cloudwatch_iam_role  = true
  flow_log_max_aggregation_interval    = 60

  # Subnet tagging for Load Balancers
  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
  }

  tags = local.common_tags
}

# Security Groups
resource "aws_security_group" "eks_cluster" {
  name_prefix = "${local.cluster_name}-cluster-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [module.vpc.vpc_cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, {
    Name = "${local.cluster_name}-cluster-sg"
  })
}

# EKS Cluster
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = local.cluster_name
  cluster_version = var.kubernetes_version

  vpc_id                         = module.vpc.vpc_id
  subnet_ids                     = module.vpc.private_subnets
  cluster_endpoint_public_access = true
  cluster_endpoint_private_access = true

  # Cluster security group
  cluster_security_group_additional_rules = {
    ingress_nodes_ephemeral_ports_tcp = {
      description                = "Nodes on ephemeral ports"
      protocol                   = "tcp"
      from_port                  = 1025
      to_port                    = 65535
      type                       = "ingress"
      source_node_security_group = true
    }
  }

  # Node groups
  eks_managed_node_groups = {
    main = {
      name = "main-nodegroup"

      instance_types = ["t3.medium", "t3.large"]
      capacity_type  = "SPOT"

      min_size     = 1
      max_size     = 10
      desired_size = 3

      # Launch template configuration
      launch_template_name        = "${local.cluster_name}-main"
      launch_template_description = "Launch template for main node group"
      launch_template_version     = "$Latest"

      pre_bootstrap_user_data = <<-EOT
        #!/bin/bash
        /etc/eks/bootstrap.sh ${local.cluster_name}
        yum install -y amazon-cloudwatch-agent
      EOT

      # Disk configuration
      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 100
            volume_type           = "gp3"
            iops                 = 3000
            throughput           = 150
            encrypted            = true
            delete_on_termination = true
          }
        }
      }

      # Taints and labels
      taints = {
        dedicated = {
          key    = "dedicated"
          value  = "main"
          effect = "NO_SCHEDULE"
        }
      }

      labels = {
        Environment = var.environment
        NodeGroup   = "main"
      }

      tags = local.common_tags
    }

    # Additional node group for system workloads
    system = {
      name = "system-nodegroup"

      instance_types = ["t3.small"]
      capacity_type  = "ON_DEMAND"

      min_size     = 2
      max_size     = 4
      desired_size = 2

      labels = {
        Environment = var.environment
        NodeGroup   = "system"
        WorkloadType = "system"
      }

      taints = {
        system = {
          key    = "system"
          value  = "true"
          effect = "NO_SCHEDULE"
        }
      }

      tags = local.common_tags
    }
  }

  # Cluster add-ons
  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      most_recent = true
    }
  }

  tags = local.common_tags
}

# RDS Database
module "rds" {
  source = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "${local.cluster_name}-postgres"

  # Database configuration
  engine               = "postgres"
  engine_version       = "15.4"
  family              = "postgres15"
  major_engine_version = "15"
  instance_class       = "db.t3.medium"

  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_type          = "gp2"
  storage_encrypted     = true

  # Database settings
  db_name  = replace(var.project_name, "-", "_")
  username = "postgres"
  manage_master_user_password = true
  port     = 5432

  # Network configuration
  multi_az               = true
  db_subnet_group_name   = module.vpc.database_subnet_group
  vpc_security_group_ids = [aws_security_group.rds.id]

  # Backup configuration
  backup_retention_period = 30
  backup_window          = "03:00-04:00"
  maintenance_window     = "Sun:04:00-Sun:05:00"

  # Monitoring
  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
  create_cloudwatch_log_group     = true
  performance_insights_enabled    = true
  performance_insights_retention_period = 7

  # Security
  deletion_protection = true
  skip_final_snapshot = false
  final_snapshot_identifier = "${local.cluster_name}-postgres-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"

  tags = local.common_tags
}

# RDS Security Group
resource "aws_security_group" "rds" {
  name_prefix = "${local.cluster_name}-rds-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "PostgreSQL"
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = [module.vpc.vpc_cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, {
    Name = "${local.cluster_name}-rds-sg"
  })
}

# ElastiCache Redis
resource "aws_elasticache_subnet_group" "redis" {
  name       = "${local.cluster_name}-redis-subnet"
  subnet_ids = module.vpc.database_subnets

  tags = local.common_tags
}

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id       = "${local.cluster_name}-redis"
  description                = "Redis cluster for ${local.cluster_name}"

  node_type               = "cache.t3.micro"
  port                    = 6379
  parameter_group_name    = "default.redis7"

  num_cache_clusters      = 2
  automatic_failover_enabled = true
  multi_az_enabled        = true

  subnet_group_name       = aws_elasticache_subnet_group.redis.name
  security_group_ids      = [aws_security_group.redis.id]

  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = random_password.redis_auth.result

  # Backup configuration
  snapshot_retention_limit = 7
  snapshot_window         = "03:00-05:00"

  # Maintenance
  maintenance_window = "sun:05:00-sun:07:00"

  # Logging
  log_delivery_configuration {
    destination      = aws_cloudwatch_log_group.redis.name
    destination_type = "cloudwatch-logs"
    log_format      = "text"
    log_type        = "slow-log"
  }

  tags = local.common_tags
}

# Redis password
resource "random_password" "redis_auth" {
  length  = 32
  special = true
}

# Redis Security Group
resource "aws_security_group" "redis" {
  name_prefix = "${local.cluster_name}-redis-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "Redis"
    from_port   = 6379
    to_port     = 6379
    protocol    = "tcp"
    cidr_blocks = [module.vpc.vpc_cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, {
    Name = "${local.cluster_name}-redis-sg"
  })
}

# CloudWatch Log Group for Redis
resource "aws_cloudwatch_log_group" "redis" {
  name              = "/elasticache/${local.cluster_name}-redis"
  retention_in_days = 30

  tags = local.common_tags
}

# S3 Bucket for static assets
module "s3_bucket" {
  source = "terraform-aws-modules/s3-bucket/aws"
  version = "~> 3.0"

  bucket = "${local.cluster_name}-assets-${random_string.bucket_suffix.result}"

  # Security
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true

  # Versioning
  versioning = {
    enabled = true
  }

  # Server-side encryption
  server_side_encryption_configuration = {
    rule = {
      apply_server_side_encryption_by_default = {
        sse_algorithm = "AES256"
      }
    }
  }

  # Lifecycle configuration
  lifecycle_configuration = {
    rule = {
      id     = "delete_old_versions"
      status = "Enabled"

      noncurrent_version_expiration = {
        noncurrent_days = 90
      }
    }
  }

  tags = local.common_tags
}

resource "random_string" "bucket_suffix" {
  length  = 8
  special = false
  upper   = false
}

# Application Load Balancer
module "alb" {
  source = "terraform-aws-modules/alb/aws"
  version = "~> 8.0"

  name = "${local.cluster_name}-alb"

  load_balancer_type = "application"

  vpc_id          = module.vpc.vpc_id
  subnets         = module.vpc.public_subnets
  security_groups = [aws_security_group.alb.id]

  # Target groups (will be managed by Kubernetes ingress)
  target_groups = [
    {
      name     = "${local.cluster_name}-tg"
      backend_protocol     = "HTTP"
      backend_port         = 80
      target_type         = "ip"
      deregistration_delay = 10

      health_check = {
        enabled             = true
        healthy_threshold   = 2
        interval            = 30
        matcher            = "200"
        path               = "/health"
        port               = "traffic-port"
        protocol           = "HTTP"
        timeout            = 5
        unhealthy_threshold = 2
      }

      stickiness = {
        enabled = false
        type    = "lb_cookie"
      }
    }
  ]

  # Listeners
  http_tcp_listeners = [
    {
      port               = 80
      protocol           = "HTTP"
      target_group_index = 0
    }
  ]

  tags = local.common_tags
}

# ALB Security Group
resource "aws_security_group" "alb" {
  name_prefix = "${local.cluster_name}-alb-"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.common_tags, {
    Name = "${local.cluster_name}-alb-sg"
  })
}

# Outputs
output "cluster_endpoint" {
  description = "Endpoint for EKS control plane"
  value       = module.eks.cluster_endpoint
}

output "cluster_name" {
  description = "Kubernetes Cluster Name"
  value       = module.eks.cluster_name
}

output "rds_endpoint" {
  description = "RDS instance endpoint"
  value       = module.rds.db_instance_endpoint
}

output "redis_endpoint" {
  description = "ElastiCache Redis endpoint"
  value       = aws_elasticache_replication_group.redis.configuration_endpoint_address
}

output "s3_bucket_name" {
  description = "Name of the S3 bucket"
  value       = module.s3_bucket.s3_bucket_id
}

output "alb_dns_name" {
  description = "The DNS name of the load balancer"
  value       = module.alb.lb_dns_name
}

Load Balancers and Reverse Proxies: Traffic Distribution Excellence

Professional Load Balancing Architecture

Load balancing strategies that actually scale in production:

# nginx.conf - Professional reverse proxy and load balancer configuration

# Main context - global configuration
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;

# Optimize for high performance
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    # Basic settings
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Performance optimizations
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 100;
    types_hash_max_size 2048;
    server_tokens off;

    # Buffer optimizations
    client_body_buffer_size 128k;
    client_max_body_size 100m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;

    # Gzip compression
    gzip on;
    gzip_vary on;
    gzip_min_length 1000;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/json
        application/javascript
        application/xml+rss
        application/atom+xml
        image/svg+xml;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "no-referrer-when-downgrade" always;
    add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
    limit_conn_zone $binary_remote_addr zone=connections:10m;

    # Upstream servers with advanced load balancing
    upstream backend_api {
        # Load balancing method: least_conn, ip_hash, hash, random
        least_conn;

        # Backend servers with weights and health checks
        server api-1.internal:3000 weight=3 max_fails=2 fail_timeout=30s;
        server api-2.internal:3000 weight=3 max_fails=2 fail_timeout=30s;
        server api-3.internal:3000 weight=2 max_fails=2 fail_timeout=30s;
        server api-4.internal:3000 weight=1 max_fails=2 fail_timeout=30s backup;

        # Connection pooling
        keepalive 32;
        keepalive_requests 100;
        keepalive_timeout 60s;
    }

    upstream backend_websocket {
        # Use IP hash for WebSocket sticky sessions
        ip_hash;

        server ws-1.internal:3001 max_fails=1 fail_timeout=10s;
        server ws-2.internal:3001 max_fails=1 fail_timeout=10s;
        server ws-3.internal:3001 max_fails=1 fail_timeout=10s;

        keepalive 16;
    }

    upstream backend_static {
        # Round robin for static content
        server static-1.internal:8080 weight=1;
        server static-2.internal:8080 weight=1;

        keepalive 8;
    }

    # Health check endpoint
    server {
        listen 8080;
        server_name _;

        location /nginx-health {
            access_log off;
            return 200 "healthy\n";
            add_header Content-Type text/plain;
        }

        # Nginx status for monitoring
        location /nginx-status {
            stub_status on;
            access_log off;
            allow 10.0.0.0/8;
            allow 172.16.0.0/12;
            allow 192.168.0.0/16;
            deny all;
        }
    }

    # Main application server
    server {
        listen 80;
        server_name myapp.com www.myapp.com;

        # Redirect to HTTPS
        return 301 https://$server_name$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name myapp.com www.myapp.com;

        # SSL configuration
        ssl_certificate /etc/nginx/ssl/myapp.com.crt;
        ssl_certificate_key /etc/nginx/ssl/myapp.com.key;

        # Modern SSL configuration
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384;
        ssl_prefer_server_ciphers off;
        ssl_session_cache shared:SSL:10m;
        ssl_session_timeout 10m;
        ssl_session_tickets off;
        ssl_stapling on;
        ssl_stapling_verify on;

        # HSTS
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

        # Logging
        access_log /var/log/nginx/myapp_access.log combined;
        error_log /var/log/nginx/myapp_error.log;

        # Global rate limiting
        limit_req zone=api burst=20 nodelay;
        limit_conn connections 50;

        # Static content with aggressive caching
        location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
            expires 1y;
            add_header Cache-Control "public, immutable";
            add_header Vary Accept-Encoding;

            # Serve from static backend
            proxy_pass http://backend_static;
            proxy_cache static_cache;
            proxy_cache_valid 200 1d;
            proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
            proxy_cache_lock on;
            proxy_cache_lock_timeout 5s;
        }

        # API endpoints with specific rate limiting
        location /api/ {
            # Stricter rate limiting for API
            limit_req zone=api burst=10 nodelay;

            # Proxy to backend API
            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection 'upgrade';
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Request-ID $request_id;

            # Timeouts
            proxy_connect_timeout 5s;
            proxy_send_timeout 60s;
            proxy_read_timeout 60s;

            # Buffering
            proxy_buffering on;
            proxy_buffer_size 4k;
            proxy_buffers 8 4k;
            proxy_busy_buffers_size 8k;

            # Cache API responses selectively
            proxy_cache api_cache;
            proxy_cache_valid 200 5m;
            proxy_cache_valid 404 1m;
            proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
            proxy_cache_bypass $http_cache_control;

            # Health checks
            proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
            proxy_next_upstream_tries 3;
            proxy_next_upstream_timeout 10s;
        }

        # Authentication endpoints with strict rate limiting
        location /auth/ {
            limit_req zone=login burst=5 nodelay;

            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # No caching for auth endpoints
            proxy_no_cache 1;
            proxy_cache_bypass 1;
        }

        # WebSocket endpoints
        location /ws/ {
            proxy_pass http://backend_websocket;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # WebSocket specific timeouts
            proxy_connect_timeout 7d;
            proxy_send_timeout 7d;
            proxy_read_timeout 7d;
        }

        # Health check endpoint (no auth required)
        location /health {
            access_log off;
            proxy_pass http://backend_api;
            proxy_connect_timeout 1s;
            proxy_send_timeout 2s;
            proxy_read_timeout 2s;
        }

        # Root location
        location / {
            proxy_pass http://backend_api;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_set_header X-Request-ID $request_id;

            # Caching for dynamic content
            proxy_cache dynamic_cache;
            proxy_cache_valid 200 1m;
            proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
            proxy_cache_bypass $cookie_session;
        }

        # Custom error pages
        error_page 404 /404.html;
        error_page 500 502 503 504 /50x.html;

        location = /404.html {
            internal;
            root /var/www/html;
        }

        location = /50x.html {
            internal;
            root /var/www/html;
        }
    }

    # Cache definitions
    proxy_cache_path /var/cache/nginx/static levels=1:2 keys_zone=static_cache:10m max_size=1g inactive=60m use_temp_path=off;
    proxy_cache_path /var/cache/nginx/api levels=1:2 keys_zone=api_cache:10m max_size=100m inactive=10m use_temp_path=off;
    proxy_cache_path /var/cache/nginx/dynamic levels=1:2 keys_zone=dynamic_cache:10m max_size=100m inactive=5m use_temp_path=off;

    # Log format with detailed information
    log_format detailed '$remote_addr - $remote_user [$time_local] "$request" '
                       '$status $body_bytes_sent "$http_referer" '
                       '"$http_user_agent" "$http_x_forwarded_for" '
                       'rt=$request_time uct="$upstream_connect_time" '
                       'uht="$upstream_header_time" urt="$upstream_response_time" '
                       'rid=$request_id';
}

Advanced Load Balancer Configuration with HAProxy:

# haproxy.cfg - Enterprise-grade load balancer configuration

global
    # Process management
    daemon
    user haproxy
    group haproxy
    pidfile /var/run/haproxy.pid

    # Performance tuning
    maxconn 40000
    ulimit-n 81000

    # SSL configuration
    ssl-default-bind-ciphers ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256
    ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

    # Certificate store
    crt-base /etc/ssl/certs
    ca-base /etc/ssl/certs

    # Logging
    log stdout local0 info

    # Statistics
    stats socket /var/run/haproxy.sock mode 600 level admin
    stats timeout 2m

defaults
    mode http
    option httplog
    option dontlognull
    option log-health-checks
    option forwardfor
    option http-server-close

    # Timeouts
    timeout connect 5s
    timeout client 50s
    timeout server 50s
    timeout http-request 15s
    timeout http-keep-alive 15s
    timeout check 10s

    # Error handling
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

# Frontend configuration
frontend web_frontend
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/myapp.com.pem

    # Security headers
    http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
    http-response set-header X-Frame-Options "SAMEORIGIN"
    http-response set-header X-Content-Type-Options "nosniff"
    http-response set-header X-XSS-Protection "1; mode=block"

    # Rate limiting using stick tables
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request reject if { sc_http_req_rate(0) gt 20 }

    # Redirect HTTP to HTTPS
    redirect scheme https if !{ ssl_fc }

    # Request routing based on path
    use_backend api_backend if { path_beg /api/ }
    use_backend ws_backend if { path_beg /ws/ } { hdr(upgrade) -i websocket }
    use_backend static_backend if { path_beg /static/ }
    use_backend auth_backend if { path_beg /auth/ }

    default_backend web_backend

    # Capture headers for logging
    capture request header Host len 64
    capture request header User-Agent len 64
    capture request header X-Forwarded-For len 64

# Backend configurations
backend web_backend
    balance roundrobin
    option httpchk GET /health

    # Backend servers
    server web1 10.0.1.10:3000 check weight 100 maxconn 1000
    server web2 10.0.1.11:3000 check weight 100 maxconn 1000
    server web3 10.0.1.12:3000 check weight 80 maxconn 800
    server web4 10.0.1.13:3000 check weight 50 maxconn 500 backup

    # Health check configuration
    http-check connect port 3000
    http-check send meth GET uri /health ver HTTP/1.1 hdr host myapp.com
    http-check expect status 200

    # Session persistence
    cookie SERVERID insert indirect nocache

    # Connection reuse
    http-reuse safe

backend api_backend
    balance leastconn
    option httpchk GET /api/health

    # Stricter rate limiting for API
    stick-table type ip size 10k expire 60s store http_req_rate(60s)
    http-request track-sc1 src
    http-request reject if { sc_http_req_rate(1) gt 100 }

    server api1 10.0.2.10:3000 check weight 100 maxconn 500
    server api2 10.0.2.11:3000 check weight 100 maxconn 500
    server api3 10.0.2.12:3000 check weight 100 maxconn 500
    server api4 10.0.2.13:3000 check weight 50 maxconn 250 backup

    # Advanced health checks
    http-check connect port 3000
    http-check send meth GET uri /api/health ver HTTP/1.1 hdr host api.myapp.com
    http-check expect status 200
    http-check expect string "healthy"

backend ws_backend
    balance source
    option httpchk GET /ws/health

    # WebSocket specific settings
    timeout server 7d
    timeout tunnel 7d

    server ws1 10.0.3.10:3001 check weight 100
    server ws2 10.0.3.11:3001 check weight 100
    server ws3 10.0.3.12:3001 check weight 100

backend static_backend
    balance roundrobin
    option httpchk HEAD /static/health.txt

    # Optimized for static content
    http-response set-header Cache-Control "public, max-age=31536000"

    server static1 10.0.4.10:8080 check weight 100
    server static2 10.0.4.11:8080 check weight 100

backend auth_backend
    balance roundrobin
    option httpchk GET /auth/health

    # Very strict rate limiting for auth
    stick-table type ip size 10k expire 300s store http_req_rate(60s)
    http-request track-sc2 src
    http-request reject if { sc_http_req_rate(2) gt 10 }

    server auth1 10.0.5.10:3000 check weight 100 maxconn 200
    server auth2 10.0.5.11:3000 check weight 100 maxconn 200

# Statistics and monitoring
listen stats
    bind *:8080
    stats enable
    stats uri /haproxy-stats
    stats realm "HAProxy Statistics"
    stats auth admin:secure_password_here
    stats refresh 30s
    stats show-legends
    stats show-desc "MyApp Load Balancer"

    # Admin interface
    stats admin if TRUE

    # Detailed backend information
    stats show-node
    stats show-legends

# Health check service
listen health_check
    bind *:8081
    monitor-uri /health
    monitor fail if { nbsrv(web_backend) lt 2 }
    monitor fail if { nbsrv(api_backend) lt 2 }

CDN and Static Asset Delivery: Global Performance Optimization

Professional CDN Architecture and Implementation

CDN strategy that actually improves global performance:

#!/bin/bash
# cdn-setup.sh - Professional CDN configuration and optimization

set -euo pipefail

# Configuration
CDN_PROVIDER="${CDN_PROVIDER:-cloudflare}"
DOMAIN="${DOMAIN:-myapp.com}"
ORIGIN_SERVER="${ORIGIN_SERVER:-origin.myapp.com}"
ASSETS_BUCKET="${ASSETS_BUCKET:-myapp-assets}"

setup_cloudflare_cdn() {
    echo "☁️  Setting up Cloudflare CDN configuration..."

    # Cloudflare API configuration
    local zone_id=$(get_cloudflare_zone_id "$DOMAIN")

    # Configure caching rules
    configure_cloudflare_caching "$zone_id"

    # Set up page rules for optimization
    setup_cloudflare_page_rules "$zone_id"

    # Configure security settings
    setup_cloudflare_security "$zone_id"

    # Set up Workers for edge computing
    deploy_cloudflare_workers "$zone_id"

    echo "✅ Cloudflare CDN configured"
}

configure_cloudflare_caching() {
    local zone_id="$1"

    echo "📦 Configuring Cloudflare caching policies..."

    # Static assets - aggressive caching
    curl -X POST "https://api.cloudflare.com/client/v4/zones/$zone_id/pagerules" \
        -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
        -H "Content-Type: application/json" \
        --data '{
            "targets": [{
                "target": "url",
                "constraint": {
                    "operator": "matches",
                    "value": "'$DOMAIN'/static/*"
                }
            }],
            "actions": [{
                "id": "cache_level",
                "value": "cache_everything"
            }, {
                "id": "edge_cache_ttl",
                "value": 2592000
            }, {
                "id": "browser_cache_ttl",
                "value": 31536000
            }],
            "priority": 1,
            "status": "active"
        }'

    # API responses - selective caching
    curl -X POST "https://api.cloudflare.com/client/v4/zones/$zone_id/pagerules" \
        -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
        -H "Content-Type: application/json" \
        --data '{
            "targets": [{
                "target": "url",
                "constraint": {
                    "operator": "matches",
                    "value": "'$DOMAIN'/api/public/*"
                }
            }],
            "actions": [{
                "id": "cache_level",
                "value": "cache_everything"
            }, {
                "id": "edge_cache_ttl",
                "value": 300
            }],
            "priority": 2,
            "status": "active"
        }'

    # Dynamic content - bypass cache
    curl -X POST "https://api.cloudflare.com/client/v4/zones/$zone_id/pagerules" \
        -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
        -H "Content-Type: application/json" \
        --data '{
            "targets": [{
                "target": "url",
                "constraint": {
                    "operator": "matches",
                    "value": "'$DOMAIN'/api/user/*"
                }
            }],
            "actions": [{
                "id": "cache_level",
                "value": "bypass"
            }],
            "priority": 3,
            "status": "active"
        }'
}

deploy_cloudflare_workers() {
    local zone_id="$1"

    echo "⚡ Deploying Cloudflare Workers for edge processing..."

    # Create advanced image optimization worker
    cat > image-optimization-worker.js << 'EOF'
addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
    const url = new URL(request.url)

    // Only process image requests
    if (!url.pathname.match(/\.(jpg|jpeg|png|webp|gif)$/i)) {
        return fetch(request)
    }

    // Get client information
    const accept = request.headers.get('Accept') || ''
    const userAgent = request.headers.get('User-Agent') || ''

    // Determine optimal format
    let format = 'auto'
    if (accept.includes('image/webp')) {
        format = 'webp'
    } else if (accept.includes('image/avif')) {
        format = 'avif'
    }

    // Determine device type for quality optimization
    let quality = 85
    if (userAgent.includes('Mobile')) {
        quality = 75
    }

    // Build Cloudflare Images URL
    const imageUrl = new URL(url)
    imageUrl.searchParams.set('format', format)
    imageUrl.searchParams.set('quality', quality.toString())

    // Add responsive sizing based on viewport
    const viewport = request.headers.get('Viewport-Width')
    if (viewport) {
        const width = Math.min(parseInt(viewport), 2048)
        imageUrl.searchParams.set('width', width.toString())
    }

    // Fetch optimized image
    const response = await fetch(imageUrl.toString())

    // Add performance headers
    const newResponse = new Response(response.body, response)
    newResponse.headers.set('Cache-Control', 'public, max-age=31536000, immutable')
    newResponse.headers.set('X-Image-Optimized', 'cloudflare-worker')

    return newResponse
}
EOF

    # Deploy the worker
    wrangler publish image-optimization-worker.js --name image-optimizer

    # Create security and performance worker
    cat > security-performance-worker.js << 'EOF'
addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
    const url = new URL(request.url)

    // Security: Block suspicious requests
    const userAgent = request.headers.get('User-Agent') || ''
    const suspiciousPatterns = [
        /bot/i, /crawler/i, /scraper/i, /spider/i
    ]

    if (suspiciousPatterns.some(pattern => pattern.test(userAgent))) {
        // Rate limit bots
        const botKey = `bot:${request.headers.get('CF-Connecting-IP')}`
        const botCount = await RATE_LIMITER.get(botKey) || 0

        if (botCount > 10) {
            return new Response('Rate limited', { status: 429 })
        }

        await RATE_LIMITER.put(botKey, botCount + 1, { expirationTtl: 3600 })
    }

    // Performance: Add security headers
    const response = await fetch(request)
    const newResponse = new Response(response.body, response)

    // Security headers
    newResponse.headers.set('X-Frame-Options', 'DENY')
    newResponse.headers.set('X-Content-Type-Options', 'nosniff')
    newResponse.headers.set('X-XSS-Protection', '1; mode=block')
    newResponse.headers.set('Referrer-Policy', 'strict-origin-when-cross-origin')
    newResponse.headers.set('Content-Security-Policy', "default-src 'self'; script-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net")

    // Performance headers
    if (url.pathname.match(/\.(css|js|png|jpg|jpeg|gif|webp|svg|woff|woff2)$/)) {
        newResponse.headers.set('Cache-Control', 'public, max-age=31536000, immutable')
    }

    return newResponse
}
EOF

    wrangler publish security-performance-worker.js --name security-performance
}

setup_aws_cloudfront() {
    echo "🌐 Setting up AWS CloudFront distribution..."

    # Create CloudFront distribution configuration
    cat > cloudfront-distribution.json << EOF
{
    "CallerReference": "$(date +%s)",
    "Comment": "Production CDN for $DOMAIN",
    "DefaultRootObject": "index.html",
    "Origins": {
        "Quantity": 2,
        "Items": [
            {
                "Id": "origin-server",
                "DomainName": "$ORIGIN_SERVER",
                "CustomOriginConfig": {
                    "HTTPPort": 80,
                    "HTTPSPort": 443,
                    "OriginProtocolPolicy": "https-only",
                    "OriginSslProtocols": {
                        "Quantity": 1,
                        "Items": ["TLSv1.2"]
                    }
                }
            },
            {
                "Id": "s3-assets",
                "DomainName": "$ASSETS_BUCKET.s3.amazonaws.com",
                "S3OriginConfig": {
                    "OriginAccessIdentity": ""
                }
            }
        ]
    },
    "DefaultCacheBehavior": {
        "TargetOriginId": "origin-server",
        "ViewerProtocolPolicy": "redirect-to-https",
        "MinTTL": 0,
        "ForwardedValues": {
            "QueryString": true,
            "Cookies": {
                "Forward": "whitelist",
                "WhitelistedNames": {
                    "Quantity": 2,
                    "Items": ["session_id", "auth_token"]
                }
            },
            "Headers": {
                "Quantity": 3,
                "Items": ["Host", "Authorization", "CloudFront-Viewer-Country"]
            }
        },
        "TrustedSigners": {
            "Enabled": false,
            "Quantity": 0
        },
        "Compress": true
    },
    "CacheBehaviors": {
        "Quantity": 3,
        "Items": [
            {
                "PathPattern": "/static/*",
                "TargetOriginId": "s3-assets",
                "ViewerProtocolPolicy": "redirect-to-https",
                "MinTTL": 31536000,
                "DefaultTTL": 31536000,
                "MaxTTL": 31536000,
                "ForwardedValues": {
                    "QueryString": false,
                    "Cookies": {
                        "Forward": "none"
                    }
                },
                "Compress": true
            },
            {
                "PathPattern": "/api/*",
                "TargetOriginId": "origin-server",
                "ViewerProtocolPolicy": "redirect-to-https",
                "MinTTL": 0,
                "DefaultTTL": 300,
                "MaxTTL": 3600,
                "ForwardedValues": {
                    "QueryString": true,
                    "Cookies": {
                        "Forward": "all"
                    },
                    "Headers": {
                        "Quantity": 4,
                        "Items": ["Authorization", "Content-Type", "Accept", "User-Agent"]
                    }
                }
            },
            {
                "PathPattern": "/auth/*",
                "TargetOriginId": "origin-server",
                "ViewerProtocolPolicy": "redirect-to-https",
                "MinTTL": 0,
                "DefaultTTL": 0,
                "MaxTTL": 0,
                "ForwardedValues": {
                    "QueryString": true,
                    "Cookies": {
                        "Forward": "all"
                    },
                    "Headers": {
                        "Quantity": 1,
                        "Items": ["*"]
                    }
                }
            }
        ]
    },
    "Enabled": true,
    "PriceClass": "PriceClass_All",
    "Aliases": {
        "Quantity": 2,
        "Items": ["$DOMAIN", "www.$DOMAIN"]
    },
    "ViewerCertificate": {
        "ACMCertificateArn": "arn:aws:acm:us-east-1:123456789012:certificate/certificate-id",
        "SSLSupportMethod": "sni-only",
        "MinimumProtocolVersion": "TLSv1.2_2021"
    },
    "HttpVersion": "http2",
    "IsIPV6Enabled": true,
    "Logging": {
        "Enabled": true,
        "IncludeCookies": false,
        "Bucket": "$DOMAIN-cloudfront-logs.s3.amazonaws.com",
        "Prefix": "access-logs/"
    }
}
EOF

    # Create the distribution
    aws cloudfront create-distribution \
        --distribution-config file://cloudfront-distribution.json \
        --region us-east-1

    rm cloudfront-distribution.json

    echo "✅ CloudFront distribution created"
}

optimize_static_assets() {
    echo "🎨 Optimizing static assets for CDN delivery..."

    # Create asset optimization pipeline
    cat > optimize-assets.js << 'EOF'
const fs = require('fs');
const path = require('path');
const sharp = require('sharp');
const { minify } = require('terser');
const CleanCSS = require('clean-css');
const { gzipSync, brotliCompressSync } = require('zlib');

class AssetOptimizer {
    constructor(srcDir, distDir) {
        this.srcDir = srcDir;
        this.distDir = distDir;
        this.stats = {
            processed: 0,
            originalSize: 0,
            optimizedSize: 0
        };
    }

    async optimizeImages(inputDir, outputDir) {
        console.log('🖼️  Optimizing images...');

        const imageFiles = this.getFiles(inputDir, /\.(jpg|jpeg|png|webp|svg)$/i);

        for (const file of imageFiles) {
            const inputPath = path.join(inputDir, file);
            const outputPath = path.join(outputDir, file);
            const outputDir = path.dirname(outputPath);

            // Ensure output directory exists
            fs.mkdirSync(outputDir, { recursive: true });

            const inputStats = fs.statSync(inputPath);
            this.stats.originalSize += inputStats.size;

            if (file.endsWith('.svg')) {
                // Copy SVG files as-is (could add SVGO optimization)
                fs.copyFileSync(inputPath, outputPath);
            } else {
                // Optimize raster images
                await sharp(inputPath)
                    .resize(2048, 2048, {
                        fit: 'inside',
                        withoutEnlargement: true
                    })
                    .jpeg({ quality: 85, progressive: true })
                    .png({ quality: 85, progressive: true })
                    .webp({ quality: 85 })
                    .toFile(outputPath);

                // Generate additional formats
                const baseName = path.parse(file).name;
                const baseDir = path.dirname(outputPath);

                // Generate WebP version
                await sharp(inputPath)
                    .webp({ quality: 85 })
                    .toFile(path.join(baseDir, `${baseName}.webp`));

                // Generate AVIF version for modern browsers
                try {
                    await sharp(inputPath)
                        .avif({ quality: 75 })
                        .toFile(path.join(baseDir, `${baseName}.avif`));
                } catch (e) {
                    // AVIF not supported in all Sharp versions
                    console.log(`AVIF optimization skipped for ${file}`);
                }
            }

            const outputStats = fs.statSync(outputPath);
            this.stats.optimizedSize += outputStats.size;
            this.stats.processed++;

            console.log(`  Optimized ${file}: ${this.formatBytes(inputStats.size)} → ${this.formatBytes(outputStats.size)}`);
        }
    }

    async optimizeJavaScript(inputDir, outputDir) {
        console.log('📜 Optimizing JavaScript...');

        const jsFiles = this.getFiles(inputDir, /\.js$/i);

        for (const file of jsFiles) {
            const inputPath = path.join(inputDir, file);
            const outputPath = path.join(outputDir, file);
            const outputDir = path.dirname(outputPath);

            fs.mkdirSync(outputDir, { recursive: true });

            const code = fs.readFileSync(inputPath, 'utf8');
            const inputSize = Buffer.byteLength(code, 'utf8');
            this.stats.originalSize += inputSize;

            // Minify JavaScript
            const result = await minify(code, {
                compress: {
                    dead_code: true,
                    drop_console: true,
                    drop_debugger: true,
                    keep_fargs: false,
                    passes: 2
                },
                mangle: {
                    toplevel: true
                },
                format: {
                    comments: false
                }
            });

            const minified = result.code;
            const outputSize = Buffer.byteLength(minified, 'utf8');
            this.stats.optimizedSize += outputSize;

            // Write minified version
            fs.writeFileSync(outputPath, minified);

            // Create compressed versions
            this.createCompressedVersions(outputPath, minified);

            console.log(`  Optimized ${file}: ${this.formatBytes(inputSize)} → ${this.formatBytes(outputSize)}`);
        }
    }

    async optimizeCSS(inputDir, outputDir) {
        console.log('🎨 Optimizing CSS...');

        const cssFiles = this.getFiles(inputDir, /\.css$/i);

        for (const file of cssFiles) {
            const inputPath = path.join(inputDir, file);
            const outputPath = path.join(outputDir, file);
            const outputDir = path.dirname(outputPath);

            fs.mkdirSync(outputDir, { recursive: true });

            const css = fs.readFileSync(inputPath, 'utf8');
            const inputSize = Buffer.byteLength(css, 'utf8');
            this.stats.originalSize += inputSize;

            // Minify CSS
            const result = new CleanCSS({
                level: 2,
                inline: ['all'],
                rebase: false
            }).minify(css);

            if (result.errors.length > 0) {
                console.error(`CSS optimization errors in ${file}:`, result.errors);
                continue;
            }

            const minified = result.styles;
            const outputSize = Buffer.byteLength(minified, 'utf8');
            this.stats.optimizedSize += outputSize;

            // Write minified version
            fs.writeFileSync(outputPath, minified);

            // Create compressed versions
            this.createCompressedVersions(outputPath, minified);

            console.log(`  Optimized ${file}: ${this.formatBytes(inputSize)} → ${this.formatBytes(outputSize)}`);
        }
    }

    createCompressedVersions(filePath, content) {
        // Create gzipped version
        const gzipped = gzipSync(content, { level: 9 });
        fs.writeFileSync(`${filePath}.gz`, gzipped);

        // Create Brotli compressed version
        const brotli = brotliCompressSync(content, {
            params: {
                [require('zlib').constants.BROTLI_PARAM_QUALITY]: 11
            }
        });
        fs.writeFileSync(`${filePath}.br`, brotli);
    }

    getFiles(dir, pattern) {
        const files = [];

        function walk(currentDir) {
            const entries = fs.readdirSync(currentDir, { withFileTypes: true });

            for (const entry of entries) {
                const fullPath = path.join(currentDir, entry.name);

                if (entry.isDirectory()) {
                    walk(fullPath);
                } else if (pattern.test(entry.name)) {
                    files.push(path.relative(dir, fullPath));
                }
            }
        }

        walk(dir);
        return files;
    }

    formatBytes(bytes) {
        if (bytes === 0) return '0 B';
        const k = 1024;
        const sizes = ['B', 'KB', 'MB', 'GB'];
        const i = Math.floor(Math.log(bytes) / Math.log(k));
        return `${parseFloat((bytes / Math.pow(k, i)).toFixed(1))} ${sizes[i]}`;
    }

    printStats() {
        const saved = this.stats.originalSize - this.stats.optimizedSize;
        const percent = ((saved / this.stats.originalSize) * 100).toFixed(1);

        console.log('\n📊 Optimization Summary:');
        console.log(`   Files processed: ${this.stats.processed}`);
        console.log(`   Original size: ${this.formatBytes(this.stats.originalSize)}`);
        console.log(`   Optimized size: ${this.formatBytes(this.stats.optimizedSize)}`);
        console.log(`   Space saved: ${this.formatBytes(saved)} (${percent}%)`);
    }
}

// Usage
const optimizer = new AssetOptimizer('./src/assets', './dist/assets');

async function main() {
    console.log('🚀 Starting asset optimization pipeline...');

    await optimizer.optimizeImages('./src/assets/images', './dist/assets/images');
    await optimizer.optimizeJavaScript('./src/assets/js', './dist/assets/js');
    await optimizer.optimizeCSS('./src/assets/css', './dist/assets/css');

    optimizer.printStats();
    console.log('✅ Asset optimization completed');
}

main().catch(console.error);
EOF

    # Run the optimization
    node optimize-assets.js

    echo "✅ Static assets optimized"
}

# Command routing
case "${1:-help}" in
    "cloudflare")
        setup_cloudflare_cdn
        ;;
    "aws")
        setup_aws_cloudfront
        ;;
    "optimize")
        optimize_static_assets
        ;;
    "all")
        optimize_static_assets
        setup_cloudflare_cdn
        ;;
    "help"|*)
        cat << EOF
CDN Setup and Optimization

Usage: $0 <command>

Commands:
    cloudflare      Set up Cloudflare CDN configuration
    aws            Set up AWS CloudFront distribution
    optimize       Optimize static assets for CDN delivery
    all            Run complete CDN setup and optimization

Examples:
    $0 cloudflare   # Configure Cloudflare CDN
    $0 aws          # Set up CloudFront distribution
    $0 optimize     # Optimize assets for delivery
    $0 all          # Complete CDN setup
EOF
        ;;
esac

Key Takeaways

Professional deployment and infrastructure management transforms amateur manual processes into automated, scalable systems that handle real-world production demands. Modern infrastructure requires thinking beyond single servers to orchestrated, cloud-native architectures with proper load balancing, CDN optimization, and Infrastructure as Code practices.

The deployment and infrastructure mastery mindset:

Deployment strategies eliminate downtime: Rolling, blue-green, and canary deployments ensure zero-downtime updates with automatic rollback capabilities
Cloud platforms provide scalable foundation: AWS, GCP, and Azure offer the building blocks for resilient, globally distributed infrastructure
Infrastructure as Code ensures consistency: Terraform and similar tools make infrastructure reproducible, versionable, and auditable
Load balancing enables scale: Professional load balancers distribute traffic intelligently while handling failures gracefully
CDNs optimize global performance: Content delivery networks reduce latency and server load through intelligent edge caching

What distinguishes professional deployment infrastructure:

Automated deployment pipelines that handle complexity without human intervention
Multi-cloud infrastructure provisioning that eliminates vendor lock-in risks
Intelligent load balancing with health checking and automatic failover
Global CDN optimization that serves content from the edge closest to users
Comprehensive monitoring and alerting that detects issues before customers notice

What’s Next

This article covered deployment strategies, cloud platform architecture, Infrastructure as Code, load balancing, and CDN optimization. The next article completes the deployment infrastructure with CI/CD pipeline automation, comprehensive monitoring and alerting systems, centralized log aggregation and analysis, disaster recovery planning, and backup strategies that ensure business continuity.

You’re no longer manually deploying applications and hoping they work—you’re operating professional infrastructure that scales globally, handles failures gracefully, and delivers optimal performance to users worldwide. The deployment foundation is solid. Now we build the operational excellence around it.