Deployment & Infrastructure - 1/2
From Container Mastery to Production Infrastructure Reality
You’ve mastered advanced containerization with production-grade orchestration that handles automatic scaling and zero-downtime deployments, implemented comprehensive security hardening that prevents container escapes and vulnerability exploitation, optimized images for performance with techniques that reduce deployment time and resource consumption, and established enterprise registry management with authentication, cleanup policies, and operational monitoring. Your containerized applications now operate as production-grade infrastructure that scales, performs, and secures applications for enterprise environments. But here’s the infrastructure reality that separates hobby deployments from enterprise-grade systems: perfect containerization means nothing if your deployment infrastructure can’t handle real-world traffic, lacks cloud-native scalability, has no disaster recovery plan, and operates without proper monitoring and automation that can detect and resolve issues before customers notice them.
The production infrastructure nightmare that destroys scalable businesses:
# Your production infrastructure horror story
# CTO: "We need to handle 10x traffic for the product launch tomorrow"
# Attempt 1: Manual server scaling
$ ssh production-server-1
production$ htop
# CPU: 98%, Memory: 95%, Load: 15.2
# Single server melting under load
$ curl -I https://api.company.com/health
curl: (28) Operation timed out after 30001 milliseconds
# API completely unresponsive
# Attempt 2: Emergency server provisioning
$ aws ec2 run-instances --image-id ami-12345 --instance-type t3.large --count 5
# 20 minutes later...
$ ssh new-server-1
new-server$ sudo apt update && sudo apt install docker.io
# Another 15 minutes of manual setup per server
# No configuration management, everything installed by hand
# Attempt 3: Manual load balancer configuration
$ ssh load-balancer
lb$ sudo nano /etc/nginx/nginx.conf
# Frantically typing server IPs while customers can't access the site
# No health checking, traffic routing to failed servers
# Attempt 4: Database disaster
$ ssh database-server
db$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 20G 19G 100M 99% /
# Database disk full, transactions failing
# The cascading infrastructure disasters:
# - No auto-scaling, manual server provisioning takes hours
# - No infrastructure as code, every server manually configured
# - No monitoring, problems discovered by angry customers
# - No disaster recovery, single points of failure everywhere
# - No CI/CD pipelines, deployments via SSH and pray
# - No load balancing health checks, traffic to dead servers
# - No CDN, static assets served from overloaded origin servers
# - No backup strategy, data loss risk on every failure
# Launch day result: Complete system failure
# 8-hour outage during peak launch traffic
# 50,000 potential customers lost to competitors
# $5M funding round canceled due to "technical concerns"
# Engineering team blamed for "not being ready for scale"
# The painful truth: Perfect containers can't save amateur infrastructure
The uncomfortable production truth: Advanced containerization and orchestration can’t save you from infrastructure disasters when your deployment strategy lacks cloud-native architecture, automated scaling, proper monitoring, and disaster recovery planning. Professional infrastructure requires thinking beyond containers to the entire deployment ecosystem.
Real-world infrastructure failure consequences:
// What happens when infrastructure practices are amateur:
const infrastructureFailureImpact = {
scalingDisasters: {
problem: "Traffic spikes overwhelm manually managed infrastructure",
cause: "No auto-scaling, manual provisioning, single-server dependencies",
impact: "Website crashes during viral marketing campaign, customers lost",
cost: "$2M in lost revenue during peak shopping season",
},
securityBreaches: {
problem: "Compromised server leads to full infrastructure takeover",
cause: "No infrastructure as code, inconsistent security policies",
impact: "Attacker pivots through entire network, customer data stolen",
consequences: "Class action lawsuit, regulatory fines, business closure",
},
operationalChaos: {
problem: "Critical system failure at 3 AM with no monitoring alerts",
cause: "No proper monitoring, alerting, or on-call procedures",
impact: "6-hour outage discovered by customer complaints, not systems",
reality: "Competitors with proper infrastructure capture market share",
},
disasterRecoveryFailure: {
problem: "Data center failure causes complete data loss",
cause: "No backup strategy, single-region deployment, no DR planning",
impact: "Company closes permanently, all customer data lost forever",
prevention: "Professional DR would cost 0.1% of revenue to implement",
},
// Perfect containerization is worthless when infrastructure
// lacks scalability, monitoring, disaster recovery, and automation
};
Production infrastructure mastery requires understanding:
- Deployment strategies that handle traffic gracefully with rolling updates, blue-green deployments, and canary releases
- Cloud platforms architecture that leverages AWS, GCP, and Azure for scalable, resilient infrastructure
- Server management and provisioning with Infrastructure as Code that eliminates manual configuration and ensures consistency
- Load balancers and reverse proxies that distribute traffic intelligently and handle failures transparently
- CDN and static asset delivery that optimizes performance globally and reduces origin server load
This article transforms your infrastructure from manual, error-prone processes into automated, scalable systems that handle real-world production demands with confidence.
Deployment Strategies: Beyond “SSH and Pray”
The Evolution from Manual Deployments to Professional Strategies
Understanding why manual deployments are career-limiting:
// Manual deployment vs Professional deployment strategies
const deploymentEvolution = {
manualDeployment: {
process: "SSH into servers and run commands manually",
risk: "Human error, inconsistent deployments, downtime",
scalability: "Doesn't scale beyond 2-3 servers",
rollback: "Pray you have backups, usually doesn't work",
testing: "Test in production, debug in front of customers",
timeline: "Hours of downtime for simple changes",
stressLevel: "Emergency room levels of blood pressure",
},
professionalDeployment: {
process: "Automated, repeatable, tested deployment pipelines",
risk: "Minimal risk with automated testing and rollbacks",
scalability: "Handles thousands of servers automatically",
rollback: "One-click rollback to any previous version",
testing: "Comprehensive testing before production",
timeline: "Zero-downtime deployments in minutes",
stressLevel: "Relaxed coffee sipping while systems deploy",
},
theDeploymentGap: [
"Manual processes don't scale beyond tiny teams",
"Human error causes 80% of production incidents",
"No standardization leads to configuration drift",
"Rollback procedures often fail when needed most",
"Testing in production destroys customer experience",
"Manual deployments become bottlenecks for releases",
],
};
Rolling Deployment: The Foundation of Zero-Downtime Updates
#!/bin/bash
# rolling-deployment.sh - Professional rolling deployment implementation
set -euo pipefail
# Configuration
APP_NAME="${APP_NAME:-myapp}"
NEW_VERSION="${1:-latest}"
INSTANCES="${INSTANCES:-5}"
BATCH_SIZE="${BATCH_SIZE:-1}"
HEALTH_CHECK_URL="${HEALTH_CHECK_URL:-/health}"
ROLLBACK_ON_FAILURE="${ROLLBACK_ON_FAILURE:-true}"
rolling_deployment() {
local new_version="$1"
local total_instances="$2"
local batch_size="$3"
echo "π Starting rolling deployment: $APP_NAME to $new_version"
echo "π Configuration: $total_instances instances, batch size $batch_size"
# Pre-deployment validation
validate_deployment_prerequisites "$new_version"
# Record current deployment state for rollback
record_deployment_state
local current_batch=1
local total_batches=$(( (total_instances + batch_size - 1) / batch_size ))
for (( i=1; i<=total_instances; i+=batch_size )); do
local end_instance=$((i + batch_size - 1))
if [ $end_instance -gt $total_instances ]; then
end_instance=$total_instances
fi
echo "π¦ Batch $current_batch/$total_batches: Updating instances $i-$end_instance"
# Update batch of instances
update_instance_batch "$i" "$end_instance" "$new_version"
# Wait for instances to be healthy
wait_for_batch_health "$i" "$end_instance"
# Verify deployment quality
if ! verify_deployment_quality "$i" "$end_instance"; then
echo "β Deployment quality check failed, initiating rollback"
rollback_deployment
exit 1
fi
# Traffic validation - ensure no degradation
if ! validate_traffic_health; then
echo "β Traffic health degraded, initiating rollback"
rollback_deployment
exit 1
fi
# Pause between batches for monitoring
if [ $end_instance -lt $total_instances ]; then
echo "βΈοΈ Monitoring deployment health for 60 seconds..."
sleep 60
fi
((current_batch++))
done
# Final validation
perform_final_deployment_validation "$new_version"
echo "β
Rolling deployment completed successfully"
echo "π All $total_instances instances updated to $new_version"
}
validate_deployment_prerequisites() {
local version="$1"
echo "π Validating deployment prerequisites..."
# Check if image exists and is healthy
if ! docker pull "$APP_NAME:$version" &>/dev/null; then
echo "β Cannot pull image $APP_NAME:$version"
exit 1
fi
# Verify image passes security scan
if ! security_scan_image "$APP_NAME:$version"; then
echo "β Security scan failed for $APP_NAME:$version"
exit 1
fi
# Check if infrastructure can handle the deployment
if ! check_infrastructure_capacity; then
echo "β Insufficient infrastructure capacity"
exit 1
fi
# Validate database migrations if needed
if ! validate_database_migrations "$version"; then
echo "β Database migration validation failed"
exit 1
fi
echo "β
Prerequisites validation passed"
}
update_instance_batch() {
local start_instance="$1"
local end_instance="$2"
local version="$3"
for (( i=start_instance; i<=end_instance; i++ )); do
echo "π Updating instance $APP_NAME-$i to $version..."
# Gracefully stop current instance
docker stop "$APP_NAME-$i" --time=30 || true
docker rm "$APP_NAME-$i" || true
# Start new instance with updated version
docker run -d \
--name "$APP_NAME-$i" \
--network production \
--restart unless-stopped \
--health-cmd "curl -f http://localhost:3000$HEALTH_CHECK_URL || exit 1" \
--health-interval 10s \
--health-retries 5 \
--health-start-period 60s \
--label "version=$version" \
--label "deployment-batch=$(date +%Y%m%d-%H%M%S)" \
-e NODE_ENV=production \
-e INSTANCE_ID="$i" \
"$APP_NAME:$version"
done
}
wait_for_batch_health() {
local start_instance="$1"
local end_instance="$2"
echo "π₯ Waiting for batch instances to become healthy..."
for (( i=start_instance; i<=end_instance; i++ )); do
local timeout=300 # 5 minutes
local counter=0
while [ $counter -lt $timeout ]; do
local health_status=$(docker inspect "$APP_NAME-$i" --format='{{.State.Health.Status}}' 2>/dev/null || echo "unhealthy")
if [ "$health_status" = "healthy" ]; then
echo "β
Instance $APP_NAME-$i is healthy"
break
fi
if [ "$health_status" = "unhealthy" ]; then
echo "β Instance $APP_NAME-$i failed health check"
show_instance_logs "$APP_NAME-$i"
return 1
fi
sleep 10
((counter += 10))
echo -n "."
done
if [ $counter -ge $timeout ]; then
echo "β Instance $APP_NAME-$i failed to become healthy within $timeout seconds"
return 1
fi
done
echo ""
echo "β
All instances in batch are healthy"
}
verify_deployment_quality() {
local start_instance="$1"
local end_instance="$2"
echo "π§ͺ Verifying deployment quality..."
# Test each instance individually
for (( i=start_instance; i<=end_instance; i++ )); do
local instance_ip=$(docker inspect "$APP_NAME-$i" --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}')
# Smoke tests
if ! run_smoke_tests "http://$instance_ip:3000"; then
echo "β Smoke tests failed for instance $APP_NAME-$i"
return 1
fi
# Performance baseline test
if ! check_performance_baseline "$instance_ip"; then
echo "β Performance baseline failed for instance $APP_NAME-$i"
return 1
fi
done
echo "β
Deployment quality verification passed"
return 0
}
validate_traffic_health() {
echo "π Validating overall traffic health..."
# Check error rate
local error_rate=$(get_current_error_rate)
if [ "$error_rate" -gt 5 ]; then
echo "β Error rate too high: $error_rate%"
return 1
fi
# Check response time
local avg_response_time=$(get_average_response_time)
if [ "$avg_response_time" -gt 2000 ]; then
echo "β Response time too high: ${avg_response_time}ms"
return 1
fi
# Check throughput
local throughput=$(get_current_throughput)
local expected_throughput=$(get_baseline_throughput)
local throughput_ratio=$((throughput * 100 / expected_throughput))
if [ $throughput_ratio -lt 80 ]; then
echo "β Throughput too low: $throughput_ratio% of baseline"
return 1
fi
echo "β
Traffic health validation passed"
return 0
}
# ========================================
# Blue-Green Deployment Strategy
# ========================================
blue_green_deployment() {
local new_version="$1"
local current_env="${2:-blue}"
local target_env="green"
if [ "$current_env" = "green" ]; then
target_env="blue"
fi
echo "π΅π’ Starting blue-green deployment: $current_env β $target_env"
# Stage 1: Deploy to inactive environment
deploy_to_environment "$target_env" "$new_version"
# Stage 2: Comprehensive testing of new environment
run_comprehensive_tests "$target_env"
# Stage 3: Gradual traffic switching with monitoring
gradual_traffic_switch "$current_env" "$target_env"
# Stage 4: Monitor and validate
monitor_post_switch "$target_env"
# Stage 5: Cleanup old environment
cleanup_old_environment "$current_env"
echo "β
Blue-green deployment completed successfully"
echo "π― Active environment: $target_env"
}
deploy_to_environment() {
local environment="$1"
local version="$2"
echo "π Deploying $version to $environment environment..."
# Update environment-specific configuration
update_environment_config "$environment" "$version"
# Deploy all services to environment
docker-compose -f "docker-compose.yml" -f "docker-compose.$environment.yml" \
pull --quiet
docker-compose -f "docker-compose.yml" -f "docker-compose.$environment.yml" \
up -d --scale app=3
# Wait for environment to be fully operational
wait_for_environment_ready "$environment"
echo "β
Deployment to $environment completed"
}
gradual_traffic_switch() {
local old_env="$1"
local new_env="$2"
echo "π Performing gradual traffic switch..."
# Start with 10% traffic to new environment
update_load_balancer_weights "$old_env:90" "$new_env:10"
monitor_traffic_split 180 # Monitor for 3 minutes
# Increase to 50% if healthy
update_load_balancer_weights "$old_env:50" "$new_env:50"
monitor_traffic_split 300 # Monitor for 5 minutes
# Full switch if still healthy
update_load_balancer_weights "$old_env:0" "$new_env:100"
monitor_traffic_split 600 # Monitor for 10 minutes
echo "β
Traffic switch completed successfully"
}
# ========================================
# Canary Deployment Strategy
# ========================================
canary_deployment() {
local new_version="$1"
local canary_percentage="${2:-10}"
local canary_duration="${3:-1800}" # 30 minutes
echo "π€ Starting canary deployment: $canary_percentage% traffic to $new_version"
# Deploy canary instances
deploy_canary_instances "$new_version" "$canary_percentage"
# Configure intelligent traffic routing
setup_canary_routing "$canary_percentage"
# Monitor canary metrics with automated decision making
if monitor_canary_deployment "$canary_duration"; then
echo "β
Canary deployment successful, promoting to full deployment"
promote_canary_to_production "$new_version"
else
echo "β Canary deployment failed, automatic rollback initiated"
rollback_canary_deployment
exit 1
fi
}
deploy_canary_instances() {
local version="$1"
local percentage="$2"
local total_instances=$(docker ps --filter "label=app=$APP_NAME" --format "{{.Names}}" | wc -l)
local canary_instances=$(( (total_instances * percentage) / 100 ))
if [ $canary_instances -eq 0 ]; then
canary_instances=1
fi
echo "π Deploying $canary_instances canary instances (${percentage}% of $total_instances)"
for (( i=1; i<=canary_instances; i++ )); do
docker run -d \
--name "$APP_NAME-canary-$i" \
--network production \
--restart unless-stopped \
--label "app=$APP_NAME" \
--label "deployment-type=canary" \
--label "version=$version" \
--label "canary-group=$(date +%Y%m%d-%H%M%S)" \
-e NODE_ENV=production \
-e DEPLOYMENT_TYPE=canary \
"$APP_NAME:$version"
done
# Wait for canary instances to be healthy
wait_for_canary_health
}
monitor_canary_deployment() {
local duration="$1"
local start_time=$(date +%s)
local end_time=$((start_time + duration))
echo "π Monitoring canary deployment for $duration seconds..."
while [ $(date +%s) -lt $end_time ]; do
# Collect canary metrics
local canary_metrics=$(collect_canary_metrics)
# Analyze metrics for anomalies
if ! analyze_canary_health "$canary_metrics"; then
echo "β Canary health degraded, failing deployment"
return 1
fi
# Progressive analysis - stricter thresholds over time
local elapsed=$(($(date +%s) - start_time))
local progress=$((elapsed * 100 / duration))
if [ $progress -gt 50 ] && ! deep_canary_analysis "$canary_metrics"; then
echo "β Deep canary analysis failed, deployment unhealthy"
return 1
fi
echo "β
Canary health check passed ($progress% complete)"
sleep 60
done
echo "β
Canary monitoring period completed successfully"
return 0
}
# ========================================
# Deployment Utilities and Monitoring
# ========================================
run_smoke_tests() {
local endpoint="$1"
echo "π§ͺ Running smoke tests against $endpoint..."
# Test 1: Health endpoint
if ! curl -f -m 10 "$endpoint/health" &>/dev/null; then
echo "β Health check failed"
return 1
fi
# Test 2: Authentication flow
local auth_token=$(curl -s -X POST "$endpoint/auth/login" \
-H "Content-Type: application/json" \
-d '{"username":"test","password":"test"}' | \
jq -r '.token' 2>/dev/null)
if [ -z "$auth_token" ] || [ "$auth_token" = "null" ]; then
echo "β Authentication test failed"
return 1
fi
# Test 3: Core API functionality
if ! curl -f -H "Authorization: Bearer $auth_token" \
"$endpoint/api/status" &>/dev/null; then
echo "β Core API test failed"
return 1
fi
# Test 4: Database connectivity
if ! curl -f "$endpoint/api/health/database" &>/dev/null; then
echo "β Database connectivity test failed"
return 1
fi
echo "β
All smoke tests passed"
return 0
}
rollback_deployment() {
echo "π Initiating deployment rollback..."
local previous_version=$(get_previous_deployment_version)
if [ -z "$previous_version" ]; then
echo "β Cannot determine previous version for rollback"
exit 1
fi
echo "βͺ Rolling back to version: $previous_version"
# Execute rollback using the same strategy as deployment
case "${DEPLOYMENT_STRATEGY:-rolling}" in
"rolling")
rolling_deployment "$previous_version" "$INSTANCES" "$BATCH_SIZE"
;;
"blue-green")
# Switch back to previous environment
switch_to_previous_environment
;;
"canary")
# Remove canary instances and restore full production
remove_canary_instances
;;
esac
# Verify rollback success
if validate_rollback_success "$previous_version"; then
echo "β
Rollback completed successfully"
notify_rollback_success "$previous_version"
else
echo "β Rollback failed - manual intervention required"
alert_operations_team
exit 1
fi
}
# Command routing for deployment strategies
case "${1:-help}" in
"rolling")
rolling_deployment "${2:-latest}" "${3:-5}" "${4:-1}"
;;
"blue-green")
blue_green_deployment "${2:-latest}" "${3:-blue}"
;;
"canary")
canary_deployment "${2:-latest}" "${3:-10}" "${4:-1800}"
;;
"rollback")
rollback_deployment
;;
"smoke-test")
run_smoke_tests "${2:-http://localhost:3000}"
;;
"help"|*)
cat << EOF
Professional Deployment Strategies
Usage: $0 <strategy> [options]
Strategies:
rolling [version] [instances] [batch] Rolling deployment (zero downtime)
blue-green [version] [current-env] Blue-green deployment (instant switch)
canary [version] [percentage] [duration] Canary deployment (gradual rollout)
rollback Rollback to previous version
smoke-test [endpoint] Run smoke tests against endpoint
Examples:
$0 rolling v2.1.0 10 2 # Rolling update, 10 instances, 2 at a time
$0 blue-green v2.1.0 blue # Blue-green from blue to green
$0 canary v2.1.0 15 3600 # Canary with 15% traffic for 1 hour
$0 rollback # Emergency rollback
EOF
;;
esac
Cloud Platform Architecture: AWS, GCP, and Azure Mastery
Choosing Your Cloud Foundation Wisely
Understanding cloud platform strengths and use cases:
// Cloud platform comparison for backend infrastructure
const cloudPlatformDecision = {
aws: {
strengths: [
"Largest market share and ecosystem",
"Most comprehensive service catalog",
"Best enterprise and compliance support",
"Mature networking and security services",
"Extensive third-party integrations",
],
idealFor: [
"Enterprise applications requiring compliance",
"Complex multi-service architectures",
"Organizations with existing AWS investments",
"Applications needing global edge presence",
"Teams with strong DevOps/Infrastructure expertise",
],
pricing: "Pay-as-you-go, complex but optimizable",
learningCurve: "Steep but extensive documentation",
keyServices: {
compute: ["EC2", "ECS", "EKS", "Lambda", "Fargate"],
storage: ["S3", "EBS", "EFS"],
database: ["RDS", "DynamoDB", "ElastiCache", "DocumentDB"],
networking: ["VPC", "ALB/NLB", "CloudFront", "Route53"],
monitoring: ["CloudWatch", "X-Ray", "Systems Manager"],
},
},
gcp: {
strengths: [
"Superior machine learning and AI services",
"Excellent container orchestration (GKE)",
"Competitive pricing and sustained use discounts",
"Strong data analytics and BigData tools",
"Google-scale global network infrastructure",
],
idealFor: [
"Data-heavy applications and analytics",
"Machine learning and AI workloads",
"Kubernetes-native applications",
"Startups looking for cost-effective scaling",
"Organizations leveraging Google Workspace",
],
pricing: "Generally more cost-effective, simpler structure",
learningCurve: "Moderate, good developer experience",
keyServices: {
compute: ["Compute Engine", "GKE", "Cloud Run", "Cloud Functions"],
storage: ["Cloud Storage", "Persistent Disk"],
database: ["Cloud SQL", "Firestore", "BigQuery", "Memorystore"],
networking: ["VPC", "Load Balancing", "Cloud CDN", "Cloud DNS"],
monitoring: ["Stackdriver", "Cloud Logging", "Cloud Monitoring"],
},
},
azure: {
strengths: [
"Seamless Microsoft ecosystem integration",
"Strong hybrid cloud capabilities",
"Excellent enterprise Active Directory integration",
"Competitive pricing for Microsoft shops",
"Growing market presence and feature parity",
],
idealFor: [
"Organizations heavily using Microsoft stack",
"Hybrid cloud deployments",
"Enterprise applications requiring AD integration",
"Windows-based application workloads",
".NET and C# development teams",
],
pricing: "Competitive, especially with Microsoft licensing bundles",
learningCurve: "Moderate, familiar for Windows administrators",
keyServices: {
compute: ["Virtual Machines", "AKS", "Container Instances", "Functions"],
storage: ["Blob Storage", "Disk Storage", "Files"],
database: ["SQL Database", "Cosmos DB", "PostgreSQL", "Redis Cache"],
networking: ["Virtual Network", "Load Balancer", "CDN", "DNS"],
monitoring: ["Monitor", "Log Analytics", "Application Insights"],
},
},
};
// Decision matrix for cloud platform selection
function selectCloudPlatform(requirements) {
const {
team_experience,
existing_stack,
compliance_needs,
ai_ml_requirements,
budget_constraints,
geographical_presence,
scaling_requirements,
} = requirements;
if (
existing_stack.includes("microsoft") &&
team_experience.includes("windows")
) {
return "azure";
}
if (ai_ml_requirements === "high" || budget_constraints === "tight") {
return "gcp";
}
if (compliance_needs === "enterprise" || geographical_presence === "global") {
return "aws";
}
// Default recommendation for general use cases
return team_experience.includes("aws") ? "aws" : "gcp";
}
AWS Infrastructure Implementation:
#!/bin/bash
# aws-infrastructure.sh - Professional AWS infrastructure setup
set -euo pipefail
# Configuration
AWS_REGION="${AWS_REGION:-us-west-2}"
PROJECT_NAME="${PROJECT_NAME:-myapp}"
ENVIRONMENT="${ENVIRONMENT:-production}"
setup_aws_infrastructure() {
echo "ποΈ Setting up AWS infrastructure for $PROJECT_NAME..."
# Create VPC and networking
setup_vpc_networking
# Set up compute resources
setup_compute_infrastructure
# Configure databases and storage
setup_data_layer
# Set up load balancing and CDN
setup_networking_layer
# Configure monitoring and logging
setup_observability
echo "β
AWS infrastructure setup completed"
}
setup_vpc_networking() {
echo "π Setting up VPC networking..."
# Create VPC
local vpc_id=$(aws ec2 create-vpc \
--cidr-block 10.0.0.0/16 \
--tag-specifications "ResourceType=vpc,Tags=[{Key=Name,Value=$PROJECT_NAME-vpc},{Key=Environment,Value=$ENVIRONMENT}]" \
--query 'Vpc.VpcId' \
--output text)
echo "Created VPC: $vpc_id"
# Create Internet Gateway
local igw_id=$(aws ec2 create-internet-gateway \
--tag-specifications "ResourceType=internet-gateway,Tags=[{Key=Name,Value=$PROJECT_NAME-igw}]" \
--query 'InternetGateway.InternetGatewayId' \
--output text)
aws ec2 attach-internet-gateway \
--vpc-id "$vpc_id" \
--internet-gateway-id "$igw_id"
# Create public subnets (for load balancers)
local public_subnet_1=$(aws ec2 create-subnet \
--vpc-id "$vpc_id" \
--cidr-block 10.0.1.0/24 \
--availability-zone "${AWS_REGION}a" \
--tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-public-1},{Key=Type,Value=public}]" \
--query 'Subnet.SubnetId' \
--output text)
local public_subnet_2=$(aws ec2 create-subnet \
--vpc-id "$vpc_id" \
--cidr-block 10.0.2.0/24 \
--availability-zone "${AWS_REGION}b" \
--tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-public-2},{Key=Type,Value=public}]" \
--query 'Subnet.SubnetId' \
--output text)
# Create private subnets (for application servers)
local private_subnet_1=$(aws ec2 create-subnet \
--vpc-id "$vpc_id" \
--cidr-block 10.0.3.0/24 \
--availability-zone "${AWS_REGION}a" \
--tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-private-1},{Key=Type,Value=private}]" \
--query 'Subnet.SubnetId' \
--output text)
local private_subnet_2=$(aws ec2 create-subnet \
--vpc-id "$vpc_id" \
--cidr-block 10.0.4.0/24 \
--availability-zone "${AWS_REGION}b" \
--tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-private-2},{Key=Type,Value=private}]" \
--query 'Subnet.SubnetId' \
--output text)
# Create database subnets (isolated)
local db_subnet_1=$(aws ec2 create-subnet \
--vpc-id "$vpc_id" \
--cidr-block 10.0.5.0/24 \
--availability-zone "${AWS_REGION}a" \
--tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-db-1},{Key=Type,Value=database}]" \
--query 'Subnet.SubnetId' \
--output text)
local db_subnet_2=$(aws ec2 create-subnet \
--vpc-id "$vpc_id" \
--cidr-block 10.0.6.0/24 \
--availability-zone "${AWS_REGION}b" \
--tag-specifications "ResourceType=subnet,Tags=[{Key=Name,Value=$PROJECT_NAME-db-2},{Key=Type,Value=database}]" \
--query 'Subnet.SubnetId' \
--output text)
# Set up routing
setup_vpc_routing "$vpc_id" "$igw_id" "$public_subnet_1" "$public_subnet_2" \
"$private_subnet_1" "$private_subnet_2"
# Configure security groups
setup_security_groups "$vpc_id"
echo "β
VPC networking configured"
}
setup_compute_infrastructure() {
echo "π» Setting up compute infrastructure..."
# Create ECS cluster
aws ecs create-cluster \
--cluster-name "$PROJECT_NAME-cluster" \
--capacity-providers EC2 FARGATE FARGATE_SPOT \
--default-capacity-provider-strategy \
capacityProvider=FARGATE,weight=1,base=2 \
capacityProvider=FARGATE_SPOT,weight=4 \
capacityProvider=EC2,weight=1
# Create application task definition
create_ecs_task_definition
# Set up Auto Scaling Group for EC2 capacity
setup_auto_scaling_group
# Create ECS service with deployment configuration
create_ecs_service
echo "β
Compute infrastructure configured"
}
create_ecs_task_definition() {
cat > task-definition.json << EOF
{
"family": "$PROJECT_NAME-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"executionRoleArn": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/ecsTaskRole",
"containerDefinitions": [
{
"name": "$PROJECT_NAME-container",
"image": "$PROJECT_NAME:latest",
"essential": true,
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"environment": [
{"name": "NODE_ENV", "value": "production"},
{"name": "PORT", "value": "3000"}
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:$AWS_REGION:$(aws sts get-caller-identity --query Account --output text):secret:$PROJECT_NAME/database-url"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/$PROJECT_NAME",
"awslogs-region": "$AWS_REGION",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
}
]
}
EOF
aws ecs register-task-definition \
--cli-input-json file://task-definition.json
rm task-definition.json
}
setup_data_layer() {
echo "ποΈ Setting up data layer..."
# Create RDS subnet group
aws rds create-db-subnet-group \
--db-subnet-group-name "$PROJECT_NAME-db-subnet-group" \
--db-subnet-group-description "Subnet group for $PROJECT_NAME database" \
--subnet-ids subnet-xxx subnet-yyy \
--tags Key=Name,Value="$PROJECT_NAME-db-subnet-group"
# Create RDS instance with Multi-AZ deployment
aws rds create-db-instance \
--db-instance-identifier "$PROJECT_NAME-postgres" \
--db-instance-class db.t3.medium \
--engine postgres \
--engine-version 15.4 \
--master-username postgres \
--master-user-password "$(aws secretsmanager get-random-password --password-length 32 --exclude-characters '"@/\' --query RandomPassword --output text)" \
--allocated-storage 100 \
--storage-type gp2 \
--storage-encrypted \
--vpc-security-group-ids sg-xxx \
--db-subnet-group-name "$PROJECT_NAME-db-subnet-group" \
--backup-retention-period 30 \
--multi-az \
--deletion-protection \
--enable-performance-insights \
--performance-insights-retention-period 7 \
--tags Key=Name,Value="$PROJECT_NAME-postgres" Key=Environment,Value="$ENVIRONMENT"
# Create ElastiCache Redis cluster
aws elasticache create-cache-cluster \
--cache-cluster-id "$PROJECT_NAME-redis" \
--cache-node-type cache.t3.micro \
--engine redis \
--num-cache-nodes 1 \
--cache-subnet-group-name "$PROJECT_NAME-cache-subnet-group" \
--security-group-ids sg-yyy \
--tags Key=Name,Value="$PROJECT_NAME-redis"
# Create S3 bucket for static assets
aws s3 mb "s3://$PROJECT_NAME-assets-$(date +%Y%m%d)" \
--region "$AWS_REGION"
echo "β
Data layer configured"
}
setup_networking_layer() {
echo "π Setting up load balancing and CDN..."
# Create Application Load Balancer
local alb_arn=$(aws elbv2 create-load-balancer \
--name "$PROJECT_NAME-alb" \
--subnets subnet-xxx subnet-yyy \
--security-groups sg-zzz \
--scheme internet-facing \
--type application \
--ip-address-type ipv4 \
--tags Key=Name,Value="$PROJECT_NAME-alb" \
--query 'LoadBalancers[0].LoadBalancerArn' \
--output text)
# Create target group
local tg_arn=$(aws elbv2 create-target-group \
--name "$PROJECT_NAME-tg" \
--protocol HTTP \
--port 3000 \
--vpc-id vpc-xxx \
--target-type ip \
--health-check-protocol HTTP \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--query 'TargetGroups[0].TargetGroupArn' \
--output text)
# Create ALB listener
aws elbv2 create-listener \
--load-balancer-arn "$alb_arn" \
--protocol HTTP \
--port 80 \
--default-actions Type=forward,TargetGroupArn="$tg_arn"
# Set up CloudFront distribution
setup_cloudfront_distribution "$alb_arn"
echo "β
Networking layer configured"
}
setup_observability() {
echo "π Setting up monitoring and logging..."
# Create CloudWatch log group
aws logs create-log-group \
--log-group-name "/ecs/$PROJECT_NAME" \
--retention-in-days 30
# Create CloudWatch alarms
create_cloudwatch_alarms
# Set up X-Ray tracing
aws xray create-service-map \
--service-name "$PROJECT_NAME-service"
echo "β
Observability configured"
}
GCP Infrastructure Implementation:
# gcp-infrastructure.yaml - Professional GCP infrastructure with Deployment Manager
resources:
# VPC Network
- name: myapp-vpc
type: compute.v1.network
properties:
autoCreateSubnetworks: false
# Subnets
- name: myapp-subnet-web
type: compute.v1.subnetwork
properties:
network: $(ref.myapp-vpc.selfLink)
ipCidrRange: 10.0.1.0/24
region: us-west1
- name: myapp-subnet-app
type: compute.v1.subnetwork
properties:
network: $(ref.myapp-vpc.selfLink)
ipCidrRange: 10.0.2.0/24
region: us-west1
privateIpGoogleAccess: true
# GKE Cluster
- name: myapp-gke-cluster
type: container.v1.cluster
properties:
zone: us-west1-a
network: $(ref.myapp-vpc.selfLink)
subnetwork: $(ref.myapp-subnet-app.selfLink)
initialClusterVersion: "1.27"
nodePools:
- name: default-pool
initialNodeCount: 3
config:
machineType: e2-standard-2
diskType: pd-ssd
diskSizeGb: 100
preemptible: false
serviceAccount: default
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
autoscaling:
enabled: true
minNodeCount: 1
maxNodeCount: 10
management:
autoUpgrade: true
autoRepair: true
# Cloud SQL Instance
- name: myapp-postgres
type: sqladmin.v1beta4.instance
properties:
backendType: SECOND_GEN
instanceType: CLOUD_SQL_INSTANCE
databaseVersion: POSTGRES_15
region: us-west1
settings:
tier: db-g1-small
storageType: PD_SSD
storageSize: 100
storageAutoResize: true
availabilityType: REGIONAL
backupConfiguration:
enabled: true
startTime: "03:00"
retainedBackups: 30
ipConfiguration:
privateNetwork: $(ref.myapp-vpc.selfLink)
requireSsl: true
maintenanceWindow:
hour: 3
day: 7
# Redis Instance
- name: myapp-redis
type: redis.v1.instance
properties:
tier: STANDARD_HA
memorySizeGb: 1
region: us-west1
authorizedNetwork: $(ref.myapp-vpc.selfLink)
redisVersion: REDIS_7_0
# Load Balancer
- name: myapp-lb
type: compute.v1.globalForwardingRule
properties:
IPProtocol: TCP
portRange: 80-80
target: $(ref.myapp-http-proxy.selfLink)
# Cloud Storage Bucket
- name: myapp-assets
type: storage.v1.bucket
properties:
location: US
storageClass: STANDARD
versioning:
enabled: true
lifecycle:
rule:
- action:
type: Delete
condition:
age: 365
Azure Infrastructure with ARM Templates:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"projectName": {
"type": "string",
"defaultValue": "myapp"
},
"environment": {
"type": "string",
"defaultValue": "production"
}
},
"resources": [
{
"type": "Microsoft.Network/virtualNetworks",
"apiVersion": "2021-05-01",
"name": "[concat(parameters('projectName'), '-vnet')]",
"location": "[resourceGroup().location]",
"properties": {
"addressSpace": {
"addressPrefixes": ["10.0.0.0/16"]
},
"subnets": [
{
"name": "web-subnet",
"properties": {
"addressPrefix": "10.0.1.0/24"
}
},
{
"name": "app-subnet",
"properties": {
"addressPrefix": "10.0.2.0/24"
}
},
{
"name": "data-subnet",
"properties": {
"addressPrefix": "10.0.3.0/24",
"serviceEndpoints": [
{
"service": "Microsoft.Sql"
}
]
}
}
]
}
},
{
"type": "Microsoft.ContainerService/managedClusters",
"apiVersion": "2023-07-01",
"name": "[concat(parameters('projectName'), '-aks')]",
"location": "[resourceGroup().location]",
"properties": {
"kubernetesVersion": "1.27.3",
"dnsPrefix": "[parameters('projectName')]",
"agentPoolProfiles": [
{
"name": "nodepool1",
"count": 3,
"vmSize": "Standard_D2s_v3",
"osType": "Linux",
"mode": "System",
"enableAutoScaling": true,
"minCount": 1,
"maxCount": 10
}
],
"servicePrincipalProfile": {
"clientId": "msi"
},
"addonProfiles": {
"azureKeyvaultSecretsProvider": {
"enabled": true
},
"azurepolicy": {
"enabled": true
}
},
"networkProfile": {
"networkPlugin": "azure",
"serviceCidr": "172.16.0.0/16",
"dnsServiceIP": "172.16.0.10"
}
},
"identity": {
"type": "SystemAssigned"
}
},
{
"type": "Microsoft.Sql/servers",
"apiVersion": "2021-11-01",
"name": "[concat(parameters('projectName'), '-sql')]",
"location": "[resourceGroup().location]",
"properties": {
"administratorLogin": "sqladmin",
"administratorLoginPassword": "[concat(toUpper(uniqueString(resourceGroup().id)), uniqueString(resourceGroup().id), '!')]"
},
"resources": [
{
"type": "databases",
"apiVersion": "2021-11-01",
"name": "[parameters('projectName')]",
"dependsOn": [
"[resourceId('Microsoft.Sql/servers', concat(parameters('projectName'), '-sql'))]"
],
"properties": {
"sku": {
"name": "S1",
"tier": "Standard"
},
"maxSizeBytes": 107374182400
}
}
]
}
]
}
Server Management and Provisioning: Infrastructure as Code
The Evolution from Snowflake Servers to Cattle Infrastructure
Understanding the infrastructure management revolution:
// Server management evolution: Pets vs Cattle vs Code
const infrastructureEvolution = {
petsModel: {
approach: "Hand-crafted, named servers treated like pets",
characteristics: [
"Manually configured and maintained",
"Irreplaceable and unique",
"SSH access for troubleshooting",
"Configuration drift over time",
"Difficult to replicate",
],
problems: [
"Scaling requires manual work",
"Inconsistent environments",
"Single points of failure",
"Knowledge trapped in individuals",
"Disaster recovery is painful",
],
reality: "Doesn't scale beyond small teams or simple applications",
},
cattleModel: {
approach: "Disposable, identical servers treated like cattle",
characteristics: [
"Automated provisioning",
"Replaceable and identical",
"No SSH access needed",
"Immutable infrastructure",
"Auto-scaling capable",
],
benefits: [
"Consistent deployments",
"Easy disaster recovery",
"Horizontal scaling",
"Reduced operational overhead",
"Better security posture",
],
limitations: "Still requires infrastructure management tooling",
},
infrastructureAsCode: {
approach: "Infrastructure defined, versioned, and managed as code",
characteristics: [
"Declarative configuration",
"Version controlled infrastructure",
"Automated provisioning and updates",
"Peer review for infrastructure changes",
"Reproducible across environments",
],
advantages: [
"Infrastructure becomes predictable",
"Changes are tracked and auditable",
"Environment consistency guaranteed",
"Collaboration through code review",
"Disaster recovery through code deployment",
],
outcome: "Infrastructure becomes as manageable as application code",
},
};
Terraform Infrastructure as Code Implementation:
# terraform/main.tf - Professional multi-cloud infrastructure
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
helm = {
source = "hashicorp/helm"
version = "~> 2.11"
}
}
backend "s3" {
bucket = "myapp-terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# Variables
variable "project_name" {
description = "Name of the project"
type = string
default = "myapp"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "aws_region" {
description = "AWS region"
type = string
default = "us-west-2"
}
variable "kubernetes_version" {
description = "Kubernetes version for EKS"
type = string
default = "1.28"
}
# Data sources
data "aws_availability_zones" "available" {
state = "available"
}
data "aws_caller_identity" "current" {}
# Local values
locals {
cluster_name = "${var.project_name}-${var.environment}"
common_tags = {
Project = var.project_name
Environment = var.environment
ManagedBy = "terraform"
}
azs = slice(data.aws_availability_zones.available.names, 0, 3)
}
# VPC Configuration
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "${local.cluster_name}-vpc"
cidr = "10.0.0.0/16"
azs = local.azs
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
database_subnets = ["10.0.7.0/24", "10.0.8.0/24", "10.0.9.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
enable_vpn_gateway = false
enable_dns_hostnames = true
enable_dns_support = true
# VPC Flow Logs
enable_flow_log = true
create_flow_log_cloudwatch_log_group = true
create_flow_log_cloudwatch_iam_role = true
flow_log_max_aggregation_interval = 60
# Subnet tagging for Load Balancers
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
tags = local.common_tags
}
# Security Groups
resource "aws_security_group" "eks_cluster" {
name_prefix = "${local.cluster_name}-cluster-"
vpc_id = module.vpc.vpc_id
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [module.vpc.vpc_cidr_block]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(local.common_tags, {
Name = "${local.cluster_name}-cluster-sg"
})
}
# EKS Cluster
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.0"
cluster_name = local.cluster_name
cluster_version = var.kubernetes_version
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
# Cluster security group
cluster_security_group_additional_rules = {
ingress_nodes_ephemeral_ports_tcp = {
description = "Nodes on ephemeral ports"
protocol = "tcp"
from_port = 1025
to_port = 65535
type = "ingress"
source_node_security_group = true
}
}
# Node groups
eks_managed_node_groups = {
main = {
name = "main-nodegroup"
instance_types = ["t3.medium", "t3.large"]
capacity_type = "SPOT"
min_size = 1
max_size = 10
desired_size = 3
# Launch template configuration
launch_template_name = "${local.cluster_name}-main"
launch_template_description = "Launch template for main node group"
launch_template_version = "$Latest"
pre_bootstrap_user_data = <<-EOT
#!/bin/bash
/etc/eks/bootstrap.sh ${local.cluster_name}
yum install -y amazon-cloudwatch-agent
EOT
# Disk configuration
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 100
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
delete_on_termination = true
}
}
}
# Taints and labels
taints = {
dedicated = {
key = "dedicated"
value = "main"
effect = "NO_SCHEDULE"
}
}
labels = {
Environment = var.environment
NodeGroup = "main"
}
tags = local.common_tags
}
# Additional node group for system workloads
system = {
name = "system-nodegroup"
instance_types = ["t3.small"]
capacity_type = "ON_DEMAND"
min_size = 2
max_size = 4
desired_size = 2
labels = {
Environment = var.environment
NodeGroup = "system"
WorkloadType = "system"
}
taints = {
system = {
key = "system"
value = "true"
effect = "NO_SCHEDULE"
}
}
tags = local.common_tags
}
}
# Cluster add-ons
cluster_addons = {
coredns = {
most_recent = true
}
kube-proxy = {
most_recent = true
}
vpc-cni = {
most_recent = true
}
aws-ebs-csi-driver = {
most_recent = true
}
}
tags = local.common_tags
}
# RDS Database
module "rds" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.0"
identifier = "${local.cluster_name}-postgres"
# Database configuration
engine = "postgres"
engine_version = "15.4"
family = "postgres15"
major_engine_version = "15"
instance_class = "db.t3.medium"
allocated_storage = 100
max_allocated_storage = 1000
storage_type = "gp2"
storage_encrypted = true
# Database settings
db_name = replace(var.project_name, "-", "_")
username = "postgres"
manage_master_user_password = true
port = 5432
# Network configuration
multi_az = true
db_subnet_group_name = module.vpc.database_subnet_group
vpc_security_group_ids = [aws_security_group.rds.id]
# Backup configuration
backup_retention_period = 30
backup_window = "03:00-04:00"
maintenance_window = "Sun:04:00-Sun:05:00"
# Monitoring
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
create_cloudwatch_log_group = true
performance_insights_enabled = true
performance_insights_retention_period = 7
# Security
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "${local.cluster_name}-postgres-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
tags = local.common_tags
}
# RDS Security Group
resource "aws_security_group" "rds" {
name_prefix = "${local.cluster_name}-rds-"
vpc_id = module.vpc.vpc_id
ingress {
description = "PostgreSQL"
from_port = 5432
to_port = 5432
protocol = "tcp"
cidr_blocks = [module.vpc.vpc_cidr_block]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(local.common_tags, {
Name = "${local.cluster_name}-rds-sg"
})
}
# ElastiCache Redis
resource "aws_elasticache_subnet_group" "redis" {
name = "${local.cluster_name}-redis-subnet"
subnet_ids = module.vpc.database_subnets
tags = local.common_tags
}
resource "aws_elasticache_replication_group" "redis" {
replication_group_id = "${local.cluster_name}-redis"
description = "Redis cluster for ${local.cluster_name}"
node_type = "cache.t3.micro"
port = 6379
parameter_group_name = "default.redis7"
num_cache_clusters = 2
automatic_failover_enabled = true
multi_az_enabled = true
subnet_group_name = aws_elasticache_subnet_group.redis.name
security_group_ids = [aws_security_group.redis.id]
at_rest_encryption_enabled = true
transit_encryption_enabled = true
auth_token = random_password.redis_auth.result
# Backup configuration
snapshot_retention_limit = 7
snapshot_window = "03:00-05:00"
# Maintenance
maintenance_window = "sun:05:00-sun:07:00"
# Logging
log_delivery_configuration {
destination = aws_cloudwatch_log_group.redis.name
destination_type = "cloudwatch-logs"
log_format = "text"
log_type = "slow-log"
}
tags = local.common_tags
}
# Redis password
resource "random_password" "redis_auth" {
length = 32
special = true
}
# Redis Security Group
resource "aws_security_group" "redis" {
name_prefix = "${local.cluster_name}-redis-"
vpc_id = module.vpc.vpc_id
ingress {
description = "Redis"
from_port = 6379
to_port = 6379
protocol = "tcp"
cidr_blocks = [module.vpc.vpc_cidr_block]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(local.common_tags, {
Name = "${local.cluster_name}-redis-sg"
})
}
# CloudWatch Log Group for Redis
resource "aws_cloudwatch_log_group" "redis" {
name = "/elasticache/${local.cluster_name}-redis"
retention_in_days = 30
tags = local.common_tags
}
# S3 Bucket for static assets
module "s3_bucket" {
source = "terraform-aws-modules/s3-bucket/aws"
version = "~> 3.0"
bucket = "${local.cluster_name}-assets-${random_string.bucket_suffix.result}"
# Security
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
# Versioning
versioning = {
enabled = true
}
# Server-side encryption
server_side_encryption_configuration = {
rule = {
apply_server_side_encryption_by_default = {
sse_algorithm = "AES256"
}
}
}
# Lifecycle configuration
lifecycle_configuration = {
rule = {
id = "delete_old_versions"
status = "Enabled"
noncurrent_version_expiration = {
noncurrent_days = 90
}
}
}
tags = local.common_tags
}
resource "random_string" "bucket_suffix" {
length = 8
special = false
upper = false
}
# Application Load Balancer
module "alb" {
source = "terraform-aws-modules/alb/aws"
version = "~> 8.0"
name = "${local.cluster_name}-alb"
load_balancer_type = "application"
vpc_id = module.vpc.vpc_id
subnets = module.vpc.public_subnets
security_groups = [aws_security_group.alb.id]
# Target groups (will be managed by Kubernetes ingress)
target_groups = [
{
name = "${local.cluster_name}-tg"
backend_protocol = "HTTP"
backend_port = 80
target_type = "ip"
deregistration_delay = 10
health_check = {
enabled = true
healthy_threshold = 2
interval = 30
matcher = "200"
path = "/health"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
stickiness = {
enabled = false
type = "lb_cookie"
}
}
]
# Listeners
http_tcp_listeners = [
{
port = 80
protocol = "HTTP"
target_group_index = 0
}
]
tags = local.common_tags
}
# ALB Security Group
resource "aws_security_group" "alb" {
name_prefix = "${local.cluster_name}-alb-"
vpc_id = module.vpc.vpc_id
ingress {
description = "HTTP"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = merge(local.common_tags, {
Name = "${local.cluster_name}-alb-sg"
})
}
# Outputs
output "cluster_endpoint" {
description = "Endpoint for EKS control plane"
value = module.eks.cluster_endpoint
}
output "cluster_name" {
description = "Kubernetes Cluster Name"
value = module.eks.cluster_name
}
output "rds_endpoint" {
description = "RDS instance endpoint"
value = module.rds.db_instance_endpoint
}
output "redis_endpoint" {
description = "ElastiCache Redis endpoint"
value = aws_elasticache_replication_group.redis.configuration_endpoint_address
}
output "s3_bucket_name" {
description = "Name of the S3 bucket"
value = module.s3_bucket.s3_bucket_id
}
output "alb_dns_name" {
description = "The DNS name of the load balancer"
value = module.alb.lb_dns_name
}
Load Balancers and Reverse Proxies: Traffic Distribution Excellence
Professional Load Balancing Architecture
Load balancing strategies that actually scale in production:
# nginx.conf - Professional reverse proxy and load balancer configuration
# Main context - global configuration
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
# Optimize for high performance
worker_rlimit_nofile 65535;
events {
worker_connections 4096;
use epoll;
multi_accept on;
}
http {
# Basic settings
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Performance optimizations
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
keepalive_requests 100;
types_hash_max_size 2048;
server_tokens off;
# Buffer optimizations
client_body_buffer_size 128k;
client_max_body_size 100m;
client_header_buffer_size 1k;
large_client_header_buffers 4 4k;
output_buffers 1 32k;
postpone_output 1460;
# Gzip compression
gzip on;
gzip_vary on;
gzip_min_length 1000;
gzip_proxied any;
gzip_comp_level 6;
gzip_types
text/plain
text/css
text/xml
text/javascript
application/json
application/javascript
application/xml+rss
application/atom+xml
image/svg+xml;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "no-referrer-when-downgrade" always;
add_header Content-Security-Policy "default-src 'self' http: https: data: blob: 'unsafe-inline'" always;
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
limit_conn_zone $binary_remote_addr zone=connections:10m;
# Upstream servers with advanced load balancing
upstream backend_api {
# Load balancing method: least_conn, ip_hash, hash, random
least_conn;
# Backend servers with weights and health checks
server api-1.internal:3000 weight=3 max_fails=2 fail_timeout=30s;
server api-2.internal:3000 weight=3 max_fails=2 fail_timeout=30s;
server api-3.internal:3000 weight=2 max_fails=2 fail_timeout=30s;
server api-4.internal:3000 weight=1 max_fails=2 fail_timeout=30s backup;
# Connection pooling
keepalive 32;
keepalive_requests 100;
keepalive_timeout 60s;
}
upstream backend_websocket {
# Use IP hash for WebSocket sticky sessions
ip_hash;
server ws-1.internal:3001 max_fails=1 fail_timeout=10s;
server ws-2.internal:3001 max_fails=1 fail_timeout=10s;
server ws-3.internal:3001 max_fails=1 fail_timeout=10s;
keepalive 16;
}
upstream backend_static {
# Round robin for static content
server static-1.internal:8080 weight=1;
server static-2.internal:8080 weight=1;
keepalive 8;
}
# Health check endpoint
server {
listen 8080;
server_name _;
location /nginx-health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
# Nginx status for monitoring
location /nginx-status {
stub_status on;
access_log off;
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
}
}
# Main application server
server {
listen 80;
server_name myapp.com www.myapp.com;
# Redirect to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name myapp.com www.myapp.com;
# SSL configuration
ssl_certificate /etc/nginx/ssl/myapp.com.crt;
ssl_certificate_key /etc/nginx/ssl/myapp.com.key;
# Modern SSL configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_session_tickets off;
ssl_stapling on;
ssl_stapling_verify on;
# HSTS
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
# Logging
access_log /var/log/nginx/myapp_access.log combined;
error_log /var/log/nginx/myapp_error.log;
# Global rate limiting
limit_req zone=api burst=20 nodelay;
limit_conn connections 50;
# Static content with aggressive caching
location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
expires 1y;
add_header Cache-Control "public, immutable";
add_header Vary Accept-Encoding;
# Serve from static backend
proxy_pass http://backend_static;
proxy_cache static_cache;
proxy_cache_valid 200 1d;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_lock on;
proxy_cache_lock_timeout 5s;
}
# API endpoints with specific rate limiting
location /api/ {
# Stricter rate limiting for API
limit_req zone=api burst=10 nodelay;
# Proxy to backend API
proxy_pass http://backend_api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Request-ID $request_id;
# Timeouts
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffering
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
# Cache API responses selectively
proxy_cache api_cache;
proxy_cache_valid 200 5m;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_bypass $http_cache_control;
# Health checks
proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
}
# Authentication endpoints with strict rate limiting
location /auth/ {
limit_req zone=login burst=5 nodelay;
proxy_pass http://backend_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# No caching for auth endpoints
proxy_no_cache 1;
proxy_cache_bypass 1;
}
# WebSocket endpoints
location /ws/ {
proxy_pass http://backend_websocket;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket specific timeouts
proxy_connect_timeout 7d;
proxy_send_timeout 7d;
proxy_read_timeout 7d;
}
# Health check endpoint (no auth required)
location /health {
access_log off;
proxy_pass http://backend_api;
proxy_connect_timeout 1s;
proxy_send_timeout 2s;
proxy_read_timeout 2s;
}
# Root location
location / {
proxy_pass http://backend_api;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Request-ID $request_id;
# Caching for dynamic content
proxy_cache dynamic_cache;
proxy_cache_valid 200 1m;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_bypass $cookie_session;
}
# Custom error pages
error_page 404 /404.html;
error_page 500 502 503 504 /50x.html;
location = /404.html {
internal;
root /var/www/html;
}
location = /50x.html {
internal;
root /var/www/html;
}
}
# Cache definitions
proxy_cache_path /var/cache/nginx/static levels=1:2 keys_zone=static_cache:10m max_size=1g inactive=60m use_temp_path=off;
proxy_cache_path /var/cache/nginx/api levels=1:2 keys_zone=api_cache:10m max_size=100m inactive=10m use_temp_path=off;
proxy_cache_path /var/cache/nginx/dynamic levels=1:2 keys_zone=dynamic_cache:10m max_size=100m inactive=5m use_temp_path=off;
# Log format with detailed information
log_format detailed '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'rt=$request_time uct="$upstream_connect_time" '
'uht="$upstream_header_time" urt="$upstream_response_time" '
'rid=$request_id';
}
Advanced Load Balancer Configuration with HAProxy:
# haproxy.cfg - Enterprise-grade load balancer configuration
global
# Process management
daemon
user haproxy
group haproxy
pidfile /var/run/haproxy.pid
# Performance tuning
maxconn 40000
ulimit-n 81000
# SSL configuration
ssl-default-bind-ciphers ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
# Certificate store
crt-base /etc/ssl/certs
ca-base /etc/ssl/certs
# Logging
log stdout local0 info
# Statistics
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 2m
defaults
mode http
option httplog
option dontlognull
option log-health-checks
option forwardfor
option http-server-close
# Timeouts
timeout connect 5s
timeout client 50s
timeout server 50s
timeout http-request 15s
timeout http-keep-alive 15s
timeout check 10s
# Error handling
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
# Frontend configuration
frontend web_frontend
bind *:80
bind *:443 ssl crt /etc/ssl/certs/myapp.com.pem
# Security headers
http-response set-header Strict-Transport-Security "max-age=31536000; includeSubDomains"
http-response set-header X-Frame-Options "SAMEORIGIN"
http-response set-header X-Content-Type-Options "nosniff"
http-response set-header X-XSS-Protection "1; mode=block"
# Rate limiting using stick tables
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request reject if { sc_http_req_rate(0) gt 20 }
# Redirect HTTP to HTTPS
redirect scheme https if !{ ssl_fc }
# Request routing based on path
use_backend api_backend if { path_beg /api/ }
use_backend ws_backend if { path_beg /ws/ } { hdr(upgrade) -i websocket }
use_backend static_backend if { path_beg /static/ }
use_backend auth_backend if { path_beg /auth/ }
default_backend web_backend
# Capture headers for logging
capture request header Host len 64
capture request header User-Agent len 64
capture request header X-Forwarded-For len 64
# Backend configurations
backend web_backend
balance roundrobin
option httpchk GET /health
# Backend servers
server web1 10.0.1.10:3000 check weight 100 maxconn 1000
server web2 10.0.1.11:3000 check weight 100 maxconn 1000
server web3 10.0.1.12:3000 check weight 80 maxconn 800
server web4 10.0.1.13:3000 check weight 50 maxconn 500 backup
# Health check configuration
http-check connect port 3000
http-check send meth GET uri /health ver HTTP/1.1 hdr host myapp.com
http-check expect status 200
# Session persistence
cookie SERVERID insert indirect nocache
# Connection reuse
http-reuse safe
backend api_backend
balance leastconn
option httpchk GET /api/health
# Stricter rate limiting for API
stick-table type ip size 10k expire 60s store http_req_rate(60s)
http-request track-sc1 src
http-request reject if { sc_http_req_rate(1) gt 100 }
server api1 10.0.2.10:3000 check weight 100 maxconn 500
server api2 10.0.2.11:3000 check weight 100 maxconn 500
server api3 10.0.2.12:3000 check weight 100 maxconn 500
server api4 10.0.2.13:3000 check weight 50 maxconn 250 backup
# Advanced health checks
http-check connect port 3000
http-check send meth GET uri /api/health ver HTTP/1.1 hdr host api.myapp.com
http-check expect status 200
http-check expect string "healthy"
backend ws_backend
balance source
option httpchk GET /ws/health
# WebSocket specific settings
timeout server 7d
timeout tunnel 7d
server ws1 10.0.3.10:3001 check weight 100
server ws2 10.0.3.11:3001 check weight 100
server ws3 10.0.3.12:3001 check weight 100
backend static_backend
balance roundrobin
option httpchk HEAD /static/health.txt
# Optimized for static content
http-response set-header Cache-Control "public, max-age=31536000"
server static1 10.0.4.10:8080 check weight 100
server static2 10.0.4.11:8080 check weight 100
backend auth_backend
balance roundrobin
option httpchk GET /auth/health
# Very strict rate limiting for auth
stick-table type ip size 10k expire 300s store http_req_rate(60s)
http-request track-sc2 src
http-request reject if { sc_http_req_rate(2) gt 10 }
server auth1 10.0.5.10:3000 check weight 100 maxconn 200
server auth2 10.0.5.11:3000 check weight 100 maxconn 200
# Statistics and monitoring
listen stats
bind *:8080
stats enable
stats uri /haproxy-stats
stats realm "HAProxy Statistics"
stats auth admin:secure_password_here
stats refresh 30s
stats show-legends
stats show-desc "MyApp Load Balancer"
# Admin interface
stats admin if TRUE
# Detailed backend information
stats show-node
stats show-legends
# Health check service
listen health_check
bind *:8081
monitor-uri /health
monitor fail if { nbsrv(web_backend) lt 2 }
monitor fail if { nbsrv(api_backend) lt 2 }
CDN and Static Asset Delivery: Global Performance Optimization
Professional CDN Architecture and Implementation
CDN strategy that actually improves global performance:
#!/bin/bash
# cdn-setup.sh - Professional CDN configuration and optimization
set -euo pipefail
# Configuration
CDN_PROVIDER="${CDN_PROVIDER:-cloudflare}"
DOMAIN="${DOMAIN:-myapp.com}"
ORIGIN_SERVER="${ORIGIN_SERVER:-origin.myapp.com}"
ASSETS_BUCKET="${ASSETS_BUCKET:-myapp-assets}"
setup_cloudflare_cdn() {
echo "βοΈ Setting up Cloudflare CDN configuration..."
# Cloudflare API configuration
local zone_id=$(get_cloudflare_zone_id "$DOMAIN")
# Configure caching rules
configure_cloudflare_caching "$zone_id"
# Set up page rules for optimization
setup_cloudflare_page_rules "$zone_id"
# Configure security settings
setup_cloudflare_security "$zone_id"
# Set up Workers for edge computing
deploy_cloudflare_workers "$zone_id"
echo "β
Cloudflare CDN configured"
}
configure_cloudflare_caching() {
local zone_id="$1"
echo "π¦ Configuring Cloudflare caching policies..."
# Static assets - aggressive caching
curl -X POST "https://api.cloudflare.com/client/v4/zones/$zone_id/pagerules" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{
"targets": [{
"target": "url",
"constraint": {
"operator": "matches",
"value": "'$DOMAIN'/static/*"
}
}],
"actions": [{
"id": "cache_level",
"value": "cache_everything"
}, {
"id": "edge_cache_ttl",
"value": 2592000
}, {
"id": "browser_cache_ttl",
"value": 31536000
}],
"priority": 1,
"status": "active"
}'
# API responses - selective caching
curl -X POST "https://api.cloudflare.com/client/v4/zones/$zone_id/pagerules" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{
"targets": [{
"target": "url",
"constraint": {
"operator": "matches",
"value": "'$DOMAIN'/api/public/*"
}
}],
"actions": [{
"id": "cache_level",
"value": "cache_everything"
}, {
"id": "edge_cache_ttl",
"value": 300
}],
"priority": 2,
"status": "active"
}'
# Dynamic content - bypass cache
curl -X POST "https://api.cloudflare.com/client/v4/zones/$zone_id/pagerules" \
-H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
-H "Content-Type: application/json" \
--data '{
"targets": [{
"target": "url",
"constraint": {
"operator": "matches",
"value": "'$DOMAIN'/api/user/*"
}
}],
"actions": [{
"id": "cache_level",
"value": "bypass"
}],
"priority": 3,
"status": "active"
}'
}
deploy_cloudflare_workers() {
local zone_id="$1"
echo "β‘ Deploying Cloudflare Workers for edge processing..."
# Create advanced image optimization worker
cat > image-optimization-worker.js << 'EOF'
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
const url = new URL(request.url)
// Only process image requests
if (!url.pathname.match(/\.(jpg|jpeg|png|webp|gif)$/i)) {
return fetch(request)
}
// Get client information
const accept = request.headers.get('Accept') || ''
const userAgent = request.headers.get('User-Agent') || ''
// Determine optimal format
let format = 'auto'
if (accept.includes('image/webp')) {
format = 'webp'
} else if (accept.includes('image/avif')) {
format = 'avif'
}
// Determine device type for quality optimization
let quality = 85
if (userAgent.includes('Mobile')) {
quality = 75
}
// Build Cloudflare Images URL
const imageUrl = new URL(url)
imageUrl.searchParams.set('format', format)
imageUrl.searchParams.set('quality', quality.toString())
// Add responsive sizing based on viewport
const viewport = request.headers.get('Viewport-Width')
if (viewport) {
const width = Math.min(parseInt(viewport), 2048)
imageUrl.searchParams.set('width', width.toString())
}
// Fetch optimized image
const response = await fetch(imageUrl.toString())
// Add performance headers
const newResponse = new Response(response.body, response)
newResponse.headers.set('Cache-Control', 'public, max-age=31536000, immutable')
newResponse.headers.set('X-Image-Optimized', 'cloudflare-worker')
return newResponse
}
EOF
# Deploy the worker
wrangler publish image-optimization-worker.js --name image-optimizer
# Create security and performance worker
cat > security-performance-worker.js << 'EOF'
addEventListener('fetch', event => {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
const url = new URL(request.url)
// Security: Block suspicious requests
const userAgent = request.headers.get('User-Agent') || ''
const suspiciousPatterns = [
/bot/i, /crawler/i, /scraper/i, /spider/i
]
if (suspiciousPatterns.some(pattern => pattern.test(userAgent))) {
// Rate limit bots
const botKey = `bot:${request.headers.get('CF-Connecting-IP')}`
const botCount = await RATE_LIMITER.get(botKey) || 0
if (botCount > 10) {
return new Response('Rate limited', { status: 429 })
}
await RATE_LIMITER.put(botKey, botCount + 1, { expirationTtl: 3600 })
}
// Performance: Add security headers
const response = await fetch(request)
const newResponse = new Response(response.body, response)
// Security headers
newResponse.headers.set('X-Frame-Options', 'DENY')
newResponse.headers.set('X-Content-Type-Options', 'nosniff')
newResponse.headers.set('X-XSS-Protection', '1; mode=block')
newResponse.headers.set('Referrer-Policy', 'strict-origin-when-cross-origin')
newResponse.headers.set('Content-Security-Policy', "default-src 'self'; script-src 'self' 'unsafe-inline' https://cdn.jsdelivr.net")
// Performance headers
if (url.pathname.match(/\.(css|js|png|jpg|jpeg|gif|webp|svg|woff|woff2)$/)) {
newResponse.headers.set('Cache-Control', 'public, max-age=31536000, immutable')
}
return newResponse
}
EOF
wrangler publish security-performance-worker.js --name security-performance
}
setup_aws_cloudfront() {
echo "π Setting up AWS CloudFront distribution..."
# Create CloudFront distribution configuration
cat > cloudfront-distribution.json << EOF
{
"CallerReference": "$(date +%s)",
"Comment": "Production CDN for $DOMAIN",
"DefaultRootObject": "index.html",
"Origins": {
"Quantity": 2,
"Items": [
{
"Id": "origin-server",
"DomainName": "$ORIGIN_SERVER",
"CustomOriginConfig": {
"HTTPPort": 80,
"HTTPSPort": 443,
"OriginProtocolPolicy": "https-only",
"OriginSslProtocols": {
"Quantity": 1,
"Items": ["TLSv1.2"]
}
}
},
{
"Id": "s3-assets",
"DomainName": "$ASSETS_BUCKET.s3.amazonaws.com",
"S3OriginConfig": {
"OriginAccessIdentity": ""
}
}
]
},
"DefaultCacheBehavior": {
"TargetOriginId": "origin-server",
"ViewerProtocolPolicy": "redirect-to-https",
"MinTTL": 0,
"ForwardedValues": {
"QueryString": true,
"Cookies": {
"Forward": "whitelist",
"WhitelistedNames": {
"Quantity": 2,
"Items": ["session_id", "auth_token"]
}
},
"Headers": {
"Quantity": 3,
"Items": ["Host", "Authorization", "CloudFront-Viewer-Country"]
}
},
"TrustedSigners": {
"Enabled": false,
"Quantity": 0
},
"Compress": true
},
"CacheBehaviors": {
"Quantity": 3,
"Items": [
{
"PathPattern": "/static/*",
"TargetOriginId": "s3-assets",
"ViewerProtocolPolicy": "redirect-to-https",
"MinTTL": 31536000,
"DefaultTTL": 31536000,
"MaxTTL": 31536000,
"ForwardedValues": {
"QueryString": false,
"Cookies": {
"Forward": "none"
}
},
"Compress": true
},
{
"PathPattern": "/api/*",
"TargetOriginId": "origin-server",
"ViewerProtocolPolicy": "redirect-to-https",
"MinTTL": 0,
"DefaultTTL": 300,
"MaxTTL": 3600,
"ForwardedValues": {
"QueryString": true,
"Cookies": {
"Forward": "all"
},
"Headers": {
"Quantity": 4,
"Items": ["Authorization", "Content-Type", "Accept", "User-Agent"]
}
}
},
{
"PathPattern": "/auth/*",
"TargetOriginId": "origin-server",
"ViewerProtocolPolicy": "redirect-to-https",
"MinTTL": 0,
"DefaultTTL": 0,
"MaxTTL": 0,
"ForwardedValues": {
"QueryString": true,
"Cookies": {
"Forward": "all"
},
"Headers": {
"Quantity": 1,
"Items": ["*"]
}
}
}
]
},
"Enabled": true,
"PriceClass": "PriceClass_All",
"Aliases": {
"Quantity": 2,
"Items": ["$DOMAIN", "www.$DOMAIN"]
},
"ViewerCertificate": {
"ACMCertificateArn": "arn:aws:acm:us-east-1:123456789012:certificate/certificate-id",
"SSLSupportMethod": "sni-only",
"MinimumProtocolVersion": "TLSv1.2_2021"
},
"HttpVersion": "http2",
"IsIPV6Enabled": true,
"Logging": {
"Enabled": true,
"IncludeCookies": false,
"Bucket": "$DOMAIN-cloudfront-logs.s3.amazonaws.com",
"Prefix": "access-logs/"
}
}
EOF
# Create the distribution
aws cloudfront create-distribution \
--distribution-config file://cloudfront-distribution.json \
--region us-east-1
rm cloudfront-distribution.json
echo "β
CloudFront distribution created"
}
optimize_static_assets() {
echo "π¨ Optimizing static assets for CDN delivery..."
# Create asset optimization pipeline
cat > optimize-assets.js << 'EOF'
const fs = require('fs');
const path = require('path');
const sharp = require('sharp');
const { minify } = require('terser');
const CleanCSS = require('clean-css');
const { gzipSync, brotliCompressSync } = require('zlib');
class AssetOptimizer {
constructor(srcDir, distDir) {
this.srcDir = srcDir;
this.distDir = distDir;
this.stats = {
processed: 0,
originalSize: 0,
optimizedSize: 0
};
}
async optimizeImages(inputDir, outputDir) {
console.log('πΌοΈ Optimizing images...');
const imageFiles = this.getFiles(inputDir, /\.(jpg|jpeg|png|webp|svg)$/i);
for (const file of imageFiles) {
const inputPath = path.join(inputDir, file);
const outputPath = path.join(outputDir, file);
const outputDir = path.dirname(outputPath);
// Ensure output directory exists
fs.mkdirSync(outputDir, { recursive: true });
const inputStats = fs.statSync(inputPath);
this.stats.originalSize += inputStats.size;
if (file.endsWith('.svg')) {
// Copy SVG files as-is (could add SVGO optimization)
fs.copyFileSync(inputPath, outputPath);
} else {
// Optimize raster images
await sharp(inputPath)
.resize(2048, 2048, {
fit: 'inside',
withoutEnlargement: true
})
.jpeg({ quality: 85, progressive: true })
.png({ quality: 85, progressive: true })
.webp({ quality: 85 })
.toFile(outputPath);
// Generate additional formats
const baseName = path.parse(file).name;
const baseDir = path.dirname(outputPath);
// Generate WebP version
await sharp(inputPath)
.webp({ quality: 85 })
.toFile(path.join(baseDir, `${baseName}.webp`));
// Generate AVIF version for modern browsers
try {
await sharp(inputPath)
.avif({ quality: 75 })
.toFile(path.join(baseDir, `${baseName}.avif`));
} catch (e) {
// AVIF not supported in all Sharp versions
console.log(`AVIF optimization skipped for ${file}`);
}
}
const outputStats = fs.statSync(outputPath);
this.stats.optimizedSize += outputStats.size;
this.stats.processed++;
console.log(` Optimized ${file}: ${this.formatBytes(inputStats.size)} β ${this.formatBytes(outputStats.size)}`);
}
}
async optimizeJavaScript(inputDir, outputDir) {
console.log('π Optimizing JavaScript...');
const jsFiles = this.getFiles(inputDir, /\.js$/i);
for (const file of jsFiles) {
const inputPath = path.join(inputDir, file);
const outputPath = path.join(outputDir, file);
const outputDir = path.dirname(outputPath);
fs.mkdirSync(outputDir, { recursive: true });
const code = fs.readFileSync(inputPath, 'utf8');
const inputSize = Buffer.byteLength(code, 'utf8');
this.stats.originalSize += inputSize;
// Minify JavaScript
const result = await minify(code, {
compress: {
dead_code: true,
drop_console: true,
drop_debugger: true,
keep_fargs: false,
passes: 2
},
mangle: {
toplevel: true
},
format: {
comments: false
}
});
const minified = result.code;
const outputSize = Buffer.byteLength(minified, 'utf8');
this.stats.optimizedSize += outputSize;
// Write minified version
fs.writeFileSync(outputPath, minified);
// Create compressed versions
this.createCompressedVersions(outputPath, minified);
console.log(` Optimized ${file}: ${this.formatBytes(inputSize)} β ${this.formatBytes(outputSize)}`);
}
}
async optimizeCSS(inputDir, outputDir) {
console.log('π¨ Optimizing CSS...');
const cssFiles = this.getFiles(inputDir, /\.css$/i);
for (const file of cssFiles) {
const inputPath = path.join(inputDir, file);
const outputPath = path.join(outputDir, file);
const outputDir = path.dirname(outputPath);
fs.mkdirSync(outputDir, { recursive: true });
const css = fs.readFileSync(inputPath, 'utf8');
const inputSize = Buffer.byteLength(css, 'utf8');
this.stats.originalSize += inputSize;
// Minify CSS
const result = new CleanCSS({
level: 2,
inline: ['all'],
rebase: false
}).minify(css);
if (result.errors.length > 0) {
console.error(`CSS optimization errors in ${file}:`, result.errors);
continue;
}
const minified = result.styles;
const outputSize = Buffer.byteLength(minified, 'utf8');
this.stats.optimizedSize += outputSize;
// Write minified version
fs.writeFileSync(outputPath, minified);
// Create compressed versions
this.createCompressedVersions(outputPath, minified);
console.log(` Optimized ${file}: ${this.formatBytes(inputSize)} β ${this.formatBytes(outputSize)}`);
}
}
createCompressedVersions(filePath, content) {
// Create gzipped version
const gzipped = gzipSync(content, { level: 9 });
fs.writeFileSync(`${filePath}.gz`, gzipped);
// Create Brotli compressed version
const brotli = brotliCompressSync(content, {
params: {
[require('zlib').constants.BROTLI_PARAM_QUALITY]: 11
}
});
fs.writeFileSync(`${filePath}.br`, brotli);
}
getFiles(dir, pattern) {
const files = [];
function walk(currentDir) {
const entries = fs.readdirSync(currentDir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(currentDir, entry.name);
if (entry.isDirectory()) {
walk(fullPath);
} else if (pattern.test(entry.name)) {
files.push(path.relative(dir, fullPath));
}
}
}
walk(dir);
return files;
}
formatBytes(bytes) {
if (bytes === 0) return '0 B';
const k = 1024;
const sizes = ['B', 'KB', 'MB', 'GB'];
const i = Math.floor(Math.log(bytes) / Math.log(k));
return `${parseFloat((bytes / Math.pow(k, i)).toFixed(1))} ${sizes[i]}`;
}
printStats() {
const saved = this.stats.originalSize - this.stats.optimizedSize;
const percent = ((saved / this.stats.originalSize) * 100).toFixed(1);
console.log('\nπ Optimization Summary:');
console.log(` Files processed: ${this.stats.processed}`);
console.log(` Original size: ${this.formatBytes(this.stats.originalSize)}`);
console.log(` Optimized size: ${this.formatBytes(this.stats.optimizedSize)}`);
console.log(` Space saved: ${this.formatBytes(saved)} (${percent}%)`);
}
}
// Usage
const optimizer = new AssetOptimizer('./src/assets', './dist/assets');
async function main() {
console.log('π Starting asset optimization pipeline...');
await optimizer.optimizeImages('./src/assets/images', './dist/assets/images');
await optimizer.optimizeJavaScript('./src/assets/js', './dist/assets/js');
await optimizer.optimizeCSS('./src/assets/css', './dist/assets/css');
optimizer.printStats();
console.log('β
Asset optimization completed');
}
main().catch(console.error);
EOF
# Run the optimization
node optimize-assets.js
echo "β
Static assets optimized"
}
# Command routing
case "${1:-help}" in
"cloudflare")
setup_cloudflare_cdn
;;
"aws")
setup_aws_cloudfront
;;
"optimize")
optimize_static_assets
;;
"all")
optimize_static_assets
setup_cloudflare_cdn
;;
"help"|*)
cat << EOF
CDN Setup and Optimization
Usage: $0 <command>
Commands:
cloudflare Set up Cloudflare CDN configuration
aws Set up AWS CloudFront distribution
optimize Optimize static assets for CDN delivery
all Run complete CDN setup and optimization
Examples:
$0 cloudflare # Configure Cloudflare CDN
$0 aws # Set up CloudFront distribution
$0 optimize # Optimize assets for delivery
$0 all # Complete CDN setup
EOF
;;
esac
Key Takeaways
Professional deployment and infrastructure management transforms amateur manual processes into automated, scalable systems that handle real-world production demands. Modern infrastructure requires thinking beyond single servers to orchestrated, cloud-native architectures with proper load balancing, CDN optimization, and Infrastructure as Code practices.
The deployment and infrastructure mastery mindset:
- Deployment strategies eliminate downtime: Rolling, blue-green, and canary deployments ensure zero-downtime updates with automatic rollback capabilities
- Cloud platforms provide scalable foundation: AWS, GCP, and Azure offer the building blocks for resilient, globally distributed infrastructure
- Infrastructure as Code ensures consistency: Terraform and similar tools make infrastructure reproducible, versionable, and auditable
- Load balancing enables scale: Professional load balancers distribute traffic intelligently while handling failures gracefully
- CDNs optimize global performance: Content delivery networks reduce latency and server load through intelligent edge caching
What distinguishes professional deployment infrastructure:
- Automated deployment pipelines that handle complexity without human intervention
- Multi-cloud infrastructure provisioning that eliminates vendor lock-in risks
- Intelligent load balancing with health checking and automatic failover
- Global CDN optimization that serves content from the edge closest to users
- Comprehensive monitoring and alerting that detects issues before customers notice
What’s Next
This article covered deployment strategies, cloud platform architecture, Infrastructure as Code, load balancing, and CDN optimization. The next article completes the deployment infrastructure with CI/CD pipeline automation, comprehensive monitoring and alerting systems, centralized log aggregation and analysis, disaster recovery planning, and backup strategies that ensure business continuity.
You’re no longer manually deploying applications and hoping they workβyou’re operating professional infrastructure that scales globally, handles failures gracefully, and delivers optimal performance to users worldwide. The deployment foundation is solid. Now we build the operational excellence around it.