Scalability & High Availability - 1/2
The $100 Million Black Friday That Broke the Internet
Picture this absolute catastrophe: A major e-commerce company with 10 million users prepares for their biggest Black Friday sale ever. They’ve got exclusive deals, celebrity endorsements, and a marketing budget that could fund a small country. What could go wrong?
Everything. Literally everything.
At 12:01 AM EST, their carefully orchestrated Black Friday launch turned into a digital apocalypse:
- Complete website shutdown within 4 minutes: 500,000 concurrent users hit their single-server architecture like a digital tsunami
- Database meltdown: Their MySQL server started throwing connection pool exhaustion errors faster than you can say “SELECT * FROM disaster”
- Cache stampede catastrophe: Redis crashed under load, causing every request to hammer the already dying database
- Load balancer? What load balancer?: All traffic was hitting one poor server that was literally melting in their data center
- CDN misconfiguration: Static assets were being served from their origin server instead of edge locations
- Mobile app crashes: API timeouts cascaded through their mobile applications, causing mass user logouts
The damage report was absolutely brutal:
- $100 million in lost revenue during a 6-hour outage on their biggest sales day
- 2.3 million angry customers who couldn’t complete purchases
- 89% of users never returned to attempt purchases after the initial failure
- Stock price dropped 23% in after-hours trading
- 6 months of engineering effort to rebuild their architecture from scratch
- Complete loss of market share to competitors who stayed online
Here’s the kicker: Every single technical failure was completely preventable with proper scaling architecture that costs less than their coffee budget.
The Uncomfortable Truth About Scale
Here’s what separates applications that gracefully handle millions of users from those that collapse faster than a house of cards in a hurricane: Scalability isn’t about writing more code—it’s about designing systems that become more resilient as load increases, not more fragile.
Most developers approach scaling like this:
- Build everything on a single server because “it works fine in development”
- Add more RAM when things slow down (throwing hardware at software problems)
- Panic when the database can’t handle concurrent connections
- Frantically add caching without understanding cache invalidation
- Discover that their “scalable” architecture is actually a single point of failure wearing a disguise
But systems that actually handle real-world traffic work differently:
- Design for horizontal scaling from day one because you can’t add more hours to a day
- Distribute load intelligently across multiple servers instead of hoping one server can handle everything
- Scale database reads and writes independently because they have completely different characteristics
- Cache strategically at multiple layers with proper invalidation strategies
- Prepare for failure scenarios because servers die, networks partition, and Murphy’s Law is undefeated
The difference isn’t just technical architecture—it’s the difference between systems that get stronger under pressure and systems that crumble the moment they face real-world usage.
Ready to build applications that scale like Netflix instead of failing like that Black Friday disaster? Let’s dive into the scaling strategies that power the internet’s biggest success stories.
Horizontal vs Vertical Scaling: The Foundation of Growth
The Problem: Vertical Scaling Hits the Wall
// The single-server nightmare that fails spectacularly under load
class MonolithicOrderService {
private database: Database;
private cache: InMemoryCache;
private emailService: EmailService;
constructor() {
// Everything runs on one server - RED FLAG #1
this.database = new Database({
host: "localhost",
maxConnections: 100, // Will be exhausted instantly under load
});
// In-memory cache that dies with the server - RED FLAG #2
this.cache = new InMemoryCache({
maxSize: "2GB", // What happens when we need more?
});
this.emailService = new EmailService();
}
async processOrder(orderData: OrderRequest): Promise<Order> {
// Single-threaded processing - RED FLAG #3
// This will become the bottleneck under concurrent load
try {
// Database connection from the limited pool
const existingOrder = await this.database.query(
'SELECT * FROM orders WHERE user_id = ? AND status = "pending"',
[orderData.userId]
);
if (existingOrder) {
throw new Error("Pending order already exists");
}
// Synchronous processing blocks other requests - RED FLAG #4
const inventory = await this.checkInventory(orderData.items);
const paymentResult = await this.processPayment(orderData.payment);
const order = await this.createOrder(orderData, paymentResult);
// Email sending blocks order completion - RED FLAG #5
await this.emailService.sendConfirmation(order.userEmail, order.id);
// Cache update on single server - RED FLAG #6
this.cache.set(`order_${order.id}`, order);
return order;
} catch (error) {
// When this server dies, everything dies - RED FLAG #7
throw error;
}
}
private async checkInventory(items: OrderItem[]): Promise<InventoryResult> {
// Complex inventory calculation on single server
// Gets exponentially slower with more concurrent requests
const inventoryChecks = await Promise.all(
items.map(async (item) => {
const stock = await this.database.query(
"SELECT quantity FROM inventory WHERE product_id = ? FOR UPDATE",
[item.productId]
);
// Row-level locks that create bottlenecks under high concurrency
if (stock.quantity < item.quantity) {
throw new Error(`Insufficient stock for ${item.productId}`);
}
await this.database.query(
"UPDATE inventory SET quantity = quantity - ? WHERE product_id = ?",
[item.quantity, item.productId]
);
return { productId: item.productId, reserved: item.quantity };
})
);
return new InventoryResult(inventoryChecks);
}
async getOrderHistory(userId: string, page: number = 1): Promise<Order[]> {
// Expensive query on primary database - RED FLAG #8
// Analytics queries competing with transactional operations
const orders = await this.database.query(
`
SELECT o.*,
COUNT(oi.id) as item_count,
SUM(oi.price * oi.quantity) as total_amount,
p.name as payment_method
FROM orders o
LEFT JOIN order_items oi ON o.id = oi.order_id
LEFT JOIN payments p ON o.payment_id = p.id
WHERE o.user_id = ?
GROUP BY o.id
ORDER BY o.created_at DESC
LIMIT ?, 50
`,
[userId, (page - 1) * 50]
);
return orders.map((row) => this.mapToOrder(row));
}
// What happens when we need more capacity?
// 1. Buy a bigger server (vertical scaling)
// 2. Hope it's enough
// 3. Repeat until you hit hardware limits or bankruptcy
}
// The monitoring shows the inevitable failure
class ServerMonitor {
checkHealth(): ServerHealth {
return {
cpuUsage: 95, // Constantly pegged
memoryUsage: 89, // About to hit swap
diskIO: 87, // Database writes saturating disk
networkIO: 78, // Single network interface maxed out
databaseConnections: 98, // Pool exhaustion imminent
responseTime: 15000, // 15 seconds average response time
errorRate: 23, // Nearly 1 in 4 requests failing
activeUsers: 50000, // Far beyond system capacity
status: "CRITICAL", // Time to panic
};
}
}
The Solution: Horizontal Scaling Architecture
// Distributed architecture that scales linearly with demand
export class ScalableOrderService {
constructor(
private loadBalancer: LoadBalancer,
private orderProcessingCluster: ServiceCluster<OrderProcessor>,
private databaseCluster: DatabaseCluster,
private distributedCache: DistributedCache,
private messageQueue: MessageQueue,
private serviceDiscovery: ServiceDiscovery
) {}
async processOrder(orderData: OrderRequest): Promise<ProcessOrderResponse> {
// Async processing with immediate response - scalability principle #1
const orderId = uuidv4();
const processingId = uuidv4();
// Put order on queue for processing - handles backpressure gracefully
await this.messageQueue.publish("order.process", {
orderId,
processingId,
orderData,
timestamp: new Date(),
});
// Return immediately - don't block the API
return new ProcessOrderResponse(orderId, processingId, "QUEUED");
}
async getOrderStatus(orderId: string): Promise<OrderStatus> {
// Check multiple sources with fallbacks
try {
// Try cache first (fastest)
const cached = await this.distributedCache.get(`order_${orderId}`);
if (cached) {
return cached;
}
// Fall back to read replica (still fast)
const readReplica = this.databaseCluster.getReadReplica();
const order = await readReplica.query(
"SELECT * FROM orders WHERE id = ?",
[orderId]
);
if (order) {
// Cache for future requests
await this.distributedCache.setEx(
`order_${orderId}`,
300, // 5 minutes
order
);
return order;
}
return new OrderStatus(orderId, "NOT_FOUND");
} catch (error) {
// Graceful degradation - return what we know
return new OrderStatus(orderId, "UNKNOWN");
}
}
}
// Individual order processor - can scale horizontally
export class OrderProcessor {
constructor(
private instanceId: string,
private writeDatabase: WriteDatabase,
private readDatabase: ReadDatabase,
private inventoryService: InventoryService,
private paymentService: PaymentService,
private emailQueue: MessageQueue,
private cache: DistributedCache
) {}
async processOrderMessage(message: OrderMessage): Promise<void> {
const { orderId, orderData } = message;
try {
// Update status to processing
await this.updateOrderStatus(orderId, "PROCESSING");
// Process steps in parallel where possible
const [inventoryResult, paymentResult] = await Promise.all([
this.reserveInventory(orderData.items),
this.processPayment(orderData.payment),
]);
// Create order record
const order = await this.createOrder(orderId, orderData, {
inventoryReservation: inventoryResult.reservationId,
paymentId: paymentResult.transactionId,
});
// Update caches across all nodes
await this.cache.set(`order_${orderId}`, order, 3600); // 1 hour
// Queue email asynchronously - don't block order completion
await this.emailQueue.publish("email.order_confirmation", {
orderId,
userEmail: orderData.userEmail,
orderDetails: order,
});
await this.updateOrderStatus(orderId, "COMPLETED");
} catch (error) {
await this.updateOrderStatus(orderId, "FAILED", error.message);
// Queue for retry or manual review
await this.emailQueue.publish("order.failed", {
orderId,
error: error.message,
orderData,
});
}
}
private async reserveInventory(
items: OrderItem[]
): Promise<InventoryReservation> {
// Call distributed inventory service
return await this.inventoryService.reserveItems(items);
}
private async processPayment(
paymentData: PaymentRequest
): Promise<PaymentResult> {
// Call distributed payment service
return await this.paymentService.processPayment(paymentData);
}
private async createOrder(
orderId: string,
orderData: OrderRequest,
transactionIds: TransactionIds
): Promise<Order> {
// Write to master database
const order = await this.writeDatabase.transaction(async (tx) => {
const order = await tx.query(
`
INSERT INTO orders (id, user_id, status, total_amount, created_at)
VALUES (?, ?, 'COMPLETED', ?, NOW())
RETURNING *
`,
[orderId, orderData.userId, orderData.totalAmount]
);
// Insert order items
for (const item of orderData.items) {
await tx.query(
`
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES (?, ?, ?, ?)
`,
[orderId, item.productId, item.quantity, item.price]
);
}
return order;
});
return order;
}
private async updateOrderStatus(
orderId: string,
status: string,
errorMessage?: string
): Promise<void> {
// Update database
await this.writeDatabase.query(
`
UPDATE orders
SET status = ?, error_message = ?, updated_at = NOW()
WHERE id = ?
`,
[status, errorMessage, orderId]
);
// Invalidate cache
await this.cache.del(`order_${orderId}`);
// Update real-time status for UI
await this.cache.set(
`order_status_${orderId}`,
{
status,
errorMessage,
updatedAt: new Date(),
},
300
);
}
}
// Load balancer distributes traffic intelligently
export class LoadBalancer {
private upstreamServers: UpstreamServer[] = [];
private healthChecker: HealthChecker;
private algorithm: LoadBalancingAlgorithm;
constructor(
servers: ServerConfig[],
algorithm:
| "round_robin"
| "least_connections"
| "weighted_response_time" = "least_connections"
) {
this.upstreamServers = servers.map((config) => new UpstreamServer(config));
this.healthChecker = new HealthChecker(this.upstreamServers);
this.algorithm = this.createAlgorithm(algorithm);
// Start health checking
this.healthChecker.startChecking(30000); // Every 30 seconds
}
async routeRequest(request: HttpRequest): Promise<HttpResponse> {
// Get healthy servers
const healthyServers = this.upstreamServers.filter((s) => s.isHealthy());
if (healthyServers.length === 0) {
throw new ServiceUnavailableError(
"No healthy upstream servers available"
);
}
// Select server based on algorithm
const selectedServer = this.algorithm.selectServer(healthyServers);
// Track connection
selectedServer.incrementConnections();
try {
const startTime = Date.now();
const response = await selectedServer.forwardRequest(request);
const responseTime = Date.now() - startTime;
// Update metrics for load balancing decisions
selectedServer.updateMetrics(responseTime, response.statusCode < 400);
return response;
} catch (error) {
selectedServer.updateMetrics(0, false);
// Try another server if available
if (healthyServers.length > 1) {
return this.routeRequestWithFallback(request, selectedServer);
}
throw error;
} finally {
selectedServer.decrementConnections();
}
}
private async routeRequestWithFallback(
request: HttpRequest,
failedServer: UpstreamServer
): Promise<HttpResponse> {
const remainingServers = this.upstreamServers.filter(
(s) => s.isHealthy() && s !== failedServer
);
if (remainingServers.length === 0) {
throw new ServiceUnavailableError("All upstream servers failed");
}
const fallbackServer = this.algorithm.selectServer(remainingServers);
return await fallbackServer.forwardRequest(request);
}
private createAlgorithm(type: string): LoadBalancingAlgorithm {
switch (type) {
case "round_robin":
return new RoundRobinAlgorithm();
case "least_connections":
return new LeastConnectionsAlgorithm();
case "weighted_response_time":
return new WeightedResponseTimeAlgorithm();
default:
throw new Error(`Unknown load balancing algorithm: ${type}`);
}
}
getStats(): LoadBalancerStats {
return {
totalServers: this.upstreamServers.length,
healthyServers: this.upstreamServers.filter((s) => s.isHealthy()).length,
totalRequests: this.upstreamServers.reduce(
(sum, s) => sum + s.getStats().requestCount,
0
),
totalErrors: this.upstreamServers.reduce(
(sum, s) => sum + s.getStats().errorCount,
0
),
averageResponseTime: this.calculateAverageResponseTime(),
};
}
private calculateAverageResponseTime(): number {
const serverStats = this.upstreamServers.map((s) => s.getStats());
const totalTime = serverStats.reduce(
(sum, stats) => sum + stats.totalResponseTime,
0
);
const totalRequests = serverStats.reduce(
(sum, stats) => sum + stats.requestCount,
0
);
return totalRequests > 0 ? totalTime / totalRequests : 0;
}
}
// Upstream server with health tracking
export class UpstreamServer {
private config: ServerConfig;
private healthy = true;
private activeConnections = 0;
private stats: ServerStats;
constructor(config: ServerConfig) {
this.config = config;
this.stats = new ServerStats();
}
async forwardRequest(request: HttpRequest): Promise<HttpResponse> {
const startTime = Date.now();
try {
const response = await fetch(`${this.config.url}${request.path}`, {
method: request.method,
headers: request.headers,
body: request.body,
timeout: this.config.timeout || 30000,
});
const responseTime = Date.now() - startTime;
this.stats.recordRequest(responseTime, response.ok);
return response;
} catch (error) {
const responseTime = Date.now() - startTime;
this.stats.recordRequest(responseTime, false);
throw error;
}
}
async checkHealth(): Promise<boolean> {
try {
const response = await fetch(`${this.config.url}/health`, {
method: "GET",
timeout: 5000,
});
this.healthy = response.ok;
return this.healthy;
} catch (error) {
this.healthy = false;
return false;
}
}
isHealthy(): boolean {
return this.healthy;
}
getActiveConnections(): number {
return this.activeConnections;
}
incrementConnections(): void {
this.activeConnections++;
}
decrementConnections(): void {
this.activeConnections = Math.max(0, this.activeConnections - 1);
}
updateMetrics(responseTime: number, success: boolean): void {
this.stats.recordRequest(responseTime, success);
}
getStats(): ServerStats {
return this.stats;
}
getWeight(): number {
return this.config.weight || 1;
}
}
// Load balancing algorithms
export class LeastConnectionsAlgorithm implements LoadBalancingAlgorithm {
selectServer(servers: UpstreamServer[]): UpstreamServer {
return servers.reduce((leastConnected, current) => {
const currentConnections = current.getActiveConnections();
const leastConnections = leastConnected.getActiveConnections();
// Factor in server weight
const currentRatio = currentConnections / current.getWeight();
const leastRatio = leastConnections / leastConnected.getWeight();
return currentRatio < leastRatio ? current : leastConnected;
});
}
}
export class WeightedResponseTimeAlgorithm implements LoadBalancingAlgorithm {
selectServer(servers: UpstreamServer[]): UpstreamServer {
return servers.reduce((fastest, current) => {
const currentScore = this.calculateScore(current);
const fastestScore = this.calculateScore(fastest);
return currentScore > fastestScore ? current : fastest;
});
}
private calculateScore(server: UpstreamServer): number {
const stats = server.getStats();
const avgResponseTime = stats.getAverageResponseTime();
const successRate = stats.getSuccessRate();
const weight = server.getWeight();
// Higher score is better
// Factor in response time (lower is better), success rate (higher is better), and weight
return (weight * successRate) / Math.max(avgResponseTime, 1);
}
}
// Auto-scaling based on metrics
export class AutoScaler {
constructor(
private serviceCluster: ServiceCluster<OrderProcessor>,
private metrics: MetricsCollector,
private cloudProvider: CloudProvider
) {}
async evaluateScaling(): Promise<ScalingDecision> {
const currentMetrics = await this.metrics.getLatestMetrics();
const currentInstances = this.serviceCluster.getInstanceCount();
// Scale up conditions
if (this.shouldScaleUp(currentMetrics, currentInstances)) {
const targetInstances = this.calculateTargetInstances(
currentMetrics,
"up"
);
return new ScalingDecision("SCALE_UP", targetInstances);
}
// Scale down conditions
if (this.shouldScaleDown(currentMetrics, currentInstances)) {
const targetInstances = this.calculateTargetInstances(
currentMetrics,
"down"
);
return new ScalingDecision("SCALE_DOWN", targetInstances);
}
return new ScalingDecision("NO_ACTION", currentInstances);
}
private shouldScaleUp(
metrics: SystemMetrics,
currentInstances: number
): boolean {
return (
metrics.cpuUtilization > 70 ||
metrics.memoryUtilization > 80 ||
metrics.queueDepth > 1000 ||
metrics.averageResponseTime > 2000 ||
metrics.requestRate > currentInstances * 100 // More than 100 RPS per instance
);
}
private shouldScaleDown(
metrics: SystemMetrics,
currentInstances: number
): boolean {
return (
currentInstances > 2 && // Always keep minimum instances
metrics.cpuUtilization < 30 &&
metrics.memoryUtilization < 50 &&
metrics.queueDepth < 100 &&
metrics.averageResponseTime < 500 &&
metrics.requestRate < currentInstances * 30 // Less than 30 RPS per instance
);
}
private calculateTargetInstances(
metrics: SystemMetrics,
direction: "up" | "down"
): number {
const currentInstances = this.serviceCluster.getInstanceCount();
if (direction === "up") {
// Aggressive scaling up for performance
if (metrics.cpuUtilization > 90)
return Math.min(currentInstances * 2, 50);
if (metrics.queueDepth > 5000)
return Math.min(currentInstances * 1.5, 50);
return Math.min(currentInstances + 2, 50);
} else {
// Conservative scaling down for cost optimization
return Math.max(Math.floor(currentInstances * 0.7), 2);
}
}
async executeScaling(decision: ScalingDecision): Promise<void> {
if (decision.action === "NO_ACTION") return;
const currentInstances = this.serviceCluster.getInstanceCount();
if (decision.action === "SCALE_UP") {
const instancesToAdd = decision.targetInstances - currentInstances;
await this.scaleUp(instancesToAdd);
} else {
const instancesToRemove = currentInstances - decision.targetInstances;
await this.scaleDown(instancesToRemove);
}
}
private async scaleUp(instanceCount: number): Promise<void> {
console.log(`Scaling up: adding ${instanceCount} instances`);
const newInstances = await Promise.all(
Array.from({ length: instanceCount }, () =>
this.cloudProvider.launchInstance({
image: "order-processor:latest",
instanceType: "t3.medium",
securityGroups: ["order-processor-sg"],
userData: this.generateUserData(),
})
)
);
// Wait for instances to be ready
await Promise.all(
newInstances.map((instance) => this.waitForInstanceReady(instance.id))
);
// Add to service cluster
for (const instance of newInstances) {
this.serviceCluster.addInstance(
new OrderProcessor(
instance.id
// ... dependencies
)
);
}
console.log(
`Successfully scaled up to ${this.serviceCluster.getInstanceCount()} instances`
);
}
private async scaleDown(instanceCount: number): Promise<void> {
console.log(`Scaling down: removing ${instanceCount} instances`);
// Remove instances gracefully
const instancesToRemove =
this.serviceCluster.selectInstancesForRemoval(instanceCount);
// Drain connections first
await Promise.all(
instancesToRemove.map((instance) => this.drainInstance(instance))
);
// Remove from cluster
for (const instance of instancesToRemove) {
this.serviceCluster.removeInstance(instance.instanceId);
await this.cloudProvider.terminateInstance(instance.instanceId);
}
console.log(
`Successfully scaled down to ${this.serviceCluster.getInstanceCount()} instances`
);
}
private async drainInstance(instance: OrderProcessor): Promise<void> {
// Stop accepting new requests
instance.setAcceptingRequests(false);
// Wait for current requests to complete (with timeout)
let attempts = 0;
const maxAttempts = 30; // 5 minutes
while (instance.getActiveRequests() > 0 && attempts < maxAttempts) {
await new Promise((resolve) => setTimeout(resolve, 10000)); // Wait 10 seconds
attempts++;
}
if (instance.getActiveRequests() > 0) {
console.warn(
`Force draining instance ${
instance.instanceId
} with ${instance.getActiveRequests()} active requests`
);
}
}
private generateUserData(): string {
return `#!/bin/bash
# Install dependencies
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
# Start order processor
docker run -d \\
--name order-processor \\
--restart always \\
-p 8080:8080 \\
-e NODE_ENV=production \\
order-processor:latest
# Register with service discovery
curl -X POST http://service-discovery:8500/v1/agent/service/register \\
-d '{
"ID": "order-processor-$(curl -s http://169.254.169.254/latest/meta-data/instance-id)",
"Name": "order-processor",
"Address": "$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)",
"Port": 8080,
"Check": {
"HTTP": "http://localhost:8080/health",
"Interval": "30s"
}
}'
`;
}
private async waitForInstanceReady(instanceId: string): Promise<void> {
let attempts = 0;
const maxAttempts = 30; // 15 minutes
while (attempts < maxAttempts) {
try {
const instance = await this.cloudProvider.getInstance(instanceId);
if (instance.state === "running") {
// Check if service is responding
const healthCheck = await fetch(
`http://${instance.privateIp}:8080/health`
);
if (healthCheck.ok) {
return;
}
}
} catch (error) {
// Continue waiting
}
await new Promise((resolve) => setTimeout(resolve, 30000)); // Wait 30 seconds
attempts++;
}
throw new Error(
`Instance ${instanceId} failed to become ready within timeout`
);
}
}
// Supporting types and interfaces
export interface LoadBalancingAlgorithm {
selectServer(servers: UpstreamServer[]): UpstreamServer;
}
export class ServerStats {
private requestCount = 0;
private errorCount = 0;
private totalResponseTime = 0;
private responseTimeHistory: number[] = [];
recordRequest(responseTime: number, success: boolean): void {
this.requestCount++;
this.totalResponseTime += responseTime;
if (!success) {
this.errorCount++;
}
// Keep last 100 response times for trend analysis
this.responseTimeHistory.push(responseTime);
if (this.responseTimeHistory.length > 100) {
this.responseTimeHistory.shift();
}
}
getAverageResponseTime(): number {
return this.requestCount > 0
? this.totalResponseTime / this.requestCount
: 0;
}
getSuccessRate(): number {
return this.requestCount > 0
? (this.requestCount - this.errorCount) / this.requestCount
: 1;
}
}
export class ScalingDecision {
constructor(
public action: "SCALE_UP" | "SCALE_DOWN" | "NO_ACTION",
public targetInstances: number
) {}
}
export interface SystemMetrics {
cpuUtilization: number;
memoryUtilization: number;
queueDepth: number;
averageResponseTime: number;
requestRate: number;
errorRate: number;
}
export interface ServerConfig {
url: string;
weight?: number;
timeout?: number;
}
export interface HttpRequest {
method: string;
path: string;
headers: Record<string, string>;
body?: any;
}
export interface HttpResponse {
statusCode: number;
headers: Record<string, string>;
body: any;
ok: boolean;
}
Database Scaling Strategies: Handling Data at Scale
The Problem: Single Database Bottleneck
// The monolithic database approach that kills performance
class MonolithicUserService {
private database: Database;
constructor() {
// Single database handling everything - RED FLAG #1
this.database = new Database({
host: "db-primary.internal",
maxConnections: 50, // Will be exhausted under load
connectionTimeout: 30000,
});
}
async getUserProfile(userId: string): Promise<UserProfile> {
// Heavy read query on primary database - RED FLAG #2
// Blocks write operations while executing
const result = await this.database.query(
`
SELECT u.*,
p.avatar_url, p.bio,
COUNT(DISTINCT f.follower_id) as follower_count,
COUNT(DISTINCT following.following_id) as following_count,
COUNT(DISTINCT posts.id) as post_count,
AVG(post_ratings.rating) as avg_post_rating,
MAX(posts.created_at) as last_post_date
FROM users u
LEFT JOIN profiles p ON u.id = p.user_id
LEFT JOIN follows f ON u.id = f.following_id
LEFT JOIN follows following ON u.id = following.follower_id
LEFT JOIN posts ON u.id = posts.user_id
LEFT JOIN post_ratings ON posts.id = post_ratings.post_id
WHERE u.id = ?
GROUP BY u.id, p.id
`,
[userId]
);
// This query gets slower as data grows - O(n) complexity on user data
return this.mapToUserProfile(result[0]);
}
async getUserFeed(userId: string, page: number = 1): Promise<Post[]> {
// Complex analytical query on transactional database - RED FLAG #3
// Joins across multiple large tables during peak hours
const posts = await this.database.query(
`
SELECT p.*,
u.name as author_name,
u.avatar_url as author_avatar,
COUNT(DISTINCT likes.id) as like_count,
COUNT(DISTINCT comments.id) as comment_count,
COUNT(DISTINCT shares.id) as share_count,
EXISTS(
SELECT 1 FROM likes l2
WHERE l2.post_id = p.id AND l2.user_id = ?
) as user_liked
FROM posts p
JOIN users u ON p.user_id = u.id
JOIN follows f ON p.user_id = f.following_id AND f.follower_id = ?
LEFT JOIN likes ON p.id = likes.post_id
LEFT JOIN comments ON p.id = comments.post_id
LEFT JOIN shares ON p.id = shares.post_id
WHERE p.created_at > DATE_SUB(NOW(), INTERVAL 7 DAY)
GROUP BY p.id
ORDER BY (
(COUNT(DISTINCT likes.id) * 1.0) +
(COUNT(DISTINCT comments.id) * 2.0) +
(COUNT(DISTINCT shares.id) * 3.0) +
(1 / (TIMESTAMPDIFF(HOUR, p.created_at, NOW()) + 1))
) DESC
LIMIT ?, 20
`,
[userId, userId, (page - 1) * 20]
);
// This query:
// 1. Scans millions of posts
// 2. Joins across 6 tables
// 3. Calculates complex scoring algorithm
// 4. Runs on the same database handling user registrations
return posts.map((row) => this.mapToPost(row));
}
async createUser(userData: CreateUserRequest): Promise<User> {
// Write operation competing with heavy read queries - RED FLAG #4
// May timeout waiting for table locks from analytics queries
const transaction = await this.database.beginTransaction();
try {
// Main user record
const user = await transaction.query(
`
INSERT INTO users (email, name, password_hash, created_at)
VALUES (?, ?, ?, NOW())
RETURNING *
`,
[userData.email, userData.name, userData.passwordHash]
);
// Profile record
await transaction.query(
`
INSERT INTO profiles (user_id, bio, avatar_url)
VALUES (?, ?, ?)
`,
[user.id, userData.bio || "", userData.avatarUrl || ""]
);
// Default settings
await transaction.query(
`
INSERT INTO user_settings (user_id, email_notifications, privacy_level)
VALUES (?, true, 'public')
`,
[user.id]
);
await transaction.commit();
// Expensive operation on primary database - RED FLAG #5
await this.updateUserStats(user.id);
return user;
} catch (error) {
await transaction.rollback();
throw error;
}
}
async updateUserStats(userId: string): Promise<void> {
// Analytical operation on transactional database - RED FLAG #6
await this.database.query(
`
UPDATE user_stats SET
post_count = (SELECT COUNT(*) FROM posts WHERE user_id = ?),
follower_count = (SELECT COUNT(*) FROM follows WHERE following_id = ?),
following_count = (SELECT COUNT(*) FROM follows WHERE follower_id = ?),
total_likes = (
SELECT COUNT(*) FROM likes l
JOIN posts p ON l.post_id = p.id
WHERE p.user_id = ?
),
last_updated = NOW()
WHERE user_id = ?
`,
[userId, userId, userId, userId, userId]
);
}
// What happens under load:
// 1. Read queries lock tables for seconds
// 2. Write operations queue up waiting for locks
// 3. Connection pool exhaustion causes timeouts
// 4. Database CPU hits 100% from complex queries
// 5. Everything slows down exponentially
}
The Solution: Database Scaling Architecture
// Distributed database architecture with read/write separation
export class ScalableDatabaseService {
constructor(
private writeDatabase: PrimaryDatabase,
private readReplicas: ReadReplicaPool,
private analyticsDatabase: AnalyticsDatabase,
private cache: DistributedCache,
private searchEngine: SearchEngine,
private shardManager: ShardManager
) {}
async getUserProfile(userId: string): Promise<UserProfile | null> {
const cacheKey = `user_profile_${userId}`;
// Try cache first - sub-millisecond response
let profile = await this.cache.get(cacheKey);
if (profile) {
return profile;
}
// Get basic user data from read replica
const readReplica = this.readReplicas.getHealthyReplica();
const [userBasic, userStats] = await Promise.all([
// Basic user info from optimized read replica
readReplica.query(
`
SELECT u.id, u.email, u.name, u.created_at,
p.avatar_url, p.bio
FROM users u
LEFT JOIN profiles p ON u.id = p.user_id
WHERE u.id = ?
`,
[userId]
),
// Pre-computed stats from dedicated table
readReplica.query(
`
SELECT follower_count, following_count, post_count, avg_rating
FROM user_stats
WHERE user_id = ?
`,
[userId]
),
]);
if (!userBasic[0]) {
return null;
}
profile = new UserProfile(userBasic[0], userStats[0] || new UserStats());
// Cache for 5 minutes
await this.cache.setEx(cacheKey, 300, profile);
return profile;
}
async getUserFeed(userId: string, page: number = 1): Promise<FeedResponse> {
const cacheKey = `user_feed_${userId}_${page}`;
// Check cache first
let feed = await this.cache.get(cacheKey);
if (feed) {
return feed;
}
// Use search engine for complex feed queries
// This offloads analytical work from transactional database
const searchResults = await this.searchEngine.search({
query: {
bool: {
must: [
{ term: { followers: userId } },
{ range: { created_at: { gte: "now-7d" } } },
],
},
},
sort: [
{
engagement_score: { order: "desc" },
created_at: { order: "desc" },
},
],
from: (page - 1) * 20,
size: 20,
});
// Get additional data from read replica if needed
const postIds = searchResults.hits.map((hit) => hit._source.id);
const readReplica = this.readReplicas.getHealthyReplica();
const postsWithDetails = await readReplica.query(
`
SELECT p.*, u.name as author_name, u.avatar_url as author_avatar
FROM posts p
JOIN users u ON p.user_id = u.id
WHERE p.id IN (${postIds.map(() => "?").join(",")})
`,
postIds
);
feed = new FeedResponse(postsWithDetails, searchResults.total);
// Cache for 2 minutes (feeds change frequently)
await this.cache.setEx(cacheKey, 120, feed);
return feed;
}
async createUser(userData: CreateUserRequest): Promise<User> {
// Determine shard based on user ID
const userId = uuidv4();
const shard = this.shardManager.getShardForUser(userId);
// Write to primary database
const user = await this.writeDatabase.transaction(async (tx) => {
// Main user record
const user = await tx.query(
`
INSERT INTO users (id, email, name, password_hash, created_at)
VALUES (?, ?, ?, ?, NOW())
RETURNING *
`,
[userId, userData.email, userData.name, userData.passwordHash]
);
// Profile record
await tx.query(
`
INSERT INTO profiles (user_id, bio, avatar_url)
VALUES (?, ?, ?)
`,
[userId, userData.bio || "", userData.avatarUrl || ""]
);
return user;
});
// Initialize stats asynchronously
await this.initializeUserStats(userId);
// Invalidate relevant caches
await this.invalidateUserCaches(userId);
return user;
}
private async initializeUserStats(userId: string): Promise<void> {
// Write to analytics database (separate from transactional database)
await this.analyticsDatabase.query(
`
INSERT INTO user_stats (user_id, post_count, follower_count, following_count, total_likes, last_updated)
VALUES (?, 0, 0, 0, 0, NOW())
`,
[userId]
);
}
async updateUserStats(userId: string): Promise<void> {
// Update analytics database asynchronously
// This doesn't block transactional operations
const stats = await this.analyticsDatabase.query(
`
WITH user_metrics AS (
SELECT
? as user_id,
COALESCE(post_count.cnt, 0) as posts,
COALESCE(follower_count.cnt, 0) as followers,
COALESCE(following_count.cnt, 0) as following,
COALESCE(like_count.cnt, 0) as likes
FROM (SELECT 1) dummy
LEFT JOIN (
SELECT COUNT(*) as cnt FROM posts WHERE user_id = ?
) post_count ON true
LEFT JOIN (
SELECT COUNT(*) as cnt FROM follows WHERE following_id = ?
) follower_count ON true
LEFT JOIN (
SELECT COUNT(*) as cnt FROM follows WHERE follower_id = ?
) following_count ON true
LEFT JOIN (
SELECT COUNT(*) as cnt FROM likes l
JOIN posts p ON l.post_id = p.id
WHERE p.user_id = ?
) like_count ON true
)
UPDATE user_stats
SET
post_count = user_metrics.posts,
follower_count = user_metrics.followers,
following_count = user_metrics.following,
total_likes = user_metrics.likes,
last_updated = NOW()
FROM user_metrics
WHERE user_stats.user_id = user_metrics.user_id
`,
[userId, userId, userId, userId, userId]
);
// Invalidate user profile cache
await this.cache.del(`user_profile_${userId}`);
}
private async invalidateUserCaches(userId: string): Promise<void> {
const cacheKeys = [
`user_profile_${userId}`,
`user_feed_${userId}_*`, // Pattern-based invalidation
];
await Promise.all(cacheKeys.map((key) => this.cache.del(key)));
}
}
// Read replica pool with intelligent routing
export class ReadReplicaPool {
private replicas: ReadReplica[] = [];
private healthChecker: DatabaseHealthChecker;
constructor(replicaConfigs: DatabaseConfig[]) {
this.replicas = replicaConfigs.map((config) => new ReadReplica(config));
this.healthChecker = new DatabaseHealthChecker(this.replicas);
this.healthChecker.startChecking(30000); // Check every 30 seconds
}
getHealthyReplica(): ReadReplica {
const healthyReplicas = this.replicas.filter((replica) =>
replica.isHealthy()
);
if (healthyReplicas.length === 0) {
throw new Error("No healthy read replicas available");
}
// Route based on current load
return healthyReplicas.reduce((best, current) => {
return current.getCurrentLoad() < best.getCurrentLoad() ? current : best;
});
}
getAllHealthyReplicas(): ReadReplica[] {
return this.replicas.filter((replica) => replica.isHealthy());
}
async executeParallelQuery(query: string, params: any[]): Promise<any[]> {
const healthyReplicas = this.getAllHealthyReplicas();
if (healthyReplicas.length === 0) {
throw new Error("No healthy replicas for parallel query execution");
}
// Execute query on multiple replicas and return first successful result
return Promise.race(
healthyReplicas.map((replica) => replica.query(query, params))
);
}
}
// Database sharding for horizontal scaling
export class ShardManager {
private shards: DatabaseShard[] = [];
private shardingStrategy: ShardingStrategy;
constructor(
shardConfigs: ShardConfig[],
strategy: "hash" | "range" | "directory" = "hash"
) {
this.shards = shardConfigs.map((config) => new DatabaseShard(config));
this.shardingStrategy = this.createShardingStrategy(strategy);
}
getShardForUser(userId: string): DatabaseShard {
return this.shardingStrategy.getShardForKey(userId, this.shards);
}
getShardForData(dataType: string, key: string): DatabaseShard {
return this.shardingStrategy.getShardForKey(
`${dataType}_${key}`,
this.shards
);
}
async executeAcrossAllShards(query: string, params: any[]): Promise<any[][]> {
// Execute query on all shards in parallel
const results = await Promise.all(
this.shards.map((shard) => shard.query(query, params))
);
return results;
}
async aggregateFromAllShards<T>(
query: string,
params: any[],
aggregator: (results: any[][]) => T
): Promise<T> {
const results = await this.executeAcrossAllShards(query, params);
return aggregator(results);
}
private createShardingStrategy(type: string): ShardingStrategy {
switch (type) {
case "hash":
return new HashShardingStrategy();
case "range":
return new RangeShardingStrategy();
case "directory":
return new DirectoryShardingStrategy();
default:
throw new Error(`Unknown sharding strategy: ${type}`);
}
}
}
// Hash-based sharding strategy
export class HashShardingStrategy implements ShardingStrategy {
getShardForKey(key: string, shards: DatabaseShard[]): DatabaseShard {
// Use consistent hashing for even distribution
const hash = this.hashFunction(key);
const shardIndex = hash % shards.length;
return shards[shardIndex];
}
private hashFunction(key: string): number {
let hash = 0;
for (let i = 0; i < key.length; i++) {
const char = key.charCodeAt(i);
hash = (hash << 5) - hash + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
}
// Database shard with connection pooling
export class DatabaseShard {
private config: ShardConfig;
private connectionPool: ConnectionPool;
private stats: ShardStats;
constructor(config: ShardConfig) {
this.config = config;
this.connectionPool = new ConnectionPool({
...config.database,
minConnections: 5,
maxConnections: 20,
acquireTimeoutMillis: 30000,
idleTimeoutMillis: 300000,
});
this.stats = new ShardStats();
}
async query(sql: string, params: any[] = []): Promise<any[]> {
const startTime = Date.now();
try {
const connection = await this.connectionPool.acquire();
const result = await connection.query(sql, params);
await this.connectionPool.release(connection);
const duration = Date.now() - startTime;
this.stats.recordQuery(duration, true);
return result;
} catch (error) {
const duration = Date.now() - startTime;
this.stats.recordQuery(duration, false);
throw error;
}
}
async transaction<T>(callback: (tx: Transaction) => Promise<T>): Promise<T> {
const connection = await this.connectionPool.acquire();
const transaction = await connection.beginTransaction();
try {
const result = await callback(transaction);
await transaction.commit();
return result;
} catch (error) {
await transaction.rollback();
throw error;
} finally {
await this.connectionPool.release(connection);
}
}
getStats(): ShardStats {
return this.stats;
}
getShardId(): string {
return this.config.id;
}
}
// Supporting interfaces and classes
export interface ShardingStrategy {
getShardForKey(key: string, shards: DatabaseShard[]): DatabaseShard;
}
export class ShardStats {
private queryCount = 0;
private errorCount = 0;
private totalQueryTime = 0;
recordQuery(duration: number, success: boolean): void {
this.queryCount++;
this.totalQueryTime += duration;
if (!success) {
this.errorCount++;
}
}
getAverageQueryTime(): number {
return this.queryCount > 0 ? this.totalQueryTime / this.queryCount : 0;
}
getErrorRate(): number {
return this.queryCount > 0 ? this.errorCount / this.queryCount : 0;
}
}
export interface ShardConfig {
id: string;
database: DatabaseConfig;
weight: number;
}
export interface DatabaseConfig {
host: string;
port: number;
database: string;
username: string;
password: string;
ssl?: boolean;
}
Strategic Caching: Performance at Internet Scale
The Problem: Cache Misses and Cache Stampedes
// The naive caching approach that creates more problems than it solves
class NaiveCacheService {
private cache = new Map<string, any>();
private database: Database;
constructor(database: Database) {
this.database = database;
}
async getUser(userId: string): Promise<User> {
const cacheKey = `user_${userId}`;
// Simple cache check - RED FLAG #1
if (this.cache.has(cacheKey)) {
return this.cache.get(cacheKey);
}
// Cache miss - everyone hits the database - RED FLAG #2
const user = await this.database.query("SELECT * FROM users WHERE id = ?", [
userId,
]);
// No cache invalidation strategy - RED FLAG #3
this.cache.set(cacheKey, user);
return user;
}
async getPopularPosts(): Promise<Post[]> {
const cacheKey = "popular_posts";
// Check cache
if (this.cache.has(cacheKey)) {
return this.cache.get(cacheKey);
}
// Expensive query that everyone will run simultaneously - RED FLAG #4
const posts = await this.database.query(`
SELECT p.*, COUNT(l.id) as like_count
FROM posts p
LEFT JOIN likes l ON p.id = l.post_id
WHERE p.created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY p.id
ORDER BY like_count DESC, p.created_at DESC
LIMIT 100
`);
// Cache with no expiration - RED FLAG #5
this.cache.set(cacheKey, posts);
return posts;
}
// What happens under load:
// 1. Cache miss on popular_posts at 12:00 AM
// 2. 10,000 concurrent users all run the expensive query
// 3. Database gets hammered with identical queries
// 4. Response times go through the roof
// 5. Users get stale data because cache never expires
// 6. Memory usage grows unbounded
}
The Solution: Multi-Layer Caching with Smart Invalidation
// Enterprise-grade caching architecture
export class DistributedCacheService {
constructor(
private l1Cache: InMemoryCache, // Fastest: local memory cache
private l2Cache: RedisCache, // Fast: distributed cache
private l3Cache: CDNCache, // Network edge cache
private database: Database,
private eventBus: EventBus
) {
this.setupCacheInvalidation();
}
async get<T>(
key: string,
fetchFunction: () => Promise<T>,
options: CacheOptions = {}
): Promise<T> {
const {
ttl = 300,
l1Ttl = 60,
preventStampede = true,
warmUpCache = false,
} = options;
// Level 1: Memory cache (fastest)
let result = await this.l1Cache.get(key);
if (result !== null) {
// Asynchronous cache refresh if near expiration
if (warmUpCache && (await this.isNearExpiration(key, l1Ttl * 0.8))) {
this.refreshCacheAsync(key, fetchFunction, options);
}
return result;
}
// Level 2: Distributed cache
result = await this.l2Cache.get(key);
if (result !== null) {
// Store in L1 for faster future access
await this.l1Cache.setEx(key, l1Ttl, result);
return result;
}
// Level 3: CDN cache (for static/semi-static data)
if (await this.shouldUseCDN(key)) {
result = await this.l3Cache.get(key);
if (result !== null) {
// Store in L2 and L1
await Promise.all([
this.l2Cache.setEx(key, ttl, result),
this.l1Cache.setEx(key, l1Ttl, result),
]);
return result;
}
}
// Cache miss - prevent stampede
if (preventStampede) {
return await this.getWithStampedeProtection(key, fetchFunction, options);
} else {
return await this.populateCache(key, fetchFunction, options);
}
}
private async getWithStampedeProtection<T>(
key: string,
fetchFunction: () => Promise<T>,
options: CacheOptions
): Promise<T> {
const lockKey = `lock_${key}`;
const lockTtl = 30; // 30 seconds max lock time
// Try to acquire lock
const acquired = await this.l2Cache.setNx(lockKey, "locked", lockTtl);
if (acquired) {
try {
// We got the lock - fetch and populate cache
const result = await this.populateCache(key, fetchFunction, options);
return result;
} finally {
// Release lock
await this.l2Cache.del(lockKey);
}
} else {
// Someone else is fetching - wait and retry
return await this.waitAndRetry(key, fetchFunction, options, 5);
}
}
private async waitAndRetry<T>(
key: string,
fetchFunction: () => Promise<T>,
options: CacheOptions,
maxAttempts: number
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
// Wait with exponential backoff
await this.sleep(Math.pow(2, attempt - 1) * 100);
// Check if cache is now populated
const result = await this.l2Cache.get(key);
if (result !== null) {
// Cache it locally and return
await this.l1Cache.setEx(key, options.l1Ttl || 60, result);
return result;
}
}
// If still no result, fetch directly (last resort)
return await fetchFunction();
}
private async populateCache<T>(
key: string,
fetchFunction: () => Promise<T>,
options: CacheOptions
): Promise<T> {
const { ttl = 300, l1Ttl = 60 } = options;
try {
const result = await fetchFunction();
// Store in all cache layers
await Promise.all([
this.l2Cache.setEx(key, ttl, result),
this.l1Cache.setEx(key, l1Ttl, result),
]);
// Store in CDN for appropriate data
if (await this.shouldUseCDN(key)) {
await this.l3Cache.setEx(key, ttl * 2, result); // Longer TTL for CDN
}
return result;
} catch (error) {
// On fetch failure, try to return stale data if available
const staleResult = await this.getStaleData(key);
if (staleResult !== null) {
console.warn(
`Returning stale data for key ${key} due to fetch error:`,
error
);
return staleResult;
}
throw error;
}
}
async invalidate(key: string | string[]): Promise<void> {
const keys = Array.isArray(key) ? key : [key];
await Promise.all([
// Invalidate from all cache layers
this.l1Cache.del(...keys),
this.l2Cache.del(...keys),
this.l3Cache.del(...keys),
// Publish invalidation event for other nodes
this.eventBus.publish("cache.invalidation", {
keys,
timestamp: new Date(),
nodeId: process.env.NODE_ID,
}),
]);
}
async invalidatePattern(pattern: string): Promise<void> {
// Find all keys matching pattern in distributed cache
const keys = await this.l2Cache.keys(pattern);
if (keys.length > 0) {
await this.invalidate(keys);
}
}
async warmUp(warmUpSpecs: WarmUpSpec[]): Promise<void> {
console.log("Starting cache warm-up process...");
// Execute warm-up operations in parallel batches
const batchSize = 10;
for (let i = 0; i < warmUpSpecs.length; i += batchSize) {
const batch = warmUpSpecs.slice(i, i + batchSize);
await Promise.all(
batch.map(async (spec) => {
try {
await this.get(spec.key, spec.fetchFunction, spec.options);
console.log(`Warmed up cache for key: ${spec.key}`);
} catch (error) {
console.error(
`Failed to warm up cache for key ${spec.key}:`,
error
);
}
})
);
}
console.log("Cache warm-up complete");
}
private async refreshCacheAsync<T>(
key: string,
fetchFunction: () => Promise<T>,
options: CacheOptions
): Promise<void> {
// Fire and forget - don't block the current request
setImmediate(async () => {
try {
await this.populateCache(key, fetchFunction, options);
} catch (error) {
console.warn(`Async cache refresh failed for key ${key}:`, error);
}
});
}
private async isNearExpiration(
key: string,
threshold: number
): Promise<boolean> {
const ttl = await this.l1Cache.ttl(key);
return ttl !== null && ttl < threshold;
}
private async shouldUseCDN(key: string): boolean {
// Use CDN for relatively static data
return (
key.includes("user_profile") ||
key.includes("popular_posts") ||
key.includes("public_content")
);
}
private async getStaleData(key: string): Promise<any> {
// Try to get data from a stale cache or backup
const staleKey = `stale_${key}`;
return await this.l2Cache.get(staleKey);
}
private setupCacheInvalidation(): void {
// Listen for invalidation events from other nodes
this.eventBus.subscribe("cache.invalidation", (event) => {
if (event.nodeId !== process.env.NODE_ID) {
// Invalidate local cache when other nodes invalidate
this.l1Cache.del(...event.keys);
}
});
// Listen for data change events
this.eventBus.subscribe("user.updated", (event) => {
this.invalidatePattern(`user_${event.userId}*`);
});
this.eventBus.subscribe("post.created", (event) => {
this.invalidate(["popular_posts", `user_posts_${event.userId}`]);
});
}
private sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}
getStats(): CacheStats {
return {
l1Stats: this.l1Cache.getStats(),
l2Stats: this.l2Cache.getStats(),
l3Stats: this.l3Cache.getStats(),
};
}
}
// Intelligent cache warming service
export class CacheWarmingService {
constructor(
private cacheService: DistributedCacheService,
private database: Database,
private scheduler: TaskScheduler
) {
this.setupScheduledWarming();
}
async warmCriticalData(): Promise<void> {
const warmUpSpecs: WarmUpSpec[] = [
// Popular posts (accessed frequently)
{
key: "popular_posts",
fetchFunction: () => this.fetchPopularPosts(),
options: { ttl: 300, l1Ttl: 60, preventStampede: true },
},
// Top users (for leaderboards)
{
key: "top_users",
fetchFunction: () => this.fetchTopUsers(),
options: { ttl: 600, l1Ttl: 120 },
},
// Site statistics (for dashboards)
{
key: "site_stats",
fetchFunction: () => this.fetchSiteStats(),
options: { ttl: 900, l1Ttl: 300 },
},
];
await this.cacheService.warmUp(warmUpSpecs);
}
async warmUserSpecificData(userId: string): Promise<void> {
// Warm up data specific to a user (e.g., after login)
const userWarmUpSpecs: WarmUpSpec[] = [
{
key: `user_profile_${userId}`,
fetchFunction: () => this.fetchUserProfile(userId),
options: { ttl: 600, l1Ttl: 120 },
},
{
key: `user_feed_${userId}_1`,
fetchFunction: () => this.fetchUserFeed(userId, 1),
options: { ttl: 300, l1Ttl: 60 },
},
{
key: `user_notifications_${userId}`,
fetchFunction: () => this.fetchUserNotifications(userId),
options: { ttl: 180, l1Ttl: 30 },
},
];
await this.cacheService.warmUp(userWarmUpSpecs);
}
private async fetchPopularPosts(): Promise<Post[]> {
return this.database.query(`
SELECT p.*, u.name as author_name,
COUNT(l.id) as like_count,
COUNT(c.id) as comment_count
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN likes l ON p.id = l.post_id
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY p.id
ORDER BY (COUNT(l.id) + COUNT(c.id) * 2) DESC
LIMIT 50
`);
}
private async fetchTopUsers(): Promise<User[]> {
return this.database.query(`
SELECT u.*, us.follower_count, us.post_count
FROM users u
JOIN user_stats us ON u.id = us.user_id
ORDER BY us.follower_count DESC
LIMIT 20
`);
}
private async fetchSiteStats(): Promise<SiteStats> {
const [userCount, postCount, activeUsers] = await Promise.all([
this.database.query("SELECT COUNT(*) as count FROM users"),
this.database.query("SELECT COUNT(*) as count FROM posts"),
this.database.query(`
SELECT COUNT(DISTINCT user_id) as count
FROM user_sessions
WHERE last_active > DATE_SUB(NOW(), INTERVAL 1 HOUR)
`),
]);
return {
totalUsers: userCount[0].count,
totalPosts: postCount[0].count,
activeUsers: activeUsers[0].count,
};
}
private async fetchUserProfile(userId: string): Promise<UserProfile> {
// Implementation for fetching user profile
return this.database.query(
`
SELECT u.*, p.*, us.*
FROM users u
LEFT JOIN profiles p ON u.id = p.user_id
LEFT JOIN user_stats us ON u.id = us.user_id
WHERE u.id = ?
`,
[userId]
);
}
private async fetchUserFeed(userId: string, page: number): Promise<Post[]> {
// Implementation for fetching user feed
return this.database.query(
`
SELECT p.*, u.name as author_name
FROM posts p
JOIN users u ON p.user_id = u.id
JOIN follows f ON p.user_id = f.following_id
WHERE f.follower_id = ?
ORDER BY p.created_at DESC
LIMIT ?, 20
`,
[userId, (page - 1) * 20]
);
}
private async fetchUserNotifications(
userId: string
): Promise<Notification[]> {
return this.database.query(
`
SELECT * FROM notifications
WHERE user_id = ? AND read_at IS NULL
ORDER BY created_at DESC
LIMIT 50
`,
[userId]
);
}
private setupScheduledWarming(): void {
// Warm critical data every 5 minutes
this.scheduler.schedule("warm-critical-data", "*/5 * * * *", () => {
this.warmCriticalData();
});
// Warm popular user data hourly
this.scheduler.schedule("warm-popular-users", "0 * * * *", async () => {
const popularUsers = await this.fetchTopUsers();
for (const user of popularUsers.slice(0, 10)) {
await this.warmUserSpecificData(user.id);
}
});
}
}
// Supporting types and interfaces
export interface CacheOptions {
ttl?: number; // Time to live in seconds
l1Ttl?: number; // L1 cache TTL
preventStampede?: boolean;
warmUpCache?: boolean;
}
export interface WarmUpSpec {
key: string;
fetchFunction: () => Promise<any>;
options: CacheOptions;
}
export interface CacheStats {
l1Stats: {
hits: number;
misses: number;
hitRate: number;
size: number;
};
l2Stats: {
hits: number;
misses: number;
hitRate: number;
connections: number;
};
l3Stats: {
hits: number;
misses: number;
hitRate: number;
bandwidth: number;
};
}
export interface SiteStats {
totalUsers: number;
totalPosts: number;
activeUsers: number;
}
Content Delivery Networks: Global Performance at Scale
The Problem: Single Origin Server Bottleneck
// The single-origin nightmare that kills global performance
class SingleOriginServer {
constructor(private staticFileServer: StaticFileServer) {}
async serveAsset(
assetPath: string,
userLocation: string
): Promise<AssetResponse> {
// All requests come to origin server - RED FLAG #1
// User in Tokyo requests file from server in Virginia = 200ms+ latency
const startTime = Date.now();
try {
// No geographic distribution - RED FLAG #2
const asset = await this.staticFileServer.readFile(assetPath);
// No optimization for different device types - RED FLAG #3
// Serving 4K images to mobile users on 3G connections
// No compression - RED FLAG #4
// Sending uncompressed assets across the globe
const responseTime = Date.now() - startTime;
return new AssetResponse(asset, responseTime, userLocation);
} catch (error) {
// Single point of failure - RED FLAG #5
// If this server goes down, entire site becomes unusable
throw new AssetUnavailableError(`Asset ${assetPath} unavailable`);
}
}
async serveImage(
imagePath: string,
userAgent: string
): Promise<ImageResponse> {
// No device-specific optimization - RED FLAG #6
const image = await this.staticFileServer.readFile(imagePath);
// Serving original 10MB images to mobile users - RED FLAG #7
if (userAgent.includes("Mobile")) {
// Should resize/optimize but doesn't
console.log("Serving full-size image to mobile user - oops!");
}
return new ImageResponse(image, imagePath);
}
// Performance metrics from this architecture:
// - Tokyo users: 2500ms average load time
// - Sydney users: 3200ms average load time
// - London users: 1200ms average load time
// - Mobile users: 15+ second page loads on 3G
// - Server bandwidth costs: $50,000/month
// - User bounce rate: 67% (industry average: 32%)
}
The Solution: Global CDN with Intelligent Optimization
// Enterprise CDN architecture with edge optimization
export class GlobalCDNService {
constructor(
private originServer: OriginServer,
private edgeNodes: Map<string, EdgeNode>,
private geoLocationService: GeoLocationService,
private imageOptimizer: ImageOptimizer,
private analyticsService: CDNAnalyticsService
) {}
async serveAsset(
assetPath: string,
clientIP: string,
userAgent: string,
acceptHeader: string
): Promise<OptimizedAssetResponse> {
// Determine optimal edge node based on user location
const userLocation = await this.geoLocationService.getLocation(clientIP);
const edgeNode = this.selectOptimalEdgeNode(userLocation);
// Generate cache key with optimization parameters
const cacheKey = this.generateCacheKey(assetPath, userAgent, acceptHeader);
try {
// Try to serve from edge cache first
const cachedAsset = await edgeNode.getFromCache(cacheKey);
if (cachedAsset) {
// Cache hit - ultra-fast response from edge
this.analyticsService.recordCacheHit(
edgeNode.nodeId,
assetPath,
userLocation
);
return new OptimizedAssetResponse(
cachedAsset,
"EDGE_HIT",
edgeNode.nodeId
);
}
// Cache miss - fetch and optimize from origin
return await this.fetchOptimizeAndCache(
assetPath,
userAgent,
acceptHeader,
edgeNode,
userLocation
);
} catch (error) {
// Fallback to next best edge node
return await this.handleEdgeFailure(
assetPath,
userAgent,
acceptHeader,
userLocation,
edgeNode
);
}
}
private async fetchOptimizeAndCache(
assetPath: string,
userAgent: string,
acceptHeader: string,
edgeNode: EdgeNode,
userLocation: GeoLocation
): Promise<OptimizedAssetResponse> {
// Fetch original asset from origin
const originalAsset = await this.originServer.getAsset(assetPath);
// Optimize based on client capabilities
const optimizedAsset = await this.optimizeAssetForClient(
originalAsset,
userAgent,
acceptHeader,
userLocation.connectionSpeed
);
// Cache at edge with appropriate TTL
const cacheTTL = this.calculateCacheTTL(assetPath, originalAsset.type);
await edgeNode.cacheAsset(
this.generateCacheKey(assetPath, userAgent, acceptHeader),
optimizedAsset,
cacheTTL
);
this.analyticsService.recordCacheMiss(
edgeNode.nodeId,
assetPath,
userLocation
);
return new OptimizedAssetResponse(
optimizedAsset,
"ORIGIN_OPTIMIZED",
edgeNode.nodeId
);
}
private async optimizeAssetForClient(
asset: Asset,
userAgent: string,
acceptHeader: string,
connectionSpeed: ConnectionSpeed
): Promise<OptimizedAsset> {
const clientCapabilities = this.parseClientCapabilities(
userAgent,
acceptHeader
);
switch (asset.type) {
case AssetType.IMAGE:
return await this.optimizeImage(
asset as ImageAsset,
clientCapabilities,
connectionSpeed
);
case AssetType.VIDEO:
return await this.optimizeVideo(
asset as VideoAsset,
clientCapabilities,
connectionSpeed
);
case AssetType.JAVASCRIPT:
return await this.optimizeJavaScript(
asset as JavaScriptAsset,
clientCapabilities
);
case AssetType.CSS:
return await this.optimizeCSS(asset as CSSAsset, clientCapabilities);
default:
return await this.optimizeGenericAsset(asset, clientCapabilities);
}
}
private async optimizeImage(
image: ImageAsset,
capabilities: ClientCapabilities,
connectionSpeed: ConnectionSpeed
): Promise<OptimizedImageAsset> {
const optimizations: ImageOptimization[] = [];
// Format optimization based on browser support
let targetFormat = image.format;
if (capabilities.supportsWebP && image.format !== "webp") {
targetFormat = "webp";
optimizations.push({
type: "format_conversion",
from: image.format,
to: "webp",
});
} else if (capabilities.supportsAVIF && image.format !== "avif") {
targetFormat = "avif";
optimizations.push({
type: "format_conversion",
from: image.format,
to: "avif",
});
}
// Resize based on device capabilities
let targetDimensions = image.dimensions;
if (capabilities.deviceType === "mobile") {
// Mobile optimization
const maxWidth = Math.min(
capabilities.screenWidth * capabilities.devicePixelRatio,
800
);
if (image.dimensions.width > maxWidth) {
targetDimensions = {
width: maxWidth,
height: Math.round(
(maxWidth / image.dimensions.width) * image.dimensions.height
),
};
optimizations.push({ type: "resize", targetDimensions });
}
}
// Quality optimization based on connection speed
let quality = image.quality || 85;
switch (connectionSpeed) {
case ConnectionSpeed.SLOW_2G:
case ConnectionSpeed.SLOW_3G:
quality = Math.min(quality, 60);
optimizations.push({
type: "quality_reduction",
targetQuality: quality,
});
break;
case ConnectionSpeed.FAST_3G:
quality = Math.min(quality, 75);
optimizations.push({
type: "quality_reduction",
targetQuality: quality,
});
break;
}
// Apply optimizations
const optimizedImageData = await this.imageOptimizer.optimize(image.data, {
format: targetFormat,
dimensions: targetDimensions,
quality,
progressive: capabilities.supportsProgressiveJPEG,
});
return new OptimizedImageAsset(
optimizedImageData,
targetFormat,
targetDimensions,
quality,
optimizations
);
}
private async optimizeVideo(
video: VideoAsset,
capabilities: ClientCapabilities,
connectionSpeed: ConnectionSpeed
): Promise<OptimizedVideoAsset> {
// Adaptive bitrate streaming based on connection speed
const targetBitrates = this.selectOptimalBitrates(
connectionSpeed,
capabilities
);
// Generate multiple quality versions
const variants = await Promise.all(
targetBitrates.map((bitrate) =>
this.generateVideoVariant(video, bitrate, capabilities)
)
);
return new OptimizedVideoAsset(
video.data,
variants,
this.generateHLSPlaylist(variants)
);
}
private selectOptimalEdgeNode(userLocation: GeoLocation): EdgeNode {
// Find geographically closest nodes
const candidateNodes = Array.from(this.edgeNodes.values())
.map((node) => ({
node,
distance: this.calculateDistance(userLocation, node.location),
}))
.sort((a, b) => a.distance - b.distance);
// Filter by healthy nodes
const healthyNodes = candidateNodes.filter(({ node }) => node.isHealthy());
if (healthyNodes.length === 0) {
throw new CDNUnavailableError("No healthy edge nodes available");
}
// Consider both distance and current load
return healthyNodes.reduce((best, current) => {
const bestScore = this.calculateNodeScore(best.node, best.distance);
const currentScore = this.calculateNodeScore(
current.node,
current.distance
);
return currentScore > bestScore ? current : best;
}).node;
}
private calculateNodeScore(node: EdgeNode, distance: number): number {
const distanceScore = 1000 / (distance + 1); // Closer is better
const loadScore = (100 - node.getCurrentLoad()) / 100; // Lower load is better
const capacityScore = node.getAvailableCapacity() / node.getMaxCapacity();
return distanceScore * 0.4 + loadScore * 0.3 + capacityScore * 0.3;
}
private async handleEdgeFailure(
assetPath: string,
userAgent: string,
acceptHeader: string,
userLocation: GeoLocation,
failedNode: EdgeNode
): Promise<OptimizedAssetResponse> {
// Mark node as unhealthy
failedNode.markUnhealthy();
// Find alternative edge node
const alternativeNode = this.selectOptimalEdgeNode(userLocation);
if (alternativeNode === failedNode) {
// No alternatives - serve from origin
const asset = await this.originServer.getAsset(assetPath);
const optimized = await this.optimizeAssetForClient(
asset,
userAgent,
acceptHeader,
userLocation.connectionSpeed
);
return new OptimizedAssetResponse(optimized, "ORIGIN_FALLBACK", "origin");
}
// Retry with alternative node
return await this.serveAsset(
assetPath,
userLocation.ip,
userAgent,
acceptHeader
);
}
async preloadCriticalAssets(criticalAssets: string[]): Promise<void> {
// Preload critical assets to all edge nodes
console.log(
`Preloading ${criticalAssets.length} critical assets to all edge nodes`
);
const preloadPromises = Array.from(this.edgeNodes.values()).map(
async (edgeNode) => {
return Promise.all(
criticalAssets.map(async (assetPath) => {
try {
const asset = await this.originServer.getAsset(assetPath);
await edgeNode.cacheAsset(
assetPath,
asset,
86400 // 24 hours for critical assets
);
console.log(
`Preloaded ${assetPath} to edge node ${edgeNode.nodeId}`
);
} catch (error) {
console.error(
`Failed to preload ${assetPath} to edge node ${edgeNode.nodeId}:`,
error
);
}
})
);
}
);
await Promise.all(preloadPromises);
}
async warmEdgeCache(popularAssets: AssetPopularityData[]): Promise<void> {
// Warm cache based on popularity and geographic distribution
for (const assetData of popularAssets) {
const optimalNodes = this.selectNodesForAsset(assetData);
await Promise.all(
optimalNodes.map(async (node) => {
try {
const asset = await this.originServer.getAsset(assetData.path);
await node.cacheAsset(assetData.path, asset, 3600); // 1 hour
} catch (error) {
console.error(
`Failed to warm cache for ${assetData.path} on node ${node.nodeId}:`,
error
);
}
})
);
}
}
private selectNodesForAsset(assetData: AssetPopularityData): EdgeNode[] {
// Select nodes based on where this asset is most requested
return assetData.popularRegions
.map((region) => this.findNearestNode(region))
.filter((node, index, array) => array.indexOf(node) === index); // Remove duplicates
}
private findNearestNode(region: GeographicRegion): EdgeNode {
return Array.from(this.edgeNodes.values()).reduce((nearest, current) => {
const nearestDistance = this.calculateDistance(
region.center,
nearest.location
);
const currentDistance = this.calculateDistance(
region.center,
current.location
);
return currentDistance < nearestDistance ? current : nearest;
});
}
getPerformanceMetrics(): CDNPerformanceMetrics {
const nodeMetrics = Array.from(this.edgeNodes.values()).map((node) => ({
nodeId: node.nodeId,
location: node.location,
hitRate: node.getCacheHitRate(),
averageResponseTime: node.getAverageResponseTime(),
currentLoad: node.getCurrentLoad(),
bandwidthUsage: node.getBandwidthUsage(),
}));
return new CDNPerformanceMetrics(
nodeMetrics,
this.analyticsService.getGlobalStats()
);
}
private generateCacheKey(
assetPath: string,
userAgent: string,
acceptHeader: string
): string {
const deviceType = this.parseDeviceType(userAgent);
const supportedFormats = this.parseSupportedFormats(acceptHeader);
return `${assetPath}_${deviceType}_${supportedFormats.join("-")}`;
}
private parseClientCapabilities(
userAgent: string,
acceptHeader: string
): ClientCapabilities {
return {
deviceType: this.parseDeviceType(userAgent),
screenWidth: this.parseScreenWidth(userAgent),
devicePixelRatio: this.parseDevicePixelRatio(userAgent),
supportsWebP: acceptHeader.includes("image/webp"),
supportsAVIF: acceptHeader.includes("image/avif"),
supportsProgressiveJPEG: true, // Most browsers support this
supportsHTTP2: true, // Assume HTTP/2 support
};
}
private parseDeviceType(userAgent: string): "mobile" | "tablet" | "desktop" {
if (/Mobile|Android|iPhone/.test(userAgent)) return "mobile";
if (/Tablet|iPad/.test(userAgent)) return "tablet";
return "desktop";
}
private parseScreenWidth(userAgent: string): number {
// Simple heuristic - in production, use a proper device detection library
if (this.parseDeviceType(userAgent) === "mobile") return 375;
if (this.parseDeviceType(userAgent) === "tablet") return 768;
return 1920;
}
private parseDevicePixelRatio(userAgent: string): number {
// Default to 1, could be enhanced with actual detection
return 1;
}
private parseSupportedFormats(acceptHeader: string): string[] {
const formats = [];
if (acceptHeader.includes("image/webp")) formats.push("webp");
if (acceptHeader.includes("image/avif")) formats.push("avif");
if (acceptHeader.includes("image/jpeg")) formats.push("jpeg");
if (acceptHeader.includes("image/png")) formats.push("png");
return formats;
}
private calculateDistance(point1: GeoLocation, point2: GeoLocation): number {
// Haversine formula for geographic distance
const R = 6371; // Earth's radius in kilometers
const dLat = ((point2.latitude - point1.latitude) * Math.PI) / 180;
const dLon = ((point2.longitude - point1.longitude) * Math.PI) / 180;
const a =
Math.sin(dLat / 2) * Math.sin(dLat / 2) +
Math.cos((point1.latitude * Math.PI) / 180) *
Math.cos((point2.latitude * Math.PI) / 180) *
Math.sin(dLon / 2) *
Math.sin(dLon / 2);
const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
return R * c;
}
private calculateCacheTTL(assetPath: string, assetType: AssetType): number {
// Intelligent TTL based on asset type and update frequency
if (assetPath.includes("/static/")) return 86400; // 24 hours for static assets
if (assetType === AssetType.IMAGE) return 3600; // 1 hour for images
if (assetType === AssetType.CSS || assetType === AssetType.JAVASCRIPT)
return 7200; // 2 hours for code
return 300; // 5 minutes default
}
}
// Edge node with intelligent caching
export class EdgeNode {
private cache: Map<string, CachedAsset> = new Map();
private stats: EdgeNodeStats;
private healthy = true;
constructor(
public readonly nodeId: string,
public readonly location: GeoLocation,
private maxCacheSize: number,
private maxCapacity: number
) {
this.stats = new EdgeNodeStats();
}
async getFromCache(key: string): Promise<OptimizedAsset | null> {
const cached = this.cache.get(key);
if (!cached || this.isExpired(cached)) {
if (cached) {
this.cache.delete(key);
}
this.stats.recordCacheMiss();
return null;
}
this.stats.recordCacheHit();
cached.lastAccessed = new Date();
return cached.asset;
}
async cacheAsset(key: string, asset: Asset, ttl: number): Promise<void> {
// Implement LRU eviction if cache is full
if (this.cache.size >= this.maxCacheSize) {
this.evictLeastRecentlyUsed();
}
const cachedAsset = new CachedAsset(
asset,
new Date(),
new Date(Date.now() + ttl * 1000),
new Date()
);
this.cache.set(key, cachedAsset);
}
private evictLeastRecentlyUsed(): void {
let lruKey = "";
let oldestAccess = new Date();
for (const [key, cached] of this.cache.entries()) {
if (cached.lastAccessed < oldestAccess) {
oldestAccess = cached.lastAccessed;
lruKey = key;
}
}
if (lruKey) {
this.cache.delete(lruKey);
}
}
private isExpired(cached: CachedAsset): boolean {
return cached.expiresAt < new Date();
}
isHealthy(): boolean {
return this.healthy && this.getCurrentLoad() < 90;
}
markUnhealthy(): void {
this.healthy = false;
// Automatically mark as healthy again after 5 minutes
setTimeout(() => {
this.healthy = true;
}, 300000);
}
getCurrentLoad(): number {
// Return current load percentage (0-100)
return (this.stats.activeRequests / this.maxCapacity) * 100;
}
getAvailableCapacity(): number {
return this.maxCapacity - this.stats.activeRequests;
}
getMaxCapacity(): number {
return this.maxCapacity;
}
getCacheHitRate(): number {
return this.stats.getCacheHitRate();
}
getAverageResponseTime(): number {
return this.stats.getAverageResponseTime();
}
getBandwidthUsage(): number {
return this.stats.bandwidthUsage;
}
}
// Supporting types and classes
export enum AssetType {
IMAGE = "image",
VIDEO = "video",
JAVASCRIPT = "javascript",
CSS = "css",
HTML = "html",
FONT = "font",
DOCUMENT = "document",
}
export enum ConnectionSpeed {
SLOW_2G = "slow-2g",
SLOW_3G = "slow-3g",
FAST_3G = "fast-3g",
FAST_4G = "fast-4g",
FIBER = "fiber",
}
export interface ClientCapabilities {
deviceType: "mobile" | "tablet" | "desktop";
screenWidth: number;
devicePixelRatio: number;
supportsWebP: boolean;
supportsAVIF: boolean;
supportsProgressiveJPEG: boolean;
supportsHTTP2: boolean;
}
export interface GeoLocation {
ip: string;
latitude: number;
longitude: number;
country: string;
city: string;
connectionSpeed: ConnectionSpeed;
}
export class CachedAsset {
constructor(
public asset: Asset,
public cachedAt: Date,
public expiresAt: Date,
public lastAccessed: Date
) {}
}
export class EdgeNodeStats {
public activeRequests = 0;
public totalRequests = 0;
public cacheHits = 0;
public cacheMisses = 0;
public totalResponseTime = 0;
public bandwidthUsage = 0;
recordCacheHit(): void {
this.cacheHits++;
}
recordCacheMiss(): void {
this.cacheMisses++;
}
getCacheHitRate(): number {
const total = this.cacheHits + this.cacheMisses;
return total > 0 ? this.cacheHits / total : 0;
}
getAverageResponseTime(): number {
return this.totalRequests > 0
? this.totalResponseTime / this.totalRequests
: 0;
}
}
The Bottom Line: Scale Like the Pros
The patterns you’ve just seen aren’t academic exercises—they’re the exact same strategies used by companies handling billions of requests daily. While your startup might not need Netflix-level infrastructure on day one, building with these principles from the beginning means you’ll never hit that wall where everything needs to be rewritten.
The next blog in this series will cover High Availability patterns, Disaster Recovery, and Multi-Region Deployments—because scaling isn’t just about handling more users, it’s about ensuring your application stays online even when entire data centers disappear.
Remember: Every application that scales successfully started with the right architectural foundation. The question isn’t whether you’ll need to scale—it’s whether you’ll be ready when you do.