Database Scaling & Performance

From Database Performance to Infinite Scale

You’ve mastered database fundamentals, implemented intelligent caching, built advanced data management systems with search and synchronization, and created event-driven architectures that process millions of background jobs. Your applications handle complex data orchestration with ease. But here’s the scaling reality that every successful application eventually faces: no matter how well you optimize individual queries or implement caching strategies, there comes a point where a single database server simply cannot handle your application’s growth.

The single database bottleneck that kills scaling dreams:

// Your perfectly optimized application works beautifully
const getUserFeed = async (userId, page = 1) => {
  // Cached, optimized, indexed - everything done right
  const feedKey = `user:${userId}:feed:${page}`;
  let feed = await cache.get(feedKey);

  if (!feed) {
    // Optimized query with proper indexes
    feed = await db.posts
      .find({
        $or: [
          { authorId: { $in: user.following } },
          { tags: { $in: user.interests } },
        ],
      })
      .sort({ createdAt: -1, engagement: -1 })
      .skip((page - 1) * 20)
      .limit(20);

    await cache.set(feedKey, feed, 300);
  }

  return feed;
};

// This works perfectly... until it doesn't:
// - 100 users: 50ms average response time
// - 10,000 users: 200ms average response time
// - 100,000 users: 2 second response time
// - 1,000,000 users: Database crashes under load

// No amount of optimization can save you from the fundamental limits:
// - Single database server has finite CPU, memory, and I/O
// - Concurrent connections max out (even with pooling)
// - Storage performance degrades with dataset size
// - Backup and maintenance windows become unacceptable

The scaling wall every growing application hits:

-- Your database performance metrics start looking like this:

-- Query performance degrades with table size
SELECT * FROM posts WHERE author_id IN (...)
ORDER BY created_at DESC LIMIT 20;
-- 1 million posts: 10ms
-- 10 million posts: 100ms
-- 100 million posts: 2000ms
-- 500 million posts: Query timeout

-- Connection pool exhaustion
SHOW PROCESSLIST;
-- 200 connections running simultaneously
-- New connections getting "Too many connections" error
-- Application threads blocked waiting for database connections

-- Lock contention becomes severe
SHOW ENGINE INNODB STATUS;
-- Deadlocks detected: 45 per minute
-- Lock wait timeouts: 120 per minute
-- Users experiencing random "Database error" messages

The uncomfortable scaling truth: Even perfectly architected applications with intelligent caching, optimized queries, and proper indexing will eventually exhaust the capabilities of a single database server. Modern applications serving millions of users require distributed database architectures that can scale horizontally across multiple servers while maintaining consistency, availability, and performance.

Database scaling mastery requires understanding:

  • Horizontal scaling strategies that distribute data across multiple database servers
  • Read replica architectures that handle read-heavy workloads by distributing queries
  • Sharding and partitioning patterns that split large datasets intelligently across nodes
  • Database clustering techniques that provide high availability and automatic failover
  • Performance monitoring systems that identify bottlenecks before they impact users

This article completes your database education by transforming you from a single-database architect into a distributed systems engineer. You’ll learn to design database infrastructures that scale to handle billions of records and millions of concurrent users while maintaining the performance characteristics of much smaller systems.


Read Replicas: Distributing Read Traffic

Master-Slave Architecture for Read Scaling

The read-heavy application challenge:

// Typical web application read/write ratio: 90% reads, 10% writes
const appMetrics = {
  totalQueries: 1000000,
  readQueries: 900000, // User feeds, search, profiles, content
  writeQueries: 100000, // Posts, likes, comments, updates
};

// Single database handling both reads and writes
const handleUserRequest = async (req, res) => {
  // Every request hits the same database server
  // Reads compete with writes for resources
  // Database CPU: 95%
  // Database Memory: 90%
  // Connection Pool: 180/200 connections used
  // Users experiencing slow response times
};

// Even with perfect caching, cache misses and real-time data still overwhelm the database

Professional read replica architecture:

// ✅ Multi-database architecture for massive read scaling
const { Pool } = require("pg");

class DatabaseCluster {
  constructor() {
    // Master database for writes
    this.master = new Pool({
      host: process.env.MASTER_DB_HOST,
      port: process.env.MASTER_DB_PORT,
      database: process.env.DB_NAME,
      user: process.env.DB_USER,
      password: process.env.DB_PASSWORD,
      max: 20, // Smaller pool for writes
      ssl: { rejectUnauthorized: false },
    });

    // Read replicas for distributed read traffic
    this.readReplicas = [
      new Pool({
        host: process.env.READ_REPLICA_1_HOST,
        port: process.env.READ_REPLICA_1_PORT,
        database: process.env.DB_NAME,
        user: process.env.DB_USER,
        password: process.env.DB_PASSWORD,
        max: 50, // Larger pools for reads
        ssl: { rejectUnauthorized: false },
      }),
      new Pool({
        host: process.env.READ_REPLICA_2_HOST,
        port: process.env.READ_REPLICA_2_PORT,
        database: process.env.DB_NAME,
        user: process.env.DB_USER,
        password: process.env.DB_PASSWORD,
        max: 50,
        ssl: { rejectUnauthorized: false },
      }),
      new Pool({
        host: process.env.READ_REPLICA_3_HOST,
        port: process.env.READ_REPLICA_3_PORT,
        database: process.env.DB_NAME,
        user: process.env.DB_USER,
        password: process.env.DB_PASSWORD,
        max: 50,
        ssl: { rejectUnauthorized: false },
      }),
    ];

    this.currentReplicaIndex = 0;
    this.replicaHealthStatus = new Map();

    this.setupHealthChecks();
    this.setupFailoverLogic();
  }

  // Route writes to master
  async executeWrite(query, params = []) {
    const startTime = Date.now();

    try {
      const result = await this.master.query(query, params);

      this.logQueryPerformance("write", query, Date.now() - startTime);
      return result;
    } catch (error) {
      console.error("Write query failed:", error);
      throw error;
    }
  }

  // Route reads to replicas with load balancing
  async executeRead(query, params = [], options = {}) {
    const startTime = Date.now();
    const maxRetries = options.maxRetries || 3;

    let lastError;

    for (let attempt = 0; attempt < maxRetries; attempt++) {
      const replica = this.selectHealthyReplica(options.preferredReplica);

      try {
        const result = await replica.query(query, params);

        this.logQueryPerformance(
          "read",
          query,
          Date.now() - startTime,
          replica
        );
        return result;
      } catch (error) {
        console.warn(`Read replica failed (attempt ${attempt + 1}):`, error);
        lastError = error;

        // Mark replica as potentially unhealthy
        this.markReplicaUnhealthy(replica);
      }
    }

    // All replicas failed - fallback to master
    console.warn("All read replicas failed, falling back to master");

    try {
      const result = await this.master.query(query, params);

      this.logQueryPerformance("read_fallback", query, Date.now() - startTime);
      return result;
    } catch (error) {
      console.error("Master fallback also failed:", error);
      throw lastError || error;
    }
  }

  selectHealthyReplica(preferredIndex = null) {
    // Use preferred replica if specified and healthy
    if (
      preferredIndex !== null &&
      preferredIndex < this.readReplicas.length &&
      this.isReplicaHealthy(this.readReplicas[preferredIndex])
    ) {
      return this.readReplicas[preferredIndex];
    }

    // Find healthy replicas
    const healthyReplicas = this.readReplicas.filter((replica) =>
      this.isReplicaHealthy(replica)
    );

    if (healthyReplicas.length === 0) {
      throw new Error("No healthy read replicas available");
    }

    // Round-robin load balancing among healthy replicas
    const replica =
      healthyReplicas[this.currentReplicaIndex % healthyReplicas.length];
    this.currentReplicaIndex++;

    return replica;
  }

  isReplicaHealthy(replica) {
    const replicaId = this.getReplicaId(replica);
    const healthStatus = this.replicaHealthStatus.get(replicaId);

    if (!healthStatus) return true; // Assume healthy if no status recorded

    // Check if unhealthy period has expired
    if (
      healthStatus.unhealthyUntil &&
      Date.now() > healthStatus.unhealthyUntil
    ) {
      this.replicaHealthStatus.delete(replicaId);
      return true;
    }

    return healthStatus.isHealthy !== false;
  }

  markReplicaUnhealthy(replica) {
    const replicaId = this.getReplicaId(replica);

    this.replicaHealthStatus.set(replicaId, {
      isHealthy: false,
      unhealthyUntil: Date.now() + 30 * 1000, // Retry after 30 seconds
    });
  }

  getReplicaId(replica) {
    return `${replica.options.host}:${replica.options.port}`;
  }

  setupHealthChecks() {
    // Regular health checks for all replicas
    setInterval(async () => {
      const healthPromises = this.readReplicas.map(async (replica, index) => {
        try {
          const start = Date.now();
          await replica.query("SELECT 1 as health_check");
          const latency = Date.now() - start;

          const replicaId = this.getReplicaId(replica);
          this.replicaHealthStatus.set(replicaId, {
            isHealthy: true,
            latency,
            lastChecked: Date.now(),
          });

          return { index, status: "healthy", latency };
        } catch (error) {
          console.error(`Health check failed for replica ${index}:`, error);
          this.markReplicaUnhealthy(replica);

          return { index, status: "unhealthy", error: error.message };
        }
      });

      const results = await Promise.all(healthPromises);

      // Log health status
      const healthySummary = results.filter((r) => r.status === "healthy");
      console.log(
        `Replica health check: ${healthySummary.length}/${results.length} healthy`
      );

      // Alert if too many replicas are unhealthy
      if (healthySummary.length < this.readReplicas.length * 0.5) {
        console.error("ALERT: More than 50% of read replicas are unhealthy!");
        this.sendHealthAlert(results);
      }
    }, 30000); // Check every 30 seconds
  }

  // Transaction support (must use master)
  async withTransaction(callback) {
    const client = await this.master.connect();

    try {
      await client.query("BEGIN");
      const result = await callback(client);
      await client.query("COMMIT");
      return result;
    } catch (error) {
      await client.query("ROLLBACK");
      throw error;
    } finally {
      client.release();
    }
  }

  // Connection management
  async closeAll() {
    await this.master.end();
    await Promise.all(this.readReplicas.map((replica) => replica.end()));
  }

  logQueryPerformance(type, query, duration, replica = null) {
    const sanitizedQuery = query.substring(0, 100);

    if (duration > 1000) {
      // Log slow queries
      console.warn(`Slow ${type} query (${duration}ms):`, {
        query: sanitizedQuery,
        replica: replica ? this.getReplicaId(replica) : "master",
      });
    }

    // Send to monitoring system
    this.sendMetric(`database.query.${type}.duration`, duration);
    this.sendMetric(`database.query.${type}.count`, 1);
  }
}

// Usage in application
const db = new DatabaseCluster();

// Data access layer that automatically routes queries
class UserService {
  async getUserProfile(userId) {
    // Read operation - goes to replica
    const result = await db.executeRead(
      "SELECT id, username, email, profile_data, created_at FROM users WHERE id = $1",
      [userId]
    );

    return result.rows[0];
  }

  async updateUserProfile(userId, updates) {
    // Write operation - goes to master
    const result = await db.executeWrite(
      `UPDATE users 
       SET profile_data = profile_data || $1, updated_at = NOW()
       WHERE id = $2 
       RETURNING id, username, profile_data`,
      [JSON.stringify(updates), userId]
    );

    return result.rows[0];
  }

  async getUserFeed(userId, page = 1) {
    // Complex read - goes to replica
    const result = await db.executeRead(
      `SELECT p.id, p.title, p.content_preview, p.created_at,
              u.username, u.avatar_url
       FROM posts p
       JOIN users u ON p.author_id = u.id
       WHERE p.author_id IN (
         SELECT followed_id FROM user_follows WHERE follower_id = $1
       )
       ORDER BY p.created_at DESC
       LIMIT 20 OFFSET $2`,
      [userId, (page - 1) * 20]
    );

    return result.rows;
  }

  async createPost(userId, postData) {
    // Write operation with transaction - uses master
    return await db.withTransaction(async (client) => {
      // Insert post
      const postResult = await client.query(
        `INSERT INTO posts (author_id, title, content, created_at)
         VALUES ($1, $2, $3, NOW())
         RETURNING id, title, created_at`,
        [userId, postData.title, postData.content]
      );

      // Update user stats
      await client.query(
        "UPDATE users SET post_count = post_count + 1 WHERE id = $1",
        [userId]
      );

      return postResult.rows[0];
    });
  }
}

Read Replica Lag Management

Handling eventual consistency:

// Managing read replica lag for time-sensitive operations
class ConsistencyManager {
  constructor(dbCluster) {
    this.db = dbCluster;
    this.lagThreshold = 5000; // 5 seconds max acceptable lag
    this.replicaLagStatus = new Map();

    this.monitorReplicationLag();
  }

  async monitorReplicationLag() {
    setInterval(async () => {
      const lagPromises = this.db.readReplicas.map(async (replica, index) => {
        try {
          // Check replication lag (PostgreSQL specific)
          const result = await replica.query(`
            SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as lag_seconds
          `);

          const lagSeconds = parseFloat(result.rows[0].lag_seconds) * 1000; // Convert to ms
          const replicaId = this.db.getReplicaId(replica);

          this.replicaLagStatus.set(replicaId, {
            lag: lagSeconds,
            timestamp: Date.now(),
            acceptable: lagSeconds < this.lagThreshold,
          });

          return { index, lag: lagSeconds };
        } catch (error) {
          console.error(
            `Failed to check replica lag for replica ${index}:`,
            error
          );
          return { index, lag: Infinity, error: error.message };
        }
      });

      const lagResults = await Promise.all(lagPromises);

      // Alert on high lag
      const highLagReplicas = lagResults.filter(
        (r) => r.lag > this.lagThreshold
      );
      if (highLagReplicas.length > 0) {
        console.warn(`High replication lag detected:`, highLagReplicas);
      }
    }, 10000); // Check every 10 seconds
  }

  // Force read from master for operations requiring strong consistency
  async executeStronglyConsistentRead(query, params = []) {
    return await this.db.executeWrite(query, params); // Use master connection
  }

  // Read from replica but verify freshness
  async executeEventuallyConsistentRead(
    query,
    params = [],
    maxLagMs = this.lagThreshold
  ) {
    // Find replica with acceptable lag
    const acceptableReplica = Array.from(this.replicaLagStatus.entries()).find(
      ([replicaId, status]) => status.lag <= maxLagMs
    );

    if (acceptableReplica) {
      const replicaIndex = this.db.readReplicas.findIndex(
        (replica) => this.db.getReplicaId(replica) === acceptableReplica[0]
      );

      return await this.db.executeRead(query, params, {
        preferredReplica: replicaIndex,
      });
    }

    // No acceptable replica - use master
    console.warn("No replicas with acceptable lag, using master for read");
    return await this.executeStronglyConsistentRead(query, params);
  }
}

// Usage in services
class PostService {
  constructor(dbCluster) {
    this.consistencyManager = new ConsistencyManager(dbCluster);
  }

  async createPost(userId, postData) {
    const post = await db.withTransaction(async (client) => {
      const result = await client.query(
        "INSERT INTO posts (author_id, title, content) VALUES ($1, $2, $3) RETURNING *",
        [userId, postData.title, postData.content]
      );
      return result.rows[0];
    });

    return post;
  }

  async getPostImmediatelyAfterCreation(postId) {
    // User just created this post and wants to see it immediately
    // Must read from master to avoid "post not found" due to replication lag
    return await this.consistencyManager.executeStronglyConsistentRead(
      "SELECT * FROM posts WHERE id = $1",
      [postId]
    );
  }

  async getPublicPostFeed() {
    // Feed can tolerate some lag for better performance
    return await this.consistencyManager.executeEventuallyConsistentRead(
      "SELECT * FROM posts WHERE published = true ORDER BY created_at DESC LIMIT 20",
      [],
      10000 // 10 seconds acceptable lag
    );
  }
}

Database Sharding: Horizontal Data Distribution

Sharding Strategies and Implementation

The single-database size limit:

// Your application growth trajectory
const dataGrowth = {
  month1: { users: 1000, posts: 5000, dbSize: "1GB" },
  month6: { users: 50000, posts: 500000, dbSize: "25GB" },
  year1: { users: 500000, posts: 10000000, dbSize: "500GB" },
  year2: { users: 2000000, posts: 50000000, dbSize: "2TB" },
  year3: { users: 5000000, posts: 200000000, dbSize: "10TB" },
  // Single database performance degrades exponentially with size
  // Backup times: 1GB (1 min) → 10TB (8+ hours)
  // Index rebuild times: 25GB (5 mins) → 10TB (12+ hours)
  // Query performance tanks as indexes no longer fit in memory
};

Professional horizontal sharding architecture:

// ✅ Distributed sharding system for infinite horizontal scale
class ShardManager {
  constructor() {
    this.shards = new Map();
    this.shardingStrategy = "hash"; // hash, range, directory
    this.virtualShards = 1024; // Virtual shards for easier rebalancing
    this.setupShards();
  }

  setupShards() {
    // Define physical database shards
    const shardConfigs = [
      {
        id: "shard_0",
        host: "shard-0.db.cluster.local",
        port: 5432,
        ranges: [0, 255], // Virtual shard ranges
      },
      {
        id: "shard_1",
        host: "shard-1.db.cluster.local",
        port: 5432,
        ranges: [256, 511],
      },
      {
        id: "shard_2",
        host: "shard-2.db.cluster.local",
        port: 5432,
        ranges: [512, 767],
      },
      {
        id: "shard_3",
        host: "shard-3.db.cluster.local",
        port: 5432,
        ranges: [768, 1023],
      },
    ];

    // Initialize connection pools for each shard
    shardConfigs.forEach((config) => {
      this.shards.set(config.id, {
        pool: new Pool({
          host: config.host,
          port: config.port,
          database: process.env.DB_NAME,
          user: process.env.DB_USER,
          password: process.env.DB_PASSWORD,
          max: 25, // Connections per shard
        }),
        ranges: config.ranges,
        isHealthy: true,
        lastHealthCheck: Date.now(),
      });
    });

    this.startHealthMonitoring();
  }

  // Hash-based sharding for user data
  getShardForUser(userId) {
    // Consistent hashing to distribute users evenly
    const hash = this.hashUserId(userId);
    const virtualShard = hash % this.virtualShards;

    // Find physical shard that contains this virtual shard
    for (const [shardId, shard] of this.shards) {
      if (virtualShard >= shard.ranges[0] && virtualShard <= shard.ranges[1]) {
        return { shardId, shard: shard.pool };
      }
    }

    throw new Error(`No shard found for user ${userId}`);
  }

  hashUserId(userId) {
    // Simple hash function (use crypto.createHash in production)
    let hash = 0;
    const str = userId.toString();
    for (let i = 0; i < str.length; i++) {
      const char = str.charCodeAt(i);
      hash = (hash << 5) - hash + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }

  // Execute query on specific shard
  async executeOnShard(shardId, query, params = []) {
    const shard = this.shards.get(shardId);
    if (!shard) {
      throw new Error(`Shard ${shardId} not found`);
    }

    if (!shard.isHealthy) {
      throw new Error(`Shard ${shardId} is unhealthy`);
    }

    return await shard.pool.query(query, params);
  }

  // Execute query for specific user (automatically routes to correct shard)
  async executeForUser(userId, query, params = []) {
    const { shardId, shard } = this.getShardForUser(userId);
    return await shard.query(query, params);
  }

  // Cross-shard queries (expensive - avoid when possible)
  async executeAcrossAllShards(query, params = []) {
    const promises = [];

    for (const [shardId, shard] of this.shards) {
      if (shard.isHealthy) {
        promises.push(
          shard.pool
            .query(query, params)
            .then((result) => ({
              shardId,
              result,
            }))
            .catch((error) => ({
              shardId,
              error,
            }))
        );
      }
    }

    const results = await Promise.all(promises);

    // Separate successful results from errors
    const successful = results.filter((r) => !r.error);
    const failed = results.filter((r) => r.error);

    if (failed.length > 0) {
      console.warn("Some shards failed during cross-shard query:", failed);
    }

    return successful;
  }

  // Aggregate results from multiple shards
  async aggregateFromShards(query, params = [], aggregateFunction) {
    const shardResults = await this.executeAcrossAllShards(query, params);

    const allRows = shardResults.flatMap((result) => result.result.rows);

    if (aggregateFunction) {
      return aggregateFunction(allRows);
    }

    return allRows;
  }

  startHealthMonitoring() {
    setInterval(async () => {
      for (const [shardId, shard] of this.shards) {
        try {
          await shard.pool.query("SELECT 1");
          shard.isHealthy = true;
          shard.lastHealthCheck = Date.now();
        } catch (error) {
          console.error(`Shard ${shardId} health check failed:`, error);
          shard.isHealthy = false;

          // Alert on shard failure
          this.alertShardFailure(shardId, error);
        }
      }
    }, 30000);
  }

  alertShardFailure(shardId, error) {
    console.error(`CRITICAL: Shard ${shardId} is down!`, error);
    // Implement alerting logic here
  }
}

// Sharded data access layer
class ShardedUserService {
  constructor() {
    this.shardManager = new ShardManager();
  }

  async createUser(userData) {
    const userId = generateId();
    const userWithId = { ...userData, id: userId, created_at: new Date() };

    // Insert into appropriate shard
    const result = await this.shardManager.executeForUser(
      userId,
      `INSERT INTO users (id, username, email, profile_data, created_at) 
       VALUES ($1, $2, $3, $4, $5) 
       RETURNING id, username, email, created_at`,
      [
        userId,
        userData.username,
        userData.email,
        userData.profileData,
        userWithId.created_at,
      ]
    );

    return result.rows[0];
  }

  async getUserById(userId) {
    // Automatically routes to correct shard
    const result = await this.shardManager.executeForUser(
      userId,
      "SELECT id, username, email, profile_data, created_at FROM users WHERE id = $1",
      [userId]
    );

    return result.rows[0];
  }

  async updateUser(userId, updates) {
    const result = await this.shardManager.executeForUser(
      userId,
      `UPDATE users 
       SET profile_data = profile_data || $1, updated_at = NOW()
       WHERE id = $2
       RETURNING id, username, email, profile_data`,
      [JSON.stringify(updates), userId]
    );

    return result.rows[0];
  }

  // Cross-shard operations (expensive)
  async searchUsersByUsername(username) {
    const searchPattern = `%${username}%`;

    const results = await this.shardManager.aggregateFromShards(
      "SELECT id, username, email FROM users WHERE username ILIKE $1 LIMIT 10",
      [searchPattern],
      (allRows) => {
        // Merge and sort results from all shards
        return allRows
          .sort((a, b) => a.username.localeCompare(b.username))
          .slice(0, 10); // Final limit across all shards
      }
    );

    return results;
  }

  async getUserStats() {
    // Aggregate statistics across all shards
    const results = await this.shardManager.aggregateFromShards(
      `SELECT 
         COUNT(*) as total_users,
         COUNT(CASE WHEN created_at > NOW() - INTERVAL '30 days' THEN 1 END) as new_users_30d,
         COUNT(CASE WHEN last_login > NOW() - INTERVAL '7 days' THEN 1 END) as active_users_7d
       FROM users`,
      [],
      (allRows) => {
        // Aggregate counts from all shards
        return allRows.reduce(
          (totals, row) => ({
            total_users: totals.total_users + parseInt(row.total_users),
            new_users_30d: totals.new_users_30d + parseInt(row.new_users_30d),
            active_users_7d:
              totals.active_users_7d + parseInt(row.active_users_7d),
          }),
          { total_users: 0, new_users_30d: 0, active_users_7d: 0 }
        );
      }
    );

    return results;
  }
}

// Sharded posts service with relationship handling
class ShardedPostService {
  constructor() {
    this.shardManager = new ShardManager();
    this.userService = new ShardedUserService();
  }

  async createPost(userId, postData) {
    const postId = generateId();

    // Posts are sharded by author_id to keep user's posts together
    const result = await this.shardManager.executeForUser(
      userId,
      `INSERT INTO posts (id, author_id, title, content, created_at)
       VALUES ($1, $2, $3, $4, NOW())
       RETURNING id, title, content, created_at`,
      [postId, userId, postData.title, postData.content]
    );

    return result.rows[0];
  }

  async getUserPosts(userId, page = 1, limit = 20) {
    // All user's posts are on the same shard
    const result = await this.shardManager.executeForUser(
      userId,
      `SELECT id, title, content, created_at, view_count, like_count
       FROM posts 
       WHERE author_id = $1
       ORDER BY created_at DESC
       LIMIT $2 OFFSET $3`,
      [userId, limit, (page - 1) * limit]
    );

    return result.rows;
  }

  // Complex cross-shard operation: building user feed
  async getUserFeed(userId, page = 1, limit = 20) {
    // 1. Get user's following list (from user's shard)
    const followingResult = await this.shardManager.executeForUser(
      userId,
      "SELECT followed_id FROM user_follows WHERE follower_id = $1",
      [userId]
    );

    const followedUsers = followingResult.rows.map((row) => row.followed_id);

    if (followedUsers.length === 0) {
      return [];
    }

    // 2. Group followed users by shard
    const usersByShards = new Map();
    followedUsers.forEach((followedUserId) => {
      const { shardId } = this.shardManager.getShardForUser(followedUserId);
      if (!usersByShards.has(shardId)) {
        usersByShards.set(shardId, []);
      }
      usersByShards.get(shardId).push(followedUserId);
    });

    // 3. Query posts from each relevant shard
    const feedPromises = [];
    for (const [shardId, userIds] of usersByShards) {
      feedPromises.push(
        this.shardManager
          .executeOnShard(
            shardId,
            `SELECT p.id, p.title, p.content, p.created_at, p.author_id,
                  u.username, u.avatar_url
           FROM posts p
           JOIN users u ON p.author_id = u.id  
           WHERE p.author_id = ANY($1)
           ORDER BY p.created_at DESC
           LIMIT $2`,
            [userIds, limit * 2] // Get extra to account for merging
          )
          .then((result) => result.rows)
      );
    }

    const shardResults = await Promise.all(feedPromises);

    // 4. Merge and sort results from all shards
    const allPosts = shardResults
      .flat()
      .sort((a, b) => new Date(b.created_at) - new Date(a.created_at))
      .slice((page - 1) * limit, page * limit);

    return allPosts;
  }
}

Auto-Sharding and Rebalancing

Dynamic shard management:

// Advanced shard management with automatic rebalancing
class AutoShardManager extends ShardManager {
  constructor() {
    super();
    this.rebalanceThreshold = 0.8; // Rebalance when any shard reaches 80% capacity
    this.monitoringInterval = 300000; // 5 minutes
    this.isRebalancing = false;

    this.startCapacityMonitoring();
  }

  async startCapacityMonitoring() {
    setInterval(async () => {
      if (this.isRebalancing) return;

      const capacityStats = await this.getShardCapacityStats();
      const needsRebalancing = capacityStats.some(
        (stat) => stat.utilizationRatio > this.rebalanceThreshold
      );

      if (needsRebalancing) {
        console.log(
          "Shard capacity threshold exceeded - initiating rebalancing"
        );
        await this.initiateRebalancing(capacityStats);
      }
    }, this.monitoringInterval);
  }

  async getShardCapacityStats() {
    const stats = [];

    for (const [shardId, shard] of this.shards) {
      if (!shard.isHealthy) continue;

      try {
        const result = await shard.pool.query(`
          SELECT 
            pg_database_size(current_database()) as db_size_bytes,
            (SELECT count(*) FROM users) as user_count,
            (SELECT count(*) FROM posts) as post_count
        `);

        const row = result.rows[0];
        const sizeGB = parseInt(row.db_size_bytes) / (1024 * 1024 * 1024);
        const maxSizeGB = 100; // 100GB per shard limit

        stats.push({
          shardId,
          sizeGB,
          maxSizeGB,
          utilizationRatio: sizeGB / maxSizeGB,
          userCount: parseInt(row.user_count),
          postCount: parseInt(row.post_count),
        });
      } catch (error) {
        console.error(`Failed to get capacity stats for ${shardId}:`, error);
      }
    }

    return stats;
  }

  async initiateRebalancing(capacityStats) {
    this.isRebalancing = true;

    try {
      console.log("Starting shard rebalancing process");

      // 1. Add new shard if needed
      const overloadedShards = capacityStats.filter(
        (s) => s.utilizationRatio > this.rebalanceThreshold
      );

      if (overloadedShards.length > 0) {
        const newShardId = await this.addNewShard();
        console.log(`Added new shard: ${newShardId}`);

        // 2. Rebalance virtual shards
        await this.rebalanceVirtualShards(newShardId);

        // 3. Migrate data to achieve better balance
        await this.migrateDataForBalance(
          overloadedShards[0].shardId,
          newShardId
        );
      }

      console.log("Shard rebalancing completed successfully");
    } catch (error) {
      console.error("Shard rebalancing failed:", error);
      // Alert operations team
    } finally {
      this.isRebalancing = false;
    }
  }

  async addNewShard() {
    const newShardId = `shard_${this.shards.size}`;
    const newShardHost = await this.provisionNewShardInstance(); // Cloud API call

    // Add to shard map
    this.shards.set(newShardId, {
      pool: new Pool({
        host: newShardHost,
        port: 5432,
        database: process.env.DB_NAME,
        user: process.env.DB_USER,
        password: process.env.DB_PASSWORD,
        max: 25,
      }),
      ranges: [0, 0], // Will be set during rebalancing
      isHealthy: true,
      lastHealthCheck: Date.now(),
    });

    // Initialize schema on new shard
    await this.initializeShardSchema(newShardId);

    return newShardId;
  }

  async rebalanceVirtualShards(newShardId) {
    const totalShards = this.shards.size;
    const virtualShardsPerPhysicalShard = Math.floor(
      this.virtualShards / totalShards
    );

    let currentVirtualShard = 0;

    for (const [shardId, shard] of this.shards) {
      const rangeStart = currentVirtualShard;
      const rangeEnd = Math.min(
        currentVirtualShard + virtualShardsPerPhysicalShard - 1,
        this.virtualShards - 1
      );

      shard.ranges = [rangeStart, rangeEnd];
      currentVirtualShard = rangeEnd + 1;

      console.log(
        `Shard ${shardId} assigned virtual shard range: ${rangeStart}-${rangeEnd}`
      );
    }
  }

  async migrateDataForBalance(sourceShardId, targetShardId) {
    console.log(`Migrating data from ${sourceShardId} to ${targetShardId}`);

    // Get users to migrate (move users whose virtual shard now belongs to target)
    const sourcePool = this.shards.get(sourceShardId).pool;
    const targetPool = this.shards.get(targetShardId).pool;
    const targetRanges = this.shards.get(targetShardId).ranges;

    // Find users that should be on target shard based on new virtual shard mapping
    const usersToMigrate = await sourcePool.query(
      "SELECT id, username, email, profile_data, created_at FROM users"
    );

    const migratePromises = [];

    for (const user of usersToMigrate.rows) {
      const virtualShard = this.hashUserId(user.id) % this.virtualShards;

      if (virtualShard >= targetRanges[0] && virtualShard <= targetRanges[1]) {
        migratePromises.push(this.migrateUser(user, sourcePool, targetPool));
      }
    }

    await Promise.all(migratePromises);
    console.log(
      `Migrated ${migratePromises.length} users from ${sourceShardId} to ${targetShardId}`
    );
  }

  async migrateUser(user, sourcePool, targetPool) {
    try {
      // Start transaction on target shard
      const targetClient = await targetPool.connect();
      const sourceClient = await sourcePool.connect();

      try {
        await targetClient.query("BEGIN");
        await sourceClient.query("BEGIN");

        // Copy user to target shard
        await targetClient.query(
          `INSERT INTO users (id, username, email, profile_data, created_at)
           VALUES ($1, $2, $3, $4, $5)`,
          [
            user.id,
            user.username,
            user.email,
            user.profile_data,
            user.created_at,
          ]
        );

        // Copy user's posts
        const posts = await sourceClient.query(
          "SELECT * FROM posts WHERE author_id = $1",
          [user.id]
        );

        for (const post of posts.rows) {
          await targetClient.query(
            `INSERT INTO posts (id, author_id, title, content, created_at, view_count, like_count)
             VALUES ($1, $2, $3, $4, $5, $6, $7)`,
            [
              post.id,
              post.author_id,
              post.title,
              post.content,
              post.created_at,
              post.view_count,
              post.like_count,
            ]
          );
        }

        // Copy user's relationships
        const follows = await sourceClient.query(
          "SELECT * FROM user_follows WHERE follower_id = $1 OR followed_id = $1",
          [user.id]
        );

        for (const follow of follows.rows) {
          await targetClient.query(
            "INSERT INTO user_follows (follower_id, followed_id, created_at) VALUES ($1, $2, $3)",
            [follow.follower_id, follow.followed_id, follow.created_at]
          );
        }

        // Delete from source shard
        await sourceClient.query(
          "DELETE FROM user_follows WHERE follower_id = $1 OR followed_id = $1",
          [user.id]
        );
        await sourceClient.query("DELETE FROM posts WHERE author_id = $1", [
          user.id,
        ]);
        await sourceClient.query("DELETE FROM users WHERE id = $1", [user.id]);

        await targetClient.query("COMMIT");
        await sourceClient.query("COMMIT");
      } catch (error) {
        await targetClient.query("ROLLBACK");
        await sourceClient.query("ROLLBACK");
        throw error;
      } finally {
        targetClient.release();
        sourceClient.release();
      }
    } catch (error) {
      console.error(`Failed to migrate user ${user.id}:`, error);
      throw error;
    }
  }
}

Database Clustering: High Availability and Failover

Master-Master Clustering with Automatic Failover

Production-grade cluster management:

// High availability cluster with automatic failover
class DatabaseCluster {
  constructor() {
    this.nodes = new Map();
    this.currentMaster = null;
    this.clusterState = "initializing";
    this.failoverInProgress = false;
    this.healthCheckInterval = 5000; // 5 seconds
    this.failoverTimeout = 30000; // 30 seconds

    this.setupClusterNodes();
    this.startHealthMonitoring();
    this.setupFailoverLogic();
  }

  setupClusterNodes() {
    const nodeConfigs = [
      {
        id: "node_1",
        host: "db-node-1.cluster.local",
        port: 5432,
        role: "master",
        priority: 100, // Higher priority = preferred master
      },
      {
        id: "node_2",
        host: "db-node-2.cluster.local",
        port: 5432,
        role: "standby",
        priority: 90,
      },
      {
        id: "node_3",
        host: "db-node-3.cluster.local",
        port: 5432,
        role: "standby",
        priority: 80,
      },
    ];

    nodeConfigs.forEach((config) => {
      this.nodes.set(config.id, {
        id: config.id,
        host: config.host,
        port: config.port,
        role: config.role,
        priority: config.priority,
        pool: new Pool({
          host: config.host,
          port: config.port,
          database: process.env.DB_NAME,
          user: process.env.DB_USER,
          password: process.env.DB_PASSWORD,
          max: 20,
          connectionTimeoutMillis: 5000,
          idleTimeoutMillis: 30000,
        }),
        isHealthy: true,
        lastHealthCheck: null,
        lagMs: 0,
        consecutiveFailures: 0,
      });
    });

    // Set initial master
    this.currentMaster = this.findPreferredMaster();
    this.clusterState = "healthy";
  }

  findPreferredMaster() {
    const healthyNodes = Array.from(this.nodes.values())
      .filter((node) => node.isHealthy)
      .sort((a, b) => b.priority - a.priority);

    return healthyNodes.length > 0 ? healthyNodes[0].id : null;
  }

  async startHealthMonitoring() {
    setInterval(async () => {
      await this.performHealthChecks();

      if (!this.failoverInProgress) {
        await this.checkFailoverConditions();
      }
    }, this.healthCheckInterval);
  }

  async performHealthChecks() {
    const healthPromises = Array.from(this.nodes.entries()).map(
      async ([nodeId, node]) => {
        try {
          const startTime = Date.now();

          // Check basic connectivity
          const result = await node.pool.query(
            "SELECT pg_is_in_recovery(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), now()"
          );

          const row = result.rows[0];
          const responseTime = Date.now() - startTime;

          // Update node status
          node.isHealthy = true;
          node.lastHealthCheck = Date.now();
          node.consecutiveFailures = 0;
          node.responseTime = responseTime;
          node.isStandby = row.pg_is_in_recovery;

          // Calculate replication lag for standby nodes
          if (node.isStandby && this.currentMaster) {
            node.lagMs = await this.calculateReplicationLag(nodeId);
          }

          return { nodeId, status: "healthy", responseTime, lag: node.lagMs };
        } catch (error) {
          console.error(`Health check failed for ${nodeId}:`, error.message);

          node.isHealthy = false;
          node.consecutiveFailures++;

          return { nodeId, status: "unhealthy", error: error.message };
        }
      }
    );

    const results = await Promise.all(healthPromises);

    // Log cluster health summary
    const healthyCount = results.filter((r) => r.status === "healthy").length;
    console.log(
      `Cluster health: ${healthyCount}/${results.length} nodes healthy`
    );
  }

  async calculateReplicationLag(standbyNodeId) {
    try {
      const masterNode = this.nodes.get(this.currentMaster);
      const standbyNode = this.nodes.get(standbyNodeId);

      if (!masterNode || !standbyNode) return 0;

      const [masterResult, standbyResult] = await Promise.all([
        masterNode.pool.query("SELECT pg_current_wal_lsn()"),
        standbyNode.pool.query("SELECT pg_last_wal_replay_lsn()"),
      ]);

      const masterLsn = masterResult.rows[0].pg_current_wal_lsn;
      const standbyLsn = standbyResult.rows[0].pg_last_wal_replay_lsn;

      // Calculate LSN difference (simplified)
      const lagBytes = this.calculateLsnDifference(masterLsn, standbyLsn);

      // Estimate lag in milliseconds (rough approximation)
      const estimatedLagMs = lagBytes / 1024; // 1KB = ~1ms lag

      return Math.max(0, estimatedLagMs);
    } catch (error) {
      console.warn(
        `Failed to calculate replication lag for ${standbyNodeId}:`,
        error
      );
      return 0;
    }
  }

  calculateLsnDifference(masterLsn, standbyLsn) {
    // Simplified LSN difference calculation
    // In production, use proper PostgreSQL LSN arithmetic
    return 0; // Placeholder
  }

  async checkFailoverConditions() {
    const masterNode = this.nodes.get(this.currentMaster);

    if (!masterNode || !masterNode.isHealthy) {
      console.warn(
        `Master node ${this.currentMaster} is unhealthy - initiating failover`
      );
      await this.initiateFailover("master_unhealthy");
      return;
    }

    // Check for high replication lag
    const standbyNodes = Array.from(this.nodes.values()).filter(
      (node) => node.id !== this.currentMaster && node.isHealthy
    );

    const highLagThreshold = 10000; // 10 seconds
    const nodesWithHighLag = standbyNodes.filter(
      (node) => node.lagMs > highLagThreshold
    );

    if (nodesWithHighLag.length > 0) {
      console.warn(
        `High replication lag detected on ${nodesWithHighLag.length} standby nodes`
      );
      // Don't failover for lag alone, but alert
      this.alertHighReplicationLag(nodesWithHighLag);
    }
  }

  async initiateFailover(reason) {
    if (this.failoverInProgress) {
      console.log("Failover already in progress");
      return;
    }

    this.failoverInProgress = true;
    this.clusterState = "failover";

    try {
      console.log(`Starting failover process. Reason: ${reason}`);

      // 1. Find best candidate for new master
      const newMaster = this.selectFailoverCandidate();
      if (!newMaster) {
        throw new Error("No healthy standby nodes available for failover");
      }

      console.log(`Selected ${newMaster.id} as new master`);

      // 2. Promote standby to master
      await this.promoteStandbyToMaster(newMaster.id);

      // 3. Update cluster configuration
      this.currentMaster = newMaster.id;
      newMaster.role = "master";

      // 4. Reconfigure remaining nodes
      await this.reconfigureStandbyNodes(newMaster.id);

      // 5. Update application connection strings
      await this.updateApplicationConfiguration(newMaster);

      console.log(
        `Failover completed successfully. New master: ${newMaster.id}`
      );
      this.clusterState = "healthy";

      // 6. Alert operations team
      this.alertFailoverComplete(newMaster.id, reason);
    } catch (error) {
      console.error("Failover failed:", error);
      this.clusterState = "failed";
      this.alertFailoverFailed(error);
    } finally {
      this.failoverInProgress = false;
    }
  }

  selectFailoverCandidate() {
    const healthyStandbys = Array.from(this.nodes.values())
      .filter(
        (node) =>
          node.id !== this.currentMaster &&
          node.isHealthy &&
          node.consecutiveFailures < 3
      )
      .sort((a, b) => {
        // Prefer nodes with lower lag and higher priority
        if (a.lagMs !== b.lagMs) {
          return a.lagMs - b.lagMs;
        }
        return b.priority - a.priority;
      });

    return healthyStandbys[0] || null;
  }

  async promoteStandbyToMaster(nodeId) {
    const node = this.nodes.get(nodeId);

    try {
      // Execute promotion command (PostgreSQL specific)
      await node.pool.query("SELECT pg_promote()");

      // Wait for promotion to complete
      let attempts = 0;
      const maxAttempts = 30; // 30 seconds timeout

      while (attempts < maxAttempts) {
        try {
          const result = await node.pool.query("SELECT pg_is_in_recovery()");

          if (!result.rows[0].pg_is_in_recovery) {
            console.log(`Node ${nodeId} successfully promoted to master`);
            return;
          }
        } catch (error) {
          // Node might be restarting during promotion
          console.log(
            `Waiting for promotion to complete (attempt ${attempts + 1})`
          );
        }

        await new Promise((resolve) => setTimeout(resolve, 1000));
        attempts++;
      }

      throw new Error(
        `Promotion timeout - node ${nodeId} failed to become master`
      );
    } catch (error) {
      console.error(`Failed to promote ${nodeId} to master:`, error);
      throw error;
    }
  }

  async reconfigureStandbyNodes(newMasterId) {
    const newMaster = this.nodes.get(newMasterId);

    const standbyPromises = Array.from(this.nodes.values())
      .filter((node) => node.id !== newMasterId && node.isHealthy)
      .map(async (node) => {
        try {
          // Reconfigure standby to follow new master
          await this.reconfigureStandbyReplication(node.id, newMaster);
          node.role = "standby";

          console.log(`Reconfigured ${node.id} to replicate from new master`);
        } catch (error) {
          console.error(`Failed to reconfigure standby ${node.id}:`, error);
        }
      });

    await Promise.all(standbyPromises);
  }

  async reconfigureStandbyReplication(standbyId, newMaster) {
    // Implementation depends on PostgreSQL setup (streaming replication, etc.)
    // This would typically involve updating recovery configuration
    // and restarting replication connections
  }

  // Application interface - automatically routes to current master
  async executeWrite(query, params = []) {
    const masterNode = this.nodes.get(this.currentMaster);

    if (!masterNode || !masterNode.isHealthy) {
      throw new Error("No healthy master node available");
    }

    return await masterNode.pool.query(query, params);
  }

  async executeRead(query, params = []) {
    // Try to use a healthy standby first to reduce master load
    const healthyStandbys = Array.from(this.nodes.values())
      .filter(
        (node) =>
          node.id !== this.currentMaster && node.isHealthy && node.lagMs < 5000 // Less than 5 second lag
      )
      .sort((a, b) => a.lagMs - b.lagMs);

    if (healthyStandbys.length > 0) {
      const selectedStandby = healthyStandbys[0];
      return await selectedStandby.pool.query(query, params);
    }

    // Fallback to master if no suitable standby
    return await this.executeWrite(query, params);
  }

  getClusterStatus() {
    const nodeStatuses = Array.from(this.nodes.entries()).map(
      ([nodeId, node]) => ({
        id: nodeId,
        role: node.role,
        isHealthy: node.isHealthy,
        isCurrent: nodeId === this.currentMaster,
        responseTime: node.responseTime,
        lagMs: node.lagMs,
        consecutiveFailures: node.consecutiveFailures,
      })
    );

    return {
      state: this.clusterState,
      currentMaster: this.currentMaster,
      failoverInProgress: this.failoverInProgress,
      nodes: nodeStatuses,
    };
  }
}

Performance Monitoring and Optimization at Scale

Comprehensive Performance Monitoring System

Real-time performance dashboard:

// Advanced performance monitoring for distributed databases
class DatabasePerformanceMonitor {
  constructor(databaseCluster) {
    this.cluster = databaseCluster;
    this.metrics = new Map();
    this.alertRules = new Map();
    this.performanceHistory = [];

    this.setupMetricsCollection();
    this.setupAlertingRules();
    this.startMonitoring();
  }

  setupMetricsCollection() {
    // Key performance indicators to track
    this.kpis = {
      // Query performance
      avgQueryTime: { threshold: 100, unit: "ms" },
      slowQueryCount: { threshold: 10, unit: "per_minute" },
      queryErrorRate: { threshold: 0.01, unit: "percentage" },

      // Connection metrics
      activeConnections: { threshold: 180, unit: "count" },
      connectionWaitTime: { threshold: 50, unit: "ms" },
      connectionErrors: { threshold: 5, unit: "per_minute" },

      // Resource utilization
      cpuUsage: { threshold: 80, unit: "percentage" },
      memoryUsage: { threshold: 85, unit: "percentage" },
      diskUsage: { threshold: 90, unit: "percentage" },
      diskIO: { threshold: 1000, unit: "iops" },

      // Replication metrics
      replicationLag: { threshold: 5000, unit: "ms" },
      replicationErrors: { threshold: 1, unit: "per_minute" },

      // Business metrics
      userQueriesPerSecond: { threshold: 1000, unit: "qps" },
      dataGrowthRate: { threshold: 10, unit: "gb_per_day" },
    };
  }

  async startMonitoring() {
    // Collect metrics every 30 seconds
    setInterval(async () => {
      await this.collectMetrics();
    }, 30000);

    // Generate performance reports every hour
    setInterval(async () => {
      await this.generatePerformanceReport();
    }, 3600000);

    // Check for optimization opportunities daily
    setInterval(async () => {
      await this.analyzeOptimizationOpportunities();
    }, 24 * 3600000);
  }

  async collectMetrics() {
    const timestamp = Date.now();
    const clusterMetrics = {
      timestamp,
      nodes: new Map(),
      aggregate: {},
    };

    // Collect metrics from each node
    for (const [nodeId, node] of this.cluster.nodes) {
      if (!node.isHealthy) continue;

      try {
        const nodeMetrics = await this.collectNodeMetrics(nodeId, node);
        clusterMetrics.nodes.set(nodeId, nodeMetrics);
      } catch (error) {
        console.error(`Failed to collect metrics for ${nodeId}:`, error);
      }
    }

    // Calculate aggregate metrics
    clusterMetrics.aggregate = this.calculateAggregateMetrics(
      clusterMetrics.nodes
    );

    // Store in history
    this.performanceHistory.push(clusterMetrics);

    // Keep only last 24 hours of data
    const cutoff = timestamp - 24 * 60 * 60 * 1000;
    this.performanceHistory = this.performanceHistory.filter(
      (m) => m.timestamp > cutoff
    );

    // Check alert conditions
    await this.checkAlertConditions(clusterMetrics.aggregate);

    // Send to external monitoring system
    await this.sendToMonitoringSystem(clusterMetrics);
  }

  async collectNodeMetrics(nodeId, node) {
    const metrics = {};

    // Database-specific metrics
    const dbStatsResult = await node.pool.query(`
      SELECT 
        -- Connection stats
        (SELECT count(*) FROM pg_stat_activity WHERE state = 'active') as active_connections,
        (SELECT count(*) FROM pg_stat_activity WHERE state = 'idle in transaction') as idle_in_transaction,
        
        -- Query stats  
        (SELECT avg(mean_exec_time) FROM pg_stat_statements WHERE calls > 10) as avg_query_time,
        (SELECT count(*) FROM pg_stat_statements WHERE mean_exec_time > 1000) as slow_query_count,
        
        -- Database size
        pg_database_size(current_database()) as db_size_bytes,
        
        -- Cache hit ratio
        (SELECT sum(heap_blks_hit) / nullif(sum(heap_blks_hit + heap_blks_read), 0) FROM pg_statio_user_tables) as cache_hit_ratio,
        
        -- Transaction stats
        (SELECT sum(xact_commit) FROM pg_stat_database WHERE datname = current_database()) as commits,
        (SELECT sum(xact_rollback) FROM pg_stat_database WHERE datname = current_database()) as rollbacks,
        
        -- Lock stats
        (SELECT count(*) FROM pg_locks WHERE NOT granted) as waiting_locks,
        
        -- Replication lag (if standby)
        CASE WHEN pg_is_in_recovery() THEN 
          EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000
        ELSE 0 END as replication_lag_ms
    `);

    const row = dbStatsResult.rows[0];

    metrics.database = {
      activeConnections: parseInt(row.active_connections) || 0,
      idleInTransaction: parseInt(row.idle_in_transaction) || 0,
      avgQueryTime: parseFloat(row.avg_query_time) || 0,
      slowQueryCount: parseInt(row.slow_query_count) || 0,
      dbSizeGB: parseFloat(row.db_size_bytes) / (1024 * 1024 * 1024),
      cacheHitRatio: parseFloat(row.cache_hit_ratio) || 0,
      commits: parseInt(row.commits) || 0,
      rollbacks: parseInt(row.rollbacks) || 0,
      waitingLocks: parseInt(row.waiting_locks) || 0,
      replicationLagMs: parseFloat(row.replication_lag_ms) || 0,
    };

    // System metrics (would typically come from system monitoring agent)
    metrics.system = await this.collectSystemMetrics(nodeId);

    // Application metrics
    metrics.application = await this.collectApplicationMetrics(nodeId);

    return metrics;
  }

  async collectSystemMetrics(nodeId) {
    // In production, this would integrate with system monitoring tools
    // like Prometheus, DataDog agents, or CloudWatch
    return {
      cpuUsage: Math.random() * 100, // Placeholder
      memoryUsage: Math.random() * 100,
      diskUsage: Math.random() * 100,
      diskIOPS: Math.random() * 1000,
      networkBytesPerSec: Math.random() * 1024 * 1024 * 100, // 100MB/s max
    };
  }

  async collectApplicationMetrics(nodeId) {
    // Application-level metrics from your monitoring system
    return {
      requestsPerSecond: Math.random() * 1000,
      errorRate: Math.random() * 0.05, // 5% max
      responseTime: Math.random() * 200, // 200ms max
    };
  }

  calculateAggregateMetrics(nodeMetrics) {
    const values = Array.from(nodeMetrics.values());

    if (values.length === 0) return {};

    const aggregate = {};

    // Sum metrics
    const sumFields = [
      "activeConnections",
      "slowQueryCount",
      "commits",
      "rollbacks",
    ];
    sumFields.forEach((field) => {
      aggregate[field] = values.reduce(
        (sum, node) => sum + (node.database[field] || 0),
        0
      );
    });

    // Average metrics
    const avgFields = [
      "avgQueryTime",
      "cacheHitRatio",
      "cpuUsage",
      "memoryUsage",
      "diskUsage",
    ];
    avgFields.forEach((field) => {
      const sourceObj = field.includes("Usage") ? "system" : "database";
      const validValues = values
        .map((node) => node[sourceObj][field])
        .filter((val) => val != null && !isNaN(val));

      if (validValues.length > 0) {
        aggregate[field] =
          validValues.reduce((sum, val) => sum + val, 0) / validValues.length;
      } else {
        aggregate[field] = 0;
      }
    });

    // Max metrics
    const maxFields = ["replicationLagMs", "waitingLocks"];
    maxFields.forEach((field) => {
      aggregate[field] = Math.max(
        ...values.map((node) => node.database[field] || 0)
      );
    });

    // Calculate derived metrics
    aggregate.errorRate =
      aggregate.rollbacks /
      Math.max(aggregate.commits + aggregate.rollbacks, 1);
    aggregate.connectionUtilization =
      aggregate.activeConnections / (values.length * 200); // Assume 200 max per node

    return aggregate;
  }

  async checkAlertConditions(metrics) {
    const alerts = [];

    // Check each KPI against thresholds
    Object.entries(this.kpis).forEach(([metricName, config]) => {
      const currentValue = metrics[metricName];

      if (currentValue != null && currentValue > config.threshold) {
        alerts.push({
          metric: metricName,
          currentValue,
          threshold: config.threshold,
          unit: config.unit,
          severity: this.calculateAlertSeverity(currentValue, config.threshold),
          timestamp: Date.now(),
        });
      }
    });

    // Send alerts if any
    if (alerts.length > 0) {
      await this.sendAlerts(alerts);
    }

    return alerts;
  }

  calculateAlertSeverity(value, threshold) {
    const ratio = value / threshold;

    if (ratio > 2) return "critical";
    if (ratio > 1.5) return "warning";
    return "info";
  }

  async generatePerformanceReport() {
    const now = Date.now();
    const oneHourAgo = now - 60 * 60 * 1000;

    // Get metrics from last hour
    const hourlyMetrics = this.performanceHistory.filter(
      (m) => m.timestamp >= oneHourAgo
    );

    if (hourlyMetrics.length < 2) return; // Not enough data

    const report = {
      period: { start: oneHourAgo, end: now },
      summary: this.calculatePeriodSummary(hourlyMetrics),
      trends: this.calculateTrends(hourlyMetrics),
      topQueries: await this.getTopQueries(),
      recommendations: await this.generateRecommendations(hourlyMetrics),
    };

    console.log("=== Hourly Performance Report ===");
    console.log(
      `Period: ${new Date(oneHourAgo).toISOString()} - ${new Date(
        now
      ).toISOString()}`
    );
    console.log(
      `Average Query Time: ${report.summary.avgQueryTime.toFixed(2)}ms`
    );
    console.log(`Slow Queries: ${report.summary.slowQueryCount}`);
    console.log(
      `Connection Utilization: ${(
        report.summary.connectionUtilization * 100
      ).toFixed(1)}%`
    );
    console.log(
      `Cache Hit Ratio: ${(report.summary.cacheHitRatio * 100).toFixed(1)}%`
    );

    if (report.recommendations.length > 0) {
      console.log("\n=== Recommendations ===");
      report.recommendations.forEach((rec, index) => {
        console.log(`${index + 1}. ${rec.title}: ${rec.description}`);
      });
    }

    // Store report for later analysis
    await this.storePerformanceReport(report);
  }

  async analyzeOptimizationOpportunities() {
    console.log("Analyzing optimization opportunities...");

    const opportunities = [];

    // Check for missing indexes
    const missingIndexes = await this.findMissingIndexes();
    opportunities.push(...missingIndexes);

    // Check for unused indexes
    const unusedIndexes = await this.findUnusedIndexes();
    opportunities.push(...unusedIndexes);

    // Check for table bloat
    const bloatedTables = await this.findBloatedTables();
    opportunities.push(...bloatedTables);

    // Check for inefficient queries
    const inefficientQueries = await this.findInefficientQueries();
    opportunities.push(...inefficientQueries);

    if (opportunities.length > 0) {
      console.log(`Found ${opportunities.length} optimization opportunities:`);
      opportunities.forEach((opp, index) => {
        console.log(
          `${index + 1}. ${opp.type}: ${opp.description} (Impact: ${
            opp.impact
          })`
        );
      });

      // Generate optimization script
      await this.generateOptimizationScript(opportunities);
    } else {
      console.log("No immediate optimization opportunities found.");
    }
  }

  async findMissingIndexes() {
    const opportunities = [];

    // This would analyze query patterns and suggest indexes
    // Implementation would examine pg_stat_statements and identify
    // frequently executed queries with high cost sequential scans

    return opportunities;
  }

  async findUnusedIndexes() {
    const opportunities = [];

    try {
      const result = await this.cluster.executeRead(`
        SELECT 
          schemaname, 
          tablename, 
          indexname,
          idx_scan,
          pg_size_pretty(pg_relation_size(indexrelid)) as size
        FROM pg_stat_user_indexes
        WHERE idx_scan < 10 
          AND pg_relation_size(indexrelid) > 1024 * 1024 -- Larger than 1MB
        ORDER BY pg_relation_size(indexrelid) DESC
      `);

      result.rows.forEach((row) => {
        opportunities.push({
          type: "unused_index",
          description: `Index ${row.indexname} on ${row.tablename} is rarely used (${row.idx_scan} scans) but takes ${row.size}`,
          impact: "medium",
          action: `Consider dropping index: DROP INDEX ${row.indexname};`,
        });
      });
    } catch (error) {
      console.error("Failed to find unused indexes:", error);
    }

    return opportunities;
  }

  async findBloatedTables() {
    const opportunities = [];

    try {
      const result = await this.cluster.executeRead(`
        SELECT 
          schemaname,
          tablename,
          n_dead_tup,
          n_live_tup,
          CASE WHEN n_live_tup > 0 
            THEN n_dead_tup::float / n_live_tup::float 
            ELSE 0 
          END as bloat_ratio
        FROM pg_stat_user_tables
        WHERE n_dead_tup > 10000
          AND n_dead_tup::float / GREATEST(n_live_tup::float, 1) > 0.1
        ORDER BY bloat_ratio DESC
      `);

      result.rows.forEach((row) => {
        opportunities.push({
          type: "table_bloat",
          description: `Table ${row.tablename} has high bloat ratio (${(
            row.bloat_ratio * 100
          ).toFixed(1)}%) with ${row.n_dead_tup} dead tuples`,
          impact: "high",
          action: `Consider running: VACUUM FULL ${row.tablename}; or REINDEX TABLE ${row.tablename};`,
        });
      });
    } catch (error) {
      console.error("Failed to find bloated tables:", error);
    }

    return opportunities;
  }
}

// Usage
const performanceMonitor = new DatabasePerformanceMonitor(databaseCluster);

// API endpoints for performance dashboard
app.get("/admin/performance", async (req, res) => {
  try {
    const status = performanceMonitor.cluster.getClusterStatus();
    const recentMetrics = performanceMonitor.performanceHistory.slice(-10);

    res.json({
      clusterStatus: status,
      recentMetrics,
      kpis: performanceMonitor.kpis,
    });
  } catch (error) {
    res.status(500).json({ error: "Failed to get performance data" });
  }
});

Key Takeaways

Database scaling separates applications that grow from those that scale. Read replicas distribute query load across multiple servers, sharding enables horizontal data distribution, clustering provides high availability, and comprehensive monitoring ensures performance remains optimal as scale increases.

The database scaling mindset you need:

  • Horizontal beats vertical: Adding more servers scales better than adding more power to one server
  • Read scaling comes first: Most applications are read-heavy, so read replicas provide immediate scaling benefits
  • Sharding is inevitable: Eventually, your data will outgrow any single database server’s capabilities
  • High availability requires redundancy: Single points of failure will cause outages - design for automatic failover

What distinguishes scalable database architectures:

  • Read replica strategies that handle 10x-100x query load increases without degrading performance
  • Sharding systems that distribute data intelligently while maintaining query performance
  • Clustering configurations that provide automatic failover with minimal downtime
  • Monitoring systems that identify performance bottlenecks before they impact users

Series Conclusion

We’ve completed our comprehensive journey through database fundamentals, advanced data management, and scaling strategies. You now possess the complete toolkit for building data architectures that scale from thousands to millions of users while maintaining performance, consistency, and reliability.

The complete database architect’s mastery:

  • Relational expertise with complex queries, transactions, and data integrity
  • NoSQL proficiency across document, key-value, column-family, and graph paradigms
  • Advanced data orchestration through caching, search, synchronization, and event processing
  • Scaling architectures with read replicas, sharding, clustering, and performance monitoring
  • Operational excellence in connection management, migrations, backups, and optimization

What’s Next

With database mastery complete, we move into the security and authentication phase of backend development. You’ll learn to protect applications and data with proper authentication systems, authorization patterns, secure coding practices, and threat mitigation strategies.

Data is the foundation. Security is the protection. Together, they enable you to build backend systems that not only perform flawlessly at scale but also protect user data and maintain trust in an increasingly connected world.

You’re no longer just storing and querying data—you’re architecting information systems that scale infinitely, perform consistently, and operate reliably under any conditions. The data layer is complete. Now we secure what we’ve built.