Database Scaling & Performance
From Database Performance to Infinite Scale
You’ve mastered database fundamentals, implemented intelligent caching, built advanced data management systems with search and synchronization, and created event-driven architectures that process millions of background jobs. Your applications handle complex data orchestration with ease. But here’s the scaling reality that every successful application eventually faces: no matter how well you optimize individual queries or implement caching strategies, there comes a point where a single database server simply cannot handle your application’s growth.
The single database bottleneck that kills scaling dreams:
// Your perfectly optimized application works beautifully
const getUserFeed = async (userId, page = 1) => {
// Cached, optimized, indexed - everything done right
const feedKey = `user:${userId}:feed:${page}`;
let feed = await cache.get(feedKey);
if (!feed) {
// Optimized query with proper indexes
feed = await db.posts
.find({
$or: [
{ authorId: { $in: user.following } },
{ tags: { $in: user.interests } },
],
})
.sort({ createdAt: -1, engagement: -1 })
.skip((page - 1) * 20)
.limit(20);
await cache.set(feedKey, feed, 300);
}
return feed;
};
// This works perfectly... until it doesn't:
// - 100 users: 50ms average response time
// - 10,000 users: 200ms average response time
// - 100,000 users: 2 second response time
// - 1,000,000 users: Database crashes under load
// No amount of optimization can save you from the fundamental limits:
// - Single database server has finite CPU, memory, and I/O
// - Concurrent connections max out (even with pooling)
// - Storage performance degrades with dataset size
// - Backup and maintenance windows become unacceptable
The scaling wall every growing application hits:
-- Your database performance metrics start looking like this:
-- Query performance degrades with table size
SELECT * FROM posts WHERE author_id IN (...)
ORDER BY created_at DESC LIMIT 20;
-- 1 million posts: 10ms
-- 10 million posts: 100ms
-- 100 million posts: 2000ms
-- 500 million posts: Query timeout
-- Connection pool exhaustion
SHOW PROCESSLIST;
-- 200 connections running simultaneously
-- New connections getting "Too many connections" error
-- Application threads blocked waiting for database connections
-- Lock contention becomes severe
SHOW ENGINE INNODB STATUS;
-- Deadlocks detected: 45 per minute
-- Lock wait timeouts: 120 per minute
-- Users experiencing random "Database error" messages
The uncomfortable scaling truth: Even perfectly architected applications with intelligent caching, optimized queries, and proper indexing will eventually exhaust the capabilities of a single database server. Modern applications serving millions of users require distributed database architectures that can scale horizontally across multiple servers while maintaining consistency, availability, and performance.
Database scaling mastery requires understanding:
- Horizontal scaling strategies that distribute data across multiple database servers
- Read replica architectures that handle read-heavy workloads by distributing queries
- Sharding and partitioning patterns that split large datasets intelligently across nodes
- Database clustering techniques that provide high availability and automatic failover
- Performance monitoring systems that identify bottlenecks before they impact users
This article completes your database education by transforming you from a single-database architect into a distributed systems engineer. You’ll learn to design database infrastructures that scale to handle billions of records and millions of concurrent users while maintaining the performance characteristics of much smaller systems.
Read Replicas: Distributing Read Traffic
Master-Slave Architecture for Read Scaling
The read-heavy application challenge:
// Typical web application read/write ratio: 90% reads, 10% writes
const appMetrics = {
totalQueries: 1000000,
readQueries: 900000, // User feeds, search, profiles, content
writeQueries: 100000, // Posts, likes, comments, updates
};
// Single database handling both reads and writes
const handleUserRequest = async (req, res) => {
// Every request hits the same database server
// Reads compete with writes for resources
// Database CPU: 95%
// Database Memory: 90%
// Connection Pool: 180/200 connections used
// Users experiencing slow response times
};
// Even with perfect caching, cache misses and real-time data still overwhelm the database
Professional read replica architecture:
// ✅ Multi-database architecture for massive read scaling
const { Pool } = require("pg");
class DatabaseCluster {
constructor() {
// Master database for writes
this.master = new Pool({
host: process.env.MASTER_DB_HOST,
port: process.env.MASTER_DB_PORT,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20, // Smaller pool for writes
ssl: { rejectUnauthorized: false },
});
// Read replicas for distributed read traffic
this.readReplicas = [
new Pool({
host: process.env.READ_REPLICA_1_HOST,
port: process.env.READ_REPLICA_1_PORT,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 50, // Larger pools for reads
ssl: { rejectUnauthorized: false },
}),
new Pool({
host: process.env.READ_REPLICA_2_HOST,
port: process.env.READ_REPLICA_2_PORT,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 50,
ssl: { rejectUnauthorized: false },
}),
new Pool({
host: process.env.READ_REPLICA_3_HOST,
port: process.env.READ_REPLICA_3_PORT,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 50,
ssl: { rejectUnauthorized: false },
}),
];
this.currentReplicaIndex = 0;
this.replicaHealthStatus = new Map();
this.setupHealthChecks();
this.setupFailoverLogic();
}
// Route writes to master
async executeWrite(query, params = []) {
const startTime = Date.now();
try {
const result = await this.master.query(query, params);
this.logQueryPerformance("write", query, Date.now() - startTime);
return result;
} catch (error) {
console.error("Write query failed:", error);
throw error;
}
}
// Route reads to replicas with load balancing
async executeRead(query, params = [], options = {}) {
const startTime = Date.now();
const maxRetries = options.maxRetries || 3;
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
const replica = this.selectHealthyReplica(options.preferredReplica);
try {
const result = await replica.query(query, params);
this.logQueryPerformance(
"read",
query,
Date.now() - startTime,
replica
);
return result;
} catch (error) {
console.warn(`Read replica failed (attempt ${attempt + 1}):`, error);
lastError = error;
// Mark replica as potentially unhealthy
this.markReplicaUnhealthy(replica);
}
}
// All replicas failed - fallback to master
console.warn("All read replicas failed, falling back to master");
try {
const result = await this.master.query(query, params);
this.logQueryPerformance("read_fallback", query, Date.now() - startTime);
return result;
} catch (error) {
console.error("Master fallback also failed:", error);
throw lastError || error;
}
}
selectHealthyReplica(preferredIndex = null) {
// Use preferred replica if specified and healthy
if (
preferredIndex !== null &&
preferredIndex < this.readReplicas.length &&
this.isReplicaHealthy(this.readReplicas[preferredIndex])
) {
return this.readReplicas[preferredIndex];
}
// Find healthy replicas
const healthyReplicas = this.readReplicas.filter((replica) =>
this.isReplicaHealthy(replica)
);
if (healthyReplicas.length === 0) {
throw new Error("No healthy read replicas available");
}
// Round-robin load balancing among healthy replicas
const replica =
healthyReplicas[this.currentReplicaIndex % healthyReplicas.length];
this.currentReplicaIndex++;
return replica;
}
isReplicaHealthy(replica) {
const replicaId = this.getReplicaId(replica);
const healthStatus = this.replicaHealthStatus.get(replicaId);
if (!healthStatus) return true; // Assume healthy if no status recorded
// Check if unhealthy period has expired
if (
healthStatus.unhealthyUntil &&
Date.now() > healthStatus.unhealthyUntil
) {
this.replicaHealthStatus.delete(replicaId);
return true;
}
return healthStatus.isHealthy !== false;
}
markReplicaUnhealthy(replica) {
const replicaId = this.getReplicaId(replica);
this.replicaHealthStatus.set(replicaId, {
isHealthy: false,
unhealthyUntil: Date.now() + 30 * 1000, // Retry after 30 seconds
});
}
getReplicaId(replica) {
return `${replica.options.host}:${replica.options.port}`;
}
setupHealthChecks() {
// Regular health checks for all replicas
setInterval(async () => {
const healthPromises = this.readReplicas.map(async (replica, index) => {
try {
const start = Date.now();
await replica.query("SELECT 1 as health_check");
const latency = Date.now() - start;
const replicaId = this.getReplicaId(replica);
this.replicaHealthStatus.set(replicaId, {
isHealthy: true,
latency,
lastChecked: Date.now(),
});
return { index, status: "healthy", latency };
} catch (error) {
console.error(`Health check failed for replica ${index}:`, error);
this.markReplicaUnhealthy(replica);
return { index, status: "unhealthy", error: error.message };
}
});
const results = await Promise.all(healthPromises);
// Log health status
const healthySummary = results.filter((r) => r.status === "healthy");
console.log(
`Replica health check: ${healthySummary.length}/${results.length} healthy`
);
// Alert if too many replicas are unhealthy
if (healthySummary.length < this.readReplicas.length * 0.5) {
console.error("ALERT: More than 50% of read replicas are unhealthy!");
this.sendHealthAlert(results);
}
}, 30000); // Check every 30 seconds
}
// Transaction support (must use master)
async withTransaction(callback) {
const client = await this.master.connect();
try {
await client.query("BEGIN");
const result = await callback(client);
await client.query("COMMIT");
return result;
} catch (error) {
await client.query("ROLLBACK");
throw error;
} finally {
client.release();
}
}
// Connection management
async closeAll() {
await this.master.end();
await Promise.all(this.readReplicas.map((replica) => replica.end()));
}
logQueryPerformance(type, query, duration, replica = null) {
const sanitizedQuery = query.substring(0, 100);
if (duration > 1000) {
// Log slow queries
console.warn(`Slow ${type} query (${duration}ms):`, {
query: sanitizedQuery,
replica: replica ? this.getReplicaId(replica) : "master",
});
}
// Send to monitoring system
this.sendMetric(`database.query.${type}.duration`, duration);
this.sendMetric(`database.query.${type}.count`, 1);
}
}
// Usage in application
const db = new DatabaseCluster();
// Data access layer that automatically routes queries
class UserService {
async getUserProfile(userId) {
// Read operation - goes to replica
const result = await db.executeRead(
"SELECT id, username, email, profile_data, created_at FROM users WHERE id = $1",
[userId]
);
return result.rows[0];
}
async updateUserProfile(userId, updates) {
// Write operation - goes to master
const result = await db.executeWrite(
`UPDATE users
SET profile_data = profile_data || $1, updated_at = NOW()
WHERE id = $2
RETURNING id, username, profile_data`,
[JSON.stringify(updates), userId]
);
return result.rows[0];
}
async getUserFeed(userId, page = 1) {
// Complex read - goes to replica
const result = await db.executeRead(
`SELECT p.id, p.title, p.content_preview, p.created_at,
u.username, u.avatar_url
FROM posts p
JOIN users u ON p.author_id = u.id
WHERE p.author_id IN (
SELECT followed_id FROM user_follows WHERE follower_id = $1
)
ORDER BY p.created_at DESC
LIMIT 20 OFFSET $2`,
[userId, (page - 1) * 20]
);
return result.rows;
}
async createPost(userId, postData) {
// Write operation with transaction - uses master
return await db.withTransaction(async (client) => {
// Insert post
const postResult = await client.query(
`INSERT INTO posts (author_id, title, content, created_at)
VALUES ($1, $2, $3, NOW())
RETURNING id, title, created_at`,
[userId, postData.title, postData.content]
);
// Update user stats
await client.query(
"UPDATE users SET post_count = post_count + 1 WHERE id = $1",
[userId]
);
return postResult.rows[0];
});
}
}
Read Replica Lag Management
Handling eventual consistency:
// Managing read replica lag for time-sensitive operations
class ConsistencyManager {
constructor(dbCluster) {
this.db = dbCluster;
this.lagThreshold = 5000; // 5 seconds max acceptable lag
this.replicaLagStatus = new Map();
this.monitorReplicationLag();
}
async monitorReplicationLag() {
setInterval(async () => {
const lagPromises = this.db.readReplicas.map(async (replica, index) => {
try {
// Check replication lag (PostgreSQL specific)
const result = await replica.query(`
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as lag_seconds
`);
const lagSeconds = parseFloat(result.rows[0].lag_seconds) * 1000; // Convert to ms
const replicaId = this.db.getReplicaId(replica);
this.replicaLagStatus.set(replicaId, {
lag: lagSeconds,
timestamp: Date.now(),
acceptable: lagSeconds < this.lagThreshold,
});
return { index, lag: lagSeconds };
} catch (error) {
console.error(
`Failed to check replica lag for replica ${index}:`,
error
);
return { index, lag: Infinity, error: error.message };
}
});
const lagResults = await Promise.all(lagPromises);
// Alert on high lag
const highLagReplicas = lagResults.filter(
(r) => r.lag > this.lagThreshold
);
if (highLagReplicas.length > 0) {
console.warn(`High replication lag detected:`, highLagReplicas);
}
}, 10000); // Check every 10 seconds
}
// Force read from master for operations requiring strong consistency
async executeStronglyConsistentRead(query, params = []) {
return await this.db.executeWrite(query, params); // Use master connection
}
// Read from replica but verify freshness
async executeEventuallyConsistentRead(
query,
params = [],
maxLagMs = this.lagThreshold
) {
// Find replica with acceptable lag
const acceptableReplica = Array.from(this.replicaLagStatus.entries()).find(
([replicaId, status]) => status.lag <= maxLagMs
);
if (acceptableReplica) {
const replicaIndex = this.db.readReplicas.findIndex(
(replica) => this.db.getReplicaId(replica) === acceptableReplica[0]
);
return await this.db.executeRead(query, params, {
preferredReplica: replicaIndex,
});
}
// No acceptable replica - use master
console.warn("No replicas with acceptable lag, using master for read");
return await this.executeStronglyConsistentRead(query, params);
}
}
// Usage in services
class PostService {
constructor(dbCluster) {
this.consistencyManager = new ConsistencyManager(dbCluster);
}
async createPost(userId, postData) {
const post = await db.withTransaction(async (client) => {
const result = await client.query(
"INSERT INTO posts (author_id, title, content) VALUES ($1, $2, $3) RETURNING *",
[userId, postData.title, postData.content]
);
return result.rows[0];
});
return post;
}
async getPostImmediatelyAfterCreation(postId) {
// User just created this post and wants to see it immediately
// Must read from master to avoid "post not found" due to replication lag
return await this.consistencyManager.executeStronglyConsistentRead(
"SELECT * FROM posts WHERE id = $1",
[postId]
);
}
async getPublicPostFeed() {
// Feed can tolerate some lag for better performance
return await this.consistencyManager.executeEventuallyConsistentRead(
"SELECT * FROM posts WHERE published = true ORDER BY created_at DESC LIMIT 20",
[],
10000 // 10 seconds acceptable lag
);
}
}
Database Sharding: Horizontal Data Distribution
Sharding Strategies and Implementation
The single-database size limit:
// Your application growth trajectory
const dataGrowth = {
month1: { users: 1000, posts: 5000, dbSize: "1GB" },
month6: { users: 50000, posts: 500000, dbSize: "25GB" },
year1: { users: 500000, posts: 10000000, dbSize: "500GB" },
year2: { users: 2000000, posts: 50000000, dbSize: "2TB" },
year3: { users: 5000000, posts: 200000000, dbSize: "10TB" },
// Single database performance degrades exponentially with size
// Backup times: 1GB (1 min) → 10TB (8+ hours)
// Index rebuild times: 25GB (5 mins) → 10TB (12+ hours)
// Query performance tanks as indexes no longer fit in memory
};
Professional horizontal sharding architecture:
// ✅ Distributed sharding system for infinite horizontal scale
class ShardManager {
constructor() {
this.shards = new Map();
this.shardingStrategy = "hash"; // hash, range, directory
this.virtualShards = 1024; // Virtual shards for easier rebalancing
this.setupShards();
}
setupShards() {
// Define physical database shards
const shardConfigs = [
{
id: "shard_0",
host: "shard-0.db.cluster.local",
port: 5432,
ranges: [0, 255], // Virtual shard ranges
},
{
id: "shard_1",
host: "shard-1.db.cluster.local",
port: 5432,
ranges: [256, 511],
},
{
id: "shard_2",
host: "shard-2.db.cluster.local",
port: 5432,
ranges: [512, 767],
},
{
id: "shard_3",
host: "shard-3.db.cluster.local",
port: 5432,
ranges: [768, 1023],
},
];
// Initialize connection pools for each shard
shardConfigs.forEach((config) => {
this.shards.set(config.id, {
pool: new Pool({
host: config.host,
port: config.port,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 25, // Connections per shard
}),
ranges: config.ranges,
isHealthy: true,
lastHealthCheck: Date.now(),
});
});
this.startHealthMonitoring();
}
// Hash-based sharding for user data
getShardForUser(userId) {
// Consistent hashing to distribute users evenly
const hash = this.hashUserId(userId);
const virtualShard = hash % this.virtualShards;
// Find physical shard that contains this virtual shard
for (const [shardId, shard] of this.shards) {
if (virtualShard >= shard.ranges[0] && virtualShard <= shard.ranges[1]) {
return { shardId, shard: shard.pool };
}
}
throw new Error(`No shard found for user ${userId}`);
}
hashUserId(userId) {
// Simple hash function (use crypto.createHash in production)
let hash = 0;
const str = userId.toString();
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = (hash << 5) - hash + char;
hash = hash & hash; // Convert to 32-bit integer
}
return Math.abs(hash);
}
// Execute query on specific shard
async executeOnShard(shardId, query, params = []) {
const shard = this.shards.get(shardId);
if (!shard) {
throw new Error(`Shard ${shardId} not found`);
}
if (!shard.isHealthy) {
throw new Error(`Shard ${shardId} is unhealthy`);
}
return await shard.pool.query(query, params);
}
// Execute query for specific user (automatically routes to correct shard)
async executeForUser(userId, query, params = []) {
const { shardId, shard } = this.getShardForUser(userId);
return await shard.query(query, params);
}
// Cross-shard queries (expensive - avoid when possible)
async executeAcrossAllShards(query, params = []) {
const promises = [];
for (const [shardId, shard] of this.shards) {
if (shard.isHealthy) {
promises.push(
shard.pool
.query(query, params)
.then((result) => ({
shardId,
result,
}))
.catch((error) => ({
shardId,
error,
}))
);
}
}
const results = await Promise.all(promises);
// Separate successful results from errors
const successful = results.filter((r) => !r.error);
const failed = results.filter((r) => r.error);
if (failed.length > 0) {
console.warn("Some shards failed during cross-shard query:", failed);
}
return successful;
}
// Aggregate results from multiple shards
async aggregateFromShards(query, params = [], aggregateFunction) {
const shardResults = await this.executeAcrossAllShards(query, params);
const allRows = shardResults.flatMap((result) => result.result.rows);
if (aggregateFunction) {
return aggregateFunction(allRows);
}
return allRows;
}
startHealthMonitoring() {
setInterval(async () => {
for (const [shardId, shard] of this.shards) {
try {
await shard.pool.query("SELECT 1");
shard.isHealthy = true;
shard.lastHealthCheck = Date.now();
} catch (error) {
console.error(`Shard ${shardId} health check failed:`, error);
shard.isHealthy = false;
// Alert on shard failure
this.alertShardFailure(shardId, error);
}
}
}, 30000);
}
alertShardFailure(shardId, error) {
console.error(`CRITICAL: Shard ${shardId} is down!`, error);
// Implement alerting logic here
}
}
// Sharded data access layer
class ShardedUserService {
constructor() {
this.shardManager = new ShardManager();
}
async createUser(userData) {
const userId = generateId();
const userWithId = { ...userData, id: userId, created_at: new Date() };
// Insert into appropriate shard
const result = await this.shardManager.executeForUser(
userId,
`INSERT INTO users (id, username, email, profile_data, created_at)
VALUES ($1, $2, $3, $4, $5)
RETURNING id, username, email, created_at`,
[
userId,
userData.username,
userData.email,
userData.profileData,
userWithId.created_at,
]
);
return result.rows[0];
}
async getUserById(userId) {
// Automatically routes to correct shard
const result = await this.shardManager.executeForUser(
userId,
"SELECT id, username, email, profile_data, created_at FROM users WHERE id = $1",
[userId]
);
return result.rows[0];
}
async updateUser(userId, updates) {
const result = await this.shardManager.executeForUser(
userId,
`UPDATE users
SET profile_data = profile_data || $1, updated_at = NOW()
WHERE id = $2
RETURNING id, username, email, profile_data`,
[JSON.stringify(updates), userId]
);
return result.rows[0];
}
// Cross-shard operations (expensive)
async searchUsersByUsername(username) {
const searchPattern = `%${username}%`;
const results = await this.shardManager.aggregateFromShards(
"SELECT id, username, email FROM users WHERE username ILIKE $1 LIMIT 10",
[searchPattern],
(allRows) => {
// Merge and sort results from all shards
return allRows
.sort((a, b) => a.username.localeCompare(b.username))
.slice(0, 10); // Final limit across all shards
}
);
return results;
}
async getUserStats() {
// Aggregate statistics across all shards
const results = await this.shardManager.aggregateFromShards(
`SELECT
COUNT(*) as total_users,
COUNT(CASE WHEN created_at > NOW() - INTERVAL '30 days' THEN 1 END) as new_users_30d,
COUNT(CASE WHEN last_login > NOW() - INTERVAL '7 days' THEN 1 END) as active_users_7d
FROM users`,
[],
(allRows) => {
// Aggregate counts from all shards
return allRows.reduce(
(totals, row) => ({
total_users: totals.total_users + parseInt(row.total_users),
new_users_30d: totals.new_users_30d + parseInt(row.new_users_30d),
active_users_7d:
totals.active_users_7d + parseInt(row.active_users_7d),
}),
{ total_users: 0, new_users_30d: 0, active_users_7d: 0 }
);
}
);
return results;
}
}
// Sharded posts service with relationship handling
class ShardedPostService {
constructor() {
this.shardManager = new ShardManager();
this.userService = new ShardedUserService();
}
async createPost(userId, postData) {
const postId = generateId();
// Posts are sharded by author_id to keep user's posts together
const result = await this.shardManager.executeForUser(
userId,
`INSERT INTO posts (id, author_id, title, content, created_at)
VALUES ($1, $2, $3, $4, NOW())
RETURNING id, title, content, created_at`,
[postId, userId, postData.title, postData.content]
);
return result.rows[0];
}
async getUserPosts(userId, page = 1, limit = 20) {
// All user's posts are on the same shard
const result = await this.shardManager.executeForUser(
userId,
`SELECT id, title, content, created_at, view_count, like_count
FROM posts
WHERE author_id = $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3`,
[userId, limit, (page - 1) * limit]
);
return result.rows;
}
// Complex cross-shard operation: building user feed
async getUserFeed(userId, page = 1, limit = 20) {
// 1. Get user's following list (from user's shard)
const followingResult = await this.shardManager.executeForUser(
userId,
"SELECT followed_id FROM user_follows WHERE follower_id = $1",
[userId]
);
const followedUsers = followingResult.rows.map((row) => row.followed_id);
if (followedUsers.length === 0) {
return [];
}
// 2. Group followed users by shard
const usersByShards = new Map();
followedUsers.forEach((followedUserId) => {
const { shardId } = this.shardManager.getShardForUser(followedUserId);
if (!usersByShards.has(shardId)) {
usersByShards.set(shardId, []);
}
usersByShards.get(shardId).push(followedUserId);
});
// 3. Query posts from each relevant shard
const feedPromises = [];
for (const [shardId, userIds] of usersByShards) {
feedPromises.push(
this.shardManager
.executeOnShard(
shardId,
`SELECT p.id, p.title, p.content, p.created_at, p.author_id,
u.username, u.avatar_url
FROM posts p
JOIN users u ON p.author_id = u.id
WHERE p.author_id = ANY($1)
ORDER BY p.created_at DESC
LIMIT $2`,
[userIds, limit * 2] // Get extra to account for merging
)
.then((result) => result.rows)
);
}
const shardResults = await Promise.all(feedPromises);
// 4. Merge and sort results from all shards
const allPosts = shardResults
.flat()
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))
.slice((page - 1) * limit, page * limit);
return allPosts;
}
}
Auto-Sharding and Rebalancing
Dynamic shard management:
// Advanced shard management with automatic rebalancing
class AutoShardManager extends ShardManager {
constructor() {
super();
this.rebalanceThreshold = 0.8; // Rebalance when any shard reaches 80% capacity
this.monitoringInterval = 300000; // 5 minutes
this.isRebalancing = false;
this.startCapacityMonitoring();
}
async startCapacityMonitoring() {
setInterval(async () => {
if (this.isRebalancing) return;
const capacityStats = await this.getShardCapacityStats();
const needsRebalancing = capacityStats.some(
(stat) => stat.utilizationRatio > this.rebalanceThreshold
);
if (needsRebalancing) {
console.log(
"Shard capacity threshold exceeded - initiating rebalancing"
);
await this.initiateRebalancing(capacityStats);
}
}, this.monitoringInterval);
}
async getShardCapacityStats() {
const stats = [];
for (const [shardId, shard] of this.shards) {
if (!shard.isHealthy) continue;
try {
const result = await shard.pool.query(`
SELECT
pg_database_size(current_database()) as db_size_bytes,
(SELECT count(*) FROM users) as user_count,
(SELECT count(*) FROM posts) as post_count
`);
const row = result.rows[0];
const sizeGB = parseInt(row.db_size_bytes) / (1024 * 1024 * 1024);
const maxSizeGB = 100; // 100GB per shard limit
stats.push({
shardId,
sizeGB,
maxSizeGB,
utilizationRatio: sizeGB / maxSizeGB,
userCount: parseInt(row.user_count),
postCount: parseInt(row.post_count),
});
} catch (error) {
console.error(`Failed to get capacity stats for ${shardId}:`, error);
}
}
return stats;
}
async initiateRebalancing(capacityStats) {
this.isRebalancing = true;
try {
console.log("Starting shard rebalancing process");
// 1. Add new shard if needed
const overloadedShards = capacityStats.filter(
(s) => s.utilizationRatio > this.rebalanceThreshold
);
if (overloadedShards.length > 0) {
const newShardId = await this.addNewShard();
console.log(`Added new shard: ${newShardId}`);
// 2. Rebalance virtual shards
await this.rebalanceVirtualShards(newShardId);
// 3. Migrate data to achieve better balance
await this.migrateDataForBalance(
overloadedShards[0].shardId,
newShardId
);
}
console.log("Shard rebalancing completed successfully");
} catch (error) {
console.error("Shard rebalancing failed:", error);
// Alert operations team
} finally {
this.isRebalancing = false;
}
}
async addNewShard() {
const newShardId = `shard_${this.shards.size}`;
const newShardHost = await this.provisionNewShardInstance(); // Cloud API call
// Add to shard map
this.shards.set(newShardId, {
pool: new Pool({
host: newShardHost,
port: 5432,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 25,
}),
ranges: [0, 0], // Will be set during rebalancing
isHealthy: true,
lastHealthCheck: Date.now(),
});
// Initialize schema on new shard
await this.initializeShardSchema(newShardId);
return newShardId;
}
async rebalanceVirtualShards(newShardId) {
const totalShards = this.shards.size;
const virtualShardsPerPhysicalShard = Math.floor(
this.virtualShards / totalShards
);
let currentVirtualShard = 0;
for (const [shardId, shard] of this.shards) {
const rangeStart = currentVirtualShard;
const rangeEnd = Math.min(
currentVirtualShard + virtualShardsPerPhysicalShard - 1,
this.virtualShards - 1
);
shard.ranges = [rangeStart, rangeEnd];
currentVirtualShard = rangeEnd + 1;
console.log(
`Shard ${shardId} assigned virtual shard range: ${rangeStart}-${rangeEnd}`
);
}
}
async migrateDataForBalance(sourceShardId, targetShardId) {
console.log(`Migrating data from ${sourceShardId} to ${targetShardId}`);
// Get users to migrate (move users whose virtual shard now belongs to target)
const sourcePool = this.shards.get(sourceShardId).pool;
const targetPool = this.shards.get(targetShardId).pool;
const targetRanges = this.shards.get(targetShardId).ranges;
// Find users that should be on target shard based on new virtual shard mapping
const usersToMigrate = await sourcePool.query(
"SELECT id, username, email, profile_data, created_at FROM users"
);
const migratePromises = [];
for (const user of usersToMigrate.rows) {
const virtualShard = this.hashUserId(user.id) % this.virtualShards;
if (virtualShard >= targetRanges[0] && virtualShard <= targetRanges[1]) {
migratePromises.push(this.migrateUser(user, sourcePool, targetPool));
}
}
await Promise.all(migratePromises);
console.log(
`Migrated ${migratePromises.length} users from ${sourceShardId} to ${targetShardId}`
);
}
async migrateUser(user, sourcePool, targetPool) {
try {
// Start transaction on target shard
const targetClient = await targetPool.connect();
const sourceClient = await sourcePool.connect();
try {
await targetClient.query("BEGIN");
await sourceClient.query("BEGIN");
// Copy user to target shard
await targetClient.query(
`INSERT INTO users (id, username, email, profile_data, created_at)
VALUES ($1, $2, $3, $4, $5)`,
[
user.id,
user.username,
user.email,
user.profile_data,
user.created_at,
]
);
// Copy user's posts
const posts = await sourceClient.query(
"SELECT * FROM posts WHERE author_id = $1",
[user.id]
);
for (const post of posts.rows) {
await targetClient.query(
`INSERT INTO posts (id, author_id, title, content, created_at, view_count, like_count)
VALUES ($1, $2, $3, $4, $5, $6, $7)`,
[
post.id,
post.author_id,
post.title,
post.content,
post.created_at,
post.view_count,
post.like_count,
]
);
}
// Copy user's relationships
const follows = await sourceClient.query(
"SELECT * FROM user_follows WHERE follower_id = $1 OR followed_id = $1",
[user.id]
);
for (const follow of follows.rows) {
await targetClient.query(
"INSERT INTO user_follows (follower_id, followed_id, created_at) VALUES ($1, $2, $3)",
[follow.follower_id, follow.followed_id, follow.created_at]
);
}
// Delete from source shard
await sourceClient.query(
"DELETE FROM user_follows WHERE follower_id = $1 OR followed_id = $1",
[user.id]
);
await sourceClient.query("DELETE FROM posts WHERE author_id = $1", [
user.id,
]);
await sourceClient.query("DELETE FROM users WHERE id = $1", [user.id]);
await targetClient.query("COMMIT");
await sourceClient.query("COMMIT");
} catch (error) {
await targetClient.query("ROLLBACK");
await sourceClient.query("ROLLBACK");
throw error;
} finally {
targetClient.release();
sourceClient.release();
}
} catch (error) {
console.error(`Failed to migrate user ${user.id}:`, error);
throw error;
}
}
}
Database Clustering: High Availability and Failover
Master-Master Clustering with Automatic Failover
Production-grade cluster management:
// High availability cluster with automatic failover
class DatabaseCluster {
constructor() {
this.nodes = new Map();
this.currentMaster = null;
this.clusterState = "initializing";
this.failoverInProgress = false;
this.healthCheckInterval = 5000; // 5 seconds
this.failoverTimeout = 30000; // 30 seconds
this.setupClusterNodes();
this.startHealthMonitoring();
this.setupFailoverLogic();
}
setupClusterNodes() {
const nodeConfigs = [
{
id: "node_1",
host: "db-node-1.cluster.local",
port: 5432,
role: "master",
priority: 100, // Higher priority = preferred master
},
{
id: "node_2",
host: "db-node-2.cluster.local",
port: 5432,
role: "standby",
priority: 90,
},
{
id: "node_3",
host: "db-node-3.cluster.local",
port: 5432,
role: "standby",
priority: 80,
},
];
nodeConfigs.forEach((config) => {
this.nodes.set(config.id, {
id: config.id,
host: config.host,
port: config.port,
role: config.role,
priority: config.priority,
pool: new Pool({
host: config.host,
port: config.port,
database: process.env.DB_NAME,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
max: 20,
connectionTimeoutMillis: 5000,
idleTimeoutMillis: 30000,
}),
isHealthy: true,
lastHealthCheck: null,
lagMs: 0,
consecutiveFailures: 0,
});
});
// Set initial master
this.currentMaster = this.findPreferredMaster();
this.clusterState = "healthy";
}
findPreferredMaster() {
const healthyNodes = Array.from(this.nodes.values())
.filter((node) => node.isHealthy)
.sort((a, b) => b.priority - a.priority);
return healthyNodes.length > 0 ? healthyNodes[0].id : null;
}
async startHealthMonitoring() {
setInterval(async () => {
await this.performHealthChecks();
if (!this.failoverInProgress) {
await this.checkFailoverConditions();
}
}, this.healthCheckInterval);
}
async performHealthChecks() {
const healthPromises = Array.from(this.nodes.entries()).map(
async ([nodeId, node]) => {
try {
const startTime = Date.now();
// Check basic connectivity
const result = await node.pool.query(
"SELECT pg_is_in_recovery(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), now()"
);
const row = result.rows[0];
const responseTime = Date.now() - startTime;
// Update node status
node.isHealthy = true;
node.lastHealthCheck = Date.now();
node.consecutiveFailures = 0;
node.responseTime = responseTime;
node.isStandby = row.pg_is_in_recovery;
// Calculate replication lag for standby nodes
if (node.isStandby && this.currentMaster) {
node.lagMs = await this.calculateReplicationLag(nodeId);
}
return { nodeId, status: "healthy", responseTime, lag: node.lagMs };
} catch (error) {
console.error(`Health check failed for ${nodeId}:`, error.message);
node.isHealthy = false;
node.consecutiveFailures++;
return { nodeId, status: "unhealthy", error: error.message };
}
}
);
const results = await Promise.all(healthPromises);
// Log cluster health summary
const healthyCount = results.filter((r) => r.status === "healthy").length;
console.log(
`Cluster health: ${healthyCount}/${results.length} nodes healthy`
);
}
async calculateReplicationLag(standbyNodeId) {
try {
const masterNode = this.nodes.get(this.currentMaster);
const standbyNode = this.nodes.get(standbyNodeId);
if (!masterNode || !standbyNode) return 0;
const [masterResult, standbyResult] = await Promise.all([
masterNode.pool.query("SELECT pg_current_wal_lsn()"),
standbyNode.pool.query("SELECT pg_last_wal_replay_lsn()"),
]);
const masterLsn = masterResult.rows[0].pg_current_wal_lsn;
const standbyLsn = standbyResult.rows[0].pg_last_wal_replay_lsn;
// Calculate LSN difference (simplified)
const lagBytes = this.calculateLsnDifference(masterLsn, standbyLsn);
// Estimate lag in milliseconds (rough approximation)
const estimatedLagMs = lagBytes / 1024; // 1KB = ~1ms lag
return Math.max(0, estimatedLagMs);
} catch (error) {
console.warn(
`Failed to calculate replication lag for ${standbyNodeId}:`,
error
);
return 0;
}
}
calculateLsnDifference(masterLsn, standbyLsn) {
// Simplified LSN difference calculation
// In production, use proper PostgreSQL LSN arithmetic
return 0; // Placeholder
}
async checkFailoverConditions() {
const masterNode = this.nodes.get(this.currentMaster);
if (!masterNode || !masterNode.isHealthy) {
console.warn(
`Master node ${this.currentMaster} is unhealthy - initiating failover`
);
await this.initiateFailover("master_unhealthy");
return;
}
// Check for high replication lag
const standbyNodes = Array.from(this.nodes.values()).filter(
(node) => node.id !== this.currentMaster && node.isHealthy
);
const highLagThreshold = 10000; // 10 seconds
const nodesWithHighLag = standbyNodes.filter(
(node) => node.lagMs > highLagThreshold
);
if (nodesWithHighLag.length > 0) {
console.warn(
`High replication lag detected on ${nodesWithHighLag.length} standby nodes`
);
// Don't failover for lag alone, but alert
this.alertHighReplicationLag(nodesWithHighLag);
}
}
async initiateFailover(reason) {
if (this.failoverInProgress) {
console.log("Failover already in progress");
return;
}
this.failoverInProgress = true;
this.clusterState = "failover";
try {
console.log(`Starting failover process. Reason: ${reason}`);
// 1. Find best candidate for new master
const newMaster = this.selectFailoverCandidate();
if (!newMaster) {
throw new Error("No healthy standby nodes available for failover");
}
console.log(`Selected ${newMaster.id} as new master`);
// 2. Promote standby to master
await this.promoteStandbyToMaster(newMaster.id);
// 3. Update cluster configuration
this.currentMaster = newMaster.id;
newMaster.role = "master";
// 4. Reconfigure remaining nodes
await this.reconfigureStandbyNodes(newMaster.id);
// 5. Update application connection strings
await this.updateApplicationConfiguration(newMaster);
console.log(
`Failover completed successfully. New master: ${newMaster.id}`
);
this.clusterState = "healthy";
// 6. Alert operations team
this.alertFailoverComplete(newMaster.id, reason);
} catch (error) {
console.error("Failover failed:", error);
this.clusterState = "failed";
this.alertFailoverFailed(error);
} finally {
this.failoverInProgress = false;
}
}
selectFailoverCandidate() {
const healthyStandbys = Array.from(this.nodes.values())
.filter(
(node) =>
node.id !== this.currentMaster &&
node.isHealthy &&
node.consecutiveFailures < 3
)
.sort((a, b) => {
// Prefer nodes with lower lag and higher priority
if (a.lagMs !== b.lagMs) {
return a.lagMs - b.lagMs;
}
return b.priority - a.priority;
});
return healthyStandbys[0] || null;
}
async promoteStandbyToMaster(nodeId) {
const node = this.nodes.get(nodeId);
try {
// Execute promotion command (PostgreSQL specific)
await node.pool.query("SELECT pg_promote()");
// Wait for promotion to complete
let attempts = 0;
const maxAttempts = 30; // 30 seconds timeout
while (attempts < maxAttempts) {
try {
const result = await node.pool.query("SELECT pg_is_in_recovery()");
if (!result.rows[0].pg_is_in_recovery) {
console.log(`Node ${nodeId} successfully promoted to master`);
return;
}
} catch (error) {
// Node might be restarting during promotion
console.log(
`Waiting for promotion to complete (attempt ${attempts + 1})`
);
}
await new Promise((resolve) => setTimeout(resolve, 1000));
attempts++;
}
throw new Error(
`Promotion timeout - node ${nodeId} failed to become master`
);
} catch (error) {
console.error(`Failed to promote ${nodeId} to master:`, error);
throw error;
}
}
async reconfigureStandbyNodes(newMasterId) {
const newMaster = this.nodes.get(newMasterId);
const standbyPromises = Array.from(this.nodes.values())
.filter((node) => node.id !== newMasterId && node.isHealthy)
.map(async (node) => {
try {
// Reconfigure standby to follow new master
await this.reconfigureStandbyReplication(node.id, newMaster);
node.role = "standby";
console.log(`Reconfigured ${node.id} to replicate from new master`);
} catch (error) {
console.error(`Failed to reconfigure standby ${node.id}:`, error);
}
});
await Promise.all(standbyPromises);
}
async reconfigureStandbyReplication(standbyId, newMaster) {
// Implementation depends on PostgreSQL setup (streaming replication, etc.)
// This would typically involve updating recovery configuration
// and restarting replication connections
}
// Application interface - automatically routes to current master
async executeWrite(query, params = []) {
const masterNode = this.nodes.get(this.currentMaster);
if (!masterNode || !masterNode.isHealthy) {
throw new Error("No healthy master node available");
}
return await masterNode.pool.query(query, params);
}
async executeRead(query, params = []) {
// Try to use a healthy standby first to reduce master load
const healthyStandbys = Array.from(this.nodes.values())
.filter(
(node) =>
node.id !== this.currentMaster && node.isHealthy && node.lagMs < 5000 // Less than 5 second lag
)
.sort((a, b) => a.lagMs - b.lagMs);
if (healthyStandbys.length > 0) {
const selectedStandby = healthyStandbys[0];
return await selectedStandby.pool.query(query, params);
}
// Fallback to master if no suitable standby
return await this.executeWrite(query, params);
}
getClusterStatus() {
const nodeStatuses = Array.from(this.nodes.entries()).map(
([nodeId, node]) => ({
id: nodeId,
role: node.role,
isHealthy: node.isHealthy,
isCurrent: nodeId === this.currentMaster,
responseTime: node.responseTime,
lagMs: node.lagMs,
consecutiveFailures: node.consecutiveFailures,
})
);
return {
state: this.clusterState,
currentMaster: this.currentMaster,
failoverInProgress: this.failoverInProgress,
nodes: nodeStatuses,
};
}
}
Performance Monitoring and Optimization at Scale
Comprehensive Performance Monitoring System
Real-time performance dashboard:
// Advanced performance monitoring for distributed databases
class DatabasePerformanceMonitor {
constructor(databaseCluster) {
this.cluster = databaseCluster;
this.metrics = new Map();
this.alertRules = new Map();
this.performanceHistory = [];
this.setupMetricsCollection();
this.setupAlertingRules();
this.startMonitoring();
}
setupMetricsCollection() {
// Key performance indicators to track
this.kpis = {
// Query performance
avgQueryTime: { threshold: 100, unit: "ms" },
slowQueryCount: { threshold: 10, unit: "per_minute" },
queryErrorRate: { threshold: 0.01, unit: "percentage" },
// Connection metrics
activeConnections: { threshold: 180, unit: "count" },
connectionWaitTime: { threshold: 50, unit: "ms" },
connectionErrors: { threshold: 5, unit: "per_minute" },
// Resource utilization
cpuUsage: { threshold: 80, unit: "percentage" },
memoryUsage: { threshold: 85, unit: "percentage" },
diskUsage: { threshold: 90, unit: "percentage" },
diskIO: { threshold: 1000, unit: "iops" },
// Replication metrics
replicationLag: { threshold: 5000, unit: "ms" },
replicationErrors: { threshold: 1, unit: "per_minute" },
// Business metrics
userQueriesPerSecond: { threshold: 1000, unit: "qps" },
dataGrowthRate: { threshold: 10, unit: "gb_per_day" },
};
}
async startMonitoring() {
// Collect metrics every 30 seconds
setInterval(async () => {
await this.collectMetrics();
}, 30000);
// Generate performance reports every hour
setInterval(async () => {
await this.generatePerformanceReport();
}, 3600000);
// Check for optimization opportunities daily
setInterval(async () => {
await this.analyzeOptimizationOpportunities();
}, 24 * 3600000);
}
async collectMetrics() {
const timestamp = Date.now();
const clusterMetrics = {
timestamp,
nodes: new Map(),
aggregate: {},
};
// Collect metrics from each node
for (const [nodeId, node] of this.cluster.nodes) {
if (!node.isHealthy) continue;
try {
const nodeMetrics = await this.collectNodeMetrics(nodeId, node);
clusterMetrics.nodes.set(nodeId, nodeMetrics);
} catch (error) {
console.error(`Failed to collect metrics for ${nodeId}:`, error);
}
}
// Calculate aggregate metrics
clusterMetrics.aggregate = this.calculateAggregateMetrics(
clusterMetrics.nodes
);
// Store in history
this.performanceHistory.push(clusterMetrics);
// Keep only last 24 hours of data
const cutoff = timestamp - 24 * 60 * 60 * 1000;
this.performanceHistory = this.performanceHistory.filter(
(m) => m.timestamp > cutoff
);
// Check alert conditions
await this.checkAlertConditions(clusterMetrics.aggregate);
// Send to external monitoring system
await this.sendToMonitoringSystem(clusterMetrics);
}
async collectNodeMetrics(nodeId, node) {
const metrics = {};
// Database-specific metrics
const dbStatsResult = await node.pool.query(`
SELECT
-- Connection stats
(SELECT count(*) FROM pg_stat_activity WHERE state = 'active') as active_connections,
(SELECT count(*) FROM pg_stat_activity WHERE state = 'idle in transaction') as idle_in_transaction,
-- Query stats
(SELECT avg(mean_exec_time) FROM pg_stat_statements WHERE calls > 10) as avg_query_time,
(SELECT count(*) FROM pg_stat_statements WHERE mean_exec_time > 1000) as slow_query_count,
-- Database size
pg_database_size(current_database()) as db_size_bytes,
-- Cache hit ratio
(SELECT sum(heap_blks_hit) / nullif(sum(heap_blks_hit + heap_blks_read), 0) FROM pg_statio_user_tables) as cache_hit_ratio,
-- Transaction stats
(SELECT sum(xact_commit) FROM pg_stat_database WHERE datname = current_database()) as commits,
(SELECT sum(xact_rollback) FROM pg_stat_database WHERE datname = current_database()) as rollbacks,
-- Lock stats
(SELECT count(*) FROM pg_locks WHERE NOT granted) as waiting_locks,
-- Replication lag (if standby)
CASE WHEN pg_is_in_recovery() THEN
EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) * 1000
ELSE 0 END as replication_lag_ms
`);
const row = dbStatsResult.rows[0];
metrics.database = {
activeConnections: parseInt(row.active_connections) || 0,
idleInTransaction: parseInt(row.idle_in_transaction) || 0,
avgQueryTime: parseFloat(row.avg_query_time) || 0,
slowQueryCount: parseInt(row.slow_query_count) || 0,
dbSizeGB: parseFloat(row.db_size_bytes) / (1024 * 1024 * 1024),
cacheHitRatio: parseFloat(row.cache_hit_ratio) || 0,
commits: parseInt(row.commits) || 0,
rollbacks: parseInt(row.rollbacks) || 0,
waitingLocks: parseInt(row.waiting_locks) || 0,
replicationLagMs: parseFloat(row.replication_lag_ms) || 0,
};
// System metrics (would typically come from system monitoring agent)
metrics.system = await this.collectSystemMetrics(nodeId);
// Application metrics
metrics.application = await this.collectApplicationMetrics(nodeId);
return metrics;
}
async collectSystemMetrics(nodeId) {
// In production, this would integrate with system monitoring tools
// like Prometheus, DataDog agents, or CloudWatch
return {
cpuUsage: Math.random() * 100, // Placeholder
memoryUsage: Math.random() * 100,
diskUsage: Math.random() * 100,
diskIOPS: Math.random() * 1000,
networkBytesPerSec: Math.random() * 1024 * 1024 * 100, // 100MB/s max
};
}
async collectApplicationMetrics(nodeId) {
// Application-level metrics from your monitoring system
return {
requestsPerSecond: Math.random() * 1000,
errorRate: Math.random() * 0.05, // 5% max
responseTime: Math.random() * 200, // 200ms max
};
}
calculateAggregateMetrics(nodeMetrics) {
const values = Array.from(nodeMetrics.values());
if (values.length === 0) return {};
const aggregate = {};
// Sum metrics
const sumFields = [
"activeConnections",
"slowQueryCount",
"commits",
"rollbacks",
];
sumFields.forEach((field) => {
aggregate[field] = values.reduce(
(sum, node) => sum + (node.database[field] || 0),
0
);
});
// Average metrics
const avgFields = [
"avgQueryTime",
"cacheHitRatio",
"cpuUsage",
"memoryUsage",
"diskUsage",
];
avgFields.forEach((field) => {
const sourceObj = field.includes("Usage") ? "system" : "database";
const validValues = values
.map((node) => node[sourceObj][field])
.filter((val) => val != null && !isNaN(val));
if (validValues.length > 0) {
aggregate[field] =
validValues.reduce((sum, val) => sum + val, 0) / validValues.length;
} else {
aggregate[field] = 0;
}
});
// Max metrics
const maxFields = ["replicationLagMs", "waitingLocks"];
maxFields.forEach((field) => {
aggregate[field] = Math.max(
...values.map((node) => node.database[field] || 0)
);
});
// Calculate derived metrics
aggregate.errorRate =
aggregate.rollbacks /
Math.max(aggregate.commits + aggregate.rollbacks, 1);
aggregate.connectionUtilization =
aggregate.activeConnections / (values.length * 200); // Assume 200 max per node
return aggregate;
}
async checkAlertConditions(metrics) {
const alerts = [];
// Check each KPI against thresholds
Object.entries(this.kpis).forEach(([metricName, config]) => {
const currentValue = metrics[metricName];
if (currentValue != null && currentValue > config.threshold) {
alerts.push({
metric: metricName,
currentValue,
threshold: config.threshold,
unit: config.unit,
severity: this.calculateAlertSeverity(currentValue, config.threshold),
timestamp: Date.now(),
});
}
});
// Send alerts if any
if (alerts.length > 0) {
await this.sendAlerts(alerts);
}
return alerts;
}
calculateAlertSeverity(value, threshold) {
const ratio = value / threshold;
if (ratio > 2) return "critical";
if (ratio > 1.5) return "warning";
return "info";
}
async generatePerformanceReport() {
const now = Date.now();
const oneHourAgo = now - 60 * 60 * 1000;
// Get metrics from last hour
const hourlyMetrics = this.performanceHistory.filter(
(m) => m.timestamp >= oneHourAgo
);
if (hourlyMetrics.length < 2) return; // Not enough data
const report = {
period: { start: oneHourAgo, end: now },
summary: this.calculatePeriodSummary(hourlyMetrics),
trends: this.calculateTrends(hourlyMetrics),
topQueries: await this.getTopQueries(),
recommendations: await this.generateRecommendations(hourlyMetrics),
};
console.log("=== Hourly Performance Report ===");
console.log(
`Period: ${new Date(oneHourAgo).toISOString()} - ${new Date(
now
).toISOString()}`
);
console.log(
`Average Query Time: ${report.summary.avgQueryTime.toFixed(2)}ms`
);
console.log(`Slow Queries: ${report.summary.slowQueryCount}`);
console.log(
`Connection Utilization: ${(
report.summary.connectionUtilization * 100
).toFixed(1)}%`
);
console.log(
`Cache Hit Ratio: ${(report.summary.cacheHitRatio * 100).toFixed(1)}%`
);
if (report.recommendations.length > 0) {
console.log("\n=== Recommendations ===");
report.recommendations.forEach((rec, index) => {
console.log(`${index + 1}. ${rec.title}: ${rec.description}`);
});
}
// Store report for later analysis
await this.storePerformanceReport(report);
}
async analyzeOptimizationOpportunities() {
console.log("Analyzing optimization opportunities...");
const opportunities = [];
// Check for missing indexes
const missingIndexes = await this.findMissingIndexes();
opportunities.push(...missingIndexes);
// Check for unused indexes
const unusedIndexes = await this.findUnusedIndexes();
opportunities.push(...unusedIndexes);
// Check for table bloat
const bloatedTables = await this.findBloatedTables();
opportunities.push(...bloatedTables);
// Check for inefficient queries
const inefficientQueries = await this.findInefficientQueries();
opportunities.push(...inefficientQueries);
if (opportunities.length > 0) {
console.log(`Found ${opportunities.length} optimization opportunities:`);
opportunities.forEach((opp, index) => {
console.log(
`${index + 1}. ${opp.type}: ${opp.description} (Impact: ${
opp.impact
})`
);
});
// Generate optimization script
await this.generateOptimizationScript(opportunities);
} else {
console.log("No immediate optimization opportunities found.");
}
}
async findMissingIndexes() {
const opportunities = [];
// This would analyze query patterns and suggest indexes
// Implementation would examine pg_stat_statements and identify
// frequently executed queries with high cost sequential scans
return opportunities;
}
async findUnusedIndexes() {
const opportunities = [];
try {
const result = await this.cluster.executeRead(`
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan < 10
AND pg_relation_size(indexrelid) > 1024 * 1024 -- Larger than 1MB
ORDER BY pg_relation_size(indexrelid) DESC
`);
result.rows.forEach((row) => {
opportunities.push({
type: "unused_index",
description: `Index ${row.indexname} on ${row.tablename} is rarely used (${row.idx_scan} scans) but takes ${row.size}`,
impact: "medium",
action: `Consider dropping index: DROP INDEX ${row.indexname};`,
});
});
} catch (error) {
console.error("Failed to find unused indexes:", error);
}
return opportunities;
}
async findBloatedTables() {
const opportunities = [];
try {
const result = await this.cluster.executeRead(`
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
CASE WHEN n_live_tup > 0
THEN n_dead_tup::float / n_live_tup::float
ELSE 0
END as bloat_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
AND n_dead_tup::float / GREATEST(n_live_tup::float, 1) > 0.1
ORDER BY bloat_ratio DESC
`);
result.rows.forEach((row) => {
opportunities.push({
type: "table_bloat",
description: `Table ${row.tablename} has high bloat ratio (${(
row.bloat_ratio * 100
).toFixed(1)}%) with ${row.n_dead_tup} dead tuples`,
impact: "high",
action: `Consider running: VACUUM FULL ${row.tablename}; or REINDEX TABLE ${row.tablename};`,
});
});
} catch (error) {
console.error("Failed to find bloated tables:", error);
}
return opportunities;
}
}
// Usage
const performanceMonitor = new DatabasePerformanceMonitor(databaseCluster);
// API endpoints for performance dashboard
app.get("/admin/performance", async (req, res) => {
try {
const status = performanceMonitor.cluster.getClusterStatus();
const recentMetrics = performanceMonitor.performanceHistory.slice(-10);
res.json({
clusterStatus: status,
recentMetrics,
kpis: performanceMonitor.kpis,
});
} catch (error) {
res.status(500).json({ error: "Failed to get performance data" });
}
});
Key Takeaways
Database scaling separates applications that grow from those that scale. Read replicas distribute query load across multiple servers, sharding enables horizontal data distribution, clustering provides high availability, and comprehensive monitoring ensures performance remains optimal as scale increases.
The database scaling mindset you need:
- Horizontal beats vertical: Adding more servers scales better than adding more power to one server
- Read scaling comes first: Most applications are read-heavy, so read replicas provide immediate scaling benefits
- Sharding is inevitable: Eventually, your data will outgrow any single database server’s capabilities
- High availability requires redundancy: Single points of failure will cause outages - design for automatic failover
What distinguishes scalable database architectures:
- Read replica strategies that handle 10x-100x query load increases without degrading performance
- Sharding systems that distribute data intelligently while maintaining query performance
- Clustering configurations that provide automatic failover with minimal downtime
- Monitoring systems that identify performance bottlenecks before they impact users
Series Conclusion
We’ve completed our comprehensive journey through database fundamentals, advanced data management, and scaling strategies. You now possess the complete toolkit for building data architectures that scale from thousands to millions of users while maintaining performance, consistency, and reliability.
The complete database architect’s mastery:
- Relational expertise with complex queries, transactions, and data integrity
- NoSQL proficiency across document, key-value, column-family, and graph paradigms
- Advanced data orchestration through caching, search, synchronization, and event processing
- Scaling architectures with read replicas, sharding, clustering, and performance monitoring
- Operational excellence in connection management, migrations, backups, and optimization
What’s Next
With database mastery complete, we move into the security and authentication phase of backend development. You’ll learn to protect applications and data with proper authentication systems, authorization patterns, secure coding practices, and threat mitigation strategies.
Data is the foundation. Security is the protection. Together, they enable you to build backend systems that not only perform flawlessly at scale but also protect user data and maintain trust in an increasingly connected world.
You’re no longer just storing and querying data—you’re architecting information systems that scale infinitely, perform consistently, and operate reliably under any conditions. The data layer is complete. Now we secure what we’ve built.