Advanced Database Concepts
The $8.3 Million Database Performance Disaster That Killed Black Friday
Picture this database catastrophe: A major e-commerce platform with 50 million products and 200 million users approaches Black Friday with their “bulletproof” database architecture. Their database team, fresh from a MongoDB conference, decides to completely redesign their product catalog using the latest NoSQL patterns and “advanced indexing strategies” without understanding the fundamentals.
Black Friday morning arrives like a digital apocalypse:
The symptoms were financially devastating:
- Database response times increased 47,000%: Simple product lookups that took 2ms were now taking 15+ seconds
- Revenue loss hit $8.3 million in 6 hours: Customer abandonment spiked to 94% due to page load failures
- Database servers consumed 2.4TB of RAM: Poorly designed indexes were eating memory faster than they could provision it
- Query execution costs reached $180,000/hour: Complex aggregation queries were scanning entire collections repeatedly
- Storage costs exploded to 15x normal: Redundant indexes and poor schema design inflated data size by 1,400%
- Customer service received 847,000 complaints: Users couldn’t browse products, complete purchases, or access their accounts
Here’s what their expensive database audit revealed:
- Index explosion without strategy: They created 847 indexes across 23 collections, most of which were never used and slowed down writes by 89%
- Complete misunderstanding of storage engines: They used memory-mapped storage for time-series data and row-based storage for analytics
- Trigger cascade nightmares: A single product update was triggering 73 database triggers, creating write amplification disasters
- Wrong database for the job: They used graph databases for simple key-value lookups and relational databases for graph traversals
- No query plan analysis: Complex aggregation queries were performing full table scans on 50 million record collections
- Storage engine mismatch: They used InnoDB for append-only logs and MyISAM for high-concurrency transactional data
The final damage:
- $8.3 million lost revenue in a single Black Friday, destroying quarterly projections
- 47% customer trust rating drop as users experienced consistent 30-second page loads
- Database infrastructure costs increased 890% trying to scale their way out of performance problems
- 6 months of engineering time spent rebuilding the entire data architecture from scratch
- Complete C-level restructure as the database decisions became a board-level crisis
The brutal truth? Every single database performance disaster could have been prevented with proper understanding of database internals, storage engines, and indexing strategies.
The Uncomfortable Truth About Advanced Database Concepts
Here’s what separates database architectures that scale gracefully from those that collapse under their own complexity: Advanced database features are powerful tools that become expensive liabilities when you don’t understand the underlying storage engines, indexing mechanisms, and query execution patterns. The more advanced features you use, the deeper your understanding of database internals needs to be.
Most developers approach advanced database concepts like this:
- Add more indexes thinking it will always improve performance
- Use triggers and stored procedures without understanding execution context
- Choose databases based on hype instead of workload characteristics
- Ignore query execution plans and storage engine behavior
- Assume that NoSQL automatically means better performance
But developers who build truly performant database systems work differently:
- Understand storage engine characteristics and match them to specific workload patterns
- Design indexes strategically based on query patterns and write/read ratios
- Choose the right database type based on data access patterns, not technology trends
- Monitor query execution plans continuously and optimize based on actual performance data
- Understand the performance implications of every advanced feature before implementing it
The difference isn’t just query response times—it’s the difference between systems that handle growth gracefully and systems that require complete rebuilds every time traffic increases.
Ready to build database architectures that actually leverage advanced concepts for performance gains instead of creating expensive bottlenecks? Let’s dive into database internals that work in production.
Database Storage Engines: Choosing the Right Foundation
The Problem: Storage Engine Mismatch That Destroys Performance
-- The storage engine nightmare that makes everything slow
-- Using InnoDB for append-only time-series data - RED FLAG #1
CREATE TABLE sensor_readings (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
sensor_id VARCHAR(50),
timestamp BIGINT,
temperature DECIMAL(5,2),
humidity DECIMAL(5,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- InnoDB overhead for data that's never updated - RED FLAG #2
INDEX idx_sensor_timestamp (sensor_id, timestamp),
INDEX idx_created_at (created_at),
INDEX idx_temperature (temperature),
INDEX idx_humidity (humidity)
) ENGINE=InnoDB;
-- Using MyISAM for high-concurrency transactional data - RED FLAG #3
CREATE TABLE user_sessions (
session_id VARCHAR(128) PRIMARY KEY,
user_id BIGINT,
session_data TEXT,
last_activity TIMESTAMP,
-- No transaction support for critical user data - RED FLAG #4
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM;
-- Using memory engine for persistent data - RED FLAG #5
CREATE TABLE application_config (
config_key VARCHAR(100) PRIMARY KEY,
config_value TEXT,
updated_at TIMESTAMP
) ENGINE=MEMORY;
-- MongoDB collection without considering access patterns - RED FLAG #6
db.products.createIndex({
"name": "text", // Text search on every field - expensive
"description": "text",
"category": "text",
"tags": "text"
});
-- Wrong index strategy for time-series queries - RED FLAG #7
db.analytics.createIndex({
"userId": 1, // Leading with high-cardinality field
"eventType": 1,
"timestamp": -1
});
-- Problems this creates:
-- - Time-series inserts become 10x slower due to InnoDB transaction overhead
-- - Session updates cause table locks with MyISAM during high traffic
-- - Configuration data disappears on server restart with MEMORY engine
-- - Text indexes consume massive amounts of memory and slow down writes
-- - Analytics queries scan millions of records due to poor index design
The Solution: Strategic Storage Engine Selection with Proper Configuration
// Advanced database architecture with optimal storage engine selection
import { createConnection, Connection } from "mysql2/promise";
import { MongoClient, Db } from "mongodb";
import { createClient, RedisClientType } from "redis";
// Storage engine strategy for different data patterns
export class DatabaseArchitect {
private mysqlConnection: Connection;
private mongoDb: Db;
private redisClient: RedisClientType;
private timeseries: any; // TimescaleDB or InfluxDB client
constructor() {
this.initializeConnections();
}
private async initializeConnections() {
// MySQL with optimized engine configurations
this.mysqlConnection = await createConnection({
host: process.env.MYSQL_HOST,
user: process.env.MYSQL_USER,
password: process.env.MYSQL_PASSWORD,
database: process.env.MYSQL_DATABASE,
// Connection pool optimization
connectionLimit: 100,
acquireTimeout: 60000,
timeout: 60000,
});
// MongoDB with proper read/write concerns
const mongoClient = new MongoClient(process.env.MONGODB_URI!, {
maxPoolSize: 50,
minPoolSize: 5,
maxIdleTimeMS: 30000,
serverSelectionTimeoutMS: 5000,
// Write concern for performance vs durability balance
writeConcern: { w: 1, j: true, wtimeout: 5000 },
readConcern: { level: "local" },
});
await mongoClient.connect();
this.mongoDb = mongoClient.db("ecommerce");
// Redis for high-performance caching and real-time data
this.redisClient = createClient({
url: process.env.REDIS_URL,
socket: {
connectTimeout: 5000,
lazyConnect: true,
},
});
}
// ✅ Optimal: InnoDB for transactional data with proper indexing
async setupTransactionalTables(): Promise<void> {
// User accounts - ACID properties essential
await this.mysqlConnection.execute(`
CREATE TABLE IF NOT EXISTS users (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
profile_data JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
-- Strategic indexing for common queries
INDEX idx_email_active (email),
INDEX idx_created_at (created_at),
-- JSON indexing for profile queries
INDEX idx_profile_name ((JSON_UNQUOTE(JSON_EXTRACT(profile_data, '$.name')))),
INDEX idx_profile_location ((JSON_UNQUOTE(JSON_EXTRACT(profile_data, '$.location'))))
) ENGINE=InnoDB
ROW_FORMAT=COMPRESSED
KEY_BLOCK_SIZE=8
COMMENT='User accounts requiring ACID properties';
`);
// Orders - Complex transactions requiring consistency
await this.mysqlConnection.execute(`
CREATE TABLE IF NOT EXISTS orders (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT,
order_status ENUM('pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled'),
total_amount DECIMAL(10,2),
payment_data JSON,
shipping_address JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
-- Compound indexes for common query patterns
INDEX idx_user_status_date (user_id, order_status, created_at),
INDEX idx_status_date (order_status, created_at),
INDEX idx_amount_date (total_amount, created_at),
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
) ENGINE=InnoDB
ROW_FORMAT=COMPRESSED
COMMENT='Order transactions requiring strong consistency';
`);
}
// ✅ Optimal: Column store for analytics and time-series data
async setupAnalyticsSchema(): Promise<void> {
// Time-series data optimized for append-heavy workloads
await this.mysqlConnection.execute(`
CREATE TABLE IF NOT EXISTS user_events (
id BIGINT AUTO_INCREMENT,
user_id BIGINT,
event_type VARCHAR(50),
event_data JSON,
session_id VARCHAR(128),
ip_address INET,
user_agent TEXT,
timestamp BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-- Time-series optimized indexing
PRIMARY KEY (id, timestamp),
INDEX idx_user_time_type (user_id, timestamp, event_type),
INDEX idx_session_time (session_id, timestamp),
INDEX idx_event_type_time (event_type, timestamp)
) ENGINE=InnoDB
PARTITION BY RANGE (timestamp) (
PARTITION p202401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01')),
PARTITION p202403 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
)
COMMENT='Partitioned time-series data for analytics';
`);
// Product analytics using specialized time-series approach
const timeseriesSchema = `
CREATE TABLE IF NOT EXISTS product_metrics (
time TIMESTAMPTZ NOT NULL,
product_id BIGINT,
views INTEGER DEFAULT 0,
purchases INTEGER DEFAULT 0,
revenue DECIMAL(10,2) DEFAULT 0,
conversion_rate DECIMAL(5,4),
-- Time-series specific indexes
PRIMARY KEY (time, product_id)
) ENGINE=InnoDB
PARTITION BY RANGE (YEAR(time)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Hypertable for time-series if using TimescaleDB
SELECT create_hypertable('product_metrics', 'time',
chunk_time_interval => INTERVAL '1 day',
if_not_exists => TRUE
);
`;
}
// ✅ Optimal: MongoDB for document-oriented data with proper indexing
async setupDocumentCollections(): Promise<void> {
// Product catalog - Complex nested data structures
const productsCollection = this.mongoDb.collection("products");
// Strategic compound indexes based on query patterns
await Promise.all([
// Primary search patterns
productsCollection.createIndex(
{
category: 1,
price: 1,
"inventory.quantity": 1,
},
{
name: "category_price_inventory",
background: true,
}
),
// Text search with proper language support
productsCollection.createIndex(
{
name: "text",
"description.short": "text",
},
{
name: "product_text_search",
weights: { name: 10, "description.short": 1 },
default_language: "english",
background: true,
}
),
// Geographic queries for local inventory
productsCollection.createIndex(
{ "inventory.locations": "2dsphere" },
{ name: "location_search", background: true }
),
// Sparse index for sales data (only products on sale)
productsCollection.createIndex(
{ "pricing.salePrice": 1, "pricing.saleEndDate": 1 },
{
name: "active_sales",
sparse: true,
partialFilterExpression: { "pricing.salePrice": { $exists: true } },
background: true,
}
),
]);
// User profiles - Semi-structured data with flexible schema
const userProfilesCollection = this.mongoDb.collection("user_profiles");
await Promise.all([
// Primary user lookup
userProfilesCollection.createIndex(
{ userId: 1 },
{ name: "user_id_lookup", unique: true, background: true }
),
// Preference-based recommendations
userProfilesCollection.createIndex(
{ "preferences.categories": 1, "behavior.purchaseHistory": 1 },
{ name: "recommendation_engine", background: true }
),
// Geographic personalization
userProfilesCollection.createIndex(
{ "location.coordinates": "2dsphere" },
{ name: "location_personalization", background: true }
),
]);
}
// ✅ Optimal: Redis for high-performance caching and real-time data
async setupRedisStructures(): Promise<void> {
await this.redisClient.connect();
// Session management with TTL
const sessionKey = "session:user:123";
await this.redisClient.hMSet(sessionKey, {
userId: "123",
loginTime: Date.now().toString(),
lastActivity: Date.now().toString(),
permissions: JSON.stringify(["read", "write"]),
});
await this.redisClient.expire(sessionKey, 3600); // 1 hour TTL
// Real-time leaderboards using sorted sets
await this.redisClient.zAdd("user_scores", [
{ score: 1500, value: "user:123" },
{ score: 1450, value: "user:456" },
{ score: 1400, value: "user:789" },
]);
// Product inventory counters with atomic operations
await this.redisClient.hMSet("inventory:product:456", {
quantity: "100",
reserved: "5",
available: "95",
});
// Rate limiting using sliding window
const rateLimitKey = "rate_limit:user:123";
await this.redisClient.zAdd(rateLimitKey, [
{ score: Date.now(), value: Date.now().toString() },
]);
await this.redisClient.expire(rateLimitKey, 3600);
}
// Advanced query optimization with storage engine awareness
async executeOptimizedQuery(queryType: string, params: any): Promise<any> {
switch (queryType) {
case "user_orders_with_products":
// Use InnoDB's transaction isolation for consistency
await this.mysqlConnection.beginTransaction();
try {
const [orders] = await this.mysqlConnection.execute(
`
SELECT
o.id, o.total_amount, o.order_status, o.created_at,
JSON_OBJECT(
'email', u.email,
'profile', u.profile_data
) as user_data
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.user_id = ?
AND o.created_at >= DATE_SUB(NOW(), INTERVAL ? DAY)
ORDER BY o.created_at DESC
LIMIT ?
`,
[params.userId, params.dayRange || 30, params.limit || 20]
);
await this.mysqlConnection.commit();
return orders;
} catch (error) {
await this.mysqlConnection.rollback();
throw error;
}
case "product_analytics_aggregation":
// Use MongoDB's aggregation pipeline for complex analytics
return await this.mongoDb
.collection("products")
.aggregate([
// Match stage with indexed fields first
{
$match: {
category: params.category,
"inventory.quantity": { $gt: 0 },
},
},
// Lookup related data efficiently
{
$lookup: {
from: "product_metrics",
let: { productId: "$_id" },
pipeline: [
{
$match: {
$expr: { $eq: ["$product_id", "$$productId"] },
time: {
$gte: new Date(params.startDate),
$lte: new Date(params.endDate),
},
},
},
{
$group: {
_id: null,
totalViews: { $sum: "$views" },
totalRevenue: { $sum: "$revenue" },
avgConversion: { $avg: "$conversion_rate" },
},
},
],
as: "metrics",
},
},
// Project with computed fields
{
$project: {
name: 1,
price: "$pricing.basePrice",
inventory: "$inventory.quantity",
metrics: { $arrayElemAt: ["$metrics", 0] },
},
},
// Sort by performance metrics
{
$sort: { "metrics.totalRevenue": -1 },
},
{ $limit: params.limit || 50 },
])
.toArray();
case "real_time_inventory_check":
// Use Redis for immediate response
const pipeline = this.redisClient.multi();
for (const productId of params.productIds) {
pipeline.hGetAll(`inventory:product:${productId}`);
}
const results = await pipeline.exec();
return results.map((result, index) => ({
productId: params.productIds[index],
inventory: result,
}));
default:
throw new Error(`Unknown query type: ${queryType}`);
}
}
// Database health monitoring and performance metrics
async getDatabaseMetrics(): Promise<DatabaseMetrics> {
// MySQL performance metrics
const [mysqlStats] = await this.mysqlConnection.execute(`
SHOW GLOBAL STATUS WHERE
Variable_name IN (
'Threads_connected', 'Threads_running', 'Queries',
'Slow_queries', 'Innodb_buffer_pool_hit_rate',
'Innodb_rows_read', 'Innodb_rows_inserted'
)
`);
// MongoDB metrics
const mongoStats = await this.mongoDb.admin().serverStatus();
// Redis metrics
const redisInfo = await this.redisClient.info("memory");
return {
mysql: {
connections: this.extractStatValue(mysqlStats, "Threads_connected"),
activeQueries: this.extractStatValue(mysqlStats, "Threads_running"),
slowQueries: this.extractStatValue(mysqlStats, "Slow_queries"),
bufferPoolHitRate: this.extractStatValue(
mysqlStats,
"Innodb_buffer_pool_hit_rate"
),
},
mongodb: {
connections: mongoStats.connections.current,
operations: mongoStats.opcounters,
memory: mongoStats.mem,
locks: mongoStats.locks,
},
redis: {
memoryUsed: this.parseRedisInfo(redisInfo, "used_memory"),
memoryPeak: this.parseRedisInfo(redisInfo, "used_memory_peak"),
connectedClients: await this.redisClient
.clientList()
.then((list) => list.length),
},
};
}
private extractStatValue(stats: any[], variableName: string): number {
const stat = stats.find((s: any) => s.Variable_name === variableName);
return stat ? parseInt(stat.Value) : 0;
}
private parseRedisInfo(info: string, key: string): number {
const match = info.match(new RegExp(`${key}:([0-9]+)`));
return match ? parseInt(match[1]) : 0;
}
}
// Storage engine decision framework
export class StorageEngineOptimizer {
static analyzeWorkload(
characteristics: WorkloadCharacteristics
): StorageRecommendation {
const recommendations: StorageRecommendation = {
primaryStorage: "InnoDB",
cachingLayer: "Redis",
analyticsStorage: "ColumnStore",
searchEngine: "Elasticsearch",
reasoning: [],
};
// Transactional requirements
if (characteristics.requiresACID) {
recommendations.primaryStorage = "InnoDB";
recommendations.reasoning.push(
"InnoDB selected for ACID transaction support"
);
}
// Read-heavy workloads
if (characteristics.readWriteRatio > 10) {
recommendations.cachingLayer = "Redis";
recommendations.reasoning.push(
"Redis caching for high read-to-write ratios"
);
if (characteristics.dataSize > "100GB") {
recommendations.primaryStorage = "MyISAM";
recommendations.reasoning.push(
"MyISAM for large read-heavy datasets without transactions"
);
}
}
// Time-series data
if (characteristics.isTimeSeries) {
recommendations.analyticsStorage = "TimescaleDB";
recommendations.reasoning.push(
"TimescaleDB for time-series data optimization"
);
}
// Complex document structures
if (characteristics.hasComplexDocuments) {
recommendations.primaryStorage = "MongoDB";
recommendations.reasoning.push("MongoDB for complex document structures");
}
// Graph relationships
if (characteristics.hasGraphRelationships) {
recommendations.primaryStorage = "Neo4j";
recommendations.reasoning.push("Neo4j for graph relationship queries");
}
// Full-text search requirements
if (characteristics.requiresFullTextSearch) {
recommendations.searchEngine = "Elasticsearch";
recommendations.reasoning.push(
"Elasticsearch for advanced full-text search"
);
}
return recommendations;
}
}
// Supporting interfaces
interface WorkloadCharacteristics {
requiresACID: boolean;
readWriteRatio: number;
dataSize: string;
isTimeSeries: boolean;
hasComplexDocuments: boolean;
hasGraphRelationships: boolean;
requiresFullTextSearch: boolean;
concurrencyLevel: "low" | "medium" | "high";
consistencyRequirements: "eventual" | "strong";
}
interface StorageRecommendation {
primaryStorage: string;
cachingLayer: string;
analyticsStorage: string;
searchEngine: string;
reasoning: string[];
}
interface DatabaseMetrics {
mysql: {
connections: number;
activeQueries: number;
slowQueries: number;
bufferPoolHitRate: number;
};
mongodb: {
connections: number;
operations: any;
memory: any;
locks: any;
};
redis: {
memoryUsed: number;
memoryPeak: number;
connectedClients: number;
};
}
Advanced Indexing Strategies: Beyond Basic Performance
The Problem: Index Strategies That Make Everything Slower
-- The indexing nightmare that destroys write performance
-- Over-indexing every possible column - RED FLAG #1
CREATE TABLE products (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
description TEXT,
price DECIMAL(10,2),
category_id BIGINT,
brand_id BIGINT,
sku VARCHAR(100),
weight DECIMAL(8,2),
dimensions JSON,
created_at TIMESTAMP,
updated_at TIMESTAMP,
-- Redundant and overlapping indexes - RED FLAG #2
INDEX idx_name (name),
INDEX idx_name_category (name, category_id),
INDEX idx_name_brand (name, brand_id),
INDEX idx_category (category_id),
INDEX idx_category_brand (category_id, brand_id),
INDEX idx_brand (brand_id),
INDEX idx_price (price),
INDEX idx_price_category (price, category_id),
INDEX idx_sku (sku),
INDEX idx_weight (weight),
INDEX idx_created (created_at),
INDEX idx_updated (updated_at),
INDEX idx_created_updated (created_at, updated_at),
-- Expensive functional indexes - RED FLAG #3
INDEX idx_json_width ((JSON_EXTRACT(dimensions, '$.width'))),
INDEX idx_json_height ((JSON_EXTRACT(dimensions, '$.height'))),
INDEX idx_json_depth ((JSON_EXTRACT(dimensions, '$.depth'))),
-- Full-text index on large text - RED FLAG #4
FULLTEXT INDEX idx_description_fulltext (description),
FULLTEXT INDEX idx_name_desc_fulltext (name, description)
);
-- MongoDB compound index anti-patterns - RED FLAG #5
db.products.createIndex({
"name": 1, // High cardinality first
"category": 1, // Lower selectivity
"price": 1,
"brand": 1,
"createdAt": -1
});
-- Wrong index order for query patterns - RED FLAG #6
db.analytics.createIndex({
"timestamp": 1, // Time range queries should be last
"userId": 1, // Should be first for user-specific queries
"eventType": 1
});
-- Problems this creates:
-- - Insert/Update operations become 10x slower
-- - Storage requirements increase by 300-400%
-- - Query optimizer confusion from too many index options
-- - Memory consumption increases dramatically
-- - Maintenance operations (backups, rebuilds) take hours instead of minutes
The Solution: Strategic Index Design with Performance Monitoring
// Advanced indexing strategies with performance optimization
import { performance } from "perf_hooks";
export class AdvancedIndexOptimizer {
private db: any;
private indexMetrics: Map<string, IndexPerformanceMetrics>;
private queryPatterns: Map<string, QueryPattern>;
constructor(database: any) {
this.db = database;
this.indexMetrics = new Map();
this.queryPatterns = new Map();
this.startIndexMonitoring();
}
// Strategic compound index creation based on query patterns
async createOptimizedIndexes(): Promise<void> {
// E-commerce product search - ESR (Equality, Sort, Range) principle
await this.createStrategicIndex(
"products",
"product_search_optimal",
{
// Equality filters first (highest selectivity)
category_id: 1,
brand_id: 1,
availability_status: 1,
// Sort fields next
popularity_score: -1,
// Range queries last
price: 1,
created_at: -1,
},
{
partialFilterExpression: {
availability_status: { $in: ["in_stock", "low_stock"] },
},
collation: { locale: "en", strength: 2 }, // Case insensitive
}
);
// User analytics - Time-series optimization
await this.createStrategicIndex(
"user_events",
"analytics_time_series",
{
// Partition key first
user_id: 1,
// Event type for filtering
event_type: 1,
// Time range last for efficient range scans
timestamp: -1,
},
{
partialFilterExpression: {
// Only index recent events
timestamp: { $gte: Date.now() - 30 * 24 * 60 * 60 * 1000 },
},
}
);
// Geographic search with spatial indexing
await this.createSpatialIndex(
"stores",
"location_proximity",
{
// 2dsphere for geographic queries
location: "2dsphere",
},
{
// Optimize for local searches
"2dsphereIndexVersion": 3,
}
);
// Search functionality with text indexing
await this.createTextIndex(
"products",
"product_text_search",
{
name: "text",
"description.short": "text",
"specifications.features": "text",
},
{
weights: {
name: 10, // Product name most important
"description.short": 5, // Short description medium
"specifications.features": 1, // Features least important
},
default_language: "english",
language_override: "lang",
}
);
}
// Advanced index creation with monitoring
private async createStrategicIndex(
collection: string,
indexName: string,
indexSpec: any,
options: any = {}
): Promise<void> {
const startTime = performance.now();
try {
// Check if index already exists
const existingIndexes = await this.db
.collection(collection)
.listIndexes()
.toArray();
const indexExists = existingIndexes.some((idx) => idx.name === indexName);
if (indexExists) {
console.log(`Index ${indexName} already exists on ${collection}`);
return;
}
// Create index in background to avoid blocking operations
const finalOptions = {
...options,
name: indexName,
background: true,
};
await this.db.collection(collection).createIndex(indexSpec, finalOptions);
const creationTime = performance.now() - startTime;
// Record index creation metrics
this.indexMetrics.set(indexName, {
collection,
indexSpec,
creationTime,
lastUsed: Date.now(),
usageCount: 0,
avgQueryTime: 0,
totalDocsExamined: 0,
totalDocsReturned: 0,
});
console.log(
`Created optimized index ${indexName} on ${collection} in ${creationTime.toFixed(
2
)}ms`
);
} catch (error) {
console.error(
`Failed to create index ${indexName} on ${collection}:`,
error
);
throw error;
}
}
// Spatial index for geographic data
private async createSpatialIndex(
collection: string,
indexName: string,
indexSpec: any,
options: any = {}
): Promise<void> {
await this.createStrategicIndex(collection, indexName, indexSpec, {
...options,
sparse: true, // Only index documents with location data
});
}
// Text index for full-text search
private async createTextIndex(
collection: string,
indexName: string,
indexSpec: any,
options: any = {}
): Promise<void> {
await this.createStrategicIndex(collection, indexName, indexSpec, {
...options,
// Text index specific options
textIndexVersion: 3,
background: true,
});
}
// Query execution with index performance tracking
async executeOptimizedQuery(
collection: string,
query: any,
options: any = {}
): Promise<QueryResult> {
const queryId = this.generateQueryId(collection, query);
const startTime = performance.now();
try {
// Execute query with explain to get execution stats
const cursor = this.db.collection(collection).find(query, options);
const explainResult = await cursor.explain("executionStats");
const results = await cursor.toArray();
const executionTime = performance.now() - startTime;
// Extract execution statistics
const stats = explainResult.executionStats;
const indexUsed = stats.winningPlan.inputStage?.indexName;
// Update query pattern tracking
this.updateQueryPatterns(queryId, {
collection,
query,
indexUsed,
executionTime,
docsExamined: stats.totalDocsExamined,
docsReturned: stats.totalDocsReturned,
executionTimeMillis: stats.executionTimeMillis,
});
// Update index usage metrics
if (indexUsed) {
this.updateIndexMetrics(indexUsed, {
usageCount: 1,
queryTime: executionTime,
docsExamined: stats.totalDocsExamined,
docsReturned: stats.totalDocsReturned,
});
}
return {
results,
executionStats: {
executionTime,
indexUsed,
docsExamined: stats.totalDocsExamined,
docsReturned: stats.totalDocsReturned,
isEfficient: this.isQueryEfficient(stats),
},
};
} catch (error) {
console.error(`Query execution failed for ${collection}:`, error);
throw error;
}
}
// Analyze query efficiency
private isQueryEfficient(stats: any): boolean {
const examineRatio = stats.totalDocsReturned / stats.totalDocsExamined;
const executionTime = stats.executionTimeMillis;
// Query is efficient if:
// - Examine ratio > 0.1 (not scanning too many unnecessary docs)
// - Execution time < 100ms for most queries
// - Uses an index (not doing collection scan)
return (
examineRatio > 0.1 &&
executionTime < 100 &&
stats.winningPlan.stage !== "COLLSCAN"
);
}
// Index usage analysis and optimization recommendations
async analyzeIndexPerformance(): Promise<IndexAnalysisReport> {
const indexStats = await this.getIndexStatistics();
const recommendations: IndexRecommendation[] = [];
for (const [indexName, metrics] of this.indexMetrics.entries()) {
const dbStats = indexStats.find((stat) => stat.name === indexName);
// Identify unused indexes
if (
metrics.usageCount === 0 &&
Date.now() - metrics.lastUsed > 7 * 24 * 60 * 60 * 1000
) {
recommendations.push({
type: "DROP_UNUSED",
indexName,
collection: metrics.collection,
reason: "Index has not been used in the last 7 days",
impact: "Will improve write performance and reduce storage overhead",
});
}
// Identify inefficient indexes (low selectivity)
if (metrics.avgQueryTime > 1000) {
// Queries taking > 1 second
recommendations.push({
type: "OPTIMIZE_SLOW",
indexName,
collection: metrics.collection,
reason: `Average query time is ${metrics.avgQueryTime.toFixed(2)}ms`,
impact:
"Consider reordering index fields or adding additional filters",
});
}
// Identify redundant indexes
const potentialDuplicates = this.findRedundantIndexes(indexName, metrics);
if (potentialDuplicates.length > 0) {
recommendations.push({
type: "MERGE_REDUNDANT",
indexName,
collection: metrics.collection,
reason: `Potentially redundant with indexes: ${potentialDuplicates.join(
", "
)}`,
impact: "Consolidating indexes will improve write performance",
});
}
}
return {
totalIndexes: this.indexMetrics.size,
recommendations,
performanceMetrics: Array.from(this.indexMetrics.entries()).map(
([name, metrics]) => ({
indexName: name,
collection: metrics.collection,
usageCount: metrics.usageCount,
avgQueryTime: metrics.avgQueryTime,
efficiency:
metrics.totalDocsReturned / Math.max(metrics.totalDocsExamined, 1),
})
),
};
}
// Get database-level index statistics
private async getIndexStatistics(): Promise<IndexStats[]> {
try {
// MongoDB-specific index stats
const collections = await this.db.listCollections().toArray();
const allIndexStats: IndexStats[] = [];
for (const collection of collections) {
const indexStats = await this.db
.collection(collection.name)
.aggregate([{ $indexStats: {} }])
.toArray();
allIndexStats.push(
...indexStats.map((stat: any) => ({
name: stat.name,
collection: collection.name,
accesses: stat.accesses,
ops: stat.accesses.ops,
since: stat.accesses.since,
}))
);
}
return allIndexStats;
} catch (error) {
console.error("Failed to get index statistics:", error);
return [];
}
}
// Index monitoring setup
private startIndexMonitoring(): void {
// Monitor index usage patterns every hour
setInterval(async () => {
try {
const analysis = await this.analyzeIndexPerformance();
// Log recommendations for review
if (analysis.recommendations.length > 0) {
console.log(
"Index optimization recommendations:",
JSON.stringify(analysis.recommendations, null, 2)
);
}
// Alert on critically slow queries
const slowQueries = analysis.performanceMetrics.filter(
(metric) => metric.avgQueryTime > 5000
);
if (slowQueries.length > 0) {
console.warn("Critical: Slow queries detected:", slowQueries);
}
} catch (error) {
console.error("Index monitoring failed:", error);
}
}, 60 * 60 * 1000); // Every hour
}
// Helper methods
private generateQueryId(collection: string, query: any): string {
const queryHash = require("crypto")
.createHash("md5")
.update(JSON.stringify({ collection, query }))
.digest("hex");
return `${collection}_${queryHash.substring(0, 8)}`;
}
private updateQueryPatterns(queryId: string, data: any): void {
const existing = this.queryPatterns.get(queryId);
if (existing) {
existing.count++;
existing.totalExecutionTime += data.executionTime;
existing.avgExecutionTime = existing.totalExecutionTime / existing.count;
} else {
this.queryPatterns.set(queryId, {
...data,
count: 1,
totalExecutionTime: data.executionTime,
avgExecutionTime: data.executionTime,
});
}
}
private updateIndexMetrics(indexName: string, usage: any): void {
const metrics = this.indexMetrics.get(indexName);
if (metrics) {
metrics.usageCount += usage.usageCount;
metrics.totalDocsExamined += usage.docsExamined;
metrics.totalDocsReturned += usage.docsReturned;
metrics.lastUsed = Date.now();
// Update rolling average query time
metrics.avgQueryTime = (metrics.avgQueryTime + usage.queryTime) / 2;
}
}
private findRedundantIndexes(indexName: string, metrics: any): string[] {
// Simplified redundancy detection - would need more sophisticated logic
return Array.from(this.indexMetrics.entries())
.filter(
([name, m]) =>
name !== indexName &&
m.collection === metrics.collection &&
this.hasOverlapWithIndex(metrics.indexSpec, m.indexSpec)
)
.map(([name]) => name);
}
private hasOverlapWithIndex(spec1: any, spec2: any): boolean {
const keys1 = Object.keys(spec1);
const keys2 = Object.keys(spec2);
// Check if one index is a prefix of another
const minLength = Math.min(keys1.length, keys2.length);
for (let i = 0; i < minLength; i++) {
if (keys1[i] !== keys2[i]) return false;
}
return true; // One is a prefix of the other
}
}
// Supporting interfaces
interface IndexPerformanceMetrics {
collection: string;
indexSpec: any;
creationTime: number;
lastUsed: number;
usageCount: number;
avgQueryTime: number;
totalDocsExamined: number;
totalDocsReturned: number;
}
interface QueryPattern {
collection: string;
query: any;
indexUsed?: string;
executionTime: number;
docsExamined: number;
docsReturned: number;
count: number;
totalExecutionTime: number;
avgExecutionTime: number;
}
interface QueryResult {
results: any[];
executionStats: {
executionTime: number;
indexUsed?: string;
docsExamined: number;
docsReturned: number;
isEfficient: boolean;
};
}
interface IndexRecommendation {
type: "DROP_UNUSED" | "OPTIMIZE_SLOW" | "MERGE_REDUNDANT" | "CREATE_MISSING";
indexName: string;
collection: string;
reason: string;
impact: string;
}
interface IndexAnalysisReport {
totalIndexes: number;
recommendations: IndexRecommendation[];
performanceMetrics: Array<{
indexName: string;
collection: string;
usageCount: number;
avgQueryTime: number;
efficiency: number;
}>;
}
interface IndexStats {
name: string;
collection: string;
accesses: any;
ops: number;
since: Date;
}
Specialized Database Systems: Time-Series and Graph Databases
The Problem: Using the Wrong Database for Specialized Data Patterns
// The specialized database nightmare - using SQL for everything
// Time-series data forced into relational structure - RED FLAG #1
CREATE TABLE sensor_data (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
sensor_id VARCHAR(50),
measurement_type VARCHAR(50),
value DECIMAL(15,6),
unit VARCHAR(20),
timestamp TIMESTAMP,
location_id BIGINT,
-- Trying to normalize time-series data - RED FLAG #2
INDEX idx_sensor_time (sensor_id, timestamp),
INDEX idx_location_time (location_id, timestamp),
INDEX idx_type_time (measurement_type, timestamp)
);
-- Social network relationships in SQL - RED FLAG #3
CREATE TABLE user_relationships (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT,
related_user_id BIGINT,
relationship_type ENUM('friend', 'follows', 'blocks'),
created_at TIMESTAMP,
-- No efficient way to traverse relationship graphs - RED FLAG #4
INDEX idx_user_relationships (user_id, relationship_type),
INDEX idx_related_user (related_user_id, relationship_type)
);
-- Complex graph queries in SQL - RED FLAG #5
-- Finding friends of friends (2-degree connections)
SELECT DISTINCT u3.id, u3.username
FROM users u1
JOIN user_relationships r1 ON u1.id = r1.user_id
JOIN user_relationships r2 ON r1.related_user_id = r2.user_id
JOIN users u3 ON r2.related_user_id = u3.id
WHERE u1.id = ?
AND r1.relationship_type = 'friend'
AND r2.relationship_type = 'friend'
AND u3.id != ?;
-- Time-series aggregation that scans millions of rows - RED FLAG #6
SELECT
DATE(timestamp) as day,
sensor_id,
AVG(value) as avg_value,
MIN(value) as min_value,
MAX(value) as max_value
FROM sensor_data
WHERE timestamp >= DATE_SUB(NOW(), INTERVAL 30 DAY)
AND sensor_id IN ('sensor1', 'sensor2', 'sensor3')
GROUP BY DATE(timestamp), sensor_id
ORDER BY day DESC;
-- Problems this creates:
-- - Time-series queries scan entire tables instead of using time-based partitioning
-- - Graph traversals require multiple JOINs and become exponentially expensive
-- - No built-in functions for time-series analysis (downsampling, interpolation)
-- - Relationship queries don't scale beyond 2-3 degrees of separation
-- - Storage overhead from relational normalization for time-series data
The Solution: Purpose-Built Database Systems for Specialized Workloads
// Advanced time-series and graph database implementation
import { InfluxDB, Point } from "@influxdata/influxdb-client";
import { Client as InfluxClient } from "@influxdata/influxdb-client-apis";
import neo4j, { Driver, Session } from "neo4j-driver";
import { Client as ElasticsearchClient } from "@elastic/elasticsearch";
export class SpecializedDatabaseArchitecture {
private influxDB: InfluxDB;
private neo4jDriver: Driver;
private elasticsearch: ElasticsearchClient;
private timeseries: TimeSeriesManager;
private graph: GraphManager;
constructor() {
this.initializeSpecializedDatabases();
}
private async initializeSpecializedDatabases() {
// InfluxDB for time-series data
this.influxDB = new InfluxDB({
url: process.env.INFLUXDB_URL!,
token: process.env.INFLUXDB_TOKEN!,
});
// Neo4j for graph relationships
this.neo4jDriver = neo4j.driver(
process.env.NEO4J_URI!,
neo4j.auth.basic(process.env.NEO4J_USER!, process.env.NEO4J_PASSWORD!),
{
maxConnectionLifetime: 30000,
maxConnectionPoolSize: 50,
connectionAcquisitionTimeout: 10000,
}
);
// Elasticsearch for full-text search and complex queries
this.elasticsearch = new ElasticsearchClient({
node: process.env.ELASTICSEARCH_URL!,
auth: {
username: process.env.ELASTICSEARCH_USER!,
password: process.env.ELASTICSEARCH_PASSWORD!,
},
maxRetries: 3,
requestTimeout: 30000,
});
this.timeseries = new TimeSeriesManager(this.influxDB);
this.graph = new GraphManager(this.neo4jDriver);
}
// Time-series data management with proper optimization
async setupTimeSeriesSchema(): Promise<void> {
const org = process.env.INFLUXDB_ORG!;
// Create buckets with appropriate retention policies
const buckets = [
{
name: "sensor_data_realtime",
retention: 24 * 60 * 60, // 24 hours in seconds
description: "Real-time sensor data for immediate monitoring",
},
{
name: "sensor_data_hourly",
retention: 30 * 24 * 60 * 60, // 30 days
description: "Hourly aggregated sensor data",
},
{
name: "sensor_data_daily",
retention: 365 * 24 * 60 * 60, // 1 year
description: "Daily aggregated sensor data for long-term analysis",
},
{
name: "application_metrics",
retention: 7 * 24 * 60 * 60, // 7 days
description: "Application performance metrics",
},
];
const bucketsAPI = new InfluxClient({
influxDB: this.influxDB,
}).getBucketsAPI(org);
for (const bucketConfig of buckets) {
try {
const existingBuckets = await bucketsAPI.getBuckets(org, {
name: bucketConfig.name,
});
if (existingBuckets.buckets?.length === 0) {
await bucketsAPI.postBuckets({
body: {
orgID: org,
name: bucketConfig.name,
retentionRules: [{ everySeconds: bucketConfig.retention }],
description: bucketConfig.description,
},
});
console.log(`Created InfluxDB bucket: ${bucketConfig.name}`);
}
} catch (error) {
console.error(`Failed to create bucket ${bucketConfig.name}:`, error);
}
}
}
// Graph database schema setup with proper constraints and indexes
async setupGraphSchema(): Promise<void> {
const session = this.neo4jDriver.session();
try {
// Create constraints for uniqueness and performance
await session.run(`
CREATE CONSTRAINT user_email_unique IF NOT EXISTS
FOR (u:User) REQUIRE u.email IS UNIQUE
`);
await session.run(`
CREATE CONSTRAINT user_id_unique IF NOT EXISTS
FOR (u:User) REQUIRE u.id IS UNIQUE
`);
await session.run(`
CREATE CONSTRAINT product_id_unique IF NOT EXISTS
FOR (p:Product) REQUIRE p.id IS UNIQUE
`);
// Create indexes for common query patterns
await session.run(`
CREATE INDEX user_created_at IF NOT EXISTS
FOR (u:User) ON (u.createdAt)
`);
await session.run(`
CREATE INDEX relationship_created_at IF NOT EXISTS
FOR ()-[r:FOLLOWS|FRIENDS_WITH|PURCHASED]-() ON (r.createdAt)
`);
await session.run(`
CREATE INDEX product_category_price IF NOT EXISTS
FOR (p:Product) ON (p.category, p.price)
`);
// Create composite indexes for complex queries
await session.run(`
CREATE INDEX user_location_interests IF NOT EXISTS
FOR (u:User) ON (u.location, u.interests)
`);
console.log("Graph database schema setup completed");
} catch (error) {
console.error("Failed to setup graph schema:", error);
throw error;
} finally {
await session.close();
}
}
}
// Advanced time-series data management
export class TimeSeriesManager {
private writeAPI: any;
private queryAPI: any;
constructor(private influxDB: InfluxDB) {
const org = process.env.INFLUXDB_ORG!;
this.writeAPI = influxDB.getWriteApi(org, "sensor_data_realtime");
this.queryAPI = influxDB.getQueryApi(org);
// Configure write options for performance
this.writeAPI.useDefaultTags({ environment: process.env.NODE_ENV });
}
// Optimized time-series data ingestion with batching
async ingestSensorData(sensorData: SensorReading[]): Promise<void> {
try {
// Create points with proper tags and fields structure
const points = sensorData.map((reading) => {
const point = new Point("sensor_reading")
// Tags (indexed, for grouping and filtering)
.tag("sensor_id", reading.sensorId)
.tag("sensor_type", reading.sensorType)
.tag("location", reading.location)
.tag("unit", reading.unit)
// Fields (not indexed, for measurements)
.floatField("value", reading.value)
.floatField("accuracy", reading.accuracy || 1.0)
.stringField("status", reading.status || "normal")
// Timestamp
.timestamp(new Date(reading.timestamp));
// Add optional metadata as fields
if (reading.metadata) {
Object.entries(reading.metadata).forEach(([key, value]) => {
if (typeof value === "number") {
point.floatField(`meta_${key}`, value);
} else {
point.stringField(`meta_${key}`, String(value));
}
});
}
return point;
});
// Batch write for performance
this.writeAPI.writePoints(points);
await this.writeAPI.flush();
console.log(`Ingested ${sensorData.length} sensor readings`);
} catch (error) {
console.error("Failed to ingest sensor data:", error);
throw error;
}
}
// Advanced time-series queries with aggregation
async queryTimeSeriesData(
params: TimeSeriesQueryParams
): Promise<TimeSeriesResult> {
const {
sensorIds,
startTime,
endTime,
aggregation = "mean",
window = "1h",
fill = "null",
} = params;
// Build Flux query with proper time ranges and aggregation
const query = `
from(bucket: "sensor_data_realtime")
|> range(start: ${startTime.toISOString()}, stop: ${endTime.toISOString()})
|> filter(fn: (r) => r["_measurement"] == "sensor_reading")
|> filter(fn: (r) => contains(value: r["sensor_id"], set: [${sensorIds
.map((id) => `"${id}"`)
.join(", ")}]))
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: ${window}, fn: ${aggregation}, createEmpty: false)
|> fill(column: "_value", usePrevious: ${fill === "previous"})
|> pivot(rowKey:["_time"], columnKey: ["sensor_id"], valueColumn: "_value")
|> yield(name: "aggregated_data")
`;
try {
const results: any[] = [];
const observer = {
next: (row: any, tableMeta: any) => {
const result = tableMeta.toObject(row);
results.push(result);
},
error: (error: Error) => {
console.error("Query execution error:", error);
throw error;
},
complete: () => {
console.log("Query completed successfully");
},
};
await this.queryAPI.queryRows(query, observer);
return {
data: results,
metadata: {
query,
sensorCount: sensorIds.length,
timeRange: { start: startTime, end: endTime },
aggregation,
window,
},
};
} catch (error) {
console.error("Time-series query failed:", error);
throw error;
}
}
// Continuous aggregation for data downsampling
async setupContinuousAggregation(): Promise<void> {
// Create tasks for automatic data downsampling
const hourlyAggregationTask = `
import "influxdata/influxdb/tasks"
option task = {
name: "hourly-sensor-aggregation",
every: 1h,
offset: 5m
}
from(bucket: "sensor_data_realtime")
|> range(start: -2h, stop: -1h)
|> filter(fn: (r) => r["_measurement"] == "sensor_reading")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> set(key: "_measurement", value: "sensor_reading_hourly")
|> to(bucket: "sensor_data_hourly")
`;
const dailyAggregationTask = `
option task = {
name: "daily-sensor-aggregation",
every: 1d,
offset: 10m
}
from(bucket: "sensor_data_hourly")
|> range(start: -25h, stop: -1h)
|> filter(fn: (r) => r["_measurement"] == "sensor_reading_hourly")
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
|> set(key: "_measurement", value: "sensor_reading_daily")
|> to(bucket: "sensor_data_daily")
`;
// Would need to use InfluxDB API to create these tasks
console.log("Continuous aggregation tasks configured");
}
// Time-series anomaly detection
async detectAnomalies(
params: AnomalyDetectionParams
): Promise<AnomalyResult[]> {
const { sensorId, timeRange, threshold = 2.0 } = params;
const anomalyQuery = `
from(bucket: "sensor_data_realtime")
|> range(start: ${timeRange.start.toISOString()}, stop: ${timeRange.end.toISOString()})
|> filter(fn: (r) => r["_measurement"] == "sensor_reading")
|> filter(fn: (r) => r["sensor_id"] == "${sensorId}")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
|> duplicate(column: "_value", as: "original_value")
|> movingAverage(n: 12) // 1-hour moving average (5min * 12)
|> duplicate(column: "_value", as: "moving_avg")
|> map(fn: (r) => ({
r with
deviation: math.abs(x: r.original_value - r.moving_avg),
threshold: ${threshold}
}))
|> filter(fn: (r) => r.deviation > r.threshold * r.moving_avg)
`;
try {
const anomalies: any[] = [];
const observer = {
next: (row: any, tableMeta: any) => {
anomalies.push(tableMeta.toObject(row));
},
error: (error: Error) => {
throw error;
},
complete: () => {},
};
await this.queryAPI.queryRows(anomalyQuery, observer);
return anomalies.map((anomaly) => ({
timestamp: new Date(anomaly._time),
sensorId: anomaly.sensor_id,
value: anomaly.original_value,
expectedValue: anomaly.moving_avg,
deviation: anomaly.deviation,
severity: anomaly.deviation > threshold * 2 ? "high" : "medium",
}));
} catch (error) {
console.error("Anomaly detection failed:", error);
throw error;
}
}
}
// Advanced graph database management
export class GraphManager {
constructor(private driver: Driver) {}
// Optimized social graph operations
async createUserRelationship(
userId1: string,
userId2: string,
relationshipType: string,
metadata: any = {}
): Promise<void> {
const session = this.driver.session();
try {
await session.run(
`
MATCH (u1:User {id: $userId1}), (u2:User {id: $userId2})
MERGE (u1)-[r:${relationshipType.toUpperCase()}]->(u2)
ON CREATE SET r.createdAt = datetime(), r.metadata = $metadata
ON MATCH SET r.updatedAt = datetime()
RETURN r
`,
{ userId1, userId2, metadata }
);
console.log(
`Created ${relationshipType} relationship: ${userId1} -> ${userId2}`
);
} catch (error) {
console.error("Failed to create relationship:", error);
throw error;
} finally {
await session.close();
}
}
// Efficient friend recommendation using graph algorithms
async getFriendRecommendations(
userId: string,
limit: number = 10
): Promise<UserRecommendation[]> {
const session = this.driver.session();
try {
// Use graph algorithms for friend-of-friend recommendations
const result = await session.run(
`
MATCH (user:User {id: $userId})-[:FRIENDS_WITH]-(friend:User)
-[:FRIENDS_WITH]-(recommendation:User)
WHERE NOT (user)-[:FRIENDS_WITH]-(recommendation)
AND user <> recommendation
WITH recommendation,
count(*) as mutualFriends,
collect(DISTINCT friend.name) as mutualFriendNames
OPTIONAL MATCH (recommendation)-[:INTERESTED_IN]->(interest:Interest)
<-[:INTERESTED_IN]-(user)
WITH recommendation, mutualFriends, mutualFriendNames,
count(DISTINCT interest) as commonInterests
OPTIONAL MATCH (recommendation)-[:LIVES_IN]->(location:Location)
<-[:LIVES_IN]-(user)
WITH recommendation, mutualFriends, mutualFriendNames, commonInterests,
CASE WHEN location IS NOT NULL THEN 1 ELSE 0 END as sameLocation
RETURN recommendation.id as userId,
recommendation.name as name,
recommendation.profilePicture as profilePicture,
mutualFriends,
mutualFriendNames[0..3] as topMutualFriends,
commonInterests,
sameLocation,
(mutualFriends * 2 + commonInterests + sameLocation) as score
ORDER BY score DESC, mutualFriends DESC
LIMIT $limit
`,
{ userId, limit }
);
return result.records.map((record) => ({
userId: record.get("userId"),
name: record.get("name"),
profilePicture: record.get("profilePicture"),
mutualFriends: record.get("mutualFriends").toNumber(),
topMutualFriends: record.get("topMutualFriends"),
commonInterests: record.get("commonInterests").toNumber(),
sameLocation: record.get("sameLocation").toNumber() === 1,
recommendationScore: record.get("score").toNumber(),
}));
} catch (error) {
console.error("Friend recommendation query failed:", error);
throw error;
} finally {
await session.close();
}
}
// Product recommendation using collaborative filtering
async getProductRecommendations(
userId: string,
limit: number = 20
): Promise<ProductRecommendation[]> {
const session = this.driver.session();
try {
// Collaborative filtering based on similar users' purchases
const result = await session.run(
`
MATCH (user:User {id: $userId})-[:PURCHASED]->(product:Product)
<-[:PURCHASED]-(similarUser:User)
WHERE similarUser <> user
WITH similarUser,
count(DISTINCT product) as commonPurchases,
collect(DISTINCT product.category) as userCategories
ORDER BY commonPurchases DESC
LIMIT 50
WITH collect({user: similarUser, commonPurchases: commonPurchases}) as similarUsers,
userCategories
UNWIND similarUsers as similar
MATCH (similar.user)-[:PURCHASED]->(recommendedProduct:Product)
WHERE NOT (user)-[:PURCHASED]->(recommendedProduct)
AND recommendedProduct.category IN userCategories
WITH recommendedProduct,
sum(similar.commonPurchases) as totalScore,
count(DISTINCT similar.user) as recommendingUsers,
avg(recommendedProduct.rating) as avgRating
WHERE totalScore >= 2 AND recommendingUsers >= 2
RETURN recommendedProduct.id as productId,
recommendedProduct.name as name,
recommendedProduct.price as price,
recommendedProduct.category as category,
recommendedProduct.imageUrl as imageUrl,
avgRating,
totalScore,
recommendingUsers
ORDER BY totalScore DESC, avgRating DESC
LIMIT $limit
`,
{ userId, limit }
);
return result.records.map((record) => ({
productId: record.get("productId"),
name: record.get("name"),
price: record.get("price").toNumber(),
category: record.get("category"),
imageUrl: record.get("imageUrl"),
avgRating: record.get("avgRating").toNumber(),
recommendationStrength: record.get("totalScore").toNumber(),
basedOnUsers: record.get("recommendingUsers").toNumber(),
}));
} catch (error) {
console.error("Product recommendation query failed:", error);
throw error;
} finally {
await session.close();
}
}
// Graph analytics for insights
async getGraphAnalytics(): Promise<GraphAnalytics> {
const session = this.driver.session();
try {
// Parallel execution of multiple analytics queries
const [userStatsResult, relationshipStatsResult, communityStatsResult] =
await Promise.all([
// User statistics
session.run(`
MATCH (u:User)
OPTIONAL MATCH (u)-[r:FRIENDS_WITH]->()
WITH u, count(r) as friendCount
RETURN count(u) as totalUsers,
avg(friendCount) as avgFriends,
max(friendCount) as maxFriends,
percentileDisc(friendCount, 0.5) as medianFriends
`),
// Relationship statistics
session.run(`
MATCH ()-[r]->()
WITH type(r) as relType, count(*) as relCount
RETURN collect({type: relType, count: relCount}) as relationships
`),
// Community detection (simplified)
session.run(`
CALL gds.louvain.stream('user-network')
YIELD nodeId, communityId
WITH communityId, count(*) as communitySize
RETURN count(DISTINCT communityId) as totalCommunities,
avg(communitySize) as avgCommunitySize,
max(communitySize) as largestCommunity
`),
]);
const userStats = userStatsResult.records[0];
const relationshipStats = relationshipStatsResult.records[0];
const communityStats = communityStatsResult.records[0];
return {
users: {
total: userStats.get("totalUsers").toNumber(),
avgFriends: userStats.get("avgFriends").toNumber(),
maxFriends: userStats.get("maxFriends").toNumber(),
medianFriends: userStats.get("medianFriends").toNumber(),
},
relationships: relationshipStats.get("relationships"),
communities: {
total: communityStats.get("totalCommunities").toNumber(),
avgSize: communityStats.get("avgCommunitySize").toNumber(),
largest: communityStats.get("largestCommunity").toNumber(),
},
};
} catch (error) {
console.error("Graph analytics failed:", error);
throw error;
} finally {
await session.close();
}
}
}
// Supporting interfaces
interface SensorReading {
sensorId: string;
sensorType: string;
location: string;
value: number;
unit: string;
timestamp: number;
accuracy?: number;
status?: string;
metadata?: Record<string, any>;
}
interface TimeSeriesQueryParams {
sensorIds: string[];
startTime: Date;
endTime: Date;
aggregation?: string;
window?: string;
fill?: string;
}
interface TimeSeriesResult {
data: any[];
metadata: {
query: string;
sensorCount: number;
timeRange: { start: Date; end: Date };
aggregation: string;
window: string;
};
}
interface AnomalyDetectionParams {
sensorId: string;
timeRange: { start: Date; end: Date };
threshold?: number;
}
interface AnomalyResult {
timestamp: Date;
sensorId: string;
value: number;
expectedValue: number;
deviation: number;
severity: "low" | "medium" | "high";
}
interface UserRecommendation {
userId: string;
name: string;
profilePicture?: string;
mutualFriends: number;
topMutualFriends: string[];
commonInterests: number;
sameLocation: boolean;
recommendationScore: number;
}
interface ProductRecommendation {
productId: string;
name: string;
price: number;
category: string;
imageUrl?: string;
avgRating: number;
recommendationStrength: number;
basedOnUsers: number;
}
interface GraphAnalytics {
users: {
total: number;
avgFriends: number;
maxFriends: number;
medianFriends: number;
};
relationships: Array<{
type: string;
count: number;
}>;
communities: {
total: number;
avgSize: number;
largest: number;
};
}
This comprehensive advanced database concepts guide gives you:
- Strategic storage engine selection that matches database engines to specific workload characteristics
- Advanced indexing strategies with performance monitoring and optimization recommendations
- Specialized database implementations using time-series and graph databases for appropriate use cases
- Performance optimization techniques that go beyond basic SQL tuning
- Production-ready monitoring and analytics for database health and query performance
The difference between database architectures that scale gracefully and those that require constant rebuilds isn’t just following best practices—it’s understanding the fundamental characteristics of different data patterns and choosing the right specialized tools for each job.