Advanced Database Concepts

The $8.3 Million Database Performance Disaster That Killed Black Friday

Picture this database catastrophe: A major e-commerce platform with 50 million products and 200 million users approaches Black Friday with their “bulletproof” database architecture. Their database team, fresh from a MongoDB conference, decides to completely redesign their product catalog using the latest NoSQL patterns and “advanced indexing strategies” without understanding the fundamentals.

Black Friday morning arrives like a digital apocalypse:

The symptoms were financially devastating:

  • Database response times increased 47,000%: Simple product lookups that took 2ms were now taking 15+ seconds
  • Revenue loss hit $8.3 million in 6 hours: Customer abandonment spiked to 94% due to page load failures
  • Database servers consumed 2.4TB of RAM: Poorly designed indexes were eating memory faster than they could provision it
  • Query execution costs reached $180,000/hour: Complex aggregation queries were scanning entire collections repeatedly
  • Storage costs exploded to 15x normal: Redundant indexes and poor schema design inflated data size by 1,400%
  • Customer service received 847,000 complaints: Users couldn’t browse products, complete purchases, or access their accounts

Here’s what their expensive database audit revealed:

  • Index explosion without strategy: They created 847 indexes across 23 collections, most of which were never used and slowed down writes by 89%
  • Complete misunderstanding of storage engines: They used memory-mapped storage for time-series data and row-based storage for analytics
  • Trigger cascade nightmares: A single product update was triggering 73 database triggers, creating write amplification disasters
  • Wrong database for the job: They used graph databases for simple key-value lookups and relational databases for graph traversals
  • No query plan analysis: Complex aggregation queries were performing full table scans on 50 million record collections
  • Storage engine mismatch: They used InnoDB for append-only logs and MyISAM for high-concurrency transactional data

The final damage:

  • $8.3 million lost revenue in a single Black Friday, destroying quarterly projections
  • 47% customer trust rating drop as users experienced consistent 30-second page loads
  • Database infrastructure costs increased 890% trying to scale their way out of performance problems
  • 6 months of engineering time spent rebuilding the entire data architecture from scratch
  • Complete C-level restructure as the database decisions became a board-level crisis

The brutal truth? Every single database performance disaster could have been prevented with proper understanding of database internals, storage engines, and indexing strategies.

The Uncomfortable Truth About Advanced Database Concepts

Here’s what separates database architectures that scale gracefully from those that collapse under their own complexity: Advanced database features are powerful tools that become expensive liabilities when you don’t understand the underlying storage engines, indexing mechanisms, and query execution patterns. The more advanced features you use, the deeper your understanding of database internals needs to be.

Most developers approach advanced database concepts like this:

  1. Add more indexes thinking it will always improve performance
  2. Use triggers and stored procedures without understanding execution context
  3. Choose databases based on hype instead of workload characteristics
  4. Ignore query execution plans and storage engine behavior
  5. Assume that NoSQL automatically means better performance

But developers who build truly performant database systems work differently:

  1. Understand storage engine characteristics and match them to specific workload patterns
  2. Design indexes strategically based on query patterns and write/read ratios
  3. Choose the right database type based on data access patterns, not technology trends
  4. Monitor query execution plans continuously and optimize based on actual performance data
  5. Understand the performance implications of every advanced feature before implementing it

The difference isn’t just query response times—it’s the difference between systems that handle growth gracefully and systems that require complete rebuilds every time traffic increases.

Ready to build database architectures that actually leverage advanced concepts for performance gains instead of creating expensive bottlenecks? Let’s dive into database internals that work in production.


Database Storage Engines: Choosing the Right Foundation

The Problem: Storage Engine Mismatch That Destroys Performance

-- The storage engine nightmare that makes everything slow
-- Using InnoDB for append-only time-series data - RED FLAG #1
CREATE TABLE sensor_readings (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    sensor_id VARCHAR(50),
    timestamp BIGINT,
    temperature DECIMAL(5,2),
    humidity DECIMAL(5,2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    -- InnoDB overhead for data that's never updated - RED FLAG #2
    INDEX idx_sensor_timestamp (sensor_id, timestamp),
    INDEX idx_created_at (created_at),
    INDEX idx_temperature (temperature),
    INDEX idx_humidity (humidity)
) ENGINE=InnoDB;

-- Using MyISAM for high-concurrency transactional data - RED FLAG #3
CREATE TABLE user_sessions (
    session_id VARCHAR(128) PRIMARY KEY,
    user_id BIGINT,
    session_data TEXT,
    last_activity TIMESTAMP,
    -- No transaction support for critical user data - RED FLAG #4
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM;

-- Using memory engine for persistent data - RED FLAG #5
CREATE TABLE application_config (
    config_key VARCHAR(100) PRIMARY KEY,
    config_value TEXT,
    updated_at TIMESTAMP
) ENGINE=MEMORY;

-- MongoDB collection without considering access patterns - RED FLAG #6
db.products.createIndex({
    "name": "text",           // Text search on every field - expensive
    "description": "text",
    "category": "text",
    "tags": "text"
});

-- Wrong index strategy for time-series queries - RED FLAG #7
db.analytics.createIndex({
    "userId": 1,              // Leading with high-cardinality field
    "eventType": 1,
    "timestamp": -1
});

-- Problems this creates:
-- - Time-series inserts become 10x slower due to InnoDB transaction overhead
-- - Session updates cause table locks with MyISAM during high traffic
-- - Configuration data disappears on server restart with MEMORY engine
-- - Text indexes consume massive amounts of memory and slow down writes
-- - Analytics queries scan millions of records due to poor index design

The Solution: Strategic Storage Engine Selection with Proper Configuration

// Advanced database architecture with optimal storage engine selection
import { createConnection, Connection } from "mysql2/promise";
import { MongoClient, Db } from "mongodb";
import { createClient, RedisClientType } from "redis";

// Storage engine strategy for different data patterns
export class DatabaseArchitect {
  private mysqlConnection: Connection;
  private mongoDb: Db;
  private redisClient: RedisClientType;
  private timeseries: any; // TimescaleDB or InfluxDB client

  constructor() {
    this.initializeConnections();
  }

  private async initializeConnections() {
    // MySQL with optimized engine configurations
    this.mysqlConnection = await createConnection({
      host: process.env.MYSQL_HOST,
      user: process.env.MYSQL_USER,
      password: process.env.MYSQL_PASSWORD,
      database: process.env.MYSQL_DATABASE,
      // Connection pool optimization
      connectionLimit: 100,
      acquireTimeout: 60000,
      timeout: 60000,
    });

    // MongoDB with proper read/write concerns
    const mongoClient = new MongoClient(process.env.MONGODB_URI!, {
      maxPoolSize: 50,
      minPoolSize: 5,
      maxIdleTimeMS: 30000,
      serverSelectionTimeoutMS: 5000,
      // Write concern for performance vs durability balance
      writeConcern: { w: 1, j: true, wtimeout: 5000 },
      readConcern: { level: "local" },
    });

    await mongoClient.connect();
    this.mongoDb = mongoClient.db("ecommerce");

    // Redis for high-performance caching and real-time data
    this.redisClient = createClient({
      url: process.env.REDIS_URL,
      socket: {
        connectTimeout: 5000,
        lazyConnect: true,
      },
    });
  }

  // ✅ Optimal: InnoDB for transactional data with proper indexing
  async setupTransactionalTables(): Promise<void> {
    // User accounts - ACID properties essential
    await this.mysqlConnection.execute(`
      CREATE TABLE IF NOT EXISTS users (
        id BIGINT AUTO_INCREMENT PRIMARY KEY,
        email VARCHAR(255) UNIQUE NOT NULL,
        password_hash VARCHAR(255) NOT NULL,
        profile_data JSON,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
        
        -- Strategic indexing for common queries
        INDEX idx_email_active (email),
        INDEX idx_created_at (created_at),
        
        -- JSON indexing for profile queries
        INDEX idx_profile_name ((JSON_UNQUOTE(JSON_EXTRACT(profile_data, '$.name')))),
        INDEX idx_profile_location ((JSON_UNQUOTE(JSON_EXTRACT(profile_data, '$.location'))))
      ) ENGINE=InnoDB 
        ROW_FORMAT=COMPRESSED 
        KEY_BLOCK_SIZE=8
        COMMENT='User accounts requiring ACID properties';
    `);

    // Orders - Complex transactions requiring consistency
    await this.mysqlConnection.execute(`
      CREATE TABLE IF NOT EXISTS orders (
        id BIGINT AUTO_INCREMENT PRIMARY KEY,
        user_id BIGINT,
        order_status ENUM('pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled'),
        total_amount DECIMAL(10,2),
        payment_data JSON,
        shipping_address JSON,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
        
        -- Compound indexes for common query patterns
        INDEX idx_user_status_date (user_id, order_status, created_at),
        INDEX idx_status_date (order_status, created_at),
        INDEX idx_amount_date (total_amount, created_at),
        
        FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
      ) ENGINE=InnoDB 
        ROW_FORMAT=COMPRESSED
        COMMENT='Order transactions requiring strong consistency';
    `);
  }

  // ✅ Optimal: Column store for analytics and time-series data
  async setupAnalyticsSchema(): Promise<void> {
    // Time-series data optimized for append-heavy workloads
    await this.mysqlConnection.execute(`
      CREATE TABLE IF NOT EXISTS user_events (
        id BIGINT AUTO_INCREMENT,
        user_id BIGINT,
        event_type VARCHAR(50),
        event_data JSON,
        session_id VARCHAR(128),
        ip_address INET,
        user_agent TEXT,
        timestamp BIGINT,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        
        -- Time-series optimized indexing
        PRIMARY KEY (id, timestamp),
        INDEX idx_user_time_type (user_id, timestamp, event_type),
        INDEX idx_session_time (session_id, timestamp),
        INDEX idx_event_type_time (event_type, timestamp)
      ) ENGINE=InnoDB
        PARTITION BY RANGE (timestamp) (
          PARTITION p202401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01')),
          PARTITION p202402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01')),
          PARTITION p202403 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')),
          PARTITION p_future VALUES LESS THAN MAXVALUE
        )
        COMMENT='Partitioned time-series data for analytics';
    `);

    // Product analytics using specialized time-series approach
    const timeseriesSchema = `
      CREATE TABLE IF NOT EXISTS product_metrics (
        time TIMESTAMPTZ NOT NULL,
        product_id BIGINT,
        views INTEGER DEFAULT 0,
        purchases INTEGER DEFAULT 0,
        revenue DECIMAL(10,2) DEFAULT 0,
        conversion_rate DECIMAL(5,4),
        
        -- Time-series specific indexes
        PRIMARY KEY (time, product_id)
      ) ENGINE=InnoDB
        PARTITION BY RANGE (YEAR(time)) (
          PARTITION p2023 VALUES LESS THAN (2024),
          PARTITION p2024 VALUES LESS THAN (2025),
          PARTITION p_future VALUES LESS THAN MAXVALUE
        );

      -- Hypertable for time-series if using TimescaleDB
      SELECT create_hypertable('product_metrics', 'time', 
        chunk_time_interval => INTERVAL '1 day',
        if_not_exists => TRUE
      );
    `;
  }

  // ✅ Optimal: MongoDB for document-oriented data with proper indexing
  async setupDocumentCollections(): Promise<void> {
    // Product catalog - Complex nested data structures
    const productsCollection = this.mongoDb.collection("products");

    // Strategic compound indexes based on query patterns
    await Promise.all([
      // Primary search patterns
      productsCollection.createIndex(
        {
          category: 1,
          price: 1,
          "inventory.quantity": 1,
        },
        {
          name: "category_price_inventory",
          background: true,
        }
      ),

      // Text search with proper language support
      productsCollection.createIndex(
        {
          name: "text",
          "description.short": "text",
        },
        {
          name: "product_text_search",
          weights: { name: 10, "description.short": 1 },
          default_language: "english",
          background: true,
        }
      ),

      // Geographic queries for local inventory
      productsCollection.createIndex(
        { "inventory.locations": "2dsphere" },
        { name: "location_search", background: true }
      ),

      // Sparse index for sales data (only products on sale)
      productsCollection.createIndex(
        { "pricing.salePrice": 1, "pricing.saleEndDate": 1 },
        {
          name: "active_sales",
          sparse: true,
          partialFilterExpression: { "pricing.salePrice": { $exists: true } },
          background: true,
        }
      ),
    ]);

    // User profiles - Semi-structured data with flexible schema
    const userProfilesCollection = this.mongoDb.collection("user_profiles");

    await Promise.all([
      // Primary user lookup
      userProfilesCollection.createIndex(
        { userId: 1 },
        { name: "user_id_lookup", unique: true, background: true }
      ),

      // Preference-based recommendations
      userProfilesCollection.createIndex(
        { "preferences.categories": 1, "behavior.purchaseHistory": 1 },
        { name: "recommendation_engine", background: true }
      ),

      // Geographic personalization
      userProfilesCollection.createIndex(
        { "location.coordinates": "2dsphere" },
        { name: "location_personalization", background: true }
      ),
    ]);
  }

  // ✅ Optimal: Redis for high-performance caching and real-time data
  async setupRedisStructures(): Promise<void> {
    await this.redisClient.connect();

    // Session management with TTL
    const sessionKey = "session:user:123";
    await this.redisClient.hMSet(sessionKey, {
      userId: "123",
      loginTime: Date.now().toString(),
      lastActivity: Date.now().toString(),
      permissions: JSON.stringify(["read", "write"]),
    });
    await this.redisClient.expire(sessionKey, 3600); // 1 hour TTL

    // Real-time leaderboards using sorted sets
    await this.redisClient.zAdd("user_scores", [
      { score: 1500, value: "user:123" },
      { score: 1450, value: "user:456" },
      { score: 1400, value: "user:789" },
    ]);

    // Product inventory counters with atomic operations
    await this.redisClient.hMSet("inventory:product:456", {
      quantity: "100",
      reserved: "5",
      available: "95",
    });

    // Rate limiting using sliding window
    const rateLimitKey = "rate_limit:user:123";
    await this.redisClient.zAdd(rateLimitKey, [
      { score: Date.now(), value: Date.now().toString() },
    ]);
    await this.redisClient.expire(rateLimitKey, 3600);
  }

  // Advanced query optimization with storage engine awareness
  async executeOptimizedQuery(queryType: string, params: any): Promise<any> {
    switch (queryType) {
      case "user_orders_with_products":
        // Use InnoDB's transaction isolation for consistency
        await this.mysqlConnection.beginTransaction();

        try {
          const [orders] = await this.mysqlConnection.execute(
            `
            SELECT 
              o.id, o.total_amount, o.order_status, o.created_at,
              JSON_OBJECT(
                'email', u.email,
                'profile', u.profile_data
              ) as user_data
            FROM orders o
            JOIN users u ON o.user_id = u.id
            WHERE o.user_id = ? 
              AND o.created_at >= DATE_SUB(NOW(), INTERVAL ? DAY)
            ORDER BY o.created_at DESC
            LIMIT ?
          `,
            [params.userId, params.dayRange || 30, params.limit || 20]
          );

          await this.mysqlConnection.commit();
          return orders;
        } catch (error) {
          await this.mysqlConnection.rollback();
          throw error;
        }

      case "product_analytics_aggregation":
        // Use MongoDB's aggregation pipeline for complex analytics
        return await this.mongoDb
          .collection("products")
          .aggregate([
            // Match stage with indexed fields first
            {
              $match: {
                category: params.category,
                "inventory.quantity": { $gt: 0 },
              },
            },

            // Lookup related data efficiently
            {
              $lookup: {
                from: "product_metrics",
                let: { productId: "$_id" },
                pipeline: [
                  {
                    $match: {
                      $expr: { $eq: ["$product_id", "$$productId"] },
                      time: {
                        $gte: new Date(params.startDate),
                        $lte: new Date(params.endDate),
                      },
                    },
                  },
                  {
                    $group: {
                      _id: null,
                      totalViews: { $sum: "$views" },
                      totalRevenue: { $sum: "$revenue" },
                      avgConversion: { $avg: "$conversion_rate" },
                    },
                  },
                ],
                as: "metrics",
              },
            },

            // Project with computed fields
            {
              $project: {
                name: 1,
                price: "$pricing.basePrice",
                inventory: "$inventory.quantity",
                metrics: { $arrayElemAt: ["$metrics", 0] },
              },
            },

            // Sort by performance metrics
            {
              $sort: { "metrics.totalRevenue": -1 },
            },

            { $limit: params.limit || 50 },
          ])
          .toArray();

      case "real_time_inventory_check":
        // Use Redis for immediate response
        const pipeline = this.redisClient.multi();

        for (const productId of params.productIds) {
          pipeline.hGetAll(`inventory:product:${productId}`);
        }

        const results = await pipeline.exec();
        return results.map((result, index) => ({
          productId: params.productIds[index],
          inventory: result,
        }));

      default:
        throw new Error(`Unknown query type: ${queryType}`);
    }
  }

  // Database health monitoring and performance metrics
  async getDatabaseMetrics(): Promise<DatabaseMetrics> {
    // MySQL performance metrics
    const [mysqlStats] = await this.mysqlConnection.execute(`
      SHOW GLOBAL STATUS WHERE 
        Variable_name IN (
          'Threads_connected', 'Threads_running', 'Queries', 
          'Slow_queries', 'Innodb_buffer_pool_hit_rate',
          'Innodb_rows_read', 'Innodb_rows_inserted'
        )
    `);

    // MongoDB metrics
    const mongoStats = await this.mongoDb.admin().serverStatus();

    // Redis metrics
    const redisInfo = await this.redisClient.info("memory");

    return {
      mysql: {
        connections: this.extractStatValue(mysqlStats, "Threads_connected"),
        activeQueries: this.extractStatValue(mysqlStats, "Threads_running"),
        slowQueries: this.extractStatValue(mysqlStats, "Slow_queries"),
        bufferPoolHitRate: this.extractStatValue(
          mysqlStats,
          "Innodb_buffer_pool_hit_rate"
        ),
      },
      mongodb: {
        connections: mongoStats.connections.current,
        operations: mongoStats.opcounters,
        memory: mongoStats.mem,
        locks: mongoStats.locks,
      },
      redis: {
        memoryUsed: this.parseRedisInfo(redisInfo, "used_memory"),
        memoryPeak: this.parseRedisInfo(redisInfo, "used_memory_peak"),
        connectedClients: await this.redisClient
          .clientList()
          .then((list) => list.length),
      },
    };
  }

  private extractStatValue(stats: any[], variableName: string): number {
    const stat = stats.find((s: any) => s.Variable_name === variableName);
    return stat ? parseInt(stat.Value) : 0;
  }

  private parseRedisInfo(info: string, key: string): number {
    const match = info.match(new RegExp(`${key}:([0-9]+)`));
    return match ? parseInt(match[1]) : 0;
  }
}

// Storage engine decision framework
export class StorageEngineOptimizer {
  static analyzeWorkload(
    characteristics: WorkloadCharacteristics
  ): StorageRecommendation {
    const recommendations: StorageRecommendation = {
      primaryStorage: "InnoDB",
      cachingLayer: "Redis",
      analyticsStorage: "ColumnStore",
      searchEngine: "Elasticsearch",
      reasoning: [],
    };

    // Transactional requirements
    if (characteristics.requiresACID) {
      recommendations.primaryStorage = "InnoDB";
      recommendations.reasoning.push(
        "InnoDB selected for ACID transaction support"
      );
    }

    // Read-heavy workloads
    if (characteristics.readWriteRatio > 10) {
      recommendations.cachingLayer = "Redis";
      recommendations.reasoning.push(
        "Redis caching for high read-to-write ratios"
      );

      if (characteristics.dataSize > "100GB") {
        recommendations.primaryStorage = "MyISAM";
        recommendations.reasoning.push(
          "MyISAM for large read-heavy datasets without transactions"
        );
      }
    }

    // Time-series data
    if (characteristics.isTimeSeries) {
      recommendations.analyticsStorage = "TimescaleDB";
      recommendations.reasoning.push(
        "TimescaleDB for time-series data optimization"
      );
    }

    // Complex document structures
    if (characteristics.hasComplexDocuments) {
      recommendations.primaryStorage = "MongoDB";
      recommendations.reasoning.push("MongoDB for complex document structures");
    }

    // Graph relationships
    if (characteristics.hasGraphRelationships) {
      recommendations.primaryStorage = "Neo4j";
      recommendations.reasoning.push("Neo4j for graph relationship queries");
    }

    // Full-text search requirements
    if (characteristics.requiresFullTextSearch) {
      recommendations.searchEngine = "Elasticsearch";
      recommendations.reasoning.push(
        "Elasticsearch for advanced full-text search"
      );
    }

    return recommendations;
  }
}

// Supporting interfaces
interface WorkloadCharacteristics {
  requiresACID: boolean;
  readWriteRatio: number;
  dataSize: string;
  isTimeSeries: boolean;
  hasComplexDocuments: boolean;
  hasGraphRelationships: boolean;
  requiresFullTextSearch: boolean;
  concurrencyLevel: "low" | "medium" | "high";
  consistencyRequirements: "eventual" | "strong";
}

interface StorageRecommendation {
  primaryStorage: string;
  cachingLayer: string;
  analyticsStorage: string;
  searchEngine: string;
  reasoning: string[];
}

interface DatabaseMetrics {
  mysql: {
    connections: number;
    activeQueries: number;
    slowQueries: number;
    bufferPoolHitRate: number;
  };
  mongodb: {
    connections: number;
    operations: any;
    memory: any;
    locks: any;
  };
  redis: {
    memoryUsed: number;
    memoryPeak: number;
    connectedClients: number;
  };
}

Advanced Indexing Strategies: Beyond Basic Performance

The Problem: Index Strategies That Make Everything Slower

-- The indexing nightmare that destroys write performance
-- Over-indexing every possible column - RED FLAG #1
CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255),
    description TEXT,
    price DECIMAL(10,2),
    category_id BIGINT,
    brand_id BIGINT,
    sku VARCHAR(100),
    weight DECIMAL(8,2),
    dimensions JSON,
    created_at TIMESTAMP,
    updated_at TIMESTAMP,

    -- Redundant and overlapping indexes - RED FLAG #2
    INDEX idx_name (name),
    INDEX idx_name_category (name, category_id),
    INDEX idx_name_brand (name, brand_id),
    INDEX idx_category (category_id),
    INDEX idx_category_brand (category_id, brand_id),
    INDEX idx_brand (brand_id),
    INDEX idx_price (price),
    INDEX idx_price_category (price, category_id),
    INDEX idx_sku (sku),
    INDEX idx_weight (weight),
    INDEX idx_created (created_at),
    INDEX idx_updated (updated_at),
    INDEX idx_created_updated (created_at, updated_at),

    -- Expensive functional indexes - RED FLAG #3
    INDEX idx_json_width ((JSON_EXTRACT(dimensions, '$.width'))),
    INDEX idx_json_height ((JSON_EXTRACT(dimensions, '$.height'))),
    INDEX idx_json_depth ((JSON_EXTRACT(dimensions, '$.depth'))),

    -- Full-text index on large text - RED FLAG #4
    FULLTEXT INDEX idx_description_fulltext (description),
    FULLTEXT INDEX idx_name_desc_fulltext (name, description)
);

-- MongoDB compound index anti-patterns - RED FLAG #5
db.products.createIndex({
    "name": 1,           // High cardinality first
    "category": 1,       // Lower selectivity
    "price": 1,
    "brand": 1,
    "createdAt": -1
});

-- Wrong index order for query patterns - RED FLAG #6
db.analytics.createIndex({
    "timestamp": 1,      // Time range queries should be last
    "userId": 1,         // Should be first for user-specific queries
    "eventType": 1
});

-- Problems this creates:
-- - Insert/Update operations become 10x slower
-- - Storage requirements increase by 300-400%
-- - Query optimizer confusion from too many index options
-- - Memory consumption increases dramatically
-- - Maintenance operations (backups, rebuilds) take hours instead of minutes

The Solution: Strategic Index Design with Performance Monitoring

// Advanced indexing strategies with performance optimization
import { performance } from "perf_hooks";

export class AdvancedIndexOptimizer {
  private db: any;
  private indexMetrics: Map<string, IndexPerformanceMetrics>;
  private queryPatterns: Map<string, QueryPattern>;

  constructor(database: any) {
    this.db = database;
    this.indexMetrics = new Map();
    this.queryPatterns = new Map();
    this.startIndexMonitoring();
  }

  // Strategic compound index creation based on query patterns
  async createOptimizedIndexes(): Promise<void> {
    // E-commerce product search - ESR (Equality, Sort, Range) principle
    await this.createStrategicIndex(
      "products",
      "product_search_optimal",
      {
        // Equality filters first (highest selectivity)
        category_id: 1,
        brand_id: 1,
        availability_status: 1,

        // Sort fields next
        popularity_score: -1,

        // Range queries last
        price: 1,
        created_at: -1,
      },
      {
        partialFilterExpression: {
          availability_status: { $in: ["in_stock", "low_stock"] },
        },
        collation: { locale: "en", strength: 2 }, // Case insensitive
      }
    );

    // User analytics - Time-series optimization
    await this.createStrategicIndex(
      "user_events",
      "analytics_time_series",
      {
        // Partition key first
        user_id: 1,

        // Event type for filtering
        event_type: 1,

        // Time range last for efficient range scans
        timestamp: -1,
      },
      {
        partialFilterExpression: {
          // Only index recent events
          timestamp: { $gte: Date.now() - 30 * 24 * 60 * 60 * 1000 },
        },
      }
    );

    // Geographic search with spatial indexing
    await this.createSpatialIndex(
      "stores",
      "location_proximity",
      {
        // 2dsphere for geographic queries
        location: "2dsphere",
      },
      {
        // Optimize for local searches
        "2dsphereIndexVersion": 3,
      }
    );

    // Search functionality with text indexing
    await this.createTextIndex(
      "products",
      "product_text_search",
      {
        name: "text",
        "description.short": "text",
        "specifications.features": "text",
      },
      {
        weights: {
          name: 10, // Product name most important
          "description.short": 5, // Short description medium
          "specifications.features": 1, // Features least important
        },
        default_language: "english",
        language_override: "lang",
      }
    );
  }

  // Advanced index creation with monitoring
  private async createStrategicIndex(
    collection: string,
    indexName: string,
    indexSpec: any,
    options: any = {}
  ): Promise<void> {
    const startTime = performance.now();

    try {
      // Check if index already exists
      const existingIndexes = await this.db
        .collection(collection)
        .listIndexes()
        .toArray();
      const indexExists = existingIndexes.some((idx) => idx.name === indexName);

      if (indexExists) {
        console.log(`Index ${indexName} already exists on ${collection}`);
        return;
      }

      // Create index in background to avoid blocking operations
      const finalOptions = {
        ...options,
        name: indexName,
        background: true,
      };

      await this.db.collection(collection).createIndex(indexSpec, finalOptions);

      const creationTime = performance.now() - startTime;

      // Record index creation metrics
      this.indexMetrics.set(indexName, {
        collection,
        indexSpec,
        creationTime,
        lastUsed: Date.now(),
        usageCount: 0,
        avgQueryTime: 0,
        totalDocsExamined: 0,
        totalDocsReturned: 0,
      });

      console.log(
        `Created optimized index ${indexName} on ${collection} in ${creationTime.toFixed(
          2
        )}ms`
      );
    } catch (error) {
      console.error(
        `Failed to create index ${indexName} on ${collection}:`,
        error
      );
      throw error;
    }
  }

  // Spatial index for geographic data
  private async createSpatialIndex(
    collection: string,
    indexName: string,
    indexSpec: any,
    options: any = {}
  ): Promise<void> {
    await this.createStrategicIndex(collection, indexName, indexSpec, {
      ...options,
      sparse: true, // Only index documents with location data
    });
  }

  // Text index for full-text search
  private async createTextIndex(
    collection: string,
    indexName: string,
    indexSpec: any,
    options: any = {}
  ): Promise<void> {
    await this.createStrategicIndex(collection, indexName, indexSpec, {
      ...options,
      // Text index specific options
      textIndexVersion: 3,
      background: true,
    });
  }

  // Query execution with index performance tracking
  async executeOptimizedQuery(
    collection: string,
    query: any,
    options: any = {}
  ): Promise<QueryResult> {
    const queryId = this.generateQueryId(collection, query);
    const startTime = performance.now();

    try {
      // Execute query with explain to get execution stats
      const cursor = this.db.collection(collection).find(query, options);
      const explainResult = await cursor.explain("executionStats");
      const results = await cursor.toArray();

      const executionTime = performance.now() - startTime;

      // Extract execution statistics
      const stats = explainResult.executionStats;
      const indexUsed = stats.winningPlan.inputStage?.indexName;

      // Update query pattern tracking
      this.updateQueryPatterns(queryId, {
        collection,
        query,
        indexUsed,
        executionTime,
        docsExamined: stats.totalDocsExamined,
        docsReturned: stats.totalDocsReturned,
        executionTimeMillis: stats.executionTimeMillis,
      });

      // Update index usage metrics
      if (indexUsed) {
        this.updateIndexMetrics(indexUsed, {
          usageCount: 1,
          queryTime: executionTime,
          docsExamined: stats.totalDocsExamined,
          docsReturned: stats.totalDocsReturned,
        });
      }

      return {
        results,
        executionStats: {
          executionTime,
          indexUsed,
          docsExamined: stats.totalDocsExamined,
          docsReturned: stats.totalDocsReturned,
          isEfficient: this.isQueryEfficient(stats),
        },
      };
    } catch (error) {
      console.error(`Query execution failed for ${collection}:`, error);
      throw error;
    }
  }

  // Analyze query efficiency
  private isQueryEfficient(stats: any): boolean {
    const examineRatio = stats.totalDocsReturned / stats.totalDocsExamined;
    const executionTime = stats.executionTimeMillis;

    // Query is efficient if:
    // - Examine ratio > 0.1 (not scanning too many unnecessary docs)
    // - Execution time < 100ms for most queries
    // - Uses an index (not doing collection scan)
    return (
      examineRatio > 0.1 &&
      executionTime < 100 &&
      stats.winningPlan.stage !== "COLLSCAN"
    );
  }

  // Index usage analysis and optimization recommendations
  async analyzeIndexPerformance(): Promise<IndexAnalysisReport> {
    const indexStats = await this.getIndexStatistics();
    const recommendations: IndexRecommendation[] = [];

    for (const [indexName, metrics] of this.indexMetrics.entries()) {
      const dbStats = indexStats.find((stat) => stat.name === indexName);

      // Identify unused indexes
      if (
        metrics.usageCount === 0 &&
        Date.now() - metrics.lastUsed > 7 * 24 * 60 * 60 * 1000
      ) {
        recommendations.push({
          type: "DROP_UNUSED",
          indexName,
          collection: metrics.collection,
          reason: "Index has not been used in the last 7 days",
          impact: "Will improve write performance and reduce storage overhead",
        });
      }

      // Identify inefficient indexes (low selectivity)
      if (metrics.avgQueryTime > 1000) {
        // Queries taking > 1 second
        recommendations.push({
          type: "OPTIMIZE_SLOW",
          indexName,
          collection: metrics.collection,
          reason: `Average query time is ${metrics.avgQueryTime.toFixed(2)}ms`,
          impact:
            "Consider reordering index fields or adding additional filters",
        });
      }

      // Identify redundant indexes
      const potentialDuplicates = this.findRedundantIndexes(indexName, metrics);
      if (potentialDuplicates.length > 0) {
        recommendations.push({
          type: "MERGE_REDUNDANT",
          indexName,
          collection: metrics.collection,
          reason: `Potentially redundant with indexes: ${potentialDuplicates.join(
            ", "
          )}`,
          impact: "Consolidating indexes will improve write performance",
        });
      }
    }

    return {
      totalIndexes: this.indexMetrics.size,
      recommendations,
      performanceMetrics: Array.from(this.indexMetrics.entries()).map(
        ([name, metrics]) => ({
          indexName: name,
          collection: metrics.collection,
          usageCount: metrics.usageCount,
          avgQueryTime: metrics.avgQueryTime,
          efficiency:
            metrics.totalDocsReturned / Math.max(metrics.totalDocsExamined, 1),
        })
      ),
    };
  }

  // Get database-level index statistics
  private async getIndexStatistics(): Promise<IndexStats[]> {
    try {
      // MongoDB-specific index stats
      const collections = await this.db.listCollections().toArray();
      const allIndexStats: IndexStats[] = [];

      for (const collection of collections) {
        const indexStats = await this.db
          .collection(collection.name)
          .aggregate([{ $indexStats: {} }])
          .toArray();

        allIndexStats.push(
          ...indexStats.map((stat: any) => ({
            name: stat.name,
            collection: collection.name,
            accesses: stat.accesses,
            ops: stat.accesses.ops,
            since: stat.accesses.since,
          }))
        );
      }

      return allIndexStats;
    } catch (error) {
      console.error("Failed to get index statistics:", error);
      return [];
    }
  }

  // Index monitoring setup
  private startIndexMonitoring(): void {
    // Monitor index usage patterns every hour
    setInterval(async () => {
      try {
        const analysis = await this.analyzeIndexPerformance();

        // Log recommendations for review
        if (analysis.recommendations.length > 0) {
          console.log(
            "Index optimization recommendations:",
            JSON.stringify(analysis.recommendations, null, 2)
          );
        }

        // Alert on critically slow queries
        const slowQueries = analysis.performanceMetrics.filter(
          (metric) => metric.avgQueryTime > 5000
        );

        if (slowQueries.length > 0) {
          console.warn("Critical: Slow queries detected:", slowQueries);
        }
      } catch (error) {
        console.error("Index monitoring failed:", error);
      }
    }, 60 * 60 * 1000); // Every hour
  }

  // Helper methods
  private generateQueryId(collection: string, query: any): string {
    const queryHash = require("crypto")
      .createHash("md5")
      .update(JSON.stringify({ collection, query }))
      .digest("hex");
    return `${collection}_${queryHash.substring(0, 8)}`;
  }

  private updateQueryPatterns(queryId: string, data: any): void {
    const existing = this.queryPatterns.get(queryId);
    if (existing) {
      existing.count++;
      existing.totalExecutionTime += data.executionTime;
      existing.avgExecutionTime = existing.totalExecutionTime / existing.count;
    } else {
      this.queryPatterns.set(queryId, {
        ...data,
        count: 1,
        totalExecutionTime: data.executionTime,
        avgExecutionTime: data.executionTime,
      });
    }
  }

  private updateIndexMetrics(indexName: string, usage: any): void {
    const metrics = this.indexMetrics.get(indexName);
    if (metrics) {
      metrics.usageCount += usage.usageCount;
      metrics.totalDocsExamined += usage.docsExamined;
      metrics.totalDocsReturned += usage.docsReturned;
      metrics.lastUsed = Date.now();

      // Update rolling average query time
      metrics.avgQueryTime = (metrics.avgQueryTime + usage.queryTime) / 2;
    }
  }

  private findRedundantIndexes(indexName: string, metrics: any): string[] {
    // Simplified redundancy detection - would need more sophisticated logic
    return Array.from(this.indexMetrics.entries())
      .filter(
        ([name, m]) =>
          name !== indexName &&
          m.collection === metrics.collection &&
          this.hasOverlapWithIndex(metrics.indexSpec, m.indexSpec)
      )
      .map(([name]) => name);
  }

  private hasOverlapWithIndex(spec1: any, spec2: any): boolean {
    const keys1 = Object.keys(spec1);
    const keys2 = Object.keys(spec2);

    // Check if one index is a prefix of another
    const minLength = Math.min(keys1.length, keys2.length);
    for (let i = 0; i < minLength; i++) {
      if (keys1[i] !== keys2[i]) return false;
    }

    return true; // One is a prefix of the other
  }
}

// Supporting interfaces
interface IndexPerformanceMetrics {
  collection: string;
  indexSpec: any;
  creationTime: number;
  lastUsed: number;
  usageCount: number;
  avgQueryTime: number;
  totalDocsExamined: number;
  totalDocsReturned: number;
}

interface QueryPattern {
  collection: string;
  query: any;
  indexUsed?: string;
  executionTime: number;
  docsExamined: number;
  docsReturned: number;
  count: number;
  totalExecutionTime: number;
  avgExecutionTime: number;
}

interface QueryResult {
  results: any[];
  executionStats: {
    executionTime: number;
    indexUsed?: string;
    docsExamined: number;
    docsReturned: number;
    isEfficient: boolean;
  };
}

interface IndexRecommendation {
  type: "DROP_UNUSED" | "OPTIMIZE_SLOW" | "MERGE_REDUNDANT" | "CREATE_MISSING";
  indexName: string;
  collection: string;
  reason: string;
  impact: string;
}

interface IndexAnalysisReport {
  totalIndexes: number;
  recommendations: IndexRecommendation[];
  performanceMetrics: Array<{
    indexName: string;
    collection: string;
    usageCount: number;
    avgQueryTime: number;
    efficiency: number;
  }>;
}

interface IndexStats {
  name: string;
  collection: string;
  accesses: any;
  ops: number;
  since: Date;
}

Specialized Database Systems: Time-Series and Graph Databases

The Problem: Using the Wrong Database for Specialized Data Patterns

// The specialized database nightmare - using SQL for everything
// Time-series data forced into relational structure - RED FLAG #1
CREATE TABLE sensor_data (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    sensor_id VARCHAR(50),
    measurement_type VARCHAR(50),
    value DECIMAL(15,6),
    unit VARCHAR(20),
    timestamp TIMESTAMP,
    location_id BIGINT,
    -- Trying to normalize time-series data - RED FLAG #2
    INDEX idx_sensor_time (sensor_id, timestamp),
    INDEX idx_location_time (location_id, timestamp),
    INDEX idx_type_time (measurement_type, timestamp)
);

-- Social network relationships in SQL - RED FLAG #3
CREATE TABLE user_relationships (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    user_id BIGINT,
    related_user_id BIGINT,
    relationship_type ENUM('friend', 'follows', 'blocks'),
    created_at TIMESTAMP,
    -- No efficient way to traverse relationship graphs - RED FLAG #4
    INDEX idx_user_relationships (user_id, relationship_type),
    INDEX idx_related_user (related_user_id, relationship_type)
);

-- Complex graph queries in SQL - RED FLAG #5
-- Finding friends of friends (2-degree connections)
SELECT DISTINCT u3.id, u3.username
FROM users u1
JOIN user_relationships r1 ON u1.id = r1.user_id
JOIN user_relationships r2 ON r1.related_user_id = r2.user_id
JOIN users u3 ON r2.related_user_id = u3.id
WHERE u1.id = ?
  AND r1.relationship_type = 'friend'
  AND r2.relationship_type = 'friend'
  AND u3.id != ?;

-- Time-series aggregation that scans millions of rows - RED FLAG #6
SELECT
    DATE(timestamp) as day,
    sensor_id,
    AVG(value) as avg_value,
    MIN(value) as min_value,
    MAX(value) as max_value
FROM sensor_data
WHERE timestamp >= DATE_SUB(NOW(), INTERVAL 30 DAY)
  AND sensor_id IN ('sensor1', 'sensor2', 'sensor3')
GROUP BY DATE(timestamp), sensor_id
ORDER BY day DESC;

-- Problems this creates:
-- - Time-series queries scan entire tables instead of using time-based partitioning
-- - Graph traversals require multiple JOINs and become exponentially expensive
-- - No built-in functions for time-series analysis (downsampling, interpolation)
-- - Relationship queries don't scale beyond 2-3 degrees of separation
-- - Storage overhead from relational normalization for time-series data

The Solution: Purpose-Built Database Systems for Specialized Workloads

// Advanced time-series and graph database implementation
import { InfluxDB, Point } from "@influxdata/influxdb-client";
import { Client as InfluxClient } from "@influxdata/influxdb-client-apis";
import neo4j, { Driver, Session } from "neo4j-driver";
import { Client as ElasticsearchClient } from "@elastic/elasticsearch";

export class SpecializedDatabaseArchitecture {
  private influxDB: InfluxDB;
  private neo4jDriver: Driver;
  private elasticsearch: ElasticsearchClient;
  private timeseries: TimeSeriesManager;
  private graph: GraphManager;

  constructor() {
    this.initializeSpecializedDatabases();
  }

  private async initializeSpecializedDatabases() {
    // InfluxDB for time-series data
    this.influxDB = new InfluxDB({
      url: process.env.INFLUXDB_URL!,
      token: process.env.INFLUXDB_TOKEN!,
    });

    // Neo4j for graph relationships
    this.neo4jDriver = neo4j.driver(
      process.env.NEO4J_URI!,
      neo4j.auth.basic(process.env.NEO4J_USER!, process.env.NEO4J_PASSWORD!),
      {
        maxConnectionLifetime: 30000,
        maxConnectionPoolSize: 50,
        connectionAcquisitionTimeout: 10000,
      }
    );

    // Elasticsearch for full-text search and complex queries
    this.elasticsearch = new ElasticsearchClient({
      node: process.env.ELASTICSEARCH_URL!,
      auth: {
        username: process.env.ELASTICSEARCH_USER!,
        password: process.env.ELASTICSEARCH_PASSWORD!,
      },
      maxRetries: 3,
      requestTimeout: 30000,
    });

    this.timeseries = new TimeSeriesManager(this.influxDB);
    this.graph = new GraphManager(this.neo4jDriver);
  }

  // Time-series data management with proper optimization
  async setupTimeSeriesSchema(): Promise<void> {
    const org = process.env.INFLUXDB_ORG!;

    // Create buckets with appropriate retention policies
    const buckets = [
      {
        name: "sensor_data_realtime",
        retention: 24 * 60 * 60, // 24 hours in seconds
        description: "Real-time sensor data for immediate monitoring",
      },
      {
        name: "sensor_data_hourly",
        retention: 30 * 24 * 60 * 60, // 30 days
        description: "Hourly aggregated sensor data",
      },
      {
        name: "sensor_data_daily",
        retention: 365 * 24 * 60 * 60, // 1 year
        description: "Daily aggregated sensor data for long-term analysis",
      },
      {
        name: "application_metrics",
        retention: 7 * 24 * 60 * 60, // 7 days
        description: "Application performance metrics",
      },
    ];

    const bucketsAPI = new InfluxClient({
      influxDB: this.influxDB,
    }).getBucketsAPI(org);

    for (const bucketConfig of buckets) {
      try {
        const existingBuckets = await bucketsAPI.getBuckets(org, {
          name: bucketConfig.name,
        });

        if (existingBuckets.buckets?.length === 0) {
          await bucketsAPI.postBuckets({
            body: {
              orgID: org,
              name: bucketConfig.name,
              retentionRules: [{ everySeconds: bucketConfig.retention }],
              description: bucketConfig.description,
            },
          });

          console.log(`Created InfluxDB bucket: ${bucketConfig.name}`);
        }
      } catch (error) {
        console.error(`Failed to create bucket ${bucketConfig.name}:`, error);
      }
    }
  }

  // Graph database schema setup with proper constraints and indexes
  async setupGraphSchema(): Promise<void> {
    const session = this.neo4jDriver.session();

    try {
      // Create constraints for uniqueness and performance
      await session.run(`
        CREATE CONSTRAINT user_email_unique IF NOT EXISTS 
        FOR (u:User) REQUIRE u.email IS UNIQUE
      `);

      await session.run(`
        CREATE CONSTRAINT user_id_unique IF NOT EXISTS 
        FOR (u:User) REQUIRE u.id IS UNIQUE
      `);

      await session.run(`
        CREATE CONSTRAINT product_id_unique IF NOT EXISTS 
        FOR (p:Product) REQUIRE p.id IS UNIQUE
      `);

      // Create indexes for common query patterns
      await session.run(`
        CREATE INDEX user_created_at IF NOT EXISTS 
        FOR (u:User) ON (u.createdAt)
      `);

      await session.run(`
        CREATE INDEX relationship_created_at IF NOT EXISTS 
        FOR ()-[r:FOLLOWS|FRIENDS_WITH|PURCHASED]-() ON (r.createdAt)
      `);

      await session.run(`
        CREATE INDEX product_category_price IF NOT EXISTS 
        FOR (p:Product) ON (p.category, p.price)
      `);

      // Create composite indexes for complex queries
      await session.run(`
        CREATE INDEX user_location_interests IF NOT EXISTS 
        FOR (u:User) ON (u.location, u.interests)
      `);

      console.log("Graph database schema setup completed");
    } catch (error) {
      console.error("Failed to setup graph schema:", error);
      throw error;
    } finally {
      await session.close();
    }
  }
}

// Advanced time-series data management
export class TimeSeriesManager {
  private writeAPI: any;
  private queryAPI: any;

  constructor(private influxDB: InfluxDB) {
    const org = process.env.INFLUXDB_ORG!;
    this.writeAPI = influxDB.getWriteApi(org, "sensor_data_realtime");
    this.queryAPI = influxDB.getQueryApi(org);

    // Configure write options for performance
    this.writeAPI.useDefaultTags({ environment: process.env.NODE_ENV });
  }

  // Optimized time-series data ingestion with batching
  async ingestSensorData(sensorData: SensorReading[]): Promise<void> {
    try {
      // Create points with proper tags and fields structure
      const points = sensorData.map((reading) => {
        const point = new Point("sensor_reading")
          // Tags (indexed, for grouping and filtering)
          .tag("sensor_id", reading.sensorId)
          .tag("sensor_type", reading.sensorType)
          .tag("location", reading.location)
          .tag("unit", reading.unit)

          // Fields (not indexed, for measurements)
          .floatField("value", reading.value)
          .floatField("accuracy", reading.accuracy || 1.0)
          .stringField("status", reading.status || "normal")

          // Timestamp
          .timestamp(new Date(reading.timestamp));

        // Add optional metadata as fields
        if (reading.metadata) {
          Object.entries(reading.metadata).forEach(([key, value]) => {
            if (typeof value === "number") {
              point.floatField(`meta_${key}`, value);
            } else {
              point.stringField(`meta_${key}`, String(value));
            }
          });
        }

        return point;
      });

      // Batch write for performance
      this.writeAPI.writePoints(points);
      await this.writeAPI.flush();

      console.log(`Ingested ${sensorData.length} sensor readings`);
    } catch (error) {
      console.error("Failed to ingest sensor data:", error);
      throw error;
    }
  }

  // Advanced time-series queries with aggregation
  async queryTimeSeriesData(
    params: TimeSeriesQueryParams
  ): Promise<TimeSeriesResult> {
    const {
      sensorIds,
      startTime,
      endTime,
      aggregation = "mean",
      window = "1h",
      fill = "null",
    } = params;

    // Build Flux query with proper time ranges and aggregation
    const query = `
      from(bucket: "sensor_data_realtime")
        |> range(start: ${startTime.toISOString()}, stop: ${endTime.toISOString()})
        |> filter(fn: (r) => r["_measurement"] == "sensor_reading")
        |> filter(fn: (r) => contains(value: r["sensor_id"], set: [${sensorIds
          .map((id) => `"${id}"`)
          .join(", ")}]))
        |> filter(fn: (r) => r["_field"] == "value")
        |> aggregateWindow(every: ${window}, fn: ${aggregation}, createEmpty: false)
        |> fill(column: "_value", usePrevious: ${fill === "previous"})
        |> pivot(rowKey:["_time"], columnKey: ["sensor_id"], valueColumn: "_value")
        |> yield(name: "aggregated_data")
    `;

    try {
      const results: any[] = [];
      const observer = {
        next: (row: any, tableMeta: any) => {
          const result = tableMeta.toObject(row);
          results.push(result);
        },
        error: (error: Error) => {
          console.error("Query execution error:", error);
          throw error;
        },
        complete: () => {
          console.log("Query completed successfully");
        },
      };

      await this.queryAPI.queryRows(query, observer);

      return {
        data: results,
        metadata: {
          query,
          sensorCount: sensorIds.length,
          timeRange: { start: startTime, end: endTime },
          aggregation,
          window,
        },
      };
    } catch (error) {
      console.error("Time-series query failed:", error);
      throw error;
    }
  }

  // Continuous aggregation for data downsampling
  async setupContinuousAggregation(): Promise<void> {
    // Create tasks for automatic data downsampling
    const hourlyAggregationTask = `
      import "influxdata/influxdb/tasks"
      
      option task = {
        name: "hourly-sensor-aggregation",
        every: 1h,
        offset: 5m
      }
      
      from(bucket: "sensor_data_realtime")
        |> range(start: -2h, stop: -1h)
        |> filter(fn: (r) => r["_measurement"] == "sensor_reading")
        |> filter(fn: (r) => r["_field"] == "value")
        |> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
        |> set(key: "_measurement", value: "sensor_reading_hourly")
        |> to(bucket: "sensor_data_hourly")
    `;

    const dailyAggregationTask = `
      option task = {
        name: "daily-sensor-aggregation", 
        every: 1d,
        offset: 10m
      }
      
      from(bucket: "sensor_data_hourly")
        |> range(start: -25h, stop: -1h)
        |> filter(fn: (r) => r["_measurement"] == "sensor_reading_hourly")
        |> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
        |> set(key: "_measurement", value: "sensor_reading_daily")
        |> to(bucket: "sensor_data_daily")
    `;

    // Would need to use InfluxDB API to create these tasks
    console.log("Continuous aggregation tasks configured");
  }

  // Time-series anomaly detection
  async detectAnomalies(
    params: AnomalyDetectionParams
  ): Promise<AnomalyResult[]> {
    const { sensorId, timeRange, threshold = 2.0 } = params;

    const anomalyQuery = `
      from(bucket: "sensor_data_realtime")
        |> range(start: ${timeRange.start.toISOString()}, stop: ${timeRange.end.toISOString()})
        |> filter(fn: (r) => r["_measurement"] == "sensor_reading")
        |> filter(fn: (r) => r["sensor_id"] == "${sensorId}")
        |> filter(fn: (r) => r["_field"] == "value")
        |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
        |> duplicate(column: "_value", as: "original_value")
        |> movingAverage(n: 12) // 1-hour moving average (5min * 12)
        |> duplicate(column: "_value", as: "moving_avg")
        |> map(fn: (r) => ({
            r with 
            deviation: math.abs(x: r.original_value - r.moving_avg),
            threshold: ${threshold}
          }))
        |> filter(fn: (r) => r.deviation > r.threshold * r.moving_avg)
    `;

    try {
      const anomalies: any[] = [];
      const observer = {
        next: (row: any, tableMeta: any) => {
          anomalies.push(tableMeta.toObject(row));
        },
        error: (error: Error) => {
          throw error;
        },
        complete: () => {},
      };

      await this.queryAPI.queryRows(anomalyQuery, observer);

      return anomalies.map((anomaly) => ({
        timestamp: new Date(anomaly._time),
        sensorId: anomaly.sensor_id,
        value: anomaly.original_value,
        expectedValue: anomaly.moving_avg,
        deviation: anomaly.deviation,
        severity: anomaly.deviation > threshold * 2 ? "high" : "medium",
      }));
    } catch (error) {
      console.error("Anomaly detection failed:", error);
      throw error;
    }
  }
}

// Advanced graph database management
export class GraphManager {
  constructor(private driver: Driver) {}

  // Optimized social graph operations
  async createUserRelationship(
    userId1: string,
    userId2: string,
    relationshipType: string,
    metadata: any = {}
  ): Promise<void> {
    const session = this.driver.session();

    try {
      await session.run(
        `
        MATCH (u1:User {id: $userId1}), (u2:User {id: $userId2})
        MERGE (u1)-[r:${relationshipType.toUpperCase()}]->(u2)
        ON CREATE SET r.createdAt = datetime(), r.metadata = $metadata
        ON MATCH SET r.updatedAt = datetime()
        RETURN r
      `,
        { userId1, userId2, metadata }
      );

      console.log(
        `Created ${relationshipType} relationship: ${userId1} -> ${userId2}`
      );
    } catch (error) {
      console.error("Failed to create relationship:", error);
      throw error;
    } finally {
      await session.close();
    }
  }

  // Efficient friend recommendation using graph algorithms
  async getFriendRecommendations(
    userId: string,
    limit: number = 10
  ): Promise<UserRecommendation[]> {
    const session = this.driver.session();

    try {
      // Use graph algorithms for friend-of-friend recommendations
      const result = await session.run(
        `
        MATCH (user:User {id: $userId})-[:FRIENDS_WITH]-(friend:User)
        -[:FRIENDS_WITH]-(recommendation:User)
        WHERE NOT (user)-[:FRIENDS_WITH]-(recommendation)
          AND user <> recommendation
        WITH recommendation, 
             count(*) as mutualFriends,
             collect(DISTINCT friend.name) as mutualFriendNames
        OPTIONAL MATCH (recommendation)-[:INTERESTED_IN]->(interest:Interest)
          <-[:INTERESTED_IN]-(user)
        WITH recommendation, mutualFriends, mutualFriendNames,
             count(DISTINCT interest) as commonInterests
        OPTIONAL MATCH (recommendation)-[:LIVES_IN]->(location:Location)
          <-[:LIVES_IN]-(user)
        WITH recommendation, mutualFriends, mutualFriendNames, commonInterests,
             CASE WHEN location IS NOT NULL THEN 1 ELSE 0 END as sameLocation
        RETURN recommendation.id as userId,
               recommendation.name as name,
               recommendation.profilePicture as profilePicture,
               mutualFriends,
               mutualFriendNames[0..3] as topMutualFriends,
               commonInterests,
               sameLocation,
               (mutualFriends * 2 + commonInterests + sameLocation) as score
        ORDER BY score DESC, mutualFriends DESC
        LIMIT $limit
      `,
        { userId, limit }
      );

      return result.records.map((record) => ({
        userId: record.get("userId"),
        name: record.get("name"),
        profilePicture: record.get("profilePicture"),
        mutualFriends: record.get("mutualFriends").toNumber(),
        topMutualFriends: record.get("topMutualFriends"),
        commonInterests: record.get("commonInterests").toNumber(),
        sameLocation: record.get("sameLocation").toNumber() === 1,
        recommendationScore: record.get("score").toNumber(),
      }));
    } catch (error) {
      console.error("Friend recommendation query failed:", error);
      throw error;
    } finally {
      await session.close();
    }
  }

  // Product recommendation using collaborative filtering
  async getProductRecommendations(
    userId: string,
    limit: number = 20
  ): Promise<ProductRecommendation[]> {
    const session = this.driver.session();

    try {
      // Collaborative filtering based on similar users' purchases
      const result = await session.run(
        `
        MATCH (user:User {id: $userId})-[:PURCHASED]->(product:Product)
          <-[:PURCHASED]-(similarUser:User)
        WHERE similarUser <> user
        WITH similarUser, 
             count(DISTINCT product) as commonPurchases,
             collect(DISTINCT product.category) as userCategories
        ORDER BY commonPurchases DESC
        LIMIT 50
        
        WITH collect({user: similarUser, commonPurchases: commonPurchases}) as similarUsers,
             userCategories
        
        UNWIND similarUsers as similar
        MATCH (similar.user)-[:PURCHASED]->(recommendedProduct:Product)
        WHERE NOT (user)-[:PURCHASED]->(recommendedProduct)
          AND recommendedProduct.category IN userCategories
        
        WITH recommendedProduct,
             sum(similar.commonPurchases) as totalScore,
             count(DISTINCT similar.user) as recommendingUsers,
             avg(recommendedProduct.rating) as avgRating
        
        WHERE totalScore >= 2 AND recommendingUsers >= 2
        
        RETURN recommendedProduct.id as productId,
               recommendedProduct.name as name,
               recommendedProduct.price as price,
               recommendedProduct.category as category,
               recommendedProduct.imageUrl as imageUrl,
               avgRating,
               totalScore,
               recommendingUsers
        ORDER BY totalScore DESC, avgRating DESC
        LIMIT $limit
      `,
        { userId, limit }
      );

      return result.records.map((record) => ({
        productId: record.get("productId"),
        name: record.get("name"),
        price: record.get("price").toNumber(),
        category: record.get("category"),
        imageUrl: record.get("imageUrl"),
        avgRating: record.get("avgRating").toNumber(),
        recommendationStrength: record.get("totalScore").toNumber(),
        basedOnUsers: record.get("recommendingUsers").toNumber(),
      }));
    } catch (error) {
      console.error("Product recommendation query failed:", error);
      throw error;
    } finally {
      await session.close();
    }
  }

  // Graph analytics for insights
  async getGraphAnalytics(): Promise<GraphAnalytics> {
    const session = this.driver.session();

    try {
      // Parallel execution of multiple analytics queries
      const [userStatsResult, relationshipStatsResult, communityStatsResult] =
        await Promise.all([
          // User statistics
          session.run(`
          MATCH (u:User)
          OPTIONAL MATCH (u)-[r:FRIENDS_WITH]->()
          WITH u, count(r) as friendCount
          RETURN count(u) as totalUsers,
                 avg(friendCount) as avgFriends,
                 max(friendCount) as maxFriends,
                 percentileDisc(friendCount, 0.5) as medianFriends
        `),

          // Relationship statistics
          session.run(`
          MATCH ()-[r]->()
          WITH type(r) as relType, count(*) as relCount
          RETURN collect({type: relType, count: relCount}) as relationships
        `),

          // Community detection (simplified)
          session.run(`
          CALL gds.louvain.stream('user-network')
          YIELD nodeId, communityId
          WITH communityId, count(*) as communitySize
          RETURN count(DISTINCT communityId) as totalCommunities,
                 avg(communitySize) as avgCommunitySize,
                 max(communitySize) as largestCommunity
        `),
        ]);

      const userStats = userStatsResult.records[0];
      const relationshipStats = relationshipStatsResult.records[0];
      const communityStats = communityStatsResult.records[0];

      return {
        users: {
          total: userStats.get("totalUsers").toNumber(),
          avgFriends: userStats.get("avgFriends").toNumber(),
          maxFriends: userStats.get("maxFriends").toNumber(),
          medianFriends: userStats.get("medianFriends").toNumber(),
        },
        relationships: relationshipStats.get("relationships"),
        communities: {
          total: communityStats.get("totalCommunities").toNumber(),
          avgSize: communityStats.get("avgCommunitySize").toNumber(),
          largest: communityStats.get("largestCommunity").toNumber(),
        },
      };
    } catch (error) {
      console.error("Graph analytics failed:", error);
      throw error;
    } finally {
      await session.close();
    }
  }
}

// Supporting interfaces
interface SensorReading {
  sensorId: string;
  sensorType: string;
  location: string;
  value: number;
  unit: string;
  timestamp: number;
  accuracy?: number;
  status?: string;
  metadata?: Record<string, any>;
}

interface TimeSeriesQueryParams {
  sensorIds: string[];
  startTime: Date;
  endTime: Date;
  aggregation?: string;
  window?: string;
  fill?: string;
}

interface TimeSeriesResult {
  data: any[];
  metadata: {
    query: string;
    sensorCount: number;
    timeRange: { start: Date; end: Date };
    aggregation: string;
    window: string;
  };
}

interface AnomalyDetectionParams {
  sensorId: string;
  timeRange: { start: Date; end: Date };
  threshold?: number;
}

interface AnomalyResult {
  timestamp: Date;
  sensorId: string;
  value: number;
  expectedValue: number;
  deviation: number;
  severity: "low" | "medium" | "high";
}

interface UserRecommendation {
  userId: string;
  name: string;
  profilePicture?: string;
  mutualFriends: number;
  topMutualFriends: string[];
  commonInterests: number;
  sameLocation: boolean;
  recommendationScore: number;
}

interface ProductRecommendation {
  productId: string;
  name: string;
  price: number;
  category: string;
  imageUrl?: string;
  avgRating: number;
  recommendationStrength: number;
  basedOnUsers: number;
}

interface GraphAnalytics {
  users: {
    total: number;
    avgFriends: number;
    maxFriends: number;
    medianFriends: number;
  };
  relationships: Array<{
    type: string;
    count: number;
  }>;
  communities: {
    total: number;
    avgSize: number;
    largest: number;
  };
}

This comprehensive advanced database concepts guide gives you:

  1. Strategic storage engine selection that matches database engines to specific workload characteristics
  2. Advanced indexing strategies with performance monitoring and optimization recommendations
  3. Specialized database implementations using time-series and graph databases for appropriate use cases
  4. Performance optimization techniques that go beyond basic SQL tuning
  5. Production-ready monitoring and analytics for database health and query performance

The difference between database architectures that scale gracefully and those that require constant rebuilds isn’t just following best practices—it’s understanding the fundamental characteristics of different data patterns and choosing the right specialized tools for each job.