Third-party Integration - 2/2

The $18 Million Integration Tsunami That Crushed Cyber Monday

Picture this catastrophe: November 27th, 11:30 PM EST. One of the largest retail platforms is preparing for Cyber Monday—the biggest online shopping day of the year. They’ve successfully handled Black Friday with 300% normal traffic. Their integrations with payment processors, inventory systems, and shipping providers have been rock solid. Everything is optimized and ready.

Then midnight hits, and the perfect storm begins.

Within the first 30 minutes, what started as a trickle becomes a tsunami:

  • Stripe starts sending 50,000 webhook events per minute instead of the usual 500
  • PayPal webhooks begin arriving 4 hours late, causing payment status confusion
  • The inventory management system API starts rate limiting at 100 requests/minute instead of 1,000
  • Shipping carrier APIs return 500 errors for 40% of tracking requests
  • Email service webhooks flood the servers with 200,000 delivery status updates
  • Social media sharing APIs hit their daily quota limits by 1 AM
  • The recommendation engine API starts timing out after 2 seconds instead of responding in 200ms

But here’s where it gets catastrophic—their webhook processing system, designed to handle the “normal” flow of external events, begins to collapse under the load:

  • Webhook queue overflow: 2 million unprocessed webhook events backed up
  • Duplicate processing: Failed webhooks retrying without deduplication, creating data corruption
  • Resource exhaustion: Webhook processors consuming 100% CPU trying to handle the flood
  • Database deadlocks: Concurrent webhook processing causing payment status conflicts
  • Memory leaks: Webhook handlers not releasing resources, causing server crashes
  • Cascading failures: Slow webhook processing causing HTTP timeouts, triggering even more retries

By 3 AM, the damage was staggering:

  • $18 million in revenue lost due to failed payment confirmations
  • 1.2 million customers with incorrect order statuses
  • 800,000 shipping notifications never sent
  • 3 million email delivery events never processed
  • Complete breakdown of real-time inventory tracking
  • Customer service receiving 100,000 tickets about “missing” orders that actually went through

The brutal truth? They had built integrations that worked beautifully under normal load but had never planned for the exponential chaos that external services create during peak traffic.

The Uncomfortable Truth About Integration Resilience

Here’s what separates applications that gracefully handle third-party chaos from those that crumble when external services misbehave: Integration resilience isn’t just about handling individual API failures—it’s about building systems that can adapt to unpredictable patterns, survive webhook floods, and maintain consistency when external services behave in ways you never anticipated.

Most developers approach integration resilience like this:

  1. Add retry logic and assume that solves reliability
  2. Process webhooks as they arrive and hope the load stays manageable
  3. Set up basic monitoring for uptime and response times
  4. Discover during peak traffic that external services scale differently than your application
  5. Panic when webhook floods overwhelm your servers and corrupt your data

But systems that handle real-world integration complexity work differently:

  1. Design webhook processing to handle unpredictable loads and patterns
  2. Implement intelligent rate limiting that adapts to external service behavior
  3. Build comprehensive monitoring that tracks integration health, not just uptime
  4. Plan for service degradation and implement smart fallback strategies
  5. Treat external service behavior as a distributed systems problem requiring sophisticated coordination

The difference isn’t just reliability—it’s the difference between systems that become more robust as they integrate with more services and systems where each new integration becomes a potential single point of failure.

Ready to build integrations that work like Shopify’s payment processing during Black Friday instead of that webhook handler that falls over when someone sneezes at the third-party data center? Let’s dive into the patterns that power bulletproof integration resilience.


Webhook Handling and Processing: Taming the Event Storm

Scalable Webhook Processing Architecture

// Advanced webhook processing system that handles floods and ensures reliability
class WebhookProcessingEngine {
  private processors: Map<string, WebhookProcessor> = new Map();
  private incomingQueue: WebhookQueue;
  private processingQueue: WebhookQueue;
  private deadLetterQueue: WebhookQueue;
  private deduplicationCache: DeduplicationCache;
  private rateLimiter: WebhookRateLimiter;
  private retryManager: WebhookRetryManager;
  private metricsCollector: WebhookMetrics;

  constructor(config: WebhookEngineConfig) {
    this.incomingQueue = new WebhookQueue("incoming", config.queueConfig);
    this.processingQueue = new WebhookQueue("processing", config.queueConfig);
    this.deadLetterQueue = new WebhookQueue("dead-letter", config.queueConfig);
    this.deduplicationCache = new DeduplicationCache(config.cacheConfig);
    this.rateLimiter = new WebhookRateLimiter(config.rateLimitConfig);
    this.retryManager = new WebhookRetryManager(config.retryConfig);
    this.metricsCollector = new WebhookMetrics();

    this.initializeProcessors(config.processors);
    this.startProcessingWorkers(config.workerCount || 10);
    this.setupHealthMonitoring();
  }

  // Accept incoming webhook with immediate response
  async receiveWebhook(
    provider: string,
    signature: string,
    payload: any,
    headers: Record<string, string>
  ): Promise<WebhookReceiptResult> {
    const webhookId = this.generateWebhookId();
    const receivedAt = Date.now();

    try {
      // Immediate signature validation (fast path)
      const processor = this.processors.get(provider);
      if (!processor) {
        throw new WebhookError(
          `No processor found for provider: ${provider}`,
          "UNKNOWN_PROVIDER"
        );
      }

      // Basic signature validation
      const isValidSignature = await processor.validateSignature(
        signature,
        payload,
        headers
      );

      if (!isValidSignature) {
        await this.metricsCollector.recordRejected(
          provider,
          "INVALID_SIGNATURE"
        );
        throw new WebhookError(
          "Invalid webhook signature",
          "INVALID_SIGNATURE"
        );
      }

      // Apply rate limiting per provider
      const rateLimitResult = await this.rateLimiter.checkLimit(provider);
      if (!rateLimitResult.allowed) {
        await this.metricsCollector.recordRejected(provider, "RATE_LIMITED");
        throw new WebhookError("Rate limit exceeded", "RATE_LIMITED", 429, {
          retryAfter: rateLimitResult.retryAfter,
        });
      }

      // Create webhook record
      const webhook: IncomingWebhook = {
        id: webhookId,
        provider,
        signature,
        payload,
        headers,
        receivedAt,
        status: "received",
        attempts: 0,
        maxAttempts: this.getMaxAttempts(provider),
        nextRetryAt: null,
      };

      // Queue for processing (fast operation)
      await this.incomingQueue.enqueue(webhook);

      // Record metrics
      await this.metricsCollector.recordReceived(provider);

      console.log(`Webhook received: ${webhookId} from ${provider}`);

      return {
        webhookId,
        status: "accepted",
        message: "Webhook queued for processing",
      };
    } catch (error) {
      console.error(`Webhook reception failed: ${webhookId}`, error);

      return {
        webhookId,
        status: "rejected",
        message: error.message,
        error: error.code,
      };
    }
  }

  // Process webhooks with deduplication and retry logic
  private async processWebhook(webhook: IncomingWebhook): Promise<void> {
    const startTime = Date.now();
    let processingResult: WebhookProcessingResult;

    try {
      // Check for duplicate processing
      const deduplicationKey = this.generateDeduplicationKey(webhook);
      const isDuplicate = await this.deduplicationCache.isDuplicate(
        deduplicationKey
      );

      if (isDuplicate) {
        console.log(`Duplicate webhook detected: ${webhook.id}`);
        await this.metricsCollector.recordDuplicate(webhook.provider);
        return;
      }

      // Mark as processing to prevent duplicates
      await this.deduplicationCache.markProcessing(
        deduplicationKey,
        webhook.id
      );

      // Get processor for this provider
      const processor = this.processors.get(webhook.provider);
      if (!processor) {
        throw new WebhookError(
          `No processor found for provider: ${webhook.provider}`,
          "NO_PROCESSOR"
        );
      }

      // Parse webhook events
      const events = await processor.parseEvents(
        webhook.payload,
        webhook.headers
      );

      if (events.length === 0) {
        console.log(`No events found in webhook: ${webhook.id}`);
        await this.metricsCollector.recordProcessed(
          webhook.provider,
          0,
          Date.now() - startTime
        );
        return;
      }

      // Process each event in the webhook
      processingResult = await this.processWebhookEvents(webhook, events);

      // Record successful processing
      await this.metricsCollector.recordProcessed(
        webhook.provider,
        events.length,
        Date.now() - startTime
      );

      // Mark processing complete in cache
      await this.deduplicationCache.markCompleted(
        deduplicationKey,
        processingResult
      );

      console.log(
        `Webhook processed successfully: ${webhook.id} (${events.length} events)`
      );
    } catch (error) {
      const duration = Date.now() - startTime;

      console.error(`Webhook processing failed: ${webhook.id}`, error);

      // Determine if error is retryable
      const shouldRetry = this.shouldRetryWebhook(error, webhook);

      if (shouldRetry && webhook.attempts < webhook.maxAttempts) {
        // Calculate retry delay with exponential backoff
        const retryDelay = this.calculateRetryDelay(webhook.attempts);
        webhook.nextRetryAt = Date.now() + retryDelay;
        webhook.attempts++;
        webhook.lastError = error.message;

        // Requeue for retry
        await this.processingQueue.enqueueDelayed(webhook, retryDelay);

        await this.metricsCollector.recordRetry(
          webhook.provider,
          webhook.attempts
        );

        console.log(
          `Webhook queued for retry: ${webhook.id} (attempt ${webhook.attempts}/${webhook.maxAttempts})`
        );
      } else {
        // Max retries exceeded or non-retryable error
        webhook.status = "failed";
        webhook.lastError = error.message;

        await this.deadLetterQueue.enqueue(webhook);
        await this.metricsCollector.recordFailed(webhook.provider, duration);

        console.error(
          `Webhook moved to dead letter queue: ${webhook.id}`,
          error
        );
      }
    }
  }

  private async processWebhookEvents(
    webhook: IncomingWebhook,
    events: WebhookEvent[]
  ): Promise<WebhookProcessingResult> {
    const results: EventProcessingResult[] = [];
    const processor = this.processors.get(webhook.provider)!;

    for (const event of events) {
      try {
        // Process individual event with timeout
        const eventResult = await Promise.race([
          processor.processEvent(event, webhook),
          new Promise<never>((_, reject) =>
            setTimeout(
              () => reject(new Error("Event processing timeout")),
              30000 // 30 second timeout per event
            )
          ),
        ]);

        results.push({
          eventId: event.id,
          eventType: event.type,
          success: true,
          result: eventResult,
        });
      } catch (error) {
        console.error(
          `Event processing failed: ${webhook.id}/${event.id}`,
          error
        );

        results.push({
          eventId: event.id,
          eventType: event.type,
          success: false,
          error: error.message,
        });

        // If it's a critical event type, fail the entire webhook
        if (this.isCriticalEventType(event.type)) {
          throw new WebhookError(
            `Critical event processing failed: ${event.type}`,
            "CRITICAL_EVENT_FAILED",
            500,
            { eventId: event.id, originalError: error }
          );
        }
      }
    }

    const successfulEvents = results.filter((r) => r.success).length;
    const failedEvents = results.filter((r) => !r.success).length;

    return {
      webhookId: webhook.id,
      totalEvents: events.length,
      successfulEvents,
      failedEvents,
      results,
      processedAt: Date.now(),
    };
  }

  // Start processing workers
  private startProcessingWorkers(workerCount: number): void {
    for (let i = 0; i < workerCount; i++) {
      this.startProcessingWorker(`worker-${i}`);
    }
    console.log(`Started ${workerCount} webhook processing workers`);
  }

  private async startProcessingWorker(workerId: string): Promise<void> {
    console.log(`Starting webhook processing worker: ${workerId}`);

    // Process incoming webhooks
    const processIncoming = async () => {
      while (true) {
        try {
          const webhook = await this.incomingQueue.dequeue(5000); // 5 second timeout
          if (webhook) {
            webhook.status = "processing";
            await this.processWebhook(webhook);
          }
        } catch (error) {
          console.error(`Worker ${workerId} error processing incoming:`, error);
          await new Promise((resolve) => setTimeout(resolve, 1000)); // Brief pause on error
        }
      }
    };

    // Process retry webhooks
    const processRetries = async () => {
      while (true) {
        try {
          const webhook = await this.processingQueue.dequeueReady(5000);
          if (webhook) {
            webhook.status = "retrying";
            await this.processWebhook(webhook);
          }
        } catch (error) {
          console.error(`Worker ${workerId} error processing retries:`, error);
          await new Promise((resolve) => setTimeout(resolve, 1000));
        }
      }
    };

    // Run both processing loops concurrently
    Promise.all([processIncoming(), processRetries()]);
  }

  // Webhook replay for failed processing
  async replayWebhook(webhookId: string): Promise<WebhookReplayResult> {
    try {
      // Find webhook in dead letter queue
      const webhook = await this.deadLetterQueue.findById(webhookId);
      if (!webhook) {
        throw new WebhookError("Webhook not found", "WEBHOOK_NOT_FOUND");
      }

      // Reset webhook state for replay
      webhook.status = "received";
      webhook.attempts = 0;
      webhook.nextRetryAt = null;
      webhook.lastError = null;

      // Remove from dead letter queue
      await this.deadLetterQueue.remove(webhookId);

      // Requeue for processing
      await this.incomingQueue.enqueue(webhook);

      console.log(`Webhook queued for replay: ${webhookId}`);

      return {
        webhookId,
        status: "replaying",
        message: "Webhook queued for replay processing",
      };
    } catch (error) {
      return {
        webhookId,
        status: "replay_failed",
        message: error.message,
        error: error.code,
      };
    }
  }

  // Batch webhook operations
  async replayWebhookBatch(
    criteria: WebhookReplayCriteria
  ): Promise<BatchReplayResult> {
    const webhooks = await this.deadLetterQueue.findByCriteria(criteria);
    const results: WebhookReplayResult[] = [];

    for (const webhook of webhooks) {
      const result = await this.replayWebhook(webhook.id);
      results.push(result);
    }

    return {
      total: webhooks.length,
      successful: results.filter((r) => r.status === "replaying").length,
      failed: results.filter((r) => r.status === "replay_failed").length,
      results,
    };
  }

  // Webhook analytics and monitoring
  async getWebhookStatistics(
    timeRange: TimeRange,
    provider?: string
  ): Promise<WebhookStatistics> {
    return await this.metricsCollector.getStatistics(timeRange, provider);
  }

  async getWebhookHealth(): Promise<WebhookHealthStatus> {
    const queueStats = {
      incoming: await this.incomingQueue.getSize(),
      processing: await this.processingQueue.getSize(),
      deadLetter: await this.deadLetterQueue.getSize(),
    };

    const processingRates = await this.metricsCollector.getProcessingRates();
    const errorRates = await this.metricsCollector.getErrorRates();

    return {
      queues: queueStats,
      processingRates,
      errorRates,
      healthScore: this.calculateHealthScore(queueStats, errorRates),
      lastUpdated: Date.now(),
    };
  }

  // Initialize webhook processors for different providers
  private initializeProcessors(
    processorConfigs: WebhookProcessorConfig[]
  ): void {
    for (const config of processorConfigs) {
      let processor: WebhookProcessor;

      switch (config.provider) {
        case "stripe":
          processor = new StripeWebhookProcessor(config);
          break;
        case "paypal":
          processor = new PayPalWebhookProcessor(config);
          break;
        case "github":
          processor = new GitHubWebhookProcessor(config);
          break;
        case "sendgrid":
          processor = new SendGridWebhookProcessor(config);
          break;
        default:
          throw new Error(`Unsupported webhook provider: ${config.provider}`);
      }

      this.processors.set(config.provider, processor);
      console.log(`Initialized webhook processor: ${config.provider}`);
    }
  }

  // Utility methods
  private generateWebhookId(): string {
    return `wh_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }

  private generateDeduplicationKey(webhook: IncomingWebhook): string {
    // Create unique key based on provider, signature, and payload hash
    const payloadHash = this.hashPayload(webhook.payload);
    return `${webhook.provider}:${webhook.signature}:${payloadHash}`;
  }

  private hashPayload(payload: any): string {
    const crypto = require("crypto");
    return crypto
      .createHash("sha256")
      .update(JSON.stringify(payload))
      .digest("hex");
  }

  private shouldRetryWebhook(error: any, webhook: IncomingWebhook): boolean {
    // Determine if webhook should be retried based on error type
    const retryableErrors = [
      "TIMEOUT",
      "CONNECTION_ERROR",
      "SERVICE_UNAVAILABLE",
      "DATABASE_ERROR",
    ];

    const nonRetryableErrors = [
      "INVALID_SIGNATURE",
      "NO_PROCESSOR",
      "MALFORMED_PAYLOAD",
    ];

    if (nonRetryableErrors.includes(error.code)) {
      return false;
    }

    if (retryableErrors.includes(error.code) || error.status >= 500) {
      return true;
    }

    return false;
  }

  private calculateRetryDelay(attemptNumber: number): number {
    // Exponential backoff with jitter
    const baseDelay = 1000; // 1 second
    const maxDelay = 300000; // 5 minutes
    const exponentialDelay = baseDelay * Math.pow(2, attemptNumber);
    const jitter = Math.random() * 1000; // Up to 1 second jitter

    return Math.min(maxDelay, exponentialDelay + jitter);
  }

  private getMaxAttempts(provider: string): number {
    const maxAttemptsByProvider: Record<string, number> = {
      stripe: 5,
      paypal: 3,
      github: 7,
      sendgrid: 5,
    };

    return maxAttemptsByProvider[provider] || 5;
  }

  private isCriticalEventType(eventType: string): boolean {
    const criticalEvents = [
      "payment.completed",
      "payment.failed",
      "subscription.cancelled",
      "account.deleted",
    ];

    return criticalEvents.includes(eventType);
  }

  private calculateHealthScore(queueStats: any, errorRates: any): number {
    let score = 100;

    // Deduct points for queue backlog
    if (queueStats.incoming > 1000) score -= 20;
    if (queueStats.processing > 500) score -= 15;
    if (queueStats.deadLetter > 100) score -= 25;

    // Deduct points for high error rates
    if (errorRates.overall > 0.1) score -= 30; // 10% error rate
    if (errorRates.overall > 0.05) score -= 15; // 5% error rate

    return Math.max(0, Math.min(100, score));
  }

  private setupHealthMonitoring(): void {
    setInterval(async () => {
      try {
        const health = await this.getWebhookHealth();

        if (health.healthScore < 70) {
          console.warn("Webhook system health degraded:", health);
        }

        // Auto-scale workers if queues are backing up
        if (health.queues.incoming > 5000) {
          console.log("High webhook queue detected, consider scaling workers");
        }
      } catch (error) {
        console.error("Health monitoring error:", error);
      }
    }, 30000); // Every 30 seconds
  }
}

// Example Stripe webhook processor implementation
class StripeWebhookProcessor implements WebhookProcessor {
  constructor(private config: WebhookProcessorConfig) {}

  async validateSignature(
    signature: string,
    payload: any,
    headers: Record<string, string>
  ): Promise<boolean> {
    try {
      const stripe = require("stripe")(this.config.secretKey);
      const endpointSecret = this.config.webhookSecret;

      // Use Stripe's signature validation
      stripe.webhooks.constructEvent(payload, signature, endpointSecret);
      return true;
    } catch (error) {
      console.error("Stripe signature validation failed:", error.message);
      return false;
    }
  }

  async parseEvents(
    payload: any,
    headers: Record<string, string>
  ): Promise<WebhookEvent[]> {
    const stripeEvent = payload;

    return [
      {
        id: stripeEvent.id,
        type: stripeEvent.type,
        data: stripeEvent.data,
        createdAt: stripeEvent.created * 1000, // Convert to milliseconds
        metadata: {
          apiVersion: stripeEvent.api_version,
          livemode: stripeEvent.livemode,
        },
      },
    ];
  }

  async processEvent(
    event: WebhookEvent,
    webhook: IncomingWebhook
  ): Promise<EventProcessingResult> {
    console.log(`Processing Stripe event: ${event.type} (${event.id})`);

    try {
      switch (event.type) {
        case "payment_intent.succeeded":
          return await this.handlePaymentSuccess(event);

        case "payment_intent.payment_failed":
          return await this.handlePaymentFailed(event);

        case "customer.subscription.created":
          return await this.handleSubscriptionCreated(event);

        case "customer.subscription.deleted":
          return await this.handleSubscriptionCancelled(event);

        case "invoice.payment_succeeded":
          return await this.handleInvoicePaymentSuccess(event);

        case "invoice.payment_failed":
          return await this.handleInvoicePaymentFailed(event);

        default:
          console.log(`Unhandled Stripe event type: ${event.type}`);
          return {
            success: true,
            message: "Event type not handled, but acknowledged",
          };
      }
    } catch (error) {
      return {
        success: false,
        error: error.message,
        retryable: this.isRetryableError(error),
      };
    }
  }

  private async handlePaymentSuccess(
    event: WebhookEvent
  ): Promise<EventProcessingResult> {
    const paymentIntent = event.data.object;

    // Update payment status in database
    await this.updatePaymentStatus(
      paymentIntent.metadata?.orderId || paymentIntent.id,
      "completed",
      {
        stripePaymentIntentId: paymentIntent.id,
        amount: paymentIntent.amount,
        currency: paymentIntent.currency,
      }
    );

    // Send confirmation email
    await this.sendPaymentConfirmation(paymentIntent);

    // Update inventory
    await this.updateInventory(paymentIntent.metadata?.orderId);

    return {
      success: true,
      message: `Payment confirmed: ${paymentIntent.id}`,
      data: { paymentIntentId: paymentIntent.id },
    };
  }

  private async handlePaymentFailed(
    event: WebhookEvent
  ): Promise<EventProcessingResult> {
    const paymentIntent = event.data.object;

    // Update payment status in database
    await this.updatePaymentStatus(
      paymentIntent.metadata?.orderId || paymentIntent.id,
      "failed",
      {
        stripePaymentIntentId: paymentIntent.id,
        failureReason: paymentIntent.last_payment_error?.message,
      }
    );

    // Send failure notification
    await this.sendPaymentFailureNotification(paymentIntent);

    // Release inventory hold
    await this.releaseInventoryHold(paymentIntent.metadata?.orderId);

    return {
      success: true,
      message: `Payment failure processed: ${paymentIntent.id}`,
      data: { paymentIntentId: paymentIntent.id },
    };
  }

  private async handleSubscriptionCreated(
    event: WebhookEvent
  ): Promise<EventProcessingResult> {
    const subscription = event.data.object;

    // Create subscription record
    await this.createSubscription({
      stripeSubscriptionId: subscription.id,
      customerId: subscription.customer,
      status: subscription.status,
      currentPeriodStart: subscription.current_period_start * 1000,
      currentPeriodEnd: subscription.current_period_end * 1000,
      items: subscription.items.data,
    });

    // Send welcome email
    await this.sendSubscriptionWelcome(subscription);

    return {
      success: true,
      message: `Subscription created: ${subscription.id}`,
      data: { subscriptionId: subscription.id },
    };
  }

  // Placeholder methods for actual business logic
  private async updatePaymentStatus(
    orderId: string,
    status: string,
    metadata: any
  ): Promise<void> {
    // Implementation would update your database
    console.log(`Updating payment status: ${orderId} -> ${status}`);
  }

  private async sendPaymentConfirmation(paymentIntent: any): Promise<void> {
    // Implementation would send email via your email service
    console.log(`Sending payment confirmation for: ${paymentIntent.id}`);
  }

  private async updateInventory(orderId: string): Promise<void> {
    // Implementation would update inventory levels
    console.log(`Updating inventory for order: ${orderId}`);
  }

  private async sendPaymentFailureNotification(
    paymentIntent: any
  ): Promise<void> {
    // Implementation would notify customer of payment failure
    console.log(`Sending failure notification for: ${paymentIntent.id}`);
  }

  private async releaseInventoryHold(orderId: string): Promise<void> {
    // Implementation would release inventory hold
    console.log(`Releasing inventory hold for order: ${orderId}`);
  }

  private async createSubscription(subscriptionData: any): Promise<void> {
    // Implementation would create subscription record
    console.log(
      `Creating subscription: ${subscriptionData.stripeSubscriptionId}`
    );
  }

  private async sendSubscriptionWelcome(subscription: any): Promise<void> {
    // Implementation would send welcome email
    console.log(`Sending subscription welcome: ${subscription.id}`);
  }

  private isRetryableError(error: any): boolean {
    const retryableErrors = [
      "DATABASE_CONNECTION_ERROR",
      "EMAIL_SERVICE_ERROR",
      "INVENTORY_SERVICE_ERROR",
    ];

    return retryableErrors.includes(error.code) || error.status >= 500;
  }
}

// Supporting classes and interfaces
interface WebhookEngineConfig {
  queueConfig: any;
  cacheConfig: any;
  rateLimitConfig: any;
  retryConfig: any;
  processors: WebhookProcessorConfig[];
  workerCount?: number;
}

interface WebhookProcessorConfig {
  provider: string;
  secretKey: string;
  webhookSecret: string;
  maxEvents?: number;
}

interface WebhookProcessor {
  validateSignature(
    signature: string,
    payload: any,
    headers: Record<string, string>
  ): Promise<boolean>;
  parseEvents(
    payload: any,
    headers: Record<string, string>
  ): Promise<WebhookEvent[]>;
  processEvent(
    event: WebhookEvent,
    webhook: IncomingWebhook
  ): Promise<EventProcessingResult>;
}

interface IncomingWebhook {
  id: string;
  provider: string;
  signature: string;
  payload: any;
  headers: Record<string, string>;
  receivedAt: number;
  status: "received" | "processing" | "retrying" | "completed" | "failed";
  attempts: number;
  maxAttempts: number;
  nextRetryAt: number | null;
  lastError?: string;
}

interface WebhookEvent {
  id: string;
  type: string;
  data: any;
  createdAt: number;
  metadata?: any;
}

interface EventProcessingResult {
  success: boolean;
  message?: string;
  data?: any;
  error?: string;
  retryable?: boolean;
}

interface WebhookProcessingResult {
  webhookId: string;
  totalEvents: number;
  successfulEvents: number;
  failedEvents: number;
  results: EventProcessingResult[];
  processedAt: number;
}

interface WebhookReceiptResult {
  webhookId: string;
  status: "accepted" | "rejected";
  message: string;
  error?: string;
}

interface WebhookReplayResult {
  webhookId: string;
  status: "replaying" | "replay_failed";
  message: string;
  error?: string;
}

interface BatchReplayResult {
  total: number;
  successful: number;
  failed: number;
  results: WebhookReplayResult[];
}

interface WebhookReplayCriteria {
  provider?: string;
  dateRange?: { start: Date; end: Date };
  errorType?: string;
  limit?: number;
}

interface WebhookStatistics {
  totalReceived: number;
  totalProcessed: number;
  totalFailed: number;
  successRate: number;
  averageProcessingTime: number;
  byProvider: Record<string, any>;
}

interface WebhookHealthStatus {
  queues: {
    incoming: number;
    processing: number;
    deadLetter: number;
  };
  processingRates: any;
  errorRates: any;
  healthScore: number;
  lastUpdated: number;
}

interface TimeRange {
  start: Date;
  end: Date;
}

class WebhookError extends Error {
  constructor(
    message: string,
    public code: string,
    public status?: number,
    public metadata?: any
  ) {
    super(message);
    this.name = "WebhookError";
  }
}

// Placeholder classes that would need full implementation
class WebhookQueue {
  constructor(private name: string, private config: any) {}
  async enqueue(webhook: IncomingWebhook): Promise<void> {}
  async enqueueDelayed(
    webhook: IncomingWebhook,
    delayMs: number
  ): Promise<void> {}
  async dequeue(timeoutMs?: number): Promise<IncomingWebhook | null> {
    return null;
  }
  async dequeueReady(timeoutMs?: number): Promise<IncomingWebhook | null> {
    return null;
  }
  async findById(id: string): Promise<IncomingWebhook | null> {
    return null;
  }
  async findByCriteria(
    criteria: WebhookReplayCriteria
  ): Promise<IncomingWebhook[]> {
    return [];
  }
  async remove(id: string): Promise<void> {}
  async getSize(): Promise<number> {
    return 0;
  }
}

class DeduplicationCache {
  constructor(private config: any) {}
  async isDuplicate(key: string): Promise<boolean> {
    return false;
  }
  async markProcessing(key: string, webhookId: string): Promise<void> {}
  async markCompleted(
    key: string,
    result: WebhookProcessingResult
  ): Promise<void> {}
}

class WebhookRateLimiter {
  constructor(private config: any) {}
  async checkLimit(
    provider: string
  ): Promise<{ allowed: boolean; retryAfter?: number }> {
    return { allowed: true };
  }
}

class WebhookRetryManager {
  constructor(private config: any) {}
}

class WebhookMetrics {
  async recordReceived(provider: string): Promise<void> {}
  async recordRejected(provider: string, reason: string): Promise<void> {}
  async recordProcessed(
    provider: string,
    eventCount: number,
    duration: number
  ): Promise<void> {}
  async recordRetry(provider: string, attempt: number): Promise<void> {}
  async recordFailed(provider: string, duration: number): Promise<void> {}
  async recordDuplicate(provider: string): Promise<void> {}
  async getStatistics(
    timeRange: TimeRange,
    provider?: string
  ): Promise<WebhookStatistics> {
    return {} as WebhookStatistics;
  }
  async getProcessingRates(): Promise<any> {
    return {};
  }
  async getErrorRates(): Promise<any> {
    return {};
  }
}

API Rate Limiting and Retry Strategies: Intelligent Request Management

Adaptive Rate Limiting and Smart Retry System

// Advanced rate limiting and retry system that adapts to API behavior
class IntelligentAPIManager {
  private rateLimiters: Map<string, AdaptiveRateLimiter> = new Map();
  private retryStrategies: Map<string, RetryStrategy> = new Map();
  private circuitBreakers: Map<string, CircuitBreaker> = new Map();
  private apiMetrics: APIMetricsCollector;
  private healthMonitor: APIHealthMonitor;
  private backoffCalculator: BackoffCalculator;

  constructor(config: APIManagerConfig) {
    this.apiMetrics = new APIMetricsCollector();
    this.healthMonitor = new APIHealthMonitor();
    this.backoffCalculator = new BackoffCalculator();

    this.initializeAPIConfigs(config.apis);
    this.startHealthMonitoring();
    this.startRateLimitAdjustment();
  }

  private initializeAPIConfigs(apiConfigs: APIConfig[]): void {
    for (const config of apiConfigs) {
      // Initialize adaptive rate limiter
      const rateLimiter = new AdaptiveRateLimiter({
        name: config.name,
        initialLimit: config.rateLimit.initialLimit,
        windowMs: config.rateLimit.windowMs,
        maxLimit:
          config.rateLimit.maxLimit || config.rateLimit.initialLimit * 10,
        minLimit:
          config.rateLimit.minLimit ||
          Math.max(1, config.rateLimit.initialLimit / 10),
        adaptationFactor: config.rateLimit.adaptationFactor || 0.1,
      });

      // Initialize retry strategy
      const retryStrategy = new RetryStrategy({
        name: config.name,
        maxAttempts: config.retry.maxAttempts,
        baseDelayMs: config.retry.baseDelayMs,
        maxDelayMs: config.retry.maxDelayMs,
        backoffType: config.retry.backoffType,
        jitterFactor: config.retry.jitterFactor || 0.1,
        retryableStatusCodes: config.retry.retryableStatusCodes,
        retryableErrors: config.retry.retryableErrors,
      });

      // Initialize circuit breaker
      const circuitBreaker = new CircuitBreaker({
        name: config.name,
        failureThreshold: config.circuitBreaker.failureThreshold,
        recoveryTimeoutMs: config.circuitBreaker.recoveryTimeoutMs,
        halfOpenMaxCalls: config.circuitBreaker.halfOpenMaxCalls || 3,
        monitoringWindowMs: config.circuitBreaker.monitoringWindowMs || 60000,
      });

      this.rateLimiters.set(config.name, rateLimiter);
      this.retryStrategies.set(config.name, retryStrategy);
      this.circuitBreakers.set(config.name, circuitBreaker);

      console.log(`Initialized API management for: ${config.name}`);
    }
  }

  // Execute API request with intelligent rate limiting and retry
  async executeRequest<T>(
    apiName: string,
    requestConfig: APIRequestConfig
  ): Promise<APIExecutionResult<T>> {
    const startTime = Date.now();
    const requestId = this.generateRequestId();

    const rateLimiter = this.rateLimiters.get(apiName);
    const retryStrategy = this.retryStrategies.get(apiName);
    const circuitBreaker = this.circuitBreakers.get(apiName);

    if (!rateLimiter || !retryStrategy || !circuitBreaker) {
      throw new Error(`API configuration not found: ${apiName}`);
    }

    // Check circuit breaker state
    if (circuitBreaker.isOpen()) {
      const error = new APIError(
        `Circuit breaker open for API: ${apiName}`,
        "CIRCUIT_BREAKER_OPEN",
        503
      );

      await this.apiMetrics.recordRequest(apiName, requestId, {
        status: "circuit_breaker_open",
        duration: Date.now() - startTime,
        error: error.code,
      });

      throw error;
    }

    let attemptNumber = 0;
    let lastError: APIError;

    while (attemptNumber < retryStrategy.maxAttempts) {
      attemptNumber++;

      try {
        // Wait for rate limit availability
        await this.waitForRateLimit(rateLimiter, requestId);

        // Execute the actual request
        const result = await this.executeActualRequest<T>(
          apiName,
          requestConfig,
          requestId,
          attemptNumber
        );

        // Record success metrics
        await this.recordSuccessMetrics(
          apiName,
          requestId,
          startTime,
          attemptNumber
        );

        // Update circuit breaker
        circuitBreaker.recordSuccess();

        // Adjust rate limits based on success
        await this.adjustRateLimitsOnSuccess(rateLimiter, result);

        return {
          requestId,
          data: result.data,
          status: result.status,
          headers: result.headers,
          duration: Date.now() - startTime,
          attempts: attemptNumber,
          apiName,
        };
      } catch (error) {
        lastError = error as APIError;

        // Record error metrics
        await this.recordErrorMetrics(
          apiName,
          requestId,
          startTime,
          attemptNumber,
          lastError
        );

        // Update circuit breaker
        circuitBreaker.recordFailure();

        // Adjust rate limits based on error
        await this.adjustRateLimitsOnError(rateLimiter, lastError);

        // Check if error is retryable
        if (!this.shouldRetry(retryStrategy, lastError, attemptNumber)) {
          break;
        }

        // Calculate and wait for retry delay
        if (attemptNumber < retryStrategy.maxAttempts) {
          const retryDelay = await this.calculateRetryDelay(
            retryStrategy,
            lastError,
            attemptNumber
          );

          console.log(
            `Retrying API request: ${requestId} (attempt ${attemptNumber + 1}/${
              retryStrategy.maxAttempts
            }) after ${retryDelay}ms`
          );

          await new Promise((resolve) => setTimeout(resolve, retryDelay));
        }
      }
    }

    // All retries exhausted
    throw new APIError(
      `All retry attempts exhausted for API: ${apiName}`,
      "MAX_RETRIES_EXCEEDED",
      lastError?.status || 500,
      {
        originalError: lastError,
        attempts: attemptNumber,
        totalDuration: Date.now() - startTime,
      }
    );
  }

  private async waitForRateLimit(
    rateLimiter: AdaptiveRateLimiter,
    requestId: string
  ): Promise<void> {
    const waitTime = await rateLimiter.getWaitTime();

    if (waitTime > 0) {
      console.log(`Rate limit wait: ${requestId} waiting ${waitTime}ms`);
      await new Promise((resolve) => setTimeout(resolve, waitTime));
    }

    await rateLimiter.acquireToken();
  }

  private async executeActualRequest<T>(
    apiName: string,
    requestConfig: APIRequestConfig,
    requestId: string,
    attemptNumber: number
  ): Promise<RawAPIResponse<T>> {
    const requestStart = Date.now();

    try {
      // Add retry headers
      const headers = {
        ...requestConfig.headers,
        "X-Request-ID": requestId,
        "X-Retry-Attempt": attemptNumber.toString(),
      };

      // Execute request with timeout
      const timeoutPromise = new Promise<never>((_, reject) => {
        setTimeout(
          () => reject(new APIError("Request timeout", "TIMEOUT", 408)),
          requestConfig.timeoutMs || 30000
        );
      });

      const requestPromise = this.makeHTTPRequest<T>({
        ...requestConfig,
        headers,
      });

      const response = await Promise.race([requestPromise, timeoutPromise]);

      // Check for API-specific error conditions
      this.validateAPIResponse(response);

      return response;
    } catch (error) {
      const duration = Date.now() - requestStart;

      // Transform error to standardized format
      throw this.transformError(error, apiName, duration);
    }
  }

  private async calculateRetryDelay(
    strategy: RetryStrategy,
    error: APIError,
    attemptNumber: number
  ): Promise<number> {
    let delay: number;

    // Check if API provided Retry-After header
    if (error.retryAfter) {
      delay = error.retryAfter * 1000; // Convert seconds to milliseconds
    } else {
      // Use configured backoff strategy
      delay = this.backoffCalculator.calculateDelay(
        strategy.backoffType,
        strategy.baseDelayMs,
        attemptNumber,
        strategy.maxDelayMs
      );
    }

    // Add jitter to prevent thundering herd
    if (strategy.jitterFactor > 0) {
      const jitter = delay * strategy.jitterFactor * (Math.random() - 0.5) * 2;
      delay = Math.max(0, delay + jitter);
    }

    // Apply error-specific delay multipliers
    delay = this.applyErrorSpecificDelay(delay, error);

    return Math.min(delay, strategy.maxDelayMs);
  }

  private applyErrorSpecificDelay(baseDelay: number, error: APIError): number {
    const multipliers: Record<string, number> = {
      RATE_LIMITED: 2.0, // Wait longer for rate limit errors
      SERVER_ERROR: 1.5, // Wait a bit longer for server errors
      TIMEOUT: 1.2, // Slightly longer for timeouts
      CONNECTION_ERROR: 2.0, // Much longer for connection issues
    };

    const multiplier = multipliers[error.code] || 1.0;
    return baseDelay * multiplier;
  }

  private async adjustRateLimitsOnSuccess(
    rateLimiter: AdaptiveRateLimiter,
    response: RawAPIResponse<any>
  ): Promise<void> {
    // Check for rate limit headers
    const remaining = this.parseRateLimitHeader(response.headers, "remaining");
    const limit = this.parseRateLimitHeader(response.headers, "limit");
    const resetTime = this.parseRateLimitHeader(response.headers, "reset");

    if (remaining !== null && limit !== null) {
      const usagePercent = (limit - remaining) / limit;

      // If we're using less than 70% of available rate limit, we can increase
      if (usagePercent < 0.7) {
        await rateLimiter.adjustLimit("increase", {
          reason: "low_usage",
          currentUsage: usagePercent,
          responseTime: response.duration || 0,
        });
      }
    }

    // If response was fast, we might be able to increase rate limit
    const responseTime = response.duration || 0;
    if (responseTime < 1000) {
      // Less than 1 second
      await rateLimiter.adjustLimit("increase", {
        reason: "fast_response",
        responseTime,
      });
    }
  }

  private async adjustRateLimitsOnError(
    rateLimiter: AdaptiveRateLimiter,
    error: APIError
  ): Promise<void> {
    if (error.code === "RATE_LIMITED") {
      // Decrease rate limit aggressively on rate limit errors
      await rateLimiter.adjustLimit("decrease", {
        reason: "rate_limited",
        severity: "high",
        retryAfter: error.retryAfter,
      });
    } else if (error.status && error.status >= 500) {
      // Decrease rate limit moderately on server errors
      await rateLimiter.adjustLimit("decrease", {
        reason: "server_error",
        severity: "medium",
        status: error.status,
      });
    } else if (error.code === "TIMEOUT") {
      // Decrease rate limit on timeouts
      await rateLimiter.adjustLimit("decrease", {
        reason: "timeout",
        severity: "medium",
      });
    }
  }

  private parseRateLimitHeader(
    headers: Record<string, any>,
    type: "remaining" | "limit" | "reset"
  ): number | null {
    // Common rate limit header patterns
    const headerVariations: Record<string, string[]> = {
      remaining: [
        "x-ratelimit-remaining",
        "x-rate-limit-remaining",
        "ratelimit-remaining",
        "rate-limit-remaining",
      ],
      limit: [
        "x-ratelimit-limit",
        "x-rate-limit-limit",
        "ratelimit-limit",
        "rate-limit-limit",
      ],
      reset: [
        "x-ratelimit-reset",
        "x-rate-limit-reset",
        "ratelimit-reset",
        "rate-limit-reset",
      ],
    };

    for (const headerName of headerVariations[type]) {
      const value = headers[headerName] || headers[headerName.toLowerCase()];
      if (value !== undefined) {
        const parsed = parseInt(value, 10);
        return isNaN(parsed) ? null : parsed;
      }
    }

    return null;
  }

  // API health monitoring and adaptive adjustment
  private startHealthMonitoring(): void {
    setInterval(async () => {
      for (const [apiName, rateLimiter] of this.rateLimiters) {
        try {
          const metrics = await this.apiMetrics.getRecentMetrics(
            apiName,
            300000
          ); // Last 5 minutes
          const healthScore = this.calculateAPIHealthScore(metrics);

          await this.healthMonitor.recordHealth(apiName, healthScore, metrics);

          // Auto-adjust based on health
          if (healthScore < 0.7) {
            console.warn(`API health degraded for ${apiName}: ${healthScore}`);
            await rateLimiter.adjustLimit("decrease", {
              reason: "poor_health",
              healthScore,
            });
          } else if (healthScore > 0.9) {
            await rateLimiter.adjustLimit("increase", {
              reason: "good_health",
              healthScore,
            });
          }
        } catch (error) {
          console.error(`Health monitoring error for ${apiName}:`, error);
        }
      }
    }, 60000); // Every minute
  }

  private startRateLimitAdjustment(): void {
    setInterval(async () => {
      for (const [apiName, rateLimiter] of this.rateLimiters) {
        try {
          const stats = await rateLimiter.getStatistics();

          // Auto-adjust based on token utilization
          if (stats.utilizationRate < 0.5) {
            // Low utilization, can increase
            await rateLimiter.adjustLimit("increase", {
              reason: "low_utilization",
              utilizationRate: stats.utilizationRate,
            });
          } else if (stats.utilizationRate > 0.9) {
            // High utilization, might need to decrease
            const recentErrors = await this.apiMetrics.getErrorRate(
              apiName,
              60000
            );
            if (recentErrors > 0.1) {
              await rateLimiter.adjustLimit("decrease", {
                reason: "high_utilization_with_errors",
                utilizationRate: stats.utilizationRate,
                errorRate: recentErrors,
              });
            }
          }
        } catch (error) {
          console.error(`Rate limit adjustment error for ${apiName}:`, error);
        }
      }
    }, 30000); // Every 30 seconds
  }

  private calculateAPIHealthScore(metrics: APIMetrics): number {
    let score = 1.0;

    // Factor in error rate
    if (metrics.errorRate > 0.1) score *= 0.5; // 10% error rate
    else if (metrics.errorRate > 0.05) score *= 0.7; // 5% error rate
    else if (metrics.errorRate > 0.01) score *= 0.9; // 1% error rate

    // Factor in response time
    if (metrics.avgResponseTime > 10000) score *= 0.3; // 10+ seconds
    else if (metrics.avgResponseTime > 5000) score *= 0.6; // 5+ seconds
    else if (metrics.avgResponseTime > 2000) score *= 0.8; // 2+ seconds

    // Factor in timeout rate
    if (metrics.timeoutRate > 0.05) score *= 0.4; // 5% timeout rate
    else if (metrics.timeoutRate > 0.01) score *= 0.8; // 1% timeout rate

    return Math.max(0, Math.min(1, score));
  }

  // Batch request processing with intelligent concurrency
  async executeBatchRequests<T>(
    apiName: string,
    requests: APIRequestConfig[],
    options: BatchRequestOptions = {}
  ): Promise<BatchExecutionResult<T>> {
    const batchId = this.generateRequestId();
    const concurrency = options.concurrency || 5;
    const failFast = options.failFast || false;

    console.log(
      `Executing batch requests: ${batchId} (${requests.length} requests, concurrency: ${concurrency})`
    );

    const results: APIExecutionResult<T>[] = [];
    const errors: BatchRequestError[] = [];
    const executing: Promise<any>[] = [];

    for (let i = 0; i < requests.length; i++) {
      const request = requests[i];

      const promise = this.executeRequest<T>(apiName, {
        ...request,
        headers: {
          ...request.headers,
          "X-Batch-ID": batchId,
          "X-Batch-Index": i.toString(),
        },
      }).then(
        (result) => {
          results.push(result);
          return result;
        },
        (error) => {
          const batchError: BatchRequestError = {
            index: i,
            request,
            error,
          };
          errors.push(batchError);

          if (failFast) {
            throw error;
          }

          return batchError;
        }
      );

      executing.push(promise);

      // Control concurrency
      if (executing.length >= concurrency) {
        await Promise.race(executing);
        const finishedIndex = executing.findIndex((p) => p === promise);
        if (finishedIndex !== -1) {
          executing.splice(finishedIndex, 1);
        }
      }
    }

    // Wait for all remaining requests
    if (failFast) {
      await Promise.all(executing);
    } else {
      await Promise.allSettled(executing);
    }

    return {
      batchId,
      totalRequests: requests.length,
      successfulRequests: results.length,
      failedRequests: errors.length,
      results,
      errors,
      duration: Date.now() - parseInt(batchId.split("_")[1]),
    };
  }

  // Get API statistics and health
  async getAPIStatistics(
    apiName: string,
    timeRange?: TimeRange
  ): Promise<APIStatistics> {
    const rateLimiter = this.rateLimiters.get(apiName);
    const circuitBreaker = this.circuitBreakers.get(apiName);

    if (!rateLimiter || !circuitBreaker) {
      throw new Error(`API configuration not found: ${apiName}`);
    }

    const metrics = await this.apiMetrics.getMetrics(apiName, timeRange);
    const rateLimitStats = await rateLimiter.getStatistics();
    const circuitBreakerState = circuitBreaker.getState();
    const healthScore = this.calculateAPIHealthScore(metrics);

    return {
      apiName,
      healthScore,
      metrics,
      rateLimiting: rateLimitStats,
      circuitBreaker: circuitBreakerState,
      timeRange: timeRange || {
        start: new Date(Date.now() - 3600000), // Last hour
        end: new Date(),
      },
    };
  }

  // Utility methods
  private shouldRetry(
    strategy: RetryStrategy,
    error: APIError,
    attemptNumber: number
  ): boolean {
    if (attemptNumber >= strategy.maxAttempts) {
      return false;
    }

    // Check non-retryable status codes
    if (
      error.status &&
      strategy.nonRetryableStatusCodes?.includes(error.status)
    ) {
      return false;
    }

    // Check retryable status codes
    if (error.status && strategy.retryableStatusCodes?.includes(error.status)) {
      return true;
    }

    // Check retryable error codes
    if (strategy.retryableErrors?.includes(error.code)) {
      return true;
    }

    // Default retryable conditions
    return error.status
      ? error.status >= 500
      : ["TIMEOUT", "CONNECTION_ERROR"].includes(error.code);
  }

  private validateAPIResponse(response: RawAPIResponse<any>): void {
    // Basic response validation
    if (!response.status || response.status < 200 || response.status >= 400) {
      throw new APIError(
        `HTTP error: ${response.status}`,
        "HTTP_ERROR",
        response.status
      );
    }
  }

  private transformError(
    error: any,
    apiName: string,
    duration: number
  ): APIError {
    if (error instanceof APIError) {
      return error;
    }

    // Transform common error types
    if (error.code === "ENOTFOUND" || error.code === "ECONNREFUSED") {
      return new APIError("Connection error", "CONNECTION_ERROR", 0, {
        originalError: error,
        apiName,
        duration,
      });
    }

    if (error.code === "ETIMEDOUT") {
      return new APIError("Request timeout", "TIMEOUT", 408, {
        originalError: error,
        apiName,
        duration,
      });
    }

    // Handle HTTP errors
    if (error.response) {
      const status = error.response.status || 500;
      const retryAfter = this.parseRetryAfterHeader(error.response.headers);

      return new APIError(
        error.response.statusText || error.message,
        status === 429 ? "RATE_LIMITED" : "HTTP_ERROR",
        status,
        {
          originalError: error,
          apiName,
          duration,
          retryAfter,
          responseData: error.response.data,
        }
      );
    }

    return new APIError(
      error.message || "Unknown API error",
      "UNKNOWN_ERROR",
      500,
      { originalError: error, apiName, duration }
    );
  }

  private parseRetryAfterHeader(headers: Record<string, any>): number | null {
    const retryAfter = headers["retry-after"] || headers["Retry-After"];
    if (!retryAfter) return null;

    const parsed = parseInt(retryAfter, 10);
    return isNaN(parsed) ? null : parsed;
  }

  private async makeHTTPRequest<T>(
    config: APIRequestConfig
  ): Promise<RawAPIResponse<T>> {
    // This would use your HTTP client (axios, fetch, etc.)
    // Placeholder implementation
    throw new Error("HTTP request implementation needed");
  }

  private async recordSuccessMetrics(
    apiName: string,
    requestId: string,
    startTime: number,
    attempts: number
  ): Promise<void> {
    await this.apiMetrics.recordRequest(apiName, requestId, {
      status: "success",
      duration: Date.now() - startTime,
      attempts,
    });
  }

  private async recordErrorMetrics(
    apiName: string,
    requestId: string,
    startTime: number,
    attempts: number,
    error: APIError
  ): Promise<void> {
    await this.apiMetrics.recordRequest(apiName, requestId, {
      status: "error",
      duration: Date.now() - startTime,
      attempts,
      error: error.code,
      httpStatus: error.status,
    });
  }

  private generateRequestId(): string {
    return `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }
}

// Supporting classes for rate limiting and retry logic
class AdaptiveRateLimiter {
  private tokens: number;
  private lastRefillTime: number;
  private currentLimit: number;
  private statistics: RateLimitStatistics;

  constructor(private config: RateLimiterConfig) {
    this.tokens = config.initialLimit;
    this.currentLimit = config.initialLimit;
    this.lastRefillTime = Date.now();
    this.statistics = {
      tokensConsumed: 0,
      tokensRefilled: 0,
      waitTime: 0,
      adjustments: 0,
      utilizationRate: 0,
    };
  }

  async acquireToken(): Promise<void> {
    this.refillTokens();

    if (this.tokens <= 0) {
      throw new APIError("No tokens available", "RATE_LIMITED", 429);
    }

    this.tokens--;
    this.statistics.tokensConsumed++;
  }

  async getWaitTime(): Promise<number> {
    this.refillTokens();

    if (this.tokens > 0) {
      return 0;
    }

    // Calculate time until next token is available
    const timeSinceRefill = Date.now() - this.lastRefillTime;
    const timeUntilRefill =
      this.config.windowMs - (timeSinceRefill % this.config.windowMs);

    return timeUntilRefill;
  }

  async adjustLimit(
    direction: "increase" | "decrease",
    context: RateLimitAdjustmentContext
  ): Promise<void> {
    const factor = this.config.adaptationFactor;
    let adjustment: number;

    if (direction === "increase") {
      adjustment = Math.ceil(this.currentLimit * factor);
      this.currentLimit = Math.min(
        this.config.maxLimit,
        this.currentLimit + adjustment
      );
    } else {
      adjustment = Math.ceil(this.currentLimit * factor);
      this.currentLimit = Math.max(
        this.config.minLimit,
        this.currentLimit - adjustment
      );
    }

    console.log(
      `Rate limit adjusted for ${this.config.name}: ${this.currentLimit} (${direction} by ${adjustment}, reason: ${context.reason})`
    );

    this.statistics.adjustments++;
  }

  private refillTokens(): void {
    const now = Date.now();
    const timePassed = now - this.lastRefillTime;

    if (timePassed >= this.config.windowMs) {
      const periods = Math.floor(timePassed / this.config.windowMs);
      const tokensToAdd = periods * this.currentLimit;

      this.tokens = Math.min(this.currentLimit, this.tokens + tokensToAdd);
      this.lastRefillTime = now;
      this.statistics.tokensRefilled += tokensToAdd;
    }

    // Update utilization rate
    this.statistics.utilizationRate =
      (this.currentLimit - this.tokens) / this.currentLimit;
  }

  async getStatistics(): Promise<RateLimitStatistics> {
    return { ...this.statistics };
  }
}

class RetryStrategy {
  constructor(public config: RetryStrategyConfig) {}

  get maxAttempts(): number {
    return this.config.maxAttempts;
  }
  get baseDelayMs(): number {
    return this.config.baseDelayMs;
  }
  get maxDelayMs(): number {
    return this.config.maxDelayMs;
  }
  get backoffType(): string {
    return this.config.backoffType;
  }
  get jitterFactor(): number {
    return this.config.jitterFactor;
  }
  get retryableStatusCodes(): number[] | undefined {
    return this.config.retryableStatusCodes;
  }
  get nonRetryableStatusCodes(): number[] | undefined {
    return this.config.nonRetryableStatusCodes;
  }
  get retryableErrors(): string[] | undefined {
    return this.config.retryableErrors;
  }
}

class CircuitBreaker {
  private failures = 0;
  private successes = 0;
  private lastFailureTime = 0;
  private state: "CLOSED" | "OPEN" | "HALF_OPEN" = "CLOSED";
  private halfOpenCalls = 0;

  constructor(private config: CircuitBreakerConfig) {}

  isOpen(): boolean {
    if (this.state === "OPEN") {
      if (Date.now() - this.lastFailureTime > this.config.recoveryTimeoutMs) {
        this.state = "HALF_OPEN";
        this.halfOpenCalls = 0;
        return false;
      }
      return true;
    }

    if (this.state === "HALF_OPEN") {
      return this.halfOpenCalls >= this.config.halfOpenMaxCalls;
    }

    return false;
  }

  recordSuccess(): void {
    this.successes++;

    if (this.state === "HALF_OPEN") {
      this.halfOpenCalls++;
      if (this.halfOpenCalls >= this.config.halfOpenMaxCalls) {
        this.state = "CLOSED";
        this.failures = 0;
      }
    } else if (this.state === "CLOSED") {
      // Reset failure count on success
      this.failures = 0;
    }
  }

  recordFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();

    if (this.state === "HALF_OPEN") {
      this.state = "OPEN";
    } else if (this.shouldTrip()) {
      this.state = "OPEN";
    }
  }

  private shouldTrip(): boolean {
    const totalCalls = this.failures + this.successes;
    if (totalCalls === 0) return false;

    const failureRate = this.failures / totalCalls;
    return failureRate >= this.config.failureThreshold;
  }

  getState(): CircuitBreakerState {
    return {
      state: this.state,
      failures: this.failures,
      successes: this.successes,
      failureRate:
        this.failures + this.successes === 0
          ? 0
          : this.failures / (this.failures + this.successes),
      lastFailureTime: this.lastFailureTime,
    };
  }
}

class BackoffCalculator {
  calculateDelay(
    backoffType: string,
    baseDelayMs: number,
    attemptNumber: number,
    maxDelayMs: number
  ): number {
    let delay: number;

    switch (backoffType) {
      case "linear":
        delay = baseDelayMs * attemptNumber;
        break;

      case "exponential":
        delay = baseDelayMs * Math.pow(2, attemptNumber - 1);
        break;

      case "polynomial":
        delay = baseDelayMs * Math.pow(attemptNumber, 2);
        break;

      case "fibonacci":
        delay = baseDelayMs * this.fibonacci(attemptNumber);
        break;

      default:
        delay = baseDelayMs;
    }

    return Math.min(delay, maxDelayMs);
  }

  private fibonacci(n: number): number {
    if (n <= 1) return 1;
    let a = 1,
      b = 1;
    for (let i = 2; i < n; i++) {
      [a, b] = [b, a + b];
    }
    return b;
  }
}

// Error class for API operations
class APIError extends Error {
  constructor(
    message: string,
    public code: string,
    public status?: number,
    public metadata?: any
  ) {
    super(message);
    this.name = "APIError";
  }

  get retryAfter(): number | null {
    return this.metadata?.retryAfter || null;
  }
}

// Supporting interfaces and types
interface APIManagerConfig {
  apis: APIConfig[];
}

interface APIConfig {
  name: string;
  baseUrl: string;
  rateLimit: {
    initialLimit: number;
    windowMs: number;
    maxLimit?: number;
    minLimit?: number;
    adaptationFactor?: number;
  };
  retry: {
    maxAttempts: number;
    baseDelayMs: number;
    maxDelayMs: number;
    backoffType: string;
    jitterFactor?: number;
    retryableStatusCodes?: number[];
    nonRetryableStatusCodes?: number[];
    retryableErrors?: string[];
  };
  circuitBreaker: {
    failureThreshold: number;
    recoveryTimeoutMs: number;
    halfOpenMaxCalls?: number;
    monitoringWindowMs?: number;
  };
}

interface RateLimiterConfig {
  name: string;
  initialLimit: number;
  windowMs: number;
  maxLimit: number;
  minLimit: number;
  adaptationFactor: number;
}

interface RetryStrategyConfig {
  name: string;
  maxAttempts: number;
  baseDelayMs: number;
  maxDelayMs: number;
  backoffType: string;
  jitterFactor: number;
  retryableStatusCodes?: number[];
  nonRetryableStatusCodes?: number[];
  retryableErrors?: string[];
}

interface CircuitBreakerConfig {
  name: string;
  failureThreshold: number;
  recoveryTimeoutMs: number;
  halfOpenMaxCalls: number;
  monitoringWindowMs: number;
}

interface APIRequestConfig {
  method: string;
  url: string;
  headers?: Record<string, string>;
  data?: any;
  params?: Record<string, any>;
  timeoutMs?: number;
}

interface APIExecutionResult<T> {
  requestId: string;
  data: T;
  status: number;
  headers: Record<string, any>;
  duration: number;
  attempts: number;
  apiName: string;
}

interface RawAPIResponse<T> {
  data: T;
  status: number;
  headers: Record<string, any>;
  duration?: number;
}

interface BatchRequestOptions {
  concurrency?: number;
  failFast?: boolean;
}

interface BatchExecutionResult<T> {
  batchId: string;
  totalRequests: number;
  successfulRequests: number;
  failedRequests: number;
  results: APIExecutionResult<T>[];
  errors: BatchRequestError[];
  duration: number;
}

interface BatchRequestError {
  index: number;
  request: APIRequestConfig;
  error: APIError;
}

interface RateLimitStatistics {
  tokensConsumed: number;
  tokensRefilled: number;
  waitTime: number;
  adjustments: number;
  utilizationRate: number;
}

interface RateLimitAdjustmentContext {
  reason: string;
  severity?: "low" | "medium" | "high";
  healthScore?: number;
  utilizationRate?: number;
  errorRate?: number;
  responseTime?: number;
  retryAfter?: number;
  status?: number;
  currentUsage?: number;
}

interface CircuitBreakerState {
  state: "CLOSED" | "OPEN" | "HALF_OPEN";
  failures: number;
  successes: number;
  failureRate: number;
  lastFailureTime: number;
}

interface APIMetrics {
  errorRate: number;
  avgResponseTime: number;
  timeoutRate: number;
  totalRequests: number;
  successfulRequests: number;
  failedRequests: number;
}

interface APIStatistics {
  apiName: string;
  healthScore: number;
  metrics: APIMetrics;
  rateLimiting: RateLimitStatistics;
  circuitBreaker: CircuitBreakerState;
  timeRange: TimeRange;
}

// Placeholder classes that would need full implementation
class APIMetricsCollector {
  async recordRequest(
    apiName: string,
    requestId: string,
    data: any
  ): Promise<void> {}
  async getRecentMetrics(
    apiName: string,
    timeWindowMs: number
  ): Promise<APIMetrics> {
    return {} as APIMetrics;
  }
  async getErrorRate(apiName: string, timeWindowMs: number): Promise<number> {
    return 0;
  }
  async getMetrics(
    apiName: string,
    timeRange?: TimeRange
  ): Promise<APIMetrics> {
    return {} as APIMetrics;
  }
}

class APIHealthMonitor {
  async recordHealth(
    apiName: string,
    healthScore: number,
    metrics: APIMetrics
  ): Promise<void> {}
}

Service Reliability and Fallbacks: Graceful Degradation Strategies

Building Resilient Service Integration Architecture

// Comprehensive service reliability system with intelligent fallbacks
class ServiceReliabilityManager {
  private services: Map<string, ServiceDefinition> = new Map();
  private fallbackStrategies: Map<string, FallbackStrategy> = new Map();
  private healthCheckers: Map<string, HealthChecker> = new Map();
  private serviceStates: Map<string, ServiceState> = new Map();
  private dependencyGraph: ServiceDependencyGraph;
  private reliabilityMetrics: ReliabilityMetrics;
  private alertManager: ServiceAlertManager;

  constructor(config: ServiceReliabilityConfig) {
    this.dependencyGraph = new ServiceDependencyGraph();
    this.reliabilityMetrics = new ReliabilityMetrics();
    this.alertManager = new ServiceAlertManager(config.alerting);

    this.initializeServices(config.services);
    this.startHealthMonitoring();
    this.startReliabilityAnalysis();
  }

  private initializeServices(serviceConfigs: ServiceConfig[]): void {
    for (const config of serviceConfigs) {
      // Create service definition
      const service: ServiceDefinition = {
        name: config.name,
        type: config.type,
        endpoint: config.endpoint,
        criticality: config.criticality || "medium",
        dependencies: config.dependencies || [],
        healthCheck: config.healthCheck,
        sla: config.sla,
        timeouts: {
          connection: config.timeouts?.connection || 5000,
          request: config.timeouts?.request || 30000,
          healthCheck: config.timeouts?.healthCheck || 10000,
        },
      };

      // Create fallback strategy
      const fallbackStrategy = this.createFallbackStrategy(config);

      // Create health checker
      const healthChecker = new HealthChecker(service, config.healthCheck);

      // Initialize service state
      const serviceState: ServiceState = {
        status: "unknown",
        health: 0,
        lastHealthCheck: 0,
        consecutiveFailures: 0,
        lastError: null,
        responseTime: 0,
        availability: 1.0,
        errorRate: 0,
        lastSuccessful: Date.now(),
      };

      this.services.set(config.name, service);
      this.fallbackStrategies.set(config.name, fallbackStrategy);
      this.healthCheckers.set(config.name, healthChecker);
      this.serviceStates.set(config.name, serviceState);

      // Build dependency graph
      this.dependencyGraph.addService(config.name, config.dependencies || []);

      console.log(`Initialized service reliability for: ${config.name}`);
    }
  }

  // Execute service call with automatic fallback
  async executeServiceCall<T>(
    serviceName: string,
    operation: ServiceOperation<T>,
    options: ServiceCallOptions = {}
  ): Promise<ServiceCallResult<T>> {
    const callId = this.generateCallId();
    const startTime = Date.now();

    try {
      // Get service configuration and state
      const service = this.services.get(serviceName);
      const serviceState = this.serviceStates.get(serviceName);
      const fallbackStrategy = this.fallbackStrategies.get(serviceName);

      if (!service || !serviceState || !fallbackStrategy) {
        throw new ServiceError(
          `Service not found: ${serviceName}`,
          "SERVICE_NOT_FOUND"
        );
      }

      // Check if service is available
      const availabilityCheck = await this.checkServiceAvailability(
        serviceName,
        service,
        serviceState
      );

      if (!availabilityCheck.available) {
        // Service unavailable, try fallback immediately
        console.warn(
          `Service ${serviceName} unavailable: ${availabilityCheck.reason}`
        );

        return await this.executeFallback(
          serviceName,
          operation,
          callId,
          availabilityCheck.reason
        );
      }

      // Execute primary service call
      try {
        const result = await this.executePrimaryCall(
          service,
          operation,
          callId,
          options
        );

        // Record successful call
        await this.recordSuccessfulCall(serviceName, serviceState, startTime);

        return {
          callId,
          serviceName,
          data: result,
          source: "primary",
          duration: Date.now() - startTime,
          fallbackUsed: false,
          degradedMode: false,
        };
      } catch (primaryError) {
        console.warn(
          `Primary service call failed: ${serviceName}`,
          primaryError.message
        );

        // Record failure
        await this.recordFailedCall(
          serviceName,
          serviceState,
          startTime,
          primaryError
        );

        // Determine if we should attempt fallback
        if (this.shouldAttemptFallback(primaryError, service, options)) {
          return await this.executeFallback(
            serviceName,
            operation,
            callId,
            primaryError.message,
            primaryError
          );
        }

        throw primaryError;
      }
    } catch (error) {
      // Log final failure
      await this.reliabilityMetrics.recordCall(serviceName, {
        callId,
        duration: Date.now() - startTime,
        success: false,
        error: error.message,
        fallbackUsed: false,
      });

      throw error;
    }
  }

  private async executePrimaryCall<T>(
    service: ServiceDefinition,
    operation: ServiceOperation<T>,
    callId: string,
    options: ServiceCallOptions
  ): Promise<T> {
    const timeoutMs = options.timeout || service.timeouts.request;

    // Create timeout promise
    const timeoutPromise = new Promise<never>((_, reject) => {
      setTimeout(
        () => reject(new ServiceError("Service call timeout", "TIMEOUT")),
        timeoutMs
      );
    });

    // Execute operation with timeout
    const operationPromise = operation.execute(service, {
      callId,
      timeout: timeoutMs,
      retries: options.retries || 0,
    });

    return await Promise.race([operationPromise, timeoutPromise]);
  }

  private async executeFallback<T>(
    serviceName: string,
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<ServiceCallResult<T>> {
    const fallbackStrategy = this.fallbackStrategies.get(serviceName)!;
    const startTime = Date.now();

    try {
      console.log(`Executing fallback for ${serviceName}: ${reason}`);

      const fallbackResult = await fallbackStrategy.execute(
        operation,
        callId,
        reason,
        primaryError
      );

      // Record successful fallback
      await this.reliabilityMetrics.recordCall(serviceName, {
        callId,
        duration: Date.now() - startTime,
        success: true,
        fallbackUsed: true,
        fallbackType: fallbackStrategy.type,
      });

      return {
        callId,
        serviceName,
        data: fallbackResult.data,
        source: fallbackResult.source,
        duration: Date.now() - startTime,
        fallbackUsed: true,
        degradedMode: fallbackResult.degraded || false,
        fallbackReason: reason,
      };
    } catch (fallbackError) {
      console.error(
        `Fallback failed for ${serviceName}: ${fallbackError.message}`
      );

      // Record fallback failure
      await this.reliabilityMetrics.recordCall(serviceName, {
        callId,
        duration: Date.now() - startTime,
        success: false,
        fallbackUsed: true,
        fallbackType: fallbackStrategy.type,
        error: fallbackError.message,
      });

      // If fallback fails, throw the original error
      throw primaryError || fallbackError;
    }
  }

  private createFallbackStrategy(config: ServiceConfig): FallbackStrategy {
    const fallbackConfig = config.fallback;

    switch (fallbackConfig.type) {
      case "cache":
        return new CacheFallbackStrategy(fallbackConfig);

      case "default_value":
        return new DefaultValueFallbackStrategy(fallbackConfig);

      case "alternative_service":
        return new AlternativeServiceFallbackStrategy(fallbackConfig, this);

      case "degraded_response":
        return new DegradedResponseFallbackStrategy(fallbackConfig);

      case "composite":
        return new CompositeFallbackStrategy(fallbackConfig, this);

      default:
        return new NoOpFallbackStrategy();
    }
  }

  private async checkServiceAvailability(
    serviceName: string,
    service: ServiceDefinition,
    state: ServiceState
  ): Promise<AvailabilityCheckResult> {
    // Check if service is in maintenance mode
    if (state.status === "maintenance") {
      return {
        available: false,
        reason: "Service in maintenance mode",
        canFallback: true,
      };
    }

    // Check if service has too many consecutive failures
    if (state.consecutiveFailures >= 5) {
      return {
        available: false,
        reason: `Too many consecutive failures: ${state.consecutiveFailures}`,
        canFallback: true,
      };
    }

    // Check if service health is too low
    if (state.health < 0.3) {
      return {
        available: false,
        reason: `Service health too low: ${state.health}`,
        canFallback: true,
      };
    }

    // Check SLA compliance
    if (service.sla && state.availability < service.sla.minAvailability) {
      return {
        available: false,
        reason: `SLA violation - availability: ${state.availability}`,
        canFallback: true,
      };
    }

    // Check response time SLA
    if (service.sla && state.responseTime > service.sla.maxResponseTimeMs) {
      return {
        available: false,
        reason: `SLA violation - response time: ${state.responseTime}ms`,
        canFallback: true,
      };
    }

    return {
      available: true,
      reason: "Service is available",
      canFallback: false,
    };
  }

  private shouldAttemptFallback(
    error: Error,
    service: ServiceDefinition,
    options: ServiceCallOptions
  ): boolean {
    // Don't fallback if explicitly disabled
    if (options.disableFallback) {
      return false;
    }

    // Always fallback for critical services
    if (service.criticality === "critical") {
      return true;
    }

    // Fallback for specific error types
    const fallbackErrors = [
      "TIMEOUT",
      "SERVICE_UNAVAILABLE",
      "CONNECTION_ERROR",
      "RATE_LIMITED",
    ];

    if (error instanceof ServiceError && fallbackErrors.includes(error.code)) {
      return true;
    }

    // Don't fallback for client errors (4xx)
    if (error instanceof ServiceError && error.status && error.status < 500) {
      return false;
    }

    // Fallback for server errors (5xx)
    return true;
  }

  // Service dependency analysis
  async analyzeDependencyImpact(
    serviceName: string
  ): Promise<DependencyImpactAnalysis> {
    const dependents = this.dependencyGraph.getDependents(serviceName);
    const dependencies = this.dependencyGraph.getDependencies(serviceName);

    const impact: ServiceImpact[] = [];

    // Analyze impact on dependent services
    for (const dependent of dependents) {
      const dependentState = this.serviceStates.get(dependent);
      const dependentService = this.services.get(dependent);

      if (dependentState && dependentService) {
        const impactLevel = this.calculateImpactLevel(
          serviceName,
          dependent,
          dependentService
        );

        impact.push({
          serviceName: dependent,
          impactLevel,
          mitigationStrategies: await this.getMitigationStrategies(
            dependent,
            serviceName
          ),
        });
      }
    }

    return {
      serviceName,
      directDependents: dependents,
      transitiveDependents:
        this.dependencyGraph.getTransitiveDependents(serviceName),
      dependencies,
      impact,
      cascadeRisk: this.calculateCascadeRisk(impact),
    };
  }

  private calculateImpactLevel(
    failedService: string,
    dependentService: string,
    serviceDefinition: ServiceDefinition
  ): "low" | "medium" | "high" | "critical" {
    // Check if the failed service is critical for the dependent
    const dependency = serviceDefinition.dependencies.find(
      (d) => d === failedService
    );

    // Impact based on service criticality
    if (serviceDefinition.criticality === "critical") {
      return "critical";
    } else if (serviceDefinition.criticality === "high") {
      return "high";
    }

    // Check if fallback is available
    const fallbackStrategy = this.fallbackStrategies.get(dependentService);
    if (fallbackStrategy && fallbackStrategy.type !== "none") {
      return "medium"; // Impact reduced by fallback
    }

    return "high";
  }

  private calculateCascadeRisk(
    impact: ServiceImpact[]
  ): "low" | "medium" | "high" {
    const criticalImpacts = impact.filter(
      (i) => i.impactLevel === "critical"
    ).length;
    const highImpacts = impact.filter((i) => i.impactLevel === "high").length;

    if (criticalImpacts > 2 || (criticalImpacts > 0 && highImpacts > 3)) {
      return "high";
    } else if (criticalImpacts > 0 || highImpacts > 2) {
      return "medium";
    }

    return "low";
  }

  // Start health monitoring
  private startHealthMonitoring(): void {
    setInterval(async () => {
      const healthChecks = Array.from(this.services.keys()).map(
        async (serviceName) => {
          try {
            await this.performHealthCheck(serviceName);
          } catch (error) {
            console.error(`Health check error for ${serviceName}:`, error);
          }
        }
      );

      await Promise.allSettled(healthChecks);
    }, 30000); // Every 30 seconds
  }

  private async performHealthCheck(serviceName: string): Promise<void> {
    const service = this.services.get(serviceName)!;
    const healthChecker = this.healthCheckers.get(serviceName)!;
    const state = this.serviceStates.get(serviceName)!;

    const startTime = Date.now();

    try {
      const healthResult = await healthChecker.check();
      const duration = Date.now() - startTime;

      // Update service state
      state.status = healthResult.healthy ? "healthy" : "unhealthy";
      state.health = healthResult.score;
      state.lastHealthCheck = Date.now();
      state.responseTime = duration;

      if (healthResult.healthy) {
        state.consecutiveFailures = 0;
        state.lastSuccessful = Date.now();
      } else {
        state.consecutiveFailures++;
        state.lastError = healthResult.error || "Health check failed";
      }

      // Record health metrics
      await this.reliabilityMetrics.recordHealthCheck(serviceName, {
        healthy: healthResult.healthy,
        score: healthResult.score,
        duration,
        error: healthResult.error,
      });

      // Send alerts if needed
      await this.checkHealthAlerts(serviceName, service, state, healthResult);
    } catch (error) {
      const duration = Date.now() - startTime;

      state.status = "unhealthy";
      state.health = 0;
      state.lastHealthCheck = Date.now();
      state.consecutiveFailures++;
      state.lastError = error.message;

      console.error(`Health check failed for ${serviceName}:`, error);
    }
  }

  private async checkHealthAlerts(
    serviceName: string,
    service: ServiceDefinition,
    state: ServiceState,
    healthResult: HealthCheckResult
  ): Promise<void> {
    // Alert on service down
    if (!healthResult.healthy && state.consecutiveFailures === 1) {
      await this.alertManager.sendAlert({
        type: "service_down",
        serviceName,
        severity: service.criticality === "critical" ? "critical" : "warning",
        message: `Service ${serviceName} is down: ${healthResult.error}`,
        metadata: { state, healthResult },
      });
    }

    // Alert on prolonged degradation
    if (state.consecutiveFailures >= 5) {
      await this.alertManager.sendAlert({
        type: "service_degraded",
        serviceName,
        severity: "critical",
        message: `Service ${serviceName} has been unhealthy for ${state.consecutiveFailures} consecutive checks`,
        metadata: { state },
      });
    }

    // Alert on SLA violations
    if (service.sla) {
      if (state.availability < service.sla.minAvailability) {
        await this.alertManager.sendAlert({
          type: "sla_violation",
          serviceName,
          severity: "warning",
          message: `Service ${serviceName} availability (${state.availability}) below SLA (${service.sla.minAvailability})`,
          metadata: { state, sla: service.sla },
        });
      }

      if (state.responseTime > service.sla.maxResponseTimeMs) {
        await this.alertManager.sendAlert({
          type: "sla_violation",
          serviceName,
          severity: "warning",
          message: `Service ${serviceName} response time (${state.responseTime}ms) exceeds SLA (${service.sla.maxResponseTimeMs}ms)`,
          metadata: { state, sla: service.sla },
        });
      }
    }
  }

  // Start reliability analysis
  private startReliabilityAnalysis(): void {
    setInterval(async () => {
      try {
        await this.performReliabilityAnalysis();
      } catch (error) {
        console.error("Reliability analysis error:", error);
      }
    }, 300000); // Every 5 minutes
  }

  private async performReliabilityAnalysis(): Promise<void> {
    for (const serviceName of this.services.keys()) {
      const metrics = await this.reliabilityMetrics.getServiceMetrics(
        serviceName,
        300000 // Last 5 minutes
      );

      // Update availability calculation
      const state = this.serviceStates.get(serviceName)!;
      state.availability = metrics.successRate;
      state.errorRate = 1 - metrics.successRate;

      // Check for reliability issues
      if (metrics.successRate < 0.95) {
        console.warn(
          `Service ${serviceName} reliability degraded: ${(
            metrics.successRate * 100
          ).toFixed(2)}%`
        );
      }

      // Analyze fallback usage patterns
      if (metrics.fallbackUsageRate > 0.1) {
        console.warn(
          `Service ${serviceName} high fallback usage: ${(
            metrics.fallbackUsageRate * 100
          ).toFixed(2)}%`
        );
      }
    }
  }

  private async recordSuccessfulCall(
    serviceName: string,
    state: ServiceState,
    startTime: number
  ): Promise<void> {
    const duration = Date.now() - startTime;

    state.consecutiveFailures = 0;
    state.lastSuccessful = Date.now();
    state.responseTime = duration;

    await this.reliabilityMetrics.recordCall(serviceName, {
      callId: this.generateCallId(),
      duration,
      success: true,
      fallbackUsed: false,
    });
  }

  private async recordFailedCall(
    serviceName: string,
    state: ServiceState,
    startTime: number,
    error: Error
  ): Promise<void> {
    const duration = Date.now() - startTime;

    state.consecutiveFailures++;
    state.lastError = error.message;

    await this.reliabilityMetrics.recordCall(serviceName, {
      callId: this.generateCallId(),
      duration,
      success: false,
      error: error.message,
      fallbackUsed: false,
    });
  }

  // Get service reliability statistics
  async getServiceReliabilityReport(
    serviceName?: string,
    timeRange?: TimeRange
  ): Promise<ServiceReliabilityReport> {
    const services = serviceName
      ? [serviceName]
      : Array.from(this.services.keys());

    const serviceReports: ServiceReliabilityData[] = [];

    for (const name of services) {
      const service = this.services.get(name)!;
      const state = this.serviceStates.get(name)!;
      const metrics = await this.reliabilityMetrics.getServiceMetrics(
        name,
        timeRange
          ? timeRange.end.getTime() - timeRange.start.getTime()
          : 3600000
      );

      const dependencyImpact = await this.analyzeDependencyImpact(name);

      serviceReports.push({
        serviceName: name,
        criticality: service.criticality,
        currentState: state,
        metrics,
        dependencyImpact,
        slaCompliance: service.sla
          ? {
              availabilityMet:
                state.availability >= service.sla.minAvailability,
              responseTimeMet:
                state.responseTime <= service.sla.maxResponseTimeMs,
              currentAvailability: state.availability,
              targetAvailability: service.sla.minAvailability,
              currentResponseTime: state.responseTime,
              targetResponseTime: service.sla.maxResponseTimeMs,
            }
          : null,
      });
    }

    return {
      generatedAt: Date.now(),
      timeRange: timeRange || {
        start: new Date(Date.now() - 3600000),
        end: new Date(),
      },
      services: serviceReports,
      overallHealth: this.calculateOverallHealth(serviceReports),
    };
  }

  private calculateOverallHealth(services: ServiceReliabilityData[]): number {
    if (services.length === 0) return 1;

    let weightedHealth = 0;
    let totalWeight = 0;

    for (const service of services) {
      const weight = this.getCriticalityWeight(service.criticality);
      weightedHealth += service.currentState.health * weight;
      totalWeight += weight;
    }

    return totalWeight > 0 ? weightedHealth / totalWeight : 0;
  }

  private getCriticalityWeight(criticality: string): number {
    const weights = {
      critical: 4,
      high: 3,
      medium: 2,
      low: 1,
    };
    return weights[criticality] || 2;
  }

  private async getMitigationStrategies(
    dependentService: string,
    failedService: string
  ): Promise<string[]> {
    const fallbackStrategy = this.fallbackStrategies.get(dependentService);
    const strategies: string[] = [];

    if (fallbackStrategy) {
      strategies.push(`Fallback available: ${fallbackStrategy.type}`);
    }

    // Check for alternative dependencies
    const service = this.services.get(dependentService);
    if (service && service.dependencies.length > 1) {
      strategies.push("Multiple dependencies available");
    }

    // Add other mitigation strategies based on service configuration
    strategies.push("Manual intervention available");

    return strategies;
  }

  private generateCallId(): string {
    return `call_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
  }
}

// Example fallback strategy implementations
class CacheFallbackStrategy implements FallbackStrategy {
  readonly type = "cache";

  constructor(private config: any) {}

  async execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>> {
    // Try to get cached result
    const cacheKey = this.generateCacheKey(operation, callId);
    const cachedResult = await this.getCachedResult(cacheKey);

    if (cachedResult) {
      return {
        data: cachedResult,
        source: "cache",
        degraded: true,
      };
    }

    throw new ServiceError(
      "No cached data available for fallback",
      "CACHE_MISS"
    );
  }

  private generateCacheKey(
    operation: ServiceOperation<any>,
    callId: string
  ): string {
    // Generate cache key based on operation parameters
    return `fallback:${operation.name}:${JSON.stringify(operation.parameters)}`;
  }

  private async getCachedResult(cacheKey: string): Promise<any> {
    // Implementation would check your cache (Redis, etc.)
    return null;
  }
}

class DefaultValueFallbackStrategy implements FallbackStrategy {
  readonly type = "default_value";

  constructor(private config: any) {}

  async execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>> {
    const defaultValue = this.config.defaultValues[operation.name];

    if (defaultValue !== undefined) {
      return {
        data: defaultValue,
        source: "default",
        degraded: true,
      };
    }

    throw new ServiceError(
      "No default value configured for fallback",
      "NO_DEFAULT_VALUE"
    );
  }
}

// Supporting interfaces and classes
interface ServiceReliabilityConfig {
  services: ServiceConfig[];
  alerting: AlertingConfig;
}

interface ServiceConfig {
  name: string;
  type: string;
  endpoint: string;
  criticality?: "low" | "medium" | "high" | "critical";
  dependencies?: string[];
  healthCheck: HealthCheckConfig;
  sla?: ServiceSLA;
  timeouts?: {
    connection?: number;
    request?: number;
    healthCheck?: number;
  };
  fallback: FallbackConfig;
}

interface ServiceDefinition {
  name: string;
  type: string;
  endpoint: string;
  criticality: string;
  dependencies: string[];
  healthCheck: HealthCheckConfig;
  sla?: ServiceSLA;
  timeouts: {
    connection: number;
    request: number;
    healthCheck: number;
  };
}

interface ServiceState {
  status: "unknown" | "healthy" | "unhealthy" | "maintenance";
  health: number;
  lastHealthCheck: number;
  consecutiveFailures: number;
  lastError: string | null;
  responseTime: number;
  availability: number;
  errorRate: number;
  lastSuccessful: number;
}

interface ServiceOperation<T> {
  name: string;
  parameters: any;
  execute(service: ServiceDefinition, context: any): Promise<T>;
}

interface ServiceCallOptions {
  timeout?: number;
  retries?: number;
  disableFallback?: boolean;
}

interface ServiceCallResult<T> {
  callId: string;
  serviceName: string;
  data: T;
  source: "primary" | "cache" | "default" | "alternative";
  duration: number;
  fallbackUsed: boolean;
  degradedMode: boolean;
  fallbackReason?: string;
}

interface FallbackStrategy {
  readonly type: string;
  execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>>;
}

interface FallbackResult<T> {
  data: T;
  source: string;
  degraded?: boolean;
}

interface FallbackConfig {
  type:
    | "cache"
    | "default_value"
    | "alternative_service"
    | "degraded_response"
    | "composite"
    | "none";
  [key: string]: any;
}

interface HealthCheckConfig {
  endpoint?: string;
  method?: string;
  expectedStatus?: number;
  timeout?: number;
  headers?: Record<string, string>;
}

interface ServiceSLA {
  minAvailability: number;
  maxResponseTimeMs: number;
}

interface AvailabilityCheckResult {
  available: boolean;
  reason: string;
  canFallback: boolean;
}

interface DependencyImpactAnalysis {
  serviceName: string;
  directDependents: string[];
  transitiveDependents: string[];
  dependencies: string[];
  impact: ServiceImpact[];
  cascadeRisk: "low" | "medium" | "high";
}

interface ServiceImpact {
  serviceName: string;
  impactLevel: "low" | "medium" | "high" | "critical";
  mitigationStrategies: string[];
}

interface ServiceReliabilityReport {
  generatedAt: number;
  timeRange: TimeRange;
  services: ServiceReliabilityData[];
  overallHealth: number;
}

interface ServiceReliabilityData {
  serviceName: string;
  criticality: string;
  currentState: ServiceState;
  metrics: any;
  dependencyImpact: DependencyImpactAnalysis;
  slaCompliance: {
    availabilityMet: boolean;
    responseTimeMet: boolean;
    currentAvailability: number;
    targetAvailability: number;
    currentResponseTime: number;
    targetResponseTime: number;
  } | null;
}

interface HealthCheckResult {
  healthy: boolean;
  score: number;
  error?: string;
  metadata?: any;
}

interface AlertingConfig {
  channels: string[];
  thresholds: any;
}

interface TimeRange {
  start: Date;
  end: Date;
}

class ServiceError extends Error {
  constructor(
    message: string,
    public code: string,
    public status?: number,
    public metadata?: any
  ) {
    super(message);
    this.name = "ServiceError";
  }
}

// Placeholder classes that would need full implementation
class ServiceDependencyGraph {
  addService(serviceName: string, dependencies: string[]): void {}
  getDependents(serviceName: string): string[] {
    return [];
  }
  getDependencies(serviceName: string): string[] {
    return [];
  }
  getTransitiveDependents(serviceName: string): string[] {
    return [];
  }
}

class ReliabilityMetrics {
  async recordCall(serviceName: string, data: any): Promise<void> {}
  async recordHealthCheck(serviceName: string, data: any): Promise<void> {}
  async getServiceMetrics(
    serviceName: string,
    timeWindowMs: number
  ): Promise<any> {
    return {
      successRate: 0.95,
      fallbackUsageRate: 0.05,
      avgResponseTime: 100,
    };
  }
}

class ServiceAlertManager {
  constructor(config: AlertingConfig) {}
  async sendAlert(alert: any): Promise<void> {
    console.log(`ALERT: ${alert.message}`);
  }
}

class HealthChecker {
  constructor(
    private service: ServiceDefinition,
    private config: HealthCheckConfig
  ) {}

  async check(): Promise<HealthCheckResult> {
    // Implementation would perform actual health check
    return {
      healthy: true,
      score: 1.0,
    };
  }
}

class AlternativeServiceFallbackStrategy implements FallbackStrategy {
  readonly type = "alternative_service";
  constructor(
    private config: any,
    private manager: ServiceReliabilityManager
  ) {}
  async execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>> {
    throw new Error("Alternative service fallback not implemented");
  }
}

class DegradedResponseFallbackStrategy implements FallbackStrategy {
  readonly type = "degraded_response";
  constructor(private config: any) {}
  async execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>> {
    throw new Error("Degraded response fallback not implemented");
  }
}

class CompositeFallbackStrategy implements FallbackStrategy {
  readonly type = "composite";
  constructor(
    private config: any,
    private manager: ServiceReliabilityManager
  ) {}
  async execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>> {
    throw new Error("Composite fallback not implemented");
  }
}

class NoOpFallbackStrategy implements FallbackStrategy {
  readonly type = "none";
  async execute<T>(
    operation: ServiceOperation<T>,
    callId: string,
    reason: string,
    primaryError?: Error
  ): Promise<FallbackResult<T>> {
    throw new ServiceError("No fallback strategy configured", "NO_FALLBACK");
  }
}

Key Takeaways

Third-party integration resilience isn’t just about handling individual API failures—it’s about building systems that can adapt to unpredictable patterns, survive webhook floods, and maintain consistency when external services behave in ways you never anticipated.

Essential resilience patterns:

  • Webhook processing with flood protection, deduplication, and intelligent retry logic
  • Adaptive rate limiting that adjusts based on API behavior and response patterns
  • Service reliability with comprehensive fallback strategies and dependency analysis
  • Health monitoring that tracks integration health, not just uptime
  • Graceful degradation that maintains core functionality when external services fail

The production-ready integration resilience framework:

  • Use event-driven architecture for webhook processing that can handle unpredictable loads
  • Implement adaptive rate limiting that learns from API behavior rather than using static limits
  • Build comprehensive fallback strategies including cache, default values, and alternative services
  • Design dependency-aware systems that understand the cascade effects of service failures
  • Plan intelligent degradation that provides reduced functionality rather than complete failure

Integration resilience best practices:

  • Design for failure from day one - external services will fail, plan accordingly
  • Monitor integration health as comprehensively as you monitor your own services
  • Implement circuit breakers to prevent cascading failures across your system
  • Use correlation IDs to trace requests through complex integration chains
  • Build comprehensive alerting that provides actionable insights, not just noise

The architecture decision framework:

  • Use webhook queues for high-volume event processing with flood protection
  • Use adaptive rate limiting for APIs with variable or unknown rate limits
  • Use circuit breakers for critical external dependencies
  • Use fallback strategies for any functionality that must remain available
  • Use health checks for proactive monitoring of external service availability

What’s Next?

In the upcoming blogs, we’ll continue with more advanced backend concepts, diving into specialized topics like serverless architectures, advanced database concepts, performance optimization, and monitoring and observability at scale.

We’ll explore the operational challenges of maintaining complex distributed systems—from optimizing database queries that handle millions of records to building monitoring systems that can detect problems before they impact users.

Because building integrations that work in development is the foundation. Making them bulletproof enough to handle the chaos of production traffic, unpredictable external service behavior, and the complexity of distributed systems at scale—that’s where backend engineering becomes both an art and a science.