Third-party Integration - 2/2
The $18 Million Integration Tsunami That Crushed Cyber Monday
Picture this catastrophe: November 27th, 11:30 PM EST. One of the largest retail platforms is preparing for Cyber Monday—the biggest online shopping day of the year. They’ve successfully handled Black Friday with 300% normal traffic. Their integrations with payment processors, inventory systems, and shipping providers have been rock solid. Everything is optimized and ready.
Then midnight hits, and the perfect storm begins.
Within the first 30 minutes, what started as a trickle becomes a tsunami:
- Stripe starts sending 50,000 webhook events per minute instead of the usual 500
- PayPal webhooks begin arriving 4 hours late, causing payment status confusion
- The inventory management system API starts rate limiting at 100 requests/minute instead of 1,000
- Shipping carrier APIs return 500 errors for 40% of tracking requests
- Email service webhooks flood the servers with 200,000 delivery status updates
- Social media sharing APIs hit their daily quota limits by 1 AM
- The recommendation engine API starts timing out after 2 seconds instead of responding in 200ms
But here’s where it gets catastrophic—their webhook processing system, designed to handle the “normal” flow of external events, begins to collapse under the load:
- Webhook queue overflow: 2 million unprocessed webhook events backed up
- Duplicate processing: Failed webhooks retrying without deduplication, creating data corruption
- Resource exhaustion: Webhook processors consuming 100% CPU trying to handle the flood
- Database deadlocks: Concurrent webhook processing causing payment status conflicts
- Memory leaks: Webhook handlers not releasing resources, causing server crashes
- Cascading failures: Slow webhook processing causing HTTP timeouts, triggering even more retries
By 3 AM, the damage was staggering:
- $18 million in revenue lost due to failed payment confirmations
- 1.2 million customers with incorrect order statuses
- 800,000 shipping notifications never sent
- 3 million email delivery events never processed
- Complete breakdown of real-time inventory tracking
- Customer service receiving 100,000 tickets about “missing” orders that actually went through
The brutal truth? They had built integrations that worked beautifully under normal load but had never planned for the exponential chaos that external services create during peak traffic.
The Uncomfortable Truth About Integration Resilience
Here’s what separates applications that gracefully handle third-party chaos from those that crumble when external services misbehave: Integration resilience isn’t just about handling individual API failures—it’s about building systems that can adapt to unpredictable patterns, survive webhook floods, and maintain consistency when external services behave in ways you never anticipated.
Most developers approach integration resilience like this:
- Add retry logic and assume that solves reliability
- Process webhooks as they arrive and hope the load stays manageable
- Set up basic monitoring for uptime and response times
- Discover during peak traffic that external services scale differently than your application
- Panic when webhook floods overwhelm your servers and corrupt your data
But systems that handle real-world integration complexity work differently:
- Design webhook processing to handle unpredictable loads and patterns
- Implement intelligent rate limiting that adapts to external service behavior
- Build comprehensive monitoring that tracks integration health, not just uptime
- Plan for service degradation and implement smart fallback strategies
- Treat external service behavior as a distributed systems problem requiring sophisticated coordination
The difference isn’t just reliability—it’s the difference between systems that become more robust as they integrate with more services and systems where each new integration becomes a potential single point of failure.
Ready to build integrations that work like Shopify’s payment processing during Black Friday instead of that webhook handler that falls over when someone sneezes at the third-party data center? Let’s dive into the patterns that power bulletproof integration resilience.
Webhook Handling and Processing: Taming the Event Storm
Scalable Webhook Processing Architecture
// Advanced webhook processing system that handles floods and ensures reliability
class WebhookProcessingEngine {
private processors: Map<string, WebhookProcessor> = new Map();
private incomingQueue: WebhookQueue;
private processingQueue: WebhookQueue;
private deadLetterQueue: WebhookQueue;
private deduplicationCache: DeduplicationCache;
private rateLimiter: WebhookRateLimiter;
private retryManager: WebhookRetryManager;
private metricsCollector: WebhookMetrics;
constructor(config: WebhookEngineConfig) {
this.incomingQueue = new WebhookQueue("incoming", config.queueConfig);
this.processingQueue = new WebhookQueue("processing", config.queueConfig);
this.deadLetterQueue = new WebhookQueue("dead-letter", config.queueConfig);
this.deduplicationCache = new DeduplicationCache(config.cacheConfig);
this.rateLimiter = new WebhookRateLimiter(config.rateLimitConfig);
this.retryManager = new WebhookRetryManager(config.retryConfig);
this.metricsCollector = new WebhookMetrics();
this.initializeProcessors(config.processors);
this.startProcessingWorkers(config.workerCount || 10);
this.setupHealthMonitoring();
}
// Accept incoming webhook with immediate response
async receiveWebhook(
provider: string,
signature: string,
payload: any,
headers: Record<string, string>
): Promise<WebhookReceiptResult> {
const webhookId = this.generateWebhookId();
const receivedAt = Date.now();
try {
// Immediate signature validation (fast path)
const processor = this.processors.get(provider);
if (!processor) {
throw new WebhookError(
`No processor found for provider: ${provider}`,
"UNKNOWN_PROVIDER"
);
}
// Basic signature validation
const isValidSignature = await processor.validateSignature(
signature,
payload,
headers
);
if (!isValidSignature) {
await this.metricsCollector.recordRejected(
provider,
"INVALID_SIGNATURE"
);
throw new WebhookError(
"Invalid webhook signature",
"INVALID_SIGNATURE"
);
}
// Apply rate limiting per provider
const rateLimitResult = await this.rateLimiter.checkLimit(provider);
if (!rateLimitResult.allowed) {
await this.metricsCollector.recordRejected(provider, "RATE_LIMITED");
throw new WebhookError("Rate limit exceeded", "RATE_LIMITED", 429, {
retryAfter: rateLimitResult.retryAfter,
});
}
// Create webhook record
const webhook: IncomingWebhook = {
id: webhookId,
provider,
signature,
payload,
headers,
receivedAt,
status: "received",
attempts: 0,
maxAttempts: this.getMaxAttempts(provider),
nextRetryAt: null,
};
// Queue for processing (fast operation)
await this.incomingQueue.enqueue(webhook);
// Record metrics
await this.metricsCollector.recordReceived(provider);
console.log(`Webhook received: ${webhookId} from ${provider}`);
return {
webhookId,
status: "accepted",
message: "Webhook queued for processing",
};
} catch (error) {
console.error(`Webhook reception failed: ${webhookId}`, error);
return {
webhookId,
status: "rejected",
message: error.message,
error: error.code,
};
}
}
// Process webhooks with deduplication and retry logic
private async processWebhook(webhook: IncomingWebhook): Promise<void> {
const startTime = Date.now();
let processingResult: WebhookProcessingResult;
try {
// Check for duplicate processing
const deduplicationKey = this.generateDeduplicationKey(webhook);
const isDuplicate = await this.deduplicationCache.isDuplicate(
deduplicationKey
);
if (isDuplicate) {
console.log(`Duplicate webhook detected: ${webhook.id}`);
await this.metricsCollector.recordDuplicate(webhook.provider);
return;
}
// Mark as processing to prevent duplicates
await this.deduplicationCache.markProcessing(
deduplicationKey,
webhook.id
);
// Get processor for this provider
const processor = this.processors.get(webhook.provider);
if (!processor) {
throw new WebhookError(
`No processor found for provider: ${webhook.provider}`,
"NO_PROCESSOR"
);
}
// Parse webhook events
const events = await processor.parseEvents(
webhook.payload,
webhook.headers
);
if (events.length === 0) {
console.log(`No events found in webhook: ${webhook.id}`);
await this.metricsCollector.recordProcessed(
webhook.provider,
0,
Date.now() - startTime
);
return;
}
// Process each event in the webhook
processingResult = await this.processWebhookEvents(webhook, events);
// Record successful processing
await this.metricsCollector.recordProcessed(
webhook.provider,
events.length,
Date.now() - startTime
);
// Mark processing complete in cache
await this.deduplicationCache.markCompleted(
deduplicationKey,
processingResult
);
console.log(
`Webhook processed successfully: ${webhook.id} (${events.length} events)`
);
} catch (error) {
const duration = Date.now() - startTime;
console.error(`Webhook processing failed: ${webhook.id}`, error);
// Determine if error is retryable
const shouldRetry = this.shouldRetryWebhook(error, webhook);
if (shouldRetry && webhook.attempts < webhook.maxAttempts) {
// Calculate retry delay with exponential backoff
const retryDelay = this.calculateRetryDelay(webhook.attempts);
webhook.nextRetryAt = Date.now() + retryDelay;
webhook.attempts++;
webhook.lastError = error.message;
// Requeue for retry
await this.processingQueue.enqueueDelayed(webhook, retryDelay);
await this.metricsCollector.recordRetry(
webhook.provider,
webhook.attempts
);
console.log(
`Webhook queued for retry: ${webhook.id} (attempt ${webhook.attempts}/${webhook.maxAttempts})`
);
} else {
// Max retries exceeded or non-retryable error
webhook.status = "failed";
webhook.lastError = error.message;
await this.deadLetterQueue.enqueue(webhook);
await this.metricsCollector.recordFailed(webhook.provider, duration);
console.error(
`Webhook moved to dead letter queue: ${webhook.id}`,
error
);
}
}
}
private async processWebhookEvents(
webhook: IncomingWebhook,
events: WebhookEvent[]
): Promise<WebhookProcessingResult> {
const results: EventProcessingResult[] = [];
const processor = this.processors.get(webhook.provider)!;
for (const event of events) {
try {
// Process individual event with timeout
const eventResult = await Promise.race([
processor.processEvent(event, webhook),
new Promise<never>((_, reject) =>
setTimeout(
() => reject(new Error("Event processing timeout")),
30000 // 30 second timeout per event
)
),
]);
results.push({
eventId: event.id,
eventType: event.type,
success: true,
result: eventResult,
});
} catch (error) {
console.error(
`Event processing failed: ${webhook.id}/${event.id}`,
error
);
results.push({
eventId: event.id,
eventType: event.type,
success: false,
error: error.message,
});
// If it's a critical event type, fail the entire webhook
if (this.isCriticalEventType(event.type)) {
throw new WebhookError(
`Critical event processing failed: ${event.type}`,
"CRITICAL_EVENT_FAILED",
500,
{ eventId: event.id, originalError: error }
);
}
}
}
const successfulEvents = results.filter((r) => r.success).length;
const failedEvents = results.filter((r) => !r.success).length;
return {
webhookId: webhook.id,
totalEvents: events.length,
successfulEvents,
failedEvents,
results,
processedAt: Date.now(),
};
}
// Start processing workers
private startProcessingWorkers(workerCount: number): void {
for (let i = 0; i < workerCount; i++) {
this.startProcessingWorker(`worker-${i}`);
}
console.log(`Started ${workerCount} webhook processing workers`);
}
private async startProcessingWorker(workerId: string): Promise<void> {
console.log(`Starting webhook processing worker: ${workerId}`);
// Process incoming webhooks
const processIncoming = async () => {
while (true) {
try {
const webhook = await this.incomingQueue.dequeue(5000); // 5 second timeout
if (webhook) {
webhook.status = "processing";
await this.processWebhook(webhook);
}
} catch (error) {
console.error(`Worker ${workerId} error processing incoming:`, error);
await new Promise((resolve) => setTimeout(resolve, 1000)); // Brief pause on error
}
}
};
// Process retry webhooks
const processRetries = async () => {
while (true) {
try {
const webhook = await this.processingQueue.dequeueReady(5000);
if (webhook) {
webhook.status = "retrying";
await this.processWebhook(webhook);
}
} catch (error) {
console.error(`Worker ${workerId} error processing retries:`, error);
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
};
// Run both processing loops concurrently
Promise.all([processIncoming(), processRetries()]);
}
// Webhook replay for failed processing
async replayWebhook(webhookId: string): Promise<WebhookReplayResult> {
try {
// Find webhook in dead letter queue
const webhook = await this.deadLetterQueue.findById(webhookId);
if (!webhook) {
throw new WebhookError("Webhook not found", "WEBHOOK_NOT_FOUND");
}
// Reset webhook state for replay
webhook.status = "received";
webhook.attempts = 0;
webhook.nextRetryAt = null;
webhook.lastError = null;
// Remove from dead letter queue
await this.deadLetterQueue.remove(webhookId);
// Requeue for processing
await this.incomingQueue.enqueue(webhook);
console.log(`Webhook queued for replay: ${webhookId}`);
return {
webhookId,
status: "replaying",
message: "Webhook queued for replay processing",
};
} catch (error) {
return {
webhookId,
status: "replay_failed",
message: error.message,
error: error.code,
};
}
}
// Batch webhook operations
async replayWebhookBatch(
criteria: WebhookReplayCriteria
): Promise<BatchReplayResult> {
const webhooks = await this.deadLetterQueue.findByCriteria(criteria);
const results: WebhookReplayResult[] = [];
for (const webhook of webhooks) {
const result = await this.replayWebhook(webhook.id);
results.push(result);
}
return {
total: webhooks.length,
successful: results.filter((r) => r.status === "replaying").length,
failed: results.filter((r) => r.status === "replay_failed").length,
results,
};
}
// Webhook analytics and monitoring
async getWebhookStatistics(
timeRange: TimeRange,
provider?: string
): Promise<WebhookStatistics> {
return await this.metricsCollector.getStatistics(timeRange, provider);
}
async getWebhookHealth(): Promise<WebhookHealthStatus> {
const queueStats = {
incoming: await this.incomingQueue.getSize(),
processing: await this.processingQueue.getSize(),
deadLetter: await this.deadLetterQueue.getSize(),
};
const processingRates = await this.metricsCollector.getProcessingRates();
const errorRates = await this.metricsCollector.getErrorRates();
return {
queues: queueStats,
processingRates,
errorRates,
healthScore: this.calculateHealthScore(queueStats, errorRates),
lastUpdated: Date.now(),
};
}
// Initialize webhook processors for different providers
private initializeProcessors(
processorConfigs: WebhookProcessorConfig[]
): void {
for (const config of processorConfigs) {
let processor: WebhookProcessor;
switch (config.provider) {
case "stripe":
processor = new StripeWebhookProcessor(config);
break;
case "paypal":
processor = new PayPalWebhookProcessor(config);
break;
case "github":
processor = new GitHubWebhookProcessor(config);
break;
case "sendgrid":
processor = new SendGridWebhookProcessor(config);
break;
default:
throw new Error(`Unsupported webhook provider: ${config.provider}`);
}
this.processors.set(config.provider, processor);
console.log(`Initialized webhook processor: ${config.provider}`);
}
}
// Utility methods
private generateWebhookId(): string {
return `wh_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private generateDeduplicationKey(webhook: IncomingWebhook): string {
// Create unique key based on provider, signature, and payload hash
const payloadHash = this.hashPayload(webhook.payload);
return `${webhook.provider}:${webhook.signature}:${payloadHash}`;
}
private hashPayload(payload: any): string {
const crypto = require("crypto");
return crypto
.createHash("sha256")
.update(JSON.stringify(payload))
.digest("hex");
}
private shouldRetryWebhook(error: any, webhook: IncomingWebhook): boolean {
// Determine if webhook should be retried based on error type
const retryableErrors = [
"TIMEOUT",
"CONNECTION_ERROR",
"SERVICE_UNAVAILABLE",
"DATABASE_ERROR",
];
const nonRetryableErrors = [
"INVALID_SIGNATURE",
"NO_PROCESSOR",
"MALFORMED_PAYLOAD",
];
if (nonRetryableErrors.includes(error.code)) {
return false;
}
if (retryableErrors.includes(error.code) || error.status >= 500) {
return true;
}
return false;
}
private calculateRetryDelay(attemptNumber: number): number {
// Exponential backoff with jitter
const baseDelay = 1000; // 1 second
const maxDelay = 300000; // 5 minutes
const exponentialDelay = baseDelay * Math.pow(2, attemptNumber);
const jitter = Math.random() * 1000; // Up to 1 second jitter
return Math.min(maxDelay, exponentialDelay + jitter);
}
private getMaxAttempts(provider: string): number {
const maxAttemptsByProvider: Record<string, number> = {
stripe: 5,
paypal: 3,
github: 7,
sendgrid: 5,
};
return maxAttemptsByProvider[provider] || 5;
}
private isCriticalEventType(eventType: string): boolean {
const criticalEvents = [
"payment.completed",
"payment.failed",
"subscription.cancelled",
"account.deleted",
];
return criticalEvents.includes(eventType);
}
private calculateHealthScore(queueStats: any, errorRates: any): number {
let score = 100;
// Deduct points for queue backlog
if (queueStats.incoming > 1000) score -= 20;
if (queueStats.processing > 500) score -= 15;
if (queueStats.deadLetter > 100) score -= 25;
// Deduct points for high error rates
if (errorRates.overall > 0.1) score -= 30; // 10% error rate
if (errorRates.overall > 0.05) score -= 15; // 5% error rate
return Math.max(0, Math.min(100, score));
}
private setupHealthMonitoring(): void {
setInterval(async () => {
try {
const health = await this.getWebhookHealth();
if (health.healthScore < 70) {
console.warn("Webhook system health degraded:", health);
}
// Auto-scale workers if queues are backing up
if (health.queues.incoming > 5000) {
console.log("High webhook queue detected, consider scaling workers");
}
} catch (error) {
console.error("Health monitoring error:", error);
}
}, 30000); // Every 30 seconds
}
}
// Example Stripe webhook processor implementation
class StripeWebhookProcessor implements WebhookProcessor {
constructor(private config: WebhookProcessorConfig) {}
async validateSignature(
signature: string,
payload: any,
headers: Record<string, string>
): Promise<boolean> {
try {
const stripe = require("stripe")(this.config.secretKey);
const endpointSecret = this.config.webhookSecret;
// Use Stripe's signature validation
stripe.webhooks.constructEvent(payload, signature, endpointSecret);
return true;
} catch (error) {
console.error("Stripe signature validation failed:", error.message);
return false;
}
}
async parseEvents(
payload: any,
headers: Record<string, string>
): Promise<WebhookEvent[]> {
const stripeEvent = payload;
return [
{
id: stripeEvent.id,
type: stripeEvent.type,
data: stripeEvent.data,
createdAt: stripeEvent.created * 1000, // Convert to milliseconds
metadata: {
apiVersion: stripeEvent.api_version,
livemode: stripeEvent.livemode,
},
},
];
}
async processEvent(
event: WebhookEvent,
webhook: IncomingWebhook
): Promise<EventProcessingResult> {
console.log(`Processing Stripe event: ${event.type} (${event.id})`);
try {
switch (event.type) {
case "payment_intent.succeeded":
return await this.handlePaymentSuccess(event);
case "payment_intent.payment_failed":
return await this.handlePaymentFailed(event);
case "customer.subscription.created":
return await this.handleSubscriptionCreated(event);
case "customer.subscription.deleted":
return await this.handleSubscriptionCancelled(event);
case "invoice.payment_succeeded":
return await this.handleInvoicePaymentSuccess(event);
case "invoice.payment_failed":
return await this.handleInvoicePaymentFailed(event);
default:
console.log(`Unhandled Stripe event type: ${event.type}`);
return {
success: true,
message: "Event type not handled, but acknowledged",
};
}
} catch (error) {
return {
success: false,
error: error.message,
retryable: this.isRetryableError(error),
};
}
}
private async handlePaymentSuccess(
event: WebhookEvent
): Promise<EventProcessingResult> {
const paymentIntent = event.data.object;
// Update payment status in database
await this.updatePaymentStatus(
paymentIntent.metadata?.orderId || paymentIntent.id,
"completed",
{
stripePaymentIntentId: paymentIntent.id,
amount: paymentIntent.amount,
currency: paymentIntent.currency,
}
);
// Send confirmation email
await this.sendPaymentConfirmation(paymentIntent);
// Update inventory
await this.updateInventory(paymentIntent.metadata?.orderId);
return {
success: true,
message: `Payment confirmed: ${paymentIntent.id}`,
data: { paymentIntentId: paymentIntent.id },
};
}
private async handlePaymentFailed(
event: WebhookEvent
): Promise<EventProcessingResult> {
const paymentIntent = event.data.object;
// Update payment status in database
await this.updatePaymentStatus(
paymentIntent.metadata?.orderId || paymentIntent.id,
"failed",
{
stripePaymentIntentId: paymentIntent.id,
failureReason: paymentIntent.last_payment_error?.message,
}
);
// Send failure notification
await this.sendPaymentFailureNotification(paymentIntent);
// Release inventory hold
await this.releaseInventoryHold(paymentIntent.metadata?.orderId);
return {
success: true,
message: `Payment failure processed: ${paymentIntent.id}`,
data: { paymentIntentId: paymentIntent.id },
};
}
private async handleSubscriptionCreated(
event: WebhookEvent
): Promise<EventProcessingResult> {
const subscription = event.data.object;
// Create subscription record
await this.createSubscription({
stripeSubscriptionId: subscription.id,
customerId: subscription.customer,
status: subscription.status,
currentPeriodStart: subscription.current_period_start * 1000,
currentPeriodEnd: subscription.current_period_end * 1000,
items: subscription.items.data,
});
// Send welcome email
await this.sendSubscriptionWelcome(subscription);
return {
success: true,
message: `Subscription created: ${subscription.id}`,
data: { subscriptionId: subscription.id },
};
}
// Placeholder methods for actual business logic
private async updatePaymentStatus(
orderId: string,
status: string,
metadata: any
): Promise<void> {
// Implementation would update your database
console.log(`Updating payment status: ${orderId} -> ${status}`);
}
private async sendPaymentConfirmation(paymentIntent: any): Promise<void> {
// Implementation would send email via your email service
console.log(`Sending payment confirmation for: ${paymentIntent.id}`);
}
private async updateInventory(orderId: string): Promise<void> {
// Implementation would update inventory levels
console.log(`Updating inventory for order: ${orderId}`);
}
private async sendPaymentFailureNotification(
paymentIntent: any
): Promise<void> {
// Implementation would notify customer of payment failure
console.log(`Sending failure notification for: ${paymentIntent.id}`);
}
private async releaseInventoryHold(orderId: string): Promise<void> {
// Implementation would release inventory hold
console.log(`Releasing inventory hold for order: ${orderId}`);
}
private async createSubscription(subscriptionData: any): Promise<void> {
// Implementation would create subscription record
console.log(
`Creating subscription: ${subscriptionData.stripeSubscriptionId}`
);
}
private async sendSubscriptionWelcome(subscription: any): Promise<void> {
// Implementation would send welcome email
console.log(`Sending subscription welcome: ${subscription.id}`);
}
private isRetryableError(error: any): boolean {
const retryableErrors = [
"DATABASE_CONNECTION_ERROR",
"EMAIL_SERVICE_ERROR",
"INVENTORY_SERVICE_ERROR",
];
return retryableErrors.includes(error.code) || error.status >= 500;
}
}
// Supporting classes and interfaces
interface WebhookEngineConfig {
queueConfig: any;
cacheConfig: any;
rateLimitConfig: any;
retryConfig: any;
processors: WebhookProcessorConfig[];
workerCount?: number;
}
interface WebhookProcessorConfig {
provider: string;
secretKey: string;
webhookSecret: string;
maxEvents?: number;
}
interface WebhookProcessor {
validateSignature(
signature: string,
payload: any,
headers: Record<string, string>
): Promise<boolean>;
parseEvents(
payload: any,
headers: Record<string, string>
): Promise<WebhookEvent[]>;
processEvent(
event: WebhookEvent,
webhook: IncomingWebhook
): Promise<EventProcessingResult>;
}
interface IncomingWebhook {
id: string;
provider: string;
signature: string;
payload: any;
headers: Record<string, string>;
receivedAt: number;
status: "received" | "processing" | "retrying" | "completed" | "failed";
attempts: number;
maxAttempts: number;
nextRetryAt: number | null;
lastError?: string;
}
interface WebhookEvent {
id: string;
type: string;
data: any;
createdAt: number;
metadata?: any;
}
interface EventProcessingResult {
success: boolean;
message?: string;
data?: any;
error?: string;
retryable?: boolean;
}
interface WebhookProcessingResult {
webhookId: string;
totalEvents: number;
successfulEvents: number;
failedEvents: number;
results: EventProcessingResult[];
processedAt: number;
}
interface WebhookReceiptResult {
webhookId: string;
status: "accepted" | "rejected";
message: string;
error?: string;
}
interface WebhookReplayResult {
webhookId: string;
status: "replaying" | "replay_failed";
message: string;
error?: string;
}
interface BatchReplayResult {
total: number;
successful: number;
failed: number;
results: WebhookReplayResult[];
}
interface WebhookReplayCriteria {
provider?: string;
dateRange?: { start: Date; end: Date };
errorType?: string;
limit?: number;
}
interface WebhookStatistics {
totalReceived: number;
totalProcessed: number;
totalFailed: number;
successRate: number;
averageProcessingTime: number;
byProvider: Record<string, any>;
}
interface WebhookHealthStatus {
queues: {
incoming: number;
processing: number;
deadLetter: number;
};
processingRates: any;
errorRates: any;
healthScore: number;
lastUpdated: number;
}
interface TimeRange {
start: Date;
end: Date;
}
class WebhookError extends Error {
constructor(
message: string,
public code: string,
public status?: number,
public metadata?: any
) {
super(message);
this.name = "WebhookError";
}
}
// Placeholder classes that would need full implementation
class WebhookQueue {
constructor(private name: string, private config: any) {}
async enqueue(webhook: IncomingWebhook): Promise<void> {}
async enqueueDelayed(
webhook: IncomingWebhook,
delayMs: number
): Promise<void> {}
async dequeue(timeoutMs?: number): Promise<IncomingWebhook | null> {
return null;
}
async dequeueReady(timeoutMs?: number): Promise<IncomingWebhook | null> {
return null;
}
async findById(id: string): Promise<IncomingWebhook | null> {
return null;
}
async findByCriteria(
criteria: WebhookReplayCriteria
): Promise<IncomingWebhook[]> {
return [];
}
async remove(id: string): Promise<void> {}
async getSize(): Promise<number> {
return 0;
}
}
class DeduplicationCache {
constructor(private config: any) {}
async isDuplicate(key: string): Promise<boolean> {
return false;
}
async markProcessing(key: string, webhookId: string): Promise<void> {}
async markCompleted(
key: string,
result: WebhookProcessingResult
): Promise<void> {}
}
class WebhookRateLimiter {
constructor(private config: any) {}
async checkLimit(
provider: string
): Promise<{ allowed: boolean; retryAfter?: number }> {
return { allowed: true };
}
}
class WebhookRetryManager {
constructor(private config: any) {}
}
class WebhookMetrics {
async recordReceived(provider: string): Promise<void> {}
async recordRejected(provider: string, reason: string): Promise<void> {}
async recordProcessed(
provider: string,
eventCount: number,
duration: number
): Promise<void> {}
async recordRetry(provider: string, attempt: number): Promise<void> {}
async recordFailed(provider: string, duration: number): Promise<void> {}
async recordDuplicate(provider: string): Promise<void> {}
async getStatistics(
timeRange: TimeRange,
provider?: string
): Promise<WebhookStatistics> {
return {} as WebhookStatistics;
}
async getProcessingRates(): Promise<any> {
return {};
}
async getErrorRates(): Promise<any> {
return {};
}
}
API Rate Limiting and Retry Strategies: Intelligent Request Management
Adaptive Rate Limiting and Smart Retry System
// Advanced rate limiting and retry system that adapts to API behavior
class IntelligentAPIManager {
private rateLimiters: Map<string, AdaptiveRateLimiter> = new Map();
private retryStrategies: Map<string, RetryStrategy> = new Map();
private circuitBreakers: Map<string, CircuitBreaker> = new Map();
private apiMetrics: APIMetricsCollector;
private healthMonitor: APIHealthMonitor;
private backoffCalculator: BackoffCalculator;
constructor(config: APIManagerConfig) {
this.apiMetrics = new APIMetricsCollector();
this.healthMonitor = new APIHealthMonitor();
this.backoffCalculator = new BackoffCalculator();
this.initializeAPIConfigs(config.apis);
this.startHealthMonitoring();
this.startRateLimitAdjustment();
}
private initializeAPIConfigs(apiConfigs: APIConfig[]): void {
for (const config of apiConfigs) {
// Initialize adaptive rate limiter
const rateLimiter = new AdaptiveRateLimiter({
name: config.name,
initialLimit: config.rateLimit.initialLimit,
windowMs: config.rateLimit.windowMs,
maxLimit:
config.rateLimit.maxLimit || config.rateLimit.initialLimit * 10,
minLimit:
config.rateLimit.minLimit ||
Math.max(1, config.rateLimit.initialLimit / 10),
adaptationFactor: config.rateLimit.adaptationFactor || 0.1,
});
// Initialize retry strategy
const retryStrategy = new RetryStrategy({
name: config.name,
maxAttempts: config.retry.maxAttempts,
baseDelayMs: config.retry.baseDelayMs,
maxDelayMs: config.retry.maxDelayMs,
backoffType: config.retry.backoffType,
jitterFactor: config.retry.jitterFactor || 0.1,
retryableStatusCodes: config.retry.retryableStatusCodes,
retryableErrors: config.retry.retryableErrors,
});
// Initialize circuit breaker
const circuitBreaker = new CircuitBreaker({
name: config.name,
failureThreshold: config.circuitBreaker.failureThreshold,
recoveryTimeoutMs: config.circuitBreaker.recoveryTimeoutMs,
halfOpenMaxCalls: config.circuitBreaker.halfOpenMaxCalls || 3,
monitoringWindowMs: config.circuitBreaker.monitoringWindowMs || 60000,
});
this.rateLimiters.set(config.name, rateLimiter);
this.retryStrategies.set(config.name, retryStrategy);
this.circuitBreakers.set(config.name, circuitBreaker);
console.log(`Initialized API management for: ${config.name}`);
}
}
// Execute API request with intelligent rate limiting and retry
async executeRequest<T>(
apiName: string,
requestConfig: APIRequestConfig
): Promise<APIExecutionResult<T>> {
const startTime = Date.now();
const requestId = this.generateRequestId();
const rateLimiter = this.rateLimiters.get(apiName);
const retryStrategy = this.retryStrategies.get(apiName);
const circuitBreaker = this.circuitBreakers.get(apiName);
if (!rateLimiter || !retryStrategy || !circuitBreaker) {
throw new Error(`API configuration not found: ${apiName}`);
}
// Check circuit breaker state
if (circuitBreaker.isOpen()) {
const error = new APIError(
`Circuit breaker open for API: ${apiName}`,
"CIRCUIT_BREAKER_OPEN",
503
);
await this.apiMetrics.recordRequest(apiName, requestId, {
status: "circuit_breaker_open",
duration: Date.now() - startTime,
error: error.code,
});
throw error;
}
let attemptNumber = 0;
let lastError: APIError;
while (attemptNumber < retryStrategy.maxAttempts) {
attemptNumber++;
try {
// Wait for rate limit availability
await this.waitForRateLimit(rateLimiter, requestId);
// Execute the actual request
const result = await this.executeActualRequest<T>(
apiName,
requestConfig,
requestId,
attemptNumber
);
// Record success metrics
await this.recordSuccessMetrics(
apiName,
requestId,
startTime,
attemptNumber
);
// Update circuit breaker
circuitBreaker.recordSuccess();
// Adjust rate limits based on success
await this.adjustRateLimitsOnSuccess(rateLimiter, result);
return {
requestId,
data: result.data,
status: result.status,
headers: result.headers,
duration: Date.now() - startTime,
attempts: attemptNumber,
apiName,
};
} catch (error) {
lastError = error as APIError;
// Record error metrics
await this.recordErrorMetrics(
apiName,
requestId,
startTime,
attemptNumber,
lastError
);
// Update circuit breaker
circuitBreaker.recordFailure();
// Adjust rate limits based on error
await this.adjustRateLimitsOnError(rateLimiter, lastError);
// Check if error is retryable
if (!this.shouldRetry(retryStrategy, lastError, attemptNumber)) {
break;
}
// Calculate and wait for retry delay
if (attemptNumber < retryStrategy.maxAttempts) {
const retryDelay = await this.calculateRetryDelay(
retryStrategy,
lastError,
attemptNumber
);
console.log(
`Retrying API request: ${requestId} (attempt ${attemptNumber + 1}/${
retryStrategy.maxAttempts
}) after ${retryDelay}ms`
);
await new Promise((resolve) => setTimeout(resolve, retryDelay));
}
}
}
// All retries exhausted
throw new APIError(
`All retry attempts exhausted for API: ${apiName}`,
"MAX_RETRIES_EXCEEDED",
lastError?.status || 500,
{
originalError: lastError,
attempts: attemptNumber,
totalDuration: Date.now() - startTime,
}
);
}
private async waitForRateLimit(
rateLimiter: AdaptiveRateLimiter,
requestId: string
): Promise<void> {
const waitTime = await rateLimiter.getWaitTime();
if (waitTime > 0) {
console.log(`Rate limit wait: ${requestId} waiting ${waitTime}ms`);
await new Promise((resolve) => setTimeout(resolve, waitTime));
}
await rateLimiter.acquireToken();
}
private async executeActualRequest<T>(
apiName: string,
requestConfig: APIRequestConfig,
requestId: string,
attemptNumber: number
): Promise<RawAPIResponse<T>> {
const requestStart = Date.now();
try {
// Add retry headers
const headers = {
...requestConfig.headers,
"X-Request-ID": requestId,
"X-Retry-Attempt": attemptNumber.toString(),
};
// Execute request with timeout
const timeoutPromise = new Promise<never>((_, reject) => {
setTimeout(
() => reject(new APIError("Request timeout", "TIMEOUT", 408)),
requestConfig.timeoutMs || 30000
);
});
const requestPromise = this.makeHTTPRequest<T>({
...requestConfig,
headers,
});
const response = await Promise.race([requestPromise, timeoutPromise]);
// Check for API-specific error conditions
this.validateAPIResponse(response);
return response;
} catch (error) {
const duration = Date.now() - requestStart;
// Transform error to standardized format
throw this.transformError(error, apiName, duration);
}
}
private async calculateRetryDelay(
strategy: RetryStrategy,
error: APIError,
attemptNumber: number
): Promise<number> {
let delay: number;
// Check if API provided Retry-After header
if (error.retryAfter) {
delay = error.retryAfter * 1000; // Convert seconds to milliseconds
} else {
// Use configured backoff strategy
delay = this.backoffCalculator.calculateDelay(
strategy.backoffType,
strategy.baseDelayMs,
attemptNumber,
strategy.maxDelayMs
);
}
// Add jitter to prevent thundering herd
if (strategy.jitterFactor > 0) {
const jitter = delay * strategy.jitterFactor * (Math.random() - 0.5) * 2;
delay = Math.max(0, delay + jitter);
}
// Apply error-specific delay multipliers
delay = this.applyErrorSpecificDelay(delay, error);
return Math.min(delay, strategy.maxDelayMs);
}
private applyErrorSpecificDelay(baseDelay: number, error: APIError): number {
const multipliers: Record<string, number> = {
RATE_LIMITED: 2.0, // Wait longer for rate limit errors
SERVER_ERROR: 1.5, // Wait a bit longer for server errors
TIMEOUT: 1.2, // Slightly longer for timeouts
CONNECTION_ERROR: 2.0, // Much longer for connection issues
};
const multiplier = multipliers[error.code] || 1.0;
return baseDelay * multiplier;
}
private async adjustRateLimitsOnSuccess(
rateLimiter: AdaptiveRateLimiter,
response: RawAPIResponse<any>
): Promise<void> {
// Check for rate limit headers
const remaining = this.parseRateLimitHeader(response.headers, "remaining");
const limit = this.parseRateLimitHeader(response.headers, "limit");
const resetTime = this.parseRateLimitHeader(response.headers, "reset");
if (remaining !== null && limit !== null) {
const usagePercent = (limit - remaining) / limit;
// If we're using less than 70% of available rate limit, we can increase
if (usagePercent < 0.7) {
await rateLimiter.adjustLimit("increase", {
reason: "low_usage",
currentUsage: usagePercent,
responseTime: response.duration || 0,
});
}
}
// If response was fast, we might be able to increase rate limit
const responseTime = response.duration || 0;
if (responseTime < 1000) {
// Less than 1 second
await rateLimiter.adjustLimit("increase", {
reason: "fast_response",
responseTime,
});
}
}
private async adjustRateLimitsOnError(
rateLimiter: AdaptiveRateLimiter,
error: APIError
): Promise<void> {
if (error.code === "RATE_LIMITED") {
// Decrease rate limit aggressively on rate limit errors
await rateLimiter.adjustLimit("decrease", {
reason: "rate_limited",
severity: "high",
retryAfter: error.retryAfter,
});
} else if (error.status && error.status >= 500) {
// Decrease rate limit moderately on server errors
await rateLimiter.adjustLimit("decrease", {
reason: "server_error",
severity: "medium",
status: error.status,
});
} else if (error.code === "TIMEOUT") {
// Decrease rate limit on timeouts
await rateLimiter.adjustLimit("decrease", {
reason: "timeout",
severity: "medium",
});
}
}
private parseRateLimitHeader(
headers: Record<string, any>,
type: "remaining" | "limit" | "reset"
): number | null {
// Common rate limit header patterns
const headerVariations: Record<string, string[]> = {
remaining: [
"x-ratelimit-remaining",
"x-rate-limit-remaining",
"ratelimit-remaining",
"rate-limit-remaining",
],
limit: [
"x-ratelimit-limit",
"x-rate-limit-limit",
"ratelimit-limit",
"rate-limit-limit",
],
reset: [
"x-ratelimit-reset",
"x-rate-limit-reset",
"ratelimit-reset",
"rate-limit-reset",
],
};
for (const headerName of headerVariations[type]) {
const value = headers[headerName] || headers[headerName.toLowerCase()];
if (value !== undefined) {
const parsed = parseInt(value, 10);
return isNaN(parsed) ? null : parsed;
}
}
return null;
}
// API health monitoring and adaptive adjustment
private startHealthMonitoring(): void {
setInterval(async () => {
for (const [apiName, rateLimiter] of this.rateLimiters) {
try {
const metrics = await this.apiMetrics.getRecentMetrics(
apiName,
300000
); // Last 5 minutes
const healthScore = this.calculateAPIHealthScore(metrics);
await this.healthMonitor.recordHealth(apiName, healthScore, metrics);
// Auto-adjust based on health
if (healthScore < 0.7) {
console.warn(`API health degraded for ${apiName}: ${healthScore}`);
await rateLimiter.adjustLimit("decrease", {
reason: "poor_health",
healthScore,
});
} else if (healthScore > 0.9) {
await rateLimiter.adjustLimit("increase", {
reason: "good_health",
healthScore,
});
}
} catch (error) {
console.error(`Health monitoring error for ${apiName}:`, error);
}
}
}, 60000); // Every minute
}
private startRateLimitAdjustment(): void {
setInterval(async () => {
for (const [apiName, rateLimiter] of this.rateLimiters) {
try {
const stats = await rateLimiter.getStatistics();
// Auto-adjust based on token utilization
if (stats.utilizationRate < 0.5) {
// Low utilization, can increase
await rateLimiter.adjustLimit("increase", {
reason: "low_utilization",
utilizationRate: stats.utilizationRate,
});
} else if (stats.utilizationRate > 0.9) {
// High utilization, might need to decrease
const recentErrors = await this.apiMetrics.getErrorRate(
apiName,
60000
);
if (recentErrors > 0.1) {
await rateLimiter.adjustLimit("decrease", {
reason: "high_utilization_with_errors",
utilizationRate: stats.utilizationRate,
errorRate: recentErrors,
});
}
}
} catch (error) {
console.error(`Rate limit adjustment error for ${apiName}:`, error);
}
}
}, 30000); // Every 30 seconds
}
private calculateAPIHealthScore(metrics: APIMetrics): number {
let score = 1.0;
// Factor in error rate
if (metrics.errorRate > 0.1) score *= 0.5; // 10% error rate
else if (metrics.errorRate > 0.05) score *= 0.7; // 5% error rate
else if (metrics.errorRate > 0.01) score *= 0.9; // 1% error rate
// Factor in response time
if (metrics.avgResponseTime > 10000) score *= 0.3; // 10+ seconds
else if (metrics.avgResponseTime > 5000) score *= 0.6; // 5+ seconds
else if (metrics.avgResponseTime > 2000) score *= 0.8; // 2+ seconds
// Factor in timeout rate
if (metrics.timeoutRate > 0.05) score *= 0.4; // 5% timeout rate
else if (metrics.timeoutRate > 0.01) score *= 0.8; // 1% timeout rate
return Math.max(0, Math.min(1, score));
}
// Batch request processing with intelligent concurrency
async executeBatchRequests<T>(
apiName: string,
requests: APIRequestConfig[],
options: BatchRequestOptions = {}
): Promise<BatchExecutionResult<T>> {
const batchId = this.generateRequestId();
const concurrency = options.concurrency || 5;
const failFast = options.failFast || false;
console.log(
`Executing batch requests: ${batchId} (${requests.length} requests, concurrency: ${concurrency})`
);
const results: APIExecutionResult<T>[] = [];
const errors: BatchRequestError[] = [];
const executing: Promise<any>[] = [];
for (let i = 0; i < requests.length; i++) {
const request = requests[i];
const promise = this.executeRequest<T>(apiName, {
...request,
headers: {
...request.headers,
"X-Batch-ID": batchId,
"X-Batch-Index": i.toString(),
},
}).then(
(result) => {
results.push(result);
return result;
},
(error) => {
const batchError: BatchRequestError = {
index: i,
request,
error,
};
errors.push(batchError);
if (failFast) {
throw error;
}
return batchError;
}
);
executing.push(promise);
// Control concurrency
if (executing.length >= concurrency) {
await Promise.race(executing);
const finishedIndex = executing.findIndex((p) => p === promise);
if (finishedIndex !== -1) {
executing.splice(finishedIndex, 1);
}
}
}
// Wait for all remaining requests
if (failFast) {
await Promise.all(executing);
} else {
await Promise.allSettled(executing);
}
return {
batchId,
totalRequests: requests.length,
successfulRequests: results.length,
failedRequests: errors.length,
results,
errors,
duration: Date.now() - parseInt(batchId.split("_")[1]),
};
}
// Get API statistics and health
async getAPIStatistics(
apiName: string,
timeRange?: TimeRange
): Promise<APIStatistics> {
const rateLimiter = this.rateLimiters.get(apiName);
const circuitBreaker = this.circuitBreakers.get(apiName);
if (!rateLimiter || !circuitBreaker) {
throw new Error(`API configuration not found: ${apiName}`);
}
const metrics = await this.apiMetrics.getMetrics(apiName, timeRange);
const rateLimitStats = await rateLimiter.getStatistics();
const circuitBreakerState = circuitBreaker.getState();
const healthScore = this.calculateAPIHealthScore(metrics);
return {
apiName,
healthScore,
metrics,
rateLimiting: rateLimitStats,
circuitBreaker: circuitBreakerState,
timeRange: timeRange || {
start: new Date(Date.now() - 3600000), // Last hour
end: new Date(),
},
};
}
// Utility methods
private shouldRetry(
strategy: RetryStrategy,
error: APIError,
attemptNumber: number
): boolean {
if (attemptNumber >= strategy.maxAttempts) {
return false;
}
// Check non-retryable status codes
if (
error.status &&
strategy.nonRetryableStatusCodes?.includes(error.status)
) {
return false;
}
// Check retryable status codes
if (error.status && strategy.retryableStatusCodes?.includes(error.status)) {
return true;
}
// Check retryable error codes
if (strategy.retryableErrors?.includes(error.code)) {
return true;
}
// Default retryable conditions
return error.status
? error.status >= 500
: ["TIMEOUT", "CONNECTION_ERROR"].includes(error.code);
}
private validateAPIResponse(response: RawAPIResponse<any>): void {
// Basic response validation
if (!response.status || response.status < 200 || response.status >= 400) {
throw new APIError(
`HTTP error: ${response.status}`,
"HTTP_ERROR",
response.status
);
}
}
private transformError(
error: any,
apiName: string,
duration: number
): APIError {
if (error instanceof APIError) {
return error;
}
// Transform common error types
if (error.code === "ENOTFOUND" || error.code === "ECONNREFUSED") {
return new APIError("Connection error", "CONNECTION_ERROR", 0, {
originalError: error,
apiName,
duration,
});
}
if (error.code === "ETIMEDOUT") {
return new APIError("Request timeout", "TIMEOUT", 408, {
originalError: error,
apiName,
duration,
});
}
// Handle HTTP errors
if (error.response) {
const status = error.response.status || 500;
const retryAfter = this.parseRetryAfterHeader(error.response.headers);
return new APIError(
error.response.statusText || error.message,
status === 429 ? "RATE_LIMITED" : "HTTP_ERROR",
status,
{
originalError: error,
apiName,
duration,
retryAfter,
responseData: error.response.data,
}
);
}
return new APIError(
error.message || "Unknown API error",
"UNKNOWN_ERROR",
500,
{ originalError: error, apiName, duration }
);
}
private parseRetryAfterHeader(headers: Record<string, any>): number | null {
const retryAfter = headers["retry-after"] || headers["Retry-After"];
if (!retryAfter) return null;
const parsed = parseInt(retryAfter, 10);
return isNaN(parsed) ? null : parsed;
}
private async makeHTTPRequest<T>(
config: APIRequestConfig
): Promise<RawAPIResponse<T>> {
// This would use your HTTP client (axios, fetch, etc.)
// Placeholder implementation
throw new Error("HTTP request implementation needed");
}
private async recordSuccessMetrics(
apiName: string,
requestId: string,
startTime: number,
attempts: number
): Promise<void> {
await this.apiMetrics.recordRequest(apiName, requestId, {
status: "success",
duration: Date.now() - startTime,
attempts,
});
}
private async recordErrorMetrics(
apiName: string,
requestId: string,
startTime: number,
attempts: number,
error: APIError
): Promise<void> {
await this.apiMetrics.recordRequest(apiName, requestId, {
status: "error",
duration: Date.now() - startTime,
attempts,
error: error.code,
httpStatus: error.status,
});
}
private generateRequestId(): string {
return `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
}
// Supporting classes for rate limiting and retry logic
class AdaptiveRateLimiter {
private tokens: number;
private lastRefillTime: number;
private currentLimit: number;
private statistics: RateLimitStatistics;
constructor(private config: RateLimiterConfig) {
this.tokens = config.initialLimit;
this.currentLimit = config.initialLimit;
this.lastRefillTime = Date.now();
this.statistics = {
tokensConsumed: 0,
tokensRefilled: 0,
waitTime: 0,
adjustments: 0,
utilizationRate: 0,
};
}
async acquireToken(): Promise<void> {
this.refillTokens();
if (this.tokens <= 0) {
throw new APIError("No tokens available", "RATE_LIMITED", 429);
}
this.tokens--;
this.statistics.tokensConsumed++;
}
async getWaitTime(): Promise<number> {
this.refillTokens();
if (this.tokens > 0) {
return 0;
}
// Calculate time until next token is available
const timeSinceRefill = Date.now() - this.lastRefillTime;
const timeUntilRefill =
this.config.windowMs - (timeSinceRefill % this.config.windowMs);
return timeUntilRefill;
}
async adjustLimit(
direction: "increase" | "decrease",
context: RateLimitAdjustmentContext
): Promise<void> {
const factor = this.config.adaptationFactor;
let adjustment: number;
if (direction === "increase") {
adjustment = Math.ceil(this.currentLimit * factor);
this.currentLimit = Math.min(
this.config.maxLimit,
this.currentLimit + adjustment
);
} else {
adjustment = Math.ceil(this.currentLimit * factor);
this.currentLimit = Math.max(
this.config.minLimit,
this.currentLimit - adjustment
);
}
console.log(
`Rate limit adjusted for ${this.config.name}: ${this.currentLimit} (${direction} by ${adjustment}, reason: ${context.reason})`
);
this.statistics.adjustments++;
}
private refillTokens(): void {
const now = Date.now();
const timePassed = now - this.lastRefillTime;
if (timePassed >= this.config.windowMs) {
const periods = Math.floor(timePassed / this.config.windowMs);
const tokensToAdd = periods * this.currentLimit;
this.tokens = Math.min(this.currentLimit, this.tokens + tokensToAdd);
this.lastRefillTime = now;
this.statistics.tokensRefilled += tokensToAdd;
}
// Update utilization rate
this.statistics.utilizationRate =
(this.currentLimit - this.tokens) / this.currentLimit;
}
async getStatistics(): Promise<RateLimitStatistics> {
return { ...this.statistics };
}
}
class RetryStrategy {
constructor(public config: RetryStrategyConfig) {}
get maxAttempts(): number {
return this.config.maxAttempts;
}
get baseDelayMs(): number {
return this.config.baseDelayMs;
}
get maxDelayMs(): number {
return this.config.maxDelayMs;
}
get backoffType(): string {
return this.config.backoffType;
}
get jitterFactor(): number {
return this.config.jitterFactor;
}
get retryableStatusCodes(): number[] | undefined {
return this.config.retryableStatusCodes;
}
get nonRetryableStatusCodes(): number[] | undefined {
return this.config.nonRetryableStatusCodes;
}
get retryableErrors(): string[] | undefined {
return this.config.retryableErrors;
}
}
class CircuitBreaker {
private failures = 0;
private successes = 0;
private lastFailureTime = 0;
private state: "CLOSED" | "OPEN" | "HALF_OPEN" = "CLOSED";
private halfOpenCalls = 0;
constructor(private config: CircuitBreakerConfig) {}
isOpen(): boolean {
if (this.state === "OPEN") {
if (Date.now() - this.lastFailureTime > this.config.recoveryTimeoutMs) {
this.state = "HALF_OPEN";
this.halfOpenCalls = 0;
return false;
}
return true;
}
if (this.state === "HALF_OPEN") {
return this.halfOpenCalls >= this.config.halfOpenMaxCalls;
}
return false;
}
recordSuccess(): void {
this.successes++;
if (this.state === "HALF_OPEN") {
this.halfOpenCalls++;
if (this.halfOpenCalls >= this.config.halfOpenMaxCalls) {
this.state = "CLOSED";
this.failures = 0;
}
} else if (this.state === "CLOSED") {
// Reset failure count on success
this.failures = 0;
}
}
recordFailure(): void {
this.failures++;
this.lastFailureTime = Date.now();
if (this.state === "HALF_OPEN") {
this.state = "OPEN";
} else if (this.shouldTrip()) {
this.state = "OPEN";
}
}
private shouldTrip(): boolean {
const totalCalls = this.failures + this.successes;
if (totalCalls === 0) return false;
const failureRate = this.failures / totalCalls;
return failureRate >= this.config.failureThreshold;
}
getState(): CircuitBreakerState {
return {
state: this.state,
failures: this.failures,
successes: this.successes,
failureRate:
this.failures + this.successes === 0
? 0
: this.failures / (this.failures + this.successes),
lastFailureTime: this.lastFailureTime,
};
}
}
class BackoffCalculator {
calculateDelay(
backoffType: string,
baseDelayMs: number,
attemptNumber: number,
maxDelayMs: number
): number {
let delay: number;
switch (backoffType) {
case "linear":
delay = baseDelayMs * attemptNumber;
break;
case "exponential":
delay = baseDelayMs * Math.pow(2, attemptNumber - 1);
break;
case "polynomial":
delay = baseDelayMs * Math.pow(attemptNumber, 2);
break;
case "fibonacci":
delay = baseDelayMs * this.fibonacci(attemptNumber);
break;
default:
delay = baseDelayMs;
}
return Math.min(delay, maxDelayMs);
}
private fibonacci(n: number): number {
if (n <= 1) return 1;
let a = 1,
b = 1;
for (let i = 2; i < n; i++) {
[a, b] = [b, a + b];
}
return b;
}
}
// Error class for API operations
class APIError extends Error {
constructor(
message: string,
public code: string,
public status?: number,
public metadata?: any
) {
super(message);
this.name = "APIError";
}
get retryAfter(): number | null {
return this.metadata?.retryAfter || null;
}
}
// Supporting interfaces and types
interface APIManagerConfig {
apis: APIConfig[];
}
interface APIConfig {
name: string;
baseUrl: string;
rateLimit: {
initialLimit: number;
windowMs: number;
maxLimit?: number;
minLimit?: number;
adaptationFactor?: number;
};
retry: {
maxAttempts: number;
baseDelayMs: number;
maxDelayMs: number;
backoffType: string;
jitterFactor?: number;
retryableStatusCodes?: number[];
nonRetryableStatusCodes?: number[];
retryableErrors?: string[];
};
circuitBreaker: {
failureThreshold: number;
recoveryTimeoutMs: number;
halfOpenMaxCalls?: number;
monitoringWindowMs?: number;
};
}
interface RateLimiterConfig {
name: string;
initialLimit: number;
windowMs: number;
maxLimit: number;
minLimit: number;
adaptationFactor: number;
}
interface RetryStrategyConfig {
name: string;
maxAttempts: number;
baseDelayMs: number;
maxDelayMs: number;
backoffType: string;
jitterFactor: number;
retryableStatusCodes?: number[];
nonRetryableStatusCodes?: number[];
retryableErrors?: string[];
}
interface CircuitBreakerConfig {
name: string;
failureThreshold: number;
recoveryTimeoutMs: number;
halfOpenMaxCalls: number;
monitoringWindowMs: number;
}
interface APIRequestConfig {
method: string;
url: string;
headers?: Record<string, string>;
data?: any;
params?: Record<string, any>;
timeoutMs?: number;
}
interface APIExecutionResult<T> {
requestId: string;
data: T;
status: number;
headers: Record<string, any>;
duration: number;
attempts: number;
apiName: string;
}
interface RawAPIResponse<T> {
data: T;
status: number;
headers: Record<string, any>;
duration?: number;
}
interface BatchRequestOptions {
concurrency?: number;
failFast?: boolean;
}
interface BatchExecutionResult<T> {
batchId: string;
totalRequests: number;
successfulRequests: number;
failedRequests: number;
results: APIExecutionResult<T>[];
errors: BatchRequestError[];
duration: number;
}
interface BatchRequestError {
index: number;
request: APIRequestConfig;
error: APIError;
}
interface RateLimitStatistics {
tokensConsumed: number;
tokensRefilled: number;
waitTime: number;
adjustments: number;
utilizationRate: number;
}
interface RateLimitAdjustmentContext {
reason: string;
severity?: "low" | "medium" | "high";
healthScore?: number;
utilizationRate?: number;
errorRate?: number;
responseTime?: number;
retryAfter?: number;
status?: number;
currentUsage?: number;
}
interface CircuitBreakerState {
state: "CLOSED" | "OPEN" | "HALF_OPEN";
failures: number;
successes: number;
failureRate: number;
lastFailureTime: number;
}
interface APIMetrics {
errorRate: number;
avgResponseTime: number;
timeoutRate: number;
totalRequests: number;
successfulRequests: number;
failedRequests: number;
}
interface APIStatistics {
apiName: string;
healthScore: number;
metrics: APIMetrics;
rateLimiting: RateLimitStatistics;
circuitBreaker: CircuitBreakerState;
timeRange: TimeRange;
}
// Placeholder classes that would need full implementation
class APIMetricsCollector {
async recordRequest(
apiName: string,
requestId: string,
data: any
): Promise<void> {}
async getRecentMetrics(
apiName: string,
timeWindowMs: number
): Promise<APIMetrics> {
return {} as APIMetrics;
}
async getErrorRate(apiName: string, timeWindowMs: number): Promise<number> {
return 0;
}
async getMetrics(
apiName: string,
timeRange?: TimeRange
): Promise<APIMetrics> {
return {} as APIMetrics;
}
}
class APIHealthMonitor {
async recordHealth(
apiName: string,
healthScore: number,
metrics: APIMetrics
): Promise<void> {}
}
Service Reliability and Fallbacks: Graceful Degradation Strategies
Building Resilient Service Integration Architecture
// Comprehensive service reliability system with intelligent fallbacks
class ServiceReliabilityManager {
private services: Map<string, ServiceDefinition> = new Map();
private fallbackStrategies: Map<string, FallbackStrategy> = new Map();
private healthCheckers: Map<string, HealthChecker> = new Map();
private serviceStates: Map<string, ServiceState> = new Map();
private dependencyGraph: ServiceDependencyGraph;
private reliabilityMetrics: ReliabilityMetrics;
private alertManager: ServiceAlertManager;
constructor(config: ServiceReliabilityConfig) {
this.dependencyGraph = new ServiceDependencyGraph();
this.reliabilityMetrics = new ReliabilityMetrics();
this.alertManager = new ServiceAlertManager(config.alerting);
this.initializeServices(config.services);
this.startHealthMonitoring();
this.startReliabilityAnalysis();
}
private initializeServices(serviceConfigs: ServiceConfig[]): void {
for (const config of serviceConfigs) {
// Create service definition
const service: ServiceDefinition = {
name: config.name,
type: config.type,
endpoint: config.endpoint,
criticality: config.criticality || "medium",
dependencies: config.dependencies || [],
healthCheck: config.healthCheck,
sla: config.sla,
timeouts: {
connection: config.timeouts?.connection || 5000,
request: config.timeouts?.request || 30000,
healthCheck: config.timeouts?.healthCheck || 10000,
},
};
// Create fallback strategy
const fallbackStrategy = this.createFallbackStrategy(config);
// Create health checker
const healthChecker = new HealthChecker(service, config.healthCheck);
// Initialize service state
const serviceState: ServiceState = {
status: "unknown",
health: 0,
lastHealthCheck: 0,
consecutiveFailures: 0,
lastError: null,
responseTime: 0,
availability: 1.0,
errorRate: 0,
lastSuccessful: Date.now(),
};
this.services.set(config.name, service);
this.fallbackStrategies.set(config.name, fallbackStrategy);
this.healthCheckers.set(config.name, healthChecker);
this.serviceStates.set(config.name, serviceState);
// Build dependency graph
this.dependencyGraph.addService(config.name, config.dependencies || []);
console.log(`Initialized service reliability for: ${config.name}`);
}
}
// Execute service call with automatic fallback
async executeServiceCall<T>(
serviceName: string,
operation: ServiceOperation<T>,
options: ServiceCallOptions = {}
): Promise<ServiceCallResult<T>> {
const callId = this.generateCallId();
const startTime = Date.now();
try {
// Get service configuration and state
const service = this.services.get(serviceName);
const serviceState = this.serviceStates.get(serviceName);
const fallbackStrategy = this.fallbackStrategies.get(serviceName);
if (!service || !serviceState || !fallbackStrategy) {
throw new ServiceError(
`Service not found: ${serviceName}`,
"SERVICE_NOT_FOUND"
);
}
// Check if service is available
const availabilityCheck = await this.checkServiceAvailability(
serviceName,
service,
serviceState
);
if (!availabilityCheck.available) {
// Service unavailable, try fallback immediately
console.warn(
`Service ${serviceName} unavailable: ${availabilityCheck.reason}`
);
return await this.executeFallback(
serviceName,
operation,
callId,
availabilityCheck.reason
);
}
// Execute primary service call
try {
const result = await this.executePrimaryCall(
service,
operation,
callId,
options
);
// Record successful call
await this.recordSuccessfulCall(serviceName, serviceState, startTime);
return {
callId,
serviceName,
data: result,
source: "primary",
duration: Date.now() - startTime,
fallbackUsed: false,
degradedMode: false,
};
} catch (primaryError) {
console.warn(
`Primary service call failed: ${serviceName}`,
primaryError.message
);
// Record failure
await this.recordFailedCall(
serviceName,
serviceState,
startTime,
primaryError
);
// Determine if we should attempt fallback
if (this.shouldAttemptFallback(primaryError, service, options)) {
return await this.executeFallback(
serviceName,
operation,
callId,
primaryError.message,
primaryError
);
}
throw primaryError;
}
} catch (error) {
// Log final failure
await this.reliabilityMetrics.recordCall(serviceName, {
callId,
duration: Date.now() - startTime,
success: false,
error: error.message,
fallbackUsed: false,
});
throw error;
}
}
private async executePrimaryCall<T>(
service: ServiceDefinition,
operation: ServiceOperation<T>,
callId: string,
options: ServiceCallOptions
): Promise<T> {
const timeoutMs = options.timeout || service.timeouts.request;
// Create timeout promise
const timeoutPromise = new Promise<never>((_, reject) => {
setTimeout(
() => reject(new ServiceError("Service call timeout", "TIMEOUT")),
timeoutMs
);
});
// Execute operation with timeout
const operationPromise = operation.execute(service, {
callId,
timeout: timeoutMs,
retries: options.retries || 0,
});
return await Promise.race([operationPromise, timeoutPromise]);
}
private async executeFallback<T>(
serviceName: string,
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<ServiceCallResult<T>> {
const fallbackStrategy = this.fallbackStrategies.get(serviceName)!;
const startTime = Date.now();
try {
console.log(`Executing fallback for ${serviceName}: ${reason}`);
const fallbackResult = await fallbackStrategy.execute(
operation,
callId,
reason,
primaryError
);
// Record successful fallback
await this.reliabilityMetrics.recordCall(serviceName, {
callId,
duration: Date.now() - startTime,
success: true,
fallbackUsed: true,
fallbackType: fallbackStrategy.type,
});
return {
callId,
serviceName,
data: fallbackResult.data,
source: fallbackResult.source,
duration: Date.now() - startTime,
fallbackUsed: true,
degradedMode: fallbackResult.degraded || false,
fallbackReason: reason,
};
} catch (fallbackError) {
console.error(
`Fallback failed for ${serviceName}: ${fallbackError.message}`
);
// Record fallback failure
await this.reliabilityMetrics.recordCall(serviceName, {
callId,
duration: Date.now() - startTime,
success: false,
fallbackUsed: true,
fallbackType: fallbackStrategy.type,
error: fallbackError.message,
});
// If fallback fails, throw the original error
throw primaryError || fallbackError;
}
}
private createFallbackStrategy(config: ServiceConfig): FallbackStrategy {
const fallbackConfig = config.fallback;
switch (fallbackConfig.type) {
case "cache":
return new CacheFallbackStrategy(fallbackConfig);
case "default_value":
return new DefaultValueFallbackStrategy(fallbackConfig);
case "alternative_service":
return new AlternativeServiceFallbackStrategy(fallbackConfig, this);
case "degraded_response":
return new DegradedResponseFallbackStrategy(fallbackConfig);
case "composite":
return new CompositeFallbackStrategy(fallbackConfig, this);
default:
return new NoOpFallbackStrategy();
}
}
private async checkServiceAvailability(
serviceName: string,
service: ServiceDefinition,
state: ServiceState
): Promise<AvailabilityCheckResult> {
// Check if service is in maintenance mode
if (state.status === "maintenance") {
return {
available: false,
reason: "Service in maintenance mode",
canFallback: true,
};
}
// Check if service has too many consecutive failures
if (state.consecutiveFailures >= 5) {
return {
available: false,
reason: `Too many consecutive failures: ${state.consecutiveFailures}`,
canFallback: true,
};
}
// Check if service health is too low
if (state.health < 0.3) {
return {
available: false,
reason: `Service health too low: ${state.health}`,
canFallback: true,
};
}
// Check SLA compliance
if (service.sla && state.availability < service.sla.minAvailability) {
return {
available: false,
reason: `SLA violation - availability: ${state.availability}`,
canFallback: true,
};
}
// Check response time SLA
if (service.sla && state.responseTime > service.sla.maxResponseTimeMs) {
return {
available: false,
reason: `SLA violation - response time: ${state.responseTime}ms`,
canFallback: true,
};
}
return {
available: true,
reason: "Service is available",
canFallback: false,
};
}
private shouldAttemptFallback(
error: Error,
service: ServiceDefinition,
options: ServiceCallOptions
): boolean {
// Don't fallback if explicitly disabled
if (options.disableFallback) {
return false;
}
// Always fallback for critical services
if (service.criticality === "critical") {
return true;
}
// Fallback for specific error types
const fallbackErrors = [
"TIMEOUT",
"SERVICE_UNAVAILABLE",
"CONNECTION_ERROR",
"RATE_LIMITED",
];
if (error instanceof ServiceError && fallbackErrors.includes(error.code)) {
return true;
}
// Don't fallback for client errors (4xx)
if (error instanceof ServiceError && error.status && error.status < 500) {
return false;
}
// Fallback for server errors (5xx)
return true;
}
// Service dependency analysis
async analyzeDependencyImpact(
serviceName: string
): Promise<DependencyImpactAnalysis> {
const dependents = this.dependencyGraph.getDependents(serviceName);
const dependencies = this.dependencyGraph.getDependencies(serviceName);
const impact: ServiceImpact[] = [];
// Analyze impact on dependent services
for (const dependent of dependents) {
const dependentState = this.serviceStates.get(dependent);
const dependentService = this.services.get(dependent);
if (dependentState && dependentService) {
const impactLevel = this.calculateImpactLevel(
serviceName,
dependent,
dependentService
);
impact.push({
serviceName: dependent,
impactLevel,
mitigationStrategies: await this.getMitigationStrategies(
dependent,
serviceName
),
});
}
}
return {
serviceName,
directDependents: dependents,
transitiveDependents:
this.dependencyGraph.getTransitiveDependents(serviceName),
dependencies,
impact,
cascadeRisk: this.calculateCascadeRisk(impact),
};
}
private calculateImpactLevel(
failedService: string,
dependentService: string,
serviceDefinition: ServiceDefinition
): "low" | "medium" | "high" | "critical" {
// Check if the failed service is critical for the dependent
const dependency = serviceDefinition.dependencies.find(
(d) => d === failedService
);
// Impact based on service criticality
if (serviceDefinition.criticality === "critical") {
return "critical";
} else if (serviceDefinition.criticality === "high") {
return "high";
}
// Check if fallback is available
const fallbackStrategy = this.fallbackStrategies.get(dependentService);
if (fallbackStrategy && fallbackStrategy.type !== "none") {
return "medium"; // Impact reduced by fallback
}
return "high";
}
private calculateCascadeRisk(
impact: ServiceImpact[]
): "low" | "medium" | "high" {
const criticalImpacts = impact.filter(
(i) => i.impactLevel === "critical"
).length;
const highImpacts = impact.filter((i) => i.impactLevel === "high").length;
if (criticalImpacts > 2 || (criticalImpacts > 0 && highImpacts > 3)) {
return "high";
} else if (criticalImpacts > 0 || highImpacts > 2) {
return "medium";
}
return "low";
}
// Start health monitoring
private startHealthMonitoring(): void {
setInterval(async () => {
const healthChecks = Array.from(this.services.keys()).map(
async (serviceName) => {
try {
await this.performHealthCheck(serviceName);
} catch (error) {
console.error(`Health check error for ${serviceName}:`, error);
}
}
);
await Promise.allSettled(healthChecks);
}, 30000); // Every 30 seconds
}
private async performHealthCheck(serviceName: string): Promise<void> {
const service = this.services.get(serviceName)!;
const healthChecker = this.healthCheckers.get(serviceName)!;
const state = this.serviceStates.get(serviceName)!;
const startTime = Date.now();
try {
const healthResult = await healthChecker.check();
const duration = Date.now() - startTime;
// Update service state
state.status = healthResult.healthy ? "healthy" : "unhealthy";
state.health = healthResult.score;
state.lastHealthCheck = Date.now();
state.responseTime = duration;
if (healthResult.healthy) {
state.consecutiveFailures = 0;
state.lastSuccessful = Date.now();
} else {
state.consecutiveFailures++;
state.lastError = healthResult.error || "Health check failed";
}
// Record health metrics
await this.reliabilityMetrics.recordHealthCheck(serviceName, {
healthy: healthResult.healthy,
score: healthResult.score,
duration,
error: healthResult.error,
});
// Send alerts if needed
await this.checkHealthAlerts(serviceName, service, state, healthResult);
} catch (error) {
const duration = Date.now() - startTime;
state.status = "unhealthy";
state.health = 0;
state.lastHealthCheck = Date.now();
state.consecutiveFailures++;
state.lastError = error.message;
console.error(`Health check failed for ${serviceName}:`, error);
}
}
private async checkHealthAlerts(
serviceName: string,
service: ServiceDefinition,
state: ServiceState,
healthResult: HealthCheckResult
): Promise<void> {
// Alert on service down
if (!healthResult.healthy && state.consecutiveFailures === 1) {
await this.alertManager.sendAlert({
type: "service_down",
serviceName,
severity: service.criticality === "critical" ? "critical" : "warning",
message: `Service ${serviceName} is down: ${healthResult.error}`,
metadata: { state, healthResult },
});
}
// Alert on prolonged degradation
if (state.consecutiveFailures >= 5) {
await this.alertManager.sendAlert({
type: "service_degraded",
serviceName,
severity: "critical",
message: `Service ${serviceName} has been unhealthy for ${state.consecutiveFailures} consecutive checks`,
metadata: { state },
});
}
// Alert on SLA violations
if (service.sla) {
if (state.availability < service.sla.minAvailability) {
await this.alertManager.sendAlert({
type: "sla_violation",
serviceName,
severity: "warning",
message: `Service ${serviceName} availability (${state.availability}) below SLA (${service.sla.minAvailability})`,
metadata: { state, sla: service.sla },
});
}
if (state.responseTime > service.sla.maxResponseTimeMs) {
await this.alertManager.sendAlert({
type: "sla_violation",
serviceName,
severity: "warning",
message: `Service ${serviceName} response time (${state.responseTime}ms) exceeds SLA (${service.sla.maxResponseTimeMs}ms)`,
metadata: { state, sla: service.sla },
});
}
}
}
// Start reliability analysis
private startReliabilityAnalysis(): void {
setInterval(async () => {
try {
await this.performReliabilityAnalysis();
} catch (error) {
console.error("Reliability analysis error:", error);
}
}, 300000); // Every 5 minutes
}
private async performReliabilityAnalysis(): Promise<void> {
for (const serviceName of this.services.keys()) {
const metrics = await this.reliabilityMetrics.getServiceMetrics(
serviceName,
300000 // Last 5 minutes
);
// Update availability calculation
const state = this.serviceStates.get(serviceName)!;
state.availability = metrics.successRate;
state.errorRate = 1 - metrics.successRate;
// Check for reliability issues
if (metrics.successRate < 0.95) {
console.warn(
`Service ${serviceName} reliability degraded: ${(
metrics.successRate * 100
).toFixed(2)}%`
);
}
// Analyze fallback usage patterns
if (metrics.fallbackUsageRate > 0.1) {
console.warn(
`Service ${serviceName} high fallback usage: ${(
metrics.fallbackUsageRate * 100
).toFixed(2)}%`
);
}
}
}
private async recordSuccessfulCall(
serviceName: string,
state: ServiceState,
startTime: number
): Promise<void> {
const duration = Date.now() - startTime;
state.consecutiveFailures = 0;
state.lastSuccessful = Date.now();
state.responseTime = duration;
await this.reliabilityMetrics.recordCall(serviceName, {
callId: this.generateCallId(),
duration,
success: true,
fallbackUsed: false,
});
}
private async recordFailedCall(
serviceName: string,
state: ServiceState,
startTime: number,
error: Error
): Promise<void> {
const duration = Date.now() - startTime;
state.consecutiveFailures++;
state.lastError = error.message;
await this.reliabilityMetrics.recordCall(serviceName, {
callId: this.generateCallId(),
duration,
success: false,
error: error.message,
fallbackUsed: false,
});
}
// Get service reliability statistics
async getServiceReliabilityReport(
serviceName?: string,
timeRange?: TimeRange
): Promise<ServiceReliabilityReport> {
const services = serviceName
? [serviceName]
: Array.from(this.services.keys());
const serviceReports: ServiceReliabilityData[] = [];
for (const name of services) {
const service = this.services.get(name)!;
const state = this.serviceStates.get(name)!;
const metrics = await this.reliabilityMetrics.getServiceMetrics(
name,
timeRange
? timeRange.end.getTime() - timeRange.start.getTime()
: 3600000
);
const dependencyImpact = await this.analyzeDependencyImpact(name);
serviceReports.push({
serviceName: name,
criticality: service.criticality,
currentState: state,
metrics,
dependencyImpact,
slaCompliance: service.sla
? {
availabilityMet:
state.availability >= service.sla.minAvailability,
responseTimeMet:
state.responseTime <= service.sla.maxResponseTimeMs,
currentAvailability: state.availability,
targetAvailability: service.sla.minAvailability,
currentResponseTime: state.responseTime,
targetResponseTime: service.sla.maxResponseTimeMs,
}
: null,
});
}
return {
generatedAt: Date.now(),
timeRange: timeRange || {
start: new Date(Date.now() - 3600000),
end: new Date(),
},
services: serviceReports,
overallHealth: this.calculateOverallHealth(serviceReports),
};
}
private calculateOverallHealth(services: ServiceReliabilityData[]): number {
if (services.length === 0) return 1;
let weightedHealth = 0;
let totalWeight = 0;
for (const service of services) {
const weight = this.getCriticalityWeight(service.criticality);
weightedHealth += service.currentState.health * weight;
totalWeight += weight;
}
return totalWeight > 0 ? weightedHealth / totalWeight : 0;
}
private getCriticalityWeight(criticality: string): number {
const weights = {
critical: 4,
high: 3,
medium: 2,
low: 1,
};
return weights[criticality] || 2;
}
private async getMitigationStrategies(
dependentService: string,
failedService: string
): Promise<string[]> {
const fallbackStrategy = this.fallbackStrategies.get(dependentService);
const strategies: string[] = [];
if (fallbackStrategy) {
strategies.push(`Fallback available: ${fallbackStrategy.type}`);
}
// Check for alternative dependencies
const service = this.services.get(dependentService);
if (service && service.dependencies.length > 1) {
strategies.push("Multiple dependencies available");
}
// Add other mitigation strategies based on service configuration
strategies.push("Manual intervention available");
return strategies;
}
private generateCallId(): string {
return `call_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
}
// Example fallback strategy implementations
class CacheFallbackStrategy implements FallbackStrategy {
readonly type = "cache";
constructor(private config: any) {}
async execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>> {
// Try to get cached result
const cacheKey = this.generateCacheKey(operation, callId);
const cachedResult = await this.getCachedResult(cacheKey);
if (cachedResult) {
return {
data: cachedResult,
source: "cache",
degraded: true,
};
}
throw new ServiceError(
"No cached data available for fallback",
"CACHE_MISS"
);
}
private generateCacheKey(
operation: ServiceOperation<any>,
callId: string
): string {
// Generate cache key based on operation parameters
return `fallback:${operation.name}:${JSON.stringify(operation.parameters)}`;
}
private async getCachedResult(cacheKey: string): Promise<any> {
// Implementation would check your cache (Redis, etc.)
return null;
}
}
class DefaultValueFallbackStrategy implements FallbackStrategy {
readonly type = "default_value";
constructor(private config: any) {}
async execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>> {
const defaultValue = this.config.defaultValues[operation.name];
if (defaultValue !== undefined) {
return {
data: defaultValue,
source: "default",
degraded: true,
};
}
throw new ServiceError(
"No default value configured for fallback",
"NO_DEFAULT_VALUE"
);
}
}
// Supporting interfaces and classes
interface ServiceReliabilityConfig {
services: ServiceConfig[];
alerting: AlertingConfig;
}
interface ServiceConfig {
name: string;
type: string;
endpoint: string;
criticality?: "low" | "medium" | "high" | "critical";
dependencies?: string[];
healthCheck: HealthCheckConfig;
sla?: ServiceSLA;
timeouts?: {
connection?: number;
request?: number;
healthCheck?: number;
};
fallback: FallbackConfig;
}
interface ServiceDefinition {
name: string;
type: string;
endpoint: string;
criticality: string;
dependencies: string[];
healthCheck: HealthCheckConfig;
sla?: ServiceSLA;
timeouts: {
connection: number;
request: number;
healthCheck: number;
};
}
interface ServiceState {
status: "unknown" | "healthy" | "unhealthy" | "maintenance";
health: number;
lastHealthCheck: number;
consecutiveFailures: number;
lastError: string | null;
responseTime: number;
availability: number;
errorRate: number;
lastSuccessful: number;
}
interface ServiceOperation<T> {
name: string;
parameters: any;
execute(service: ServiceDefinition, context: any): Promise<T>;
}
interface ServiceCallOptions {
timeout?: number;
retries?: number;
disableFallback?: boolean;
}
interface ServiceCallResult<T> {
callId: string;
serviceName: string;
data: T;
source: "primary" | "cache" | "default" | "alternative";
duration: number;
fallbackUsed: boolean;
degradedMode: boolean;
fallbackReason?: string;
}
interface FallbackStrategy {
readonly type: string;
execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>>;
}
interface FallbackResult<T> {
data: T;
source: string;
degraded?: boolean;
}
interface FallbackConfig {
type:
| "cache"
| "default_value"
| "alternative_service"
| "degraded_response"
| "composite"
| "none";
[key: string]: any;
}
interface HealthCheckConfig {
endpoint?: string;
method?: string;
expectedStatus?: number;
timeout?: number;
headers?: Record<string, string>;
}
interface ServiceSLA {
minAvailability: number;
maxResponseTimeMs: number;
}
interface AvailabilityCheckResult {
available: boolean;
reason: string;
canFallback: boolean;
}
interface DependencyImpactAnalysis {
serviceName: string;
directDependents: string[];
transitiveDependents: string[];
dependencies: string[];
impact: ServiceImpact[];
cascadeRisk: "low" | "medium" | "high";
}
interface ServiceImpact {
serviceName: string;
impactLevel: "low" | "medium" | "high" | "critical";
mitigationStrategies: string[];
}
interface ServiceReliabilityReport {
generatedAt: number;
timeRange: TimeRange;
services: ServiceReliabilityData[];
overallHealth: number;
}
interface ServiceReliabilityData {
serviceName: string;
criticality: string;
currentState: ServiceState;
metrics: any;
dependencyImpact: DependencyImpactAnalysis;
slaCompliance: {
availabilityMet: boolean;
responseTimeMet: boolean;
currentAvailability: number;
targetAvailability: number;
currentResponseTime: number;
targetResponseTime: number;
} | null;
}
interface HealthCheckResult {
healthy: boolean;
score: number;
error?: string;
metadata?: any;
}
interface AlertingConfig {
channels: string[];
thresholds: any;
}
interface TimeRange {
start: Date;
end: Date;
}
class ServiceError extends Error {
constructor(
message: string,
public code: string,
public status?: number,
public metadata?: any
) {
super(message);
this.name = "ServiceError";
}
}
// Placeholder classes that would need full implementation
class ServiceDependencyGraph {
addService(serviceName: string, dependencies: string[]): void {}
getDependents(serviceName: string): string[] {
return [];
}
getDependencies(serviceName: string): string[] {
return [];
}
getTransitiveDependents(serviceName: string): string[] {
return [];
}
}
class ReliabilityMetrics {
async recordCall(serviceName: string, data: any): Promise<void> {}
async recordHealthCheck(serviceName: string, data: any): Promise<void> {}
async getServiceMetrics(
serviceName: string,
timeWindowMs: number
): Promise<any> {
return {
successRate: 0.95,
fallbackUsageRate: 0.05,
avgResponseTime: 100,
};
}
}
class ServiceAlertManager {
constructor(config: AlertingConfig) {}
async sendAlert(alert: any): Promise<void> {
console.log(`ALERT: ${alert.message}`);
}
}
class HealthChecker {
constructor(
private service: ServiceDefinition,
private config: HealthCheckConfig
) {}
async check(): Promise<HealthCheckResult> {
// Implementation would perform actual health check
return {
healthy: true,
score: 1.0,
};
}
}
class AlternativeServiceFallbackStrategy implements FallbackStrategy {
readonly type = "alternative_service";
constructor(
private config: any,
private manager: ServiceReliabilityManager
) {}
async execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>> {
throw new Error("Alternative service fallback not implemented");
}
}
class DegradedResponseFallbackStrategy implements FallbackStrategy {
readonly type = "degraded_response";
constructor(private config: any) {}
async execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>> {
throw new Error("Degraded response fallback not implemented");
}
}
class CompositeFallbackStrategy implements FallbackStrategy {
readonly type = "composite";
constructor(
private config: any,
private manager: ServiceReliabilityManager
) {}
async execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>> {
throw new Error("Composite fallback not implemented");
}
}
class NoOpFallbackStrategy implements FallbackStrategy {
readonly type = "none";
async execute<T>(
operation: ServiceOperation<T>,
callId: string,
reason: string,
primaryError?: Error
): Promise<FallbackResult<T>> {
throw new ServiceError("No fallback strategy configured", "NO_FALLBACK");
}
}
Key Takeaways
Third-party integration resilience isn’t just about handling individual API failures—it’s about building systems that can adapt to unpredictable patterns, survive webhook floods, and maintain consistency when external services behave in ways you never anticipated.
Essential resilience patterns:
- Webhook processing with flood protection, deduplication, and intelligent retry logic
- Adaptive rate limiting that adjusts based on API behavior and response patterns
- Service reliability with comprehensive fallback strategies and dependency analysis
- Health monitoring that tracks integration health, not just uptime
- Graceful degradation that maintains core functionality when external services fail
The production-ready integration resilience framework:
- Use event-driven architecture for webhook processing that can handle unpredictable loads
- Implement adaptive rate limiting that learns from API behavior rather than using static limits
- Build comprehensive fallback strategies including cache, default values, and alternative services
- Design dependency-aware systems that understand the cascade effects of service failures
- Plan intelligent degradation that provides reduced functionality rather than complete failure
Integration resilience best practices:
- Design for failure from day one - external services will fail, plan accordingly
- Monitor integration health as comprehensively as you monitor your own services
- Implement circuit breakers to prevent cascading failures across your system
- Use correlation IDs to trace requests through complex integration chains
- Build comprehensive alerting that provides actionable insights, not just noise
The architecture decision framework:
- Use webhook queues for high-volume event processing with flood protection
- Use adaptive rate limiting for APIs with variable or unknown rate limits
- Use circuit breakers for critical external dependencies
- Use fallback strategies for any functionality that must remain available
- Use health checks for proactive monitoring of external service availability
What’s Next?
In the upcoming blogs, we’ll continue with more advanced backend concepts, diving into specialized topics like serverless architectures, advanced database concepts, performance optimization, and monitoring and observability at scale.
We’ll explore the operational challenges of maintaining complex distributed systems—from optimizing database queries that handle millions of records to building monitoring systems that can detect problems before they impact users.
Because building integrations that work in development is the foundation. Making them bulletproof enough to handle the chaos of production traffic, unpredictable external service behavior, and the complexity of distributed systems at scale—that’s where backend engineering becomes both an art and a science.