File Handling & Media Processing - 2/2
The $40 Million Storage Meltdown That Killed a Unicorn
Picture this catastrophe: March 15th, 2:47 AM GMT. One of the hottest social media startups, valued at $2 billion and growing at 300% year-over-year, is about to experience the kind of data disaster that becomes a business school case study. Their platform handles 50TB of user-generated content daily—photos, videos, live streams, and documents from 15 million active users worldwide.
Then a “simple” storage migration begins, and digital hell breaks loose.
Within the first hour, what started as routine maintenance becomes an extinction-level event:
- A 2PB file migration accidentally deletes 40% of user content because someone forgot to test the backup restoration process
- The remaining files become inaccessible because metadata indexes weren’t properly synchronized during the migration
- Large video files (4K uploads) start corrupting mid-stream because chunk reassembly logic had a race condition
- The file security system locks out 60% of users because permission metadata got scrambled during the move
- Backup restoration fails because the archive format changed between backup creation and restoration, making 18 months of backups unusable
- The media transcoding service crashes trying to process a 12GB 8K video file that someone uploaded as a “profile picture”
But here’s where it gets apocalyptic—their file handling infrastructure, built by developers who thought “it’s just file storage,” completely collapses under real-world complexity:
- Data corruption cascade: Inconsistent metadata causes 2.3 million files to become permanently inaccessible
- Streaming failure: Large file downloads timeout and corrupt because of improper chunk handling
- Security breach: File permissions reset during migration exposes 800,000 private documents
- Archive catastrophe: Backup restoration takes 14 days instead of 4 hours due to poor compression choices
- Transcoding bottleneck: Video processing queue backs up 72 hours, making the platform unusable
- Cost explosion: Emergency data recovery services cost $8 million, emergency storage costs $12 million
By day three, the damage was terminal:
- $40 million in emergency recovery, legal fees, and infrastructure costs
- 4.2 million users permanently lost their content and data
- 60% user churn within 30 days due to lost trust
- Complete platform rebuild required, taking 8 months
- Acquisition talks fell through due to “unreliable data infrastructure”
- Company valuation dropped 85% and they sold for parts 6 months later
The brutal truth? They had built file handling that worked fine for small images but had never been designed to handle the scale, security, and reliability requirements of production file systems managing petabytes of irreplaceable user data.
The Uncomfortable Truth About Production File Systems
Here’s what separates file systems that gracefully handle enterprise-scale data from those that crumble when someone uploads their first large video: Production file handling isn’t just about storing bytes—it’s about building distributed systems that can stream massive files, maintain data integrity, enforce security at scale, and recover from disasters while preserving every bit of user data.
Most developers approach large-scale file handling like this:
- Store files in the cloud and assume everything will scale automatically
- Handle large files by loading them entirely into memory and hope for the best
- Backup files occasionally and cross fingers that restoration will work when needed
- Set basic permissions and assume security is someone else’s problem
- Discover during a disaster that file systems are distributed databases requiring sophisticated architecture
But systems that handle real-world file complexity at scale work differently:
- Design streaming architectures that can handle terabyte files without consuming system resources
- Build comprehensive metadata systems that maintain consistency across distributed storage
- Implement multi-layered security with encryption, access control, and audit logging
- Plan disaster recovery with automated backups, tested restoration, and geographic redundancy
- Treat large-scale file handling as a critical infrastructure component requiring 99.99% availability
The difference isn’t just reliability—it’s the difference between file systems that become more robust as they scale and file systems where each new terabyte becomes a potential single point of failure that can bring down your entire platform.
Ready to build file handling that works like Netflix’s video delivery system instead of that file upload form that crashes when someone tries to share their 4K vacation video? Let’s dive into the patterns that power bulletproof enterprise file systems.
Large File Handling and Streaming: Taming the Terabyte Beast
Production-Grade Streaming File System
// Advanced large file handling system with streaming, chunking, and resumable uploads
class LargeFileStreamingManager {
private streamingEngine: StreamingEngine;
private chunkManager: ChunkManager;
private resumeManager: ResumableUploadManager;
private integrityChecker: FileIntegrityChecker;
private compressionEngine: CompressionEngine;
private deduplicationSystem: DeduplicationSystem;
private rateLimiter: StreamingRateLimiter;
constructor(config: LargeFileConfig) {
this.streamingEngine = new StreamingEngine(config.streaming);
this.chunkManager = new ChunkManager(config.chunking);
this.resumeManager = new ResumableUploadManager(config.resume);
this.integrityChecker = new FileIntegrityChecker();
this.compressionEngine = new CompressionEngine(config.compression);
this.deduplicationSystem = new DeduplicationSystem(config.deduplication);
this.rateLimiter = new StreamingRateLimiter(config.rateLimit);
this.setupStreamingWorkers();
this.setupIntegrityMonitoring();
}
// Handle large file upload with streaming and resumability
async uploadLargeFile(
uploadRequest: LargeFileUploadRequest
): Promise<LargeFileUploadResult> {
const uploadId = this.generateUploadId();
const startTime = Date.now();
try {
console.log(
`Starting large file upload: ${uploadId} (${uploadRequest.totalSize} bytes)`
);
// Check if file already exists (deduplication)
const existingFile = await this.deduplicationSystem.findExisting(
uploadRequest.fileHash
);
if (existingFile) {
console.log(`File already exists, skipping upload: ${uploadId}`);
return this.createDeduplicatedResult(existingFile, uploadId);
}
// Initialize resumable upload session
const uploadSession = await this.resumeManager.createSession({
uploadId,
filename: uploadRequest.filename,
totalSize: uploadRequest.totalSize,
chunkSize:
uploadRequest.chunkSize ||
this.getOptimalChunkSize(uploadRequest.totalSize),
fileHash: uploadRequest.fileHash,
mimeType: uploadRequest.mimeType,
userId: uploadRequest.userId,
});
// Stream and process chunks
const processingResult = await this.processLargeFileStream(
uploadRequest,
uploadSession
);
// Finalize upload and verify integrity
const finalResult = await this.finalizeUpload(
uploadSession,
processingResult,
uploadRequest
);
console.log(`Large file upload completed: ${uploadId}`);
return finalResult;
} catch (error) {
console.error(`Large file upload failed: ${uploadId}`, error);
await this.cleanupFailedUpload(uploadId);
throw new LargeFileError(
"Large file upload failed",
"UPLOAD_FAILED",
500,
{ uploadId, originalError: error }
);
}
}
private async processLargeFileStream(
request: LargeFileUploadRequest,
session: UploadSession
): Promise<StreamProcessingResult> {
const totalChunks = Math.ceil(request.totalSize / session.chunkSize);
const processedChunks: ProcessedChunk[] = [];
let uploadedBytes = 0;
console.log(`Processing ${totalChunks} chunks for ${session.uploadId}`);
// Create readable stream from request
const fileStream = this.createFileStream(request);
// Apply rate limiting
const rateLimitedStream = this.rateLimiter.createLimitedStream(
fileStream,
request.userId
);
// Apply compression if beneficial
const compressionResult = await this.compressionEngine.shouldCompress(
request.mimeType,
request.totalSize
);
let processedStream = rateLimitedStream;
if (compressionResult.shouldCompress) {
processedStream = this.compressionEngine.createCompressionStream(
processedStream,
compressionResult.algorithm
);
}
// Process stream in chunks
return new Promise((resolve, reject) => {
let chunkIndex = 0;
let buffer = Buffer.alloc(0);
let totalProcessed = 0;
processedStream.on("data", async (chunk: Buffer) => {
try {
buffer = Buffer.concat([buffer, chunk]);
// Process complete chunks
while (
buffer.length >= session.chunkSize ||
(chunkIndex === totalChunks - 1 && buffer.length > 0)
) {
const chunkSize = Math.min(session.chunkSize, buffer.length);
const chunkData = buffer.slice(0, chunkSize);
buffer = buffer.slice(chunkSize);
const processedChunk = await this.processChunk(
chunkData,
chunkIndex,
session
);
processedChunks.push(processedChunk);
totalProcessed += chunkSize;
chunkIndex++;
// Update progress
await this.resumeManager.updateProgress(session.uploadId, {
chunksCompleted: chunkIndex,
totalChunks,
bytesUploaded: totalProcessed,
});
// Call progress callback if provided
if (request.onProgress) {
request.onProgress({
uploadId: session.uploadId,
bytesUploaded: totalProcessed,
totalBytes: request.totalSize,
chunksCompleted: chunkIndex,
totalChunks,
percentage: (totalProcessed / request.totalSize) * 100,
});
}
// Check if we've processed all chunks
if (chunkIndex >= totalChunks) {
resolve({
chunks: processedChunks,
totalSize: totalProcessed,
compressionUsed: compressionResult.shouldCompress,
compressionRatio: compressionResult.shouldCompress
? totalProcessed / request.totalSize
: 1,
});
return;
}
}
} catch (error) {
reject(error);
}
});
processedStream.on("error", reject);
processedStream.on("end", () => {
// Handle any remaining buffer data
if (buffer.length > 0 && chunkIndex < totalChunks) {
this.processChunk(buffer, chunkIndex, session)
.then((processedChunk) => {
processedChunks.push(processedChunk);
resolve({
chunks: processedChunks,
totalSize: totalProcessed + buffer.length,
compressionUsed: compressionResult.shouldCompress,
compressionRatio: compressionResult.shouldCompress
? (totalProcessed + buffer.length) / request.totalSize
: 1,
});
})
.catch(reject);
}
});
});
}
private async processChunk(
chunkData: Buffer,
chunkIndex: number,
session: UploadSession
): Promise<ProcessedChunk> {
console.log(`Processing chunk ${chunkIndex} for ${session.uploadId}`);
// Calculate chunk hash for integrity verification
const chunkHash = this.integrityChecker.calculateHash(chunkData);
// Check if this chunk was already uploaded (resumable uploads)
const existingChunk = await this.resumeManager.getChunk(
session.uploadId,
chunkIndex
);
if (existingChunk && existingChunk.hash === chunkHash) {
console.log(`Chunk ${chunkIndex} already uploaded, skipping`);
return existingChunk;
}
// Upload chunk to storage
const storageResult = await this.chunkManager.uploadChunk({
uploadId: session.uploadId,
chunkIndex,
data: chunkData,
hash: chunkHash,
size: chunkData.length,
});
// Store chunk metadata
const processedChunk: ProcessedChunk = {
index: chunkIndex,
hash: chunkHash,
size: chunkData.length,
storageKey: storageResult.key,
etag: storageResult.etag,
uploadedAt: Date.now(),
};
await this.resumeManager.saveChunk(session.uploadId, processedChunk);
return processedChunk;
}
// Resume interrupted upload
async resumeUpload(uploadId: string): Promise<ResumeResult> {
try {
const session = await this.resumeManager.getSession(uploadId);
if (!session) {
throw new LargeFileError(
"Upload session not found",
"SESSION_NOT_FOUND",
404
);
}
// Check session expiration
if (this.resumeManager.isSessionExpired(session)) {
await this.resumeManager.cleanupSession(uploadId);
throw new LargeFileError(
"Upload session expired",
"SESSION_EXPIRED",
410
);
}
// Get completed chunks
const completedChunks = await this.resumeManager.getCompletedChunks(
uploadId
);
const totalChunks = Math.ceil(session.totalSize / session.chunkSize);
return {
uploadId,
completedChunks: completedChunks.map((c) => c.index),
totalChunks,
bytesCompleted: completedChunks.reduce((sum, c) => sum + c.size, 0),
totalBytes: session.totalSize,
nextChunkIndex:
completedChunks.length > 0
? Math.max(...completedChunks.map((c) => c.index)) + 1
: 0,
};
} catch (error) {
console.error(`Resume upload failed: ${uploadId}`, error);
throw error;
}
}
// Stream large file for download with range support
async streamFileDownload(
request: FileDownloadRequest
): Promise<FileDownloadStream> {
try {
// Get file metadata
const fileMetadata = await this.getFileMetadata(request.fileId);
// Check permissions
await this.verifyDownloadPermissions(request.userId, request.fileId);
// Parse range header if present
const range = this.parseRangeHeader(request.range, fileMetadata.size);
// Create download stream
const downloadStream = await this.createDownloadStream(
fileMetadata,
range
);
// Apply rate limiting for downloads
const rateLimitedStream = this.rateLimiter.createDownloadStream(
downloadStream,
request.userId
);
// Decompress if needed
let finalStream = rateLimitedStream;
if (fileMetadata.compression) {
finalStream = this.compressionEngine.createDecompressionStream(
rateLimitedStream,
fileMetadata.compression.algorithm
);
}
return {
stream: finalStream,
size: range ? range.end - range.start + 1 : fileMetadata.size,
contentType: fileMetadata.mimeType,
range: range,
headers: this.generateDownloadHeaders(fileMetadata, range),
};
} catch (error) {
console.error(`File download failed: ${request.fileId}`, error);
throw new LargeFileError("File download failed", "DOWNLOAD_FAILED", 500, {
fileId: request.fileId,
originalError: error,
});
}
}
private async createDownloadStream(
fileMetadata: FileMetadata,
range?: ByteRange
): Promise<NodeJS.ReadableStream> {
if (fileMetadata.storage.type === "chunked") {
// For chunked files, create stream from chunks
return this.createChunkedDownloadStream(fileMetadata, range);
} else {
// For single files, use regular streaming
return this.streamingEngine.createStream(
fileMetadata.storage.location,
range
);
}
}
private async createChunkedDownloadStream(
fileMetadata: FileMetadata,
range?: ByteRange
): Promise<NodeJS.ReadableStream> {
const chunks = await this.chunkManager.getFileChunks(fileMetadata.fileId);
// Calculate which chunks we need based on range
const requiredChunks = range
? this.calculateChunksForRange(chunks, range)
: chunks;
return this.chunkManager.createChunkedStream(requiredChunks, range);
}
private calculateChunksForRange(
chunks: ChunkInfo[],
range: ByteRange
): ChunkInfo[] {
const requiredChunks: ChunkInfo[] = [];
let currentOffset = 0;
for (const chunk of chunks.sort((a, b) => a.index - b.index)) {
const chunkEnd = currentOffset + chunk.size - 1;
// Check if this chunk overlaps with the requested range
if (chunkEnd >= range.start && currentOffset <= range.end) {
requiredChunks.push({
...chunk,
rangeStart: Math.max(0, range.start - currentOffset),
rangeEnd: Math.min(chunk.size - 1, range.end - currentOffset),
});
}
currentOffset += chunk.size;
// Stop if we've passed the requested range
if (currentOffset > range.end) {
break;
}
}
return requiredChunks;
}
// Parallel download with multiple connections
async downloadLargeFileParallel(
request: ParallelDownloadRequest
): Promise<ParallelDownloadResult> {
const fileMetadata = await this.getFileMetadata(request.fileId);
const connections =
request.connections || this.getOptimalConnectionCount(fileMetadata.size);
const chunkSize = Math.ceil(fileMetadata.size / connections);
console.log(
`Starting parallel download: ${request.fileId} (${connections} connections)`
);
const downloadPromises: Promise<DownloadChunk>[] = [];
for (let i = 0; i < connections; i++) {
const start = i * chunkSize;
const end = Math.min(start + chunkSize - 1, fileMetadata.size - 1);
if (start <= end) {
downloadPromises.push(
this.downloadChunk(request.fileId, start, end, i, request.userId)
);
}
}
try {
const chunks = await Promise.all(downloadPromises);
// Combine chunks in order
const combinedBuffer = this.combineDownloadChunks(chunks);
// Verify integrity if hash is available
if (fileMetadata.hash) {
const downloadHash =
this.integrityChecker.calculateHash(combinedBuffer);
if (downloadHash !== fileMetadata.hash) {
throw new LargeFileError(
"Download integrity check failed",
"INTEGRITY_CHECK_FAILED",
500
);
}
}
return {
fileId: request.fileId,
data: combinedBuffer,
size: combinedBuffer.length,
downloadTime: Date.now() - Date.now(), // Would track actual time
connectionsUsed: connections,
};
} catch (error) {
console.error(`Parallel download failed: ${request.fileId}`, error);
throw error;
}
}
private async downloadChunk(
fileId: string,
start: number,
end: number,
chunkIndex: number,
userId: string
): Promise<DownloadChunk> {
const range = `bytes=${start}-${end}`;
const downloadRequest: FileDownloadRequest = {
fileId,
userId,
range,
};
const downloadStream = await this.streamFileDownload(downloadRequest);
return new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
downloadStream.stream.on("data", (chunk: Buffer) => {
chunks.push(chunk);
});
downloadStream.stream.on("end", () => {
resolve({
index: chunkIndex,
data: Buffer.concat(chunks),
start,
end,
});
});
downloadStream.stream.on("error", reject);
});
}
private combineDownloadChunks(chunks: DownloadChunk[]): Buffer {
// Sort chunks by index to ensure correct order
const sortedChunks = chunks.sort((a, b) => a.index - b.index);
return Buffer.concat(sortedChunks.map((chunk) => chunk.data));
}
// Utility methods
private getOptimalChunkSize(fileSize: number): number {
// Optimize chunk size based on file size
if (fileSize < 100 * 1024 * 1024) {
// < 100MB
return 5 * 1024 * 1024; // 5MB chunks
} else if (fileSize < 1024 * 1024 * 1024) {
// < 1GB
return 10 * 1024 * 1024; // 10MB chunks
} else if (fileSize < 10 * 1024 * 1024 * 1024) {
// < 10GB
return 50 * 1024 * 1024; // 50MB chunks
} else {
return 100 * 1024 * 1024; // 100MB chunks
}
}
private getOptimalConnectionCount(fileSize: number): number {
// Optimize connection count based on file size
if (fileSize < 100 * 1024 * 1024) {
// < 100MB
return 2;
} else if (fileSize < 1024 * 1024 * 1024) {
// < 1GB
return 4;
} else {
return 8;
}
}
private parseRangeHeader(
rangeHeader: string | undefined,
fileSize: number
): ByteRange | null {
if (!rangeHeader) return null;
const match = rangeHeader.match(/^bytes=(\d+)-(\d*)$/);
if (!match) return null;
const start = parseInt(match[1]);
const end = match[2] ? parseInt(match[2]) : fileSize - 1;
return { start, end };
}
private generateDownloadHeaders(
metadata: FileMetadata,
range?: ByteRange
): Record<string, string> {
const headers: Record<string, string> = {
"Content-Type": metadata.mimeType,
"Content-Length": range
? (range.end - range.start + 1).toString()
: metadata.size.toString(),
"Accept-Ranges": "bytes",
"Cache-Control": "public, max-age=3600",
ETag: metadata.etag || metadata.hash,
};
if (range) {
headers[
"Content-Range"
] = `bytes ${range.start}-${range.end}/${metadata.size}`;
headers["Status"] = "206";
}
return headers;
}
private createFileStream(
request: LargeFileUploadRequest
): NodeJS.ReadableStream {
// Implementation would create stream from request data source
throw new Error("createFileStream implementation needed");
}
private async getFileMetadata(fileId: string): Promise<FileMetadata> {
// Implementation would retrieve file metadata
throw new Error("getFileMetadata implementation needed");
}
private async verifyDownloadPermissions(
userId: string,
fileId: string
): Promise<void> {
// Implementation would check download permissions
}
private generateUploadId(): string {
return `large_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private async finalizeUpload(
session: UploadSession,
processing: StreamProcessingResult,
request: LargeFileUploadRequest
): Promise<LargeFileUploadResult> {
// Implementation would finalize upload and create file record
return {
uploadId: session.uploadId,
fileId: this.generateFileId(),
filename: request.filename,
size: processing.totalSize,
url: `https://files.example.com/${session.uploadId}`,
uploadedAt: Date.now(),
};
}
private generateFileId(): string {
return `file_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private async createDeduplicatedResult(
existingFile: any,
uploadId: string
): Promise<LargeFileUploadResult> {
return {
uploadId,
fileId: existingFile.fileId,
filename: existingFile.filename,
size: existingFile.size,
url: existingFile.url,
uploadedAt: Date.now(),
deduplicated: true,
};
}
private async cleanupFailedUpload(uploadId: string): Promise<void> {
// Implementation would cleanup failed upload resources
}
private setupStreamingWorkers(): void {
// Implementation would setup background workers
}
private setupIntegrityMonitoring(): void {
// Implementation would setup integrity monitoring
}
}
// Supporting classes and interfaces
interface LargeFileUploadRequest {
filename: string;
totalSize: number;
chunkSize?: number;
fileHash: string;
mimeType: string;
userId: string;
onProgress?: (progress: UploadProgress) => void;
}
interface UploadSession {
uploadId: string;
filename: string;
totalSize: number;
chunkSize: number;
fileHash: string;
mimeType: string;
userId: string;
createdAt: number;
expiresAt: number;
}
interface ProcessedChunk {
index: number;
hash: string;
size: number;
storageKey: string;
etag: string;
uploadedAt: number;
}
interface StreamProcessingResult {
chunks: ProcessedChunk[];
totalSize: number;
compressionUsed: boolean;
compressionRatio: number;
}
interface UploadProgress {
uploadId: string;
bytesUploaded: number;
totalBytes: number;
chunksCompleted: number;
totalChunks: number;
percentage: number;
}
interface FileDownloadRequest {
fileId: string;
userId: string;
range?: string;
}
interface ByteRange {
start: number;
end: number;
}
interface FileDownloadStream {
stream: NodeJS.ReadableStream;
size: number;
contentType: string;
range?: ByteRange;
headers: Record<string, string>;
}
interface ParallelDownloadRequest {
fileId: string;
userId: string;
connections?: number;
}
interface DownloadChunk {
index: number;
data: Buffer;
start: number;
end: number;
}
interface ParallelDownloadResult {
fileId: string;
data: Buffer;
size: number;
downloadTime: number;
connectionsUsed: number;
}
interface ChunkInfo {
index: number;
size: number;
hash: string;
storageKey: string;
rangeStart?: number;
rangeEnd?: number;
}
interface FileMetadata {
fileId: string;
filename: string;
size: number;
mimeType: string;
hash: string;
etag?: string;
storage: {
type: "single" | "chunked";
location: string;
};
compression?: {
algorithm: string;
originalSize: number;
};
}
interface LargeFileUploadResult {
uploadId: string;
fileId: string;
filename: string;
size: number;
url: string;
uploadedAt: number;
deduplicated?: boolean;
}
interface ResumeResult {
uploadId: string;
completedChunks: number[];
totalChunks: number;
bytesCompleted: number;
totalBytes: number;
nextChunkIndex: number;
}
class LargeFileError extends Error {
constructor(
message: string,
public code: string,
public status: number,
public metadata?: any
) {
super(message);
this.name = "LargeFileError";
}
}
// Placeholder classes that would need full implementation
interface LargeFileConfig {
streaming: any;
chunking: any;
resume: any;
compression: any;
deduplication: any;
rateLimit: any;
}
class StreamingEngine {
constructor(config: any) {}
async createStream(
location: string,
range?: ByteRange
): Promise<NodeJS.ReadableStream> {
// Implementation would create streaming interface
throw new Error("createStream implementation needed");
}
}
class ChunkManager {
constructor(config: any) {}
async uploadChunk(chunk: any): Promise<any> {
return { key: "chunk-key", etag: "etag" };
}
async getFileChunks(fileId: string): Promise<ChunkInfo[]> {
return [];
}
async createChunkedStream(
chunks: ChunkInfo[],
range?: ByteRange
): Promise<NodeJS.ReadableStream> {
throw new Error("createChunkedStream implementation needed");
}
}
class ResumableUploadManager {
constructor(config: any) {}
async createSession(options: any): Promise<UploadSession> {
return options as UploadSession;
}
async getSession(uploadId: string): Promise<UploadSession | null> {
return null;
}
async updateProgress(uploadId: string, progress: any): Promise<void> {}
async getChunk(
uploadId: string,
chunkIndex: number
): Promise<ProcessedChunk | null> {
return null;
}
async saveChunk(uploadId: string, chunk: ProcessedChunk): Promise<void> {}
async getCompletedChunks(uploadId: string): Promise<ProcessedChunk[]> {
return [];
}
isSessionExpired(session: UploadSession): boolean {
return Date.now() > session.expiresAt;
}
async cleanupSession(uploadId: string): Promise<void> {}
}
class FileIntegrityChecker {
calculateHash(data: Buffer): string {
// Implementation would calculate file hash
return "hash-placeholder";
}
}
class CompressionEngine {
constructor(config: any) {}
async shouldCompress(
mimeType: string,
size: number
): Promise<{ shouldCompress: boolean; algorithm: string }> {
return { shouldCompress: false, algorithm: "none" };
}
createCompressionStream(
stream: NodeJS.ReadableStream,
algorithm: string
): NodeJS.ReadableStream {
return stream;
}
createDecompressionStream(
stream: NodeJS.ReadableStream,
algorithm: string
): NodeJS.ReadableStream {
return stream;
}
}
class DeduplicationSystem {
constructor(config: any) {}
async findExisting(hash: string): Promise<any> {
return null;
}
}
class StreamingRateLimiter {
constructor(config: any) {}
createLimitedStream(
stream: NodeJS.ReadableStream,
userId: string
): NodeJS.ReadableStream {
return stream;
}
createDownloadStream(
stream: NodeJS.ReadableStream,
userId: string
): NodeJS.ReadableStream {
return stream;
}
}
Metadata Extraction and Management: The File Intelligence Layer
Comprehensive Metadata Management System
// Advanced metadata extraction and management system
class MetadataManagementEngine {
private extractors: Map<string, MetadataExtractor>;
private metadataStore: MetadataStore;
private indexer: MetadataIndexer;
private searchEngine: MetadataSearchEngine;
private versioning: MetadataVersioning;
private auditLogger: MetadataAuditLogger;
private scheduler: MetadataScheduler;
constructor(config: MetadataConfig) {
this.metadataStore = new MetadataStore(config.storage);
this.indexer = new MetadataIndexer(config.indexing);
this.searchEngine = new MetadataSearchEngine(config.search);
this.versioning = new MetadataVersioning();
this.auditLogger = new MetadataAuditLogger();
this.scheduler = new MetadataScheduler();
this.initializeExtractors();
this.setupBackgroundProcessing();
}
// Extract comprehensive metadata from file
async extractMetadata(
fileId: string,
filePath: string,
mimeType: string,
options: ExtractionOptions = {}
): Promise<ExtractedMetadata> {
const extractionId = this.generateExtractionId();
const startTime = Date.now();
try {
console.log(
`Starting metadata extraction: ${extractionId} for ${fileId}`
);
// Get appropriate extractors for this file type
const extractors = this.getExtractorsForMimeType(mimeType);
// Extract metadata from all applicable extractors
const extractionResults: ExtractionResult[] = [];
for (const extractor of extractors) {
try {
console.log(
`Running extractor: ${extractor.getName()} for ${fileId}`
);
const result = await extractor.extract(filePath, {
...options,
fileId,
mimeType,
});
extractionResults.push({
extractorName: extractor.getName(),
success: true,
metadata: result,
extractionTime: result.extractionTime || 0,
});
} catch (error) {
console.error(
`Extractor ${extractor.getName()} failed for ${fileId}:`,
error
);
extractionResults.push({
extractorName: extractor.getName(),
success: false,
error: error.message,
extractionTime: 0,
});
}
}
// Combine and normalize metadata
const combinedMetadata = this.combineExtractionResults(extractionResults);
// Add system metadata
const systemMetadata = await this.generateSystemMetadata(
fileId,
filePath
);
// Create final metadata object
const finalMetadata: ExtractedMetadata = {
fileId,
extractionId,
extractedAt: Date.now(),
extractionTime: Date.now() - startTime,
// Core metadata
technical: combinedMetadata.technical,
descriptive: combinedMetadata.descriptive,
administrative: systemMetadata,
// Format-specific metadata
image: combinedMetadata.image,
video: combinedMetadata.video,
audio: combinedMetadata.audio,
document: combinedMetadata.document,
// Quality metrics
extractionResults,
qualityScore: this.calculateQualityScore(extractionResults),
};
// Store metadata with versioning
await this.storeMetadata(finalMetadata);
// Index for search
await this.indexMetadata(finalMetadata);
// Log extraction for audit
await this.auditLogger.logExtraction(extractionId, finalMetadata);
console.log(`Metadata extraction completed: ${extractionId}`);
return finalMetadata;
} catch (error) {
console.error(`Metadata extraction failed: ${extractionId}`, error);
throw new MetadataError(
"Metadata extraction failed",
"EXTRACTION_FAILED",
500,
{ extractionId, fileId, originalError: error }
);
}
}
private combineExtractionResults(
results: ExtractionResult[]
): CombinedMetadata {
const combined: CombinedMetadata = {
technical: {},
descriptive: {},
image: {},
video: {},
audio: {},
document: {},
};
for (const result of results) {
if (result.success && result.metadata) {
// Merge metadata categories
Object.keys(result.metadata).forEach((category) => {
if (combined[category as keyof CombinedMetadata]) {
Object.assign(
combined[category as keyof CombinedMetadata],
result.metadata[category]
);
}
});
}
}
return combined;
}
private async generateSystemMetadata(
fileId: string,
filePath: string
): Promise<AdministrativeMetadata> {
const stats = await this.getFileStats(filePath);
return {
fileId,
createdAt: Date.now(),
modifiedAt: stats.mtime.getTime(),
size: stats.size,
checksum: await this.calculateChecksum(filePath),
storage: {
location: filePath,
provider: "local", // Would be determined by storage configuration
redundancy: "single",
},
processing: {
extractedAt: Date.now(),
version: "1.0",
},
};
}
// Advanced metadata search with faceted filtering
async searchMetadata(
query: MetadataSearchQuery
): Promise<MetadataSearchResult> {
try {
console.log(`Executing metadata search:`, query);
// Build search parameters
const searchParams = this.buildSearchParameters(query);
// Execute search with faceting
const searchResults = await this.searchEngine.search(searchParams);
// Apply additional filters
const filteredResults = await this.applyAdvancedFilters(
searchResults,
query.filters || {}
);
// Calculate facets for navigation
const facets = await this.calculateSearchFacets(filteredResults, query);
return {
query,
results: filteredResults,
facets,
totalResults: filteredResults.length,
executionTime: Date.now() - Date.now(), // Would track actual time
suggestions: await this.generateSearchSuggestions(query),
};
} catch (error) {
console.error("Metadata search failed:", error);
throw new MetadataError("Metadata search failed", "SEARCH_FAILED", 500, {
query,
originalError: error,
});
}
}
private buildSearchParameters(query: MetadataSearchQuery): SearchParameters {
return {
text: query.text,
fields: query.fields || [
"descriptive.title",
"descriptive.description",
"descriptive.keywords",
],
filters: this.convertFiltersToSearchFilters(query.filters || {}),
sort: query.sort || [{ field: "createdAt", direction: "desc" }],
limit: query.limit || 20,
offset: query.offset || 0,
facets: query.includeFacets ? this.getAvailableFacets() : [],
};
}
private async applyAdvancedFilters(
results: any[],
filters: MetadataFilters
): Promise<MetadataSearchItem[]> {
let filteredResults = results;
// Date range filters
if (filters.dateRange) {
filteredResults = filteredResults.filter((item) => {
const createdAt = new Date(item.administrative.createdAt);
return (
createdAt >= filters.dateRange!.start &&
createdAt <= filters.dateRange!.end
);
});
}
// Size filters
if (filters.sizeRange) {
filteredResults = filteredResults.filter((item) => {
const size = item.administrative.size;
return size >= filters.sizeRange!.min && size <= filters.sizeRange!.max;
});
}
// Dimension filters for images/videos
if (filters.dimensions) {
filteredResults = filteredResults.filter((item) => {
if (item.image) {
return (
item.image.width >= filters.dimensions!.minWidth &&
item.image.height >= filters.dimensions!.minHeight
);
}
if (item.video) {
return (
item.video.width >= filters.dimensions!.minWidth &&
item.video.height >= filters.dimensions!.minHeight
);
}
return true;
});
}
// Duration filters for audio/video
if (filters.durationRange) {
filteredResults = filteredResults.filter((item) => {
const duration = item.audio?.duration || item.video?.duration;
return (
duration &&
duration >= filters.durationRange!.min &&
duration <= filters.durationRange!.max
);
});
}
return filteredResults.map(this.transformToSearchItem);
}
private transformToSearchItem(item: any): MetadataSearchItem {
return {
fileId: item.fileId,
filename: item.descriptive.filename,
mimeType: item.technical.mimeType,
size: item.administrative.size,
createdAt: item.administrative.createdAt,
thumbnail: item.image?.thumbnail || item.video?.thumbnail,
score: item._score || 1,
highlights: item._highlights || {},
};
}
// Batch metadata processing
async processBatchMetadata(
files: BatchMetadataRequest[]
): Promise<BatchMetadataResult> {
const batchId = this.generateBatchId();
const startTime = Date.now();
console.log(
`Starting batch metadata processing: ${batchId} (${files.length} files)`
);
const results: MetadataProcessingResult[] = [];
const errors: BatchMetadataError[] = [];
// Process in chunks to avoid overwhelming the system
const chunkSize = 10;
for (let i = 0; i < files.length; i += chunkSize) {
const chunk = files.slice(i, i + chunkSize);
const chunkPromises = chunk.map(async (file, index) => {
try {
const metadata = await this.extractMetadata(
file.fileId,
file.filePath,
file.mimeType,
file.options
);
results.push({
fileId: file.fileId,
success: true,
metadata,
processingTime: metadata.extractionTime,
});
} catch (error) {
errors.push({
index: i + index,
fileId: file.fileId,
error: error.message,
code: error.code,
});
}
});
await Promise.all(chunkPromises);
// Add delay between chunks to prevent overwhelming
if (i + chunkSize < files.length) {
await new Promise((resolve) => setTimeout(resolve, 100));
}
}
return {
batchId,
totalFiles: files.length,
successfulProcessing: results.length,
failedProcessing: errors.length,
results,
errors,
processingTime: Date.now() - startTime,
completedAt: Date.now(),
};
}
// Metadata analytics and insights
async generateMetadataAnalytics(
criteria: AnalyticsCriteria
): Promise<MetadataAnalytics> {
try {
const analytics: MetadataAnalytics = {
criteria,
generatedAt: Date.now(),
// File type distribution
fileTypeDistribution: await this.calculateFileTypeDistribution(
criteria
),
// Size analytics
sizeAnalytics: await this.calculateSizeAnalytics(criteria),
// Temporal analytics
temporalAnalytics: await this.calculateTemporalAnalytics(criteria),
// Quality metrics
qualityMetrics: await this.calculateQualityMetrics(criteria),
// Format-specific analytics
imageAnalytics: await this.calculateImageAnalytics(criteria),
videoAnalytics: await this.calculateVideoAnalytics(criteria),
audioAnalytics: await this.calculateAudioAnalytics(criteria),
documentAnalytics: await this.calculateDocumentAnalytics(criteria),
};
return analytics;
} catch (error) {
console.error("Metadata analytics generation failed:", error);
throw new MetadataError(
"Analytics generation failed",
"ANALYTICS_FAILED",
500,
{ criteria, originalError: error }
);
}
}
// Metadata maintenance and optimization
async optimizeMetadataStorage(): Promise<OptimizationResult> {
const startTime = Date.now();
console.log("Starting metadata storage optimization");
const optimizationTasks = [
this.compactMetadataStore(),
this.rebuiltSearchIndexes(),
this.cleanupObsoleteVersions(),
this.optimizeMetadataQueries(),
];
const results = await Promise.allSettled(optimizationTasks);
const optimization: OptimizationResult = {
startTime,
endTime: Date.now(),
duration: Date.now() - startTime,
tasks: [
{ name: "compact_storage", success: results[0].status === "fulfilled" },
{ name: "rebuild_indexes", success: results[1].status === "fulfilled" },
{
name: "cleanup_versions",
success: results[2].status === "fulfilled",
},
{
name: "optimize_queries",
success: results[3].status === "fulfilled",
},
],
spaceRecovered: await this.calculateSpaceRecovered(),
performanceImprovement: await this.measurePerformanceImprovement(),
};
console.log("Metadata optimization completed:", optimization);
return optimization;
}
// Initialize extractors for different file types
private initializeExtractors(): void {
this.extractors = new Map();
// Image extractors
this.extractors.set("image", new ImageMetadataExtractor());
this.extractors.set("exif", new ExifExtractor());
// Video extractors
this.extractors.set("video", new VideoMetadataExtractor());
this.extractors.set("ffprobe", new FFProbeExtractor());
// Audio extractors
this.extractors.set("audio", new AudioMetadataExtractor());
this.extractors.set("id3", new ID3Extractor());
// Document extractors
this.extractors.set("pdf", new PDFMetadataExtractor());
this.extractors.set("office", new OfficeMetadataExtractor());
// General extractors
this.extractors.set("mime", new MimeTypeExtractor());
this.extractors.set("file", new FileSystemExtractor());
console.log(`Initialized ${this.extractors.size} metadata extractors`);
}
private getExtractorsForMimeType(mimeType: string): MetadataExtractor[] {
const extractors: MetadataExtractor[] = [];
// Always include general extractors
extractors.push(this.extractors.get("mime")!);
extractors.push(this.extractors.get("file")!);
// Add format-specific extractors
if (mimeType.startsWith("image/")) {
extractors.push(this.extractors.get("image")!);
extractors.push(this.extractors.get("exif")!);
} else if (mimeType.startsWith("video/")) {
extractors.push(this.extractors.get("video")!);
extractors.push(this.extractors.get("ffprobe")!);
} else if (mimeType.startsWith("audio/")) {
extractors.push(this.extractors.get("audio")!);
extractors.push(this.extractors.get("id3")!);
} else if (mimeType === "application/pdf") {
extractors.push(this.extractors.get("pdf")!);
} else if (this.isOfficeDocument(mimeType)) {
extractors.push(this.extractors.get("office")!);
}
return extractors.filter((e) => e != null);
}
private calculateQualityScore(results: ExtractionResult[]): number {
const totalExtractors = results.length;
const successfulExtractors = results.filter((r) => r.success).length;
if (totalExtractors === 0) return 0;
return (successfulExtractors / totalExtractors) * 100;
}
// Utility methods
private async storeMetadata(metadata: ExtractedMetadata): Promise<void> {
await this.metadataStore.store(metadata);
await this.versioning.createVersion(metadata);
}
private async indexMetadata(metadata: ExtractedMetadata): Promise<void> {
await this.indexer.index(metadata);
}
private isOfficeDocument(mimeType: string): boolean {
const officeMimes = [
"application/msword",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.ms-excel",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.ms-powerpoint",
"application/vnd.openxmlformats-officedocument.presentationml.presentation",
];
return officeMimes.includes(mimeType);
}
private generateExtractionId(): string {
return `ext_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private generateBatchId(): string {
return `batch_meta_${Date.now()}_${Math.random()
.toString(36)
.substr(2, 9)}`;
}
// Placeholder implementations
private async getFileStats(filePath: string): Promise<any> {
return { mtime: new Date(), size: 0 };
}
private async calculateChecksum(filePath: string): Promise<string> {
return "checksum-placeholder";
}
private convertFiltersToSearchFilters(filters: MetadataFilters): any[] {
return [];
}
private getAvailableFacets(): string[] {
return ["mimeType", "size", "createdAt", "dimensions"];
}
private async calculateSearchFacets(
results: any[],
query: MetadataSearchQuery
): Promise<any> {
return {};
}
private async generateSearchSuggestions(
query: MetadataSearchQuery
): Promise<string[]> {
return [];
}
private setupBackgroundProcessing(): void {
// Setup background tasks for metadata processing
}
private async calculateFileTypeDistribution(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateSizeAnalytics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateTemporalAnalytics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateQualityMetrics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateImageAnalytics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateVideoAnalytics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateAudioAnalytics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async calculateDocumentAnalytics(
criteria: AnalyticsCriteria
): Promise<any> {
return {};
}
private async compactMetadataStore(): Promise<void> {}
private async rebuiltSearchIndexes(): Promise<void> {}
private async cleanupObsoleteVersions(): Promise<void> {}
private async optimizeMetadataQueries(): Promise<void> {}
private async calculateSpaceRecovered(): Promise<number> {
return 0;
}
private async measurePerformanceImprovement(): Promise<number> {
return 0;
}
}
// Example metadata extractor implementation
class ImageMetadataExtractor implements MetadataExtractor {
getName(): string {
return "image-metadata";
}
async extract(filePath: string, options: any): Promise<any> {
// Would use libraries like sharp, image-size, etc.
return {
image: {
width: 1920,
height: 1080,
colorSpace: "sRGB",
channels: 3,
density: 72,
format: "jpeg",
quality: 85,
compression: "baseline",
},
technical: {
bitDepth: 8,
pixelDensity: 72,
orientation: 1,
},
extractionTime: 50,
};
}
}
// Supporting interfaces and types
interface MetadataConfig {
storage: any;
indexing: any;
search: any;
}
interface ExtractionOptions {
fileId?: string;
mimeType?: string;
extractThumbnail?: boolean;
deepAnalysis?: boolean;
}
interface ExtractionResult {
extractorName: string;
success: boolean;
metadata?: any;
error?: string;
extractionTime: number;
}
interface CombinedMetadata {
technical: any;
descriptive: any;
image?: any;
video?: any;
audio?: any;
document?: any;
}
interface AdministrativeMetadata {
fileId: string;
createdAt: number;
modifiedAt: number;
size: number;
checksum: string;
storage: {
location: string;
provider: string;
redundancy: string;
};
processing: {
extractedAt: number;
version: string;
};
}
interface ExtractedMetadata {
fileId: string;
extractionId: string;
extractedAt: number;
extractionTime: number;
technical: any;
descriptive: any;
administrative: AdministrativeMetadata;
image?: any;
video?: any;
audio?: any;
document?: any;
extractionResults: ExtractionResult[];
qualityScore: number;
}
interface MetadataSearchQuery {
text?: string;
fields?: string[];
filters?: MetadataFilters;
sort?: SortOption[];
limit?: number;
offset?: number;
includeFacets?: boolean;
}
interface MetadataFilters {
mimeTypes?: string[];
dateRange?: { start: Date; end: Date };
sizeRange?: { min: number; max: number };
dimensions?: { minWidth: number; minHeight: number };
durationRange?: { min: number; max: number };
tags?: string[];
userId?: string;
}
interface SortOption {
field: string;
direction: "asc" | "desc";
}
interface SearchParameters {
text?: string;
fields: string[];
filters: any[];
sort: SortOption[];
limit: number;
offset: number;
facets: string[];
}
interface MetadataSearchResult {
query: MetadataSearchQuery;
results: MetadataSearchItem[];
facets: any;
totalResults: number;
executionTime: number;
suggestions: string[];
}
interface MetadataSearchItem {
fileId: string;
filename: string;
mimeType: string;
size: number;
createdAt: number;
thumbnail?: string;
score: number;
highlights: Record<string, string[]>;
}
interface BatchMetadataRequest {
fileId: string;
filePath: string;
mimeType: string;
options?: ExtractionOptions;
}
interface MetadataProcessingResult {
fileId: string;
success: boolean;
metadata?: ExtractedMetadata;
processingTime: number;
}
interface BatchMetadataError {
index: number;
fileId: string;
error: string;
code: string;
}
interface BatchMetadataResult {
batchId: string;
totalFiles: number;
successfulProcessing: number;
failedProcessing: number;
results: MetadataProcessingResult[];
errors: BatchMetadataError[];
processingTime: number;
completedAt: number;
}
interface AnalyticsCriteria {
dateRange?: { start: Date; end: Date };
fileTypes?: string[];
users?: string[];
tags?: string[];
}
interface MetadataAnalytics {
criteria: AnalyticsCriteria;
generatedAt: number;
fileTypeDistribution: any;
sizeAnalytics: any;
temporalAnalytics: any;
qualityMetrics: any;
imageAnalytics?: any;
videoAnalytics?: any;
audioAnalytics?: any;
documentAnalytics?: any;
}
interface OptimizationResult {
startTime: number;
endTime: number;
duration: number;
tasks: { name: string; success: boolean }[];
spaceRecovered: number;
performanceImprovement: number;
}
interface MetadataExtractor {
getName(): string;
extract(filePath: string, options: any): Promise<any>;
}
class MetadataError extends Error {
constructor(
message: string,
public code: string,
public status: number,
public metadata?: any
) {
super(message);
this.name = "MetadataError";
}
}
// Placeholder classes
class MetadataStore {
constructor(config: any) {}
async store(metadata: ExtractedMetadata): Promise<void> {}
}
class MetadataIndexer {
constructor(config: any) {}
async index(metadata: ExtractedMetadata): Promise<void> {}
}
class MetadataSearchEngine {
constructor(config: any) {}
async search(params: SearchParameters): Promise<any[]> {
return [];
}
}
class MetadataVersioning {
async createVersion(metadata: ExtractedMetadata): Promise<void> {}
}
class MetadataAuditLogger {
async logExtraction(
extractionId: string,
metadata: ExtractedMetadata
): Promise<void> {}
}
class MetadataScheduler {
constructor() {}
}
class ExifExtractor implements MetadataExtractor {
getName(): string {
return "exif";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class VideoMetadataExtractor implements MetadataExtractor {
getName(): string {
return "video";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class FFProbeExtractor implements MetadataExtractor {
getName(): string {
return "ffprobe";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class AudioMetadataExtractor implements MetadataExtractor {
getName(): string {
return "audio";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class ID3Extractor implements MetadataExtractor {
getName(): string {
return "id3";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class PDFMetadataExtractor implements MetadataExtractor {
getName(): string {
return "pdf";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class OfficeMetadataExtractor implements MetadataExtractor {
getName(): string {
return "office";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class MimeTypeExtractor implements MetadataExtractor {
getName(): string {
return "mime";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
class FileSystemExtractor implements MetadataExtractor {
getName(): string {
return "file";
}
async extract(filePath: string, options: any): Promise<any> {
return {};
}
}
File Security and Access Control: Protecting Your Digital Assets
Enterprise-Grade File Security System
// Comprehensive file security and access control system
class FileSecurityManager {
private accessController: AccessController;
private encryptionService: FileEncryptionService;
private auditLogger: SecurityAuditLogger;
private permissionManager: PermissionManager;
private quarantineSystem: QuarantineSystem;
private integrityMonitor: FileIntegrityMonitor;
private virusScanner: VirusScanner;
constructor(config: FileSecurityConfig) {
this.accessController = new AccessController(config.access);
this.encryptionService = new FileEncryptionService(config.encryption);
this.auditLogger = new SecurityAuditLogger(config.audit);
this.permissionManager = new PermissionManager(config.permissions);
this.quarantineSystem = new QuarantineSystem(config.quarantine);
this.integrityMonitor = new FileIntegrityMonitor(config.integrity);
this.virusScanner = new VirusScanner(config.virusScanning);
this.setupSecurityMonitoring();
this.setupPeriodicScans();
}
// Secure file upload with comprehensive security checks
async secureFileUpload(
uploadRequest: SecureUploadRequest
): Promise<SecureUploadResult> {
const securityId = this.generateSecurityId();
const startTime = Date.now();
try {
console.log(`Starting secure file upload: ${securityId}`);
// Phase 1: Pre-upload security validation
await this.validateUploadSecurity(uploadRequest, securityId);
// Phase 2: Real-time threat scanning
const scanResult = await this.performThreatScanning(
uploadRequest.filePath,
securityId
);
if (scanResult.threatsDetected) {
await this.quarantineFile(
uploadRequest.filePath,
scanResult,
securityId
);
throw new SecurityError(
"File contains security threats",
"THREATS_DETECTED",
400,
{ threats: scanResult.threats }
);
}
// Phase 3: Encryption and secure storage
const encryptedFile = await this.encryptFile(
uploadRequest.filePath,
uploadRequest.encryptionOptions || {}
);
// Phase 4: Generate secure access credentials
const accessCredentials = await this.generateAccessCredentials(
uploadRequest,
encryptedFile
);
// Phase 5: Set up access controls
await this.configureAccessControls(
encryptedFile.fileId,
uploadRequest.accessPolicy
);
// Phase 6: Create integrity baseline
await this.createIntegrityBaseline(encryptedFile);
// Log security event
await this.auditLogger.logSecureUpload(securityId, {
fileId: encryptedFile.fileId,
userId: uploadRequest.userId,
encryptionUsed: true,
scanResult,
});
const result: SecureUploadResult = {
securityId,
fileId: encryptedFile.fileId,
accessToken: accessCredentials.accessToken,
encryptionKey: accessCredentials.encryptionKey,
securityLevel: this.calculateSecurityLevel(uploadRequest, scanResult),
uploadedAt: Date.now(),
expiresAt: accessCredentials.expiresAt,
};
console.log(`Secure file upload completed: ${securityId}`);
return result;
} catch (error) {
await this.auditLogger.logSecurityFailure(securityId, error);
console.error(`Secure file upload failed: ${securityId}`, error);
throw error;
}
}
private async validateUploadSecurity(
request: SecureUploadRequest,
securityId: string
): Promise<void> {
// Validate user permissions
const hasUploadPermission = await this.accessController.checkPermission(
request.userId,
"file.upload",
request.context
);
if (!hasUploadPermission) {
throw new SecurityError(
"User does not have upload permission",
"PERMISSION_DENIED",
403
);
}
// Check file type restrictions
if (
request.restrictedTypes &&
request.restrictedTypes.includes(request.mimeType)
) {
throw new SecurityError(
"File type not allowed",
"FILE_TYPE_RESTRICTED",
400
);
}
// Validate file size limits based on security level
const maxSize = await this.getSecurityBasedSizeLimit(
request.userId,
request.securityLevel || "standard"
);
if (request.fileSize > maxSize) {
throw new SecurityError(
"File size exceeds security limits",
"FILE_SIZE_EXCEEDED",
413,
{ maxSize, actualSize: request.fileSize }
);
}
// Check rate limiting
const rateLimitResult = await this.accessController.checkRateLimit(
request.userId,
"secure_upload"
);
if (!rateLimitResult.allowed) {
throw new SecurityError(
"Upload rate limit exceeded",
"RATE_LIMITED",
429,
{ retryAfter: rateLimitResult.retryAfter }
);
}
}
private async performThreatScanning(
filePath: string,
securityId: string
): Promise<ThreatScanResult> {
console.log(`Performing threat scanning: ${securityId}`);
const scanResults: ScanResult[] = [];
// Virus scanning
const virusScan = await this.virusScanner.scanFile(filePath);
scanResults.push({
scanner: "virus",
clean: !virusScan.infected,
threats: virusScan.threats || [],
severity: virusScan.infected ? "high" : "none",
});
// Malware detection
const malwareScan = await this.scanForMalware(filePath);
scanResults.push({
scanner: "malware",
clean: !malwareScan.detected,
threats: malwareScan.threats || [],
severity: malwareScan.detected ? "high" : "none",
});
// Content analysis
const contentScan = await this.analyzeFileContent(filePath);
scanResults.push({
scanner: "content",
clean: contentScan.safe,
threats: contentScan.issues || [],
severity: contentScan.safe ? "none" : contentScan.severity || "medium",
});
// Metadata scanning
const metadataScan = await this.scanMetadata(filePath);
scanResults.push({
scanner: "metadata",
clean: metadataScan.clean,
threats: metadataScan.threats || [],
severity: metadataScan.clean ? "none" : "low",
});
const allThreats = scanResults.flatMap((r) => r.threats);
const threatsDetected = allThreats.length > 0;
const maxSeverity = this.getMaxSeverity(scanResults.map((r) => r.severity));
return {
scanId: this.generateScanId(),
threatsDetected,
threats: allThreats,
severity: maxSeverity,
scanResults,
scannedAt: Date.now(),
};
}
private async encryptFile(
filePath: string,
options: EncryptionOptions
): Promise<EncryptedFile> {
console.log(`Encrypting file: ${filePath}`);
const algorithm = options.algorithm || "aes-256-gcm";
const keyDerivation = options.keyDerivation || "pbkdf2";
// Generate encryption key
const encryptionKey = await this.encryptionService.generateKey(
algorithm,
keyDerivation
);
// Encrypt file
const encryptedData = await this.encryptionService.encryptFile(
filePath,
encryptionKey,
{
algorithm,
chunkSize: options.chunkSize || 64 * 1024, // 64KB chunks
compressionBefore: options.compress || false,
}
);
// Generate secure file ID
const fileId = this.generateSecureFileId();
// Store encrypted file
const storageResult = await this.storeEncryptedFile(fileId, encryptedData);
return {
fileId,
originalPath: filePath,
encryptedPath: storageResult.path,
encryptionKey: encryptionKey.toString("base64"),
algorithm,
keyDerivation,
checksum: storageResult.checksum,
size: encryptedData.length,
encryptedAt: Date.now(),
};
}
// Secure file access with authentication and authorization
async secureFileAccess(
accessRequest: FileAccessRequest
): Promise<SecureFileAccess> {
const accessId = this.generateAccessId();
try {
console.log(`Processing secure file access: ${accessId}`);
// Validate access token
const tokenValidation = await this.validateAccessToken(
accessRequest.accessToken
);
if (!tokenValidation.valid) {
throw new SecurityError(
"Invalid or expired access token",
"INVALID_TOKEN",
401
);
}
// Check file permissions
const hasAccess = await this.accessController.checkFileAccess(
tokenValidation.userId,
accessRequest.fileId,
accessRequest.operation
);
if (!hasAccess) {
throw new SecurityError("Access denied to file", "ACCESS_DENIED", 403);
}
// Get file metadata and encryption info
const fileInfo = await this.getSecureFileInfo(accessRequest.fileId);
// Check integrity
await this.verifyFileIntegrity(fileInfo);
// Decrypt file for access
const decryptedStream = await this.createDecryptedStream(
fileInfo,
accessRequest
);
// Log access event
await this.auditLogger.logFileAccess(accessId, {
userId: tokenValidation.userId,
fileId: accessRequest.fileId,
operation: accessRequest.operation,
ipAddress: accessRequest.ipAddress,
userAgent: accessRequest.userAgent,
});
return {
accessId,
stream: decryptedStream,
metadata: this.sanitizeFileMetadata(fileInfo),
accessExpiresAt: Date.now() + (accessRequest.accessDuration || 3600000), // 1 hour
};
} catch (error) {
await this.auditLogger.logAccessFailure(accessId, error);
throw error;
}
}
// Advanced permission management
async manageFilePermissions(
request: PermissionManagementRequest
): Promise<PermissionResult> {
try {
// Validate admin permissions
const hasPermissionManagementAccess =
await this.accessController.checkPermission(
request.adminUserId,
"permissions.manage",
{ fileId: request.fileId }
);
if (!hasPermissionManagementAccess) {
throw new SecurityError(
"Permission management access denied",
"ADMIN_ACCESS_DENIED",
403
);
}
switch (request.action) {
case "grant":
return await this.grantFilePermissions(request);
case "revoke":
return await this.revokeFilePermissions(request);
case "update":
return await this.updateFilePermissions(request);
case "list":
return await this.listFilePermissions(request);
default:
throw new SecurityError(
"Invalid permission action",
"INVALID_ACTION",
400
);
}
} catch (error) {
await this.auditLogger.logPermissionManagementFailure(request, error);
throw error;
}
}
private async grantFilePermissions(
request: PermissionManagementRequest
): Promise<PermissionResult> {
const permissions = request.permissions!;
for (const permission of permissions) {
await this.permissionManager.grantPermission({
fileId: request.fileId,
userId: permission.userId,
operations: permission.operations,
expiresAt: permission.expiresAt,
constraints: permission.constraints,
});
}
await this.auditLogger.logPermissionGrant(request.fileId, permissions);
return {
action: "grant",
fileId: request.fileId,
permissions,
success: true,
appliedAt: Date.now(),
};
}
private async revokeFilePermissions(
request: PermissionManagementRequest
): Promise<PermissionResult> {
const userIds = request.userIds!;
for (const userId of userIds) {
await this.permissionManager.revokeAllPermissions(request.fileId, userId);
}
await this.auditLogger.logPermissionRevocation(request.fileId, userIds);
return {
action: "revoke",
fileId: request.fileId,
revokedUsers: userIds,
success: true,
appliedAt: Date.now(),
};
}
// File integrity monitoring
async monitorFileIntegrity(): Promise<IntegrityReport> {
console.log("Starting file integrity monitoring");
const report: IntegrityReport = {
scanStarted: Date.now(),
totalFilesScanned: 0,
integrityViolations: [],
corruptedFiles: [],
modifiedFiles: [],
recommendations: [],
};
// Get all files that need integrity checking
const filesToCheck = await this.integrityMonitor.getFilesForCheck();
report.totalFilesScanned = filesToCheck.length;
for (const file of filesToCheck) {
try {
const integrityResult = await this.checkFileIntegrity(file);
if (!integrityResult.valid) {
const violation: IntegrityViolation = {
fileId: file.fileId,
violationType: integrityResult.violation,
currentHash: integrityResult.currentHash,
expectedHash: integrityResult.expectedHash,
detectedAt: Date.now(),
severity: this.calculateIntegrityViolationSeverity(integrityResult),
};
report.integrityViolations.push(violation);
if (integrityResult.violation === "corruption") {
report.corruptedFiles.push(file.fileId);
} else if (
integrityResult.violation === "unauthorized_modification"
) {
report.modifiedFiles.push(file.fileId);
}
// Take protective action
await this.handleIntegrityViolation(file, violation);
}
} catch (error) {
console.error(`Integrity check failed for ${file.fileId}:`, error);
}
}
// Generate recommendations
report.recommendations = this.generateIntegrityRecommendations(report);
report.scanCompleted = Date.now();
await this.auditLogger.logIntegrityReport(report);
console.log(
`Integrity monitoring completed: ${report.integrityViolations.length} violations found`
);
return report;
}
private async checkFileIntegrity(
file: FileInfo
): Promise<IntegrityCheckResult> {
// Calculate current file hash
const currentHash = await this.calculateFileHash(file.path);
// Get expected hash from integrity baseline
const expectedHash = await this.integrityMonitor.getExpectedHash(
file.fileId
);
if (currentHash !== expectedHash) {
// Determine if it's corruption or unauthorized modification
const violationType = await this.determineViolationType(
file,
currentHash,
expectedHash
);
return {
valid: false,
violation: violationType,
currentHash,
expectedHash,
fileId: file.fileId,
};
}
return {
valid: true,
currentHash,
expectedHash,
fileId: file.fileId,
};
}
// Security analytics and reporting
async generateSecurityReport(
criteria: SecurityReportCriteria
): Promise<SecurityReport> {
const report: SecurityReport = {
criteria,
generatedAt: Date.now(),
// Threat analysis
threatAnalysis: await this.analyzeThreatPatterns(criteria),
// Access patterns
accessAnalysis: await this.analyzeAccessPatterns(criteria),
// Permission audit
permissionAudit: await this.auditPermissions(criteria),
// Integrity status
integrityStatus: await this.getIntegrityStatus(criteria),
// Security metrics
securityMetrics: await this.calculateSecurityMetrics(criteria),
// Recommendations
recommendations: [],
};
// Generate security recommendations
report.recommendations = this.generateSecurityRecommendations(report);
return report;
}
// Utility and helper methods
private calculateSecurityLevel(
request: SecureUploadRequest,
scanResult: ThreatScanResult
): SecurityLevel {
let baseLevel = request.securityLevel || "standard";
if (scanResult.severity === "high") {
return "maximum";
} else if (scanResult.severity === "medium") {
return baseLevel === "minimum" ? "standard" : baseLevel;
}
return baseLevel;
}
private getMaxSeverity(severities: string[]): string {
const severityOrder = ["none", "low", "medium", "high"];
return severities.reduce((max, current) => {
return severityOrder.indexOf(current) > severityOrder.indexOf(max)
? current
: max;
}, "none");
}
private generateSecurityId(): string {
return `sec_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private generateScanId(): string {
return `scan_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private generateAccessId(): string {
return `access_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
private generateSecureFileId(): string {
return `secure_${Date.now()}_${Math.random().toString(36).substr(2, 16)}`;
}
// Placeholder implementations
private async getSecurityBasedSizeLimit(
userId: string,
securityLevel: string
): Promise<number> {
const limits: Record<string, number> = {
minimum: 100 * 1024 * 1024, // 100MB
standard: 1024 * 1024 * 1024, // 1GB
high: 5 * 1024 * 1024 * 1024, // 5GB
maximum: 10 * 1024 * 1024 * 1024, // 10GB
};
return limits[securityLevel] || limits.standard;
}
private async scanForMalware(filePath: string): Promise<any> {
return { detected: false, threats: [] };
}
private async analyzeFileContent(filePath: string): Promise<any> {
return { safe: true, issues: [], severity: "none" };
}
private async scanMetadata(filePath: string): Promise<any> {
return { clean: true, threats: [] };
}
private async quarantineFile(
filePath: string,
scanResult: ThreatScanResult,
securityId: string
): Promise<void> {
await this.quarantineSystem.quarantine(filePath, scanResult, securityId);
}
private async generateAccessCredentials(
request: SecureUploadRequest,
file: EncryptedFile
): Promise<any> {
return {
accessToken: "secure-token",
encryptionKey: file.encryptionKey,
expiresAt: Date.now() + 86400000, // 24 hours
};
}
private async configureAccessControls(
fileId: string,
policy: any
): Promise<void> {}
private async createIntegrityBaseline(file: EncryptedFile): Promise<void> {}
private async storeEncryptedFile(fileId: string, data: Buffer): Promise<any> {
return { path: `/secure/${fileId}`, checksum: "checksum" };
}
private async validateAccessToken(token: string): Promise<any> {
return { valid: true, userId: "user123" };
}
private async getSecureFileInfo(fileId: string): Promise<any> {
return { fileId, encryptionKey: "key", algorithm: "aes-256-gcm" };
}
private async verifyFileIntegrity(fileInfo: any): Promise<void> {}
private async createDecryptedStream(
fileInfo: any,
request: FileAccessRequest
): Promise<NodeJS.ReadableStream> {
throw new Error("createDecryptedStream implementation needed");
}
private sanitizeFileMetadata(fileInfo: any): any {
return { fileId: fileInfo.fileId, size: fileInfo.size };
}
private setupSecurityMonitoring(): void {}
private setupPeriodicScans(): void {}
private async listFilePermissions(
request: PermissionManagementRequest
): Promise<PermissionResult> {
return {
action: "list",
fileId: request.fileId,
success: true,
appliedAt: Date.now(),
};
}
private async updateFilePermissions(
request: PermissionManagementRequest
): Promise<PermissionResult> {
return {
action: "update",
fileId: request.fileId,
success: true,
appliedAt: Date.now(),
};
}
private calculateIntegrityViolationSeverity(
result: IntegrityCheckResult
): string {
return result.violation === "corruption" ? "high" : "medium";
}
private async handleIntegrityViolation(
file: FileInfo,
violation: IntegrityViolation
): Promise<void> {}
private generateIntegrityRecommendations(report: IntegrityReport): string[] {
return [];
}
private async determineViolationType(
file: FileInfo,
currentHash: string,
expectedHash: string
): Promise<string> {
return "unauthorized_modification";
}
private async calculateFileHash(path: string): Promise<string> {
return "hash-placeholder";
}
private async analyzeThreatPatterns(
criteria: SecurityReportCriteria
): Promise<any> {
return {};
}
private async analyzeAccessPatterns(
criteria: SecurityReportCriteria
): Promise<any> {
return {};
}
private async auditPermissions(
criteria: SecurityReportCriteria
): Promise<any> {
return {};
}
private async getIntegrityStatus(
criteria: SecurityReportCriteria
): Promise<any> {
return {};
}
private async calculateSecurityMetrics(
criteria: SecurityReportCriteria
): Promise<any> {
return {};
}
private generateSecurityRecommendations(report: SecurityReport): string[] {
return [];
}
}
// Supporting interfaces and classes
interface FileSecurityConfig {
access: any;
encryption: any;
audit: any;
permissions: any;
quarantine: any;
integrity: any;
virusScanning: any;
}
interface SecureUploadRequest {
filePath: string;
fileSize: number;
mimeType: string;
userId: string;
encryptionOptions?: EncryptionOptions;
accessPolicy: any;
securityLevel?: SecurityLevel;
restrictedTypes?: string[];
context?: any;
}
interface EncryptionOptions {
algorithm?: string;
keyDerivation?: string;
chunkSize?: number;
compress?: boolean;
}
type SecurityLevel = "minimum" | "standard" | "high" | "maximum";
interface ThreatScanResult {
scanId: string;
threatsDetected: boolean;
threats: string[];
severity: string;
scanResults: ScanResult[];
scannedAt: number;
}
interface ScanResult {
scanner: string;
clean: boolean;
threats: string[];
severity: string;
}
interface EncryptedFile {
fileId: string;
originalPath: string;
encryptedPath: string;
encryptionKey: string;
algorithm: string;
keyDerivation: string;
checksum: string;
size: number;
encryptedAt: number;
}
interface SecureUploadResult {
securityId: string;
fileId: string;
accessToken: string;
encryptionKey: string;
securityLevel: SecurityLevel;
uploadedAt: number;
expiresAt: number;
}
interface FileAccessRequest {
fileId: string;
accessToken: string;
operation: "read" | "write" | "delete";
accessDuration?: number;
ipAddress?: string;
userAgent?: string;
}
interface SecureFileAccess {
accessId: string;
stream: NodeJS.ReadableStream;
metadata: any;
accessExpiresAt: number;
}
interface PermissionManagementRequest {
fileId: string;
adminUserId: string;
action: "grant" | "revoke" | "update" | "list";
permissions?: FilePermission[];
userIds?: string[];
}
interface FilePermission {
userId: string;
operations: string[];
expiresAt?: number;
constraints?: any;
}
interface PermissionResult {
action: string;
fileId: string;
success: boolean;
appliedAt: number;
permissions?: FilePermission[];
revokedUsers?: string[];
}
interface IntegrityReport {
scanStarted: number;
scanCompleted?: number;
totalFilesScanned: number;
integrityViolations: IntegrityViolation[];
corruptedFiles: string[];
modifiedFiles: string[];
recommendations: string[];
}
interface IntegrityViolation {
fileId: string;
violationType: string;
currentHash: string;
expectedHash: string;
detectedAt: number;
severity: string;
}
interface IntegrityCheckResult {
valid: boolean;
violation?: string;
currentHash: string;
expectedHash: string;
fileId: string;
}
interface FileInfo {
fileId: string;
path: string;
size: number;
mimeType: string;
}
interface SecurityReportCriteria {
dateRange?: { start: Date; end: Date };
fileIds?: string[];
userIds?: string[];
securityLevels?: SecurityLevel[];
}
interface SecurityReport {
criteria: SecurityReportCriteria;
generatedAt: number;
threatAnalysis: any;
accessAnalysis: any;
permissionAudit: any;
integrityStatus: any;
securityMetrics: any;
recommendations: string[];
}
class SecurityError extends Error {
constructor(
message: string,
public code: string,
public status: number,
public metadata?: any
) {
super(message);
this.name = "SecurityError";
}
}
// Placeholder classes
class AccessController {
constructor(config: any) {}
async checkPermission(
userId: string,
permission: string,
context?: any
): Promise<boolean> {
return true;
}
async checkRateLimit(userId: string, operation: string): Promise<any> {
return { allowed: true };
}
async checkFileAccess(
userId: string,
fileId: string,
operation: string
): Promise<boolean> {
return true;
}
}
class FileEncryptionService {
constructor(config: any) {}
async generateKey(algorithm: string, derivation: string): Promise<Buffer> {
return Buffer.from("key");
}
async encryptFile(path: string, key: Buffer, options: any): Promise<Buffer> {
return Buffer.from("encrypted");
}
}
class SecurityAuditLogger {
constructor(config: any) {}
async logSecureUpload(id: string, info: any): Promise<void> {}
async logSecurityFailure(id: string, error: any): Promise<void> {}
async logFileAccess(id: string, info: any): Promise<void> {}
async logAccessFailure(id: string, error: any): Promise<void> {}
async logPermissionManagementFailure(
request: any,
error: any
): Promise<void> {}
async logPermissionGrant(fileId: string, permissions: any): Promise<void> {}
async logPermissionRevocation(
fileId: string,
userIds: string[]
): Promise<void> {}
async logIntegrityReport(report: IntegrityReport): Promise<void> {}
}
class PermissionManager {
constructor(config: any) {}
async grantPermission(permission: any): Promise<void> {}
async revokeAllPermissions(fileId: string, userId: string): Promise<void> {}
}
class QuarantineSystem {
constructor(config: any) {}
async quarantine(
path: string,
scanResult: ThreatScanResult,
securityId: string
): Promise<void> {}
}
class FileIntegrityMonitor {
constructor(config: any) {}
async getFilesForCheck(): Promise<FileInfo[]> {
return [];
}
async getExpectedHash(fileId: string): Promise<string> {
return "hash";
}
}
class VirusScanner {
constructor(config: any) {}
async scanFile(path: string): Promise<any> {
return { infected: false, threats: [] };
}
}
Key Takeaways
Large-scale file handling and media processing isn’t just about storing and serving files—it’s about building distributed systems that can stream massive files, maintain data integrity, enforce security at scale, and recover from disasters while preserving every bit of user data.
Essential production file handling patterns:
- Streaming architecture with chunked uploads, resumable transfers, and parallel downloads
- Comprehensive metadata systems that extract, index, and search across millions of files
- Enterprise security with encryption, access control, threat scanning, and integrity monitoring
- Intelligent backup strategies with automated archiving and tested disaster recovery
- Format-agnostic processing that handles everything from documents to 8K video
The enterprise-grade file system framework:
- Use streaming processing to handle terabyte files without memory exhaustion
- Implement metadata extraction pipelines that scale to millions of files
- Build layered security with encryption, access control, and continuous monitoring
- Design backup and recovery systems that can restore petabytes in hours, not days
- Plan format conversion pipelines that adapt content for any device or platform
File system best practices:
- Never load large files entirely into memory - always use streaming
- Extract and index metadata asynchronously to avoid blocking uploads
- Encrypt sensitive data at rest and in transit with proper key management
- Test disaster recovery regularly with actual data restoration scenarios
- Monitor file integrity continuously to detect corruption and tampering
The scalability decision framework:
- Use chunked storage for files larger than 100MB to enable parallel operations
- Use metadata databases for fast search and filtering across file collections
- Use distributed encryption for security without sacrificing performance
- Use geographic replication for disaster recovery and global content delivery
- Use automated archiving for cost-effective long-term storage management
What’s Next?
We’ve completed Phase 8 of our backend development journey, mastering the art of integrating with external services and handling files at enterprise scale. Next, we’ll dive into Phase 9: Advanced Architecture & Patterns, where we’ll explore Advanced Design Patterns, CQRS and Event Sourcing, Scalability & High Availability, and Performance & Optimization.
We’ll tackle the architectural challenges that separate good developers from great ones—from implementing CQRS patterns that handle millions of commands per second to building high-availability systems that maintain 99.99% uptime even when entire data centers go offline.
Because handling files and integrations is just the foundation. Building architectures that scale to millions of users while maintaining performance, consistency, and reliability—that’s where backend development becomes true systems engineering.