Advanced Data Management - 1/2
From Basic CRUD to Data Management Mastery
You’ve mastered databases, can query efficiently across SQL and NoSQL systems, and understand when to use each paradigm. You can design schemas, handle complex relationships, and even manage database operations in production. Your data fundamentals are rock solid. But here’s the reality check that hits every developer building real applications: knowing how to store and retrieve data is just the beginning. Managing data effectively means handling caching, sessions, file uploads, media processing, and serialization at scale.
The data management gap that trips up developers:
// Your basic CRUD operations work great in development
const getUser = async (id) => {
const user = await db.users.findOne({ id });
return user;
};
const getPosts = async (userId) => {
const posts = await db.posts.find({ userId }).sort({ createdAt: -1 });
return posts;
};
// But in production with 10,000+ users:
// 1. Database gets hammered with repeated queries for the same data
// 2. User sessions are scattered across server memory, breaking load balancing
// 3. File uploads crash your server when users upload 100MB videos
// 4. Profile images load slowly and consume massive bandwidth
// 5. Data serialization becomes a performance bottleneck
The uncomfortable scalability truth:
// Development reality: Everything works beautifully
const userDashboard = async (userId) => {
const user = await getUser(userId);
const posts = await getPosts(userId);
const followers = await getFollowers(userId);
const analytics = await getAnalytics(userId);
return { user, posts, followers, analytics }; // Fast and simple!
};
// Production reality: Same code, 1000x slower
// - 4 separate database queries per dashboard load
// - Popular users' data queried hundreds of times per second
// - Database connection pool exhausted
// - Response times measured in seconds, not milliseconds
// - Users abandoning your app for faster alternatives
The hard truth: Modern applications require sophisticated data management beyond basic CRUD operations. You need caching systems that eliminate redundant queries, session management that scales across servers, file handling that processes media efficiently, and serialization strategies that maintain performance as data complexity grows.
Production-ready data management requires mastery of:
- Intelligent caching strategies that dramatically reduce database load and response times
- Scalable session management that works across distributed server deployments
- Professional file upload handling that processes images, videos, and documents safely
- Efficient media processing that optimizes images and handles large files gracefully
- Smart data serialization that maintains performance with complex data structures
This article transforms you from a database user into a data management architect. You’ll learn caching patterns that make applications lightning-fast, session strategies that scale infinitely, file handling that never breaks, and serialization techniques that maintain speed as complexity grows.
Caching Strategies: Eliminate Database Load
The Performance Multiplier
Caching is the difference between good and great applications:
// ❌ Without caching: Every request hits the database
const getUserProfile = async (userId) => {
// Database query every single time
const user = await db.users.findOne({ id: userId });
const posts = await db.posts.find({ userId }).limit(10);
const followers = await db.follows.count({ followedId: userId });
return { user, posts, followerCount: followers };
// Problems:
// - Popular profiles queried hundreds of times per minute
// - Database becomes the bottleneck
// - Response times increase with user count
// - Server costs scale linearly with traffic
};
// ✅ With intelligent caching: Database rarely touched
const getUserProfileCached = async (userId) => {
const cacheKey = `user:${userId}:profile`;
// Check cache first
let profile = await redis.get(cacheKey);
if (profile) {
return JSON.parse(profile); // Instant response!
}
// Cache miss: Query database
const user = await db.users.findOne({ id: userId });
const posts = await db.posts.find({ userId }).limit(10);
const followers = await db.follows.count({ followedId: userId });
profile = { user, posts, followerCount: followers };
// Store in cache for 5 minutes
await redis.setex(cacheKey, 300, JSON.stringify(profile));
return profile;
// Benefits:
// - 95%+ requests served from cache (millisecond response)
// - Database load reduced by 20x
// - Handles traffic spikes gracefully
// - Server costs stay flat as traffic grows
};
Multi-Level Caching Architecture
Professional applications use layered caching:
// Complete caching strategy with multiple levels
class CacheManager {
constructor() {
this.memoryCache = new Map(); // L1: In-memory cache (fastest)
this.redisClient = redis.createClient(); // L2: Redis cache (shared)
this.maxMemoryItems = 1000;
this.memoryTTL = 60 * 1000; // 1 minute
this.redisTTL = 300; // 5 minutes
}
async get(key) {
// L1: Check memory cache first
const memoryCached = this.memoryCache.get(key);
if (memoryCached && Date.now() < memoryCached.expires) {
console.log(`Cache HIT (memory): ${key}`);
return memoryCached.data;
}
// L2: Check Redis cache
const redisCached = await this.redisClient.get(key);
if (redisCached) {
console.log(`Cache HIT (redis): ${key}`);
const data = JSON.parse(redisCached);
// Populate L1 cache for next time
this.setMemory(key, data);
return data;
}
console.log(`Cache MISS: ${key}`);
return null;
}
async set(key, data, ttl = this.redisTTL) {
// Store in both levels
this.setMemory(key, data);
await this.redisClient.setex(key, ttl, JSON.stringify(data));
}
setMemory(key, data) {
// Implement LRU eviction
if (this.memoryCache.size >= this.maxMemoryItems) {
const firstKey = this.memoryCache.keys().next().value;
this.memoryCache.delete(firstKey);
}
this.memoryCache.set(key, {
data,
expires: Date.now() + this.memoryTTL,
});
}
async invalidate(pattern) {
// Clear memory cache entries matching pattern
for (const [key] of this.memoryCache) {
if (key.includes(pattern)) {
this.memoryCache.delete(key);
}
}
// Clear Redis cache entries
const keys = await this.redisClient.keys(`*${pattern}*`);
if (keys.length > 0) {
await this.redisClient.del(keys);
}
}
}
const cache = new CacheManager();
// Smart cache patterns for different data types
const DataCache = {
// User profiles: Cache aggressively, invalidate on updates
async getUserProfile(userId) {
const key = `user:${userId}:profile`;
let profile = await cache.get(key);
if (!profile) {
profile = await db.users.findOne({ id: userId });
await cache.set(key, profile, 1800); // 30 minutes
}
return profile;
},
// Posts feed: Short cache, frequently updated
async getUserFeed(userId, page = 0) {
const key = `feed:${userId}:${page}`;
let feed = await cache.get(key);
if (!feed) {
feed = await db.posts
.find({ userId })
.sort({ createdAt: -1 })
.skip(page * 20)
.limit(20);
await cache.set(key, feed, 300); // 5 minutes
}
return feed;
},
// Popular content: Cache with warming
async getPopularPosts() {
const key = "posts:popular";
let posts = await cache.get(key);
if (!posts) {
posts = await db.posts
.find({ published: true })
.sort({ views: -1, likes: -1 })
.limit(50);
await cache.set(key, posts, 600); // 10 minutes
// Background refresh before expiry
setTimeout(() => {
this.warmPopularPosts();
}, 480000); // Refresh at 8 minutes
}
return posts;
},
async warmPopularPosts() {
const posts = await db.posts
.find({ published: true })
.sort({ views: -1, likes: -1 })
.limit(50);
await cache.set("posts:popular", posts, 600);
},
// Cache invalidation on data changes
async invalidateUserCache(userId) {
await cache.invalidate(`user:${userId}`);
await cache.invalidate(`feed:${userId}`);
},
async invalidatePostCache(postId, userId) {
await cache.invalidate(`post:${postId}`);
await cache.invalidate(`feed:${userId}`);
await cache.invalidate("posts:popular");
},
};
Write-through and write-back caching patterns:
// Write-through: Update cache and database together
const updateUserProfile = async (userId, updates) => {
try {
// Update database first
const updatedUser = await db.users.findOneAndUpdate(
{ id: userId },
{ $set: updates },
{ returnDocument: "after" }
);
// Update cache immediately (write-through)
const cacheKey = `user:${userId}:profile`;
await cache.set(cacheKey, updatedUser, 1800);
// Invalidate related caches
await DataCache.invalidateUserCache(userId);
return updatedUser;
} catch (error) {
// If cache update fails, invalidate to prevent stale data
await cache.invalidate(`user:${userId}`);
throw error;
}
};
// Write-back: Update cache immediately, database asynchronously
const updateUserActivity = async (userId, activity) => {
const cacheKey = `user:${userId}:activity`;
// Get current activity from cache
let currentActivity = (await cache.get(cacheKey)) || {
lastSeen: null,
actionsToday: 0,
streak: 0,
};
// Update in memory
currentActivity = {
...currentActivity,
lastSeen: new Date(),
actionsToday: currentActivity.actionsToday + 1,
...activity,
};
// Update cache immediately
await cache.set(cacheKey, currentActivity, 300);
// Schedule database update (write-back)
setImmediate(async () => {
try {
await db.users.updateOne(
{ id: userId },
{ $set: { lastActivity: currentActivity } }
);
} catch (error) {
console.error("Background activity update failed:", error);
// Cache will expire, forcing fresh database read
}
});
return currentActivity;
};
CDN and Asset Caching
Content Delivery Network integration:
// CDN-backed asset management
class AssetManager {
constructor() {
this.cdnBase = "https://cdn.yourdomain.com";
this.s3Client = new AWS.S3();
this.cloudfront = new AWS.CloudFront();
}
async uploadAsset(file, category) {
const timestamp = Date.now();
const fileName = `${category}/${timestamp}-${file.originalname}`;
// Upload to S3
const uploadParams = {
Bucket: "your-assets-bucket",
Key: fileName,
Body: file.buffer,
ContentType: file.mimetype,
CacheControl: "public, max-age=31536000", // 1 year cache
Metadata: {
uploadedAt: timestamp.toString(),
originalName: file.originalname,
},
};
const uploadResult = await this.s3Client.upload(uploadParams).promise();
// Generate CDN URLs for different sizes
const urls = {
original: `${this.cdnBase}/${fileName}`,
thumbnail: `${this.cdnBase}/${fileName}?w=150&h=150&fit=crop`,
medium: `${this.cdnBase}/${fileName}?w=800&h=600&fit=inside`,
large: `${this.cdnBase}/${fileName}?w=1200&h=900&fit=inside`,
};
// Cache asset metadata
const assetData = {
id: generateId(),
fileName,
originalName: file.originalname,
size: file.size,
mimetype: file.mimetype,
urls,
uploadedAt: new Date(timestamp),
};
await cache.set(`asset:${assetData.id}`, assetData, 86400); // 24 hours
return assetData;
}
async getAsset(assetId) {
// Check cache first
let asset = await cache.get(`asset:${assetId}`);
if (!asset) {
// Fallback to database
asset = await db.assets.findOne({ id: assetId });
if (asset) {
await cache.set(`asset:${assetId}`, asset, 86400);
}
}
return asset;
}
async invalidateAsset(fileName) {
// Create CloudFront invalidation
const invalidationParams = {
DistributionId: "YOUR_CLOUDFRONT_DISTRIBUTION_ID",
InvalidationBatch: {
CallerReference: Date.now().toString(),
Paths: {
Quantity: 1,
Items: [`/${fileName}*`], // Invalidate all sizes
},
},
};
await this.cloudfront.createInvalidation(invalidationParams).promise();
// Clear local cache
const keys = await cache.redisClient.keys(`asset:*${fileName}*`);
if (keys.length > 0) {
await cache.redisClient.del(keys);
}
}
}
Session Management: Scalable User State
Beyond Server Memory Sessions
The session scalability problem:
// ❌ In-memory sessions: Don't scale beyond one server
const express = require("express");
const session = require("express-session");
const app = express();
app.use(
session({
secret: "your-secret",
resave: false,
saveUninitialized: false,
cookie: { maxAge: 24 * 60 * 60 * 1000 }, // 24 hours
})
);
// Problems with memory sessions:
// 1. Sessions lost when server restarts
// 2. Load balancing breaks (sticky sessions required)
// 3. Can't scale horizontally
// 4. Memory usage grows with concurrent users
// 5. No session sharing between services
Professional session management with Redis:
// ✅ Redis-backed sessions: Scale infinitely
const RedisStore = require("connect-redis")(session);
const redis = require("redis");
const redisClient = redis.createClient({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
password: process.env.REDIS_PASSWORD,
db: 1, // Separate database for sessions
retry_strategy: (options) => {
if (options.error && options.error.code === "ECONNREFUSED") {
console.error("Redis server connection refused");
}
if (options.total_retry_time > 1000 * 60 * 60) {
return new Error("Redis retry time exhausted");
}
if (options.attempt > 10) {
return undefined; // Stop retrying
}
return Math.min(options.attempt * 100, 3000);
},
});
app.use(
session({
store: new RedisStore({
client: redisClient,
prefix: "sess:", // Namespace sessions
ttl: 86400, // 24 hours in seconds
}),
secret: process.env.SESSION_SECRET,
resave: false,
saveUninitialized: false,
rolling: true, // Reset expiry on activity
cookie: {
secure: process.env.NODE_ENV === "production", // HTTPS only in prod
httpOnly: true, // Prevent XSS
maxAge: 24 * 60 * 60 * 1000,
sameSite: "strict", // CSRF protection
},
})
);
// Benefits of Redis sessions:
// ✅ Survive server restarts
// ✅ Work with any load balancer
// ✅ Scale to millions of users
// ✅ Share sessions across microservices
// ✅ Persistent and reliable
Advanced Session Patterns
JWT vs Session Store comparison:
// Session-based authentication
const SessionAuth = {
async createSession(userId, userAgent, ipAddress) {
const sessionId = generateSecureId();
const sessionData = {
userId,
createdAt: new Date(),
lastActivity: new Date(),
userAgent,
ipAddress,
isActive: true,
permissions: await this.getUserPermissions(userId),
};
// Store in Redis with expiration
await redisClient.setex(
`session:${sessionId}`,
86400, // 24 hours
JSON.stringify(sessionData)
);
// Track user's active sessions
await redisClient.sadd(`user:${userId}:sessions`, sessionId);
await redisClient.expire(`user:${userId}:sessions`, 86400);
return sessionId;
},
async getSession(sessionId) {
const sessionData = await redisClient.get(`session:${sessionId}`);
if (!sessionData) return null;
const session = JSON.parse(sessionData);
// Update last activity
session.lastActivity = new Date();
await redisClient.setex(
`session:${sessionId}`,
86400,
JSON.stringify(session)
);
return session;
},
async invalidateSession(sessionId) {
const session = await this.getSession(sessionId);
if (session) {
// Remove from user's active sessions
await redisClient.srem(`user:${session.userId}:sessions`, sessionId);
}
await redisClient.del(`session:${sessionId}`);
},
async invalidateAllUserSessions(userId) {
const sessionIds = await redisClient.smembers(`user:${userId}:sessions`);
if (sessionIds.length > 0) {
const pipeline = redisClient.pipeline();
sessionIds.forEach((sessionId) => {
pipeline.del(`session:${sessionId}`);
});
pipeline.del(`user:${userId}:sessions`);
await pipeline.exec();
}
},
};
// JWT-based authentication (stateless)
const JWTAuth = {
generateTokens(userId, permissions) {
const accessToken = jwt.sign(
{
userId,
permissions,
type: "access",
iat: Math.floor(Date.now() / 1000),
},
process.env.JWT_SECRET,
{ expiresIn: "15m" }
);
const refreshToken = jwt.sign(
{
userId,
type: "refresh",
iat: Math.floor(Date.now() / 1000),
},
process.env.JWT_REFRESH_SECRET,
{ expiresIn: "7d" }
);
return { accessToken, refreshToken };
},
async verifyAccessToken(token) {
try {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
// Check if token is blacklisted (for logout)
const isBlacklisted = await redisClient.get(`blacklist:${token}`);
if (isBlacklisted) {
throw new Error("Token is blacklisted");
}
return decoded;
} catch (error) {
return null;
}
},
async refreshTokens(refreshToken) {
try {
const decoded = jwt.verify(refreshToken, process.env.JWT_REFRESH_SECRET);
// Verify user still exists and is active
const user = await db.users.findOne({
id: decoded.userId,
isActive: true,
});
if (!user) {
throw new Error("User not found or inactive");
}
const permissions = await this.getUserPermissions(decoded.userId);
return this.generateTokens(decoded.userId, permissions);
} catch (error) {
throw new Error("Invalid refresh token");
}
},
async blacklistToken(token) {
const decoded = jwt.decode(token);
if (decoded && decoded.exp) {
const ttl = decoded.exp - Math.floor(Date.now() / 1000);
if (ttl > 0) {
await redisClient.setex(`blacklist:${token}`, ttl, "true");
}
}
},
};
Session security and monitoring:
// Advanced session security
class SecureSessionManager {
constructor() {
this.maxSessionsPerUser = 5;
this.sessionTimeout = 30 * 60; // 30 minutes of inactivity
this.suspiciousActivityThreshold = 10;
}
async createSecureSession(userId, request) {
const fingerprint = this.generateFingerprint(request);
const sessionId = generateSecureId();
// Check for suspicious activity
await this.checkSuspiciousActivity(userId, request);
// Limit concurrent sessions
await this.enforceSessionLimit(userId);
const sessionData = {
userId,
sessionId,
fingerprint,
createdAt: Date.now(),
lastActivity: Date.now(),
ipAddress: request.ip,
userAgent: request.get("User-Agent"),
loginMethod: "password", // or 'oauth', 'magic-link', etc.
securityLevel: "standard",
isActive: true,
};
await redisClient.setex(
`session:${sessionId}`,
this.sessionTimeout,
JSON.stringify(sessionData)
);
// Track in user's sessions list
await redisClient.sadd(`user:${userId}:sessions`, sessionId);
await redisClient.expire(`user:${userId}:sessions`, 86400);
// Log session creation
await this.logSessionEvent(userId, sessionId, "created", request);
return sessionId;
}
async validateSession(sessionId, request) {
const sessionData = await redisClient.get(`session:${sessionId}`);
if (!sessionData) {
return { valid: false, reason: "session_not_found" };
}
const session = JSON.parse(sessionData);
const currentFingerprint = this.generateFingerprint(request);
// Check fingerprint consistency
if (session.fingerprint !== currentFingerprint) {
await this.logSessionEvent(
session.userId,
sessionId,
"fingerprint_mismatch",
request
);
// Don't immediately invalidate - could be legitimate browser update
session.securityLevel = "elevated";
}
// Check for IP address changes
if (session.ipAddress !== request.ip) {
await this.logSessionEvent(
session.userId,
sessionId,
"ip_change",
request
);
session.securityLevel = "elevated";
session.ipAddress = request.ip; // Update to new IP
}
// Update activity and extend session
session.lastActivity = Date.now();
await redisClient.setex(
`session:${sessionId}`,
this.sessionTimeout,
JSON.stringify(session)
);
return { valid: true, session };
}
generateFingerprint(request) {
const components = [
request.get("User-Agent") || "",
request.get("Accept-Language") || "",
request.get("Accept-Encoding") || "",
request.get("Accept") || "",
];
return crypto
.createHash("sha256")
.update(components.join("|"))
.digest("hex");
}
async checkSuspiciousActivity(userId, request) {
const key = `activity:${userId}:${request.ip}`;
const attempts = await redisClient.incr(key);
if (attempts === 1) {
await redisClient.expire(key, 300); // 5 minutes window
}
if (attempts > this.suspiciousActivityThreshold) {
await this.logSessionEvent(userId, null, "suspicious_activity", request);
// Implement rate limiting
throw new Error("Too many login attempts. Please try again later.");
}
}
async enforceSessionLimit(userId) {
const sessions = await redisClient.smembers(`user:${userId}:sessions`);
if (sessions.length >= this.maxSessionsPerUser) {
// Remove oldest session
const sessionDetails = await Promise.all(
sessions.map(async (sessionId) => {
const data = await redisClient.get(`session:${sessionId}`);
return data ? { sessionId, ...JSON.parse(data) } : null;
})
);
const validSessions = sessionDetails
.filter(Boolean)
.sort((a, b) => a.lastActivity - b.lastActivity);
if (validSessions.length >= this.maxSessionsPerUser) {
const oldestSession = validSessions[0];
await this.invalidateSession(oldestSession.sessionId);
}
}
}
async logSessionEvent(userId, sessionId, event, request) {
const eventData = {
userId,
sessionId,
event,
timestamp: Date.now(),
ip: request.ip,
userAgent: request.get("User-Agent"),
endpoint: request.path,
};
// Store in time-series for analysis
await redisClient.zadd(
`events:${userId}`,
Date.now(),
JSON.stringify(eventData)
);
// Keep only last 1000 events per user
await redisClient.zremrangebyrank(`events:${userId}`, 0, -1001);
}
}
File Upload Handling: Safe and Scalable Media Processing
Secure File Upload Implementation
The file upload security nightmare:
// ❌ Dangerous file upload (don't do this)
app.post("/upload", upload.single("file"), (req, res) => {
// Accepting any file type - SECURITY RISK!
// No file size validation - RESOURCE EXHAUSTION!
// No virus scanning - MALWARE RISK!
// Storing with original filename - PATH TRAVERSAL!
const file = req.file;
fs.writeFileSync(`./uploads/${file.originalname}`, file.buffer);
res.json({ url: `/uploads/${file.originalname}` });
});
// This code is a hacker's dream:
// - Upload ../../../etc/passwd to access system files
// - Upload 10GB files to crash the server
// - Upload executable files to inject malware
// - Upload files with malicious headers
Professional secure file upload system:
// ✅ Secure, production-ready file upload
const multer = require("multer");
const path = require("path");
const crypto = require("crypto");
const sharp = require("sharp"); // Image processing
const fileType = require("file-type"); // Real file type detection
class SecureFileUploader {
constructor() {
this.allowedMimeTypes = {
"image/jpeg": { ext: "jpg", maxSize: 10 * 1024 * 1024 }, // 10MB
"image/png": { ext: "png", maxSize: 10 * 1024 * 1024 },
"image/gif": { ext: "gif", maxSize: 5 * 1024 * 1024 }, // 5MB
"image/webp": { ext: "webp", maxSize: 10 * 1024 * 1024 },
"application/pdf": { ext: "pdf", maxSize: 25 * 1024 * 1024 }, // 25MB
"text/plain": { ext: "txt", maxSize: 1 * 1024 * 1024 }, // 1MB
"application/json": { ext: "json", maxSize: 1 * 1024 * 1024 },
};
this.uploadDir = process.env.UPLOAD_DIR || "./secure-uploads";
this.maxFiles = 10; // Maximum files per request
this.maxTotalSize = 100 * 1024 * 1024; // 100MB total per request
this.setupMulter();
}
setupMulter() {
const storage = multer.memoryStorage(); // Keep files in memory for processing
this.upload = multer({
storage,
limits: {
fileSize: Math.max(
...Object.values(this.allowedMimeTypes).map((t) => t.maxSize)
),
files: this.maxFiles,
fieldSize: 1024 * 1024, // 1MB field size
fieldNameSize: 100,
fields: 50,
},
fileFilter: (req, file, callback) => {
this.validateFile(file, callback);
},
});
}
validateFile(file, callback) {
// Check MIME type against whitelist
if (!this.allowedMimeTypes[file.mimetype]) {
return callback(new Error(`File type ${file.mimetype} not allowed`));
}
// Validate filename
const filename = file.originalname;
if (!/^[a-zA-Z0-9._-]+$/.test(filename)) {
return callback(new Error("Invalid filename characters"));
}
if (filename.length > 255) {
return callback(new Error("Filename too long"));
}
callback(null, true);
}
async processUpload(files, userId) {
const processedFiles = [];
let totalSize = 0;
for (const file of files) {
// Verify total size limit
totalSize += file.size;
if (totalSize > this.maxTotalSize) {
throw new Error("Total file size exceeds limit");
}
// Detect real file type from buffer
const detectedType = await fileType.fromBuffer(file.buffer);
if (!detectedType || detectedType.mime !== file.mimetype) {
throw new Error(`File type mismatch for ${file.originalname}`);
}
// Verify against size limit for this file type
const typeConfig = this.allowedMimeTypes[file.mimetype];
if (file.size > typeConfig.maxSize) {
throw new Error(`File ${file.originalname} exceeds size limit`);
}
// Generate secure filename
const fileId = crypto.randomBytes(16).toString("hex");
const secureFilename = `${fileId}.${typeConfig.ext}`;
const filePath = path.join(this.uploadDir, secureFilename);
// Virus scanning (integrate with ClamAV or similar)
await this.scanForViruses(file.buffer);
// Process based on file type
let processedBuffer = file.buffer;
let metadata = {};
if (file.mimetype.startsWith("image/")) {
const result = await this.processImage(file.buffer, file.mimetype);
processedBuffer = result.buffer;
metadata = result.metadata;
}
// Save to secure location
await fs.writeFile(filePath, processedBuffer);
// Store file metadata in database
const fileRecord = {
id: fileId,
originalName: file.originalname,
filename: secureFilename,
mimetype: file.mimetype,
size: processedBuffer.length,
uploadedBy: userId,
uploadedAt: new Date(),
metadata,
status: "active",
};
await db.files.insertOne(fileRecord);
processedFiles.push({
id: fileId,
url: `/files/${fileId}`,
originalName: file.originalname,
size: fileRecord.size,
type: file.mimetype,
});
}
return processedFiles;
}
async processImage(buffer, mimetype) {
const image = sharp(buffer);
const imageMetadata = await image.metadata();
// Security: Remove EXIF data that might contain sensitive info
let processedImage = image.rotate(); // Auto-rotate based on EXIF
// Validate image dimensions (prevent decompression bombs)
if (imageMetadata.width > 10000 || imageMetadata.height > 10000) {
throw new Error("Image dimensions too large");
}
// Convert to safe format if needed
let outputBuffer;
switch (mimetype) {
case "image/jpeg":
outputBuffer = await processedImage
.jpeg({ quality: 85, mozjpeg: true })
.toBuffer();
break;
case "image/png":
outputBuffer = await processedImage
.png({ compressionLevel: 6 })
.toBuffer();
break;
case "image/webp":
outputBuffer = await processedImage.webp({ quality: 85 }).toBuffer();
break;
default:
outputBuffer = buffer;
}
return {
buffer: outputBuffer,
metadata: {
width: imageMetadata.width,
height: imageMetadata.height,
format: imageMetadata.format,
hasAlpha: imageMetadata.hasAlpha,
colorSpace: imageMetadata.space,
},
};
}
async scanForViruses(buffer) {
// Integrate with ClamAV or similar antivirus
// This is a placeholder - implement actual virus scanning
try {
// const scanResult = await clamAV.scanBuffer(buffer);
// if (scanResult.isInfected) {
// throw new Error('File contains malware');
// }
// Basic content inspection for now
const content = buffer.toString(
"ascii",
0,
Math.min(buffer.length, 1024)
);
// Check for suspicious patterns
const suspiciousPatterns = [
/\x00\x00\x00\x00.*shell/i,
/eval\s*\(/i,
/<script/i,
/javascript:/i,
];
for (const pattern of suspiciousPatterns) {
if (pattern.test(content)) {
throw new Error("Suspicious content detected");
}
}
} catch (error) {
throw new Error(`Security scan failed: ${error.message}`);
}
}
// File serving with access control
async serveFile(fileId, userId, req, res) {
// Get file record
const fileRecord = await db.files.findOne({ id: fileId, status: "active" });
if (!fileRecord) {
return res.status(404).json({ error: "File not found" });
}
// Check access permissions
if (!(await this.checkFileAccess(fileRecord, userId))) {
return res.status(403).json({ error: "Access denied" });
}
const filePath = path.join(this.uploadDir, fileRecord.filename);
// Check if file exists on disk
if (
!(await fs
.access(filePath)
.then(() => true)
.catch(() => false))
) {
return res.status(404).json({ error: "File not found on disk" });
}
// Set appropriate headers
res.setHeader("Content-Type", fileRecord.mimetype);
res.setHeader("Content-Length", fileRecord.size);
res.setHeader(
"Content-Disposition",
`inline; filename="${fileRecord.originalName}"`
);
// Security headers
res.setHeader("X-Content-Type-Options", "nosniff");
res.setHeader("Content-Security-Policy", "default-src 'none'");
// Serve file
const fileStream = fs.createReadStream(filePath);
fileStream.pipe(res);
// Log file access
await this.logFileAccess(fileId, userId, req);
}
async checkFileAccess(fileRecord, userId) {
// Owner can always access
if (fileRecord.uploadedBy === userId) {
return true;
}
// Check if file is shared publicly
const sharing = await db.fileSharing.findOne({
fileId: fileRecord.id,
isActive: true,
});
if (sharing) {
if (sharing.shareType === "public") {
return true;
}
if (
sharing.shareType === "users" &&
sharing.allowedUsers.includes(userId)
) {
return true;
}
}
return false;
}
}
// Usage
const fileUploader = new SecureFileUploader();
app.post(
"/upload",
fileUploader.upload.array("files", 10),
async (req, res) => {
try {
const userId = req.user.id;
const files = req.files;
if (!files || files.length === 0) {
return res.status(400).json({ error: "No files provided" });
}
const processedFiles = await fileUploader.processUpload(files, userId);
res.json({
success: true,
files: processedFiles,
});
} catch (error) {
console.error("File upload error:", error);
res.status(400).json({
error: error.message || "Upload failed",
});
}
}
);
app.get("/files/:fileId", async (req, res) => {
try {
const { fileId } = req.params;
const userId = req.user?.id;
await fileUploader.serveFile(fileId, userId, req, res);
} catch (error) {
console.error("File serve error:", error);
res.status(500).json({ error: "Failed to serve file" });
}
});
Image and Media Processing: Optimization at Scale
Automated Image Processing Pipeline
Real-time image optimization:
// Professional image processing service
const sharp = require("sharp");
const ffmpeg = require("fluent-ffmpeg");
const AWS = require("aws-sdk");
class MediaProcessor {
constructor() {
this.s3 = new AWS.S3();
this.bucket = process.env.MEDIA_BUCKET;
this.cdnBase = process.env.CDN_BASE_URL;
this.imageFormats = {
thumbnail: { width: 150, height: 150, quality: 80 },
small: { width: 400, height: 300, quality: 85 },
medium: { width: 800, height: 600, quality: 85 },
large: { width: 1200, height: 900, quality: 90 },
original: { quality: 95 },
};
this.videoFormats = {
preview: { width: 320, height: 240, format: "mp4" },
sd: { width: 640, height: 480, format: "mp4" },
hd: { width: 1280, height: 720, format: "mp4" },
fullhd: { width: 1920, height: 1080, format: "mp4" },
};
}
async processImage(buffer, originalName, userId) {
const fileId = generateId();
const tasks = [];
const results = { id: fileId, variants: {} };
// Analyze original image
const image = sharp(buffer);
const metadata = await image.metadata();
// Validate image
if (metadata.width > 15000 || metadata.height > 15000) {
throw new Error("Image dimensions too large");
}
// Generate all size variants
for (const [sizeName, config] of Object.entries(this.imageFormats)) {
tasks.push(
this.generateImageVariant(buffer, fileId, sizeName, config, metadata)
);
}
// Process all variants in parallel
const variants = await Promise.all(tasks);
// Upload to S3 and CDN
const uploadTasks = variants.map((variant) =>
this.uploadImageVariant(variant, fileId, sizeName)
);
await Promise.all(uploadTasks);
// Store metadata
const imageRecord = {
id: fileId,
type: "image",
originalName,
uploadedBy: userId,
metadata: {
width: metadata.width,
height: metadata.height,
format: metadata.format,
size: buffer.length,
hasAlpha: metadata.hasAlpha,
},
variants: variants.reduce((acc, variant) => {
acc[variant.size] = {
url: `${this.cdnBase}/${fileId}/${variant.size}.${variant.format}`,
width: variant.width,
height: variant.height,
fileSize: variant.buffer.length,
};
return acc;
}, {}),
processedAt: new Date(),
status: "ready",
};
await db.media.insertOne(imageRecord);
return imageRecord;
}
async generateImageVariant(
buffer,
fileId,
sizeName,
config,
originalMetadata
) {
let image = sharp(buffer);
// Progressive JPEG loading
if (originalMetadata.format === "jpeg") {
image = image.jpeg({
quality: config.quality,
progressive: true,
mozjpeg: true,
});
}
// Resize if dimensions specified
if (config.width || config.height) {
image = image.resize(config.width, config.height, {
fit: sizeName === "thumbnail" ? "cover" : "inside",
withoutEnlargement: true,
});
}
// Apply optimizations
image = image
.rotate() // Auto-rotate based on EXIF
.removeMetadata() // Strip EXIF data for privacy
.sharpen({ sigma: 0.5 }); // Light sharpening after resize
const processedBuffer = await image.toBuffer({ resolveWithObject: true });
return {
size: sizeName,
buffer: processedBuffer.data,
width: processedBuffer.info.width,
height: processedBuffer.info.height,
format: processedBuffer.info.format,
fileSize: processedBuffer.data.length,
};
}
async uploadImageVariant(variant, fileId, sizeName) {
const key = `images/${fileId}/${sizeName}.${variant.format}`;
const uploadParams = {
Bucket: this.bucket,
Key: key,
Body: variant.buffer,
ContentType: `image/${variant.format}`,
CacheControl: "public, max-age=31536000", // 1 year
Metadata: {
width: variant.width.toString(),
height: variant.height.toString(),
size: sizeName,
processedAt: new Date().toISOString(),
},
};
await this.s3.upload(uploadParams).promise();
return key;
}
async processVideo(filePath, originalName, userId) {
const fileId = generateId();
const results = { id: fileId, variants: {}, thumbnails: [] };
// Generate video metadata
const metadata = await this.getVideoMetadata(filePath);
// Generate thumbnail from video
const thumbnailPath = await this.generateVideoThumbnail(filePath, fileId);
const thumbnailBuffer = await fs.readFile(thumbnailPath);
// Process thumbnail like regular image
const thumbnail = await this.processImage(
thumbnailBuffer,
"thumbnail.jpg",
userId
);
results.thumbnail = thumbnail;
// Generate video variants for different qualities
const videoTasks = [];
for (const [quality, config] of Object.entries(this.videoFormats)) {
if (metadata.width >= config.width || quality === "preview") {
videoTasks.push(
this.generateVideoVariant(filePath, fileId, quality, config)
);
}
}
const videoVariants = await Promise.all(videoTasks);
// Upload video files
const uploadTasks = videoVariants.map((variant) =>
this.uploadVideoVariant(variant, fileId)
);
await Promise.all(uploadTasks);
// Store video record
const videoRecord = {
id: fileId,
type: "video",
originalName,
uploadedBy: userId,
metadata: {
duration: metadata.duration,
width: metadata.width,
height: metadata.height,
format: metadata.format,
size: metadata.size,
bitrate: metadata.bitrate,
fps: metadata.fps,
},
thumbnail: thumbnail.id,
variants: videoVariants.reduce((acc, variant) => {
acc[variant.quality] = {
url: `${this.cdnBase}/${fileId}/${variant.quality}.${variant.format}`,
width: variant.width,
height: variant.height,
fileSize: variant.size,
bitrate: variant.bitrate,
};
return acc;
}, {}),
processedAt: new Date(),
status: "ready",
};
await db.media.insertOne(videoRecord);
// Cleanup temporary files
await fs.unlink(filePath).catch(() => {});
await fs.unlink(thumbnailPath).catch(() => {});
return videoRecord;
}
async generateVideoVariant(inputPath, fileId, quality, config) {
const outputPath = `/tmp/${fileId}_${quality}.${config.format}`;
return new Promise((resolve, reject) => {
ffmpeg(inputPath)
.size(`${config.width}x${config.height}`)
.videoBitrate("1000k")
.audioBitrate("128k")
.format(config.format)
.videoCodec("libx264")
.audioCodec("aac")
.addOptions([
"-preset fast",
"-crf 23",
"-movflags +faststart", // Enable progressive download
"-profile:v baseline", // Better compatibility
"-level 3.0",
])
.on("end", async () => {
const stats = await fs.stat(outputPath);
resolve({
quality,
format: config.format,
width: config.width,
height: config.height,
path: outputPath,
size: stats.size,
bitrate: "1000k",
});
})
.on("error", reject)
.save(outputPath);
});
}
async generateVideoThumbnail(inputPath, fileId) {
const thumbnailPath = `/tmp/${fileId}_thumb.jpg`;
return new Promise((resolve, reject) => {
ffmpeg(inputPath)
.screenshots({
timestamps: ["10%"], // Capture at 10% through video
filename: path.basename(thumbnailPath),
folder: path.dirname(thumbnailPath),
size: "800x600",
})
.on("end", () => resolve(thumbnailPath))
.on("error", reject);
});
}
}
// Background processing with queues
const Queue = require("bull");
const mediaQueue = new Queue("media processing", {
redis: {
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
},
});
mediaQueue.process("process-image", async (job) => {
const { buffer, originalName, userId } = job.data;
const processor = new MediaProcessor();
try {
const result = await processor.processImage(buffer, originalName, userId);
return result;
} catch (error) {
console.error("Image processing failed:", error);
throw error;
}
});
mediaQueue.process("process-video", async (job) => {
const { filePath, originalName, userId } = job.data;
const processor = new MediaProcessor();
try {
const result = await processor.processVideo(filePath, originalName, userId);
return result;
} catch (error) {
console.error("Video processing failed:", error);
throw error;
}
});
// Usage in upload endpoint
app.post("/upload-media", upload.single("media"), async (req, res) => {
try {
const file = req.file;
const userId = req.user.id;
if (file.mimetype.startsWith("image/")) {
// Queue image processing
const job = await mediaQueue.add("process-image", {
buffer: file.buffer,
originalName: file.originalname,
userId,
});
res.json({
success: true,
jobId: job.id,
message: "Image processing started",
});
} else if (file.mimetype.startsWith("video/")) {
// Save to temp file for video processing
const tempPath = `/tmp/${Date.now()}-${file.originalname}`;
await fs.writeFile(tempPath, file.buffer);
const job = await mediaQueue.add("process-video", {
filePath: tempPath,
originalName: file.originalname,
userId,
});
res.json({
success: true,
jobId: job.id,
message: "Video processing started",
});
}
} catch (error) {
console.error("Media upload error:", error);
res.status(500).json({ error: "Upload failed" });
}
});
Key Takeaways
Advanced data management separates basic web applications from professional, scalable systems. Intelligent caching eliminates database bottlenecks, proper session management enables horizontal scaling, secure file handling protects against attacks, and efficient media processing delivers great user experiences.
The data management mindset you need:
- Cache strategically, not universally: Use multi-level caching with appropriate TTLs and invalidation strategies
- Sessions must scale beyond memory: Redis-backed sessions enable load balancing and service distribution
- File uploads are security battlegrounds: Validate everything, process safely, store securely
- Media optimization is user experience: Responsive images and optimized videos keep users engaged
What distinguishes professional data management:
- Caching architectures that reduce database load by 90%+ while maintaining data consistency
- Session systems that handle millions of users across distributed infrastructure
- File upload pipelines that prevent security breaches while processing media efficiently
- Background processing that handles expensive operations without blocking user interactions
What’s Next
We’ve covered the foundational aspects of data management—caching, sessions, and file handling. In the next article, we’ll dive deeper into advanced data patterns: search implementations, data synchronization, background job processing, message queues, and streaming data architectures.
The data layer is more than storage—it’s the performance foundation that determines whether your application scales gracefully or crumbles under load. Master these patterns, and you can build systems that handle millions of users while maintaining the responsiveness of a local application.
You’re no longer just moving data around—you’re architecting data flows that optimize for performance, security, and scalability simultaneously. The foundation is solid. Time to build advanced data orchestration systems.