November 25, 2025· 11 min read

The Only Guide You'd Ever Need for Load Balancers - 2

#load balancers #computer science #development #system design #dns #networking

The Simplest Possible Load Balancer - DNS Round Robin

If you’re here after reading part 1, welcome back g. For a quick recap, we’ve by now figured out that throwing more money at a single server doesn’t scale forever, and that multiple servers are the way to go. We also realized we have a pretty massive problem…how do clients know which server to connect to?

I’m going to explore the most basic approach to load balancing. It’s pretty far from perfect, but understanding why it doesn’t work will teach us exactly what we need from a “real” load balancer. This practice of using DNS RR is really bad and not use anywhere, but IMO it’s pretty important for building up the intuition for the real one. So if you want to skip this, feel free to, although I don’t recommend doing that.

DNS

Before I jump into DNS based load balancing, let’s quickly refresh how DNS actually works. When you go to www.forwingmen.com, your computer has no idea what that means. Computers don’t speak domain names, they speak IP addresses. So there needs to be a translation layer, and that’s basically what DNS (Domain Name System) is.

Here’s What Happens When You Visit a Website

You type www.forwingmen.com

1. Browser -> "wtf's the IP address for it?"
2. Your computer -> checks its local cache
3. Not in cache? Ask (ISP's) DNS server
4. ISP's DNS server -> ask root DNS/TLD/authoritative servers
5. Someone returns the actual IP, say 192.168.1.10
6. Browser -> connects to 192.168.1.10
7. Your computer -> caches this

Browser-DNS basic architecture

Look at step 5. That response with the IP address is called an A record (Address record). It’s literally just a mapping:

www.forwingmen.com → 192.168.1.10

What if…we returned multiple IP addresses?

DNS Round Robin

Remember our three servers from Part 1?

Server 1: 192.168.1.10
Server 2: 192.168.1.11
Server 3: 192.168.1.12

With DNS RR, instead of having one A record, you configure your DNS server to have multiple A records for the same domain:

www.forwingmen.com → 192.168.1.10
www.forwingmen.com → 192.168.1.11
www.forwingmen.com → 192.168.1.12

DNS Zone File

When a DNS query comes in for www.forwingmen.com, the DNS server returns all three IP addresses, but it rotates the order each time. That’s the “round robin” part.

Rotates the Order?

Here’s what I mean by “rotates the order”

Query 1 from User A:
Response: [192.168.1.10, 192.168.1.11, 192.168.1.12]

Query 2 from User B:
Response: [192.168.1.11, 192.168.1.12, 192.168.1.10]

Query 3 from User C:
Response: [192.168.1.12, 192.168.1.10, 192.168.1.11]

Query 4 from User D:
Response: [192.168.1.10, 192.168.1.11, 192.168.1.12]

... and so on

DNS Round Robin Queries

The DNS server is literally just changing which IP comes first in the list. But the thing is, most clients will just use the first IP in the list.

So in theory, this distributes traffic evenly across your three servers. User A goes to Server 1, User B goes to Server 2, User C goes to Server 3, User D goes back to Server 1. Perfect, right?

Wrong.

But before we destroy this approach (and we will), let’s actually see wtf is going on here.

Setting Up DNS Round Robin

Let’s say you control the DNS records for forwingmen.com. Here’s how you’d configure DNS RR (I’ll use a common DNS zone file syntax)

; DNS Zone file for forwingmen.com
$TTL 300

www  IN  A  192.168.1.10
www  IN  A  192.168.1.11
www  IN  A  192.168.1.12

That’s it. Three lines. The $TTL 300 means “cache this for 300 seconds (5 minutes)”. We’ll come back to why that matters.

With most DNS providers, you’d just add three A records with the same hostname:

Type: A
Name: www
Value: 192.168.1.10

Type: A
Name: www
Value: 192.168.1.11

Type: A
Name: www
Value: 192.168.1.12

Done. You now have “load balancing”. Kinda.

Walking Through Real Requests

Let’s see what actually happens when users start visiting your site.

First User: Sydney

10:00:00 AM - Sydney opens her browser
10:00:01 AM - Types www.forwingmen.com
10:00:02 AM - Browser requests for the IP of www.forwingmen.com
10:00:03 AM - DNS responses -> [192.168.1.10, 192.168.1.11, 192.168.1.12]
10:00:04 AM - Browser picks the first one: 192.168.1.10
10:00:05 AM - Connects to server 1

Sydney is now talking to server 1. Her browser also caches this IP address for 5 minutes (remember that TTL we set?)

Second User: Sweeney

10:00:10 AM - Sweeney opens her browser
10:00:11 AM - Types www.forwingmen.com
10:00:12 AM - Browser requests for the IP of www.forwingmen.com
10:00:13 AM - DNS responds -> [192.168.1.11, 192.168.1.12, 192.168.1.10] (different order!!)
10:00:14 AM - Browser picks the first one: 192.168.1.11
10:00:15 AM - Connects to server 2

Sweeney is now talking to server 2. Lesgo, the traffic’s being distributed.

Third User: Sanchit

10:00:20 AM - Sanchit opens his browser
10:00:21 AM - Types www.forwingmen.com
10:00:22 AM - Browser requests for the IP of www.forwingmen.com
10:00:23 AM - DNS responds: [192.168.1.12, 192.168.1.10, 192.168.1.11] (again, order)
10:00:24 AM - Browser picks the first one: 192.168.1.12
10:00:25 AM - Connects to server 3

Sanchit is now talking to server 3. Simple distribution.

Sydney Again (4 Minutes Later)

10:05:00 AM - Sydney clicks another link on your site
10:05:01 AM - Browser checks cache for IP
10:05:02 AM - Cache still has 192.168.1.10
10:05:03 AM - Connects to server 1 (no DNS query needed)

Sydney is still on server 1. The browser didn’t even ask DNS again because it cached the result.

This is where things start to get interesting…and problematic.

The Problems Start to Show

Let’s simulate what happens over the course of an hour with 1,000 users. You’d think it would be roughly:

Server 1: ~333 users
Server 2: ~333 users
Server 3: ~334 users

But that’s not what happens.

Problem #1: DNS Caching Ruins Everything

Remember that 5 min TTL? That means every user’s DNS resolution gets cached for 5 minutes. But it’s actually worse than that:

Multiple levels of caching:

Browser cache (respects TTL, mostly)
Operating system cache (might ignore TTL)
Home router cache (definitely ignores TTL)
ISP DNS cache (sometimes ignores TTL)

Layers/Levels of caching

Your TTL = 5 minutes But actually cached time is more.

You get the point. So even though you set TTL to 5 minutes, users might be stuck with the same IP for 30 minutes, an hour, or even longer.

The distribution nightmare:

Sydney's company has 100 employees.
They all share the same corporate DNS server.
At 9:00 AM, the first employee queries your site.
Corporate DNS caches the result: 192.168.1.10
For the next hour, all 100 employees connect to server 1.

Sweeney's ISP has 10,000 customers in her area.
They share an ISP DNS server.
It happens to cache 192.168.1.11.
Now thousands of users are hitting Server 2.

Server distribution after an hour:
Server 1: 3,000 requests
Server 2: 5,000 requests
Server 3: 200 requests

DNS caching problem’s timeline

So much for “even distribution”

Problem #2: What Happens When a Server Dies?

Let’s say Server 2 crashes at 10:30 AM. Here’s what happens:

10:30:00 AM - server 2 dies (192.168.1.11)
10:30:02 AM - you try to remove it from DNS
10:30:05 AM - DNS updated...only two A records now
10:30:05 AM - Sweeney tries to access the site
10:30:06 AM - Her DNS cache still says: "192.168.1.11"
10:30:07 AM - Browser tries to connect to 192.168.1.11
10:30:37 AM - Connection timeout (30 seconds later)
10:30:38 AM - Browser tries next IP: 192.168.1.12
10:30:39 AM - Finally connects to server 3

Timeline for a scenario of a server failing

Sweeney just waited 30+ seconds to load your page.

DNS caching means Sweeney’s PC (and thousands of others) might keep trying the dead server for the next 30 minutes, hour, or longer, depending on the cache.

Problem #3: No Session Awareness

Remember our wingman dating app from Part 1? Let’s say Sydney logs in:

10:00:00 AM - Sydney visits www.forwingmen.com
10:00:01 AM - DNS gives her 192.168.1.10
10:00:02 AM - She logs in on server 1
10:00:03 AM - server 1 creates a session and stores: "Sydney is logged in"

Now Sydney clicks “View Matches”:

10:00:10 AM - Sydney clicks "View Matches"
10:00:11 AM - DNS cache expired (short TTL for this example)
10:00:12 AM - new DNS query returns 192.168.1.11 (server 2)
10:00:13 AM - request goes to server 2
10:00:14 AM - server 2 checks if Sydney is logged in
10:00:15 AM - server 2 doesn't know who Sydney is
10:00:16 AM - Sydney sees "Please log in again"

Getting redirect to a different server

Sydney just got logged out for no apparent reason. She’s now questioning if your app is trash (it might be, but not for this reason).

This happens because each server stores its own session data. Server 1 knows Sydney is logged in, but server 2 has no idea. DNS RR has zero awareness of sessions or state.

Problem #4: Uneven Request Distribution

Even if caching wasn’t an issue, DNS RR distributes DNS queries, not requests. This is a huge difference:

User A makes 1 DNS query, then makes like 50 requests (browsing your site)
User B makes 1 DNS query, then makes 2 requests (leaves immediately, hated ur website)
User C makes 1 DNS query, then makes 500 requests (ur avg. power user)

DNS RR thinks: "neat, 1 query to server 1, 1 to server 2, 1 to server 3"

Reality:
Server 1: 50 requests (User A)
Server 2: 2 requests (User B)
Server 3: 500 requests (User C)

Uneven load balance

Problem #5: No Control Over the Algorithm

With this, you’re at the mercy of:

How the DNS server rotates IPs (you can’t control this)
How clients pick from multiple IPs (browsers do whatever tf they want)
How caching layers behave (completely out of your control)
How long things stay cached (lol good luck)

You can’t say “server 1 is more powerful, send more traffic there.” (pretty good intution for weighted RR right here), you can’t say “This mf should stick to server 2.” You can’t do anything, really. You just cross your fingers and hope for the best.

When DNS Load Balancing Actually Makes Sense

Now, after this roasting session of DNS RR, let me just say that it IS terrible, but in very, very specific cases it isn’t. Give this blog a read.

1. Geographic Distribution

If you have servers in different geographic regions and you want to route users to the closest one, DNS can work great:

Users in US → 192.168.1.10 (US server)
Users in EU → 192.168.2.20 (EU server)
Users in Asia → 192.168.3.30 (Asia server)

This is called GeoDNS, and certain services do this automatically. The long DNS cache time actually helps here since you want users to stick to their regional server.

2. CDN/Edge Distribution

CDNs often use DNS based routing to direct users to the nearest edge server. This actually works decently well for serving static stuff.

What We Actually Need

After all this, we’ve learned what we really need from a load balancer:

Health Checking

Don’t send traffic to dead servers
Automatically detect failures
Remove servers that keep failing from rotation immediately
No waiting for DNS caches to expire

Session Awareness

Keep users connected to the same server (when needed)
Or better yet, share session state across all servers

Real Time Control

Add/remove servers instantly
Adjust traffic distribution on the fly
Respond to changing conditions in real time

Even Load Distributionl

Distribute actual load, not just DNS queries
Account for different server capacities
Consider active connections, not just user count

Intelligent Routing

Choose the best server based on current conditions
Consider server response times
Route based on request type or content

What we have: A DNS based round robin, selecting servers randomly.

DNS Round Robin

What we want: A dedicated load balancer, selecting servers intelligently.

Dedicated Load Balancer

The Realization

DNS RR is “technically” load balancing the same way you are “technically” a developer :p (mb). Sure, it distributes traffic across multiple servers, but it does it in the dumbest way possible with almost no control or intelligence.

What we really need is something that sits between clients and servers. Something that:

Clients always connect to (one stable address)
Actively monitors server health
Makes intelligent routing decisions in real-time
Can handle session persistence
Gives us full control over distribution algorithms

This is what a dedicated load balancer does.

The Architecture We’re Building Toward

Instead of this (DNS RR):

Client → DNS Server → [randomly picks] → Server 1/2/3
(no intelligence, no control, no health checking)

Architecture from a client’s POV

We want this:

Client → Load Balancer → [intelligently picks] → Server 1/2/3
         (health checking, session persistence, control)

Actual architecture

The load balancer becomes a reverse proxy, a single entry point that clients connect to, which then forwards requests to the appropriate backend server based on [insert our smart routing logic here, will talk more in depth about it later].

So…?

We’ve reached the point of actually needing a real load balancer very intuitively. I love this type of learning (or teaching), makes things really stick IMO. Hopefully it was good enough for you :)

In part 1, we realised the need of having something that can direct users to different servers. So, we implemented just that in this part, but now we realise that we don’t just need something that can redirect users, we need something *intelligent that can redirect users. Now in the next part begins the main series. I could’ve just started from there, but there’s no point in doing that, 100s of blogs out there already do it this way and I really hate it. No intuition, no meaning, nothin.

Feel free to let me know if something could’ve been done better, I’d love to hear your feedback. You can DM me on X / Twitter and let me know right there.

See you :)

The Simplest Possible Load Balancer - DNS Round Robin

DNS

Here’s What Happens When You Visit a Website

DNS Round Robin

Rotates the Order?

Setting Up DNS Round Robin

Walking Through Real Requests

First User: Sydney

Second User: Sweeney

Third User: Sanchit

Sydney Again (4 Minutes Later)

The Problems Start to Show

Problem #1: DNS Caching Ruins Everything

Problem #2: What Happens When a Server Dies?

Problem #3: No Session Awareness

Problem #4: Uneven Request Distribution

Problem #5: No Control Over the Algorithm

When DNS Load Balancing Actually Makes Sense

1. Geographic Distribution

2. CDN/Edge Distribution

What We Actually Need

Health Checking

Session Awareness

Real Time Control

Even Load Distributionl

Intelligent Routing

The Realization

The Architecture We’re Building Toward

So…?

Enjoyed this post?