The Only Guide You'd Ever Need for Load Balancers - 6
Load Balancing Algorithms - Beyond Round Robin
Welcome back. If you’re coming from part 5, our load balancer now has health checking. Dead servers get kicked out, recovering servers get added back. Cool.
But here’s the thing…we’re still using Round Robin. And it’s kind of dumb. It treats every server exactly the same and every request exactly the same. In the real world, that’s rarely true.
In this part, we’re going to explore a whole zoo of load balancing algorithms.
Why Round Robin Isn’t Always Optimal
Let me show you two scenarios where Round Robin falls apart.
Problem 1: Server Heterogeneity
You have three servers, but they’re not equal:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Server 1: Beefy boy │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CPU: 32 cores │ │
│ │ RAM: 128 GB │ │
│ │ Can handle: ~3000 requests/second │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Server 2: Medium machine │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CPU: 8 cores │ │
│ │ RAM: 32 GB │ │
│ │ Can handle: ~1000 requests/second │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Server 3: Potato │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CPU: 2 cores │ │
│ │ RAM: 4 GB │ │
│ │ Can handle: ~200 requests/second │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
With Round Robin:
Total capacity: 3000 + 1000 + 200 = 4200 req/s
Round Robin distribution:
Server 1: 33.3% of traffic = 1400 req/s (yawning, at 46% capacity)
Server 2: 33.3% of traffic = 1400 req/s (OVERLOADED, at 140% capacity)
Server 3: 33.3% of traffic = 1400 req/s (DEAD, at 700% capacity)
Result: We're wasting Server 1's capacity while killing Server 3
Round Robin assumes all servers are equal. When they’re not, you get unbalanced load.
Problem 2: Request Heterogeneity
Even if your servers are identical, requests aren’t:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Request types hitting your wingman dating app: │
│ │
│ GET /health │
│ └── Processing time: 1ms │
│ └── Just returns "OK" │
│ │
│ GET /profile/123 │
│ └── Processing time: 50ms │
│ └── Database query, cache check │
│ │
│ POST /match/calculate │
│ └── Processing time: 2000ms │
│ └── ML model inference, heavy computation │
│ │
│ GET /feed/generate │
│ └── Processing time: 500ms │
│ └── Multiple DB queries, sorting, filtering │
│ │
└─────────────────────────────────────────────────────────────────┘
With Round Robin:
Request 1: GET /health → Server 1 (1ms, done instantly)
Request 2: POST /match/calculate → Server 2 (2000ms, still processing...)
Request 3: GET /health → Server 3 (1ms, done instantly)
Request 4: GET /feed/generate → Server 1 (500ms, processing...)
Request 5: POST /match/calculate → Server 2 (still busy with request 2!)
Request 6: GET /profile/123 → Server 3 (50ms, done)
Server 2 is now backed up with heavy requests
while Server 3 is twiddling its thumbs
RR counts requests, not actual load. A server handling one heavy request looks the same as a server handling one lightweight request.
Weighted Round Robin
The first evolution of Round Robin. Simple idea: give each server a weight based on its capacity.
The Concept
Server 1: Weight 5 (can handle 5x the load)
Server 2: Weight 2 (can handle 2x the load)
Server 3: Weight 1 (baseline)
Distribution over 8 requests:
Request 1 → Server 1
Request 2 → Server 1
Request 3 → Server 1
Request 4 → Server 1
Request 5 → Server 1
Request 6 → Server 2
Request 7 → Server 2
Request 8 → Server 3
(then repeat)
Server 1 gets 5/8 = 62.5% of traffic
Server 2 gets 2/8 = 25% of traffic
Server 3 gets 1/8 = 12.5% of traffic
Much better. Now our beefy server gets more traffic.
The Naive Implementation
The simplest approach: repeat servers in the list based on their weight.
Original: [Server1(w=5), Server2(w=2), Server3(w=1)]
Expanded: [S1, S1, S1, S1, S1, S2, S2, S3]
Then just do regular Round Robin on the expanded list.
But this has a problem. The distribution is “bursty”:
Requests over time:
S1 → S1 → S1 → S1 → S1 → S2 → S2 → S3 → S1 → S1 → ...
Server 1 gets 5 requests in a row, then nothing.
Server 3 gets 1 request, then waits for 7 requests.
This is uneven and can cause micro-bursts.
Smooth Weighted Round Robin (Nginx’s Approach)
Nginx uses a clever algorithm to distribute requests more evenly. Let me explain.
Each server has:
- weight: the configured weight (static)
- currentWeight: changes with each selection (dynamic)
- effectiveWeight: usually equals weight, but can decrease on failures
The algorithm:
- Add
weightto each server’scurrentWeight - Pick the server with the highest
currentWeight - Subtract
totalWeightfrom the selected server’scurrentWeight
Let me walk through an example:
Servers: A(w=5), B(w=2), C(w=1)
Total weight = 8
Round 1:
Before: A.current=0, B.current=0, C.current=0
Add weights: A.current=5, B.current=2, C.current=1
Highest: A (5)
Select A, subtract total: A.current = 5 - 8 = -3
After: A.current=-3, B.current=2, C.current=1
Round 2:
Add weights: A.current=2, B.current=4, C.current=2
Highest: B (4)
Select B, subtract total: B.current = 4 - 8 = -4
After: A.current=2, B.current=-4, C.current=2
Round 3:
Add weights: A.current=7, B.current=-2, C.current=3
Highest: A (7)
Select A, subtract total: A.current = 7 - 8 = -1
After: A.current=-1, B.current=-2, C.current=3
Round 4:
Add weights: A.current=4, B.current=0, C.current=4
Highest: A or C (tie at 4, pick A)
Select A, subtract total: A.current = 4 - 8 = -4
After: A.current=-4, B.current=0, C.current=4
Round 5:
Add weights: A.current=1, B.current=2, C.current=5
Highest: C (5)
Select C, subtract total: C.current = 5 - 8 = -3
After: A.current=1, B.current=2, C.current=-3
...and so on
The sequence over 8 requests: A, B, A, A, C, A, B, A
Naive approach: A → A → A → A → A → B → B → C
Smooth approach: A → B → A → A → C → A → B → A
Much more evenly now :)
Implementation
type WeightedBackend struct {
Backend *Backend
Weight int
CurrentWeight int
}
type WeightedRoundRobin struct {
backends []*WeightedBackend
totalWeight int
mux sync.Mutex
}
func NewWeightedRoundRobin() *WeightedRoundRobin {
return &WeightedRoundRobin{
backends: make([]*WeightedBackend, 0),
}
}
func (wrr *WeightedRoundRobin) AddBackend(backend *Backend, weight int) {
wrr.mux.Lock()
defer wrr.mux.Unlock()
wb := &WeightedBackend{
Backend: backend,
Weight: weight,
CurrentWeight: 0,
}
wrr.backends = append(wrr.backends, wb)
wrr.totalWeight += weight
}
func (wrr *WeightedRoundRobin) GetNextBackend() *Backend {
wrr.mux.Lock()
defer wrr.mux.Unlock()
if len(wrr.backends) == 0 {
return nil
}
var best *WeightedBackend
for _, wb := range wrr.backends {
if !wb.Backend.IsAlive() {
continue
}
// add weight to current weight
wb.CurrentWeight += wb.Weight
// track the best (highest current weight)
if best == nil || wb.CurrentWeight > best.CurrentWeight {
best = wb
}
}
if best == nil {
return nil
}
best.CurrentWeight -= wrr.totalWeight
return best.Backend
}
Least Connections
Now let’s tackle the request heterogeneity problem. Instead of blindly rotating, let’s send traffic to the server with the fewest active connections.
The Concept

This naturally handles request heterogeneity:
- Servers handling slow requests accumulate connections
- Servers handling fast requests finish quickly and free up connections
- New requests go to less loaded servers
Tracking Active Connections
We need to track when connections start and end:
Request comes in:
1. Pick server with least connections
2. Increment that server's connection count
3. Forward request
Request completes:
4. Decrement that server's connection count
Timeline:
0ms: Request 1 → Server 1 (now has 1 conn)
Request 2 → Server 2 (now has 1 conn)
Request 3 → Server 3 (now has 1 conn)
10ms: Request 4 arrives
Server 1: 1 conn, Server 2: 1 conn, Server 3: 1 conn
(tie, pick any) → Server 1 (now has 2 conns)
20ms: Request 2 completes
Server 2: 0 conns now
30ms: Request 5 arrives
Server 1: 2 conns, Server 2: 0 conns, Server 3: 1 conn
Pick Server 2! (has 0 conns)
Implementation
First, update the Backend struct to track connections:
type Backend struct {
Host string
Port int
Alive bool
Connections int64
mux sync.RWMutex
}
func (b *Backend) IncrementConnections() {
atomic.AddInt64(&b.Connections, 1)
}
func (b *Backend) DecrementConnections() {
atomic.AddInt64(&b.Connections, -1)
}
func (b *Backend) GetConnections() int64 {
return atomic.LoadInt64(&b.Connections)
}
I’m using atomic operations here because connection counts change frequently and from multiple goroutines. Atomic operations are faster than mutex locks for simple counters.
Now the algorithm:
type LeastConnections struct {
backends []*Backend
mux sync.RWMutex
}
func NewLeastConnections() *LeastConnections {
return &LeastConnections{
backends: make([]*Backend, 0),
}
}
func (lc *LeastConnections) AddBackend(backend *Backend) {
lc.mux.Lock()
defer lc.mux.Unlock()
lc.backends = append(lc.backends, backend)
}
func (lc *LeastConnections) GetNextBackend() *Backend {
lc.mux.RLock()
defer lc.mux.RUnlock()
var best *Backend
var minConns int64 = -1
for _, b := range lc.backends {
if !b.IsAlive() {
continue
}
conns := b.GetConnections()
if best == nil || conns < minConns {
best = b
minConns = conns
}
}
return best
}
And update the connection handling in the load balancer:
func (lb *LoadBalancer) handleConnection(clientConn net.Conn) {
defer clientConn.Close()
backend := lb.algorithm.GetNextBackend()
if backend == nil {
clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
return
}
// INCREMENT connection\
backend.IncrementConnections()
defer backend.DecrementConnections() // DECREMENT when done
backendConn, err := net.Dial("tcp", backend.Address())
if err != nil {
clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
return
}
defer backendConn.Close()
lb.forwardTraffic(clientConn, backendConn)
}
defer backend.DecrementConnections() ensures we decrement even if something goes wrong.
Weighted Least Connections
What if you have both problems? Different server capacities AND request heterogeneity?
Combine Weighted Round Robin and Least Connections!
The Concept
Instead of picking the server with the fewest connections, pick the server with the fewest connections relative to its weight.
Formula: score = connections / weight
Lower score = better choice
Example:
Server 1: 10 connections, weight 5 → score = 10/5 = 2.0
Server 2: 4 connections, weight 2 → score = 4/2 = 2.0
Server 3: 3 connections, weight 1 → score = 3/1 = 3.0
Server 1 and 2 are tied at 2.0, both are good choices.
Server 3 has a higher score, meaning it's relatively more loaded.

Implementation
type WeightedLeastConnections struct {
backends []*WeightedBackend
mux sync.RWMutex
}
func NewWeightedLeastConnections() *WeightedLeastConnections {
return &WeightedLeastConnections{
backends: make([]*WeightedBackend, 0),
}
}
func (wlc *WeightedLeastConnections) AddBackend(backend *Backend, weight int) {
wlc.mux.Lock()
defer wlc.mux.Unlock()
wb := &WeightedBackend{
Backend: backend,
Weight: weight,
}
wlc.backends = append(wlc.backends, wb)
}
func (wlc *WeightedLeastConnections) GetNextBackend() *Backend {
wlc.mux.RLock()
defer wlc.mux.RUnlock()
var best *WeightedBackend
var bestScore float64 = -1
for _, wb := range wlc.backends {
if !wb.Backend.IsAlive() {
continue
}
// calculate score: connections / weight
// lower is better
conns := float64(wb.Backend.GetConnections())
score := conns / float64(wb.Weight)
if best == nil || score < bestScore {
best = wb
bestScore = score
}
}
if best == nil {
return nil
}
return best.Backend
}
Random Selection
Yes, random. I know it sounds dumb. But bear with me.
Why Random Actually Works
With a large number of servers (say, 100+), random selection gives you pretty even distribution due to the law of large numbers.
100 servers, 1,000,000 requests, random selection:
Expected: each server gets ~10,000 requests
Actual: each server gets 9,800 - 10,200 requests
Standard deviation is small relative to the mean.
Good enough for many use cases!
Pros:
- Dead simple to implement
- O(1) selection time (no scanning through servers)
- No shared state (no mutex needed for selection)
- Works well with many servers
Cons:
- Not great with few servers (variance is high)
- Doesn’t consider server capacity or current load
Implementation
type RandomSelection struct {
backends []*Backend
mux sync.RWMutex
}
func NewRandomSelection() *RandomSelection {
return &RandomSelection{
backends: make([]*Backend, 0),
}
}
func (rs *RandomSelection) AddBackend(backend *Backend) {
rs.mux.Lock()
defer rs.mux.Unlock()
rs.backends = append(rs.backends, backend)
}
func (rs *RandomSelection) GetNextBackend() *Backend {
rs.mux.RLock()
defer rs.mux.RUnlock()
var healthy []*Backend
for _, b := range rs.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
if len(healthy) == 0 {
return nil
}
return healthy[rand.Intn(len(healthy))]
}
Power of Two Choices
Now here’s where it gets interesting. Power of Two Choices (P2C) is a genius algorithm that gives you most of the benefits of Least Connections with much less overhead.
The Problem with Least Connections
To pick the least-loaded server, you need to check ALL servers:
10 servers: Check 10 connection counts
100 servers: Check 100 connection counts
1000 servers: Check 1000 connection counts
O(n) per request. With high traffic, this adds up.
The Power of Two Choices Insight
Research showed that instead of checking ALL servers, you can:
- Pick TWO servers at random
- Choose the one with fewer connections
This gives you most of the benefit of Least Connections with O(1) complexity!

Why Does This Work?
Intuition: with pure random selection, you might pick a heavily loaded server. But with TWO random choices, you’re unlikely to pick TWO heavily loaded servers. At least one will probably be reasonably loaded.
If you’re curious for why this works, look up “The Power of Two Choices in Randomized Load Balancing”.
Expected max load comparison:
Pure Random: O(log n / log log n)
Power of Two: O(log log n)
That's an exponential improvement, so mf neat. I love this.
Implementation
type PowerOfTwoChoices struct {
backends []*Backend
mux sync.RWMutex
}
func NewPowerOfTwoChoices() *PowerOfTwoChoices {
return &PowerOfTwoChoices{
backends: make([]*Backend, 0),
}
}
func (p2c *PowerOfTwoChoices) AddBackend(backend *Backend) {
p2c.mux.Lock()
defer p2c.mux.Unlock()
p2c.backends = append(p2c.backends, backend)
}
func (p2c *PowerOfTwoChoices) GetNextBackend() *Backend {
p2c.mux.RLock()
defer p2c.mux.RUnlock()
var healthy []*Backend
for _, b := range p2c.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
n := len(healthy)
if n == 0 {
return nil
}
if n == 1 {
return healthy[0]
}
i := rand.Intn(n)
j := rand.Intn(n)
for j == i {
j = rand.Intn(n)
}
// return the one with fewer connections
if healthy[i].GetConnections() <= healthy[j].GetConnections() {
return healthy[i]
}
return healthy[j]
}
IP Hash
All the algorithms we’ve discussed so far have one thing in common: they don’t guarantee the same client goes to the same server. For stateless applications, that’s fine. For stateful applications (sessions, shopping carts), it’s a problem.
IP Hash solves this by hashing the client’s IP address to determine which server to use.
The Concept
Client IP: 192.168.1.100
Hash: hash("192.168.1.100") = 2847583
Server index: 2847583 % 3 = 1
→ Always goes to Server 1
Client IP: 10.0.0.55
Hash: hash("10.0.0.55") = 9384756
Server index: 9384756 % 3 = 0
→ Always goes to Server 0
The same IP always hashes to the same value, so it always goes to the same server. This is called session affinity or sticky sessions.
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Sydney's requests (IP: 192.168.1.100): │
│ │
│ Login → hash → Server 1 (creates session) │
│ Browse → hash → Server 1 (session exists) │
│ Checkout → hash → Server 1 (session still there) │
│ Payment → hash → Server 1 (session valid) │
│ │
│ All requests go to the same server. Session works :) │
│ │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Without IP Hash (Round Robin): │
│ │
│ Login → Server 1 (creates session) │
│ Browse → Server 2 (no session! "Please log in") │
│ Checkout → Server 3 (no session! "Please log in") │
│ Sydney: *loses her shit* │
│ │
└─────────────────────────────────────────────────────────────────┘
The Problem with Simple IP Hash
What happens when a server dies?
Before:
hash(IP) % 3 = 1 → Server 1
After Server 2 dies (now only 2 servers):
hash(IP) % 2 = 0 → Server 0 (DIFFERENT SERVER!)
Sydney's session is on Server 1, but now she's going to Server 0.
Session lost!
In fact, when a server is added or removed, almost ALL clients get remapped to different servers. This is catastrophic for sessions.
The solution? Consistent Hashing. But that’s a whole topic on its own, and I’ll cover it in depth later (probably in the next part). For now, let’s implement basic IP Hash.
Implementation
type IPHash struct {
backends []*Backend
mux sync.RWMutex
}
func NewIPHash() *IPHash {
return &IPHash{
backends: make([]*Backend, 0),
}
}
func (ih *IPHash) AddBackend(backend *Backend) {
ih.mux.Lock()
defer ih.mux.Unlock()
ih.backends = append(ih.backends, backend)
}
func (ih *IPHash) GetNextBackendForIP(clientIP string) *Backend {
ih.mux.RLock()
defer ih.mux.RUnlock()
var healthy []*Backend
for _, b := range ih.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
if len(healthy) == 0 {
return nil
}
// hash the client IP
hash := fnv.New32a()
hash.Write([]byte(clientIP))
hashValue := hash.Sum32()
index := int(hashValue) % len(healthy)
return healthy[index]
}
Note: This has the server removal problem I mentioned. We’ll fix it with consistent hashing.
Least Response Time
Instead of just counting connections, let’s consider how FAST each server is responding.
The Concept
Server 1: 5 connections, avg response time 50ms
Server 2: 3 connections, avg response time 200ms
Server 3: 8 connections, avg response time 30ms
Least Connections picks: Server 2 (3 connections)
But Server 2 is slow! Each request takes 200ms.
Least Response Time considers both:
Server 1: 5 conns × 50ms = 250 (score)
Server 2: 3 conns × 200ms = 600 (score)
Server 3: 8 conns × 30ms = 240 (score)
Picks: Server 3 (lowest score)
Even though it has the most connections, it's so fast that it's still the best choice!
Tracking Response Times
We need to measure how long each request takes:
type Backend struct {
Host string
Port int
Alive bool
Connections int64
ResponseTime int64
mux sync.RWMutex
}
For the average, we’ll use an Exponential Moving Average (EMA) instead of a simple average. EMA gives more weight to recent measurements, so the algorithm adapts to changing conditions.
Simple Average:
avg = (t1 + t2 + t3 + ... + tn) / n
Problem: Old measurements have the same weight as new ones.
If server slowed down recently, takes a long time to reflect.
Exponential Moving Average:
newAvg = α × newValue + (1 - α) × oldAvg
Where α (alpha) is typically 0.1 to 0.3
Recent measurements have more influence.
Adapts quickly to changes.
Implementation
type Backend struct {
Host string
Port int
Alive bool
Connections int64
ResponseTime int64
mux sync.RWMutex
}
const emaAlpha = 0.2
func (b *Backend) UpdateResponseTime(duration time.Duration) {
b.mux.Lock()
defer b.mux.Unlock()
newTime := duration.Microseconds()
oldTime := b.ResponseTime
if oldTime == 0 {
b.ResponseTime = newTime
} else {
b.ResponseTime = int64(float64(newTime)*emaAlpha + float64(oldTime)*(1-emaAlpha))
}
}
func (b *Backend) GetResponseTime() int64 {
b.mux.RLock()
defer b.mux.RUnlock()
return b.ResponseTime
}
type LeastResponseTime struct {
backends []*Backend
mux sync.RWMutex
}
func NewLeastResponseTime() *LeastResponseTime {
return &LeastResponseTime{
backends: make([]*Backend, 0),
}
}
func (lrt *LeastResponseTime) AddBackend(backend *Backend) {
lrt.mux.Lock()
defer lrt.mux.Unlock()
lrt.backends = append(lrt.backends, backend)
}
func (lrt *LeastResponseTime) GetNextBackend() *Backend {
lrt.mux.RLock()
defer lrt.mux.RUnlock()
var best *Backend
var bestScore int64 = -1
for _, b := range lrt.backends {
if !b.IsAlive() {
continue
}
// score = connections * responseTime
// lower is better
conns := b.GetConnections()
respTime := b.GetResponseTime()
if respTime == 0 {
respTime = 1
}
score := conns * respTime
if best == nil || score < bestScore {
best = b
bestScore = score
}
}
return best
}
And update the connection handler to measure response time:
func (lb *LoadBalancer) handleConnection(clientConn net.Conn) {
defer clientConn.Close()
backend := lb.algorithm.GetNextBackend()
if backend == nil {
clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
return
}
backend.IncrementConnections()
defer backend.DecrementConnections()
// START timing
startTime := time.Now()
backendConn, err := net.Dial("tcp", backend.Address())
if err != nil {
clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
return
}
defer backendConn.Close()
lb.forwardTraffic(clientConn, backendConn)
// END timing, update response time
duration := time.Since(startTime)
backend.UpdateResponseTime(duration)
}
Putting It All Together - The Algorithm Interface
Let’s create a clean interface so we can easily swap algorithms:
type Algorithm interface {
GetNextBackend() *Backend
AddBackend(backend *Backend)
}
type WeightedAlgorithm interface {
Algorithm
AddWeightedBackend(backend *Backend, weight int)
}
type AffinityAlgorithm interface {
GetNextBackendForIP(clientIP string) *Backend
AddBackend(backend *Backend)
}
Now our LoadBalancer can work with any algorithm:
type LoadBalancer struct {
host string
port int
algorithm Algorithm
serverPool *ServerPool
}
func NewLoadBalancer(host string, port int, algo Algorithm, pool *ServerPool) *LoadBalancer {
return &LoadBalancer{
host: host,
port: port,
algorithm: algo,
serverPool: pool,
}
}
Decision Flowchart
START
│
▼
┌───────────────────┐
│ Need session │
│ affinity? │
└───────────────────┘
│ │
Yes No
│ │
▼ ▼
┌───────────┐ ┌───────────────────┐
│ IP Hash │ │ Servers have │
│ or │ │ different │
│ Consistent│ │ capacities? │
│ Hashing │ └───────────────────┘
└───────────┘ │ │
Yes No
│ │
▼ ▼
┌──────────────┐ ┌───────────────────┐
│ Request │ │ Request │
│ complexity │ │ complexity │
│ varies? │ │ varies? │
└──────────────┘ └───────────────────┘
│ │ │ │
Yes No Yes No
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────┐ ┌──────┐ ┌──────────┐
│ Weighted │ │Weight│ │Least │ │ Round │
│ Least │ │ RR │ │Conns │ │ Robin │
│ Conns │ │ │ │or P2C│ │ │
└──────────┘ └──────┘ └──────┘ └──────────┘
Testing the Algorithms
Let’s create a test that shows how different algorithms behave under load.
Create main.go with algorithm selection:
package main
import (
"flag"
"fmt"
"hash/fnv"
"io"
"log"
"math/rand"
"net"
"sync"
"sync/atomic"
"time"
)
// include all the Backend, ServerPool, Algorithm implementations from above!!!
func main() {
algoName := flag.String("algo", "roundrobin", "Algorithm: roundrobin, weighted, leastconn, p2c, random")
flag.Parse()
pool := NewServerPool()
b1 := pool.AddBackend("127.0.0.1", 8081)
b2 := pool.AddBackend("127.0.0.1", 8082)
b3 := pool.AddBackend("127.0.0.1", 8083)
var algo Algorithm
switch *algoName {
case "roundrobin":
rr := NewRoundRobin()
rr.AddBackend(b1)
rr.AddBackend(b2)
rr.AddBackend(b3)
algo = rr
log.Println("[LB] Using Round Robin algorithm")
case "weighted":
wrr := NewWeightedRoundRobin()
wrr.AddBackend(b1, 5)
wrr.AddBackend(b2, 2)
wrr.AddBackend(b3, 1)
algo = wrr
log.Println("[LB] Using Weighted Round Robin algorithm (5:2:1)")
case "leastconn":
lc := NewLeastConnections()
lc.AddBackend(b1)
lc.AddBackend(b2)
lc.AddBackend(b3)
algo = lc
log.Println("[LB] Using Least Connections algorithm")
case "p2c":
p2c := NewPowerOfTwoChoices()
p2c.AddBackend(b1)
p2c.AddBackend(b2)
p2c.AddBackend(b3)
algo = p2c
log.Println("[LB] Using Power of Two Choices algorithm")
case "random":
rs := NewRandomSelection()
rs.AddBackend(b1)
rs.AddBackend(b2)
rs.AddBackend(b3)
algo = rs
log.Println("[LB] Using Random algorithm")
default:
log.Fatalf("Unknown algorithm: %s", *algoName)
}
healthChecker := NewHealthChecker(pool)
go healthChecker.Start()
lb := NewLoadBalancer("0.0.0.0", 8080, algo, pool)
if err := lb.Start(); err != nil {
log.Fatalf("Load balancer failed: %v", err)
}
}
Testing Weighted Round Robin
# terminals 1-3: start backend servers
go run backend/server.go 8081
go run backend/server.go 8082
go run backend/server.go 8083
# terminal 4: start load balancer with weighted RR
go run main.go -algo weighted
# terminal 5: send requests and count distribution
for i in {1..80}; do curl -s http://localhost:8080 | grep "Backend Server"; done | sort | uniq -c
Expected output (approx):
50 <h1>Backend Server 8081</h1>
20 <h1>Backend Server 8082</h1>
10 <h1>Backend Server 8083</h1>
Roughly 5:2:1 ratio.
Testing Least Connections
To see Least Connections in action, we need requests with different processing times. Update the backend server to simulate slow requests:
http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
// simulate slow processing
time.Sleep(2 * time.Second)
fmt.Fprintf(w, "Server %d: slow response\n", port)
})
http.HandleFunc("/fast", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Server %d: fast response\n", port)
})
Then test:
# start with least connections
go run main.go -algo leastconn
watch -n 0.5 'curl -s http://localhost:8080/status'
# send some slow requests...they'll pile up on servers)
for i in {1..10}; do curl http://localhost:8080/slow & done
# send fast requests...they should avoid the busy servers
for i in {1..20}; do curl -s http://localhost:8080/fast; done
Performance Considerations
Different algorithms have different overhead:
┌─────────────────┬────────────────┬─────────────────────────────┐
│ Algorithm │ Time │ Notes │
│ │ Complexity │ │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Round Robin │ O(1)* │ Just increment index │
│ │ │ *O(n) if skipping unhealthy │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Weighted RR │ O(n) │ Check all for highest weight│
├─────────────────┼────────────────┼─────────────────────────────┤
│ Random │ O(n)* │ O(n) to filter healthy, │
│ │ │ O(1) to pick │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Least Conns │ O(n) │ Check all connection counts │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Power of 2 │ O(n)* │ O(n) to filter healthy, │
│ │ │ O(1) to pick 2 and compare │
├─────────────────┼────────────────┼─────────────────────────────┤
│ IP Hash │ O(n)* │ O(1) hash, O(n) filter │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Least Resp Time │ O(n) │ Check all response times │
└─────────────────┴────────────────┴─────────────────────────────┘
* Can be optimized with maintained healthy list
For most applications with <100 servers, these differences don’t matter. If you have thousands of servers, consider:
- Maintaining a separate list of healthy backends (avoid filtering every request)
- Using Power of Two Choices instead of Least Connections
- Sharding the backend pool
Complete Implementation
Here’s a complete, runnable example with all algorithms:
package main
import (
"flag"
"fmt"
"hash/fnv"
"io"
"log"
"math/rand"
"net"
"net/http"
"sync"
"sync/atomic"
"time"
)
// ============== backend ==============
type Backend struct {
Host string
Port int
alive int32
connections int64
responseTime int64
}
func NewBackend(host string, port int) *Backend {
return &Backend{
Host: host,
Port: port,
alive: 1,
}
}
func (b *Backend) Address() string {
return fmt.Sprintf("%s:%d", b.Host, b.Port)
}
func (b *Backend) IsAlive() bool {
return atomic.LoadInt32(&b.alive) == 1
}
func (b *Backend) SetAlive(alive bool) {
var v int32 = 0
if alive {
v = 1
}
atomic.StoreInt32(&b.alive, v)
}
func (b *Backend) GetConnections() int64 {
return atomic.LoadInt64(&b.connections)
}
func (b *Backend) IncrementConnections() {
atomic.AddInt64(&b.connections, 1)
}
func (b *Backend) DecrementConnections() {
atomic.AddInt64(&b.connections, -1)
}
func (b *Backend) GetResponseTime() int64 {
return atomic.LoadInt64(&b.responseTime)
}
func (b *Backend) UpdateResponseTime(d time.Duration) {
newTime := d.Microseconds()
oldTime := atomic.LoadInt64(&b.responseTime)
if oldTime == 0 {
atomic.StoreInt64(&b.responseTime, newTime)
} else {
// EMA with alpha = 0.2
ema := int64(float64(newTime)*0.2 + float64(oldTime)*0.8)
atomic.StoreInt64(&b.responseTime, ema)
}
}
// ============== server pool ==============
type ServerPool struct {
backends []*Backend
mux sync.RWMutex
}
func NewServerPool() *ServerPool {
return &ServerPool{
backends: make([]*Backend, 0),
}
}
func (p *ServerPool) AddBackend(host string, port int) *Backend {
p.mux.Lock()
defer p.mux.Unlock()
backend := NewBackend(host, port)
p.backends = append(p.backends, backend)
log.Printf("[POOL] Added server: %s", backend.Address())
return backend
}
func (p *ServerPool) GetAllBackends() []*Backend {
p.mux.RLock()
defer p.mux.RUnlock()
result := make([]*Backend, len(p.backends))
copy(result, p.backends)
return result
}
func (p *ServerPool) GetHealthyBackends() []*Backend {
p.mux.RLock()
defer p.mux.RUnlock()
var healthy []*Backend
for _, b := range p.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
return healthy
}
type Algorithm interface {
Next() *Backend
}
// ============== RR ==============
type RoundRobin struct {
backends []*Backend
current uint64
}
func NewRoundRobin(backends []*Backend) *RoundRobin {
return &RoundRobin{backends: backends}
}
func (rr *RoundRobin) Next() *Backend {
if len(rr.backends) == 0 {
return nil
}
n := len(rr.backends)
for i := 0; i < n; i++ {
idx := atomic.AddUint64(&rr.current, 1) - 1
backend := rr.backends[idx%uint64(n)]
if backend.IsAlive() {
return backend
}
}
return nil
}
// ============== WEIGHTED RR (WRR) ==============
type WeightedBackend struct {
Backend *Backend
Weight int
currentWeight int
}
type WeightedRoundRobin struct {
backends []*WeightedBackend
totalWeight int
mux sync.Mutex
}
func NewWeightedRoundRobin() *WeightedRoundRobin {
return &WeightedRoundRobin{
backends: make([]*WeightedBackend, 0),
}
}
func (wrr *WeightedRoundRobin) AddBackend(backend *Backend, weight int) {
wrr.backends = append(wrr.backends, &WeightedBackend{
Backend: backend,
Weight: weight,
})
wrr.totalWeight += weight
}
func (wrr *WeightedRoundRobin) Next() *Backend {
wrr.mux.Lock()
defer wrr.mux.Unlock()
if len(wrr.backends) == 0 {
return nil
}
var best *WeightedBackend
for _, wb := range wrr.backends {
if !wb.Backend.IsAlive() {
continue
}
wb.currentWeight += wb.Weight
if best == nil || wb.currentWeight > best.currentWeight {
best = wb
}
}
if best == nil {
return nil
}
best.currentWeight -= wrr.totalWeight
return best.Backend
}
// ============== LEAST CONNECTIONS ==============
type LeastConnections struct {
backends []*Backend
}
func NewLeastConnections(backends []*Backend) *LeastConnections {
return &LeastConnections{backends: backends}
}
func (lc *LeastConnections) Next() *Backend {
var best *Backend
var minConns int64 = -1
for _, b := range lc.backends {
if !b.IsAlive() {
continue
}
conns := b.GetConnections()
if best == nil || conns < minConns {
best = b
minConns = conns
}
}
return best
}
// ============== POWER OF 2 CHOICES ==============
type PowerOfTwoChoices struct {
backends []*Backend
}
func NewPowerOfTwoChoices(backends []*Backend) *PowerOfTwoChoices {
return &PowerOfTwoChoices{backends: backends}
}
func (p2c *PowerOfTwoChoices) Next() *Backend {
var healthy []*Backend
for _, b := range p2c.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
n := len(healthy)
if n == 0 {
return nil
}
if n == 1 {
return healthy[0]
}
i := rand.Intn(n)
j := rand.Intn(n)
for j == i {
j = rand.Intn(n)
}
if healthy[i].GetConnections() <= healthy[j].GetConnections() {
return healthy[i]
}
return healthy[j]
}
// ============== RANDOM ==============
type RandomSelection struct {
backends []*Backend
}
func NewRandomSelection(backends []*Backend) *RandomSelection {
return &RandomSelection{backends: backends}
}
func (rs *RandomSelection) Next() *Backend {
var healthy []*Backend
for _, b := range rs.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
if len(healthy) == 0 {
return nil
}
return healthy[rand.Intn(len(healthy))]
}
// ============== IP HASH ==============
type IPHash struct {
backends []*Backend
}
func NewIPHash(backends []*Backend) *IPHash {
return &IPHash{backends: backends}
}
func (ih *IPHash) NextForIP(ip string) *Backend {
var healthy []*Backend
for _, b := range ih.backends {
if b.IsAlive() {
healthy = append(healthy, b)
}
}
if len(healthy) == 0 {
return nil
}
hash := fnv.New32a()
hash.Write([]byte(ip))
idx := int(hash.Sum32()) % len(healthy)
return healthy[idx]
}
// ============== LEAST RESPONSE TIME ==============
type LeastResponseTime struct {
backends []*Backend
}
func NewLeastResponseTime(backends []*Backend) *LeastResponseTime {
return &LeastResponseTime{backends: backends}
}
func (lrt *LeastResponseTime) Next() *Backend {
var best *Backend
var bestScore int64 = -1
for _, b := range lrt.backends {
if !b.IsAlive() {
continue
}
conns := b.GetConnections()
respTime := b.GetResponseTime()
if respTime == 0 {
respTime = 1
}
score := conns * respTime
if best == nil || score < bestScore {
best = b
bestScore = score
}
}
return best
}
// ============== HEALTH CHECKER ==============
type HealthChecker struct {
pool *ServerPool
interval time.Duration
timeout time.Duration
failThreshold int
riseThreshold int
failCounts map[string]int
successCounts map[string]int
mux sync.Mutex
}
func NewHealthChecker(pool *ServerPool) *HealthChecker {
return &HealthChecker{
pool: pool,
interval: 5 * time.Second,
timeout: 3 * time.Second,
failThreshold: 3,
riseThreshold: 2,
failCounts: make(map[string]int),
successCounts: make(map[string]int),
}
}
func (hc *HealthChecker) check(b *Backend) bool {
client := &http.Client{Timeout: hc.timeout}
resp, err := client.Get(fmt.Sprintf("http://%s/health", b.Address()))
if err != nil {
return false
}
defer resp.Body.Close()
return resp.StatusCode >= 200 && resp.StatusCode < 300
}
func (hc *HealthChecker) process(b *Backend, healthy bool) {
hc.mux.Lock()
defer hc.mux.Unlock()
key := b.Address()
wasAlive := b.IsAlive()
if healthy {
hc.failCounts[key] = 0
hc.successCounts[key]++
if !wasAlive && hc.successCounts[key] >= hc.riseThreshold {
b.SetAlive(true)
log.Printf("[HEALTH] %s is now HEALTHY", key)
hc.successCounts[key] = 0
}
} else {
hc.successCounts[key] = 0
hc.failCounts[key]++
if wasAlive && hc.failCounts[key] >= hc.failThreshold {
b.SetAlive(false)
log.Printf("[HEALTH] %s is now UNHEALTHY", key)
hc.failCounts[key] = 0
}
}
}
func (hc *HealthChecker) Start() {
log.Printf("[HEALTH] Starting health checker")
ticker := time.NewTicker(hc.interval)
hc.checkAll()
for range ticker.C {
hc.checkAll()
}
}
func (hc *HealthChecker) checkAll() {
var wg sync.WaitGroup
for _, b := range hc.pool.GetAllBackends() {
wg.Add(1)
go func(backend *Backend) {
defer wg.Done()
hc.process(backend, hc.check(backend))
}(b)
}
wg.Wait()
}
// ============== LOAD BALANCER ==============
type LoadBalancer struct {
host string
port int
algorithm Algorithm
}
func NewLoadBalancer(host string, port int, algo Algorithm) *LoadBalancer {
return &LoadBalancer{
host: host,
port: port,
algorithm: algo,
}
}
func (lb *LoadBalancer) Start() error {
addr := fmt.Sprintf("%s:%d", lb.host, lb.port)
listener, err := net.Listen("tcp", addr)
if err != nil {
return err
}
defer listener.Close()
log.Printf("[LB] Started on %s", addr)
for {
conn, err := listener.Accept()
if err != nil {
log.Printf("[LB] Accept error: %v", err)
continue
}
go lb.handle(conn)
}
}
func (lb *LoadBalancer) handle(client net.Conn) {
defer client.Close()
backend := lb.algorithm.Next()
if backend == nil {
client.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
return
}
backend.IncrementConnections()
defer backend.DecrementConnections()
start := time.Now()
server, err := net.Dial("tcp", backend.Address())
if err != nil {
client.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
return
}
defer server.Close()
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
io.Copy(server, client)
}()
go func() {
defer wg.Done()
io.Copy(client, server)
}()
wg.Wait()
backend.UpdateResponseTime(time.Since(start))
}
func main() {
algo := flag.String("algo", "roundrobin", "roundrobin|weighted|leastconn|p2c|random|lrt")
flag.Parse()
rand.Seed(time.Now().UnixNano())
pool := NewServerPool()
b1 := pool.AddBackend("127.0.0.1", 8081)
b2 := pool.AddBackend("127.0.0.1", 8082)
b3 := pool.AddBackend("127.0.0.1", 8083)
backends := []*Backend{b1, b2, b3}
var algorithm Algorithm
switch *algo {
case "roundrobin":
algorithm = NewRoundRobin(backends)
log.Println("[LB] Algorithm: Round Robin")
case "weighted":
wrr := NewWeightedRoundRobin()
wrr.AddBackend(b1, 5)
wrr.AddBackend(b2, 2)
wrr.AddBackend(b3, 1)
algorithm = wrr
log.Println("[LB] Algorithm: Weighted Round Robin (5:2:1)")
case "leastconn":
algorithm = NewLeastConnections(backends)
log.Println("[LB] Algorithm: Least Connections")
case "p2c":
algorithm = NewPowerOfTwoChoices(backends)
log.Println("[LB] Algorithm: Power of Two Choices")
case "random":
algorithm = NewRandomSelection(backends)
log.Println("[LB] Algorithm: Random")
case "lrt":
algorithm = NewLeastResponseTime(backends)
log.Println("[LB] Algorithm: Least Response Time")
default:
log.Fatalf("Unknown algorithm: %s", *algo)
}
hc := NewHealthChecker(pool)
go hc.Start()
lb := NewLoadBalancer("0.0.0.0", 8080, algorithm)
if err := lb.Start(); err != nil {
log.Fatal(err)
}
}
Recap
We covered A LOOTTT in this part. Let’s summarize:
The Problem:
- Round Robin treats all servers and requests equally
- Real world: servers have different capacities, requests have different costs
The Solutions:
| Algorithm | Solves | Complexity |
|---|---|---|
| Weighted RR | Different server capacities | O(n) |
| Least Connections | Varying request duration | O(n) |
| Weighted LC | Both problems above | O(n) |
| Random | Need simplicity at scale | O(1)* |
| Power of 2 Choices | Scale + connection awareness | O(1)* |
| IP Hash | Session affinity | O(1)* |
| Least Response Time | Performance-sensitive apps | O(n) |
*After filtering healthy backends
When to Use What:
- Start with Round Robin (simple, predictable)
- Add weights if server capacities differ
- Switch to Least Connections if request durations vary
- Use Power of Two Choices at scale
- Use IP Hash when you need sticky sessions
What Now?
We’ve got a pretty decent load balancer now. Multiple algos, health checking, connection tracking. But we still have a problem with IP Hash: when a server goes down, ALL sessions get messed up.
In the next part, I’m diving deep into Session Persistence.
As always, hit me up on X / Twitter if you have questions, found bugs, or want to discuss load balancing algorithms. I love talking about this stuff.
See you in part 7 <3