· 27 min read

The Only Guide You'd Ever Need for Load Balancers - 6

Load Balancing Algorithms - Beyond Round Robin

Welcome back. If you’re coming from part 5, our load balancer now has health checking. Dead servers get kicked out, recovering servers get added back. Cool.

But here’s the thing…we’re still using Round Robin. And it’s kind of dumb. It treats every server exactly the same and every request exactly the same. In the real world, that’s rarely true.

In this part, we’re going to explore a whole zoo of load balancing algorithms.


Why Round Robin Isn’t Always Optimal

Let me show you two scenarios where Round Robin falls apart.

Problem 1: Server Heterogeneity

You have three servers, but they’re not equal:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Server 1: Beefy boy                                           │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  CPU: 32 cores                                          │   │
│   │  RAM: 128 GB                                            │   │
│   │  Can handle: ~3000 requests/second                      │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│   Server 2: Medium machine                                      │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  CPU: 8 cores                                           │   │
│   │  RAM: 32 GB                                             │   │
│   │  Can handle: ~1000 requests/second                      │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│   Server 3: Potato                                              │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  CPU: 2 cores                                           │   │
│   │  RAM: 4 GB                                              │   │
│   │  Can handle: ~200 requests/second                       │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

With Round Robin:

Total capacity: 3000 + 1000 + 200 = 4200 req/s

Round Robin distribution:
  Server 1: 33.3% of traffic = 1400 req/s (yawning, at 46% capacity)
  Server 2: 33.3% of traffic = 1400 req/s (OVERLOADED, at 140% capacity)
  Server 3: 33.3% of traffic = 1400 req/s (DEAD, at 700% capacity)

Result: We're wasting Server 1's capacity while killing Server 3

Round Robin assumes all servers are equal. When they’re not, you get unbalanced load.

Problem 2: Request Heterogeneity

Even if your servers are identical, requests aren’t:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Request types hitting your wingman dating app:                │
│                                                                 │
│   GET /health                                                   │
│   └── Processing time: 1ms                                      │
│   └── Just returns "OK"                                         │
│                                                                 │
│   GET /profile/123                                              │
│   └── Processing time: 50ms                                     │
│   └── Database query, cache check                               │
│                                                                 │
│   POST /match/calculate                                         │
│   └── Processing time: 2000ms                                   │
│   └── ML model inference, heavy computation                     │
│                                                                 │
│   GET /feed/generate                                            │
│   └── Processing time: 500ms                                    │
│   └── Multiple DB queries, sorting, filtering                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

With Round Robin:

Request 1: GET /health → Server 1 (1ms, done instantly)
Request 2: POST /match/calculate → Server 2 (2000ms, still processing...)
Request 3: GET /health → Server 3 (1ms, done instantly)
Request 4: GET /feed/generate → Server 1 (500ms, processing...)
Request 5: POST /match/calculate → Server 2 (still busy with request 2!)
Request 6: GET /profile/123 → Server 3 (50ms, done)

Server 2 is now backed up with heavy requests
while Server 3 is twiddling its thumbs

RR counts requests, not actual load. A server handling one heavy request looks the same as a server handling one lightweight request.


Weighted Round Robin

The first evolution of Round Robin. Simple idea: give each server a weight based on its capacity.

The Concept

Server 1: Weight 5 (can handle 5x the load)
Server 2: Weight 2 (can handle 2x the load)
Server 3: Weight 1 (baseline)

Distribution over 8 requests:
Request 1 → Server 1
Request 2 → Server 1
Request 3 → Server 1
Request 4 → Server 1
Request 5 → Server 1
Request 6 → Server 2
Request 7 → Server 2
Request 8 → Server 3
(then repeat)

Server 1 gets 5/8 = 62.5% of traffic
Server 2 gets 2/8 = 25% of traffic
Server 3 gets 1/8 = 12.5% of traffic

Much better. Now our beefy server gets more traffic.

The Naive Implementation

The simplest approach: repeat servers in the list based on their weight.

Original: [Server1(w=5), Server2(w=2), Server3(w=1)]

Expanded: [S1, S1, S1, S1, S1, S2, S2, S3]

Then just do regular Round Robin on the expanded list.

But this has a problem. The distribution is “bursty”:

Requests over time:
S1 → S1 → S1 → S1 → S1 → S2 → S2 → S3 → S1 → S1 → ...

Server 1 gets 5 requests in a row, then nothing.
Server 3 gets 1 request, then waits for 7 requests.
This is uneven and can cause micro-bursts.

Smooth Weighted Round Robin (Nginx’s Approach)

Nginx uses a clever algorithm to distribute requests more evenly. Let me explain.

Each server has:

  • weight: the configured weight (static)
  • currentWeight: changes with each selection (dynamic)
  • effectiveWeight: usually equals weight, but can decrease on failures

The algorithm:

  1. Add weight to each server’s currentWeight
  2. Pick the server with the highest currentWeight
  3. Subtract totalWeight from the selected server’s currentWeight

Let me walk through an example:

Servers: A(w=5), B(w=2), C(w=1)
Total weight = 8

Round 1:
  Before: A.current=0, B.current=0, C.current=0
  Add weights: A.current=5, B.current=2, C.current=1
  Highest: A (5)
  Select A, subtract total: A.current = 5 - 8 = -3
  After: A.current=-3, B.current=2, C.current=1

Round 2:
  Add weights: A.current=2, B.current=4, C.current=2
  Highest: B (4)
  Select B, subtract total: B.current = 4 - 8 = -4
  After: A.current=2, B.current=-4, C.current=2

Round 3:
  Add weights: A.current=7, B.current=-2, C.current=3
  Highest: A (7)
  Select A, subtract total: A.current = 7 - 8 = -1
  After: A.current=-1, B.current=-2, C.current=3

Round 4:
  Add weights: A.current=4, B.current=0, C.current=4
  Highest: A or C (tie at 4, pick A)
  Select A, subtract total: A.current = 4 - 8 = -4
  After: A.current=-4, B.current=0, C.current=4

Round 5:
  Add weights: A.current=1, B.current=2, C.current=5
  Highest: C (5)
  Select C, subtract total: C.current = 5 - 8 = -3
  After: A.current=1, B.current=2, C.current=-3

...and so on

The sequence over 8 requests: A, B, A, A, C, A, B, A

Naive approach:     A → A → A → A → A → B → B → C
Smooth approach:    A → B → A → A → C → A → B → A

Much more evenly now :)

Implementation

type WeightedBackend struct {
    Backend       *Backend
    Weight        int
    CurrentWeight int
}

type WeightedRoundRobin struct {
    backends    []*WeightedBackend
    totalWeight int
    mux         sync.Mutex
}

func NewWeightedRoundRobin() *WeightedRoundRobin {
    return &WeightedRoundRobin{
        backends: make([]*WeightedBackend, 0),
    }
}

func (wrr *WeightedRoundRobin) AddBackend(backend *Backend, weight int) {
    wrr.mux.Lock()
    defer wrr.mux.Unlock()

    wb := &WeightedBackend{
        Backend:       backend,
        Weight:        weight,
        CurrentWeight: 0,
    }
    wrr.backends = append(wrr.backends, wb)
    wrr.totalWeight += weight
}

func (wrr *WeightedRoundRobin) GetNextBackend() *Backend {
    wrr.mux.Lock()
    defer wrr.mux.Unlock()

    if len(wrr.backends) == 0 {
        return nil
    }

    var best *WeightedBackend

    for _, wb := range wrr.backends {
        if !wb.Backend.IsAlive() {
            continue
        }

        // add weight to current weight
        wb.CurrentWeight += wb.Weight

        // track the best (highest current weight)
        if best == nil || wb.CurrentWeight > best.CurrentWeight {
            best = wb
        }
    }

    if best == nil {
        return nil
    }

    best.CurrentWeight -= wrr.totalWeight

    return best.Backend
}

Least Connections

Now let’s tackle the request heterogeneity problem. Instead of blindly rotating, let’s send traffic to the server with the fewest active connections.

The Concept

Concept of least connections

This naturally handles request heterogeneity:

  • Servers handling slow requests accumulate connections
  • Servers handling fast requests finish quickly and free up connections
  • New requests go to less loaded servers

Tracking Active Connections

We need to track when connections start and end:

Request comes in:
  1. Pick server with least connections
  2. Increment that server's connection count
  3. Forward request

Request completes:
  4. Decrement that server's connection count
Timeline:

0ms:   Request 1 → Server 1 (now has 1 conn)
       Request 2 → Server 2 (now has 1 conn)
       Request 3 → Server 3 (now has 1 conn)

10ms:  Request 4 arrives
       Server 1: 1 conn, Server 2: 1 conn, Server 3: 1 conn
       (tie, pick any) → Server 1 (now has 2 conns)

20ms:  Request 2 completes
       Server 2: 0 conns now

30ms:  Request 5 arrives
       Server 1: 2 conns, Server 2: 0 conns, Server 3: 1 conn
       Pick Server 2! (has 0 conns)

Implementation

First, update the Backend struct to track connections:

type Backend struct {
    Host        string
    Port        int
    Alive       bool
    Connections int64
    mux         sync.RWMutex
}

func (b *Backend) IncrementConnections() {
    atomic.AddInt64(&b.Connections, 1)
}

func (b *Backend) DecrementConnections() {
    atomic.AddInt64(&b.Connections, -1)
}

func (b *Backend) GetConnections() int64 {
    return atomic.LoadInt64(&b.Connections)
}

I’m using atomic operations here because connection counts change frequently and from multiple goroutines. Atomic operations are faster than mutex locks for simple counters.

Now the algorithm:

type LeastConnections struct {
    backends []*Backend
    mux      sync.RWMutex
}

func NewLeastConnections() *LeastConnections {
    return &LeastConnections{
        backends: make([]*Backend, 0),
    }
}

func (lc *LeastConnections) AddBackend(backend *Backend) {
    lc.mux.Lock()
    defer lc.mux.Unlock()
    lc.backends = append(lc.backends, backend)
}

func (lc *LeastConnections) GetNextBackend() *Backend {
    lc.mux.RLock()
    defer lc.mux.RUnlock()

    var best *Backend
    var minConns int64 = -1

    for _, b := range lc.backends {
        if !b.IsAlive() {
            continue
        }

        conns := b.GetConnections()

        if best == nil || conns < minConns {
            best = b
            minConns = conns
        }
    }

    return best
}

And update the connection handling in the load balancer:

func (lb *LoadBalancer) handleConnection(clientConn net.Conn) {
    defer clientConn.Close()

    backend := lb.algorithm.GetNextBackend()
    if backend == nil {
        clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
        return
    }

    // INCREMENT connection\
    backend.IncrementConnections()
    defer backend.DecrementConnections() // DECREMENT when done

    backendConn, err := net.Dial("tcp", backend.Address())
    if err != nil {
        clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
        return
    }
    defer backendConn.Close()

    lb.forwardTraffic(clientConn, backendConn)
}

defer backend.DecrementConnections() ensures we decrement even if something goes wrong.


Weighted Least Connections

What if you have both problems? Different server capacities AND request heterogeneity?

Combine Weighted Round Robin and Least Connections!

The Concept

Instead of picking the server with the fewest connections, pick the server with the fewest connections relative to its weight.

Formula: score = connections / weight

Lower score = better choice

Example:
  Server 1: 10 connections, weight 5 → score = 10/5 = 2.0
  Server 2: 4 connections, weight 2  → score = 4/2 = 2.0
  Server 3: 3 connections, weight 1  → score = 3/1 = 3.0

Server 1 and 2 are tied at 2.0, both are good choices.
Server 3 has a higher score, meaning it's relatively more loaded.

Concept of least connections

Implementation

type WeightedLeastConnections struct {
    backends []*WeightedBackend
    mux      sync.RWMutex
}

func NewWeightedLeastConnections() *WeightedLeastConnections {
    return &WeightedLeastConnections{
        backends: make([]*WeightedBackend, 0),
    }
}

func (wlc *WeightedLeastConnections) AddBackend(backend *Backend, weight int) {
    wlc.mux.Lock()
    defer wlc.mux.Unlock()

    wb := &WeightedBackend{
        Backend: backend,
        Weight:  weight,
    }
    wlc.backends = append(wlc.backends, wb)
}

func (wlc *WeightedLeastConnections) GetNextBackend() *Backend {
    wlc.mux.RLock()
    defer wlc.mux.RUnlock()

    var best *WeightedBackend
    var bestScore float64 = -1

    for _, wb := range wlc.backends {
        if !wb.Backend.IsAlive() {
            continue
        }

        // calculate score: connections / weight
        // lower is better
        conns := float64(wb.Backend.GetConnections())
        score := conns / float64(wb.Weight)

        if best == nil || score < bestScore {
            best = wb
            bestScore = score
        }
    }

    if best == nil {
        return nil
    }

    return best.Backend
}

Random Selection

Yes, random. I know it sounds dumb. But bear with me.

Why Random Actually Works

With a large number of servers (say, 100+), random selection gives you pretty even distribution due to the law of large numbers.

100 servers, 1,000,000 requests, random selection:

Expected: each server gets ~10,000 requests
Actual: each server gets 9,800 - 10,200 requests

Standard deviation is small relative to the mean.
Good enough for many use cases!

Pros:

  • Dead simple to implement
  • O(1) selection time (no scanning through servers)
  • No shared state (no mutex needed for selection)
  • Works well with many servers

Cons:

  • Not great with few servers (variance is high)
  • Doesn’t consider server capacity or current load

Implementation

type RandomSelection struct {
    backends []*Backend
    mux      sync.RWMutex
}

func NewRandomSelection() *RandomSelection {
    return &RandomSelection{
        backends: make([]*Backend, 0),
    }
}

func (rs *RandomSelection) AddBackend(backend *Backend) {
    rs.mux.Lock()
    defer rs.mux.Unlock()
    rs.backends = append(rs.backends, backend)
}

func (rs *RandomSelection) GetNextBackend() *Backend {
    rs.mux.RLock()
    defer rs.mux.RUnlock()

    var healthy []*Backend
    for _, b := range rs.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }

    if len(healthy) == 0 {
        return nil
    }

    return healthy[rand.Intn(len(healthy))]
}

Power of Two Choices

Now here’s where it gets interesting. Power of Two Choices (P2C) is a genius algorithm that gives you most of the benefits of Least Connections with much less overhead.

The Problem with Least Connections

To pick the least-loaded server, you need to check ALL servers:

10 servers: Check 10 connection counts
100 servers: Check 100 connection counts
1000 servers: Check 1000 connection counts

O(n) per request. With high traffic, this adds up.

The Power of Two Choices Insight

Research showed that instead of checking ALL servers, you can:

  1. Pick TWO servers at random
  2. Choose the one with fewer connections

This gives you most of the benefit of Least Connections with O(1) complexity!

The power of 2 choices

Why Does This Work?

Intuition: with pure random selection, you might pick a heavily loaded server. But with TWO random choices, you’re unlikely to pick TWO heavily loaded servers. At least one will probably be reasonably loaded.

If you’re curious for why this works, look up “The Power of Two Choices in Randomized Load Balancing”.

Expected max load comparison:

Pure Random:        O(log n / log log n)
Power of Two:       O(log log n)

That's an exponential improvement, so mf neat. I love this.

Implementation

type PowerOfTwoChoices struct {
    backends []*Backend
    mux      sync.RWMutex
}

func NewPowerOfTwoChoices() *PowerOfTwoChoices {
    return &PowerOfTwoChoices{
        backends: make([]*Backend, 0),
    }
}

func (p2c *PowerOfTwoChoices) AddBackend(backend *Backend) {
    p2c.mux.Lock()
    defer p2c.mux.Unlock()
    p2c.backends = append(p2c.backends, backend)
}

func (p2c *PowerOfTwoChoices) GetNextBackend() *Backend {
    p2c.mux.RLock()
    defer p2c.mux.RUnlock()

    var healthy []*Backend
    for _, b := range p2c.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }

    n := len(healthy)
    if n == 0 {
        return nil
    }
    if n == 1 {
        return healthy[0]
    }

    i := rand.Intn(n)
    j := rand.Intn(n)

    for j == i {
        j = rand.Intn(n)
    }

    // return the one with fewer connections
    if healthy[i].GetConnections() <= healthy[j].GetConnections() {
        return healthy[i]
    }
    return healthy[j]
}

IP Hash

All the algorithms we’ve discussed so far have one thing in common: they don’t guarantee the same client goes to the same server. For stateless applications, that’s fine. For stateful applications (sessions, shopping carts), it’s a problem.

IP Hash solves this by hashing the client’s IP address to determine which server to use.

The Concept

Client IP: 192.168.1.100
Hash: hash("192.168.1.100") = 2847583
Server index: 2847583 % 3 = 1
→ Always goes to Server 1

Client IP: 10.0.0.55
Hash: hash("10.0.0.55") = 9384756
Server index: 9384756 % 3 = 0
→ Always goes to Server 0

The same IP always hashes to the same value, so it always goes to the same server. This is called session affinity or sticky sessions.

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Sydney's requests (IP: 192.168.1.100):                        │
│                                                                 │
│   Login → hash → Server 1 (creates session)                     │
│   Browse → hash → Server 1 (session exists)                     │
│   Checkout → hash → Server 1 (session still there)              │
│   Payment → hash → Server 1 (session valid)                     │
│                                                                 │
│   All requests go to the same server. Session works :)          │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Without IP Hash (Round Robin):                                │
│                                                                 │
│   Login → Server 1 (creates session)                            │
│   Browse → Server 2 (no session! "Please log in")               │
│   Checkout → Server 3 (no session! "Please log in")             │
│   Sydney: *loses her shit*                                      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Problem with Simple IP Hash

What happens when a server dies?

Before:
  hash(IP) % 3 = 1 → Server 1

After Server 2 dies (now only 2 servers):
  hash(IP) % 2 = 0 → Server 0 (DIFFERENT SERVER!)

Sydney's session is on Server 1, but now she's going to Server 0.
Session lost!

In fact, when a server is added or removed, almost ALL clients get remapped to different servers. This is catastrophic for sessions.

The solution? Consistent Hashing. But that’s a whole topic on its own, and I’ll cover it in depth later (probably in the next part). For now, let’s implement basic IP Hash.

Implementation

type IPHash struct {
    backends []*Backend
    mux      sync.RWMutex
}

func NewIPHash() *IPHash {
    return &IPHash{
        backends: make([]*Backend, 0),
    }
}

func (ih *IPHash) AddBackend(backend *Backend) {
    ih.mux.Lock()
    defer ih.mux.Unlock()
    ih.backends = append(ih.backends, backend)
}

func (ih *IPHash) GetNextBackendForIP(clientIP string) *Backend {
    ih.mux.RLock()
    defer ih.mux.RUnlock()

    var healthy []*Backend
    for _, b := range ih.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }

    if len(healthy) == 0 {
        return nil
    }

    // hash the client IP
    hash := fnv.New32a()
    hash.Write([]byte(clientIP))
    hashValue := hash.Sum32()

    index := int(hashValue) % len(healthy)

    return healthy[index]
}

Note: This has the server removal problem I mentioned. We’ll fix it with consistent hashing.


Least Response Time

Instead of just counting connections, let’s consider how FAST each server is responding.

The Concept

Server 1: 5 connections, avg response time 50ms
Server 2: 3 connections, avg response time 200ms
Server 3: 8 connections, avg response time 30ms

Least Connections picks: Server 2 (3 connections)
But Server 2 is slow! Each request takes 200ms.

Least Response Time considers both:
  Server 1: 5 conns × 50ms = 250 (score)
  Server 2: 3 conns × 200ms = 600 (score)
  Server 3: 8 conns × 30ms = 240 (score)

Picks: Server 3 (lowest score)
Even though it has the most connections, it's so fast that it's still the best choice!

Tracking Response Times

We need to measure how long each request takes:

type Backend struct {
    Host         string
    Port         int
    Alive        bool
    Connections  int64
    ResponseTime int64
    mux          sync.RWMutex
}

For the average, we’ll use an Exponential Moving Average (EMA) instead of a simple average. EMA gives more weight to recent measurements, so the algorithm adapts to changing conditions.

Simple Average:
  avg = (t1 + t2 + t3 + ... + tn) / n
  Problem: Old measurements have the same weight as new ones.
           If server slowed down recently, takes a long time to reflect.

Exponential Moving Average:
  newAvg = α × newValue + (1 - α) × oldAvg
  Where α (alpha) is typically 0.1 to 0.3

  Recent measurements have more influence.
  Adapts quickly to changes.

Implementation

type Backend struct {
    Host         string
    Port         int
    Alive        bool
    Connections  int64
    ResponseTime int64
    mux          sync.RWMutex
}

const emaAlpha = 0.2

func (b *Backend) UpdateResponseTime(duration time.Duration) {
    b.mux.Lock()
    defer b.mux.Unlock()

    newTime := duration.Microseconds()
    oldTime := b.ResponseTime

    if oldTime == 0 {
        b.ResponseTime = newTime
    } else {
        b.ResponseTime = int64(float64(newTime)*emaAlpha + float64(oldTime)*(1-emaAlpha))
    }
}

func (b *Backend) GetResponseTime() int64 {
    b.mux.RLock()
    defer b.mux.RUnlock()
    return b.ResponseTime
}

type LeastResponseTime struct {
    backends []*Backend
    mux      sync.RWMutex
}

func NewLeastResponseTime() *LeastResponseTime {
    return &LeastResponseTime{
        backends: make([]*Backend, 0),
    }
}

func (lrt *LeastResponseTime) AddBackend(backend *Backend) {
    lrt.mux.Lock()
    defer lrt.mux.Unlock()
    lrt.backends = append(lrt.backends, backend)
}

func (lrt *LeastResponseTime) GetNextBackend() *Backend {
    lrt.mux.RLock()
    defer lrt.mux.RUnlock()

    var best *Backend
    var bestScore int64 = -1

    for _, b := range lrt.backends {
        if !b.IsAlive() {
            continue
        }

        // score = connections * responseTime
        // lower is better
        conns := b.GetConnections()
        respTime := b.GetResponseTime()

        if respTime == 0 {
            respTime = 1
        }

        score := conns * respTime

        if best == nil || score < bestScore {
            best = b
            bestScore = score
        }
    }

    return best
}

And update the connection handler to measure response time:

func (lb *LoadBalancer) handleConnection(clientConn net.Conn) {
    defer clientConn.Close()

    backend := lb.algorithm.GetNextBackend()
    if backend == nil {
        clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
        return
    }

    backend.IncrementConnections()
    defer backend.DecrementConnections()

    // START timing
    startTime := time.Now()

    backendConn, err := net.Dial("tcp", backend.Address())
    if err != nil {
        clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
        return
    }
    defer backendConn.Close()

    lb.forwardTraffic(clientConn, backendConn)

    // END timing, update response time
    duration := time.Since(startTime)
    backend.UpdateResponseTime(duration)
}

Putting It All Together - The Algorithm Interface

Let’s create a clean interface so we can easily swap algorithms:

type Algorithm interface {
    GetNextBackend() *Backend
    AddBackend(backend *Backend)
}

type WeightedAlgorithm interface {
    Algorithm
    AddWeightedBackend(backend *Backend, weight int)
}

type AffinityAlgorithm interface {
    GetNextBackendForIP(clientIP string) *Backend
    AddBackend(backend *Backend)
}

Now our LoadBalancer can work with any algorithm:

type LoadBalancer struct {
    host       string
    port       int
    algorithm  Algorithm
    serverPool *ServerPool
}

func NewLoadBalancer(host string, port int, algo Algorithm, pool *ServerPool) *LoadBalancer {
    return &LoadBalancer{
        host:       host,
        port:       port,
        algorithm:  algo,
        serverPool: pool,
    }
}

Decision Flowchart

                     START
              ┌───────────────────┐
              │ Need session      │
              │ affinity?         │
              └───────────────────┘
                 │             │
                Yes            No
                 │             │
                 ▼             ▼
           ┌───────────┐  ┌───────────────────┐
           │ IP Hash   │  │ Servers have      │
           │ or        │  │ different         │
           │ Consistent│  │ capacities?       │
           │ Hashing   │  └───────────────────┘
           └───────────┘     │           │
                            Yes          No
                             │           │
                             ▼           ▼
                   ┌──────────────┐ ┌───────────────────┐
                   │ Request      │ │ Request           │
                   │ complexity   │ │ complexity        │
                   │ varies?      │ │ varies?           │
                   └──────────────┘ └───────────────────┘
                     │         │       │           │
                    Yes        No     Yes          No
                     │         │       │           │
                     ▼         ▼       ▼           ▼
              ┌──────────┐ ┌──────┐ ┌──────┐ ┌──────────┐
              │ Weighted │ │Weight│ │Least │ │ Round    │
              │ Least    │ │ RR   │ │Conns │ │ Robin    │
              │ Conns    │ │      │ │or P2C│ │          │
              └──────────┘ └──────┘ └──────┘ └──────────┘

Testing the Algorithms

Let’s create a test that shows how different algorithms behave under load.

Create main.go with algorithm selection:

package main

import (
    "flag"
    "fmt"
    "hash/fnv"
    "io"
    "log"
    "math/rand"
    "net"
    "sync"
    "sync/atomic"
    "time"
)

// include all the Backend, ServerPool, Algorithm implementations from above!!!

func main() {
    algoName := flag.String("algo", "roundrobin", "Algorithm: roundrobin, weighted, leastconn, p2c, random")
    flag.Parse()

    pool := NewServerPool()

    b1 := pool.AddBackend("127.0.0.1", 8081)
    b2 := pool.AddBackend("127.0.0.1", 8082)
    b3 := pool.AddBackend("127.0.0.1", 8083)

    var algo Algorithm

    switch *algoName {
    case "roundrobin":
        rr := NewRoundRobin()
        rr.AddBackend(b1)
        rr.AddBackend(b2)
        rr.AddBackend(b3)
        algo = rr
        log.Println("[LB] Using Round Robin algorithm")

    case "weighted":
        wrr := NewWeightedRoundRobin()
        wrr.AddBackend(b1, 5)
        wrr.AddBackend(b2, 2)
        wrr.AddBackend(b3, 1)
        algo = wrr
        log.Println("[LB] Using Weighted Round Robin algorithm (5:2:1)")

    case "leastconn":
        lc := NewLeastConnections()
        lc.AddBackend(b1)
        lc.AddBackend(b2)
        lc.AddBackend(b3)
        algo = lc
        log.Println("[LB] Using Least Connections algorithm")

    case "p2c":
        p2c := NewPowerOfTwoChoices()
        p2c.AddBackend(b1)
        p2c.AddBackend(b2)
        p2c.AddBackend(b3)
        algo = p2c
        log.Println("[LB] Using Power of Two Choices algorithm")

    case "random":
        rs := NewRandomSelection()
        rs.AddBackend(b1)
        rs.AddBackend(b2)
        rs.AddBackend(b3)
        algo = rs
        log.Println("[LB] Using Random algorithm")

    default:
        log.Fatalf("Unknown algorithm: %s", *algoName)
    }

    healthChecker := NewHealthChecker(pool)
    go healthChecker.Start()

    lb := NewLoadBalancer("0.0.0.0", 8080, algo, pool)
    if err := lb.Start(); err != nil {
        log.Fatalf("Load balancer failed: %v", err)
    }
}

Testing Weighted Round Robin

# terminals 1-3: start backend servers
go run backend/server.go 8081
go run backend/server.go 8082
go run backend/server.go 8083

# terminal 4: start load balancer with weighted RR
go run main.go -algo weighted

# terminal 5: send requests and count distribution
for i in {1..80}; do curl -s http://localhost:8080 | grep "Backend Server"; done | sort | uniq -c

Expected output (approx):

     50     <h1>Backend Server 8081</h1>
     20     <h1>Backend Server 8082</h1>
     10     <h1>Backend Server 8083</h1>

Roughly 5:2:1 ratio.

Testing Least Connections

To see Least Connections in action, we need requests with different processing times. Update the backend server to simulate slow requests:

http.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
    // simulate slow processing
    time.Sleep(2 * time.Second)
    fmt.Fprintf(w, "Server %d: slow response\n", port)
})

http.HandleFunc("/fast", func(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Server %d: fast response\n", port)
})

Then test:

# start with least connections
go run main.go -algo leastconn

watch -n 0.5 'curl -s http://localhost:8080/status'

# send some slow requests...they'll pile up on servers)
for i in {1..10}; do curl http://localhost:8080/slow & done

# send fast requests...they should avoid the busy servers
for i in {1..20}; do curl -s http://localhost:8080/fast; done

Performance Considerations

Different algorithms have different overhead:

┌─────────────────┬────────────────┬─────────────────────────────┐
│ Algorithm       │ Time           │ Notes                       │
│                 │ Complexity     │                             │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Round Robin     │ O(1)*          │ Just increment index        │
│                 │                │ *O(n) if skipping unhealthy │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Weighted RR     │ O(n)           │ Check all for highest weight│
├─────────────────┼────────────────┼─────────────────────────────┤
│ Random          │ O(n)*          │ O(n) to filter healthy,     │
│                 │                │ O(1) to pick                │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Least Conns     │ O(n)           │ Check all connection counts │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Power of 2      │ O(n)*          │ O(n) to filter healthy,     │
│                 │                │ O(1) to pick 2 and compare  │
├─────────────────┼────────────────┼─────────────────────────────┤
│ IP Hash         │ O(n)*          │ O(1) hash, O(n) filter      │
├─────────────────┼────────────────┼─────────────────────────────┤
│ Least Resp Time │ O(n)           │ Check all response times    │
└─────────────────┴────────────────┴─────────────────────────────┘

* Can be optimized with maintained healthy list

For most applications with <100 servers, these differences don’t matter. If you have thousands of servers, consider:

  • Maintaining a separate list of healthy backends (avoid filtering every request)
  • Using Power of Two Choices instead of Least Connections
  • Sharding the backend pool

Complete Implementation

Here’s a complete, runnable example with all algorithms:

package main

import (
    "flag"
    "fmt"
    "hash/fnv"
    "io"
    "log"
    "math/rand"
    "net"
    "net/http"
    "sync"
    "sync/atomic"
    "time"
)

// ============== backend ==============

type Backend struct {
    Host         string
    Port         int
    alive        int32
    connections  int64
    responseTime int64
}

func NewBackend(host string, port int) *Backend {
    return &Backend{
        Host:  host,
        Port:  port,
        alive: 1,
    }
}

func (b *Backend) Address() string {
    return fmt.Sprintf("%s:%d", b.Host, b.Port)
}

func (b *Backend) IsAlive() bool {
    return atomic.LoadInt32(&b.alive) == 1
}

func (b *Backend) SetAlive(alive bool) {
    var v int32 = 0
    if alive {
        v = 1
    }
    atomic.StoreInt32(&b.alive, v)
}

func (b *Backend) GetConnections() int64 {
    return atomic.LoadInt64(&b.connections)
}

func (b *Backend) IncrementConnections() {
    atomic.AddInt64(&b.connections, 1)
}

func (b *Backend) DecrementConnections() {
    atomic.AddInt64(&b.connections, -1)
}

func (b *Backend) GetResponseTime() int64 {
    return atomic.LoadInt64(&b.responseTime)
}

func (b *Backend) UpdateResponseTime(d time.Duration) {
    newTime := d.Microseconds()
    oldTime := atomic.LoadInt64(&b.responseTime)

    if oldTime == 0 {
        atomic.StoreInt64(&b.responseTime, newTime)
    } else {
        // EMA with alpha = 0.2
        ema := int64(float64(newTime)*0.2 + float64(oldTime)*0.8)
        atomic.StoreInt64(&b.responseTime, ema)
    }
}

// ============== server pool ==============

type ServerPool struct {
    backends []*Backend
    mux      sync.RWMutex
}

func NewServerPool() *ServerPool {
    return &ServerPool{
        backends: make([]*Backend, 0),
    }
}

func (p *ServerPool) AddBackend(host string, port int) *Backend {
    p.mux.Lock()
    defer p.mux.Unlock()

    backend := NewBackend(host, port)
    p.backends = append(p.backends, backend)
    log.Printf("[POOL] Added server: %s", backend.Address())
    return backend
}

func (p *ServerPool) GetAllBackends() []*Backend {
    p.mux.RLock()
    defer p.mux.RUnlock()
    result := make([]*Backend, len(p.backends))
    copy(result, p.backends)
    return result
}

func (p *ServerPool) GetHealthyBackends() []*Backend {
    p.mux.RLock()
    defer p.mux.RUnlock()

    var healthy []*Backend
    for _, b := range p.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }
    return healthy
}

type Algorithm interface {
    Next() *Backend
}

// ============== RR ==============

type RoundRobin struct {
    backends []*Backend
    current  uint64
}

func NewRoundRobin(backends []*Backend) *RoundRobin {
    return &RoundRobin{backends: backends}
}

func (rr *RoundRobin) Next() *Backend {
    if len(rr.backends) == 0 {
        return nil
    }

    n := len(rr.backends)
    for i := 0; i < n; i++ {
        idx := atomic.AddUint64(&rr.current, 1) - 1
        backend := rr.backends[idx%uint64(n)]
        if backend.IsAlive() {
            return backend
        }
    }
    return nil
}

// ============== WEIGHTED RR (WRR) ==============

type WeightedBackend struct {
    Backend       *Backend
    Weight        int
    currentWeight int
}

type WeightedRoundRobin struct {
    backends    []*WeightedBackend
    totalWeight int
    mux         sync.Mutex
}

func NewWeightedRoundRobin() *WeightedRoundRobin {
    return &WeightedRoundRobin{
        backends: make([]*WeightedBackend, 0),
    }
}

func (wrr *WeightedRoundRobin) AddBackend(backend *Backend, weight int) {
    wrr.backends = append(wrr.backends, &WeightedBackend{
        Backend: backend,
        Weight:  weight,
    })
    wrr.totalWeight += weight
}

func (wrr *WeightedRoundRobin) Next() *Backend {
    wrr.mux.Lock()
    defer wrr.mux.Unlock()

    if len(wrr.backends) == 0 {
        return nil
    }

    var best *WeightedBackend

    for _, wb := range wrr.backends {
        if !wb.Backend.IsAlive() {
            continue
        }

        wb.currentWeight += wb.Weight

        if best == nil || wb.currentWeight > best.currentWeight {
            best = wb
        }
    }

    if best == nil {
        return nil
    }

    best.currentWeight -= wrr.totalWeight
    return best.Backend
}

// ============== LEAST CONNECTIONS ==============

type LeastConnections struct {
    backends []*Backend
}

func NewLeastConnections(backends []*Backend) *LeastConnections {
    return &LeastConnections{backends: backends}
}

func (lc *LeastConnections) Next() *Backend {
    var best *Backend
    var minConns int64 = -1

    for _, b := range lc.backends {
        if !b.IsAlive() {
            continue
        }

        conns := b.GetConnections()
        if best == nil || conns < minConns {
            best = b
            minConns = conns
        }
    }

    return best
}

// ============== POWER OF 2 CHOICES ==============

type PowerOfTwoChoices struct {
    backends []*Backend
}

func NewPowerOfTwoChoices(backends []*Backend) *PowerOfTwoChoices {
    return &PowerOfTwoChoices{backends: backends}
}

func (p2c *PowerOfTwoChoices) Next() *Backend {
    var healthy []*Backend
    for _, b := range p2c.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }

    n := len(healthy)
    if n == 0 {
        return nil
    }
    if n == 1 {
        return healthy[0]
    }

    i := rand.Intn(n)
    j := rand.Intn(n)
    for j == i {
        j = rand.Intn(n)
    }

    if healthy[i].GetConnections() <= healthy[j].GetConnections() {
        return healthy[i]
    }
    return healthy[j]
}

// ============== RANDOM ==============

type RandomSelection struct {
    backends []*Backend
}

func NewRandomSelection(backends []*Backend) *RandomSelection {
    return &RandomSelection{backends: backends}
}

func (rs *RandomSelection) Next() *Backend {
    var healthy []*Backend
    for _, b := range rs.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }

    if len(healthy) == 0 {
        return nil
    }

    return healthy[rand.Intn(len(healthy))]
}

// ============== IP HASH ==============

type IPHash struct {
    backends []*Backend
}

func NewIPHash(backends []*Backend) *IPHash {
    return &IPHash{backends: backends}
}

func (ih *IPHash) NextForIP(ip string) *Backend {
    var healthy []*Backend
    for _, b := range ih.backends {
        if b.IsAlive() {
            healthy = append(healthy, b)
        }
    }

    if len(healthy) == 0 {
        return nil
    }

    hash := fnv.New32a()
    hash.Write([]byte(ip))
    idx := int(hash.Sum32()) % len(healthy)

    return healthy[idx]
}

// ============== LEAST RESPONSE TIME ==============

type LeastResponseTime struct {
    backends []*Backend
}

func NewLeastResponseTime(backends []*Backend) *LeastResponseTime {
    return &LeastResponseTime{backends: backends}
}

func (lrt *LeastResponseTime) Next() *Backend {
    var best *Backend
    var bestScore int64 = -1

    for _, b := range lrt.backends {
        if !b.IsAlive() {
            continue
        }

        conns := b.GetConnections()
        respTime := b.GetResponseTime()
        if respTime == 0 {
            respTime = 1
        }

        score := conns * respTime

        if best == nil || score < bestScore {
            best = b
            bestScore = score
        }
    }

    return best
}

// ============== HEALTH CHECKER ==============

type HealthChecker struct {
    pool          *ServerPool
    interval      time.Duration
    timeout       time.Duration
    failThreshold int
    riseThreshold int
    failCounts    map[string]int
    successCounts map[string]int
    mux           sync.Mutex
}

func NewHealthChecker(pool *ServerPool) *HealthChecker {
    return &HealthChecker{
        pool:          pool,
        interval:      5 * time.Second,
        timeout:       3 * time.Second,
        failThreshold: 3,
        riseThreshold: 2,
        failCounts:    make(map[string]int),
        successCounts: make(map[string]int),
    }
}

func (hc *HealthChecker) check(b *Backend) bool {
    client := &http.Client{Timeout: hc.timeout}
    resp, err := client.Get(fmt.Sprintf("http://%s/health", b.Address()))
    if err != nil {
        return false
    }
    defer resp.Body.Close()
    return resp.StatusCode >= 200 && resp.StatusCode < 300
}

func (hc *HealthChecker) process(b *Backend, healthy bool) {
    hc.mux.Lock()
    defer hc.mux.Unlock()

    key := b.Address()
    wasAlive := b.IsAlive()

    if healthy {
        hc.failCounts[key] = 0
        hc.successCounts[key]++
        if !wasAlive && hc.successCounts[key] >= hc.riseThreshold {
            b.SetAlive(true)
            log.Printf("[HEALTH] %s is now HEALTHY", key)
            hc.successCounts[key] = 0
        }
    } else {
        hc.successCounts[key] = 0
        hc.failCounts[key]++
        if wasAlive && hc.failCounts[key] >= hc.failThreshold {
            b.SetAlive(false)
            log.Printf("[HEALTH] %s is now UNHEALTHY", key)
            hc.failCounts[key] = 0
        }
    }
}

func (hc *HealthChecker) Start() {
    log.Printf("[HEALTH] Starting health checker")
    ticker := time.NewTicker(hc.interval)

    hc.checkAll()

    for range ticker.C {
        hc.checkAll()
    }
}

func (hc *HealthChecker) checkAll() {
    var wg sync.WaitGroup
    for _, b := range hc.pool.GetAllBackends() {
        wg.Add(1)
        go func(backend *Backend) {
            defer wg.Done()
            hc.process(backend, hc.check(backend))
        }(b)
    }
    wg.Wait()
}

// ============== LOAD BALANCER ==============

type LoadBalancer struct {
    host      string
    port      int
    algorithm Algorithm
}

func NewLoadBalancer(host string, port int, algo Algorithm) *LoadBalancer {
    return &LoadBalancer{
        host:      host,
        port:      port,
        algorithm: algo,
    }
}

func (lb *LoadBalancer) Start() error {
    addr := fmt.Sprintf("%s:%d", lb.host, lb.port)
    listener, err := net.Listen("tcp", addr)
    if err != nil {
        return err
    }
    defer listener.Close()

    log.Printf("[LB] Started on %s", addr)

    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Printf("[LB] Accept error: %v", err)
            continue
        }
        go lb.handle(conn)
    }
}

func (lb *LoadBalancer) handle(client net.Conn) {
    defer client.Close()

    backend := lb.algorithm.Next()
    if backend == nil {
        client.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
        return
    }

    backend.IncrementConnections()
    defer backend.DecrementConnections()

    start := time.Now()

    server, err := net.Dial("tcp", backend.Address())
    if err != nil {
        client.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
        return
    }
    defer server.Close()

    var wg sync.WaitGroup
    wg.Add(2)

    go func() {
        defer wg.Done()
        io.Copy(server, client)
    }()

    go func() {
        defer wg.Done()
        io.Copy(client, server)
    }()

    wg.Wait()

    backend.UpdateResponseTime(time.Since(start))
}

func main() {
    algo := flag.String("algo", "roundrobin", "roundrobin|weighted|leastconn|p2c|random|lrt")
    flag.Parse()

    rand.Seed(time.Now().UnixNano())

    pool := NewServerPool()
    b1 := pool.AddBackend("127.0.0.1", 8081)
    b2 := pool.AddBackend("127.0.0.1", 8082)
    b3 := pool.AddBackend("127.0.0.1", 8083)

    backends := []*Backend{b1, b2, b3}

    var algorithm Algorithm

    switch *algo {
    case "roundrobin":
        algorithm = NewRoundRobin(backends)
        log.Println("[LB] Algorithm: Round Robin")

    case "weighted":
        wrr := NewWeightedRoundRobin()
        wrr.AddBackend(b1, 5)
        wrr.AddBackend(b2, 2)
        wrr.AddBackend(b3, 1)
        algorithm = wrr
        log.Println("[LB] Algorithm: Weighted Round Robin (5:2:1)")

    case "leastconn":
        algorithm = NewLeastConnections(backends)
        log.Println("[LB] Algorithm: Least Connections")

    case "p2c":
        algorithm = NewPowerOfTwoChoices(backends)
        log.Println("[LB] Algorithm: Power of Two Choices")

    case "random":
        algorithm = NewRandomSelection(backends)
        log.Println("[LB] Algorithm: Random")

    case "lrt":
        algorithm = NewLeastResponseTime(backends)
        log.Println("[LB] Algorithm: Least Response Time")

    default:
        log.Fatalf("Unknown algorithm: %s", *algo)
    }

    hc := NewHealthChecker(pool)
    go hc.Start()

    lb := NewLoadBalancer("0.0.0.0", 8080, algorithm)
    if err := lb.Start(); err != nil {
        log.Fatal(err)
    }
}

Recap

We covered A LOOTTT in this part. Let’s summarize:

The Problem:

  • Round Robin treats all servers and requests equally
  • Real world: servers have different capacities, requests have different costs

The Solutions:

AlgorithmSolvesComplexity
Weighted RRDifferent server capacitiesO(n)
Least ConnectionsVarying request durationO(n)
Weighted LCBoth problems aboveO(n)
RandomNeed simplicity at scaleO(1)*
Power of 2 ChoicesScale + connection awarenessO(1)*
IP HashSession affinityO(1)*
Least Response TimePerformance-sensitive appsO(n)

*After filtering healthy backends

When to Use What:

  • Start with Round Robin (simple, predictable)
  • Add weights if server capacities differ
  • Switch to Least Connections if request durations vary
  • Use Power of Two Choices at scale
  • Use IP Hash when you need sticky sessions

What Now?

We’ve got a pretty decent load balancer now. Multiple algos, health checking, connection tracking. But we still have a problem with IP Hash: when a server goes down, ALL sessions get messed up.

In the next part, I’m diving deep into Session Persistence.


As always, hit me up on X / Twitter if you have questions, found bugs, or want to discuss load balancing algorithms. I love talking about this stuff.

See you in part 7 <3