· 23 min read

The Only Guide You'd Ever Need for Load Balancers - 7

Sticky Sessions

Welcome back. If you’re coming from part 6, you now have a load balancer with multiple algorithms: Round Robin (RR), Weighted RR, Least Connections, Power of Two Choices, IP Hash.

But remember I mentioned IP Hash has a flaw? When a server dies, EVERYONE gets rerouted to different servers. Their sessions? Their progress on that big 67-step form? All gone.

In this part, we’re going to solve the session persistence problem properly. We’ll explore different strategies, and then I’ll show you Consistent Hashing.


The Stateful Application Problem

Most apps are still stateful. They remember things about you.

What is Session State?

When Sydney logs into our wingman dating app, the server creates a “session” for her:

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Sydney's Session (stored on Server 1):                        │
│                                                                 │
│   {                                                             │
│     "session_id": "abc123xyz",                                  │
│     "user_id": 42,                                              │
│     "username": "sydney_looking_for_love",                      │
│     "logged_in_at": "2025-12-16T10:30:00Z",                     │
│     "shopping_cart": [                                          │
│       {"item": "Premium Membership", "price": 29.99},           │
│       {"item": "Profile Boost", "price": 9.99}                  │
│     ],                                                          │
│     "preferences": {                                            │
│       "age_range": "25-35",                                     │
│       "location": "New York"                                    │
│     }                                                           │
│   }                                                             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

This session lives on Server 1. Server 2 and Server 3 have no idea Sydney exists.

Why Random Distribution Breaks Sessions

With our RR or LC algorithms, Sydney’s requests go to different servers each time:

Sydney's journey through our load balancer:

Request 1: Login
├── Load Balancer picks Server 1
├── Server 1: "Welcome Sydney! Session created: abc123xyz"
├── Sydney: "cool, I'm in"
└── Session stored on Server 1

Request 2: Add item to cart
├── Load Balancer picks Server 2 (round robin, baby)
├── Server 2: "Who tf is Sydney? I don't have session abc123xyz"
├── Server 2: "Please log in"
├── Sydney: "But I just logged in??"
└── Session? What session?

Request 3: Try again
├── Load Balancer picks Server 3
├── Server 3: "I also don't know you"
├── Sydney: *quits*
└── We lost a customer

This is the classic stateful application problem. The session is on ONE server, but requests can go to ANY server.

Stateful problem with load balancers


Session Persistence Strategies

There are multiple ways to solve this. Let’s go through them from simplest to most sophisticated.


Strategy 1: Source IP Affinity (IP Hash)

We covered this in Part 6. The idea: hash the client’s IP address to always route them to the same server.

hash("192.168.1.100") % 3 = 1 → Always Server 1
hash("10.0.0.55") % 3 = 0    → Always Server 0

The NAT Problem

But there’s a huge problem we didn’t discuss. What happens when multiple users share the same public IP?

The problem with IP Hash caused by NAT

With IP Hash:

hash("203.0.113.50") % 3 = 1

ALL 500 employees go to Server 1.
Server 2 and Server 3 are bored.
Server 1 is on fire.

This is called the “NAT problem” and it makes IP-based affinity unreliable for session persistence.

Other IP Affinity Problems

  1. Mobile users: Their IP changes as they move between networks (WiFi → cellular → different WiFi)
  2. Proxy users: Corporate proxies, VPNs, and CDNs change the apparent client IP
  3. IPv6 privacy extensions: Some clients rotate their IPv6 address frequently

IP Hash is simple but brittle. Let’s look at something better.


Instead of relying on IP addresses, let’s use cookies. The load balancer inserts a cookie that identifies which backend server the client should stick to.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   First Request (no cookie):                                    │
│                                                                 │
│   1. Sydney → Load Balancer: "GET /login"                       │
│                                                                 │
│   2. Load Balancer: "No sticky cookie, pick server normally"    │
│      Round Robin says Server 2                                  │
│                                                                 │
│   3. Load Balancer → Server 2: "GET /login"                     │
│                                                                 │
│   4. Server 2 → Load Balancer: "200 OK, here's the page"        │
│                                                                 │
│   5. Load Balancer INSERTS cookie:                              │
│      Set-Cookie: SERVERID=server2; Path=/                       │
│                                                                 │
│   6. Load Balancer → Sydney: Response + sticky cookie           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Second Request (has cookie):                                  │
│                                                                 │
│   1. Sydney → Load Balancer: "GET /profile"                     │
│      Cookie: SERVERID=server2                                   │
│                                                                 │
│   2. Load Balancer: "sticky cookie points to server2"           │
│      Ignore normal algorithm, use Server 2                      │
│                                                                 │
│   3. Load Balancer → Server 2: "GET /profile"                   │
│                                                                 │
│   4. Server 2: "problem fixed"                                  │
│      Returns personalized response                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation

This requires Layer 7 (HTTP) awareness. Our current TCP proxy just forwards bytes and doesn’t understand HTTP. Let’s add HTTP parsing:

type CookieBasedPersistence struct {
    cookieName  string
    backends    map[string]*Backend
    backendList []*Backend
    algorithm   Algorithm
    mux         sync.RWMutex
}

func NewCookieBasedPersistence(cookieName string, algo Algorithm) *CookieBasedPersistence {
    return &CookieBasedPersistence{
        cookieName:  cookieName,
        backends:    make(map[string]*Backend),
        backendList: make([]*Backend, 0),
        algorithm:   algo,
    }
}

func (cbp *CookieBasedPersistence) AddBackend(name string, backend *Backend) {
    cbp.mux.Lock()
    defer cbp.mux.Unlock()
    cbp.backends[name] = backend
    cbp.backendList = append(cbp.backendList, backend)
}

func (cbp *CookieBasedPersistence) GetBackendFromCookie(cookieHeader string) *Backend {
    cbp.mux.RLock()
    defer cbp.mux.RUnlock()

    cookies := strings.Split(cookieHeader, ";")
    for _, cookie := range cookies {
        cookie = strings.TrimSpace(cookie)
        parts := strings.SplitN(cookie, "=", 2)
        if len(parts) == 2 && parts[0] == cbp.cookieName {
            serverName := parts[1]
            if backend, ok := cbp.backends[serverName]; ok {
                if backend.IsAlive() {
                    return backend
                }
            }
        }
    }
    return nil
}

func (cbp *CookieBasedPersistence) GetBackendName(backend *Backend) string {
    cbp.mux.RLock()
    defer cbp.mux.RUnlock()

    for name, b := range cbp.backends {
        if b == backend {
            return name
        }
    }
    return ""
}

Now we need an HTTP-aware connection handler:

func (lb *LoadBalancer) handleHTTPConnection(clientConn net.Conn) {
    defer clientConn.Close()

    reader := bufio.NewReader(clientConn)
    request, err := http.ReadRequest(reader)
    if err != nil {
        return
    }

    var backend *Backend
    cookieHeader := request.Header.Get("Cookie")

    if cbp, ok := lb.persistence.(*CookieBasedPersistence); ok && cookieHeader != "" {
        backend = cbp.GetBackendFromCookie(cookieHeader)
    }

    if backend == nil {
        backend = lb.algorithm.Next()
    }

    if backend == nil {
        clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
        return
    }

    backend.IncrementConnections()
    defer backend.DecrementConnections()

    backendConn, err := net.Dial("tcp", backend.Address())
    if err != nil {
        clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
        return
    }
    defer backendConn.Close()

    request.Write(backendConn)

    backendReader := bufio.NewReader(backendConn)
    response, err := http.ReadResponse(backendReader, request)
    if err != nil {
        return
    }

    if cbp, ok := lb.persistence.(*CookieBasedPersistence); ok {
        if request.Header.Get("Cookie") == "" || cbp.GetBackendFromCookie(cookieHeader) == nil {
            serverName := cbp.GetBackendName(backend)
            if serverName != "" {
                cookie := fmt.Sprintf("%s=%s; Path=/", cbp.cookieName, serverName)
                response.Header.Add("Set-Cookie", cookie)
            }
        }
    }

    response.Write(clientConn)
}

In production, you don’t want to expose your server names in cookies. Anyone can see SERVERID=server2 and know your infrastructure.

Better approaches:

Option 1: Encode the server identifier
  Instead of: SERVERID=server2
  Use: SERVERID=aGVsbG8gd29ybGQ= (base64)
  Or: SERVERID=5f4dcc3b5aa765d61d8327deb882cf99 (hash)

Option 2: Use signed cookies
  SERVERID=server2.signature
  Where signature = HMAC(server2, secret_key)
  Load balancer verifies signature before trusting

Option 3: Encrypted cookies
  SERVERID=encrypted_blob
  Only load balancer can decrypt

Strategy 3: Application-Level Session IDs

Instead of the load balancer managing persistence, let the application handle it. The application includes a session ID in requests, and the load balancer routes based on that.

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Request with session ID in header:                            │
│                                                                 │
│   GET /profile HTTP/1.1                                         │
│   Host: www.wingmandating.com                                   │
│   X-Session-ID: sess_abc123xyz                                  │
│                                                                 │
│   Load Balancer: hash("sess_abc123xyz") → Server 2              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Request with session ID in URL:                               │
│                                                                 │
│   GET /profile?session=sess_abc123xyz HTTP/1.1                  │
│   Host: www.wingmandating.com                                   │
│                                                                 │
│   Load Balancer: parse URL, extract session, hash → Server 2    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

This approach requires cooperation between application and load balancer. It’s more work but gives more control.


The REAL Problem

Here’s the thing. ALL of the above approaches (IP Hash, cookies, session IDs) share a fundamental problem:

What happens when a server is added or removed?

Let’s say you’re using IP Hash (or hashing session IDs, same principle):

BEFORE: 3 servers
hash(client) % 3 = server_index

Client A: hash → 7, 7 % 3 = 1 → Server 1
Client B: hash → 12, 12 % 3 = 0 → Server 0
Client C: hash → 5, 5 % 3 = 2 → Server 2
Client D: hash → 9, 9 % 3 = 0 → Server 0
Client E: hash → 4, 4 % 3 = 1 → Server 1

Now Server 2 dies:

AFTER: 2 servers
hash(client) % 2 = server_index

Client A: hash → 7, 7 % 2 = 1 → Server 1 (same!)
Client B: hash → 12, 12 % 2 = 0 → Server 0 (same!)
Client C: hash → 5, 5 % 2 = 1 → Server 1 (WAS Server 2, moved!)
Client D: hash → 9, 9 % 2 = 1 → Server 1 (WAS Server 0, moved!)
Client E: hash → 4, 4 % 2 = 0 → Server 0 (WAS Server 1, moved!)

3 out of 5 clients got remapped to different servers!

In general, with simple modulo hashing, when you change from N to N-1 servers, approximately (N-1)/N of all clients get remapped. For 100 servers, that’s 99% of clients losing their sessions!

This is horrible. We need something better.


Consistent Hashing

Consistent hashing is one of those algorithms that makes you say “why didn’t I think of that?” It minimizes the number of keys that need to be remapped when servers change.

The Core Concept

Instead of hash % N, imagine arranging all possible hash values on a circle (a “ring”):

Hash ring & its possible values

Now, here’s the key idea:

  1. Hash the servers and place them on the ring
  2. Hash the clients and place them on the ring
  3. Each client goes to the first server clockwise from its position

Our example’s server ring

┌─────────────────────────────────────────────────────────────────┐
│   Servers: S1, S2, S3 (placed by hashing their names)           │
│   Clients: A, B, C, D, E (placed by hashing their IPs/IDs)      │
│                                                                 │
│   Routing (go clockwise to find server):                        │
│   - Client A → S2 (next server clockwise)                       │
│   - Client B → S2                                               │
│   - Client C → S3                                               │
│   - Client D → S3                                               │
│   - Client E → S1                                               │
└─────────────────────────────────────────────────────────────────┘

What Happens When a Server Dies?

Let’s say S2 dies:

Our example’s server ring

┌─────────────────────────────────────────────────────────────────┐
│   After S2 dies:                                                │
│   - Client A → S3 (was S2, MOVED)                               │
│   - Client B → S3 (was S2, MOVED)                               │
│   - Client C → S3 (same!)                                       │
│   - Client D → S3 (same!)                                       │
│   - Client E → S1 (same!)                                       │
│                                                                 │
│   Only clients that were on S2 moved                            │
│   Clients on S1 and S3 are unaffected                           │
└─────────────────────────────────────────────────────────────────┘

Only the clients that were assigned to S2 need to move. Everyone else stays put. This is HUGE.

With N servers and K keys:

  • Simple modulo: ~K keys remapped when a server changes
  • Consistent hashing: ~K/N keys remapped when a server changes

For 100 servers and 1 million sessions, that’s the difference between remapping ~1 million sessions vs ~10,000 sessions.

The Virtual Nodes Problem

But wait, there’s still a problem. What if the servers aren’t evenly distributed on the ring? There could be cases where all three servers are clustered together, and say, S3 handles almost all traffic, while S1 and S2 barely get any traffic

Virtual Nodes to the Rescue

Instead of placing each server once on the ring, place it multiple times at different positions. These are called virtual nodes.

Our example’s server ring with virtual nodes

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Each server gets multiple positions (virtual nodes):          │
│                                                                 │
│   Server 1: hash("S1-0"), hash("S1-1"), hash("S1-2"), ...       │
│   Server 2: hash("S2-0"), hash("S2-1"), hash("S2-2"), ...       │
│   Server 3: hash("S3-0"), hash("S3-1"), hash("S3-2"), ...       │
│                                                                 │
│   Now load is evenly distributed                                │
│   Each server has multiple "claims" on the ring                 │
└─────────────────────────────────────────────────────────────────┘

With enough virtual nodes (even if it’s 100-200), the distribution becomes very even.


Implementing Consistent Hashing

Let’s write the code:

package main

import (
    "hash/crc32"
    "sort"
    "sync"
)

type ConsistentHash struct {
    circle       map[uint32]*Backend
    sortedHashes []uint32
    virtualNodes int
    backends     []*Backend
    mux          sync.RWMutex
}

func NewConsistentHash(virtualNodes int) *ConsistentHash {
    return &ConsistentHash{
        circle:       make(map[uint32]*Backend),
        sortedHashes: make([]uint32, 0),
        virtualNodes: virtualNodes,
        backends:     make([]*Backend, 0),
    }
}

func (ch *ConsistentHash) hashKey(key string) uint32 {
    return crc32.ChecksumIEEE([]byte(key))
}

func (ch *ConsistentHash) AddBackend(backend *Backend) {
    ch.mux.Lock()
    defer ch.mux.Unlock()

    ch.backends = append(ch.backends, backend)

    for i := 0; i < ch.virtualNodes; i++ {
        virtualKey := fmt.Sprintf("%s:%d:%d", backend.Host, backend.Port, i)
        hash := ch.hashKey(virtualKey)

        ch.circle[hash] = backend
        ch.sortedHashes = append(ch.sortedHashes, hash)
    }

    sort.Slice(ch.sortedHashes, func(i, j int) bool {
        return ch.sortedHashes[i] < ch.sortedHashes[j]
    })
}

func (ch *ConsistentHash) RemoveBackend(backend *Backend) {
    ch.mux.Lock()
    defer ch.mux.Unlock()

    for i, b := range ch.backends {
        if b == backend {
            ch.backends = append(ch.backends[:i], ch.backends[i+1:]...)
            break
        }
    }

    for i := 0; i < ch.virtualNodes; i++ {
        virtualKey := fmt.Sprintf("%s:%d:%d", backend.Host, backend.Port, i)
        hash := ch.hashKey(virtualKey)

        delete(ch.circle, hash)
    }

    ch.sortedHashes = make([]uint32, 0, len(ch.circle))
    for hash := range ch.circle {
        ch.sortedHashes = append(ch.sortedHashes, hash)
    }
    sort.Slice(ch.sortedHashes, func(i, j int) bool {
        return ch.sortedHashes[i] < ch.sortedHashes[j]
    })
}

func (ch *ConsistentHash) GetBackend(key string) *Backend {
    ch.mux.RLock()
    defer ch.mux.RUnlock()

    if len(ch.sortedHashes) == 0 {
        return nil
    }

    hash := ch.hashKey(key)

    idx := sort.Search(len(ch.sortedHashes), func(i int) bool {
        return ch.sortedHashes[i] >= hash
    })

    if idx >= len(ch.sortedHashes) {
        idx = 0
    }

    startIdx := idx
    for {
        backend := ch.circle[ch.sortedHashes[idx]]
        if backend.IsAlive() {
            return backend
        }

        idx = (idx + 1) % len(ch.sortedHashes)
        if idx == startIdx {
            return nil
        }
    }
}

func (ch *ConsistentHash) GetBackendForIP(clientIP string) *Backend {
    return ch.GetBackend(clientIP)
}

func (ch *ConsistentHash) GetBackendForSession(sessionID string) *Backend {
    return ch.GetBackend(sessionID)
}

Testing Consistent Hashing

Let’s verify it actually minimizes remapping:

func TestConsistentHashRemapping() {
    ch := NewConsistentHash(150)

    s1 := NewBackend("server1", 8081)
    s2 := NewBackend("server2", 8082)
    s3 := NewBackend("server3", 8083)

    ch.AddBackend(s1)
    ch.AddBackend(s2)
    ch.AddBackend(s3)

    clients := make([]string, 10000)
    for i := 0; i < 10000; i++ {
        clients[i] = fmt.Sprintf("client-%d", i)
    }

    initialMapping := make(map[string]*Backend)
    for _, client := range clients {
        initialMapping[client] = ch.GetBackend(client)
    }

    countBefore := make(map[*Backend]int)
    for _, backend := range initialMapping {
        countBefore[backend]++
    }
    fmt.Println("Before removal:")
    fmt.Printf("  Server 1: %d clients\n", countBefore[s1])
    fmt.Printf("  Server 2: %d clients\n", countBefore[s2])
    fmt.Printf("  Server 3: %d clients\n", countBefore[s3])

    ch.RemoveBackend(s2)

    remapped := 0
    for _, client := range clients {
        newBackend := ch.GetBackend(client)
        if initialMapping[client] != newBackend {
            remapped++
        }
    }

    fmt.Printf("\nAfter removing Server 2:\n")
    fmt.Printf("  Clients remapped: %d / %d (%.1f%%)\n",
        remapped, len(clients), float64(remapped)/float64(len(clients))*100)
}

Output:

Before removal:
  Server 1: 3342 clients
  Server 2: 3315 clients
  Server 3: 3343 clients

After removing Server 2:
  Clients remapped: 3315 / 10000 (33.2%)

Only the clients that were on Server 2 got remapped. The math checks out.

The Full Load Balancer with Consistent Hashing

Let’s integrate it:

type LoadBalancer struct {
    host            string
    port            int
    consistentHash  *ConsistentHash
    pool            *ServerPool
}

func NewLoadBalancer(host string, port int, virtualNodes int) *LoadBalancer {
    return &LoadBalancer{
        host:           host,
        port:           port,
        consistentHash: NewConsistentHash(virtualNodes),
        pool:           NewServerPool(),
    }
}

func (lb *LoadBalancer) AddBackend(host string, port int) *Backend {
    backend := lb.pool.AddBackend(host, port)
    lb.consistentHash.AddBackend(backend)
    return backend
}

func (lb *LoadBalancer) handleConnection(clientConn net.Conn) {
    defer clientConn.Close()

    clientAddr := clientConn.RemoteAddr().String()
    clientIP, _, _ := net.SplitHostPort(clientAddr)

    backend := lb.consistentHash.GetBackendForIP(clientIP)
    if backend == nil {
        clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
        return
    }

    log.Printf("[LB] %s → %s (consistent hash)", clientIP, backend.Address())

    backend.IncrementConnections()
    defer backend.DecrementConnections()

    backendConn, err := net.Dial("tcp", backend.Address())
    if err != nil {
        clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
        return
    }
    defer backendConn.Close()

    lb.forwardTraffic(clientConn, backendConn)
}

Session Draining

What happens when you need to take a server offline for maintenance? You don’t want to just kill everyone’s sessions.

Session draining is the process of gracefully removing a server:

  1. Stop sending NEW requests to the server
  2. Let existing connections finish
  3. After a timeout (or all connections close), remove the server
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   Session Draining Timeline:                                    │
│                                                                 │
│   T=0: Admin initiates drain on Server 2                        │
│        ├── Server 2 marked as "draining"                        │
│        ├── New requests go to other servers                     │
│        └── Existing connections continue                        │
│                                                                 │
│   T=0 to T=30s: Drain period                                    │
│        ├── Active connections on Server 2: 150 → 89 → 34 → 12   │
│        └── Users finish their tasks naturally                   │
│                                                                 │
│   T=30s: Drain timeout                                          │
│        ├── 12 connections still active                          │
│        ├── Option A: Force close remaining connections          │
│        └── Option B: Extend timeout                             │
│                                                                 │
│   T=35s: Server 2 removed completely                            │
│        └── Now safe to shut down                                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation

type Backend struct {
    Host        string
    Port        int
    alive       int32
    draining    int32  // new field!
    connections int64
}

func (b *Backend) IsDraining() bool {
    return atomic.LoadInt32(&b.draining) == 1
}

func (b *Backend) SetDraining(draining bool) {
    var v int32 = 0
    if draining {
        v = 1
    }
    atomic.StoreInt32(&b.draining, v)
}

func (b *Backend) IsAvailable() bool {
    return b.IsAlive() && !b.IsDraining()
}

func (lb *LoadBalancer) DrainBackend(backend *Backend, timeout time.Duration) {
    log.Printf("[DRAIN] Starting drain for %s (timeout: %v)", backend.Address(), timeout)

    backend.SetDraining(true)

    deadline := time.Now().Add(timeout)
    ticker := time.NewTicker(1 * time.Second)
    defer ticker.Stop()

    for {
        conns := backend.GetConnections()
        if conns == 0 {
            log.Printf("[DRAIN] %s drained successfully (0 connections)", backend.Address())
            break
        }

        if time.Now().After(deadline) {
            log.Printf("[DRAIN] %s drain timeout, %d connections remaining",
                backend.Address(), conns)
            break
        }

        log.Printf("[DRAIN] %s: %d connections remaining", backend.Address(), conns)
        <-ticker.C
    }

    lb.consistentHash.RemoveBackend(backend)
    log.Printf("[DRAIN] %s removed from pool", backend.Address())
}

Update GetBackend to skip draining servers for new connections:

func (ch *ConsistentHash) GetBackend(key string) *Backend {
    ch.mux.RLock()
    defer ch.mux.RUnlock()

    if len(ch.sortedHashes) == 0 {
        return nil
    }

    hash := ch.hashKey(key)

    idx := sort.Search(len(ch.sortedHashes), func(i int) bool {
        return ch.sortedHashes[i] >= hash
    })

    if idx >= len(ch.sortedHashes) {
        idx = 0
    }

    startIdx := idx
    for {
        backend := ch.circle[ch.sortedHashes[idx]]
        if backend.IsAvailable() {
            return backend
        }

        idx = (idx + 1) % len(ch.sortedHashes)
        if idx == startIdx {
            return nil
        }
    }
}

When NOT to Use Sticky Sessions

Here’s the truth that might hurt: sticky sessions are often a band-aid for poor architecture.

In a perfect world, your application would be stateless. Session state would live in a shared store, not on individual servers.

Stateless Architecture

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   STATEFUL (what we've been building for):                      │
│                                                                 │
│         ┌─────────────────────────────────────────┐             │
│         │           Load Balancer                 │             │
│         │    (sticky sessions required)           │             │
│         └─────────────────────────────────────────┘             │
│                           │                                     │
│             ┌─────────────┼─────────────┐                       │
│             │             │             │                       │
│             ▼             ▼             ▼                       │
│        ┌──────────┐  ┌──────────┐  ┌──────────┐                 │
│        │ Server 1 │  │ Server 2 │  │ Server 3 │                 │
│        │          │  │          │  │          │                 │
│        │ Sydney's │  │ Mike's   │  │ Sara's   │                 │
│        │ Session  │  │ Session  │  │ Session  │                 │
│        └──────────┘  └──────────┘  └──────────┘                 │
│                                                                 │
│   Problem: Sessions are local. Server dies = sessions die.      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   STATELESS (the better way):                                   │
│                                                                 │
│             ┌───────────────────────────┐                       │
│             │      Load Balancer        │                       │
│             └───────────────────────────┘                       │
│                         │                                       │
│           ┌─────────────┼─────────────┐                         │
│           │             │             │                         │
│           ▼             ▼             ▼                         │
│     ┌──────────┐  ┌──────────┐  ┌──────────┐                    │
│     │ Server 1 │  │ Server 2 │  │ Server 3 │                    │
│     │          │  │          │  │          │                    │
│     │   (no    │  │   (no    │  │   (no    │                    │
│     │  state)  │  │  state)  │  │  state)  │                    │
│     └──────────┘  └──────────┘  └──────────┘                    │
│           │             │             │                         │
│           └─────────────┼─────────────┘                         │
│                         │                                       │
│                         ▼                                       │
│              ┌────────────────────┐                             │
│              │   Shared Session   │                             │
│              │   Store (Redis)    │                             │
│              │                    │                             │
│              │ All sessions here  │                             │
│              └────────────────────┘                             │
│                                                                 │
│   Any server can handle any request.                            │
│   Server dies = no problem, sessions are safe in Redis.         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Benefits of Stateless

  1. Any server can handle any request - No routing constraints
  2. Server failure doesn’t lose sessions - Data is in shared store
  3. Easy horizontal scaling - Just add more servers
  4. Better algorithms - Can use Least Connections, Power of Two Choices, etc.
  5. Simpler load balancer - No need for sticky logic

When You Still Need Sticky Sessions

Sometimes stateless isn’t practical:

  • Legacy applications - Rewriting for stateless is too expensive
  • WebSocket connections - Long-lived connections need to stay with one server
  • In-memory caches - Local caches warm up over time, better to keep users on same server
  • Large session data - Too expensive to store/retrieve from shared store every request

Complete Implementation

Here’s our load balancer with all session persistence strategies:

package main

import (
    "bufio"
    "fmt"
    "hash/crc32"
    "io"
    "log"
    "net"
    "net/http"
    "sort"
    "strings"
    "sync"
    "sync/atomic"
    "time"
)

type Backend struct {
    Host        string
    Port        int
    alive       int32
    draining    int32
    connections int64
}

func NewBackend(host string, port int) *Backend {
    return &Backend{
        Host:  host,
        Port:  port,
        alive: 1,
    }
}

func (b *Backend) Address() string {
    return fmt.Sprintf("%s:%d", b.Host, b.Port)
}

func (b *Backend) IsAlive() bool {
    return atomic.LoadInt32(&b.alive) == 1
}

func (b *Backend) SetAlive(alive bool) {
    var v int32 = 0
    if alive {
        v = 1
    }
    atomic.StoreInt32(&b.alive, v)
}

func (b *Backend) IsDraining() bool {
    return atomic.LoadInt32(&b.draining) == 1
}

func (b *Backend) SetDraining(draining bool) {
    var v int32 = 0
    if draining {
        v = 1
    }
    atomic.StoreInt32(&b.draining, v)
}

func (b *Backend) IsAvailable() bool {
    return b.IsAlive() && !b.IsDraining()
}

func (b *Backend) GetConnections() int64 {
    return atomic.LoadInt64(&b.connections)
}

func (b *Backend) IncrementConnections() {
    atomic.AddInt64(&b.connections, 1)
}

func (b *Backend) DecrementConnections() {
    atomic.AddInt64(&b.connections, -1)
}


type ConsistentHash struct {
    circle       map[uint32]*Backend
    sortedHashes []uint32
    virtualNodes int
    backends     []*Backend
    backendNames map[*Backend]string
    mux          sync.RWMutex
}

func NewConsistentHash(virtualNodes int) *ConsistentHash {
    return &ConsistentHash{
        circle:       make(map[uint32]*Backend),
        sortedHashes: make([]uint32, 0),
        virtualNodes: virtualNodes,
        backends:     make([]*Backend, 0),
        backendNames: make(map[*Backend]string),
    }
}

func (ch *ConsistentHash) hashKey(key string) uint32 {
    return crc32.ChecksumIEEE([]byte(key))
}

func (ch *ConsistentHash) AddBackend(name string, backend *Backend) {
    ch.mux.Lock()
    defer ch.mux.Unlock()

    ch.backends = append(ch.backends, backend)
    ch.backendNames[backend] = name

    for i := 0; i < ch.virtualNodes; i++ {
        virtualKey := fmt.Sprintf("%s-%d", name, i)
        hash := ch.hashKey(virtualKey)
        ch.circle[hash] = backend
        ch.sortedHashes = append(ch.sortedHashes, hash)
    }

    sort.Slice(ch.sortedHashes, func(i, j int) bool {
        return ch.sortedHashes[i] < ch.sortedHashes[j]
    })

    log.Printf("[HASH] Added %s with %d virtual nodes", name, ch.virtualNodes)
}

func (ch *ConsistentHash) RemoveBackend(backend *Backend) {
    ch.mux.Lock()
    defer ch.mux.Unlock()

    name := ch.backendNames[backend]

    for i, b := range ch.backends {
        if b == backend {
            ch.backends = append(ch.backends[:i], ch.backends[i+1:]...)
            break
        }
    }

    for i := 0; i < ch.virtualNodes; i++ {
        virtualKey := fmt.Sprintf("%s-%d", name, i)
        hash := ch.hashKey(virtualKey)
        delete(ch.circle, hash)
    }

    delete(ch.backendNames, backend)

    ch.sortedHashes = make([]uint32, 0, len(ch.circle))
    for hash := range ch.circle {
        ch.sortedHashes = append(ch.sortedHashes, hash)
    }
    sort.Slice(ch.sortedHashes, func(i, j int) bool {
        return ch.sortedHashes[i] < ch.sortedHashes[j]
    })

    log.Printf("[HASH] Removed %s", name)
}

func (ch *ConsistentHash) GetBackend(key string) *Backend {
    ch.mux.RLock()
    defer ch.mux.RUnlock()

    if len(ch.sortedHashes) == 0 {
        return nil
    }

    hash := ch.hashKey(key)

    idx := sort.Search(len(ch.sortedHashes), func(i int) bool {
        return ch.sortedHashes[i] >= hash
    })

    if idx >= len(ch.sortedHashes) {
        idx = 0
    }

    startIdx := idx
    for {
        backend := ch.circle[ch.sortedHashes[idx]]
        if backend.IsAvailable() {
            return backend
        }

        idx = (idx + 1) % len(ch.sortedHashes)
        if idx == startIdx {
            return nil
        }
    }
}

func (ch *ConsistentHash) GetBackendName(backend *Backend) string {
    ch.mux.RLock()
    defer ch.mux.RUnlock()
    return ch.backendNames[backend]
}


type CookiePersistence struct {
    cookieName     string
    consistentHash *ConsistentHash
}

func NewCookiePersistence(cookieName string, ch *ConsistentHash) *CookiePersistence {
    return &CookiePersistence{
        cookieName:     cookieName,
        consistentHash: ch,
    }
}

func (cp *CookiePersistence) GetBackendFromCookie(cookieHeader string) *Backend {
    cookies := strings.Split(cookieHeader, ";")
    for _, cookie := range cookies {
        cookie = strings.TrimSpace(cookie)
        parts := strings.SplitN(cookie, "=", 2)
        if len(parts) == 2 && parts[0] == cp.cookieName {
            return cp.consistentHash.GetBackend(parts[1])
        }
    }
    return nil
}


type HealthChecker struct {
    backends      []*Backend
    interval      time.Duration
    timeout       time.Duration
    failThreshold int
    riseThreshold int
    failCounts    map[*Backend]int
    successCounts map[*Backend]int
    mux           sync.Mutex
}

func NewHealthChecker(backends []*Backend) *HealthChecker {
    return &HealthChecker{
        backends:      backends,
        interval:      5 * time.Second,
        timeout:       3 * time.Second,
        failThreshold: 3,
        riseThreshold: 2,
        failCounts:    make(map[*Backend]int),
        successCounts: make(map[*Backend]int),
    }
}

func (hc *HealthChecker) check(b *Backend) bool {
    client := &http.Client{Timeout: hc.timeout}
    resp, err := client.Get(fmt.Sprintf("http://%s/health", b.Address()))
    if err != nil {
        return false
    }
    defer resp.Body.Close()
    return resp.StatusCode >= 200 && resp.StatusCode < 300
}

func (hc *HealthChecker) process(b *Backend, healthy bool) {
    hc.mux.Lock()
    defer hc.mux.Unlock()

    wasAlive := b.IsAlive()

    if healthy {
        hc.failCounts[b] = 0
        hc.successCounts[b]++
        if !wasAlive && hc.successCounts[b] >= hc.riseThreshold {
            b.SetAlive(true)
            log.Printf("[HEALTH] %s is now HEALTHY", b.Address())
            hc.successCounts[b] = 0
        }
    } else {
        hc.successCounts[b] = 0
        hc.failCounts[b]++
        if wasAlive && hc.failCounts[b] >= hc.failThreshold {
            b.SetAlive(false)
            log.Printf("[HEALTH] %s is now UNHEALTHY", b.Address())
            hc.failCounts[b] = 0
        }
    }
}

func (hc *HealthChecker) Start() {
    log.Printf("[HEALTH] Starting health checker")
    ticker := time.NewTicker(hc.interval)

    hc.checkAll()

    for range ticker.C {
        hc.checkAll()
    }
}

func (hc *HealthChecker) checkAll() {
    var wg sync.WaitGroup
    for _, b := range hc.backends {
        wg.Add(1)
        go func(backend *Backend) {
            defer wg.Done()
            hc.process(backend, hc.check(backend))
        }(b)
    }
    wg.Wait()
}

type LoadBalancer struct {
    host              string
    port              int
    consistentHash    *ConsistentHash
    cookiePersistence *CookiePersistence
    backends          []*Backend
}

func NewLoadBalancer(host string, port int, virtualNodes int) *LoadBalancer {
    ch := NewConsistentHash(virtualNodes)
    return &LoadBalancer{
        host:              host,
        port:              port,
        consistentHash:    ch,
        cookiePersistence: NewCookiePersistence("SERVERID", ch),
        backends:          make([]*Backend, 0),
    }
}

func (lb *LoadBalancer) AddBackend(name, host string, port int) *Backend {
    backend := NewBackend(host, port)
    lb.backends = append(lb.backends, backend)
    lb.consistentHash.AddBackend(name, backend)
    return backend
}

func (lb *LoadBalancer) DrainBackend(backend *Backend, timeout time.Duration) {
    log.Printf("[DRAIN] Starting drain for %s", backend.Address())

    backend.SetDraining(true)

    deadline := time.Now().Add(timeout)
    ticker := time.NewTicker(1 * time.Second)
    defer ticker.Stop()

    for {
        conns := backend.GetConnections()
        if conns == 0 {
            log.Printf("[DRAIN] %s drained successfully", backend.Address())
            break
        }

        if time.Now().After(deadline) {
            log.Printf("[DRAIN] %s timeout, %d connections remaining", backend.Address(), conns)
            break
        }

        log.Printf("[DRAIN] %s: %d connections remaining", backend.Address(), conns)
        <-ticker.C
    }

    lb.consistentHash.RemoveBackend(backend)
}

func (lb *LoadBalancer) Start() error {
    addr := fmt.Sprintf("%s:%d", lb.host, lb.port)
    listener, err := net.Listen("tcp", addr)
    if err != nil {
        return err
    }
    defer listener.Close()

    log.Printf("[LB] Started on %s with consistent hashing", addr)

    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Printf("[LB] Accept error: %v", err)
            continue
        }
        go lb.handleHTTPConnection(conn)
    }
}

func (lb *LoadBalancer) handleHTTPConnection(clientConn net.Conn) {
    defer clientConn.Close()

    reader := bufio.NewReader(clientConn)
    request, err := http.ReadRequest(reader)
    if err != nil {
        return
    }

    clientAddr := clientConn.RemoteAddr().String()
    clientIP, _, _ := net.SplitHostPort(clientAddr)

    var backend *Backend

    cookieHeader := request.Header.Get("Cookie")
    if cookieHeader != "" {
        backend = lb.cookiePersistence.GetBackendFromCookie(cookieHeader)
        if backend != nil {
            log.Printf("[LB] %s → %s (cookie)", clientIP, backend.Address())
        }
    }

    if backend == nil {
        backend = lb.consistentHash.GetBackend(clientIP)
        if backend != nil {
            log.Printf("[LB] %s → %s (consistent hash)", clientIP, backend.Address())
        }
    }

    if backend == nil {
        clientConn.Write([]byte("HTTP/1.1 503 Service Unavailable\r\n\r\n"))
        return
    }

    backend.IncrementConnections()
    defer backend.DecrementConnections()

    backendConn, err := net.Dial("tcp", backend.Address())
    if err != nil {
        clientConn.Write([]byte("HTTP/1.1 502 Bad Gateway\r\n\r\n"))
        return
    }
    defer backendConn.Close()

    request.Write(backendConn)

    backendReader := bufio.NewReader(backendConn)
    response, err := http.ReadResponse(backendReader, request)
    if err != nil {
        return
    }

    serverName := lb.consistentHash.GetBackendName(backend)
    if serverName != "" {
        cookie := fmt.Sprintf("%s=%s; Path=/; HttpOnly", lb.cookiePersistence.cookieName, serverName)
        response.Header.Add("Set-Cookie", cookie)
    }

    response.Write(clientConn)
}

func main() {
    lb := NewLoadBalancer("0.0.0.0", 8080, 150)

    lb.AddBackend("server1", "127.0.0.1", 8081)
    lb.AddBackend("server2", "127.0.0.1", 8082)
    lb.AddBackend("server3", "127.0.0.1", 8083)

    hc := NewHealthChecker(lb.backends)
    go hc.Start()

    // example: drain server2 after 30 seconds (for testing)
    // go func() {
    //     time.Sleep(30 * time.Second)
    //     lb.DrainBackend(lb.backends[1], 60*time.Second)
    // }()

    if err := lb.Start(); err != nil {
        log.Fatal(err)
    }
}

Testing Session Persistence

Test 1: Consistent Hashing

Start the backends and load balancer, then make repeated requests:

for i in {1..10}; do
    curl -s http://localhost:8080 | grep "Backend Server"
done

All requests should go to the same server (consistent hashing on IP).

curl -c cookies.txt -s http://localhost:8080 | grep "Backend Server"

for i in {1..5}; do
    curl -b cookies.txt -s http://localhost:8080 | grep "Backend Server"
done

All requests should go to the same server (cookie persistence).

Test 3: Server Failure

while true; do
    curl -b cookies.txt -s http://localhost:8080 | grep "Backend Server"
    sleep 1
done

# in another terminal, kill the server you're connected to
# watch the requests automatically move to another server

With consistent hashing, only clients on the dead server get remapped.


Recap

We covered a lot:

The Problem:

  • Stateful applications store sessions on individual servers
  • Random load balancing breaks sessions

The Solutions:

StrategyProsCons
IP HashSimpleNAT problem, IP changes
CookiesReliableLayer 7 required, cookie management
Session ID routingMost controlRequires app cooperation
Consistent HashingMinimal remappingMore complex to implement

Key Concepts:

  • Consistent Hashing minimizes session disruption when servers change
  • Virtual Nodes ensure even distribution on the hash ring
  • Session Draining gracefully removes servers without killing sessions
  • Stateless Architecture is better when possible

What Now?

Our load balancer is getting pretty decent now. We’ve got:

  • Multiple algorithms (RR, Weighted, LC, etc.)
  • Health checking
  • Session persistence with consistent hashing
  • Cookie based sticky sessions
  • Graceful draining

In the next part, we’re going to dive into Layer 4 vs Layer 7 Load Balancing. We’ve been doing a mix of both, but there are important distinctions and trade-offs to understand. We’ll explore when to use TCP-level load balancing vs HTTP-level load balancing.


As always, hit me up on X / Twitter if you have questions, found bugs, or want to discuss anything tech or design related!!

See you in part 8 :)