Skip to main content

Designing a Load Balancer

Load balancing is fundamentally about deciding where to send the next request. There are countless ways to do it, and in practice, systems often mix approaches. Here are a few common strategies, with their pros, cons, and level of statefulness:


1. Weighted Round Robin

  • How it works: Each server gets a weight. Requests are distributed proportionally (e.g., a server with weight 2 handles twice as many requests as one with weight 1).

  • Pros:

    • Simple and predictable.
    • Supports heterogeneous server capacity.
  • Cons:

    • Doesn’t adapt to changing backend load.
    • Long-lived connections can skew distribution.
  • Statefulness: Stateless.


2. Lowest Response Time

  • How it works: The load balancer tracks backend response times and sends traffic to the fastest server.

  • Pros:

    • Dynamic — adapts to load and performance differences in real time.
    • Great for systems with variable workloads.
  • Cons:

    • Requires active monitoring/metrics.
    • Can cause “flapping” (rapid shifting of traffic).
  • Statefulness: Semi-stateful (requires health/latency data, but not per-client mappings).


3. Hash-Based Routing (Consistent Hashing)

Hash-based routing ensures that the same client or request always maps to the same backend. This avoids “session breakage” and enables sticky routing without explicit state.


Layer 4 (Transport)

  • Inputs: tuple of {src_ip, dst_ip, src_port, dst_port}.

  • Advantages:

    • Very fast (uses only packet headers).
    • Stateless and protocol-agnostic (works with TCP, UDP, etc.).
  • Drawbacks:

    • Clients behind the same NAT share an IP → they all hash to the same backend, creating load imbalance.
    • No awareness of user/session identity.

Layer 7 (Application)

  • Inputs: application-level fields such as:

    • Cookies (session ID).
    • HTTP headers (e.g. Authorization).
    • URL paths or hostnames (tenant-based routing).
  • Advantages:

    • True session affinity — user stickiness works even behind NAT.
    • Can route by tenant, path, or custom logic.
    • More even distribution in real-world traffic.
  • Drawbacks:

    • Added complexity compared to L4, which just uses transport headers. There’s more CPU consumption and slight latency overhead due to:

      • Protocol parsing — the load balancer must understand the application protocol well enough to extract the chosen hash input (cookie, header, tenant ID, etc.).
      • Application payload parsing/serializing

Example:

  • With L4 hashing, two users behind a corporate NAT land on the same backend, potentially overloading it.
  • With L7 hashing, each user’s session ID is used as input to the hash, giving balanced and deterministic routing.

Point: L4 can clump behind NAT; L7 uses app identity for true stickiness.

Callouts to annotate:

  • “L4 hashes on transport tuple → NAT makes many users look the same.”
  • “L7 hashes on session/tenant → even spread & stickiness behind NAT.”

Design Considerations

Regardless of strategy, a real load balancer has to address a few core concerns:

  • Health Checks — Detect failing backends and stop routing to them.

  • Failover — Redistribute traffic gracefully when a backend disappears.

  • Scalability — Scale horizontally (multiple LBs in front) without creating bottlenecks.

  • Statefulness — Decide how much state you’re willing to keep:

    • Stateless (round robin, hashing) → easier to scale, no synchronization needed.
    • Stateful (per-connection/session mapping) → more control, but harder to scale.
  • Fairness vs. Stickiness — Some strategies optimize for even load (round robin, least response time), others for client affinity (hashing). Many production systems combine them.


Fault Tolerance: Active–Active vs. Active–Passive

Load balancers themselves can’t be single points of failure — otherwise, you’ve just moved the bottleneck upstream, therefore we will need multiple load balanacers. But what do we do when one of our load balancers goes down?

High availability setups usually come in two flavors:

Active–Active

  • How it works: Multiple load balancers run simultaneously. Clients (or DNS / anycast routing) spread requests across them.

  • Pros:

    • Scales horizontally.
    • If one LB fails, traffic naturally shifts to the remaining ones.
  • Cons:

    • Requires coordination to keep configs, health states, and metrics consistent.
    • More complex to operate.

Active–Passive

  • How it works: One load balancer is active, while another runs in standby. If the active fails, the passive takes over (via health probes or a virtual IP).

  • Pros:

    • Simpler to operate.
    • Lower overhead than active–active.
  • Cons:

    • Only one LB handles traffic at a time → no horizontal scaling.
    • Failover can cause a brief blip while the passive takes over.

🔗 How This Interacts with Hashing (L4 vs. L7)

  • Stateless strategies (Round Robin, Hashing): Active–active is straightforward — each LB can independently compute the backend choice.

    • L4 Hashing: trivial, since every LB sees the same transport tuple.
    • L7 Hashing: also works, as long as each LB can parse the application protocol consistently.
  • Stateful strategies (session tables, connection pinning): Active–active is much harder, since LB state must be replicated across nodes. If state isn’t shared, a client could hash to different servers depending on which LB it hits.

  • Design takeaway:

    • If you want active–active at scale, prefer stateless approaches (hashing is perfect here).
    • If you must maintain state (e.g., sticky tables), active–passive is usually simpler.

1) Active–Active, Stateless Hashing (ideal with L4/L7 hashing)

Point: Both LBs independently hash → same backend, no shared state needed.

Callouts to annotate:

  • “Deterministic mapping (e.g., hash(cookie)) → S2.”
  • “No session tables, no LB-LB sync.”

2) Active–Passive with VIP Failover

Point: Simpler ops, but only one LB at a time; brief blip on failover.

Callouts to annotate:

  • “Promotion via VRRP/keepalived/health probes.”
  • “Short interruption during VIP move.”