Skip to main content

“Wait… Why Are We Modding by the Number of Nodes?”

A Quick Developer’s Journey to Consistent Hashing


1. The Naive Starting Point

You’re building a distributed cache or key-value store. You need to decide: which node holds which key?

First thought (and it sounds perfectly reasonable):

“We just need relatively balanced buckets… each bucket is one node… so we’ll hash the key and mod by the number of nodes.”

# Naive hashing
def get_node(key: str, nodes: list[str]) -> str:
return nodes[hash(key) % len(nodes)]

Boom, done, right? Keys are evenly spread across all nodes. No central lookup table. No coordination. It feels elegant.


2. The First Pain Point

Everything is fine until one day: a node dies (or you add a new one). Suddenly:

node = hash(key) % (node_count ± 1)

And you realize… Almost every key just moved to a new node.

  • Node A didn’t just get a few new keys.
  • Node B didn’t just lose a small slice.
  • The entire dataset just shuffled like a deck of cards.

Statistically:

  • Adding/removing a node moves ~(N-1)/N of all keys.
  • That’s basically all of them for big clusters.

3. The First Hacky Fix: Buckets

Your inner monologue kicks in:

“Wait, what if each node didn’t equal one bucket? What if we had lots of buckets and just mapped those to nodes?”

Example:

  • Use 500 buckets (a constant).
  • Assign them to your 5 nodes (100 buckets each).
# Fixed number of slots
SLOT_COUNT = 500
slot_to_node = {}

# Assign buckets to nodes (e.g., round-robin)
for slot in range(SLOT_COUNT):
slot_to_node[slot] = nodes[slot % len(nodes)]

def get_node(key: str) -> str:
slot = hash(key) % SLOT_COUNT
return slot_to_node[slot]

Now, when a node fails:

  • You only reassign its buckets.
  • The other buckets (and their keys) stay put.

4a. Buckets Need to Be Scattered, Not Clumped

But wait: there’s a subtlety here. If all of Node 1’s buckets are adjacent (say 0–20, 21–40, 41–60, etc.), then when Node 1 fails:

  • All those adjacent buckets move together → effectively one big chunk.
  • You haven’t actually solved the load-balancing problem.

**4b. Buckets Must Be Scattered and Permuted

If you “scatter” by copying the same banding pattern (e.g., 0–20, 100–120, 200–220) and keep the same A→B→C order in every band, then when Node A dies, Node B still absorbs everything in those bands. You avoided clumps but kept successor alignment.


The Fix: Scatter Pieces Randomly Around One Shared Range

  • Give each machine several numbers (say 64–256) spread across a big fixed range (like all 64-bit integers).

  • For machine N and index i, compute one of these numbers by hashing "{N}:{i}".

  • Collect all the numbers from all machines, then sort them.

  • To place a piece of data:

    • Hash the data into the same number space.
    • Find the next largest number in the sorted list.
    • The machine that owns that number gets the data.
    • If the hash is bigger than all numbers, wrap around to the very first number in the list.

Because every machine’s numbers are generated independently, their positions get mixed together. That means if machine A disappears, the “next largest” might belong to B in some places, C in others, etc. The work gets shared out roughly in proportion to how many numbers each machine has.

Why this is good

  • Diverse neighbors: Each machine hands its work off to many different others, instead of just one unlucky neighbor.
  • Small, predictable changes: Adding or removing a machine only shifts about 1/N of the work.
  • Smoother distribution: More tickets per machine means the load evens out better (64–256 is usually plenty).

Tiny pseudocode (unchanged idea, correct construction)

def build_ring(nodes, K, hash_fn):  # nodes: [(node_id, weight>=1), ...]
ring = []
for node_id, w in nodes:
for i in range(K * w):
pos = hash_fn(f"{node_id}:{i}") % (1 << 64)
ring.append((pos, node_id))
ring.sort(key=lambda x: x[0]) # one global sort → mixed order
return ring

def owner(key, ring, hash_fn):
p = hash_fn(key) % (1 << 64)
# binary search first ring[pos] >= p (wrap to 0 if past end)
...

Notes

  • Use a stable, uniform hash (xxHash/Murmur/SHA-256).
  • Weights: give bigger machines more pegs.
  • If you want the same guarantees without a ring, consider Rendezvous/HRW hashing—it naturally avoids aligned successors too.

5. The “Ohhh…” Moment: Consistent Hashing

Congratulations, you’ve basically discovered:

  • Consistent Hashing → deterministic key-to-range mapping.
  • Virtual Nodes → fancy term for “a node owns multiple scattered ranges.”

The “hash ring” people draw is just a prettier visualization of this:

  • Nodes (or virtual nodes) are points on the ring.
  • Keys land somewhere on the ring.
  • Each key goes to the next node clockwise.

6. Picking the Constant

One subtle detail:

What do we mod by?

  • In naive hashing, it was node_countbad.
  • In bucketized hashing, it’s a fixed constant (like 500 or 10,000).
  • Nodes can change, but your hash space doesn’t.

That’s why consistent hashing is called consistent:

  • Adding/removing nodes only shifts keys in those affected ranges.
  • Everything else stays consistent.

7. Bonus: Jump Consistent Hash

If you’re feeling fancy, check out Jump Consistent Hash (from Google):

  • No big slot table.
  • O(1) time to compute which node a key belongs to.
  • Still only remaps ~1/N keys when node count changes.

Takeaway

The whole point of consistent hashing is to break free from the naive thought:

“We’ll just mod by the number of nodes.”

Instead:

  • Use a fixed hash space (think “lots of tiny buckets”).
  • Each machine claims multiple buckets, scattered pseudo-randomly.
  • Buckets are implicit — we just store the end-points (integers). To route a key, you hash it, then look up the next largest integer; the machine that owns that integer is the winner.

That’s how you get scalable, fault-tolerant, load-balanced key distribution.