Common Use Cases of Consistent Hashing

Consistent hashing is a technique for distributing items across a dynamic set of nodes with minimal reshuffling when nodes join or leave. Because the mapping is deterministic, the same input always routes to the same output, giving both scalability and stability.

Here are the most common real-world use cases:

1. Distributed Caches

Problem: In a cache cluster, you need to know which node stores a given key.
Naïve approach: hash(key) mod N. But if N changes (a node added/removed), almost all keys remap → cache misses galore.
Consistent hashing: Only the keys in the affected range remap. The rest stay put.
Examples: Memcached, Redis Cluster, Couchbase.

2. Distributed Key-Value Stores & Databases

Databases like Dynamo, Cassandra, and Riak rely on consistent hashing for partitioning.
Keys (or partitions) are placed on a logical “ring”; each node owns a range of the ring.
When a node fails or is added, only adjacent ranges shift, keeping the system stable.
Replication is layered on top: a key maps to multiple nodes (e.g. primary + replicas).

3. Load Balancing (L4 vs. L7 Hashing)

Load balancers often use consistent hashing to ensure that a given client or request always routes to the same backend. This avoids “session breakage” and enables sticky routing without shared state. There are pros and cons to the layer you choose to hash on:

Layer 4 (Transport)

Inputs: tuple of {src_ip, dst_ip, src_port, dst_port}.
Advantages:
- Very fast (works only on packet headers).
- Stateless and protocol-agnostic (works with TCP, UDP, etc.).
Drawbacks:
- Clients behind the same NAT share an IP → they all hash to the same backend, creating load imbalance.
- No awareness of user/session identity.

Layer 7 (Application)

Inputs: application-level fields such as:
- Cookies (session ID).
- HTTP headers (e.g. Authorization).
- URL paths or hostnames (tenant-based routing).
Advantages:
- True session affinity — user stickiness works even behind NAT.
- Can route by tenant, path, or custom logic.
- More even distribution in real-world traffic.
Drawbacks:
- Added complexity compared to L4, which just uses transport headers. There's More CPU consumed and slight latency overhead due to:
  - Protocol parsing - The load balancer must understand the application protocol well enough to extract the chosen hash input (cookie, header, tenant ID, etc.).
  - Application payload parsing/serializing

Example:

With L4 hashing, two users behind a corporate NAT land on the same backend, potentially overloading it.
With L7 hashing, each user’s session ID is used as input to the hash, giving balanced and deterministic routing.

4. Sharding

Splitting a dataset across shards, consistent hashing ensures each key is deterministically assigned.
Compared to range-based sharding, it avoids hotspots (all popular keys piling into one shard).
Used in MongoDB sharding, custom SQL partitioners, and homegrown distributed systems.

5. Content Delivery Networks (CDNs)

CDNs hash on content ID or URL to route requests to cache nodes.
Ensures the same content consistently maps to the same edge server, maximizing cache hits.
Handles edge nodes joining/leaving without invalidating the whole cache.

6. Peer-to-Peer Networks

P2P overlays like Chord, Pastry, and Kademlia rely on consistent hashing in their distributed hash tables (DHTs).
Files or blocks are hashed onto a ring; peers own ranges.
Enables decentralized lookup and routing without a central directory.

7. Job Scheduling / Task Queues

Jobs or messages are assigned to workers by hashing on job ID, customer ID, or partition key.
Guarantees the same worker handles related jobs, while the pool can scale elastically.
Variants appear in Kafka partition assignment and stream processing frameworks.

✨ Why It Matters

Consistent hashing looks simple but powers the backbone of modern distributed systems: caches, databases, CDNs, P2P overlays, load balancers, and job queues. Anywhere you want deterministic mapping + graceful handling of node churn, consistent hashing is the tool of choice.

1. Distributed Caches​

2. Distributed Key-Value Stores & Databases​

3. Load Balancing (L4 vs. L7 Hashing)​

Layer 4 (Transport)​

Layer 7 (Application)​

4. Sharding​

5. Content Delivery Networks (CDNs)​

6. Peer-to-Peer Networks​

7. Job Scheduling / Task Queues​

✨ Why It Matters​