Software Development

Consistent Hashing: The Algorithm That Makes Distributed Caches and Databases Actually Scale

Theory, variants, and where it breaks — from the 1997 Karger et al. paper to virtual nodes in DynamoDB, Cassandra, and Redis Cluster.

1. Why Modular Hashing Fails at Scale

Before consistent hashing existed, the natural approach to distributing data across N nodes was modular arithmetic. To determine which node owns a key, you compute hash(key) % N. It is fast, simple, and completely adequate — right up until you need to add or remove a node.

Adding a single node changes N from, say, 10 to 11. Consequently, hash(key) % 10 and hash(key) % 11 produce different results for nearly every key. In practice, approximately (N−1)/N of all keys — around 90% in this example — must be remapped to a different node. For a cache with hundreds of millions of entries, that means a thundering-herd event: the moment a new cache node joins, the vast majority of the cache becomes invalid simultaneously, and the origin servers absorb the full load until the cache warms again.

Modular hashing — keys remapped when N changes
Standard hashing: node = hash(key) % N
— when N -> N+1: ~(N-1)/N keys remapped ? 90% for N=10

Consistent hashing: node = successor(hash(key)) on ring
— when N -> N+1: ~K/N keys remapped ?

The same cascade happens on node removal. One node fails and 90% of keys need re-routing. For any system where cache miss cost is high — think CDN edge nodes, database read replicas, or in-memory session stores — this is catastrophic. Karger et al.’s 1997 paper at MIT named this problem precisely and introduced a solution that has remained essentially unchanged for nearly three decades.

The core guaranteeConsistent hashing ensures that when a node is added or removed, onlyK/Nkeys need to be redistributed on average — where K is the total number of keys and N is the number of nodes. This is the minimum theoretically possible.

2. The Hash Ring: Consistent Hashing Explained

The key insight of consistent hashing is geometric: instead of a linear range, map both nodes and keys onto a circular hash space — conventionally the range [0, 2³²) for a 32-bit hash function. This ring has no beginning and no end; position 2³²−1 is immediately followed by position 0.

Each node is assigned a position on the ring by hashing its identifier (typically its IP address or hostname). Each key is also hashed to a position on the ring. To find the node responsible for a key, you walk clockwise around the ring from the key’s position until you reach the first node — this is called the key’s successor node. That node is responsible for storing or serving the key.

Interactive Hash Ring

Toggle views to see how adding a node (D) affects key distribution. Only keys in the arc between C and D are remapped — all others remain on the same node.

When a new node D is added between nodes C and A on the ring, the only keys that need to move are those that were previously served by A but now fall between C and D. All other nodes — and their assigned keys — are completely unaffected. This is precisely why the fraction of remapped keys is K/N rather than (N−1)/N.

Similarly, when a node is removed, only the keys that were assigned to that node need to be redistributed — to that node’s clockwise successor. Every other assignment stays intact. Consequently, node failures cause only localised key redistribution, which is essential for building fault-tolerant distributed caches.

3. Virtual Nodes: Solving the Balance Problem

A naive hash ring with three or four physical nodes has a significant weakness: the hash positions of real nodes are unlikely to be uniformly distributed around the ring. One node may end up owning 60% of the ring arc while another owns only 5%. The result is severe load imbalance — exactly the problem we were trying to solve.

The solution, introduced alongside the original ring idea and now universally adopted in production systems, is virtual nodes (vnodes). Rather than placing each physical node at a single point on the ring, you place it at V points simultaneously — each point representing a virtual node. When a virtual node receives a request, the physical machine behind it handles it. In practice, V is set between 100 and 1,000 per physical node.

Load Distribution: Physical vs. Virtual Nodes (150 virtual nodes per server)

Simulated distribution of 1,000,000 keys across 5 nodes. With only physical nodes, one node holds 28% of keys; with 150 vnodes per node, the range narrows to 18–22%. Based on coupon-collector analysis and consistent hashing load bounds (Karger et al., Mirrokni et al.).

Virtual nodes also make heterogeneous clusters straightforward. A machine with twice the memory and CPU can be assigned twice as many virtual nodes, proportionally increasing its share of the keyspace. When that machine is removed, its keys scatter across many different successors rather than dumping onto one neighbour — preventing the successor from being overwhelmed.

Practical guidance on VCassandra defaults to 256 virtual nodes per physical node as of version 4.0. DynamoDB’s internal partitioning uses a similar concept at the storage node level. Redis Cluster takes a different approach — it uses a fixed 16,384-slot hash ring rather than per-node virtual positions — which provides comparable balance without the gossip overhead of tracking thousands of individual vnode positions.

4. A Working Java Implementation

The core of a consistent hash ring fits in a single class. The following implementation uses a TreeMap to represent the ring — the sorted key structure makes finding the successor node a simple ceilingKey call, giving us O(log V) lookup where V is the total number of virtual node positions.

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHashRing<T> {

    private final int                     virtualNodes;
    private final SortedMap<Long, T> ring = new TreeMap<>();

    public ConsistentHashRing(int virtualNodes) {
        this.virtualNodes = virtualNodes;
    }

    // Add a node: hash V virtual positions into the ring
    public void addNode(T node) {
        for (int i = 0; i < virtualNodes; i++) {
            long position = hash(node.toString() + "-vn-" + i);
            ring.put(position, node);
        }
    }

    // Remove a node: delete all its virtual positions
    public void removeNode(T node) {
        for (int i = 0; i < virtualNodes; i++) {
            long position = hash(node.toString() + "-vn-" + i);
            ring.remove(position);
        }
    }

    // Route a key: find its successor node clockwise on the ring
    public T getNode(String key) {
        if (ring.isEmpty()) throw new IllegalStateException("Ring is empty");

        long keyHash = hash(key);
        SortedMap<Long, T> tail = ring.tailMap(keyHash);

        // If no node after keyHash, wrap around to the first node
        Long nodePos = tail.isEmpty() ? ring.firstKey() : tail.firstKey();
        return ring.get(nodePos);
    }

    // MD5-based hash ? 64-bit long (first 8 bytes)
    private long hash(String input) {
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            byte[] digest = md.digest(input.getBytes(StandardCharsets.UTF_8));
            long h = 0;
            for (int i = 0; i < 8; i++) {
                h = (h << 8) | (digest[i] & 0xFF);
            }
            return h;
        } catch (Exception e) { throw new RuntimeException(e); }
    }
}

// -  Usage -
ConsistentHashRing<String> ring = new ConsistentHashRing<>(150);
ring.addNode("node-a:6379");
ring.addNode("node-b:6379");
ring.addNode("node-c:6379");

String target = ring.getNode("user:session:8f3d");
// -> "node-b:6379"  (deterministic, consistent)

ring.addNode("node-d:6379");
String same = ring.getNode("user:session:8f3d");
// -> still "node-b:6379" if the key doesn't fall in the new arc

A few implementation details are worth noting. First, MD5 is used here for its wide availability and well-understood distribution properties — in high-throughput systems, MurmurHash3 or xxHash are faster alternatives with similar uniformity. Second, the TreeMap.ceilingKey() operation — hidden inside tailMap — runs in O(log N) where N is the number of entries in the ring (virtual nodes × physical nodes). For a typical cluster of 10 nodes with 150 vnodes each, that is O(log 1500) ≈ 11 comparisons per lookup, which is negligible.

5. Rendezvous Hashing: An Elegant Alternative

Rendezvous hashing — also called Highest Random Weight (HRW) hashing — was introduced by Thaler and Ravishankar at roughly the same time as Karger’s ring and solves the same problem with a different geometric intuition.

Rather than a ring, the algorithm is conceptually simpler: for a given key, hash the key together with each candidate node’s identifier, producing a score for each (key, node) pair. The node with the highest score wins. Crucially, when a node is added or removed, only those keys whose highest-scoring node has changed are remapped — which turns out to be exactly K/N keys, matching the theoretical optimum of consistent hashing.

import java.util.Collection;
import java.util.Comparator;

public class RendezvousHash<N> {

    private final Collection<N> nodes;

    public RendezvousHash(Collection<N> nodes) {
        this.nodes = nodes;
    }

    // Pick the node that produces the highest hash score for this key
    public N getNode(String key) {
        return nodes.stream()
            .max(Comparator.comparingLong(n -> score(key, n.toString())))
            .orElseThrow();
    }

    private long score(String key, String node) {
        // XOR the hashes of key and node for a combined score
        return fnv1a(key) ^ fnv1a(node);
    }

    // FNV-1a: fast non-cryptographic hash, good distribution
    private long fnv1a(String s) {
        long h = 0xcbf29ce484222325L;
        for (byte b : s.getBytes(java.nio.charset.StandardCharsets.UTF_8)) {
            h ^= b;
            h *= 0x100000001b3L;
        }
        return h;
    }
}

Rendezvous hashing has two genuine advantages over ring-based consistent hashing. It requires no data structure maintenance — there is no ring to update when nodes change, and the algorithm is stateless. It also achieves exact uniformity without virtual nodes, because every (key, node) pair is independently scored. However, it pays for those benefits with O(N) lookup time per key — you must compute a score for every node on every request. For small clusters (fewer than 50 nodes) this is entirely practical. For large clusters with thousands of nodes, the ring’s O(log N) lookup wins decisively.

PropertyRing (Consistent Hashing)Rendezvous (HRW)
Lookup complexityO(log N) with TreeMapO(N) — scores all nodes
Load balanceNeeds virtual nodes (V ≈ 150)Exact uniform distribution
State requiredRing data structure (sorted map)Stateless — just the node list
Node add/remove costO(V log V) ring updateO(1) — update node list only
Best forLarge clusters, high QPSSmall clusters, simple deployments
Used byCassandra, DynamoDB, Redis ClusterNginx, some CDN edge routing

6. How DynamoDB, Cassandra, and Redis Use It

Understanding how production systems apply consistent hashing reveals the engineering decisions that sit on top of the theoretical foundation. Each system makes different trade-offs around virtual node count, replication, and the role the ring plays in the overall architecture.

7. Where Consistent Hashing Breaks Down

No algorithm is a universal solution. Consistent hashing has well-understood failure modes, and understanding them is just as important as understanding the algorithm itself.

Cache Hit Rate During Node Addition: Consistent vs. Modular Hashing

Simulated CDN edge cluster adding 1 node to a 10-node ring (cache capacity 1M objects, 10M total requests). Consistent hashing recovers in minutes; modular hashing triggers a full cold-start across the whole fleet. Based on standard consistent hashing analysis and observed CDN behaviour.

1. Hot spots and skewed key distributions

Consistent hashing distributes keys uniformly only when the hash function itself is uniform across the key space. Real-world workloads frequently violate this: celebrity users, trending content, and time-series data with monotonically increasing keys all produce access patterns that concentrate load on a small number of nodes regardless of the hashing scheme. This is the fundamental tension between uniform key distribution and non-uniform access frequency. DynamoDB’s adaptive capacity and Cassandra’s speculative execution are both responses to this limitation.

2. Virtual node overhead in large clusters

With 150 virtual nodes per physical node and a 1,000-node cluster, the ring contains 150,000 entries. Maintaining this ring via gossip protocol — broadcasting additions and removals — generates O(N²) messages in a naive implementation. Production systems mitigate this with bounded gossip (each node only syncs with a few peers per round), but large cluster resizes can still produce detectable gossip storms. Cassandra’s internal repair process partially addresses this.

3. Replication and consistency are a separate concern

Consistent hashing solves where data lives. It says nothing about what happens when multiple replicas of the same key diverge — that problem belongs to replication protocols (quorum reads/writes, vector clocks, CRDTs). It is a common misconception that using consistent hashing somehow addresses split-brain scenarios or network partition behaviour. It does not.

4. Non-monotone hash spaces

The original Karger construction assumes a fixed, large hash space (e.g., 2³²). Systems that use bounded spaces (like Redis’s 16,384 slots) must reason carefully about what happens at the boundaries and ensure that slot assignment logic handles wraparound correctly — subtle bugs in this area have been the source of real production incidents.

The bounded load variantResearch by Mirrokni, Thorup, and Zadimoghaddam (2018) introduced consistent hashing with bounded loads — a variant that enforces an upper bound on how overloaded any single node can become, at the cost of a small amount of additional routing complexity. Google uses this in several internal load balancers where the standard variant’s potential for imbalance is unacceptable.

8. What We Have Learned

Consistent hashing is one of those rare algorithms that is simultaneously simple to understand, elegant in theory, and genuinely indispensable in practice. Here is the full picture:

  • Modular hashing’s fatal flaw: hash(key) % N remaps ~(N−1)/N keys on every resize — catastrophic for distributed caches and databases that change size under load.
  • The ring construction: Map both nodes and keys to a circular hash space. Each key is served by its clockwise successor node. On resize, only K/N keys move — the theoretical minimum.
  • Virtual nodes are mandatory in production: Without them, physical node placement is statistically unlikely to be uniform. With V ≈ 150–256 vnodes per node, load variance becomes acceptably small and heterogeneous clusters become trivially configurable.
  • Java implementation: A TreeMap keyed on hash positions gives O(log N) lookup with a very small constant. MD5 or MurmurHash3 are the standard hash function choices.
  • Rendezvous hashing is a stateless, structurally simpler alternative with O(N) lookup — ideal for small clusters, CDN edge routing, and environments where maintaining a ring is operationally burdensome.
  • Production systems diverge in the details: Cassandra uses a 2⁶⁴ Murmur3 ring with 256 vnodes; Redis Cluster uses 16,384 fixed hash slots with CRC16; DynamoDB hides its ring entirely behind its partition abstraction.
  • The limits are real: Hot keys, gossip overhead at scale, and the complete separation between partitioning and replication are all constraints that production teams must design around explicitly.

Eleftheria Drosopoulou

Eleftheria is an Experienced Business Analyst with a robust background in the computer software industry. Proficient in Computer Software Training, Digital Marketing, HTML Scripting, and Microsoft Office, they bring a wealth of technical skills to the table. Additionally, she has a love for writing articles on various tech subjects, showcasing a talent for translating complex concepts into accessible content.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button