CRDTs: Mergeable Data Structures for Distributed Systems
✅ CRDTs Require These
To merge state safely and deterministically across nodes, operations must be:
- Associative: So you don’t care about how things are grouped in time
- Commutative: So replicas can apply things in different orders
- Idempotent: So retries or duplicate messages don’t corrupt the result
🔧 Real-World Examples
Operation | Associative | Commutative | Idempotent | Notes |
---|---|---|---|---|
+ (integers) | ✅ | ✅ | ❌ | Retrying +1 causes drift |
max(a, b) | ✅ | ✅ | ✅ | CRDT-friendly (e.g. LWW strategy) |
set.add(x) | ✅ | ✅ | ✅ | G-Set in CRDTs |
set.remove(x) | ✅ | ✅ | ✅ (in some CRDTs) | Must union (and potentially add unique ids) |
assign = x | ❌ | ❌ | ❌ | Dangerous in distributed settings |
State Based CRDTs / (also called convergent replicated data types, or CvRDTs)
Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)
Let's clarify the exact definition boundary and why:
✅ CmRDT ≠ union ✅ CvRDT = union-based merge
And what that means for commutativity vs convergence.
🎯 Quick Definitions (as per Shapiro et al.)
Term | Means You... | Convergence Guarantee Comes From |
---|---|---|
CmRDT | Replicate operations between nodes | Commutative operations |
CvRDT | Replicate full state between nodes | Mergeable state (⊔ operator) |
✅ The Cm in CmRDT literally stands for Commutative ✅ The Cv in CvRDT stands for Convergent (via a merge function)
❓ “If I just do a union of sets, isn't that commutative?”
Yes — but that’s a merge of state, not an application of operations.
This is the heart of the distinction:
Operation-based (CmRDT):
-
You store:
op1 = add("cheese")
op2 = remove("cheese")
-
You apply ops in causally correct order
-
You require delivery guarantees
-
Your application logic must handle idempotency + conflicts
apply(op1); apply(op2) === apply(op2); apply(op1)
⇨ only true if ops are commutative
State-based (CvRDT):
-
You never apply ops directly — instead:
- You just merge state
- Merge is something like
set1 ∪ set2
ormax(counter1, counter2)
-
Commutativity is not required of the original operations — only of the merge function
❌ Why “Union” Doesn’t Fit CmRDT
Union is a merge of two full sets. CmRDTs don't merge sets — they replicate and apply operations like:
replica.apply("add cheese")
replica.apply("remove cheese")
You’d have to implement the remove as an operation — e.g., with a tombstone or tag — and ensure that:
apply(add(x)); apply(remove(x)) == apply(remove(x)); apply(add(x))
Which is only true with careful operation design — hence, the commutativity constraint.
✅ So What Does This Mean?
Question | Answer |
---|---|
Is union commutative? | Yes, but that’s for state merging |
Is union used in CmRDTs? | No — CmRDTs replicate operations, not state |
Is union used in CvRDTs? | Yes — union is a classic CvRDT merge strategy (e.g., G-Set) |
Does CmRDT require causal delivery? | Yes — because operations like remove(x) depend on prior add(x) |
What makes CvRDT convergent? | The merge function (⊔) being associative, commutative, and idempotent |
🧠 Mnemonic
CmRDT → Commutative ops CvRDT → Convergent merge