CRDTs: Mergeable Data Structures for Distributed Systems
Start With This Excellent Explanation
Then Skim The Wikipedia Article
My Notes:
Classes of CRDTs:
- Operation Based CRDTs - Transmit only the Updates
- State Based CRDTs - The entire state of every CRDT must be transmitted eventually to every other replica
🤩 Some Important Prerequisite Info
It wasnt just for Algebra 2...
| Property | Definition | Example | Why It Matters |
|---|---|---|---|
| Associative | Grouping doesn’t matter:(a ⋆ b) ⋆ c = a ⋆ (b ⋆ c) | (1 + 2) + 3 = 1 + (2 + 3) | You can reorder merge steps without affecting the result |
| Commutative | Order doesn’t matter:a ⋆ b = b ⋆ a | 2 + 5 = 5 + 2 | You can replicate and apply ops in any order |
| Idempotent | Reapplying doesn’t change the result:a ⋆ a = a | max(3, 3) = 3 | You can safely reapply updates (e.g. after network retries) |
✅ CRDTs Require These
To merge state safely and deterministically across nodes, operations must be:
- Associative: So you don’t care about how things are grouped in time
- Commutative: So replicas can apply things in different orders
- Idempotent: So retries or duplicate messages don’t corrupt the result
🔧 Real-World Examples
| Operation | Associative | Commutative | Idempotent | Notes |
|---|---|---|---|---|
+ (integers) | ✅ | ✅ | ❌ | Retrying +1 causes drift |
max(a, b) | ✅ | ✅ | ✅ | CRDT-friendly (e.g. LWW strategy) |
set.add(x) | ✅ | ✅ | ✅ | G-Set in CRDTs |
set.remove(x) | ✅ | ✅ | ✅ (in some CRDTs) | Must union (and potentially add unique ids) |
assign = x | ❌ | ❌ | ❌ | Dangerous in distributed settings |
State Based CRDTs / (also called convergent replicated data types, or CvRDTs)
Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)
Let's clarify the exact definition boundary and why:
✅ CmRDT ≠ union ✅ CvRDT = union-based merge
And what that means for commutativity vs convergence.
🎯 Quick Definitions (as per Shapiro et al.)
| Term | Means You... | Convergence Guarantee Comes From |
|---|---|---|
| CmRDT | Replicate operations between nodes | Commutative operations |
| CvRDT | Replicate full state between nodes | Mergeable state (⊔ operator) |
✅ The Cm in CmRDT literally stands for Commutative ✅ The Cv in CvRDT stands for Convergent (via a merge function)
❓ “If I just do a union of sets, isn't that commutative?”
Yes — but that’s a merge of state, not an application of operations.
This is the heart of the distinction:
Operation-based (CmRDT):
-
You store:
op1 = add("cheese")op2 = remove("cheese")
-
You apply ops in causally correct order
-
You require delivery guarantees
-
Your application logic must handle idempotency + conflicts
apply(op1); apply(op2) === apply(op2); apply(op1)
⇨ only true if ops are commutative
State-based (CvRDT):
-
You never apply ops directly — instead:
- You just merge state
- Merge is something like
set1 ∪ set2ormax(counter1, counter2)
-
Commutativity is not required of the original operations — only of the merge function
❌ Why “Union” Doesn’t Fit CmRDT
Union is a merge of two full sets. CmRDTs don't merge sets — they replicate and apply operations like:
replica.apply("add cheese")
replica.apply("remove cheese")
You’d have to implement the remove as an operation — e.g., with a tombstone or tag — and ensure that:
apply(add(x)); apply(remove(x)) == apply(remove(x)); apply(add(x))
Which is only true with careful operation design — hence, the commutativity constraint.
✅ So What Does This Mean?
| Question | Answer |
|---|---|
| Is union commutative? | Yes, but that’s for state merging |
| Is union used in CmRDTs? | No — CmRDTs replicate operations, not state |
| Is union used in CvRDTs? | Yes — union is a classic CvRDT merge strategy (e.g., G-Set) |
| Does CmRDT require causal delivery? | Yes — because operations like remove(x) depend on prior add(x) |
| What makes CvRDT convergent? | The merge function (⊔) being associative, commutative, and idempotent |
🧠 Mnemonic
CmRDT → Commutative ops CvRDT → Convergent merge