CRDTs: Mergeable Data Structures for Distributed Systems
Start With This Excellent Explanation
Then Skim The Wikipedia Article
My Notes:
Classes of CRDTs:β
- Operation Based CRDTs - Transmit only the Updates
- State Based CRDTs - The entire state of every CRDT must be transmitted eventually to every other replica
π€© Some Important Prerequisite Infoβ
It wasnt just for Algebra 2...
Property | Definition | Example | Why It Matters |
---|---|---|---|
Associative | Grouping doesnβt matter:(a β b) β c = a β (b β c) | (1 + 2) + 3 = 1 + (2 + 3) | You can reorder merge steps without affecting the result |
Commutative | Order doesnβt matter:a β b = b β a | 2 + 5 = 5 + 2 | You can replicate and apply ops in any order |
Idempotent | Reapplying doesnβt change the result:a β a = a | max(3, 3) = 3 | You can safely reapply updates (e.g. after network retries) |
β CRDTs Require Theseβ
To merge state safely and deterministically across nodes, operations must be:
- Associative: So you donβt care about how things are grouped in time
- Commutative: So replicas can apply things in different orders
- Idempotent: So retries or duplicate messages donβt corrupt the result
π§ Real-World Examplesβ
Operation | Associative | Commutative | Idempotent | Notes |
---|---|---|---|---|
+ (integers) | β | β | β | Retrying +1 causes drift |
max(a, b) | β | β | β | CRDT-friendly (e.g. LWW strategy) |
set.add(x) | β | β | β | G-Set in CRDTs |
set.remove(x) | β | β | β (in some CRDTs) | Must union (and potentially add unique ids) |
assign = x | β | β | β | Dangerous in distributed settings |
State Based CRDTs / (also called convergent replicated data types, or CvRDTs)β
Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)β
Let's clarify the exact definition boundary and why:
β CmRDT β union β CvRDT = union-based merge
And what that means for commutativity vs convergence.
π― Quick Definitions (as per Shapiro et al.)β
Term | Means You... | Convergence Guarantee Comes From |
---|---|---|
CmRDT | Replicate operations between nodes | Commutative operations |
CvRDT | Replicate full state between nodes | Mergeable state (β operator) |
β The Cm in CmRDT literally stands for Commutative β The Cv in CvRDT stands for Convergent (via a merge function)
β βIf I just do a union of sets, isn't that commutative?ββ
Yes β but thatβs a merge of state, not an application of operations.
This is the heart of the distinction:
Operation-based (CmRDT):β
-
You store:
op1 = add("cheese")
op2 = remove("cheese")
-
You apply ops in causally correct order
-
You require delivery guarantees
-
Your application logic must handle idempotency + conflicts
apply(op1); apply(op2) === apply(op2); apply(op1)
β¨ only true if ops are commutative
State-based (CvRDT):β
-
You never apply ops directly β instead:
- You just merge state
- Merge is something like
set1 βͺ set2
ormax(counter1, counter2)
-
Commutativity is not required of the original operations β only of the merge function
β Why βUnionβ Doesnβt Fit CmRDTβ
Union is a merge of two full sets. CmRDTs don't merge sets β they replicate and apply operations like:
replica.apply("add cheese")
replica.apply("remove cheese")
Youβd have to implement the remove as an operation β e.g., with a tombstone or tag β and ensure that:
apply(add(x)); apply(remove(x)) == apply(remove(x)); apply(add(x))
Which is only true with careful operation design β hence, the commutativity constraint.
β So What Does This Mean?β
Question | Answer |
---|---|
Is union commutative? | Yes, but thatβs for state merging |
Is union used in CmRDTs? | No β CmRDTs replicate operations, not state |
Is union used in CvRDTs? | Yes β union is a classic CvRDT merge strategy (e.g., G-Set) |
Does CmRDT require causal delivery? | Yes β because operations like remove(x) depend on prior add(x) |
What makes CvRDT convergent? | The merge function (β) being associative, commutative, and idempotent |
π§ Mnemonicβ
CmRDT β Commutative ops CvRDT β Convergent merge