CRDTs: Mergeable Data Structures for Distributed Systems

My Notes:

Classes of CRDTs:

Operation Based CRDTs - Transmit only the Updates
State Based CRDTs - The entire state of every CRDT must be transmitted eventually to every other replica

🤩 Some Important Prerequisite Info

It wasnt just for Algebra 2...

Property	Definition	Example	Why It Matters
Associative	Grouping doesn’t matter:`(a ⋆ b) ⋆ c = a ⋆ (b ⋆ c)`	`(1 + 2) + 3 = 1 + (2 + 3)`	You can reorder merge steps without affecting the result
Commutative	Order doesn’t matter:`a ⋆ b = b ⋆ a`	`2 + 5 = 5 + 2`	You can replicate and apply ops in any order
Idempotent	Reapplying doesn’t change the result:`a ⋆ a = a`	`max(3, 3) = 3`	You can safely reapply updates (e.g. after network retries)

✅ CRDTs Require These

To merge state safely and deterministically across nodes, operations must be:

Associative: So you don’t care about how things are grouped in time
Commutative: So replicas can apply things in different orders
Idempotent: So retries or duplicate messages don’t corrupt the result

🔧 Real-World Examples

Operation	Associative	Commutative	Idempotent	Notes
`+` (integers)	✅	✅	❌	Retrying `+1` causes drift
`max(a, b)`	✅	✅	✅	CRDT-friendly (e.g. LWW strategy)
`set.add(x)`	✅	✅	✅	G-Set in CRDTs
`set.remove(x)`	✅	✅	✅ (in some CRDTs)	Must union (and potentially add unique ids)
`assign = x`	❌	❌	❌	Dangerous in distributed settings

State Based CRDTs / (also called convergent replicated data types, or CvRDTs)

Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)

Let's clarify the exact definition boundary and why:

✅ CmRDT ≠ union ✅ CvRDT = union-based merge

And what that means for commutativity vs convergence.

🎯 Quick Definitions (as per Shapiro et al.)

Term	Means You...	Convergence Guarantee Comes From
CmRDT	Replicate operations between nodes	Commutative operations
CvRDT	Replicate full state between nodes	Mergeable state (⊔ operator)

✅ The Cm in CmRDT literally stands for Commutative ✅ The Cv in CvRDT stands for Convergent (via a merge function)

❓ “If I just do a union of sets, isn't that commutative?”

Yes — but that’s a merge of state, not an application of operations.

This is the heart of the distinction:

Operation-based (CmRDT):

You store:
- op1 = add("cheese")
- op2 = remove("cheese")
You apply ops in causally correct order
You require delivery guarantees
Your application logic must handle idempotency + conflicts

apply(op1); apply(op2) === apply(op2); apply(op1)
⇨ only true if ops are commutative

State-based (CvRDT):

You never apply ops directly — instead:
- You just merge state
- Merge is something like set1 ∪ set2 or max(counter1, counter2)
Commutativity is not required of the original operations — only of the merge function

❌ Why “Union” Doesn’t Fit CmRDT

Union is a merge of two full sets. CmRDTs don't merge sets — they replicate and apply operations like:

replica.apply("add cheese")
replica.apply("remove cheese")

You’d have to implement the remove as an operation — e.g., with a tombstone or tag — and ensure that:

apply(add(x)); apply(remove(x)) == apply(remove(x)); apply(add(x))

Which is only true with careful operation design — hence, the commutativity constraint.

✅ So What Does This Mean?

Question	Answer
Is union commutative?	Yes, but that’s for state merging
Is union used in CmRDTs?	No — CmRDTs replicate operations, not state
Is union used in CvRDTs?	Yes — union is a classic CvRDT merge strategy (e.g., G-Set)
Does CmRDT require causal delivery?	Yes — because operations like `remove(x)` depend on prior `add(x)`
What makes CvRDT convergent?	The merge function (⊔) being associative, commutative, and idempotent

🧠 Mnemonic

CmRDT → Commutative ops CvRDT → Convergent merge

Classes of CRDTs:​

🤩 Some Important Prerequisite Info​

✅ CRDTs Require These​

🔧 Real-World Examples​

State Based CRDTs / (also called convergent replicated data types, or CvRDTs)​

Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)​

🎯 Quick Definitions (as per Shapiro et al.)​

❓ “If I just do a union of sets, isn't that commutative?”​

Operation-based (CmRDT):​

State-based (CvRDT):​

❌ Why “Union” Doesn’t Fit CmRDT​

✅ So What Does This Mean?​

🧠 Mnemonic​