Skip to main content

CRDTs: Mergeable Data Structures for Distributed Systems

✅ CRDTs Require These

To merge state safely and deterministically across nodes, operations must be:

  • Associative: So you don’t care about how things are grouped in time
  • Commutative: So replicas can apply things in different orders
  • Idempotent: So retries or duplicate messages don’t corrupt the result

🔧 Real-World Examples

OperationAssociativeCommutativeIdempotentNotes
+ (integers)Retrying +1 causes drift
max(a, b)CRDT-friendly (e.g. LWW strategy)
set.add(x)G-Set in CRDTs
set.remove(x)✅ (in some CRDTs)Must union (and potentially add unique ids)
assign = xDangerous in distributed settings

State Based CRDTs / (also called convergent replicated data types, or CvRDTs)

Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)

Let's clarify the exact definition boundary and why:

CmRDT ≠ unionCvRDT = union-based merge

And what that means for commutativity vs convergence.


🎯 Quick Definitions (as per Shapiro et al.)

TermMeans You...Convergence Guarantee Comes From
CmRDTReplicate operations between nodesCommutative operations
CvRDTReplicate full state between nodesMergeable state (⊔ operator)

✅ The Cm in CmRDT literally stands for Commutative ✅ The Cv in CvRDT stands for Convergent (via a merge function)


❓ “If I just do a union of sets, isn't that commutative?”

Yes — but that’s a merge of state, not an application of operations.

This is the heart of the distinction:

Operation-based (CmRDT):

  • You store:

    • op1 = add("cheese")
    • op2 = remove("cheese")
  • You apply ops in causally correct order

  • You require delivery guarantees

  • Your application logic must handle idempotency + conflicts

apply(op1); apply(op2) === apply(op2); apply(op1)
⇨ only true if ops are commutative

State-based (CvRDT):

  • You never apply ops directly — instead:

    • You just merge state
    • Merge is something like set1 ∪ set2 or max(counter1, counter2)
  • Commutativity is not required of the original operations — only of the merge function


❌ Why “Union” Doesn’t Fit CmRDT

Union is a merge of two full sets. CmRDTs don't merge sets — they replicate and apply operations like:

replica.apply("add cheese")
replica.apply("remove cheese")

You’d have to implement the remove as an operation — e.g., with a tombstone or tag — and ensure that:

apply(add(x)); apply(remove(x)) == apply(remove(x)); apply(add(x))

Which is only true with careful operation design — hence, the commutativity constraint.


✅ So What Does This Mean?

QuestionAnswer
Is union commutative?Yes, but that’s for state merging
Is union used in CmRDTs?No — CmRDTs replicate operations, not state
Is union used in CvRDTs?Yes — union is a classic CvRDT merge strategy (e.g., G-Set)
Does CmRDT require causal delivery?Yes — because operations like remove(x) depend on prior add(x)
What makes CvRDT convergent?The merge function (⊔) being associative, commutative, and idempotent

🧠 Mnemonic

CmRDT → Commutative ops CvRDT → Convergent merge