Skip to main content

CRDTs: Mergeable Data Structures for Distributed Systems

Start With This Excellent Explanation

Then Skim The Wikipedia Article

My Notes:

Classes of CRDTs:​

  • Operation Based CRDTs - Transmit only the Updates
  • State Based CRDTs - The entire state of every CRDT must be transmitted eventually to every other replica

🀩 Some Important Prerequisite Info​

It wasnt just for Algebra 2...

PropertyDefinitionExampleWhy It Matters
AssociativeGrouping doesn’t matter:(a ⋆ b) ⋆ c = a ⋆ (b ⋆ c)(1 + 2) + 3 = 1 + (2 + 3)You can reorder merge steps without affecting the result
CommutativeOrder doesn’t matter:a ⋆ b = b ⋆ a2 + 5 = 5 + 2You can replicate and apply ops in any order
IdempotentReapplying doesn’t change the result:a ⋆ a = amax(3, 3) = 3You can safely reapply updates (e.g. after network retries)

βœ… CRDTs Require These​

To merge state safely and deterministically across nodes, operations must be:

  • Associative: So you don’t care about how things are grouped in time
  • Commutative: So replicas can apply things in different orders
  • Idempotent: So retries or duplicate messages don’t corrupt the result

πŸ”§ Real-World Examples​

OperationAssociativeCommutativeIdempotentNotes
+ (integers)βœ…βœ…βŒRetrying +1 causes drift
max(a, b)βœ…βœ…βœ…CRDT-friendly (e.g. LWW strategy)
set.add(x)βœ…βœ…βœ…G-Set in CRDTs
set.remove(x)βœ…βœ…βœ… (in some CRDTs)Must union (and potentially add unique ids)
assign = x❌❌❌Dangerous in distributed settings

State Based CRDTs / (also called convergent replicated data types, or CvRDTs)​

Operation-based CRDTs (also called commutative replicated data types, or CmRDTs)​

Let's clarify the exact definition boundary and why:

βœ… CmRDT β‰  union βœ… CvRDT = union-based merge

And what that means for commutativity vs convergence.


🎯 Quick Definitions (as per Shapiro et al.)​

TermMeans You...Convergence Guarantee Comes From
CmRDTReplicate operations between nodesCommutative operations
CvRDTReplicate full state between nodesMergeable state (βŠ” operator)

βœ… The Cm in CmRDT literally stands for Commutative βœ… The Cv in CvRDT stands for Convergent (via a merge function)


❓ β€œIf I just do a union of sets, isn't that commutative?”​

Yes β€” but that’s a merge of state, not an application of operations.

This is the heart of the distinction:

Operation-based (CmRDT):​

  • You store:

    • op1 = add("cheese")
    • op2 = remove("cheese")
  • You apply ops in causally correct order

  • You require delivery guarantees

  • Your application logic must handle idempotency + conflicts

apply(op1); apply(op2) === apply(op2); apply(op1)
⇨ only true if ops are commutative

State-based (CvRDT):​

  • You never apply ops directly β€” instead:

    • You just merge state
    • Merge is something like set1 βˆͺ set2 or max(counter1, counter2)
  • Commutativity is not required of the original operations β€” only of the merge function


❌ Why β€œUnion” Doesn’t Fit CmRDT​

Union is a merge of two full sets. CmRDTs don't merge sets β€” they replicate and apply operations like:

replica.apply("add cheese")
replica.apply("remove cheese")

You’d have to implement the remove as an operation β€” e.g., with a tombstone or tag β€” and ensure that:

apply(add(x)); apply(remove(x)) == apply(remove(x)); apply(add(x))

Which is only true with careful operation design β€” hence, the commutativity constraint.


βœ… So What Does This Mean?​

QuestionAnswer
Is union commutative?Yes, but that’s for state merging
Is union used in CmRDTs?No β€” CmRDTs replicate operations, not state
Is union used in CvRDTs?Yes β€” union is a classic CvRDT merge strategy (e.g., G-Set)
Does CmRDT require causal delivery?Yes β€” because operations like remove(x) depend on prior add(x)
What makes CvRDT convergent?The merge function (βŠ”) being associative, commutative, and idempotent

🧠 Mnemonic​

CmRDT β†’ Commutative ops CvRDT β†’ Convergent merge