Distributed Systems Interview Gotchas Cheat Sheet (Updated)

1. Leader-based vs Leaderless Replication

Common Trap: "Only one node can accept writes." → true only in leader-based systems.

Fix:

Leader-based (Raft, Paxos, Postgres streaming)
- One leader handles writes.
- Followers replicate from leader.
- Failover triggers leader election.
Leaderless (Dynamo, Cassandra, Riak)
- Any node can accept writes.
- Receiving node acts as temporary coordinator.
- Conflicts resolved via vector clocks, LWW, or CRDTs.
- No cluster-wide election on node failure.

2. CAP Theorem

Common Trap: "You can only pick 2 out of 3 always."

Actually: During a partition, you must choose either Consistency or Availability (Partition Tolerance is non-negotiable in distributed systems).

Fix (interview phrasing):

*“In a network partition, you must choose:

CP: Consistency + Partition Tolerance (block or fail requests)

AP: Availability + Partition Tolerance (allow temporary inconsistencies)”*

3. ACID vs CAP

Common Trap: "ACID and CAP are similar because they both have A and C." → leads to confusion about scope.

Fix:

ACID = think database transactions:
- What does a DB engine guarantee for one transaction?
- Atomicity (all or nothing)
- Consistency (valid state after transaction)
- Isolation (transactions don’t step on each other)
- Durability (persists after crash)
- A raw DB instance (like Postgres) doesn’t handle availability—that’s outside ACID’s scope.
CAP = think distributed system behavior when network communication is unreliable:
- It answers “who can do what, and when?” when nodes might not see each other.
- CAP can apply even within one physical system (multiple processes, sandboxing, or IPC boundaries).
Mental Anchor:

ACID = properties of a transaction inside one node > CAP = properties of operations across nodes (or processes) under unreliable communication

4. Quorum Math (N, W, R)

Common Trap: "W + R > N always means strong consistency."

Actually: Only in absence of partitions. During a partition, you still pick between stale reads (AP) or blocking (CP).

Fix (phrasing):

“When W + R > N, at least one replica in any read has the latest write, but CAP still applies during partitions.”

5. Vector Clocks vs LWW

Common Trap:

Assuming Last Write Wins solves conflicts perfectly.
Reality: LWW drops one concurrent write silently.

Fix:

Vector Clocks detect concurrency explicitly.
CRDTs merge changes meaningfully without coordination.

6. Consensus vs Replication

Common Trap:

Treating consensus (Raft, Paxos) as the same as replication (leaderless, multi-master).

Fix:

Consensus: Agreement on one sequence of operations (used for leader election, consistent logs).
Replication: Copying state across nodes (may be eventually consistent or strongly consistent depending on design).

7. Client-Side vs Server-Side Read Repair

Common Trap:

Assuming read repair always happens on the server.
Dynamo uses client-side repair; Cassandra uses server-side repair.

8. Sharding vs Partitioning

Common Trap:

Thinking they’re synonyms.
Sharding = horizontal scaling (split keys).
Partitioning = data separation for fault tolerance (e.g., availability zones).

9. Clocks & Time

Common Trap:

Relying on wall-clock timestamps for ordering.
NTP drift breaks causality.

Fix:

Use logical clocks (Lamport, vector clocks) when ordering matters.

1. Leader-based vs Leaderless Replication​

2. CAP Theorem​

3. ACID vs CAP​

4. Quorum Math (N, W, R)​

5. Vector Clocks vs LWW​

6. Consensus vs Replication​

7. Client-Side vs Server-Side Read Repair​

8. Sharding vs Partitioning​

9. Clocks & Time​

1. Leader-based vs Leaderless Replication

2. CAP Theorem

3. ACID vs CAP

4. Quorum Math (N, W, R)

5. Vector Clocks vs LWW

6. Consensus vs Replication

7. Client-Side vs Server-Side Read Repair

8. Sharding vs Partitioning

9. Clocks & Time