Skip to main content

Why fsync Matters for Durability

Most developers know the D in ACID stands for Durability β€” the guarantee that once a transaction is committed, it will survive power loss or crashes.
What’s less obvious is that this durability often comes down to a single system call: fsync.


What fsync Does​

Normally when an application writes to a file, the data goes to the OS page cache in RAM.
The kernel decides when to flush it to disk, which could be seconds later.

Calling fsync(fd) tells the OS:

β€œDon’t just buffer this β€” push it all the way to stable storage now.”

Without it, a machine crash could wipe out β€œcommitted” data that only lived in memory.


Why Databases Rely on It​

Databases use fsync to ensure Write-Ahead Logs (WALs) or redo logs are safely persisted:

  • Postgres/MySQL β†’ write changes to WAL, then call fsync. Only then do they acknowledge COMMIT.
  • Redis β†’ configurable:
    • appendfsync always β†’ durability, higher latency.
    • appendfsync everysec β†’ default, at most 1s of data loss.
    • appendfsync no β†’ no durability, fastest.
  • Cassandra/RocksDB β†’ rely on appending to commit logs / SSTables with periodic fsync, durability mostly ensured by replication.

The Tradeoff: Speed vs Safety​

  • Frequent fsyncs

    • Pros: strong durability (every committed txn is on disk).
    • Cons: higher latency (waiting on disk I/O).
  • Infrequent fsyncs (batched, async, configurable)

    • Pros: faster throughput.
    • Cons: risk of losing recent commits if power is lost.

This is why many DBs expose durability knobs: you can tune how often fsync happens depending on your SLA and risk tolerance.


Why It Matters​

In a system design interview or in production, durability isn’t abstract β€” it’s about fsync policy:

  • Banking system? β†’ fsync every commit.
  • Analytics pipeline? β†’ batch fsyncs, accept potential data loss.
  • Cache/session store? β†’ skip fsync altogether, optimize for speed.

Key Takeaways​

  • Durability depends on when/if fsync is called.
  • Different DBs expose this as configuration knobs.
  • The tradeoff is always latency vs safety.

In short: if you care about your data surviving a crash, you should care about fsync.