Skip to main content

To Disk and Back rm: A File’s Tale

“I will take the data to the disk, though I do not know the way.”

This is the journey of a humble string — "hello\n" — from the moment it leaves userland until it rests on disk, only to be swept away again by rm.

We’re still using our logical disk (/dev/loop7 formatted with ext4) from the previous article.


1. The Call to Adventure: write()

When a process calls:

write(fd, "hello\n", 6);
  • The syscall enters the VFS (Virtual Filesystem).
  • VFS translates the file descriptor into an inode in ext4.
  • Instead of writing straight to disk, the kernel places the data in the page cache (RAM).
  • The system call returns immediately — the application thinks the write is “done,” but really it’s just cached.

The Fellowship has left the Shire, but the journey is only beginning.


2. Through the Page Cache

  • Data sits in RAM, waiting.
  • A background process (the writeback thread) eventually flushes it.
  • Meanwhile, ext4 must decide: which disk blocks should store this file?

This is where extents come in:

  • Instead of listing every block individually, ext4 allocates a range:

    [block 1000, length=1]
  • The inode is updated to point to this extent.

  • Metadata changes (inode size, free block bitmap) are bundled together.


3. The Journal of Gondor (err, ext4)

Before updating the real structures, ext4 writes to the journal (like a WAL):

  • Transaction record: “inode X now points to block 1000, file size=6.”
  • Written sequentially to the journal area.
  • Only after journal commit does ext4 mark the metadata as applied.

Journaling modes:

  • Ordered (default): Data written first, then metadata journaled.
  • Writeback: Metadata only. Faster, riskier.
  • Journal: Data + metadata journaled. Slowest, safest.

If a crash occurs:

  • Before commit → no change (as if Frodo never left).
  • After commit but before checkpoint → journal replay restores consistency.

4. The Return Journey: rm hello.txt

When you rm hello.txt, no orc comes to overwrite the data immediately. Instead:

  1. Unlink call:

    • rm calls unlink().
    • Directory entry (name → inode) is removed.
  2. Link count:

    • Inode’s link count decrements.
    • If it hits zero (no other hard links), inode is considered free.
  3. Freeing blocks:

    • Blocks marked free in the block bitmap.
    • Inode marked free in the inode bitmap.
  4. But the data remains!

    • Until new files overwrite those blocks, the actual bytes still sit on disk.
    • Tools like strings or photorec can sometimes recover them.

The Fellowship dissolved, but traces of their passing remain in Middle-earth.


5. Key Takeaways

  • write() is asynchronous: the syscall ends once data is in RAM.
  • Extents make block allocation efficient.
  • Journaling protects metadata consistency, not necessarily your file’s contents.
  • rm only unlinks — the data lingers until reused.