To Disk and Back rm
: A File’s Tale
“I will take the data to the disk, though I do not know the way.”
This is the journey of a humble string — "hello\n"
— from the moment it leaves userland until it rests on disk, only to be swept away again by rm
.
We’re still using our logical disk (/dev/loop7
formatted with ext4) from the previous article.
1. The Call to Adventure: write()
When a process calls:
write(fd, "hello\n", 6);
- The syscall enters the VFS (Virtual Filesystem).
- VFS translates the file descriptor into an inode in ext4.
- Instead of writing straight to disk, the kernel places the data in the page cache (RAM).
- The system call returns immediately — the application thinks the write is “done,” but really it’s just cached.
The Fellowship has left the Shire, but the journey is only beginning.
2. Through the Page Cache
- Data sits in RAM, waiting.
- A background process (the writeback thread) eventually flushes it.
- Meanwhile, ext4 must decide: which disk blocks should store this file?
This is where extents come in:
-
Instead of listing every block individually, ext4 allocates a range:
[block 1000, length=1]
-
The inode is updated to point to this extent.
-
Metadata changes (inode size, free block bitmap) are bundled together.
3. The Journal of Gondor (err, ext4)
Before updating the real structures, ext4 writes to the journal (like a WAL):
- Transaction record: “inode X now points to block 1000, file size=6.”
- Written sequentially to the journal area.
- Only after journal commit does ext4 mark the metadata as applied.
Journaling modes:
- Ordered (default): Data written first, then metadata journaled.
- Writeback: Metadata only. Faster, riskier.
- Journal: Data + metadata journaled. Slowest, safest.
If a crash occurs:
- Before commit → no change (as if Frodo never left).
- After commit but before checkpoint → journal replay restores consistency.
4. The Return Journey: rm hello.txt
When you rm hello.txt
, no orc comes to overwrite the data immediately. Instead:
-
Unlink call:
rm
callsunlink()
.- Directory entry (
name → inode
) is removed.
-
Link count:
- Inode’s link count decrements.
- If it hits zero (no other hard links), inode is considered free.
-
Freeing blocks:
- Blocks marked free in the block bitmap.
- Inode marked free in the inode bitmap.
-
But the data remains!
- Until new files overwrite those blocks, the actual bytes still sit on disk.
- Tools like
strings
orphotorec
can sometimes recover them.
The Fellowship dissolved, but traces of their passing remain in Middle-earth.
5. Key Takeaways
- write() is asynchronous: the syscall ends once data is in RAM.
- Extents make block allocation efficient.
- Journaling protects metadata consistency, not necessarily your file’s contents.
- rm only unlinks — the data lingers until reused.