π LSM Tree vs B-Tree
When databases choose a storage/indexing structure, they usually pick between B-trees (classic choice for OLTP databases) and LSM trees (modern choice for write-heavy keyβvalue stores).
Both solve the same fundamental problem β organizing sorted keyβvalue pairs on disk β but they make different trade-offs based on whether the workload is read-heavy or write-heavy.
π Side-by-Side Comparisonβ
Aspect | LSM Tree | B-Tree (and B+Tree) |
---|---|---|
Write Pattern | Appends new data to memory (MemTable) + WAL, later flushed as immutable sorted SSTables β sequential writes. | Updates/insertions happen in place on disk β random writes. |
Read Pattern | May need to check MemTable + several SSTables. Uses Bloom filters + indexes to cut down search. Reads can be slower if many levels exist. | Generally faster point lookups (1β2 disk seeks), since data is in one structure with well-defined pages. |
Range Scans | Good (files are sorted). Might need to merge results across SSTables. | Very good β data locality in leaf pages makes range scans efficient. |
Compaction / Maintenance | Requires background compaction (merging/deleting across SSTables). This adds write amplification. | No compaction step β nodes are balanced as part of normal operations. |
Write Amplification | High (same data written multiple times at different levels). | Lower (modifies pages once, though splits can cascade). |
Read Amplification | Higher if many SSTables exist. Mitigated by Bloom filters and leveled compaction. | Lower β usually a single tree traversal. |
Memory Usage | Needs RAM for MemTable + Bloom filters + indexes. | Needs RAM for buffer cache to keep hot pages, but no Bloom filters. |
Workload Fit | Optimized for write-heavy workloads (e.g., logging, time series, streaming ingestion, event data). | Optimized for read-heavy and OLTP workloads (e.g., banking, ERP, indexes for relational databases). |
Examples | Cassandra, ScyllaDB, HBase, LevelDB, RocksDB. | MySQL (InnoDB), PostgreSQL, Oracle, SQL Server. |
π¦ In Practiceβ
- LSM tree: βTake writes now, sort/merge later.β Best for high-ingest, append-heavy workloads.
- B-tree: βKeep data balanced as we go.β Best for low-latency queries and frequent point lookups.
LSM Tree:
B-Tree: