Vivcre Learn learn it · write it · retain it

← All posts

Database #lsm#compaction#rum-conjecture#write-amplification#storage-engines

Compaction: The Triangle You Can't Escape

25 Jun 2026

An LSM tree keeps writes cheap by flushing them to immutable sorted files (SSTables) and merging those files later in the background. That merge step is compaction, and how you do it is the single most consequential design choice in an LSM engine. “Merges in the background” hides a real cost.

Three amplifications, pick two

Every time compaction merges files it rewrites data — reads the inputs, writes a fresh merged output. So a byte written once can be rewritten to disk many times over its life. That ratio — bytes physically written ÷ bytes the application asked to write — is write amplification. It fights two siblings:

The governing law (informally, the RUM conjecture): you get to minimize two of {read, write, space} — never all three. Every compaction strategy is just a different choice of which corner to sacrifice.

Size-tiered (STCS) — the write-heavy default

Group SSTables by size; when ~four files of roughly equal size pile up, merge them into one ~4× bigger file. Flushes drop 10 MB files; four → 40 MB; four of those → 160 MB; and so on.

Leveled (LCS) — the read-heavy default

Impose a rule: within any one level, SSTables have non-overlapping key ranges. So a key appears at most once per level, and a read checks at most one file per level — worst case ≈ the number of levels (~5–7), not the number of files. Reads and space both win.

But keeping ranges non-overlapping is not cheap — and this is the part most people get backwards. To push a small file down into a level, its range straddles several existing files there; you must read all of them, merge, and rewrite fresh non-overlapping files. Since each level holds ~10× the data of the one above, a small newcomer forces you to rewrite a slice of a level an order of magnitude bigger — and it happens again at every level the data passes through. A single key gets physically rewritten 10–40 times over its life. That’s high write amplification — the corner leveled compaction pays.

The whole topic in one shape

StrategyWrite ampRead ampSpace ampUse when
Size-tieredlowhighhigh (transient ~2×)write-heavy, ingest-bound
Leveledhigh (10–40×)low (≤1 file/level)lowread-heavy or space-constrained

Two strategies, opposite corners of the same triangle. You can’t escape the triangle — you only choose which amplification you can afford. It’s the same shape as the B-tree vs LSM choice one level up: every storage engine is a variation on “which amplification can I afford to pay?”


Practise these questions →

Spaced-repetition MCQs for this post, on practise.vivcre.com.