Benchmarks
What pg-xpatch's compression and speed look like in practice, drawn from the xpatch delta library and the pgit project built on it.
Benchmarks
Numbers depend on your data and your hardware, so treat these as direction, not promises. The honest way to size pg-xpatch is to load representative data and read xpatch.stats(). With that said, two published benchmarks show the shape of what to expect.
Real-world storage: pgit
pgit imports real Git repositories into PostgreSQL and stores their history through pg-xpatch, which makes it a good proxy for "how well does this compress actual versioned data." Its benchmark covers 20 open-source repos totaling 7.3 GB of raw history, measured against Git's own aggressive pack compression.
| Repository | pgit | git aggressive pack |
|---|---|---|
| serde | 51.6 | 36.5 |
| fzf | 71.1 | 61.3 |
| git | 66.8 | 82.0 |
On most repositories pgit matches or beats Git's packfiles, which are a high bar for delta compression. On the Git repository itself it lands behind, the price of living in a general-purpose database rather than a bespoke format. Across all 20 repos, PostgreSQL overhead averaged about 22% (ranging from 10% to 40%).
Raw delta performance: the xpatch library
Underneath, the xpatch library computes and applies the deltas. Its synthetic benchmark pits it against other delta algorithms, and on compression it sits near the top:
| Algorithm | Saved % |
|---|---|
| qbsdiff | 84.4 |
| xpatch | 74.6 |
| gdelta_zstd | 74.6 |
| gdelta_lz4 | 70.6 |
| gdelta | 68.4 |
| vcdiff | 64.2 |
| zstd_dict | 55.0 |
Decode is the operation pg-xpatch runs on every cache miss, so its speed is exactly what makes warm reads cheap. xpatch decodes at roughly 2.3 GB/s, in good company:
| Algorithm | Decode MB/s |
|---|---|
| gdelta | 4400 |
| gdelta_lz4 | 3900 |
| vcdiff | 2600 |
| xpatch | 2300 |
| gdelta_zstd | 2200 |
| qbsdiff | 111 |
| zstd_dict | 16 |
The synthetic benchmark ran with the whole dataset resident in CPU cache (L3), so the encode and decode figures reflect the algorithm alone, with no memory or disk pressure. In practice, decode speed is governed mostly by how fast the deltas and base versions can be fetched from memory and disk, not by the algorithm. Treat these as an upper bound, not what a cold read will see.
Encoding is more middling (around 306 MB/s, behind the gdelta family), but it runs once per version at write time, where it is rarely the bottleneck. On real Git histories the compression climbs well past the synthetic figures: 97.5% saved on mdn/content and 97.9% on tokio, because consecutive versions barely differ.
What this means for pg-xpatch
- Compression tracks similarity. Versioned text and code, the things pgit stores, hit the high end. Unrelated data does not. See when pg-xpatch pays off.
- Decode is cheap, so warm reads are cheap. The expensive step is the first reconstruction; after that the cache serves it.
- There is a database tax. A purpose-built format can still win on raw size. You trade some of that for living inside PostgreSQL, with SQL, indexes, and MVCC.
Measure your own
SELECT total_rows, compression_ratio, raw_size_bytes, compressed_size_bytesFROM xpatch.stats('your_table');The repo also ships a benchmark script under benchmark/ for an end-to-end storage and query comparison against a plain heap table.
Push the ratio further on your own data.
Why decode speed makes warm reads fast.
Hat das auf deinem Setup funktioniert?
Noch nicht bewertet