Archival Compression Comparisons

December 22, 2025

I need to archive roughly 1 TB of data as a write-once, read-almost-never redundancy layer for files I care about. S3 Glacier Deep Archive is extremely cheap (~$0.002/GB/month), but at that scale it is still worth compressing the data as aggressively as possible beforehand.

The data consists almost entirely of large CSV files containing high-entropy numeric data. This post documents a few practical benchmarks to determine the most sensible zstd settings for this workload.

Hardware

CPU: Intel Core i7-10700K
RAM: 31.26 GiB
Disk: Crucial P3 Plus 2 TB NVMe

Dataset 1: Large, Multi-file Corpus

16 GiB total
318 CSV files
High-entropy numeric trade data

Compressor	Command	Time	Compression	Final size
zstd	`-19 --long=31 -T8`	41m 40s	6.45×	2.70 GiB
zstd (ultra)	`--ultra -22 --long=31 -T8`	2h 48m	6.54×	2.67 GiB

The --ultra setting provides only a ~1.3% reduction in size at the cost of more than a 4× increase in runtime. This is a clear case of diminishing returns.

Dataset 2: Single-file Benchmark

1 GiB total
Single CSV file of the same type
Tested with -T0 and --long=31

zstd level	Time (s)	Final size	Ratio
1	1.4	164 MiB	15.2%
2	1.4	170 MiB	15.9%
3	1.5	177 MiB	16.4%
4	1.7	177 MiB	16.5%
5	2.8	161 MiB	15.0%
6	3.3	150 MiB	13.9%
7	5.0	149 MiB	13.9%
8	5.2	135 MiB	12.6%
9	6.9	136 MiB	12.6%
10	12.0	135 MiB	12.6%
11	16.9	134 MiB	12.4%
12	19.8	134 MiB	12.5%
13	17.0	132 MiB	12.2%
14	20.1	130 MiB	12.1%
15	28.4	130 MiB	12.1%
16	30.6	110 MiB	10.2%
17	55.7	111 MiB	10.3%
18	76.1	108 MiB	10.0%
19	122.3	107 MiB	10.0%

Conclusion

Strangely, levels 8 to 15 offer almost identical compression (12.6 to 12.1%) but at vastly different speeds (a 5x increase). The highest two levels before ultra (18 and 19) do shave off a bit of final file size, and given my requirements (S3 Deep Archive requires files to be kept 6 months at least), it could make sense to use those settings. But at a 10-20x increase in processing time... I'd rather not brutalise my CPU over the next few days to save a dollar a year at best.

← Back to blog