I deal with a lot of big files at work. While storage capacity is not infinite indeed. So it’s in my interest to keep the file sizes as low as possible.
One way to achieve that is by using compression. Especially when dealing with log files, or database archive, you can save a ton of space with the right compression tool.
But space saving is not the only consideration.
You also need to weighs in other factors. Such as :
- File type : different tool will compress different type of file differently
- CPU multi-core capabilities
- Compression speed
- Compression size
- Decompression time
But there are so many great compression tools available in Unix / Linux. It can be really confusing to choose which one to use even for a seasoned expert.
So I created X Compression Tool Benchmarker to help with this.
- Test any kind of file : just put the file’s name as the parameter when calling the script. Then it will be tested against all the specified compression tools.
- Add more compression tool easily : just edit the compressor_list & ext_file variable, and that’s it
- Fire and forget : just run the script, and forget it. It will run without needing any intervention
- CSV output : ready to be opened with Libre Office / Excel, and made into graphs in seconds.
Here’s a sample result for a Database archive file (type MySQL dump) :
The bar chart on top of this article is based from this result.
As you can see, currently this script will benchmark the following compression tools automatically : pigz – gzip – bzip2 – pbzip2 – lrzip – rzip – zstd – pixz – plzip – xz
The result, for each different file types, may surprise you
For example ; I was surprised to see rzip beat lrzip – because lrzip is supposed to be the enhancement of rzip.
Then I was even more surprised to find out that :
- I was testing Debian Buster’s version of rzip, which turned out to be pretty old – it does not even have multi-thread/core capability
- But when I tested the latest version of rzip, which can use all the 16 cores in my server – it turned out to be slower than the old rzip from Debian Buster !
- No, disk speed is not an issue – I made sure that all the benchmark was run from NVME SSD
So I was grinning at how Debian Buster packaged a very old version of rzip instead of the new one – turned out the joke’s on me : the old rzip perform better than the new one. Even without the multi-core capability.
Also it was amazing to see how really REALLY fast zstd is, while still giving decent compression size. When you absolutely need compression speed, this not so well known compression tool turned out to be the clear winner.
And so on, etc
Yes, indeed I had fun
I hope you will too. Enjoy !
UPDATE : My friend , Eko Juniarto, published his results here and have permitted me to publish it here as well – thanks. Very interesting, indeed.