[email protected] wrote:
do you already have numbers, opinions and maybe a comparison in
reliability, speed, compression ratio etc. against the new zstd?
I have used unzcrash to test the ability of the zstd decoder to detect
corruption by itself (without a checksum), and the results are not good.
As an example, here are the results of repeatedly decompressing the file
COPYING.zst (a copy of the GPLv3) inverting a bit each time as to test
all possible bit flips:
11913 bytes tested
95304 total decompressions
56058 decompressions returned with zero status, of which
56017 comparisons failed
The zstd decoder detects the corruption less than half of the times.
Compare this with the lzip decoder, that detects about 99.99995% of the
bit flips even without the help of its 3-factor integrity checking.
Using 'zstd --no-check' is significantly unsafer than using 'xz
--check=none'.
Even with integrity checking enabled, my guess is that it is at least a
million times more probable to get a false negative (undetected
corruption) from zstd than from lzip.
The zstd file format has many of the defects of the xz format[1];
unprotected lengths, unprotected flags, unprotected dictionary IDs,
optional integrity checking, optional file concatenation, and it does
not seem to admit trailing data. Also the current version of the zstd
file format is 0.2.0, which may mean that changes in the format are
expected.
Zstd is described as a "fast real-time compression algorithm". AFAIK,
its author does not recommend zstd for long-term archiving.
So my advice is that you should not use zstd for long-term archiving.
[1] http://www.nongnu.org/lzip/xz_inadequate.html
Juan Francisco Cantero Hurtado asked me if I know why the tests of zstd
take so long to finish.
It seems that 'make test' takes a lot of time (17 min) because it is a
full regression test, not just a small test with a few files to verify
that compilation went well, as most programs do. The theoretical basis
of zstd[2] seems more complicated than that of LZMA, and the author
probably wants to make sure that any possible bug is caught early.
[2] https://arxiv.org/abs/1311.2540 Asymmetric numeral systems: entropy
coding combining speed of Huffman coding with compression rate of
arithmetic coding.
Best regards,
Antonio.
_______________________________________________
Lzip-bug mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lzip-bug