Re: [Lzip-bug] lzip vs. zstd

Antonio Diaz Diaz Wed, 19 Oct 2016 17:47:19 -0700

[email protected] wrote:

do you already have numbers, opinions and maybe a comparison in
reliability, speed, compression ratio etc. against the new zstd?

I have used unzcrash to test the ability of the zstd decoder to detectcorruption by itself (without a checksum), and the results are not good.As an example, here are the results of repeatedly decompressing the fileCOPYING.zst (a copy of the GPLv3) inverting a bit each time as to testall possible bit flips:


   11913 bytes tested
   95304 total decompressions
   56058 decompressions returned with zero status, of which
   56017 comparisons failed

The zstd decoder detects the corruption less than half of the times.Compare this with the lzip decoder, that detects about 99.99995% of thebit flips even without the help of its 3-factor integrity checking.

Using 'zstd --no-check' is significantly unsafer than using 'xz--check=none'.

Even with integrity checking enabled, my guess is that it is at least amillion times more probable to get a false negative (undetectedcorruption) from zstd than from lzip.

The zstd file format has many of the defects of the xz format[1];unprotected lengths, unprotected flags, unprotected dictionary IDs,optional integrity checking, optional file concatenation, and it doesnot seem to admit trailing data. Also the current version of the zstdfile format is 0.2.0, which may mean that changes in the format areexpected.

Zstd is described as a "fast real-time compression algorithm". AFAIK,its author does not recommend zstd for long-term archiving.


So my advice is that you should not use zstd for long-term archiving.

[1] http://www.nongnu.org/lzip/xz_inadequate.html

Juan Francisco Cantero Hurtado asked me if I know why the tests of zstdtake so long to finish.

It seems that 'make test' takes a lot of time (17 min) because it is afull regression test, not just a small test with a few files to verifythat compilation went well, as most programs do. The theoretical basisof zstd[2] seems more complicated than that of LZMA, and the authorprobably wants to make sure that any possible bug is caught early.

[2] https://arxiv.org/abs/1311.2540 Asymmetric numeral systems: entropycoding combining speed of Huffman coding with compression rate ofarithmetic coding.



Best regards,
Antonio.

_______________________________________________
Lzip-bug mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lzip-bug

Re: [Lzip-bug] lzip vs. zstd

Reply via email to