Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip

Damir Thu, 18 May 2017 00:32:17 -0700

Hello Antonio!



>
> > Some recent CPUs (x86_64 SSE4.2, PowerPC ISA 2.07, ARM v8.1) offer
> > hardware accelerated calculation of CRC32 with a different polynomial
> > (crc32c) than used in lzip (ethernet crc32).
>
> Maybe hardware accelerated calculation of ethernet CRC32 also exists.
> After all it is the same polynomial used by gzip and zlib.


Not in those CPUs I mentioned. And won't be implemented in new hardware
because of inferiority of ethernet poly.

>
>
> > So, picking crc32c poly instead has two benefits:
> > 1) hardware accelerated integrity checking
>
> Hardware acceleration of CRC calculation makes sense for storage devices
> because the data is just moved; there is no time spent in processing it.
> Calculating the CRC is the only calculation involved.
>
> But calculating the CRC is just a small part of the total decompression
> time. So, even if you accelerate it, the total speed gain is small.
> (Probably smaller than 5%). For compression the speed gain is even smaller.
>

I can cite your own lzip benchmark, when comparing uncompression peformance
of lunzip vs busybox unxz, enabling crc in unxz (by using xz with crc32)
gives performance penalty of 16.7% (9.723s vs 8.331s). That's more
convincing number than 5%.


>
> > 2) better protection against undetected errors
>
> You will need to prove this one.
>
> CRC32C has a slightly larger Hamming distance than ethernet CRC32 for
> "small" packet sizes (see pags 3,4 of [1]). But beyond some size perhaps
> not much larger than 128 KiB, both have the same HD of 2. For files
> larger than that (uncompressed) size, there is little diference between
> both CRCs.
>

That's not accurate at all. According to Koopman's crc32 zoo, crc32c gives
hd=4 on 2 gigabits, while ethernet crc is 92 kilobits.
http://users.ece.cmu.edu/~koopman/crc/crc32.html

2 gigabits (268 MB) is well within typical lzip usage, while 92 Kbit (11
KB) is nothing.


> Even more important, we are talking about the interaction between
> compression and integrity checking. The difference between a Hamming
> distance of 2 or 3 is probably immaterial here. Maybe you would like to
> read section 2.10 of [2]. I quote:
>

That's a valid point. There is no good error model describing corruption in
uncompressed file resulting from typical errors in compressed files (BER,
burst errors, nand page error, hdd sector error). Still, better HD on
typical sizes is preferred.


> > The downside is the compatibility problem, but changing version byte in
> > file header can help with that.
>
> This is a very large downside, most probably to gain almost nothing.
> IMO, one of the big problems of today's software development is that too
> many people are willing to complicate the code without the slightest
> proof that the proposed change is indeed an improvement.
>

That's an argument that everyone uses against implementing lzip. But
relatively narrow spread of lzip is actually plus here, because it gives
much more flexibility.

 New decompressor can decompress both old files and new ones, and old
decompressor can decompress new ones, but can't check its integrity. Large
downside, but not very large.


>
> Best regards,
> Antonio.
>
> _______________________________________________
> Lzip-bug mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lzip-bug
>

_______________________________________________
Lzip-bug mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lzip-bug

Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip

Reply via email to