On 2025-01-17 Simon Josefsson wrote:
> Lasse Collin <lasse.col...@tukaani.org> writes:
> 
> > [1] 
> > https://github.com/tukaani-project/xz/blob/master/src/liblzma/check/crc_x86_clmul.h
> >   
> 
> Oh.  Do you see any chance to merge your code into gnulib?  If there
> is any feature in your implementation that we don't have.  And then
> use it in xz?

My code hasn't been included in any xz release yet. I don't want to
backport it to 5.6.x and 5.7.1alpha isn't out yet. So the code hasn't
been tested by many people yet.

A CRC function tries to produce the correct result as fast as possible.
There isn't much else to compare feature wise. It is fine to me if the
code is adapted into Gnulib. However, a significant amount of effort has
been put into the version that is in Gnulib already, so I don't want
make any recommendations, especially since my version hasn't been widely
tested yet. (I was afraid that if I link to my code, it's easily seen as
if I'm advertising a replacement. Sorry.)

> It seems a bit silly to have a lot of different
> implementations around for this fairly low-level simple code, and as
> we saw in this thread, maintaining it does show up bugs...

Yes, it's a bit silly and duplicate effort. I suppose there are
multiple reasons why many implementations exists, for example, avoiding
extra dependencies, licensing questions, and just having fun. There has
to be a reason why a CRC32 function is wanted in Gnulib instead of
adding, for example, zlib as a dependency to packages that need CRC32.

Even GNU has more than one CLMUL CRC implementation. See Libgcrypt:

    
https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=cipher/crc-intel-pclmul.c

I tested Gnulib's lib/crc-x86_64-pclmul.c (v1.0-1387-g5dd298feb0) a
bit. I changed the target attribute to plain "pclmul" to make it work
on a processor that lacks AVX. With 0-3 byte inputs the result differs
from what I see from other implementations. I didn't investigate why it
happens. I attached a test program that tests alignment variations but
the problem can be seen without such a loop already.

About sanitizers: They can be annoying with SIMD code. If a function is
passed an unaligned buffer, it would be fine to round the address down
to an aligned value, do an aligned read, and ignore the out-of-bounds
bytes. One can do it in assembly because sanitizers don't see it. In
contrast to sanitizers, Valgrind is happy if the extra bytes are thrown
away.

-- 
Lasse Collin
/*
From Gnulib version, comment out #includes for <config.h>, "crc-x86_64.h",
and "crc.h", and add <stdint.h>. Possibly omit ",avx" from the attribute too.

gcc -O2 -Wall -Wextra crc32-test.c -DUSE_ZLIB -lz
gcc -O2 -Wall -Wextra crc32-test.c -DUSE_LIBLZMA -llzma
gcc -O2 -Wall -Wextra crc32-test.c crc-x86_64-pclmul-modified.c
*/

#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>

#if defined(USE_ZLIB)
#   include <zlib.h>
#   define CRC32(buf, size, crc) crc32(crc, buf, size)
#elif defined(USE_LIBLZMA)
#   include <lzma.h>
#   define CRC32(buf, size, crc) lzma_crc32(buf, size, crc)
#else
extern uint32_t crc32_update_no_xor_pclmul(
                uint32_t crc, const void *buf, size_t len);
#   define CRC32(buf, size, crc) ~crc32_update_no_xor_pclmul(~(crc), buf, size)
#endif


#define END_SIZE 2000

static uint8_t r[END_SIZE + 16 * 17]; // Yes, it's a bit oversized.


static uint32_t
test(size_t size)
{
        uint32_t crc = 0;

        for (size_t k = 0; k < 16 * 17; k += 17)
                crc = CRC32(r + k, size, crc);

        return crc;
}


extern int
main(void)
{
        uint32_t seed = 29;
        for (size_t i = 0; i < sizeof(r); ++i) {
                seed = seed * 1103515245 + 12345;
                r[i] = (uint8_t)(seed >> 22);
        }

        for (size_t size = 0; size < END_SIZE; ++size)
                printf("%-4zu  0x%08" PRIX32 "\n", size, test(size));

        return 0;
}

Reply via email to