On 2025-01-17 Simon Josefsson wrote: > Lasse Collin <lasse.col...@tukaani.org> writes: > > > [1] > > https://github.com/tukaani-project/xz/blob/master/src/liblzma/check/crc_x86_clmul.h > > > > Oh. Do you see any chance to merge your code into gnulib? If there > is any feature in your implementation that we don't have. And then > use it in xz?
My code hasn't been included in any xz release yet. I don't want to backport it to 5.6.x and 5.7.1alpha isn't out yet. So the code hasn't been tested by many people yet. A CRC function tries to produce the correct result as fast as possible. There isn't much else to compare feature wise. It is fine to me if the code is adapted into Gnulib. However, a significant amount of effort has been put into the version that is in Gnulib already, so I don't want make any recommendations, especially since my version hasn't been widely tested yet. (I was afraid that if I link to my code, it's easily seen as if I'm advertising a replacement. Sorry.) > It seems a bit silly to have a lot of different > implementations around for this fairly low-level simple code, and as > we saw in this thread, maintaining it does show up bugs... Yes, it's a bit silly and duplicate effort. I suppose there are multiple reasons why many implementations exists, for example, avoiding extra dependencies, licensing questions, and just having fun. There has to be a reason why a CRC32 function is wanted in Gnulib instead of adding, for example, zlib as a dependency to packages that need CRC32. Even GNU has more than one CLMUL CRC implementation. See Libgcrypt: https://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=blob;f=cipher/crc-intel-pclmul.c I tested Gnulib's lib/crc-x86_64-pclmul.c (v1.0-1387-g5dd298feb0) a bit. I changed the target attribute to plain "pclmul" to make it work on a processor that lacks AVX. With 0-3 byte inputs the result differs from what I see from other implementations. I didn't investigate why it happens. I attached a test program that tests alignment variations but the problem can be seen without such a loop already. About sanitizers: They can be annoying with SIMD code. If a function is passed an unaligned buffer, it would be fine to round the address down to an aligned value, do an aligned read, and ignore the out-of-bounds bytes. One can do it in assembly because sanitizers don't see it. In contrast to sanitizers, Valgrind is happy if the extra bytes are thrown away. -- Lasse Collin
/* From Gnulib version, comment out #includes for <config.h>, "crc-x86_64.h", and "crc.h", and add <stdint.h>. Possibly omit ",avx" from the attribute too. gcc -O2 -Wall -Wextra crc32-test.c -DUSE_ZLIB -lz gcc -O2 -Wall -Wextra crc32-test.c -DUSE_LIBLZMA -llzma gcc -O2 -Wall -Wextra crc32-test.c crc-x86_64-pclmul-modified.c */ #include <inttypes.h> #include <stdlib.h> #include <stdio.h> #if defined(USE_ZLIB) # include <zlib.h> # define CRC32(buf, size, crc) crc32(crc, buf, size) #elif defined(USE_LIBLZMA) # include <lzma.h> # define CRC32(buf, size, crc) lzma_crc32(buf, size, crc) #else extern uint32_t crc32_update_no_xor_pclmul( uint32_t crc, const void *buf, size_t len); # define CRC32(buf, size, crc) ~crc32_update_no_xor_pclmul(~(crc), buf, size) #endif #define END_SIZE 2000 static uint8_t r[END_SIZE + 16 * 17]; // Yes, it's a bit oversized. static uint32_t test(size_t size) { uint32_t crc = 0; for (size_t k = 0; k < 16 * 17; k += 17) crc = CRC32(r + k, size, crc); return crc; } extern int main(void) { uint32_t seed = 29; for (size_t i = 0; i < sizeof(r); ++i) { seed = seed * 1103515245 + 12345; r[i] = (uint8_t)(seed >> 22); } for (size_t size = 0; size < END_SIZE; ++size) printf("%-4zu 0x%08" PRIX32 "\n", size, test(size)); return 0; }