Simon Josefsson wrote: > I think in this example, I think it makes > sense for gnulib to provide a optimized CRC function that may contain > architecture-specific optimizations. The reason seems to be that while > there are numerous different optimized implementations around, few seems > to be arranged in a re-usable fashion. > > The barrier for acceptance in gnulib may be higher than in some > individual projects since gnulib is intended to be highly portable and > flexible, and this requires extra care when doing the implementation. > But there are examples of arch-specific assembler code in gnulib > already.
I agree. In this case, it makes sense to have special optimizations for particular CPUs because it's such a bottleneck for gzip and because these optimizations are significant: - pclmul (x86_64) twice as fast [1], - crc32[bhwx] (arm64) [2] five times as fast [3] or pmull [4]. For maintainability, it does not seem wise to have sizeable portions of assembly-language code for different CPUs in the same source file. I would therefore suggest these file names: lib/crc.c for the generic C code, lib/crc-x64_64.h for the x86_64 code, libcrc-arm64.h for the arm64 code. But first, we need to have the slice-by-8 as portable code in lib/crc.c. Bruno [1] https://lists.gnu.org/archive/html/bug-gnulib/2024-10/msg00095.html [2] https://developer.arm.com/documentation/dui0801/h/A64-General-Instructions/CRC32B--CRC32H--CRC32W--CRC32X [3] https://patchwork.kernel.org/project/linux-crypto/patch/1416417577-27495-1-git-send-email-yazen.ghan...@linaro.org/ [4] https://lwn.net/Articles/994292/