Re: fpurge: Improve configure test

2024-11-26 Thread Bruno Haible via Gnulib discussion list
Eli Schwartz wrote: > > [case "$host_os" in > > # Guess yes on musl systems. > > *-musl* | midipix*) gl_cv_func_fpurge_works="guessing yes" ;; > > # Otherwise obey --enable-cross-guesses. > > *)

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Sam Russell
> It's your choice: 3 compilation units for x86_64, or 1 compilation unit > for x86_64, or no extra compilation unit (all code contained in .h files) — > as you prefer. Fine with me either way. Let's cross that bridge when we get to it :) I'm fairly relaxed which one we choose in the end. Final p

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Bruno Haible via Gnulib discussion list
Sam Russell wrote: > It makes sense to keep them in the same module though, I agree. Thanks. > I'd prefer to keep them as separate files if you're okay with it. I did a > quick experiment and by wrapping each function in push_options and > pop_options pragmas it was pretty easy to get it all work

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Sam Russell
> Thue use of _mm_loadu_si128 provides for unaligned byte arrays (that's > the 'u' in the 'loadu'), so you will be Ok there, too. Thanks Jeff, I wasn't going to push this with a "works for me" without knowing why. I'll remove the alignment code. > I believe the way to zero a __m128i is using _mm_

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Jeffrey Walton
On Tue, Nov 26, 2024 at 4:27 PM Sam Russell wrote: > > I've added an alignment check in lib/crc, it looks like the code works okay > without it for me but an _m128 is supposed to be 128-bit aligned so I'm happy > that I've added it. The _m128i's are naturally aligned. They will be ok: +

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Sam Russell
I've added an alignment check in lib/crc, it looks like the code works okay without it for me but an _m128 is supposed to be 128-bit aligned so I'm happy that I've added it. The attached patch renames the module to crc-x86_64 while keeping the source file crc-x86_64-pclmul.c, as well as the alignm

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Sam Russell
> Cool. But it even gets better: one can use these target options on a per- > function basis, via __attribute__. See > https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/x86-Function-Attributes.html#index-target_0028_0022avx_0022_0029-function-attribute_002c-x86 > https://gcc.gnu.org/onlinedocs/gcc-14

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Bruno Haible via Gnulib discussion list
Thanks for the updated patch. I'm fine with the 'crc-x64_64-pclmul' name. Sam Russell wrote: > > * Are the options -mpclmul -mavx understood by both gcc and clang? > > Or does clang use different options for the same thing? > > As per [1] it looks to be the case Thanks for having checked it.

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Sam Russell
> * I would suggest to rename the main source file from crc-pclmul.c to > crc-x86_64.c. > Rationale: So that immediately clear that the code is specific to the > x86_64 CPUs. Not everyone is an assembly language hacker, and even some > assembly language hackers (like me) don't know ab

Re: [PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Bruno Haible via Gnulib discussion list
Hi Sam, Thanks for working on this! Sam Russell wrote: > 85% time reduction on AMD Ryzen 5 5600: > > $ ./gltests/bench-crc 100 > real 1.740296 > user 1.740 > sys0.000 > > $ ../bench-crc-pclmul 100 > real 0.248324 > user 0.248 > sys0.000 > > This translates to a 13% time

[PATCH] crc: Add PCLMUL implementation

2024-11-26 Thread Sam Russell
85% time reduction on AMD Ryzen 5 5600: $ ./gltests/bench-crc 100 real 1.740296 user 1.740 sys0.000 $ ../bench-crc-pclmul 100 real 0.248324 user 0.248 sys0.000 This translates to a 13% time reduction for gzip: $ time ./gzip_sliceby8 -k -d -c large_file.gz > /dev/null re

fpurge: Improve configure test

2024-11-26 Thread Bruno Haible via Gnulib discussion list
Eli Schwartz reported that the 'fpurge' configure test, on musl libc, produces different results a) with CC="gcc" b) with CC="gcc -Werror=implicit-function-declaration" (which is used as an approximation for strict C23 compilers, such as recent clang releases with -std=gnu23).