> * I would suggest to rename the main source file from crc-pclmul.c to > crc-x86_64.c. > Rationale: So that immediately clear that the code is specific to the > x86_64 CPUs. Not everyone is an assembly language hacker, and even some > assembly language hackers (like me) don't know about the 'pclmul' > instruction and of which ISA it is part of. Whereas everyone knows what > x86_64 is.
I am fine with whatever structure, the thing to keep in mind is I have a
patch open for corelib extending their 128-bit implementation (pclmul is
the cpu flag we check) to 256-bit (avx2 + vpclmulqdq) and avx 512 (avx512 +
vpclmulqdq). x86_64 seems fine to have in the title but I think there's
value in specifying the CPU flag we're checking. The build flag is
especially important because the avx2 instruction vpclmulqdq can be
assembled with a VEX (avx2) or EVEX (avx512) prefix, and a CPU with avx2
support would SIGILL on the EVEX prefix despite supporting the VEX version
of the instruction, so in my coreutils implementation I actually build each
of the 3 pclmul/avx2/avx512 implementations as separate objects with
separate sets of flags.
> * Similarly, I suggest to rename the module 'crc-pclmul' to 'crc-x86_64'.
As above, I have no preference, I've renamed it to crc-x86_64-pclmul but am
happy to go with any suggestion, as long as it opens us to avx2 and avx512
in future implementations
> * In the patch, one of the diffs ends with
> \ No newline at end of file
> Could you please add a newline at the end of that source file?
Fixed.
> * In crc.c: Would it make sense to cache the value of pclmul_enabled
> in a static variable?
> You can probably guess it by looking at the assembly-language code
> that gcc -O2 produces for the two __builtin_cpu_supports invocations.
Done.
This sounds like a good optimisation if we're worried about people
incrementally hitting this function for small (>64) byte hashes, so I've
made it fall back to the older implementation for these as the
initialisation of all the xmm registers also burns a ton of cycles.
> * Are the options -mpclmul -mavx understood by both gcc and clang?
> Or does clang use different options for the same thing?
As per [1] it looks to be the case, these appear to match directly to the
output of $(lscpu)
> * In the module description, when you write
> CFLAGS += -mpclmul -mavx
> it has an effect on all compilations of the same Makefile (e.g. in the
> case of coreutils, on all the coreutils programs). Which means that
the
> resulting binaries will *NOT WORK* on CPUs that don't have AVX
extensions.
> So, these CFLAGS changes should be limited to the one compilation unit
> that needs it. Unfortunately, this is not easy to do with Automake (it
> requires an intermediate convenience library to be declared), and I
don't
> know whether gnulib-tool already supports this contortion.
> But before I look into it in detail: Is there an alternative to these
> command-line options? Maybe there are some #pragmas that have the same
> effect?
This looks to work with gcc:
#pragma GCC target("pclmul,avx")
And I believe clang honours this, can someone else check this for me or
give me a cmdline to make it build locally with clang so I can check?
> How about ((uintptr_t) ptr & 0xF) == 0
> or something like that?
Perfect, I'll experiment from there.
This patch implements all the above feedback except for making sure data is
correctly aligned, I will add this either tomorrow or later today.
[1] <https://clang.llvm.org/docs/ClangCommandLineReference.html>
0001-crc-Add-PCLMUL-implementation.patch
Description: Binary data
