> * I would suggest to rename the main source file from crc-pclmul.c to
>   crc-x86_64.c.
>     Rationale: So that immediately clear that the code is specific to the
>     x86_64 CPUs. Not everyone is an assembly language hacker, and even
some
>     assembly language hackers (like me) don't know about the 'pclmul'
>     instruction and of which ISA it is part of. Whereas everyone knows
what
>     x86_64 is.

I am fine with whatever structure, the thing to keep in mind is I have a
patch open for corelib extending their 128-bit implementation (pclmul is
the cpu flag we check) to 256-bit (avx2 + vpclmulqdq) and avx 512 (avx512 +
vpclmulqdq). x86_64 seems fine to have in the title but I think there's
value in specifying the CPU flag we're checking. The build flag is
especially important because the avx2 instruction vpclmulqdq can be
assembled with a VEX (avx2) or EVEX (avx512) prefix, and a CPU with avx2
support would SIGILL on the EVEX prefix despite supporting the VEX version
of the instruction, so in my coreutils implementation I actually build each
of the 3 pclmul/avx2/avx512 implementations as separate objects with
separate sets of flags.

> * Similarly, I suggest to rename the module 'crc-pclmul' to 'crc-x86_64'.

As above, I have no preference, I've renamed it to crc-x86_64-pclmul but am
happy to go with any suggestion, as long as it opens us to avx2 and avx512
in future implementations

> * In the patch, one of the diffs ends with
>   \ No newline at end of file
>   Could you please add a newline at the end of that source file?

Fixed.

> * In crc.c: Would it make sense to cache the value of pclmul_enabled
>     in a static variable?
>     You can probably guess it by looking at the assembly-language code
>     that gcc -O2 produces for the two __builtin_cpu_supports invocations.

Done.

This sounds like a good optimisation if we're worried about people
incrementally hitting this function for small (>64) byte hashes, so I've
made it fall back to the older implementation for these as the
initialisation of all the xmm registers also burns a ton of cycles.

> * Are the options -mpclmul -mavx understood by both gcc and clang?
>   Or does clang use different options for the same thing?

As per [1] it looks to be the case, these appear to match directly to the
output of $(lscpu)

> * In the module description, when you write
>       CFLAGS += -mpclmul -mavx
>     it has an effect on all compilations of the same Makefile (e.g. in the
>     case of coreutils, on all the coreutils programs). Which means that
the
>     resulting binaries will *NOT WORK* on CPUs that don't have AVX
extensions.

>     So, these CFLAGS changes should be limited to the one compilation unit
>     that needs it. Unfortunately, this is not easy to do with Automake (it
>     requires an intermediate convenience library to be declared), and I
don't
>     know whether gnulib-tool already supports this contortion.

>     But before I look into it in detail: Is there an alternative to these
>     command-line options? Maybe there are some #pragmas that have the same
>     effect?

This looks to work with gcc:

#pragma GCC target("pclmul,avx")

And I believe clang honours this, can someone else check this for me or
give me a cmdline to make it build locally with clang so I can check?

> How about  ((uintptr_t) ptr & 0xF) == 0
> or something like that?

Perfect, I'll experiment from there.

This patch implements all the above feedback except for making sure data is
correctly aligned, I will add this either tomorrow or later today.

[1] <https://clang.llvm.org/docs/ClangCommandLineReference.html>

Attachment: 0001-crc-Add-PCLMUL-implementation.patch
Description: Binary data

Reply via email to