> * I would suggest to rename the main source file from crc-pclmul.c to > crc-x86_64.c. > Rationale: So that immediately clear that the code is specific to the > x86_64 CPUs. Not everyone is an assembly language hacker, and even some > assembly language hackers (like me) don't know about the 'pclmul' > instruction and of which ISA it is part of. Whereas everyone knows what > x86_64 is.
I am fine with whatever structure, the thing to keep in mind is I have a patch open for corelib extending their 128-bit implementation (pclmul is the cpu flag we check) to 256-bit (avx2 + vpclmulqdq) and avx 512 (avx512 + vpclmulqdq). x86_64 seems fine to have in the title but I think there's value in specifying the CPU flag we're checking. The build flag is especially important because the avx2 instruction vpclmulqdq can be assembled with a VEX (avx2) or EVEX (avx512) prefix, and a CPU with avx2 support would SIGILL on the EVEX prefix despite supporting the VEX version of the instruction, so in my coreutils implementation I actually build each of the 3 pclmul/avx2/avx512 implementations as separate objects with separate sets of flags. > * Similarly, I suggest to rename the module 'crc-pclmul' to 'crc-x86_64'. As above, I have no preference, I've renamed it to crc-x86_64-pclmul but am happy to go with any suggestion, as long as it opens us to avx2 and avx512 in future implementations > * In the patch, one of the diffs ends with > \ No newline at end of file > Could you please add a newline at the end of that source file? Fixed. > * In crc.c: Would it make sense to cache the value of pclmul_enabled > in a static variable? > You can probably guess it by looking at the assembly-language code > that gcc -O2 produces for the two __builtin_cpu_supports invocations. Done. This sounds like a good optimisation if we're worried about people incrementally hitting this function for small (>64) byte hashes, so I've made it fall back to the older implementation for these as the initialisation of all the xmm registers also burns a ton of cycles. > * Are the options -mpclmul -mavx understood by both gcc and clang? > Or does clang use different options for the same thing? As per [1] it looks to be the case, these appear to match directly to the output of $(lscpu) > * In the module description, when you write > CFLAGS += -mpclmul -mavx > it has an effect on all compilations of the same Makefile (e.g. in the > case of coreutils, on all the coreutils programs). Which means that the > resulting binaries will *NOT WORK* on CPUs that don't have AVX extensions. > So, these CFLAGS changes should be limited to the one compilation unit > that needs it. Unfortunately, this is not easy to do with Automake (it > requires an intermediate convenience library to be declared), and I don't > know whether gnulib-tool already supports this contortion. > But before I look into it in detail: Is there an alternative to these > command-line options? Maybe there are some #pragmas that have the same > effect? This looks to work with gcc: #pragma GCC target("pclmul,avx") And I believe clang honours this, can someone else check this for me or give me a cmdline to make it build locally with clang so I can check? > How about ((uintptr_t) ptr & 0xF) == 0 > or something like that? Perfect, I'll experiment from there. This patch implements all the above feedback except for making sure data is correctly aligned, I will add this either tomorrow or later today. [1] <https://clang.llvm.org/docs/ClangCommandLineReference.html>
0001-crc-Add-PCLMUL-implementation.patch
Description: Binary data