On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote: >On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote: >>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers >><ebigg...@kernel.org> wrote: >>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >>>> >>>> On 15/12/2020 04:41, Eric Biggers wrote: >>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >>>> >> On 12/12/2020 01:43, Eric Biggers wrote: >>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>>> >>>> The driver crc32c-intel match CPUs supporting >>>X86_FEATURE_XMM4_2. >>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, >>When >>>> >>>> crc32c-intel and crc32c-generic are both registered, system >>will >>>> >>>> use crc32c-intel because its .cra_priority is greater than >>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver >>>for >>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these >>>Zhaoxin >>>> >>>> CPUs support from crc32c-intel. >>>> >>>> >>>> >>>> Signed-off-by: Tony W Wang-oc <tonywwang...@zhaoxin.com> >>>> >>> >>>> >>> Does this mean that the performance of the crc32c instruction >on >>>those CPUs is >>>> >>> actually slower than a regular C implementation? That's very >>>weird. >>>> >>> >>>> >> >>>> >> From the lmbench3 Create and Delete file test on those chips, I >>>think yes. >>>> >> >>>> > >>>> > Did you try measuring the performance of the hashing itself, and >>>not some >>>> > higher-level filesystem operations? >>>> > >>>> >>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the >>>same >>>> input value the generic C implementation takes fewer time than the >>>> crc32c instruction implementation. >>>> >>> >>>And that is really "working as intended"? >> >>These CPU's crc32c instruction is not working as intended. >> >> Why do these CPUs even >>>declare that >>>they support the crc32c instruction, when it is so slow? >>> >> >>The presence of crc32c and some other instructions supports are >>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >>except the crc32c instruction. >> >>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >>>etc.) that >>>these CPUs similarly declare support for but they are uselessly slow? >> >>No. >> >>Sincerely >>Tonyw >> >>> >>>- Eric > >Then the right thing to do is to disable the CPUID bit in the >vendor-specific startup code.
This way makes these CPUs do not support all instruction sets enumerated by CPUID.01:ECX[SSE4.2]. While only crc32c instruction is slow, just expect the crc32c-intel driver do not match these CPUs. Sincerely Tonyw