On 12/12/2020 18:54, Ard Biesheuvel wrote:
> On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel <a...@kernel.org> wrote:
>>
>> On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebigg...@kernel.org> wrote:
>>>
>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>>>> crc32c-intel and crc32c-generic are both registered, system will
>>>> use crc32c-intel because its .cra_priority is greater than
>>>> crc32c-generic. This case expect to use crc32c-generic driver for
>>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>>>> CPUs support from crc32c-intel.
>>>>
>>>> Signed-off-by: Tony W Wang-oc <tonywwang...@zhaoxin.com>
>>>
>>> Does this mean that the performance of the crc32c instruction on those CPUs
>>> is
>>> actually slower than a regular C implementation? That's very weird.
>>>
>>
>> This driver does not use CRC instructions, but carryless
>> multiplication and aggregation. So I suppose the pclmulqdq instruction
>> triggers some pathological performance limitation here.
>>
>
> Just noticed it uses both crc instructions and pclmulqdq instructions.
> Sorry for the noise.
>
>> That means the crct10dif driver probably needs the same treatment.
>
> Tony, can you confirm that the problem is in the CRC instructions and
> not in the PCLMULQDQ code path that supersedes it when available?
CRC instructions.
sincerely
Tony