I rebuilt my 4.18 kernel with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as Martin 
recommended and got even better performance results vs the CRC Slice by 16 
changes.  Here's a summary of the results

FIO Sequential Write, 64K Block Size, Queue Depth 64
PCLMUL = y Kernel:        bw = 2237 MiB/s
Slice by 16 CRC Calc:      bw = 1964 MiB/s
Base Kernel:                     bw =   357 MiB/s

FIO Sequential Read, 64K Block Size, Queue Depth 64
PCLMUL = y Kernel:        bw = 3839 MiB/s
Slice by 16 CRC Calc:      bw = 2730  MiB/s
Base Kernel:                     bw =   797 MiB/s

So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best performance. 
 Are there any negative side effect to this config option?   If not, does it 
make sense to recommend all the major distro's change their config options to 
have CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as the default option?  


Jeff Lien


-----Original Message-----
From: Christoph Hellwig [mailto:h...@infradead.org] 
Sent: Wednesday, August 22, 2018 1:20 AM
To: Martin K. Petersen <martin.peter...@oracle.com>
Cc: Jeffrey Lien <jeff.l...@wdc.com>; linux-ker...@vger.kernel.org; 
linux-crypto@vger.kernel.org; linux-bl...@vger.kernel.org; 
linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; 
tim.c.c...@linux.intel.com; David Darrington <david.darring...@wdc.com>; Jeff 
Furlong <jeff.furl...@wdc.com>
Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.

On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote:
> When crc-t10dif is initialized, the crypto infrastructure will pick 
> the algorithm with the highest priority currently registered. Both 
> block and SCSI will cause crc-t10dif to be compiled as a built-in so 
> this selection happens very early.

Ouch.  This might actually happen in a lot of other users of the crypto 
functionality as well.

> However, it seems like a bit of a deficiency in crypto that there is 
> no way to upgrade existing transformations if higher priority 
> algorithms become available. btrfs and a few others work around this 
> issue by not using the generic lib/ CRC functions (which defeats the 
> purpose of having these in the first place). Instead they are 
> registering their own transformation at a later time where any 
> accelerator modules are more likely to be loaded.

If we can't fix this in crypto (which doesn't seem that easy), we should at 
least clearly document the issue somewhere, and fix this in the t10pi code by 
initializing crct10dif_tfm in a lazy fashion only once the fist block device 
starts using it.

Reply via email to