On 24 August 2018 at 16:32, Jeffrey Lien <jeff.l...@wdc.com> wrote:
> I rebuilt my 4.18 kernel with CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as Martin 
> recommended and got even better performance results vs the CRC Slice by 16 
> changes.  Here's a summary of the results
>
> FIO Sequential Write, 64K Block Size, Queue Depth 64
> PCLMUL = y Kernel:        bw = 2237 MiB/s
> Slice by 16 CRC Calc:      bw = 1964 MiB/s
> Base Kernel:                     bw =   357 MiB/s
>
> FIO Sequential Read, 64K Block Size, Queue Depth 64
> PCLMUL = y Kernel:        bw = 3839 MiB/s
> Slice by 16 CRC Calc:      bw = 2730  MiB/s
> Base Kernel:                     bw =   797 MiB/s
>
> So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best 
> performance.  Are there any negative side effect to this config option?   If 
> not, does it make sense to recommend all the major distro's change their 
> config options to have CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y as the default option?
>

I think the way the library version of crc_t10dif() invokes the crypto
API should be revised.

Would it be possible to allocate the crypto transform upon first use
instead of from an initcall? If crc_t10dif() is mostly called from
non-process context, that would not really work, but otherwise, we
could simply defer it (and occasional calls from non-process context
that do occur would use the generic code until the point where another
call from process context allocates the transform)

> -----Original Message-----
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: Wednesday, August 22, 2018 1:20 AM
> To: Martin K. Petersen <martin.peter...@oracle.com>
> Cc: Jeffrey Lien <jeff.l...@wdc.com>; linux-ker...@vger.kernel.org; 
> linux-crypto@vger.kernel.org; linux-bl...@vger.kernel.org; 
> linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; 
> tim.c.c...@linux.intel.com; David Darrington <david.darring...@wdc.com>; Jeff 
> Furlong <jeff.furl...@wdc.com>
> Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations.
>
> On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote:
>> When crc-t10dif is initialized, the crypto infrastructure will pick
>> the algorithm with the highest priority currently registered. Both
>> block and SCSI will cause crc-t10dif to be compiled as a built-in so
>> this selection happens very early.
>
> Ouch.  This might actually happen in a lot of other users of the crypto 
> functionality as well.
>
>> However, it seems like a bit of a deficiency in crypto that there is
>> no way to upgrade existing transformations if higher priority
>> algorithms become available. btrfs and a few others work around this
>> issue by not using the generic lib/ CRC functions (which defeats the
>> purpose of having these in the first place). Instead they are
>> registering their own transformation at a later time where any
>> accelerator modules are more likely to be loaded.
>
> If we can't fix this in crypto (which doesn't seem that easy), we should at 
> least clearly document the issue somewhere, and fix this in the t10pi code by 
> initializing crct10dif_tfm in a lazy fashion only once the fist block device 
> starts using it.

Reply via email to