Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-25 Thread Martin K. Petersen
Herbert, > I don't think this is safe unless you do some kind of locking > which would slow down the data path. The easiest fix would be > to keep the old tfm around forever, or use RCU if RCU read locking > is acceptable to your use-case. You're right. There's a small race there. Patch serie

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Herbert Xu
On Fri, Aug 24, 2018 at 05:46:15PM -0400, Martin K. Petersen wrote: > > +#ifdef CONFIG_MODULES > + struct module *mod = data; > + > + if (val != MODULE_STATE_LIVE || > + strncmp(mod->name, "crct10dif", strlen("crct10dif"))) > + return 0; > + > + /* Fall back to libra

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Martin K. Petersen
Ard, > This looks like it should work, yes. It does rely on the module name > to start with 'crct10dif' but I guess that is reasonable, and matches > the current state on all architectures. Yep, I verified the module names on ARM and Power. There really wasn't much I could key off of other than

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Ard Biesheuvel
On 24 August 2018 at 22:46, Martin K. Petersen wrote: > > Ard, > >> I'd prefer to handle this without help from userland. >> >> It shouldn't be too difficult to register a module notifier that only >> sets a flag (or the static key, even?), and to free and re-allocate >> the crc_t10dif transform i

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Martin K. Petersen
Ard, > I'd prefer to handle this without help from userland. > > It shouldn't be too difficult to register a module notifier that only > sets a flag (or the static key, even?), and to free and re-allocate > the crc_t10dif transform if the flag is set. Something like this proof of concept? diff

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Ard Biesheuvel
On 24 August 2018 at 17:29, Martin K. Petersen wrote: > > Ard, > >> Would it be possible to allocate the crypto transform upon first use >> instead of from an initcall? If crc_t10dif() is mostly called from >> non-process context, that would not really work, but otherwise, we >> could simply defer

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Martin K. Petersen
Jeffrey, > So it seems the CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y provides the best > performance. Thanks for confirming! > Are there any negative side effect to this config option? Other than kernel image size, not really. -- Martin K. Petersen Oracle Linux Engineering

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Martin K. Petersen
Ard, > Would it be possible to allocate the crypto transform upon first use > instead of from an initcall? If crc_t10dif() is mostly called from > non-process context, that would not really work, but otherwise, we > could simply defer it (and occasional calls from non-process context > that do o

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Ard Biesheuvel
er.kernel.org; > linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; > tim.c.c...@linux.intel.com; David Darrington ; Jeff > Furlong > Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. > > On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote: >> W

RE: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-24 Thread Jeffrey Lien
ana.org.au; tim.c.c...@linux.intel.com; David Darrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote: > When crc-t10dif is initialized, the crypto infrastructure will pick > the al

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-21 Thread Christoph Hellwig
On Tue, Aug 21, 2018 at 09:40:34PM -0400, Martin K. Petersen wrote: > When crc-t10dif is initialized, the crypto infrastructure will pick the > algorithm with the highest priority currently registered. Both block and > SCSI will cause crc-t10dif to be compiled as a built-in so this > selection happ

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-21 Thread Martin K. Petersen
> These days we obviously use the hardware-accelerated CRC calculation > so the software table approach mostly serves as a reference > implementation. I was puzzled as to why WDC's tests did not seem to use the hardware- accelerated CRC calculation whereas tests on my end worked fine. Turns out

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-16 Thread Martin K. Petersen
> With regard to your comment about slice (table ?) size, that is > partially addressed by a kernel build time option shown in the above > patch. That could be taken a bit further with a sysfs knob (where ?) > to reduce the effective table size from that which the kernel is built > with. To incre

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-16 Thread Douglas Gilbert
vger.kernel.org; linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; tim.c.c...@linux.intel.com; martin.peter...@oracle.com; David Darrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote: This p

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-16 Thread Christophe LEROY
arrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote: This patch provides a performance improvement for the CRC16 calculations done in read/write workloads using the T10 Type 1/2/3 guard field.

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-16 Thread Christophe LEROY
el.org; linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; tim.c.c...@linux.intel.com; martin.peter...@oracle.com; David Darrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote: This p

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-16 Thread Douglas Gilbert
g; linux-crypto@vger.kernel.org; linux-bl...@vger.kernel.org; linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; tim.c.c...@linux.intel.com; martin.peter...@oracle.com; David Darrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On Fri, Aug 10, 201

RE: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-16 Thread Jeffrey Lien
Cc: linux-ker...@vger.kernel.org; linux-crypto@vger.kernel.org; linux-bl...@vger.kernel.org; linux-s...@vger.kernel.org; herb...@gondor.apana.org.au; tim.c.c...@linux.intel.com; martin.peter...@oracle.com; David Darrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calcul

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-15 Thread Pavel Machek
Hi! > This patch provides a performance improvement for the CRC16 calculations done > in read/write > workloads using the T10 Type 1/2/3 guard field. For example, today with > sequential write > workloads (one thread/CPU of IO) we consume 100% of the CPU because of the > CRC16 computation > bo

RE: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-15 Thread Jeffrey Lien
Jeffrey Lien ; linux-ker...@vger.kernel.org; linux-crypto@vger.kernel.org; linux-bl...@vger.kernel.org; linux-s...@vger.kernel.org Cc: herb...@gondor.apana.org.au; martin.peter...@oracle.com; David Darrington ; Jeff Furlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On 08/10/2018

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-13 Thread Tim Chen
On 08/10/2018 12:12 PM, Jeff Lien wrote: > This patch provides a performance improvement for the CRC16 calculations done > in read/write > workloads using the T10 Type 1/2/3 guard field. For example, today with > sequential write > workloads (one thread/CPU of IO) we consume 100% of the CPU beca

RE: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-13 Thread Jeffrey Lien
rlong Subject: Re: [PATCH] Performance Improvement in CRC16 Calculations. On Sat, 2018-08-11 at 02:04 -0700, Joe Perches wrote: > On Fri, 2018-08-10 at 22:39 -0400, Douglas Gilbert wrote: > > but below is a copy and paste of a table 27 from draft SBC-4 > > revision 15 in chapter

RE: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-13 Thread David Laight
> A more interesting version would be to generate the lookup table > for a byte followed by 3 zero bytes. > You could then run four separate register dependency chains using the > same 256 entry lookup table. Not sure that works with a table lookup :-( David - Registered Address Lakeside

RE: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-13 Thread David Laight
From: Jeff Lien > Sent: 10 August 2018 20:12 > > This patch provides a performance improvement for the CRC16 calculations done > in read/write > workloads using the T10 Type 1/2/3 guard field. For example, today with > sequential write > workloads (one thread/CPU of IO) we consume 100% of the C

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-12 Thread Chaitanya Kulkarni
On 8/10/18, 12:13 PM, "linux-block-ow...@vger.kernel.org on behalf of Jeff Lien" wrote: This patch provides a performance improvement for the CRC16 calculations done in read/write workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write workloads

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-12 Thread Joe Perches
On Sun, 2018-08-12 at 23:36 -0400, Douglas Gilbert wrote: > On 2018-08-10 08:11 PM, Joe Perches wrote: > > On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote: > > > On Fri, 10 Aug 2018, Joe Perches wrote: > > > > > > > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: > > > > > This patch pr

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-12 Thread Douglas Gilbert
On 2018-08-10 08:11 PM, Joe Perches wrote: On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote: On Fri, 10 Aug 2018, Joe Perches wrote: On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: This patch provides a performance improvement for the CRC16 calculations done in read/write workloads

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-11 Thread Joe Perches
On Sat, 2018-08-11 at 11:36 -0400, Martin K. Petersen wrote: > Jeff, > > > This patch provides a performance improvement for the CRC16 > > calculations done in read/write workloads using the T10 Type 1/2/3 > > guard field. For example, today with sequential write workloads (one > > thread/CPU of

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-11 Thread Martin K. Petersen
Jeff, > This patch provides a performance improvement for the CRC16 > calculations done in read/write workloads using the T10 Type 1/2/3 > guard field. For example, today with sequential write workloads (one > thread/CPU of IO) we consume 100% of the CPU because of the CRC16 > computation bottl

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-11 Thread Joe Perches
On Sat, 2018-08-11 at 02:04 -0700, Joe Perches wrote: > On Fri, 2018-08-10 at 22:39 -0400, Douglas Gilbert wrote: > > but below is a copy and paste of a table 27 from draft SBC-4 > > revision 15 in chapter 4.22.4.4 on page 87. > > The posted code returns the proper crc for each > CONFIG_CRYPTO_CRC

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-11 Thread Joe Perches
On Fri, 2018-08-10 at 22:39 -0400, Douglas Gilbert wrote: > but below is a copy and paste of a table 27 from draft SBC-4 > revision 15 in chapter 4.22.4.4 on page 87. The posted code returns the proper crc for each CONFIG_CRYPTO_CRCT10DIF_TABLE_SIZE value from 1 to 5 for these arrays.

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Douglas Gilbert
On 2018-08-10 08:11 PM, Joe Perches wrote: On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote: On Fri, 10 Aug 2018, Joe Perches wrote: On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: This patch provides a performance improvement for the CRC16 calculations done in read/write workloads

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Nicolas Pitre
On Fri, 10 Aug 2018, Joe Perches wrote: > On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote: > > On Fri, 10 Aug 2018, Joe Perches wrote: > > > > > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: > > > > This patch provides a performance improvement for the CRC16 > > > > calculations don

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Joe Perches
On Fri, 2018-08-10 at 16:02 -0400, Nicolas Pitre wrote: > On Fri, 10 Aug 2018, Joe Perches wrote: > > > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: > > > This patch provides a performance improvement for the CRC16 calculations > > > done in read/write > > > workloads using the T10 Type 1/

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Douglas Gilbert
On 2018-08-10 03:12 PM, Jeff Lien wrote: This patch provides a performance improvement for the CRC16 calculations done in read/write workloads using the T10 Type 1/2/3 guard field. For example, today with sequential write workloads (one thread/CPU of IO) we consume 100% of the CPU because of t

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Eric Biggers
On Fri, Aug 10, 2018 at 02:12:11PM -0500, Jeff Lien wrote: > This patch provides a performance improvement for the CRC16 calculations done > in read/write > workloads using the T10 Type 1/2/3 guard field. For example, today with > sequential write > workloads (one thread/CPU of IO) we consume 10

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Nicolas Pitre
On Fri, 10 Aug 2018, Jeff Lien wrote: > This patch provides a performance improvement for the CRC16 calculations done > in read/write > workloads using the T10 Type 1/2/3 guard field. For example, today with > sequential write > workloads (one thread/CPU of IO) we consume 100% of the CPU becaus

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Nicolas Pitre
On Fri, 10 Aug 2018, Joe Perches wrote: > On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: > > This patch provides a performance improvement for the CRC16 calculations > > done in read/write > > workloads using the T10 Type 1/2/3 guard field. For example, today with > > sequential write > >

Re: [PATCH] Performance Improvement in CRC16 Calculations.

2018-08-10 Thread Joe Perches
On Fri, 2018-08-10 at 14:12 -0500, Jeff Lien wrote: > This patch provides a performance improvement for the CRC16 calculations done > in read/write > workloads using the T10 Type 1/2/3 guard field. For example, today with > sequential write > workloads (one thread/CPU of IO) we consume 100% of t