Re: [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms

2021-01-16 Thread Dey, Megha
Hi Ard, On 1/16/2021 8:52 AM, Ard Biesheuvel wrote: On Mon, 28 Dec 2020 at 20:11, Dey, Megha wrote: Hi Eric, On 12/21/2020 3:20 PM, Eric Biggers wrote: On Fri, Dec 18, 2020 at 01:10:57PM -0800, Megha Dey wrote: Optimize crypto algorithms using VPCLMULQDQ and VAES AVX512 instructions (first

Re: [linux-next:master 952/3956] crypto/blake2b_generic.c:73:13: warning: stack frame size of 9776 bytes in function 'blake2b_compress_one_generic'

2021-01-16 Thread Arnd Bergmann
On Sat, Jan 16, 2021 at 2:59 AM Eric Biggers wrote: > On Sat, Jan 16, 2021 at 08:59:50AM +0800, kernel test robot wrote > > Looks like the clang bug that causes large stack usage in this function > (https://bugs.llvm.org/show_bug.cgi?id=45803 which is still unfixed) got > triggered again. Note th

Re: [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms

2021-01-16 Thread Ard Biesheuvel
On Mon, 28 Dec 2020 at 20:11, Dey, Megha wrote: > > Hi Eric, > > On 12/21/2020 3:20 PM, Eric Biggers wrote: > > On Fri, Dec 18, 2020 at 01:10:57PM -0800, Megha Dey wrote: > >> Optimize crypto algorithms using VPCLMULQDQ and VAES AVX512 instructions > >> (first implemented on Intel's Icelake client

Re: [RFC V1 3/7] crypto: ghash - Optimized GHASH computations

2021-01-16 Thread Ard Biesheuvel
On Sat, 16 Jan 2021 at 06:13, Dave Hansen wrote: > > On 1/15/21 6:04 PM, Eric Biggers wrote: > > On Fri, Jan 15, 2021 at 04:20:44PM -0800, Dave Hansen wrote: > >> On 1/15/21 4:14 PM, Dey, Megha wrote: > >>> Also, I do not know of any cores that implement PCLMULQDQ and not AES-NI. > >> That's true,

Re: [RFC V1 1/7] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support

2021-01-16 Thread Ard Biesheuvel
On Fri, 18 Dec 2020 at 22:07, Megha Dey wrote: > > This is a preparatory patch to introduce the optimized crypto algorithms > using AVX512 instructions which would require VAES and VPLCMULQDQ support. > > Check for VAES and VPCLMULQDQ assembler support using AVX512 registers. > > Cc: x...@kernel.o

Re: [RFC V1 7/7] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ

2021-01-16 Thread Ard Biesheuvel
On Fri, 18 Dec 2020 at 22:08, Megha Dey wrote: > > Introduce the AVX512 implementation that optimizes the AESNI-GCM encode > and decode routines using VPCLMULQDQ. > > The glue code in AESNI module overrides the existing AVX2 GCM mode > encryption/decryption routines with the AX512 AES GCM mode one

Re: [RFC V1 5/7] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization

2021-01-16 Thread Ard Biesheuvel
On Fri, 18 Dec 2020 at 22:08, Megha Dey wrote: > > Introduce the "by16" implementation of the AES CTR mode using AVX512 > optimizations. "by16" means that 16 independent blocks (each block > being 128 bits) can be ciphered simultaneously as opposed to the > current 8 blocks. > > The glue code in A

[PATCH 2/2] crypto: aesni - release FPU during skcipher walk API calls

2021-01-16 Thread Ard Biesheuvel
Taking ownership of the FPU in kernel mode disables preemption, and this may result in excessive scheduling blackouts if the size of the data being processed on the FPU is unbounded. Given that taking and releasing the FPU is cheap these days on x86, we can limit the impact of this issue easily fo

[PATCH 0/2] crypto: aesni - fix more FPU handling and indirect call issues

2021-01-16 Thread Ard Biesheuvel
My recent patches to the AES-NI driver addressed all the instances of indirect calls occurring in the XTS and GCM drivers, and while at it, limited the scope of FPU enabled/preemption disabled regions not to cover the work that goes on inside the skcipher walk API. This gets rid of scheduling laten

[PATCH 1/2] crypto: aesni - replace CTR function pointer with static call

2021-01-16 Thread Ard Biesheuvel
Indirect calls are very expensive on x86, so use a static call to set the system-wide AES-NI/CTR asm helper. Signed-off-by: Ard Biesheuvel --- arch/x86/crypto/aesni-intel_glue.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_glue.c

Re: [RFC V1 2/7] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction

2021-01-16 Thread Ard Biesheuvel
On Fri, 18 Dec 2020 at 22:07, Megha Dey wrote: > > From: Kyung Min Park > > Update the crc_pcl function that calculates T10 Data Integrity Field > CRC16 (CRC T10 DIF) using VPCLMULQDQ instruction. VPCLMULQDQ instruction > with AVX-512F adds EVEX encoded 512 bit version of PCLMULQDQ instruction. >