Re: [PATCH v4 6/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-25 Thread Arvind Sankar
On Sun, Oct 25, 2020 at 11:23:52PM +, David Laight wrote: > From: Arvind Sankar > > Sent: 25 October 2020 20:18 > > > > On Sun, Oct 25, 2020 at 06:51:18PM +, David Laight wrote: > > > From: Arvind Sankar > > > > Sent: 25 October 2020 14:31 &g

Re: [PATCH v4 6/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-25 Thread Arvind Sankar
On Sun, Oct 25, 2020 at 06:51:18PM +, David Laight wrote: > From: Arvind Sankar > > Sent: 25 October 2020 14:31 > > > > Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 > > (tested on Broadwell Xeon) while not increasing code size too

[PATCH v4 1/6] crypto: lib/sha256 - Use memzero_explicit() for clearing state

2020-10-25 Thread Arvind Sankar
. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 2321f6cb322f..d43bc39ab05e 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -265,7 +265,7

[PATCH v4 3/6] crypto: lib/sha256 - Don't clear temporary variables

2020-10-25 Thread Arvind Sankar
compiler-generated temporaries that are impossible to clear in any case. So drop the clearing of a through h and t1/t2. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index

[PATCH v4 5/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-25 Thread Arvind Sankar
This reduces code size substantially (on x86_64 with gcc-10 the size of sha256_update() goes from 7593 bytes to 1952 bytes including the new SHA256_K array), and on x86 is slightly faster than the full unroll (tested on Broadwell Xeon). Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers

[PATCH v4 4/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()

2020-10-25 Thread Arvind Sankar
() implementation, and considerably more (~20%) with a bad one (eg the x86 purgatory currently uses a memset() coded in C). Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/crypto

[PATCH v4 2/6] crypto: Use memzero_explicit() for clearing state

2020-10-25 Thread Arvind Sankar
Without the barrier_data() inside memzero_explicit(), the compiler may optimize away the state-clearing if it can tell that the state is not used afterwards. Signed-off-by: Arvind Sankar --- arch/arm64/crypto/ghash-ce-glue.c | 2 +- arch/arm64/crypto/poly1305-glue.c | 2 +- arch/arm64/crypto

[PATCH v4 6/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-25 Thread Arvind Sankar
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 (tested on Broadwell Xeon) while not increasing code size too much. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 24 1 file changed, 20 insertions(+), 4

[PATCH v4 0/6] crypto: lib/sha256 - cleanup/optimization

2020-10-25 Thread Arvind Sankar
1 - Reword commit message for patch 2 - Reformat SHA256_K array - Drop v2 patch combining K and W arrays v2: - Add patch to combine K and W arrays, suggested by David - Reformat SHA256_ROUND() macro a little Arvind Sankar (6): crypto: lib/sha256 - Use memzero_explicit() for clearing state

[PATCH v3 1/5] crypto: Use memzero_explicit() for clearing state

2020-10-23 Thread Arvind Sankar
. Signed-off-by: Arvind Sankar --- arch/arm64/crypto/ghash-ce-glue.c | 2 +- arch/arm64/crypto/poly1305-glue.c | 2 +- arch/arm64/crypto/sha3-ce-glue.c | 2 +- arch/x86/crypto/poly1305_glue.c | 2 +- include/crypto/sha1_base.h| 3 ++- include/crypto/sha256_base.h | 3 ++- include

[PATCH v3 5/5] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-23 Thread Arvind Sankar
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 (tested on Broadwell Xeon) while not increasing code size too much. Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 24 1 file changed, 20 insertions(+), 4

[PATCH v3 0/5] crypto: lib/sha256 - cleanup/optimization

2020-10-23 Thread Arvind Sankar
little Arvind Sankar (5): crypto: Use memzero_explicit() for clearing state crypto: lib/sha256 - Don't clear temporary variables crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform() crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64 crypto

[PATCH v3 2/5] crypto: lib/sha256 - Don't clear temporary variables

2020-10-23 Thread Arvind Sankar
compiler-generated temporaries that are impossible to clear in any case. So drop the clearing of a through h and t1/t2. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index d43bc39ab05e..099cd11f83c1

[PATCH v3 4/5] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-23 Thread Arvind Sankar
This reduces code size substantially (on x86_64 with gcc-10 the size of sha256_update() goes from 7593 bytes to 1952 bytes including the new SHA256_K array), and on x86 is slightly faster than the full unroll (tested on Broadwell Xeon). Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 174

[PATCH v3 3/5] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()

2020-10-23 Thread Arvind Sankar
() implementation, and considerably more (~20%) with a bad one (eg the x86 purgatory currently uses a memset() coded in C). Signed-off-by: Arvind Sankar Reviewed-by: Eric Biggers --- lib/crypto/sha256.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/crypto

Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state

2020-10-23 Thread Arvind Sankar
On Wed, Oct 21, 2020 at 09:36:33PM -0700, Eric Biggers wrote: > On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote: > > Without the barrier_data() inside memzero_explicit(), the compiler may > > optimize away the state-clearing if it can tell that the state is not >

Re: [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables

2020-10-22 Thread Arvind Sankar
On Wed, Oct 21, 2020 at 09:58:50PM -0700, Eric Biggers wrote: > On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote: > > The assignments to clear a through h and t1/t2 are optimized out by the > > compiler because they are unused after the assignments. > > > >

Re: [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-22 Thread Arvind Sankar
On Wed, Oct 21, 2020 at 10:02:19PM -0700, Eric Biggers wrote: > On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote: > > This reduces code size substantially (on x86_64 with gcc-10 the size of > > sha256_update() goes from 7593 bytes to 1952 bytes including the new >

Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule

2020-10-21 Thread Arvind Sankar
On Tue, Oct 20, 2020 at 09:36:00PM +, David Laight wrote: > From: Arvind Sankar > > Sent: 20 October 2020 21:40 > > > > Putting the round constants and the message schedule arrays together in > > one structure saves one register, which can be a significant benefit

[PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()

2020-10-20 Thread Arvind Sankar
() implementation, and considerably more (~20%) with a bad one (eg the x86 purgatory currently uses a memset() coded in C). Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256

[PATCH v2 0/6] crypto: lib/sha256 - cleanup/optimization

2020-10-20 Thread Arvind Sankar
v2: - Add patch to combine K and W arrays, suggested by David - Reformat SHA256_ROUND() macro a little Arvind Sankar (6): crypto: Use memzero_explicit() for clearing state crypto: lib/sha256 - Don't clear temporary variables crypto: lib/sha256 - Clear W[] in sha256_update() instead of sh

[PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule

2020-10-20 Thread Arvind Sankar
Putting the round constants and the message schedule arrays together in one structure saves one register, which can be a significant benefit on register-constrained architectures. On x86-32 (tested on Broadwell Xeon), this gives a 10% performance benefit. Signed-off-by: Arvind Sankar Suggested

[PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables

2020-10-20 Thread Arvind Sankar
equivalent to knowing one 64-byte block's SHA256 hash (with non-standard initial value) which, assuming SHA256 is secure, doesn't reveal any information about the input. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/crypto/s

[PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-20 Thread Arvind Sankar
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64 (tested on Broadwell Xeon) while not increasing code size too much. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/lib

[PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-20 Thread Arvind Sankar
This reduces code size substantially (on x86_64 with gcc-10 the size of sha256_update() goes from 7593 bytes to 1952 bytes including the new SHA256_K array), and on x86 is slightly faster than the full unroll (tesed on Broadwell Xeon). Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 166

[PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state

2020-10-20 Thread Arvind Sankar
. Signed-off-by: Arvind Sankar --- include/crypto/sha1_base.h | 3 ++- include/crypto/sha256_base.h | 3 ++- include/crypto/sha512_base.h | 3 ++- include/crypto/sm3_base.h| 3 ++- lib/crypto/sha256.c | 2 +- 5 files changed, 9 insertions(+), 5 deletions(-) diff --git a/include/crypto

Re: [PATCH 4/5] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-20 Thread Arvind Sankar
On Tue, Oct 20, 2020 at 02:55:47PM +, David Laight wrote: > From: Arvind Sankar > > Sent: 20 October 2020 15:07 > > To: David Laight > > > > On Tue, Oct 20, 2020 at 07:41:33AM +, David Laight wrote: > > > From: Arvind Sankar> Sent: 19 October 2020

Re: [PATCH 4/5] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-20 Thread Arvind Sankar
On Tue, Oct 20, 2020 at 07:41:33AM +, David Laight wrote: > From: Arvind Sankar> Sent: 19 October 2020 16:30 > > To: Herbert Xu ; David S. Miller > > ; linux- > > cry...@vger.kernel.org > > Cc: linux-ker...@vger.kernel.org > > Subject: [PATCH 4/5] crypt

[PATCH 2/5] crypto: lib/sha256 - Don't clear temporary variables

2020-10-19 Thread Arvind Sankar
equivalent to knowing one 64-byte block's SHA256 hash (with non-standard initial value) which, assuming SHA256 is secure, doesn't reveal any information about the input. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 1 - 1 file changed, 1 deletion(-) diff --git a/lib/crypto/s

[PATCH 0/5] crypto: lib/sha256 - cleanup/optimization

2020-10-19 Thread Arvind Sankar
barriers. I don't think it's really necessary to clear them, but I'm not a cryptanalyst, so I would like comment on whether it's indeed safe not to, or we should instead add the required barriers to force clearing. The last three patches are optimizations for generic sha25

[PATCH 1/5] crypto: Use memzero_explicit() for clearing state

2020-10-19 Thread Arvind Sankar
. Signed-off-by: Arvind Sankar --- include/crypto/sha1_base.h | 3 ++- include/crypto/sha256_base.h | 3 ++- include/crypto/sha512_base.h | 3 ++- include/crypto/sm3_base.h| 3 ++- lib/crypto/sha256.c | 2 +- 5 files changed, 9 insertions(+), 5 deletions(-) diff --git a/include/crypto

[PATCH 5/5] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-19 Thread Arvind Sankar
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86 while not increasing code size too much. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto

[PATCH 3/5] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()

2020-10-19 Thread Arvind Sankar
() implementation, and considerably more (~20%) with a bad one (eg the x86 purgatory currently uses a memset() coded in C). Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256

[PATCH 4/5] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-19 Thread Arvind Sankar
This reduces code size substantially (on x86_64 with gcc-10 the size of sha256_update() goes from 7593 bytes to 1952 bytes including the new SHA256_K array), and on x86 is slightly faster than the full unroll. Signed-off-by: Arvind Sankar --- lib/crypto/sha256.c | 164

[tip: x86/urgent] lib/string: Make memzero_explicit() inline instead of external

2019-10-08 Thread tip-bot2 for Arvind Sankar
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: bec500777089b3c96c53681fc0aa6fee59711d4a Gitweb: https://git.kernel.org/tip/bec500777089b3c96c53681fc0aa6fee59711d4a Author:Arvind Sankar AuthorDate:Mon, 07 Oct 2019 18:00:02 -04:00

[tip: x86/urgent] lib/string: Make memzero_explicit() inline instead of external

2019-10-08 Thread tip-bot2 for Arvind Sankar
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: bec500777089b3c96c53681fc0aa6fee59711d4a Gitweb: https://git.kernel.org/tip/bec500777089b3c96c53681fc0aa6fee59711d4a Author:Arvind Sankar AuthorDate:Mon, 07 Oct 2019 18:00:02 -04:00

[PATCH] lib/string: make memzero_explicit inline instead of external

2019-10-07 Thread Arvind Sankar
rypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit") Reviewed-by: Hans de Goede Tested-by: Hans de Goede Signed-off-by: Arvind Sankar --- include/linux/string.h | 21 - lib/string.c | 21 - 2 files changed, 20 inserti

Re: [PATCH v2 5.4 regression fix] x86/boot: Provide memzero_explicit

2019-10-07 Thread Arvind Sankar
On Mon, Oct 07, 2019 at 05:40:07PM +0200, Ingo Molnar wrote: > > * Arvind Sankar wrote: > > > With the barrier in there, is there any reason to *not* inline the > > function? barrier_data() is an asm statement that tells the compiler > > that the asm uses the memory

Re: [PATCH v2 5.4 regression fix] x86/boot: Provide memzero_explicit

2019-10-07 Thread Arvind Sankar
> > > > On 07-10-2019 16:00, Ingo Molnar wrote: > > > > > > > > > > * Hans de Goede wrote: > > > > > > > > > > > The purgatory code now uses the shared lib/crypto/sha256.c sha256 > > > > > > i

Re: [PATCH 5.4 regression fix] x86/boot: Provide memzero_explicit

2019-10-07 Thread Arvind Sankar
gt;> On 07-10-2019 10:59, Stephan Mueller wrote: > >>> Am Montag, 7. Oktober 2019, 10:55:01 CEST schrieb Hans de Goede: > >>> > >>> Hi Hans, > >>> > >>>> The purgatory code now uses the shared lib/crypto/sha256.c sha256 > >>

Re: [PATCH 5.4 regression fix] x86/boot: Provide memzero_explicit

2019-10-07 Thread Arvind Sankar
19, 10:55:01 CEST schrieb Hans de Goede: > > > > > > Hi Hans, > > > > > >> The purgatory code now uses the shared lib/crypto/sha256.c sha256 > > >> implementation. This needs memzero_explicit, implement this. > > >> > > >> Report