On Sun, Oct 25, 2020 at 11:23:52PM +, David Laight wrote:
> From: Arvind Sankar
> > Sent: 25 October 2020 20:18
> >
> > On Sun, Oct 25, 2020 at 06:51:18PM +, David Laight wrote:
> > > From: Arvind Sankar
> > > > Sent: 25 October 2020 14:31
&g
On Sun, Oct 25, 2020 at 06:51:18PM +, David Laight wrote:
> From: Arvind Sankar
> > Sent: 25 October 2020 14:31
> >
> > Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
> > (tested on Broadwell Xeon) while not increasing code size too
.
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
---
lib/crypto/sha256.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index 2321f6cb322f..d43bc39ab05e 100644
--- a/lib/crypto/sha256.c
+++ b/lib/crypto/sha256.c
@@ -265,7 +265,7
compiler-generated temporaries that are
impossible to clear in any case.
So drop the clearing of a through h and t1/t2.
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
---
lib/crypto/sha256.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index
This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll
(tested on Broadwell Xeon).
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
---
lib/crypto/sha256.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/crypto
Without the barrier_data() inside memzero_explicit(), the compiler may
optimize away the state-clearing if it can tell that the state is not
used afterwards.
Signed-off-by: Arvind Sankar
---
arch/arm64/crypto/ghash-ce-glue.c | 2 +-
arch/arm64/crypto/poly1305-glue.c | 2 +-
arch/arm64/crypto
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
(tested on Broadwell Xeon) while not increasing code size too much.
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
---
lib/crypto/sha256.c | 24
1 file changed, 20 insertions(+), 4
1
- Reword commit message for patch 2
- Reformat SHA256_K array
- Drop v2 patch combining K and W arrays
v2:
- Add patch to combine K and W arrays, suggested by David
- Reformat SHA256_ROUND() macro a little
Arvind Sankar (6):
crypto: lib/sha256 - Use memzero_explicit() for clearing state
.
Signed-off-by: Arvind Sankar
---
arch/arm64/crypto/ghash-ce-glue.c | 2 +-
arch/arm64/crypto/poly1305-glue.c | 2 +-
arch/arm64/crypto/sha3-ce-glue.c | 2 +-
arch/x86/crypto/poly1305_glue.c | 2 +-
include/crypto/sha1_base.h| 3 ++-
include/crypto/sha256_base.h | 3 ++-
include
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
(tested on Broadwell Xeon) while not increasing code size too much.
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
---
lib/crypto/sha256.c | 24
1 file changed, 20 insertions(+), 4
little
Arvind Sankar (5):
crypto: Use memzero_explicit() for clearing state
crypto: lib/sha256 - Don't clear temporary variables
crypto: lib/sha256 - Clear W[] in sha256_update() instead of
sha256_transform()
crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
crypto
compiler-generated temporaries that are
impossible to clear in any case.
So drop the clearing of a through h and t1/t2.
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
index d43bc39ab05e..099cd11f83c1
This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll
(tested on Broadwell Xeon).
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 174
() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).
Signed-off-by: Arvind Sankar
Reviewed-by: Eric Biggers
---
lib/crypto/sha256.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/crypto
On Wed, Oct 21, 2020 at 09:36:33PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote:
> > Without the barrier_data() inside memzero_explicit(), the compiler may
> > optimize away the state-clearing if it can tell that the state is not
>
On Wed, Oct 21, 2020 at 09:58:50PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote:
> > The assignments to clear a through h and t1/t2 are optimized out by the
> > compiler because they are unused after the assignments.
> >
> >
On Wed, Oct 21, 2020 at 10:02:19PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote:
> > This reduces code size substantially (on x86_64 with gcc-10 the size of
> > sha256_update() goes from 7593 bytes to 1952 bytes including the new
>
On Tue, Oct 20, 2020 at 09:36:00PM +, David Laight wrote:
> From: Arvind Sankar
> > Sent: 20 October 2020 21:40
> >
> > Putting the round constants and the message schedule arrays together in
> > one structure saves one register, which can be a significant benefit
() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256
v2:
- Add patch to combine K and W arrays, suggested by David
- Reformat SHA256_ROUND() macro a little
Arvind Sankar (6):
crypto: Use memzero_explicit() for clearing state
crypto: lib/sha256 - Don't clear temporary variables
crypto: lib/sha256 - Clear W[] in sha256_update() instead of
sh
Putting the round constants and the message schedule arrays together in
one structure saves one register, which can be a significant benefit on
register-constrained architectures. On x86-32 (tested on Broadwell
Xeon), this gives a 10% performance benefit.
Signed-off-by: Arvind Sankar
Suggested
equivalent to knowing one 64-byte block's SHA256
hash (with non-standard initial value) which, assuming SHA256 is secure,
doesn't reveal any information about the input.
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/crypto/s
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
(tested on Broadwell Xeon) while not increasing code size too much.
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 24
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/lib
This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll
(tesed on Broadwell Xeon).
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 166
.
Signed-off-by: Arvind Sankar
---
include/crypto/sha1_base.h | 3 ++-
include/crypto/sha256_base.h | 3 ++-
include/crypto/sha512_base.h | 3 ++-
include/crypto/sm3_base.h| 3 ++-
lib/crypto/sha256.c | 2 +-
5 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/include/crypto
On Tue, Oct 20, 2020 at 02:55:47PM +, David Laight wrote:
> From: Arvind Sankar
> > Sent: 20 October 2020 15:07
> > To: David Laight
> >
> > On Tue, Oct 20, 2020 at 07:41:33AM +, David Laight wrote:
> > > From: Arvind Sankar> Sent: 19 October 2020
On Tue, Oct 20, 2020 at 07:41:33AM +, David Laight wrote:
> From: Arvind Sankar> Sent: 19 October 2020 16:30
> > To: Herbert Xu ; David S. Miller
> > ; linux-
> > cry...@vger.kernel.org
> > Cc: linux-ker...@vger.kernel.org
> > Subject: [PATCH 4/5] crypt
equivalent to knowing one 64-byte block's SHA256
hash (with non-standard initial value) which, assuming SHA256 is secure,
doesn't reveal any information about the input.
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/crypto/s
barriers. I don't
think it's really necessary to clear them, but I'm not a cryptanalyst,
so I would like comment on whether it's indeed safe not to, or we should
instead add the required barriers to force clearing.
The last three patches are optimizations for generic sha25
.
Signed-off-by: Arvind Sankar
---
include/crypto/sha1_base.h | 3 ++-
include/crypto/sha256_base.h | 3 ++-
include/crypto/sha512_base.h | 3 ++-
include/crypto/sm3_base.h| 3 ++-
lib/crypto/sha256.c | 2 +-
5 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/include/crypto
Unrolling the LOAD and BLEND loops improves performance by ~8% on x86
while not increasing code size too much.
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 24
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/lib/crypto/sha256.c b/lib/crypto
() implementation, and considerably more (~20%) with a
bad one (eg the x86 purgatory currently uses a memset() coded in C).
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256
This reduces code size substantially (on x86_64 with gcc-10 the size of
sha256_update() goes from 7593 bytes to 1952 bytes including the new
SHA256_K array), and on x86 is slightly faster than the full unroll.
Signed-off-by: Arvind Sankar
---
lib/crypto/sha256.c | 164
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: bec500777089b3c96c53681fc0aa6fee59711d4a
Gitweb:
https://git.kernel.org/tip/bec500777089b3c96c53681fc0aa6fee59711d4a
Author:Arvind Sankar
AuthorDate:Mon, 07 Oct 2019 18:00:02 -04:00
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: bec500777089b3c96c53681fc0aa6fee59711d4a
Gitweb:
https://git.kernel.org/tip/bec500777089b3c96c53681fc0aa6fee59711d4a
Author:Arvind Sankar
AuthorDate:Mon, 07 Oct 2019 18:00:02 -04:00
rypto: sha256 - Use get/put_unaligned_be32 to get input,
memzero_explicit")
Reviewed-by: Hans de Goede
Tested-by: Hans de Goede
Signed-off-by: Arvind Sankar
---
include/linux/string.h | 21 -
lib/string.c | 21 -
2 files changed, 20 inserti
On Mon, Oct 07, 2019 at 05:40:07PM +0200, Ingo Molnar wrote:
>
> * Arvind Sankar wrote:
>
> > With the barrier in there, is there any reason to *not* inline the
> > function? barrier_data() is an asm statement that tells the compiler
> > that the asm uses the memory
> > > > On 07-10-2019 16:00, Ingo Molnar wrote:
> > > > >
> > > > > * Hans de Goede wrote:
> > > > >
> > > > > > The purgatory code now uses the shared lib/crypto/sha256.c sha256
> > > > > > i
gt;> On 07-10-2019 10:59, Stephan Mueller wrote:
> >>> Am Montag, 7. Oktober 2019, 10:55:01 CEST schrieb Hans de Goede:
> >>>
> >>> Hi Hans,
> >>>
> >>>> The purgatory code now uses the shared lib/crypto/sha256.c sha256
> >>
19, 10:55:01 CEST schrieb Hans de Goede:
> > >
> > > Hi Hans,
> > >
> > >> The purgatory code now uses the shared lib/crypto/sha256.c sha256
> > >> implementation. This needs memzero_explicit, implement this.
> > >>
> > >> Report
41 matches
Mail list logo