Extends the x86_64 SSE2 Poly1305 authenticator by a function processing two
consecutive Poly1305 blocks in parallel using a derived key r^2. Loop
unrolling can be more effectively mapped to SSE instructions, further
increasing throughput.
For large messages, throughput increases by ~45-65% compare
Extends the x86_64 Poly1305 authenticator by a function processing four
consecutive Poly1305 blocks in parallel using AVX2 instructions.
For large messages, throughput increases by ~15-45% compared to two
block SSE2:
testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes
Extends the x86_64 ChaCha20 implementation by a function processing eight
ChaCha20 blocks in parallel using AVX2.
For large messages, throughput increases by ~55-70% compared to four block
SSSE3:
testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 o
Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.
For large messages, throughput increases by ~110% compared to single block
SSSE3:
testing s
Implements an x86_64 assembler driver for the Poly1305 authenticator. This
single block variant holds the 130-bit integer in 5 32-bit words, but uses
SSE to do two multiplications/additions in parallel.
When calling updates with small blocks, the overhead for kernel_fpu_begin/
kernel_fpu_end() neg
As architecture specific drivers need a software fallback, export Poly1305
init/update/final functions together with some helpers in a header file.
Signed-off-by: Martin Willi
---
crypto/chacha20poly1305.c | 4 +--
crypto/poly1305_generic.c | 73 +++
As architecture specific drivers need a software fallback, export a
ChaCha20 en-/decryption function together with some helpers in a header
file.
Signed-off-by: Martin Willi
---
crypto/chacha20_generic.c | 28
crypto/chacha20poly1305.c | 3 +--
include/crypto/chacha
This patch series adds both ChaCha20 and Poly1305 specific ciphers for
x86_64 using SSE2/SSSE3 and AVX2 instructions. The idea is to have a drop-in
replacement for AESNI/CLMUL-accelerated AES-GCM providing at least somewhat
comparable performance, refer to RFC7539 for details. It is based on crypto
The AVX2 variant of ChaCha20 is used only for messages with >= 512 bytes
length. With the existing test vectors, the implementation could not be
tested. Due that lack of such a long official test vector, this one is
self-generated using chacha20-generic.
Signed-off-by: Martin Willi
---
crypto/te
Adds individual ChaCha20 and Poly1305 and a combined rfc7539esp AEAD speed
test using mode numbers 214, 321 and 213. For Poly1305 we add a specific
speed template, as it expects the key prepended to the input data.
Signed-off-by: Martin Willi
---
crypto/tcrypt.c | 15 +++
crypto/tcry
Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.
For large messages, throughput increases by ~65% compared to
chac
Herbert,
> This patch converts rfc7539 and rfc7539esp to the new AEAD interface.
> The test vectors for rfc7539esp have also been updated to include
> the IV.
Thanks for taking care of it, I haven't found the time yet to do it
myself. I can confirm that it works fine under IPsec load, so you may
12 matches
Mail list logo