> An SSSE3 implementation of single-block HChaCha20 is also added so
> that XChaCha20 can use it rather than the generic
> implementation.  This required refactoring the ChaCha permutation
> into its own function. 

> [...]

> +ENTRY(chacha20_block_xor_ssse3)
> +     # %rdi: Input state matrix, s
> +     # %rsi: up to 1 data block output, o
> +     # %rdx: up to 1 data block input, i
> +     # %rcx: input/output length in bytes
> +
> +     # x0..3 = s0..3
> +     movdqa          0x00(%rdi),%xmm0
> +     movdqa          0x10(%rdi),%xmm1
> +     movdqa          0x20(%rdi),%xmm2
> +     movdqa          0x30(%rdi),%xmm3
> +     movdqa          %xmm0,%xmm8
> +     movdqa          %xmm1,%xmm9
> +     movdqa          %xmm2,%xmm10
> +     movdqa          %xmm3,%xmm11
> +
> +     mov             %rcx,%rax
> +     call            chacha20_permute
> +
>       # o0 = i0 ^ (x0 + s0)
>       paddd           %xmm8,%xmm0
>       cmp             $0x10,%rax
> @@ -189,6 +198,23 @@ ENTRY(chacha20_block_xor_ssse3)
>  
>  ENDPROC(chacha20_block_xor_ssse3)
>  
> +ENTRY(hchacha20_block_ssse3)
> +     # %rdi: Input state matrix, s
> +     # %rsi: output (8 32-bit words)
> +
> +     movdqa          0x00(%rdi),%xmm0
> +     movdqa          0x10(%rdi),%xmm1
> +     movdqa          0x20(%rdi),%xmm2
> +     movdqa          0x30(%rdi),%xmm3
> +
> +     call            chacha20_permute

AFAIK, the general convention is to create proper stack frames using
FRAME_BEGIN/END for non leaf-functions. Should chacha20_permute()
callers do so?

For the other parts:

Reviewed-by: Martin Willi <mar...@strongswan.org>


Reply via email to