On Thu, Aug 23, 2018 at 05:48:45PM +0100, Ard Biesheuvel wrote:
> Replace the literal load of the addend vector with a sequence that
> performs each add individually. This sequence is only 2 instructions
> longer than the original, and 2% faster on Cortex-A53.
> 
> This is an improvement by itself, but also works around a Clang issue,
> whose integrated assembler does not implement the GNU ARM asm syntax
> completely, and does not support the =literal notation for FP registers
> (more info at https://bugs.llvm.org/show_bug.cgi?id=38642)
> 
> Cc: Nick Desaulniers <ndesaulni...@google.com>
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> ---
> v2: replace convoluted code involving a SIMD add to increment four BE counters
>     at once with individual add/rev/mov instructions
> 
>  arch/arm64/crypto/aes-modes.S | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herb...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Reply via email to