On Mon, Aug 15, 2022 at 10:29 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Many thanks to Uros for reviewing/approving all of the previous pieces.
> This patch adds support for converting 128-bit TImode shifts and rotates
> to SSE equivalents using V1TImode during the TImode STV pass.
> Previously, only logical shifts by multiples of 8 were handled
> (from my patch earlier this month).
>
> As an example of the benefits, the following rotate by 32-bits:
>
> unsigned __int128 a, b;
> void rot32() { a = (b >> 32) | (b << 96); }
>
> when compiled on x86_64 with -O2 previously generated:
>
>         movq    b(%rip), %rax
>         movq    b+8(%rip), %rdx
>         movq    %rax, %rcx
>         shrdq   $32, %rdx, %rax
>         shrdq   $32, %rcx, %rdx
>         movq    %rax, a(%rip)
>         movq    %rdx, a+8(%rip)
>         ret
>
> with this patch, now generates:
>
>         movdqa  b(%rip), %xmm0
>         pshufd  $57, %xmm0, %xmm0
>         movaps  %xmm0, a(%rip)
>         ret
>
> [which uses a V4SI permutation for those that don't read SSE].
> This should help 128-bit cryptography codes, that interleave XORs
> with rotations (but that don't use additions or subtractions).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-15  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.cc
>         (timode_scalar_chain::compute_convert_gain): Provide costs for
>         shifts and rotates.  Provide gains for comparisons against 0/-1.

Please split out the compare part, it doesn't fit under "Support
shifts and rotates by integer constants in TImode STV." summary.

>         (timode_scalar_chain::convert_insn): Handle ASHIFTRT, ROTATERT
>         and ROTATE just like existing ASHIFT and LSHIFTRT cases.
>         (timode_scalar_to_vector_candidate_p): Handle all shifts and
>         rotates by integer constants between 0 and 127.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse4_1-stv-9.c: New test case.

OK for the patch without COMPARE stuff, the separate COMPARE patch is
pre-approved.

Thanks,
Uros.

Reply via email to