On Mon, Aug 15, 2022 at 10:29 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Many thanks to Uros for reviewing/approving all of the previous pieces. > This patch adds support for converting 128-bit TImode shifts and rotates > to SSE equivalents using V1TImode during the TImode STV pass. > Previously, only logical shifts by multiples of 8 were handled > (from my patch earlier this month). > > As an example of the benefits, the following rotate by 32-bits: > > unsigned __int128 a, b; > void rot32() { a = (b >> 32) | (b << 96); } > > when compiled on x86_64 with -O2 previously generated: > > movq b(%rip), %rax > movq b+8(%rip), %rdx > movq %rax, %rcx > shrdq $32, %rdx, %rax > shrdq $32, %rcx, %rdx > movq %rax, a(%rip) > movq %rdx, a+8(%rip) > ret > > with this patch, now generates: > > movdqa b(%rip), %xmm0 > pshufd $57, %xmm0, %xmm0 > movaps %xmm0, a(%rip) > ret > > [which uses a V4SI permutation for those that don't read SSE]. > This should help 128-bit cryptography codes, that interleave XORs > with rotations (but that don't use additions or subtractions). > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-08-15 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-features.cc > (timode_scalar_chain::compute_convert_gain): Provide costs for > shifts and rotates. Provide gains for comparisons against 0/-1.
Please split out the compare part, it doesn't fit under "Support shifts and rotates by integer constants in TImode STV." summary. > (timode_scalar_chain::convert_insn): Handle ASHIFTRT, ROTATERT > and ROTATE just like existing ASHIFT and LSHIFTRT cases. > (timode_scalar_to_vector_candidate_p): Handle all shifts and > rotates by integer constants between 0 and 127. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse4_1-stv-9.c: New test case. OK for the patch without COMPARE stuff, the separate COMPARE patch is pre-approved. Thanks, Uros.