On Sun, Jun 26, 2022 at 5:54 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch was motivated by the investigation of Linus Torvalds' spill
> heavy cryptography kernels in PR 105930.  The <any_rotate>di3 expander
> handles all rotations by an immediate constant for 1..63 bits with the
> exception of 32 bits, which FAILs and is then split by the middle-end.
> This patch makes these 32-bit doubleword rotations consistent with the
> other DImode rotations during reload, which results in reduced register
> pressure, fewer instructions and the use of x86's xchg instruction
> when appropriate.  In theory, xchg can be handled by register renaming,
> but even on micro-architectures where it's implemented by 3 uops (no
> worse than a three instruction shuffle), avoiding nominating a
> "temporary" register, reduces user-visible register pressure (and
> has obvious code size benefits).
>
> To effects are best shown with the new testcase:
>
> unsigned long long bar();
> unsigned long long foo()
> {
>   unsigned long long x = bar();
>   return (x>>32) | (x<<32);
> }
>
> for which GCC with -m32 -O2 currently generates:
>
>         subl    $12, %esp
>         call    bar
>         addl    $12, %esp
>         movl    %eax, %ecx
>         movl    %edx, %eax
>         movl    %ecx, %edx
>         ret
>
> but with this patch now generates:
>
>         subl    $12, %esp
>         call    bar
>         addl    $12, %esp
>         xchgl   %edx, %eax
>         ret
>
> With this patch, the number of lines of assembly language generated
> for the blake2b kernel (from the attachment to PR105930) decreases
> from 5626 to 5404. Although there's an impressive reduction in
> instruction count, there's no change/reduction in stack frame size.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-26  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386.md (swap_mode): Rename from *swap<mode> to
>         provide gen_swapsi.
>         (<any_rotate>di3): Handle !TARGET_64BIT rotations via new
>         gen_ix86_<insn>32di2_doubleword below.
>         (ix86_<anyrotate>32di2_doubleword): New define_insn_and_split
>         that splits after reload as either a pair of move instructions
>         or an xchgl (using gen_swapsi).
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/xchg-3.c: New test case.

+(define_insn_and_split "ix86_<insn>32di2_doubleword"

We don't encode the target in the insn name - <insn>32di2_doubleword
should be OK.

+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+       (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o")
+                      (const_int 32)))]

Please use "=r,r,r"/"0,r,o" constraints here.

Uros.

>
> Thanks in advance,
> Roger
> --
>

Reply via email to