On Sun, Jun 26, 2022 at 5:54 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch was motivated by the investigation of Linus Torvalds' spill > heavy cryptography kernels in PR 105930. The <any_rotate>di3 expander > handles all rotations by an immediate constant for 1..63 bits with the > exception of 32 bits, which FAILs and is then split by the middle-end. > This patch makes these 32-bit doubleword rotations consistent with the > other DImode rotations during reload, which results in reduced register > pressure, fewer instructions and the use of x86's xchg instruction > when appropriate. In theory, xchg can be handled by register renaming, > but even on micro-architectures where it's implemented by 3 uops (no > worse than a three instruction shuffle), avoiding nominating a > "temporary" register, reduces user-visible register pressure (and > has obvious code size benefits). > > To effects are best shown with the new testcase: > > unsigned long long bar(); > unsigned long long foo() > { > unsigned long long x = bar(); > return (x>>32) | (x<<32); > } > > for which GCC with -m32 -O2 currently generates: > > subl $12, %esp > call bar > addl $12, %esp > movl %eax, %ecx > movl %edx, %eax > movl %ecx, %edx > ret > > but with this patch now generates: > > subl $12, %esp > call bar > addl $12, %esp > xchgl %edx, %eax > ret > > With this patch, the number of lines of assembly language generated > for the blake2b kernel (from the attachment to PR105930) decreases > from 5626 to 5404. Although there's an impressive reduction in > instruction count, there's no change/reduction in stack frame size. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-06-26 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (swap_mode): Rename from *swap<mode> to > provide gen_swapsi. > (<any_rotate>di3): Handle !TARGET_64BIT rotations via new > gen_ix86_<insn>32di2_doubleword below. > (ix86_<anyrotate>32di2_doubleword): New define_insn_and_split > that splits after reload as either a pair of move instructions > or an xchgl (using gen_swapsi). > > gcc/testsuite/ChangeLog > * gcc.target/i386/xchg-3.c: New test case.
+(define_insn_and_split "ix86_<insn>32di2_doubleword" We don't encode the target in the insn name - <insn>32di2_doubleword should be OK. + [(set (match_operand:DI 0 "register_operand" "=r,r") + (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "r,o") + (const_int 32)))] Please use "=r,r,r"/"0,r,o" constraints here. Uros. > > Thanks in advance, > Roger > -- >