On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch is a follow-up to my previous PR target/110551 patch, this > time to address the additional move after mulx, seen on TARGET_BMI2 > architectures (such as -march=haswell). The complication here is > that the flexible multiple-set mulx instruction is introduced into > RTL after reload, by split2, and therefore can't benefit from register > preferencing. This results in RTL like the following: > > (insn 32 31 17 2 (parallel [ > (set (reg:DI 4 si [orig:101 r ] [101]) > (mult:DI (reg:DI 1 dx [109]) > (reg:DI 5 di [109]))) > (set (reg:DI 5 di [ r+8 ]) > (umul_highpart:DI (reg:DI 1 dx [109]) > (reg:DI 5 di [109]))) > ]) "pr110551-2.c":8:17 -1 > (nil)) > > (insn 17 32 9 2 (set (reg:DI 0 ax [107]) > (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal} > (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ]) > (nil))) > > Here insn 32, the mulx instruction, places its results in si and di, > and then immediately after decides to move di to ax, with di now dead. > This can be trivially cleaned up by a peephole2. I've added an > additional constraint that the two SET_DESTs can't be the same > register to avoid confusing the middle-end, but this has well-defined > behaviour on x86_64/BMI2, encoding a umul_highpart. > > For the new test case, compiled on x86_64 with -O2 -march=haswell: > > Before: > mulx64: movabsq $-7046029254386353131, %rdx > mulx %rdi, %rsi, %rdi > movq %rdi, %rax > xorq %rsi, %rax > ret > > After: > mulx64: movabsq $-7046029254386353131, %rdx > mulx %rdi, %rsi, %rax > xorq %rsi, %rax > ret > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline?
It looks that your previous PR110551 patch regressed -march=cascadelake [1]. Let's fix these regressions first. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html Uros. > 2023-10-30 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/110551 > * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition > as operands[2] with predicate register_operand must be !MEM_P. > (peephole2): Optimize a mulx followed by a register-to-register > move, to place result in the correct destination if possible. > > gcc/testsuite/ChangeLog > PR target/110551 > * gcc.target/i386/pr110551-2.c: New test case. > > > Thanks in advance, > Roger > -- >