On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch is a follow-up to my previous PR target/110551 patch, this
> time to address the additional move after mulx, seen on TARGET_BMI2
> architectures (such as -march=haswell).  The complication here is
> that the flexible multiple-set mulx instruction is introduced into
> RTL after reload, by split2, and therefore can't benefit from register
> preferencing.  This results in RTL like the following:
>
> (insn 32 31 17 2 (parallel [
>             (set (reg:DI 4 si [orig:101 r ] [101])
>                 (mult:DI (reg:DI 1 dx [109])
>                     (reg:DI 5 di [109])))
>             (set (reg:DI 5 di [ r+8 ])
>                 (umul_highpart:DI (reg:DI 1 dx [109])
>                     (reg:DI 5 di [109])))
>         ]) "pr110551-2.c":8:17 -1
>      (nil))
>
> (insn 17 32 9 2 (set (reg:DI 0 ax [107])
>         (reg:DI 5 di [ r+8 ])) "pr110551-2.c":9:40 90 {*movdi_internal}
>      (expr_list:REG_DEAD (reg:DI 5 di [ r+8 ])
>         (nil)))
>
> Here insn 32, the mulx instruction, places its results in si and di,
> and then immediately after decides to move di to ax, with di now dead.
> This can be trivially cleaned up by a peephole2.  I've added an
> additional constraint that the two SET_DESTs can't be the same
> register to avoid confusing the middle-end, but this has well-defined
> behaviour on x86_64/BMI2, encoding a umul_highpart.
>
> For the new test case, compiled on x86_64 with -O2 -march=haswell:
>
> Before:
> mulx64: movabsq $-7046029254386353131, %rdx
>         mulx    %rdi, %rsi, %rdi
>         movq    %rdi, %rax
>         xorq    %rsi, %rax
>         ret
>
> After:
> mulx64: movabsq $-7046029254386353131, %rdx
>         mulx    %rdi, %rsi, %rax
>         xorq    %rsi, %rax
>         ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

It looks that your previous PR110551 patch regressed
-march=cascadelake [1]. Let's fix these regressions first.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html

Uros.

> 2023-10-30  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/110551
>         * config/i386/i386.md (*bmi2_umul<mode><dwi>3_1): Tidy condition
>         as operands[2] with predicate register_operand must be !MEM_P.
>         (peephole2): Optimize a mulx followed by a register-to-register
>         move, to place result in the correct destination if possible.
>
> gcc/testsuite/ChangeLog
>         PR target/110551
>         * gcc.target/i386/pr110551-2.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

Reply via email to