Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.

Uros Bizjak via Gcc-patches Fri, 03 Jun 2022 03:08:37 -0700

On Fri, Jun 3, 2022 at 11:49 AM Roger Sayle <[email protected]> wrote:
>
>
> Technically, PR target/91681 has already been resolved; we now recognize the
> highpart multiplication at the tree-level, we no longer use the stack, and
> we currently generate the same number of instructions as LLVM.  However, it
> is still possible to do better, the current x86_64 code to generate a double
> word addition of a zero extended operand, looks like:
>
>         xorl    %r11d, %r11d
>         addq    %r10, %rax
>         adcq    %r11, %rdx
>
> when it's possible (as LLVM does) to use an immediate constant:
>
>         addq    %r10, %rax
>         adcq    $0, %rdx
>
> To do this, the backend required one or two simple changes, that
> then themselves required one or two more obscure tweaks.
>
> The simple starting point is to define a zero_extendditi2 pattern,
> for zero extension from DImode to TImode on TARGET_64BIT that is
> split after reload.  Double word (TImode) addition/subtraction is
> split after reload, so that constrains when things should happen.
>
> With zero extension now visible to combine, we add two new
> define_insn_and_split that add/subtract a zero extended operand
> in double word mode.  These apply to both 32-bit and 64-bit code
> generation, to produce adc $0 and sbb $0.
>
> The first strange tweak is that these new patterns interfere with
> the optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair
> of register moves, or more accurately the combine splitter no longer
> triggers as we're now converting two instructions into two instructions
> (not three instructions into two instructions).  This is easily
> repaired (and extended to handle TImode) by changing from a pair
> of define_split (that handle operand commutativity) to a set of
> four define_insn_and_split (again to handle operand commutativity).
>
> The other/final strange tweak that the above splitters now interfere
> with AVX512's kunpckdq instruction which is defined as identical RTL,
> DW:DI = (HI:SI<<32)|zero_extend(LO:SI).  To distinguish this, and
> also avoid AVX512 mask registers being used by reload to perform
> SImode scalar shifts, I've added the explicit (unspec UNSPEC_MASKOP)
> to the unpack mask operations, which matches what sse.md does for
> the other mask specific (logic) operations.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-06-03  Roger Sayle  <[email protected]>
>
> gcc/ChangeLog
>         PR target/91681
>         * config/i386/i386.md (zero_extendditi2): New define_insn_and_split.
>         (*add<dwi>3_doubleword_zext): New define_insn_and_split.
>         (*sub<dwi>3_doubleword_zext): New define_insn_and_split.
>         (*concat<mode><dwi>3_1): New define_insn_and_split replacing
>         previous define_split for implementing DST = (HI<<32)|LO as
>         pair of move instructions, setting lopart and hipart.
>         (*concat<mode><dwi>3_2): Likewise.
>         (*concat<mode><dwi>3_3): Likewise, where HI is zero_extended.
>         (*concat<mode><dwi>3_4): Likewise, where HI is zero_extended.
>         * config/i386/sse.md (kunpckhi): Add UNSPEC_MASKOP unspec.
>         (kunpcksi): Likewise, add UNSPEC_MASKOP unspec.
>         (kunpckdi): Likewise, add UNSPEC_MASKOP unspec.
>         (vec_pack_trunc_qi): Update to specify required UNSPEC_MASKOP
> unspec.
>         (vec_pack_trunc_<mode>): Likewise.
>
> gcc/testsuite/ChangeLog
>         PR target/91681
>         * g++.target/i386/pr91681.C: New test case (from the PR).
>         * gcc.target/i386/pr91681-1.c: New int128 test case.
>         * gcc.target/i386/pr91681-2.c: Likewise.
>         * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.


+  "MEM_P (operands[0]) ? rtx_equal_p (operands[0], operands[1])
+ && !MEM_P (operands[2])
+       : !MEM_P (operands[1])"

Can we use ix86_binary_operator_ok (UNKNOWN, ...mode..., operands) here instead?

(UNKNOWN RTX code is used to prevent unwanted optimization with
commutative operands).

Uros.

Re: [x86 PATCH] PR target/91681: zero_extendditi2 pattern for more optimizations.

Reply via email to