On Mon, May 30, 2022 at 11:11 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> This is a ping of my patch from April, which as you've suggested should be
> submitted
> for review even if there remain two missed-optimization regressions on ia32
> (to
> allow reviewers to better judge if those fixes are appropriate/the best
> solution).
> https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html
>
> The executive summary is that the core of this patch is a single pre-reload
> splitter:
>
> (define_insn_and_split "*cmp<dwi>_doubleword"
>   [(set (reg:CCZ FLAGS_REG)
>        (compare:CCZ (match_operand:<DWI> 0 "nonimmediate_operand")
>                     (match_operand:<DWI> 1 "x86_64_general_operand")))]
>   "ix86_pre_reload_split ()"
>   "#"
>   "&& 1"
>   [(parallel [(set (reg:CCZ FLAGS_REG)
>                   (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5))
>                                (const_int 0)))
>              (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
>
> That allows the RTL optimizers to assume the target has a double word
> equality/inequality comparison during combine, but then split this into
> an CC setting IOR of the lowpart and highpart just before reload.
>
> The intended effect is that for PR target/70321's test case:
>
> void foo (long long ixi)
> {
>   if (ixi != 14348907)
>     __builtin_abort ();
> }
>
> where with -m32 -O2 GCC previously produced:
>
>         movl    16(%esp), %eax
>         movl    20(%esp), %edx
>         xorl    $14348907, %eax
>         orl     %eax, %edx
>         jne     .L3
>
> we now produce the slightly improved:
>
>         movl    16(%esp), %eax
>         xorl    $14348907, %eax
>         orl     20(%esp), %eax
>         jne     .L3
>
> Similar improvements are seen with _int128 equality on TARGET_64BIT.
>
> The rest of the patch, in fact the bulk of it, is purely to adjust the other
> parts of the i386 backend that make the assumption that double word
> equality has been lowered during RTL expansion, including for example
> STV which turns DImode equality into SSE ptest, which previously
> explicitly looked for the IOR of lowpart/highpart.
>
> This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures.  However, when adding
> --target_board=unix{-m32} there two new missed optimization FAILs
> both related to pandn.
> FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn
> FAIL: gcc.target/i386/pr78794.c scan-assembler pandn
>
> These become the requested test cases for the fix proposed here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html
>
> OK for mainline, now we're in stage 1?
>
>
> 2022-05-30  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/70321
>         * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
>         DI mode equality/inequality using XOR here.  Instead generate a
>         COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
>         * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
>         gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
>         (general_scalar_chain::convert_compare): New function to convert
>         scalar equality/inequality comparison into vector operations.
>         (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
>         new convert_compare helper method.
>         (convertible_comparion_p): Update to match doubleword COMPARE
>         of two register, memory or integer constant operands.
>         * config/i386/i386-features.h
> (general_scalar_chain::convert_compare):
>         Prototype/declare member function here.
>         * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but
>         only allow new doubleword modes for EQ and NE operators.
>         (*cmp<dwi>_doubleword): New define_insn_and_split, to split a
>         doubleword comparison into a pair of XORs followed by an IOR to
>         set the (zero) flags register, optimizing the XORs if possible.
>         * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
> iterator;
>         V_AVX is (currently) only used by ptest.
>         (sse4_1 mode attribute): Update to support V1TI and V2TI.
>
> gcc/testsuite/ChangeLog
>         PR target/70321
>         * gcc.target/i386/pr70321.c: New test case.
>         * gcc.target/i386/sse4_1-stv-1.c: New test case.

OK.

Thanks,
Uros.

>
>
> Many thanks in advance,
> Roger
> --
>

Reply via email to