On Mon, May 30, 2022 at 11:11 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > This is a ping of my patch from April, which as you've suggested should be > submitted > for review even if there remain two missed-optimization regressions on ia32 > (to > allow reviewers to better judge if those fixes are appropriate/the best > solution). > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html > > The executive summary is that the core of this patch is a single pre-reload > splitter: > > (define_insn_and_split "*cmp<dwi>_doubleword" > [(set (reg:CCZ FLAGS_REG) > (compare:CCZ (match_operand:<DWI> 0 "nonimmediate_operand") > (match_operand:<DWI> 1 "x86_64_general_operand")))] > "ix86_pre_reload_split ()" > "#" > "&& 1" > [(parallel [(set (reg:CCZ FLAGS_REG) > (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5)) > (const_int 0))) > (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])] > > That allows the RTL optimizers to assume the target has a double word > equality/inequality comparison during combine, but then split this into > an CC setting IOR of the lowpart and highpart just before reload. > > The intended effect is that for PR target/70321's test case: > > void foo (long long ixi) > { > if (ixi != 14348907) > __builtin_abort (); > } > > where with -m32 -O2 GCC previously produced: > > movl 16(%esp), %eax > movl 20(%esp), %edx > xorl $14348907, %eax > orl %eax, %edx > jne .L3 > > we now produce the slightly improved: > > movl 16(%esp), %eax > xorl $14348907, %eax > orl 20(%esp), %eax > jne .L3 > > Similar improvements are seen with _int128 equality on TARGET_64BIT. > > The rest of the patch, in fact the bulk of it, is purely to adjust the other > parts of the i386 backend that make the assumption that double word > equality has been lowered during RTL expansion, including for example > STV which turns DImode equality into SSE ptest, which previously > explicitly looked for the IOR of lowpart/highpart. > > This patch has been retested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, with no new failures. However, when adding > --target_board=unix{-m32} there two new missed optimization FAILs > both related to pandn. > FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn > FAIL: gcc.target/i386/pr78794.c scan-assembler pandn > > These become the requested test cases for the fix proposed here: > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html > > OK for mainline, now we're in stage 1? > > > 2022-05-30 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/70321 > * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose > DI mode equality/inequality using XOR here. Instead generate a > COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode). > * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use > gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode. > (general_scalar_chain::convert_compare): New function to convert > scalar equality/inequality comparison into vector operations. > (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call > new convert_compare helper method. > (convertible_comparion_p): Update to match doubleword COMPARE > of two register, memory or integer constant operands. > * config/i386/i386-features.h > (general_scalar_chain::convert_compare): > Prototype/declare member function here. > * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but > only allow new doubleword modes for EQ and NE operators. > (*cmp<dwi>_doubleword): New define_insn_and_split, to split a > doubleword comparison into a pair of XORs followed by an IOR to > set the (zero) flags register, optimizing the XORs if possible. > * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode > iterator; > V_AVX is (currently) only used by ptest. > (sse4_1 mode attribute): Update to support V1TI and V2TI. > > gcc/testsuite/ChangeLog > PR target/70321 > * gcc.target/i386/pr70321.c: New test case. > * gcc.target/i386/sse4_1-stv-1.c: New test case.
OK. Thanks, Uros. > > > Many thanks in advance, > Roger > -- >