On Wed, Jun 28, 2023 at 3:32 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Doh! Wrong patch... > Roger > -- > > From: Roger Sayle <ro...@nextmovesoftware.com> > Sent: 27 June 2023 20:28 > To: 'gcc-patches@gcc.gnu.org' <gcc-patches@gcc.gnu.org> > Cc: 'Uros Bizjak' <ubiz...@gmail.com>; 'Hongtao Liu' <crazy...@gmail.com> > Subject: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector > equality. > > > Hi Uros, > > Hopefully Hongtao will approve my patch to support SUBREG conversions > in STV https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html > but for some of the examples described in the above post (and its test > case), I've also come up with an alternate/complementary/supplementary > fix of generating the PTEST during RTL expansion, rather than rely on > this being caught/optimized later during STV. > > You may notice in this patch, the tests for TARGET_SSE4_1 and TImode > appear last. When I was writing this, I initially also added support > for AVX VPTEST and OImode, before realizing that x86 doesn't (yet) > support 256-bit OImode (which also explains why we don't have an OImode > to V1OImode scalar-to-vector pass). Retaining this clause ordering > should minimize the lines changed if things change in future. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-06-27 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_int_compare): If > testing a TImode SUBREG of a 128-bit vector register against > zero, use a PTEST instruction instead of first moving it to > to scalar registers. > > > Please let me know what you think. > Roger > -- >
+ /* Attempt to use PTEST, if available, when testing vector modes for + equality/inequality against zero. */ + if (op1 == const0_rtx + && SUBREG_P (op0) + && cmpmode == CCZmode + && SUBREG_BYTE (op0) == 0 + && REG_P (SUBREG_REG (op0)) Just register_operand (op0, TImode), + && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0))) + && TARGET_SSE4_1 + && GET_MODE (op0) == TImode + && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16) + { + tmp = SUBREG_REG (op0); and tmp = lowpart_subreg (V1TImode, force_reg (TImode, op0));? I think RA can handle SUBREG correctly, no need for extra predicates. + tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST); + } + else + tmp = gen_rtx_COMPARE (cmpmode, op0, op1); -- BR, Hongtao