On Wed, Jun 28, 2023 at 3:32 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Doh! Wrong patch...
> Roger
> --
>
> From: Roger Sayle <ro...@nextmovesoftware.com>
> Sent: 27 June 2023 20:28
> To: 'gcc-patches@gcc.gnu.org' <gcc-patches@gcc.gnu.org>
> Cc: 'Uros Bizjak' <ubiz...@gmail.com>; 'Hongtao Liu' <crazy...@gmail.com>
> Subject: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector
> equality.
>
>
> Hi Uros,
>
> Hopefully Hongtao will approve my patch to support SUBREG conversions
> in STV https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
> but for some of the examples described in the above post (and its test
> case), I've also come up with an alternate/complementary/supplementary
> fix of generating the PTEST during RTL expansion, rather than rely on
> this being caught/optimized later during STV.
>
> You may notice in this patch, the tests for TARGET_SSE4_1 and TImode
> appear last.  When I was writing this, I initially also added support
> for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
> support 256-bit OImode (which also explains why we don't have an OImode
> to V1OImode scalar-to-vector pass).  Retaining this clause ordering
> should minimize the lines changed if things change in future.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-27  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_int_compare): If
>         testing a TImode SUBREG of a 128-bit vector register against
>         zero, use a PTEST instruction instead of first moving it to
>         to scalar registers.
>
>
> Please let me know what you think.
> Roger
> --
>

+  /* Attempt to use PTEST, if available, when testing vector modes for
+     equality/inequality against zero.  */
+  if (op1 == const0_rtx
+      && SUBREG_P (op0)
+      && cmpmode == CCZmode
+      && SUBREG_BYTE (op0) == 0
+      && REG_P (SUBREG_REG (op0))
Just register_operand (op0, TImode),
+      && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
+      && TARGET_SSE4_1
+      && GET_MODE (op0) == TImode
+      && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
+    {
+      tmp = SUBREG_REG (op0);
and tmp = lowpart_subreg (V1TImode, force_reg (TImode, op0));?
I think RA can handle SUBREG correctly, no need for extra predicates.
+      tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST);
+    }
+  else
+    tmp = gen_rtx_COMPARE (cmpmode, op0, op1);



--
BR,
Hongtao

Reply via email to