Doh! Wrong patch...
Roger
--
From: Roger Sayle <[email protected]>
Sent: 27 June 2023 20:28
To: '[email protected]' <[email protected]>
Cc: 'Uros Bizjak' <[email protected]>; 'Hongtao Liu' <[email protected]>
Subject: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector
equality.
Hi Uros,
Hopefully Hongtao will approve my patch to support SUBREG conversions
in STV https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
but for some of the examples described in the above post (and its test
case), I've also come up with an alternate/complementary/supplementary
fix of generating the PTEST during RTL expansion, rather than rely on
this being caught/optimized later during STV.
You may notice in this patch, the tests for TARGET_SSE4_1 and TImode
appear last. When I was writing this, I initially also added support
for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
support 256-bit OImode (which also explains why we don't have an OImode
to V1OImode scalar-to-vector pass). Retaining this clause ordering
should minimize the lines changed if things change in future.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2023-06-27 Roger Sayle <[email protected]>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_compare): If
testing a TImode SUBREG of a 128-bit vector register against
zero, use a PTEST instruction instead of first moving it to
to scalar registers.
Please let me know what you think.
Roger
--
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 9a8d244..814d63b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -2958,9 +2958,26 @@ ix86_expand_int_compare (enum rtx_code code, rtx op0,
rtx op1)
cmpmode = SELECT_CC_MODE (code, op0, op1);
flags = gen_rtx_REG (cmpmode, FLAGS_REG);
+ /* Attempt to use PTEST, if available, when testing vector modes for
+ equality/inequality against zero. */
+ if (op1 == const0_rtx
+ && SUBREG_P (op0)
+ && cmpmode == CCZmode
+ && SUBREG_BYTE (op0) == 0
+ && REG_P (SUBREG_REG (op0))
+ && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
+ && TARGET_SSE4_1
+ && GET_MODE (op0) == TImode
+ && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
+ {
+ tmp = SUBREG_REG (op0);
+ tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST);
+ }
+ else
+ tmp = gen_rtx_COMPARE (cmpmode, op0, op1);
+
/* This is very simple, but making the interface the same as in the
FP case makes the rest of the code easier. */
- tmp = gen_rtx_COMPARE (cmpmode, op0, op1);
emit_insn (gen_rtx_SET (flags, tmp));
/* Return the test that should be put into the flags user, i.e.