Richard Biener <[email protected]> writes:
> On December 12, 2019 12:56:01 AM GMT+01:00, Jakub Jelinek <[email protected]>
> wrote:
>>Hi!
>>
>>The AVX512{F,VL} vector comparisons that set %kN registers are
>>represented
>>in RTL as comparisons with vector mode operands and scalar integral
>>result,
>>where at runtime the scalar integer is filled with a bitmask.
>>Unfortunately, simplify_relational_operation would fold e.g.
>>(eq:SI (reg:V32HI x) (reg:V32HI x))
>>into (const_int 1) rather than (const_int -1) that is expected (all
>>elements
>>equal). simplify_const_relational_operation is documented to always
>>return
>>just const0_rtx or const_true_rtx and simplify_relational_operation is
>>expected to fix this up, for vector comparisons with vector result it
>>duplicates the 0 or -1 into all elements, etc., so this patch adds
>>handling
>>for this case there too.
>>
>>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> So there's no whole vector equality RTX but we have to pun to integer modes
> for that? The eq:SImode would suggest that. Guess we should have used a
> BImode vector representation...
>
> Can you check whether we have any target with whole vector compare patterns
> that would break here?
We also have vector cbranch<mode>4 for vectors, which does a genuine
scalar comparison of vectors. It happens that the representation of
them on x86 (using CCmodes only) means that there's currently no
ambiguity with the elementwise vector-to-scalar comparisons above.
But I think further encoding this assumption is going to cause problems
eventually.
E.g. the scalar comparison { 0, 0 } == { 0, 1 } should be false when
used with cbranch<mode>4, but should be 2 when used with an elementwise
interpretation. If the cbranch<mode>4 interpretation is turned into
a cstore for (cbranch<mode>4 ? 1 : 0), we could easily end up wanting
(eq:SI ...) of those vectors to return false rather than 2. Even for
{ 0, 0 } == { 0, 0 }, a cstore (eq:SI ...) would need to be 1 rather
than 3.
I wonder how easy it would be to make the mask registers use
MODE_VECTOR_BOOL instead of scalar integers... :-)
For now I think the safest thing would be to punt, rather than assume
one thing or the other. simplify_const_relational_operation doesn't
handle many vector cases anyway, and most interesting optimisations
like this should happen on gimple, where we know whether the condition
result is a vector or a scalar.
Thanks,
Richard
>
> Richard.
>
>>2019-12-11 Jakub Jelinek <[email protected]>
>>
>> PR target/92908
>> * simplify-rtx.c (simplify_relational_operation): For vector cmp_mode
>> and scalar mode, if simplify_relational_operation returned
>> const_true_rtx, return a scalar bitmask of all ones.
>> (simplify_const_relational_operation): Change VOID_mode in function
>> comment to VOIDmode.
>>
>> * gcc.target/i386/avx512bw-pr92908.c: New test.
>>
>>--- gcc/simplify-rtx.c.jj 2019-11-19 22:27:02.000058742 +0100
>>+++ gcc/simplify-rtx.c 2019-12-11 13:31:57.197809704 +0100
>>@@ -5037,6 +5037,23 @@ simplify_relational_operation (enum rtx_
>> return NULL_RTX;
>> #endif
>> }
>>+ if (VECTOR_MODE_P (cmp_mode)
>>+ && SCALAR_INT_MODE_P (mode)
>>+ && tem == const_true_rtx)
>>+ {
>>+ /* Vector comparisons that expect a scalar integral
>>+ bitmask. For const0_rtx the result is already correct,
>>+ for const_true_rtx we need all bits set. */
>>+ int n_elts;
>>+ scalar_int_mode smode = as_a <scalar_int_mode> (mode);
>>+ gcc_assert (GET_MODE_NUNITS (cmp_mode).is_constant (&n_elts)
>>+ && GET_MODE_PRECISION (smode) <= n_elts);
>>+ if (GET_MODE_PRECISION (smode) == n_elts)
>>+ return constm1_rtx;
>>+ if (n_elts < HOST_BITS_PER_WIDE_INT)
>>+ return GEN_INT ((HOST_WIDE_INT_1U << n_elts) - 1);
>>+ return NULL_RTX;
>>+ }
>>
>> return tem;
>> }
>>@@ -5383,7 +5400,7 @@ comparison_result (enum rtx_code code, i
>> }
>>
>> /* Check if the given comparison (done in the given MODE) is actually
>>- a tautology or a contradiction. If the mode is VOID_mode, the
>>+ a tautology or a contradiction. If the mode is VOIDmode, the
>> comparison is done in "infinite precision". If no simplification
>> is possible, this function returns zero. Otherwise, it returns
>> either const_true_rtx or const0_rtx. */
>>--- gcc/testsuite/gcc.target/i386/avx512bw-pr92908.c.jj 2019-12-11
>>14:24:12.083418762 +0100
>>+++ gcc/testsuite/gcc.target/i386/avx512bw-pr92908.c 2019-12-11
>>14:23:56.071665326 +0100
>>@@ -0,0 +1,21 @@
>>+/* PR target/92908 */
>>+/* { dg-do run } */
>>+/* { dg-options "-Og -fno-tree-fre -mavx512bw" } */
>>+/* { dg-require-effective-target avx512bw } */
>>+
>>+#define AVX512BW
>>+#include "avx512f-helper.h"
>>+
>>+typedef unsigned short V __attribute__ ((__vector_size__ (64)));
>>+
>>+V v;
>>+
>>+void
>>+TEST (void)
>>+{
>>+ int i;
>>+ v = (V) v == v;
>>+ for (i = 0; i < 32; i++)
>>+ if (v[i] != 0xffff)
>>+ abort ();
>>+}
>>
>> Jakub