This patch fixes a wrong-code bug on amdgcn in which the excess "ones"
in the mask enable extra lanes that were supposed to be unused and are
therefore undefined.
Richi suggested an alternative approach involving narrower types and
then a zero-extend to the actual mask type. This solved the problem for
the specific test case that I had, but I'm not sure if it would work
with V2 and V4 modes (not that I've observed bad behaviour from them
anyway, but still). There were some other caveats involving "two-lane
division" that I don't fully understand, so I went with the simpler
implementation.
This patch does have the disadvantage of an additional "and" instruction
in the non-constant case even for machines that don't need it. I'm not
sure how to fix that without an additional target hook. (If GCC could
use the 64-lane vectors more effectively without the assistance of
artificially reduced sizes then this problem wouldn't exist.)
OK to commit?
Andrew
vect: Don't set excess bits in unform masks
AVX ignores any excess bits in the mask, but AMD GCN magically uses a larger
vector than was intended (the smaller sizes are "fake"), leading to wrong-code.
gcc/ChangeLog:
* expr.cc (store_constructor): Add "and" operation to uniform mask
generation.
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 4220cbd9f8f..fb4609f616e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -7440,7 +7440,7 @@ store_constructor (tree exp, rtx target, int cleared,
poly_int64 size,
break;
}
/* Use sign-extension for uniform boolean vectors with
- integer modes. */
+ integer modes. Effectively "vec_duplicate" for bitmasks. */
if (!TREE_SIDE_EFFECTS (exp)
&& VECTOR_BOOLEAN_TYPE_P (type)
&& SCALAR_INT_MODE_P (mode)
@@ -7449,7 +7449,21 @@ store_constructor (tree exp, rtx target, int cleared,
poly_int64 size,
{
rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)),
expand_normal (elt));
- convert_move (target, op0, 0);
+ rtx tmp = gen_reg_rtx (mode);
+ convert_move (tmp, op0, 0);
+
+ if (known_ne (TYPE_VECTOR_SUBPARTS (type),
+ GET_MODE_PRECISION (mode)))
+ {
+ /* Ensure no excess bits are set.
+ GCN needs this, AVX does not. */
+ expand_binop (mode, and_optab, tmp,
+ GEN_INT ((1 << (TYPE_VECTOR_SUBPARTS (type)
+ .to_constant())) - 1),
+ target, true, OPTAB_DIRECT);
+ }
+ else
+ emit_move_insn (target, tmp);
break;
}