https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86753

--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
To give a few more details:

void
f1 (int *x, int *y, int *z)
{
  for (int i = 0; i < 100; ++i)
    x[i] = y[i] ? z[i] : 10;
}

produces:

        ptrue   p3.s, all
        ...
.L3:
        ld1w    z0.s, p2/z, [x1, x3, lsl 2]
        cmpne   p1.s, p3/z, z0.s, #0
        cmpne   p0.s, p2/z, z0.s, #0
        ld1w    z0.s, p0/z, [x2, x3, lsl 2]
        sel     z0.s, p1, z0.s, z1.s
        st1w    z0.s, p2, [x0, x3, lsl 2]
        incw    x3
        whilelo p2.s, w3, w4
        b.any   .L3

where the select is fed by an unpredicated comparison while the
masked load is (and in general needs to be, although not in this
case) fed by a predicated comparison.

The gimple output of the vectoriser is:

  mask__35.10_54 = vect__4.9_52 != vect_cst__53;
  vec_mask_and_57 = mask__35.10_54 & loop_mask_51;
  vect_iftmp.13_58 = .MASK_LOAD (vectp_z.11_55, 4B, vec_mask_and_57);
  vect_iftmp.14_61 = VEC_COND_EXPR <vect__4.9_52 != vect_cst__59,
vect_iftmp.13_58, vect_cst__60>;

I think when vectorising VEC_COND_EXPR <C, T, E>, we should check
whether T is ultimately conditional on C, either via a masked load
or a conditional internal function.  If so, we should reuse the same
condition for the VEC_COND_EXPR too (which for SVE means using the
condition with the loop mask applied).

Alternatively, we could just keep a hash of available vector conditions
that have been used in masked loads or conditional internal functions,
and try to reuse them when vectorising VEC_COND_EXPRs (without checking
T and E specifically).  That might be simpler.

Either way, the end goal would be to use vec_mask_and_57 in the
VEC_COND_EXPR.

For:

  void
  f2 (int *x, int *y, int *z, int fallback)
  {
    for (int i = 0; i < 100; ++i)
      x[i] = y[i] ? z[i] : fallback;
  }

we instead put the masked load in the "else" of the VEC_COND_EXPR:

  mask__36.49_55 = vect__4.48_53 != vect_cst__54;
  vec_mask_and_58 = mask__36.49_55 & loop_mask_52;
  vect_iftmp.52_59 = .MASK_LOAD (vectp_z.50_56, 4B, vec_mask_and_58);
  vect_iftmp.53_62 = VEC_COND_EXPR <vect__4.48_53 == vect_cst__60,
vect_cst__61, vect_iftmp.52_59>;

so we'd need to check that case too, and invert the VEC_COND_EXPR
condition where necessary.

Reply via email to