[Bug tree-optimization/54013] Loop with control flow not vectorized

alalaw01 at gcc dot gnu.org Mon, 15 Jun 2015 09:43:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013


alalaw01 at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-06-15
                 CC|                            |alalaw01 at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Indeed it does (confirmed).

So there are a few tricks here, but they are not Intel-specific, and don't even
look to require new tree codes. The loop body can be vectorized by computing
the (x < tab[i]) predicate across the vector, and then using a reduction opcode
(a bitwise-or reduction would be most natural but others work) to convert to a
scalar which then jumps out of the loop, i.e. if *any* of the lanes in the
vector would exit:

int foo (float x, float *tab)
{
  for (i = 2; i < 45; i+= 4)
    {
      v4sf v_tab = ...load from tab...
      unsigned v4si v_exit_cond = vec_cond_expr({x,x,x,x} < v_tab, -1, 0);
      if (reduc_max_expr (v_exit_cond)) break;
    }
  ...
}

The epilogue must then work out the value of i at exit (possibly a separate
epilogue for the "break" vs the other exit). I see two schemes:

(1) use vec_pack_trunc_expr, or similar, to narrow v_exit_cond down to a
scalar, where we can find the first set bit, and use this as an index to add to
the value still in i.

(2) compute a vector of the value i would have had if each element had been the
one that exitted:

v4si v_i_on_exit = vec_cond_expr (v_exit_cond,
    {i, i+1, i+2, i+3}, /* Maybe available as induction variable?  */
    {MAX_INT, MAX_INT, MAX_INT, MAX_INT})

and then take a reduc_min_expr to look for the *first* value of i that exits.

(There is one more issue, i.e. that we need to speculate the read of
tab[i+1...i+3], as the vector load will probably read all the lanes before we
know whether earlier iterations should have exited. So we'd need to have some
kind of check against that, or e.g. if tab[] were a global with known bounds.
Similar/complicated conditions apply to any/everything else in the loop, too!)

[Bug tree-optimization/54013] Loop with control flow not vectorized

Reply via email to