https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013
alalaw01 at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2015-06-15 CC| |alalaw01 at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from alalaw01 at gcc dot gnu.org --- Indeed it does (confirmed). So there are a few tricks here, but they are not Intel-specific, and don't even look to require new tree codes. The loop body can be vectorized by computing the (x < tab[i]) predicate across the vector, and then using a reduction opcode (a bitwise-or reduction would be most natural but others work) to convert to a scalar which then jumps out of the loop, i.e. if *any* of the lanes in the vector would exit: int foo (float x, float *tab) { for (i = 2; i < 45; i+= 4) { v4sf v_tab = ...load from tab... unsigned v4si v_exit_cond = vec_cond_expr({x,x,x,x} < v_tab, -1, 0); if (reduc_max_expr (v_exit_cond)) break; } ... } The epilogue must then work out the value of i at exit (possibly a separate epilogue for the "break" vs the other exit). I see two schemes: (1) use vec_pack_trunc_expr, or similar, to narrow v_exit_cond down to a scalar, where we can find the first set bit, and use this as an index to add to the value still in i. (2) compute a vector of the value i would have had if each element had been the one that exitted: v4si v_i_on_exit = vec_cond_expr (v_exit_cond, {i, i+1, i+2, i+3}, /* Maybe available as induction variable? */ {MAX_INT, MAX_INT, MAX_INT, MAX_INT}) and then take a reduc_min_expr to look for the *first* value of i that exits. (There is one more issue, i.e. that we need to speculate the read of tab[i+1...i+3], as the vector load will probably read all the lanes before we know whether earlier iterations should have exited. So we'd need to have some kind of check against that, or e.g. if tab[] were a global with known bounds. Similar/complicated conditions apply to any/everything else in the loop, too!)