https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
First of all we end up with weird

t.c:4:12: note: op: VEC_PERM_EXPR
t.c:4:12: note:         [l] stmt 0 center_x_317 = _316 * _stepX_14(D);
t.c:4:12: note:         [l] stmt 1 center_y_320 = _319 * _stepY_17(D);
t.c:4:12: note:         [l] stmt 2 center_x_317 = _316 * _stepX_14(D);
t.c:4:12: note:         [l] stmt 3 center_y_320 = _319 * _stepY_17(D);
t.c:4:12: note:         lane permutation { 0[0] 0[2] 0[0] 0[2] }
t.c:4:12: note:         children 0x4d3c8c0 0x4d3c8c0
t.c:4:12: note: node (external) 0x4d3c8c0 (max_nunits=1, refcnt=3) vector(4)
float
t.c:4:12: note:         { center_x_317, _68, center_y_320, _69 }
t.c:4:12: note:   node 0x4d3c9e0 (max_nunits=1, refcnt=2) vector(4) float
t.c:4:12: note:   op: VEC_PERM_EXPR
t.c:4:12: note:         stmt 0 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note:         stmt 1 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note:         stmt 2 _68 = _boxWidth_20(D) * 5.0e-1; 
t.c:4:12: note:         stmt 3 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note:         lane permutation { 0[1] 0[3] 0[1] 0[3] }
t.c:4:12: note:         children 0x4d3c8c0 0x4d3c8c0

that's "weird" because permuting an external isn't very optimal.  We do

t.c:4:12: note:   Replace two_operators operands:
t.c:4:12: note:   Operand 0:
t.c:4:12: note:         stmt 0 center_x_417 = _416 * _stepX_14(D);
t.c:4:12: note:         stmt 1 center_y_420 = _419 * _stepY_17(D);
t.c:4:12: note:         stmt 2 center_x_417 = _416 * _stepX_14(D);
t.c:4:12: note:         stmt 3 center_y_420 = _419 * _stepY_17(D);
t.c:4:12: note:   Operand 1: 
t.c:4:12: note:         stmt 0 _68 = _boxWidth_20(D) * 5.0e-1; 
t.c:4:12: note:         stmt 1 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note:         stmt 2 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note:         stmt 3 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note:   With a single operand:
t.c:4:12: note:         stmt 0 center_x_417 = _416 * _stepX_14(D);
t.c:4:12: note:         stmt 1 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note:         stmt 2 center_y_420 = _419 * _stepY_17(D);
t.c:4:12: note:         stmt 3 _69 = _boxHeight_21(D) * 5.0e-1;

it looks like we fail discovery here and fall back to externs but still
end up generating the permutes.  Looks like the has_two_operators_perm
code will not backtrack after all?

Anyway, we're using get_later_stmt all over the place which assumes defs
are in the same basic-block.  Both in vect_create_constant_vectors for
the case we have former internal-defs but also when scheduling via
vect_find_first/last_scalar_stmt_in_slp.  When we allow different BBs
we'd have to ensure we can order defs or at least insert locations which
we do not yet verify (that we can schedule - we'd only figure during transform
at the moment).

So the fix is to restrict both SLP build (thereby extern promotion) and
the new "mixing" of two operands to honor a single BB def.

I'll not pursue a more complex solution unless the former causes regressions
(we're eventually fine for strictly orderable defs).

That said ...

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7f69a3f57b4..bca3e31d6eb 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1854,8 +1854,10 @@ vect_orig_stmt (stmt_vec_info stmt_info)
 inline stmt_vec_info
 get_later_stmt (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info)
 {
-  if (gimple_uid (vect_orig_stmt (stmt1_info)->stmt)
-      > gimple_uid (vect_orig_stmt (stmt2_info)->stmt))
+  gimple *stmt1 = vect_orig_stmt (stmt1_info)->stmt;
+  gimple *stmt2 = vect_orig_stmt (stmt2_info)->stmt;
+  gcc_assert (gimple_bb (stmt1) == gimple_bb (stmt2));
+  if (gimple_uid (stmt1) > gimple_uid (stmt2))
     return stmt1_info;
   else
     return stmt2_info;

Reply via email to