https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> --- First of all we end up with weird t.c:4:12: note: op: VEC_PERM_EXPR t.c:4:12: note: [l] stmt 0 center_x_317 = _316 * _stepX_14(D); t.c:4:12: note: [l] stmt 1 center_y_320 = _319 * _stepY_17(D); t.c:4:12: note: [l] stmt 2 center_x_317 = _316 * _stepX_14(D); t.c:4:12: note: [l] stmt 3 center_y_320 = _319 * _stepY_17(D); t.c:4:12: note: lane permutation { 0[0] 0[2] 0[0] 0[2] } t.c:4:12: note: children 0x4d3c8c0 0x4d3c8c0 t.c:4:12: note: node (external) 0x4d3c8c0 (max_nunits=1, refcnt=3) vector(4) float t.c:4:12: note: { center_x_317, _68, center_y_320, _69 } t.c:4:12: note: node 0x4d3c9e0 (max_nunits=1, refcnt=2) vector(4) float t.c:4:12: note: op: VEC_PERM_EXPR t.c:4:12: note: stmt 0 _68 = _boxWidth_20(D) * 5.0e-1; t.c:4:12: note: stmt 1 _69 = _boxHeight_21(D) * 5.0e-1; t.c:4:12: note: stmt 2 _68 = _boxWidth_20(D) * 5.0e-1; t.c:4:12: note: stmt 3 _69 = _boxHeight_21(D) * 5.0e-1; t.c:4:12: note: lane permutation { 0[1] 0[3] 0[1] 0[3] } t.c:4:12: note: children 0x4d3c8c0 0x4d3c8c0 that's "weird" because permuting an external isn't very optimal. We do t.c:4:12: note: Replace two_operators operands: t.c:4:12: note: Operand 0: t.c:4:12: note: stmt 0 center_x_417 = _416 * _stepX_14(D); t.c:4:12: note: stmt 1 center_y_420 = _419 * _stepY_17(D); t.c:4:12: note: stmt 2 center_x_417 = _416 * _stepX_14(D); t.c:4:12: note: stmt 3 center_y_420 = _419 * _stepY_17(D); t.c:4:12: note: Operand 1: t.c:4:12: note: stmt 0 _68 = _boxWidth_20(D) * 5.0e-1; t.c:4:12: note: stmt 1 _69 = _boxHeight_21(D) * 5.0e-1; t.c:4:12: note: stmt 2 _68 = _boxWidth_20(D) * 5.0e-1; t.c:4:12: note: stmt 3 _69 = _boxHeight_21(D) * 5.0e-1; t.c:4:12: note: With a single operand: t.c:4:12: note: stmt 0 center_x_417 = _416 * _stepX_14(D); t.c:4:12: note: stmt 1 _68 = _boxWidth_20(D) * 5.0e-1; t.c:4:12: note: stmt 2 center_y_420 = _419 * _stepY_17(D); t.c:4:12: note: stmt 3 _69 = _boxHeight_21(D) * 5.0e-1; it looks like we fail discovery here and fall back to externs but still end up generating the permutes. Looks like the has_two_operators_perm code will not backtrack after all? Anyway, we're using get_later_stmt all over the place which assumes defs are in the same basic-block. Both in vect_create_constant_vectors for the case we have former internal-defs but also when scheduling via vect_find_first/last_scalar_stmt_in_slp. When we allow different BBs we'd have to ensure we can order defs or at least insert locations which we do not yet verify (that we can schedule - we'd only figure during transform at the moment). So the fix is to restrict both SLP build (thereby extern promotion) and the new "mixing" of two operands to honor a single BB def. I'll not pursue a more complex solution unless the former causes regressions (we're eventually fine for strictly orderable defs). That said ... diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 7f69a3f57b4..bca3e31d6eb 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1854,8 +1854,10 @@ vect_orig_stmt (stmt_vec_info stmt_info) inline stmt_vec_info get_later_stmt (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info) { - if (gimple_uid (vect_orig_stmt (stmt1_info)->stmt) - > gimple_uid (vect_orig_stmt (stmt2_info)->stmt)) + gimple *stmt1 = vect_orig_stmt (stmt1_info)->stmt; + gimple *stmt2 = vect_orig_stmt (stmt2_info)->stmt; + gcc_assert (gimple_bb (stmt1) == gimple_bb (stmt2)); + if (gimple_uid (stmt1) > gimple_uid (stmt2)) return stmt1_info; else return stmt2_info;