When we decide to not process a association chain of size two and that would also mismatch with a different chain size on another lane we shouldn't fail discovery hard at this point. Instead let the regular discovery figure out matching lanes so the parent can decide to perform operand swapping or we can split groups at better points rather than forcefully splitting away the first single lane.
For example on gcc.dg/vect/vect-strided-u8-i8.c we now see two groups of size 4 feeding the store instead of groups of size 1, three, two, one and one. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. Richard. * tree-vect-slp.cc (vect_build_slp_tree_2): On reassociation chain length mismatch do not fail discovery of the node but try without re-associating to compute a better matches[]. Provide a reassociation failure hint in the dump. (vect_slp_analyze_node_operations): Avoid stray failure dumping. (vectorizable_slp_permutation_1): Dump the address of the SLP node representing the permutation. --- gcc/tree-vect-slp.cc | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 0fb17340bd3..2c296bc1926 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2143,19 +2143,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, if (chain.length () == 2) { /* In a chain of just two elements resort to the regular - operand swapping scheme. If we run into a length - mismatch still hard-FAIL. */ - if (chain_len == 0) - hard_fail = false; - else - { - matches[lane] = false; - /* ??? We might want to process the other lanes, but - make sure to not give false matching hints to the - caller for lanes we did not process. */ - if (lane != group_size - 1) - matches[0] = false; - } + operand swapping scheme. Likewise if we run into a + length mismatch process regularly as well as we did not + process the other lanes we cannot report a good hint what + lanes to try swapping in the parent. */ + hard_fail = false; break; } else if (chain_len == 0) @@ -2428,6 +2420,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, return node; } out: + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "failed to line up SLP graph by re-associating " + "operations in lanes%s\n", + !hard_fail ? " trying regular discovery" : ""); while (!children.is_empty ()) vect_free_slp_tree (children.pop ()); while (!chains.is_empty ()) @@ -7553,7 +7550,9 @@ vect_slp_analyze_node_operations (vec_info *vinfo, slp_tree node, /* We're having difficulties scheduling nodes with just constant operands and no scalar stmts since we then cannot compute a stmt insertion place. */ - if (!seen_non_constant_child && SLP_TREE_SCALAR_STMTS (node).is_empty ()) + if (res + && !seen_non_constant_child + && SLP_TREE_SCALAR_STMTS (node).is_empty ()) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -10279,7 +10278,7 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, if (dump_p) { dump_printf_loc (MSG_NOTE, vect_location, - "vectorizing permutation"); + "vectorizing permutation %p", (void *)node); for (unsigned i = 0; i < perm.length (); ++i) dump_printf (MSG_NOTE, " op%u[%u]", perm[i].first, perm[i].second); if (repeating_p) -- 2.43.0