https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117714
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- So it seems that vect_optimize_slp_pass::change_layout_cost when trying to transition the following load from layout 1 to layout 0: slp-reduc-4.c:13:5: note: node 0x36cd760 (max_nunits=2, refcnt=1) vector(2) unsigned int slp-reduc-4.c:13:5: note: op template: _9 = uc[_8]; slp-reduc-4.c:13:5: note: stmt 0 _9 = uc[_8]; slp-reduc-4.c:13:5: note: stmt 1 _11 = uc[_10]; slp-reduc-4.c:13:5: note: stmt 2 _16 = uc[_15]; slp-reduc-4.c:13:5: note: stmt 3 _14 = uc[_13]; slp-reduc-4.c:13:5: note: stmt 4 _5 = uc[_4]; slp-reduc-4.c:13:5: note: stmt 5 _3 = uc[_2]; slp-reduc-4.c:13:5: note: stmt 6 _7 = uc[_6]; slp-reduc-4.c:13:5: note: stmt 7 _12 = uc[_1]; slp-reduc-4.c:13:5: note: load permutation { 7 6 5 4 3 2 1 0 } asks whether the target can do a { 7 6 5 4 3 2 1 0 } permute (which sparc cannot do). But it's missing the fact that the permute would be merged with the load permutation, cancelling that out? In fact with the following (not entirely sure the vect_permute_slp should be forward...), we correctly reject layout 0 (given the load permutation isn't supported) but accept layout 1 (no permute needed). We then still fail because MAX_EXPR isn't supported (and then re-try single-lane SLP which will dump and make the testcase FAIL). diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9ad95104ec7..f870206b585 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -6003,7 +6003,15 @@ vect_optimize_slp_pass::change_layout_cost (slp_tree node, auto_vec<slp_tree, 1> children (1); children.quick_push (node); auto_lane_permutation_t perm (SLP_TREE_LANES (node)); - if (from_layout_i > 0) + if (SLP_TREE_LOAD_PERMUTATION (node).exists () && from_layout_i > 0) + { + auto_load_permutation_t tmp_perm; + tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node)); + vect_slp_permute (m_perms[from_layout_i], tmp_perm, false); + for (unsigned int i : tmp_perm) + perm.quick_push ({ 0, i }); + } + else if (from_layout_i > 0) for (unsigned int i : m_perms[from_layout_i]) perm.quick_push ({ 0, i }); else