https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117714

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
So it seems that vect_optimize_slp_pass::change_layout_cost when trying to
transition the following load from layout 1 to layout 0:

slp-reduc-4.c:13:5: note: node 0x36cd760 (max_nunits=2, refcnt=1) vector(2)
unsigned int
slp-reduc-4.c:13:5: note: op template: _9 = uc[_8];
slp-reduc-4.c:13:5: note:     stmt 0 _9 = uc[_8];
slp-reduc-4.c:13:5: note:     stmt 1 _11 = uc[_10];
slp-reduc-4.c:13:5: note:     stmt 2 _16 = uc[_15];
slp-reduc-4.c:13:5: note:     stmt 3 _14 = uc[_13];
slp-reduc-4.c:13:5: note:     stmt 4 _5 = uc[_4];
slp-reduc-4.c:13:5: note:     stmt 5 _3 = uc[_2];
slp-reduc-4.c:13:5: note:     stmt 6 _7 = uc[_6];
slp-reduc-4.c:13:5: note:     stmt 7 _12 = uc[_1];
slp-reduc-4.c:13:5: note:     load permutation { 7 6 5 4 3 2 1 0 }

asks whether the target can do a { 7 6 5 4 3 2 1 0 } permute (which sparc
cannot do).  But it's missing the fact that the permute would be merged
with the load permutation, cancelling that out?

In fact with the following (not entirely sure the vect_permute_slp should
be forward...), we correctly reject layout 0 (given the load permutation
isn't supported) but accept layout 1 (no permute needed).  We then still
fail because MAX_EXPR isn't supported (and then re-try single-lane SLP
which will dump and make the testcase FAIL).

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9ad95104ec7..f870206b585 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6003,7 +6003,15 @@ vect_optimize_slp_pass::change_layout_cost (slp_tree
node,
   auto_vec<slp_tree, 1> children (1);
   children.quick_push (node);
   auto_lane_permutation_t perm (SLP_TREE_LANES (node));
-  if (from_layout_i > 0)
+  if (SLP_TREE_LOAD_PERMUTATION (node).exists () && from_layout_i > 0)
+    {
+      auto_load_permutation_t tmp_perm;
+      tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node));
+      vect_slp_permute (m_perms[from_layout_i], tmp_perm, false);
+      for (unsigned int i : tmp_perm)
+       perm.quick_push ({ 0, i });
+    }
+  else if (from_layout_i > 0)
     for (unsigned int i : m_perms[from_layout_i])
       perm.quick_push ({ 0, i });
   else

Reply via email to