https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98516

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Meh, it's very hard to spot the actual problem :/

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d8a2ceb0fa1..dee360307d0 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5058,8 +5059,7 @@ vect_slp_region (vec<basic_block> bbs,
vec<data_reference_p> datarefs,
        bb_vinfo->shared->check_datarefs ();
       bb_vinfo->vector_mode = next_vector_mode;

-      if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal, dataref_groups)
-         && dbg_cnt (vect_slp))
+      if (vect_slp_analyze_bb_1 (bb_vinfo, n_stmts, fatal, dataref_groups))
        {
          if (dump_enabled_p ())
            {
@@ -5090,6 +5090,9 @@ vect_slp_region (vec<basic_block> bbs,
vec<data_reference_p> datarefs,
                  continue;
                }

+             if (!dbg_cnt (vect_slp))
+               continue;
+
              if (!vectorized && dump_enabled_p ())
                dump_printf_loc (MSG_NOTE, vect_location,
                                 "Basic block will be vectorized "

helps to narrow down the bogus vectorization, -fdbg-cnt=vect_slp:2:2 triggers
it but the SLP region is quite big still.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d8a2ceb0fa1..dee360307d0 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3310,6 +3310,7 @@ vect_optimize_slp (vec_info *vinfo)
   auto_vec<int> leafs;
   vect_slp_build_vertices (vinfo, vertices, leafs);

+#if 0
   struct graph *slpg = new_graph (vertices.length ());
   FOR_EACH_VEC_ELT (vertices, i, node)
     {
@@ -3619,7 +3620,7 @@ vect_optimize_slp (vec_info *vinfo)
   while (!perms.is_empty ())
     perms.pop ().release ();
   free_graph (slpg);
-
+#endif

   /* Now elide load permutations that are not necessary.  */
   for (i = 0; i < leafs.length (); ++i)

avoids the miscompilation.  The key transform we're doing is
eliding load permutations that swap real/imag parts and instead
adjust the lane permutation of a blend created for plus/minus ops
which is where the bug is I think.  We're changing

t.C:80:7: note: node 0x4204018 (max_nunits=2, refcnt=1)
t.C:80:7: note: op: VEC_PERM_EXPR
t.C:80:7: note:         stmt 0 _37 = _35 - _36;
t.C:80:7: note:         stmt 1 _34 = _32 + _33;
t.C:80:7: note:         lane permutation { 0[0] 1[1] }
t.C:80:7: note:         children 0x42045f0 0x4204678

to

t.C:80:7: note: node 0x4207018 (max_nunits=2, refcnt=1)
t.C:80:7: note: op: VEC_PERM_EXPR
t.C:80:7: note:         stmt 0 _37 = _35 - _36;
t.C:80:7: note:         stmt 1 _34 = _32 + _33;
t.C:80:7: note:         lane permutation { 1[1] 0[0] }
t.C:80:7: note:         children 0x42075f0 0x4207678

but that's not what is necessary - we have permuted the lanes of the
children but permuting the blend will not materialize properly
instead we need to generate { 0[1] 1[0] } I think.

I'm trying to create a simpler C testcase now.

Reply via email to