https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
There's also a missed optimization showing - we analyze the group_size == 3
case
successfully but fail to consider splitting it as it fails the unroll check
because

      /* We consider breaking the group only on VF boundaries from the existing
         start.  */
      for (i = 0; i < group_size; i++)
        if (!matches[i]) break;

      if (i >= const_nunits && i < group_size)
        {

i == group_size here.

If you make fn1 static you see a different thing, namely

t2.c:10:1: note:   Build SLP for _10 = (long int) _2;
t2.c:10:1: note:   get vectype for scalar type (group size 3): long int
t2.c:10:1: note:   vectype: vector(1) long int
t2.c:10:1: note:   get vectype for smallest scalar type: _Bool
t2.c:10:1: note:   nunits vectype: vector(2) unsigned char
t2.c:10:1: note:   nunits = 2

V2QI?  Huh.

t2.c:10:1: note:   Build SLP for _2 = c.0_1 == 0;
t2.c:10:1: note:   get vectype for scalar type (group size 3): int
t2.c:10:1: note:   vectype: vector(2) int
t2.c:10:1: note:   nunits = 2

and V2SI.  But still V1DI.

I guess with all this it might be the case that the vect_update_max_nunits
call in vect_build_slp_tree for the case where there is a leader doesn't
work in case the local max_nunits is bigger?  But this isn't the case here.

So interestingly for

_2 = c.0_1 == 0;

we have vectype == boolean_type_node and nunits_vectype V2SI.

Uh.

@@ -1247,7 +1248,8 @@ vect_build_slp_tree (vec_info *vinfo,
       return *leader;
     }
   poly_uint64 this_max_nunits = 1;
-  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits,
+  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size,
+                                       &this_max_nunits,
                                        matches, npermutes, tree_size,
bst_map);
   if (res)
     {

Reply via email to