https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
There's also a missed optimization showing - we analyze the group_size == 3
case
successfully but fail to consider splitting it as it fails the unroll check
because
/* We consider breaking the group only on VF boundaries from the existing
start. */
for (i = 0; i < group_size; i++)
if (!matches[i]) break;
if (i >= const_nunits && i < group_size)
{
i == group_size here.
If you make fn1 static you see a different thing, namely
t2.c:10:1: note: Build SLP for _10 = (long int) _2;
t2.c:10:1: note: get vectype for scalar type (group size 3): long int
t2.c:10:1: note: vectype: vector(1) long int
t2.c:10:1: note: get vectype for smallest scalar type: _Bool
t2.c:10:1: note: nunits vectype: vector(2) unsigned char
t2.c:10:1: note: nunits = 2
V2QI? Huh.
t2.c:10:1: note: Build SLP for _2 = c.0_1 == 0;
t2.c:10:1: note: get vectype for scalar type (group size 3): int
t2.c:10:1: note: vectype: vector(2) int
t2.c:10:1: note: nunits = 2
and V2SI. But still V1DI.
I guess with all this it might be the case that the vect_update_max_nunits
call in vect_build_slp_tree for the case where there is a leader doesn't
work in case the local max_nunits is bigger? But this isn't the case here.
So interestingly for
_2 = c.0_1 == 0;
we have vectype == boolean_type_node and nunits_vectype V2SI.
Uh.
@@ -1247,7 +1248,8 @@ vect_build_slp_tree (vec_info *vinfo,
return *leader;
}
poly_uint64 this_max_nunits = 1;
- slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits,
+ slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size,
+ &this_max_nunits,
matches, npermutes, tree_size,
bst_map);
if (res)
{