https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92596
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- There's also a missed optimization showing - we analyze the group_size == 3 case successfully but fail to consider splitting it as it fails the unroll check because /* We consider breaking the group only on VF boundaries from the existing start. */ for (i = 0; i < group_size; i++) if (!matches[i]) break; if (i >= const_nunits && i < group_size) { i == group_size here. If you make fn1 static you see a different thing, namely t2.c:10:1: note: Build SLP for _10 = (long int) _2; t2.c:10:1: note: get vectype for scalar type (group size 3): long int t2.c:10:1: note: vectype: vector(1) long int t2.c:10:1: note: get vectype for smallest scalar type: _Bool t2.c:10:1: note: nunits vectype: vector(2) unsigned char t2.c:10:1: note: nunits = 2 V2QI? Huh. t2.c:10:1: note: Build SLP for _2 = c.0_1 == 0; t2.c:10:1: note: get vectype for scalar type (group size 3): int t2.c:10:1: note: vectype: vector(2) int t2.c:10:1: note: nunits = 2 and V2SI. But still V1DI. I guess with all this it might be the case that the vect_update_max_nunits call in vect_build_slp_tree for the case where there is a leader doesn't work in case the local max_nunits is bigger? But this isn't the case here. So interestingly for _2 = c.0_1 == 0; we have vectype == boolean_type_node and nunits_vectype V2SI. Uh. @@ -1247,7 +1248,8 @@ vect_build_slp_tree (vec_info *vinfo, return *leader; } poly_uint64 this_max_nunits = 1; - slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits, + slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, + &this_max_nunits, matches, npermutes, tree_size, bst_map); if (res) {