https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- I'll note the mask for SLP load-lanes is also off, we discover mask_struct_load_2.c:39:1: note: node 0x5feccd0 (max_nunits=16, refcnt=2) vector([16,16]) <signed-boolean:1> mask_struct_load_2.c:39:1: note: op template: _21 = _3 != 0; mask_struct_load_2.c:39:1: note: stmt 0 _21 = _3 != 0; mask_struct_load_2.c:39:1: note: stmt 1 _21 = _3 != 0; mask_struct_load_2.c:39:1: note: stmt 2 _21 = _3 != 0; mask_struct_load_2.c:39:1: note: children 0x5fece98 0x5fecf30 mask_struct_load_2.c:39:1: note: node (constant) 0x5fece98 (max_nunits=1, refcnt=1) mask_struct_load_2.c:39:1: note: { 0, 0, 0 } mask_struct_load_2.c:39:1: note: node 0x5fecf30 (max_nunits=16, refcnt=2) vector([16,16]) signed char mask_struct_load_2.c:39:1: note: op: VEC_PERM_EXPR mask_struct_load_2.c:39:1: note: stmt 0 _3 = *_2; mask_struct_load_2.c:39:1: note: stmt 1 _3 = *_2; mask_struct_load_2.c:39:1: note: stmt 2 _3 = *_2; mask_struct_load_2.c:39:1: note: lane permutation { 0[0] 0[0] 0[0] } mask_struct_load_2.c:39:1: note: children 0x5fec9d8 as the mask - but in the end the actual CPU instruction only needs a third of the lanes and the above specific permute (the "splat"), isn't supported. load/store-lanes are a difficult beast and the SLP representation we currently use might be sub-optimal. SLP pattern matching could be another place to discover load-lanes.