https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'll note the mask for SLP load-lanes is also off, we discover

mask_struct_load_2.c:39:1: note:   node 0x5feccd0 (max_nunits=16, refcnt=2)
vector([16,16]) <signed-boolean:1>
mask_struct_load_2.c:39:1: note:   op template: _21 = _3 != 0;
mask_struct_load_2.c:39:1: note:        stmt 0 _21 = _3 != 0;
mask_struct_load_2.c:39:1: note:        stmt 1 _21 = _3 != 0;
mask_struct_load_2.c:39:1: note:        stmt 2 _21 = _3 != 0;
mask_struct_load_2.c:39:1: note:        children 0x5fece98 0x5fecf30
mask_struct_load_2.c:39:1: note:   node (constant) 0x5fece98 (max_nunits=1,
refcnt=1)
mask_struct_load_2.c:39:1: note:        { 0, 0, 0 }
mask_struct_load_2.c:39:1: note:   node 0x5fecf30 (max_nunits=16, refcnt=2)
vector([16,16]) signed char
mask_struct_load_2.c:39:1: note:   op: VEC_PERM_EXPR
mask_struct_load_2.c:39:1: note:        stmt 0 _3 = *_2;
mask_struct_load_2.c:39:1: note:        stmt 1 _3 = *_2;
mask_struct_load_2.c:39:1: note:        stmt 2 _3 = *_2;
mask_struct_load_2.c:39:1: note:        lane permutation { 0[0] 0[0] 0[0] }
mask_struct_load_2.c:39:1: note:        children 0x5fec9d8

as the mask - but in the end the actual CPU instruction only needs a third
of the lanes and the above specific permute (the "splat"), isn't supported.

load/store-lanes are a difficult beast and the SLP representation we
currently use might be sub-optimal.


SLP pattern matching could be another place to discover load-lanes.

Reply via email to