https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
FAIL: gcc.target/aarch64/sve/mask_struct_load_2.c

for example fails because of this.  We now correctly do SLP discovery to

mask_struct_load_2.c:39:1: note:   node 0x4ed9940 (max_nunits=16, refcnt=2)
vector([16,16]) signed char
mask_struct_load_2.c:39:1: note:   op: VEC_PERM_EXPR
mask_struct_load_2.c:39:1: note:        stmt 0 _7 = .MASK_LOAD (_6, 8B, _21);
mask_struct_load_2.c:39:1: note:        lane permutation { 0[0] }
mask_struct_load_2.c:39:1: note:        children 0x4ed99d8
mask_struct_load_2.c:39:1: note:   node 0x4ed9e00 (max_nunits=16, refcnt=2)
vector([16,16]) signed char
mask_struct_load_2.c:39:1: note:   op: VEC_PERM_EXPR
mask_struct_load_2.c:39:1: note:        stmt 0 _11 = .MASK_LOAD (_10, 8B, _21);
mask_struct_load_2.c:39:1: note:        lane permutation { 0[1] }
mask_struct_load_2.c:39:1: note:        children 0x4ed99d8
mask_struct_load_2.c:39:1: note:   node 0x4ed9f30 (max_nunits=16, refcnt=2)
vector([16,16]) signed char
mask_struct_load_2.c:39:1: note:   op: VEC_PERM_EXPR
mask_struct_load_2.c:39:1: note:        stmt 0 _16 = .MASK_LOAD (_15, 8B, _21);
mask_struct_load_2.c:39:1: note:        lane permutation { 0[2] }
mask_struct_load_2.c:39:1: note:        children 0x4ed99d8
mask_struct_load_2.c:39:1: note:   node 0x4ed99d8 (max_nunits=16, refcnt=4)
vector([16,16]) signed char
mask_struct_load_2.c:39:1: note:   op template: _7 = .MASK_LOAD (_6, 8B, _21);
mask_struct_load_2.c:39:1: note:        stmt 0 _7 = .MASK_LOAD (_6, 8B, _21);
mask_struct_load_2.c:39:1: note:        stmt 1 _11 = .MASK_LOAD (_10, 8B, _21);
mask_struct_load_2.c:39:1: note:        stmt 2 _16 = .MASK_LOAD (_15, 8B, _21);
mask_struct_load_2.c:39:1: note:        children 0x4ed9a70 

but this representation is not marked as ->ldst_p - it doesn't require
further lowering (there's no permute on the actual load) and that's what
currently sets the want-to-use-load-lanes flag.

For masked load lanes we need some other place setting this - it could
be as late as during permute optimization (where we conveniently have
backward edges for the SLP graph).  I do not want to set the flag during
SLP discovery (which now splits nodes as seen above).

FAIL: gcc.target/aarch64/sve/mask_struct_load_1.c

fails the same way though IMO questionable whether ld2 is really profitable
here.

Reply via email to