[Bug tree-optimization/119181] Missed vectorization due to imperfect SLP discovery for 2 grouped load with same base pointer (taken as 1 interleaved load)

rguenth at gcc dot gnu.org via Gcc-bugs Mon, 10 Mar 2025 16:23:48 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2025-03-10
            Summary|Missed vectorization due to |Missed vectorization due to
                   |imperfect SLP discovery for |imperfect SLP discovery for
                   |2 grouped load with same    |2 grouped load with same
                   |base pointer(taken as 1     |base pointer (taken as 1
                   |interleaved load)           |interleaved load)
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is we detect this as a single interleaving group:

t.c:12:1: note:   Detected interleaving load of size 264
t.c:12:1: note:         _1 = *a_26(D);
t.c:12:1: note:         _5 = MEM[(double *)a_26(D) + 8B]; 
t.c:12:1: note:         _7 = MEM[(double *)a_26(D) + 16B];
t.c:12:1: note:         _11 = MEM[(double *)a_26(D) + 24B];
t.c:12:1: note:         _14 = MEM[(double *)a_26(D) + 32B];
t.c:12:1: note:         _17 = MEM[(double *)a_26(D) + 40B];
t.c:12:1: note:         _19 = MEM[(double *)a_26(D) + 48B];
t.c:12:1: note:         _22 = MEM[(double *)a_26(D) + 56B];
t.c:12:1: note:         <gap of 248 elements>
t.c:12:1: note:         _2 = MEM[(double *)a_26(D) + 2048B];
t.c:12:1: note:         _4 = MEM[(double *)a_26(D) + 2056B];
t.c:12:1: note:         _8 = MEM[(double *)a_26(D) + 2064B];
t.c:12:1: note:         _10 = MEM[(double *)a_26(D) + 2072B];
t.c:12:1: note:         _13 = MEM[(double *)a_26(D) + 2080B];
t.c:12:1: note:         _16 = MEM[(double *)a_26(D) + 2088B];
t.c:12:1: note:         _20 = MEM[(double *)a_26(D) + 2096B];
t.c:12:1: note:         _23 = MEM[(double *)a_26(D) + 2104B];

so the heuristic to swap operands to get a single group in leafs doesn't
work.  Instead you get offsetting costs to avoid runaway with very large
gaps:

*a_26(D) 132 times unaligned_load (misalign -1) costs 1584 in body

and that makes it unprofitable.

There is indeed some better heuristic needed where to split groups - gaps
bigger than the biggest vector size might be a good candidate.  Note
when two different interleaving groups are used in the same SLP leaf
we fail as we don't support that yet.

[Bug tree-optimization/119181] Missed vectorization due to imperfect SLP discovery for 2 grouped load with same base pointer (taken as 1 interleaved load)

Reply via email to