[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-02-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #8 from Robin Dapp --- I went with your approach and performed some local testing. What I did is add another "unrolling type" in cunrolli, UL_FOR_GAPS, and split it off as a third cunrolli invocation. Right now it analyses the loop f

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #7 from rguenther at suse dot de --- On Wed, 8 Jan 2025, rdapp.gcc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 > > --- Comment #6 from rdapp.gcc at gmail dot com --- > >> Another thought I had as w

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #6 from rdapp.gcc at gmail dot com --- >> Another thought I had as we already know that SLP handles this more >> gracefully: >> Would it make sense to "just" defer to BB vectorization and have loop >> vectorization not do anything, p

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #5 from rguenther at suse dot de --- On Wed, 8 Jan 2025, rdapp.gcc at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 > > --- Comment #4 from rdapp.gcc at gmail dot com --- > > That said - if DR analysis

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rdapp.gcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #4 from rdapp.gcc at gmail dot com --- > That said - if DR analysis could, say, "force" a particular VF where it > knows that gaps are closed we might "virtually" unroll this and thus > detect it as a group of contiguous 16 stores. N

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #3 from Richard Biener --- The issue is that when we treat this as a group the same group in the next iteration will overlap - this isn't something we support (we'd have to alter dependence analysis to consider overlap with gaps as n

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2025-01-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 --- Comment #2 from Robin Dapp --- > The stores are not considered "grouped" because they have gaps. > To do better we'd have to improve the store dataref analysis to see > that a vectorization factor of four would "close" the gaps, or more > g

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2024-06-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-06-04 Blocks|