https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2023-03-09 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- On trunk we end up with a vectorized loop with unrolled vectorized epilogue and scalar iteration and with a scalar loop copy used for n == 1 and the case when 'a' and 'b' alias, so this loop covers all possible 'n'. Prefetching then (rightfully) prefetches both of these which makes four out of them. What can be improved is the upper bound on the iteration for the epilog loop as created by tree_unroll_loop. That results in the RTL unroller seeing - upper bound: 2147483646 - likely upper bound: 2147483646 + upper bound: 14 + likely upper bound: 14 realistic bound: -1 ;; Unable to prove that the loop iterates constant times +;; Not unrolling loop, doesn't roll ... - upper bound: 536870910 - likely upper bound: 536870910 + upper bound: 2 + likely upper bound: 2 realistic bound: -1 ;; Unable to prove that the loop iterates constant times +;; Not unrolling loop, doesn't roll and thus only unrolling the main loops again. The prefetching unrolling should have been enough of course - I suppose the RTL unroller could detect prefetch instructions and refrain from unrolling loops with prefetches on the basis they are already tuned well (that could be also implemented in the unroll control target hook). I am testing a patch to improve the situation.