https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
   Last reconfirmed|                            |2023-03-09
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
On trunk we end up with a vectorized loop with unrolled vectorized epilogue and
scalar iteration and with a scalar loop copy used for n == 1 and the case
when 'a' and 'b' alias, so this loop covers all possible 'n'.  Prefetching then
(rightfully) prefetches both of these which makes four out of them.

What can be improved is the upper bound on the iteration for the epilog loop as
created by tree_unroll_loop.  That results in the RTL unroller seeing

-  upper bound: 2147483646
-  likely upper bound: 2147483646
+  upper bound: 14
+  likely upper bound: 14
   realistic bound: -1
 ;; Unable to prove that the loop iterates constant times
+;; Not unrolling loop, doesn't roll

...

-  upper bound: 536870910
-  likely upper bound: 536870910
+  upper bound: 2
+  likely upper bound: 2
   realistic bound: -1
 ;; Unable to prove that the loop iterates constant times
+;; Not unrolling loop, doesn't roll

and thus only unrolling the main loops again.  The prefetching unrolling
should have been enough of course - I suppose the RTL unroller could detect
prefetch instructions and refrain from unrolling loops with prefetches
on the basis they are already tuned well (that could be also implemented
in the unroll control target hook).

I am testing a patch to improve the situation.

Reply via email to