void foo(int *a, int *b, int n)
{
int i;
for(i = 0; i < n; i++)
a[i] = a[i] + b[i];
}
For this simple loop, the vectorizer does its job and peels the last few
iterations as post-loop that is not vectorized. But the RTL loop unroller
does not know that it just has a few (at most 3 in this case) iterations,
and will unroll the post-loop.
What is worse, if you compile it with:
gcc -O3 -fprefetch-loop-arrays -funroll-loops
You may find the prefetch pass will also unroll the post-loop, and generate
a new post-loop (post-post-loop) for this post-loop. Again, the RTL loop
unroller could not recognize this post-post-loop, and will unroll it.
(the RTL loop unroller will generate yet another post loop
(post-post-post-loop) for the post-post-loop :-))
This will cause compilation time and code size increase dramastically without
any performance benefit.
--
Summary: pre- and post-loops should not be unrolled.
Product: gcc
Version: lno
Status: UNCONFIRMED
Severity: major
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: changpeng dot fang at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794