void foo(int *a, int *b, int n) { int i; for(i = 0; i < n; i++) a[i] = a[i] + b[i]; }
For this simple loop, the vectorizer does its job and peels the last few iterations as post-loop that is not vectorized. But the RTL loop unroller does not know that it just has a few (at most 3 in this case) iterations, and will unroll the post-loop. What is worse, if you compile it with: gcc -O3 -fprefetch-loop-arrays -funroll-loops You may find the prefetch pass will also unroll the post-loop, and generate a new post-loop (post-post-loop) for this post-loop. Again, the RTL loop unroller could not recognize this post-post-loop, and will unroll it. (the RTL loop unroller will generate yet another post loop (post-post-post-loop) for the post-post-loop :-)) This will cause compilation time and code size increase dramastically without any performance benefit. -- Summary: pre- and post-loops should not be unrolled. Product: gcc Version: lno Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794