http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-19 10:27:22 UTC --- (In reply to comment #4) > (In reply to comment #3) > > The reason is that unrolling early can be harmful to for example > > vectorization > > and thus cunrolli restricts itself to "obviously" profitable cases. > > > > In this case the loop is not an "inner" loop - it doesn't have a containing > > loop and so growth is not allowed even with -O3 (we otherwise will fail > > to vectorize if the unrolled body ends up as part of other basic-blocks). > > > Richard, > > It looks that you did not see attached testcases. I did - I even compiled them as you did and looked at the dump file and the unroller source. > I can't agree with your statement since > 1. Loop in problem (t.c) has only 3 iterations and in any case it should not > be > considered as candidate for vectorization. That's target dependend knowledge the unroller does not have (with two element vectors you can produce one vectorized and one scalar iteration). > 2. Loop contains calls of functions that do not have vectorizable > counterparts. The unroller does not have this detailed knowledge of the vectorizers capabilities - it simply considers all loops vectorizable. > 3. Loop contains comparisons with loop control variable as > if (i == 0) etc. > and cunrolli phase determines it: > > BB: 7, after_exit: 1 > size: 2 if (i_1 == 1) > Constant conditional. > BB: 5, after_exit: 1 > size: 2 foo4 (k_15(D)); > size: 2 if (i_1 == 0) > Constant conditional. > > It means that these tests will be completely eliminated by loop unroller and > some bb will become unreachable. So? Fact is: FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { struct loop *loop_father = loop_outer (loop); if (may_increase_size && optimize_loop_nest_for_speed_p (loop) /* Unroll outermost loops only if asked to do so or they do not cause code growth. */ && (unroll_outer || loop_outer (loop_father))) ul = UL_ALL; else ul = UL_NO_GROWTH; will end up with ul == UL_NO_GROWTH for t.c. Because loop_outer (loop_father) is NULL (and unroll_outer is false). I stated the reason for this "heuristic" (-> this loop may no longer be a loop after unrolling and thus not vectorizable). > I also added another testcase (t2.c) for which cunrolli does correct size > estimation and completely unroll it (it has only 2 iterations). size: 14-5, last_iteration: 2-0 Loop size: 14 Estimated size after unrolling: 13 doesn't grow thus is ok to unroll. > So I assume that size estimation algorithm in unroller is not perfect and must > be re-written. Haha ;) Of course - it can't be "perfect" - you cannot reasonably pre-compute the outcome of all subsequent optimizations correctly without ever pessimizing in one or another way (either estimate a too small or a too large size). But you are of course free to propose a patch! > And at last if customer provides gcc with "-funroll-loop" option we should not > consider "possible size growth" as reason of unroll rejection. As I said above, cunrolli is supposed to only unroll inner loops. Your loop isn't an inner (nested loop). This restriction is relaxed if unrolling does not increase size. > > It's a know issue that after cunroll there is no strong value-numbering > > pass that handles memory (there is DOM which only has weak memory handling). > > > > So, it's a trade-off we make, mostly for the sake of loop optimizations > > that do not handle unrolled loops well. > > Best regards. > Yuri.