http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-19
10:27:22 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > The reason is that unrolling early can be harmful to for example
> > vectorization
> > and thus cunrolli restricts itself to "obviously" profitable cases.
> >
> > In this case the loop is not an "inner" loop - it doesn't have a containing
> > loop and so growth is not allowed even with -O3 (we otherwise will fail
> > to vectorize if the unrolled body ends up as part of other basic-blocks).
> >
> Richard,
>
> It looks that you did not see attached testcases.
I did - I even compiled them as you did and looked at the dump file and
the unroller source.
> I can't agree with your statement since
> 1. Loop in problem (t.c) has only 3 iterations and in any case it should not
> be
> considered as candidate for vectorization.
That's target dependend knowledge the unroller does not have (with
two element vectors you can produce one vectorized and one scalar iteration).
> 2. Loop contains calls of functions that do not have vectorizable
> counterparts.
The unroller does not have this detailed knowledge of the vectorizers
capabilities - it simply considers all loops vectorizable.
> 3. Loop contains comparisons with loop control variable as
> if (i == 0) etc.
> and cunrolli phase determines it:
>
> BB: 7, after_exit: 1
> size: 2 if (i_1 == 1)
> Constant conditional.
> BB: 5, after_exit: 1
> size: 2 foo4 (k_15(D));
> size: 2 if (i_1 == 0)
> Constant conditional.
>
> It means that these tests will be completely eliminated by loop unroller and
> some bb will become unreachable.
So? Fact is:
FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
{
struct loop *loop_father = loop_outer (loop);
if (may_increase_size && optimize_loop_nest_for_speed_p (loop)
/* Unroll outermost loops only if asked to do so or they do
not cause code growth. */
&& (unroll_outer || loop_outer (loop_father)))
ul = UL_ALL;
else
ul = UL_NO_GROWTH;
will end up with ul == UL_NO_GROWTH for t.c. Because loop_outer (loop_father)
is NULL (and unroll_outer is false).
I stated the reason for this "heuristic" (-> this loop may no longer be
a loop after unrolling and thus not vectorizable).
> I also added another testcase (t2.c) for which cunrolli does correct size
> estimation and completely unroll it (it has only 2 iterations).
size: 14-5, last_iteration: 2-0
Loop size: 14
Estimated size after unrolling: 13
doesn't grow thus is ok to unroll.
> So I assume that size estimation algorithm in unroller is not perfect and must
> be re-written.
Haha ;) Of course - it can't be "perfect" - you cannot reasonably pre-compute
the outcome of all subsequent optimizations correctly without ever pessimizing
in one or another way (either estimate a too small or a too large size).
But you are of course free to propose a patch!
> And at last if customer provides gcc with "-funroll-loop" option we should not
> consider "possible size growth" as reason of unroll rejection.
As I said above, cunrolli is supposed to only unroll inner loops. Your
loop isn't an inner (nested loop). This restriction is relaxed if unrolling
does not increase size.
> > It's a know issue that after cunroll there is no strong value-numbering
> > pass that handles memory (there is DOM which only has weak memory handling).
> >
> > So, it's a trade-off we make, mostly for the sake of loop optimizations
> > that do not handle unrolled loops well.
>
> Best regards.
> Yuri.