http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731



--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-19 
10:27:22 UTC ---

(In reply to comment #4)

> (In reply to comment #3)

> > The reason is that unrolling early can be harmful to for example 
> > vectorization

> > and thus cunrolli restricts itself to "obviously" profitable cases.

> > 

> > In this case the loop is not an "inner" loop - it doesn't have a containing

> > loop and so growth is not allowed even with -O3 (we otherwise will fail

> > to vectorize if the unrolled body ends up as part of other basic-blocks).

> > 

> Richard,

> 

> It looks that you did not see attached testcases.



I did - I even compiled them as you did and looked at the dump file and

the unroller source.



> I can't agree with your statement since

> 1. Loop in problem (t.c) has only 3 iterations and in any case it should not 
> be

> considered as candidate for vectorization.



That's target dependend knowledge the unroller does not have (with

two element vectors you can produce one vectorized and one scalar iteration).



> 2. Loop contains calls of functions that do not have vectorizable 
> counterparts.



The unroller does not have this detailed knowledge of the vectorizers

capabilities - it simply considers all loops vectorizable.



> 3. Loop contains comparisons with loop control variable as

>     if (i == 0) etc.

> and cunrolli phase determines it:

> 

>  BB: 7, after_exit: 1

>   size:   2 if (i_1 == 1)

>    Constant conditional.

>  BB: 5, after_exit: 1

>   size:   2 foo4 (k_15(D));

>   size:   2 if (i_1 == 0)

>    Constant conditional.

> 

> It means that these tests will be completely eliminated by loop unroller and

> some bb will become unreachable.



So?  Fact is:



      FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)

        {

          struct loop *loop_father = loop_outer (loop);



          if (may_increase_size && optimize_loop_nest_for_speed_p (loop)

              /* Unroll outermost loops only if asked to do so or they do

                 not cause code growth.  */

              && (unroll_outer || loop_outer (loop_father)))

            ul = UL_ALL;

          else

            ul = UL_NO_GROWTH;



will end up with ul == UL_NO_GROWTH for t.c.  Because loop_outer (loop_father)

is NULL (and unroll_outer is false).



I stated the reason for this "heuristic" (-> this loop may no longer be

a loop after unrolling and thus not vectorizable).



> I also added another testcase (t2.c) for which cunrolli does correct size

> estimation and completely unroll it (it has only 2 iterations).



size: 14-5, last_iteration: 2-0

  Loop size: 14

  Estimated size after unrolling: 13



doesn't grow thus is ok to unroll.



> So I assume that size estimation algorithm in unroller is not perfect and must

> be re-written.



Haha ;)  Of course - it can't be "perfect" - you cannot reasonably pre-compute

the outcome of all subsequent optimizations correctly without ever pessimizing

in one or another way (either estimate a too small or a too large size).



But you are of course free to propose a patch!



> And at last if customer provides gcc with "-funroll-loop" option we should not

> consider "possible size growth" as reason of unroll rejection. 



As I said above, cunrolli is supposed to only unroll inner loops.  Your

loop isn't an inner (nested loop).  This restriction is relaxed if unrolling

does not increase size.



> > It's a know issue that after cunroll there is no strong value-numbering

> > pass that handles memory (there is DOM which only has weak memory handling).

> > 

> > So, it's a trade-off we make, mostly for the sake of loop optimizations

> > that do not handle unrolled loops well.

> 

> Best regards.

> Yuri.

Reply via email to