[Bug middle-end/29256] [4.2 regression] loop performance regression

rakdver at gcc dot gnu dot org Thu, 28 Sep 2006 07:40:44 -0700


------- Comment #14 from rakdver at gcc dot gnu dot org  2006-09-28 14:40 
-------
> > > for this loop instead of just one.
> > > Actually unrolling is not need to produced the bad code:
> > > .L2:
> > >         lwz 0,0(9)
> > >         stwx 0,11,9
> > >         addi 9,9,4
> > >         bdnz .L2
> > > I bet a beer that loop.c actually fixed this crap up before.
> > 
> > I am bad at reading ppc assembler; could you please explain what exactly is
> > wrong with the code you present?
> 
> One, there are two adds still there (just one is implicated)
> so why not do the loop as:


there is only one add, as far as I can see.

>  .L2:
>          lwz r0,0(r9)
>          stw r0,0(r11)
>          addi r9,r9,4
>          addi r11,r11,4
>          bdnz .L2

Otoh, this seems worse to me (one more add).

> Or:
>  .L2:
>          lwxz r0,r9,r12
>          stwx r0,r11,r12
>          addi r12,r12,4
>          bdnz .L2

Yes, this would be about the same.  Still, ivopts chose one of the best
possible ways, so I do not see what you are complaining about so much.
The unrolled case is something different -- of course we should use offsetted
modes there.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29256

[Bug middle-end/29256] [4.2 regression] loop performance regression

Reply via email to