4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

paolo dot bonzini at lu dot unisi dot ch Mon, 07 Aug 2006 09:59:00 -0700


------- Comment #40 from paolo dot bonzini at lu dot unisi dot ch  2006-08-07 
16:58 -------
Subject: Re:  [4.0/4.1 Regression] gcc 4 produces worse
 x87 code on all platforms than gcc 3



>> I don't see how the last fmul[sl] can be removed without increasing code 
>> size.
>>     
> However, I can see that the
> peephole phase might not be able to change the register usage.
Actually, the peephole phase may not change the register usage, but it 
could peruse a scratch register if available.  But it would be much more 
controversial (even if backed by your hard numbers on ATLAS) to state 
that splitting fmul[sl] to fld[sl]+fmul is always beneficial, unless 
there is some manual telling us exactly that... for example it would be 
a different story if it could give higher scheduling freedom (stuff like 
VectorPath vs. DirectPath on Athlons), and if we could figure out on 
which platforms it improves performance.
> On this front, is there some reason you cannot post
> the patch(es) as attachments, just to rule out copy problems, as I've asked in
> last several messages?  Note there's no need if I can grab your stuff from 
> SVN,
> as below . . .
>   
You already found about this :-P

Unfortunately I mistyped the PR number when I committed the patch; I 
meant the commit to appear in the audit trail, so that you'd have seen 
that I had committed it.
>> because my tests were run on a similar Prescott (P4e)
>>     
> You didn't post the gcc 3 performance numbers.  What were those like?  If
> you beat/tied gcc 3, then the remaining fmul[l,s] are probably not a big
> deal.  If gcc 3 is still winning, on the other hand . . .
>   
I don't have GCC 3 on that machine.

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

Reply via email to