------- Comment #40 from paolo dot bonzini at lu dot unisi dot ch 2006-08-07 16:58 ------- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
>> I don't see how the last fmul[sl] can be removed without increasing code >> size. >> > However, I can see that the > peephole phase might not be able to change the register usage. Actually, the peephole phase may not change the register usage, but it could peruse a scratch register if available. But it would be much more controversial (even if backed by your hard numbers on ATLAS) to state that splitting fmul[sl] to fld[sl]+fmul is always beneficial, unless there is some manual telling us exactly that... for example it would be a different story if it could give higher scheduling freedom (stuff like VectorPath vs. DirectPath on Athlons), and if we could figure out on which platforms it improves performance. > On this front, is there some reason you cannot post > the patch(es) as attachments, just to rule out copy problems, as I've asked in > last several messages? Note there's no need if I can grab your stuff from > SVN, > as below . . . > You already found about this :-P Unfortunately I mistyped the PR number when I committed the patch; I meant the commit to appear in the audit trail, so that you'd have seen that I had committed it. >> because my tests were run on a similar Prescott (P4e) >> > You didn't post the gcc 3 performance numbers. What were those like? If > you beat/tied gcc 3, then the remaining fmul[l,s] are probably not a big > deal. If gcc 3 is still winning, on the other hand . . . > I don't have GCC 3 on that machine. Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827