Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

Andrew Pinski Sun, 12 Mar 2006 16:18:59 -0800

> 
> On 3/12/06, Steven Bosscher <[EMAIL PROTECTED]> wrote:
> > > Yes, why is the benchmark not valid?
> >
> > It is valid.  We should understand why this behavior has changed so 
> > drastically.
> This benchmark maybe useless, it still exposes a weakness of gcc4. At
> least it's not news to me:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21195
> 
> So that PR has been closed when gcc-devs marked all those intrinsics
> as force_inline. That's also the kludge i use with my code. The real
> problem is once you start marking some functions as force_inline, you
> upset the inlining heuristic even more creating even more silly
> inlining misses, rince, repeat.
> At the end of the day, everything is marked either force_inline or
> noinline and you'd be better off without a heuristic at all.


Actually the best way of improving the inline heuristics is to get
a real testcase (and not some benchmark) where  the inline heuristics
is messed up.  Now SSE intrinsics are special in that they should be
always inlined and that fact should be hidden from the user.  Maybe
they should be rewritten so that they are just like the altivec
intrinsics in that it is just a plain #define and nothing special to
the user and no worrying about the inlining heuristic.  I should
note that always inline was added for altivec intrinsics in the 
first place and they have now since been rewritten.  Also the
kernel uses always inline but I and other feels that is a mistake.

-- Pinski

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

Reply via email to