> Jan Hubicka <hubi...@ucw.cz> writes:
> > Concerning x86 specifics, there is cost for allocating stack frame.  So
> > if the function has nothing on stack frame push/pop becomes bit better
> > candidate then a spill.  The hook you added does not seem to be able to
> > test this, since it does not have frame size as an parameter.  I wonder
> > if there is easy way to get it in?
> 
> The main frame size is available globally as get_frame_size ().
> There's also the question of whether a frame needs to be created
> for other reasons, such as an alloca call, but I suppose setting
> up a frame for just alloca would also use push on x86?

Usually the frame is first created by push/pop instructions (which are
callee saves and possibly frame pointer) and the remaining capacity is
allocated using add/sub of ESP pointer. If these can be avoided we save
about 8 bytes of code. Performance wise the stack engine will likely
completely hide the overhead of extra add/sub.

We need add/sub for caller saves, spilling and on-stack variables.
We may be able to hide it in red-zone, but only for leafs.
get_frame_size I think only tells me about hte on-stack variables at the
time ira-color is performed.

This is something that would be nice to model better, but also is likely
not critical.  So I only mentioned it in case you or Vladimir can come
up with a nice way to fit this in.
> 
> > Also for old CPUs with no stack prediction engine we split either one or
> > two push instructions into adjustemnet+move pair.  I do not see how to
> > put that into game, since the cost of 1 or 2 reigsters then differs from
> > 3 or more, but also I think we do not need to care about this, since all
> > reaosnably current CPUs have stack prediction.
> 
> Yeah.  The hook does allow you test how many registers have been pushed,
> and how many will be pushed after the change that is being costed.
> But giving a higher cost for the first two registers would probably
> tend to penalise using callee-saved registers for the first few allocnos
> that we colour, which are also likely to be the most important allocnos.
> Trying to cost the difference might therefore be counter-productive.

Actually my memory got this backwards. While I experimented by avoiding
only some push/pop instructions on CPUs w/o stack engine (those were
produced before 2003) it is not in mainline.  All we do is the oposite
conversion. Sometimes we turn sub/add of ESP into shorter but more
expensive push or pop.  This may be accounted in frame allocation cost,
but again, it is only about extra old CPUs.

Honza
> 
> > I am benchmarking updated patch and will send once it is done.
> 
> Thanks!
> 
> Richard

Reply via email to