> Jan Hubicka <hubi...@ucw.cz> writes: > > Concerning x86 specifics, there is cost for allocating stack frame. So > > if the function has nothing on stack frame push/pop becomes bit better > > candidate then a spill. The hook you added does not seem to be able to > > test this, since it does not have frame size as an parameter. I wonder > > if there is easy way to get it in? > > The main frame size is available globally as get_frame_size (). > There's also the question of whether a frame needs to be created > for other reasons, such as an alloca call, but I suppose setting > up a frame for just alloca would also use push on x86?
Usually the frame is first created by push/pop instructions (which are callee saves and possibly frame pointer) and the remaining capacity is allocated using add/sub of ESP pointer. If these can be avoided we save about 8 bytes of code. Performance wise the stack engine will likely completely hide the overhead of extra add/sub. We need add/sub for caller saves, spilling and on-stack variables. We may be able to hide it in red-zone, but only for leafs. get_frame_size I think only tells me about hte on-stack variables at the time ira-color is performed. This is something that would be nice to model better, but also is likely not critical. So I only mentioned it in case you or Vladimir can come up with a nice way to fit this in. > > > Also for old CPUs with no stack prediction engine we split either one or > > two push instructions into adjustemnet+move pair. I do not see how to > > put that into game, since the cost of 1 or 2 reigsters then differs from > > 3 or more, but also I think we do not need to care about this, since all > > reaosnably current CPUs have stack prediction. > > Yeah. The hook does allow you test how many registers have been pushed, > and how many will be pushed after the change that is being costed. > But giving a higher cost for the first two registers would probably > tend to penalise using callee-saved registers for the first few allocnos > that we colour, which are also likely to be the most important allocnos. > Trying to cost the difference might therefore be counter-productive. Actually my memory got this backwards. While I experimented by avoiding only some push/pop instructions on CPUs w/o stack engine (those were produced before 2003) it is not in mainline. All we do is the oposite conversion. Sometimes we turn sub/add of ESP into shorter but more expensive push or pop. This may be accounted in frame allocation cost, but again, it is only about extra old CPUs. Honza > > > I am benchmarking updated patch and will send once it is done. > > Thanks! > > Richard