Hello, I looked into updating the hook > -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > +/* Implement TARGET_CALLEE_SAVE_COST. */ > > static int > -ix86_ira_callee_saved_register_cost_scale (int) > +ix86_callee_save_cost (spill_cost_type, unsigned int, machine_mode, > + unsigned int, int mem_cost, const HARD_REG_SET &, bool) > { > - return 1; > + /* Account for the fact that push and pop are shorter and do their > + own allocation and deallocation. */ > + return mem_cost - 2; > }
I think this is fine for usual performance metrics of push/pop. For size we now end up with cost of 0, which is likely not right, so I added a special case and return 1. Size costs do not quite correspond to mov-mov sizes, so I will try to fix it and see if that results in better code size. I also added a test that regno in question is integer registers. While we do not callee save XMM for the defualt ABI, Microsoft version does. I am not sure how push2 and pushp extensions comes into game, but we can do that once we have hardward to test. Concerning x86 specifics, there is cost for allocating stack frame. So if the function has nothing on stack frame push/pop becomes bit better candidate then a spill. The hook you added does not seem to be able to test this, since it does not have frame size as an parameter. I wonder if there is easy way to get it in? Also for old CPUs with no stack prediction engine we split either one or two push instructions into adjustemnet+move pair. I do not see how to put that into game, since the cost of 1 or 2 reigsters then differs from 3 or more, but also I think we do not need to care about this, since all reaosnably current CPUs have stack prediction. I am benchmarking updated patch and will send once it is done. Honza