Hello,
I looked into updating the hook
> -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> +/* Implement TARGET_CALLEE_SAVE_COST.  */
>  
>  static int
> -ix86_ira_callee_saved_register_cost_scale (int)
> +ix86_callee_save_cost (spill_cost_type, unsigned int, machine_mode,
> +                    unsigned int, int mem_cost, const HARD_REG_SET &, bool)
>  {
> -  return 1;
> +  /* Account for the fact that push and pop are shorter and do their
> +     own allocation and deallocation.  */
> +  return mem_cost - 2;
>  }

I think this is fine for usual performance metrics of push/pop.  For
size we now end up with cost of 0, which is likely not right, so I added
a special case and return 1.  Size costs do not quite correspond to
mov-mov sizes, so I will try to fix it and see if that results in better
code size.

I also added a test that regno in question is integer registers.  While
we do not callee save XMM for the defualt ABI, Microsoft version does.
I am not sure how push2 and pushp extensions comes into game, but we can
do that once we have hardward to test.

Concerning x86 specifics, there is cost for allocating stack frame.  So
if the function has nothing on stack frame push/pop becomes bit better
candidate then a spill.  The hook you added does not seem to be able to
test this, since it does not have frame size as an parameter.  I wonder
if there is easy way to get it in?

Also for old CPUs with no stack prediction engine we split either one or
two push instructions into adjustemnet+move pair.  I do not see how to
put that into game, since the cost of 1 or 2 reigsters then differs from
3 or more, but also I think we do not need to care about this, since all
reaosnably current CPUs have stack prediction.

I am benchmarking updated patch and will send once it is done.

Honza

Reply via email to