RE: Optimized Allocation of Argument registers

Ajit Kumar Agarwal Mon, 24 Nov 2014 03:48:07 -0800

All:

The optimization of reducing save and restore of the callee and caller saved 
register has been the attention Of
increasing the performance of the benchmark. The callee saved registers is 
saved at the entry and restore at the 
exit of the procedure if the register is reused inside the procedure whereas 
the caller save registers at the Caller site
is saved before the call and the restore after the variable is live and spans 
through the call.

The GCC port has done some optimization whereas the call-used registers are 
live inside the procedure and has been 
set as 1 bit then it will not be saved and restored. This is based on the data 
flow analysis.

The callee saved registers is useful when there all multiple calls in the call 
graph whereas the caller save registers are 
useful if the call is the leaf procedure then the saving before the call and 
restore after the call will be useful  and increases
 the performance.

By traversing the call graph in depth-first-order and the bottom-up approach we 
can propagate the save and restore
At the procedure entry and exit to the upper regions of the call graph which 
reduces the save and restore at all the lower
Regions across the various lower calls. These decision can be made based on the 
frequency of the call in the call graph as 
Proposed by Fred Chow.

Another approach to reducing the save and restore at the procedure entry and 
exit is moving the save and restore from
The procedure entry to the active regions where the variable is Live inside the 
procedure based on the data flow analysis
And thus improve the performance of many benchmarks as proposed by Fred Chow.

The propagation of save and restore from the lower regions of the call graph to 
the upper regions is not implemented in 
The GCC framework and also the moving the save and restore to the active region 
of Liveness inside the procedure from 
The entry and exit is not implemented inside the GCC framework. Can this be 
proposed and implemented in GCC framework?

For the open procedure whereas the indirect calls, recursive calls and the 
external linkage calls cannot be optimized and in 
The open case the save and restore at the entry and exit of the procedure is 
applied. But for the open procedure if all the 
Lower calls in the call-graph is closed and resolved through call-graph, the 
save and restore can be propagate to the upper
Region in the open procedures from the lower region of the calls which are 
closed and resolved. This can also improve the
Performance of many benchmarks and can this be proposed and implemented in GCC 
framework?

Let me know what do you think.

Thanks & Regards
Ajit 

-----Original Message-----
From: Ajit Kumar Agarwal 
Sent: Tuesday, November 18, 2014 7:01 PM
To: 'Vladimir Makarov'; gcc Mailing List
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: RE: Optimized Allocation of Argument registers

-----Original Message-----
From: Vladimir Makarov [mailto:vmaka...@redhat.com]
Sent: Tuesday, November 18, 2014 1:57 AM
To: Ajit Kumar Agarwal; gcc Mailing List
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: Optimized Allocation of Argument registers

On 2014-11-17 8:13 AM, Ajit Kumar Agarwal wrote:
> Hello All:
>
> I was looking at the optimized usage and allocation to argument registers. 
> There are two aspects to it as follows.
>
> 1. We need to specify the argument registers as followed by ABI in the target 
> specific code. Based on the function
>   argument registers defined in the target dependent code the function 
> argument registers are passed. If the
>   number of argument registers defined in the Architecture is large say 6/8 
> function argument registers.
> Most of the time in the benchmarks we don't pass so many arguments and the 
> number of arguments passed
>   is quite less. Since we reserve the function arguments as specified 
> in the target specific code for the given architecture, leads to unoptimized 
> usage as this function argument registers will not be used in the function.
> Thus we need to steal some of the arguments registers and have the 
> usage of those in the function depending on the support of the number of 
> function argument registers. The stealing of function argument registers will
>   lead more number of registers available that are to be used in the function 
> and leading to less spill and fetch.
>

>>The argument registers should be not reserved.  They should be present in RTL 
>>and RA allocator will figure out itself when it can use them. 
>>That is how other ports work.

Thanks Vladimir for Clarifications.

> 2. The other aspect of the function argument registers is not spill 
> and fetch the argument registers as they are live across the function 
> call. But the liveness is limited to certain point of the called 
> function after that point the function argument registers are not live 
> and can be used inside the called function. Other aspect is if there is a 
> shortage of registers than can the function argument registers should be used 
> as spill candidate? Will this lead to the optimized code.
>
>

>>You can remove unnecessary code to save/restore arg registers around calls if 
>>you can figure out that they are not used in called functions. 
 >> There is already code for this written by Tom de Vries.  So you can use it.

Is the code written by Tom de Vries already a part of trunk?  Could you please 
give the pointer to this part of code.

Thanks & Regards
Ajit

RE: Optimized Allocation of Argument registers

Reply via email to