Ajit,

Please check it out  the -fshrink-wrap option.


~Umesh

On Mon, Nov 24, 2014 at 5:17 PM, Ajit Kumar Agarwal
<ajit.kumar.agar...@xilinx.com> wrote:
> All:
>
> The optimization of reducing save and restore of the callee and caller saved 
> register has been the attention Of
> increasing the performance of the benchmark. The callee saved registers is 
> saved at the entry and restore at the
> exit of the procedure if the register is reused inside the procedure whereas 
> the caller save registers at the Caller site
> is saved before the call and the restore after the variable is live and spans 
> through the call.
>
> The GCC port has done some optimization whereas the call-used registers are 
> live inside the procedure and has been
> set as 1 bit then it will not be saved and restored. This is based on the 
> data flow analysis.
>
> The callee saved registers is useful when there all multiple calls in the 
> call graph whereas the caller save registers are
> useful if the call is the leaf procedure then the saving before the call and 
> restore after the call will be useful  and increases
>  the performance.
>
> By traversing the call graph in depth-first-order and the bottom-up approach 
> we can propagate the save and restore
> At the procedure entry and exit to the upper regions of the call graph which 
> reduces the save and restore at all the lower
> Regions across the various lower calls. These decision can be made based on 
> the frequency of the call in the call graph as
> Proposed by Fred Chow.
>
> Another approach to reducing the save and restore at the procedure entry and 
> exit is moving the save and restore from
> The procedure entry to the active regions where the variable is Live inside 
> the procedure based on the data flow analysis
> And thus improve the performance of many benchmarks as proposed by Fred Chow.
>
> The propagation of save and restore from the lower regions of the call graph 
> to the upper regions is not implemented in
> The GCC framework and also the moving the save and restore to the active 
> region of Liveness inside the procedure from
> The entry and exit is not implemented inside the GCC framework. Can this be 
> proposed and implemented in GCC framework?
>
> For the open procedure whereas the indirect calls, recursive calls and the 
> external linkage calls cannot be optimized and in
> The open case the save and restore at the entry and exit of the procedure is 
> applied. But for the open procedure if all the
> Lower calls in the call-graph is closed and resolved through call-graph, the 
> save and restore can be propagate to the upper
> Region in the open procedures from the lower region of the calls which are 
> closed and resolved. This can also improve the
> Performance of many benchmarks and can this be proposed and implemented in 
> GCC framework?
>
> Let me know what do you think.
>
> Thanks & Regards
> Ajit
>
> -----Original Message-----
> From: Ajit Kumar Agarwal
> Sent: Tuesday, November 18, 2014 7:01 PM
> To: 'Vladimir Makarov'; gcc Mailing List
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: RE: Optimized Allocation of Argument registers
>
>
>
> -----Original Message-----
> From: Vladimir Makarov [mailto:vmaka...@redhat.com]
> Sent: Tuesday, November 18, 2014 1:57 AM
> To: Ajit Kumar Agarwal; gcc Mailing List
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: Optimized Allocation of Argument registers
>
> On 2014-11-17 8:13 AM, Ajit Kumar Agarwal wrote:
>> Hello All:
>>
>> I was looking at the optimized usage and allocation to argument registers. 
>> There are two aspects to it as follows.
>>
>> 1. We need to specify the argument registers as followed by ABI in the 
>> target specific code. Based on the function
>>   argument registers defined in the target dependent code the function 
>> argument registers are passed. If the
>>   number of argument registers defined in the Architecture is large say 6/8 
>> function argument registers.
>> Most of the time in the benchmarks we don't pass so many arguments and the 
>> number of arguments passed
>>   is quite less. Since we reserve the function arguments as specified
>> in the target specific code for the given architecture, leads to unoptimized 
>> usage as this function argument registers will not be used in the function.
>> Thus we need to steal some of the arguments registers and have the
>> usage of those in the function depending on the support of the number of 
>> function argument registers. The stealing of function argument registers will
>>   lead more number of registers available that are to be used in the 
>> function and leading to less spill and fetch.
>>
>
>>>The argument registers should be not reserved.  They should be present in 
>>>RTL and RA allocator will figure out itself when it can use them.
>>>That is how other ports work.
>
> Thanks Vladimir for Clarifications.
>
>> 2. The other aspect of the function argument registers is not spill
>> and fetch the argument registers as they are live across the function
>> call. But the liveness is limited to certain point of the called
>> function after that point the function argument registers are not live
>> and can be used inside the called function. Other aspect is if there is a 
>> shortage of registers than can the function argument registers should be 
>> used as spill candidate? Will this lead to the optimized code.
>>
>>
>
>>>You can remove unnecessary code to save/restore arg registers around calls 
>>>if you can figure out that they are not used in called functions.
>  >> There is already code for this written by Tom de Vries.  So you can use 
> it.
>
> Is the code written by Tom de Vries already a part of trunk?  Could you 
> please give the pointer to this part of code.
>
> Thanks & Regards
> Ajit

Reply via email to