Ajit, Please check it out the -fshrink-wrap option.
~Umesh On Mon, Nov 24, 2014 at 5:17 PM, Ajit Kumar Agarwal <ajit.kumar.agar...@xilinx.com> wrote: > All: > > The optimization of reducing save and restore of the callee and caller saved > register has been the attention Of > increasing the performance of the benchmark. The callee saved registers is > saved at the entry and restore at the > exit of the procedure if the register is reused inside the procedure whereas > the caller save registers at the Caller site > is saved before the call and the restore after the variable is live and spans > through the call. > > The GCC port has done some optimization whereas the call-used registers are > live inside the procedure and has been > set as 1 bit then it will not be saved and restored. This is based on the > data flow analysis. > > The callee saved registers is useful when there all multiple calls in the > call graph whereas the caller save registers are > useful if the call is the leaf procedure then the saving before the call and > restore after the call will be useful and increases > the performance. > > By traversing the call graph in depth-first-order and the bottom-up approach > we can propagate the save and restore > At the procedure entry and exit to the upper regions of the call graph which > reduces the save and restore at all the lower > Regions across the various lower calls. These decision can be made based on > the frequency of the call in the call graph as > Proposed by Fred Chow. > > Another approach to reducing the save and restore at the procedure entry and > exit is moving the save and restore from > The procedure entry to the active regions where the variable is Live inside > the procedure based on the data flow analysis > And thus improve the performance of many benchmarks as proposed by Fred Chow. > > The propagation of save and restore from the lower regions of the call graph > to the upper regions is not implemented in > The GCC framework and also the moving the save and restore to the active > region of Liveness inside the procedure from > The entry and exit is not implemented inside the GCC framework. Can this be > proposed and implemented in GCC framework? > > For the open procedure whereas the indirect calls, recursive calls and the > external linkage calls cannot be optimized and in > The open case the save and restore at the entry and exit of the procedure is > applied. But for the open procedure if all the > Lower calls in the call-graph is closed and resolved through call-graph, the > save and restore can be propagate to the upper > Region in the open procedures from the lower region of the calls which are > closed and resolved. This can also improve the > Performance of many benchmarks and can this be proposed and implemented in > GCC framework? > > Let me know what do you think. > > Thanks & Regards > Ajit > > -----Original Message----- > From: Ajit Kumar Agarwal > Sent: Tuesday, November 18, 2014 7:01 PM > To: 'Vladimir Makarov'; gcc Mailing List > Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala > Subject: RE: Optimized Allocation of Argument registers > > > > -----Original Message----- > From: Vladimir Makarov [mailto:vmaka...@redhat.com] > Sent: Tuesday, November 18, 2014 1:57 AM > To: Ajit Kumar Agarwal; gcc Mailing List > Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala > Subject: Re: Optimized Allocation of Argument registers > > On 2014-11-17 8:13 AM, Ajit Kumar Agarwal wrote: >> Hello All: >> >> I was looking at the optimized usage and allocation to argument registers. >> There are two aspects to it as follows. >> >> 1. We need to specify the argument registers as followed by ABI in the >> target specific code. Based on the function >> argument registers defined in the target dependent code the function >> argument registers are passed. If the >> number of argument registers defined in the Architecture is large say 6/8 >> function argument registers. >> Most of the time in the benchmarks we don't pass so many arguments and the >> number of arguments passed >> is quite less. Since we reserve the function arguments as specified >> in the target specific code for the given architecture, leads to unoptimized >> usage as this function argument registers will not be used in the function. >> Thus we need to steal some of the arguments registers and have the >> usage of those in the function depending on the support of the number of >> function argument registers. The stealing of function argument registers will >> lead more number of registers available that are to be used in the >> function and leading to less spill and fetch. >> > >>>The argument registers should be not reserved. They should be present in >>>RTL and RA allocator will figure out itself when it can use them. >>>That is how other ports work. > > Thanks Vladimir for Clarifications. > >> 2. The other aspect of the function argument registers is not spill >> and fetch the argument registers as they are live across the function >> call. But the liveness is limited to certain point of the called >> function after that point the function argument registers are not live >> and can be used inside the called function. Other aspect is if there is a >> shortage of registers than can the function argument registers should be >> used as spill candidate? Will this lead to the optimized code. >> >> > >>>You can remove unnecessary code to save/restore arg registers around calls >>>if you can figure out that they are not used in called functions. > >> There is already code for this written by Tom de Vries. So you can use > it. > > Is the code written by Tom de Vries already a part of trunk? Could you > please give the pointer to this part of code. > > Thanks & Regards > Ajit