All: The optimization of reducing save and restore of the callee and caller saved register has been the attention Of increasing the performance of the benchmark. The callee saved registers is saved at the entry and restore at the exit of the procedure if the register is reused inside the procedure whereas the caller save registers at the Caller site is saved before the call and the restore after the variable is live and spans through the call.
The GCC port has done some optimization whereas the call-used registers are live inside the procedure and has been set as 1 bit then it will not be saved and restored. This is based on the data flow analysis. The callee saved registers is useful when there all multiple calls in the call graph whereas the caller save registers are useful if the call is the leaf procedure then the saving before the call and restore after the call will be useful and increases the performance. By traversing the call graph in depth-first-order and the bottom-up approach we can propagate the save and restore At the procedure entry and exit to the upper regions of the call graph which reduces the save and restore at all the lower Regions across the various lower calls. These decision can be made based on the frequency of the call in the call graph as Proposed by Fred Chow. Another approach to reducing the save and restore at the procedure entry and exit is moving the save and restore from The procedure entry to the active regions where the variable is Live inside the procedure based on the data flow analysis And thus improve the performance of many benchmarks as proposed by Fred Chow. The propagation of save and restore from the lower regions of the call graph to the upper regions is not implemented in The GCC framework and also the moving the save and restore to the active region of Liveness inside the procedure from The entry and exit is not implemented inside the GCC framework. Can this be proposed and implemented in GCC framework? For the open procedure whereas the indirect calls, recursive calls and the external linkage calls cannot be optimized and in The open case the save and restore at the entry and exit of the procedure is applied. But for the open procedure if all the Lower calls in the call-graph is closed and resolved through call-graph, the save and restore can be propagate to the upper Region in the open procedures from the lower region of the calls which are closed and resolved. This can also improve the Performance of many benchmarks and can this be proposed and implemented in GCC framework? Let me know what do you think. Thanks & Regards Ajit -----Original Message----- From: Ajit Kumar Agarwal Sent: Tuesday, November 18, 2014 7:01 PM To: 'Vladimir Makarov'; gcc Mailing List Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: RE: Optimized Allocation of Argument registers -----Original Message----- From: Vladimir Makarov [mailto:vmaka...@redhat.com] Sent: Tuesday, November 18, 2014 1:57 AM To: Ajit Kumar Agarwal; gcc Mailing List Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: Optimized Allocation of Argument registers On 2014-11-17 8:13 AM, Ajit Kumar Agarwal wrote: > Hello All: > > I was looking at the optimized usage and allocation to argument registers. > There are two aspects to it as follows. > > 1. We need to specify the argument registers as followed by ABI in the target > specific code. Based on the function > argument registers defined in the target dependent code the function > argument registers are passed. If the > number of argument registers defined in the Architecture is large say 6/8 > function argument registers. > Most of the time in the benchmarks we don't pass so many arguments and the > number of arguments passed > is quite less. Since we reserve the function arguments as specified > in the target specific code for the given architecture, leads to unoptimized > usage as this function argument registers will not be used in the function. > Thus we need to steal some of the arguments registers and have the > usage of those in the function depending on the support of the number of > function argument registers. The stealing of function argument registers will > lead more number of registers available that are to be used in the function > and leading to less spill and fetch. > >>The argument registers should be not reserved. They should be present in RTL >>and RA allocator will figure out itself when it can use them. >>That is how other ports work. Thanks Vladimir for Clarifications. > 2. The other aspect of the function argument registers is not spill > and fetch the argument registers as they are live across the function > call. But the liveness is limited to certain point of the called > function after that point the function argument registers are not live > and can be used inside the called function. Other aspect is if there is a > shortage of registers than can the function argument registers should be used > as spill candidate? Will this lead to the optimized code. > > >>You can remove unnecessary code to save/restore arg registers around calls if >>you can figure out that they are not used in called functions. >> There is already code for this written by Tom de Vries. So you can use it. Is the code written by Tom de Vries already a part of trunk? Could you please give the pointer to this part of code. Thanks & Regards Ajit