Crossposting to gcc. On Fri, Feb 01, 2013 at 08:52:48AM -0800, Richard Henderson wrote: > On 01/31/2013 04:37 AM, Ondřej Bílka wrote: > >To also use my implementation of strlen in runtime linker > >use following patch. > > > >It uses fact that xmm are call clobbered and only xmm0-xmm7 can be > >used to pass arguments so xmm8-xmm15 are available. > > FYI, on the gcc list, in the context of Cilk+, Intel have been talking > about a new calling convention for "vector" functions that would in > fact use all 16 sse registers for argument passing. > > So, please no. > And did they provide any example where it would lead to simpler code and improve performance?
It would benefit only when function pass 9 and more floats/vectors functions that need this are not performance critical. A calling convention that would help it would keep arguments passed at xmm0-7 but make xmm2-7 caller save. This could be specified by fastcall attribute. This would help quite often, there is a optimization rule not to call any function when using vectors float callculations because pushing/poping them on stack easily increases cost of call by 20 cycles. > > r~