On Fri, 2007-10-26 at 20:26 -0400, Robert Dewar wrote: > skaller wrote: > > > So I am guessing the Felix version is lucky there are > > no gratuitous temporaries to be saved when this happens, > > and the C code is unlucky and there are. > > > > Maybe someone who knows how the optimiser works can comment? > > One problem with departing from the ABI even on a local level > like this is that it wipes out lots of tools that depend on > ABI compliance for the entire call chain. I suspect the overall > gain is too small to be worth this hit.
By 'tools' I presume you mean functions in the gcc compiler itself? Or at least tools that examine intermediate language outputs? I suspect the opposite for two reasons: recursion is common for FPLs such as Mlton, GHC and Felix which target C. Also, 'wrapper functions' where the outer function is ABI compliant and the inner one is not, have a use in loops and indeed any code and probably make more difference to optimisation than any low level considerations such as whether a variable is aliased in a register (these days, caches do that automatically anyhow). Without wrappers you're going to have gratuitous loads and stores of caller save registers, whereas with a wrapper you can avoid this by calling the inner function knowing it doesn't actually use that register. in turn the inner function can be tuned to the needs of the call site(s). This is both important and easiest for a simple self-recursion, since the only call site is in the inner function itself, and with recursion any temporary persisting across a call is an instant massive performance hit. In fact, ABI compliance for any ABI with 'caller save' registers disables tail-rec optimisation, so there is not really any choice at all but to do this optimisation. A trivial example is a C++ list destructor: struct Node { int x; Node *next; ~Node() { delete next; } }; That's a tail recursion. If the destructor caller save registers are pushed before the recursive delete, and popped afterwards, the function will kill your program by pushing lots of registers (including the return address) onto the stack, for every element in the list. Make a list with a few million elements (quite common), and you can interpret 'kill' literally: the program will segfault because it is out of stack. No stack at all is required to implement this function if you use an inner wrapper and so make a tail call possible. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net