On Fri, 2007-10-26 at 20:26 -0400, Robert Dewar wrote:
> skaller wrote:
> 
> > So I am guessing the Felix version is lucky there are
> > no gratuitous temporaries to be saved when this happens,
> > and the C code is unlucky and there are.
> > 
> > Maybe someone who knows how the optimiser works can comment?
> 
> One problem with departing from the ABI even on a local level
> like this is that it wipes out lots of tools that depend on
> ABI compliance for the entire call chain. I suspect the overall
> gain is too small to be worth this hit.

By 'tools' I presume you mean functions in the gcc compiler
itself? Or at least tools that examine intermediate language
outputs?

I suspect the opposite for two reasons: recursion is common
for FPLs such as Mlton, GHC and Felix which target C.

Also, 'wrapper functions' where the outer function is
ABI compliant and the inner one is not, have a use in
loops and indeed any code and probably make more difference
to optimisation than any low level considerations such as
whether a variable is aliased in a register (these days,
caches do that automatically anyhow).

Without wrappers you're going to have gratuitous loads
and stores of caller save registers, whereas with a wrapper
you can avoid this by calling the inner function knowing
it doesn't actually use that register. in turn the inner function 
can be tuned to the needs of the call site(s).

This is both important and easiest for a simple self-recursion,
since the only call site is in the inner function itself,
and with recursion any temporary persisting across a call
is an instant massive performance hit.

In fact, ABI compliance for any ABI with 'caller save' registers
disables tail-rec optimisation, so there is not really any choice 
at all but to do this optimisation. A trivial example is a C++
list destructor:

        struct Node { int x; Node *next;
                ~Node() { delete next; }
        };

That's a tail recursion. If the destructor caller save registers
are pushed before the recursive delete, and popped afterwards,
the function will kill your program by pushing
lots of registers (including the return address) onto the stack,
for every element in the list. Make a list with a few million
elements (quite common), and you can interpret 'kill' literally:
the program will segfault because it is out of stack.

No stack at all is required to implement this function if you use
an inner wrapper and so make a tail call possible.



-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

Reply via email to