"musttail" statement attribute for GCC?

2021-04-23 Thread Josh Haberman via Gcc
Would it be feasible to implement a "musttail" statement attribute in
GCC to get a guarantee that tail call optimization will be performed?

Such an attribute has just landed in the trunk of Clang
(https://reviews.llvm.org/D99517). It makes it possible to write
algorithms that use arbitrarily long chains of tail calls without risk
of blowing the stack. I would love to see something like this land in
GCC also (ultimately I'd love to see it standardized).

I wrote more about some motivation for guaranteed tail calls here:
https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html

GCC successfully optimizes tail calls in many cases already. What
would it take to provide an actual guarantee, and make it apply to
non-optimized builds also?

The Clang implementation enforces several rules that must hold for the
attribute to be correct, including:

- It must be on a function call that is tail position.
- Caller and callee must have compatible function signatures
(including the implicit "this", if any), differing only in cv
qualifiers.
- Caller and callee must use the same calling convention.
- Caller and callee may not be constructors or destructors.
- All arguments, the return type, and any temporaries created must be
trivially destructible.
- All variables currently in scope must be trivially destructible.
- The caller cannot be in a try block, an Objective-C block, or a
statement expression.

Some of these restrictions may be overly strict for some calling
conventions, but they are useful as the "least common denominator"
that should be safe in all cases. When implementing this in Clang, we
found that we were able to reuse some of the checks around goto and
asm goto, since a tail call is sort of like a goto out of the
function's scope.

Thanks,
Josh


Re: "musttail" statement attribute for GCC?

2021-04-23 Thread Josh Haberman via Gcc
David Malcolm via Gcc  wrote:
> FWIW I implemented something like this in GCC's middle-end here:
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105
> exposing it in API form for libgccjit:
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef
> https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call
> but IIRC it's not yet exposed to the regular GCC frontends.

That's great to hear that there is some existing infrastructure around this.

Your code includes some checks that we should consider adding to Clang
(especially the check against "returns twice"). Why the prohibition
against noreturn though? I've found that tail calls to noreturn
functions are useful for longjmp()-based error handling, and musttail
could be a useful workaround for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837#c2.

On Fri, Apr 23, 2021 at 1:10 PM Iain Sandoe  wrote:
> I did try to use it this ^ for GCC coroutines (where such a guarantee is
> pretty important)
>
> However, the issue there is that not all targets support indirect tailcalls.

I'm not familiar with this limitation. What targets do not support
indirect tail calls?

> What about tailcalls between DSOs?

What issues are there around DSOs? Clang/LLVM don't treat these
specially AFAIK, and it seems that tail calls through a PLT should
work okay?

Thanks,
Josh


Re: "musttail" statement attribute for GCC?

2021-04-23 Thread Josh Haberman via Gcc
On Fri, Apr 23, 2021 at 4:34 PM Andrew Pinski  wrote:
>
> On Fri, Apr 23, 2021 at 2:45 PM Josh Haberman via Gcc  wrote:
> >
> > I wrote more about some motivation for guaranteed tail calls here:
> > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html
>
> So I was looking into the use case presented in that web page, it is
> the same reason why GCC has the computed gotos extension already.
> Is there a reason why you can't just use that instead?

Computed gotos solve some of the problems associated with interpreter
main loops, but not others.

Computed gotos help avoid a bounds check on the switch(). They can
sometimes help ensure that the dispatch jmp is replicated at the end
of each case, thus giving each case a unique branch prediction
address, though this effect is somewhat fragile since the compiler
will sometimes decide to merge these paths into one:
https://godbolt.org/z/T9a6173WP

Computed gotos cannot help with the two main issues mentioned in the article:
1. The larger a function is, and the more complex and connected its
control flow, the harder it is for the compiler’s register allocator
to keep the most important data in registers.
2. When fast paths and slow paths are intermixed in the same function,
the presence of the slow paths compromises the code quality of the
fast paths.

Tail calls can help solve both of these problems:
1. If we break down the big interpreter function into many small
functions, one per operation, we get control over register allocation
at each function boundary. As long as each function is simple enough
not to spill, this data will be kept in registers continuously through
all the fast paths.
2. If we put fast paths and slow paths in separate functions entirely,
then the slow paths cannot negatively affect the code generation of
the fast paths.

We spent a lot of time trying to work within the framework of computed
gotos but were unable to get the results we wanted. Tail calls
generated substantially better code than anything else we tried.

There are also some practical reasons why tail calls are better for
us, particularly that they make it easier to reuse the code across
different "instantiations" of the interpreter. For the case of
protobuf parsing, each protobuf message has a distinct set of fields
with different field numbers and types. With tail calls we can create
a distinct dispatch table for each message type, but the function
pointers in these tables can point to a set of functions that are
shared across all message types.

Thanks,
Josh