"musttail" statement attribute for GCC?
Would it be feasible to implement a "musttail" statement attribute in GCC to get a guarantee that tail call optimization will be performed? Such an attribute has just landed in the trunk of Clang (https://reviews.llvm.org/D99517). It makes it possible to write algorithms that use arbitrarily long chains of tail calls without risk of blowing the stack. I would love to see something like this land in GCC also (ultimately I'd love to see it standardized). I wrote more about some motivation for guaranteed tail calls here: https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html GCC successfully optimizes tail calls in many cases already. What would it take to provide an actual guarantee, and make it apply to non-optimized builds also? The Clang implementation enforces several rules that must hold for the attribute to be correct, including: - It must be on a function call that is tail position. - Caller and callee must have compatible function signatures (including the implicit "this", if any), differing only in cv qualifiers. - Caller and callee must use the same calling convention. - Caller and callee may not be constructors or destructors. - All arguments, the return type, and any temporaries created must be trivially destructible. - All variables currently in scope must be trivially destructible. - The caller cannot be in a try block, an Objective-C block, or a statement expression. Some of these restrictions may be overly strict for some calling conventions, but they are useful as the "least common denominator" that should be safe in all cases. When implementing this in Clang, we found that we were able to reuse some of the checks around goto and asm goto, since a tail call is sort of like a goto out of the function's scope. Thanks, Josh
Re: "musttail" statement attribute for GCC?
David Malcolm via Gcc wrote: > FWIW I implemented something like this in GCC's middle-end here: > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105 > exposing it in API form for libgccjit: > https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef > https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call > but IIRC it's not yet exposed to the regular GCC frontends. That's great to hear that there is some existing infrastructure around this. Your code includes some checks that we should consider adding to Clang (especially the check against "returns twice"). Why the prohibition against noreturn though? I've found that tail calls to noreturn functions are useful for longjmp()-based error handling, and musttail could be a useful workaround for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837#c2. On Fri, Apr 23, 2021 at 1:10 PM Iain Sandoe wrote: > I did try to use it this ^ for GCC coroutines (where such a guarantee is > pretty important) > > However, the issue there is that not all targets support indirect tailcalls. I'm not familiar with this limitation. What targets do not support indirect tail calls? > What about tailcalls between DSOs? What issues are there around DSOs? Clang/LLVM don't treat these specially AFAIK, and it seems that tail calls through a PLT should work okay? Thanks, Josh
Re: "musttail" statement attribute for GCC?
On Fri, Apr 23, 2021 at 4:34 PM Andrew Pinski wrote: > > On Fri, Apr 23, 2021 at 2:45 PM Josh Haberman via Gcc wrote: > > > > I wrote more about some motivation for guaranteed tail calls here: > > https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html > > So I was looking into the use case presented in that web page, it is > the same reason why GCC has the computed gotos extension already. > Is there a reason why you can't just use that instead? Computed gotos solve some of the problems associated with interpreter main loops, but not others. Computed gotos help avoid a bounds check on the switch(). They can sometimes help ensure that the dispatch jmp is replicated at the end of each case, thus giving each case a unique branch prediction address, though this effect is somewhat fragile since the compiler will sometimes decide to merge these paths into one: https://godbolt.org/z/T9a6173WP Computed gotos cannot help with the two main issues mentioned in the article: 1. The larger a function is, and the more complex and connected its control flow, the harder it is for the compiler’s register allocator to keep the most important data in registers. 2. When fast paths and slow paths are intermixed in the same function, the presence of the slow paths compromises the code quality of the fast paths. Tail calls can help solve both of these problems: 1. If we break down the big interpreter function into many small functions, one per operation, we get control over register allocation at each function boundary. As long as each function is simple enough not to spill, this data will be kept in registers continuously through all the fast paths. 2. If we put fast paths and slow paths in separate functions entirely, then the slow paths cannot negatively affect the code generation of the fast paths. We spent a lot of time trying to work within the framework of computed gotos but were unable to get the results we wanted. Tail calls generated substantially better code than anything else we tried. There are also some practical reasons why tail calls are better for us, particularly that they make it easier to reuse the code across different "instantiations" of the interpreter. For the case of protobuf parsing, each protobuf message has a distinct set of fields with different field numbers and types. With tail calls we can create a distinct dispatch table for each message type, but the function pointers in these tables can point to a set of functions that are shared across all message types. Thanks, Josh