Hi Richard, > On 14 Sep 2023, at 11:18, Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Sep 6, 2023 at 5:44 PM FX Coudert <fxcoud...@gmail.com> wrote: >>
>> ping**2 on the revised patch, for Richard or another global reviewer. So far >> all review feedback is that it’s a step forward, and it’s been widely used >> for both aarch64-darwin and x86_64-darwin distributions for almost three >> years now. >> >> OK to commit? > > I just noticed that ftrampoline-impl isn't Optimize, thus it's not > streamed with LTO. I think this is fine, the nested pass runs before LTO streaming and lowers to the relevant built-ins for the chosen impl. The builtins are distinct and can co-exist in the linked exe, > How does mixing different -ftrampoline-impl for different LTO TUs behave? Assuming that a target can support multiple implementations, then each is applied local to a single TU. The nested functions are scoped within their parent and thus should not be candidates for merging by LTO. For a target that cannot support both, then one or more of the TUs should be rejected before we even get to LTO. > How does mis-specifying -ftrampoline-impl at LTO link time compared to > compile-time behave? The flag should be a NOP at LTO link time (but I do not think we want to reject it, that would probably create other issues?) > Is the state fully reflected during pre-IPA compilation and the flag not > needed after that? yes, that is my understanding, nested runs very early. > It appears so, but did you check? I actually checked on x86_64-darwin (which does support both) and we see… here with two tus with nested fns and a third with the main(). $ nm -mapv ./nn.ltrans0.ltrans.o as expected, two instances of the nested “bar”. 00000000000001a8 (__TEXT,__cstring) non-external lC0 000000000000001f (__TEXT,__text) non-external _bar.0.lto_priv.0 00000000000001d0 (__TEXT,__cstring) non-external lC1 00000000000000ec (__TEXT,__text) non-external _bar.0.lto_priv.1 000000000000007c (__TEXT,__text) external _foo_1 0000000000000149 (__TEXT,__text) external _foo_2 0000000000000000 (__TEXT,__text) external _main >>> these for heap-based: (undefined) external ___builtin_nested_func_ptr_created (undefined) external ___builtin_nested_func_ptr_deleted >>> this for stack-based. (undefined) external ___enable_execute_stack (and the code executes as expected). > OK if that's a non-issue. thanks, we'll wait a day or two in case of any follow-on comments, Iain P.S. I was investigating some unrelated unwinder issues a couple of weeks ago, but that did highlight that we have a possibility to avoid the leaks from longjump if we hang on the forced_unwind() machinery [TODO, tho, not part of this initial patch] > > Thanks, > Richard. > >> FX >> >> >> >>> Le 5 août 2023 à 16:20, FX Coudert <fxcoud...@gmail.com> a écrit : >>> >>> Hi Richard, >>> >>> Thanks for your feedback. Here is an amended version of the patch, taking >>> into consideration your requests and the following discussion. There is no >>> configure option for the libgcc part, and the documentation is amended. The >>> patch is split into three commits for core, target and libgcc. >>> >>> Currently regtesting on x86_64 linux and darwin (it was fine before I split >>> up into three commits, so I’m re-testing to make sure I didn’t screw >>> anything up). >>> >>> OK to commit? >>> FX >>