Hi Richard,

> On 14 Sep 2023, at 11:18, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> On Wed, Sep 6, 2023 at 5:44 PM FX Coudert <fxcoud...@gmail.com> wrote:
>> 

>> ping**2 on the revised patch, for Richard or another global reviewer. So far 
>> all review feedback is that it’s a step forward, and it’s been widely used 
>> for both aarch64-darwin and x86_64-darwin distributions for almost three 
>> years now.
>> 
>> OK to commit?
> 
> I just noticed that ftrampoline-impl isn't Optimize, thus it's not
> streamed with LTO.

I think this is fine, the nested pass runs before LTO streaming and lowers to 
the relevant built-ins for the chosen impl.
The builtins are distinct and can co-exist in the linked exe,

>  How does mixing different -ftrampoline-impl for different LTO TUs behave?

Assuming that a target can support multiple implementations, then each is 
applied local to a single TU.  The nested functions are scoped within their 
parent and thus should not be candidates for merging by LTO.

For a target that cannot support both, then one or more of the TUs should be 
rejected before we even get to LTO.

>  How does mis-specifying -ftrampoline-impl at LTO link time compared to 
> compile-time behave?

The flag should be a  NOP at LTO link time (but I do not think we want to 
reject it, that would probably create other issues?)

>  Is the state fully reflected during pre-IPA compilation and the flag not 
> needed after that?  

yes, that is my understanding, nested runs very early.

> It appears so, but did you check?

I actually checked on x86_64-darwin (which does support both) and we see…
here with two tus with nested fns and a third with the main().

$ nm -mapv ./nn.ltrans0.ltrans.o

as expected, two instances of the nested “bar”.

00000000000001a8 (__TEXT,__cstring) non-external lC0
000000000000001f (__TEXT,__text) non-external _bar.0.lto_priv.0 
00000000000001d0 (__TEXT,__cstring) non-external lC1
00000000000000ec (__TEXT,__text) non-external _bar.0.lto_priv.1
000000000000007c (__TEXT,__text) external _foo_1
0000000000000149 (__TEXT,__text) external _foo_2
0000000000000000 (__TEXT,__text) external _main

>>> these for heap-based:
                 (undefined) external ___builtin_nested_func_ptr_created 
                 (undefined) external ___builtin_nested_func_ptr_deleted

>>> this for stack-based.
                 (undefined) external ___enable_execute_stack

(and the code executes as expected).

> OK if that's a non-issue.

thanks, we'll wait a day or two in case of any follow-on comments,
Iain

P.S. I was investigating some unrelated unwinder issues a couple of weeks ago, 
but that did highlight that we have a possibility to avoid the leaks from 
longjump if we hang on the forced_unwind() machinery [TODO, tho, not part of 
this initial patch]


> 
> Thanks,
> Richard.
> 
>> FX
>> 
>> 
>> 
>>> Le 5 août 2023 à 16:20, FX Coudert <fxcoud...@gmail.com> a écrit :
>>> 
>>> Hi Richard,
>>> 
>>> Thanks for your feedback. Here is an amended version of the patch, taking 
>>> into consideration your requests and the following discussion. There is no 
>>> configure option for the libgcc part, and the documentation is amended. The 
>>> patch is split into three commits for core, target and libgcc.
>>> 
>>> Currently regtesting on x86_64 linux and darwin (it was fine before I split 
>>> up into three commits, so I’m re-testing to make sure I didn’t screw 
>>> anything up).
>>> 
>>> OK to commit?
>>> FX
>> 

Reply via email to