> >>>> a) If a user provides a builtin implementation to LTO, it is discarded, >>>> since by design LTO prefers builtins to user-provided versions of them. In >>>> LTO, builtins are their own prevailing decl. There is an enhancement >>>> request PR here: >>>> >>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997 >>> >>> It definitely should be the other way around and builtins should get their >>> proper entry in the now existent symbol table. >> >> Well, the way we stream builtin decls as special cases is indeed weird. I >> recall >> I once tried to remove that code and it lead to some regressions, but in >> general >> it should no tbe neccesary. > > Yes, we seem to special-case builtins all over the place. I have a > kludge disabling this, just to work on (b) below.
If you can turn the kludge into a patch, I think it would be great to have it in mainline. I do not see builtins to be inherently special here. I was bit by this code several times. >> >> This is the main problem however. As Richi pointed out, even in C this won't >> work. >> We decide inlining at WPA time and since then no inlining is possible and >> all unreachable >> functions are removed. So when you invent new calls to builtins on the way >> you can't expect them to be resonably inlinined. > > Yes, I have been playing with marking any such provided builtins with > cgraph_mark_force_output_node() in the IPA-tm pass. I assume that > anyone linking with implementations of the TM builtins must either want > them inlined, or want them in the final link. But your idea of a new > inline attribute is cleaner and far more generic. This wayou ensure that the function is not removed at beggining of IPA queue (and it gets streamed from LTRANS to WPA). You however won't arrange it to land into every partition, so we really need special functionality here if the inlining decisions has to be deferred to ltrans stage. Moreover if your function is static inline it will be output as offline copy into every compilation unit that is not what we want. > >>> Also you would not have the TM builtin bodies available in your ltrans unit >>> because nothing calls them. So anything that requires LTO (to see the >>> bodies in the first place) but does not expose the calls before LTO bytecode >>> output is not going to work. > > Marking with cgraph_mark_force_output_node() in the IPA-tm pass fixes this. > >> Well, only way I see here is to >> >> a) have special purpose local inlining pass to handle these newly born >> bultins. >> Basically you can re-purpose early inliner for this and run it after your >> pass >> (and we can generalize the machinery for other kind of beasts if needed) >> The early inliner fits better for this than late inliner. > > Yes, this is what I've been doing, but I paused for yall's input when I > had to either rematerialize the gimple bodies, or keep the gimple > optimizations from removing them as each function got compiled. There is mechanizm to save function bodies for recursive inlining. You only need to make save_inline_function_body to return true on your functions. > >> >> b) introduce new kind of functions that are those builtins. You need >> Sort of combination of always_inline, extern and used attributes but not >> quite. >> The new kind of function must >> 1) make partitioner to ship the functions into every partition, >> 2) make unreachable function removal to not remove them even if they >> seem useless, >> 3) make code generation to never produce offline copies of them even if >> they >> are not removed by the unreachable function pass. >> 4) make the final check happy that this type of function may be kept in >> memory >> till end of compilation. >> >> If this seems neccesary I can implement this for you, but I am always >> hesitant >> to add a new type of function into the machinery - we already face the >> complexity >> of having quite few of them. > > I would be delighted if you could work on this, if you think a more > general solution to just forcing the node to be outputted is necessary. > But first let's get rth's input, because I'm still unsure whether the > payoff for inlining so late is sufficient to merit all this work. Well, as I told earlier, this is bit sliperly concept, so I am not completelly wed to it. Shipping the functions into every ltrans unit "just for case" is somewhat expensive if the feature gets widespread use and there is problem on what to do with functions used only by those special functions: cgraph will need to know that these also stays till end of compilation but need not to be output unlss the functions needing them are. At the moment all decisions about what functions to output are done prior the final compilation stage and this makes whole stuff more interwinded. On the other hand if we want to support for LTO we need to solve this problem for i.e. libgcc and basic runtime, too. So perhaps we could have function attribute "runtime" with this effect. I wonder how other compilers are solving this problem? If we won't get better alternatives I wll get the cgraph bits done for you. Honza