Re: LTO inlining of transactional builtins

Richard Guenther Fri, 22 Jun 2012 06:09:00 -0700

On Fri, Jun 22, 2012 at 2:47 PM, Aldy Hernandez <al...@redhat.com> wrote:
> Hi gentlemen.
>
> I am looking again at LTO + TM.  The goal is to be able to link with the
> implemented _ITM_* functions in libitm.a, and have them inlined into the
> transaction code when profitable.
>
> To refresh everyone's memory, the original problem was two-fold:
>
> a) If a user provides a builtin implementation to LTO, it is discarded,
> since by design LTO prefers builtins to user-provided versions of them.  In
> LTO, builtins are their own prevailing decl.  There is an enhancement
> request PR here:
>
>        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51997


It definitely should be the other way around and builtins should get their
proper entry in the now existent symbol table.

> b) LTO streaming happens before TMMARK.  Since the TMMARK pass is the one
> that instruments memory operations into __builtin_ITM_* calls, even if (a)
> was fixed, LTRANS would have nothing to inline.

Which also means that this has nothing to do with LTO per-se, just that
you'd need LTO to see the bodies of the "builtins".  Use a small C testcase
where you provide the implementation of one of the builtins (well, the one
you end up using) and face the same issue.

Do I understand correctly that inlining the builtin at expansion time is not
good because the implementation detail may depend on how libitm was
configured?

> Unfortunately, the tmmark pass can't be moved earlier, because the point is
> to delay its work so memory and loop optimizations can do its thing before
> memory operations are irreconcilably transformed into function calls.
>
> FYI, the original thread was here:
>
>        http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01258.html
>
> Now unto my current woes... (I'm concentrating on problem (b) here).
>
> My original thought of moving the LTO streaming point after tmmark under
> certain circumstances is a no go.  If at compile time we were to run tmmark
> and then stream LTO out, at link/lto time we will do: inline, ipa-tm,
> optimizations, tmmark.  Unfortunately, IPA-TM will add new TM clones with
> __builtin_ITM_* calls.  This would mean that new clones don't get TM calls
> inlined, while the lexical __atomic blocks do.
>
> rth and I have been talking about re-running inlining after tmmark
> specifically for the TM builtins.
>
> As you can imagine the pass manager isn't designed to run IPA passes after
> the regular optimization passes run, and I don't see a generic need for this
> apart from the TM problem-- although I could be wrong.
>
> I tried playing with forcing another run of the early inliner after tmmark,
> since it is designed as a GIMPLE_PASS, but by pass_all_optimizations time,
> we have removed cgraph and gimple bodies. Seeing the amount of setup I have
> to do to re-run early inlining after the gimple optimizations have begun,
> perhaps I should steer my effort to running proper IPA inlining some time
> after tmmark.

Also you would not have the TM builtin bodies available in your ltrans unit
because nothing calls them.  So anything that requires LTO (to see the
bodies in the first place) but does not expose the calls before LTO bytecode
output is not going to work.

> Before I embark on more surgery, I would like your input.  I am entertaining
> the following two options:
>
> a) Have tmmark set up IPA infrastructure for ipa_inline() to
> run, and run it directly at the end of the pass (instead of through the pass
> manager).  Ugly, but localized.

See above - you'd have nothing to inline.

> b) Modify execute_pass_list() so subpasses can be IPA passes.  Set up
> appropriate infrastructure as in (a), and run this specialized IPA inline
> (or whatever subpasses we may add in the future).  This is converse to
> ipa_*summaries*() where we run subpasses that are local passes.  Generic,
> but I question whether anyone else will ever need this.

See above - you'd have nothing to inline.

> What do you think?  Am I nuts to even consider this?  Other ideas?

The only way is to run tmmark before LTO bytecode output.  Really.
Or expand the builtins not to calls but inline code at RTL expansion time.
Or give up on the idea ...

Thanks,
Richard.

> BTW, I still question whether even inlining will gain us much, since after
> tmmark there are few optimizations left to run (except RTL optimizations).
>  So I would guess that any gain from TM builtin inlining will be speed and
> any benefits RTL optimizations can give us. Still...I'm willing to play
> along a bit longer...
>
> Thanks.
> Aldy

Re: LTO inlining of transactional builtins

Reply via email to