Re: Where did my function go?
> On Wed, Oct 21, 2020 at 5:21 AM Gary Oblock wrote: > > > > >IPA transforms happens when get_body is called. With LTO this also > > >trigger reading the body from disk. So if you want to see all bodies > > >and work on them, you can simply call get_body on everything but it will > > >result in increased memory use since everything will be loaded form disk > > >and expanded (by inlining) at once instead of doing it on per-function > > >basis. > > Jan, > > > > Doing > > > > FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node) node->get_body (); > > > > instead of > > > > FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node) node->get_untransformed_body (); > > > > instantaneously breaks everything... > > I think during WPA you cannot do ->get_body (), only > ->get_untransformed_body (). But > we don't know yet where in the IPA process you're experiencing the issue. Originally get_body is designed to work in WPA as well: the info about what transforms are to be applied is kept in a vector with per-function granuality. But there may be some issues as this path is untested and i.e ipa-sra/ipa-prop does quite difficult transformations these days. What happens? Honza > > Richard. > > > Am I missing something? > > > > Gary > > > > From: Jan Hubicka > > Sent: Tuesday, October 20, 2020 4:34 AM > > To: Richard Biener > > Cc: GCC Development ; Gary Oblock > > > > Subject: Re: Where did my function go? > > > > [EXTERNAL EMAIL NOTICE: This email originated from an external sender. > > Please be mindful of safe email handling and proprietary information > > protection practices.] > > > > > > > > On Tue, Oct 20, 2020 at 1:02 PM Martin Jambor wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Tue, Oct 20 2020, Richard Biener wrote: > > > > > > On Mon, Oct 19, 2020 at 7:52 PM Gary Oblock > > > > > > wrote: > > > > > >> > > > > > >> Richard, > > > > > >> > > > > > >> I guess that will work for me. However, since it > > > > > >> was decided to remove an identical function, > > > > > >> why weren't the calls to it adjusted to reflect it? > > > > > >> If the call wasn't transformed that means it will > > > > > >> be mapped at some later time. Is that mapping > > > > > >> available to look at? Because using that would > > > > > >> also be a potential solution (assuming call > > > > > >> graph information exists for the deleted function.) > > > > > > > > > > > > I'm not sure how the transitional cgraph looks like > > > > > > during WPA analysis (which is what we're talking about?), > > > > > > but definitely the IL is unmodified in that state. > > > > > > > > > > > > Maybe Martin has an idea. > > > > > > > > > > > > > > > > Exactly, the cgraph_edges is where the correct call information is > > > > > stored until the inlining transformation phase calls > > > > > cgraph_edge::redirect_call_stmt_to_callee is called on it - inlining > > > > > is > > > > > a special pass in this regard that performs this IPA-infrastructure > > > > > function in addition to actual inlining. > > > > > > > > > > In cgraph means the callee itself but also information in > > > > > e->callee->clone.param_adjustments which might be interesting for any > > > > > struct-reorg-like optimizations (...and in future possibly in other > > > > > transformation summaries). > > > > > > > > > > The late IPA passes are in very unfortunate spot here since they run > > > > > before the real-IPA transformation phases but after unreachable node > > > > > removals and after clone materializations and so can see some but not > > > > > all of the changes performed by real IPA passes. The reason for that > > > > > is > > > > > good cache locality when late IPA passes are either not run at all or > > > > > only look at small portion of the compilation unit. In such case IPA > > > > > transformations of a function are followed by all the late passes > > > > > working on the same function. > > > > > > > > > > Late IPA passes are unfortunately second class citizens and I would > > > > > strongly recommend not to use them since they do not fit into our > > > > > otherwise robust IPA framework very well. We could probably provide a > > > > > mechanism that would allow late IPA passes to run all normal IPA > > > > > transformations on a function so they could clearly see what they are > > > > > looking at, but extensive use would slow compilation down so its use > > > > > would be frowned upon at the very least. > > > > > > > > So IPA PTA does get_body () on the nodes it wants to analyze and I > > > > thought that triggers any pending IPA transforms? > > > > > > Yes, it does (and get_untransormed_body does not) > > And to bit correct Maritn's explanation: the late IPA passes are > > intended to work, though I was mostly planning them for prototyping true > > ipa passes and also possibly for implementing passes that inspect only > > few functions. > > > > IPA transforms happens when get_body is called. With LTO this also > > trigger reading the b
Re: LTO slows down calculix by more than 10% on aarch64
On Thu, 24 Sep 2020 at 16:44, Richard Biener wrote: > > On Thu, Sep 24, 2020 at 12:36 PM Prathamesh Kulkarni > wrote: > > > > On Wed, 23 Sep 2020 at 16:40, Richard Biener > > wrote: > > > > > > On Wed, Sep 23, 2020 at 12:11 PM Prathamesh Kulkarni > > > wrote: > > > > > > > > On Wed, 23 Sep 2020 at 13:22, Richard Biener > > > > wrote: > > > > > > > > > > On Tue, Sep 22, 2020 at 6:25 PM Prathamesh Kulkarni > > > > > wrote: > > > > > > > > > > > > On Tue, 22 Sep 2020 at 16:36, Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, 22 Sep 2020 at 12:56, Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I obtained perf stat results for following > > > > > > > > > > > > > > benchmark runs: > > > > > > > > > > > > > > > > > > > > > > > > > > > > -O2: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7856832.692380 task-clock (msec) # > > > > > > > > > > > > > >1.000 CPUs utilized > > > > > > > > > > > > > > 3758 context-switches > > > > > > > > > > > > > >#0.000 K/sec > > > > > > > > > > > > > > 40 cpu-migrations > > > > > > > > > > > > > > #0.000 K/sec > > > > > > > > > > > > > > 40847 page-faults > > > > > > > > > > > > > > #0.005 K/sec > > > > > > > > > > > > > > 7856782413676 cycles > > > > > > > > > > > > > > #1.000 GHz > > > > > > > > > > > > > > 6034510093417 instructions > > > > > > > > > > > > > >#0.77 insn per cycle > > > > > > > > > > > > > > 363937274287 branches > > > > > > > > > > > > > > # 46.321 M/sec > > > > > > > > > > > > > >48557110132 branch-misses > > > > > > > > > > > > > > # 13.34% of all branches > > > > > > > > > > > > > > > > > > > > > > > > > > (ouch, 2+ hours per run is a lot, collecting a > > > > > > > > > > > > > profile over a minute should be > > > > > > > > > > > > > enough for this kind of code) > > > > > > > > > > > > > > > > > > > > > > > > > > > -O2 with orthonl inlined: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 8319643.114380 task-clock (msec) # > > > > > > > > > > > > > > 1.000 CPUs utilized > > > > > > > > > > > > > > 4285 context-switches > > > > > > > > > > > > > > #0.001 K/sec > > > > > > > > > > > > > > 28 cpu-migrations > > > > > > > > > > > > > > #0.000 K/sec > > > > > > > > > > > > > > 40843 page-faults > > > > > > > > > > > > > > #0.005 K/sec > > > > > > > > > > > > > > 8319591038295 cycles > > > > > > > > > > > > > > #1.000 GHz > > > > > > > > > > > > > > 6276338800377 instructions > > > > > > > > > > > > > > #0.75 insn per cycle > > > > > > > > > > > > > > 467400726106 branches > > > > > > > > > > > > > ># 56.180 M/sec > > > > > > > > > > > > > >45986364011branch-misses > > > > > > > > > > > > > > #9.84% of all branches > > > > > > > > > > > > > > > > > > > > > > > > > > So +100e9 branches, but +240e9 instructions and > > > > > > > > > > > > > +480e9 cycles, probably implying > > > > > > > > > > > > > that extra instructions are appearing in this loop > > > > > > > > > > > > > nest, but not in the innermost > > > > > > > > > > > > > loop. As a reminder for others, the innermost loop > > > > > > > > > > > > > has only 3 iterations. > > > > > > > > > > > > > > > > > > > > > > > > > > > -O2 with orthonl inlined and PRE disabled (this > > > > > > > > > > > > > > removes the extra branches): > > > > > > > > > > > > > > > > > > > > > > > > > > > >8207331.088040 task-clock (msec) # > > > > > > > > > > > > > > 1.000 CPUs utilized > > > > > > > > > > > > > > 2266 context-switches > > > > > > > > > > > > > > #0.000 K/sec > > > > > > > > > > > > > > 32 cpu-migrations > > > > > > > > > > > > > > #0.000 K/sec > > > > > > > > > > > > > > 40846 page-faults > > > > > > > > > > > > >
Re: LTO slows down calculix by more than 10% on aarch64
On Wed, Oct 21, 2020 at 12:04 PM Prathamesh Kulkarni wrote: > > On Thu, 24 Sep 2020 at 16:44, Richard Biener > wrote: > > > > On Thu, Sep 24, 2020 at 12:36 PM Prathamesh Kulkarni > > wrote: > > > > > > On Wed, 23 Sep 2020 at 16:40, Richard Biener > > > wrote: > > > > > > > > On Wed, Sep 23, 2020 at 12:11 PM Prathamesh Kulkarni > > > > wrote: > > > > > > > > > > On Wed, 23 Sep 2020 at 13:22, Richard Biener > > > > > wrote: > > > > > > > > > > > > On Tue, Sep 22, 2020 at 6:25 PM Prathamesh Kulkarni > > > > > > wrote: > > > > > > > > > > > > > > On Tue, 22 Sep 2020 at 16:36, Richard Biener > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, Sep 22, 2020 at 11:37 AM Prathamesh Kulkarni > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Tue, 22 Sep 2020 at 12:56, Richard Biener > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Sep 22, 2020 at 7:08 AM Prathamesh Kulkarni > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I obtained perf stat results for following > > > > > > > > > > > > > > > benchmark runs: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -O2: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7856832.692380 task-clock (msec) > > > > > > > > > > > > > > > #1.000 CPUs utilized > > > > > > > > > > > > > > > 3758 context-switches > > > > > > > > > > > > > > > #0.000 K/sec > > > > > > > > > > > > > > > 40 cpu-migrations > > > > > > > > > > > > > > > #0.000 K/sec > > > > > > > > > > > > > > > 40847 page-faults > > > > > > > > > > > > > > > #0.005 K/sec > > > > > > > > > > > > > > > 7856782413676 cycles > > > > > > > > > > > > > > >#1.000 GHz > > > > > > > > > > > > > > > 6034510093417 instructions > > > > > > > > > > > > > > > #0.77 insn per cycle > > > > > > > > > > > > > > > 363937274287 branches > > > > > > > > > > > > > > > # 46.321 M/sec > > > > > > > > > > > > > > >48557110132 branch-misses > > > > > > > > > > > > > > > # 13.34% of all branches > > > > > > > > > > > > > > > > > > > > > > > > > > > > (ouch, 2+ hours per run is a lot, collecting a > > > > > > > > > > > > > > profile over a minute should be > > > > > > > > > > > > > > enough for this kind of code) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -O2 with orthonl inlined: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 8319643.114380 task-clock (msec) # > > > > > > > > > > > > > > >1.000 CPUs utilized > > > > > > > > > > > > > > > 4285 context-switches > > > > > > > > > > > > > > > #0.001 K/sec > > > > > > > > > > > > > > > 28 cpu-migrations > > > > > > > > > > > > > > >#0.000 K/sec > > > > > > > > > > > > > > > 40843 page-faults > > > > > > > > > > > > > > >#0.005 K/sec > > > > > > > > > > > > > > > 8319591038295 cycles > > > > > > > > > > > > > > > #1.000 GHz > > > > > > > > > > > > > > > 6276338800377 instructions > > > > > > > > > > > > > > > #0.75 insn per cycle > > > > > > > > > > > > > > > 467400726106 branches > > > > > > > > > > > > > > > # 56.180 M/sec > > > > > > > > > > > > > > >45986364011branch-misses > > > > > > > > > > > > > > >#9.84% of all branches > > > > > > > > > > > > > > > > > > > > > > > > > > > > So +100e9 branches, but +240e9 instructions and > > > > > > > > > > > > > > +480e9 cycles, probably implying > > > > > > > > > > > > > > that extra instructions are appearing in this loop > > > > > > > > > > > > > > nest, but not in the innermost > > > > > > > > > > > > > > loop. As a reminder for others, the innermost loop > > > > > > > > > > > > > > has only 3 iterations. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -O2 with orthonl inlined and PRE disabled (this > > > > > > > > > > > > > > > removes the extra branches): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >8207331.088040 task-clock (msec) # > > > > > > > > > > > > > > > 1.000 CPUs utilized > > > > > > > > > > > > > > > 2266 context-switches > > > > > > > > > > > > > >
The Next GCC/LLVM/RISC-V meetup in China: Hangzhou, Oct 24, 2020
Hi all, The Next OSDT (aka HelloLLVM/HelloGCC) meetup in China will happen on Oct 24, 2020. The location is at Hangzhou. Everyone interested in GCC/LLVM Toolchain related projects and/or RISC-V is invited to join. Event details is at Chinese Version: https://github.com/hellogcc/osdt-weekly/blob/master/events/2020-10-24-hangzhou-meetup.md English Version: https://github.com/hellogcc/osdt-weekly/blob/master/events/2020-10-24-hangzhou-meetup.en.md Presentations are welcome :-) Current Topics: - Wei Wu - Recent Progress in RISC-V International - Ningning Shi - Intro of ART OptimizingCompiler - Weiwei Li - Learning QEMU/RISU - Free discussion Looking forward to meeting you! -- Best wishes, Wei Wu (吴伟)