On Thu, 2021-01-21 at 20:09 +0100, Jan Hubicka wrote:
> > On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote:
> > > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > gimple.h has this comment for gimple's uid field:
> > > > > 
> > > > >   /* UID of this statement.  This is used by passes that want
> > > > > to
> > > > >      assign IDs to statements.  It must be assigned and used
> > > > > by
> > > > > each
> > > > >      pass.  By default it should be assumed to contain
> > > > > garbage.  */
> > > > >   unsigned uid;
> > > > > 
> > > > > and gimple_set_uid has:
> > > > > 
> > > > >    Please note that this UID property is supposed to be
> > > > > undefined
> > > > > at
> > > > >    pass boundaries.  This means that a given pass should not
> > > > > assume it
> > > > >    contains any useful value when the pass starts and thus
> > > > > can
> > > > > set it
> > > > >    to any value it sees fit.
> > > > > 
> > > > > which suggests that any pass can use the uid field as an
> > > > > arbitrary
> > > > > scratch space.
> > > > > 
> > > > > PR analyzer/98599 reports a case where this error occurs in
> > > > > LTO
> > > > > mode:
> > > > >   fatal error: Cgraph edge statement index out of range
> > > > > on certain inputs with -fanalyzer.
> > > > > 
> > > > > The error occurs in the LTRANS phase after -fanalyzer runs in
> > > > > the
> > > > > WPA phase.  The analyzer pass writes to the uid fields of all
> > > > > stmts.
> > > > > 
> > > > > The error occurs when LTRANS is streaming callgraph edges
> > > > > back
> > > > > in.
> > > > > If I'm reading things correctly, the LTO format uses stmt
> > > > > uids to
> > > > > associate call stmts with callgraph edges between WPA and
> > > > > LTRANS.
> > > > > For example, in lto-cgraph.c, lto_output_edge writes out the
> > > > > gimple_uid, and input_edge reads it back in.
> > > > > 
> > > > > Hence IPA passes that touch the uids in WPA need to restore
> > > > > them,
> > > > > or the stream-in at LTRANS will fail.
> > > > > 
> > > > > Is it intended that the LTO machinery relies on the value of
> > > > > the
> > > > > uid
> > > > > field being preserved during WPA (or, at least, needs to be
> > > > > saved
> > > > > and
> > > > > restored by passes that touch it)?
> > > > 
> > > > I belive this is solely at the cgraph stream out to stream in
> > > > boundary but
> > > > this may be a blurred area since while we materialize the whole
> > > > cgraph
> > > > at once the function bodies are streamed in on demand.
> > > > 
> > > > Honza can probably clarify things.
> > > 
> > > Well, the uids are used to associate cgraph edges with
> > > statements.  At
> > > WPA stage you do not have function bodies and thus uids serves
> > > role
> > > of
> > > pointers to the statement.  If you load the body in (via
> > > get_body)
> > > the
> > > uids are replaced by pointers and when you stream out uids are
> > > recomputed again.
> > > 
> > > When do you touch the uids? At WPA time or from small IPA pass in
> > > ltrans?
> > 
> > The analyzer is here in passes.def:
> >   INSERT_PASSES_AFTER (all_regular_ipa_passes)
> >   NEXT_PASS (pass_analyzer);
> > 
> > and so in LTO runs as the first regular IPA pass at WPA time,
> > when do_whole_program_analysis calls:
> >   execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes);
> > 
> > FWIW I hope to eventually have a way to summarize function bodies
> > in
> > the analyzer, but I don't yet, so I'm currently brute forcing
> > things by
> > loading all function bodies at the start of the analyzer (when
> > -fanalyzer is enabled).
> > 
> > I wonder if that's messing things up somehow?
> 
> Actually I think it should work.  If you do get_body or
> get_untransformed_body (that will be equal at that time) you ought to
> get ids in symtab datastructure rewritten to pointers and at stream
> out
> time we should assign new ids...
> > Does the stream-out from WPA make any assumptions about the stmt
> > uids?
> > For example, 
> >   #define STMT_UID_NOT_IN_RANGE(uid) \
> >     (gimple_stmt_max_uid (fn) < uid || uid == 0)
> > seems to assume that the UIDs are per-function ranges from
> >   [0-gimple_stmt_max_uid (fn)]
> > which isn't the case for the uids set by the analyzer.  Maybe
> > that's
> > the issue here?
> > 
> > Sorry for not being more familiar with the IPA/LTO code
> 
> There is lto_prepare_function_for_streaming which assigns uids to be
> incremental.   So I guess problem is that it is not called at WPA
> time
> if function is in memory (since at moment we do not really modify
> bodies
> at WPA time, however we do stream them in sometimes to icf compare
> them
> or to update profile).
> 
> So i guess fix would be to arrange lto_prepare_function_for_streaming
> to
> be called on all functions with body defined before WPA stream-out?

Thanks; I'll have a go at implementing that.
Dave

> Honza
> > Dave
> > 
> > 
> > > hozna
> > > > Note LTO uses this exactly because of this comment to avoid
> > > > allocating
> > > > extra memory for an 'index' but it could of course leave
> > > > gimple_uid
> > > > alone
> > > > at some extra expense (eventually paid for in generic cgraph
> > > > data
> > > > structures
> > > > and thus for not only the streaming time).
> > > > 
> > > > > On the assumption that this is the case, this patch updates
> > > > > the
> > > > > comments
> > > > > in gimple.h referring to passes being able to set uid to any
> > > > > value to
> > > > > note the caveat for IPA passes, and it updates the analyzer
> > > > > to
> > > > > save
> > > > > and restore the UIDs, fixing the error.
> > > > > 
> > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-
> > > > > gnu.
> > > > > OK for master?
> > > > 
> > > > The analyzer bits are OK, let's see how Honza can clarify the
> > > > situation.
> > > > 
> > > > Thanks,
> > > > Richard.

Reply via email to