On Thu, 2021-01-14 at 15:00 +0100, Jan Hubicka wrote: > > On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > gimple.h has this comment for gimple's uid field: > > > > > > /* UID of this statement. This is used by passes that want to > > > assign IDs to statements. It must be assigned and used by > > > each > > > pass. By default it should be assumed to contain > > > garbage. */ > > > unsigned uid; > > > > > > and gimple_set_uid has: > > > > > > Please note that this UID property is supposed to be undefined > > > at > > > pass boundaries. This means that a given pass should not > > > assume it > > > contains any useful value when the pass starts and thus can > > > set it > > > to any value it sees fit. > > > > > > which suggests that any pass can use the uid field as an > > > arbitrary > > > scratch space. > > > > > > PR analyzer/98599 reports a case where this error occurs in LTO > > > mode: > > > fatal error: Cgraph edge statement index out of range > > > on certain inputs with -fanalyzer. > > > > > > The error occurs in the LTRANS phase after -fanalyzer runs in the > > > WPA phase. The analyzer pass writes to the uid fields of all > > > stmts. > > > > > > The error occurs when LTRANS is streaming callgraph edges back > > > in. > > > If I'm reading things correctly, the LTO format uses stmt uids to > > > associate call stmts with callgraph edges between WPA and LTRANS. > > > For example, in lto-cgraph.c, lto_output_edge writes out the > > > gimple_uid, and input_edge reads it back in. > > > > > > Hence IPA passes that touch the uids in WPA need to restore them, > > > or the stream-in at LTRANS will fail. > > > > > > Is it intended that the LTO machinery relies on the value of the > > > uid > > > field being preserved during WPA (or, at least, needs to be saved > > > and > > > restored by passes that touch it)? > > > > I belive this is solely at the cgraph stream out to stream in > > boundary but > > this may be a blurred area since while we materialize the whole > > cgraph > > at once the function bodies are streamed in on demand. > > > > Honza can probably clarify things. > > Well, the uids are used to associate cgraph edges with > statements. At > WPA stage you do not have function bodies and thus uids serves role > of > pointers to the statement. If you load the body in (via get_body) > the > uids are replaced by pointers and when you stream out uids are > recomputed again. > > When do you touch the uids? At WPA time or from small IPA pass in > ltrans?
The analyzer is here in passes.def: INSERT_PASSES_AFTER (all_regular_ipa_passes) NEXT_PASS (pass_analyzer); and so in LTO runs as the first regular IPA pass at WPA time, when do_whole_program_analysis calls: execute_ipa_pass_list (g->get_passes ()->all_regular_ipa_passes); FWIW I hope to eventually have a way to summarize function bodies in the analyzer, but I don't yet, so I'm currently brute forcing things by loading all function bodies at the start of the analyzer (when -fanalyzer is enabled). I wonder if that's messing things up somehow? Does the stream-out from WPA make any assumptions about the stmt uids? For example, #define STMT_UID_NOT_IN_RANGE(uid) \ (gimple_stmt_max_uid (fn) < uid || uid == 0) seems to assume that the UIDs are per-function ranges from [0-gimple_stmt_max_uid (fn)] which isn't the case for the uids set by the analyzer. Maybe that's the issue here? Sorry for not being more familiar with the IPA/LTO code Dave > hozna > > Note LTO uses this exactly because of this comment to avoid > > allocating > > extra memory for an 'index' but it could of course leave gimple_uid > > alone > > at some extra expense (eventually paid for in generic cgraph data > > structures > > and thus for not only the streaming time). > > > > > On the assumption that this is the case, this patch updates the > > > comments > > > in gimple.h referring to passes being able to set uid to any > > > value to > > > note the caveat for IPA passes, and it updates the analyzer to > > > save > > > and restore the UIDs, fixing the error. > > > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > > OK for master? > > > > The analyzer bits are OK, let's see how Honza can clarify the > > situation. > > > > Thanks, > > Richard. > > > > > gcc/analyzer/ChangeLog: > > > PR analyzer/98599 > > > * supergraph.cc (saved_uids::make_uid_unique): New. > > > (saved_uids::restore_uids): New. > > > (supergraph::supergraph): Replace assignments to stmt- > > > >uid with > > > calls to m_stmt_uids.make_uid_unique. > > > (supergraph::~supergraph): New. > > > * supergraph.h (class saved_uids): New. > > > (supergraph::~supergraph): New decl. > > > (supergraph::m_stmt_uids): New field. > > > > > > gcc/ChangeLog: > > > PR analyzer/98599 > > > * doc/gimple.texi: Document that UIDs must not change > > > during IPA > > > passes. > > > * gimple.h (gimple::uid): Likewise. > > > (gimple_set_uid): Likewise. > > > > > > gcc/testsuite/ChangeLog: > > > PR analyzer/98599 > > > * gcc.dg/analyzer/pr98599-a.c: New test. > > > * gcc.dg/analyzer/pr98599-b.c: New test. > > > --- > > > gcc/analyzer/supergraph.cc | 53 > > > +++++++++++++++++++++-- > > > gcc/analyzer/supergraph.h | 15 +++++++ > > > gcc/doc/gimple.texi | 6 +++ > > > gcc/gimple.h | 13 +++++- > > > gcc/testsuite/gcc.dg/analyzer/pr98599-a.c | 8 ++++ > > > gcc/testsuite/gcc.dg/analyzer/pr98599-b.c | 1 + > > > 6 files changed, 90 insertions(+), 6 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c > > > create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c > > > > > > diff --git a/gcc/analyzer/supergraph.cc > > > b/gcc/analyzer/supergraph.cc > > > index 419f6424f76..40acfbd16a8 100644 > > > --- a/gcc/analyzer/supergraph.cc > > > +++ b/gcc/analyzer/supergraph.cc > > > @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple > > > *stmt) > > > return edge; > > > } > > > > > > +/* class saved_uids. > > > + > > > + In order to ensure consistent results without relying on the > > > ordering > > > + of pointer values we assign a uid to each gimple stmt, > > > globally unique > > > + across all functions. > > > + > > > + Normally, the stmt uids are a scratch space that each pass > > > can freely > > > + assign its own values to. However, in the case of LTO, the > > > uids are > > > + used to associate call stmts with callgraph edges between the > > > WPA phase > > > + (where the analyzer runs in LTO mode) and the LTRANS phase; > > > if the > > > + analyzer changes them in the WPA phase, it leads to errors > > > when > > > + streaming the code back in at LTRANS. > > > + > > > + Hence this class has responsibility for tracking the original > > > uids > > > + and restoring them once the pass is complete (in the > > > supergraph dtor). */ > > > + > > > +/* Give STMT a globally unique uid, storing its original uid so > > > it can > > > + later be restored. */ > > > + > > > +void > > > +saved_uids::make_uid_unique (gimple *stmt) > > > +{ > > > + unsigned next_uid = m_old_stmt_uids.length (); > > > + unsigned old_stmt_uid = stmt->uid; > > > + stmt->uid = next_uid; > > > + m_old_stmt_uids.safe_push > > > + (std::pair<gimple *, unsigned> (stmt, old_stmt_uid)); > > > +} > > > + > > > +/* Restore the saved uids of all stmts. */ > > > + > > > +void > > > +saved_uids::restore_uids () const > > > +{ > > > + unsigned i; > > > + std::pair<gimple *, unsigned> *pair; > > > + FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair) > > > + pair->first->uid = pair->second; > > > +} > > > + > > > /* supergraph's ctor. Walk the callgraph, building supernodes > > > for each > > > CFG basic block, splitting the basic blocks at > > > callsites. Join > > > together the supernodes with interprocedural and > > > intraprocedural > > > @@ -101,8 +141,6 @@ supergraph::supergraph (logger *logger) > > > > > > /* First pass: make supernodes (and assign UIDs to the gimple > > > stmts). */ > > > { > > > - unsigned next_uid = 0; > > > - > > > /* Sort the cgraph_nodes? */ > > > cgraph_node *node; > > > FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node) > > > @@ -127,7 +165,7 @@ supergraph::supergraph (logger *logger) > > > { > > > gimple *stmt = gsi_stmt (gpi); > > > m_stmt_to_node_t.put (stmt, node_for_stmts); > > > - stmt->uid = next_uid++; > > > + m_stmt_uids.make_uid_unique (stmt); > > > } > > > > > > /* Append statements from BB to the current supernode, > > > splitting > > > @@ -139,7 +177,7 @@ supergraph::supergraph (logger *logger) > > > gimple *stmt = gsi_stmt (gsi); > > > node_for_stmts->m_stmts.safe_push (stmt); > > > m_stmt_to_node_t.put (stmt, node_for_stmts); > > > - stmt->uid = next_uid++; > > > + m_stmt_uids.make_uid_unique (stmt); > > > if (cgraph_edge *edge = supergraph_call_edge (fun, > > > stmt)) > > > { > > > m_cgraph_edge_to_caller_prev_node.put(edge, > > > node_for_stmts); > > > @@ -257,6 +295,13 @@ supergraph::supergraph (logger *logger) > > > } > > > } > > > > > > +/* supergraph's dtor. Reset stmt uids. */ > > > + > > > +supergraph::~supergraph () > > > +{ > > > + m_stmt_uids.restore_uids (); > > > +} > > > + > > > /* Dump this graph in .dot format to PP, using DUMP_ARGS. > > > Cluster the supernodes by function, then by BB from original > > > CFG. */ > > > > > > diff --git a/gcc/analyzer/supergraph.h > > > b/gcc/analyzer/supergraph.h > > > index fc4a753c5a4..b2125a35410 100644 > > > --- a/gcc/analyzer/supergraph.h > > > +++ b/gcc/analyzer/supergraph.h > > > @@ -79,6 +79,18 @@ struct supergraph_traits > > > typedef supercluster cluster_t; > > > }; > > > > > > +/* A class to manage the setting and restoring of statement > > > uids. */ > > > + > > > +class saved_uids > > > +{ > > > +public: > > > + void make_uid_unique (gimple *stmt); > > > + void restore_uids () const; > > > + > > > +private: > > > + auto_vec<std::pair<gimple *, unsigned> > m_old_stmt_uids; > > > +}; > > > + > > > /* A "supergraph" is a directed graph formed by joining together > > > all CFGs, > > > linking them via interprocedural call and return edges. > > > > > > @@ -90,6 +102,7 @@ class supergraph : public > > > digraph<supergraph_traits> > > > { > > > public: > > > supergraph (logger *logger); > > > + ~supergraph (); > > > > > > supernode *get_node_for_function_entry (function *fun) const > > > { > > > @@ -205,6 +218,8 @@ private: > > > > > > typedef hash_map<const function *, unsigned> > > > function_to_num_snodes_t; > > > function_to_num_snodes_t m_function_to_num_snodes; > > > + > > > + saved_uids m_stmt_uids; > > > }; > > > > > > /* A node within a supergraph. */ > > > diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi > > > index 4b3d7d7452e..417f37169ff 100644 > > > --- a/gcc/doc/gimple.texi > > > +++ b/gcc/doc/gimple.texi > > > @@ -173,6 +173,12 @@ This is an unsigned integer used by passes > > > that want to assign > > > IDs to every statement. These IDs must be assigned and used by > > > each pass. > > > > > > +Normally the uid field is a scratch space for use by passes that > > > +need one. However, in LTO mode the UIDs are used to associate > > > call > > > +statements with callgraph edges between the WPA phase and the > > > LTRANS > > > +phase; hence if they are changed by an IPA pass they should be > > > restored > > > +at the end of that pass. > > > + > > > @item @code{location} > > > This is a @code{location_t} identifier to specify source code > > > location for this statement. It is inherited from the front > > > diff --git a/gcc/gimple.h b/gcc/gimple.h > > > index 3ec86f5f082..7d01bd4dd6a 100644 > > > --- a/gcc/gimple.h > > > +++ b/gcc/gimple.h > > > @@ -260,7 +260,11 @@ struct GTY((desc > > > ("gimple_statement_structure (&%h)"), tag ("GSS_BASE"), > > > > > > /* UID of this statement. This is used by passes that want to > > > assign IDs to statements. It must be assigned and used by > > > each > > > - pass. By default it should be assumed to contain > > > garbage. */ > > > + pass. By default it should be assumed to contain garbage. > > > + However, in LTO mode the UIDs are used to associate call > > > stmts with > > > + callgraph edges between the WPA phase and the LTRANS phase; > > > hence > > > + if they are changed by an IPA pass they should be restored > > > at the > > > + end of that pass. */ > > > unsigned uid; > > > > > > /* [ WORD 2 ] > > > @@ -2043,7 +2047,12 @@ gimple_plf (gimple *stmt, enum plf_mask > > > plf) > > > Please note that this UID property is supposed to be > > > undefined at > > > pass boundaries. This means that a given pass should not > > > assume it > > > contains any useful value when the pass starts and thus can > > > set it > > > - to any value it sees fit. */ > > > + to any value it sees fit. > > > + > > > + That said, in LTO mode the UIDs are used to associate call > > > stmts with > > > + callgraph edges between the WPA phase and the LTRANS phase; > > > hence > > > + if they are changed by an IPA pass they should be restored at > > > the > > > + end of that pass. */ > > > > > > static inline void > > > gimple_set_uid (gimple *g, unsigned uid) > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c > > > new file mode 100644 > > > index 00000000000..2bbf37b0e6e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-a.c > > > @@ -0,0 +1,8 @@ > > > +/* { dg-do link } */ > > > +/* { dg-require-effective-target lto } */ > > > +/* { dg-additional-options "-Os -flto" } */ > > > +/* { dg-additional-sources pr98599-b.c } */ > > > + > > > +int b(int x); > > > +int a() { b(5); } > > > +int main() { a(); } > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c > > > b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c > > > new file mode 100644 > > > index 00000000000..cfdeb3bf134 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr98599-b.c > > > @@ -0,0 +1 @@ > > > +int b(int x) { return x; } > > > -- > > > 2.26.2 > > >