> > Areas that are confusing and need clean up (IMO) include: > 1) handling of aliases and clones
I am slowly cleaning up alias stuff, it had major reorg in 4.7 and further cleanups in 4.8. Do you have more specific suggestions? > 2) reachability, needed, analyzed bits. The needed bit is not in sync > with the call to check if a function is needed. Bits and attributes > used in analysis should be isolated out from the core data structure. This is mostly done. Reachable and needed flags are removed. We still have analyzed flag, but it really has meaning "is definition" these days except for early cgraph construction phase, so I intend to rename it (and make cgraph construction to use its own bitmap). > 3) ipa references -- should they be modeled as special edges? They are special edges. Different in memory from cgraph edges though. I did some estimates how much memory would be needed for representing them by the cgraph_edge way and it was quite steep for Mozilla LTO (over a gig, as compared to about 200MB we need now). We could also rework cgraph edges into vector represnetaiton instead of doubly linked lists to save some memory, not sure how fruitful that would be. Still different sizes of in memory representations and thus different vectors is desirable here. > > >> > >> > > >> >> 2) Introduce a global symbol table containing a function table > >> >> and a global variable table. The function table should replace > >> >> the current cgraph node link list, and the variable table replaces > >> >> the varpool. The symbol table should provide basic interfaces to > >> >> do named based lookup, traversal, alias handling etc. I noticed > >> >> trunk already has some of that -- but it seems more abstraction > >> >> is needed. > >> > > >> > Do you mean moving away from a pointer-based approach? > >> > >> See above. I mean it is important to have well defined symtab > >> interfaces that hide implementation as much as possible. It will make > >> the interfaces more stable. It is currently quite difficult for > >> cgraph/varpool related changes in gcc branches to keep up with trunk > >> without stable core APIs. > > > > Well, the APIs did not changed much over years (LTO brought in some more > > busy > > changes, but many were just hacks that needed redesign anyway), in 4.8 we > > changed quite a lot because of introduction of the symbol table. It would be > > really nice to get as much as C++ conversion into 4.8, too, since we broke > > most > > of existing patches anyway, to get more stability frm 4.9 up. > > > >> >> 3) it seems natural to drop 'node' in the naming scheme > >> >> > >> >> symbol (symtab_entry) --> base class; > >> >> function --> derived from symbol; (It conflicts with the existing > >> >> struct function though); > >> >> variable --> derived from symbol; > >> >> > >> >> typedef node<function> cnode; > >> > > >> > A node<function> is not a derived class of node<symbol> even when > >> > function is derived from symbol. That property is helpful in > >> > ensuring usable type safety. > >> > >> > >> We don't need node<symbol> -- only node<function> is needed, and it is > >> derived from function and function is derived from symbol. > > > > Yes, this is where I expected to land. I expect to see more complex > > inheritance > > tree. Currently we have: > > > > symbol > > | > > +-function (cgraph) > > | > > +-variable (varpool) > > > > While function can have five types - just external declarations, actual > > definitions with body (analyzed flag is set), thunk, alias and virtual clone > > > > Similarly variables are declarations, ones with body (analyzed flag is set), > > and aliases. > > > > We do not represent labels and const_decl references. We ought: those are > > also > > symbols and they are important for partitioning. > > > > We probably want rither structure here in longer term: > > > > symbol > > | > > +-function declaration > > | | > > | +-function definition > > | | > > | +-function alternative entry point (thunk) > > | | > > | +-function alias > > | | > > | +-virtual clone > > | > > +-variable declaration > > | | > > | +-variable definition (possibly constant pool references via const_decl > > here, too) > > | | > > | +-variable alias > > | > > +-function label definition > > > > In particular we do not want to store all the stuff we store for function > > definitions for declarations. > > True, but that is at the cost of too complex class hierarchy. Is it > enough to have 'has_body()' to differentiate def vs declarations (the > meaning of 'is_external' is overloaded)? The additional information > can be accessed via indirection (to additional data structure). We can, not sure how effective those sparse tables will be in practice however, Some of this stuff is quite critical during WPA stage. > > For aliases, why can't those nodes be merged into one node (mapped > from all assembler names in the symtab)? aliases will just be > attribute of the symbol). I attempted to merge aliases into one node for cgraph/varpool, but this did not really work - the problem is that different symbols do have different visibilities (one is overwritable one not). Consequentely you need to track to which specific alias call/reference is going. So we ended up doing this only for same body aliases and eventually I got convinced to reorg the code other way (in 4.7) so the aliases are all explicitely referenced. Except for need to walk the alises to real node and work out the visibility, there is not much to worry about. Sadly I do not think aliases can be fully hidden in the abstraction - you need to actually think of them when doing many of IPA transforms. > > Thunks are only referenced from vtables. Is there a need to create > cgraph nodes for them? This is not really true. thunks may get into code via constant folding or LTO unit merging. We used to disallow direct calls to thunks in 4.6 and earlier but decided to lift this restriction in 4.7. In general thunks are just special cases of alternative entry points we will want to support in very long run. > > The clone handling code is also quite complicated. Is it possible to > simplify it? I do not really have very explicit plans here, except for getting inline clones bit more regular as part of my reorg of WPA partitioning. I also plan to clean up some of the older code in ipa.c that is now bit convluted wrt the clones because of how it developed over years. Suggestions are welcome. Honza