On Thu, Jun 12, 2014 at 12:29 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Jun 12, 2014 at 10:47 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >> Richard, >> as briefly discussed before, I would like to teach LTO type merging to not >> merge >> types that was declared in anonymous namespaces and use C++ ODR type names >> (stored in DECL_ASSEMBLER_NAME of the TYPE_DECL) to break down canonical >> types >> by their names. >> >> First thing I need to arrange IMO is to not merge two anonymous types from >> two different units. While looking into it I noticed that the current code >> in unify_scc that refuses to merge local decls produces conflicts and seems >> useless excercise to do. >> >> This patch introduces special hash code 1 that specify that given SCC is >> known >> to be local and should bypass the merging logic. This is propagated down and >> seems to quite noticeably reduce size of SCC hash: >> >> [WPA] read 10190717 SCCs of average size 1.980409 >> [WPA] 20181785 tree bodies read in total >> [WPA] tree SCC table: size 4194301, 1882700 elements, collision ratio: >> 0.815497 >> [WPA] tree SCC max chain length 140 (size 1) >> [WPA] Compared 3392363 SCCs, 2718822 collisions (0.801454) >> [WPA] Merged 3314075 SCCs >> [WPA] Merged 9693632 tree bodies >> [WPA] Merged 2467704 types >> [WPA] 1783262 types prevailed (4491218 associated trees) >> [WPA] GIMPLE canonical type table: size 131071, 94867 elements, 1783347 >> searches, 737056 collisions (ratio: 0.413299) >> [WPA] GIMPLE canonical type pointer-map: 94867 elements, 3973875 searches >> [WPA] Compression: 282828785 input bytes, 831186147 uncompressed bytes >> (ratio: 2.938832) >> [WPA] Size of mmap'd section decls: 282828785 bytes >> >> to: >> >> [WPA] read 10172291 SCCs of average size 1.982162 >> [WPA] 20163124 tree bodies read in total >> [WPA] tree SCC table: size 2097143, 988764 elements, collision ratio: >> 0.684967 >> [WPA] tree SCC max chain length 140 (size 1) >> [WPA] Compared 3060932 SCCs, 2405009 collisions (0.785711) >> [WPA] Merged 3040565 SCCs >> [WPA] Merged 9246482 tree bodies >> [WPA] Merged 2382312 types >> [WPA] 1868611 types prevailed (4728465 associated trees) >> [WPA] GIMPLE canonical type table: size 131071, 94910 elements, 1868696 >> searches, 790939 collisions (ratio: 0.423257) >> [WPA] GIMPLE canonical type pointer-map: 94910 elements, 4216423 searches >> [WPA] Compression: 273322455 input bytes, 824178095 uncompressed bytes >> (ratio: 3.015406) >> >> We merge less, but not by much and I think we was not right not merge in >> that cases. > > If we merge things we may not merge then the fix is to compare_tree_sccs_1, > not introducing special cases like you propose. > > That is, if we are not allowed to merge anonymous namespaces then > make sure we don't. We already should not merge types with > TYPE_CONTEXT == such namespace by means of > > /* ??? Global types from different TUs have non-matching > TRANSLATION_UNIT_DECLs. Still merge them if they are otherwise > equal. */ > if (TYPE_FILE_SCOPE_P (t1) && TYPE_FILE_SCOPE_P (t2)) > ; > else > compare_tree_edges (TYPE_CONTEXT (t1), TYPE_CONTEXT (t2)); > > but we possibly merge a subset of decl kinds from "different" namespaces : > > /* ??? Global decls from different TUs have non-matching > TRANSLATION_UNIT_DECLs. Only consider a small set of > decls equivalent, we should not end up merging others. */ > if ((code == TYPE_DECL > || code == NAMESPACE_DECL > || code == IMPORTED_DECL > || code == CONST_DECL > || (VAR_OR_FUNCTION_DECL_P (t1) > && (TREE_PUBLIC (t1) || DECL_EXTERNAL (t1)))) > && DECL_FILE_SCOPE_P (t1) && DECL_FILE_SCOPE_P (t2)) > ; > else > compare_tree_edges (DECL_CONTEXT (t1), DECL_CONTEXT (t2)); > > Not sure what we end up doing for NAMESPACE_DECL itself (and what > fields we stream for it). It would be interesting to check that. > > Thus, make sure we don't merge namespace {} and namespace {} from > two different units. > > But effectively you say we have two classes of "global" trees, first > those that are mergeable across TUs and second those that are not. > This IMHO means we want to separate those to two different LTO > sections and simply skip all the merging code for the second (instead > of adding hacks to the merging code).
As that also restricts the "pointers" we can have. Mergeable stuff may not refer to non-mergeable stuff. Breaks down for initializers: static int x; int *p = &x; though you could say that as p is initialized (thus not DECL_COMMON) this instance cannot be merged with anything else - other entities are 'extern int *p' (tree merging is different from symtab merging). Thus int *p = &x; is also non-mergeable (everything that has tree pointers refer to sth not mergeable is not mergeable). We have similar "issues" with tree_is_indexable and pointers violating constraints (like those variadic return types). Similar as I had the idea to compute "indexability" during the SCC walk we can compute "mergability" during it as well. Richard. > Richard. > >> >> Would something like this make sense? (I am not saying my definition of >> unit_local_tree_p >> is most polished one :) >> >> I think next step could be to make anonymous types to bypass the canonical >> type >> merging (i.e. simply save the chains as they comde from frontends forthose) >> and >> then look into computing the type names in free lang data, using odr name >> hash instaed >> of canonical type hash for those named types + link them to canonical type >> hash >> entries and if we end up with unnamed type in canonical type hash, then make >> its >> alias class to conflict with all the named types. >> >> Honza >> >> Index: lto-streamer-out.c >> =================================================================== >> --- lto-streamer-out.c (revision 211488) >> +++ lto-streamer-out.c (working copy) >> @@ -54,6 +54,47 @@ along with GCC; see the file COPYING3. >> #include "cfgloop.h" >> #include "builtins.h" >> >> +/* Return if T can never be shared across units. */ >> +static bool >> +unit_local_tree_p (tree t) >> +{ >> + switch (TREE_CODE (t)) >> + { >> + case VAR_DECL: >> + /* Automatic variables are always unit local. */ >> + if (!TREE_STATIC (t) && !DECL_EXTERNAL (t) >> + && !DECL_HARD_REGISTER (t)) >> + return true; >> + /* ... fall through ... */ >> + >> + case FUNCTION_DECL: >> + /* Non-public declarations are alwyas local. */ >> + if (!TREE_PUBLIC (t)) >> + return true; >> + >> + /* Public definitions that would cause linker error if >> + appeared in other unit. */ >> + if (TREE_PUBLIC (t) >> + && !DECL_EXTERNAL (t) >> + && !DECL_WEAK (t)) >> + return true; >> + return false; >> + case NAMESPACE_DECL: >> + return !TREE_PUBLIC (t); >> + case TRANSLATION_UNIT_DECL: >> + return true; >> + case PARM_DECL: >> + case RESULT_DECL: >> + case LABEL_DECL: >> + case SSA_NAME: >> + return true; >> + default: >> + if (TYPE_P (t) >> + && type_in_anonymous_namespace_p (t)) >> + return true; >> + return false; >> + } >> +} >> >> static void lto_write_tree (struct output_block*, tree, bool); >> >> @@ -686,7 +727,9 @@ DFS_write_tree_body (struct output_block >> #undef DFS_follow_tree_edge >> } >> >> -/* Return a hash value for the tree T. */ >> +/* Return a hash value for the tree T. >> + If T is local to unit or refers anything local to unit, return 1. >> + Otherwise return non-1. */ >> >> static hashval_t >> hash_tree (struct streamer_tree_cache_d *cache, tree t) >> @@ -694,10 +737,19 @@ hash_tree (struct streamer_tree_cache_d >> #define visit(SIBLING) \ >> do { \ >> unsigned ix; \ >> + hashval_t h; \ >> if (SIBLING && streamer_tree_cache_lookup (cache, SIBLING, &ix)) \ >> - v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, >> ix), v); \ >> + { \ >> + h = streamer_tree_cache_get_hash (cache, ix); \ >> + if (h == STREAMER_TREE_CACHE_HASH_LOCAL) \ >> + return STREAMER_TREE_CACHE_HASH_LOCAL; \ >> + v = iterative_hash_hashval_t (h, v); \ >> + } \ >> } while (0) >> >> + if (unit_local_tree_p (t)) >> + return STREAMER_TREE_CACHE_HASH_LOCAL; >> + >> /* Hash TS_BASE. */ >> enum tree_code code = TREE_CODE (t); >> hashval_t v = iterative_hash_host_wide_int (code, 0); >> @@ -911,7 +963,11 @@ hash_tree (struct streamer_tree_cache_d >> hashval_t x; >> unsigned ix; >> if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), &ix)) >> - x = streamer_tree_cache_get_hash (cache, ix); >> + { >> + x = streamer_tree_cache_get_hash (cache, ix); >> + if (x == STREAMER_TREE_CACHE_HASH_LOCAL) >> + return STREAMER_TREE_CACHE_HASH_LOCAL; >> + } >> else >> x = hash_tree (cache, TREE_TYPE (t)); >> v = iterative_hash_hashval_t (x, v); >> @@ -1101,7 +1157,7 @@ hash_tree (struct streamer_tree_cache_d >> visit (OMP_CLAUSE_CHAIN (t)); >> } >> >> - return v; >> + return v + (v == STREAMER_TREE_CACHE_HASH_LOCAL); >> >> #undef visit >> } >> @@ -1121,14 +1177,25 @@ scc_entry_compare (const void *p1_, cons >> } >> >> /* Return a hash value for the SCC on the SCC stack from FIRST with >> - size SIZE. */ >> + size SIZE. >> + If STREAMER_TREE_CACHE_HASH_LOCAL is returned, we know the SCC will never >> + get merged with other unit. */ >> >> static hashval_t >> hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned >> size) >> { >> /* Compute hash values for the SCC members. */ >> for (unsigned i = 0; i < size; ++i) >> - sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); >> + { >> + sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); >> + /* If any member of SCC is local, whole SCC is. */ >> + if (sccstack[first+i].hash == STREAMER_TREE_CACHE_HASH_LOCAL) >> + { >> + for (unsigned i = 0; i < size; ++i) >> + sccstack[first+i].hash = STREAMER_TREE_CACHE_HASH_LOCAL; >> + return STREAMER_TREE_CACHE_HASH_LOCAL; >> + } >> + } >> >> if (size == 1) >> return sccstack[first].hash; >> @@ -1161,7 +1228,7 @@ hash_scc (struct streamer_tree_cache_d * >> sccstack[first+i].hash = tem[i]; >> scc_hash = iterative_hash_hashval_t (tem[i], scc_hash); >> } >> - return scc_hash; >> + return scc_hash + (scc_hash == STREAMER_TREE_CACHE_HASH_LOCAL); >> } >> >> /* DFS walk EXPR and stream SCCs of tree bodies if they are not >> @@ -1242,26 +1309,29 @@ DFS_write_tree (struct output_block *ob, >> scc_hash = hash_scc (ob->writer_cache, first, size); >> >> /* Put the entries with the least number of collisions first. >> */ >> - unsigned entry_start = 0; >> - scc_entry_len = size + 1; >> - for (unsigned i = 0; i < size;) >> + if (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL) >> { >> - unsigned from = i; >> - for (i = i + 1; i < size >> - && (sccstack[first + i].hash >> - == sccstack[first + from].hash); ++i) >> - ; >> - if (i - from < scc_entry_len) >> + unsigned entry_start = 0; >> + scc_entry_len = size + 1; >> + for (unsigned i = 0; i < size;) >> { >> - scc_entry_len = i - from; >> - entry_start = from; >> + unsigned from = i; >> + for (i = i + 1; i < size >> + && (sccstack[first + i].hash >> + == sccstack[first + from].hash); ++i) >> + ; >> + if (i - from < scc_entry_len) >> + { >> + scc_entry_len = i - from; >> + entry_start = from; >> + } >> + } >> + for (unsigned i = 0; i < scc_entry_len; ++i) >> + { >> + scc_entry tem = sccstack[first + i]; >> + sccstack[first + i] = sccstack[first + entry_start + >> i]; >> + sccstack[first + entry_start + i] = tem; >> } >> - } >> - for (unsigned i = 0; i < scc_entry_len; ++i) >> - { >> - scc_entry tem = sccstack[first + i]; >> - sccstack[first + i] = sccstack[first + entry_start + i]; >> - sccstack[first + entry_start + i] = tem; >> } >> } >> >> @@ -1271,7 +1341,7 @@ DFS_write_tree (struct output_block *ob, >> streamer_write_uhwi (ob, scc_hash); >> >> /* Write size-1 SCCs without wrapping them inside SCC bundles. >> - All INTEGER_CSTs need to be handled this way as we need >> + All INTEGER_CSTs need thiso be handled this way as we need >> their type to materialize them. Also builtins are handled >> this way. >> ??? We still wrap these in LTO_tree_scc so at the >> @@ -1291,6 +1361,8 @@ DFS_write_tree (struct output_block *ob, >> tree t = sccstack[first+i].t; >> bool exists_p = streamer_tree_cache_insert >> (ob->writer_cache, >> t, hash, &ix); >> + gcc_assert (scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL >> + || hash == STREAMER_TREE_CACHE_HASH_LOCAL); >> gcc_assert (!exists_p); >> >> if (!lto_is_streamable (t)) >> Index: lto/lto.c >> =================================================================== >> --- lto/lto.c (revision 211488) >> +++ lto/lto.c (working copy) >> @@ -1896,7 +1897,7 @@ lto_read_decls (struct lto_file_decl_dat >> continue; >> >> /* Try to unify the SCC with already existing ones. */ >> - if (!flag_ltrans >> + if (!flag_ltrans && scc_hash != STREAMER_TREE_CACHE_HASH_LOCAL >> && unify_scc (data_in->reader_cache, from, >> len, scc_entry_len, scc_hash)) >> continue;