Hi, this is a proof-of-concept patch for type merging during LTO streaming. It does two things 1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list This is useful at compilation time because frontends produce more variants than needed. The effect of this is about 2% of decl stream 2) replace ODR types by their prevailing variant if ODR violation was not detected. This is useful at WPA time and saves about 50% of global decl stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB.
Here are stats of number of nodes hitting global stream during WPA->ltrans streaming of firefox: Before after 7354 7280 namespace_decl 46593 14084 union_type 18496 15405 translation_unit_decl 22816 21548 integer_type 107047 31681 enumeral_type 67072 46390 array_type 541220 99572 pointer_plus_expr 542360 100712 addr_expr 167657 154019 var_decl 864769 182691 tree_binfo 240200 206783 reference_type 410403 316877 function_type 1862985 522524 type_decl 1954864 652664 record_type 1070333 880209 method_type 1582055 1014270 pointer_type 1495367 1406670 tree_list 7926385 1483545 field_decl 1715133 1725612 function_decl 3384810 3191406 identifier_node So largest savings are in field_decls (to 18% and they are not even merged fully by the patch), pointer tpes (to 64%), record types (to 30%) and type decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to reffer to vtables to 21%). Patch bootstraps, regtestes and seems to work reliably. The merging is currently done during streaming by simply looking up prevailing type and populating the cache (so we do not get too many repeated lookups). Similar tricks can be added to checksum calculation. I am not sure this is best possible place, especially for 2) since it provides really quite major reduction in the IL size. I am quite sure we do not want to do that at stream in time in parallel to SCC merging because it affects the SCC components and also we do not want to merge this way types containing ODR violations as QOI issue. (merging two completely different types would lead to ICEs but merging two TBAA incompatible types is probably equally bad). So we need to do following a) read in the the types & do tree mering b) populate odr type hash, do ODR violation checks and decide on what types are consistent c) do the ODR based merging sometime before the trees hit ltrans stream. I was thinking that perhaps c) can also be added to mentions_vars_p machinery which would make it somewhat more costy but we could free those duplicates and save some more memory during WPA. I am also not usre if longer term we would not want to have mode that bypasses WPA->ltrans stream completely and just compile out of WPA process in threads or forks. Drawback is that building vector recording every single type pointer is going to also consume quite some memory. Patch is also not complete because it does not merge field decls. Analogous tricks can be added to streamer, but I just tought I would like to discuss alternative implementations first. WPA compile time improves by about 11%, so merging benefits overcome the extra complexity in sreamer. I hope that with this patch I will be able to increase default number of partitions since the global stream is getting less percentage of the ltrans files. 33% is still quite a lot, but far better than what we had previously. Hopefully we could still dramatically cut this down in followup. Also obvious drawback is that this helps to C++ only. I looked into stats from C programs and also Toon's stats on Fortran and I tend to believe that C++ is really much worse than those two. Honza Index: ipa-devirt.c =================================================================== --- ipa-devirt.c (revision 264689) +++ ipa-devirt.c (working copy) @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert) return val; } +/* Return the main variant of the odr type. This is used for straming out + to reduce number of type duplicates hitting the WPA->LTRANS streams. + Do not do so when ODR violation was detected since the type may be + structurally different then. */ + +tree +prevailing_odr_type (tree t) +{ + t = TYPE_MAIN_VARIANT (t); + /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE. + We can not merge these because this does not honnor precision and + signedness. */ + if (!type_with_linkage_p (t) + || type_in_anonymous_namespace_p (t) + || TREE_CODE (t) == INTEGER_TYPE + || !COMPLETE_TYPE_P (t)) + return t; + odr_type ot = get_odr_type (t, true); + if (!ot || !ot->odr_violated) + return ot->type; + return t; +} + /* Add TYPE od ODR type hash. */ void Index: ipa-utils.h =================================================================== --- ipa-utils.h (revision 264689) +++ ipa-utils.h (working copy) @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree location_t loc2 = UNKNOWN_LOCATION); bool odr_or_derived_type_p (const_tree t); bool odr_types_equivalent_p (tree type1, tree type2); +tree prevailing_odr_type (tree t); /* Return vector containing possible targets of polymorphic call E. If COMPLETEP is non-NULL, store true if the list is complete. Index: lto/lto.c =================================================================== --- lto/lto.c (revision 264689) +++ lto/lto.c (working copy) @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t static void gimple_register_canonical_type (tree t) { + if (flag_checking) + verify_type (t); if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t) || !canonical_type_used_p (t)) return; Index: lto-streamer-out.c =================================================================== --- lto-streamer-out.c (revision 264689) +++ lto-streamer-out.c (working copy) @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. #include "debug.h" #include "omp-offload.h" #include "print-tree.h" +#include "attribs.h" +#include "ipa-utils.h" static void lto_write_tree (struct output_block*, tree, bool); @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc free (ob); } +/* Do same comparsion as check_qualified_type skipping lang part of type + and be more permissive about type names: we only care that names are + same (for diagnostics) and that ODR names are the same. */ -/* Look up NODE in the type table and write the index for it to OB. */ +static bool +types_equal_p (tree t, tree v) +{ + if (TYPE_QUALS (t) != TYPE_QUALS (v)) + return false; + + if (TYPE_NAME (t) != TYPE_NAME (v) + && (!TYPE_NAME (t) || !TYPE_NAME (v) + || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL + || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL + || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t)) + != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v)) + || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v)))) + return false; + + if (TYPE_ALIGN (t) != TYPE_ALIGN (v)) + return false; + + if (!attribute_list_equal (TYPE_ATTRIBUTES (t), + TYPE_ATTRIBUTES (v))) + return false; + + /* Do not replace complete type by incomplete. */ + if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v)) + return false; + + if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v)) + return false; + + gcc_assert (TREE_CODE (t) == TREE_CODE (v)); + + /* For pointer types and array types we also care about the type they + reffer to. */ + if (TREE_TYPE (t)) + return types_equal_p (TREE_TYPE (t), TREE_TYPE (v)); + + return true; +} + +/* Search list of type variants and look up first one that looks same. + At compile time, this removes duplicates of types created by front-ends. + At WPA time we also merge duplicated ODR types. */ + +static tree +first_compatible_type_variant (tree t) +{ + tree first = NULL; + if (POINTER_TYPE_P (t)) + { + tree t2 = first_compatible_type_variant (TREE_TYPE (t)); + if (t2 != TREE_TYPE (t)) + { + if (TREE_CODE (t) == POINTER_TYPE) + first = build_pointer_type_for_mode (t2, TYPE_MODE (t), + TYPE_REF_CAN_ALIAS_ALL (t)); + else + first = build_reference_type_for_mode (t2, TYPE_MODE (t), + TYPE_REF_CAN_ALIAS_ALL (t)); + } + else + first = TYPE_MAIN_VARIANT (t); + gcc_assert (TYPE_MAIN_VARIANT (first) == first); + } + else + first = prevailing_odr_type (t); + + gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false)); + + for (tree v = first; v; v = TYPE_NEXT_VARIANT (v)) + if (types_equal_p (t, v)) + { + gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false)); + gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v)); + gcc_checking_assert (get_alias_set (t) == get_alias_set (v)); + return v; + } + + /* Most of the time we will find v itself in the list. With ODR type merging + we however merge across TYPE_MAIN_VARIANTs and in that case we may not + have corresponding type in the list. */ + tree v = build_variant_type_copy (first); + TYPE_READONLY (v) = TYPE_READONLY (t); + TYPE_VOLATILE (v) = TYPE_VOLATILE (t); + TYPE_ATOMIC (v) = TYPE_ATOMIC (t); + TYPE_RESTRICT (v) = TYPE_RESTRICT (t); + TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t); + SET_TYPE_ALIGN (v, TYPE_ALIGN (t)); + TYPE_NAME (v) = TYPE_NAME (t); + TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t); + TREE_TYPE (v) = TREE_TYPE (t); + gcc_checking_assert (types_equal_p (t, v)); + gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false)); + gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v)); + gcc_checking_assert (get_alias_set (t) == get_alias_set (v)); + return v; +} + + +/* Look up NODE in the type table and write the index for it to OB. + Eliminate unnecesary type variants. */ static void output_type_ref (struct output_block *ob, tree node) { + bool existed_p; + unsigned int &index + = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table-> + get_or_insert (node, &existed_p); streamer_write_record_start (ob, LTO_type_ref); - lto_output_type_ref_index (ob->decl_state, ob->main_stream, node); + + if (existed_p) + { + streamer_write_uhwi_stream (ob->main_stream, index); + return; + } + + tree node2 = first_compatible_type_variant (node); + + /* If NODE is first variant, just add it into encoder. */ + if (node2 == node) + { + index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length (); + if (streamer_dump_file) + { + print_node_brief (streamer_dump_file, + " Encoding indexable ", node, 4); + fprintf (streamer_dump_file, " as %i\n", index); + } + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node); + streamer_write_uhwi_stream (ob->main_stream, index); + return; + } + + index = 0xdead; + + if (streamer_dump_file) + { + print_node_brief (streamer_dump_file, " Type ", node, 4); + fprintf (streamer_dump_file, " prevailed by "); + print_node_brief (streamer_dump_file, "", node2, 4); + fprintf (streamer_dump_file, "\n"); + } + + /* See if we already assgined one to the first variant. If that is the case + only duplicate the entry in the hashtable so we do not need to call + first_compatible_type_variant redundantly. */ + unsigned int &index2 + = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table-> + get_or_insert (node2, &existed_p); + /* The reference may be changed by hash table expansion. */ + unsigned int i = index2; + + if (existed_p) + { + ob->decl_state->streams[LTO_DECL_STREAM_TYPE]. + tree_hash_table->get_or_insert (node, &existed_p) = i; + streamer_write_uhwi_stream (ob->main_stream, i); + return; + } + else + { + index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length (); + i = index2; + if (streamer_dump_file) + { + print_node_brief (streamer_dump_file, + " Encoding indexable ", node2, 4); + fprintf (streamer_dump_file, " as %i \n", i); + } + ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2); + /* We need to search again to watch hash table resize. */ + ob->decl_state->streams[LTO_DECL_STREAM_TYPE]. + tree_hash_table->get_or_insert (node, &existed_p) = i; + gcc_assert (existed_p); + streamer_write_uhwi_stream (ob->main_stream, i); + } + return; } @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d #define visit(SIBLING) \ do { \ unsigned ix; \ - if (!SIBLING) \ + tree t2 = SIBLING; \ + if (t2 && TYPE_P (t2)) \ + t2 = first_compatible_type_variant (t2); \ + if (!t2) \ hstate.add_int (0); \ - else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \ + else if (streamer_tree_cache_lookup (cache, t2, &ix)) \ hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \ else if (map) \ - hstate.add_int (*map->get (SIBLING)); \ + hstate.add_int (*map->get (t2)); \ else \ hstate.add_int (1); \ } while (0) @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block if (this_ref_p && tree_is_indexable (expr)) return; + /* Replace type by its prevailing variant. */ + if (TYPE_P (expr)) + expr = first_compatible_type_variant (expr); + /* Check if we already streamed EXPR. */ if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL)) return; @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob return; } + if (TYPE_P (expr)) + expr = first_compatible_type_variant (expr); + + gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr)); + existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix); if (existed_p) { @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab gcc_assert (lto_tree_ref_encoder_size (encoder) == 0); encoder->trees.reserve_exact (n); for (j = 0; j < n; j++) - encoder->trees.safe_push ((*trees)[j]); + { + tree t = (*trees)[j]; + if (TYPE_P (t)) + t = first_compatible_type_variant (t); + if (streamer_dump_file) + { + print_node_brief (streamer_dump_file, + " Adding reference to ", t, 4); + fprintf (streamer_dump_file, " as %i \n", (int)j); + } + encoder->trees.safe_push (t); + } } lto_free_raw_section_data (file_data, LTO_section_function_body, name, @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block print_node_brief (streamer_dump_file, "", t, 4); fprintf (streamer_dump_file, "\n"); } + gcc_checking_assert (!TYPE_P (t) + || t == first_compatible_type_variant (t)); if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL)) stream_write_tree (ob, t, false); } @@ -2537,6 +2755,8 @@ write_global_references (struct output_b t = lto_tree_ref_encoder_get_tree (encoder, index); streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num); gcc_assert (slot_num != (unsigned)-1); + gcc_checking_assert (!TYPE_P (t) + || t == first_compatible_type_variant (t)); data[index + 1] = slot_num; } Index: tree-ssa-alias.c =================================================================== --- tree-ssa-alias.c (revision 264689) +++ tree-ssa-alias.c (working copy) @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. #include "tree-dfa.h" #include "ipa-reference.h" #include "varasm.h" +#include "ipa-utils.h" /* Broad overview of how alias analysis on gimple works: @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p ??? Bitfields can overlap at RTL level so punt on them. */ if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) return false; + /* In LTO we may end up merging different variants of ODR types + but keep corresponding field decls unmerged. */ + if (in_lto_p + && type_with_linkage_p (type1) + && DECL_NAME (field1) == DECL_NAME (field2)) + return false; return true; } } @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t { /* We're left with accessing different fields of a structure, no possible overlap. */ - if (fieldx != fieldy) + if (fieldx != fieldy + && (in_lto_p || !type_with_linkage_p (typex) + || DECL_NAME (fieldx) == DECL_NAME (fieldy))) { /* A field and its representative need to be considered the same. */ Index: tree.c =================================================================== --- tree.c (revision 264689) +++ tree.c (working copy) @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void) /* Traverse every type found freeing its language data. */ FOR_EACH_VEC_ELT (fld.types, i, t) - free_lang_data_in_type (t); + { + free_lang_data_in_type (t); + while (TYPE_NEXT_VARIANT (t) + && !fld.pset.contains (TYPE_NEXT_VARIANT (t))) + TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t)); + } if (flag_checking) { FOR_EACH_VEC_ELT (fld.types, i, t)