Hi,
this is a proof-of-concept patch for type merging during LTO streaming. It
does two things
1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list
   This is useful at compilation time because frontends produce more variants
   than needed. The effect of this is about 2% of decl stream
2) replace ODR types by their prevailing variant if ODR violation was not
   detected.  This is useful at WPA time and saves about 50% of global decl
   stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB.

Here are stats of number of nodes hitting global stream during WPA->ltrans
streaming of firefox:

Before  after
   7354    7280 namespace_decl
  46593   14084 union_type
  18496   15405 translation_unit_decl
  22816   21548 integer_type
 107047   31681 enumeral_type
  67072   46390 array_type
 541220   99572 pointer_plus_expr
 542360  100712 addr_expr
 167657  154019 var_decl
 864769  182691 tree_binfo
 240200  206783 reference_type
 410403  316877 function_type
1862985  522524 type_decl
1954864  652664 record_type
1070333  880209 method_type
1582055 1014270 pointer_type
1495367 1406670 tree_list
7926385 1483545 field_decl
1715133 1725612 function_decl
3384810 3191406 identifier_node

So largest savings are in field_decls (to 18% and they are not even merged
fully by the patch), pointer tpes (to 64%), record types (to 30%) and type
decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to
reffer to vtables to 21%). Patch bootstraps, regtestes and seems to
work reliably.

The merging is currently done during streaming by simply looking up prevailing
type and populating the cache (so we do not get too many repeated lookups).
Similar tricks can be added to checksum calculation.

I am not sure this is best possible place, especially for 2) since it provides
really quite major reduction in the IL size.  I am quite sure we do not want
to do that at stream in time in parallel to SCC merging because it affects the
SCC components and also we do not want to merge this way types containing ODR
violations as QOI issue. (merging two completely different types would lead
to ICEs but merging two TBAA incompatible types is probably equally bad).

So we need to do following
 a) read in the the types & do tree mering
 b) populate odr type hash, do ODR violation checks and decide on what types
    are consistent
 c) do the ODR based merging sometime before the trees hit ltrans stream.

I was thinking that perhaps c) can also be added to mentions_vars_p machinery
which would make it somewhat more costy but we could free those duplicates and
save some more memory during WPA.  I am also not usre if longer term we
would not want to have mode that bypasses WPA->ltrans stream completely and
just compile out of WPA process in threads or forks.
Drawback is that building vector recording every single type pointer is going
to also consume quite some memory.

Patch is also not complete because it does not merge field decls. Analogous
tricks can be added to streamer, but I just tought I would like to discuss
alternative implementations first.

WPA compile time improves by about 11%, so merging benefits overcome the extra
complexity in sreamer.

I hope that with this patch I will be able to increase default number of 
partitions
since the global stream is getting less percentage of the ltrans files. 33%
is still quite a lot, but far better than what we had previously. Hopefully
we could still dramatically cut this down in followup.

Also obvious drawback is that this helps to C++ only. I looked into stats from
C programs and also Toon's stats on Fortran and I tend to believe that C++
is really much worse than those two.

Honza

Index: ipa-devirt.c
===================================================================
--- ipa-devirt.c        (revision 264689)
+++ ipa-devirt.c        (working copy)
@@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
   return val;
 }
 
+/* Return the main variant of the odr type.  This is used for straming out
+   to reduce number of type duplicates hitting the WPA->LTRANS streams.
+   Do not do so when ODR violation was detected since the type may be
+   structurally different then.  */
+
+tree
+prevailing_odr_type (tree t)
+{
+  t = TYPE_MAIN_VARIANT (t);
+  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
+     We can not merge these because this does not honnor precision and
+     signedness.  */
+  if (!type_with_linkage_p (t)
+      || type_in_anonymous_namespace_p (t)
+      || TREE_CODE (t) == INTEGER_TYPE
+      || !COMPLETE_TYPE_P (t))
+    return t;
+  odr_type ot = get_odr_type (t, true);
+  if (!ot || !ot->odr_violated)
+    return ot->type;
+  return t;
+}
+
 /* Add TYPE od ODR type hash.  */
 
 void
Index: ipa-utils.h
===================================================================
--- ipa-utils.h (revision 264689)
+++ ipa-utils.h (working copy)
@@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
                          location_t loc2 = UNKNOWN_LOCATION);
 bool odr_or_derived_type_p (const_tree t);
 bool odr_types_equivalent_p (tree type1, tree type2);
+tree prevailing_odr_type (tree t);
 
 /* Return vector containing possible targets of polymorphic call E.
    If COMPLETEP is non-NULL, store true if the list is complete. 
Index: lto/lto.c
===================================================================
--- lto/lto.c   (revision 264689)
+++ lto/lto.c   (working copy)
@@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
 static void
 gimple_register_canonical_type (tree t)
 {
+  if (flag_checking)
+    verify_type (t);
   if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
       || !canonical_type_used_p (t))
     return;
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c  (revision 264689)
+++ lto-streamer-out.c  (working copy)
@@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
 #include "debug.h"
 #include "omp-offload.h"
 #include "print-tree.h"
+#include "attribs.h"
+#include "ipa-utils.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
   free (ob);
 }
 
+/* Do same comparsion as check_qualified_type skipping lang part of type
+   and be more permissive about type names: we only care that names are
+   same (for diagnostics) and that ODR names are the same.  */
 
-/* Look up NODE in the type table and write the index for it to OB.  */
+static bool
+types_equal_p (tree t, tree v)
+{
+  if (TYPE_QUALS (t) != TYPE_QUALS (v))
+    return false;
+
+  if (TYPE_NAME (t) != TYPE_NAME (v)
+      && (!TYPE_NAME (t) || !TYPE_NAME (v)
+         || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
+         || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
+         || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
+            != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
+         || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
+     return false;
+
+  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
+    return false;
+
+  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
+                            TYPE_ATTRIBUTES (v)))
+     return false;
+
+  /* Do not replace complete type by incomplete.  */
+  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
+    return false;
+
+  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
+    return false;
+
+  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
+
+  /* For pointer types and array types we also care about the type they
+     reffer to.  */
+  if (TREE_TYPE (t))
+    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
+
+  return true;
+}
+
+/* Search list of type variants and look up first one that looks same.
+   At compile time, this removes duplicates of types created by front-ends.
+   At WPA time we also merge duplicated ODR types.  */
+
+static tree
+first_compatible_type_variant (tree t)
+{
+  tree first = NULL;
+  if (POINTER_TYPE_P (t))
+    {
+      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
+      if (t2 != TREE_TYPE (t))
+       {
+         if (TREE_CODE (t) == POINTER_TYPE)
+           first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
+                                               TYPE_REF_CAN_ALIAS_ALL (t));
+         else
+           first = build_reference_type_for_mode (t2, TYPE_MODE (t),
+                                               TYPE_REF_CAN_ALIAS_ALL (t));
+       }
+      else
+       first = TYPE_MAIN_VARIANT (t);
+      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
+    }
+  else
+    first = prevailing_odr_type (t);
+
+  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
+
+  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
+    if (types_equal_p (t, v))
+      {
+       gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
+       gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL 
(v));
+       gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
+        return v;
+      }
+
+  /* Most of the time we will find v itself in the list.  With ODR type merging
+     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
+     have corresponding type in the list.  */
+  tree v = build_variant_type_copy (first);
+  TYPE_READONLY (v) = TYPE_READONLY (t);
+  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
+  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
+  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
+  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
+  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
+  TYPE_NAME (v) = TYPE_NAME (t);
+  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
+  TREE_TYPE (v) = TREE_TYPE (t);
+  gcc_checking_assert (types_equal_p (t, v));
+  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
+  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
+  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
+  return v;
+}
+
+
+/* Look up NODE in the type table and write the index for it to OB.
+   Eliminate unnecesary type variants.  */
 
 static void
 output_type_ref (struct output_block *ob, tree node)
 {
+  bool existed_p;
+  unsigned int &index
+    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
+       get_or_insert (node, &existed_p);
   streamer_write_record_start (ob, LTO_type_ref);
-  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
+
+  if (existed_p)
+    {
+      streamer_write_uhwi_stream (ob->main_stream, index);
+      return;
+    }
+
+  tree node2 = first_compatible_type_variant (node);
+
+  /* If NODE is first variant, just add it into encoder.  */
+  if (node2 == node)
+    {
+      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
+      if (streamer_dump_file)
+       {
+         print_node_brief (streamer_dump_file,
+                           "    Encoding indexable ", node, 4);
+         fprintf (streamer_dump_file, "  as %i\n", index);
+       }
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
+      streamer_write_uhwi_stream (ob->main_stream, index);
+      return;
+    }
+
+  index = 0xdead;
+
+  if (streamer_dump_file)
+    {
+      print_node_brief (streamer_dump_file, "    Type ", node, 4);
+      fprintf (streamer_dump_file, "  prevailed by ");
+      print_node_brief (streamer_dump_file, "", node2, 4);
+      fprintf (streamer_dump_file, "\n");
+    }
+
+  /* See if we already assgined one to the first variant.  If that is the case
+     only duplicate the entry in the hashtable so we do not need to call
+     first_compatible_type_variant redundantly.  */
+  unsigned int &index2
+    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
+       get_or_insert (node2, &existed_p);
+  /* The reference may be changed by hash table expansion.  */
+  unsigned int i = index2;
+
+  if (existed_p)
+    {
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
+       tree_hash_table->get_or_insert (node, &existed_p) = i;
+      streamer_write_uhwi_stream (ob->main_stream, i);
+      return;
+    }
+  else
+    {
+      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
+      i = index2;
+      if (streamer_dump_file)
+       {
+         print_node_brief (streamer_dump_file,
+                           "    Encoding indexable ", node2, 4);
+         fprintf (streamer_dump_file, "  as %i \n", i);
+       }
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
+      /* We need to search again to watch hash table resize.  */
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
+       tree_hash_table->get_or_insert (node, &existed_p) = i;
+      gcc_assert (existed_p);
+      streamer_write_uhwi_stream (ob->main_stream, i);
+    }
+  return;
 }
 
 
@@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
 #define visit(SIBLING) \
   do { \
     unsigned ix; \
-    if (!SIBLING) \
+    tree t2 = SIBLING; \
+    if (t2 && TYPE_P (t2)) \
+      t2 = first_compatible_type_variant (t2); \
+    if (!t2) \
       hstate.add_int (0); \
-    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
+    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
       hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
     else if (map) \
-      hstate.add_int (*map->get (SIBLING)); \
+      hstate.add_int (*map->get (t2)); \
     else \
       hstate.add_int (1); \
   } while (0)
@@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
   if (this_ref_p && tree_is_indexable (expr))
     return;
 
+  /* Replace type by its prevailing variant.  */
+  if (TYPE_P (expr))
+    expr = first_compatible_type_variant (expr);
+
   /* Check if we already streamed EXPR.  */
   if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
     return;
@@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
       return;
     }
 
+  if (TYPE_P (expr))
+    expr = first_compatible_type_variant (expr);
+
+  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
+
   existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
   if (existed_p)
     {
@@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
       gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
       encoder->trees.reserve_exact (n);
       for (j = 0; j < n; j++)
-       encoder->trees.safe_push ((*trees)[j]);
+       {
+         tree t = (*trees)[j];
+         if (TYPE_P (t))
+           t = first_compatible_type_variant (t);
+         if (streamer_dump_file)
+           {
+             print_node_brief (streamer_dump_file,
+                               "  Adding reference to ", t, 4);
+             fprintf (streamer_dump_file, "  as %i \n", (int)j);
+           }
+         encoder->trees.safe_push (t);
+       }
     }
 
   lto_free_raw_section_data (file_data, LTO_section_function_body, name,
@@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
          print_node_brief (streamer_dump_file, "", t, 4);
           fprintf (streamer_dump_file, "\n");
        }
+      gcc_checking_assert (!TYPE_P (t)
+                          || t == first_compatible_type_variant (t));
       if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
        stream_write_tree (ob, t, false);
     }
@@ -2537,6 +2755,8 @@ write_global_references (struct output_b
       t = lto_tree_ref_encoder_get_tree (encoder, index);
       streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
       gcc_assert (slot_num != (unsigned)-1);
+      gcc_checking_assert (!TYPE_P (t)
+                          || t == first_compatible_type_variant (t));
       data[index + 1] = slot_num;
     }
 
Index: tree-ssa-alias.c
===================================================================
--- tree-ssa-alias.c    (revision 264689)
+++ tree-ssa-alias.c    (working copy)
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
 #include "tree-dfa.h"
 #include "ipa-reference.h"
 #include "varasm.h"
+#include "ipa-utils.h"
 
 /* Broad overview of how alias analysis on gimple works:
 
@@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
             ??? Bitfields can overlap at RTL level so punt on them.  */
          if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
            return false;
+         /* In LTO we may end up merging different variants of ODR types
+            but keep corresponding field decls unmerged.  */
+         if (in_lto_p
+             && type_with_linkage_p (type1)
+             && DECL_NAME (field1) == DECL_NAME (field2))
+           return false;
          return true;
        }
     }
@@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
        {
          /* We're left with accessing different fields of a structure,
             no possible overlap.  */
-         if (fieldx != fieldy)
+         if (fieldx != fieldy
+             && (in_lto_p || !type_with_linkage_p (typex)
+                 || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
            {
              /* A field and its representative need to be considered the
                 same.  */
Index: tree.c
===================================================================
--- tree.c      (revision 264689)
+++ tree.c      (working copy)
@@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
 
   /* Traverse every type found freeing its language data.  */
   FOR_EACH_VEC_ELT (fld.types, i, t)
-    free_lang_data_in_type (t);
+    {
+      free_lang_data_in_type (t);
+      while (TYPE_NEXT_VARIANT (t)
+            && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
+       TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
+    }
   if (flag_checking)
     {
       FOR_EACH_VEC_ELT (fld.types, i, t)

Reply via email to