On Wed, Nov 2, 2011 at 12:12 PM, Martin Jambor <mjam...@suse.cz> wrote:
> Hi,
>
> On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote:
>> On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor <mjam...@suse.cz> wrote:
>> > Hi,
>> >
>> > the patch below is the second (and last) revived type-based
>> > devirtualization patch that did not make it to 4.6.  It deals with
>> > virtual calls from the function in which the there is also the object
>> > declaration:
>> >
>> > void foo()
>> > {
>> >  A a;
>> >
>> >  a.foo ();
>> > }
>> >
>> > Normally, the front-end would do the devirtualization on its own,
>> > however, early-inlining can create these situations too.  Since there
>> > is nothing interprocedural going on, the current inlining and IPA-CP
>> > devirtualization bits are of no help.  We do not do type-based
>> > devirtualization in OBJ_TYPE_REF folding either, because the necessary
>> > type-changing checks might make it quite expensive.
>>
>> But in the above case - a is not address-taken - we do not need
>> type-based devirtualization.  So, can you explain why we do not
>> forward the proper vtable from the vtable store that would lead to
>> the known-type info?
>>
>
> We can only do that when the object constructor is early-inlined.  If
> it is only slightly bigger, or in a different compilation unit, the
> store ends up in a different function (or unit) and we cannot do it.
> LTO is not necessary with this approach and we still can be quite
> helpful if foo is in the same TU and we can inline it.

So we are using that fishy calls-cannot-change-object-types logic here?
Even if the actual call is the constructor?

Well, the cases seem to be pretty obscure - do we really want to add
another walk over the entire IL for that?  Can't you piggy-back on
some other transform for that?  Like for example FREs eliminate() walk,
which already does some of the devirt work?

Thanks,
Richard.

> Martin
>
>
>> Thanks,
>> Richard.
>>
>> >  Thus, this patch
>> > introduces a new pass to do that.
>> >
>> > The patch basically piggy-tails on the intraprocedural
>> > devirtualization mechanism, trying to construct a known-type jump
>> > function for all objects in OBJ_TYPE_REF calls and then immediately
>> > using it like we do in IPA-CP.
>> >
>> > The original patch was doing this as a part of
>> > pass_rebuild_cgraph_edges.  Honza did not like this idea so I made it
>> > a separate pass.  First, I scheduled it after
>> > pass_rebuild_cgraph_edges and was only traversing indirect edges,
>> > avoiding a sweep over all of the IL.  Unfortunately, this does not
>> > work in one scenario.  If the newly-known destination of a virtual
>> > call is known not to throw, we may have to purge some EH CFG edges and
>> > potentially basic blocks.  If these basic blocks contain calls
>> > (typically calls to object destructors), we may end up having stale
>> > call edges in the call graph... and our current approach to that
>> > problem is to call pass_rebuild_cgraph_edges.  I think that I was not
>> > running into this problem when the mechanism was a part of that pass
>> > just because of pure luck.  Anyway, this is why I eventually opted for
>> > a sweep over the statements.
>> >
>> > My best guess is that it is probably worth it, but only because the
>> > overhead should be still fairly low.  The pass triggers quite a number
>> > of times when building libstdc++ and it can speed up an artificial
>> > testcase which I will attach from over 20 seconds to 7s on my older
>> > desktop - it is very similar to the one I posted with the previous
>> > patch but this time the object constructors must not get early inlined
>> > but the function multiply_matrices has to.  Currently I have problems
>> > compiling Firefox even without LTO so I don't have any numbers from it
>> > either.  IIRC, Honza did not see this patch trigger there when he
>> > tried the ancient version almost a year go.  On the other hand, Maxim
>> > claimed that the impact can be noticeable on some code base he is
>> > concerned about.
>> >
>> > I have successfully bootstrapped and tested the patch on x86_64-linux.
>> > What do you think, should we include this in trunk?
>> >
>> > Thanks,
>> >
>> > Martin
>> >
>> >
>> > 2011-10-31  Martin Jambor  <mjam...@suse.cz>
>> >
>> >        * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to...
>> >        * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported 
>> > and
>> >        updated all callers.
>> >        (intraprocedural_devirtualization): New function.
>> >        (gate_intra_devirtualization): Likewise.
>> >        (pass_intra_devirt): New pass.
>> >        * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared.
>> >        * passes.c (init_optimization_passes): Schedule pass_intra_devirt.
>> >        * tree-pass.h (pass_intra_devirt): Declare.
>> >
>> >        * testsuite/g++.dg/ipa/imm-devirt-1.C: New test.
>> >        * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise.
>> >
>> >
>> > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
>> > ===================================================================
>> > --- /dev/null
>> > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C
>> > @@ -0,0 +1,62 @@
>> > +/* Verify that virtual calls are folded even when a typecast to an
>> > +   ancestor is involved along the way.  */
>> > +/* { dg-do run } */
>> > +/* { dg-options "-O2 -fdump-tree-devirt"  } */
>> > +
>> > +extern "C" void abort (void);
>> > +
>> > +class A
>> > +{
>> > +public:
>> > +  int data;
>> > +  virtual int foo (int i);
>> > +};
>> > +
>> > +
>> > +class B : public A
>> > +{
>> > +public:
>> > +  __attribute__ ((noinline)) B();
>> > +  virtual int foo (int i);
>> > +};
>> > +
>> > +int __attribute__ ((noinline)) A::foo (int i)
>> > +{
>> > +  return i + 1;
>> > +}
>> > +
>> > +int __attribute__ ((noinline)) B::foo (int i)
>> > +{
>> > +  return i + 2;
>> > +}
>> > +
>> > +int __attribute__ ((noinline,noclone)) get_input(void)
>> > +{
>> > +  return 1;
>> > +}
>> > +
>> > +__attribute__ ((noinline)) B::B()
>> > +{
>> > +}
>> > +
>> > +static inline int middleman_1 (class A *obj, int i)
>> > +{
>> > +  return obj->foo (i);
>> > +}
>> > +
>> > +static inline int middleman_2 (class B *obj, int i)
>> > +{
>> > +  return middleman_1 (obj, i);
>> > +}
>> > +
>> > +int main (int argc, char *argv[])
>> > +{
>> > +  class B b;
>> > +
>> > +  if (middleman_2 (&b, get_input ()) != 3)
>> > +    abort ();
>> > +  return 0;
>> > +}
>> > +
>> > +/* { dg-final { scan-tree-dump "Immediately devirtualizing 
>> > call.*into.*B::foo"  "devirt"  } } */
>> > +/* { dg-final { cleanup-tree-dump "devirt" } } */
>> > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
>> > ===================================================================
>> > --- /dev/null
>> > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C
>> > @@ -0,0 +1,95 @@
>> > +/* Verify that virtual calls are folded even when a typecast to an
>> > +   ancestor is involved along the way.  */
>> > +/* { dg-do run } */
>> > +/* { dg-options "-O2 -fdump-tree-devirt-slim"  } */
>> > +
>> > +extern "C" void abort (void);
>> > +
>> > +class Distraction
>> > +{
>> > +public:
>> > +  float f;
>> > +  double d;
>> > +  Distraction ()
>> > +  {
>> > +    f = 8.3;
>> > +    d = 10.2;
>> > +  }
>> > +  virtual float bar (float z);
>> > +};
>> > +
>> > +class A
>> > +{
>> > +public:
>> > +  int data;
>> > +  virtual int foo (int i);
>> > +};
>> > +
>> > +class B : public A
>> > +{
>> > +public:
>> > +  int data_2;
>> > +  virtual int foo (int i);
>> > +  virtual int baz (int i);
>> > +};
>> > +
>> > +
>> > +class C : public Distraction, public B
>> > +{
>> > +public:
>> > +  __attribute__ ((noinline)) C();
>> > +  virtual int foo (int i);
>> > +};
>> > +
>> > +float __attribute__ ((noinline)) Distraction::bar (float z)
>> > +{
>> > +  f += z;
>> > +  return f/2;
>> > +}
>> > +
>> > +int __attribute__ ((noinline)) A::foo (int i)
>> > +{
>> > +  return i + 1;
>> > +}
>> > +
>> > +int __attribute__ ((noinline)) B::foo (int i)
>> > +{
>> > +  return i + 2;
>> > +}
>> > +
>> > +int __attribute__ ((noinline)) B::baz (int i)
>> > +{
>> > +  return i * 15;
>> > +}
>> > +
>> > +int __attribute__ ((noinline)) C::foo (int i)
>> > +{
>> > +  return i + 3;
>> > +}
>> > +
>> > +int __attribute__ ((noinline,noclone)) get_input(void)
>> > +{
>> > +  return 1;
>> > +}
>> > +
>> > +static inline int middleman (class A *obj, int i)
>> > +{
>> > +  return obj->foo (i);
>> > +}
>> > +
>> > +__attribute__ ((noinline)) C::C()
>> > +{
>> > +}
>> > +
>> > +int main (int argc, char *argv[])
>> > +{
>> > +  class C c;
>> > +
>> > +  if (middleman (&c, get_input ()) != 4)
>> > +    abort ();
>> > +
>> > +  return 0;
>> > +}
>> > +
>> > +/* { dg-final { scan-tree-dump "Immediately devirtualizing 
>> > call.*into.*C::.*foo"  "devirt"  } } */
>> > +/* { dg-final { cleanup-tree-dump "devirt" } } */
>> > Index: src/gcc/ipa-cp.c
>> > ===================================================================
>> > --- src.orig/gcc/ipa-cp.c
>> > +++ src/gcc/ipa-cp.c
>> > @@ -674,20 +674,6 @@ ipa_get_jf_ancestor_result (struct ipa_j
>> >     return NULL_TREE;
>> >  }
>> >
>> > -/* Extract the acual BINFO being described by JFUNC which must be a known 
>> > type
>> > -   jump function.  */
>> > -
>> > -static tree
>> > -ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
>> > -{
>> > -  tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type);
>> > -  if (!base_binfo)
>> > -    return NULL_TREE;
>> > -  return get_binfo_at_offset (base_binfo,
>> > -                             jfunc->value.known_type.offset,
>> > -                             jfunc->value.known_type.component_type);
>> > -}
>> > -
>> >  /* Determine whether JFUNC evaluates to a known value (that is either a
>> >    constant or a binfo) and if so, return it.  Otherwise return NULL. INFO
>> >    describes the caller node so that pass-through jump functions can be
>> > @@ -699,7 +685,7 @@ ipa_value_from_jfunc (struct ipa_node_pa
>> >   if (jfunc->type == IPA_JF_CONST)
>> >     return jfunc->value.constant;
>> >   else if (jfunc->type == IPA_JF_KNOWN_TYPE)
>> > -    return ipa_value_from_known_type_jfunc (jfunc);
>> > +    return ipa_binfo_from_known_type_jfunc (jfunc);
>> >   else if (jfunc->type == IPA_JF_PASS_THROUGH
>> >           || jfunc->type == IPA_JF_ANCESTOR)
>> >     {
>> > @@ -1006,7 +992,7 @@ propagate_accross_jump_function (struct
>> >
>> >       if (jfunc->type == IPA_JF_KNOWN_TYPE)
>> >        {
>> > -         val = ipa_value_from_known_type_jfunc (jfunc);
>> > +         val = ipa_binfo_from_known_type_jfunc (jfunc);
>> >          if (!val)
>> >            return set_lattice_contains_variable (dest_lat);
>> >        }
>> > Index: src/gcc/ipa-prop.c
>> > ===================================================================
>> > --- src.orig/gcc/ipa-prop.c
>> > +++ src/gcc/ipa-prop.c
>> > @@ -3069,3 +3069,107 @@ ipa_update_after_lto_read (void)
>> >     if (node->analyzed)
>> >       ipa_initialize_node_params (node);
>> >  }
>> > +
>> > +/* Extract the acual BINFO being described by JFUNC which must be a known 
>> > type
>> > +   jump function.  */
>> > +
>> > +tree
>> > +ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *jfunc)
>> > +{
>> > +  tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type);
>> > +  if (!base_binfo)
>> > +    return NULL_TREE;
>> > +  return get_binfo_at_offset (base_binfo,
>> > +                             jfunc->value.known_type.offset,
>> > +                             jfunc->value.known_type.component_type);
>> > +}
>> > +
>> > +/* Intraprocedural (early) type-based devirtualization pass.  Looks at 
>> > all call
>> > +   statements and attempts type-base devirtualization on those calling an
>> > +   OBJ_TYPE_REF.  */
>> > +
>> > +static unsigned int
>> > +intraprocedural_devirtualization (void)
>> > +{
>> > +  basic_block bb;
>> > +  gimple_stmt_iterator gsi;
>> > +  bool cfg_changed = false;
>> > +
>> > +  FOR_EACH_BB (bb)
>> > +    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> > +      if (is_gimple_call (gsi_stmt (gsi)))
>> > +       {
>> > +         tree binfo, token, fndecl;
>> > +         gimple call = gsi_stmt (gsi);
>> > +         tree target = gimple_call_fn (call);
>> > +         struct ipa_jump_func jfunc;
>> > +
>> > +         if (TREE_CODE (target) != OBJ_TYPE_REF)
>> > +           continue;
>> > +
>> > +         jfunc.type = IPA_JF_UNKNOWN;
>> > +         compute_known_type_jump_func (OBJ_TYPE_REF_OBJECT (target), 
>> > &jfunc,
>> > +                                       call);
>> > +         if (jfunc.type != IPA_JF_KNOWN_TYPE)
>> > +           continue;
>> > +         binfo = ipa_binfo_from_known_type_jfunc (&jfunc);
>> > +         if (!binfo)
>> > +           continue;
>> > +         token = OBJ_TYPE_REF_TOKEN (target);
>> > +         fndecl = gimple_get_virt_method_for_binfo (tree_low_cst (token, 
>> > 1),
>> > +                                                    binfo);
>> > +         if (!fndecl)
>> > +           continue;
>> > +
>> > +         if (dump_file)
>> > +           {
>> > +             fprintf (dump_file, "ipa-prop: Immediately devirtualizing 
>> > call ");
>> > +             print_gimple_expr (dump_file, call, 0, 0);
>> > +           }
>> > +
>> > +         gimple_call_set_fndecl (call, fndecl);
>> > +         update_stmt (call);
>> > +         if (maybe_clean_eh_stmt (call)
>> > +             && gimple_purge_dead_eh_edges (bb))
>> > +           cfg_changed = true;
>> > +
>> > +         if (dump_file)
>> > +           {
>> > +             fprintf (dump_file, " into ");
>> > +             print_gimple_expr (dump_file, call, 0, 0);
>> > +             fprintf (dump_file, "\n");
>> > +           }
>> > +       }
>> > +
>> > +
>> > + if (cfg_changed)
>> > +    return TODO_cleanup_cfg;
>> > +  else
>> > +    return 0;
>> > +}
>> > +
>> > +static bool
>> > +gate_intra_devirtualization (void)
>> > +{
>> > +  return flag_devirtualize != 0;
>> > +}
>> > +
>> > +
>> > +struct gimple_opt_pass pass_intra_devirt =
>> > +  {
>> > +    {
>> > +      GIMPLE_PASS,
>> > +      "devirt",                                        /* name */
>> > +      gate_intra_devirtualization,             /* gate */
>> > +      intraprocedural_devirtualization,        /* execute */
>> > +      NULL,                                    /* sub */
>> > +      NULL,                                    /* next */
>> > +      0,                                       /* static_pass_number */
>> > +      TV_CGRAPH,                               /* tv_id */
>> > +      PROP_cfg,                                        /* 
>> > properties_required */
>> > +      0,                                       /* properties_provided */
>> > +      0,                                       /* properties_destroyed */
>> > +      0,                                       /* todo_flags_start */
>> > +      0,                                       /* todo_flags_finish */
>> > +    }
>> > +  };
>> > Index: src/gcc/ipa-prop.h
>> > ===================================================================
>> > --- src.orig/gcc/ipa-prop.h
>> > +++ src/gcc/ipa-prop.h
>> > @@ -347,6 +347,7 @@ bool ipa_propagate_indirect_call_infos (
>> >
>> >  /* Indirect edge and binfo processing.  */
>> >  struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, 
>> > tree);
>> > +tree ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *);
>> >
>> >  /* Functions related to both.  */
>> >  void ipa_analyze_node (struct cgraph_node *);
>> > Index: src/gcc/passes.c
>> > ===================================================================
>> > --- src.orig/gcc/passes.c
>> > +++ src/gcc/passes.c
>> > @@ -1225,6 +1225,7 @@ init_optimization_passes (void)
>> >           NEXT_PASS (pass_cleanup_eh);
>> >           NEXT_PASS (pass_profile);
>> >           NEXT_PASS (pass_local_pure_const);
>> > +         NEXT_PASS (pass_intra_devirt);
>> >          /* Split functions creates parts that are not run through
>> >             early optimizations again.  It is thus good idea to do this
>> >             late.  */
>> > Index: src/gcc/tree-pass.h
>> > ===================================================================
>> > --- src.orig/gcc/tree-pass.h
>> > +++ src/gcc/tree-pass.h
>> > @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_trace
>> >  extern struct gimple_opt_pass pass_warn_unused_result;
>> >  extern struct gimple_opt_pass pass_split_functions;
>> >  extern struct gimple_opt_pass pass_feedback_split_functions;
>> > +extern struct gimple_opt_pass pass_intra_devirt;
>> >
>> >  /* IPA Passes */
>> >  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
>> >
>

Reply via email to