On Wed, Nov 2, 2011 at 12:12 PM, Martin Jambor <mjam...@suse.cz> wrote: > Hi, > > On Wed, Nov 02, 2011 at 11:02:48AM +0100, Richard Guenther wrote: >> On Tue, Nov 1, 2011 at 11:06 PM, Martin Jambor <mjam...@suse.cz> wrote: >> > Hi, >> > >> > the patch below is the second (and last) revived type-based >> > devirtualization patch that did not make it to 4.6. It deals with >> > virtual calls from the function in which the there is also the object >> > declaration: >> > >> > void foo() >> > { >> > A a; >> > >> > a.foo (); >> > } >> > >> > Normally, the front-end would do the devirtualization on its own, >> > however, early-inlining can create these situations too. Since there >> > is nothing interprocedural going on, the current inlining and IPA-CP >> > devirtualization bits are of no help. We do not do type-based >> > devirtualization in OBJ_TYPE_REF folding either, because the necessary >> > type-changing checks might make it quite expensive. >> >> But in the above case - a is not address-taken - we do not need >> type-based devirtualization. So, can you explain why we do not >> forward the proper vtable from the vtable store that would lead to >> the known-type info? >> > > We can only do that when the object constructor is early-inlined. If > it is only slightly bigger, or in a different compilation unit, the > store ends up in a different function (or unit) and we cannot do it. > LTO is not necessary with this approach and we still can be quite > helpful if foo is in the same TU and we can inline it.
So we are using that fishy calls-cannot-change-object-types logic here? Even if the actual call is the constructor? Well, the cases seem to be pretty obscure - do we really want to add another walk over the entire IL for that? Can't you piggy-back on some other transform for that? Like for example FREs eliminate() walk, which already does some of the devirt work? Thanks, Richard. > Martin > > >> Thanks, >> Richard. >> >> > Thus, this patch >> > introduces a new pass to do that. >> > >> > The patch basically piggy-tails on the intraprocedural >> > devirtualization mechanism, trying to construct a known-type jump >> > function for all objects in OBJ_TYPE_REF calls and then immediately >> > using it like we do in IPA-CP. >> > >> > The original patch was doing this as a part of >> > pass_rebuild_cgraph_edges. Honza did not like this idea so I made it >> > a separate pass. First, I scheduled it after >> > pass_rebuild_cgraph_edges and was only traversing indirect edges, >> > avoiding a sweep over all of the IL. Unfortunately, this does not >> > work in one scenario. If the newly-known destination of a virtual >> > call is known not to throw, we may have to purge some EH CFG edges and >> > potentially basic blocks. If these basic blocks contain calls >> > (typically calls to object destructors), we may end up having stale >> > call edges in the call graph... and our current approach to that >> > problem is to call pass_rebuild_cgraph_edges. I think that I was not >> > running into this problem when the mechanism was a part of that pass >> > just because of pure luck. Anyway, this is why I eventually opted for >> > a sweep over the statements. >> > >> > My best guess is that it is probably worth it, but only because the >> > overhead should be still fairly low. The pass triggers quite a number >> > of times when building libstdc++ and it can speed up an artificial >> > testcase which I will attach from over 20 seconds to 7s on my older >> > desktop - it is very similar to the one I posted with the previous >> > patch but this time the object constructors must not get early inlined >> > but the function multiply_matrices has to. Currently I have problems >> > compiling Firefox even without LTO so I don't have any numbers from it >> > either. IIRC, Honza did not see this patch trigger there when he >> > tried the ancient version almost a year go. On the other hand, Maxim >> > claimed that the impact can be noticeable on some code base he is >> > concerned about. >> > >> > I have successfully bootstrapped and tested the patch on x86_64-linux. >> > What do you think, should we include this in trunk? >> > >> > Thanks, >> > >> > Martin >> > >> > >> > 2011-10-31 Martin Jambor <mjam...@suse.cz> >> > >> > * ipa-cp.c (ipa_value_from_known_type_jfunc): Moved to... >> > * ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, exported >> > and >> > updated all callers. >> > (intraprocedural_devirtualization): New function. >> > (gate_intra_devirtualization): Likewise. >> > (pass_intra_devirt): New pass. >> > * ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declared. >> > * passes.c (init_optimization_passes): Schedule pass_intra_devirt. >> > * tree-pass.h (pass_intra_devirt): Declare. >> > >> > * testsuite/g++.dg/ipa/imm-devirt-1.C: New test. >> > * testsuite/g++.dg/ipa/imm-devirt-2.C: Likewise. >> > >> > >> > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C >> > =================================================================== >> > --- /dev/null >> > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-1.C >> > @@ -0,0 +1,62 @@ >> > +/* Verify that virtual calls are folded even when a typecast to an >> > + ancestor is involved along the way. */ >> > +/* { dg-do run } */ >> > +/* { dg-options "-O2 -fdump-tree-devirt" } */ >> > + >> > +extern "C" void abort (void); >> > + >> > +class A >> > +{ >> > +public: >> > + int data; >> > + virtual int foo (int i); >> > +}; >> > + >> > + >> > +class B : public A >> > +{ >> > +public: >> > + __attribute__ ((noinline)) B(); >> > + virtual int foo (int i); >> > +}; >> > + >> > +int __attribute__ ((noinline)) A::foo (int i) >> > +{ >> > + return i + 1; >> > +} >> > + >> > +int __attribute__ ((noinline)) B::foo (int i) >> > +{ >> > + return i + 2; >> > +} >> > + >> > +int __attribute__ ((noinline,noclone)) get_input(void) >> > +{ >> > + return 1; >> > +} >> > + >> > +__attribute__ ((noinline)) B::B() >> > +{ >> > +} >> > + >> > +static inline int middleman_1 (class A *obj, int i) >> > +{ >> > + return obj->foo (i); >> > +} >> > + >> > +static inline int middleman_2 (class B *obj, int i) >> > +{ >> > + return middleman_1 (obj, i); >> > +} >> > + >> > +int main (int argc, char *argv[]) >> > +{ >> > + class B b; >> > + >> > + if (middleman_2 (&b, get_input ()) != 3) >> > + abort (); >> > + return 0; >> > +} >> > + >> > +/* { dg-final { scan-tree-dump "Immediately devirtualizing >> > call.*into.*B::foo" "devirt" } } */ >> > +/* { dg-final { cleanup-tree-dump "devirt" } } */ >> > Index: src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C >> > =================================================================== >> > --- /dev/null >> > +++ src/gcc/testsuite/g++.dg/ipa/imm-devirt-2.C >> > @@ -0,0 +1,95 @@ >> > +/* Verify that virtual calls are folded even when a typecast to an >> > + ancestor is involved along the way. */ >> > +/* { dg-do run } */ >> > +/* { dg-options "-O2 -fdump-tree-devirt-slim" } */ >> > + >> > +extern "C" void abort (void); >> > + >> > +class Distraction >> > +{ >> > +public: >> > + float f; >> > + double d; >> > + Distraction () >> > + { >> > + f = 8.3; >> > + d = 10.2; >> > + } >> > + virtual float bar (float z); >> > +}; >> > + >> > +class A >> > +{ >> > +public: >> > + int data; >> > + virtual int foo (int i); >> > +}; >> > + >> > +class B : public A >> > +{ >> > +public: >> > + int data_2; >> > + virtual int foo (int i); >> > + virtual int baz (int i); >> > +}; >> > + >> > + >> > +class C : public Distraction, public B >> > +{ >> > +public: >> > + __attribute__ ((noinline)) C(); >> > + virtual int foo (int i); >> > +}; >> > + >> > +float __attribute__ ((noinline)) Distraction::bar (float z) >> > +{ >> > + f += z; >> > + return f/2; >> > +} >> > + >> > +int __attribute__ ((noinline)) A::foo (int i) >> > +{ >> > + return i + 1; >> > +} >> > + >> > +int __attribute__ ((noinline)) B::foo (int i) >> > +{ >> > + return i + 2; >> > +} >> > + >> > +int __attribute__ ((noinline)) B::baz (int i) >> > +{ >> > + return i * 15; >> > +} >> > + >> > +int __attribute__ ((noinline)) C::foo (int i) >> > +{ >> > + return i + 3; >> > +} >> > + >> > +int __attribute__ ((noinline,noclone)) get_input(void) >> > +{ >> > + return 1; >> > +} >> > + >> > +static inline int middleman (class A *obj, int i) >> > +{ >> > + return obj->foo (i); >> > +} >> > + >> > +__attribute__ ((noinline)) C::C() >> > +{ >> > +} >> > + >> > +int main (int argc, char *argv[]) >> > +{ >> > + class C c; >> > + >> > + if (middleman (&c, get_input ()) != 4) >> > + abort (); >> > + >> > + return 0; >> > +} >> > + >> > +/* { dg-final { scan-tree-dump "Immediately devirtualizing >> > call.*into.*C::.*foo" "devirt" } } */ >> > +/* { dg-final { cleanup-tree-dump "devirt" } } */ >> > Index: src/gcc/ipa-cp.c >> > =================================================================== >> > --- src.orig/gcc/ipa-cp.c >> > +++ src/gcc/ipa-cp.c >> > @@ -674,20 +674,6 @@ ipa_get_jf_ancestor_result (struct ipa_j >> > return NULL_TREE; >> > } >> > >> > -/* Extract the acual BINFO being described by JFUNC which must be a known >> > type >> > - jump function. */ >> > - >> > -static tree >> > -ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc) >> > -{ >> > - tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type); >> > - if (!base_binfo) >> > - return NULL_TREE; >> > - return get_binfo_at_offset (base_binfo, >> > - jfunc->value.known_type.offset, >> > - jfunc->value.known_type.component_type); >> > -} >> > - >> > /* Determine whether JFUNC evaluates to a known value (that is either a >> > constant or a binfo) and if so, return it. Otherwise return NULL. INFO >> > describes the caller node so that pass-through jump functions can be >> > @@ -699,7 +685,7 @@ ipa_value_from_jfunc (struct ipa_node_pa >> > if (jfunc->type == IPA_JF_CONST) >> > return jfunc->value.constant; >> > else if (jfunc->type == IPA_JF_KNOWN_TYPE) >> > - return ipa_value_from_known_type_jfunc (jfunc); >> > + return ipa_binfo_from_known_type_jfunc (jfunc); >> > else if (jfunc->type == IPA_JF_PASS_THROUGH >> > || jfunc->type == IPA_JF_ANCESTOR) >> > { >> > @@ -1006,7 +992,7 @@ propagate_accross_jump_function (struct >> > >> > if (jfunc->type == IPA_JF_KNOWN_TYPE) >> > { >> > - val = ipa_value_from_known_type_jfunc (jfunc); >> > + val = ipa_binfo_from_known_type_jfunc (jfunc); >> > if (!val) >> > return set_lattice_contains_variable (dest_lat); >> > } >> > Index: src/gcc/ipa-prop.c >> > =================================================================== >> > --- src.orig/gcc/ipa-prop.c >> > +++ src/gcc/ipa-prop.c >> > @@ -3069,3 +3069,107 @@ ipa_update_after_lto_read (void) >> > if (node->analyzed) >> > ipa_initialize_node_params (node); >> > } >> > + >> > +/* Extract the acual BINFO being described by JFUNC which must be a known >> > type >> > + jump function. */ >> > + >> > +tree >> > +ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *jfunc) >> > +{ >> > + tree base_binfo = TYPE_BINFO (jfunc->value.known_type.base_type); >> > + if (!base_binfo) >> > + return NULL_TREE; >> > + return get_binfo_at_offset (base_binfo, >> > + jfunc->value.known_type.offset, >> > + jfunc->value.known_type.component_type); >> > +} >> > + >> > +/* Intraprocedural (early) type-based devirtualization pass. Looks at >> > all call >> > + statements and attempts type-base devirtualization on those calling an >> > + OBJ_TYPE_REF. */ >> > + >> > +static unsigned int >> > +intraprocedural_devirtualization (void) >> > +{ >> > + basic_block bb; >> > + gimple_stmt_iterator gsi; >> > + bool cfg_changed = false; >> > + >> > + FOR_EACH_BB (bb) >> > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> > + if (is_gimple_call (gsi_stmt (gsi))) >> > + { >> > + tree binfo, token, fndecl; >> > + gimple call = gsi_stmt (gsi); >> > + tree target = gimple_call_fn (call); >> > + struct ipa_jump_func jfunc; >> > + >> > + if (TREE_CODE (target) != OBJ_TYPE_REF) >> > + continue; >> > + >> > + jfunc.type = IPA_JF_UNKNOWN; >> > + compute_known_type_jump_func (OBJ_TYPE_REF_OBJECT (target), >> > &jfunc, >> > + call); >> > + if (jfunc.type != IPA_JF_KNOWN_TYPE) >> > + continue; >> > + binfo = ipa_binfo_from_known_type_jfunc (&jfunc); >> > + if (!binfo) >> > + continue; >> > + token = OBJ_TYPE_REF_TOKEN (target); >> > + fndecl = gimple_get_virt_method_for_binfo (tree_low_cst (token, >> > 1), >> > + binfo); >> > + if (!fndecl) >> > + continue; >> > + >> > + if (dump_file) >> > + { >> > + fprintf (dump_file, "ipa-prop: Immediately devirtualizing >> > call "); >> > + print_gimple_expr (dump_file, call, 0, 0); >> > + } >> > + >> > + gimple_call_set_fndecl (call, fndecl); >> > + update_stmt (call); >> > + if (maybe_clean_eh_stmt (call) >> > + && gimple_purge_dead_eh_edges (bb)) >> > + cfg_changed = true; >> > + >> > + if (dump_file) >> > + { >> > + fprintf (dump_file, " into "); >> > + print_gimple_expr (dump_file, call, 0, 0); >> > + fprintf (dump_file, "\n"); >> > + } >> > + } >> > + >> > + >> > + if (cfg_changed) >> > + return TODO_cleanup_cfg; >> > + else >> > + return 0; >> > +} >> > + >> > +static bool >> > +gate_intra_devirtualization (void) >> > +{ >> > + return flag_devirtualize != 0; >> > +} >> > + >> > + >> > +struct gimple_opt_pass pass_intra_devirt = >> > + { >> > + { >> > + GIMPLE_PASS, >> > + "devirt", /* name */ >> > + gate_intra_devirtualization, /* gate */ >> > + intraprocedural_devirtualization, /* execute */ >> > + NULL, /* sub */ >> > + NULL, /* next */ >> > + 0, /* static_pass_number */ >> > + TV_CGRAPH, /* tv_id */ >> > + PROP_cfg, /* >> > properties_required */ >> > + 0, /* properties_provided */ >> > + 0, /* properties_destroyed */ >> > + 0, /* todo_flags_start */ >> > + 0, /* todo_flags_finish */ >> > + } >> > + }; >> > Index: src/gcc/ipa-prop.h >> > =================================================================== >> > --- src.orig/gcc/ipa-prop.h >> > +++ src/gcc/ipa-prop.h >> > @@ -347,6 +347,7 @@ bool ipa_propagate_indirect_call_infos ( >> > >> > /* Indirect edge and binfo processing. */ >> > struct cgraph_edge *ipa_make_edge_direct_to_target (struct cgraph_edge *, >> > tree); >> > +tree ipa_binfo_from_known_type_jfunc (struct ipa_jump_func *); >> > >> > /* Functions related to both. */ >> > void ipa_analyze_node (struct cgraph_node *); >> > Index: src/gcc/passes.c >> > =================================================================== >> > --- src.orig/gcc/passes.c >> > +++ src/gcc/passes.c >> > @@ -1225,6 +1225,7 @@ init_optimization_passes (void) >> > NEXT_PASS (pass_cleanup_eh); >> > NEXT_PASS (pass_profile); >> > NEXT_PASS (pass_local_pure_const); >> > + NEXT_PASS (pass_intra_devirt); >> > /* Split functions creates parts that are not run through >> > early optimizations again. It is thus good idea to do this >> > late. */ >> > Index: src/gcc/tree-pass.h >> > =================================================================== >> > --- src.orig/gcc/tree-pass.h >> > +++ src/gcc/tree-pass.h >> > @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_trace >> > extern struct gimple_opt_pass pass_warn_unused_result; >> > extern struct gimple_opt_pass pass_split_functions; >> > extern struct gimple_opt_pass pass_feedback_split_functions; >> > +extern struct gimple_opt_pass pass_intra_devirt; >> > >> > /* IPA Passes */ >> > extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >> > >