[Bug tree-optimization/27882] [4.2 regression] segfault in ipa-inline.c, if (e->callee->local.disregard_inline_limits
--- Comment #14 from hubicka at ucw dot cz 2006-06-07 12:18 --- Subject: Re: [4.2 regression] segfault in ipa-inline.c, if (e->callee->local.disregard_inline_limits > > > --- Comment #12 from pinskia at gcc dot gnu dot org 2006-06-07 06:00 > --- > Wait in tree-inline.c, we do: > /* Update callgraph if needed. */ > cgraph_remove_node (cg_edge->callee); > > Isn't that wrong as we could inline the callee a couple of times? It should be OK - if we inline multiple times, we create multiple nodes. I will look into this. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27882
[Bug middle-end/38074] [4.4 Regression] missed inlining on Core2 Duo due to apparent wrong branch prediction/profile
--- Comment #9 from hubicka at ucw dot cz 2008-12-05 12:59 --- Subject: Re: [4.4 Regression] missed inlining on Core2 Duo due to apparent wrong branch prediction/profile > Honza, can you have a look here? I suspect the fortran decl issue prevent Will do. We however don't distribute profile over cgraph... This looks more like one of misguesses cases Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38074
[Bug middle-end/38074] [4.4 Regression] missed inlining on Core2 Duo due to apparent wrong branch prediction/profile
--- Comment #11 from hubicka at ucw dot cz 2008-12-05 17:15 --- Subject: Re: [4.4 Regression] missed inlining on Core2 Duo due to apparent wrong branch prediction/profile OK, so the problem is that all the paths are leading to noreturn, so the conditional deciding on what noreturn path will be taken is predictor by same heuristic in both dirrections. Our first math heuristic blindly picks the first one in row predicting the "invalid command line parameters" path to be very likely and main body to be very unlikely. This patch fixes it in both ways: when all paths are leading to noreturn, we disable the heuristics and when heuristics is taken into consideration for first match we first check that it was not confused and did not predict edge in both ways. I also fixed nonsence in compute_call_stmt_bb_frequency noticed by Jakub. To make frequencies at least little bit sane, I simply add 1 to both values so we still get calls with higher frequency than 0 predicted as more often. Honza Jan Hubicka <[EMAIL PROTECTED]> Jakub Jelinek <[EMAIL PROTECTED]> * cgraphbuild.c (compute_call_stmt_bb_frequency): Fix handling of 0 entry frequency. * predict.c (combine_predictions_for_bb): Ignore predictor predicting in both dirrection for first match heuristics. (tree_bb_level_predictions): Disable noreturn heuristic when there is no returning path. Index: cgraphbuild.c === *** cgraphbuild.c (revision 141929) --- cgraphbuild.c (working copy) *** int *** 109,121 compute_call_stmt_bb_frequency (basic_block bb) { int entry_freq = ENTRY_BLOCK_PTR->frequency; ! int freq; if (!entry_freq) ! entry_freq = 1; ! freq = (!bb->frequency && !entry_freq ? CGRAPH_FREQ_BASE ! : bb->frequency * CGRAPH_FREQ_BASE / entry_freq); if (freq > CGRAPH_FREQ_MAX) freq = CGRAPH_FREQ_MAX; --- 109,120 compute_call_stmt_bb_frequency (basic_block bb) { int entry_freq = ENTRY_BLOCK_PTR->frequency; ! int freq = bb->frequency; if (!entry_freq) ! entry_freq = 1, freq++; ! freq = freq * CGRAPH_FREQ_BASE / entry_freq; if (freq > CGRAPH_FREQ_MAX) freq = CGRAPH_FREQ_MAX; Index: predict.c === *** predict.c (revision 141929) --- predict.c (working copy) *** combine_predictions_for_bb (basic_block *** 820,827 probability = REG_BR_PROB_BASE - probability; found = true; if (best_predictor > predictor) ! best_probability = probability, best_predictor = predictor; d = (combined_probability * probability + (REG_BR_PROB_BASE - combined_probability) --- 820,852 probability = REG_BR_PROB_BASE - probability; found = true; + /* First match heuristics would be widly confused if we predicted +both directions. */ if (best_predictor > predictor) ! { ! struct edge_prediction *pred2; ! int prob = probability; ! ! for (pred2 = (struct edge_prediction *) *preds; pred2; pred2 = pred2->ep_next) ! if (pred2 != pred && pred2->ep_predictor == pred->ep_predictor) !{ ! int probability2 = pred->ep_probability; ! ! if (pred2->ep_edge != first) !probability2 = REG_BR_PROB_BASE - probability2; ! ! if ((probability < REG_BR_PROB_BASE / 2) != ! (probability2 < REG_BR_PROB_BASE / 2)) !break; ! ! /* If the same predictor later gave better result, go for it! */ ! if ((probability >= REG_BR_PROB_BASE / 2 && (probability2 > probability)) ! || (probability <= REG_BR_PROB_BASE / 2 && (probability2 < probability))) !prob = probability2; !} ! if (!pred2) ! best_probability = prob, best_predictor = predictor; ! } d = (combined_probability * probability + (REG_BR_PROB_BASE - combined_probability) *** static void *** 1521,1526 --- 1546,1561 tree_bb_level_predictions (void) { basic_block bb; + bool has_return_edges = false; + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) + if (!(e->flags & (EDGE_ABNORMAL | EDGE_FAKE | EDGE_EH))) + { + has_return_edges = true; + break; + } apply_return_prediction (); *** tree_bb_level_predictions (void) *** 1535,1541 if (is_gimple_call (stmt)) { ! if (gimple_call_flags (stmt) &
[Bug target/38824] [4.4 regression] performance regression of sse code from 4.2/4.3
--- Comment #7 from hubicka at ucw dot cz 2009-01-15 01:49 --- Subject: Re: [4.4 regression] performance regression of sse code from 4.2/4.3 I guess th3 main difference here is that load + addps pair generate 2 uops, while mov + loading addps generate 3 since the move has to go through the queue. I will try to change testcase to fit in cache to see if AMD machine reproduce it too.. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824
[Bug c++/39862] [4.5 Regression] verify_eh_tree failed with -O2
--- Comment #5 from hubicka at ucw dot cz 2009-05-08 19:44 --- Subject: Re: [4.5 Regression] verify_eh_tree failed with -O2 This is very sick side case of updating prev_try pointer in duplicate_eh_edges. I think it is clear that maintaining prev_try pointer just to slightly speed up the lookup in foreach_reachable_handler is highly impractical. Once the other bugfix to EH code is in the tree, I will test patch removing prev_try and replacing its use by find_prev_try. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39862
[Bug bootstrap/40082] [4.5 Regression] Power bootstrap is broken in building libstdc++
--- Comment #3 from hubicka at ucw dot cz 2009-05-09 14:44 --- Subject: Re: [4.5 Regression] Power bootstrap is broken in building libstdc++ Hi, I am testing the attached patch. It makes testcase to compile, does it solve bootstrap issues too? Index: ipa.c === --- ipa.c (revision 147317) +++ ipa.c (working copy) @@ -92,6 +92,21 @@ cgraph_postorder (struct cgraph_node **o return order_pos; } +/* Look for all functions inlined to NODE and update their inlined_to pointers + to INLINED_TO. */ + +static void +update_inlined_to_pointer (struct cgraph_node *node, struct cgraph_node *inlined_to) +{ + struct cgraph_edge *e; + for (e = node->callees; e; e = e->next_callee) +if (e->callee->global.inlined_to) + { +e->callee->global.inlined_to = inlined_to; + update_inlined_to_pointer (e->callee, inlined_to); + } +} + /* Perform reachability analysis and reclaim all unreachable nodes. If BEFORE_INLINING_P is true this function is called before inlining decisions has been made. If BEFORE_INLINING_P is false this function also @@ -214,7 +229,8 @@ cgraph_remove_unreachable_nodes (bool be && !node->callers) { gcc_assert (node->clones); - node->global.inlined_to = false; + node->global.inlined_to = NULL; + update_inlined_to_pointer (node, node); } node->aux = NULL; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40082
[Bug middle-end/39886] [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274
--- Comment #2 from hubicka at ucw dot cz 2009-05-09 18:29 --- Subject: Re: [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274 The problem here is that combine constructs (set (pc) (pc)) as noo-op move expecting it to be removed immediately. It is however misinterpreted as jump that is giong to be removed at the end of BB and update_cfg_for_uncondjump add FALLTHRU flag on the edge that fails in verification. Interestingly enough modifying update_cfg_for_uncondjump to ignore insns that are not last in BB seem to cause miscompilations in testsuite I am looking into the cgraph failures now, but this should be easy to fix. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39886
[Bug middle-end/40080] [4.5 Regression] error: missing callgraph edge for call stmt
--- Comment #3 from hubicka at ucw dot cz 2009-05-09 18:52 --- Subject: Re: [4.5 Regression] error: missing callgraph edge for call stmt Hi, I am testing the following: Index: cgraphunit.c === --- cgraphunit.c(revision 147319) +++ cgraphunit.c(working copy) @@ -1762,7 +1762,12 @@ cgraph_materialize_all_clones (void) for (e = node->callees; e; e = e->next_callee) { tree decl = gimple_call_fndecl (e->call_stmt); - if (decl != e->callee->decl) + /* When function gets inlined, indirect inlining might've invented + new edge for orginally indirect stmt. Since we are not + preserving clones in the original form, we must not update here + since other inline clones don't need to contain call to the same + call. Inliner will do the substitution for us later. */ + if (decl && decl != e->callee->decl) { gimple new_stmt; gimple_stmt_iterator gsi; @@ -1808,6 +1813,9 @@ cgraph_materialize_all_clones (void) verify_cgraph_node (node); #endif } +#ifdef ENABLE_CHECKING + verify_cgraph (); +#endif cgraph_remove_unreachable_nodes (false, cgraph_dump_file); } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40080
[Bug middle-end/40084] [4.5 Regression] Revision 147294 failed 483.xalancbmk in SPEC CPU 2006 at -O3
--- Comment #4 from hubicka at ucw dot cz 2009-05-09 21:06 --- Subject: Re: [4.5 Regression] Revision 147294 failed 483.xalancbmk in SPEC CPU 2006 at -O3 Hi, I am testing the following patch thasolves the ICE. Problem here was that we cleated ipa-cp clone and clonning proces allowed devirtualization that however creates new dirrect call and callgraph is not properly updated. We should handle devirtualization when inlining or const propagating, this seems common case. We seem to miss here number of inlining oppurtunities. Honza Index: tree-inline.c === --- tree-inline.c (revision 147320) +++ tree-inline.c (working copy) @@ -1522,7 +1522,8 @@ copy_bb (copy_body_data *id, basic_block gcc_assert (dest->needed || !dest->analyzed); if (id->transform_call_graph_edges == CB_CGE_MOVE_CLONES) cgraph_create_edge_including_clones (id->dst_node, dest, stmt, - bb->count, CGRAPH_FREQ_BASE, + bb->count, + compute_call_stmt_bb_frequency (id->dst_node->decl, bb), bb->loop_depth, CIF_ORIGINALLY_INDIRECT_CALL); else @@ -3535,8 +3536,9 @@ fold_marked_statements (int first, struc if (BASIC_BLOCK (first)) { gimple_stmt_iterator gsi; + basic_block bb = BASIC_BLOCK (first); - for (gsi = gsi_start_bb (BASIC_BLOCK (first)); + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) if (pointer_set_contains (statements, gsi_stmt (gsi))) @@ -3545,14 +3547,26 @@ fold_marked_statements (int first, struc if (fold_stmt (&gsi)) { + tree decl; + /* Re-read the statement from GSI as fold_stmt() may have changed it. */ gimple new_stmt = gsi_stmt (gsi); update_stmt (new_stmt); - if (is_gimple_call (old_stmt)) - cgraph_update_edges_for_call_stmt (old_stmt, new_stmt); + if (is_gimple_call (new_stmt) + && (decl = gimple_call_fndecl (new_stmt))) + { + struct cgraph_node *node = cgraph_node (current_function_decl); + if (cgraph_edge (node, old_stmt)) + cgraph_update_edges_for_call_stmt (old_stmt, new_stmt); + else + cgraph_create_edge_including_clones + (node, cgraph_node (decl), new_stmt, bb->count, + compute_call_stmt_bb_frequency (current_function_decl, bb), + bb->loop_depth, CIF_ORIGINALLY_INDIRECT_CALL); +} if (maybe_clean_or_replace_eh_stmt (old_stmt, new_stmt)) gimple_purge_dead_eh_edges (BASIC_BLOCK (first)); } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40084
[Bug tree-optimization/40436] [4.5 regression] 0.5% code size regression caused by r147852
--- Comment #7 from hubicka at ucw dot cz 2009-06-30 12:46 --- Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > The problem is that early inliner allows to increase code size estimate by > inlining single function by up to 12 instructions. This is higher than on > pretty-ipa branch still, since we are not that good on early optimizing yet > and > some C++ benchmarks (tramp/botan/boost) degrade when reduced to 7 as used by > tramp3d. In tramp3d it is mostly caused by dead loops in constructors, and I ^^ pretty-ipa :) > hope that merging IPA-SRA and CD-DCE improvements will care this on all three > benchmarks. At -O2 early inliner needs to be somewhat speculative since it > don't know the function profiles yet. It however seems stupid to allow code > size growth at -Os in general. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436
[Bug debug/40573] [4.4/4.5 Regression] DWARF for inlined subroutines refers to the outlined copy
--- Comment #3 from hubicka at ucw dot cz 2009-06-30 14:07 --- Subject: Re: [4.4/4.5 Regression] DWARF for inlined subroutines refers to the outlined copy Hmm, I tought GCC was doing the same thing for years. So we need abstract function for each inline? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40573
[Bug tree-optimization/40436] [4.5 regression] 0.5% code size regression caused by r147852
--- Comment #11 from hubicka at ucw dot cz 2009-06-30 23:36 --- Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 > I see no effect whatsoever of the patch for for CSiBE on arm-elf-unknown. At -Os? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436
[Bug tree-optimization/40585] [4.3/4.4/4.5 Regression] tracer duplicates blocks w/o adjusting EH tree
--- Comment #4 from hubicka at ucw dot cz 2009-07-01 10:47 --- Subject: Re: [4.3/4.4/4.5 Regression] tracer duplicates blocks w/o adjusting EH tree Hi, the following patch should prevent tracer from copying resx. It is remarkably ugly I need to unconstify all the copy_bb_p predicates, so perhaps I will look into fixing the RTL expanders to allow multiple RESX blocks instead. Index: cfghooks.c === --- cfghooks.c (revision 149102) +++ cfghooks.c (working copy) @@ -874,7 +874,7 @@ tidy_fallthru_edges (void) /* Returns true if we can duplicate basic block BB. */ bool -can_duplicate_block_p (const_basic_block bb) +can_duplicate_block_p (basic_block bb) { if (!cfg_hooks->can_duplicate_block_p) internal_error ("%s does not support can_duplicate_block_p", Index: cfghooks.h === --- cfghooks.h (revision 149102) +++ cfghooks.h (working copy) @@ -75,7 +75,7 @@ struct cfg_hooks bool (*predicted_by_p) (const_basic_block bb, enum br_predictor predictor); /* Return true when block A can be duplicated. */ - bool (*can_duplicate_block_p) (const_basic_block a); + bool (*can_duplicate_block_p) (basic_block a); /* Duplicate block A. */ basic_block (*duplicate_block) (basic_block a); @@ -160,7 +160,7 @@ extern void tidy_fallthru_edge (edge); extern void tidy_fallthru_edges (void); extern void predict_edge (edge e, enum br_predictor predictor, int probability); extern bool predicted_by_p (const_basic_block bb, enum br_predictor predictor); -extern bool can_duplicate_block_p (const_basic_block); +extern bool can_duplicate_block_p (basic_block); extern basic_block duplicate_block (basic_block, edge, basic_block); extern bool block_ends_with_call_p (basic_block bb); extern bool block_ends_with_condjump_p (const_basic_block bb); Index: bb-reorder.c === --- bb-reorder.c(revision 149102) +++ bb-reorder.c(working copy) @@ -176,7 +176,7 @@ static basic_block copy_bb (basic_block, static fibheapkey_t bb_to_key (basic_block); static bool better_edge_p (const_basic_block, const_edge, int, int, int, int, const_edge); static void connect_traces (int, struct trace *); -static bool copy_bb_p (const_basic_block, int); +static bool copy_bb_p (basic_block, int); static int get_uncond_jump_length (void); static bool push_to_next_round_p (const_basic_block, int, int, int, gcov_type); static void find_rarely_executed_basic_blocks_and_crossing_edges (edge **, @@ -1157,7 +1157,7 @@ connect_traces (int n_traces, struct tra when code size is allowed to grow by duplication. */ static bool -copy_bb_p (const_basic_block bb, int code_may_grow) +copy_bb_p (basic_block bb, int code_may_grow) { int size = 0; int max_size = uncond_jump_length; Index: tree-cfg.c === --- tree-cfg.c (revision 149102) +++ tree-cfg.c (working copy) @@ -5131,8 +5131,17 @@ gimple_move_block_after (basic_block bb, /* Return true if basic_block can be duplicated. */ static bool -gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED) +gimple_can_duplicate_bb_p (basic_block bb) { + gimple_stmt_iterator gsi = gsi_last_bb (bb); + + /* RTL expander has quite artificial limitation to at most one RESX instruction + per region. It can be fixed by turning 1-1 map to 1-many map, but since the + code needs to be rewritten to gimple level lowering and there is little reason + for duplicating RESX instructions in order to optimize code performance, we + just disallow it for the moment. */ + if (!gsi_end_p (gsi) && gimple_code (gsi_stmt (gsi)) == GIMPLE_RESX) +return false; return true; } Index: cfgrtl.c === --- cfgrtl.c(revision 149102) +++ cfgrtl.c(working copy) @@ -3161,7 +3161,7 @@ struct cfg_hooks rtl_cfg_hooks = { should only be used through the cfghooks interface, and we do not want to move them here since it would require also moving quite a lot of related code. They are in cfglayout.c. */ -extern bool cfg_layout_can_duplicate_bb_p (const_basic_block); +extern bool cfg_layout_can_duplicate_bb_p (basic_block); extern basic_block cfg_layout_duplicate_bb (basic_block); struct cfg_hooks cfg_layout_rtl_cfg_hooks = { -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40585
[Bug middle-end/39886] [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274
--- Comment #4 from hubicka at ucw dot cz 2009-07-02 10:05 --- Subject: Re: [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274 > Would you mind seeing if your patch was the same? I wanted to prevent the (set pc pc) trick, but this seems like easier fix for the problem :) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39886
[Bug tree-optimization/40436] [4.5 regression] 0.5% code size regression caused by r147852
--- Comment #13 from hubicka at ucw dot cz 2009-07-02 10:10 --- Subject: Re: [4.5 regression] 0.5% code size regression caused by r147852 OK, on i386 it has some effect according to our nightly tester it is 3524421->3510754. The size used to be as low as 3431090 so it is just small improvement. I guess I will commit the patch anyway as it is quite obvious fix. The other problem might be the "likely_eliminated_by_inlining_p" predicate that is very optimisitic. This predicate makes inliner to believe that all indirect reads and writes to/from pointers passed to function or function parameters will be optimized out. This is important to allow inlining of methods and SRAing out objects in C++ and devirtualizing calls, but for C code it is bit too optimistic. Partly this can be cured by IPA-SRA and Martin has WIP patch for clonning that contains more fine grained analysis of function body size specialized for given parameters. I however doubt they will catch all the cases we need for C++. Perhaps simply disabling the predicate for -Os or making it just weak hint (removing some percentage of estimated cost) is best way to go, I am just re-testing it on vangelis with size estimates ignoring it. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436
[Bug middle-end/40388] [4.5 Regression] another null pointer in remove_unreachable_regions
--- Comment #7 from hubicka at ucw dot cz 2009-07-10 22:36 --- Subject: Re: [4.5 Regression] another null pointer in remove_unreachable_regions > 569 EXECUTE_IF_SET_IN_BITMAP (i->aka, 0, n, bi) > (gdb) p i->aka > $1 = (bitmap) 0x0 oops, forgot about this issue. Testing obvious patch checking for i->aka being NULL. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40388
[Bug tree-optimization/40676] [4.5 Regression] internal compiler error: verify_ssa error: definition in block 5 does not dominate use in block 7
--- Comment #5 from hubicka at ucw dot cz 2009-07-11 22:45 --- Subject: Re: [4.5 Regression] internal compiler error: verify_ssa error: definition in block 5 does not dominate use in block 7 Thinking about this more, we change here dominance relation in not-so-obvious way. It is not really textbook case with presence of both abnormal edges that might prevent forwarding consistently everything across the empty BBs and virtual operands that may remain in the BBs otherwise empty. I think we need 1) forward the edges in the tree-ssa-dce itself (i.e. don't do the edge forwarding only when control flow stmt becomes dead but for every edge leading to dead BB that is not abnormal) 2) for empty BBs that remains in the program (only reason would be because they are destination of abnormal edge), send all virtual PHIs for updating since we can not be sure dominance relations are preserved. Sounds sane? If so, I will give it a try tomorrow. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40676
[Bug tree-optimization/40676] [4.5 Regression] internal compiler error: verify_ssa error: definition in block 5 does not dominate use in block 7
--- Comment #7 from hubicka at ucw dot cz 2009-07-12 16:18 --- Subject: Re: [4.5 Regression] internal compiler error: verify_ssa error: definition in block 5 does not dominate use in block 7 Hi, there is interesting difficulty with this plan. When we have something like BB1: if (test) goto BB2 else BB3; BB2: BB3: A=PHI (0 from BB1, 1 from BB2) we end up forwarding edge BB1->BB2 to BB3 resulting in wrong code problem. This is because how control dependency is formulated. When visiting BB we first mark live its control dependent BBs (that contains conditionals deciding if BB will be executed at all) and when visiting PHI we mark control dependency BB of source BBs of edges leading to PHI. In this case control dependent BB2 is BB1, so we correctly mark the test as neccesary, but we never mark BB2 as neccesary in any way. I checked original Cytron formulation of the CD-DCE and it is not forwarding edges of all branches, only of branches being removed just as current mainline does. I saw the forwarding of all branches on some slides presenting CD-DCE but I am not sure if this can be cheaply done correctly (one would need control dependence relation not only for BBs, but also for edges, or implicit split edge BBs of every edge that leads to PHI). The following patch fixes ICE by implementing #2 from my previous comment. Wihtout #1 we end up with some unnecesary virtuals being sent for renaming (those virtuals that exist in otherwise empty BBs), but I doubt it is that big deal. I am regtesting&bootstrapping this fix. Honza Index: tree-ssa-dce.c === --- tree-ssa-dce.c (revision 149499) +++ tree-ssa-dce.c (working copy) @@ -1137,7 +1162,7 @@ eliminate_unnecessary_stmts (void) for (bb = ENTRY_BLOCK_PTR->next_bb; bb != EXIT_BLOCK_PTR; bb = next_bb) { next_bb = bb->next_bb; - if (!(bb->flags & BB_REACHABLE)) + if (!TEST_BIT (bb_contains_live_stmts, bb->index)) { for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi)) if (!is_gimple_reg (gimple_phi_result (gsi_stmt (gsi @@ -1159,8 +1184,11 @@ eliminate_unnecessary_stmts (void) if (found) mark_virtual_phi_result_for_renaming (gsi_stmt (gsi)); } - delete_basic_block (bb); + if (!(bb->flags & BB_REACHABLE)) + delete_basic_block (bb); } + else + gcc_assert (bb->flags & BB_REACHABLE); } } FOR_EACH_BB (bb) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40676
[Bug middle-end/40388] [4.5 Regression] another null pointer in remove_unreachable_regions
--- Comment #12 from hubicka at ucw dot cz 2009-07-12 22:44 --- Subject: Re: [4.5 Regression] another null pointer in remove_unreachable_regions > The testsuite failure was due to a double paste into the testcase; fixing that > maxes it work. Uh, double application of patch.. Thanks for fixing it! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40388
[Bug tree-optimization/40759] [4.5 Regression] segfault in useless_type_conversion_p
--- Comment #3 from hubicka at ucw dot cz 2009-07-15 11:29 --- Subject: Re: [4.5 Regression] segfault in useless_type_conversion_p I hope that patch for PR40676 should cure those problems. I am just on the way to Prague, but I will try to look into it tomorrow. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40759
[Bug tree-optimization/40759] [4.5 Regression] segfault in useless_type_conversion_p
--- Comment #4 from hubicka at ucw dot cz 2009-07-23 16:15 --- Subject: Re: [4.5 Regression] segfault in useless_type_conversion_p Hi, the problem here is in removing virtual PHI. We replace uses of the virtual PHI results by the corresponding VAR_DECL and send symbol for renaming. However the replacement is done only for live statements and we send for renaming only if any live statements are found. The problem here is that virutal PHI defines vop used by dead statement. The dead statement however define vop used by live statement. At the time we are removing the dead statement, live statement gets former PHI result, now dead in its vuse. The following patch solves it by simply updating all uses, dead or alive. It woudl be possible to keep this check and add check into code deleting dead_statements to update when result of dead PHI is propagated through. I am bootsrapping/regtesting this version. Index: tree-ssa-dce.c === --- tree-ssa-dce.c (revision 150009) +++ tree-ssa-dce.c (working copy) @@ -828,9 +828,6 @@ mark_virtual_phi_result_for_renaming (gi } FOR_EACH_IMM_USE_STMT (stmt, iter, gimple_phi_result (phi)) { - if (gimple_code (stmt) != GIMPLE_PHI - && !gimple_plf (stmt, STMT_NECESSARY)) -continue; FOR_EACH_IMM_USE_ON_STMT (use_p, iter) SET_USE (use_p, SSA_NAME_VAR (gimple_phi_result (phi))); update_stmt (stmt); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40759
[Bug tree-optimization/40874] Function object abstraction penalty with inline functions.
--- Comment #11 from hubicka at ucw dot cz 2009-07-29 08:08 --- Subject: Re: Function object abstraction penalty with inline functions. > I'll take this for now. My preferred way of fixing this would be to include FRE pass. Unfortunately my last benchmarks adding FRE early wasn't showing much of win on our benchmark suite... Still it seems right thing to do. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40874
[Bug tree-optimization/24653] [4.1 regression] EON regressed seriously on x86-64
--- Comment #9 from hubicka at ucw dot cz 2005-11-21 14:44 --- Subject: Re: [4.1 regression] EON regressed seriously on x86-64 > > > --- Comment #8 from pinskia at gcc dot gnu dot org 2005-11-21 13:30 > --- > Fixed at least on the mainline for 4.2.0. I am going to fix it on 4.1 branch too once testing converge. However I would still like to see DCE after DOM or reordered DCE and DOM. Even if the CCP patch fixes the EON regression one way, this problem seem pretty common to C++ code (see my tramp3d results I posted). Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24653
[Bug c++/27369] [4.1/4.2 Regression] tree check ICE when attribute externally_visible used
--- Comment #13 from hubicka at ucw dot cz 2006-07-21 15:13 --- Subject: Re: [4.1/4.2 Regression] tree check ICE when attribute externally_visible used > > > --- Comment #12 from mmitchel at gcc dot gnu dot org 2006-07-21 08:38 > --- > I think that Comment #10 shows that handle_externally_visible should not be > registering things with cgraph, as we shouldn't ever have anything pointing at > a re-declaration. Hi, in January I made patch for similar problem that simply deffers handling of externally visible and used attributes after compilation unit is finalized. I am going to update it for current tree and re-test. Does it look safe? Hi, at the present externally_visible on: extern const char *mystr; /* normally in a header */ const char *mystr __attribute__ ((externally_visible)); int main (int argc, char **argv) { mystr = argv[0]; return (0); } is cowardly ignored. This is because handle_externally_visible_attribute is called before decl merging and it produce new cgraph node with externally_visible flag that is never merged to real decl node. externally_visible is in many ways symmetric to used attribute, so I looked on how it is implemented, but found it somewhat sliperly to copy. Used attribute is processed on cgraph_finalize_* machinery first, but since it might be added retrospectivly, it is also processed by c frontend when merging decls and finally it sets TREE_SYMBOL_REFERENCED flag that survive decl merging to get the early used attributes right. This is way too crazy and easy to break, so I moved both attributes to new place - at the end of compilation the declarations are travelled and we check whether the attributes are present. Since this is miscompilation bug, I would like to see it solved in 4.1 too. I wonder whether this scheme seems safe to others or if I should make less intrusive approach? (perhaps moving externally_visible only) Bootstrapped/regtested i686-pc-gnu-linux, OK for mainline and possibly 4.1? 2006-01-19 Jan Hubicka <[EMAIL PROTECTED]> * craph.c (cgraph_varpool_nodes): Export. (decide_is_variable_needed): Do not worry about "used" attribute. * cgraph.h (cgraph_varpool_nodes): Declare. * cgraphunit.c (decide_is_function_needed): Do not worry about "used" attribute (process_function_and_variable_attributes): New function. (cgraph_finalize_compilation_unit): Call it. * c-decl.c (finish_decl): Do not worry about used attribute. * c-common.c (handle_externally_visible_attribute): Only validate. Index: cgraph.c === *** cgraph.c(revision 109820) --- cgraph.c(working copy) *** static GTY((param_is (struct cgraph_varp *** 132,138 struct cgraph_varpool_node *cgraph_varpool_nodes_queue, *cgraph_varpool_first_unanalyzed_node; /* The linked list of cgraph varpool nodes. */ ! static GTY(()) struct cgraph_varpool_node *cgraph_varpool_nodes; /* End of the varpool queue. Needs to be QTYed to work with PCH. */ static GTY(()) struct cgraph_varpool_node *cgraph_varpool_last_needed_node; --- 132,138 struct cgraph_varpool_node *cgraph_varpool_nodes_queue, *cgraph_varpool_first_unanalyzed_node; /* The linked list of cgraph varpool nodes. */ ! struct cgraph_varpool_node *cgraph_varpool_nodes; /* End of the varpool queue. Needs to be QTYed to work with PCH. */ static GTY(()) struct cgraph_varpool_node *cgraph_varpool_last_needed_node; *** bool *** 838,845 decide_is_variable_needed (struct cgraph_varpool_node *node, tree decl) { /* If the user told us it is used, then it must be so. */ ! if (node->externally_visible ! || lookup_attribute ("used", DECL_ATTRIBUTES (decl))) return true; /* ??? If the assembler name is set by hand, it is possible to assemble --- 838,844 decide_is_variable_needed (struct cgraph_varpool_node *node, tree decl) { /* If the user told us it is used, then it must be so. */ ! if (node->externally_visible) return true; /* ??? If the assembler name is set by hand, it is possible to assemble Index: cgraph.h === *** cgraph.h(revision 109820) --- cgraph.h(working copy) *** extern GTY(()) struct cgraph_node *cgrap *** 242,247 --- 242,248 extern GTY(()) struct cgraph_varpool_node *cgraph_varpool_first_unanalyzed_node; extern GTY(()) struct cgraph_varpool_node *cgraph_varpool_nodes_queue; + extern GTY(()) struct cgraph_varpool_node *cgraph_varpool_nodes; extern GTY(()) struct cgraph_asm_node *cgraph_asm_nodes; extern GTY(()) int cgraph_order; Index: cgraphunit.c === *** cgraphunit.c(revision 109820) --- cgraphunit.c(working copy)
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #10 from hubicka at ucw dot cz 2006-07-22 13:47 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, this patch makes the -O2 case work pretty well on tree side. Inliner expands code from 8MB to 40MB of GGC memory that seems under control. Aliasing peaks at 85MB that also don't seem completely unresonable. I will need to give it more testing. I believe inliner is always ggc safe but it is easy to be mistaken here. The patch also speeds up the inline heuristic by prunning out the impossible edges early making the priority queue smaller. Also I am quite curious how inliner manages to produce 800MB of garbage... Honza Index: ipa-inline.c === *** ipa-inline.c(revision 115645) --- ipa-inline.c(working copy) *** update_caller_keys (fibheap_t heap, stru *** 413,418 --- 413,419 bitmap updated_nodes) { struct cgraph_edge *edge; + const char *failed_reason; if (!node->local.inlinable || node->local.disregard_inline_limits || node->global.inlined_to) *** update_caller_keys (fibheap_t heap, stru *** 421,426 --- 422,441 return; bitmap_set_bit (updated_nodes, node->uid); node->global.estimated_growth = INT_MIN; + + if (!node->local.inlinable) + return; + /* Prune out edges we won't inline into anymore. */ + if (!cgraph_default_inline_p (node, &failed_reason)) + { + for (edge = node->callers; edge; edge = edge->next_caller) + if (edge->aux) + { + fibheap_delete_node (heap, edge->aux); + edge->aux = NULL; + } + return; + } for (edge = node->callers; edge; edge = edge->next_caller) if (edge->inline_failed) Index: tree-inline.c === *** tree-inline.c (revision 115645) --- tree-inline.c (working copy) *** expand_call_inline (basic_block bb, tree *** 2163,2172 /* Update callgraph if needed. */ cgraph_remove_node (cg_edge->callee); - /* Declare the 'auto' variables added with this inlined body. */ - record_vars (BLOCK_VARS (id->block)); id->block = NULL_TREE; successfully_inlined = TRUE; egress: input_location = saved_location; --- 2163,2171 /* Update callgraph if needed. */ cgraph_remove_node (cg_edge->callee); id->block = NULL_TREE; successfully_inlined = TRUE; + ggc_collect (); egress: input_location = saved_location; *** declare_inline_vars (tree block, tree va *** 2556,2562 { tree t; for (t = vars; t; t = TREE_CHAIN (t)) ! DECL_SEEN_IN_BIND_EXPR_P (t) = 1; if (block) BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars); --- 2555,2567 { tree t; for (t = vars; t; t = TREE_CHAIN (t)) ! { ! DECL_SEEN_IN_BIND_EXPR_P (t) = 1; ! gcc_assert (!TREE_STATIC (t) && !TREE_ASM_WRITTEN (t)); ! cfun->unexpanded_var_list = ! tree_cons (NULL_TREE, t, ! cfun->unexpanded_var_list); ! } if (block) BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #11 from hubicka at ucw dot cz 2006-07-22 17:12 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, this avoids inliner to produce quadratically many STMT list nodes, so inlining is now resonably fast. Next offenders are alias info, PRE, regmove, global alloc and schedulers. Index: tree-cfg.c === *** tree-cfg.c (revision 115645) --- tree-cfg.c (working copy) *** tree_redirect_edge_and_branch_force (edg *** 4158,4164 static basic_block tree_split_block (basic_block bb, void *stmt) { ! block_stmt_iterator bsi, bsi_tgt; tree act; basic_block new_bb; edge e; --- 4158,4165 static basic_block tree_split_block (basic_block bb, void *stmt) { ! block_stmt_iterator bsi; ! tree_stmt_iterator tsi_tgt; tree act; basic_block new_bb; edge e; *** tree_split_block (basic_block bb, void * *** 4192,4204 } } ! bsi_tgt = bsi_start (new_bb); ! while (!bsi_end_p (bsi)) ! { ! act = bsi_stmt (bsi); ! bsi_remove (&bsi, false); ! bsi_insert_after (&bsi_tgt, act, BSI_NEW_STMT); ! } return new_bb; } --- 4193,4209 } } ! if (bsi_end_p (bsi)) ! return new_bb; ! ! /* Split the statement list - avoid re-creating new containers as this ! brings ugly quadratic memory consumption in the inliner. ! (We are still quadratic since we need to update stmt BB pointers, ! sadly) */ ! new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi); ! for (tsi_tgt = tsi_start (new_bb->stmt_list); !!tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt)) ! set_bb_for_stmt (tsi_stmt (tsi_tgt), new_bb); return new_bb; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #12 from hubicka at ucw dot cz 2006-07-22 18:09 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, I am attaching the .optimized dump of this testcase. It is quite good demonstration on how SRA and TER tends to increase register pressure in code like: ;; Function add (add) Analyzing Edge Insertions. add (x, y) { double r$min; : r$min = x.min + y.min; .max = x.max + y.max; .min = r$min; return ; } ;; Function mul (mul) Analyzing Edge Insertions. mul (x, y) { double y$min; double y$max; double x$min; double x$max; double d; double c; double b; double a; : x$max = x.max; x$min = x.min; y$max = y.max; y$min = y.min; a = y$min * x$min; b = y$max * x$min; c = y$min * x$max; d = y$max * x$max; .max = max (max (a, b), max (c, d)); .min = min (min (a, b), min (c, d)); return ; } ;; Function fz (fz) fz (x, y, z) { : tmp3 = pow (z, 3.7e+1); tmp7 = pow (y, 2.0e+0); tmp9 = pow (z, 3.6e+1); tmp14 = pow (y, 3.0e+0); tmp16 = pow (z, 3.5e+1); ... tmp3922 = pow (x, 3.8e+1); D.17848 = pow (x, 3.9e+1); D.17965 = pow (y, 3.9e+1); D.17968 = pow (z, 3.9e+1); return tmp3 * x * 2.04629333124046830505449179327115416526794433594e+1 * y + tmp9 * tmp7 * x * 1.63737898728226838329646852798759937286376953125e+2 + tmp16 * tmp14 * x * 3.102825991153964650948182679712772369384765625e+2 + tmp23 * tmp21 * x * -1.38580890184729059910750947892665863037109375e+3 + tmp30 * tmp28 * x * -4.39080063708386560961116629187017679214477539062e+1 + tmp37 * tmp35 * x * 1.737348223038549986085854470729827880859375e+4 + tmp44 * tmp42 * x * -1.069806869373114386689849197864532470703125e+4 + tmp51 * tmp49 * x * -3.542086638969252817332744598388671875e+4 + tmp58 * tmp56 * x * -3.091774346229622824466787278652191162109375e+4 + tmp65 * tmp63 * x * 1.568088658621288946889400482177734375e+5 + tmp72 * tmp70 * x * 4.19376520881160162389278411865234375e+5 + tmp79 * tmp77 * x * 2.0111082929561330820433795452117919921875e+5 + tmp86 * tmp84 * x * -4.337742627231603837572038173675537109375e+5 + tmp93 * tmp91 * x * -4.829501801337040960788726806640625e+5 + tmp100 * tmp98 * x * 5.32241994551055715419352054595947265625e+5 + tmp107 * tmp105 * x * 1.8250994926701225340366363525390625e+6 + tmp114 * tmp112 * x * 1.6382205795514374040067195892333984375e+6 + tmp121 * tmp119 * x * 1.1912621023960295133292675018310546875e+5 + tmp128 * tmp126 * x * 8.811503159726611338555812835693359375e+5 + tmp135 * tmp133 * x * 2.690164492243868880905210971832275390625e+5 + tmp142 * tmp140 * x * 2.271892026609037420712411403656005859375e+5 + tmp149 * tmp147 * x * 1.795814638975697453133761882781982421875e+5 + tmp156 * tmp154 * x * -3.94381184819339658133685588836669921875e+5 + tmp163 * tmp161 * x * 7.64450454622797551564872264862060546875e+5 + tmp170 * tmp168 * x * 6.9298171586054741055704653263092041015625e+4 + tmp177 * tmp175 * x * -3.129066099043917492963373661041259765625e+5 + tmp184 * tmp182 * x * -4.0792914801556640304625034332275390625e+5 + tmp191 * tmp189 * x * 7.3512920753349564620293676853179931640625e+4 + tmp198 * tmp196 * x * 3.5470695311840399881475605070590972900390625e+3 + tmp205 * tmp203 * x * -8.8733450804951236932538449764251708984375e+4 + tmp212 * tmp210 * x * -1.3805889644669676272314973175525665283203125e+4 + tmp219 * tmp217 * x * -7.54301319902873729006387293338775634765625e+3 + tmp226 * tmp224 * x * 2.23731170493404579246998764574527740478515625e+3 + tmp233 * tmp231 * x * -3.903765115338947599581779539585113525390625e+2 + tmp240 * tmp238 * x * 4.743319333283892547115101478993892669677734375e+2 + tmp247 * tmp245 * x * -6.32641294603530113249689748045057058334350585938e+1 + tmp252 * x * -6.76527508139541300380415123072452843189239501953e+0 * z + tmp258 * x * -4.51436297228304250772623618104262277483940124512e-1 + tmp263 * x * 2.89405090268957065902100111998151987791061401367e+0 + tmp9 * tmp268 * -3.7483157190701700756108039058744907379150390625e+2 * y + tmp16 * tmp7 * tmp268 * 9.276025613194925654170219786465167999267578125e+2 + tmp23 * tmp14 * tmp268 * 1.35840047018872951412049587815847412109375e+2 + tmp30 * tmp21 * tmp268 * -3.2681330410168111484381370246410369873046875e+3 + tmp37 * tmp28 * tmp268 * 2.77737094612259534187614917755126953125e+3 + tmp44 * tmp35 * tmp268 * 2.2773056570869275674340315163135528564453125e+3 + tmp51 * tmp42 * tmp268 * 9.2295963366692260024137794971466064453125e+4 + tmp58 * tmp49 * tmp268 * -3.049601738325569895096123218536376953125e+5 + tmp65 * tmp56 * tmp268 * -2.69300746038850047625601291656494140625e+5 + tmp72 * tmp63 * tmp268 * 3.92479526798162725754082202911376953125e+5 + tmp79 * tmp70 * tmp268 * -1.4348648827185891568660736083984375e+6 + tmp86 * tmp77 * tmp268 * 1.2925352909364881925284862518310546875e+6 + tmp93 * tmp84 * tmp268 * 3.44742843619707785546779632568359375e+6 + tmp100 * tmp91 * tmp268 * 2.2975221813043109141290187835693359375e+6 + tmp107 * tmp98 * tmp268
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #14 from hubicka at ucw dot cz 2006-07-22 19:30 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, with the attached patch I can cure the regmove quadratic behaviour and the time report is not so unresonable now: gnu_dev_major gnu_dev_minor gnu_dev_makedev max min f fx fy fz add addl addr sub subl subr mul mull mulr divl ipow fi Analyzing compilation unitPerforming intraprocedural optimizations Assembling functions: max min add addl addr sub subl subr mul mull mulr divl ipow fz fy fx f fi {GC 126177k -> 85112k} {GC 327625k -> 39474k} Execution times (seconds) garbage collection: 0.83 ( 0%) usr 0.00 ( 0%) sys 0.82 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.16 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall 1147 kB ( 0%) ggc callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 533 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc ipa pure const: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc trivially dead code : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 0 kB ( 0%) ggc life analysis : 21.38 ( 3%) usr 0.02 ( 1%) sys 21.39 ( 3%) wall 1120 kB ( 0%) ggc life info update : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall 0 kB ( 0%) ggc alias analysis: 0.87 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 4266 kB ( 1%) ggc register scan : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 150 kB ( 0%) ggc rebuild jump labels : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.27 ( 0%) usr 0.06 ( 2%) sys 0.36 ( 0%) wall 471 kB ( 0%) ggc lexical analysis : 0.04 ( 0%) usr 0.05 ( 2%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc parser: 0.12 ( 0%) usr 0.03 ( 1%) sys 0.17 ( 0%) wall 3207 kB ( 1%) ggc inline heuristics : 15.14 ( 2%) usr 0.01 ( 0%) sys 15.26 ( 2%) wall 1486 kB ( 0%) ggc integration : 21.35 ( 3%) usr 0.12 ( 4%) sys 21.71 ( 3%) wall 33445 kB ( 8%) ggc tree gimplify : 0.18 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall 3341 kB ( 1%) ggc tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1338 kB ( 0%) ggc tree CFG cleanup : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 20 kB ( 0%) ggc tree VRP : 0.38 ( 0%) usr 0.01 ( 0%) sys 0.42 ( 0%) wall 11 kB ( 0%) ggc tree copy propagation : 0.23 ( 0%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall 222 kB ( 0%) ggc tree store copy prop : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall 4 kB ( 0%) ggc tree find ref. vars : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 8137 kB ( 2%) ggc tree PTA : 1.29 ( 0%) usr 0.04 ( 1%) sys 1.36 ( 0%) wall 57 kB ( 0%) ggc tree alias analysis : 1.89 ( 0%) usr 0.20 ( 7%) sys 2.10 ( 0%) wall 0 kB ( 0%) ggc tree PHI insertion: 1.68 ( 0%) usr 0.01 ( 0%) sys 1.70 ( 0%) wall 18 kB ( 0%) ggc tree SSA rewrite : 0.62 ( 0%) usr 0.04 ( 1%) sys 0.65 ( 0%) wall 17084 kB ( 4%) ggc tree SSA other: 0.48 ( 0%) usr 0.08 ( 3%) sys 0.56 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 1.20 ( 0%) usr 0.00 ( 0%) sys 1.24 ( 0%) wall 0 kB ( 0%) ggc tree operand scan : 1.48 ( 0%) usr 0.34 (11%) sys 1.93 ( 0%) wall 15634 kB ( 4%) ggc dominator optimization: 1.05 ( 0%) usr 0.05 ( 2%) sys 1.05 ( 0%) wall 2698 kB ( 1%) ggc tree SRA : 1.05 ( 0%) usr 0.09 ( 3%) sys 1.15 ( 0%) wall 24835 kB ( 6%) ggc tree STORE-CCP: 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 4 kB ( 0%) ggc tree CCP : 0.51 ( 0%) usr 0.02 ( 1%) sys 0.56 ( 0%) wall 154 kB ( 0%) ggc tree reassociation: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc tree PRE : 296.46 (45%) usr 0.49 (16%) sys 298.81 (45%) wall 19481 kB ( 5%) ggc tree FRE : 0.96 ( 0%) usr 0.05 ( 2%) sys 1.00 ( 0%) wall 7991 kB ( 2%) ggc tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall 0 kB ( 0%) ggc tree aggressive DCE : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 8 kB ( 0%) ggc tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA to normal: 27.19 ( 4
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #16 from hubicka at ucw dot cz 2006-07-22 20:51 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, with the attached patch that saves roughly 10 minutes of tree-into-ssa pass, I can compile with -O3 -fno-tree-fre -fno-tree-pre. Only without checking-enabled since we do incredibly deep dominator walks running out of stack space that can be considered as bug too. TER still manages to enfore few thousdand temporaries with overlapping liveranges. THe out-of-ssa pass spends most of time in calculate_live_on_exit and calculate_live_on_entry that looks rather symmetric to problem cured by the attached patch, but I don't see directly how to avoid the quadratic behaviour there. Honza garbage collection: 1.22 ( 0%) usr 0.10 ( 1%) sys 8.40 ( 1%) wall 0 kB ( 0%) ggc callgraph construction: 0.14 ( 0%) usr 0.03 ( 0%) sys 0.18 ( 0%) wall 1147 kB ( 0%) ggc callgraph optimization: 0.07 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall 533 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc ipa pure const: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc cfg cleanup : 3.89 ( 1%) usr 0.01 ( 0%) sys 4.11 ( 0%) wall 1576 kB ( 1%) ggc trivially dead code : 0.46 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall 0 kB ( 0%) ggc life analysis : 51.34 ( 9%) usr 2.65 (21%) sys 73.91 ( 5%) wall 2653 kB ( 1%) ggc life info update : 48.97 ( 9%) usr 0.14 ( 1%) sys 50.68 ( 4%) wall 641 kB ( 0%) ggc alias analysis: 0.69 ( 0%) usr 0.00 ( 0%) sys 1.05 ( 0%) wall 4139 kB ( 1%) ggc register scan : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 0 kB ( 0%) ggc rebuild jump labels : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.37 ( 0%) usr 0.06 ( 0%) sys 0.34 ( 0%) wall 471 kB ( 0%) ggc lexical analysis : 0.01 ( 0%) usr 0.05 ( 0%) sys 0.07 ( 0%) wall 0 kB ( 0%) ggc parser: 0.09 ( 0%) usr 0.02 ( 0%) sys 0.18 ( 0%) wall 3207 kB ( 1%) ggc inline heuristics : 14.79 ( 3%) usr 0.02 ( 0%) sys 14.86 ( 1%) wall 1118 kB ( 0%) ggc integration : 17.07 ( 3%) usr 0.22 ( 2%) sys 17.36 ( 1%) wall 79483 kB (27%) ggc tree gimplify : 0.15 ( 0%) usr 0.01 ( 0%) sys 0.17 ( 0%) wall 3341 kB ( 1%) ggc tree eh : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 1338 kB ( 0%) ggc tree CFG cleanup : 4.27 ( 1%) usr 0.00 ( 0%) sys 4.27 ( 0%) wall 20 kB ( 0%) ggc tree VRP : 1.26 ( 0%) usr 0.03 ( 0%) sys 1.33 ( 0%) wall 14 kB ( 0%) ggc tree copy propagation : 0.85 ( 0%) usr 0.05 ( 0%) sys 0.94 ( 0%) wall 313 kB ( 0%) ggc tree store copy prop : 0.27 ( 0%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall 5 kB ( 0%) ggc tree find ref. vars : 0.16 ( 0%) usr 0.03 ( 0%) sys 0.18 ( 0%) wall 12044 kB ( 4%) ggc tree PTA : 1.55 ( 0%) usr 0.06 ( 0%) sys 1.63 ( 0%) wall 57 kB ( 0%) ggc tree alias analysis : 2.81 ( 0%) usr 0.29 ( 2%) sys 3.10 ( 0%) wall 0 kB ( 0%) ggc tree PHI insertion: 0.57 ( 0%) usr 0.92 ( 7%) sys 1.52 ( 0%) wall 3137 kB ( 1%) ggc tree SSA rewrite : 2.33 ( 0%) usr 0.06 ( 0%) sys 5.02 ( 0%) wall 21592 kB ( 7%) ggc tree SSA other: 0.41 ( 0%) usr 0.16 ( 1%) sys 0.65 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 4.18 ( 1%) usr 0.45 ( 4%) sys 4.72 ( 0%) wall 520 kB ( 0%) ggc tree operand scan : 1.79 ( 0%) usr 0.69 ( 5%) sys 39.97 ( 3%) wall 18374 kB ( 6%) ggc dominator optimization: 2.91 ( 1%) usr 0.05 ( 0%) sys 2.99 ( 0%) wall 11155 kB ( 4%) ggc tree SRA : 4.24 ( 1%) usr 0.15 ( 1%) sys 4.51 ( 0%) wall 25568 kB ( 9%) ggc tree STORE-CCP: 0.29 ( 0%) usr 0.01 ( 0%) sys 0.31 ( 0%) wall 18 kB ( 0%) ggc tree CCP : 0.87 ( 0%) usr 0.01 ( 0%) sys 2.39 ( 0%) wall 154 kB ( 0%) ggc tree split crit edges : 0.11 ( 0%) usr 0.02 ( 0%) sys 0.14 ( 0%) wall 9284 kB ( 3%) ggc tree reassociation: 0.34 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall 0 kB ( 0%) ggc tree code sinking : 0.32 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall 0 kB ( 0%) ggc tree linearize phis : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc tree forward propagate: 0.10 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 1.13 ( 0%) usr 0.00 ( 0%) sys 1.11 ( 0%) wall 0 kB ( 0%) ggc tree aggressiv
[Bug gcov/profile/28480] [4.2 Regression] inliner-1.c:31: ICE: in set_bb_for_stmt, at tree-cfg.c:2775
--- Comment #5 from hubicka at ucw dot cz 2006-07-27 16:06 --- Subject: Re: [Bug gcov/profile/28480] [4.2 Regression] inliner-1.c:31: ICE: in set_bb_for_stmt, at tree-cfg.c:2775 Hi, it is hitting sanity check in set_bb_for_stmt that is bit insane in this context. I am testing patch to inline neccesary parts of set_bb_for_stmt (because of quadratic time issue it is time critical too and we need very little of the function itself.) Index: tree-cfg.c === *** tree-cfg.c (revision 115775) --- tree-cfg.c (working copy) *** tree_split_block (basic_block bb, void * *** 4203,4209 new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi); for (tsi_tgt = tsi_start (new_bb->stmt_list); !tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt)) ! set_bb_for_stmt (tsi_stmt (tsi_tgt), new_bb); return new_bb; } --- 4205,4218 new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi); for (tsi_tgt = tsi_start (new_bb->stmt_list); !tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt)) ! { ! tree stmt = tsi_stmt (tsi_tgt); ! ! get_stmt_ann (stmt)->bb = new_bb; ! if (TREE_CODE (stmt) == LABEL_EXPR) ! VEC_replace (basic_block, label_to_block_map, !LABEL_DECL_UID (LABEL_EXPR_LABEL (stmt)), bb); ! } return new_bb; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28480
[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #32 from hubicka at ucw dot cz 2006-07-28 09:41 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, I've added this testcase to our's memory regression tester (see gcc-regression mainling list), so hopefully the quadratic memory consumption issues will be tracked now. It would be nice to have runtime benchmark variant of the test we can track the runtime and compilation time. It seems to uncover quite interesting behaviours across the compiler. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #47 from hubicka at ucw dot cz 2006-08-08 06:28 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 > In x86/x86-64 world one can be almost sure that the load+execute instruction > pair will execute (marginaly to noticeably) faster than move+load-and-execute > instruction pair as the more complex instructions are harder for on-chip > scheduling (they retire later). ^^^ retirement filling up the scheduler easilly. > Perhaps we can move such a transformation somewhere more generically perhaps > to > post-reload copyprop? > > Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #37 from hubicka at ucw dot cz 2006-08-18 23:10 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, to summary current process, the memory consumption seems to be in control now: comparing PR rtl-optimization/28071 testcase compilation at -O0 level: Ovarall memory allocated via mmap and sbrk decreased from 146456k to 134136k, overall -9.18% Peak amount of GGC memory allocated before garbage collecting run decreased from 95412k to 81628k, overall -16.89% Amount of produced GGC garbage decreased from 163295k to 143524k, overall -13.77% Overall memory needed: 146456k -> 134136k Peak memory use before GGC: 95412k -> 81628k Peak memory use after GGC: 58507k Maximum of released memory in single GGC run: 45493k Garbage: 163295k -> 143524k Leak: 7142k Overhead: 29023k -> 25103k GGC runs: 87 comparing PR rtl-optimization/28071 testcase compilation at -O1 level: Overall memory needed: 430308k -> 424700k Peak memory use before GGC: 201177k Peak memory use after GGC: 196173k Maximum of released memory in single GGC run: 100203k -> 95156k Garbage: 279198k -> 271636k Leak: 47195k Overhead: 31459k -> 29952k GGC runs: 105 comparing PR rtl-optimization/28071 testcase compilation at -O2 level: Overall memory needed: 350424k -> 344820k Peak memory use before GGC: 208293k Peak memory use after GGC: 196536k Maximum of released memory in single GGC run: 101565k -> 96536k Garbage: 394891k -> 387353k Leak: 47778k Overhead: 49054k -> 47552k GGC runs: 111 comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level: Overall memory needed: 535696k -> 536260k Peak memory use before GGC: 314602k Peak memory use after GGC: 292946k Maximum of released memory in single GGC run: 163430k Garbage: 494953k -> 486928k Leak: 65110k Overhead: 60330k -> 58798k GGC runs: 100 I will post short summary of remaining bottleneks on each optimization level. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #38 from hubicka at ucw dot cz 2006-08-19 00:19 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space At -O0 we get time sinks: life analysis : 0.75 (10%) usr 0.01 ( 3%) sys 0.78 ( 9%) wall 2714 kB ( 4%) ggc expand: 1.46 (15%) usr 0.04 (11%) sys 1.66 (15%) wall 37656 kB (58%) ggc local alloc : 1.40 (14%) usr 0.04 (11%) sys 1.45 (13%) wall 1293 kB ( 2%) ggc global alloc : 3.55 (36%) usr 0.05 (14%) sys 3.67 (34%) wall 7509 kB (12%) ggc final : 0.96 (10%) usr 0.04 (11%) sys 1.00 ( 9%) wall 1157 kB ( 2%) ggc TOTAL : 9.95 0.3510.77 64543 kB Expand seems resonable given that almost everything is call that has long representation. Global alloc is copying important portion of insn stream because of: /* If we aren't replacing things permanently and we changed something, make another copy to ensure that all the RTL is new. Otherwise things can go wrong if find_reload swaps commutative operands and one is inside RTL that has been copied while the other is not. */ new_body = old_body; if (! replace) { new_body = copy_insn (old_body); if (REG_NOTES (insn)) REG_NOTES (insn) = copy_insn_1 (REG_NOTES (insn)); } and few other occurences of copy_insn in reload1.c. They seems to copy quite a lot of unnecesary RTL "just for sure". Also virtual register ellimination produce a lot of duplicated RTL, perhaps it can be cached? global alloc also spend 50% of time by clearing out reg_has_output_reload. I am testing patch that fix that. global alloc : 1.51 (19%) usr 0.07 (20%) sys 1.60 (18%) wall 7509 kB (12%) ggc Final is spending all it's time in shorten branches, that are not needed at all. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #39 from hubicka at ucw dot cz 2006-08-19 01:51 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space The -O1 time sinks: life analysis : 25.44 (19%) usr 0.00 ( 0%) sys 25.49 (17%) wall 2565 kB ( 2%) ggc inline heuristics : 14.92 (11%) usr 0.00 ( 0%) sys 14.95 (10%) wall 1486 kB ( 1%) ggc integration : 20.73 (15%) usr 0.10 ( 4%) sys 22.72 (15%) wall 33445 kB (20%) ggc tree SSA to normal: 27.97 (20%) usr 0.04 ( 2%) sys 28.13 (19%) wall 17 kB ( 0%) ggc expand: 2.56 ( 2%) usr 0.04 ( 2%) sys 2.67 ( 2%) wall 24100 kB (14%) ggc local alloc : 7.21 ( 5%) usr 0.03 ( 1%) sys 7.18 ( 5%) wall 1855 kB ( 1%) ggc global alloc : 11.76 ( 9%) usr 0.99 (39%) sys 17.71 (12%) wall 11029 kB ( 6%) ggc reload CSE regs : 7.91 ( 6%) usr 0.02 ( 1%) sys 7.97 ( 5%) wall 2393 kB ( 1%) ggc TOTAL : 136.62 2.56 148.01 170448 kB tree SSA to normal spends most of time in find_value_in_list because TER is shuffling around single linked lists in the quadratic way. I got quickly lost in the logic there. Andrew, can you take a look, please? integration runs into qudratic behaviour of cgraph_edge. Implementing hashtable for large cgraphs is easy, I will do so. Also tree_split_block quadratic behaviour hits us here. reload CSE regs has hard time to track all the stack slot memory locations. It is working harder than needed because a lot of memories are believed to be aliasing even if theoretically almost everything SRA and has no address taken so it should have unique alias sets. Life analysis spends most of time in dead store removal code. Again lowering --param might help. I am also testing little patch to cut it to 13 seconds by speeding up reg_overlap_mentioned_p. It would be insteresting to see how dataflow branch score here. inline heuristics spends most time checking inline_function_growth limit, I will need to think about it a bit. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #41 from hubicka at ucw dot cz 2006-08-20 00:58 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Thank you for consideration, Live on entry/exit code shows up high on -O3 compilation time too (something like 30% of time without PRE/FRE I believe). So if it is self contained change, perhaps pushing it to mainline as PR fix would make sense. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug c/28744] externally_visible attribute not effective with prior declaration of symbol.
--- Comment #11 from hubicka at ucw dot cz 2006-08-20 12:42 --- Subject: Re: externally_visible attribute not effective with prior declaration of symbol. > Is there any reason why process_function_and_variable_attributes is called > at the end of each TU rather than when all TUs were already parsed? The reason is that we do unreachable function removal after each unit (to conserve memory) and for that we do need to process USED attributes (not really the externally_visible as those are used only in cgraph_optimize). Keeping handling of USED attributes at TU basis, while moving externally_visible to global basis would not completely work, since USED attributes in whole program mode can be used for public variables too, pretty much as externally_visible is used in the testcase. I guess only solution is to process all TU local objects at the end of each TU and all global objects at the cgraph_optimize stage. I will post the patch for this. Thank you for looking into those dead ends! Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28744
[Bug debug/26881] [4.1/4.2 Regression] internal compiler error in dwarf2out_finish
--- Comment #18 from hubicka at ucw dot cz 2006-08-20 12:47 --- Subject: Re: [4.1/4.2 Regression] internal compiler error in dwarf2out_finish > (In reply to comment #14) > Any news on the patch? Sadly we are having just tip of the iceberg here. The patch to deffer output of debug symbols later sort of work, but I noticed there are other PRs related to problem where optimized out static variable is still referred to by debug info, so I attempted to move debug output code to cgraph domain and failed to do so. The problem is that we are quite inconsistent in way we do handle the optimized out variables. In some cases we do emit debug output for them, in other we don't and in another we ICE depending on case and forntend. I guess I will back out and implement the deferring itself without touching the whole issue for start. THen we probably ought to teach debug info output machinery to query cgraph about whether the particular variable was output or not and output the location or optimized out info and move the debug output to cgraph at last (for both local and external stuff, so we will need new datastructure in cgraph holding all declarations in program somehow, as this is for now maintained only by frontends) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26881
[Bug c/28744] externally_visible attribute not effective with prior declaration of symbol.
--- Comment #13 from hubicka at ucw dot cz 2006-08-20 12:59 --- Subject: Re: externally_visible attribute not effective with prior declaration of symbol. > > If this is really true, then there are several bugs (in the FEs?) because > there > are numerous occurances where referenced_vars_insert() is called with > TREE_USED(to) == 0 > > Should there be an assertion that only TREE_USED() > 0 are valid targets for > insertion in/after dfa? I am not quite convinced there is necesarily a problem, since from frontend point of view all public variables are automatically used, so the whole thing matters only for the cgraph code were we start to differentiate -fwhole-program mode from non-whole-program... Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28744
[Bug c/28744] externally_visible attribute not effective with prior declaration of symbol.
--- Comment #14 from hubicka at ucw dot cz 2006-08-20 13:12 --- Subject: Re: externally_visible attribute not effective with prior declaration of symbol. Hi, this is patch I am testing now. Can you think of way to break it (again? :)) The whole thing is a lot more sliperly than I would like it to be... Honza Index: cgraphunit.c === *** cgraphunit.c(revision 116257) --- cgraphunit.c(working copy) *** cgraph_analyze_function (struct cgraph_n *** 965,975 is valid. So, we walk the nodes at the end of the translation unit, applying the !attributes at that point. */ static void process_function_and_variable_attributes (struct cgraph_node *first, ! struct cgraph_varpool_node *first_var) { struct cgraph_node *node; struct cgraph_varpool_node *vnode; --- 985,1002 is valid. So, we walk the nodes at the end of the translation unit, applying the !attributes at that point. ! !The local variables needs to be walked on the end of each compilation unit !(to allow dead function/variable removal), while the global variables needs !to be handled on the end of compilation to allow flags to be declared only !in one of units. The GLOBAL is used to specify whether local or global !variables shall be processed. */ static void process_function_and_variable_attributes (struct cgraph_node *first, ! struct cgraph_varpool_node *first_var, ! bool global) { struct cgraph_node *node; struct cgraph_varpool_node *vnode; *** process_function_and_variable_attributes *** 977,982 --- 1004,1012 for (node = cgraph_nodes; node != first; node = node->next) { tree decl = node->decl; + if (global != (DECL_COMDAT (decl) +|| (TREE_PUBLIC (decl) && !DECL_EXTERNAL (decl + continue; if (lookup_attribute ("used", DECL_ATTRIBUTES (decl))) { mark_decl_referenced (decl); *** process_function_and_variable_attributes *** 1000,1005 --- 1030,1037 for (vnode = cgraph_varpool_nodes; vnode != first_var; vnode = vnode->next) { tree decl = vnode->decl; + if (global != (DECL_COMDAT (decl) || TREE_PUBLIC (decl))) + continue; if (lookup_attribute ("used", DECL_ATTRIBUTES (decl))) { mark_decl_referenced (decl); *** cgraph_finalize_compilation_unit (void) *** 1052,1058 } timevar_push (TV_CGRAPH); ! process_function_and_variable_attributes (first_analyzed, first_analyzed_var); cgraph_varpool_analyze_pending_decls (); if (cgraph_dump_file) { --- 1085,1092 } timevar_push (TV_CGRAPH); ! process_function_and_variable_attributes (first_analyzed, first_analyzed_var, ! false); cgraph_varpool_analyze_pending_decls (); if (cgraph_dump_file) { *** cgraph_optimize (void) *** 1505,1512 timevar_push (TV_CGRAPHOPT); if (!quiet_flag) ! fprintf (stderr, "Performing intraprocedural optimizations\n"); cgraph_function_and_variable_visibility (); if (cgraph_dump_file) { --- 1540,1548 timevar_push (TV_CGRAPHOPT); if (!quiet_flag) ! fprintf (stderr, "Performing interprocedural optimizations\n"); + process_function_and_variable_attributes (NULL, NULL, true); cgraph_function_and_variable_visibility (); if (cgraph_dump_file) { -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28744
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #44 from hubicka at ucw dot cz 2006-08-21 02:59 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, update at -O1 few patches later (different machine with "only" 500MB ram, so some swappin occurs, but we almost fit now): life analysis : 23.50 (20%) usr 0.00 ( 0%) sys 23.51 (17%) wall 2565 kB ( 2%) ggc inline heuristics : 0.60 ( 1%) usr 0.00 ( 0%) sys 0.60 ( 0%) wall 1561 kB ( 1%) ggc integration : 5.75 ( 5%) usr 0.04 ( 2%) sys 5.79 ( 4%) wall 33701 kB (20%) ggc tree SSA rewrite : 0.51 ( 0%) usr 0.01 ( 1%) sys 0.53 ( 0%) wall 17087 kB (10%) ggc tree SRA : 0.98 ( 1%) usr 0.08 ( 4%) sys 1.10 ( 1%) wall 24835 kB (15%) ggc tree SSA to normal: 45.11 (39%) usr 0.02 ( 1%) sys 45.14 (33%) wall 17 kB ( 0%) ggc local alloc : 5.82 ( 5%) usr 0.01 ( 1%) sys 5.85 ( 4%) wall 1855 kB ( 1%) ggc global alloc : 9.83 ( 8%) usr 0.76 (39%) sys 23.49 (17%) wall 11029 kB ( 6%) ggc reload CSE regs : 7.30 ( 6%) usr 0.03 ( 2%) sys 10.16 ( 7%) wall 2393 kB ( 1%) ggc TOTAL : 116.65 1.96 136.52 170783 kB Life analysis is almost completely code tracking dead stores after reload (we have many stack slots). Tree-SSA to normal is the SRA problem discussed, integration is split_block, global alloc allocate very huge conflict matrix, reload CSE regs has similar problem tracking memories. No idea about local alloc. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #45 from hubicka at ucw dot cz 2006-08-21 12:56 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, -O2 times: Execution times (seconds) life analysis : 18.08 ( 3%) usr 0.04 ( 1%) sys 19.42 ( 3%) wall 1120 kB ( 0%) ggc integration : 5.97 ( 1%) usr 0.07 ( 2%) sys 6.13 ( 1%) wall 33701 kB ( 8%) ggc tree PRE : 233.01 (43%) usr 0.46 (13%) sys 241.22 (37%) wall 19480 kB ( 5%) ggc tree SSA to normal: 51.26 ( 9%) usr 0.07 ( 2%) sys 52.62 ( 8%) wall 22 kB ( 0%) ggc expand: 2.62 ( 0%) usr 0.07 ( 2%) sys 9.45 ( 1%) wall 24095 kB ( 6%) ggc PRE : 20.39 ( 4%) usr 0.05 ( 1%) sys 21.70 ( 3%) wall 200 kB ( 0%) ggc regmove : 97.32 (18%) usr 0.17 ( 5%) sys 107.36 (16%) wall 2 kB ( 0%) ggc local alloc : 6.28 ( 1%) usr 0.00 ( 0%) sys 6.29 ( 1%) wall 1480 kB ( 0%) ggc global alloc : 13.12 ( 2%) usr 0.71 (21%) sys 62.79 (10%) wall 13764 kB ( 3%) ggc reload CSE regs : 16.16 ( 3%) usr 0.02 ( 1%) sys 19.21 ( 3%) wall 4783 kB ( 1%) ggc scheduling 2 : 60.85 (11%) usr 0.57 (17%) sys 67.94 (10%) wall 206199 kB (51%) ggc TOTAL : 547.14 3.41 651.49 404467 kB Danny has fix for PRE scheduled for 4.2. Regmove hits again the same predicate function sincle we now produce big basic blocks. This can be fixed rather easilly rather by limiting walk in that predicate or assiging INSN consetuctive indexes. Scheduling has problem moving around linked lists of dependencies and fixing it seems to need to go away from log links and thus it is 4.2 issue too. One detail that just came to mind... All memory consumed in scheduling are log links. Producing 206MB of them for 24MB function is rather dense. Can't we prune them out somewhat perhaps by accounting transitivity (at least in special cases)? The instructions are all really mostly independent, but we apparently lose track of the fact somewhere and producing almost complette tournament apparently. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space
--- Comment #46 from hubicka at ucw dot cz 2006-08-21 17:11 --- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, for completeness the -O3 -fno-tree-pre -fno-tree-fre results (tree-pre/fre needs something little over 2GB of ram to converge) Execution times (seconds) garbage collection: 1.11 ( 1%) usr 0.07 ( 2%) sys 8.57 ( 5%) wall 0 kB ( 0%) ggc life analysis : 5.47 ( 4%) usr 0.12 ( 3%) sys 5.63 ( 3%) wall 2701 kB ( 1%) ggc life info update : 2.05 ( 2%) usr 0.00 ( 0%) sys 2.10 ( 1%) wall 643 kB ( 0%) ggc integration : 8.36 ( 7%) usr 0.18 ( 5%) sys 8.61 ( 5%) wall 79611 kB (27%) ggc tree CFG cleanup : 3.69 ( 3%) usr 0.00 ( 0%) sys 3.77 ( 2%) wall 20 kB ( 0%) ggc tree alias analysis : 2.64 ( 2%) usr 0.25 ( 6%) sys 3.01 ( 2%) wall 0 kB ( 0%) ggc tree SSA rewrite : 2.17 ( 2%) usr 0.02 ( 1%) sys 2.22 ( 1%) wall 21589 kB ( 7%) ggc tree SSA incremental : 4.04 ( 3%) usr 0.01 ( 0%) sys 4.10 ( 2%) wall 1061 kB ( 0%) ggc tree operand scan : 1.54 ( 1%) usr 0.54 (14%) sys 1.95 ( 1%) wall 18382 kB ( 6%) ggc dominator optimization: 2.49 ( 2%) usr 0.06 ( 2%) sys 2.61 ( 1%) wall 11262 kB ( 4%) ggc tree SRA : 3.04 ( 2%) usr 0.08 ( 2%) sys 3.12 ( 2%) wall 25600 kB ( 9%) ggc tree SSA to normal: 38.17 (31%) usr 0.09 ( 2%) sys 38.56 (21%) wall 11214 kB ( 4%) ggc dominance computation : 2.40 ( 2%) usr 0.05 ( 1%) sys 2.52 ( 1%) wall 0 kB ( 0%) ggc expand: 4.22 ( 3%) usr 0.20 ( 5%) sys 11.38 ( 6%) wall 35690 kB (12%) ggc global alloc : 13.43 (11%) usr 1.28 (32%) sys 54.13 (29%) wall 5873 kB ( 2%) ggc flow 2: 0.37 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall 5092 kB ( 2%) ggc TOTAL : 123.25 3.98 183.52 291674 kB Note that the testcase is very different at -O3, because min/max functions are inlined breaking gigantic basic blocks into number of small BBs, so many of bottlenecks visible at -O2 go away. I duno what happens in global alloc, tree SSA to normal is the live_on_entry/live_on_exit dicussed. We also have problems with very deep recursion levels as dominator tree is deep. I am thinking about implementing iterators for walking in dom order as the current fully blown domtree walker is bit uneasy in some cases. With FRE/PRE enabled also GGC runs out of stack frame size, because some of temporary values in annotations leaks and instruct GGC to recurse insanely. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug target/29401] [4.0/4.1/4.2 Regression] missed-optimization (in unneeded code elimination)
--- Comment #4 from hubicka at ucw dot cz 2006-10-15 22:20 --- Subject: Re: [4.0/4.1/4.2 Regression] missed-optimization (in unneeded code elimination) > (insn:TI 38 37 26 2 (parallel [ > (set (reg:SI 1 dx [+4 ]) > (ashiftrt:SI (reg:SI 1 dx [+4 ]) > (const_int 15 [0xf]))) > (clobber (reg:CC 17 flags)) > ]) 443 {*ashrsi3_1} (nil) > (expr_list:REG_UNUSED (reg:CC 17 flags) > (expr_list:REG_UNUSED (reg:SI 1 dx [+4 ]) > (nil > > note the problematic partially dead DI ax:dx which flow does not handle, > so the redundant instruction does not get deleted. A peephole might be > able to fix this until new dataflow maybe handles this case(?). It seems to me that the instruction is completely dead from post-reload dataflow point of view (ie return value is just eax and both sets are correctly marked as unused). Perhaps we just somehow managed to drop post-read DCE? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29401
[Bug middle-end/29299] [4.2 Regression] gcc "used" attribute has no effect on local-scope static variables
--- Comment #9 from hubicka at ucw dot cz 2006-10-15 22:33 --- Subject: Re: [4.2 Regresion] gcc "used" attribute has no effect on local-scope static variables > Reopening because it is not fixed for non unit at a time mode (-O0 for C). -O0 gets it right, just -O1 -fno-unit-at-a-time fails, but I am testing patch for this already. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29299
[Bug target/29512] compile time hog / deadloop.
--- Comment #6 from hubicka at ucw dot cz 2006-10-19 19:36 --- Subject: Re: compile time hog / deadloop. > > > --- Comment #5 from rguenth at gcc dot gnu dot org 2006-10-19 15:15 > --- > It _looks_ like we're needlessly recursing into the BINFOs in the i386 > backend. What that code is shooting for is to interpret the class hiearchy as an union - each base class is walked and classified. This is not exactly my area, but does the testcase really have so deep nesting of base classes to this shows up in profiles? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29512
[Bug target/29512] compile time hog / deadloop.
--- Comment #9 from hubicka at ucw dot cz 2006-10-19 23:32 --- Subject: Re: compile time hog / deadloop. Just for a record, we discussed this a bit on IRC. I origionally wrote that loop copying logic from alias.c just to be sure that all the fields from base clases are merged into the result. It seems that TYPE_FIELDS should already contain all of them and if this is true, I think it is safe to drop the first loop as Richard suggest, since we should not worry about other properties of the base class, like aliasing does. The merging does slightly more than just dump merging of all fields into the classes array. For instance it might be convinced that something is missaligned and dump whole thing to memory, so I would preffer that that the patch is tested by comparing assembly of some non-trivial C++ code. But looking at the function, I can't come with scenario, where this would change ABI behaviour. Honza PS: Thianks for looking into this obviously my failure! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29512
[Bug gcov-profile/30650] [4.3 Regression] ICE with -fprofile-use
--- Comment #6 from hubicka at ucw dot cz 2007-02-03 21:55 --- Subject: Re: [4.3 Regression] ICE with -fprofile-use > size = ((histogram->hvalue.counters[0] > + histogram->hvalue.counters[0] / 2) > - / histogram->hvalue.counters[0]); > + / histogram->hvalue.counters[1]); > > micha suggested you meant > > size = ((histogram->hvalue.counters[0] > + histogram->hvalue.counters[1] / 2) >/ histogram->hvalue.counters[1]); > > (upward rounding) Ah, yes, thanks! I probably should've scheduled updating this patch for mainline after the trip as I didn't do particularly good work on it just before leaving :( -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30650
[Bug tree-optimization/31191] [4.3 Regression] 1000% Runtime regression for FreeFEM navier-stokes example
--- Comment #2 from hubicka at ucw dot cz 2007-03-15 22:01 --- Subject: Re: New: [4.3 Regression] 1000% Runtime regression for FreeFEM navier-stokes example > The regression was introduced between r120825 (10s runtime) and r120846 (110s > runtime). The obvious candidate is: > > +2007-01-16 Jan Hubicka <[EMAIL PROTECTED]> > + > + * cgraph.h (cgraph_decide_inlining_incrementally): Kill. > + * tree-pass.h: Reorder to make IPA passes appear toegher. > + (pass_early_inline, pass_inline_parameters, pass_apply_inline): > Declare.+ * cgraphunit.c (cgraph_finalize_function): Do not compute > inling > + parameters, do not call early inliner. This patch enabled more inlining on many of C++ testcases with overall win on C++ benchmark suite. I know this regression goes away with http://gcc.gnu.org/ml/gcc-patches/2007-02/msg00318.html (see Feb 4 results) >From brief analysis I did some time ago, there are number of functions in freefem hitting aliasing grouping limits that didn't hit it before the patch in January, I believe the way it gets solved by simple DSE is the just fact that we never build function with so many memory references to lose track with at first place by elliminating these references early. This seems to be quite common on other C++ testcases too, so I hope we will find way to do this in 4.3. Hopefully Diego will get time to implement the AA before IPA so we can get this without the extra pass. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31191
[Bug target/27869] "-O -fregmove" handles SSE scalar instructions incorrectly
--- Comment #7 from hubicka at ucw dot cz 2007-04-06 16:07 --- Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > Investigating... The attached patch to remove '%' seems correct to me. Merge operating wrapping the (commutative) plus/mult/min/max is not commutative, so '%' is wrong. Or am I missing something? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27869
[Bug target/27869] "-O -fregmove" handles SSE scalar instructions incorrectly
--- Comment #9 from hubicka at ucw dot cz 2007-04-06 17:01 --- Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > > > --- Comment #8 from stevenb dot gcc at gmail dot com 2007-04-06 16:43 > --- > Subject: Re: "-O -fregmove" handles SSE scalar instructions incorrectly > > > The attached patch to remove '%' seems correct to me. Merge operating > > wrapping the (commutative) plus/mult/min/max is not commutative, so '%' > > is wrong. Or am I missing something? > > The commutative alternative asm output should also be removed. I don't think there are alternative asm outputs, just intel variants, unless I missed something. The min/max commutative variant should be removed however, I am testing the attached patch. Honza --- Comment #10 from hubicka at ucw dot cz 2007-04-06 17:01 --- Created an attachment (id=13334) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13334&action=view) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27869
[Bug middle-end/28071] [4.1 regression] A file that can not be compiled in reasonable time/space
--- Comment #66 from hubicka at ucw dot cz 2007-04-17 19:38 --- Subject: Re: [4.1 regression] A file that can not be compiled in reasonable time/space Just to add some explanation to the numbers, df_scan_ref_pool is 50MB, the bitmaps quoted are 8MB each. Given nature of the testcase, I think we are doing satisfactory job at -O2. At -O3 there are still problems (the testcase -O2 has one huge BB, at -O3 we have many BBs). PRE explode completely and we need over 1.2GB for -O3 -fno-tree-pre -fno-tree-fre. What is also killing us at -O3 are the bitmaps. 385MB: df-problems.c:2951 (df_chain_create_bb)40198 386574160 385195560 385195560 462958 200MB f-problems.c:984 (df_rd_alloc)40198 385290320 208450840 0 0 110MB df-problems.c:985 (df_rd_alloc)40198 201714640 110324160 0 0 tree-ssa-live.c:540 (new_tree_live_info) 31939 114031520 113098360 0 84523 tree-ssa-live.c:536 (new_tree_live_info) 31939 113096920 113092320 0 80895 Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071
[Bug tree-optimization/37315] [4.4 Regression]: gcc.c-torture/execute/931018-1.c int-compare.c ieee/inf-2.c mzero6.c
--- Comment #5 from hubicka at ucw dot cz 2008-09-02 10:14 --- Subject: Re: [4.4 Regression]: gcc.c-torture/execute/931018-1.c int-compare.c ieee/inf-2.c mzero6.c > Honza, why is tree-inline.c:initialize_cfun not calling > allocate_struct_function and *then* change whatever elements need changing? > There's no comment to reveal the reason. Now, you're just allocating a > cleared > area and doing a shallow-copy, which causes the clone to have e.g. the same > cfun->machine. Badness results. Well, the code is not mine, but it was wirtten at a time struct_function did hold a lot of extra stuff. I will take a look. Why do we allocate MDEP parts of cfun so early? I will try to deffer it to later stage of compilation. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37315
[Bug tree-optimization/37315] [4.4 Regression]: gcc.c-torture/execute/931018-1.c int-compare.c ieee/inf-2.c mzero6.c
--- Comment #7 from hubicka at ucw dot cz 2008-09-02 20:29 --- Subject: Re: [4.4 Regression]: gcc.c-torture/execute/931018-1.c int-compare.c ieee/inf-2.c mzero6.c > > > --- Comment #6 from hp at gcc dot gnu dot org 2008-09-02 10:41 --- > (In reply to comment #5) > > > Well, the code is not mine, but it was wirtten at a time struct_function > > did hold a lot of extra stuff. > > SVN blamed you for that code in tree-inline.c and the revision range is yours. I've moved that code around from verioning function and enabled it by default. I am attaching patch I am testing now. It makes versioning to copy only fields needed and I also noticed that we leak memory in gimple_body pointer. This pointer is pointing to sequence holding first basicblock after build_cfg and when this block is removed it is pointing to bogus sequence then. Setting this pointer to 0 confuse some places that use gimple_body to check presence of functionbody. I added predicate for this and replaced checks by cgraph's analyzed flag that will be needed for WHOPR anyway. Index: cgraph.c === *** cgraph.c(revision 139886) --- cgraph.c(working copy) *** cgraph_create_edge (struct cgraph_node * *** 631,637 gcc_assert (is_gimple_call (call_stmt)); ! if (!gimple_body (callee->decl)) edge->inline_failed = N_("function body not available"); else if (callee->local.redefined_extern_inline) edge->inline_failed = N_("redefined extern inline functions are not " --- 631,637 gcc_assert (is_gimple_call (call_stmt)); ! if (!callee->analyzed) edge->inline_failed = N_("function body not available"); else if (callee->local.redefined_extern_inline) edge->inline_failed = N_("redefined extern inline functions are not " *** dump_cgraph_node (FILE *f, struct cgraph *** 1059,1065 fprintf (f, " needed"); else if (node->reachable) fprintf (f, " reachable"); ! if (gimple_body (node->decl)) fprintf (f, " body"); if (node->output) fprintf (f, " output"); --- 1059,1065 fprintf (f, " needed"); else if (node->reachable) fprintf (f, " reachable"); ! if (gimple_has_body_p (node->decl)) fprintf (f, " body"); if (node->output) fprintf (f, " output"); Index: cgraphunit.c === *** cgraphunit.c(revision 139886) --- cgraphunit.c(working copy) *** verify_cgraph_node (struct cgraph_node * *** 639,645 } if (node->analyzed - && gimple_body (node->decl) && !TREE_ASM_WRITTEN (node->decl) && (!DECL_EXTERNAL (node->decl) || node->global.inlined_to)) { --- 639,644 *** cgraph_analyze_functions (void) *** 860,866 { fprintf (cgraph_dump_file, "Initial entry points:"); for (node = cgraph_nodes; node != first_analyzed; node = node->next) ! if (node->needed && gimple_body (node->decl)) fprintf (cgraph_dump_file, " %s", cgraph_node_name (node)); fprintf (cgraph_dump_file, "\n"); } --- 859,865 { fprintf (cgraph_dump_file, "Initial entry points:"); for (node = cgraph_nodes; node != first_analyzed; node = node->next) ! if (node->needed) fprintf (cgraph_dump_file, " %s", cgraph_node_name (node)); fprintf (cgraph_dump_file, "\n"); } *** cgraph_analyze_functions (void) *** 912,918 { fprintf (cgraph_dump_file, "Unit entry points:"); for (node = cgraph_nodes; node != first_analyzed; node = node->next) ! if (node->needed && gimple_body (node->decl)) fprintf (cgraph_dump_file, " %s", cgraph_node_name (node)); fprintf (cgraph_dump_file, "\n\nInitial "); dump_cgraph (cgraph_dump_file); --- 911,917 { fprintf (cgraph_dump_file, "Unit entry points:"); for (node = cgraph_nodes; node != first_analyzed; node = node->next) ! if (node->needed) fprintf (cgraph_dump_file, " %s", cgraph_node_name (node)); fprintf (cgraph_dump_file, "\n\nInitial "); dump_cgraph (cgraph_dump_file); *** cgraph_analyze_functions (void) *** 926,935 tree decl = node->decl; next = node->next; ! if (node->local.finalized && !gimple_body (decl)) cgraph_reset_node (node); ! if (!node->reachable && gimple_body (de
[Bug middle-end/37343] [4.4 Regression] ICE in expand_expr_real_1, at expr.c:7290
--- Comment #4 from hubicka at ucw dot cz 2008-09-03 14:30 --- Subject: Re: [4.4 Regression] ICE in expand_expr_real_1, at expr.c:7290 Hi, this is switch conversion bug. It attempts to convert the switch and construct static array with &function_parameter in initializer that naturally is wrong. I am testing the following fix: Index: tree-switch-conversion.c === --- tree-switch-conversion.c(revision 139938) +++ tree-switch-conversion.c(working copy) @@ -298,7 +298,7 @@ check_final_bb (void) if ((bb == info.switch_bb || (single_pred_p (bb) && single_pred (bb) == info.switch_bb)) - && !is_gimple_min_invariant (gimple_phi_arg_def (phi, i))) + && !is_gimple_ip_invariant (gimple_phi_arg_def (phi, i))) { info.reason = " Non-invariant value from a case\n"; return false; /* Non-invariant argument. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37343
[Bug tree-optimization/37345] [4.4 Regression] Segfault in decl_function_context (TYPE_MAIN_VARIANT)
--- Comment #5 from hubicka at ucw dot cz 2008-09-03 18:33 --- Subject: Re: [4.4 Regression] Segfault in decl_function_context (TYPE_MAIN_VARIANT) Testing: * tree.c (build_function_type_skip_args): Build distinct type copy; set TYPE_CONTEXT. (build_function_decl_skip_args): Set type of new decl not orig decl; clear DECL_VINDEX for methods turned into functions. Index: tree.c === --- tree.c (revision 139938) +++ tree.c (working copy) @@ -5925,7 +5925,12 @@ build_function_type_skip_args (tree orig TYPE_ARG_TYPES (new_type) = new_reversed; } else -new_type = build_function_type (TREE_TYPE (orig_type), new_reversed); +{ + new_type += build_distinct_type_copy (build_function_type (TREE_TYPE (orig_type), +new_reversed)); + TYPE_CONTEXT (new_type) = TYPE_CONTEXT (orig_type); +} /* This is a new type, not a copy of an old type. Need to reassociate variants. We can handle everything except the main variant lazily. */ @@ -5959,7 +5964,12 @@ build_function_decl_skip_args (tree orig new_type = TREE_TYPE (orig_decl); if (prototype_p (new_type)) new_type = build_function_type_skip_args (new_type, args_to_skip); - TREE_TYPE (orig_decl) = new_type; + TREE_TYPE (new_decl) = new_type; + + /* For declarations setting DECL_VINDEX (i.e. methods) + we expect first argument to be THIS pointer. */ + if (bitmap_bit_p (args_to_skip, 0)) +DECL_VINDEX (new_decl) = NULL_TREE; return new_decl; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37345
[Bug c++/37057] 7 Internal Compiler Errors when compiling OpenFOAM-1.5
--- Comment #12 from hubicka at ucw dot cz 2008-09-14 09:18 --- Subject: Re: 7 Internal Compiler Errors when compiling OpenFOAM-1.5 > Honza, > > I may not be analyzing this correctly, but it looks like > cgraph_remove_unreachable_nodes() may be removing something that is not dead. > Is cgraph handling constructors and destructors on non-ELF systems correctly? It ought to be. I.e. as long as I remember, the constructors either appear local but have DECL_STATIC_CONSTRUCTOR on ELF or they are externally visible functions with specially mangled names. Perhaps there is yet another way to handle it? They should be recognized by decide_is_function_needed predicate in cgraphunit.c Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37057
[Bug c++/37057] 7 Internal Compiler Errors when compiling OpenFOAM-1.5
--- Comment #13 from hubicka at ucw dot cz 2008-09-14 09:24 --- Subject: Re: 7 Internal Compiler Errors when compiling OpenFOAM-1.5 Looking at the log, it seems to be another leak where multiple declarations points to single struct function. This is of course quite evil bug with various side effects (surprisingly often the sharing just works, but it is always memory leak and tends to break various targets), we had instance of it already in IPCP versioning and template instantiation. THis is why I added explicit ggc_free in cgraph code now. I am just leaving for US trip, so I am not sure how soon I will be able to look, but debugging is quite easy. You figure out the shared decls (i.e. one is in the backtrace where garbagecollector crashes, other is the one we call ggc_free on the struct function when removing it). Then breakpoint on the end of ggc_page with condition of result being either of those addresses to see who builds them and the second one is the wrong copy. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37057
[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel
--- Comment #12 from hubicka at ucw dot cz 2008-01-21 09:54 --- Subject: Re: [4.3 Regression] Revision 131576 miscompiled 178.galgel and haydn tester using -O3 -funroll-loops -fpeel-loops -ffast-math -march=native -mtune=native -mfpmath=sse has also started failing at 17th, so this should rule out the extra precision theory :( Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852
[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel
--- Comment #11 from hubicka at ucw dot cz 2008-01-21 09:48 --- Subject: Re: [4.3 Regression] Revision 131576 miscompiled 178.galgel Hi, also one extra data point, britten tester that uses -O3 -fomit-frame-pointer -ftree-loop-linear -funroll-all-loops -fprefetch-loop-arrays -march=k8 -mfpmath=sse -ffast-math is not getting the faiulre at 32bit runs... Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852
[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel
--- Comment #10 from hubicka at ucw dot cz 2008-01-21 09:44 --- Subject: Re: [4.3 Regression] Revision 131576 miscompiled 178.galgel > > > --- Comment #8 from hjl dot tools at gmail dot com 2008-01-21 03:09 > --- > Add -ffloat-store seems to fix the problem. I will verify it. I also found that, but since -ffloat-store changes almost all register allocation, it is dificult to tell if it made just hid the bug or not :( -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852
[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel
--- Comment #9 from hubicka at ucw dot cz 2008-01-21 09:43 --- Subject: Re: [4.3 Regression] Revision 131576 miscompiled 178.galgel > > > --- Comment #7 from hjl dot tools at gmail dot com 2008-01-20 16:43 > --- > Oops. This one Yes, it does make sense. I must've missed it, since I was updating similar cases all around the file. It should not be code correcntess issue, just code quality - we still rely on REG_N_CALLS when asking if register crosses a call, just use frequency to drive decision on profitability of using caller save register. Thanks! I will also look into the two regressions this afternoon unless you beat me. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852
[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel
--- Comment #15 from hubicka at ucw dot cz 2008-01-21 16:42 --- Subject: Re: [4.3 Regression] Revision 131576 miscompiled 178.galgel > I tried -mpc64. It also works. I would declare this a proof that it is extra preccision issue and simply update testers to use -mpc64. It is what most of other compilers do anyway and thus we would get more comparable scores. Thanks a lot for testing it (I've scheduled same test for tonight, but you've beaten me. I still will try if it works for -mfpmath=sse case too) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852
[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel
--- Comment #18 from hubicka at ucw dot cz 2008-01-22 20:12 --- Subject: Re: [4.3 Regression] Revision 131576 miscompiled 178.galgel > So, we indeed think this issue is invalid, right? I am convinced so. -mpc64 does not change code generation. I messed up the change of britten flags to include -mpc64 by default so I am waiting for tonight runs if failure is consistently gone. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #5 from hubicka at ucw dot cz 2008-01-26 20:19 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher > and if it is just not available (i == NULL) might give inconsistent > answers. I will look into this. cgraph_local_info used to trap when asked for unavailable local info, looks like someone fixed the bug by removing the assert. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #6 from hubicka at ucw dot cz 2008-01-27 13:54 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher cgraph_local_info still behaves as expected returning NULL when info is not computed yet. Unfortunately check to simply ignore it when not available has been added to ix86_function_regparm that makes this bug lead to wrong code. (revision 123146) There are two occurences where we can ix86_function_regparm. First one is for compatibility checking, I would just declare it invalid - we don't want the type comatiblity to depend on backend decision and I think it is perfectly sane to reject any types specifying different REGPARM values or where one specify and other doesn't. I am testing attached patch and will commit it if passes. Other case is from gimplifier, I am looking into it. This definitly has to go or we need to drop the feature :( Honza Index: config/i386/i386.c === --- config/i386/i386.c (revision 131882) +++ config/i386/i386.c (working copy) @@ -3148,6 +3148,7 @@ ix86_comp_type_attributes (const_tree ty { /* Check for mismatch of non-default calling convention. */ const char *const rtdstr = TARGET_RTD ? "cdecl" : "stdcall"; + tree attr1, attr2; if (TREE_CODE (type1) != FUNCTION_TYPE && TREE_CODE (type1) != METHOD_TYPE) @@ -3155,11 +3156,27 @@ ix86_comp_type_attributes (const_tree ty /* Check for mismatched fastcall/regparm types. */ if ((!lookup_attribute ("fastcall", TYPE_ATTRIBUTES (type1)) - != !lookup_attribute ("fastcall", TYPE_ATTRIBUTES (type2))) - || (ix86_function_regparm (type1, NULL) - != ix86_function_regparm (type2, NULL))) + != !lookup_attribute ("fastcall", TYPE_ATTRIBUTES (type2 return 0; + /* We don't want to use ix86_function_regparm here: it's decision depends + on middle end information, like localness of functions. Here we only want + to know if types are declared compatible. */ + attr1 = lookup_attribute ("regparm", TYPE_ATTRIBUTES (type1)); + attr2 = lookup_attribute ("regparm", TYPE_ATTRIBUTES (type2)); + + if ((attr1 != NULL_TREE) != (attr2 != NULL_TREE)) +return 0; + + if (attr1) +{ + int val1 = TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (attr1))); + int val2 = TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (attr2))); + + if (val1 != val2) + return 0; +} + /* Check for mismatched sseregparm types. */ if (!lookup_attribute ("sseregparm", TYPE_ATTRIBUTES (type1)) != !lookup_attribute ("sseregparm", TYPE_ATTRIBUTES (type2))) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #8 from hubicka at ucw dot cz 2008-01-27 18:10 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher > One more reason to gimplify unit-at-a-time... Yep, on the other hand there is probably not much need to get that amount of architectural detail so easy. I am looking into what makes the compilation to diverge. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #9 from hubicka at ucw dot cz 2008-01-27 19:24 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher However the failure here is not early calling of cgraph_local_info (it is ugly, but harmless, we are just looking for target promoting rules that we don't change). The problem is good old type system broken scenario: the forward declaration has no prorotype and thus might be vararg and thus it is not regparmized, however the definition is correct. When expanding the call we use type of the call, so the wrong type. I am testing the attached patch. My type merging code fixes this too and obvioiusly we should work harder on maybe_vaarg rule for local functions, this should make lot of difference on K&R code (I wonder if any is still around in usual distro) Honza Index: config/i386/i386.c === *** config/i386/i386.c (revision 131882) --- config/i386/i386.c (working copy) *** init_cumulative_args (CUMULATIVE_ARGS *c *** 3432,3437 --- 3449,3455 rtx libname, /* SYMBOL_REF of library name or 0 */ tree fndecl) { + struct cgraph_local_info *i = fndecl ? cgraph_local_info (fndecl) : NULL; memset (cum, 0, sizeof (*cum)); /* Set up the number of registers to use for passing arguments. */ *** init_cumulative_args (CUMULATIVE_ARGS *c *** 3442,3447 --- 3460,3474 cum->mmx_nregs = MMX_REGPARM_MAX; cum->warn_sse = true; cum->warn_mmx = true; + + /* Because type might mismatch in between caller and callee, we need to + use actual type of function for local calls. + FIXME: cgraph_analyze can be told to actually record if function uses + va_start so for local functions maybe_vaarg can be made aggressive + helping K&R code. + FIXME: once typesytem is fixed, we won't need this code anymore. */ + if (i && i->local) + fntype = TREE_TYPE (fndecl); cum->maybe_vaarg = (fntype ? (!prototype_p (fntype) || stdarg_p (fntype)) : !libname); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982
[Bug middle-end/34969] [4.3 regression] ICE with -fipa-cp -ffast-math
--- Comment #6 from hubicka at ucw dot cz 2008-01-28 20:51 --- Subject: Re: [4.3 regression] ICE with -fipa-cp -ffast-math > No, I mean providing something like cgraph_update_edges_for_call_stmt (tree > old, tree new); or alternatively cgraph_remove_edge_from_call_stmt () and > cgraph_add_edge_from_call_stmt () and call those two unconditionally. My stragegy so far was to rebuild cgraph edges from scratch when needed (that is something possibly changed). We can probably handle that via function local TODO flag here too. Updating the edges across multiple passes is kind of sliperly, since we would need to tie cgraph a lot more with gimple, pretty much as we do for CFG. This seems too much pie in the sky project with current organization of trees and folders, I hope that with tuples we will have a lot closer correspondence in between actual statements and calls here. Since we need to have edges up to date across inliner, I guess the patch is fine (as would be addint the TODO flag). Thanks for looking into it! Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34969
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #11 from hubicka at ucw dot cz 2008-01-29 17:51 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher Hi, the patch seems to pass my local testing, but on Zdenek's tester I get curious results on i686: Tests that now fail, but worked before: libmudflap.cth/pass37-frag.c (-O2) (rerun 14) execution test libmudflap.cth/pass37-frag.c (-O2) (rerun 18) execution test libmudflap.cth/pass37-frag.c (-O2) (rerun 18) output pattern test libmudflap.cth/pass37-frag.c (-O3) (rerun 2) execution test libmudflap.cth/pass37-frag.c (-O3) (rerun 2) output pattern test libmudflap.cth/pass37-frag.c (-O3) (rerun 3) execution test libmudflap.cth/pass37-frag.c (-O3) (rerun 3) output pattern test libmudflap.cth/pass37-frag.c (-static -DSTATIC) (rerun 10) execution test libmudflap.cth/pass37-frag.c (-static -DSTATIC) (rerun 16) execution test libmudflap.cth/pass37-frag.c (-static -DSTATIC) (rerun 16) output pattern test libmudflap.cth/pass37-frag.c (rerun 10) execution test libmudflap.cth/pass37-frag.c (rerun 10) output pattern test libmudflap.cth/pass37-frag.c (rerun 12) execution test libmudflap.cth/pass37-frag.c (rerun 12) output pattern test libmudflap.cth/pass37-frag.c (rerun 13) execution test libmudflap.cth/pass37-frag.c (rerun 14) execution test libmudflap.cth/pass37-frag.c (rerun 14) output pattern test libmudflap.cth/pass37-frag.c (rerun 15) execution test libmudflap.cth/pass37-frag.c (rerun 17) execution test libmudflap.cth/pass37-frag.c (rerun 17) output pattern test libmudflap.cth/pass37-frag.c (rerun 2) execution test libmudflap.cth/pass37-frag.c (rerun 2) output pattern test libmudflap.cth/pass37-frag.c (rerun 4) execution test libmudflap.cth/pass37-frag.c (rerun 4) output pattern test libmudflap.cth/pass39-frag.c (-O2) (rerun 11) execution test libmudflap.cth/pass39-frag.c (-O2) (rerun 4) execution test libmudflap.cth/pass39-frag.c (-O3) (rerun 13) execution test libmudflap.cth/pass39-frag.c (-O3) (rerun 13) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 10) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 10) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 14) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 14) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 16) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 16) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 4) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 4) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 5) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 5) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 7) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 7) output pattern test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 9) execution test libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 9) output pattern test libmudflap.cth/pass39-frag.c (rerun 1) execution test libmudflap.cth/pass39-frag.c (rerun 1) output pattern test libmudflap.cth/pass39-frag.c (rerun 15) execution test libmudflap.cth/pass39-frag.c (rerun 18) execution test libmudflap.cth/pass39-frag.c (rerun 18) output pattern test libmudflap.cth/pass39-frag.c (rerun 19) execution test libmudflap.cth/pass39-frag.c (rerun 9) execution test libmudflap.cth/pass39-frag.c (rerun 9) output pattern test libmudflap.cth/pass39-frag.c execution test libmudflap.cth/pass39-frag.c output pattern test libmudflap.cth/pass40-frag.c (-O2) execution test libmudflap.cth/pass40-frag.c (-O2) output pattern test libmudflap.cth/pass40-frag.c (-static -DSTATIC) execution test libmudflap.cth/pass40-frag.c (-static -DSTATIC) output pattern test libmudflap.cth/pass40-f
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #17 from hubicka at ucw dot cz 2008-01-30 15:56 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher > These tests time out from time to time when the testing box is busy, that's > quite > normal. The problem is in the use of sched_yield (), which puts the calling > thread to the end of the runqueue. If there are many processes in the > runqueue, > one or more of the 10 threads might miss the 10 sec timeout in one or more of > the 20 repetitions in 100 sched_yield calls. > So just ignore this. Thanks for explanation. It happent few time in past to me that I ignored mudflap failures incorrectly claiming random noise. Now at least I know how to look for test that is supposed to have this problem. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982
[Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)
--- Comment #38 from hubicka at ucw dot cz 2008-01-30 23:19 --- Subject: Re: [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??) > AFAICT, they are exactly in the form that some targets like it (e.g. > auto-inc/dec and SMALL_REGISTER_CLASS targets). Yep, but all the pointer arithmetic makes us not to realize we are doing quite simple manipulations with array and propagate load/stores through. CSE undoes this later in the game, so we end up with normal offsetted addressing. Doing it earlier should make load/store elimination happier. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863
[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher
--- Comment #23 from hubicka at ucw dot cz 2008-01-30 23:20 --- Subject: Re: [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher > (In reply to comment #21) > > but why does this happen only with -O1? > > Random value in eax register so we could put 0 in some cases but not others. Oops, I am going to commit obvious fix for that. Looks like my tester got lucky. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982
[Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems)
--- Comment #43 from hubicka at ucw dot cz 2008-02-01 22:45 --- Subject: Re: [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??) > TER will not replace any load into an expression if there is more than one use > of the load. Your sample shows multiple uses of each load. If it did this > substitution, it could be introducing worse code, it doesn't know. (TER is > also strictly a single block replacement as well). I noticed that now too. The code after reordering by TER simply need even more registers alive by changling how the temporaries overlap. There is probably no simple heuristics to control this... Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863
[Bug c++/35182] [4.2/4.3 Regression] ICE in coalesce_abnormal_edges
--- Comment #8 from hubicka at ucw dot cz 2008-02-15 11:29 --- Subject: Re: [4.2/4.3 Regression] ICE in coalesce_abnormal_edges The split happens in loop_optimized_init during profile construction pass, not in inliner. #0 make_ssa_name (var=0xb7da1228, stmt=0xb7daf000) at ../../gcc/tree-ssanames.c:150 #1 0x0850c03c in tree_make_forwarder_block (fallthru=0xb7ce8e60) at ../../gcc/tree-cfg.c:4727 #2 0x082905d4 in make_forwarder_block (bb=0xb7da2ca8, redirect_edge_p=0x82952e0 , new_bb_cbk=0) at ../../gcc/cfghooks.c:782 #3 0x08295544 in merge_latch_edges (loop=0xb7daa8dc) at ../../gcc/cfgloop.c:697 #4 0x0829571c in disambiguate_loops_with_multiple_latches () at ../../gcc/cfgloop.c:762 #5 0x08428448 in loop_optimizer_init (flags=0) at ../../gcc/loop-init.c:65 #6 0x0846d892 in tree_estimate_probability () at ../../gcc/predict.c:1354 Don't we have separate PR for this? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35182
[Bug c++/35262] [4.4 Regression]: FAIL: abi_check
--- Comment #3 from hubicka at ucw dot cz 2008-02-20 21:05 --- Subject: Re: [4.4 Regression]: FAIL: abi_check > (In reply to comment #1) > > Jan, is this related to your patch? > > And if it is, then there is another bug somewhere else anyways as it could > only > change inlining decisions. It might be the negative inlining cost assert. I will look into that. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262
[Bug c++/35262] [4.4 Regression]: FAIL: abi_check
--- Comment #5 from hubicka at ucw dot cz 2008-02-20 23:39 --- Subject: Re: [4.4 Regression]: FAIL: abi_check OK, if it really is just inlining decision difference then we are fine. I guess we can either update symbol list or mark always_inline > because of a changing inlining decision. My concern, however, is whether it's > ok not to inline such a tiny function (fyi, the function is defined in > basic_ios.h all the uses are in fstream.tcc). First blush, I don't think it's > ok... I can look into the reason why it is not getting inlined. It would help to have preprocessed testcase as I am travelling now :) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262
[Bug c++/35262] [4.4 Regression]: FAIL: abi_check
--- Comment #10 from hubicka at ucw dot cz 2008-03-03 00:50 --- Subject: Re: [4.4 Regression]: FAIL: abi_check > Confirmed, of course. Honza, any news on the inlining issue? Sorry, I looked into it, got confused and then distracted by other problem and forgot to return back. At second look, the function is estimated to make code to grow slightly after being inlined. The functions is still getting inlined by default, however there are code paths are marked by __builtin_expect as unlikely. The call sites on these paths are considered cold and thus function is not inlined there to optimize for size. This seems very sane behaviour at first sight. However the catch is that the function is bit bigger than call sequence but still after being fully inlined, the overall code size is estimated to shrink because offline body is eliminated. So it is definitly better to inline and have more options for optimizing. Quite corner case but easy enough to handle. I am testing patch to take this fact into account and lets see if it solves to failure too. It gets function in question inlined, but makes other not inlined this time ;) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262
[Bug c++/35262] [4.4 Regression]: FAIL: abi_check
--- Comment #14 from hubicka at ucw dot cz 2008-03-03 23:46 --- Subject: Re: [4.4 Regression]: FAIL: abi_check > Honza, I'm sorry, can you please double-check the fix? On my x86_64-linux > machines I'm not seeing any progress :( Hi, this is what I get from our thester: Differences: Tests that now work, but didn't before: abi_check so it ought to make differnece for i686-linux. It is quite possible that things differ on 64bit hosts, we are staying on quite fragile ground here because in the current cost metric the benefits of inlining are very close to costs. Given the nature that function call of the wrapped function is a bit chepaer than call of the wrapper is quite correct. The decision on whether function should be inlined or not depends on many things, like overall size, ABI details etc. I see it is quite irritating for ABI checking. I am sending it for testing for x86-64 now. Perhaps we can deal with this by checking ABI with -Os that is a bit less dependent on fine grained performance decision, like we are seeing here? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262
[Bug c++/35262] [4.4 Regression]: FAIL: abi_check
--- Comment #16 from hubicka at ucw dot cz 2008-03-04 07:03 --- Subject: Re: [4.4 Regression]: FAIL: abi_check > Note however, that the patch also didn't help Geoff's i686-linux tester, just > have a look to gcc-testresults. Sorry, I had two versions of patch and managed to commit the wrong copy. Sent correct one to ML. It should be fixed now. > > > I think we should not mix the two issues, here. The first issue is that, IMO, > the function we are discussing should be inlined, it's very small and we > always > inlined it until recently. The point I wanted to make is that inliner when knowing to be inlining a cold call (because it was hinted so by __builtin_expect) is correctly a lot more sellective. Basically anything that expands to function call and some extra code around is a loss for code size inlining. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262
[Bug tree-optimization/35629] [4.4 Regression] gcc.dg/tree-ssa/loop-25.c scan-tree-dump-times profile fails
--- Comment #6 from hubicka at ucw dot cz 2008-03-19 23:06 --- Subject: Re: [4.4 Regression] gcc.dg/tree-ssa/loop-25.c scan-tree-dump-times profile fails This seems to affect about every target, but it is essentially harmless. I am discussing with Zdenek the proper fix. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35629
[Bug testsuite/35647] FAIL: gcc.dg/cpp/cmdlne-d(I|M)-M.c scan-file (^|\\n)cmdlne-d(I|M)-M[^\\n]*:[^\\n]*cmdlne-d(I|M)-M.c
--- Comment #2 from hubicka at ucw dot cz 2008-03-20 10:32 --- Subject: Re: FAIL: gcc.dg/cpp/cmdlne-d(I|M)-M.c scan-file (^|\\n)cmdlne-d(I|M)-M[^\\n]*:[^\\n]*cmdlne-d(I|M)-M.c > It fails everywhere, due to commit 133342. Author is informed and CC:ed. Sorry for all the breakage. There used to be xfail, but since I've removed at least one bug in implementation (-dM was interpretted as both GCC debugging option and preprocesor directive), I've removed the xfail assuming that the problem is fixed. It seems to pass for me: Executing on host: /root/trunk-an/build/gcc/xgcc -B/root/trunk-an/build/gcc/ /root/trunk-an/gcc/testsuite/gcc.dg/cpp/cmdlne-dI-M.c -dI -M -fno-show-column -E -o cmdlne-dI-M.i(timeout = 300) PASS: gcc.dg/cpp/cmdlne-dI-M.c (test for excess errors) PASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file-not (^|\\n)#define foo bar($|\\n) PASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file-not variable PASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file (^|\\n)cmdlne-dI-M.*:[^\\n]*cmdlne-dI-M.c Executing on host: /root/trunk-an/build/gcc/xgcc -B/root/trunk-an/build/gcc/ /root/trunk-an/gcc/testsuite/gcc.dg/cpp/cmdlne-dM-M.c -dM -M -fno-show-column -E -o cmdlne-dM-M.i(timeout = 300) PASS: gcc.dg/cpp/cmdlne-dM-M.c (test for excess errors) PASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file (^|\\n)#define foo bar($|\\n) PASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file-not variable PASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file (^|\\n)cmdlne-dM-M[^\\n]*:[^\\n]*cmdlne-dM-M.c Executing I didn't check, perhaps it was xpassing for me previously too, but why the testcase works for me and fails everywhere else? Reverting the xfail removal is probably proper fix here. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35647
[Bug middle-end/35781] [4.4 Regression]: Revision 133759 breaks ia64
--- Comment #2 from hubicka at ucw dot cz 2008-04-01 00:15 --- Subject: Re: [4.4 Regression]: Revision 133759 breaks ia64 > > > --- Comment #1 from wilson at tuliptree dot org 2008-03-31 22:42 --- > Subject: Re: New: [4.4 Regression]: Revision 133759 > breaks ia64 > > hjl dot tools at gmail dot com wrote: > > On Linux/ia64, I got > > /net/gnu-13/export/gnu/src/gcc/gcc/gcc/emit-rtl.c: In function `init_emit': > > /net/gnu-13/export/gnu/src/gcc/gcc/gcc/emit-rtl.c:5035: error: structure > > has no > > member named `emit' > > That isn't the only one broken. Just grepping for cfun->emit, I see > that m32c and sparc are broken also. There may also be others that are > broken, I haven't fully studied the patch yet. > > It looks like > cfun->emit->regno_pointer_align > is now > rtl.emit.regno_pointer_align > And since rtl is apparently now static allocated, the old code > if (cfun && cfun->emit->regno_pointer_align)\ > becomes > if (rtl.emit.regno_pointer_align) \ Yes, you are right, there are three backends using emit directly (sparc, m32c and ia64). I've posted the patch for it to http://gcc.gnu.org/ml/gcc-patches/2008-03/msg02043.html If someone could confirm that it solves the ia-64 problem, I will commit it. I am currently out of reach of ia-64 boxes. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35781
[Bug tree-optimization/35795] [4.4 Regression] Revision 133787 breaks ia64
--- Comment #3 from hubicka at ucw dot cz 2008-04-02 09:35 --- Subject: Re: New: [4.4 Regression] Revision 133787 breaks ia64 Hi, I've added the assert to check that we don't initialize RTL world twice without freeing it first (and thus that we don't leak memory). This seems to be the case. Naively, something like this should fix it. I am building a cross and will try to reproduce it. Index: config/ia64/ia64.c === *** config/ia64/ia64.c (revision 133785) --- config/ia64/ia64.c (working copy) *** ia64_output_mi_thunk (FILE *file, tree t *** 9694,9699 --- 9694,9700 final_start_function (insn, file, 1); final (insn, file, 1); final_end_function (); + free_after_compilation (cfun); reload_completed = 0; epilogue_completed = 0; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795
[Bug target/35795] [4.4 Regression] Revision 133787 breaks ia64
--- Comment #7 from hubicka at ucw dot cz 2008-04-03 06:12 --- Subject: Re: [4.4 Regression] Revision 133787 breaks ia64 > The patch is OK. > > But won't all targets that have similar code need the same fix? If I cd > to the config dir, and try "grep final_end_function */*" it looks like > alpha, ia64, m68k, mips, rs6000, score (both score3 and score7), sh, and > sparc all have the same problem. The rs6000 port has already been fixed > via PR 35801. We have an ia64 patch here. We also need patches for the > rest. Thanks, I've just noticed that too and send patch for all the backends. Looks like we was leaking memory here for ages, probably not that big deal since thunks are pretty small functions, but still keeping all the RTL register tables around seems bit expensive. Honza > > Jim > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795 > > --- You are receiving this mail because: --- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795
[Bug target/35795] [4.4 Regression] Revision 133787 breaks ia64
--- Comment #10 from hubicka at ucw dot cz 2008-04-03 15:44 --- Subject: Re: [4.4 Regression] Revision 133787 breaks ia64 > I am pretty sure I saw this one targeting sparc-rtems. Building an updated > tree now to confirm it is the fix. > > ../../../../gcc/libstdc++-v3/src/iostream-inst.cc: In member function 'void > std::basic_iostream >::_ZThn8_NSdD1Ev()': > ../../../../gcc/libstdc++-v3/src/iostream-inst.cc:51: internal compiler error: > in prepare_function_start, at function.c:3940 sparc, alpha, m68k, score and mips contained same problem, so hopefully it is fixed now. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795
[Bug tree-optimization/37709] [4.4 Regression] inlining causes explosion in debug info
--- Comment #10 from hubicka at ucw dot cz 2009-02-12 10:28 --- Subject: Re: [4.4 Regression] inlining causes explosion in debug info > sizeof (tree_block) is 52 bytes on 32-bit hosts. Of these, 8 are unused (ann > and type), 8 are frequently unused (block_fragment stuff -- always write-only > at debug level 0). Moving fragments into an annotation and reusing type for > something would already save 20% of the memory. I was looking into the bug last month but then got distracted by other more urgent things. I guess it is time to get back ;) Even if our blocks representation is not the most effecient around, I think main problem is that we keep too many blocks around that never get it to debug info or are never serving useful purpose. The testcase compiled at -g3 needs about 100MB of DWARF section and before inlining we unfortunately need to keep debug info at -g3 verbosity. I plan to look into what blocks gets ignored by the dwarf2out but also it would be great to figure out if we really need them in debug info at first place. (for example for every inlined function we create container block and then block containing the arguments. Current tree-ssa-live code to prune out blocks preserves both) Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37709
[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well
--- Comment #2 from hubicka at ucw dot cz 2009-02-17 17:43 --- Subject: Re: LTO and -fwhole-program do not play along well Hi, functions are brought local in function_and_variable_visibility that needs to be scheduled after LTO is read in. The pas computes externaly_visible flags that should be up-to-date for early IPA passes before LTO is written out, so I guess we need early function_and_vairable_visibility pass and late one where the first one is not bringing functions local at -fwhole-program -lto Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203
[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well
--- Comment #4 from hubicka at ucw dot cz 2009-02-17 18:01 --- Subject: Re: LTO and -fwhole-program do not play along well > > Hi, > > functions are brought local in function_and_variable_visibility that > > needs to be scheduled after LTO is read in. > > The pas computes externaly_visible flags that should be up-to-date for > > early IPA passes before LTO is written out, so I guess we need early > > function_and_vairable_visibility pass and late one where the first one > > is not bringing functions local at -fwhole-program -lto > > OK, but I think there is a bigger issue here. Even if -flto is *not* > used, we get link errors. Just by compiling each file with > -fwhole-program is enough to reproduce the failure: > > $ gcc -fwhole-program -c f1.c > $ gcc -fwhole-program -c f2.c > $ gcc -fwhole-program -o f f1.o f2.o > > This is just a natural side-effect of using -fwhole-program. It was > not intended to be used like this. This is intended behaviour. -fwhole-program essentially hides everything except for main and functions/variables explicitly marked via attribute, so this is exepcted. You need to use --combine and or -lto to compile programs consisting of multiple compilation units with -fwhole-program. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203
[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well
--- Comment #8 from hubicka at ucw dot cz 2009-02-17 18:34 --- Subject: Re: LTO and -fwhole-program do not play along well > > -fwhole-program essentially hides everything except for main and > > functions/variables explicitly marked via attribute, so this is > > exepcted. You need to use --combine and or -lto to compile programs > > consisting of multiple compilation units with -fwhole-program. > > Yes. As I just replied upthread, I think we could just turn off > flag_whole_program when we know we are just generating IL out of the > front ends. Essentially yes, but since we are restarting the pass queue from later time, won't we miss the visibility pass at linktime? This is why I think we need two passes (one very early and one scheduled after LTO read) Honza > > > Diego. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203 > > --- You are receiving this mail because: --- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203
[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well
--- Comment #12 from hubicka at ucw dot cz 2009-02-18 01:42 --- Subject: Re: LTO and -fwhole-program do not play along well > I believe we should be set already. During LTO read, we execute > ipa_passes, which runs all_small_ipa_passes. So, > pass_ipa_function_and_variable_visibility is run both while writing > the IL and right after we read it in. > > Is that what you were referring to? Well, you need to localize stuff at WHOPR stage, so small IPA passes are unlikely going to be executable there except for those not needing function bodies (like the visibility pass, but unlike inlining and other stuff included). Are those still skipped via the lto flag gate? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203
[Bug c++/39242] [4.4 Regression] Inconsistent reject / accept of code
--- Comment #7 from hubicka at ucw dot cz 2009-02-19 14:59 --- Subject: Re: [4.4 Regression] Inconsistent reject / accept of code > Index: cp/pt.c > === > --- cp/pt.c (revision 144292) > +++ cp/pt.c (working copy) > @@ -15285,7 +15285,7 @@ instantiate_decl (tree d, int defer_ok, >/* ... but we instantiate inline functions so that we can inline > them and ... */ >&& ! (TREE_CODE (d) == FUNCTION_DECL > - && possibly_inlined_p (d)) > + && DECL_DECLARED_INLINE_P (d)) this way we will lose inlining opurtunities at -O3, sine we will never instantiate methods not declared inline. Is standard really making difference in between inline and !inline here? Or we are just supposed to suppress the errors and instantiate only stuff that is complette or are we supposed to give error at -O0 too or is it up to our choice? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39242
[Bug c++/39242] [4.4 Regression] Inconsistent reject / accept of code
--- Comment #9 from hubicka at ucw dot cz 2009-02-19 15:40 --- Subject: Re: [4.4 Regression] Inconsistent reject / accept of code > /* Check to see whether we know that this template will be > instantiated in some other file, as with "extern template" > extension. */ > external_p = (DECL_INTERFACE_KNOWN (d) && DECL_REALLY_EXTERN (d)); > /* In general, we do not instantiate such templates... */ > if (external_p > /* ... but we instantiate inline functions so that we can inline > them and ... */ > && ! (TREE_CODE (d) == FUNCTION_DECL > && DECL_DECLARED_INLINE_P (d)) Hmm, in general it is benefical to instantiate stuff for sake of IPA optimizers. It would be interesting to know how this affects code quality... Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39242
[Bug c++/39242] [4.4 Regression] Inconsistent reject / accept of code
--- Comment #13 from hubicka at ucw dot cz 2009-02-20 14:39 --- Subject: Re: [4.4 Regression] Inconsistent reject / accept of code > > What that means is that we *must not* implicitly instantiate things > > declared "extern template" unless they are DECL_DECLARED_INLINE_P. As a > > consequence, at -O3, we cannot implicitly instantiate non-inline "extern > > template" functions. > > I'm not entirely sure that's what we want it to say, but it does seem like a > reasonable expectation for users to have. > > Beyond this issue, what is the compile speed impact of the earlier change to > use possibly_inlined_p? It seems like it might be making us speculatively > instantiate a lot more functions for potential inlining even at -O1, which I > would expect to cause memory bloat and slower compilation. There was no slowdowns visible on our C++ testers, so I hope it is not too bad. We now get rid of unreachable functions quite early in queue so they are not consuming too much of compilation time. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39242
[Bug c++/14179] [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers
--- Comment #51 from hubicka at ucw dot cz 2009-02-22 14:47 --- Subject: Re: [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers > Honza, you realize that the numbers are completely unreadable in bugzilla, > right? THey need some care to read, the columns are still intact, just interleaved... I wonder why bugzilla insists on the linebreaks? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14179
[Bug c++/14179] [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers
--- Comment #54 from hubicka at ucw dot cz 2009-02-23 16:51 --- Subject: Re: [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers > > Perhaps explicitly freeing would be good idea? > > I certainly have no objection to explicitly freeing storage if we know > we don't need it anymore. Problem is that I don't know enough of C++ parser to be sure where we can safely free this vector? Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14179
[Bug debug/39355] [4.4 Regression] ICE at dwarf2out.c:10353 in loc_descriptor_from_tree_1
--- Comment #9 from hubicka at ucw dot cz 2009-03-05 11:18 --- Subject: Re: [4.4 Regression] ICE at dwarf2out.c:10353 in loc_descriptor_from_tree_1 > --- Comment #8 from jakub at gcc dot gnu dot org 2009-03-05 08:07 --- > Can you reproduce it with stage1 cc1plus (or non-bootstrapped cc1plus) built > by > some older gcc, or only with stage2/stage3 cc1plus? Wonder if cc1plus hasn't > been miscompiled. In any case, as this can't be reproduced with a > cross-compiler, somebody with hppa-linux access needs to debug it. I've just run across interesting memory corruption in Ada: by duplicating DECL node for nested function we ended up with pointer to ggc_freeded DECL_STRUCT_FUNCTION. I have patch to always make functions nonlocal that might solve this problem as well (as we get methods for C++ local classes). I am just on the way to Prague, but I will break it up from my tree then. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39355
[Bug middle-end/39360] [4.4 Regression] ICE in referenced_var_lookup, at tree-dfa.c:563
--- Comment #9 from hubicka at ucw dot cz 2009-03-06 17:30 --- Subject: Re: [4.4 Regression] ICE in referenced_var_lookup, at tree-dfa.c:563 > part. So, either tree-inline.c needs to do the same, or it can't use > referenced_vars bit as a test whether it has been queued already onto > local_decls or not. Honza? Ah I see. Actually original version of patch added function to tree-dfa to check if variable is referenced and used this predicate followed by add_referenced_var if it was not, but then I noticed that referenced_var_check_and_insert is available to I decided to use it. OK, I will prepare this version of patch tomorrow. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39360
[Bug middle-end/39659] [4.5 Regression] ICE building libstdc++v3 functexcept.cc
--- Comment #3 from hubicka at ucw dot cz 2009-04-06 09:28 --- Subject: Re: [4.5 Regression] ICE building libstdc++v3 functexcept.cc Hi, this does not reproduce on my setup, but the following patch should fix it. Honza Index: except.c === --- except.c(revision 145584) +++ except.c(working copy) @@ -853,6 +853,7 @@ remove_unreachable_regions (sbitmap reac r->region_number, first_must_not_throw->region_number); remove_eh_handler_and_replace (r, first_must_not_throw); + first_must_not_throw->may_contain_throw |= r->may_contain_throw; } else bring_to_root (r); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39659
[Bug tree-optimization/39672] Local statics not promoted to const
--- Comment #1 from hubicka at ucw dot cz 2009-04-07 11:44 --- Subject: Re: New: Local statics not promoted to const ipa-reference definitly is supposed to do this transform. I will debug why it does not in this testcase. Honza -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39672