from:"hubicka at ucw dot cz"

[Bug tree-optimization/27882] [4.2 regression] segfault in ipa-inline.c, if (e->callee->local.disregard_inline_limits

2006-06-07 Thread hubicka at ucw dot cz



--- Comment #14 from hubicka at ucw dot cz  2006-06-07 12:18 ---
Subject: Re:  [4.2 regression] segfault in ipa-inline.c, if
(e->callee->local.disregard_inline_limits

> 
> 
> --- Comment #12 from pinskia at gcc dot gnu dot org  2006-06-07 06:00 
> ---
> Wait in tree-inline.c, we do:
>   /* Update callgraph if needed.  */
>   cgraph_remove_node (cg_edge->callee);
> 
> Isn't that wrong as we could inline the callee a couple of times?

It should be OK - if we inline multiple times, we create multiple nodes.
I will look into this.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27882

[Bug middle-end/38074] [4.4 Regression] missed inlining on Core2 Duo due to apparent wrong branch prediction/profile

2008-12-05 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2008-12-05 12:59 ---
Subject: Re:  [4.4 Regression] missed inlining on Core2 Duo due  to apparent
wrong branch prediction/profile

> Honza, can you have a look here?  I suspect the fortran decl issue prevent
Will do.  We however don't distribute profile over cgraph...  This looks
more like one of misguesses cases

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38074

[Bug middle-end/38074] [4.4 Regression] missed inlining on Core2 Duo due to apparent wrong branch prediction/profile

2008-12-05 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2008-12-05 17:15 ---
Subject: Re:  [4.4 Regression] missed inlining on Core2 Duo due  to apparent
wrong branch prediction/profile

OK,
so the problem is that all the paths are leading to noreturn, so the
conditional deciding on what noreturn path will be taken is predictor by
same heuristic in both dirrections.
Our first math heuristic blindly picks the first one in row predicting
the "invalid command line parameters" path to be very likely and main
body to be very unlikely.

This patch fixes it in both ways: when all paths are leading to
noreturn, we disable the heuristics and when heuristics is taken
into consideration for first match we first check that it was not
confused and did not predict edge in both ways.

I also fixed nonsence in compute_call_stmt_bb_frequency noticed by
Jakub. To make frequencies at least little bit sane, I simply add 1 to
both values so we still get calls with higher frequency than 0 predicted
as more often.

Honza

Jan Hubicka  <[EMAIL PROTECTED]>
Jakub Jelinek <[EMAIL PROTECTED]>
* cgraphbuild.c (compute_call_stmt_bb_frequency): Fix handling of 0
entry frequency.
* predict.c (combine_predictions_for_bb): Ignore predictor predicting
in both dirrection for first match heuristics.
(tree_bb_level_predictions): Disable noreturn heuristic when there
is no returning path.
Index: cgraphbuild.c
===
*** cgraphbuild.c   (revision 141929)
--- cgraphbuild.c   (working copy)
*** int
*** 109,121 
  compute_call_stmt_bb_frequency (basic_block bb)
  {
int entry_freq = ENTRY_BLOCK_PTR->frequency;
!   int freq;

if (!entry_freq)
! entry_freq = 1;

!   freq = (!bb->frequency && !entry_freq ? CGRAPH_FREQ_BASE
! : bb->frequency * CGRAPH_FREQ_BASE / entry_freq);
if (freq > CGRAPH_FREQ_MAX)
  freq = CGRAPH_FREQ_MAX;

--- 109,120 
  compute_call_stmt_bb_frequency (basic_block bb)
  {
int entry_freq = ENTRY_BLOCK_PTR->frequency;
!   int freq = bb->frequency;

if (!entry_freq)
! entry_freq = 1, freq++;

!   freq = freq * CGRAPH_FREQ_BASE / entry_freq;
if (freq > CGRAPH_FREQ_MAX)
  freq = CGRAPH_FREQ_MAX;

Index: predict.c
===
*** predict.c   (revision 141929)
--- predict.c   (working copy)
*** combine_predictions_for_bb (basic_block 
*** 820,827 
probability = REG_BR_PROB_BASE - probability;

  found = true;
  if (best_predictor > predictor)
!   best_probability = probability, best_predictor = predictor;

  d = (combined_probability * probability
   + (REG_BR_PROB_BASE - combined_probability)
--- 820,852 
probability = REG_BR_PROB_BASE - probability;

  found = true;
+ /* First match heuristics would be widly confused if we predicted
+both directions.  */
  if (best_predictor > predictor)
!   {
!   struct edge_prediction *pred2;
! int prob = probability;
! 
!   for (pred2 = (struct edge_prediction *) *preds; pred2; pred2 =
pred2->ep_next)
!  if (pred2 != pred && pred2->ep_predictor == pred->ep_predictor)
!{
!  int probability2 = pred->ep_probability;
! 
!  if (pred2->ep_edge != first)
!probability2 = REG_BR_PROB_BASE - probability2;
! 
!  if ((probability < REG_BR_PROB_BASE / 2) != 
!  (probability2 < REG_BR_PROB_BASE / 2))
!break;
! 
!  /* If the same predictor later gave better result, go for
it! */
!  if ((probability >= REG_BR_PROB_BASE / 2 && (probability2 >
probability))
!  || (probability <= REG_BR_PROB_BASE / 2 && (probability2
< probability)))
!prob = probability2;
!}
! if (!pred2)
!   best_probability = prob, best_predictor = predictor;
!   }

  d = (combined_probability * probability
   + (REG_BR_PROB_BASE - combined_probability)
*** static void
*** 1521,1526 
--- 1546,1561 
  tree_bb_level_predictions (void)
  {
basic_block bb;
+   bool has_return_edges = false;
+   edge e;
+   edge_iterator ei;
+ 
+   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+ if (!(e->flags & (EDGE_ABNORMAL | EDGE_FAKE | EDGE_EH)))
+   {
+ has_return_edges = true;
+   break;
+   }

apply_return_prediction ();

*** tree_bb_level_predictions (void)
*** 1535,1541 

  if (is_gimple_call (stmt))
{
! if (gimple_call_flags (stmt) &

[Bug target/38824] [4.4 regression] performance regression of sse code from 4.2/4.3

2009-01-14 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2009-01-15 01:49 ---
Subject: Re:  [4.4 regression] performance regression of sse code from 4.2/4.3

I guess th3 main difference here is that load + addps pair generate 2
uops, while mov + loading addps generate 3 since the move has to go
through the queue.  I will try to change testcase to fit in cache to see
if AMD machine reproduce it too..

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824

[Bug c++/39862] [4.5 Regression] verify_eh_tree failed with -O2

2009-05-08 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2009-05-08 19:44 ---
Subject: Re:  [4.5 Regression] verify_eh_tree failed with -O2

This is very sick side case of updating prev_try pointer in
duplicate_eh_edges.  I think it is clear that maintaining prev_try
pointer just to slightly speed up the lookup in
foreach_reachable_handler is highly impractical.  Once the other bugfix
to EH code is in the tree, I will test patch removing prev_try and
replacing its use by find_prev_try.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39862

[Bug bootstrap/40082] [4.5 Regression] Power bootstrap is broken in building libstdc++

2009-05-09 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2009-05-09 14:44 ---
Subject: Re:  [4.5 Regression] Power bootstrap is broken in building libstdc++

Hi,
I am testing the attached patch.  It makes testcase to compile, does it
solve bootstrap issues too?

Index: ipa.c
===
--- ipa.c   (revision 147317)
+++ ipa.c   (working copy)
@@ -92,6 +92,21 @@ cgraph_postorder (struct cgraph_node **o
   return order_pos;
 }

+/* Look for all functions inlined to NODE and update their inlined_to pointers
+   to INLINED_TO.  */
+
+static void
+update_inlined_to_pointer (struct cgraph_node *node, struct cgraph_node
*inlined_to)
+{
+  struct cgraph_edge *e;
+  for (e = node->callees; e; e = e->next_callee)
+if (e->callee->global.inlined_to)
+  {
+e->callee->global.inlined_to = inlined_to;
+   update_inlined_to_pointer (e->callee, inlined_to);
+  }
+}
+
 /* Perform reachability analysis and reclaim all unreachable nodes.
If BEFORE_INLINING_P is true this function is called before inlining
decisions has been made.  If BEFORE_INLINING_P is false this function also 
@@ -214,7 +229,8 @@ cgraph_remove_unreachable_nodes (bool be
  && !node->callers)
{
  gcc_assert (node->clones);
- node->global.inlined_to = false;
+ node->global.inlined_to = NULL;
+ update_inlined_to_pointer (node, node);
}
   node->aux = NULL;
 }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40082

[Bug middle-end/39886] [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274

2009-05-09 Thread hubicka at ucw dot cz



--- Comment #2 from hubicka at ucw dot cz  2009-05-09 18:29 ---
Subject: Re:  [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274

The problem here is that combine constructs (set (pc) (pc)) as noo-op
move expecting it to be removed immediately.  It is however
misinterpreted as jump that is giong to be removed at the end of BB and 
update_cfg_for_uncondjump add FALLTHRU flag on the edge that fails in
verification.

Interestingly enough modifying update_cfg_for_uncondjump to ignore insns
that are not last in BB seem to cause miscompilations in testsuite
I am looking into the cgraph failures now, but this should be easy to
fix.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39886

[Bug middle-end/40080] [4.5 Regression] error: missing callgraph edge for call stmt

2009-05-09 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2009-05-09 18:52 ---
Subject: Re:  [4.5 Regression] error: missing callgraph edge for call stmt

Hi,
I am testing the following:

Index: cgraphunit.c
===
--- cgraphunit.c(revision 147319)
+++ cgraphunit.c(working copy)
@@ -1762,7 +1762,12 @@ cgraph_materialize_all_clones (void)
for (e = node->callees; e; e = e->next_callee)
  {
tree decl = gimple_call_fndecl (e->call_stmt);
-   if (decl != e->callee->decl)
+   /* When function gets inlined, indirect inlining might've invented
+  new edge for orginally indirect stmt.  Since we are not
+  preserving clones in the original form, we must not update here
+  since other inline clones don't need to contain call to the same
+  call.  Inliner will do the substitution for us later.  */
+   if (decl && decl != e->callee->decl)
  {
gimple new_stmt;
gimple_stmt_iterator gsi;
@@ -1808,6 +1813,9 @@ cgraph_materialize_all_clones (void)
 verify_cgraph_node (node);
 #endif
   }
+#ifdef ENABLE_CHECKING
+  verify_cgraph ();
+#endif
   cgraph_remove_unreachable_nodes (false, cgraph_dump_file);
 }



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40080

[Bug middle-end/40084] [4.5 Regression] Revision 147294 failed 483.xalancbmk in SPEC CPU 2006 at -O3

2009-05-09 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2009-05-09 21:06 ---
Subject: Re:  [4.5 Regression]  Revision 147294 failed 483.xalancbmk in SPEC
CPU 2006 at -O3

Hi,
I am testing the following patch thasolves the ICE.
Problem here was that we cleated ipa-cp clone and clonning proces
allowed devirtualization that however creates new dirrect call and
callgraph is not properly updated.

We should handle devirtualization when inlining or const propagating,
this seems common case. We seem to miss here number of inlining
oppurtunities.

Honza

Index: tree-inline.c
===
--- tree-inline.c   (revision 147320)
+++ tree-inline.c   (working copy)
@@ -1522,7 +1522,8 @@ copy_bb (copy_body_data *id, basic_block
gcc_assert (dest->needed || !dest->analyzed);
if (id->transform_call_graph_edges == CB_CGE_MOVE_CLONES)
  cgraph_create_edge_including_clones (id->dst_node, dest,
stmt,
-  bb->count,
CGRAPH_FREQ_BASE,
+  bb->count,
+ 
compute_call_stmt_bb_frequency (id->dst_node->decl, bb),
   bb->loop_depth,
  
CIF_ORIGINALLY_INDIRECT_CALL);
else
@@ -3535,8 +3536,9 @@ fold_marked_statements (int first, struc
 if (BASIC_BLOCK (first))
   {
 gimple_stmt_iterator gsi;
+   basic_block bb = BASIC_BLOCK (first);

-   for (gsi = gsi_start_bb (BASIC_BLOCK (first));
+   for (gsi = gsi_start_bb (bb);
 !gsi_end_p (gsi);
 gsi_next (&gsi))
  if (pointer_set_contains (statements, gsi_stmt (gsi)))
@@ -3545,14 +3547,26 @@ fold_marked_statements (int first, struc

  if (fold_stmt (&gsi))
{
+ tree decl;
+
  /* Re-read the statement from GSI as fold_stmt() may
 have changed it.  */
  gimple new_stmt = gsi_stmt (gsi);
  update_stmt (new_stmt);

- if (is_gimple_call (old_stmt))
-   cgraph_update_edges_for_call_stmt (old_stmt, new_stmt);
+ if (is_gimple_call (new_stmt)
+ && (decl = gimple_call_fndecl (new_stmt)))
+   {
+ struct cgraph_node *node = cgraph_node
(current_function_decl);

+ if (cgraph_edge (node, old_stmt))
+   cgraph_update_edges_for_call_stmt (old_stmt, new_stmt);
+ else
+   cgraph_create_edge_including_clones
+  (node, cgraph_node (decl), new_stmt, bb->count,
+   compute_call_stmt_bb_frequency
(current_function_decl, bb),
+   bb->loop_depth, CIF_ORIGINALLY_INDIRECT_CALL);
+}
  if (maybe_clean_or_replace_eh_stmt (old_stmt, new_stmt))
gimple_purge_dead_eh_edges (BASIC_BLOCK (first));
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40084

[Bug tree-optimization/40436] [4.5 regression] 0.5% code size regression caused by r147852

2009-06-30 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2009-06-30 12:46 ---
Subject: Re:  [4.5 regression] 0.5% code size regression caused by r147852

> The problem is that early inliner allows to increase code size estimate by
> inlining single function by up to 12 instructions.  This is higher than on
> pretty-ipa branch still, since we are not that good on early optimizing yet 
> and
> some C++ benchmarks (tramp/botan/boost) degrade when reduced to 7 as used by
> tramp3d. In tramp3d it is mostly caused by dead loops in constructors, and I
  ^^ pretty-ipa :)
> hope that merging IPA-SRA and CD-DCE improvements will care this on all three
> benchmarks. At -O2 early inliner needs to be somewhat speculative since it
> don't know the function profiles yet. It however seems stupid to allow code
> size growth at -Os in general.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436

[Bug debug/40573] [4.4/4.5 Regression] DWARF for inlined subroutines refers to the outlined copy

2009-06-30 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2009-06-30 14:07 ---
Subject: Re:  [4.4/4.5 Regression] DWARF for inlined subroutines refers to the
outlined copy

Hmm, I tought GCC was doing the same thing for years.  So we need
abstract function for each inline?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40573

[Bug tree-optimization/40436] [4.5 regression] 0.5% code size regression caused by r147852

2009-06-30 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2009-06-30 23:36 ---
Subject: Re:  [4.5 regression] 0.5% code size regression caused by r147852

> I see no effect whatsoever of the patch for for CSiBE on arm-elf-unknown.
At -Os?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436

[Bug tree-optimization/40585] [4.3/4.4/4.5 Regression] tracer duplicates blocks w/o adjusting EH tree

2009-07-01 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2009-07-01 10:47 ---
Subject: Re:  [4.3/4.4/4.5 Regression] tracer duplicates blocks w/o adjusting
EH tree

Hi,
the following patch should prevent tracer from copying resx.  It is
remarkably ugly I need to unconstify all the copy_bb_p predicates, so
perhaps I will look into fixing the RTL expanders to allow multiple RESX
blocks instead.

Index: cfghooks.c
===
--- cfghooks.c  (revision 149102)
+++ cfghooks.c  (working copy)
@@ -874,7 +874,7 @@ tidy_fallthru_edges (void)
 /* Returns true if we can duplicate basic block BB.  */

 bool
-can_duplicate_block_p (const_basic_block bb)
+can_duplicate_block_p (basic_block bb)
 {
   if (!cfg_hooks->can_duplicate_block_p)
 internal_error ("%s does not support can_duplicate_block_p",
Index: cfghooks.h
===
--- cfghooks.h  (revision 149102)
+++ cfghooks.h  (working copy)
@@ -75,7 +75,7 @@ struct cfg_hooks
   bool (*predicted_by_p) (const_basic_block bb, enum br_predictor predictor);

   /* Return true when block A can be duplicated.  */
-  bool (*can_duplicate_block_p) (const_basic_block a);
+  bool (*can_duplicate_block_p) (basic_block a);

   /* Duplicate block A.  */
   basic_block (*duplicate_block) (basic_block a);
@@ -160,7 +160,7 @@ extern void tidy_fallthru_edge (edge);
 extern void tidy_fallthru_edges (void);
 extern void predict_edge (edge e, enum br_predictor predictor, int
probability);
 extern bool predicted_by_p (const_basic_block bb, enum br_predictor
predictor);
-extern bool can_duplicate_block_p (const_basic_block);
+extern bool can_duplicate_block_p (basic_block);
 extern basic_block duplicate_block (basic_block, edge, basic_block);
 extern bool block_ends_with_call_p (basic_block bb);
 extern bool block_ends_with_condjump_p (const_basic_block bb);
Index: bb-reorder.c
===
--- bb-reorder.c(revision 149102)
+++ bb-reorder.c(working copy)
@@ -176,7 +176,7 @@ static basic_block copy_bb (basic_block,
 static fibheapkey_t bb_to_key (basic_block);
 static bool better_edge_p (const_basic_block, const_edge, int, int, int, int,
const_edge);
 static void connect_traces (int, struct trace *);
-static bool copy_bb_p (const_basic_block, int);
+static bool copy_bb_p (basic_block, int);
 static int get_uncond_jump_length (void);
 static bool push_to_next_round_p (const_basic_block, int, int, int,
gcov_type);
 static void find_rarely_executed_basic_blocks_and_crossing_edges (edge **,
@@ -1157,7 +1157,7 @@ connect_traces (int n_traces, struct tra
when code size is allowed to grow by duplication.  */

 static bool
-copy_bb_p (const_basic_block bb, int code_may_grow)
+copy_bb_p (basic_block bb, int code_may_grow)
 {
   int size = 0;
   int max_size = uncond_jump_length;
Index: tree-cfg.c
===
--- tree-cfg.c  (revision 149102)
+++ tree-cfg.c  (working copy)
@@ -5131,8 +5131,17 @@ gimple_move_block_after (basic_block bb,
 /* Return true if basic_block can be duplicated.  */

 static bool
-gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED)
+gimple_can_duplicate_bb_p (basic_block bb)
 {
+  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+
+  /* RTL expander has quite artificial limitation to at most one RESX
instruction
+ per region.  It can be fixed by turning 1-1 map to 1-many map, but since
the
+ code needs to be rewritten to gimple level lowering and there is little
reason
+ for duplicating RESX instructions in order to optimize code performance,
we
+ just disallow it for the moment.  */
+  if (!gsi_end_p (gsi) && gimple_code (gsi_stmt (gsi)) == GIMPLE_RESX)
+return false;
   return true;
 }

Index: cfgrtl.c
===
--- cfgrtl.c(revision 149102)
+++ cfgrtl.c(working copy)
@@ -3161,7 +3161,7 @@ struct cfg_hooks rtl_cfg_hooks = {
should only be used through the cfghooks interface, and we do not want to
move them here since it would require also moving quite a lot of related
code.  They are in cfglayout.c.  */
-extern bool cfg_layout_can_duplicate_bb_p (const_basic_block);
+extern bool cfg_layout_can_duplicate_bb_p (basic_block);
 extern basic_block cfg_layout_duplicate_bb (basic_block);

 struct cfg_hooks cfg_layout_rtl_cfg_hooks = {


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40585

[Bug middle-end/39886] [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274

2009-07-02 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2009-07-02 10:05 ---
Subject: Re:  [4.5 Regression] ICE in purge_dead_edges, at cfgrtl.c:2274

> Would you mind seeing if your patch was the same?
I wanted to prevent the (set pc pc) trick, but this seems like easier fix
for the problem :)

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39886

[Bug tree-optimization/40436] [4.5 regression] 0.5% code size regression caused by r147852

2009-07-02 Thread hubicka at ucw dot cz



--- Comment #13 from hubicka at ucw dot cz  2009-07-02 10:10 ---
Subject: Re:  [4.5 regression] 0.5% code size regression caused by r147852

OK, on i386 it has some effect according to our nightly tester
it is 3524421->3510754.  The size used to be as low as 3431090
so it is just small improvement.  I guess I will commit the patch anyway
as it is quite obvious fix.

The other problem might be the "likely_eliminated_by_inlining_p"
predicate that is very optimisitic.

This predicate makes inliner to believe that all indirect reads and
writes to/from pointers passed to function or function parameters will
be optimized out.  This is important to allow inlining of methods and
SRAing out objects in C++ and devirtualizing calls, but for C code it is
bit too optimistic.

Partly this can be cured by IPA-SRA and Martin has WIP patch for
clonning that contains more fine grained analysis of function body size
specialized for given parameters. I however doubt they will catch all
the cases we need for C++.  Perhaps simply disabling the predicate for
-Os or making it just weak hint (removing some percentage of estimated
cost) is best way to go, I am just re-testing it on vangelis with size
estimates ignoring it.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40436

[Bug middle-end/40388] [4.5 Regression] another null pointer in remove_unreachable_regions

2009-07-10 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2009-07-10 22:36 ---
Subject: Re:  [4.5 Regression] another null pointer in
remove_unreachable_regions

> 569 EXECUTE_IF_SET_IN_BITMAP (i->aka, 0, n, bi)
> (gdb) p i->aka
> $1 = (bitmap) 0x0
oops, forgot about this issue.  Testing obvious patch checking for
i->aka being NULL.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40388

[Bug tree-optimization/40676] [4.5 Regression] internal compiler error: verify_ssa error: definition in block 5 does not dominate use in block 7

2009-07-11 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2009-07-11 22:45 ---
Subject: Re:  [4.5 Regression] internal compiler error: verify_ssa error:
definition in block 5 does not dominate use in block 7

Thinking about this more, we change here dominance relation in
not-so-obvious way.  It is not really textbook case with presence of
both abnormal edges that might prevent forwarding consistently
everything across the empty BBs and virtual operands that may remain in
the BBs otherwise empty.

I think we need
1) forward the edges in the tree-ssa-dce itself (i.e. don't do the edge
forwarding only when control flow stmt becomes dead but for every edge
leading to dead BB that is not abnormal)
2) for empty BBs that remains in the program (only reason would be
because they are destination of abnormal edge), send all virtual PHIs
for updating since we can not be sure dominance relations are preserved.

Sounds sane?  If so, I will give it a try tomorrow.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40676

[Bug tree-optimization/40676] [4.5 Regression] internal compiler error: verify_ssa error: definition in block 5 does not dominate use in block 7

2009-07-12 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2009-07-12 16:18 ---
Subject: Re:  [4.5 Regression] internal compiler error: verify_ssa error:
definition in block 5 does not dominate use in block 7

Hi,
there is interesting difficulty with this plan.

When we have something like

BB1: if (test) goto BB2 else BB3;
BB2:
BB3: A=PHI (0 from BB1, 1 from BB2)

we end up forwarding edge BB1->BB2 to BB3 resulting in wrong code
problem.  This is because how control dependency is formulated.
When visiting BB we first mark live its control dependent BBs (that
contains conditionals deciding if BB will be executed at all) and when
visiting PHI we mark control dependency BB of source BBs of edges
leading to PHI.

In this case control dependent BB2 is BB1, so we correctly mark the test
as neccesary, but we never mark BB2 as neccesary in any way.

I checked original Cytron formulation of the CD-DCE and it is not
forwarding edges of all branches, only of branches being removed just as
current mainline does.  I saw the forwarding of all branches on some
slides presenting CD-DCE but I am not sure if this can be cheaply done
correctly (one would need control dependence relation not only for BBs,
but also for edges, or implicit split edge BBs of every edge that leads
to PHI).

The following patch fixes ICE by implementing #2 from my previous comment.
Wihtout #1 we end up with some unnecesary virtuals being sent for
renaming (those virtuals that exist in otherwise empty BBs), but I doubt
it is that big deal.

I am regtesting&bootstrapping this fix.

Honza

Index: tree-ssa-dce.c
===
--- tree-ssa-dce.c  (revision 149499)
+++ tree-ssa-dce.c  (working copy)
@@ -1137,7 +1162,7 @@ eliminate_unnecessary_stmts (void)
   for (bb = ENTRY_BLOCK_PTR->next_bb; bb != EXIT_BLOCK_PTR; bb = next_bb)
{
  next_bb = bb->next_bb;
- if (!(bb->flags & BB_REACHABLE))
+ if (!TEST_BIT (bb_contains_live_stmts, bb->index))
{
  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next
(&gsi))
if (!is_gimple_reg (gimple_phi_result (gsi_stmt (gsi
@@ -1159,8 +1184,11 @@ eliminate_unnecessary_stmts (void)
if (found)
  mark_virtual_phi_result_for_renaming (gsi_stmt (gsi));
  }
- delete_basic_block (bb);
+ if (!(bb->flags & BB_REACHABLE))
+   delete_basic_block (bb);
}
+ else
+   gcc_assert (bb->flags & BB_REACHABLE);
}
 }
   FOR_EACH_BB (bb)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40676

[Bug middle-end/40388] [4.5 Regression] another null pointer in remove_unreachable_regions

2009-07-12 Thread hubicka at ucw dot cz



--- Comment #12 from hubicka at ucw dot cz  2009-07-12 22:44 ---
Subject: Re:  [4.5 Regression] another null pointer in
remove_unreachable_regions

> The testsuite failure was due to a double paste into the testcase; fixing that
> maxes it work.

Uh, double application of patch..
Thanks for fixing it!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40388

[Bug tree-optimization/40759] [4.5 Regression] segfault in useless_type_conversion_p

2009-07-15 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2009-07-15 11:29 ---
Subject: Re:  [4.5 Regression] segfault in useless_type_conversion_p

I hope that patch for PR40676 should cure those problems.  I am just on
the way to Prague, but I will try to look into it tomorrow.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40759

[Bug tree-optimization/40759] [4.5 Regression] segfault in useless_type_conversion_p

2009-07-23 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2009-07-23 16:15 ---
Subject: Re:  [4.5 Regression] segfault in useless_type_conversion_p

Hi,
the problem here is in removing virtual PHI.  We replace uses of the
virtual PHI results by the corresponding VAR_DECL and send symbol for
renaming.  However the replacement is done only for live statements and
we send for renaming only if any live statements are found.

The problem here is that virutal PHI defines vop used by dead statement.
The dead statement however define vop used by live statement.  At the
time we are removing the dead statement, live statement gets former PHI
result, now dead in its vuse.

The following patch solves it by simply updating all uses, dead or
alive.  It woudl be possible to keep this check and add check into code
deleting dead_statements to update when result of dead PHI is propagated
through.

I am bootsrapping/regtesting this version.

Index: tree-ssa-dce.c
===
--- tree-ssa-dce.c  (revision 150009)
+++ tree-ssa-dce.c  (working copy)
@@ -828,9 +828,6 @@ mark_virtual_phi_result_for_renaming (gi
 }
   FOR_EACH_IMM_USE_STMT (stmt, iter, gimple_phi_result (phi))
 {
-  if (gimple_code (stmt) != GIMPLE_PHI
- && !gimple_plf (stmt, STMT_NECESSARY))
-continue;
   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
 SET_USE (use_p, SSA_NAME_VAR (gimple_phi_result (phi)));
   update_stmt (stmt);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40759

[Bug tree-optimization/40874] Function object abstraction penalty with inline functions.

2009-07-29 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2009-07-29 08:08 ---
Subject: Re:  Function object abstraction penalty with inline functions.

> I'll take this for now.
My preferred way of fixing this would be to include FRE pass.
Unfortunately my last benchmarks adding FRE early wasn't showing much of
win on our benchmark suite...  Still it seems right thing to do.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40874

[Bug tree-optimization/24653] [4.1 regression] EON regressed seriously on x86-64

2005-11-21 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2005-11-21 14:44 ---
Subject: Re:  [4.1 regression] EON regressed seriously on x86-64

> 
> 
> --- Comment #8 from pinskia at gcc dot gnu dot org  2005-11-21 13:30 
> ---
> Fixed at least on the mainline for 4.2.0.

I am going to fix it on 4.1 branch too once testing converge.  However I
would still like to see DCE after DOM or reordered DCE and DOM.  Even if
the CCP patch fixes the EON regression one way, this problem seem pretty
common to C++ code (see my tramp3d results I posted).

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24653

[Bug c++/27369] [4.1/4.2 Regression] tree check ICE when attribute externally_visible used

2006-07-21 Thread hubicka at ucw dot cz



--- Comment #13 from hubicka at ucw dot cz  2006-07-21 15:13 ---
Subject: Re:  [4.1/4.2 Regression] tree check ICE when attribute
externally_visible used

> 
> 
> --- Comment #12 from mmitchel at gcc dot gnu dot org  2006-07-21 08:38 
> ---
> I think that Comment #10 shows that handle_externally_visible should not be
> registering things with cgraph, as we shouldn't ever have anything pointing at
> a re-declaration.

Hi,
in January I made patch for similar problem that simply deffers handling
of externally visible and used attributes after compilation unit is
finalized.  I am going to update it for current tree and re-test.  Does
it look safe?

Hi,
at the present externally_visible on:
extern const char *mystr;   /* normally in a header */
const char *mystr __attribute__ ((externally_visible));
int
main (int argc, char **argv)
{
  mystr = argv[0];
  return (0);
}
is cowardly ignored.  This is because handle_externally_visible_attribute is
called before decl merging and it produce new cgraph node with
externally_visible flag that is never merged to real decl node.

externally_visible is in many ways symmetric to used attribute, so I looked on
how it is implemented, but found it somewhat sliperly to copy.
Used attribute is processed on cgraph_finalize_* machinery first, but since
it might be added retrospectivly, it is also processed by c frontend when
merging decls
and finally it sets TREE_SYMBOL_REFERENCED flag that survive decl merging to
get the early used attributes right.  This is way too crazy and easy to break,
so I moved both attributes to new place - at the end of compilation the
declarations
are travelled and we check whether the attributes are present.

Since this is miscompilation bug, I would like to see it solved in 4.1 too.
I wonder whether this scheme seems safe to others or if I should make less
intrusive approach?  (perhaps moving externally_visible only)

Bootstrapped/regtested i686-pc-gnu-linux, OK for mainline and possibly 4.1?

2006-01-19  Jan Hubicka  <[EMAIL PROTECTED]>
* craph.c (cgraph_varpool_nodes): Export.
(decide_is_variable_needed): Do not worry about "used" attribute.
* cgraph.h (cgraph_varpool_nodes): Declare.
* cgraphunit.c (decide_is_function_needed): Do not worry about "used"
attribute
(process_function_and_variable_attributes): New function.
(cgraph_finalize_compilation_unit): Call it.
* c-decl.c (finish_decl): Do not worry about used attribute.
* c-common.c (handle_externally_visible_attribute): Only validate.

Index: cgraph.c
===
*** cgraph.c(revision 109820)
--- cgraph.c(working copy)
*** static GTY((param_is (struct cgraph_varp
*** 132,138 
  struct cgraph_varpool_node *cgraph_varpool_nodes_queue,
*cgraph_varpool_first_unanalyzed_node;

  /* The linked list of cgraph varpool nodes.  */
! static GTY(()) struct cgraph_varpool_node *cgraph_varpool_nodes;

  /* End of the varpool queue.  Needs to be QTYed to work with PCH.  */
  static GTY(()) struct cgraph_varpool_node *cgraph_varpool_last_needed_node;
--- 132,138 
  struct cgraph_varpool_node *cgraph_varpool_nodes_queue,
*cgraph_varpool_first_unanalyzed_node;

  /* The linked list of cgraph varpool nodes.  */
! struct cgraph_varpool_node *cgraph_varpool_nodes;

  /* End of the varpool queue.  Needs to be QTYed to work with PCH.  */
  static GTY(()) struct cgraph_varpool_node *cgraph_varpool_last_needed_node;
*** bool
*** 838,845 
  decide_is_variable_needed (struct cgraph_varpool_node *node, tree decl)
  {
/* If the user told us it is used, then it must be so.  */
!   if (node->externally_visible
!   || lookup_attribute ("used", DECL_ATTRIBUTES (decl)))
  return true;

/* ??? If the assembler name is set by hand, it is possible to assemble
--- 838,844 
  decide_is_variable_needed (struct cgraph_varpool_node *node, tree decl)
  {
/* If the user told us it is used, then it must be so.  */
!   if (node->externally_visible)
  return true;

/* ??? If the assembler name is set by hand, it is possible to assemble
Index: cgraph.h
===
*** cgraph.h(revision 109820)
--- cgraph.h(working copy)
*** extern GTY(()) struct cgraph_node *cgrap
*** 242,247 
--- 242,248 

  extern GTY(()) struct cgraph_varpool_node
*cgraph_varpool_first_unanalyzed_node;
  extern GTY(()) struct cgraph_varpool_node *cgraph_varpool_nodes_queue;
+ extern GTY(()) struct cgraph_varpool_node *cgraph_varpool_nodes;
  extern GTY(()) struct cgraph_asm_node *cgraph_asm_nodes;
  extern GTY(()) int cgraph_order;

Index: cgraphunit.c
===
*** cgraphunit.c(revision 109820)
--- cgraphunit.c(working copy)

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-07-22 Thread hubicka at ucw dot cz



--- Comment #10 from hubicka at ucw dot cz  2006-07-22 13:47 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
this patch makes the -O2 case work pretty well on tree side.  Inliner
expands code from 8MB to 40MB of GGC memory that seems under control.
Aliasing peaks at 85MB that also don't seem completely unresonable.
I will need to give it more testing.  I believe inliner is always ggc
safe but it is easy to be mistaken here.
The patch also speeds up the inline heuristic by prunning out the
impossible edges early making the priority queue smaller.
Also I am quite curious how inliner manages to produce 800MB of
garbage...

Honza

Index: ipa-inline.c
===
*** ipa-inline.c(revision 115645)
--- ipa-inline.c(working copy)
*** update_caller_keys (fibheap_t heap, stru
*** 413,418 
--- 413,419 
bitmap updated_nodes)
  {
struct cgraph_edge *edge;
+   const char *failed_reason;

if (!node->local.inlinable || node->local.disregard_inline_limits
|| node->global.inlined_to)
*** update_caller_keys (fibheap_t heap, stru
*** 421,426 
--- 422,441 
  return;
bitmap_set_bit (updated_nodes, node->uid);
node->global.estimated_growth = INT_MIN;
+ 
+   if (!node->local.inlinable)
+ return;
+   /* Prune out edges we won't inline into anymore.  */
+   if (!cgraph_default_inline_p (node, &failed_reason))
+ {
+   for (edge = node->callers; edge; edge = edge->next_caller)
+   if (edge->aux)
+ {
+   fibheap_delete_node (heap, edge->aux);
+   edge->aux = NULL;
+ }
+   return;
+ }

for (edge = node->callers; edge; edge = edge->next_caller)
  if (edge->inline_failed)
Index: tree-inline.c
===
*** tree-inline.c   (revision 115645)
--- tree-inline.c   (working copy)
*** expand_call_inline (basic_block bb, tree
*** 2163,2172 
/* Update callgraph if needed.  */
cgraph_remove_node (cg_edge->callee);

-   /* Declare the 'auto' variables added with this inlined body.  */
-   record_vars (BLOCK_VARS (id->block));
id->block = NULL_TREE;
successfully_inlined = TRUE;

   egress:
input_location = saved_location;
--- 2163,2171 
/* Update callgraph if needed.  */
cgraph_remove_node (cg_edge->callee);

id->block = NULL_TREE;
successfully_inlined = TRUE;
+   ggc_collect ();

   egress:
input_location = saved_location;
*** declare_inline_vars (tree block, tree va
*** 2556,2562 
  {
tree t;
for (t = vars; t; t = TREE_CHAIN (t))
! DECL_SEEN_IN_BIND_EXPR_P (t) = 1;

if (block)
  BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars);
--- 2555,2567 
  {
tree t;
for (t = vars; t; t = TREE_CHAIN (t))
! {
!   DECL_SEEN_IN_BIND_EXPR_P (t) = 1;
!   gcc_assert (!TREE_STATIC (t) && !TREE_ASM_WRITTEN (t));
!   cfun->unexpanded_var_list =
!   tree_cons (NULL_TREE, t,
!  cfun->unexpanded_var_list);
! }

if (block)
  BLOCK_VARS (block) = chainon (BLOCK_VARS (block), vars);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-07-22 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2006-07-22 17:12 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
this avoids inliner to produce quadratically many STMT list nodes, so
inlining is now resonably fast.  Next offenders are alias info, PRE,
regmove, global alloc and schedulers.

Index: tree-cfg.c
===
*** tree-cfg.c  (revision 115645)
--- tree-cfg.c  (working copy)
*** tree_redirect_edge_and_branch_force (edg
*** 4158,4164 
  static basic_block
  tree_split_block (basic_block bb, void *stmt)
  {
!   block_stmt_iterator bsi, bsi_tgt;
tree act;
basic_block new_bb;
edge e;
--- 4158,4165 
  static basic_block
  tree_split_block (basic_block bb, void *stmt)
  {
!   block_stmt_iterator bsi;
!   tree_stmt_iterator tsi_tgt;
tree act;
basic_block new_bb;
edge e;
*** tree_split_block (basic_block bb, void *
*** 4192,4204 
}
  }

!   bsi_tgt = bsi_start (new_bb);
!   while (!bsi_end_p (bsi))
! {
!   act = bsi_stmt (bsi);
!   bsi_remove (&bsi, false);
!   bsi_insert_after (&bsi_tgt, act, BSI_NEW_STMT);
! }

return new_bb;
  }
--- 4193,4209 
}
  }

!   if (bsi_end_p (bsi))
! return new_bb;
! 
!   /* Split the statement list - avoid re-creating new containers as this
!  brings ugly quadratic memory consumption in the inliner.  
!  (We are still quadratic since we need to update stmt BB pointers,
!  sadly) */
!   new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi);
!   for (tsi_tgt = tsi_start (new_bb->stmt_list);
!!tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt))
! set_bb_for_stmt (tsi_stmt (tsi_tgt), new_bb);

return new_bb;
  }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-07-22 Thread hubicka at ucw dot cz



--- Comment #12 from hubicka at ucw dot cz  2006-07-22 18:09 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
I am attaching the .optimized dump of this testcase.  It is quite good
demonstration on how SRA and TER tends to increase register pressure in
code like:


;; Function add (add)

Analyzing Edge Insertions.
add (x, y)
{
  double r$min;

:
  r$min = x.min + y.min;
  .max = x.max + y.max;
  .min = r$min;
  return ;

}

;; Function mul (mul)

Analyzing Edge Insertions.
mul (x, y)
{
  double y$min;
  double y$max;
  double x$min;
  double x$max;
  double d;
  double c;
  double b;
  double a;

:
  x$max = x.max;
  x$min = x.min;
  y$max = y.max;
  y$min = y.min;
  a = y$min * x$min;
  b = y$max * x$min;
  c = y$min * x$max;
  d = y$max * x$max;
  .max = max (max (a, b), max (c, d));
  .min = min (min (a, b), min (c, d));
  return ;

}



;; Function fz (fz)

fz (x, y, z)
{

:
  tmp3 = pow (z, 3.7e+1);
  tmp7 = pow (y, 2.0e+0);
  tmp9 = pow (z, 3.6e+1);
  tmp14 = pow (y, 3.0e+0);
  tmp16 = pow (z, 3.5e+1);
...
  tmp3922 = pow (x, 3.8e+1);
  D.17848 = pow (x, 3.9e+1);
  D.17965 = pow (y, 3.9e+1);
  D.17968 = pow (z, 3.9e+1);
  return tmp3 * x * 2.04629333124046830505449179327115416526794433594e+1 * y +
tmp9 * tmp7 * x * 1.63737898728226838329646852798759937286376953125e+2 + tmp16
* tmp14 * x * 3.102825991153964650948182679712772369384765625e+2 + tmp23 *
tmp21 * x * -1.38580890184729059910750947892665863037109375e+3 + tmp30 * tmp28
* x * -4.39080063708386560961116629187017679214477539062e+1 + tmp37 * tmp35 * x
* 1.737348223038549986085854470729827880859375e+4 + tmp44 * tmp42 * x *
-1.069806869373114386689849197864532470703125e+4 + tmp51 * tmp49 * x *
-3.542086638969252817332744598388671875e+4 + tmp58 * tmp56 * x *
-3.091774346229622824466787278652191162109375e+4 + tmp65 * tmp63 * x *
1.568088658621288946889400482177734375e+5 + tmp72 * tmp70 * x *
4.19376520881160162389278411865234375e+5 + tmp79 * tmp77 * x *
2.0111082929561330820433795452117919921875e+5 + tmp86 * tmp84 * x *
-4.337742627231603837572038173675537109375e+5 + tmp93 * tmp91 * x *
-4.829501801337040960788726806640625e+5 + tmp100 * tmp98 * x *
5.32241994551055715419352054595947265625e+5 + tmp107 * tmp105 * x *
1.8250994926701225340366363525390625e+6 + tmp114 * tmp112 * x *
1.6382205795514374040067195892333984375e+6 + tmp121 * tmp119 * x *
1.1912621023960295133292675018310546875e+5 + tmp128 * tmp126 * x *
8.811503159726611338555812835693359375e+5 + tmp135 * tmp133 * x *
2.690164492243868880905210971832275390625e+5 + tmp142 * tmp140 * x *
2.271892026609037420712411403656005859375e+5 + tmp149 * tmp147 * x *
1.795814638975697453133761882781982421875e+5 + tmp156 * tmp154 * x *
-3.94381184819339658133685588836669921875e+5 + tmp163 * tmp161 * x *
7.64450454622797551564872264862060546875e+5 + tmp170 * tmp168 * x *
6.9298171586054741055704653263092041015625e+4 + tmp177 * tmp175 * x *
-3.129066099043917492963373661041259765625e+5 + tmp184 * tmp182 * x *
-4.0792914801556640304625034332275390625e+5 + tmp191 * tmp189 * x *
7.3512920753349564620293676853179931640625e+4 + tmp198 * tmp196 * x *
3.5470695311840399881475605070590972900390625e+3 + tmp205 * tmp203 * x *
-8.8733450804951236932538449764251708984375e+4 + tmp212 * tmp210 * x *
-1.3805889644669676272314973175525665283203125e+4 + tmp219 * tmp217 * x *
-7.54301319902873729006387293338775634765625e+3 + tmp226 * tmp224 * x *
2.23731170493404579246998764574527740478515625e+3 + tmp233 * tmp231 * x *
-3.903765115338947599581779539585113525390625e+2 + tmp240 * tmp238 * x *
4.743319333283892547115101478993892669677734375e+2 + tmp247 * tmp245 * x *
-6.32641294603530113249689748045057058334350585938e+1 + tmp252 * x *
-6.76527508139541300380415123072452843189239501953e+0 * z + tmp258 * x *
-4.51436297228304250772623618104262277483940124512e-1 + tmp263 * x *
2.89405090268957065902100111998151987791061401367e+0 + tmp9 * tmp268 *
-3.7483157190701700756108039058744907379150390625e+2 * y + tmp16 * tmp7 *
tmp268 * 9.276025613194925654170219786465167999267578125e+2 + tmp23 * tmp14 *
tmp268 * 1.35840047018872951412049587815847412109375e+2 + tmp30 * tmp21 *
tmp268 * -3.2681330410168111484381370246410369873046875e+3 + tmp37 * tmp28 *
tmp268 * 2.77737094612259534187614917755126953125e+3 + tmp44 * tmp35 * tmp268 *
2.2773056570869275674340315163135528564453125e+3 + tmp51 * tmp42 * tmp268 *
9.2295963366692260024137794971466064453125e+4 + tmp58 * tmp49 * tmp268 *
-3.049601738325569895096123218536376953125e+5 + tmp65 * tmp56 * tmp268 *
-2.69300746038850047625601291656494140625e+5 + tmp72 * tmp63 * tmp268 *
3.92479526798162725754082202911376953125e+5 + tmp79 * tmp70 * tmp268 *
-1.4348648827185891568660736083984375e+6 + tmp86 * tmp77 * tmp268 *
1.2925352909364881925284862518310546875e+6 + tmp93 * tmp84 * tmp268 *
3.44742843619707785546779632568359375e+6 + tmp100 * tmp91 * tmp268 *
2.2975221813043109141290187835693359375e+6 + tmp107 * tmp98 * tmp268

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-07-22 Thread hubicka at ucw dot cz



--- Comment #14 from hubicka at ucw dot cz  2006-07-22 19:30 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
with the attached patch I can cure the regmove quadratic behaviour and
the time report is not so unresonable now:

 gnu_dev_major gnu_dev_minor gnu_dev_makedev max min f fx fy fz add addl addr
sub subl subr mul mull mulr divl ipow fi
Analyzing compilation unitPerforming intraprocedural optimizations
Assembling functions:
 max min add addl addr sub subl subr mul mull mulr divl ipow fz fy fx f fi {GC
126177k -> 85112k} {GC 327625k -> 39474k}
Execution times (seconds)
 garbage collection:   0.83 ( 0%) usr   0.00 ( 0%) sys   0.82 ( 0%) wall   
   0 kB ( 0%) ggc
 callgraph construction:   0.16 ( 0%) usr   0.02 ( 1%) sys   0.16 ( 0%) wall   
1147 kB ( 0%) ggc
 callgraph optimization:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 533 kB ( 0%) ggc
 ipa reference :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc
 ipa pure const:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 ipa type escape   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc
 trivially dead code   :   0.45 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) wall   
   0 kB ( 0%) ggc
 life analysis :  21.38 ( 3%) usr   0.02 ( 1%) sys  21.39 ( 3%) wall   
1120 kB ( 0%) ggc
 life info update  :   0.54 ( 0%) usr   0.00 ( 0%) sys   0.61 ( 0%) wall   
   0 kB ( 0%) ggc
 alias analysis:   0.87 ( 0%) usr   0.00 ( 0%) sys   0.89 ( 0%) wall   
4266 kB ( 1%) ggc
 register scan :   0.42 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall   
 150 kB ( 0%) ggc
 rebuild jump labels   :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
   0 kB ( 0%) ggc
 preprocessing :   0.27 ( 0%) usr   0.06 ( 2%) sys   0.36 ( 0%) wall   
 471 kB ( 0%) ggc
 lexical analysis  :   0.04 ( 0%) usr   0.05 ( 2%) sys   0.08 ( 0%) wall   
   0 kB ( 0%) ggc
 parser:   0.12 ( 0%) usr   0.03 ( 1%) sys   0.17 ( 0%) wall   
3207 kB ( 1%) ggc
 inline heuristics :  15.14 ( 2%) usr   0.01 ( 0%) sys  15.26 ( 2%) wall   
1486 kB ( 0%) ggc
 integration   :  21.35 ( 3%) usr   0.12 ( 4%) sys  21.71 ( 3%) wall  
33445 kB ( 8%) ggc
 tree gimplify :   0.18 ( 0%) usr   0.01 ( 0%) sys   0.19 ( 0%) wall   
3341 kB ( 1%) ggc
 tree eh   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CFG construction :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
1338 kB ( 0%) ggc
 tree CFG cleanup  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
  20 kB ( 0%) ggc
 tree VRP  :   0.38 ( 0%) usr   0.01 ( 0%) sys   0.42 ( 0%) wall   
  11 kB ( 0%) ggc
 tree copy propagation :   0.23 ( 0%) usr   0.01 ( 0%) sys   0.28 ( 0%) wall   
 222 kB ( 0%) ggc
 tree store copy prop  :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall   
   4 kB ( 0%) ggc
 tree find ref. vars   :   0.10 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall   
8137 kB ( 2%) ggc
 tree PTA  :   1.29 ( 0%) usr   0.04 ( 1%) sys   1.36 ( 0%) wall   
  57 kB ( 0%) ggc
 tree alias analysis   :   1.89 ( 0%) usr   0.20 ( 7%) sys   2.10 ( 0%) wall   
   0 kB ( 0%) ggc
 tree PHI insertion:   1.68 ( 0%) usr   0.01 ( 0%) sys   1.70 ( 0%) wall   
  18 kB ( 0%) ggc
 tree SSA rewrite  :   0.62 ( 0%) usr   0.04 ( 1%) sys   0.65 ( 0%) wall  
17084 kB ( 4%) ggc
 tree SSA other:   0.48 ( 0%) usr   0.08 ( 3%) sys   0.56 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA incremental  :   1.20 ( 0%) usr   0.00 ( 0%) sys   1.24 ( 0%) wall   
   0 kB ( 0%) ggc
 tree operand scan :   1.48 ( 0%) usr   0.34 (11%) sys   1.93 ( 0%) wall  
15634 kB ( 4%) ggc
 dominator optimization:   1.05 ( 0%) usr   0.05 ( 2%) sys   1.05 ( 0%) wall   
2698 kB ( 1%) ggc
 tree SRA  :   1.05 ( 0%) usr   0.09 ( 3%) sys   1.15 ( 0%) wall  
24835 kB ( 6%) ggc
 tree STORE-CCP:   0.09 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall   
   4 kB ( 0%) ggc
 tree CCP  :   0.51 ( 0%) usr   0.02 ( 1%) sys   0.56 ( 0%) wall   
 154 kB ( 0%) ggc
 tree reassociation:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
   0 kB ( 0%) ggc
 tree PRE  : 296.46 (45%) usr   0.49 (16%) sys 298.81 (45%) wall  
19481 kB ( 5%) ggc
 tree FRE  :   0.96 ( 0%) usr   0.05 ( 2%) sys   1.00 ( 0%) wall   
7991 kB ( 2%) ggc
 tree forward propagate:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree conservative DCE :   0.54 ( 0%) usr   0.00 ( 0%) sys   0.54 ( 0%) wall   
   0 kB ( 0%) ggc
 tree aggressive DCE   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall   
   0 kB ( 0%) ggc
 tree DSE  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   8 kB ( 0%) ggc
 tree SSA uncprop  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA to normal:  27.19 ( 4

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-07-22 Thread hubicka at ucw dot cz



--- Comment #16 from hubicka at ucw dot cz  2006-07-22 20:51 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
with the attached patch that saves roughly 10 minutes of tree-into-ssa
pass, I can compile with -O3 -fno-tree-fre -fno-tree-pre.  Only without
checking-enabled since we do incredibly deep dominator walks running out
of stack space that can be considered as bug too. 
TER still manages to enfore few thousdand temporaries with overlapping
liveranges.

THe out-of-ssa pass spends most of time in calculate_live_on_exit
and calculate_live_on_entry that looks rather symmetric to problem cured
by the attached patch, but I don't see directly how to avoid the
quadratic behaviour there.

Honza

 garbage collection:   1.22 ( 0%) usr   0.10 ( 1%) sys   8.40 ( 1%) wall   
   0 kB ( 0%) ggc
 callgraph construction:   0.14 ( 0%) usr   0.03 ( 0%) sys   0.18 ( 0%) wall   
1147 kB ( 0%) ggc
 callgraph optimization:   0.07 ( 0%) usr   0.01 ( 0%) sys   0.45 ( 0%) wall   
 533 kB ( 0%) ggc
 ipa reference :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc
 ipa pure const:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 ipa type escape   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 cfg cleanup   :   3.89 ( 1%) usr   0.01 ( 0%) sys   4.11 ( 0%) wall   
1576 kB ( 1%) ggc
 trivially dead code   :   0.46 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) wall   
   0 kB ( 0%) ggc
 life analysis :  51.34 ( 9%) usr   2.65 (21%) sys  73.91 ( 5%) wall   
2653 kB ( 1%) ggc
 life info update  :  48.97 ( 9%) usr   0.14 ( 1%) sys  50.68 ( 4%) wall   
 641 kB ( 0%) ggc
 alias analysis:   0.69 ( 0%) usr   0.00 ( 0%) sys   1.05 ( 0%) wall   
4139 kB ( 1%) ggc
 register scan :   0.41 ( 0%) usr   0.00 ( 0%) sys   0.40 ( 0%) wall   
   0 kB ( 0%) ggc
 rebuild jump labels   :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc
 preprocessing :   0.37 ( 0%) usr   0.06 ( 0%) sys   0.34 ( 0%) wall   
 471 kB ( 0%) ggc
 lexical analysis  :   0.01 ( 0%) usr   0.05 ( 0%) sys   0.07 ( 0%) wall   
   0 kB ( 0%) ggc
 parser:   0.09 ( 0%) usr   0.02 ( 0%) sys   0.18 ( 0%) wall   
3207 kB ( 1%) ggc
 inline heuristics :  14.79 ( 3%) usr   0.02 ( 0%) sys  14.86 ( 1%) wall   
1118 kB ( 0%) ggc
 integration   :  17.07 ( 3%) usr   0.22 ( 2%) sys  17.36 ( 1%) wall  
79483 kB (27%) ggc
 tree gimplify :   0.15 ( 0%) usr   0.01 ( 0%) sys   0.17 ( 0%) wall   
3341 kB ( 1%) ggc
 tree eh   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 tree CFG construction :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
1338 kB ( 0%) ggc
 tree CFG cleanup  :   4.27 ( 1%) usr   0.00 ( 0%) sys   4.27 ( 0%) wall   
  20 kB ( 0%) ggc
 tree VRP  :   1.26 ( 0%) usr   0.03 ( 0%) sys   1.33 ( 0%) wall   
  14 kB ( 0%) ggc
 tree copy propagation :   0.85 ( 0%) usr   0.05 ( 0%) sys   0.94 ( 0%) wall   
 313 kB ( 0%) ggc
 tree store copy prop  :   0.27 ( 0%) usr   0.01 ( 0%) sys   0.28 ( 0%) wall   
   5 kB ( 0%) ggc
 tree find ref. vars   :   0.16 ( 0%) usr   0.03 ( 0%) sys   0.18 ( 0%) wall  
12044 kB ( 4%) ggc
 tree PTA  :   1.55 ( 0%) usr   0.06 ( 0%) sys   1.63 ( 0%) wall   
  57 kB ( 0%) ggc
 tree alias analysis   :   2.81 ( 0%) usr   0.29 ( 2%) sys   3.10 ( 0%) wall   
   0 kB ( 0%) ggc
 tree PHI insertion:   0.57 ( 0%) usr   0.92 ( 7%) sys   1.52 ( 0%) wall   
3137 kB ( 1%) ggc
 tree SSA rewrite  :   2.33 ( 0%) usr   0.06 ( 0%) sys   5.02 ( 0%) wall  
21592 kB ( 7%) ggc
 tree SSA other:   0.41 ( 0%) usr   0.16 ( 1%) sys   0.65 ( 0%) wall   
   0 kB ( 0%) ggc
 tree SSA incremental  :   4.18 ( 1%) usr   0.45 ( 4%) sys   4.72 ( 0%) wall   
 520 kB ( 0%) ggc
 tree operand scan :   1.79 ( 0%) usr   0.69 ( 5%) sys  39.97 ( 3%) wall  
18374 kB ( 6%) ggc
 dominator optimization:   2.91 ( 1%) usr   0.05 ( 0%) sys   2.99 ( 0%) wall  
11155 kB ( 4%) ggc
 tree SRA  :   4.24 ( 1%) usr   0.15 ( 1%) sys   4.51 ( 0%) wall  
25568 kB ( 9%) ggc
 tree STORE-CCP:   0.29 ( 0%) usr   0.01 ( 0%) sys   0.31 ( 0%) wall   
  18 kB ( 0%) ggc
 tree CCP  :   0.87 ( 0%) usr   0.01 ( 0%) sys   2.39 ( 0%) wall   
 154 kB ( 0%) ggc
 tree split crit edges :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.14 ( 0%) wall   
9284 kB ( 3%) ggc
 tree reassociation:   0.34 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall   
   0 kB ( 0%) ggc
 tree code sinking :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall   
   0 kB ( 0%) ggc
 tree linearize phis   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc
 tree forward propagate:   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
   0 kB ( 0%) ggc
 tree conservative DCE :   1.13 ( 0%) usr   0.00 ( 0%) sys   1.11 ( 0%) wall   
   0 kB ( 0%) ggc
 tree aggressiv

[Bug gcov/profile/28480] [4.2 Regression] inliner-1.c:31: ICE: in set_bb_for_stmt, at tree-cfg.c:2775

2006-07-27 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2006-07-27 16:06 ---
Subject: Re: [Bug gcov/profile/28480] [4.2 Regression] inliner-1.c:31: ICE: in
set_bb_for_stmt, at tree-cfg.c:2775

Hi,
it is hitting sanity check in set_bb_for_stmt that is bit insane in this
context.  I am testing patch to inline neccesary parts of
set_bb_for_stmt (because of quadratic time issue it is time critical too
and we need very little of the function itself.)

Index: tree-cfg.c
===
*** tree-cfg.c  (revision 115775)
--- tree-cfg.c  (working copy)
*** tree_split_block (basic_block bb, void *
*** 4203,4209 
new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi);
for (tsi_tgt = tsi_start (new_bb->stmt_list);
 !tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt))
! set_bb_for_stmt (tsi_stmt (tsi_tgt), new_bb);

return new_bb;
  }
--- 4205,4218 
new_bb->stmt_list = tsi_split_statement_list_before (&bsi.tsi);
for (tsi_tgt = tsi_start (new_bb->stmt_list);
 !tsi_end_p (tsi_tgt); tsi_next (&tsi_tgt))
! {
!   tree stmt = tsi_stmt (tsi_tgt);
! 
!   get_stmt_ann (stmt)->bb = new_bb;
!   if (TREE_CODE (stmt) == LABEL_EXPR)
! VEC_replace (basic_block, label_to_block_map,
!LABEL_DECL_UID (LABEL_EXPR_LABEL (stmt)), bb);
! }

return new_bb;
  }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28480

[Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-07-28 Thread hubicka at ucw dot cz



--- Comment #32 from hubicka at ucw dot cz  2006-07-28 09:41 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
I've added this testcase to our's memory regression tester (see
gcc-regression mainling list), so hopefully the quadratic memory
consumption issues will be tracked now.  It would be nice to have
runtime benchmark variant of the test we can track the runtime and
compilation time.  It seems to uncover quite interesting behaviours
across the compiler.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

2006-08-07 Thread hubicka at ucw dot cz



--- Comment #47 from hubicka at ucw dot cz  2006-08-08 06:28 ---
Subject: Re:  [4.0/4.1 Regression] gcc 4 produces worse x87 code on all
platforms than gcc 3

> In x86/x86-64 world one can be almost sure that the load+execute instruction
> pair will execute (marginaly to noticeably) faster than move+load-and-execute
> instruction pair as the more complex instructions are harder for on-chip
> scheduling (they retire later).
   ^^^ retirement filling up the scheduler
   easilly.
> Perhaps we can move such a transformation somewhere more generically perhaps 
> to
> post-reload copyprop?
> 
> Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-18 Thread hubicka at ucw dot cz



--- Comment #37 from hubicka at ucw dot cz  2006-08-18 23:10 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
to summary current process, the memory consumption seems to be in
control now:

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
  Ovarall memory allocated via mmap and sbrk decreased from 146456k to 134136k,
overall -9.18%
  Peak amount of GGC memory allocated before garbage collecting run decreased
from 95412k to 81628k, overall -16.89%
  Amount of produced GGC garbage decreased from 163295k to 143524k, overall
-13.77%
Overall memory needed: 146456k -> 134136k
Peak memory use before GGC: 95412k -> 81628k
Peak memory use after GGC: 58507k
Maximum of released memory in single GGC run: 45493k
Garbage: 163295k -> 143524k
Leak: 7142k
Overhead: 29023k -> 25103k
GGC runs: 87

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
Overall memory needed: 430308k -> 424700k
Peak memory use before GGC: 201177k
Peak memory use after GGC: 196173k
Maximum of released memory in single GGC run: 100203k -> 95156k
Garbage: 279198k -> 271636k
Leak: 47195k
Overhead: 31459k -> 29952k
GGC runs: 105

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
Overall memory needed: 350424k -> 344820k
Peak memory use before GGC: 208293k
Peak memory use after GGC: 196536k
Maximum of released memory in single GGC run: 101565k -> 96536k
Garbage: 394891k -> 387353k
Leak: 47778k
Overhead: 49054k -> 47552k
GGC runs: 111

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre
-fno-tree-fre level:
Overall memory needed: 535696k -> 536260k
Peak memory use before GGC: 314602k
Peak memory use after GGC: 292946k
Maximum of released memory in single GGC run: 163430k
Garbage: 494953k -> 486928k
Leak: 65110k
Overhead: 60330k -> 58798k
GGC runs: 100

I will post short summary of remaining bottleneks on each optimization
level.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-18 Thread hubicka at ucw dot cz



--- Comment #38 from hubicka at ucw dot cz  2006-08-19 00:19 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

At -O0 we get time sinks:
 life analysis :   0.75 (10%) usr   0.01 ( 3%) sys   0.78 ( 9%) wall   
2714 kB ( 4%) ggc
 expand:   1.46 (15%) usr   0.04 (11%) sys   1.66 (15%) wall  
37656 kB (58%) ggc
 local alloc   :   1.40 (14%) usr   0.04 (11%) sys   1.45 (13%) wall   
1293 kB ( 2%) ggc
 global alloc  :   3.55 (36%) usr   0.05 (14%) sys   3.67 (34%) wall   
7509 kB (12%) ggc
 final :   0.96 (10%) usr   0.04 (11%) sys   1.00 ( 9%) wall   
1157 kB ( 2%) ggc
 TOTAL :   9.95 0.3510.77 
64543 kB

Expand seems resonable given that almost everything is call that has
long representation. 

Global alloc is copying important portion of insn stream because of:

  /* If we aren't replacing things permanently and we changed something,
 make another copy to ensure that all the RTL is new.  Otherwise
 things can go wrong if find_reload swaps commutative operands
 and one is inside RTL that has been copied while the other is not.  */
  new_body = old_body;
  if (! replace)
{
  new_body = copy_insn (old_body);
  if (REG_NOTES (insn))
REG_NOTES (insn) = copy_insn_1 (REG_NOTES (insn));
}

and few other occurences of copy_insn in reload1.c.  They seems to copy
quite a lot of unnecesary RTL "just for sure".  Also virtual register
ellimination produce a lot of duplicated RTL, perhaps it can be cached?

global alloc also spend 50% of time by clearing out
reg_has_output_reload.  I am testing patch that fix that.

 global alloc  :   1.51 (19%) usr   0.07 (20%) sys   1.60 (18%) wall   
7509 kB (12%) ggc

Final is spending all it's time in shorten branches, that are not needed
at all.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-18 Thread hubicka at ucw dot cz



--- Comment #39 from hubicka at ucw dot cz  2006-08-19 01:51 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

The -O1 time sinks:

 life analysis :  25.44 (19%) usr   0.00 ( 0%) sys  25.49 (17%) wall   
2565 kB ( 2%) ggc
 inline heuristics :  14.92 (11%) usr   0.00 ( 0%) sys  14.95 (10%) wall   
1486 kB ( 1%) ggc
 integration   :  20.73 (15%) usr   0.10 ( 4%) sys  22.72 (15%) wall  
33445 kB (20%) ggc
 tree SSA to normal:  27.97 (20%) usr   0.04 ( 2%) sys  28.13 (19%) wall   
  17 kB ( 0%) ggc
 expand:   2.56 ( 2%) usr   0.04 ( 2%) sys   2.67 ( 2%) wall  
24100 kB (14%) ggc
 local alloc   :   7.21 ( 5%) usr   0.03 ( 1%) sys   7.18 ( 5%) wall   
1855 kB ( 1%) ggc
 global alloc  :  11.76 ( 9%) usr   0.99 (39%) sys  17.71 (12%) wall  
11029 kB ( 6%) ggc
 reload CSE regs   :   7.91 ( 6%) usr   0.02 ( 1%) sys   7.97 ( 5%) wall   
2393 kB ( 1%) ggc
 TOTAL : 136.62 2.56   148.01
170448 kB

tree SSA to normal spends most of time in find_value_in_list because TER
is shuffling around single linked lists in the quadratic way.  I got
quickly lost in the logic there.  Andrew, can you take a look, please?

integration runs into qudratic behaviour of cgraph_edge.  Implementing
hashtable for large cgraphs is easy, I will do so.  Also
tree_split_block quadratic behaviour hits us here.

reload CSE regs has hard time to track all the stack slot memory
locations.  It is working harder than needed because a lot of memories
are believed to be aliasing even if theoretically almost everything SRA
and has no address taken so it should have unique alias sets.

Life analysis spends most of time in dead store removal code.  Again
lowering --param might help.  I am also testing little patch to cut it
to 13 seconds by speeding up reg_overlap_mentioned_p.  It would be
insteresting to see how dataflow branch score here.

inline heuristics spends most time checking inline_function_growth
limit, I will need to think about it a bit.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-19 Thread hubicka at ucw dot cz



--- Comment #41 from hubicka at ucw dot cz  2006-08-20 00:58 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Thank you for consideration,
Live on entry/exit code shows up high on -O3 compilation time too
(something like 30% of time without PRE/FRE I believe).  So if it is
self contained change, perhaps pushing it to mainline as PR fix would
make sense.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug c/28744] externally_visible attribute not effective with prior declaration of symbol.

2006-08-20 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2006-08-20 12:42 ---
Subject: Re:  externally_visible attribute not effective with prior declaration
of symbol.

> Is there any reason why process_function_and_variable_attributes is called
> at the end of each TU rather than when all TUs were already parsed?

The reason is that we do unreachable function removal after each unit
(to conserve memory) and for that we do need to process USED attributes
(not really the externally_visible as those are used only in
cgraph_optimize).  Keeping handling of USED attributes at TU basis,
while moving externally_visible to global basis would not completely
work, since USED attributes in whole program mode can be used for public
variables too, pretty much as externally_visible is used in the
testcase.

I guess only solution is to process all TU local objects at the end of
each TU and all global objects at the cgraph_optimize stage.  I will
post the patch for this.

Thank you for looking into those dead ends!
Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28744

[Bug debug/26881] [4.1/4.2 Regression] internal compiler error in dwarf2out_finish

2006-08-20 Thread hubicka at ucw dot cz



--- Comment #18 from hubicka at ucw dot cz  2006-08-20 12:47 ---
Subject: Re:  [4.1/4.2 Regression] internal compiler error in dwarf2out_finish

> (In reply to comment #14)
> Any news on the patch?

Sadly we are having just tip of the iceberg here.  The patch to deffer
output of debug symbols later sort of work, but I noticed there are
other PRs related to problem where optimized out static variable is
still referred to by debug info, so I attempted to move debug output
code to cgraph domain and failed to do so.  The problem is that we are
quite inconsistent in way we do handle the optimized out variables.  In
some cases we do emit debug output for them, in other we don't and
in another we ICE depending on case and forntend.

I guess I will back out and implement the deferring itself without
touching the whole issue for start.  THen we probably ought to teach
debug info output machinery to query cgraph about whether the particular
variable was output or not and output the location or optimized out info
and move the debug output to cgraph at last (for both local and external
stuff, so we will need new datastructure in cgraph holding all
declarations in program somehow, as this is for now maintained only by
frontends)

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26881

[Bug c/28744] externally_visible attribute not effective with prior declaration of symbol.

2006-08-20 Thread hubicka at ucw dot cz



--- Comment #13 from hubicka at ucw dot cz  2006-08-20 12:59 ---
Subject: Re:  externally_visible attribute not effective with prior declaration
of symbol.

> 
> If this is really true, then there are several bugs (in the FEs?) because 
> there
> are numerous occurances where referenced_vars_insert() is called with
> TREE_USED(to) == 0
> 
> Should there be an assertion that only TREE_USED() > 0 are valid targets for
> insertion in/after dfa?

I am not quite convinced there is necesarily a problem, since from
frontend point of view all public variables are automatically used, so
the whole thing matters only for the cgraph code were we start to
differentiate -fwhole-program mode from non-whole-program...

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28744

[Bug c/28744] externally_visible attribute not effective with prior declaration of symbol.

2006-08-20 Thread hubicka at ucw dot cz



--- Comment #14 from hubicka at ucw dot cz  2006-08-20 13:12 ---
Subject: Re:  externally_visible attribute not effective with prior declaration
of symbol.

Hi,
this is patch I am testing now.  Can you think of way to break it
(again? :))
The whole thing is a lot more sliperly than I would like it to be...

Honza

Index: cgraphunit.c
===
*** cgraphunit.c(revision 116257)
--- cgraphunit.c(working copy)
*** cgraph_analyze_function (struct cgraph_n
*** 965,975 
 is valid.

 So, we walk the nodes at the end of the translation unit, applying the
!attributes at that point.  */

  static void
  process_function_and_variable_attributes (struct cgraph_node *first,
!   struct cgraph_varpool_node
*first_var)
  {
struct cgraph_node *node;
struct cgraph_varpool_node *vnode;
--- 985,1002 
 is valid.

 So, we walk the nodes at the end of the translation unit, applying the
!attributes at that point. 
!
!The local variables needs to be walked on the end of each compilation unit
!(to allow dead function/variable removal), while the global variables
needs
!to be handled on the end of compilation to allow flags to be declared only
!in one of units.  The GLOBAL is used to specify whether local or global
!variables shall be processed.  */

  static void
  process_function_and_variable_attributes (struct cgraph_node *first,
!   struct cgraph_varpool_node
*first_var,
! bool global)
  {
struct cgraph_node *node;
struct cgraph_varpool_node *vnode;
*** process_function_and_variable_attributes
*** 977,982 
--- 1004,1012 
for (node = cgraph_nodes; node != first; node = node->next)
  {
tree decl = node->decl;
+   if (global != (DECL_COMDAT (decl)
+|| (TREE_PUBLIC (decl) && !DECL_EXTERNAL (decl
+   continue;
if (lookup_attribute ("used", DECL_ATTRIBUTES (decl)))
{
  mark_decl_referenced (decl);
*** process_function_and_variable_attributes
*** 1000,1005 
--- 1030,1037 
for (vnode = cgraph_varpool_nodes; vnode != first_var; vnode = vnode->next)
  {
tree decl = vnode->decl;
+   if (global != (DECL_COMDAT (decl) || TREE_PUBLIC (decl)))
+   continue;
if (lookup_attribute ("used", DECL_ATTRIBUTES (decl)))
{
  mark_decl_referenced (decl);
*** cgraph_finalize_compilation_unit (void)
*** 1052,1058 
  }

timevar_push (TV_CGRAPH);
!   process_function_and_variable_attributes (first_analyzed,
first_analyzed_var);
cgraph_varpool_analyze_pending_decls ();
if (cgraph_dump_file)
  {
--- 1085,1092 
  }

timevar_push (TV_CGRAPH);
!   process_function_and_variable_attributes (first_analyzed,
first_analyzed_var,
!   false);
cgraph_varpool_analyze_pending_decls ();
if (cgraph_dump_file)
  {
*** cgraph_optimize (void)
*** 1505,1512 

timevar_push (TV_CGRAPHOPT);
if (!quiet_flag)
! fprintf (stderr, "Performing intraprocedural optimizations\n");

cgraph_function_and_variable_visibility ();
if (cgraph_dump_file)
  {
--- 1540,1548 

timevar_push (TV_CGRAPHOPT);
if (!quiet_flag)
! fprintf (stderr, "Performing interprocedural optimizations\n");

+   process_function_and_variable_attributes (NULL, NULL, true);
cgraph_function_and_variable_visibility ();
if (cgraph_dump_file)
  {


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28744

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-20 Thread hubicka at ucw dot cz



--- Comment #44 from hubicka at ucw dot cz  2006-08-21 02:59 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
update at -O1 few patches later (different machine with "only" 500MB
ram, so some swappin occurs, but we almost fit now):
 life analysis :  23.50 (20%) usr   0.00 ( 0%) sys  23.51 (17%) wall   
2565 kB ( 2%) ggc
 inline heuristics :   0.60 ( 1%) usr   0.00 ( 0%) sys   0.60 ( 0%) wall   
1561 kB ( 1%) ggc
 integration   :   5.75 ( 5%) usr   0.04 ( 2%) sys   5.79 ( 4%) wall  
33701 kB (20%) ggc
 tree SSA rewrite  :   0.51 ( 0%) usr   0.01 ( 1%) sys   0.53 ( 0%) wall  
17087 kB (10%) ggc
 tree SRA  :   0.98 ( 1%) usr   0.08 ( 4%) sys   1.10 ( 1%) wall  
24835 kB (15%) ggc
 tree SSA to normal:  45.11 (39%) usr   0.02 ( 1%) sys  45.14 (33%) wall   
  17 kB ( 0%) ggc
 local alloc   :   5.82 ( 5%) usr   0.01 ( 1%) sys   5.85 ( 4%) wall   
1855 kB ( 1%) ggc
 global alloc  :   9.83 ( 8%) usr   0.76 (39%) sys  23.49 (17%) wall  
11029 kB ( 6%) ggc
 reload CSE regs   :   7.30 ( 6%) usr   0.03 ( 2%) sys  10.16 ( 7%) wall   
2393 kB ( 1%) ggc
 TOTAL : 116.65 1.96   136.52
170783 kB
Life analysis is almost completely code tracking dead stores after
reload (we have many stack slots).  Tree-SSA to normal is the SRA
problem discussed, integration is split_block, global alloc allocate
very huge conflict matrix, reload CSE regs has similar problem tracking
memories.  No idea about local alloc.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-21 Thread hubicka at ucw dot cz



--- Comment #45 from hubicka at ucw dot cz  2006-08-21 12:56 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
-O2 times:
Execution times (seconds)
 life analysis :  18.08 ( 3%) usr   0.04 ( 1%) sys  19.42 ( 3%) wall   
1120 kB ( 0%) ggc
 integration   :   5.97 ( 1%) usr   0.07 ( 2%) sys   6.13 ( 1%) wall  
33701 kB ( 8%) ggc
 tree PRE  : 233.01 (43%) usr   0.46 (13%) sys 241.22 (37%) wall  
19480 kB ( 5%) ggc
 tree SSA to normal:  51.26 ( 9%) usr   0.07 ( 2%) sys  52.62 ( 8%) wall   
  22 kB ( 0%) ggc
 expand:   2.62 ( 0%) usr   0.07 ( 2%) sys   9.45 ( 1%) wall  
24095 kB ( 6%) ggc
 PRE   :  20.39 ( 4%) usr   0.05 ( 1%) sys  21.70 ( 3%) wall   
 200 kB ( 0%) ggc
 regmove   :  97.32 (18%) usr   0.17 ( 5%) sys 107.36 (16%) wall   
   2 kB ( 0%) ggc
 local alloc   :   6.28 ( 1%) usr   0.00 ( 0%) sys   6.29 ( 1%) wall   
1480 kB ( 0%) ggc
 global alloc  :  13.12 ( 2%) usr   0.71 (21%) sys  62.79 (10%) wall  
13764 kB ( 3%) ggc
 reload CSE regs   :  16.16 ( 3%) usr   0.02 ( 1%) sys  19.21 ( 3%) wall   
4783 kB ( 1%) ggc
 scheduling 2  :  60.85 (11%) usr   0.57 (17%) sys  67.94 (10%) wall 
206199 kB (51%) ggc
 TOTAL : 547.14 3.41   651.49
404467 kB

Danny has fix for PRE scheduled for 4.2. Regmove hits again the same
predicate function sincle we now produce big basic blocks.  This can be
fixed rather easilly rather by limiting walk in that predicate or
assiging INSN consetuctive indexes.  Scheduling has problem moving
around linked lists of dependencies and fixing it seems to need to go
away from log links and thus it is 4.2 issue too.

One detail that just came to mind...  All memory consumed in scheduling
are log links. Producing 206MB of them for 24MB function is rather
dense. Can't we prune them out somewhat perhaps by accounting
transitivity (at least in special cases)?  The instructions are all
really mostly independent, but we apparently lose track of the fact
somewhere and producing almost complette tournament apparently.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug middle-end/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space

2006-08-21 Thread hubicka at ucw dot cz



--- Comment #46 from hubicka at ucw dot cz  2006-08-21 17:11 ---
Subject: Re:  [4.1/4.2 regression] A file that can not be compiled in
reasonable time/space

Hi,
for completeness the -O3 -fno-tree-pre -fno-tree-fre results
(tree-pre/fre needs something little over 2GB of ram to converge)

Execution times (seconds)
 garbage collection:   1.11 ( 1%) usr   0.07 ( 2%) sys   8.57 ( 5%) wall   
   0 kB ( 0%) ggc
 life analysis :   5.47 ( 4%) usr   0.12 ( 3%) sys   5.63 ( 3%) wall   
2701 kB ( 1%) ggc
 life info update  :   2.05 ( 2%) usr   0.00 ( 0%) sys   2.10 ( 1%) wall   
 643 kB ( 0%) ggc
 integration   :   8.36 ( 7%) usr   0.18 ( 5%) sys   8.61 ( 5%) wall  
79611 kB (27%) ggc
 tree CFG cleanup  :   3.69 ( 3%) usr   0.00 ( 0%) sys   3.77 ( 2%) wall   
  20 kB ( 0%) ggc
 tree alias analysis   :   2.64 ( 2%) usr   0.25 ( 6%) sys   3.01 ( 2%) wall   
   0 kB ( 0%) ggc
 tree SSA rewrite  :   2.17 ( 2%) usr   0.02 ( 1%) sys   2.22 ( 1%) wall  
21589 kB ( 7%) ggc
 tree SSA incremental  :   4.04 ( 3%) usr   0.01 ( 0%) sys   4.10 ( 2%) wall   
1061 kB ( 0%) ggc
 tree operand scan :   1.54 ( 1%) usr   0.54 (14%) sys   1.95 ( 1%) wall  
18382 kB ( 6%) ggc
 dominator optimization:   2.49 ( 2%) usr   0.06 ( 2%) sys   2.61 ( 1%) wall  
11262 kB ( 4%) ggc
 tree SRA  :   3.04 ( 2%) usr   0.08 ( 2%) sys   3.12 ( 2%) wall  
25600 kB ( 9%) ggc
 tree SSA to normal:  38.17 (31%) usr   0.09 ( 2%) sys  38.56 (21%) wall  
11214 kB ( 4%) ggc
 dominance computation :   2.40 ( 2%) usr   0.05 ( 1%) sys   2.52 ( 1%) wall   
   0 kB ( 0%) ggc
 expand:   4.22 ( 3%) usr   0.20 ( 5%) sys  11.38 ( 6%) wall  
35690 kB (12%) ggc
 global alloc  :  13.43 (11%) usr   1.28 (32%) sys  54.13 (29%) wall   
5873 kB ( 2%) ggc
 flow 2:   0.37 ( 0%) usr   0.01 ( 0%) sys   0.78 ( 0%) wall   
5092 kB ( 2%) ggc
 TOTAL : 123.25 3.98   183.52
291674 kB

Note that the testcase is very different at -O3, because min/max
functions are inlined breaking gigantic basic blocks into number of
small BBs, so many of bottlenecks visible at -O2 go away.  I duno what
happens in global alloc, tree SSA to normal is the
live_on_entry/live_on_exit dicussed.  We also have problems with very
deep recursion levels as dominator tree is deep.  I am thinking about
implementing iterators for walking in dom order as the current fully
blown domtree walker is bit uneasy in some cases.

With FRE/PRE enabled also GGC runs out of stack frame size, because some
of temporary values in annotations leaks and instruct GGC to recurse
insanely.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug target/29401] [4.0/4.1/4.2 Regression] missed-optimization (in unneeded code elimination)

2006-10-15 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2006-10-15 22:20 ---
Subject: Re:  [4.0/4.1/4.2 Regression] missed-optimization (in unneeded code
elimination)

> (insn:TI 38 37 26 2 (parallel [
> (set (reg:SI 1 dx [+4 ])
> (ashiftrt:SI (reg:SI 1 dx [+4 ])
> (const_int 15 [0xf])))
> (clobber (reg:CC 17 flags))
> ]) 443 {*ashrsi3_1} (nil)
> (expr_list:REG_UNUSED (reg:CC 17 flags)
> (expr_list:REG_UNUSED (reg:SI 1 dx [+4 ])
> (nil
> 
> note the problematic partially dead DI ax:dx which flow does not handle,
> so the redundant instruction does not get deleted.  A peephole might be
> able to fix this until new dataflow maybe handles this case(?).

It seems to me that the instruction is completely dead from post-reload
dataflow point of view (ie return value is just eax and both sets are
correctly marked as unused).  Perhaps we just somehow managed to drop
post-read DCE?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29401

[Bug middle-end/29299] [4.2 Regression] gcc "used" attribute has no effect on local-scope static variables

2006-10-15 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2006-10-15 22:33 ---
Subject: Re:  [4.2 Regresion] gcc "used" attribute has no effect on local-scope
static variables

> Reopening because it is not fixed for non unit at a time mode (-O0 for C).
-O0 gets it right, just -O1 -fno-unit-at-a-time fails, but I am testing
patch for this already.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29299

[Bug target/29512] compile time hog / deadloop.

2006-10-19 Thread hubicka at ucw dot cz



--- Comment #6 from hubicka at ucw dot cz  2006-10-19 19:36 ---
Subject: Re:  compile time hog / deadloop.

> 
> 
> --- Comment #5 from rguenth at gcc dot gnu dot org  2006-10-19 15:15 
> ---
> It _looks_ like we're needlessly recursing into the BINFOs in the i386 
> backend.

What that code is shooting for is to interpret the class hiearchy as an
union - each base class is walked and classified.
This is not exactly my area, but does the testcase really have so deep
nesting of base classes to this shows up in profiles?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29512

[Bug target/29512] compile time hog / deadloop.

2006-10-19 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2006-10-19 23:32 ---
Subject: Re:  compile time hog / deadloop.

Just for a record, we discussed this a bit on IRC.  I origionally wrote
that loop copying logic from alias.c just to be sure that all the fields
from base clases are merged into the result.

It seems that TYPE_FIELDS should already contain all of them and if this
is true, I think it is safe to drop the first loop as Richard suggest,
since we should not worry about other properties of the base class, like
aliasing does.

The merging does slightly more than just dump merging of all fields into
the classes array. For instance it might be convinced that something is
missaligned and dump whole thing to memory, so I would preffer that that
the patch is tested by comparing assembly of some non-trivial C++ code.
But looking at the function, I can't come with scenario, where this
would change ABI behaviour.

Honza

PS: Thianks for looking into this obviously my failure!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29512

[Bug gcov-profile/30650] [4.3 Regression] ICE with -fprofile-use

2007-02-03 Thread hubicka at ucw dot cz



--- Comment #6 from hubicka at ucw dot cz  2007-02-03 21:55 ---
Subject: Re:  [4.3 Regression] ICE with -fprofile-use

>   size = ((histogram->hvalue.counters[0]
>   + histogram->hvalue.counters[0] / 2)
> -  / histogram->hvalue.counters[0]);
> +  / histogram->hvalue.counters[1]);
> 
> micha suggested you meant 
> 
>   size = ((histogram->hvalue.counters[0]
>   + histogram->hvalue.counters[1] / 2)
>/ histogram->hvalue.counters[1]);
> 
> (upward rounding)

Ah, yes, thanks!  I probably should've scheduled updating this patch for
mainline after the trip as I didn't do particularly good work on it just
before leaving :(


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30650

[Bug tree-optimization/31191] [4.3 Regression] 1000% Runtime regression for FreeFEM navier-stokes example

2007-03-15 Thread hubicka at ucw dot cz



--- Comment #2 from hubicka at ucw dot cz  2007-03-15 22:01 ---
Subject: Re:   New: [4.3 Regression] 1000% Runtime regression for FreeFEM
navier-stokes example

> The regression was introduced between r120825 (10s runtime) and r120846 (110s
> runtime).  The obvious candidate is:
> 
> +2007-01-16  Jan Hubicka  <[EMAIL PROTECTED]>
> +
> +   * cgraph.h (cgraph_decide_inlining_incrementally): Kill.
> +   * tree-pass.h: Reorder to make IPA passes appear toegher.
> +   (pass_early_inline, pass_inline_parameters, pass_apply_inline):
> Declare.+   * cgraphunit.c (cgraph_finalize_function): Do not compute
> inling
> +   parameters, do not call early inliner.

This patch enabled more inlining on many of C++ testcases with overall
win on C++ benchmark suite. I know this regression goes away with
http://gcc.gnu.org/ml/gcc-patches/2007-02/msg00318.html
(see Feb 4 results)

>From brief analysis I did some time ago, there are number of functions
in freefem hitting aliasing grouping limits that didn't hit it before
the patch in January,

I believe the way it gets solved by simple DSE is the just fact
that we never build function with so many memory references to lose
track with at first place by elliminating these references early.
This seems to be quite common on other C++ testcases too, so I hope we
will find way to do this in 4.3.

Hopefully Diego will get time to implement the AA before IPA so we can
get this without the extra pass.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31191

[Bug target/27869] "-O -fregmove" handles SSE scalar instructions incorrectly

2007-04-06 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2007-04-06 16:07 ---
Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> Investigating...
The attached patch to remove '%' seems correct to me.  Merge operating
wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
is wrong.  Or am I missing something?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27869

[Bug target/27869] "-O -fregmove" handles SSE scalar instructions incorrectly

2007-04-06 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2007-04-06 17:01 ---
Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly

> 
> 
> --- Comment #8 from stevenb dot gcc at gmail dot com  2007-04-06 16:43 
> ---
> Subject: Re:  "-O -fregmove" handles SSE scalar instructions incorrectly
> 
> > The attached patch to remove '%' seems correct to me.  Merge operating
> > wrapping the (commutative) plus/mult/min/max is not commutative, so '%'
> > is wrong.  Or am I missing something?
> 
> The commutative alternative asm output should also be removed.

I don't think there are alternative asm outputs, just intel variants,
unless I missed something.  The min/max commutative variant should be
removed however, I am testing the attached patch.

Honza


--- Comment #10 from hubicka at ucw dot cz  2007-04-06 17:01 ---
Created an attachment (id=13334)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13334&action=view)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27869

[Bug middle-end/28071] [4.1 regression] A file that can not be compiled in reasonable time/space

2007-04-17 Thread hubicka at ucw dot cz



--- Comment #66 from hubicka at ucw dot cz  2007-04-17 19:38 ---
Subject: Re:  [4.1 regression] A file that can not be compiled in reasonable
time/space

Just to add some explanation to the numbers, df_scan_ref_pool is 50MB,
the bitmaps quoted are 8MB each.  Given nature of the testcase, I think
we are doing satisfactory job at -O2. At -O3 there are still problems
(the testcase -O2 has one huge BB, at -O3 we have many BBs). PRE explode
completely and we need over 1.2GB for -O3 -fno-tree-pre -fno-tree-fre.
What is also killing us at -O3 are the bitmaps.
385MB:
df-problems.c:2951 (df_chain_create_bb)40198  386574160  385195560
385195560 462958
200MB
f-problems.c:984 (df_rd_alloc)40198  385290320  208450840
0  0
110MB
df-problems.c:985 (df_rd_alloc)40198  201714640  110324160
0  0
tree-ssa-live.c:540 (new_tree_live_info)   31939  114031520  113098360
0  84523
tree-ssa-live.c:536 (new_tree_live_info)   31939  113096920  113092320
0  80895

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071

[Bug tree-optimization/37315] [4.4 Regression]: gcc.c-torture/execute/931018-1.c int-compare.c ieee/inf-2.c mzero6.c

2008-09-02 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2008-09-02 10:14 ---
Subject: Re:  [4.4 Regression]: gcc.c-torture/execute/931018-1.c  int-compare.c
ieee/inf-2.c mzero6.c

> Honza, why is tree-inline.c:initialize_cfun not calling
> allocate_struct_function and *then* change whatever elements need changing? 
> There's no comment to reveal the reason.  Now, you're just allocating a 
> cleared
> area and doing a shallow-copy, which causes the clone to have e.g. the same
> cfun->machine. Badness results.

Well, the code is not mine, but it was wirtten at a time struct_function
did hold a lot of extra stuff.  I will take a look.
Why do we allocate MDEP parts of cfun so early?  I will try to deffer it
to later stage of compilation.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37315

[Bug tree-optimization/37315] [4.4 Regression]: gcc.c-torture/execute/931018-1.c int-compare.c ieee/inf-2.c mzero6.c

2008-09-02 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2008-09-02 20:29 ---
Subject: Re:  [4.4 Regression]: gcc.c-torture/execute/931018-1.c  int-compare.c
ieee/inf-2.c mzero6.c

> 
> 
> --- Comment #6 from hp at gcc dot gnu dot org  2008-09-02 10:41 ---
> (In reply to comment #5)
> 
> > Well, the code is not mine, but it was wirtten at a time struct_function
> > did hold a lot of extra stuff.
> 
> SVN blamed you for that code in tree-inline.c and the revision range is yours.

I've moved that code around from verioning function and enabled it by
default.

I am attaching patch I am testing now. It makes versioning to copy only
fields needed and I also noticed that we leak memory in gimple_body
pointer.  This pointer is pointing to sequence holding first basicblock
after build_cfg and when this block is removed it is pointing to bogus
sequence then. Setting this pointer to 0 confuse some places that use
gimple_body to check presence of functionbody.
I added predicate for this and replaced checks by cgraph's analyzed
flag that will be needed for WHOPR anyway.

Index: cgraph.c
===
*** cgraph.c(revision 139886)
--- cgraph.c(working copy)
*** cgraph_create_edge (struct cgraph_node *
*** 631,637 

gcc_assert (is_gimple_call (call_stmt));

!   if (!gimple_body (callee->decl))
  edge->inline_failed = N_("function body not available");
else if (callee->local.redefined_extern_inline)
  edge->inline_failed = N_("redefined extern inline functions are not "
--- 631,637 

gcc_assert (is_gimple_call (call_stmt));

!   if (!callee->analyzed)
  edge->inline_failed = N_("function body not available");
else if (callee->local.redefined_extern_inline)
  edge->inline_failed = N_("redefined extern inline functions are not "
*** dump_cgraph_node (FILE *f, struct cgraph
*** 1059,1065 
  fprintf (f, " needed");
else if (node->reachable)
  fprintf (f, " reachable");
!   if (gimple_body (node->decl))
  fprintf (f, " body");
if (node->output)
  fprintf (f, " output");
--- 1059,1065 
  fprintf (f, " needed");
else if (node->reachable)
  fprintf (f, " reachable");
!   if (gimple_has_body_p (node->decl))
  fprintf (f, " body");
if (node->output)
  fprintf (f, " output");
Index: cgraphunit.c
===
*** cgraphunit.c(revision 139886)
--- cgraphunit.c(working copy)
*** verify_cgraph_node (struct cgraph_node *
*** 639,645 
  }

if (node->analyzed
-   && gimple_body (node->decl)
&& !TREE_ASM_WRITTEN (node->decl)
&& (!DECL_EXTERNAL (node->decl) || node->global.inlined_to))
  {
--- 639,644 
*** cgraph_analyze_functions (void)
*** 860,866 
  {
fprintf (cgraph_dump_file, "Initial entry points:");
for (node = cgraph_nodes; node != first_analyzed; node = node->next)
!   if (node->needed && gimple_body (node->decl))
  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
fprintf (cgraph_dump_file, "\n");
  }
--- 859,865 
  {
fprintf (cgraph_dump_file, "Initial entry points:");
for (node = cgraph_nodes; node != first_analyzed; node = node->next)
!   if (node->needed)
  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
fprintf (cgraph_dump_file, "\n");
  }
*** cgraph_analyze_functions (void)
*** 912,918 
  {
fprintf (cgraph_dump_file, "Unit entry points:");
for (node = cgraph_nodes; node != first_analyzed; node = node->next)
!   if (node->needed && gimple_body (node->decl))
  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
fprintf (cgraph_dump_file, "\n\nInitial ");
dump_cgraph (cgraph_dump_file);
--- 911,917 
  {
fprintf (cgraph_dump_file, "Unit entry points:");
for (node = cgraph_nodes; node != first_analyzed; node = node->next)
!   if (node->needed)
  fprintf (cgraph_dump_file, " %s", cgraph_node_name (node));
fprintf (cgraph_dump_file, "\n\nInitial ");
dump_cgraph (cgraph_dump_file);
*** cgraph_analyze_functions (void)
*** 926,935 
tree decl = node->decl;
next = node->next;

!   if (node->local.finalized && !gimple_body (decl))
cgraph_reset_node (node);

!   if (!node->reachable && gimple_body (de

[Bug middle-end/37343] [4.4 Regression] ICE in expand_expr_real_1, at expr.c:7290

2008-09-03 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2008-09-03 14:30 ---
Subject: Re:  [4.4 Regression] ICE in expand_expr_real_1, at expr.c:7290

Hi,
this is switch conversion bug.  It attempts to convert the switch and
construct static array with &function_parameter in initializer that
naturally is wrong.

I am testing the following fix:

Index: tree-switch-conversion.c
===
--- tree-switch-conversion.c(revision 139938)
+++ tree-switch-conversion.c(working copy)
@@ -298,7 +298,7 @@ check_final_bb (void)

  if ((bb == info.switch_bb
   || (single_pred_p (bb) && single_pred (bb) == info.switch_bb))
- && !is_gimple_min_invariant (gimple_phi_arg_def (phi, i)))
+ && !is_gimple_ip_invariant (gimple_phi_arg_def (phi, i)))
{
  info.reason = "   Non-invariant value from a case\n";
  return false; /* Non-invariant argument.  */


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37343

[Bug tree-optimization/37345] [4.4 Regression] Segfault in decl_function_context (TYPE_MAIN_VARIANT)

2008-09-03 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2008-09-03 18:33 ---
Subject: Re:  [4.4 Regression] Segfault in decl_function_context
(TYPE_MAIN_VARIANT)

Testing:

* tree.c (build_function_type_skip_args): Build distinct type copy;
set TYPE_CONTEXT.
(build_function_decl_skip_args): Set type of new decl not orig decl;
clear DECL_VINDEX for methods turned into functions.

Index: tree.c
===
--- tree.c  (revision 139938)
+++ tree.c  (working copy)
@@ -5925,7 +5925,12 @@ build_function_type_skip_args (tree orig
   TYPE_ARG_TYPES (new_type) = new_reversed;
 }
   else
-new_type = build_function_type (TREE_TYPE (orig_type), new_reversed);
+{
+  new_type
+= build_distinct_type_copy (build_function_type (TREE_TYPE
(orig_type),
+new_reversed));
+  TYPE_CONTEXT (new_type) = TYPE_CONTEXT (orig_type);
+}

   /* This is a new type, not a copy of an old type.  Need to reassociate
  variants.  We can handle everything except the main variant lazily.  */
@@ -5959,7 +5964,12 @@ build_function_decl_skip_args (tree orig
   new_type = TREE_TYPE (orig_decl);
   if (prototype_p (new_type))
 new_type = build_function_type_skip_args (new_type, args_to_skip);
-  TREE_TYPE (orig_decl) = new_type;
+  TREE_TYPE (new_decl) = new_type;
+
+  /* For declarations setting DECL_VINDEX (i.e. methods)
+ we expect first argument to be THIS pointer.   */
+  if (bitmap_bit_p (args_to_skip, 0))
+DECL_VINDEX (new_decl) = NULL_TREE;
   return new_decl;
 }



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37345

[Bug c++/37057] 7 Internal Compiler Errors when compiling OpenFOAM-1.5

2008-09-14 Thread hubicka at ucw dot cz



--- Comment #12 from hubicka at ucw dot cz  2008-09-14 09:18 ---
Subject: Re:  7 Internal Compiler Errors when compiling OpenFOAM-1.5

> Honza,
> 
> I may not be analyzing this correctly, but it looks like
> cgraph_remove_unreachable_nodes() may be removing something that is not dead. 
> Is cgraph handling constructors and destructors on non-ELF systems correctly?

It ought to be.  I.e. as long as I remember, the constructors either
appear local but have DECL_STATIC_CONSTRUCTOR on ELF or they are
externally visible functions with specially mangled names.
Perhaps there is yet another way to handle it?  They should be
recognized by decide_is_function_needed predicate in cgraphunit.c

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37057

[Bug c++/37057] 7 Internal Compiler Errors when compiling OpenFOAM-1.5

2008-09-14 Thread hubicka at ucw dot cz



--- Comment #13 from hubicka at ucw dot cz  2008-09-14 09:24 ---
Subject: Re:  7 Internal Compiler Errors when compiling OpenFOAM-1.5

Looking at the log, it seems to be another leak where multiple
declarations points to single struct function.  This is of course quite
evil bug with various side effects (surprisingly often the sharing just
works, but it is always memory leak and tends to break various targets),
we had instance of it already in IPCP versioning and template
instantiation.  THis is why I added explicit ggc_free in cgraph code
now.

I am just leaving for US trip, so I am not sure how soon I will be able
to look, but debugging is quite easy.  You figure out the shared decls
(i.e. one is in the backtrace where garbagecollector crashes, other is
the one we call ggc_free on the struct function when removing it).  Then
breakpoint on the end of ggc_page with condition of result being either
of those addresses to see who builds them and the second one is the
wrong copy. 

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37057

[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel

2008-01-21 Thread hubicka at ucw dot cz



--- Comment #12 from hubicka at ucw dot cz  2008-01-21 09:54 ---
Subject: Re:  [4.3 Regression]  Revision 131576 miscompiled 178.galgel

 and haydn tester using -O3 -funroll-loops -fpeel-loops -ffast-math
-march=native -mtune=native -mfpmath=sse has also started failing at
17th, so this should rule out the extra precision theory :(

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852

[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel

2008-01-21 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2008-01-21 09:48 ---
Subject: Re:  [4.3 Regression]  Revision 131576 miscompiled 178.galgel

Hi,
also one extra data point, britten tester that uses -O3
-fomit-frame-pointer -ftree-loop-linear -funroll-all-loops
-fprefetch-loop-arrays -march=k8 -mfpmath=sse -ffast-math is not getting
the faiulre at 32bit runs...

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852

[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel

2008-01-21 Thread hubicka at ucw dot cz



--- Comment #10 from hubicka at ucw dot cz  2008-01-21 09:44 ---
Subject: Re:  [4.3 Regression]  Revision 131576 miscompiled 178.galgel

> 
> 
> --- Comment #8 from hjl dot tools at gmail dot com  2008-01-21 03:09 
> ---
> Add -ffloat-store seems to fix the problem. I will verify it.

I also found that, but since -ffloat-store changes almost all register
allocation, it is dificult to tell if it made just hid the bug or not :(


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852

[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel

2008-01-21 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2008-01-21 09:43 ---
Subject: Re:  [4.3 Regression]  Revision 131576 miscompiled 178.galgel

> 
> 
> --- Comment #7 from hjl dot tools at gmail dot com  2008-01-20 16:43 
> ---
> Oops. This one

Yes, it does make sense.  I must've missed it, since I was updating
similar cases all around the file.
It should not be code correcntess issue, just code quality - we still
rely on REG_N_CALLS when asking if register crosses a call, just use
frequency to drive decision on profitability of using caller save
register.

Thanks!  I will also look into the two regressions this afternoon unless
you beat me.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852

[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel

2008-01-21 Thread hubicka at ucw dot cz



--- Comment #15 from hubicka at ucw dot cz  2008-01-21 16:42 ---
Subject: Re:  [4.3 Regression]  Revision 131576 miscompiled 178.galgel

> I tried -mpc64. It also works.
I would declare this a proof that it is extra preccision issue and
simply update testers to use -mpc64.  It is what most of other compilers
do anyway and thus we would get more comparable scores. Thanks a lot for
testing it (I've scheduled same test for tonight, but you've beaten me.
I still will try if it works for -mfpmath=sse case too)

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852

[Bug middle-end/34852] [4.3 Regression] Revision 131576 miscompiled 178.galgel

2008-01-22 Thread hubicka at ucw dot cz



--- Comment #18 from hubicka at ucw dot cz  2008-01-22 20:12 ---
Subject: Re:  [4.3 Regression]  Revision 131576 miscompiled 178.galgel

> So, we indeed think this issue is invalid, right?
I am convinced so. -mpc64 does not change code generation.  I messed up
the change of britten flags to include -mpc64 by default so I am waiting
for tonight runs if failure is consistently gone.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34852

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-26 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2008-01-26 20:19 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

> and if it is just not available (i == NULL) might give inconsistent
> answers.

I will look into this.  cgraph_local_info used to trap when asked for
unavailable local info, looks like someone fixed the bug by removing the
assert.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-27 Thread hubicka at ucw dot cz



--- Comment #6 from hubicka at ucw dot cz  2008-01-27 13:54 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

cgraph_local_info still behaves as expected returning NULL when info is
not computed yet. Unfortunately check to simply ignore it when not
available has been added to ix86_function_regparm that makes this bug
lead to wrong code. (revision 123146)

There are two occurences where we can ix86_function_regparm. First one
is for compatibility checking, I would just declare it invalid - we
don't want the type comatiblity to depend on backend decision and I
think it is perfectly sane to reject any types specifying different
REGPARM values or where one specify and other doesn't.

I am testing attached patch and will commit it if passes.

Other case is from gimplifier, I am looking into it.  This definitly has
to go or we need to drop the feature :(

Honza

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 131882)
+++ config/i386/i386.c  (working copy)
@@ -3148,6 +3148,7 @@ ix86_comp_type_attributes (const_tree ty
 {
   /* Check for mismatch of non-default calling convention.  */
   const char *const rtdstr = TARGET_RTD ? "cdecl" : "stdcall";
+  tree attr1, attr2;

   if (TREE_CODE (type1) != FUNCTION_TYPE
   && TREE_CODE (type1) != METHOD_TYPE)
@@ -3155,11 +3156,27 @@ ix86_comp_type_attributes (const_tree ty

   /* Check for mismatched fastcall/regparm types.  */
   if ((!lookup_attribute ("fastcall", TYPE_ATTRIBUTES (type1))
-   != !lookup_attribute ("fastcall", TYPE_ATTRIBUTES (type2)))
-  || (ix86_function_regparm (type1, NULL)
- != ix86_function_regparm (type2, NULL)))
+   != !lookup_attribute ("fastcall", TYPE_ATTRIBUTES (type2
 return 0;

+  /* We don't want to use ix86_function_regparm here: it's decision depends
+ on middle end information, like localness of functions.  Here we only
want
+ to know if types are declared compatible.  */
+  attr1 = lookup_attribute ("regparm", TYPE_ATTRIBUTES (type1));
+  attr2 = lookup_attribute ("regparm", TYPE_ATTRIBUTES (type2));
+
+  if ((attr1 != NULL_TREE) != (attr2 != NULL_TREE))
+return 0;
+
+  if (attr1)
+{
+  int val1 = TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (attr1)));
+  int val2 = TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (attr2)));
+
+  if (val1 != val2)
+   return 0;
+}
+
   /* Check for mismatched sseregparm types.  */
   if (!lookup_attribute ("sseregparm", TYPE_ATTRIBUTES (type1))
   != !lookup_attribute ("sseregparm", TYPE_ATTRIBUTES (type2)))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-27 Thread hubicka at ucw dot cz



--- Comment #8 from hubicka at ucw dot cz  2008-01-27 18:10 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

> One more reason to gimplify unit-at-a-time...

Yep, on the other hand there is probably not much need to get that
amount of architectural detail so easy.  I am looking into what makes
the compilation to diverge.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-27 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2008-01-27 19:24 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

However the failure here is not early calling of cgraph_local_info (it
is ugly, but harmless, we are just looking for target promoting rules
that we don't change). 

The problem is good old type system broken scenario: the forward
declaration has no prorotype and thus might be vararg and thus it is not
regparmized, however the definition is correct. When expanding the call
we use type of the call, so the wrong type.

I am testing the attached patch. My type merging code fixes this too and
obvioiusly we should work harder on maybe_vaarg rule for local
functions, this should make lot of difference on K&R code (I wonder if
any is still around in usual distro)

Honza

Index: config/i386/i386.c
===
*** config/i386/i386.c  (revision 131882)
--- config/i386/i386.c  (working copy)
*** init_cumulative_args (CUMULATIVE_ARGS *c
*** 3432,3437 
--- 3449,3455 
  rtx libname,  /* SYMBOL_REF of library name or 0 */
  tree fndecl)
  {
+   struct cgraph_local_info *i = fndecl ? cgraph_local_info (fndecl) : NULL;
memset (cum, 0, sizeof (*cum));

/* Set up the number of registers to use for passing arguments.  */
*** init_cumulative_args (CUMULATIVE_ARGS *c
*** 3442,3447 
--- 3460,3474 
  cum->mmx_nregs = MMX_REGPARM_MAX;
cum->warn_sse = true;
cum->warn_mmx = true;
+ 
+   /* Because type might mismatch in between caller and callee, we need to
+  use actual type of function for local calls.
+  FIXME: cgraph_analyze can be told to actually record if function uses
+  va_start so for local functions maybe_vaarg can be made aggressive
+  helping K&R code.
+  FIXME: once typesytem is fixed, we won't need this code anymore.  */
+   if (i && i->local)
+ fntype = TREE_TYPE (fndecl);
cum->maybe_vaarg = (fntype
  ? (!prototype_p (fntype) || stdarg_p (fntype))
  : !libname);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982

[Bug middle-end/34969] [4.3 regression] ICE with -fipa-cp -ffast-math

2008-01-28 Thread hubicka at ucw dot cz



--- Comment #6 from hubicka at ucw dot cz  2008-01-28 20:51 ---
Subject: Re:  [4.3 regression] ICE with -fipa-cp -ffast-math

> No, I mean providing something like cgraph_update_edges_for_call_stmt (tree
> old, tree new); or alternatively cgraph_remove_edge_from_call_stmt () and
> cgraph_add_edge_from_call_stmt () and call those two unconditionally.

My stragegy so far was to rebuild cgraph edges from scratch when needed
(that is something possibly changed).  We can probably handle that via
function local TODO flag here too.

Updating the edges across multiple passes is kind of sliperly, since we
would need to tie cgraph a lot more with gimple, pretty much as we do
for CFG.  This seems too much pie in the sky project with current
organization of trees and folders, I hope that with tuples we will have
a lot closer correspondence in between actual statements and calls here.

Since we need to have edges up to date across inliner, I guess the patch
is fine (as would be addint the TODO flag).  Thanks for looking into it!

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34969

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-29 Thread hubicka at ucw dot cz



--- Comment #11 from hubicka at ucw dot cz  2008-01-29 17:51 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

Hi,
the patch seems to pass my local testing, but on Zdenek's tester I get
curious results on i686:

Tests that now fail, but worked before: 

libmudflap.cth/pass37-frag.c (-O2) (rerun 14) execution test
libmudflap.cth/pass37-frag.c (-O2) (rerun 18) execution test
libmudflap.cth/pass37-frag.c (-O2) (rerun 18) output pattern test   
libmudflap.cth/pass37-frag.c (-O3) (rerun 2) execution test 
libmudflap.cth/pass37-frag.c (-O3) (rerun 2) output pattern test
libmudflap.cth/pass37-frag.c (-O3) (rerun 3) execution test 
libmudflap.cth/pass37-frag.c (-O3) (rerun 3) output pattern test
libmudflap.cth/pass37-frag.c (-static -DSTATIC) (rerun 10) execution test   
libmudflap.cth/pass37-frag.c (-static -DSTATIC) (rerun 16) execution test   
libmudflap.cth/pass37-frag.c (-static -DSTATIC) (rerun 16) output pattern test  
libmudflap.cth/pass37-frag.c (rerun 10) execution test  
libmudflap.cth/pass37-frag.c (rerun 10) output pattern test 
libmudflap.cth/pass37-frag.c (rerun 12) execution test 
   
libmudflap.cth/pass37-frag.c (rerun 12) output pattern test 
libmudflap.cth/pass37-frag.c (rerun 13) execution test  
libmudflap.cth/pass37-frag.c (rerun 14) execution test  
libmudflap.cth/pass37-frag.c (rerun 14) output pattern test 
libmudflap.cth/pass37-frag.c (rerun 15) execution test  
libmudflap.cth/pass37-frag.c (rerun 17) execution test  
libmudflap.cth/pass37-frag.c (rerun 17) output pattern test 
libmudflap.cth/pass37-frag.c (rerun 2) execution test   
libmudflap.cth/pass37-frag.c (rerun 2) output pattern test  
libmudflap.cth/pass37-frag.c (rerun 4) execution test   
libmudflap.cth/pass37-frag.c (rerun 4) output pattern test  
libmudflap.cth/pass39-frag.c (-O2) (rerun 11) execution test
libmudflap.cth/pass39-frag.c (-O2) (rerun 4) execution test 
libmudflap.cth/pass39-frag.c (-O3) (rerun 13) execution test
libmudflap.cth/pass39-frag.c (-O3) (rerun 13) output pattern test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 10) execution test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 10) output pattern test  
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 14) execution test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 14) output pattern test  
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 16) execution test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 16) output pattern test  
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 4) execution test
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 4) output pattern test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 5) execution test
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 5) output pattern test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 7) execution test
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 7) output pattern test   
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 9) execution test
libmudflap.cth/pass39-frag.c (-static -DSTATIC) (rerun 9) output pattern test   
libmudflap.cth/pass39-frag.c (rerun 1) execution test   
libmudflap.cth/pass39-frag.c (rerun 1) output pattern test  
libmudflap.cth/pass39-frag.c (rerun 15) execution test  
libmudflap.cth/pass39-frag.c (rerun 18) execution test  
libmudflap.cth/pass39-frag.c (rerun 18) output pattern test 
libmudflap.cth/pass39-frag.c (rerun 19) execution test  
libmudflap.cth/pass39-frag.c (rerun 9) execution test
libmudflap.cth/pass39-frag.c (rerun 9) output pattern test  
libmudflap.cth/pass39-frag.c execution test 
libmudflap.cth/pass39-frag.c output pattern test
libmudflap.cth/pass40-frag.c (-O2) execution test   
libmudflap.cth/pass40-frag.c (-O2) output pattern test  
libmudflap.cth/pass40-frag.c (-static -DSTATIC) execution test  
libmudflap.cth/pass40-frag.c (-static -DSTATIC) output pattern test 
libmudflap.cth/pass40-f

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-30 Thread hubicka at ucw dot cz



--- Comment #17 from hubicka at ucw dot cz  2008-01-30 15:56 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

> These tests time out from time to time when the testing box is busy, that's
> quite
> normal.  The problem is in the use of sched_yield (), which puts the calling
> thread to the end of the runqueue.  If there are many processes in the
> runqueue,
> one or more of the 10 threads might miss the 10 sec timeout in one or more of
> the 20 repetitions in 100 sched_yield calls.
> So just ignore this.

Thanks for explanation.  It happent few time in past to me that I
ignored mudflap failures incorrectly claiming random noise. Now at least
I know how to look for test that is supposed to have this problem.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982

[Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as much??)

2008-01-30 Thread hubicka at ucw dot cz



--- Comment #38 from hubicka at ucw dot cz  2008-01-30 23:19 ---
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as
much??)

> AFAICT, they are exactly in the form that some targets like it (e.g.
> auto-inc/dec and SMALL_REGISTER_CLASS targets).

Yep, but all the pointer arithmetic makes us not to realize we are doing
quite simple manipulations with array and propagate load/stores through.
CSE undoes this later in the game, so we end up with normal offsetted
addressing. Doing it earlier should make load/store elimination happier.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863

[Bug target/34982] [4.3 regression] calling a function with undefined parameters causes segmentation fault at -O1 or higher

2008-01-30 Thread hubicka at ucw dot cz



--- Comment #23 from hubicka at ucw dot cz  2008-01-30 23:20 ---
Subject: Re:  [4.3 regression] calling a function with undefined parameters
causes segmentation fault at -O1 or higher

> (In reply to comment #21)
> > but why does this happen only with -O1?
> 
> Random value in eax register so we could put 0 in some cases but not others.

Oops, I am going to commit obvious fix for that. Looks like my tester
got lucky.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34982

[Bug tree-optimization/17863] [4.0/4.1/4.2/4.3 Regression] performance loss (TER register presure and inlining limits problems)

2008-02-01 Thread hubicka at ucw dot cz



--- Comment #43 from hubicka at ucw dot cz  2008-02-01 22:45 ---
Subject: Re:  [4.0/4.1/4.2/4.3 Regression] performance loss (not inlining as
much??)

> TER will not replace any load into an expression if there is more than one use
> of the load. Your sample shows multiple uses of each load. If it did this
> substitution, it could be introducing worse code, it doesn't know.   (TER is
> also strictly a single block replacement as well).

I noticed that now too.   The code after reordering by TER simply need
even more registers alive by changling how the temporaries overlap.
There is probably no simple heuristics to control this...

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17863

[Bug c++/35182] [4.2/4.3 Regression] ICE in coalesce_abnormal_edges

2008-02-15 Thread hubicka at ucw dot cz



--- Comment #8 from hubicka at ucw dot cz  2008-02-15 11:29 ---
Subject: Re:  [4.2/4.3 Regression] ICE in coalesce_abnormal_edges

The split happens in loop_optimized_init during profile construction pass, not
in inliner.
#0  make_ssa_name (var=0xb7da1228, stmt=0xb7daf000) at
../../gcc/tree-ssanames.c:150
#1  0x0850c03c in tree_make_forwarder_block (fallthru=0xb7ce8e60) at
../../gcc/tree-cfg.c:4727
#2  0x082905d4 in make_forwarder_block (bb=0xb7da2ca8,
redirect_edge_p=0x82952e0 , new_bb_cbk=0) at
../../gcc/cfghooks.c:782 #3  0x08295544 in merge_latch_edges (loop=0xb7daa8dc)
at
../../gcc/cfgloop.c:697 #4  0x0829571c in
disambiguate_loops_with_multiple_latches () at
../../gcc/cfgloop.c:762 #5  0x08428448 in loop_optimizer_init (flags=0) at
../../gcc/loop-init.c:65 #6  0x0846d892 in tree_estimate_probability () at
../../gcc/predict.c:1354 

Don't we have separate PR for this?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35182

[Bug c++/35262] [4.4 Regression]: FAIL: abi_check

2008-02-20 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2008-02-20 21:05 ---
Subject: Re:  [4.4 Regression]: FAIL: abi_check

> (In reply to comment #1)
> > Jan, is this related to your patch?
> 
> And if it is, then there is another bug somewhere else anyways as it could 
> only
> change inlining decisions.

It might be the negative inlining cost assert.  I will look into that.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262

[Bug c++/35262] [4.4 Regression]: FAIL: abi_check

2008-02-20 Thread hubicka at ucw dot cz



--- Comment #5 from hubicka at ucw dot cz  2008-02-20 23:39 ---
Subject: Re:  [4.4 Regression]: FAIL: abi_check

OK,
if it really is just inlining decision difference then we are fine.
I guess we can either update symbol list or mark always_inline
> because of a changing inlining decision. My concern, however, is whether it's
> ok not to inline such a tiny function (fyi, the function is defined in
> basic_ios.h all the uses are in fstream.tcc). First blush, I don't think it's
> ok...

I can look into the reason why it is not getting inlined. It would help
to have preprocessed testcase as I am travelling now  :)

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262

[Bug c++/35262] [4.4 Regression]: FAIL: abi_check

2008-03-02 Thread hubicka at ucw dot cz



--- Comment #10 from hubicka at ucw dot cz  2008-03-03 00:50 ---
Subject: Re:  [4.4 Regression]: FAIL: abi_check

> Confirmed, of course. Honza, any news on the inlining issue?
Sorry,
I looked into it, got confused and then distracted by other problem and
forgot to return back.

At second look, the function is estimated to make code to grow slightly
after being inlined.  The functions is still getting inlined by default,
however there are code paths are marked by __builtin_expect as unlikely.
The call sites on these paths are considered cold and thus function is
not inlined there to optimize for size.  This seems very sane behaviour
at first sight.

However the catch is that the function is bit bigger than call sequence
but still after being fully inlined, the overall code size is estimated
to shrink because offline body is eliminated.  So it is definitly better
to inline and have more options for optimizing.  Quite corner case but
easy enough to handle.

I am testing patch to take this fact into account and lets see if it
solves to failure too.  It gets function in question inlined, but makes
other not inlined this time ;)

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262

[Bug c++/35262] [4.4 Regression]: FAIL: abi_check

2008-03-03 Thread hubicka at ucw dot cz



--- Comment #14 from hubicka at ucw dot cz  2008-03-03 23:46 ---
Subject: Re:  [4.4 Regression]: FAIL: abi_check

> Honza, I'm sorry, can you please double-check the fix? On my x86_64-linux
> machines I'm not seeing any progress :(
Hi,
this is what I get from our thester:

Differences:
Tests that now work, but didn't before:
abi_check

so it ought to make differnece for i686-linux.

It is quite possible that things differ on 64bit hosts, we are staying
on quite fragile ground here because in the current cost metric the
benefits of inlining are very close to costs. Given the nature that
function call of the wrapped function is a bit chepaer than call of the
wrapper is quite correct.

The decision on whether function should be inlined or not depends on
many things, like overall size, ABI details etc.  I see it is quite
irritating for ABI checking.

I am sending it for testing for x86-64 now. Perhaps we can deal with
this by checking ABI with -Os that is a bit less dependent on fine
grained performance decision, like we are seeing here?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262

[Bug c++/35262] [4.4 Regression]: FAIL: abi_check

2008-03-03 Thread hubicka at ucw dot cz



--- Comment #16 from hubicka at ucw dot cz  2008-03-04 07:03 ---
Subject: Re:  [4.4 Regression]: FAIL: abi_check

> Note however, that the patch also didn't help Geoff's i686-linux tester, just
> have a look to gcc-testresults.

Sorry, I had two versions of patch and managed to commit the wrong copy.
Sent correct one to ML.  It should be fixed now.
> 
> 
> I think we should not mix the two issues, here. The first issue is that, IMO,
> the function we are discussing should be inlined, it's very small and we 
> always
> inlined it until recently.

The point I wanted to make is that inliner when knowing to be inlining a
cold call (because it was hinted so by __builtin_expect) is correctly a
lot more sellective.  Basically anything that expands to function call
and some extra code around is a loss for code size inlining.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35262

[Bug tree-optimization/35629] [4.4 Regression] gcc.dg/tree-ssa/loop-25.c scan-tree-dump-times profile fails

2008-03-19 Thread hubicka at ucw dot cz



--- Comment #6 from hubicka at ucw dot cz  2008-03-19 23:06 ---
Subject: Re:  [4.4 Regression] gcc.dg/tree-ssa/loop-25.c scan-tree-dump-times
profile fails

This seems to affect about every target, but it is essentially harmless.
I am discussing with Zdenek the proper fix.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35629

[Bug testsuite/35647] FAIL: gcc.dg/cpp/cmdlne-d(I|M)-M.c scan-file (^|\\n)cmdlne-d(I|M)-M[^\\n]:[^\\n]cmdlne-d(I|M)-M.c

2008-03-20 Thread hubicka at ucw dot cz



--- Comment #2 from hubicka at ucw dot cz  2008-03-20 10:32 ---
Subject: Re:  FAIL: gcc.dg/cpp/cmdlne-d(I|M)-M.c scan-file
(^|\\n)cmdlne-d(I|M)-M[^\\n]*:[^\\n]*cmdlne-d(I|M)-M.c

> It fails everywhere, due to commit 133342.  Author is informed and CC:ed.

Sorry for all the breakage.

There used to be xfail, but since I've removed at least one bug in
implementation (-dM was interpretted as both GCC debugging option and
preprocesor directive), I've removed the xfail assuming that the problem
is fixed.  It seems to pass for me:

Executing on host: /root/trunk-an/build/gcc/xgcc
-B/root/trunk-an/build/gcc/
/root/trunk-an/gcc/testsuite/gcc.dg/cpp/cmdlne-dI-M.c   -dI -M
-fno-show-column -E  -o cmdlne-dI-M.i(timeout = 300)
PASS: gcc.dg/cpp/cmdlne-dI-M.c (test for excess errors)
PASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file-not (^|\\n)#define foo
bar($|\\n)
PASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file-not variable
PASS: gcc.dg/cpp/cmdlne-dI-M.c scan-file
(^|\\n)cmdlne-dI-M.*:[^\\n]*cmdlne-dI-M.c
Executing on host: /root/trunk-an/build/gcc/xgcc
-B/root/trunk-an/build/gcc/
/root/trunk-an/gcc/testsuite/gcc.dg/cpp/cmdlne-dM-M.c   -dM -M
-fno-show-column -E  -o cmdlne-dM-M.i(timeout = 300)
PASS: gcc.dg/cpp/cmdlne-dM-M.c (test for excess errors)
PASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file (^|\\n)#define foo bar($|\\n)
PASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file-not variable
PASS: gcc.dg/cpp/cmdlne-dM-M.c scan-file
(^|\\n)cmdlne-dM-M[^\\n]*:[^\\n]*cmdlne-dM-M.c
Executing

I didn't check, perhaps it was xpassing for me previously too, but why
the testcase works for me and fails everywhere else?

Reverting the xfail removal is probably proper fix here.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35647

[Bug middle-end/35781] [4.4 Regression]: Revision 133759 breaks ia64

2008-03-31 Thread hubicka at ucw dot cz



--- Comment #2 from hubicka at ucw dot cz  2008-04-01 00:15 ---
Subject: Re:  [4.4 Regression]: Revision 133759 breaks ia64

> 
> 
> --- Comment #1 from wilson at tuliptree dot org  2008-03-31 22:42 ---
> Subject: Re:   New: [4.4 Regression]: Revision 133759
>  breaks ia64
> 
> hjl dot tools at gmail dot com wrote:
> > On Linux/ia64, I got
> > /net/gnu-13/export/gnu/src/gcc/gcc/gcc/emit-rtl.c: In function `init_emit':
> > /net/gnu-13/export/gnu/src/gcc/gcc/gcc/emit-rtl.c:5035: error: structure 
> > has no
> > member named `emit'
> 
> That isn't the only one broken.  Just grepping for cfun->emit, I see 
> that m32c and sparc are broken also.  There may also be others that are 
> broken, I haven't fully studied the patch yet.
> 
> It looks like
>  cfun->emit->regno_pointer_align
> is now
>  rtl.emit.regno_pointer_align
> And since rtl is apparently now static allocated, the old code
>  if (cfun && cfun->emit->regno_pointer_align)\
> becomes
>  if (rtl.emit.regno_pointer_align)  \

Yes, you are right, there are three backends using emit directly (sparc,
m32c and ia64).  I've posted the patch for it to
http://gcc.gnu.org/ml/gcc-patches/2008-03/msg02043.html
If someone could confirm that it solves the ia-64 problem, I will commit
it.  I am currently out of reach of ia-64 boxes.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35781

[Bug tree-optimization/35795] [4.4 Regression] Revision 133787 breaks ia64

2008-04-02 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2008-04-02 09:35 ---
Subject: Re:   New: [4.4 Regression] Revision 133787 breaks ia64

Hi,
I've added the assert to check that we don't initialize RTL world twice
without freeing it first (and thus that we don't leak memory).  This
seems to be the case.  Naively, something like this should fix it.
I am building a cross and will try to reproduce it.

Index: config/ia64/ia64.c
===
*** config/ia64/ia64.c  (revision 133785)
--- config/ia64/ia64.c  (working copy)
*** ia64_output_mi_thunk (FILE *file, tree t
*** 9694,9699 
--- 9694,9700 
final_start_function (insn, file, 1);
final (insn, file, 1);
final_end_function ();
+   free_after_compilation (cfun);

reload_completed = 0;
epilogue_completed = 0;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795

[Bug target/35795] [4.4 Regression] Revision 133787 breaks ia64

2008-04-02 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2008-04-03 06:12 ---
Subject: Re:  [4.4 Regression] Revision 133787 breaks ia64

> The patch is OK.
> 
> But won't all targets that have similar code need the same fix?  If I cd 
> to the config dir, and try "grep final_end_function */*" it looks like 
> alpha, ia64, m68k, mips, rs6000, score (both score3 and score7), sh, and 
> sparc all have the same problem.  The rs6000 port has already been fixed 
> via PR 35801.  We have an ia64 patch here.  We also need patches for the 
> rest.

Thanks,
I've just noticed that too and send patch for all the backends.  Looks
like we was leaking memory here for ages, probably not that big deal
since thunks are pretty small functions, but still keeping all the RTL
register tables around seems bit expensive.

Honza
> 
> Jim
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795
> 
> --- You are receiving this mail because: ---
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795

[Bug target/35795] [4.4 Regression] Revision 133787 breaks ia64

2008-04-03 Thread hubicka at ucw dot cz



--- Comment #10 from hubicka at ucw dot cz  2008-04-03 15:44 ---
Subject: Re:  [4.4 Regression] Revision 133787 breaks ia64

> I am pretty sure I saw this one targeting sparc-rtems.  Building an updated
> tree now to confirm it is the fix. 
> 
> ../../../../gcc/libstdc++-v3/src/iostream-inst.cc: In member function 'void
> std::basic_iostream >::_ZThn8_NSdD1Ev()':
> ../../../../gcc/libstdc++-v3/src/iostream-inst.cc:51: internal compiler error:
> in prepare_function_start, at function.c:3940

sparc, alpha, m68k, score and mips contained same problem, so hopefully
it is fixed now.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35795

[Bug tree-optimization/37709] [4.4 Regression] inlining causes explosion in debug info

2009-02-12 Thread hubicka at ucw dot cz



--- Comment #10 from hubicka at ucw dot cz  2009-02-12 10:28 ---
Subject: Re:  [4.4 Regression] inlining causes explosion in debug info

> sizeof (tree_block) is 52 bytes on 32-bit hosts.  Of these, 8 are unused (ann
> and type), 8 are frequently unused (block_fragment stuff -- always write-only
> at debug level 0).  Moving fragments into an annotation and reusing type for
> something would already save 20% of the memory.

I was looking into the bug last month but then got distracted by other
more urgent things.  I guess it is time to get back ;)

Even if our blocks representation is not the most effecient around, I
think main problem is that we keep too many blocks around that never get
it to debug info or are never serving useful purpose.  The testcase
compiled at -g3 needs about 100MB of DWARF section and before inlining
we unfortunately need to keep debug info at -g3 verbosity.  I plan to
look into what blocks gets ignored by the dwarf2out but also it would be
great to figure out if we really need them in debug info at first place.
(for example for every inlined function we create container block and
then block containing the arguments.  Current tree-ssa-live code to
prune out blocks preserves both)

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37709

[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well

2009-02-17 Thread hubicka at ucw dot cz



--- Comment #2 from hubicka at ucw dot cz  2009-02-17 17:43 ---
Subject: Re:  LTO and -fwhole-program do not play along well

Hi,
functions are brought local in function_and_variable_visibility that
needs to be scheduled after LTO is read in.
The pas computes externaly_visible flags that should be up-to-date for
early IPA passes before LTO is written out, so I guess we need early
function_and_vairable_visibility pass and late one where the first one
is not bringing functions local at -fwhole-program -lto

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203

[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well

2009-02-17 Thread hubicka at ucw dot cz



--- Comment #4 from hubicka at ucw dot cz  2009-02-17 18:01 ---
Subject: Re:  LTO and -fwhole-program do not play along well

> > Hi,
> > functions are brought local in function_and_variable_visibility that
> > needs to be scheduled after LTO is read in.
> > The pas computes externaly_visible flags that should be up-to-date for
> > early IPA passes before LTO is written out, so I guess we need early
> > function_and_vairable_visibility pass and late one where the first one
> > is not bringing functions local at -fwhole-program -lto
> 
> OK, but I think there is a bigger issue here.  Even if -flto is *not*
> used, we get link errors.  Just by compiling each file with
> -fwhole-program is enough to reproduce the failure:
> 
> $ gcc -fwhole-program -c f1.c
> $ gcc -fwhole-program -c f2.c
> $ gcc -fwhole-program -o f f1.o f2.o
> 
> This is just a natural side-effect of using -fwhole-program.  It was
> not intended to be used like this.

This is intended behaviour.
-fwhole-program essentially hides everything except for main and
functions/variables explicitly marked via attribute, so this is
exepcted.  You need to use --combine and or -lto to compile programs
consisting of multiple compilation units with -fwhole-program.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203

[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well

2009-02-17 Thread hubicka at ucw dot cz



--- Comment #8 from hubicka at ucw dot cz  2009-02-17 18:34 ---
Subject: Re:  LTO and -fwhole-program do not play along well

> > -fwhole-program essentially hides everything except for main and
> > functions/variables explicitly marked via attribute, so this is
> > exepcted.  You need to use --combine and or -lto to compile programs
> > consisting of multiple compilation units with -fwhole-program.
> 
> Yes.  As I just replied upthread, I think we could just turn off
> flag_whole_program when we know we are just generating IL out of the
> front ends.

Essentially yes, but since we are restarting the pass queue from later
time, won't we miss the visibility pass at linktime?
This is why I think we need two passes (one very early and one scheduled
after LTO read)

Honza
> 
> 
> Diego.
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203
> 
> --- You are receiving this mail because: ---
> You are on the CC list for the bug, or are watching someone who is.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203

[Bug tree-optimization/39203] LTO and -fwhole-program do not play along well

2009-02-17 Thread hubicka at ucw dot cz



--- Comment #12 from hubicka at ucw dot cz  2009-02-18 01:42 ---
Subject: Re:  LTO and -fwhole-program do not play along well

> I believe we should be set already.  During LTO read, we execute
> ipa_passes, which runs all_small_ipa_passes.  So,
> pass_ipa_function_and_variable_visibility is run both while writing
> the IL and right after we read it in.
> 
> Is that what you were referring to?

Well, you need to localize stuff at WHOPR stage, so small IPA passes are
unlikely going to be executable there except for those not needing
function bodies (like the visibility pass, but unlike inlining and other
stuff included).  Are those still skipped via the lto flag gate?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39203

[Bug c++/39242] [4.4 Regression] Inconsistent reject / accept of code

2009-02-19 Thread hubicka at ucw dot cz



--- Comment #7 from hubicka at ucw dot cz  2009-02-19 14:59 ---
Subject: Re:  [4.4 Regression] Inconsistent reject / accept of code

> Index: cp/pt.c
> ===
> --- cp/pt.c (revision 144292)
> +++ cp/pt.c (working copy)
> @@ -15285,7 +15285,7 @@ instantiate_decl (tree d, int defer_ok,
>/* ... but we instantiate inline functions so that we can inline
>  them and ... */
>&& ! (TREE_CODE (d) == FUNCTION_DECL
> -   && possibly_inlined_p (d))
> +   && DECL_DECLARED_INLINE_P (d))

this way we will lose inlining opurtunities at -O3, sine we will never
instantiate methods not declared inline.

Is standard really making difference in between inline and !inline here?
Or we are just supposed to suppress the errors and instantiate only
stuff that is complette or are we supposed to give error at -O0 too or
is it up to our choice?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39242

[Bug c++/39242] [4.4 Regression] Inconsistent reject / accept of code

2009-02-19 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2009-02-19 15:40 ---
Subject: Re:  [4.4 Regression] Inconsistent reject / accept of code

>   /* Check to see whether we know that this template will be
>  instantiated in some other file, as with "extern template"
>  extension.  */
>   external_p = (DECL_INTERFACE_KNOWN (d) && DECL_REALLY_EXTERN (d));
>   /* In general, we do not instantiate such templates...  */
>   if (external_p
>   /* ... but we instantiate inline functions so that we can inline
>  them and ... */
>   && ! (TREE_CODE (d) == FUNCTION_DECL
> && DECL_DECLARED_INLINE_P (d))

Hmm, in general it is benefical to instantiate stuff for sake of IPA
optimizers.  It would be interesting to know how this affects code
quality...

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39242

[Bug c++/39242] [4.4 Regression] Inconsistent reject / accept of code

2009-02-20 Thread hubicka at ucw dot cz



--- Comment #13 from hubicka at ucw dot cz  2009-02-20 14:39 ---
Subject: Re:  [4.4 Regression] Inconsistent reject / accept of code

> > What that means is that we *must not* implicitly instantiate things
> > declared "extern template" unless they are DECL_DECLARED_INLINE_P.  As a
> > consequence, at -O3, we cannot implicitly instantiate non-inline "extern
> > template" functions.
> 
> I'm not entirely sure that's what we want it to say, but it does seem like a
> reasonable expectation for users to have.
> 
> Beyond this issue, what is the compile speed impact of the earlier change to
> use possibly_inlined_p?  It seems like it might be making us speculatively
> instantiate a lot more functions for potential inlining even at -O1, which I
> would expect to cause memory bloat and slower compilation.

There was no slowdowns visible on our C++ testers, so I hope it is not
too bad.  We now get rid of unreachable functions quite early in queue
so they are not consuming too much of compilation time.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39242

[Bug c++/14179] [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers

2009-02-22 Thread hubicka at ucw dot cz



--- Comment #51 from hubicka at ucw dot cz  2009-02-22 14:47 ---
Subject: Re:  [4.2/4.3/4.4 Regression] out of memory while parsing array with
many initializers

> Honza, you realize that the numbers are completely unreadable in bugzilla,
> right?
THey need some care to read, the columns are still intact, just
interleaved... I wonder why bugzilla insists on the linebreaks?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14179

[Bug c++/14179] [4.2/4.3/4.4 Regression] out of memory while parsing array with many initializers

2009-02-23 Thread hubicka at ucw dot cz



--- Comment #54 from hubicka at ucw dot cz  2009-02-23 16:51 ---
Subject: Re:  [4.2/4.3/4.4 Regression] out of memory while parsing array with
many initializers

> > Perhaps explicitly freeing would be good idea? 
> 
> I certainly have no objection to explicitly freeing storage if we know
> we don't need it anymore.

Problem is that I don't know enough of C++ parser to be sure where we
can safely free this vector?

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14179

[Bug debug/39355] [4.4 Regression] ICE at dwarf2out.c:10353 in loc_descriptor_from_tree_1

2009-03-05 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2009-03-05 11:18 ---
Subject: Re:  [4.4 Regression] ICE at dwarf2out.c:10353 in
loc_descriptor_from_tree_1

> --- Comment #8 from jakub at gcc dot gnu dot org  2009-03-05 08:07 ---
> Can you reproduce it with stage1 cc1plus (or non-bootstrapped cc1plus) built 
> by
> some older gcc, or only with stage2/stage3 cc1plus?  Wonder if cc1plus hasn't
> been miscompiled.  In any case, as this can't be reproduced with a
> cross-compiler, somebody with hppa-linux access needs to debug it.

I've just run across interesting memory corruption in Ada: by
duplicating DECL node for nested function we ended up with pointer to
ggc_freeded DECL_STRUCT_FUNCTION.  I have patch to always make functions
nonlocal that might solve this problem as well (as we get methods for
C++ local classes).  I am just on the way to Prague, but I will break it
up from my tree then.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39355

[Bug middle-end/39360] [4.4 Regression] ICE in referenced_var_lookup, at tree-dfa.c:563

2009-03-06 Thread hubicka at ucw dot cz



--- Comment #9 from hubicka at ucw dot cz  2009-03-06 17:30 ---
Subject: Re:  [4.4 Regression] ICE in referenced_var_lookup, at tree-dfa.c:563

> part.  So, either tree-inline.c needs to do the same, or it can't use
> referenced_vars bit as a test whether it has been queued already onto
> local_decls or not.  Honza?

Ah I see.  Actually original version of patch added function to tree-dfa
to check if variable is referenced and used this predicate followed by
add_referenced_var if it was not, but then I noticed that
referenced_var_check_and_insert is available to I decided to use it.
OK, I will prepare this version of patch tomorrow.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39360

[Bug middle-end/39659] [4.5 Regression] ICE building libstdc++v3 functexcept.cc

2009-04-06 Thread hubicka at ucw dot cz



--- Comment #3 from hubicka at ucw dot cz  2009-04-06 09:28 ---
Subject: Re:  [4.5 Regression] ICE building libstdc++v3 functexcept.cc

Hi,
this does not reproduce on my setup, but the following patch should fix it.

Honza

Index: except.c
===
--- except.c(revision 145584)
+++ except.c(working copy)
@@ -853,6 +853,7 @@ remove_unreachable_regions (sbitmap reac
 r->region_number,
 first_must_not_throw->region_number);
  remove_eh_handler_and_replace (r, first_must_not_throw);
+ first_must_not_throw->may_contain_throw |= r->may_contain_throw;
}
   else
bring_to_root (r);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39659

[Bug tree-optimization/39672] Local statics not promoted to const

2009-04-07 Thread hubicka at ucw dot cz



--- Comment #1 from hubicka at ucw dot cz  2009-04-07 11:44 ---
Subject: Re:   New: Local statics not promoted to const

ipa-reference definitly is supposed to do this transform.  I will debug
why it does not in this testcase.

Honza


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39672

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1243 matches

Mail list logo