Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-13 Thread Andrey Belevantsev

On 12.04.2012 18:22, Richard Guenther wrote:

2012/4/12 Andrey Belevantsev:

On 12.04.2012 17:54, Richard Guenther wrote:


2012/4/12 Andrey Belevantsev:


On 12.04.2012 16:38, Richard Guenther wrote:



On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin
  wrote:



On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
  wrote:



On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov
  wrote:





Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?




As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency);
otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without
a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.




It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation "atom-imul-32"
"atom-imul-1, atom-imul-2, atom-imul-3,
atom-imul-4,
 atom-port-0")

;;; imul instruction excludes other non-FP instructions.
(exclusion_set "atom-eu-0, atom-eu-1"
   "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.




Why does that not happen without your patch?  Does it never happen
without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?




It does not happen because the scheduler by itself does not do such
specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.



That surprises me.  What is so specific about this reordering?



I mean that the scheduler does things like "sort the ready list according to
a number of heuristics and to the insn priority, then choose the insn that
would allow the maximum of ready insns to be issued on the current cycle".
  The heuristics are rather general.  The scheduler does not do things like
"if some random insn is ready, then choose some other random insn from the
ready list and schedule it" (which is what the patch does). This is what
scheduler hooks are for, to account for some target-specific heuristic.

The problem is that this particular implementation looks somewhat like an
overkill and also motivated by a single benchmark.  Testing on a wider set
of benchmarks and checking compile-time hit would make the motivation more
convincing.


Yeah, and especially looking _why_ the generic heuristics are not working
and if they could be improved.  After all the issue seems to be properly
modeled in the DFA.


It is, but the DFA only models the actual stalls that you get when an imul 
is scheduled.  What you need here is to increase priority for ready insns 
that have imuls in their critical paths, and those imuls should be close 
enough to fill in the gap (actual consumers in the case of the patch).


The lookahead done in max_issue can help with this "dynamic" priority 
change in general, but it considers just the ready insns, not their 
immediate consumers.  You need to make the deeper lookahead if you want to 
do this in a target independent way -- no-no from compile time POV.   A 
number of already existing hooks can help:


- sched_reorder used in this patch just sorts the ready list in any way the 
target wishes;


- adjust_priority/adjust_cost  helps to boost or to lower priority in such 
subtle cases;


- is_costly_dependence (now rs6000 only) can guide the early queue insn 
removal and its addition to the ready list;


- dfa_lookahead_guard (now ia64 only) can ban some insns from being issued 
on the current try.


Using sched_reorder is existing practice, like in s390, spu, mips, etc. 
Usually though an implementation again only considers the actual ready 
insns and not the successors.  I'd say that trying more tests would help, 
or other hooks can be tried, too, but after all this is Atom specific, so 
it's the decision of i386 maintainers.


Andrey



Richard.


Andrey





Igor, why not try different subtler mechanisms like adjust_priority,
which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out
when
you have more than one imul in the ready list?  Don't you want to bump
the
priority of the other imul found?

Andrey





And MD allows us only prefer scheduling of successive IMUL
instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidat

Re: [PATCH] Fix PR 52203 and 52715

2012-04-13 Thread Andrey Belevantsev

Hello,

On 07.03.2012 15:46, Alexander Monakov wrote:



On Wed, 7 Mar 2012, Andrey Belevantsev wrote:


Hello,

This PR is again about insns that are recog'ed as>=0 but do not change the
processor state.  As explained in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52203#c8, I've tried experimenting
with an attribute marking those "empty" insns in MD files and asserting that
all other insns do have reservations.  As this doesn't seem to be interesting,
I give up with the idea, and the below patch makes sel-sched do exactly what
the Haifa scheduler does, i.e. do not count such insns against issue_rate when
modelling clock cycles.


But of course I wrongly microoptimized the decision of whether an insn is 
"empty" as shown in PR 52715, so the right fix is to check the emptiness 
right before issuing the insn.  Thus, the following patch is really needed 
(tested on ia64 and x86 again), OK?  The new hunk is the last one in the patch.


Andrey

2012-04-13  Andrey Belevantsev  

PR rtl-optimization/52203
PR rtl-optimization/52715

Revert the 2012-03-07 fix for PR 52203.
* sel-sched.c (reset_sched_cycles_in_current_ebb): Check that
the insn does not modify DFA right before issuing, adjust
issue_rate accordingly.




Tested on ia64 and x86-64, OK for trunk?  No testcase again because of the
amount of flags needed.

Andrey

2012-03-07  Andrey Belevantsev

PR rtl-optimization/52203
* sel-sched.c (estimate_insn_cost): New parameter pempty.  Adjust
all callers to pass NULL except ...
(reset_sched_cycles_in_current_ebb): ... here, save the value
in new variable 'empty'.  Increase issue_rate only for
non-empty insns.


This is OK.

Thanks.



diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 4e13230..ce38fa0 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -4265,10 +4265,9 @@ invoke_aftermath_hooks (fence_t fence, rtx best_insn, int issue_more)
   return issue_more;
 }
 
-/* Estimate the cost of issuing INSN on DFA state STATE.  Write to PEMPTY
-   true when INSN does not change the processor state.  */
+/* Estimate the cost of issuing INSN on DFA state STATE.  */
 static int
-estimate_insn_cost (rtx insn, state_t state, bool *pempty)
+estimate_insn_cost (rtx insn, state_t state)
 {
   static state_t temp = NULL;
   int cost;
@@ -4278,8 +4277,6 @@ estimate_insn_cost (rtx insn, state_t state, bool *pempty)
 
   memcpy (temp, state, dfa_state_size);
   cost = state_transition (temp, insn);
-  if (pempty)
-*pempty = (memcmp (temp, state, dfa_state_size) == 0);
 
   if (cost < 0)
 return 0;
@@ -4310,7 +4307,7 @@ get_expr_cost (expr_t expr, fence_t fence)
 	return 0;
 }
   else
-return estimate_insn_cost (insn, FENCE_STATE (fence), NULL);
+return estimate_insn_cost (insn, FENCE_STATE (fence));
 }
 
 /* Find the best insn for scheduling, either via max_issue or just take
@@ -7023,7 +7020,7 @@ reset_sched_cycles_in_current_ebb (void)
 {
   int cost, haifa_cost;
   int sort_p;
-  bool asm_p, real_insn, after_stall, all_issued, empty;
+  bool asm_p, real_insn, after_stall, all_issued;
   int clock;
 
   if (!INSN_P (insn))
@@ -7050,7 +7047,7 @@ reset_sched_cycles_in_current_ebb (void)
 	haifa_cost = 0;
 	}
   else
-haifa_cost = estimate_insn_cost (insn, curr_state, &empty);
+haifa_cost = estimate_insn_cost (insn, curr_state);
 
   /* Stall for whatever cycles we've stalled before.  */
   after_stall = 0;
@@ -7084,7 +7081,7 @@ reset_sched_cycles_in_current_ebb (void)
   if (!after_stall
   && real_insn
   && haifa_cost > 0
-  && estimate_insn_cost (insn, curr_state, NULL) == 0)
+  && estimate_insn_cost (insn, curr_state) == 0)
 break;
 
   /* When the data dependency stall is longer than the DFA stall,
@@ -7096,7 +7093,7 @@ reset_sched_cycles_in_current_ebb (void)
   if ((after_stall || all_issued)
   && real_insn
   && haifa_cost == 0)
-haifa_cost = estimate_insn_cost (insn, curr_state, NULL);
+haifa_cost = estimate_insn_cost (insn, curr_state);
 }
 
 	  haifa_clock += i;
@@ -7127,8 +7124,14 @@ reset_sched_cycles_in_current_ebb (void)
 
   if (real_insn)
 	{
+	  static state_t temp = NULL;
+
+	  if (!temp)
+	temp = xmalloc (dfa_state_size);
+	  memcpy (temp, curr_state, dfa_state_size);
+
 	  cost = state_transition (curr_state, insn);
-	  if (!empty)
+	  if (memcmp (temp, curr_state, dfa_state_size))
 	issued_insns++;
 
   if (sched_verbose >= 2)


use longcall to abort from !pic trampoline setup on ppc-vxworks

2012-04-13 Thread Olivier Hainque
Hello,

For some versions and execution modes, VxWorks features facilities to let
users download object modules and link them with the kernel at run-time.

Relocation troubles (24bit reloc overflows) might show up when module
instructions contain short references to kernel symbols and the module
happens to be loaded very far away from the kernel in memory.

As of today, the powerpc "__trampoline_setup" function embeds such a
potentially problematic reference, in the short call the "abort" at the
end.

The attached patch is a suggestion to prevent the potential troubles
by simply issuing a longcall sequence in the relevant case (vxworks, !pic).

We have been using it in our internal trees for years now. Tested on
mainline by checking that a linux hosted build for powerpc-wrs-vxworks
succeeds and that the expected sequence is found in the tramp.o of
interest.

OK to commit ?

Thanks in advance,

With Kind Regards,

Olivier

2012-04-12  Olivier Hainque  

libgcc/
* config/rs6000/vxworks/tramp.S (trampoline_setup): Use a longcall
sequence in the non pic case on VxWorks.



vxtramp.dif
Description: video/dv


[PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread Ilya Enkovich
Hello,

Here is a patch which creates new gnu-user-common.h file and moves all
common gnu-user.h and gnu-user64.h definitions to this new file. New
file is required to avoid duplication of Android specific changes in
gnu-user.h and gnu-user64.h. This patch is actually a non Android
specific part of previously submitted patch to support Android in x86
target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html).
Bootstrapped and tested on linux-x86_64. Is it OK for mainline and
4.7?

Thanks
Ilya,
---
2012-04-13  Enkovich Ilya  

* config/i386/gnu-user.h: Include gnu-user-common.h.
(CPP_SPEC): Removed.
(CC1_SPEC): Removed.
(ENDFILE_SPEC): Removed.
(DEFAULT_PCC_STRUCT_RETURN): Removed.
(TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed.
(TARGET_OS_CPP_BUILTINS): Removed.
(LIBGCC2_HAS_TF_MODE): Removed.
(LIBGCC2_TF_CEXT): Removed.
(TF_SIZE): Removed.
(TARGET_ASM_FILE_END): Removed.
(STACK_CHECK_MOVING_SP): Removed.
(STACK_CHECK_STATIC_BUILTIN): Removed.
(TARGET_THREAD_SSP_OFFSET): Removed.
(TARGET_CAN_SPLIT_STACK): Removed.
(TARGET_THREAD_SPLIT_STACK_OFFSET): Removed.

* config/i386/gnu-user64.h: Likewise.

* config/i386/gnu-user-common.h: New.


mandroid.patch
Description: Binary data


[PATCH] Fix for PR52734 (-ftree-tail-merge)

2012-04-13 Thread Tom de Vries
Richard,

this patch fixes PR52743.

The problem is as follows: blocks 3 and 5, with successor 6 are considered equal
and merged.
...
  # BLOCK 3 freq:6102
  # PRED: 2 [61.0%]  (true,exec)
  # VUSE <.MEMD.1734_10>
  dddD.1710_3 = bbbD.1703;
  goto ;
  # SUCC: 6 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:2378
  # PRED: 4 [61.0%]  (false,exec)
  # SUCC: 6 [100.0%]  (fallthru,exec)

  # BLOCK 6 freq:1
  # PRED: 3 [100.0%]  (fallthru,exec) 7 [100.0%]  (fallthru) 5 [100.0%]
(fallthru,exec)
  # dddD.1710_1 = PHI 
  # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)>
  # VUSE <.MEMD.1734_8>
  return dddD.1710_1;
  # SUCC: EXIT [100.0%]
...

Tail merge considers 2 blocks equal if the effect at the tail is equal,
meaning:
- the sequence of side effects produced by each block is equal
- the value phis are equal

There are no side effects in block 3 and 5, and the phi alternatives of
dddD.1710_1 for 3 (dddD.1710_3)  and 5 (dddD.1710_4)  are proven equal by gvn.

The problem is that changing the (4->5) edge into a (4->3) edge changes the
value of dddD.1710_3, because block 4 contains a store that affects the load in
block 3.
...
  # BLOCK 4 freq:3898
  # PRED: 2 [39.0%]  (false,exec)
  # VUSE <.MEMD.1734_10>
  dddD.1710_4 = bbbD.1703;
  # .MEMD.1734_11 = VDEF <.MEMD.1734_10>
  # USE = nonlocal null
  # CLB = nonlocal null
  D.1724_5 = aaaD.1705 ();
  if (D.1724_5 != 0)
goto ;
  else
goto ;
  # SUCC: 7 [39.0%]  (true,exec) 5 [61.0%]  (false,exec)
...

Or, put differently, the incoming vuse of block 3 affects a value phi
alternative for that block (dddD.1710_3), so the 2 blocks are equal only under
the condition that the incoming vuses are equal.

We could build an analysis that addresses that precisely, but for now I
implemented a more coarse-grained fix: if the incoming vuses are not equal, and
at least one of the vuses influenced a non-virtual result, we don't consider the
blocks equal.

Bootstrapped and reg-tested on x86_64.

ok for trunk, 4.7.1?

Thanks,
- Tom

2012-04-13  Tom de Vries  

* tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add
parameters vuse and vuse_escaped.
(find_duplicate): Init vuse1, vuse2 and vuse_escaped.  Pass to
gsi_advance_bw_nondebug_nonlocal.  Return if vuse_escaped and
vuse1 != vuse2.

* gcc.dg/pr52734.c: New test.

Index: gcc/tree-ssa-tail-merge.c
===
--- gcc/tree-ssa-tail-merge.c (revision 185028)
+++ gcc/tree-ssa-tail-merge.c (working copy)
@@ -1123,18 +1123,31 @@ gimple_equal_p (same_succ same_succ, gim
 }
 }
 
-/* Let GSI skip backwards over local defs.  */
+/* Let GSI skip backwards over local defs.  Return the earliest vuse in VUSE.
+   Return true in VUSE_ESCAPED if the vuse influenced a SSA_OP_DEF of one of the
+   processed statements.  */
 
 static void
-gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi)
+gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi, tree *vuse,
+  bool *vuse_escaped)
 {
   gimple stmt;
+  tree lvuse;
 
   while (true)
 {
   if (gsi_end_p (*gsi))
 	return;
   stmt = gsi_stmt (*gsi);
+
+  lvuse = gimple_vuse (stmt);
+  if (lvuse != NULL_TREE)
+	{
+	  *vuse = lvuse;
+	  if (!ZERO_SSA_OPERANDS (stmt, SSA_OP_DEF))
+	*vuse_escaped = true;
+	}
+
   if (!(is_gimple_assign (stmt) && local_def (gimple_get_lhs (stmt))
 	&& !gimple_has_side_effects (stmt)))
 	return;
@@ -1150,9 +1163,11 @@ find_duplicate (same_succ same_succ, bas
 {
   gimple_stmt_iterator gsi1 = gsi_last_nondebug_bb (bb1);
   gimple_stmt_iterator gsi2 = gsi_last_nondebug_bb (bb2);
+  tree vuse1 = NULL_TREE, vuse2 = NULL_TREE;
+  bool vuse_escaped = false;
 
-  gsi_advance_bw_nondebug_nonlocal (&gsi1);
-  gsi_advance_bw_nondebug_nonlocal (&gsi2);
+  gsi_advance_bw_nondebug_nonlocal (&gsi1, &vuse1, &vuse_escaped);
+  gsi_advance_bw_nondebug_nonlocal (&gsi2, &vuse2, &vuse_escaped);
 
   while (!gsi_end_p (gsi1) && !gsi_end_p (gsi2))
 {
@@ -1161,13 +1176,20 @@ find_duplicate (same_succ same_succ, bas
 
   gsi_prev_nondebug (&gsi1);
   gsi_prev_nondebug (&gsi2);
-  gsi_advance_bw_nondebug_nonlocal (&gsi1);
-  gsi_advance_bw_nondebug_nonlocal (&gsi2);
+  gsi_advance_bw_nondebug_nonlocal (&gsi1, &vuse1, &vuse_escaped);
+  gsi_advance_bw_nondebug_nonlocal (&gsi2, &vuse2, &vuse_escaped);
 }
 
   if (!(gsi_end_p (gsi1) && gsi_end_p (gsi2)))
 return;
 
+  /* If the incoming vuses are not the same, and the vuse escaped into an
+ SSA_OP_DEF, then merging the 2 blocks will change the value of the def,
+ which potentially means the semantics of one of the blocks will be changed.
+ TODO: make this check more precise.  */
+  if (vuse_escaped && vuse1 != vuse2)
+return;
+
   if (dump_file)
 fprintf (dump_file, "find_duplicates:  duplicate of \n",
 	 bb1->index, bb2->index);
Index: gcc/testsuite/gcc.dg/pr52734.c
==

Re: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-13 Thread Richard Guenther
On Fri, Apr 13, 2012 at 12:11 AM, Aldy Hernandez  wrote:
> Here we have a testcase that affects both the C++ memory model and
> transactional memory.
>
> [Hans, this is caused by the same problem that is causing the speculative
> register promotion issue you and Torvald pointed me at].
>
> In the following testcase (adapted from the PR), the loop invariant motion
> pass caches a pre-existing value for g_2, and then performs a store to g_2
> on every path, causing a store data race:
>
> int g_1 = 1;
> int g_2 = 0;
>
> int func_1(void)
> {
>  int l;
>  for (l = 0; l < 1234; l++)
>  {
>    if (g_1)
>      return l;
>    else
>      g_2 = 0;  <-- Store to g_2 should only happen if !g_1
>  }
>  return 999;
> }
>
> This gets transformed into something like:
>
>        g_2_lsm = g_2;
>        if (g_1) {
>                g_2 = g_2_lsm;  // boo! hiss!
>                return 0;
>        } else {
>                g_2_lsm = 0;
>                g_2 = g_2_lsm;
>        }
>
> The spurious write to g_2 could cause a data race.
>
> Andrew has suggested verifying that the store to g_2 was actually different
> than on entry, and letting PHI copy propagation optimize the redundant
> comparisons.  Like this:
>
>        g_2_lsm = g_2;
>        if (g_1) {
>                if (g_2_lsm != g_2)     // hopefully optimized away
>                        g_2 = g_2_lsm;  // hopefully optimized away
>                return 0;
>        } else {
>                g_2_lsm = 0;
>                if (g_2_lsm != g_2)     // hopefully optimized away
>                        g_2 = g_2_lsm;
>        }
>
> ...which PHI copy propagation and/or friends should be able to optimize.
>
> For that matter, regardless of the memory model, the proposed solution would
> improve the existing code by removing the extra store (in this contrived
> test case anyhow).
>
> Anyone see a problem with this approach (see attached patch)?

Can we _remove_ a store we percieve as redundant (with a single-threaded
view) with the memory model?  If so, then this sounds plausible (minus
the optimization that if the variable is written to unconditionally we avoid
this comparison -- see the places where we check whether we can apply
store motion to possibly trapping locations).

A similar effect could be obtained by keeping a flag whether we entered the
path that stored the value before the transform.  Thus, do

  lsm = g2;  // technically not needed, if you also handle loads
  flag = false;
  for (...)
{
   if (g1)
 {
if (flag)
  g2 = lsm;
 }
   else
 {
lsm = 0;
flag = true;
 }
}
  if (flag)
g2 = lsm;

that would allow to extend this to cover the cases where the access may
eventually trap (if you omit the load before the loop).

Not sure which is more efficient (you can eventually combine flag variables
for different store locations, combining the if-changed compare is not so easy
I guess).

> Assuming the unlikely scenario that everyone likes this :), I have two
> problems that I would like answered.  But feel free to ignore if the
> approach is a no go.
>
> 1. No pass can figure out the equality (or inequality) of g_2_lsm and g_2.
>  In the PR, Richard mentions that both FRE/PRE can figure this out, but they
> are not run after store motion.  DOM should also be able to, but apparently
> does not :(.  Tips?

Well.  Schedule FRE after loop opts ...

DOM is not prepared to handle loads/stores with differing VOPs - it was never
updated really after the alias-improvements merge.

> 2. The GIMPLE_CONDs I create in this patch are currently causing problems
> with complex floats/doubles.  do_jump is complaining that it can't compare
> them, so I assume it is too late (in tree-ssa-loop-im.c) to compare complex
> values since complex lowering has already happened (??). Is there a more
> canonical way of creating a GIMPLE_COND that may contain complex floats at
> this stage?

As you are handling redundant stores you want a bitwise comparison anyway,
but yes, complex compares need to be lowered.  But as said, you want a
bitwise comparison, not a value-comparison.  You can try using

  if (VIEW_CONVERT_EXPR 
(lsm_tmp_reg) != VIEW_CONVERT_EXPR <  > (orig_value))
...

that can of course result in really awful codegen (think of fp - gp register
moves, or even stack spills).  Which maybe hints at that the flag approach
would be better in some cases?  (you need to cache the original value
in a register anyway, the only thing you save is updating it when you would
update the flag value)

Maybe you can combine both, eventually dependent on a target hook
can_compare_bitwise (mode) that would tell you if the target can efficiently
compare two registers in mode for bit equality.

Few comments on the patch itself.

+ new_bb = split_edge (ex);
+ then_bb = create_empty_bb (new_bb);
+ if (current_loops && new_bb->loop_father)
+   add_bb_to_loop (then_bb, new_bb->lo

Re: [PATCH] Fix PR 52203 and 52715

2012-04-13 Thread Alexander Monakov

On Fri, 13 Apr 2012, Andrey Belevantsev wrote:

> But of course I wrongly microoptimized the decision of whether an insn is
> "empty" as shown in PR 52715, so the right fix is to check the emptiness right
> before issuing the insn.  Thus, the following patch is really needed (tested
> on ia64 and x86 again), OK?  The new hunk is the last one in the patch.
> 
> Andrey
> 
> 2012-04-13  Andrey Belevantsev  
> 
>   PR rtl-optimization/52203
>   PR rtl-optimization/52715
> 
>   Revert the 2012-03-07 fix for PR 52203.
>   * sel-sched.c (reset_sched_cycles_in_current_ebb): Check that
>   the insn does not modify DFA right before issuing, adjust
>   issue_rate accordingly.

OK.  Thanks.

Alexander


Re: [PATCH] Fix for PR52734 (-ftree-tail-merge)

2012-04-13 Thread Richard Guenther
On Fri, Apr 13, 2012 at 10:32 AM, Tom de Vries  wrote:
> Richard,
>
> this patch fixes PR52743.
>
> The problem is as follows: blocks 3 and 5, with successor 6 are considered 
> equal
> and merged.
> ...
>  # BLOCK 3 freq:6102
>  # PRED: 2 [61.0%]  (true,exec)
>  # VUSE <.MEMD.1734_10>
>  dddD.1710_3 = bbbD.1703;
>  goto ;
>  # SUCC: 6 [100.0%]  (fallthru,exec)
>
>  # BLOCK 5 freq:2378
>  # PRED: 4 [61.0%]  (false,exec)
>  # SUCC: 6 [100.0%]  (fallthru,exec)
>
>  # BLOCK 6 freq:1
>  # PRED: 3 [100.0%]  (fallthru,exec) 7 [100.0%]  (fallthru) 5 [100.0%]
> (fallthru,exec)
>  # dddD.1710_1 = PHI 
>  # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)>
>  # VUSE <.MEMD.1734_8>
>  return dddD.1710_1;
>  # SUCC: EXIT [100.0%]
> ...
>
> Tail merge considers 2 blocks equal if the effect at the tail is equal,
> meaning:
> - the sequence of side effects produced by each block is equal
> - the value phis are equal
>
> There are no side effects in block 3 and 5, and the phi alternatives of
> dddD.1710_1 for 3 (dddD.1710_3)  and 5 (dddD.1710_4)  are proven equal by gvn.
>
> The problem is that changing the (4->5) edge into a (4->3) edge changes the
> value of dddD.1710_3, because block 4 contains a store that affects the load 
> in
> block 3.
> ...
>  # BLOCK 4 freq:3898
>  # PRED: 2 [39.0%]  (false,exec)
>  # VUSE <.MEMD.1734_10>
>  dddD.1710_4 = bbbD.1703;
>  # .MEMD.1734_11 = VDEF <.MEMD.1734_10>
>  # USE = nonlocal null
>  # CLB = nonlocal null
>  D.1724_5 = aaaD.1705 ();
>  if (D.1724_5 != 0)
>    goto ;
>  else
>    goto ;
>  # SUCC: 7 [39.0%]  (true,exec) 5 [61.0%]  (false,exec)
> ...
>
> Or, put differently, the incoming vuse of block 3 affects a value phi
> alternative for that block (dddD.1710_3), so the 2 blocks are equal only under
> the condition that the incoming vuses are equal.
>
> We could build an analysis that addresses that precisely, but for now I
> implemented a more coarse-grained fix: if the incoming vuses are not equal, 
> and
> at least one of the vuses influenced a non-virtual result, we don't consider 
> the
> blocks equal.
>
> Bootstrapped and reg-tested on x86_64.
>
> ok for trunk, 4.7.1?

Hmm, I wonder if the proper observation would not be that while GVN considers
the PHI arguments in

>  # dddD.1710_1 = PHI 

to be equal, their definitions are not in the blocks we try to merge, so their
value can be dependent on the status as it was on their predecessors.
GVN, after all, considers flow-sensitivity.

Richard.


> Thanks,
> - Tom
>
> 2012-04-13  Tom de Vries  
>
>        * tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add
>        parameters vuse and vuse_escaped.
>        (find_duplicate): Init vuse1, vuse2 and vuse_escaped.  Pass to
>        gsi_advance_bw_nondebug_nonlocal.  Return if vuse_escaped and
>        vuse1 != vuse2.
>
>        * gcc.dg/pr52734.c: New test.
>


Re: [PATCH][C] Fix PR52549

2012-04-13 Thread Richard Guenther
On Thu, 12 Apr 2012, Richard Guenther wrote:

> 
> This fixes PR52549 - we are running into an overzealous assert
> that wants to make sure we don't have PLUS_EXPR on pointers.
> But that code does not really check this and falls foul of
> the conversion removal code right before it that transforms
> (void *)(a + b) to a + b.
> 
> Fixed as follows.
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Done.

> Ok?

Actually I changed my mind and consider it obvious.

Richard.

> 2012-04-12  Richard Guenther  
> 
>   PR c/52549
>   * c-typeck.c (pointer_diff): Remove bogus assert.
> 
>   * gcc.dg/pr52549.c: New testcase.
> 
> Index: gcc/c-typeck.c
> ===
> --- gcc/c-typeck.c(revision 186373)
> +++ gcc/c-typeck.c(working copy)
> @@ -3446,8 +3446,6 @@ pointer_diff (location_t loc, tree op0,
>else
>  con1 = op1;
>  
> -  gcc_assert (TREE_CODE (con0) != PLUS_EXPR
> -   && TREE_CODE (con1) != PLUS_EXPR);
>if (TREE_CODE (con0) == POINTER_PLUS_EXPR)
>  {
>lit0 = TREE_OPERAND (con0, 1);
> Index: gcc/testsuite/gcc.dg/pr52549.c
> ===
> --- gcc/testsuite/gcc.dg/pr52549.c(revision 0)
> +++ gcc/testsuite/gcc.dg/pr52549.c(revision 0)
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +
> +_mark (long obj, int i, char *a)
> +{
> +  (char *)&(((long *)(obj)) [i]) - a;
> +}
> 


Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread Uros Bizjak
Hello!

> Here is a patch which creates new gnu-user-common.h file and moves all
> common gnu-user.h and gnu-user64.h definitions to this new file. New
> file is required to avoid duplication of Android specific changes in
> gnu-user.h and gnu-user64.h. This patch is actually a non Android
> specific part of previously submitted patch to support Android in x86
> target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html).
> Bootstrapped and tested on linux-x86_64. Is it OK for mainline and
> 4.7?

> 2012-04-13  Enkovich Ilya  
>
>   * config/i386/gnu-user.h: Include gnu-user-common.h.
>   (CPP_SPEC): Removed.
>   (CC1_SPEC): Removed.
>   (ENDFILE_SPEC): Removed.
>   (DEFAULT_PCC_STRUCT_RETURN): Removed.
>   (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed.
>   (TARGET_OS_CPP_BUILTINS): Removed.
>   (LIBGCC2_HAS_TF_MODE): Removed.
>   (LIBGCC2_TF_CEXT): Removed.
>   (TF_SIZE): Removed.
>   (TARGET_ASM_FILE_END): Removed.
>   (STACK_CHECK_MOVING_SP): Removed.
>   (STACK_CHECK_STATIC_BUILTIN): Removed.
>   (TARGET_THREAD_SSP_OFFSET): Removed.
>   (TARGET_CAN_SPLIT_STACK): Removed.
>   (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed.
>
>   * config/i386/gnu-user64.h: Likewise.
>
>   * config/i386/gnu-user-common.h: New.

This ChangeLog is wrong, you didn't remove these files but move them
to new file.

+#ifdef TARGET_LIBC_PROVIDES_SSP
+/* i386 glibc provides __stack_chk_guard in %gs:0x14,
+   x32 glibc provides it in %fs:0x18.
+   x86_64 glibc provides it in %fs:0x28.  */
+#define TARGET_THREAD_SSP_OFFSET \
+  (TARGET_64BIT ? (TARGET_X32 ? 0x18 : 0x28) : 0x14)
+
+/* We steal the last transactional memory word.  */
+#define TARGET_CAN_SPLIT_STACK
+#define TARGET_THREAD_SPLIT_STACK_OFFSET \
+  (TARGET_64BIT ? (TARGET_X32 ? 0x40 : 0x70) : 0x30)
+#endif

Shouldn't TARGET_64BIT part remain in gnu_user64.h and !TARGET_64BIT
in gnu-user.h? I don't see the reason to put these conditinal defines
in shared file. However, this may be due to biarch handling that I'm
not familiar with in all details.

+#include "config/i386/gnu-user-common.h"

You shouldn't include new file here, list it in config.gcc before
i386/gnu-user.h or i386/gnu-user64.c

Uros.


Re: [PATCH] Option to build bare-metal ARM cross-compiler for arm-none-eabi target without libunwind

2012-04-13 Thread Sebastian Huber

Hello,

I think it is good to disable the exceptions for the division by zero.  Maybe 
this should be made target specific and not based on a configure option.  For 
example in "libgcc/config.host":


arm*-*-linux*)
[...]
tmake_file="${tmake_file} arm/t-div-by-zero-exc"
[...]

With "libgcc/config/arm/t-div-by-zero-exc":

LIB2_DIVMOD_CFLAGS := -fexceptions -fnon-call-exception

--
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax : +49 89 18 90 80 79-9
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-13 Thread Igor Zamyatin
On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev  wrote:
> On 12.04.2012 16:38, Richard Guenther wrote:
>>
>> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin
>>  wrote:
>>>
>>> On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
>>>   wrote:

 On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov
  wrote:
>
>
>> Can atom execute two IMUL in parallel?  Or what exactly is the
>> pipeline
>> behavior?
>
>
> As I understand from Intel's optimization reference manual, the
> behavior is as
> follows: if the instruction immediately following IMUL has shorter
> latency,
> execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
> a
> 4-or-more cycles latency instruction can be issued after IMUL without a
> stall.
> In other words, IMUL is pipelined with respect to other long-latency
> instructions, but not to short-latency instructions.


 It seems to be modeled in the pipeline description though:

 ;;; imul insn has 5 cycles latency
 (define_reservation "atom-imul-32"
                    "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
                     atom-port-0")

 ;;; imul instruction excludes other non-FP instructions.
 (exclusion_set "atom-eu-0, atom-eu-1"
               "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")

>>>
>>> The main idea is quite simple:
>>>
>>> If we are going to schedule IMUL instruction (it is on the top of
>>> ready list) we try to find out producer of other (independent) IMUL
>>> instruction that is in ready list too. The goal is try to schedule
>>> such a producer to get another IMUL in ready list and get scheduling
>>> of 2 successive IMUL instructions.
>>
>>
>> Why does that not happen without your patch?  Does it never happen without
>> your patch or does it merely not happen for one EEMBC benchmark (can
>> you provide a testcase?)?
>
>
> It does not happen because the scheduler by itself does not do such specific
> reordering.  That said, it is easy to imagine the cases where this patch
> will make things worse rather than better.
>
> Igor, why not try different subtler mechanisms like adjust_priority, which
> is get called when an insn is added to the ready list?  E.g. increase the
> producer's priority.
>
> The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
> you have more than one imul in the ready list?  Don't you want to bump the
> priority of the other imul found?

Could you provide some examples when this patch would harm the performance?

Sched_reorder was chosen since it is used in other ports and looks
most suitable for such case, e.g. it provides access to the whole
ready list.
BTW, just increasing producer's priority seems to be more risky in
performance sense - we can incorrectly start delaying some
instructions.

Thought ready list doesn't contain DEBUG_INSN... Is it so? If it
contains them - this could be added easily

The case with more than one imul in the ready list wasn't considered
just because the goal was to catch the particular case when there is a
risk to get the following picture: "imul-producer-imul" which is less
effective than "producer-imul-imul" for Atom

>
> Andrey
>
>
>>
>>> And MD allows us only prefer scheduling of successive IMUL instructions,
>>> i.e.
>>> If IMUL was just scheduled and ready list contains another IMUL
>>> instruction then it will be chosen as the best candidate for
>>> scheduling.
>>>
>>>
 at least from my very limited guessing of what the above does.  So, did
 you
 analyze why the scheduler does not properly schedule things for you?

 Richard.

>
>  From reading the patch, I could not understand the link between
> pipeline
> behavior and what the patch appears to do.
>
> Hope that helps.
>
> Alexander
>
>

Thanks,
Igor


Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread Ilya Enkovich
> Hello!
>
>> Here is a patch which creates new gnu-user-common.h file and moves all
>> common gnu-user.h and gnu-user64.h definitions to this new file. New
>> file is required to avoid duplication of Android specific changes in
>> gnu-user.h and gnu-user64.h. This patch is actually a non Android
>> specific part of previously submitted patch to support Android in x86
>> target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html).
>> Bootstrapped and tested on linux-x86_64. Is it OK for mainline and
>> 4.7?
>
>> 2012-04-13  Enkovich Ilya  
>>
>>       * config/i386/gnu-user.h: Include gnu-user-common.h.
>>       (CPP_SPEC): Removed.
>>       (CC1_SPEC): Removed.
>>       (ENDFILE_SPEC): Removed.
>>       (DEFAULT_PCC_STRUCT_RETURN): Removed.
>>       (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed.
>>       (TARGET_OS_CPP_BUILTINS): Removed.
>>       (LIBGCC2_HAS_TF_MODE): Removed.
>>       (LIBGCC2_TF_CEXT): Removed.
>>       (TF_SIZE): Removed.
>>       (TARGET_ASM_FILE_END): Removed.
>>       (STACK_CHECK_MOVING_SP): Removed.
>>       (STACK_CHECK_STATIC_BUILTIN): Removed.
>>       (TARGET_THREAD_SSP_OFFSET): Removed.
>>       (TARGET_CAN_SPLIT_STACK): Removed.
>>       (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed.
>>
>>       * config/i386/gnu-user64.h: Likewise.
>>
>>       * config/i386/gnu-user-common.h: New.
>
> This ChangeLog is wrong, you didn't remove these files but move them
> to new file.
>
> +#ifdef TARGET_LIBC_PROVIDES_SSP
> +/* i386 glibc provides __stack_chk_guard in %gs:0x14,
> +   x32 glibc provides it in %fs:0x18.
> +   x86_64 glibc provides it in %fs:0x28.  */
> +#define TARGET_THREAD_SSP_OFFSET \
> +  (TARGET_64BIT ? (TARGET_X32 ? 0x18 : 0x28) : 0x14)
> +
> +/* We steal the last transactional memory word.  */
> +#define TARGET_CAN_SPLIT_STACK
> +#define TARGET_THREAD_SPLIT_STACK_OFFSET \
> +  (TARGET_64BIT ? (TARGET_X32 ? 0x40 : 0x70) : 0x30)
> +#endif
>
> Shouldn't TARGET_64BIT part remain in gnu_user64.h and !TARGET_64BIT
> in gnu-user.h? I don't see the reason to put these conditinal defines
> in shared file. However, this may be due to biarch handling that I'm
> not familiar with in all details.
>
> +#include "config/i386/gnu-user-common.h"
>
> You shouldn't include new file here, list it in config.gcc before
> i386/gnu-user.h or i386/gnu-user64.c
>
> Uros.

Hello,

Thanks for review! Here is a new version with all issues fixed.

Bootstrapped on linux-x86_64 and check is in progress. Will it be OK
for trunk and 4.7 after successfull check?

Thanks,
Ilya
---
2012-04-13  Enkovich Ilya  

* config/i386/gnu-user-common.h: New.

* config/i386/gnu-user.h (CPP_SPEC): Moved to
gnu-user-common.h.
(CC1_SPEC): Likewise.
(ENDFILE_SPEC): Likewise.
(DEFAULT_PCC_STRUCT_RETURN): Likewise.
(TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
(TARGET_OS_CPP_BUILTINS): Likewise.
(LIBGCC2_HAS_TF_MODE): Likewise.
(LIBGCC2_TF_CEXT): Likewise.
(TF_SIZE): Likewise.
(TARGET_ASM_FILE_END): Likewise.
(STACK_CHECK_MOVING_SP): Likewise.
(STACK_CHECK_STATIC_BUILTIN): Likewise.

* config/i386/gnu-user64.h: Likewise.


mandroid.patch
Description: Binary data


Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-13 Thread Richard Earnshaw
On 12/04/12 19:29, Dennis Gilmore wrote:
> 
> off topic but i find aarch64 weird and too generic is it arm alpha amd
> atom.  
> 

That's only 'cos it's new.  It's no different from names like ia64.

R.



Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread Richard Guenther
On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich  wrote:
>> Hello!
>>
>>> Here is a patch which creates new gnu-user-common.h file and moves all
>>> common gnu-user.h and gnu-user64.h definitions to this new file. New
>>> file is required to avoid duplication of Android specific changes in
>>> gnu-user.h and gnu-user64.h. This patch is actually a non Android
>>> specific part of previously submitted patch to support Android in x86
>>> target (http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00127.html).
>>> Bootstrapped and tested on linux-x86_64. Is it OK for mainline and
>>> 4.7?
>>
>>> 2012-04-13  Enkovich Ilya  
>>>
>>>       * config/i386/gnu-user.h: Include gnu-user-common.h.
>>>       (CPP_SPEC): Removed.
>>>       (CC1_SPEC): Removed.
>>>       (ENDFILE_SPEC): Removed.
>>>       (DEFAULT_PCC_STRUCT_RETURN): Removed.
>>>       (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Removed.
>>>       (TARGET_OS_CPP_BUILTINS): Removed.
>>>       (LIBGCC2_HAS_TF_MODE): Removed.
>>>       (LIBGCC2_TF_CEXT): Removed.
>>>       (TF_SIZE): Removed.
>>>       (TARGET_ASM_FILE_END): Removed.
>>>       (STACK_CHECK_MOVING_SP): Removed.
>>>       (STACK_CHECK_STATIC_BUILTIN): Removed.
>>>       (TARGET_THREAD_SSP_OFFSET): Removed.
>>>       (TARGET_CAN_SPLIT_STACK): Removed.
>>>       (TARGET_THREAD_SPLIT_STACK_OFFSET): Removed.
>>>
>>>       * config/i386/gnu-user64.h: Likewise.
>>>
>>>       * config/i386/gnu-user-common.h: New.
>>
>> This ChangeLog is wrong, you didn't remove these files but move them
>> to new file.
>>
>> +#ifdef TARGET_LIBC_PROVIDES_SSP
>> +/* i386 glibc provides __stack_chk_guard in %gs:0x14,
>> +   x32 glibc provides it in %fs:0x18.
>> +   x86_64 glibc provides it in %fs:0x28.  */
>> +#define TARGET_THREAD_SSP_OFFSET \
>> +  (TARGET_64BIT ? (TARGET_X32 ? 0x18 : 0x28) : 0x14)
>> +
>> +/* We steal the last transactional memory word.  */
>> +#define TARGET_CAN_SPLIT_STACK
>> +#define TARGET_THREAD_SPLIT_STACK_OFFSET \
>> +  (TARGET_64BIT ? (TARGET_X32 ? 0x40 : 0x70) : 0x30)
>> +#endif
>>
>> Shouldn't TARGET_64BIT part remain in gnu_user64.h and !TARGET_64BIT
>> in gnu-user.h? I don't see the reason to put these conditinal defines
>> in shared file. However, this may be due to biarch handling that I'm
>> not familiar with in all details.
>>
>> +#include "config/i386/gnu-user-common.h"
>>
>> You shouldn't include new file here, list it in config.gcc before
>> i386/gnu-user.h or i386/gnu-user64.c
>>
>> Uros.
>
> Hello,
>
> Thanks for review! Here is a new version with all issues fixed.
>
> Bootstrapped on linux-x86_64 and check is in progress. Will it be OK
> for trunk and 4.7 after successfull check?

For sure such changes are not appropriate for a release branch.

Richard.

> Thanks,
> Ilya
> ---
> 2012-04-13  Enkovich Ilya  
>
>        * config/i386/gnu-user-common.h: New.
>
>        * config/i386/gnu-user.h (CPP_SPEC): Moved to
>        gnu-user-common.h.
>        (CC1_SPEC): Likewise.
>        (ENDFILE_SPEC): Likewise.
>        (DEFAULT_PCC_STRUCT_RETURN): Likewise.
>        (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
>        (TARGET_OS_CPP_BUILTINS): Likewise.
>        (LIBGCC2_HAS_TF_MODE): Likewise.
>        (LIBGCC2_TF_CEXT): Likewise.
>        (TF_SIZE): Likewise.
>        (TARGET_ASM_FILE_END): Likewise.
>        (STACK_CHECK_MOVING_SP): Likewise.
>        (STACK_CHECK_STATIC_BUILTIN): Likewise.
>
>        * config/i386/gnu-user64.h: Likewise.


Re: [PATCH] Fix for PR52734 (-ftree-tail-merge)

2012-04-13 Thread Tom de Vries
On 13/04/12 11:13, Richard Guenther wrote:
> On Fri, Apr 13, 2012 at 10:32 AM, Tom de Vries  wrote:
>> Richard,
>>
>> this patch fixes PR52743.
>>
>> The problem is as follows: blocks 3 and 5, with successor 6 are considered 
>> equal
>> and merged.
>> ...
>>  # BLOCK 3 freq:6102
>>  # PRED: 2 [61.0%]  (true,exec)
>>  # VUSE <.MEMD.1734_10>
>>  dddD.1710_3 = bbbD.1703;
>>  goto ;
>>  # SUCC: 6 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 5 freq:2378
>>  # PRED: 4 [61.0%]  (false,exec)
>>  # SUCC: 6 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 6 freq:1
>>  # PRED: 3 [100.0%]  (fallthru,exec) 7 [100.0%]  (fallthru) 5 [100.0%]
>> (fallthru,exec)
>>  # dddD.1710_1 = PHI 
>>  # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)>
>>  # VUSE <.MEMD.1734_8>
>>  return dddD.1710_1;
>>  # SUCC: EXIT [100.0%]
>> ...
>>
>> Tail merge considers 2 blocks equal if the effect at the tail is equal,
>> meaning:
>> - the sequence of side effects produced by each block is equal
>> - the value phis are equal
>>
>> There are no side effects in block 3 and 5, and the phi alternatives of
>> dddD.1710_1 for 3 (dddD.1710_3)  and 5 (dddD.1710_4)  are proven equal by 
>> gvn.
>>
>> The problem is that changing the (4->5) edge into a (4->3) edge changes the
>> value of dddD.1710_3, because block 4 contains a store that affects the load 
>> in
>> block 3.
>> ...
>>  # BLOCK 4 freq:3898
>>  # PRED: 2 [39.0%]  (false,exec)
>>  # VUSE <.MEMD.1734_10>
>>  dddD.1710_4 = bbbD.1703;
>>  # .MEMD.1734_11 = VDEF <.MEMD.1734_10>
>>  # USE = nonlocal null
>>  # CLB = nonlocal null
>>  D.1724_5 = aaaD.1705 ();
>>  if (D.1724_5 != 0)
>>goto ;
>>  else
>>goto ;
>>  # SUCC: 7 [39.0%]  (true,exec) 5 [61.0%]  (false,exec)
>> ...
>>
>> Or, put differently, the incoming vuse of block 3 affects a value phi
>> alternative for that block (dddD.1710_3), so the 2 blocks are equal only 
>> under
>> the condition that the incoming vuses are equal.
>>
>> We could build an analysis that addresses that precisely, but for now I
>> implemented a more coarse-grained fix: if the incoming vuses are not equal, 
>> and
>> at least one of the vuses influenced a non-virtual result, we don't consider 
>> the
>> blocks equal.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> ok for trunk, 4.7.1?
> 
> Hmm, I wonder if the proper observation would not be that while GVN considers
> the PHI arguments in
> 
>>  # dddD.1710_1 = PHI 
> 
> to be equal, their definitions are not in the blocks we try to merge, so their
> value can be dependent on the status as it was on their predecessors.
> GVN, after all, considers flow-sensitivity.
> 

we are replacing block 5 by block 3. The phi alternative for block 3 is
dddD.1710_3(3), which is defined in block 3. The problem is related to the vuse
dependency of the def of dddD.1710_3(3).

I'll try to reformulate. In tail_merge_optimize, we analyze:
1. if the blocks are equal (for which gvn might be used)
2. if the blocks can be merged. The blocks cannot be merged if the dependencies
   of the replacement block are not satisfied in the predecessors of the other
   blocks.

We handle the dependencies for virtual and normal values differently.

For normal values, we calculate BB_DEP_BB (The bb that either contains or is
dominated by the dependencies of the bb). If BB_DEP_BB of the replacement block
dominates the blocks that are replaced, the blocks can be merged.

For virtual values, we don't check for dependencies.

If we would check for virtual dependencies like normal dependencies, the
original example pr43864.c would not work anymore: there are 2 blocks with
identical result-less function calls, but with different incoming vuse.
It's safe to merge the blocks, so we don't treat the vuses as dependencies.

This test-case shows a case that the vuse is a dependency of the replacement
block, which is not satisfied by the predecessor of the replaced block.

We need an analysis of when a vuse needs to be considered a dependency.

This patch implement such an analysis. Conservative, but simple. OK for trunk,
4.7.1?

Thanks,
- Tom

> Richard.
> 
> 
>> Thanks,
>> - Tom
>>
>> 2012-04-13  Tom de Vries  
>>
>>* tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add
>>parameters vuse and vuse_escaped.
>>(find_duplicate): Init vuse1, vuse2 and vuse_escaped.  Pass to
>>gsi_advance_bw_nondebug_nonlocal.  Return if vuse_escaped and
>>vuse1 != vuse2.
>>
>>* gcc.dg/pr52734.c: New test.
>>



Commit: RL78: Remove use of TODO_dump_func

2012-04-13 Thread Nick Clifton
Hi DJ,

  The optimization pass flag "TODO_dump_flag" has been removed (see
  patch committed 2012-04-11) which was causing the RL78 backend to fail
  to build.  I am applying the following patch as an obvious fix.

Cheers
  Nick

gcc/ChangeLog
2012-04-13  Nick Clifton  

* config/rl78/rl78.c (rl78_devirt_pass): Remove use of
TODO_dump_func flag.

Index: gcc/config/rl78/rl78.c
===
--- gcc/config/rl78/rl78.c  (revision 186405)
+++ gcc/config/rl78/rl78.c  (working copy)
@@ -140,7 +140,7 @@
   TV_MACH_DEP,
   0, 0, 0,
   0,
-  TODO_dump_func
+  0
 };
 
 static struct register_pass_info rl78_devirt_info =



Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-13 Thread Igor Zamyatin
On Fri, Apr 13, 2012 at 2:18 PM, Igor Zamyatin  wrote:
> On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev  wrote:
>> On 12.04.2012 16:38, Richard Guenther wrote:
>>>
>>> On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin
>>>  wrote:

 On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
   wrote:
>
> On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov
>  wrote:
>>
>>
>>> Can atom execute two IMUL in parallel?  Or what exactly is the
>>> pipeline
>>> behavior?
>>
>>
>> As I understand from Intel's optimization reference manual, the
>> behavior is as
>> follows: if the instruction immediately following IMUL has shorter
>> latency,
>> execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
>> a
>> 4-or-more cycles latency instruction can be issued after IMUL without a
>> stall.
>> In other words, IMUL is pipelined with respect to other long-latency
>> instructions, but not to short-latency instructions.
>
>
> It seems to be modeled in the pipeline description though:
>
> ;;; imul insn has 5 cycles latency
> (define_reservation "atom-imul-32"
>                    "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
>                     atom-port-0")
>
> ;;; imul instruction excludes other non-FP instructions.
> (exclusion_set "atom-eu-0, atom-eu-1"
>               "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")
>

 The main idea is quite simple:

 If we are going to schedule IMUL instruction (it is on the top of
 ready list) we try to find out producer of other (independent) IMUL
 instruction that is in ready list too. The goal is try to schedule
 such a producer to get another IMUL in ready list and get scheduling
 of 2 successive IMUL instructions.
>>>
>>>
>>> Why does that not happen without your patch?  Does it never happen without
>>> your patch or does it merely not happen for one EEMBC benchmark (can
>>> you provide a testcase?)?
>>
>>
>> It does not happen because the scheduler by itself does not do such specific
>> reordering.  That said, it is easy to imagine the cases where this patch
>> will make things worse rather than better.
>>
>> Igor, why not try different subtler mechanisms like adjust_priority, which
>> is get called when an insn is added to the ready list?  E.g. increase the
>> producer's priority.
>>
>> The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
>> you have more than one imul in the ready list?  Don't you want to bump the
>> priority of the other imul found?
>
> Could you provide some examples when this patch would harm the performance?
>
> Sched_reorder was chosen since it is used in other ports and looks
> most suitable for such case, e.g. it provides access to the whole
> ready list.
> BTW, just increasing producer's priority seems to be more risky in
> performance sense - we can incorrectly start delaying some
> instructions.
>
> Thought ready list doesn't contain DEBUG_INSN... Is it so? If it
> contains them - this could be added easily
>
> The case with more than one imul in the ready list wasn't considered
> just because the goal was to catch the particular case when there is a
> risk to get the following picture: "imul-producer-imul" which is less
> effective than "producer-imul-imul" for Atom


Actually, this constraint of one imul could be omitted. I'm going to
try this change

>
>>
>> Andrey
>>
>>
>>>
 And MD allows us only prefer scheduling of successive IMUL instructions,
 i.e.
 If IMUL was just scheduled and ready list contains another IMUL
 instruction then it will be chosen as the best candidate for
 scheduling.


> at least from my very limited guessing of what the above does.  So, did
> you
> analyze why the scheduler does not properly schedule things for you?
>
> Richard.
>
>>
>>  From reading the patch, I could not understand the link between
>> pipeline
>> behavior and what the patch appears to do.
>>
>> Hope that helps.
>>
>> Alexander
>>
>>
>
> Thanks,
> Igor


Re: [PATCH] Fix for PR52734 (-ftree-tail-merge)

2012-04-13 Thread Richard Guenther
On Fri, Apr 13, 2012 at 12:56 PM, Tom de Vries  wrote:
> On 13/04/12 11:13, Richard Guenther wrote:
>> On Fri, Apr 13, 2012 at 10:32 AM, Tom de Vries  
>> wrote:
>>> Richard,
>>>
>>> this patch fixes PR52743.
>>>
>>> The problem is as follows: blocks 3 and 5, with successor 6 are considered 
>>> equal
>>> and merged.
>>> ...
>>>  # BLOCK 3 freq:6102
>>>  # PRED: 2 [61.0%]  (true,exec)
>>>  # VUSE <.MEMD.1734_10>
>>>  dddD.1710_3 = bbbD.1703;
>>>  goto ;
>>>  # SUCC: 6 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 5 freq:2378
>>>  # PRED: 4 [61.0%]  (false,exec)
>>>  # SUCC: 6 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 6 freq:1
>>>  # PRED: 3 [100.0%]  (fallthru,exec) 7 [100.0%]  (fallthru) 5 [100.0%]
>>> (fallthru,exec)
>>>  # dddD.1710_1 = PHI 
>>>  # .MEMD.1734_8 = PHI <.MEMD.1734_10(3), .MEMD.1734_11(7), .MEMD.1734_11(5)>
>>>  # VUSE <.MEMD.1734_8>
>>>  return dddD.1710_1;
>>>  # SUCC: EXIT [100.0%]
>>> ...
>>>
>>> Tail merge considers 2 blocks equal if the effect at the tail is equal,
>>> meaning:
>>> - the sequence of side effects produced by each block is equal
>>> - the value phis are equal
>>>
>>> There are no side effects in block 3 and 5, and the phi alternatives of
>>> dddD.1710_1 for 3 (dddD.1710_3)  and 5 (dddD.1710_4)  are proven equal by 
>>> gvn.
>>>
>>> The problem is that changing the (4->5) edge into a (4->3) edge changes the
>>> value of dddD.1710_3, because block 4 contains a store that affects the 
>>> load in
>>> block 3.
>>> ...
>>>  # BLOCK 4 freq:3898
>>>  # PRED: 2 [39.0%]  (false,exec)
>>>  # VUSE <.MEMD.1734_10>
>>>  dddD.1710_4 = bbbD.1703;
>>>  # .MEMD.1734_11 = VDEF <.MEMD.1734_10>
>>>  # USE = nonlocal null
>>>  # CLB = nonlocal null
>>>  D.1724_5 = aaaD.1705 ();
>>>  if (D.1724_5 != 0)
>>>    goto ;
>>>  else
>>>    goto ;
>>>  # SUCC: 7 [39.0%]  (true,exec) 5 [61.0%]  (false,exec)
>>> ...
>>>
>>> Or, put differently, the incoming vuse of block 3 affects a value phi
>>> alternative for that block (dddD.1710_3), so the 2 blocks are equal only 
>>> under
>>> the condition that the incoming vuses are equal.
>>>
>>> We could build an analysis that addresses that precisely, but for now I
>>> implemented a more coarse-grained fix: if the incoming vuses are not equal, 
>>> and
>>> at least one of the vuses influenced a non-virtual result, we don't 
>>> consider the
>>> blocks equal.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> ok for trunk, 4.7.1?
>>
>> Hmm, I wonder if the proper observation would not be that while GVN considers
>> the PHI arguments in
>>
>>>  # dddD.1710_1 = PHI 
>>
>> to be equal, their definitions are not in the blocks we try to merge, so 
>> their
>> value can be dependent on the status as it was on their predecessors.
>> GVN, after all, considers flow-sensitivity.
>>
>
> we are replacing block 5 by block 3. The phi alternative for block 3 is
> dddD.1710_3(3), which is defined in block 3. The problem is related to the 
> vuse
> dependency of the def of dddD.1710_3(3).
>
> I'll try to reformulate. In tail_merge_optimize, we analyze:
> 1. if the blocks are equal (for which gvn might be used)
> 2. if the blocks can be merged. The blocks cannot be merged if the 
> dependencies
>   of the replacement block are not satisfied in the predecessors of the other
>   blocks.
>
> We handle the dependencies for virtual and normal values differently.
>
> For normal values, we calculate BB_DEP_BB (The bb that either contains or is
> dominated by the dependencies of the bb). If BB_DEP_BB of the replacement 
> block
> dominates the blocks that are replaced, the blocks can be merged.
>
> For virtual values, we don't check for dependencies.
>
> If we would check for virtual dependencies like normal dependencies, the
> original example pr43864.c would not work anymore: there are 2 blocks with
> identical result-less function calls, but with different incoming vuse.
> It's safe to merge the blocks, so we don't treat the vuses as dependencies.
>
> This test-case shows a case that the vuse is a dependency of the replacement
> block, which is not satisfied by the predecessor of the replaced block.
>
> We need an analysis of when a vuse needs to be considered a dependency.

Ah, ok.  That makes sense then.

> This patch implement such an analysis. Conservative, but simple. OK for trunk,
> 4.7.1?

Yes.

Thanks,
Richard.

> Thanks,
> - Tom
>
>> Richard.
>>
>>
>>> Thanks,
>>> - Tom
>>>
>>> 2012-04-13  Tom de Vries  
>>>
>>>        * tree-ssa-tail-merge.c (gsi_advance_bw_nondebug_nonlocal): Add
>>>        parameters vuse and vuse_escaped.
>>>        (find_duplicate): Init vuse1, vuse2 and vuse_escaped.  Pass to
>>>        gsi_advance_bw_nondebug_nonlocal.  Return if vuse_escaped and
>>>        vuse1 != vuse2.
>>>
>>>        * gcc.dg/pr52734.c: New test.
>>>
>


Re: [i386, patch, RFC] HLE support in GCC

2012-04-13 Thread Kirill Yukhin
> No, just the bits; programmers would need to do
>  __atomic_...(..., __ATOMIC_RELEASE | HLE_RELEASE);
> I believe this is what you had in one of your versions of the patch.  My
> suggestions was not about doing something new but instead a
> suggestions/poll for a resolution of the discussion.

Oh, okay, got it.
So, seems it all covered by my recent patch (hle-rfc-5.gcc.patch).

Any other inputs?

Thanks, K


Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?

2012-04-13 Thread Richard Guenther
On Thu, 12 Apr 2012, Martin Jambor wrote:

> Hi,
> 
> On Wed, Apr 04, 2012 at 04:42:05PM +0200, Richard Guenther wrote:
> > On Wed, 4 Apr 2012, Martin Jambor wrote:
> > 
> > > Hi everyone, especially Richi and Eric,
> > > 
> > > I'd like to know what is your attitude to changing SRA's
> > > build_ref_for_model to what it once looked like, so that it produces
> > > COMPONENT_REFs only for bit-fields.  The non-bit field handling was
> > > added in order to work-around problems when expanding non-aligned
> > > MEM_REFs on strict-alignment targets but that should not be a problem
> > > now and my experiments confirm that.  Last week I successfully
> > > bootstrapped and tested this patch on sparc64-linux (with the
> > > temporary MEM_EXPR patch, not including Java), ia64-linux (without
> > > Ada), x86_64-linux, i686-linux and tested it on hppa-linux (only C and
> > > C++).
> 
> ...
> 
> > > 
> > > 2012-03-20 Martin Jambor 
> > > 
> > >   * tree-sra.c (build_ref_for_model): Create COMPONENT_REFs only for
> > >   bit-fields.
> > > 
> > > Index: src/gcc/tree-sra.c
> > > ===
> > > *** src.orig/gcc/tree-sra.c
> > > --- src/gcc/tree-sra.c
> > > *** build_ref_for_offset (location_t loc, tr
> > > *** 1489,1558 
> > > return fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > >   }
> > >   
> > > - DEF_VEC_ALLOC_P_STACK (tree);
> > > - #define VEC_tree_stack_alloc(alloc) VEC_stack_alloc (tree, alloc)
> > > - 
> > >   /* Construct a memory reference to a part of an aggregate BASE at the 
> > > given
> > > !OFFSET and of the type of MODEL.  In case this is a chain of 
> > > references
> > > !to component, the function will replicate the chain of 
> > > COMPONENT_REFs of
> > > !the expression of MODEL to access it.  GSI and INSERT_AFTER have the 
> > > same
> > > !meaning as in build_ref_for_offset.  */
> > >   
> > >   static tree
> > >   build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
> > >struct access *model, gimple_stmt_iterator *gsi,
> > >bool insert_after)
> > >   {
> > > !   tree type = model->type, t;
> > > !   VEC(tree,stack) *cr_stack = NULL;
> > > ! 
> > > !   if (TREE_CODE (model->expr) == COMPONENT_REF)
> > >   {
> > > !   tree expr = model->expr;
> > > ! 
> > > !   /* Create a stack of the COMPONENT_REFs so later we can walk them 
> > > in
> > > !  order from inner to outer.  */
> > > !   cr_stack = VEC_alloc (tree, stack, 6);
> > > ! 
> > > !   do {
> > > ! tree field = TREE_OPERAND (expr, 1);
> > > ! tree cr_offset = component_ref_field_offset (expr);
> > > ! HOST_WIDE_INT bit_pos
> > > !   = tree_low_cst (cr_offset, 1) * BITS_PER_UNIT
> > > !   + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field));
> > >   
> > > ! /* We can be called with a model different from the one 
> > > associated
> > > !with BASE so we need to avoid going up the chain too far.  */
> > > ! if (offset - bit_pos < 0)
> > > !   break;
> > > ! 
> > > ! offset -= bit_pos;
> > > ! VEC_safe_push (tree, stack, cr_stack, expr);
> > > ! 
> > > ! expr = TREE_OPERAND (expr, 0);
> > > ! type = TREE_TYPE (expr);
> > > !   } while (TREE_CODE (expr) == COMPONENT_REF);
> > >   }
> > > ! 
> > > !   t = build_ref_for_offset (loc, base, offset, type, gsi, insert_after);
> > > ! 
> > > !   if (TREE_CODE (model->expr) == COMPONENT_REF)
> > > ! {
> > > !   unsigned i;
> > > !   tree expr;
> > > ! 
> > > !   /* Now replicate the chain of COMPONENT_REFs from inner to outer. 
> > >  */
> > > !   FOR_EACH_VEC_ELT_REVERSE (tree, cr_stack, i, expr)
> > > ! {
> > > !   tree field = TREE_OPERAND (expr, 1);
> > > !   t = fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (field), 
> > > t, field,
> > > !TREE_OPERAND (expr, 2));
> > > ! }
> > > ! 
> > > !   VEC_free (tree, stack, cr_stack);
> > > ! }
> > > ! 
> > > !   return t;
> > >   }
> > >   
> > >   /* Construct a memory reference consisting of component_refs and 
> > > array_refs to
> > > --- 1489,1520 
> > > return fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > >   }
> > >   
> > >   /* Construct a memory reference to a part of an aggregate BASE at the 
> > > given
> > > !OFFSET and of the same type as MODEL.  In case this is a reference 
> > > to a
> > > !bit-field, the function will replicate the last component_ref of 
> > > model's
> > > !expr to access it.  GSI and INSERT_AFTER have the same meaning as in
> > > !build_ref_for_offset.  */
> > >   
> > >   static tree
> > >   build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
> > >struct access *model, gimple_stmt_iterator *gsi,
> > >bool insert_after)
>

Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?

2012-04-13 Thread Rainer Orth
Richard Guenther  writes:

>> Anyway, the patch I posted previously would risk re-introducing PR
>> 50386 and PR 50326, even though they are very unlikely with just
>> bit-fields.  So my current working version is the following, but it
>> causes failure of libmudflap.c++/pass55-frag.cxx execution test so I'm
>> not actually proposing it yet (sigh).
>
> I would not worry about mudflap tests.  The patch looks good to my
> eyes.

Are you sure the failure is new?  At least for 64-bit at -O,
libmudflap.c++/pass55-frag.cxx already fails right now (cf. PR
libmudflap/49843).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix PR52969

2012-04-13 Thread Richard Guenther

The following patch fixes the missed handling of TRUTH_NOT_EXPR
predicates in predicate_mem_writes and general combined predicates
which need gimplification.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-04-13  Richard Guenther  

PR tree-optimization/52969
* tree-if-conv.c (predicate_mem_writes): Properly gimplify
the condition for the COND_EXPR and handle predicate negation
by swapping the COND_EXPR arms.

* gcc.dg/torture/pr52969.c: New testcase.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 186406)
+++ gcc/tree-if-conv.c  (working copy)
@@ -1543,11 +1522,19 @@ predicate_mem_writes (loop_p loop)
   gimple_stmt_iterator gsi;
   basic_block bb = ifc_bbs[i];
   tree cond = bb_predicate (bb);
+  bool swap;
   gimple stmt;
 
   if (is_true_predicate (cond))
continue;
 
+  swap = false;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+   {
+ swap = true;
+ cond = TREE_OPERAND (cond, 0);
+   }
+
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
if ((stmt = gsi_stmt (gsi))
&& gimple_assign_single_p (stmt)
@@ -1559,6 +1546,15 @@ predicate_mem_writes (loop_p loop)
 
lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
+   if (swap)
+ {
+   tree tem = lhs;
+   lhs = rhs;
+   rhs = tem;
+ }
+   cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+  is_gimple_condexpr, NULL_TREE,
+  true, GSI_SAME_STMT);
rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
update_stmt (stmt);
Index: gcc/testsuite/gcc.dg/torture/pr52969.c
===
--- gcc/testsuite/gcc.dg/torture/pr52969.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr52969.c  (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-ftree-loop-if-convert-stores" } */
+
+int a, b;
+float xsum[100];
+void foo (float *cluster)
+{
+  int j;
+  for (; a ; ++j) {
+  xsum[j] = cluster[j];
+  if (xsum[j] > 0)
+   xsum[j] = 0;
+  }
+  if (xsum[0])
+b = 0;
+}


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-13 Thread Andrey Belevantsev

On 13.04.2012 14:18, Igor Zamyatin wrote:

On Thu, Apr 12, 2012 at 5:01 PM, Andrey Belevantsev  wrote:

On 12.04.2012 16:38, Richard Guenther wrote:


On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin
  wrote:


On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
wrote:


On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov
  wrote:




Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?



As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.



It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation "atom-imul-32"
"atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0")

;;; imul instruction excludes other non-FP instructions.
(exclusion_set "atom-eu-0, atom-eu-1"
   "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.



Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?



It does not happen because the scheduler by itself does not do such specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.

Igor, why not try different subtler mechanisms like adjust_priority, which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
you have more than one imul in the ready list?  Don't you want to bump the
priority of the other imul found?


Could you provide some examples when this patch would harm the performance?


I thought of the cases when the other ready insns can fill up the hole and 
that would be more beneficial because e.g. they would be on more critical 
paths than the producer of your second imul.  I don't know enough of Atom 
to give an example -- maybe some long divisions?




Sched_reorder was chosen since it is used in other ports and looks
most suitable for such case, e.g. it provides access to the whole
ready list.
BTW, just increasing producer's priority seems to be more risky in
performance sense - we can incorrectly start delaying some
instructions.


Yes, but exactly because of the above example you can start incorrectly 
delaying other insns, too, as you force the insn to be the first in the 
list.  While bumping priority still leaves the scheduler sorting heuristics 
in place and actually lowers that risk.




Thought ready list doesn't contain DEBUG_INSN... Is it so? If it
contains them - this could be added easily


It does, but I'm not sure the sched_reorder hook gets them or they are 
immediately removed -- I saw similar checks in one of the targets' hooks.


Anyways, my main thought was that it is better to test on more benchmarks 
to alleviate the above concerns, so as long as the i386 maintainers are 
happy, I don't see major problems here.  A good idea could be to generalize 
the patch to handle other long latency insns as second consumers, not only 
imuls, if this is relevant for Atom.


Andrey



The case with more than one imul in the ready list wasn't considered
just because the goal was to catch the particular case when there is a
risk to get the following picture: "imul-producer-imul" which is less
effective than "producer-imul-imul" for Atom



Andrey





And MD allows us only prefer scheduling of successive IMUL instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.



at least from my very limited guessing of what the above does.  So, did
you
analyze why the scheduler does not properly schedule things for you?

Richard.



  From reading the patch, I could not understand the link between
pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander





Thanks,
Igor




default to strict dwarf-2 on powerpc-vxworks

2012-04-13 Thread Olivier Hainque
Hello,

On typical VxWorks environments, WindRiver integrated tools are used as
much if not more than gdb for debugging purposes.

These evolve at an industrial pace, traditionally not as fast as GCC
regarding the support of latest dwarf standards.

As of today, in our experience, the best compromise to let WRS debugging
tools work fine and have negligible effect on gdb's experience for this
target is to enforce strict dwarf-2 debugging info by default. 

We have been doing this in our production compilers for a while and find
it really useful, so we're happy to contribute here, in case this is deemed
more generally appropriate. 

The attached patch achieves this in two mini steps:

- Allow to distinguish between a dwarf version picked by default by the
  compiler and one provided on the command line that happens to match the
  default, which is useful to make sure we don't override a request for
  an explicit level on the command line.

  This is achieved by initializing dwarf_version to -1 instead of 2 from
  common.opt, then pick 2 as a default from toplev.c:process_options,
  similar to the scheme in place for dwarf_strict already.

- Default to dwarf_strict = 1 && dwarf_version = 2 for VxWorks via
  vxworks_override_options.

Sanity checked on mainline for powerpc-wrs-vxworks and bootstrapped
on x86_64-pc-linux-gnu.

Thanks in advance for your feedback,

With Kind Regards,

Olivier

2012-04-13  Olivier Hainque  

* common.opt (gdwarf-): Initialize dwarf_version to -1 instead of 2.
* toplev.c (process_options): Default to dwarf_version 2.
* config/vxworks.c (vxworks_override_options): Default to strict-dwarf
and dwarf_version 2.



vxdwarf.dif
Description: video/dv





Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread NightStrike
On Wed, Apr 11, 2012 at 4:46 AM, Richard Guenther
 wrote:
> On Tue, Apr 10, 2012 at 9:08 PM, NightStrike  wrote:
>> On Tue, Apr 10, 2012 at 10:38 AM, Richard Guenther
>>  wrote:
>>> On Tue, Apr 10, 2012 at 3:06 PM, JonY  wrote:
 On 4/10/2012 20:37, Richard Guenther wrote:
> On Tue, Apr 10, 2012 at 2:15 PM, JonY wrote:
>> Hi,
>>
>> Patch OK?
>
> What kind of warning?
>

 Oops, I forgot to mention gcc was complaining about missing braces.
>>>
>>> Not for me.  And I fail to see why it should.
>>
>> This is the warning:
>>
>> ../../../../build/gcc/src/gcc/tree-parloops.c:724: warning: suggest
>> explicit braces to avoid ambiguous `else'
>
>> ./xgcc -B. -c   -g -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual 
>> -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute 
>> -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings 
>> -Wold-style-definition -Wc++-compat -fno-common  -DHAVE_CONFIG_H -I. -I. 
>> -I/space/rguenther/src/svn/trunk/gcc -I/space/rguenther/src/svn/trunk/gcc/. 
>> -I/space/rguenther/src/svn/trunk/gcc/../include 
>> -I/space/rguenther/src/svn/trunk/gcc/../libcpp/include  
>> -I/space/rguenther/src/svn/trunk/gcc/../libdecnumber 
>> -I/space/rguenther/src/svn/trunk/gcc/../libdecnumber/bid -I../libdecnumber   
>>  /space/rguenther/src/svn/trunk/gcc/tree-parloops.c -o tree-parloops.o
>
> no warning from trunk.  Which GCC version emits this warning?  Note that
> we do not try to keep stage1 warning-free but only later stages (which is
> why we use -Werror there).
>
> Richard.

Looks like cygwin gcc 3.4.4


recognize .persistent.bss sections as bss

2012-04-13 Thread Olivier Hainque
Hello,

For several years now, Ada has support for a "Persistent_BSS" pragma
that let users place data in a ".persistent.bss" section. This section: 

- needs to be treated as a bss section by the compiler (in particular,
  to set flags that will prevent use of space in executable files)

- can be treated specially by run-time loaders, not resetting to all
  zeroes on application warm-restarts. 

This patch simply adds ".persistent.bss" to the list of section names
that GCC identifies a bss sections.

Bootstrapped and regtested on x86_84-pc-linux-gnu.

OK to commit ?

Thanks in advance,

Olivier

2012-04-13  Olivier Hainque  

* varasm.c (default_section_type_flags): Flag .persistent.bss
sections as SECTION_BSS.



pbss.dif
Description: video/dv


Re: [patch, fortran] Trim spaces on list-directed reads

2012-04-13 Thread Thomas Koenig

Am 10.04.2012 14:32, schrieb Thomas Koenig:

Hello world,

this patch effectively trims the spaces from the string on
list-directed reads. This avoids the large overhead on
processing these spaces when reading from long lines.


Ping ** 0.4285714?


[PATCH, PR 52939] Gracefully deal with fold_ctor_reference returning NULL during devirtualization

2012-04-13 Thread Martin Jambor
Hi,

currently we ICE when attempting to devirtualize a call to a virtual
method introduced in a descendant but with a base which is an ancestor
which does not have it.  This is because fold_ctor_reference returns
constant zero when it cannot find the particular value in the provided
constructor which is something gimple_get_virt_method_for_binfo cannot
cope with.  The patch below avoids it by simply testing for that
value and bailing out.

Bootstrapped and tested on x86_64-linux, OK for trunk?

Thanks,

Martin


2012-04-12  Martin Jambor  

PR middle-end/52939
* gimple-fold.c (gimple_get_virt_method_for_binfo): Bail out if
fold_ctor_reference returns a zero constant.

* testsuite/g++.dg/ipa/pr52939.C: New test.

Index: src/gcc/gimple-fold.c
===
--- src.orig/gcc/gimple-fold.c
+++ src/gcc/gimple-fold.c
@@ -3087,7 +3087,7 @@ gimple_get_virt_method_for_binfo (HOST_W
   offset += token * size;
   fn = fold_ctor_reference (TREE_TYPE (TREE_TYPE (v)), DECL_INITIAL (v),
offset, size);
-  if (!fn)
+  if (!fn || integer_zerop (fn))
 return NULL_TREE;
   gcc_assert (TREE_CODE (fn) == ADDR_EXPR
  || TREE_CODE (fn) == FDESC_EXPR);
Index: src/gcc/testsuite/g++.dg/ipa/pr52939.C
===
--- /dev/null
+++ src/gcc/testsuite/g++.dg/ipa/pr52939.C
@@ -0,0 +1,58 @@
+/* Verify that we do not ICE on invalid devirtualizations (which might
+   be OK at run-time because never executed).  */
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-early-inlining -fno-inline"  } */
+
+extern "C" void abort (void);
+
+class A
+{
+public:
+  int data;
+  virtual int foo (int i);
+};
+
+class B : public A
+{
+public:
+  virtual int foo (int i);
+  virtual int bar (int i);
+};
+
+int A::foo (int i)
+{
+  return i + 1;
+}
+
+int B::foo (int i)
+{
+  return i + 2;
+}
+
+int B::bar (int i)
+{
+  return i + 3;
+}
+
+static int middleman (class A *obj, int i)
+{
+  class B *b = (class B *) obj;
+
+  if (i != 1)
+return b->bar (i);
+  else
+return i;
+}
+
+int __attribute__ ((noinline,noclone)) get_input(void)
+{
+  return 1;
+}
+
+int main (int argc, char *argv[])
+{
+  class A o;
+  if (middleman (&o, get_input ()) != 1)
+abort ();
+  return 0;
+}


Re: PING Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2

2012-04-13 Thread Mike Stump
On Apr 13, 2012, at 3:51 AM, Bernhard Reutner-Fischer wrote:
> Ping.

Before advancing, has the problem that Rainer pointed out on March 19th with 
your earlier patch been fixed?


[patch, fortran] PR fortran/52537 Optimize string comparisons against empty strings

2012-04-13 Thread Thomas Koenig

Hello world,

this patch replaces  a != '' with len_trim(a) != 0, to
speed up the comparison.  It also introduces a bit of cleanup
in frontend-passes.c.

Regression-tested. OK for trunk?

Thomas

2012-04-13  Thomas Koenig  

PR fortran/52537
* frontend-passes.c (optimize_op):  Change
old-style comparison operators to new-style, simplify
switch as a result.
(empty_string):  New function.
(get_len_trim_call):  New function.
(optimize_comparison):  If comparing to an empty string,
use comparison of len_trim to zero.
Use new-style comparison operators only.
(optimize_trim):  Use get_len_trim_call.

2012-04-13  Thomas Koenig  

PR fortran/52537
* gfortran.dg/string_compare_4.f90:  New test.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 186213)
+++ frontend-passes.c	(Arbeitskopie)
@@ -796,20 +796,45 @@ optimize_op (gfc_expr *e)
 {
   gfc_intrinsic_op op = e->value.op.op;
 
+  /* Only use new-style comparisions.  */
+  switch(op)
+{
+case INTRINSIC_EQ_OS:
+  op = INTRINSIC_EQ;
+  break;
+
+case INTRINSIC_GE_OS:
+  op = INTRINSIC_GE;
+  break;
+
+case INTRINSIC_LE_OS:
+  op = INTRINSIC_LE;
+  break;
+
+case INTRINSIC_NE_OS:
+  op = INTRINSIC_NE;
+  break;
+
+case INTRINSIC_GT_OS:
+  op = INTRINSIC_GT;
+  break;
+
+case INTRINSIC_LT_OS:
+  op = INTRINSIC_LT;
+  break;
+
+default:
+  break;
+}
+
   switch (op)
 {
 case INTRINSIC_EQ:
-case INTRINSIC_EQ_OS:
 case INTRINSIC_GE:
-case INTRINSIC_GE_OS:
 case INTRINSIC_LE:
-case INTRINSIC_LE_OS:
 case INTRINSIC_NE:
-case INTRINSIC_NE_OS:
 case INTRINSIC_GT:
-case INTRINSIC_GT_OS:
 case INTRINSIC_LT:
-case INTRINSIC_LT_OS:
   return optimize_comparison (e, op);
 
 default:
@@ -819,6 +844,61 @@ optimize_op (gfc_expr *e)
   return false;
 }
 
+/* Return true if a constant string contains spaces only.  */
+
+static bool
+empty_string (gfc_expr *e)
+{
+  int i;
+
+  if (e->ts.type != BT_CHARACTER || e->expr_type != EXPR_CONSTANT)
+return false;
+
+  for (i=0; ivalue.character.length; i++)
+{
+  if (e->value.character.string[i] != ' ')
+	return false;
+}
+
+  return true;
+}
+
+/* Insert a call to the intrinsic len_trim. Use a different name for
+   the symbol tree so we don't run into trouble when the user has
+   renamed len_trim for some reason.  */
+
+static gfc_expr*
+get_len_trim_call (gfc_expr *str, int kind)
+{
+  gfc_expr *fcn;
+  gfc_actual_arglist *actual_arglist, *next;
+
+  fcn = gfc_get_expr ();
+  fcn->expr_type = EXPR_FUNCTION;
+  fcn->value.function.isym = gfc_intrinsic_function_by_id (GFC_ISYM_LEN_TRIM);
+  actual_arglist = gfc_get_actual_arglist ();
+  actual_arglist->expr = str;
+  next = gfc_get_actual_arglist ();
+  next->expr = gfc_get_int_expr (gfc_default_integer_kind, NULL, kind);
+  actual_arglist->next = next;
+
+  fcn->value.function.actual = actual_arglist;
+  fcn->where = str->where;
+  fcn->ts.type = BT_INTEGER;
+  fcn->ts.kind = gfc_charlen_int_kind;
+
+  gfc_get_sym_tree("__internal_len_trim", current_ns, &fcn->symtree, false);
+  fcn->symtree->n.sym->ts = fcn->ts;
+  fcn->symtree->n.sym->attr.flavor = FL_PROCEDURE;
+  fcn->symtree->n.sym->attr.function = 1;
+  fcn->symtree->n.sym->attr.elemental = 1;
+  fcn->symtree->n.sym->attr.referenced = 1;
+  fcn->symtree->n.sym->attr.access = ACCESS_PRIVATE;
+  gfc_commit_symbol (fcn->symtree->n.sym);
+
+  return fcn;
+}
+
 /* Optimize expressions for equality.  */
 
 static bool
@@ -862,6 +942,46 @@ optimize_comparison (gfc_expr *e, gfc_intrinsic_op
   if (e->rank > 0)
 return change;
 
+  /* Replace a == '' with len_trim(a) == 0 and a /= '' with
+ len_trim(a) != 0 */
+  if (op1->ts.type == BT_CHARACTER && op2->ts.type == BT_CHARACTER
+  && (op == INTRINSIC_EQ || op == INTRINSIC_NE))
+{
+
+  bool empty_op1, empty_op2;
+  empty_op1 = empty_string(op1);
+  empty_op2 = empty_string(op2);
+
+  if (empty_op1 || empty_op2)
+	{
+	  gfc_expr *fcn;
+	  gfc_expr *zero;
+	  gfc_expr *str;
+
+	  /* This can only happen when an error for comparing
+	 characters of different kinds has already been issued.  */
+	  if (empty_op1 && empty_op2)
+	return false;
+
+	  zero = gfc_get_int_expr (gfc_charlen_int_kind, &e->where, 0);
+	  str = empty_op1 ? op2 : op1;
+
+	  fcn = get_len_trim_call (str, gfc_charlen_int_kind);
+
+
+	  if (empty_op1)
+	gfc_free_expr (op1);
+	  else
+	gfc_free_expr (op2);
+
+	  op1 = fcn;
+	  op2 = zero;
+	  e->value.op.op1 = fcn;
+	  e->value.op.op2 = zero;
+	}
+}
+
+
   /* Don't compare REAL or COMPLEX expressions when honoring NaNs.  */
 
   if (flag_finite_math_only
@@ -935,32 +1055,26 @@ optimize_comparison (gfc_expr *e, gfc_intrinsic_op
 	  switch (op)
 	{
 	case INTRINSIC_EQ:
-	case INTRI

Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread Uros Bizjak
On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich  wrote:

> 2012-04-13  Enkovich Ilya  
>
>        * config/i386/gnu-user-common.h: New.
>
>        * config/i386/gnu-user.h (CPP_SPEC): Moved to
>        gnu-user-common.h.
>        (CC1_SPEC): Likewise.
>        (ENDFILE_SPEC): Likewise.
>        (DEFAULT_PCC_STRUCT_RETURN): Likewise.
>        (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
>        (TARGET_OS_CPP_BUILTINS): Likewise.
>        (LIBGCC2_HAS_TF_MODE): Likewise.
>        (LIBGCC2_TF_CEXT): Likewise.
>        (TF_SIZE): Likewise.
>        (TARGET_ASM_FILE_END): Likewise.
>        (STACK_CHECK_MOVING_SP): Likewise.
>        (STACK_CHECK_STATIC_BUILTIN): Likewise.
>
>        * config/i386/gnu-user64.h: Likewise.

OK for mainline SVN, but as Richard said, this is not appropriate for
release branches.

Thanks,
Uros.


Re: [PATCH, PR 52939] Gracefully deal with fold_ctor_reference returning NULL during devirtualization

2012-04-13 Thread Richard Guenther
On Fri, 13 Apr 2012, Martin Jambor wrote:

> Hi,
> 
> currently we ICE when attempting to devirtualize a call to a virtual
> method introduced in a descendant but with a base which is an ancestor
> which does not have it.  This is because fold_ctor_reference returns
> constant zero when it cannot find the particular value in the provided
> constructor which is something gimple_get_virt_method_for_binfo cannot
> cope with.  The patch below avoids it by simply testing for that
> value and bailing out.
> 
> Bootstrapped and tested on x86_64-linux, OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> 2012-04-12  Martin Jambor  
> 
>   PR middle-end/52939
>   * gimple-fold.c (gimple_get_virt_method_for_binfo): Bail out if
>   fold_ctor_reference returns a zero constant.
> 
>   * testsuite/g++.dg/ipa/pr52939.C: New test.
> 
> Index: src/gcc/gimple-fold.c
> ===
> --- src.orig/gcc/gimple-fold.c
> +++ src/gcc/gimple-fold.c
> @@ -3087,7 +3087,7 @@ gimple_get_virt_method_for_binfo (HOST_W
>offset += token * size;
>fn = fold_ctor_reference (TREE_TYPE (TREE_TYPE (v)), DECL_INITIAL (v),
>   offset, size);
> -  if (!fn)
> +  if (!fn || integer_zerop (fn))
>  return NULL_TREE;
>gcc_assert (TREE_CODE (fn) == ADDR_EXPR
> || TREE_CODE (fn) == FDESC_EXPR);
> Index: src/gcc/testsuite/g++.dg/ipa/pr52939.C
> ===
> --- /dev/null
> +++ src/gcc/testsuite/g++.dg/ipa/pr52939.C
> @@ -0,0 +1,58 @@
> +/* Verify that we do not ICE on invalid devirtualizations (which might
> +   be OK at run-time because never executed).  */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fno-early-inlining -fno-inline"  } */
> +
> +extern "C" void abort (void);
> +
> +class A
> +{
> +public:
> +  int data;
> +  virtual int foo (int i);
> +};
> +
> +class B : public A
> +{
> +public:
> +  virtual int foo (int i);
> +  virtual int bar (int i);
> +};
> +
> +int A::foo (int i)
> +{
> +  return i + 1;
> +}
> +
> +int B::foo (int i)
> +{
> +  return i + 2;
> +}
> +
> +int B::bar (int i)
> +{
> +  return i + 3;
> +}
> +
> +static int middleman (class A *obj, int i)
> +{
> +  class B *b = (class B *) obj;
> +
> +  if (i != 1)
> +return b->bar (i);
> +  else
> +return i;
> +}
> +
> +int __attribute__ ((noinline,noclone)) get_input(void)
> +{
> +  return 1;
> +}
> +
> +int main (int argc, char *argv[])
> +{
> +  class A o;
> +  if (middleman (&o, get_input ()) != 1)
> +abort ();
> +  return 0;
> +}
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2

2012-04-13 Thread Mike Stump
On Apr 3, 2012, at 5:16 AM, Bernhard Reutner-Fischer wrote:
> The second part of implicitly doing cleanup-modules is to remove the now
> superfluous dg-final directives.

Ok once the issue Rainer pointed out is addressed.  As for the ChangeLog, I'd 
be tempted to list them as:

* gfortran.dg/*.f90: Remove now redundant manual
  cleanup-modules directive.

I think that strikes a reasonable balance.


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread Mike Stump
On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
>> no warning from trunk.  Which GCC version emits this warning?

> Looks like cygwin gcc 3.4.4

3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
compiler...  :-)


Re: [PATCH] More code refactoring

2012-04-13 Thread Jan Hubicka
> 
> Eventually I'll succeed in making tree-optimize.c empty.  At least
> the pass stuff I'm interested in get's better now.

Decompozing tree-optimize was on my wishlist, too.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Richard.
> 
> 2012-04-12  Richard Guenther  
> 
>   * Makefile.in (cgraphunit.o): Add $(EXCEPT_H) dependency.
>   * cgraph.h (tree_rest_of_compilation): Remove.
>   * cgraph.c (cgraph_add_new_function): Move ...
>   * cgraphunit.c (cgraph_add_new_function): ... here.
>   (tree_rest_of_compilation): Make static.
>   (cgraph_expand_function): Do not set cgraph_function_flags_ready.

We could try keep in mind that cgraphunit.c is a historical mess and should
be also dismantled ;)
Majority of logic should go into symbol table (I have symtab.c for that),
cgraphbuild or pass management.

Honza


Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread H.J. Lu
On Fri, Apr 13, 2012 at 7:02 AM, Uros Bizjak  wrote:
> On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich  
> wrote:
>
>> 2012-04-13  Enkovich Ilya  
>>
>>        * config/i386/gnu-user-common.h: New.
>>
>>        * config/i386/gnu-user.h (CPP_SPEC): Moved to
>>        gnu-user-common.h.
>>        (CC1_SPEC): Likewise.
>>        (ENDFILE_SPEC): Likewise.
>>        (DEFAULT_PCC_STRUCT_RETURN): Likewise.
>>        (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
>>        (TARGET_OS_CPP_BUILTINS): Likewise.
>>        (LIBGCC2_HAS_TF_MODE): Likewise.
>>        (LIBGCC2_TF_CEXT): Likewise.
>>        (TF_SIZE): Likewise.
>>        (TARGET_ASM_FILE_END): Likewise.
>>        (STACK_CHECK_MOVING_SP): Likewise.
>>        (STACK_CHECK_STATIC_BUILTIN): Likewise.
>>
>>        * config/i386/gnu-user64.h: Likewise.
>
> OK for mainline SVN, but as Richard said, this is not appropriate for
> release branches.
>

Ilya, I will check it for you.  There is no need for release branches.

Thanks.


-- 
H.J.


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread Bernd Schmidt
On 04/13/2012 04:20 PM, Mike Stump wrote:
> On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
>>> no warning from trunk.  Which GCC version emits this warning?
> 
>> Looks like cygwin gcc 3.4.4
> 
> 3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
> compiler...  :-)

The thing is, the else really is ambiguous, so it looks like we have a
warning regression.


Bernd



Re: PING Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2

2012-04-13 Thread Bernhard Reutner-Fischer
On Fri, Apr 13, 2012 at 06:57:44AM -0700, Mike Stump wrote:
>On Apr 13, 2012, at 3:51 AM, Bernhard Reutner-Fischer wrote:
>> Ping.
>
>Before advancing, has the problem that Rainer pointed out on March 19th with 
>your earlier patch been fixed?

I believe that it is fixed, yes. See r185688 and my follow up to him (
http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01498.html )


Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread H.J. Lu
On Fri, Apr 13, 2012 at 7:34 AM, H.J. Lu  wrote:
> On Fri, Apr 13, 2012 at 7:02 AM, Uros Bizjak  wrote:
>> On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich  
>> wrote:
>>
>>> 2012-04-13  Enkovich Ilya  
>>>
>>>        * config/i386/gnu-user-common.h: New.
>>>
>>>        * config/i386/gnu-user.h (CPP_SPEC): Moved to
>>>        gnu-user-common.h.
>>>        (CC1_SPEC): Likewise.
>>>        (ENDFILE_SPEC): Likewise.
>>>        (DEFAULT_PCC_STRUCT_RETURN): Likewise.
>>>        (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
>>>        (TARGET_OS_CPP_BUILTINS): Likewise.
>>>        (LIBGCC2_HAS_TF_MODE): Likewise.
>>>        (LIBGCC2_TF_CEXT): Likewise.
>>>        (TF_SIZE): Likewise.
>>>        (TARGET_ASM_FILE_END): Likewise.
>>>        (STACK_CHECK_MOVING_SP): Likewise.
>>>        (STACK_CHECK_STATIC_BUILTIN): Likewise.
>>>
>>>        * config/i386/gnu-user64.h: Likewise.
>>
>> OK for mainline SVN, but as Richard said, this is not appropriate for
>> release branches.
>>
>
> Ilya, I will check it for you.  There is no need for release branches.
>

ChangeLog for config.gcc is missing.


-- 
H.J.


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread Jakub Jelinek
On Fri, Apr 13, 2012 at 04:33:17PM +0200, Bernd Schmidt wrote:
> On 04/13/2012 04:20 PM, Mike Stump wrote:
> > On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
> >>> no warning from trunk.  Which GCC version emits this warning?
> > 
> >> Looks like cygwin gcc 3.4.4
> > 
> > 3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
> > compiler...  :-)
> 
> The thing is, the else really is ambiguous, so it looks like we have a
> warning regression.

To the compiler?  There is for in between, no need to put extra braces IMHO.
I believe it has been intentionally fixed for 4.0, but my bisect tree only
goes back to r9.

Jakub


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread Bernd Schmidt
On 04/13/2012 04:44 PM, Jakub Jelinek wrote:
> On Fri, Apr 13, 2012 at 04:33:17PM +0200, Bernd Schmidt wrote:
>> On 04/13/2012 04:20 PM, Mike Stump wrote:
>>> On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
> no warning from trunk.  Which GCC version emits this warning?
>>>
 Looks like cygwin gcc 3.4.4
>>>
>>> 3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
>>> compiler...  :-)
>>
>> The thing is, the else really is ambiguous, so it looks like we have a
>> warning regression.
> 
> To the compiler?  There is for in between, no need to put extra braces IMHO.
> I believe it has been intentionally fixed for 4.0, but my bisect tree only
> goes back to r9.

Well, to the compiler even
if ()
  if ()
x
  else
y

has an unambiguous meaning. The problem is that a human might be
confused, for example due to bad indentation. Whether there's a for in
between doesn't matter for this purpose, the following is most likely a bug:

if ()
  for (..)
if ()
  x
else
  y


Bernd


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread NightStrike
On Fri, Apr 13, 2012 at 10:20 AM, Mike Stump  wrote:
> On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
>>> no warning from trunk.  Which GCC version emits this warning?
>
>> Looks like cygwin gcc 3.4.4
>
> 3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
> compiler...  :-)

Yeah, not sure why cygwin doesn't make the gcc4 packages the default.


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread NightStrike
On Fri, Apr 13, 2012 at 10:33 AM, Bernd Schmidt  wrote:
> On 04/13/2012 04:20 PM, Mike Stump wrote:
>> On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
 no warning from trunk.  Which GCC version emits this warning?
>>
>>> Looks like cygwin gcc 3.4.4
>>
>> 3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
>> compiler...  :-)
>
> The thing is, the else really is ambiguous, so it looks like we have a
> warning regression.

I'd agree here.  It's why I didn't think to try with a new compiler.


Re: fix left-over debug insns in DCE

2012-04-13 Thread Alexandre Oliva
On Apr  9, 2012, Eric Botcazou  wrote:

> Could you add a comment for each value?

Done

> Missing "extern" for all declarations.

Thanks, added.

> I don't understand the "or _WITH_VALUE otherwise" part of the comment.

Sorry, my bad.  It didn't make sense.  Fixed.

> Please add a comment explaining what this is doing.

How's this?

I've just installed the patch, but if you find the need for any further
improvement, let me know and I'll do it right away.

Thanks,

for  gcc/ChangeLog
from  Alexandre Oliva  

	PR debug/48866
	* df.h (enum debug_temp_where): New.
	(dead_debug_init, dead_debug_finish) Declare.
	(dead_debug_add, dead_debug_insert_temp): Declare.
	(struct dead_debug_use, struct dead_debug): Moved from...
	* df-problems.c: ... here.
	(df_set_unused_notes_for_mw): Bind debug uses of unused regno
	to a debug temp.
	(df_create_unused_note): Likewise.
	(df_set_dead_notes_for_mw): Move comment where it belongs.
	(dead_debug_init): Export.
	(dead_debug_reset_uses): New, factored out of...
	(dead_debug_finish): ...this.  Export.
	(dead_debug_reset): Remove.
	(dead_debug_add): Export.
	(dead_debug_insert_before): Rename to...
	(dead_debug_insert_temp): ... this.  Add where argument.  Export.
	Locate stored value for BEFORE_WITH_VALUE.  Avoid repeat inserts.
	Return insertion count.
	(df_note_bb_compute): Adjust.
	* dce.c (word_dce_process_block): Adjust dead debug uses.
	(dce_process_block): Likewise.

Index: gcc/df.h
===
--- gcc/df.h.orig	2012-04-13 05:18:29.140781640 -0300
+++ gcc/df.h	2012-04-13 07:13:18.0 -0300
@@ -1101,4 +1101,46 @@ extern void union_defs (df_ref, struct w
 			unsigned int *used, struct web_entry *,
 			bool (*fun) (struct web_entry *, struct web_entry *));
 
+/* Debug uses of dead regs.  */
+
+/* Node of a linked list of uses of dead REGs in debug insns.  */
+struct dead_debug_use
+{
+  df_ref use;
+  struct dead_debug_use *next;
+};
+
+/* Linked list of the above, with a bitmap of the REGs in the
+   list.  */
+struct dead_debug
+{
+  struct dead_debug_use *head;
+  bitmap used;
+  bitmap to_rescan;
+};
+
+/* This type controls the behavior of dead_debug_insert_temp WRT
+   UREGNO and INSN.  */
+enum debug_temp_where
+  {
+/* Bind a newly-created debug temporary to a REG for UREGNO, and
+   insert the debug insn before INSN.  REG is expected to die at
+   INSN.  */
+DEBUG_TEMP_BEFORE_WITH_REG = -1,
+/* Bind a newly-created debug temporary to the value INSN stores
+   in REG, and insert the debug insn before INSN.  */
+DEBUG_TEMP_BEFORE_WITH_VALUE = 0,
+/* Bind a newly-created debug temporary to a REG for UREGNO, and
+   insert the debug insn after INSN.  REG is expected to be set at
+   INSN.  */
+DEBUG_TEMP_AFTER_WITH_REG = 1
+  };
+
+extern void dead_debug_init (struct dead_debug *, bitmap);
+extern void dead_debug_finish (struct dead_debug *, bitmap);
+extern void dead_debug_add (struct dead_debug *, df_ref, unsigned int);
+extern int dead_debug_insert_temp (struct dead_debug *,
+   unsigned int uregno, rtx insn,
+   enum debug_temp_where);
+
 #endif /* GCC_DF_H */
Index: gcc/df-problems.c
===
--- gcc/df-problems.c.orig	2012-04-13 06:58:42.053258184 -0300
+++ gcc/df-problems.c	2012-04-13 07:29:33.0 -0300
@@ -2886,25 +2886,6 @@ df_whole_mw_reg_unused_p (struct df_mw_h
 }
 
 
-/* Node of a linked list of uses of dead REGs in debug insns.  */
-struct dead_debug_use
-{
-  df_ref use;
-  struct dead_debug_use *next;
-};
-
-/* Linked list of the above, with a bitmap of the REGs in the
-   list.  */
-struct dead_debug
-{
-  struct dead_debug_use *head;
-  bitmap used;
-  bitmap to_rescan;
-};
-
-static void dead_debug_reset (struct dead_debug *, unsigned int);
-
-
 /* Set the REG_UNUSED notes for the multiword hardreg defs in INSN
based on the bits in LIVE.  Do not generate notes for registers in
artificial uses.  DO_NOT_GEN is updated so that REG_DEAD notes are
@@ -2930,7 +2911,7 @@ df_set_unused_notes_for_mw (rtx insn, st
 {
   unsigned int regno = mws->start_regno;
   df_set_note (REG_UNUSED, insn, mws->mw_reg);
-  dead_debug_reset (debug, regno);
+  dead_debug_insert_temp (debug, regno, insn, DEBUG_TEMP_AFTER_WITH_REG);
 
 #ifdef REG_DEAD_DEBUGGING
   df_print_note ("adding 1: ", insn, REG_NOTES (insn));
@@ -2945,7 +2926,7 @@ df_set_unused_notes_for_mw (rtx insn, st
 	&& !bitmap_bit_p (artificial_uses, r))
 	  {
 	df_set_note (REG_UNUSED, insn, regno_reg_rtx[r]);
-	dead_debug_reset (debug, r);
+	dead_debug_insert_temp (debug, r, insn, DEBUG_TEMP_AFTER_WITH_REG);
 #ifdef REG_DEAD_DEBUGGING
 	df_print_note ("adding 2: ", insn, REG_NOTES (insn));
 #endif
@@ -3013,12 +2994,12 @@ df_set_dead_notes_for_mw (rtx insn, stru
 
   if (df_whole_mw_reg_dead_p (mws, live, artificial_uses, do_not_gen))
 {
-  /* Add a dead note for the entire m

Re: fix incorrect debug temp added by df-problems

2012-04-13 Thread Alexandre Oliva
On Apr  9, 2012, Jakub Jelinek  wrote, in response to
my posting to the wrong thread (now fixed):

> On Mon, Apr 09, 2012 at 03:29:05AM -0300, Alexandre Oliva wrote:
>> +  && (!df_ignore_stack_reg (uregno)))

> Please remove the extra () around this line,
> && !df_ignore_stack_reg (uregno))

> Ok for trunk with that change, thanks.

Thanks, here's what I've just installed.

This fixes a bug in pr43165.c with -Os -g.

for  gcc/ChangeLog
from  Alexandre Oliva  

	* df-problems.c (df_note_bb_compute): Do not take note of
	debug uses for whose REGs we won't emit DEAD or UNUSED notes.
	
Index: gcc/df-problems.c
===
--- gcc/df-problems.c.orig	2012-04-13 05:18:28.604788011 -0300
+++ gcc/df-problems.c	2012-04-13 06:58:42.053258184 -0300
@@ -3453,7 +3453,12 @@ df_note_bb_compute (unsigned int bb_inde
 		{
 		  if (debug_insn > 0)
 		{
-		  dead_debug_add (&debug, use, uregno);
+		  /* We won't add REG_UNUSED or REG_DEAD notes for
+			 these, so we don't have to mess with them in
+			 debug insns either.  */
+		  if (!bitmap_bit_p (artificial_uses, uregno)
+			  && !df_ignore_stack_reg (uregno))
+			dead_debug_add (&debug, use, uregno);
 		  continue;
 		}
 		  break;

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: [PATCH, i386] Move common definitions of gnu-user.h and gnu-user64.h to separate file

2012-04-13 Thread Ilya Enkovich
> On Fri, Apr 13, 2012 at 7:34 AM, H.J. Lu  wrote:
>> On Fri, Apr 13, 2012 at 7:02 AM, Uros Bizjak  wrote:
>>> On Fri, Apr 13, 2012 at 12:37 PM, Ilya Enkovich  
>>> wrote:
>>>
 2012-04-13  Enkovich Ilya  

        * config/i386/gnu-user-common.h: New.

        * config/i386/gnu-user.h (CPP_SPEC): Moved to
        gnu-user-common.h.
        (CC1_SPEC): Likewise.
        (ENDFILE_SPEC): Likewise.
        (DEFAULT_PCC_STRUCT_RETURN): Likewise.
        (TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
        (TARGET_OS_CPP_BUILTINS): Likewise.
        (LIBGCC2_HAS_TF_MODE): Likewise.
        (LIBGCC2_TF_CEXT): Likewise.
        (TF_SIZE): Likewise.
        (TARGET_ASM_FILE_END): Likewise.
        (STACK_CHECK_MOVING_SP): Likewise.
        (STACK_CHECK_STATIC_BUILTIN): Likewise.

        * config/i386/gnu-user64.h: Likewise.
>>>
>>> OK for mainline SVN, but as Richard said, this is not appropriate for
>>> release branches.
>>>
>>
>> Ilya, I will check it for you.  There is no need for release branches.
>>
>
> ChangeLog for config.gcc is missing.
>
>
> --
> H.J.

Hi H.J.,

Here is fixed ChangeLog

Thanks,
Ilya
---
2012-04-13  Enkovich Ilya  

* config/i386/gnu-user-common.h: New.

* config/i386/gnu-user.h (CPP_SPEC): Moved to
gnu-user-common.h.
(CC1_SPEC): Likewise.
(ENDFILE_SPEC): Likewise.
(DEFAULT_PCC_STRUCT_RETURN): Likewise.
(TARGET_TLS_DIRECT_SEG_REFS_DEFAULT): Likewise.
(TARGET_OS_CPP_BUILTINS): Likewise.
(LIBGCC2_HAS_TF_MODE): Likewise.
(LIBGCC2_TF_CEXT): Likewise.
(TF_SIZE): Likewise.
(TARGET_ASM_FILE_END): Likewise.
(STACK_CHECK_MOVING_SP): Likewise.
(STACK_CHECK_STATIC_BUILTIN): Likewise.

* config/i386/gnu-user64.h: Likewise.

* config.gcc: Add i386/gnu-user-common.h before
all i386/gnu-user.h and i386/gnu-user64.h usages.


Re: fix left-over debug insns in DCE

2012-04-13 Thread Paolo Bonzini
Il 13/04/2012 17:58, Alexandre Oliva ha scritto:
> 
> I've just installed the patch, but if you find the need for any further
> improvement, let me know and I'll do it right away.

I wonder if it makes any sense to move the dead_debug_* stuff to its own
file...

Paolo


Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?

2012-04-13 Thread Martin Jambor
On Fri, Apr 13, 2012 at 01:57:33PM +0200, Rainer Orth wrote:
> Richard Guenther  writes:
> 
> >> Anyway, the patch I posted previously would risk re-introducing PR
> >> 50386 and PR 50326, even though they are very unlikely with just
> >> bit-fields.  So my current working version is the following, but it
> >> causes failure of libmudflap.c++/pass55-frag.cxx execution test so I'm
> >> not actually proposing it yet (sigh).
> >
> > I would not worry about mudflap tests.  The patch looks good to my
> > eyes.
> 
> Are you sure the failure is new?  At least for 64-bit at -O,
> libmudflap.c++/pass55-frag.cxx already fails right now (cf. PR
> libmudflap/49843).
> 

Well, my patch makes it fail on all other optimization levels too (the
supposedly no-optimization variant is actually compiled with -O2).
But thanks for pointing me towards the bug, I'll use it as an excuse
to commit the patch nevertheless (mainly because I really cannot see
anything wrong going on there, at least with respect to SRA and this
change).

Thanks,

Martin



PATCH: Define _ILP32 and __ILP32__ for x32

2012-04-13 Thread H.J. Lu
Hi,

This patch defines _ILP32 and __ILP32__ for x32 as specified by x32 psABI.
OK for trunk and 4.7 branch?

Thanks.

H.J.
---
2012-04-13  H.J. Lu  

* config/i386/i386-c.c (ix86_target_macros): Define _ILP32
and __ILP32__ for x32.

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 8adb3b4..49fd4d9 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -383,6 +383,11 @@ ix86_target_macros (void)
   cpp_define (parse_in, "__amd64__");
   cpp_define (parse_in, "__x86_64");
   cpp_define (parse_in, "__x86_64__");
+  if (TARGET_X32)
+   {
+ cpp_define (parse_in, "_ILP32");
+ cpp_define (parse_in, "__ILP32__");
+   }
 }
   else
 {


Re: [patch, fortran-dev] Use fixed variable sizes for stride calculations

2012-04-13 Thread Janne Blomqvist
On Wed, Apr 11, 2012 at 10:51, Thomas Koenig  wrote:
> Hi,
>
>
>> this patch uses division by known sizes (which can usually be replaced
>> by a simple shift because intrinsics have sizes of power of two) instead
>> of division by the size extracted from the array descriptor itself.
>>
>> This should save about 20 cycles for a single calculation.
>>
>> I'll go through the rest of the library to identify other possibilities
>> for this.
>>
>> Regression-tested, no new failures.
>>
>> OK for the branch?
>
>
> Full patch at http://gcc.gnu.org/ml/fortran/2012-03/msg00120.html
>
> Ping?
>
>        Thomas

I thought I already approved this a few weeks ago
(http://gcc.gnu.org/ml/fortran/2012-03/msg00144.html)? Or if it has
been decided that I'm no longer a GFortran reviewer, it would be nice
if I'd be informed of this as well.


-- 
Janne Blomqvist


Re: PATCH: Define _ILP32 and __ILP32__ for x32

2012-04-13 Thread Uros Bizjak
On Fri, Apr 13, 2012 at 7:14 PM, H.J. Lu  wrote:

> This patch defines _ILP32 and __ILP32__ for x32 as specified by x32 psABI.
> OK for trunk and 4.7 branch?

>        * config/i386/i386-c.c (ix86_target_macros): Define _ILP32
>        and __ILP32__ for x32.

OK.

Thanks,
Uros.


Re: [PATCH, PR 52939] Gracefully deal with fold_ctor_reference returning NULL during devirtualization

2012-04-13 Thread Martin Jambor
Hi,

On Fri, Apr 13, 2012 at 04:13:13PM +0200, Richard Guenther wrote:
> On Fri, 13 Apr 2012, Martin Jambor wrote:
> 
> > Hi,
> > 
> > currently we ICE when attempting to devirtualize a call to a virtual
> > method introduced in a descendant but with a base which is an ancestor
> > which does not have it.  This is because fold_ctor_reference returns
> > constant zero when it cannot find the particular value in the provided
> > constructor which is something gimple_get_virt_method_for_binfo cannot
> > cope with.  The patch below avoids it by simply testing for that
> > value and bailing out.
> > 
> > Bootstrapped and tested on x86_64-linux, OK for trunk?
> 
> Ok.

I forgot to ask permission also to commit it to the 4.7 branch but I
suppose that I can do that too and will go ahead with the same patch
soon (after bootstrap and testing on the branch).

Thanks,

Martin


> 
> Thanks,
> Richard.
> 
> > Thanks,
> > 
> > Martin
> > 
> > 
> > 2012-04-12  Martin Jambor  
> > 
> > PR middle-end/52939
> > * gimple-fold.c (gimple_get_virt_method_for_binfo): Bail out if
> > fold_ctor_reference returns a zero constant.
> > 
> > * testsuite/g++.dg/ipa/pr52939.C: New test.
> > 
> > Index: src/gcc/gimple-fold.c
> > ===
> > --- src.orig/gcc/gimple-fold.c
> > +++ src/gcc/gimple-fold.c
> > @@ -3087,7 +3087,7 @@ gimple_get_virt_method_for_binfo (HOST_W
> >offset += token * size;
> >fn = fold_ctor_reference (TREE_TYPE (TREE_TYPE (v)), DECL_INITIAL (v),
> > offset, size);
> > -  if (!fn)
> > +  if (!fn || integer_zerop (fn))
> >  return NULL_TREE;
> >gcc_assert (TREE_CODE (fn) == ADDR_EXPR
> >   || TREE_CODE (fn) == FDESC_EXPR);
> > Index: src/gcc/testsuite/g++.dg/ipa/pr52939.C
> > ===
> > --- /dev/null
> > +++ src/gcc/testsuite/g++.dg/ipa/pr52939.C
> > @@ -0,0 +1,58 @@
> > +/* Verify that we do not ICE on invalid devirtualizations (which might
> > +   be OK at run-time because never executed).  */
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -fno-early-inlining -fno-inline"  } */
> > +
> > +extern "C" void abort (void);
> > +
> > +class A
> > +{
> > +public:
> > +  int data;
> > +  virtual int foo (int i);
> > +};
> > +
> > +class B : public A
> > +{
> > +public:
> > +  virtual int foo (int i);
> > +  virtual int bar (int i);
> > +};
> > +
> > +int A::foo (int i)
> > +{
> > +  return i + 1;
> > +}
> > +
> > +int B::foo (int i)
> > +{
> > +  return i + 2;
> > +}
> > +
> > +int B::bar (int i)
> > +{
> > +  return i + 3;
> > +}
> > +
> > +static int middleman (class A *obj, int i)
> > +{
> > +  class B *b = (class B *) obj;
> > +
> > +  if (i != 1)
> > +return b->bar (i);
> > +  else
> > +return i;
> > +}
> > +
> > +int __attribute__ ((noinline,noclone)) get_input(void)
> > +{
> > +  return 1;
> > +}
> > +
> > +int main (int argc, char *argv[])
> > +{
> > +  class A o;
> > +  if (middleman (&o, get_input ()) != 1)
> > +abort ();
> > +  return 0;
> > +}
> > 
> > 
> 
> -- 
> Richard Guenther 
> SUSE / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer



Re: Commit: RL78: Remove use of TODO_dump_func

2012-04-13 Thread DJ Delorie

>   The optimization pass flag "TODO_dump_flag" has been removed (see
>   patch committed 2012-04-11) which was causing the RL78 backend to fail
>   to build.  I am applying the following patch as an obvious fix.

Ok, thanks!


C++ PATCH for c++/52915 (accepts-invalid anonymous union in C++11 mode)

2012-04-13 Thread Jason Merrill
C++11 extends unions so that a member can have a non-trivial default 
constructor, but the union then has a deleted constructor unless the 
user defines one.  As a result, we can't assume that an anonymous union 
has a trivial default constructor anymore.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 761b558950409d024fd509228b1a3e04fcefb38a
Author: Jason Merrill 
Date:   Thu Apr 12 18:10:48 2012 -0400

	PR c++/52915
	* decl2.c (finish_anon_union): Use cp_finish_decl.
	* error.c (dump_function_name): Avoid showing anonymous "name".

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index b048ac7..212feea 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -1456,12 +1456,7 @@ finish_anon_union (tree anon_union_decl)
 }
 
   pushdecl (anon_union_decl);
-  if (building_stmt_list_p ()
-  && at_function_scope_p ())
-add_decl_expr (anon_union_decl);
-  else if (!processing_template_decl)
-rest_of_decl_compilation (anon_union_decl,
-			  toplevel_bindings_p (), at_eof);
+  cp_finish_decl (anon_union_decl, NULL_TREE, false, NULL_TREE, 0);
 }
 
 /* Auxiliary functions to make type signatures for
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index ee8f0e0..77eb306 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1556,6 +1556,8 @@ dump_function_name (tree t, int flags)
 {
   if (LAMBDA_TYPE_P (DECL_CONTEXT (t)))
 	name = get_identifier ("");
+  else if (TYPE_ANONYMOUS_P (DECL_CONTEXT (t)))
+	name = get_identifier ("");
   else
 	name = constructor_name (DECL_CONTEXT (t));
 }
diff --git a/gcc/testsuite/g++.dg/other/anon-union2.C b/gcc/testsuite/g++.dg/other/anon-union2.C
new file mode 100644
index 000..31bb74f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/anon-union2.C
@@ -0,0 +1,10 @@
+// PR c++/52915
+
+struct S {
+  int val;
+  S(int v) : val(v) {}
+};
+
+void f() {
+  union { S a; };		// { dg-error "constructor|no match" }
+}


C++ PATCH for c++/52905 (ICE with invalid list-initialization)

2012-04-13 Thread Jason Merrill
When a list-initialization doesn't quite match either a list constructor 
or a non-list constructor, we end up trying to compare them in joust and 
get confused because they have different numbers of parameters.  So 
let's just treat them as unordered; we're going to talk about what's 
wrong with both of them anyway.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit fad09b389b18d6f375b07bb680abc7e4590ecae2
Author: Jason Merrill 
Date:   Fri Apr 13 13:01:59 2012 -0400

	PR c++/52905
	* call.c (joust): Handle comparing list and non-list ctors.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 3c3dabb..46ac55c 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8011,6 +8011,12 @@ joust (struct z_candidate *cand1, struct z_candidate *cand2, bool warn)
   int static_1 = DECL_STATIC_FUNCTION_P (cand1->fn);
   int static_2 = DECL_STATIC_FUNCTION_P (cand2->fn);
 
+  if (DECL_CONSTRUCTOR_P (cand1->fn)
+	  && is_list_ctor (cand1->fn) != is_list_ctor (cand2->fn))
+	/* We're comparing a near-match list constructor and a near-match
+	   non-list constructor.  Just treat them as unordered.  */
+	return 0;
+
   gcc_assert (static_1 != static_2);
 
   if (static_1)
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-ctor1.C b/gcc/testsuite/g++.dg/cpp0x/initlist-ctor1.C
new file mode 100644
index 000..82031cb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-ctor1.C
@@ -0,0 +1,13 @@
+// PR c++/52905
+// { dg-options -std=c++11 }
+
+#include 
+
+enum E { e1, e2 };
+struct A
+{
+  A(std::initializer_list);	// { dg-message "A::A" }
+  A(int, E);			// { dg-message "A::A" }
+};
+
+A a{e1,2};			// { dg-error "" }


C++ PATCH for c++/52824 (pack expansion and fixed template parameters)

2012-04-13 Thread Jason Merrill

One case that I missed in my patch for PR 35722.

Tested x86_64-pc-linux-gnu, applying to trunk and 4.7.
commit 9fa7eea3608b19b53cc2f3c9a8195cf811b02d84
Author: Jason Merrill 
Date:   Fri Apr 13 13:37:26 2012 -0400

	PR c++/52824
	* pt.c (any_pack_expanson_args_p): New.
	(coerce_template_parms): Use it.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index ee38254..07a2cc0 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6725,6 +6725,20 @@ coerce_template_parameter_pack (tree parms,
   return argument_pack;
 }
 
+/* Returns true if the template argument vector ARGS contains
+   any pack expansions, false otherwise.  */
+
+static bool
+any_pack_expanson_args_p (tree args)
+{
+  int i;
+  if (args)
+for (i = 0; i < TREE_VEC_LENGTH (args); ++i)
+  if (PACK_EXPANSION_P (TREE_VEC_ELT (args, i)))
+	return true;
+  return false;
+}
+
 /* Convert all template arguments to their appropriate types, and
return a vector containing the innermost resulting template
arguments.  If any error occurs, return error_mark_node. Error and
@@ -6790,6 +6804,7 @@ coerce_template_parms (tree parms,
   if ((nargs > nparms && !variadic_p)
   || (nargs < nparms - variadic_p
 	  && require_all_args
+	  && !any_pack_expanson_args_p (inner_args)
 	  && (!use_default_args
 	  || (TREE_VEC_ELT (parms, nargs) != error_mark_node
   && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs))
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C
index 2bc9b11..b23e402 100644
--- a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C
@@ -2,7 +2,7 @@
 // { dg-options "-std=c++0x" }
 
 template //#1
-struct foo {}; // { dg-error "provided for|foo" }
+struct foo {};
 
 template
 struct P {};
@@ -10,8 +10,8 @@ struct P {};
 template class... TT>
 struct bar {
 template
-using mem = P...>;//#2 { dg-error "wrong number of|arguments" }
+using mem = P...>;//#2
 };
 
-bar::mem b;//#3 { dg-error "invalid type" }
+bar::mem b;//#3
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic123.C b/gcc/testsuite/g++.dg/cpp0x/variadic123.C
new file mode 100644
index 000..f0ab9fc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic123.C
@@ -0,0 +1,14 @@
+// PR c++/52824
+// { dg-do compile { target c++11 } }
+
+template
+struct foo
+{};
+
+template
+struct bar : foo
+{};
+
+int main() {
+  bar f;
+}


C++ PATCH for c++/52824 (pack expansion and fixed template parameters)

2012-04-13 Thread Jason Merrill

One case that I missed in my patch for PR 35722.

Tested x86_64-pc-linux-gnu, applying to trunk and 4.7.
commit 9fa7eea3608b19b53cc2f3c9a8195cf811b02d84
Author: Jason Merrill 
Date:   Fri Apr 13 13:37:26 2012 -0400

	PR c++/52824
	* pt.c (any_pack_expanson_args_p): New.
	(coerce_template_parms): Use it.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index ee38254..07a2cc0 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6725,6 +6725,20 @@ coerce_template_parameter_pack (tree parms,
   return argument_pack;
 }
 
+/* Returns true if the template argument vector ARGS contains
+   any pack expansions, false otherwise.  */
+
+static bool
+any_pack_expanson_args_p (tree args)
+{
+  int i;
+  if (args)
+for (i = 0; i < TREE_VEC_LENGTH (args); ++i)
+  if (PACK_EXPANSION_P (TREE_VEC_ELT (args, i)))
+	return true;
+  return false;
+}
+
 /* Convert all template arguments to their appropriate types, and
return a vector containing the innermost resulting template
arguments.  If any error occurs, return error_mark_node. Error and
@@ -6790,6 +6804,7 @@ coerce_template_parms (tree parms,
   if ((nargs > nparms && !variadic_p)
   || (nargs < nparms - variadic_p
 	  && require_all_args
+	  && !any_pack_expanson_args_p (inner_args)
 	  && (!use_default_args
 	  || (TREE_VEC_ELT (parms, nargs) != error_mark_node
   && !TREE_PURPOSE (TREE_VEC_ELT (parms, nargs))
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C
index 2bc9b11..b23e402 100644
--- a/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-15.C
@@ -2,7 +2,7 @@
 // { dg-options "-std=c++0x" }
 
 template //#1
-struct foo {}; // { dg-error "provided for|foo" }
+struct foo {};
 
 template
 struct P {};
@@ -10,8 +10,8 @@ struct P {};
 template class... TT>
 struct bar {
 template
-using mem = P...>;//#2 { dg-error "wrong number of|arguments" }
+using mem = P...>;//#2
 };
 
-bar::mem b;//#3 { dg-error "invalid type" }
+bar::mem b;//#3
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic123.C b/gcc/testsuite/g++.dg/cpp0x/variadic123.C
new file mode 100644
index 000..f0ab9fc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic123.C
@@ -0,0 +1,14 @@
+// PR c++/52824
+// { dg-do compile { target c++11 } }
+
+template
+struct foo
+{};
+
+template
+struct bar : foo
+{};
+
+int main() {
+  bar f;
+}


Re: RFA: consolidate DWARF strings into libiberty

2012-04-13 Thread Tom Tromey
Tom> Here is a new patch for gcc.
Tom> I still haven't updated the src side, but there's little to do there
Tom> that isn't already done in this patch.

Tom> Ok?

Tom> Ping.

Ping.

Tom


Re: PING Re: [PATCH] gfortran testsuite: implicitly cleanup-modules, part 2

2012-04-13 Thread Mike Stump
On Apr 13, 2012, at 7:39 AM, Bernhard Reutner-Fischer wrote:
> On Fri, Apr 13, 2012 at 06:57:44AM -0700, Mike Stump wrote:
>> On Apr 13, 2012, at 3:51 AM, Bernhard Reutner-Fischer wrote:
>>> Ping.
>> 
>> Before advancing, has the problem that Rainer pointed out on March 19th with 
>> your earlier patch been fixed?
> 
> I believe that it is fixed, yes. See r185688 and my follow up to him (
> http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01498.html )

Hum, somehow I missed that... Thanks, I was hoping I just missed it.


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread Mike Stump
On Apr 13, 2012, at 7:50 AM, Bernd Schmidt wrote:
> The problem is that a human might be
> confused, for example due to bad indentation. Whether there's a for in
> between doesn't matter for this purpose, the following is most likely a bug:
> 
> if ()
>  for (..)
>if ()
>  x
> else
>  y

I like the warning only when the indentation of the else does not match the 
indentation of the matching if.  Most humans I think can follow the meaning 
correctly, if the code is indented correctly.  The point of the warning would 
be to catch typos, and an else that isn't in the right column, is most likely 
wrong.  Though, I'd argue that any else in the wrong column, is most likely 
wrong...


Re: [Patch, Fortran, F03] PR52909: Procedure pointers not private to modules

2012-04-13 Thread Janne Blomqvist
On Tue, Apr 10, 2012 at 11:21, Tobias Burnus
 wrote:
> Regarding ABI breakage:
[snip]

In general I agree that ABI compatibility is something we should take
seriously, but OTOH we should take care that the anointed ABI makes
sense. Which IMHO would imply that known ABI bugs/misdesigns should be
fixed, and e.g. the mod file format should be extensible. And even
that may not be enough, considering there are rather big performance
issues with mod loading at the moment, which may or may not require a
complete redesign of the mod file format as well. So I worry that if
we commit to supporting something more or less like the current mod
format going forward, will that make solving these problems
prohibitively tedious as one would need to at the same time support
both the existing, possibly completely different module format, and
the new one?

Also, ideally we should have a document similar to what C++ has at
http://sourcery.mentor.com/public/cxx-abi/ which describes the ABI.

Also, while it would be nice to fix the ABI in one go, considering the
rather limited manpower we have, maybe it's more feasible to spread it
over several releases. Say if we update the array descriptor (and
hence library .so version number) in one release, name mangling fixes
and other  "OOP ABI" issues in another, better .mod file format
whenever it gets done, and so forth. And only when we're "done" we fix
the ABI and implement the -fabi-version thingy (since inevitably ABI
bugs will be found and fixed in the future as well, and also of course
new language features may require updating the ABI).


-- 
Janne Blomqvist


Re: [patch, fortran-dev] Use fixed variable sizes for stride calculations

2012-04-13 Thread Thomas Koenig

Hi Janne,


I thought I already approved this a few weeks ago


sorry, I totally missed that.  Comes from having a computer
crash on you...

Thanks for the review!

Thomas


RE: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-13 Thread Boehm, Hans


> -Original Message-
> From: Aldy Hernandez [mailto:al...@redhat.com]
> Sent: Thursday, April 12, 2012 3:12 PM
> To: Richard Guenther
> Cc: Andrew MacLeod; Boehm, Hans; gcc-patches; Torvald Riegel
> Subject: [PR tree-optimization/52558]: RFC: questions on store data
> race
> 
> Here we have a testcase that affects both the C++ memory model and
> transactional memory.
> 
> [Hans, this is caused by the same problem that is causing the
> speculative register promotion issue you and Torvald pointed me at].
> 
> In the following testcase (adapted from the PR), the loop invariant
> motion pass caches a pre-existing value for g_2, and then performs a
> store to g_2 on every path, causing a store data race:
> 
> int g_1 = 1;
> int g_2 = 0;
> 
> int func_1(void)
> {
>int l;
>for (l = 0; l < 1234; l++)
>{
>  if (g_1)
>return l;
>  else
>g_2 = 0;   <-- Store to g_2 should only happen if !g_1
>}
>return 999;
> }
> 
> This gets transformed into something like:
> 
>   g_2_lsm = g_2;
>   if (g_1) {
>   g_2 = g_2_lsm;  // boo! hiss!
>   return 0;
>   } else {
>   g_2_lsm = 0;
>   g_2 = g_2_lsm;
>   }
> 
> The spurious write to g_2 could cause a data race.
Presumably the g_2_lsm = g_2 is actually outside the loop?

Why does the second g_2 = g_2_lsm; get introduced?  I would have expected it 
before the return.  Was the example just over-abbreviated?

Other than that, this sounds right to me.  So does Richard's flag-based 
version, which is the approach I would have originally expected.  But that 
clearly costs you a register.  It would be interesting to see how they compare.

Hans

> 
> Andrew has suggested verifying that the store to g_2 was actually
> different than on entry, and letting PHI copy propagation optimize the
> redundant comparisons.  Like this:
> 
>   g_2_lsm = g_2;
>   if (g_1) {
>   if (g_2_lsm != g_2) // hopefully optimized away
>   g_2 = g_2_lsm;  // hopefully optimized away
>   return 0;
>   } else {
>   g_2_lsm = 0;
>   if (g_2_lsm != g_2) // hopefully optimized away
>   g_2 = g_2_lsm;
>   }
> 
> ...which PHI copy propagation and/or friends should be able to
> optimize.
> 
> For that matter, regardless of the memory model, the proposed solution
> would improve the existing code by removing the extra store (in this
> contrived test case anyhow).
> 
> Anyone see a problem with this approach (see attached patch)?
> 
> Assuming the unlikely scenario that everyone likes this :), I have two
> problems that I would like answered.  But feel free to ignore if the
> approach is a no go.
> 
> 1. No pass can figure out the equality (or inequality) of g_2_lsm and
> g_2.  In the PR, Richard mentions that both FRE/PRE can figure this
> out, but they are not run after store motion.  DOM should also be able
> to, but apparently does not :(.  Tips?
> 
> 2. The GIMPLE_CONDs I create in this patch are currently causing
> problems with complex floats/doubles.  do_jump is complaining that it
> can't compare them, so I assume it is too late (in tree-ssa-loop-im.c)
> to compare complex values since complex lowering has already happened
> (??). Is there a more canonical way of creating a GIMPLE_COND that may
> contain complex floats at this stage?
> 
> Thanks folks.


RE: [PR tree-optimization/52558]: RFC: questions on store data race

2012-04-13 Thread Boehm, Hans
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Can we _remove_ a store we percieve as redundant (with a single-threaded
> view) with the memory model?
Generally yes, so long as synchronization operations either conservatively 
treated as completely opaque, or are treated correctly in the "single-threaded 
view".  If there is no synchronization between the original store, and the 
redundant one, then the redundant one changes things only if another thread 
writes to the same variable in-between.  That constitutes a data race, which 
invokes undefined behavior.  The general rule is that any sequentially correct 
transformation is OK between synchronization operations, so long as you don't 
store to anything you weren't supposed to modify, and the state at the next 
synchronization point is what would have been expected.  You can sometimes 
treat synchronizations more aggressively, but that should be safe.

Hans


Re: [Patch] tree-parloops.c (eliminate_local_variables): Add braces to suppress warnings

2012-04-13 Thread Magnus Fromreide
On Fri, 2012-04-13 at 16:50 +0200, Bernd Schmidt wrote:
> On 04/13/2012 04:44 PM, Jakub Jelinek wrote:
> > On Fri, Apr 13, 2012 at 04:33:17PM +0200, Bernd Schmidt wrote:
> >> On 04/13/2012 04:20 PM, Mike Stump wrote:
> >>> On Apr 13, 2012, at 5:30 AM, NightStrike wrote:
> > no warning from trunk.  Which GCC version emits this warning?
> >>>
>  Looks like cygwin gcc 3.4.4
> >>>
> >>> 3.4.4 is a little old now..  We'd encourage an upgrade to a fine new 
> >>> compiler...  :-)
> >>
> >> The thing is, the else really is ambiguous, so it looks like we have a
> >> warning regression.
> > 
> > To the compiler?  There is for in between, no need to put extra braces IMHO.
> > I believe it has been intentionally fixed for 4.0, but my bisect tree only
> > goes back to r9.
> 
> Well, to the compiler even
> if ()
>   if ()
> x
>   else
> y
> 
> has an unambiguous meaning. The problem is that a human might be
> confused, for example due to bad indentation.

For me this warning is almost always a false positive since I am unable
to turn it of selectively.

I am programming in C++.
My main use of

if () ; else y

is to get a temporary scope that doesn't have any braces, e.g. in this
macro

#define DEBUG if (!enable_debugging) ; else debug_object()

where debug_object is an instance of std::ostream.
I then use it like

DEBUG << "fancy message goes here";

Now, writing

if (condition)
  DEBUG << "ooops, condition is true";

is not that far-fetched and obviously triggers the warning but I have a
hard time seeing how that is confusing to a casual reader of the code.


(In order to avoid this warning I am currently writing the above code
as

#define DEBUG switch (!enable_debugging) case false: debug_object()

but that makes the define harder to understand and so the net result of
the warning is to reduce the clarity of the code)

/MF



Re: [PATCH] Fix for PR52081 - Missed tail merging with pure calls

2012-04-13 Thread Tom de Vries
On 06/03/12 15:21, Richard Guenther wrote:
> On Mon, Feb 13, 2012 at 1:36 PM, Tom de Vries  wrote:
>> On 13/02/12 12:54, Richard Guenther wrote:
>>> On Thu, Feb 2, 2012 at 11:44 AM, Tom de Vries  
>>> wrote:
 Richard,

 this patch fixes PR52801.

 Consider test-case pr51879-12.c:
 ...
 __attribute__((pure)) int bar (int);
 __attribute__((pure)) int bar2 (int);
 void baz (int);

 int x, z;

 void
 foo (int y)
 {
  int a = 0;
  if (y == 6)
{
  a += bar (7);
  a += bar2 (6);
}
  else
{
  a += bar2 (6);
  a += bar (7);
}
  baz (a);
 }
 ...

 When compiling at -O2, pr51879-12.c.094t.pre looks like this:
 ...
  # BLOCK 3 freq:1991
  # PRED: 2 [19.9%]  (true,exec)
  # VUSE <.MEMD.1722_12(D)>
  # USE = nonlocal escaped
  D.1717_4 = barD.1703 (7);
  # VUSE <.MEMD.1722_12(D)>
  # USE = nonlocal escaped
  D.1718_6 = bar2D.1705 (6);
  aD.1713_7 = D.1717_4 + D.1718_6;
  goto ;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 4 freq:8009
  # PRED: 2 [80.1%]  (false,exec)
  # VUSE <.MEMD.1722_12(D)>
  # USE = nonlocal escaped
  D.1720_8 = bar2D.1705 (6);
  # VUSE <.MEMD.1722_12(D)>
  # USE = nonlocal escaped
  D.1721_10 = barD.1703 (7);
  aD.1713_11 = D.1720_8 + D.1721_10;
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 5 freq:1
  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
  # aD.1713_1 = PHI 
  # .MEMD.1722_13 = VDEF <.MEMD.1722_12(D)>
  # USE = nonlocal
  # CLB = nonlocal
  bazD.1707 (aD.1713_1);
  # VUSE <.MEMD.1722_13>
  return;
 ...
 block 3 and 4 can be tail-merged.

 Value numbering numbers the two phi arguments a_7 and a_11 the same so the
 problem is not in value numbering:
 ...
 Setting value number of a_11 to a_7 (changed)
 ...

 There are 2 reasons that tail_merge_optimize doesn't optimize this:

 1.
 The clause
  is_gimple_assign (stmt) && local_def (gimple_get_lhs (stmt))
  && !gimple_has_side_effects (stmt)
 used in both same_succ_hash and gsi_advance_bw_nondebug_nonlocal evaluates 
 to
 false for pure calls.
 This is fixed by replacing is_gimple_assign with gimple_has_lhs.

 2.
 In same_succ_equal we check gimples from the 2 bbs side-by-side:
 ...
  gsi1 = gsi_start_nondebug_bb (bb1);
  gsi2 = gsi_start_nondebug_bb (bb2);
  while (!(gsi_end_p (gsi1) || gsi_end_p (gsi2)))
{
  s1 = gsi_stmt (gsi1);
  s2 = gsi_stmt (gsi2);
  if (gimple_code (s1) != gimple_code (s2))
return 0;
  if (is_gimple_call (s1) && !gimple_call_same_target_p (s1, s2))
return 0;
  gsi_next_nondebug (&gsi1);
  gsi_next_nondebug (&gsi2);
}
 ...
 and we'll be comparing 'bar (7)' and 'bar2 (6)', and 
 gimple_call_same_target_p
 will return false.
 This is fixed by ignoring local defs in this comparison, by using
 gsi_advance_fw_nondebug_nonlocal on the iterators.

 bootstrapped and reg-tested on x86_64.

 ok for stage1?
>>>
>>> Sorry for responding so late ...
>>
>> no problem :)
>>
>>> I think these fixes hint at that we should
>>> use "structural" equality as fallback if value-numbering doesn't equate
>>> two stmt effects.  Thus, treat two stmts with exactly the same operands
>>> and flags as equal and using value-numbering to canonicalize operands
>>> (when they are SSA names) for that comparison, or use VN entirely
>>> if there are no side-effects on the stmt.
>>>
>>> Changing value-numbering of virtual operands, even if it looks correct in 
>>> the
>>> simple cases you change, doesn't look like a general solution for the missed
>>> tail merging opportunities.
>>>
>>
>> Your comment is relevant for the other recent tail-merge related fixes I
>> submitted, but I think not for this one.
>>
>> In this case, value-numbering manages to value number the 2 phi-alternatives
>> equal. It's tail-merging that doesn't take advantage of this, by treating 
>> pure
>> function calls the same as non-pure function calls. The fixes are therefore 
>> in
>> tail-merging, not in value numbering.
>>
>> So, ok for stage1?
> 
> I see.  A few comments.
> 
> +/* Returns whether VAL is used in the same bb as in which it is defined, or
> +   in the phi of a successor bb.  */
> +
> +static bool
> +local_def (tree val)
> +{
> +  gimple stmt, def_stmt;
> +  basic_block bb, def_bb;
> +  imm_use_iterator iter;
> +  bool res;
> +
> +  if (TREE_CODE (val) != SSA_NAME)
> +return false;
> 
> what about SSA_NAME_IS_DEFAULT_DEF names?  They have a def-stmt
> with a NULL basic-block.

As you suggested below, I've inlined local_def into stmt_local_def, so val is
now initialized with DEF_FROM_PTR (SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF)