[PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases

2012-09-14 Thread Tom de Vries
Richard,

I've tried to handle more LSHIFT_EXPR cases with a shift range in tree-vrp.

Currently we handle cases like this:
- non-negative shifting out zeros
  [5, 6] << [1, 2]
  == [10, 24]

This patch adds these cases:
- unsigned shifting out ones
  [0xff00, 0x] << [1, 2]
  == [0xfc00, 0xfffe]
- negative numbers
  [-1, 1] << [1, 2]
  == [-4, 4]

My previous patch (for PR53986) contained a bug (test-case vrp82.c) which makes
vrp evaluate:
- [minint, 1] << [1, 2]
  == [0, 4]
It should conservatively have checked for vr0.min >= 0, for the signed negative
case.

This patch fixes that bug as well.

Bootstrapped and regtested (ada inclusive) on x86_64.

OK for trunk?

Thanks,
- Tom

2012-09-10  Tom de Vries  

* tree-vrp.c (extract_range_from_binary_expr_1): Fix bug in handling of
LSHIFT_EXPR with shift range.  Handle more LSHIFT_EXPR cases with shift
range.

* gcc.dg/tree-ssa/vrp81.c: New test.
* gcc.dg/tree-ssa/vrp81-2.c: Same.
* gcc.dg/tree-ssa/vrp82.c: Same.
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 191089)
+++ gcc/tree-vrp.c	(working copy)
@@ -2766,20 +2766,63 @@
 	  else if (code == LSHIFT_EXPR
 		   && range_int_cst_p (&vr0))
 	{
-	  int overflow_pos = TYPE_PRECISION (expr_type);
+	  int prec = TYPE_PRECISION (expr_type);
+	  int overflow_pos = prec;
 	  int bound_shift;
-	  double_int bound;
+	  double_int bound, complement, low_bound, high_bound;
+	  bool uns = TYPE_UNSIGNED (expr_type);
+	  bool in_bounds = false;
 
-	  if (!TYPE_UNSIGNED (expr_type))
+	  if (!uns)
 		overflow_pos -= 1;
 
 	  bound_shift = overflow_pos - TREE_INT_CST_LOW (vr1.max);
-	  bound = double_int_one.llshift (bound_shift,
-	  TYPE_PRECISION (expr_type));
-	  if (tree_to_double_int (vr0.max).ult (bound))
+	  /* If bound_shift == HOST_BITS_PER_DOUBLE_INT, the llshift can
+		 overflow.  However, for that to happen, vr1.max needs to be
+		 zero, which means vr1 is a singleton range of zero, which
+		 means it should be handled by the previous LSHIFT_EXPR
+		 if-clause.  */
+	  bound = double_int_one.llshift (bound_shift, prec);
+	  complement = ~(bound - double_int_one);
+
+	  if (uns)
 		{
-		  /* In the absense of overflow, (a << b) is equivalent
-		 to (a * 2^b).  */
+		  low_bound = bound;
+		  high_bound = complement.zext (prec);
+		  if (tree_to_double_int (vr0.max).ult (low_bound))
+		{
+		  /* [5, 6] << [1, 2] == [10, 24].  */
+		  /* We're shifting out only zeroes, the value increases
+			 monotomically.  */
+		  in_bounds = true;
+		}
+		  else if (high_bound.ult (tree_to_double_int (vr0.min)))
+		{
+		  /* [0xff00, 0x] << [1, 2]
+		 == [0xfc00, 0xfffe].  */
+		  /* We're shifting out only ones, the value decreases
+			 monotomically.  */
+		  in_bounds = true;
+		}
+		}
+	  else
+		{
+		  /* [-1, 1] << [1, 2] == [-4, 4].  */
+		  low_bound = complement.sext (prec);
+		  high_bound = bound;
+		  if (tree_to_double_int (vr0.max).slt (high_bound)
+		  && low_bound.slt (tree_to_double_int (vr0.min)))
+		{
+		  /* For non-negative numbers, we're shifting out only
+			 zeroes, the value increases monotomically.
+			 For negative numbers, we're shifting out only ones, the
+			 value decreases monotomically.  */
+		  in_bounds = true;
+		}
+		}
+
+	  if (in_bounds)
+		{
 		  extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
 		  return;
 		}
Index: gcc/testsuite/gcc.dg/tree-ssa/vrp81-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/vrp81-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp81-2.c	(revision 0)
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-vrp1" } */
+
+extern void vrp_keep (void);
+
+void
+f2 (int c, int b)
+{
+  int s = 0;
+  if (c == 0)
+s += 1;
+  else if (c < 1)
+s -= 1;
+  /* s in [-1, 1].   */
+  b = (b & 1) + 1;
+  /* b in range [1, 2].  */
+  b = s << b;
+  /* b in range [-4, 4].  */
+  if (b == -4)
+vrp_keep ();
+  if (b == 4)
+vrp_keep ();
+}
+
+void
+f3 (int s, int b)
+{
+  if (s >> 3 == -2)
+{
+  /* s in range [-16, -9].  */
+  b = (b & 1) + 1;
+  /* b in range [1, 2].  */
+  b =  s << b;
+  /* b in range [bmin << smax, bmax << smin],
+== [-16 << 2, -9 << 1]
+== [-64, -18].  */
+  if (b == -64)
+	vrp_keep ();
+  if (b == -18)
+	vrp_keep ();
+}
+}
+
+void
+f4 (unsigned int s, unsigned int b)
+{
+  s |= ~(0xffU);
+  /* s in [0xff00, 0x].  */
+  b = (b & 1) + 1;
+  /* b in [1, 2].  */
+  b = s << b;
+  /* s in [0xfc00, 0xfffe].  */
+  if (b == ~0x3ffU)
+vrp_keep ();
+  if (b == ~0x1U)
+vrp_keep ();
+}
+
+/* { dg-final { scan-tree-dump-times "vrp_keep \\(" 6 "vrp1"} }

Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases

2012-09-14 Thread Jakub Jelinek
On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote:
>   * gcc.dg/tree-ssa/vrp81.c: New test.
>   * gcc.dg/tree-ssa/vrp81-2.c: Same.
>   * gcc.dg/tree-ssa/vrp82.c: Same.

Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added
vrp80-2.c test to vrp81.c)?

Jakub


Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases

2012-09-14 Thread Tom de Vries
On 14/09/12 09:38, Jakub Jelinek wrote:
> On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote:
>>  * gcc.dg/tree-ssa/vrp81.c: New test.
>>  * gcc.dg/tree-ssa/vrp81-2.c: Same.
>>  * gcc.dg/tree-ssa/vrp82.c: Same.
> 
> Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added
> vrp80-2.c test to vrp81.c)?
> 

My thinking behind this was the following: vrp80.c and vrp80-2.c are 2 versions
of more or less the same code. In one version, we test whether the inclusive
bounds of the range are folded. In the other version we test whether the
exclusive bounds of the range are not folded.

Given that rationale, should I leave the names like this or rename them?

Thanks,
- Tom

>   Jakub
> 



Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases

2012-09-14 Thread Jakub Jelinek
On Fri, Sep 14, 2012 at 09:51:48AM +0200, Tom de Vries wrote:
> On 14/09/12 09:38, Jakub Jelinek wrote:
> > On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote:
> >>* gcc.dg/tree-ssa/vrp81.c: New test.
> >>* gcc.dg/tree-ssa/vrp81-2.c: Same.
> >>* gcc.dg/tree-ssa/vrp82.c: Same.
> > 
> > Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added
> > vrp80-2.c test to vrp81.c)?
> > 
> 
> My thinking behind this was the following: vrp80.c and vrp80-2.c are 2 
> versions
> of more or less the same code. In one version, we test whether the inclusive
> bounds of the range are folded. In the other version we test whether the
> exclusive bounds of the range are not folded.

IMHO it is enough to give them consecutive numbers, there are many cases
where multiple vrpNN.c tests have been added for more or less the same code,
but I don't care that much, will leave that decision to Richard as the
probable reviewer.

Jakub


Re: [PATCH] Changes in mode switching

2012-09-14 Thread Vladimir Yakovlev
Additionaly.
You can find the patch history in
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01590.html.
I need this changes for  my implementation of vzeroupper placement:
for some statements I have no needs doing real insertion.
I tested the changes on bootstrap using config
../gcc/configure
--prefix=/export/users/vbyakovl/workspaces/vzu/install-middle
--enable-languages=c,c++,fortran

2012/9/14 Vladimir Yakovlev :
> Hello,
>
> I reproduced the failure and found reason of it. I understood haw it
> resolve and now I need small changes only - additional argument of
> EMIT_MODE_SET. Is it good fo trunk?
>
> Thank you,
> Vladimir
>
> 2012-09-14  Vladimir Yakovlev  
>
> * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls.
>
> * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.
>
> * config/i386/i386.h (EMIT_MODE_SET): Added an argument.
>
> * config/sh/sh.h (EMIT_MODE_SET): Added an argument.
>
>
> 2012/8/29 Vladimir Yakovlev :
>> I built using last configure.
>>
>> Thank you,
>> Vladimir
>>
>> 2012/8/29 Kaz Kojima :
 I tryed

 ../gcc/configure --host=i686-pc-linux-gnu
 --target=sh4-unknown-linux-gnu --enable-build-with-cxx --enable-lto
 --enable-shared --enable-threads=posix --enable-clocale=gnu
 --enable-libitm --enable-libgcj
 --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
 --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
 --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
 --with-mpc=/opt2/i686-pc-linux-gnu
 --with-libelf=/opt2/i686-pc-linux-gnu --with-ppl=no
 --enable-languages=c,c++,fortran,java,lto,objc
 --prefix=/export/users/mstester/stability/work/trunk/64/install_sh4

 and have got build error. make.log attached. Could you take a look?
>>>
>>> make.log says
>>>
 make[2]: i686-pc-linux-gnu-ar: Command not found
>>>
>>> It looks your build system is x86_64-unknown-linux-gnu.
>>> Perhaps with specifying --host=x86_64-unknown-linux-gnu instead
>>> of --host=i686-pc-linux-gnu in your configuration, that error
>>> could be resolved, though
>>>
 --with-ld=/usr/local/bin/sh4-unknown-linux-gnu-ld
 --with-as=/usr/local/bin/sh4-unknown-linux-gnu-as
 --with-sysroot=/exp/ldroot --with-mpfr=/opt2/i686-pc-linux-gnu
 --with-mpc=/opt2/i686-pc-linux-gnu
 --with-libelf=/opt2/i686-pc-linux-gnu
>>>
>>> are strongly specific to my environment.  Maybe
>>>
>>>   ../gcc/configure --host=x86_64-unknown-linux-gnu 
>>> --target=sh4-unknown-linux-gnu --enable-languages=c
>>>
>>> and
>>>
>>>   make all-gcc
>>>
>>> is enough to get cc1 for sh4-unknown-linux-gnu.
>>>
>>> Best Regards,
>>> kaz


Re: Backtrace library [3/3]

2012-09-14 Thread Janne Blomqvist
A few quick comments,

1) Although mmap is not guaranteed to be async-signal-safe, in
practice it should be as you mentioned previously. However I see that
when using mmap, the implementation uses pthread mutexes. These are
not guaranteed to be async-signal-safe either, but I guess in practice
as long as you don't try to use the same mutex in "normal" code and
the signal handler (in this case, the same state struct), it should be
OK. Is this so?

2) In backtrace_print(), in lieu of a standard (de-facto or otherwise)
for the backtrace format, why not follow gdb? I.e. a format string
like

"#%d 0x%lx in %s at %s:%d\n"

where the first argument is a frame number, otherwise as currently done.

-- 
Janne Blomqvist


Re: [PATCH] Combine location with block using block_locations

2012-09-14 Thread Eric Botcazou
> I think it's going to make GCC harder to maintain if we drop the -g0
> vs. -g no-code-difference requirement for just some optimization
> levels.

Seconded, this is surely going to open yet another can of worms.

-- 
Eric Botcazou


Re: Backtrace library [3/3]

2012-09-14 Thread Jakub Jelinek
On Fri, Sep 14, 2012 at 11:50:31AM +0300, Janne Blomqvist wrote:
> A few quick comments,
> 
> 1) Although mmap is not guaranteed to be async-signal-safe, in
> practice it should be as you mentioned previously. However I see that
> when using mmap, the implementation uses pthread mutexes. These are
> not guaranteed to be async-signal-safe either, but I guess in practice

Not just not guaranteed, tho locking is just not async-signal-safe.

> as long as you don't try to use the same mutex in "normal" code and
> the signal handler (in this case, the same state struct), it should be
> OK. Is this so?

At least if you get interrupted by async signal while in libbacktrace
(trying to acquire/release some lock or hold it), you'd be definitely out of
luck.  File locking could be used as async signal safe locking, but then
there is still the case of what to do when backtrace* is invoked while the
"lock" is held.  If the locking is just about initialization of the DWARF
reader, then you might want to prepare everything and just atomically change
some global pointer.

Jakub


Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Richard Guenther
On Thu, 13 Sep 2012, Steven Bosscher wrote:

> On Wed, Sep 12, 2012 at 4:52 PM, Steven Bosscher wrote:
> > On Wed, Sep 12, 2012 at 4:02 PM, Richard Guenther wrote:
> >> for a followup (and I bet sth else than PRE blows up at -O2 as well).
> >
> > Actually, the only thing that really blows up is that enemy of scalability, 
> > VRP.
> 
> FWIW, this appears to be due to the well-known problem with the equiv
> set, but also due to the liveness computations that tree-vrp performs,
> since your commit in r139263.
> 
> Any reason why you didn't just re-use the tree-ssa-live machinery?

Probably I didn't know about it or didn't want to keep the full life
problem life (it tries to free things as soon as possible).

> And any reason why you don't let a DEF kill a live SSA name? AFAICT
> you're exposing all SSA names up without ever killing a USE :-)

Eh ;)  We also should traverse blocks backward I suppose.  Also
the RPO traversal suffers from the same issue I noticed in PRE
and for what I invented my_rev_post_order_compute ...
(pre_and_rev_post_order_compute doesn't compute an optimal
reverse post order).

Patch fixing the liveness below, untested sofar, apart from on
tree-ssa.exp where it seems to regress gcc.dg/tree-ssa/pr21086.c :/

As for the equiv sets - yes, that's known.  I wanted to investigate
at some point what happens if we instead record the SSA name we
registered the assert for (thus look up a chain of lattice values
instead of recording all relevant entries in a bitmap).  ISTR there
were some correctness issues, but if we restrict the chaining maybe
we cat get a good compromise here.

Richard.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 191289)
+++ gcc/tree-vrp.c  (working copy)
@@ -5439,7 +5439,6 @@ find_assert_locations_1 (basic_block bb,
 {
   gimple_stmt_iterator si;
   gimple last;
-  gimple phi;
   bool need_assert;
 
   need_assert = false;
@@ -5462,7 +5461,7 @@ find_assert_locations_1 (basic_block bb,
 
   /* Traverse all the statements in BB marking used names and looking
  for statements that may infer assertions for their used operands.  */
-  for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+  for (si = gsi_last_bb (bb); !gsi_end_p (si); gsi_prev (&si))
 {
   gimple stmt;
   tree op;
@@ -5531,6 +5530,9 @@ find_assert_locations_1 (basic_block bb,
}
}
}
+
+  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
+   RESET_BIT (live, SSA_NAME_VERSION (op));
 }
 
   /* Traverse all PHI nodes in BB marking used operands.  */
@@ -5538,7 +5540,11 @@ find_assert_locations_1 (basic_block bb,
 {
   use_operand_p arg_p;
   ssa_op_iter i;
-  phi = gsi_stmt (si);
+  gimple phi = gsi_stmt (si);
+  tree res = gimple_phi_result (phi);
+
+  if (virtual_operand_p (res))
+   continue;
 
   FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE)
{
@@ -5546,6 +5552,8 @@ find_assert_locations_1 (basic_block bb,
  if (TREE_CODE (arg) == SSA_NAME)
SET_BIT (live, SSA_NAME_VERSION (arg));
}
+
+  RESET_BIT (live, SSA_NAME_VERSION (res));
 }
 
   return need_assert;


Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable

2012-09-14 Thread Jakub Jelinek
On Thu, Sep 13, 2012 at 07:46:52PM -0700, H.J. Lu wrote:
> There is no reason why --eh-frame-hdr can't be used with static
> executable on Linux.  This patch enables --eh-frame-hdr for static

Well, there is.  For more than 2 years after the addition of --eh-frame-hdr
support dl_iterate_phdr in libc.a would simply always fail, you aren't
adding any kind of check that old glibc (2001-2003ish) isn't used.
Even in newer glibcs, it relies on AT_* aux vector values provided by the
kernel, if they are not provided for whatever reason, it would fail.

Jakub


Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Richard Guenther
On Fri, 14 Sep 2012, Richard Guenther wrote:

> On Thu, 13 Sep 2012, Steven Bosscher wrote:
> 
> > On Wed, Sep 12, 2012 at 4:52 PM, Steven Bosscher wrote:
> > > On Wed, Sep 12, 2012 at 4:02 PM, Richard Guenther wrote:
> > >> for a followup (and I bet sth else than PRE blows up at -O2 as well).
> > >
> > > Actually, the only thing that really blows up is that enemy of 
> > > scalability, VRP.
> > 
> > FWIW, this appears to be due to the well-known problem with the equiv
> > set, but also due to the liveness computations that tree-vrp performs,
> > since your commit in r139263.
> > 
> > Any reason why you didn't just re-use the tree-ssa-live machinery?
> 
> Probably I didn't know about it or didn't want to keep the full life
> problem life (it tries to free things as soon as possible).
> 
> > And any reason why you don't let a DEF kill a live SSA name? AFAICT
> > you're exposing all SSA names up without ever killing a USE :-)
> 
> Eh ;)  We also should traverse blocks backward I suppose.  Also
> the RPO traversal suffers from the same issue I noticed in PRE
> and for what I invented my_rev_post_order_compute ...
> (pre_and_rev_post_order_compute doesn't compute an optimal
> reverse post order).
> 
> Patch fixing the liveness below, untested sofar, apart from on
> tree-ssa.exp where it seems to regress gcc.dg/tree-ssa/pr21086.c :/

The following not.  Queued for testing (though I doubt it will
help memory usage due to the use of sbitmaps).  I think
jump threading special-casing asserts and the equiv bitmaps are
the real problem of VRP.  What testcase did you notice the live
issue on?

Thanks,
Richard.

2012-09-14  Richard Guenther  

* tree-vrp.c (register_new_assert_for): Simplify for backward
walk.
(find_assert_locations_1): Walk the basic-block backwards,
properly add/prune from live.  Use live for asserts derived
from stmts.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 191289)
+++ gcc/tree-vrp.c  (working copy)
@@ -4384,24 +4384,7 @@ register_new_assert_for (tree name, tree
  && (loc->expr == expr
  || operand_equal_p (loc->expr, expr, 0)))
{
- /* If the assertion NAME COMP_CODE VAL has already been
-registered at a basic block that dominates DEST_BB, then
-we don't need to insert the same assertion again.  Note
-that we don't check strict dominance here to avoid
-replicating the same assertion inside the same basic
-block more than once (e.g., when a pointer is
-dereferenced several times inside a block).
-
-An exception to this rule are edge insertions.  If the
-new assertion is to be inserted on edge E, then it will
-dominate all the other insertions that we may want to
-insert in DEST_BB.  So, if we are doing an edge
-insertion, don't do this dominance check.  */
-  if (e == NULL
- && dominated_by_p (CDI_DOMINATORS, dest_bb, loc->bb))
-   return;
-
- /* Otherwise, if E is not a critical edge and DEST_BB
+ /* If E is not a critical edge and DEST_BB
 dominates the existing location for the assertion, move
 the assertion up in the dominance tree by updating its
 location information.  */
@@ -5439,7 +5422,6 @@ find_assert_locations_1 (basic_block bb,
 {
   gimple_stmt_iterator si;
   gimple last;
-  gimple phi;
   bool need_assert;
 
   need_assert = false;
@@ -5462,7 +5444,7 @@ find_assert_locations_1 (basic_block bb,
 
   /* Traverse all the statements in BB marking used names and looking
  for statements that may infer assertions for their used operands.  */
-  for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+  for (si = gsi_last_bb (bb); !gsi_end_p (si); gsi_prev (&si))
 {
   gimple stmt;
   tree op;
@@ -5479,8 +5461,10 @@ find_assert_locations_1 (basic_block bb,
  tree value;
  enum tree_code comp_code;
 
- /* Mark OP in our live bitmap.  */
- SET_BIT (live, SSA_NAME_VERSION (op));
+ /* If op is not live beyond this stmt, do not bother to insert
+asserts for it.  */
+ if (!TEST_BIT (live, SSA_NAME_VERSION (op)))
+   continue;
 
  /* If OP is used in such a way that we can infer a value
 range for it, and we don't find a previous assertion for
@@ -5520,25 +5504,28 @@ find_assert_locations_1 (basic_block bb,
}
}
 
- /* If OP is used only once, namely in this STMT, don't
-bother creating an ASSERT_EXPR for it.  Such an
-ASSERT_EXPR would do nothing but increase compile time.  */
- if (!has_single_use (op))
-   {
- register_new_assert_for (op, op, comp_code, value,
-

Re: PR libstdc++/54576: random_device isn't protected by _GLIBCXX_USE_C99_STDINT_TR1

2012-09-14 Thread Paolo Carlini

Hi,

On 09/14/2012 03:16 AM, H.J. Lu wrote:

Hi,

include/random has

#ifdef _GLIBCXX_USE_C99_STDINT_TR1

#include  // For uint_fast32_t, uint_fast64_t, uint_least32_t
#include 
#include 

#endif // _GLIBCXX_USE_C99_STDINT_TR1

random_device is defined in . But src/c++11/random.cc
has

#include 
...
   void
   random_device::_M_init(const std::string& token)
   {

It doesn't check if _GLIBCXX_USE_C99_STDINT_TR1 is defined.  This
patch checks it.  OK to install?

I thought this was already history, because it's a Dup.

Anyway, the obvious patch is Ok, thanks, but please put a blank line 
right after "#ifdef _GLIBCXX_USE_C99_STDINT_TR1"-


Thanks!
Paolo.


Re: minor cleanup in forwprop: use get_prop_source_stmt more

2012-09-14 Thread Richard Guenther
On Thu, Sep 13, 2012 at 8:05 PM, Marc Glisse  wrote:
> Hello,
>
> this patch is a minor cleanup of my previous forwprop patches for vectors. I
> have known about get_prop_source_stmt from the beginning, but for some
> reason I always used SSA_NAME_DEF_STMT. This makes the source code slightly
> shorter, and together with PR 54565 it should help get some optimizations to
> apply as early as forwprop1 instead of forwprop2.
>
> There is one line I had badly indented. I am not sure what the policy is for
> that. Silently bundling it with this patch as I am doing is probably not so
> good. I should probably just fix it in svn without asking the list, but I
> was wondering if I should add a ChangeLog entry and post the committed patch
> to the list afterwards? (that's what I would do by default, at worst it is a
> bit of spam)

That's ok, no changelog needed for that.

Ok.

Thanks,
Richard.

> passes bootstrap+testsuite
>
> 2012-09-14  Marc Glisse  
>
> * tree-ssa-forwprop.c (simplify_bitfield_ref): Call
> get_prop_source_stmt.
> (simplify_permutation): Likewise.
> (simplify_vector_constructor): Likewise.
>
> --
> Marc Glisse
> Index: tree-ssa-forwprop.c
> ===
> --- tree-ssa-forwprop.c (revision 191247)
> +++ tree-ssa-forwprop.c (working copy)
> @@ -2599,23 +2599,22 @@ simplify_bitfield_ref (gimple_stmt_itera
>elem_type = TREE_TYPE (TREE_TYPE (op0));
>if (TREE_TYPE (op) != elem_type)
>  return false;
>
>size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>op1 = TREE_OPERAND (op, 1);
>n = TREE_INT_CST_LOW (op1) / size;
>if (n != 1)
>  return false;
>
> -  def_stmt = SSA_NAME_DEF_STMT (op0);
> -  if (!def_stmt || !is_gimple_assign (def_stmt)
> -  || !can_propagate_from (def_stmt))
> +  def_stmt = get_prop_source_stmt (op0, false, NULL);
> +  if (!def_stmt || !can_propagate_from (def_stmt))
>  return false;
>
>op2 = TREE_OPERAND (op, 2);
>idx = TREE_INT_CST_LOW (op2) / size;
>
>code = gimple_assign_rhs_code (def_stmt);
>
>if (code == VEC_PERM_EXPR)
>  {
>tree p, m, index, tem;
> @@ -2630,21 +2629,21 @@ simplify_bitfield_ref (gimple_stmt_itera
> {
>   p = gimple_assign_rhs1 (def_stmt);
> }
>else
> {
>   p = gimple_assign_rhs2 (def_stmt);
>   idx -= nelts;
> }
>index = build_int_cst (TREE_TYPE (TREE_TYPE (m)), idx * size);
>tem = build3 (BIT_FIELD_REF, TREE_TYPE (op),
> -unshare_expr (p), op1, index);
> +   unshare_expr (p), op1, index);
>gimple_assign_set_rhs1 (stmt, tem);
>fold_stmt (gsi);
>update_stmt (gsi_stmt (*gsi));
>return true;
>  }
>
>return false;
>  }
>
>  /* Determine whether applying the 2 permutations (mask1 then mask2)
> @@ -2682,40 +2681,40 @@ is_combined_permutation_identity (tree m
>  /* Combine a shuffle with its arguments.  Returns 1 if there were any
> changes made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
>
>  static int
>  simplify_permutation (gimple_stmt_iterator *gsi)
>  {
>gimple stmt = gsi_stmt (*gsi);
>gimple def_stmt;
>tree op0, op1, op2, op3, arg0, arg1;
>enum tree_code code;
> +  bool single_use_op0 = false;
>
>gcc_checking_assert (gimple_assign_rhs_code (stmt) == VEC_PERM_EXPR);
>
>op0 = gimple_assign_rhs1 (stmt);
>op1 = gimple_assign_rhs2 (stmt);
>op2 = gimple_assign_rhs3 (stmt);
>
>if (TREE_CODE (op2) != VECTOR_CST)
>  return 0;
>
>if (TREE_CODE (op0) == VECTOR_CST)
>  {
>code = VECTOR_CST;
>arg0 = op0;
>  }
>else if (TREE_CODE (op0) == SSA_NAME)
>  {
> -  def_stmt = SSA_NAME_DEF_STMT (op0);
> -  if (!def_stmt || !is_gimple_assign (def_stmt)
> - || !can_propagate_from (def_stmt))
> +  def_stmt = get_prop_source_stmt (op0, false, &single_use_op0);
> +  if (!def_stmt || !can_propagate_from (def_stmt))
> return 0;
>
>code = gimple_assign_rhs_code (def_stmt);
>arg0 = gimple_assign_rhs1 (def_stmt);
>  }
>else
>  return 0;
>
>/* Two consecutive shuffles.  */
>if (code == VEC_PERM_EXPR)
> @@ -2740,35 +2739,31 @@ simplify_permutation (gimple_stmt_iterat
>return remove_prop_source_from_use (op0) ? 2 : 1;
>  }
>
>/* Shuffle of a constructor.  */
>else if (code == CONSTRUCTOR || code == VECTOR_CST)
>  {
>tree opt;
>bool ret = false;
>if (op0 != op1)
> {
> - if (TREE_CODE (op0) == SSA_NAME && !has_single_use (op0))
> + if (TREE_CODE (op0) == SSA_NAME && !single_use_op0)
> return 0;
>
>   if (TREE_CODE (op1) == VECTOR_CST)
> arg1 = op1;
>   else if (TREE_CODE (op1) == SSA_NAME)
> {
>   enum tree_code code2;
>
> - if (!has_single_use (op1))
> -   return 0;
> -
> - 

Re: [PATCH] Changes in mode switching

2012-09-14 Thread Kaz Kojima
Vladimir Yakovlev  wrote:
> I reproduced the failure and found reason of it. I understood haw it
> resolve and now I need small changes only - additional argument of
> EMIT_MODE_SET. Is it good fo trunk?
> 
> Thank you,
> Vladimir
> 
> 2012-09-14  Vladimir Yakovlev  
> 
> * (optimize_mode_switching): Added an argument EMIT_MODE_SET calls.
> 
> * config/epiphany/epiphany.h (EMIT_MODE_SET): Added an argument.
> 
> * config/i386/i386.h (EMIT_MODE_SET): Added an argument.
> 
> * config/sh/sh.h (EMIT_MODE_SET): Added an argument.

No new failures on sh4-unknown-linux-gnu with your patch.
Looks OK to me, though I have no authority to approve it
except SH specific part.

BTW, I guess that the active voice is usual in gcc/ChangeLog.
Also, perhaps mailer issue, a tab should be used for indentation
instead of 8 spaces and the empty line isn't required between
items.  Maybe something like

* mode-switching.c (optimize_mode_switching): Add an argument
EMIT_MODE_SET calls.
* config/epiphany/epiphany.h (EMIT_MODE_SET): Add an argument.
* config/i386/i386.h (EMIT_MODE_SET): Likewise.
* config/sh/sh.h (EMIT_MODE_SET): Likewise.

is a usual form.

Regards,
kaz


Re: [PATCH] Bug fix in LSHIFT_EXPR case with a shift range in tree-vrp, handle more cases

2012-09-14 Thread Richard Guenther
On Fri, Sep 14, 2012 at 9:59 AM, Jakub Jelinek  wrote:
> On Fri, Sep 14, 2012 at 09:51:48AM +0200, Tom de Vries wrote:
>> On 14/09/12 09:38, Jakub Jelinek wrote:
>> > On Fri, Sep 14, 2012 at 09:27:27AM +0200, Tom de Vries wrote:
>> >>* gcc.dg/tree-ssa/vrp81.c: New test.
>> >>* gcc.dg/tree-ssa/vrp81-2.c: Same.
>> >>* gcc.dg/tree-ssa/vrp82.c: Same.
>> >
>> > Why not vrp82.c, vrp83.c and vrp84.c (and rename the recently added
>> > vrp80-2.c test to vrp81.c)?
>> >
>>
>> My thinking behind this was the following: vrp80.c and vrp80-2.c are 2 
>> versions
>> of more or less the same code. In one version, we test whether the inclusive
>> bounds of the range are folded. In the other version we test whether the
>> exclusive bounds of the range are not folded.
>
> IMHO it is enough to give them consecutive numbers, there are many cases
> where multiple vrpNN.c tests have been added for more or less the same code,
> but I don't care that much, will leave that decision to Richard as the
> probable reviewer.

I agree with Jakub - the patch is ok with adjusting the testcase names.

Thanks,
Richard.

> Jakub


Re: [Patch ARM] Update the test case to differ movs and lsrs for ARM mode and non-ARM mode

2012-09-14 Thread Ulrich Weigand
Terry Guo wrote:

> -/* { dg-final { scan-assembler "movs\tr\[0-9\]" } } */
> +/* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target arm_thumb2_ok } }
> }  */
> +/* { dg-final { scan-assembler "movs\tr\[0-9\]" { target { ! arm_thumb2_ok
> } } } } */

This causes the arm.exp testcase to fail with a tcl error for me:

ERROR: tcl error sourcing 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp.
ERROR: unmatched open brace in list
while executing
"foreach op $tmp {
verbose "Processing option: $op" 3
set status [catch "$op" errmsg]
if { $status != 0 } {
if { 0 && [info exists errorInfo] }..."
(procedure "saved-dg-test" line 75)
invoked from within
"saved-dg-test 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/combine-movs.c {} { 
-ansi -pedantic-errors}"
("eval" body line 1)
invoked from within
"eval saved-dg-test $args "
(procedure "dg-test" line 10)
invoked from within
"dg-test $testcase $flags ${default-extra-flags}"
(procedure "dg-runtest" line 10)
invoked from within
"dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
"" $DEFAULT_CFLAGS"
(file "/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp" 
line 37)
invoked from within
"source /home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp"
("uplevel" body line 1)
invoked from within
"uplevel #0 source 
/home/uweigand/fsf/gcc-head/gcc/testsuite/gcc.target/arm/arm.exp"
invoked from within
"catch "uplevel #0 source $test_file_name""

which seems to be caused by a missing space between two closing braces.

Fixed by the following patch.
Committed to mainline as obvious.

Bye,
Ulrich

ChangeLog:

* gcc.target/arm/combine-movs.c: Add missing space.

Index: gcc/testsuite/gcc.target/arm/combine-movs.c
===
*** gcc/testsuite/gcc.target/arm/combine-movs.c (revision 191254)
--- gcc/testsuite/gcc.target/arm/combine-movs.c (working copy)
*** void foo (unsigned long r[], unsigned in
*** 9,13 
  r[i] = 0;
  }
  
! /* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target arm_thumb2_ok } }} */
  /* { dg-final { scan-assembler "movs\tr\[0-9\]" { target { ! arm_thumb2_ok} } 
} } */
--- 9,13 
  r[i] = 0;
  }
  
! /* { dg-final { scan-assembler "lsrs\tr\[0-9\]" { target arm_thumb2_ok } } } 
*/
  /* { dg-final { scan-assembler "movs\tr\[0-9\]" { target { ! arm_thumb2_ok} } 
} } */

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [PATCH 0/4] slim-lto-bootstrap.mk build-target support

2012-09-14 Thread Markus Trippelsdorf
On 2012.09.13 at 14:51 -0700, Andi Kleen wrote:
> Markus Trippelsdorf  writes:
> 
> > Because there is no enthusiastic support for a full libtool update,
> > here is a minimal version that adds a new slim-lto-bootstrap
> > build-config. 
> 
> Can you split the two patches? libtool and ltmain? Thanks for extracting
> those out.

I've split them into four.

> Looks good to me, but eventually this should be just the default for
> lto-bootstrap

Yes, but lets test the new build-config a little bit first.

BTW, I was wondering if I should add:

AR_FOR_TARGET = gcc-ar
NM_FOR_TARGET = gcc-nm
RANLIB_FOR_TARGET = gcc-ranlib

to the build target.

If the patches look Ok, I would be nice if someone could commit them,
because I have no access.

Thanks.
-- 
Markus


Re: [PATCH 1/4] Add slim-lto support to libtool.m4

2012-09-14 Thread Markus Trippelsdorf
Credits go to Ralf Wildenhues.

2012-09-13  Markus Trippelsdorf  

* libtool.m4 : Handle slim-lto objects

diff --git a/libtool.m4 b/libtool.m4
index a7f99ac..5754fb1 100644
--- a/libtool.m4
+++ b/libtool.m4
@@ -3434,6 +3434,7 @@ for ac_symprfx in "" "_"; do
   else
 lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[[
]]\($symcode$symcode*\)[[   ]][[
]]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'"
   fi
+  lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ 
__gnu_lto/d'"
 
   # Check to see that the pipe works correctly.
   pipe_works=no
@@ -4451,7 +4452,7 @@ _LT_EOF
   if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null \
 && test "$tmp_diet" = no
   then
-   tmp_addflag=
+   tmp_addflag=' $pic_flag'
tmp_sharedflag='-shared'
case $cc_basename,$host_cpu in
 pgcc*) # Portland Group C compiler
@@ -5517,8 +5518,8 @@ if test "$_lt_caught_CXX_error" != yes; then
   # Check if GNU C++ uses GNU ld as the underlying linker, since the
   # archiving commands below assume that GNU ld is being used.
   if test "$with_gnu_ld" = yes; then
-_LT_TAGVAR(archive_cmds, $1)='$CC -shared -nostdlib $predep_objects 
$libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname $wl$soname -o 
$lib'
-_LT_TAGVAR(archive_expsym_cmds, $1)='$CC -shared -nostdlib 
$predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname 
$wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
+_LT_TAGVAR(archive_cmds, $1)='$CC $pic_flag -shared -nostdlib 
$predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname 
$wl$soname -o $lib'
+_LT_TAGVAR(archive_expsym_cmds, $1)='$CC $pic_flag -shared -nostdlib 
$predep_objects $libobjs $deplibs $postdep_objects $compiler_flags ${wl}-soname 
$wl$soname ${wl}-retain-symbols-file $wl$export_symbols -o $lib'
 
 _LT_TAGVAR(hardcode_libdir_flag_spec, $1)='${wl}-rpath ${wl}$libdir'
 _LT_TAGVAR(export_dynamic_flag_spec, $1)='${wl}--export-dynamic'
@@ -6495,6 +6496,13 @@ public class foo {
 };
 _LT_EOF
 ])
+
+_lt_libdeps_save_CFLAGS=$CFLAGS
+case "$CC $CFLAGS " in #(
+*\ -flto*\ *) CFLAGS="$CFLAGS -fno-lto" ;;
+*\ -fwhopr*\ *) CFLAGS="$CFLAGS -fno-whopr" ;;
+esac
+
 dnl Parse the compiler output and extract the necessary
 dnl objects, libraries and library flags.
 if AC_TRY_EVAL(ac_compile); then
@@ -6543,6 +6551,7 @@ if AC_TRY_EVAL(ac_compile); then
fi
;;
 
+*.lto.$objext) ;; # Ignore GCC LTO objects
 *.$objext)
# This assumes that the test object file only shows up
# once in the compiler output.
@@ -6578,6 +6587,7 @@ else
 fi
 
 $RM -f confest.$objext
+CFLAGS=$_lt_libdeps_save_CFLAGS
 
 # PORTME: override above test o


Re: [PATCH 2/4] Add slim-lto support to ltmain.sh

2012-09-14 Thread Markus Trippelsdorf
Credits go to Ralf Wildenhues.

2012-09-13  Markus Trippelsdorf  

* ltmain.sh: Handle lto options

diff --git a/ltmain.sh b/ltmain.sh
index a03433f..2e09101 100644
--- a/ltmain.sh
+++ b/ltmain.sh
@@ -4980,7 +4980,8 @@ func_mode_link ()
   # @file GCC response files
   # -tp=* Portland pgcc target processor selection
   -64|-mips[0-9]|-r[0-9][0-9]*|-xarch=*|-xtarget=*|+DA*|+DD*|-q*|-m*| \
-  -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*)
+  -t[45]*|-txscale*|-p|-pg|--coverage|-fprofile-*|-F*|@*|-tp=*| \
+  -O*|-flto*|-fwhopr|-fuse-linker-plugin)
 func_quote_for_eval "$arg"
arg="$func_quote_for_eval_result"
 func_append com


Re: [PATCH 3/4] Pass CFLAGS to fixincludes

2012-09-14 Thread Markus Trippelsdorf
2012-09-13  Markus Trippelsdorf  

* Makefile.in (configure-(build-)fixincludes): Pass CFLAGS

diff --git a/Makefile.in b/Makefile.in
index 0108162..891168d 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -2835,6 +2835,7 @@ configure-build-fixincludes:
test ! -f $(BUILD_SUBDIR)/fixincludes/Makefile || exit 0; \
$(SHELL) $(srcdir)/mkinstalldirs $(BUILD_SUBDIR)/fixincludes ; \
$(BUILD_EXPORTS)  \
+   CFLAGS="$(STAGE_CFLAGS)"; export CFLAGS; \
echo Configuring in $(BUILD_SUBDIR)/fixincludes; \
cd "$(BUILD_SUBDIR)/fixincludes" || exit 1; \
case $(srcdir) in \
@@ -2870,6 +2871,7 @@ all-build-fixincludes: configure-build-fixincludes
$(BUILD_EXPORTS)  \
(cd $(BUILD_SUBDIR)/fixincludes && \
  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_BUILD_FLAGS)  \
+   CFLAGS="$(STAGE_CFLAGS)" \
$(TARGET-build-fixincludes))
 @endif build-fixincludes
 
@@ -7745,6 +7747,7 @@ configure-fixincludes:
test ! -f $(HOST_SUBDIR)/fixincludes/Makefile || exit 0; \
$(SHELL) $(srcdir)/mkinstalldirs $(HOST_SUBDIR)/fixincludes ; \
$(HOST_EXPORTS)  \
+   CFLAGS="$(STAGE_CFLAGS)"; export CFLAGS; \
echo Configuring in $(HOST_SUBDIR)/fixincludes; \
cd "$(HOST_SUBDIR)/fixincludes" || exit 1; \
case $(srcdir) in \
@@ -7779,6 +7782,7 @@ all-fixincludes: configure-fixincludes
$(HOST_EXPORTS)  \
(cd $(HOST_SUBDIR)/fixincludes && \
  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS)  \
+   CFLAGS="$(STAGE_CFLAGS)" \
$(TARGET-fixincludes))
 @endif fixincludes
 
-- 
Markus


Re: [PATCH 4/4] Add new slim-lto-bootstrap.mk build-target

2012-09-14 Thread Markus Trippelsdorf
2012-09-13  Markus Trippelsdorf  

* config/slim-lto-bootstrap.mk: new build-config

diff --git a/config/slim-lto-bootstrap.mk b/config/slim-lto-bootstrap.mk
new file mode 100644
index 000..11d1252
--- /dev/null
+++ b/config/slim-lto-bootstrap.mk
@@ -0,0 +1,9 @@
+# This option enables slim LTO for stage2 and stage3.
+
+STAGE2_CFLAGS += -flto=jobserver -fno-fat-lto-objects -frandom-seed=1
+STAGE3_CFLAGS += -flto=jobserver -fno-fat-lto-objects -frandom-seed=1
+STAGE_CFLAGS += -fuse-linker-plugin
+STAGEprofile_CFLAGS += -fno-lto
+AR = gcc-ar
+NM = gcc-nm
+RANLIB = gcc-ranlib
-- 
Markus


Re: [PATCH] Combine location with block using block_locations

2012-09-14 Thread Diego Novillo

On 2012-09-14 04:59 , Eric Botcazou wrote:

I think it's going to make GCC harder to maintain if we drop the -g0
vs. -g no-code-difference requirement for just some optimization
levels.


Seconded, this is surely going to open yet another can of worms.


Agreed.


Diego.



Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Steven Bosscher
On Fri, Sep 14, 2012 at 11:43 AM, Richard Guenther  wrote:
>> > Any reason why you didn't just re-use the tree-ssa-live machinery?
>>
>> Probably I didn't know about it or didn't want to keep the full life
>> problem life (it tries to free things as soon as possible).

I think it'd be good to use the tree-ssa-live stuff instead of a local
liveness dataflow solver, even if that may allocate a bit of extra
memory. tree-ssa-live is very smart about how liveness is computed.
I'll experiment with using it in tree-vrp.


>> > And any reason why you don't let a DEF kill a live SSA name? AFAICT
>> > you're exposing all SSA names up without ever killing a USE :-)
>>
>> Eh ;)  We also should traverse blocks backward I suppose.  Also
>> the RPO traversal suffers from the same issue I noticed in PRE
>> and for what I invented my_rev_post_order_compute ...
>> (pre_and_rev_post_order_compute doesn't compute an optimal
>> reverse post order).

Eh, what do you mean with "optimal" here?

Yikes, I didn't know about my_rev_post_order_compute. How horrible!
That function doesn't compute reverse post-order of the CFG, but a
post-order of the reverse CFG!


> The following not.  Queued for testing (though I doubt it will
> help memory usage due to the use of sbitmaps).  I think
> jump threading special-casing asserts and the equiv bitmaps are
> the real problem of VRP.  What testcase did you notice the live
> issue on?

I don't remember exactly what test case it was, but it had a large
switch in it. Maybe Brad Lucier's scheme interpreter, but I 'm not
sure.

Ciao!
Steven


Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Richard Guenther
On Fri, 14 Sep 2012, Steven Bosscher wrote:

> On Fri, Sep 14, 2012 at 11:43 AM, Richard Guenther  wrote:
> >> > Any reason why you didn't just re-use the tree-ssa-live machinery?
> >>
> >> Probably I didn't know about it or didn't want to keep the full life
> >> problem life (it tries to free things as soon as possible).
> 
> I think it'd be good to use the tree-ssa-live stuff instead of a local
> liveness dataflow solver, even if that may allocate a bit of extra
> memory. tree-ssa-live is very smart about how liveness is computed.
> I'll experiment with using it in tree-vrp.

Ok.  Note that VRP likes to know liveness at a given stmt (but
actually doesn't use it - it would, with my patch) here:

  /* If OP is used only once, namely in this STMT, don't
 bother creating an ASSERT_EXPR for it.  Such an
 ASSERT_EXPR would do nothing but increase compile time.  
*/
  if (!has_single_use (op))

that really wants to know if anything would consume the assert, thus
if OP is live below stmt.

> >> > And any reason why you don't let a DEF kill a live SSA name? AFAICT
> >> > you're exposing all SSA names up without ever killing a USE :-)
> >>
> >> Eh ;)  We also should traverse blocks backward I suppose.  Also
> >> the RPO traversal suffers from the same issue I noticed in PRE
> >> and for what I invented my_rev_post_order_compute ...
> >> (pre_and_rev_post_order_compute doesn't compute an optimal
> >> reverse post order).
> 
> Eh, what do you mean with "optimal" here?
> 
> Yikes, I didn't know about my_rev_post_order_compute. How horrible!
> That function doesn't compute reverse post-order of the CFG, but a
> post-order of the reverse CFG!

Ok, well - then that's what we need for compute_antic to have
minmal number of iterations and it is what VRP needs.  Visit
all successors before BB if possible.

;)

> 
> > The following not.  Queued for testing (though I doubt it will
> > help memory usage due to the use of sbitmaps).  I think
> > jump threading special-casing asserts and the equiv bitmaps are
> > the real problem of VRP.  What testcase did you notice the live
> > issue on?
> 
> I don't remember exactly what test case it was, but it had a large
> switch in it. Maybe Brad Lucier's scheme interpreter, but I 'm not
> sure.

I see.

Richard.


[PATCH] Fix PR54565

2012-09-14 Thread Richard Guenther

This, as suggested in the PR arranges update_address_taken to be
run before forwprop.  The patch simply moves it to after CCP
instead of before alias computation.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-09-14  Richard Guenther  

PR tree-optimization/54565
* passes.c (init_optimization_passes): Adjust comments.
(execute_function_todo): Do not execute execute_update_addresses_taken
before processing TODO_rebuild_alias.
* tree-ssa-ccp.c (do_ssa_ccp): Schedule TODO_update_address_taken.

* gcc.dg/tree-ssa/ssa-ccp-17.c: Adjust.
* gcc.dg/tree-ssa/forwprop-6.c: Likewise.  Remove XFAIL.

Index: gcc/passes.c
===
--- gcc/passes.c(revision 191289)
+++ gcc/passes.c(working copy)
@@ -1297,11 +1297,11 @@ init_optimization_passes (void)
  NEXT_PASS (pass_remove_cgraph_callee_edges);
  NEXT_PASS (pass_rename_ssa_copies);
  NEXT_PASS (pass_ccp);
+ /* After CCP we rewrite no longer addressed locals into SSA
+form if possible.  */
  NEXT_PASS (pass_forwprop);
  /* pass_build_ealias is a dummy pass that ensures that we
-execute TODO_rebuild_alias at this point.  Re-building
-alias information also rewrites no longer addressed
-locals into SSA form if possible.  */
+execute TODO_rebuild_alias at this point.  */
  NEXT_PASS (pass_build_ealias);
  NEXT_PASS (pass_sra_early);
  NEXT_PASS (pass_fre);
@@ -1371,11 +1371,11 @@ init_optimization_passes (void)
   NEXT_PASS (pass_rename_ssa_copies);
   NEXT_PASS (pass_complete_unrolli);
   NEXT_PASS (pass_ccp);
+  /* After CCP we rewrite no longer addressed locals into SSA
+form if possible.  */
   NEXT_PASS (pass_forwprop);
   /* pass_build_alias is a dummy pass that ensures that we
-execute TODO_rebuild_alias at this point.  Re-building
-alias information also rewrites no longer addressed
-locals into SSA form if possible.  */
+execute TODO_rebuild_alias at this point.  */
   NEXT_PASS (pass_build_alias);
   NEXT_PASS (pass_return_slot);
   NEXT_PASS (pass_phiprop);
@@ -1414,6 +1414,8 @@ init_optimization_passes (void)
   NEXT_PASS (pass_object_sizes);
   NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_ccp);
+  /* After CCP we rewrite no longer addressed locals into SSA
+form if possible.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
@@ -1773,13 +1775,10 @@ execute_function_todo (void *data)
   cfun->last_verified &= ~TODO_verify_ssa;
 }
 
-  if (flags & TODO_rebuild_alias)
-{
-  execute_update_addresses_taken ();
-  if (flag_tree_pta)
-   compute_may_aliases ();
-}
-  else if (optimize && (flags & TODO_update_address_taken))
+  if (flag_tree_pta && (flags & TODO_rebuild_alias))
+compute_may_aliases ();
+
+  if (optimize && (flags & TODO_update_address_taken))
 execute_update_addresses_taken ();
 
   if (flags & TODO_remove_unused_locals)
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c  (revision 191289)
+++ gcc/tree-ssa-ccp.c  (working copy)
@@ -2105,7 +2105,8 @@ do_ssa_ccp (void)
   ccp_initialize ();
   ssa_propagate (ccp_visit_stmt, ccp_visit_phi_node);
   if (ccp_finalize ())
-todo = (TODO_cleanup_cfg | TODO_update_ssa | TODO_remove_unused_locals);
+todo = (TODO_cleanup_cfg | TODO_update_ssa | TODO_remove_unused_locals
+   | TODO_update_address_taken);
   free_dominance_info (CDI_DOMINATORS);
   return todo;
 }
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-17.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-17.c  (revision 191289)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-ccp-17.c  (working copy)
@@ -26,7 +26,7 @@ int foobar(void)
   return ((const struct Foo *)p)->i;
 }
 
-/* { dg-final { scan-tree-dump "= i;" "ccp1" } } */
+/* { dg-final { scan-tree-dump "= i_.;" "ccp1" } } */
 /* { dg-final { scan-tree-dump "= f.i;" "ccp1" } } */
 /* { dg-final { scan-tree-dump "= g.i;" "ccp1" } } */
 /* { dg-final { cleanup-tree-dump "ccp1" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/forwprop-6.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/forwprop-6.c  (revision 191289)
+++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-6.c  (working copy)
@@ -22,6 +22,7 @@ void f(void)
particular situation before doing this transformation we have to
assure that a is killed by a dominating store via type float for
it to be valid.  Then we might as well handle the situation by
-   value-numbering, removing the load alltogether.  */
-/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 1 "forwprop1" { xf

Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Richard Guenther
On Fri, 14 Sep 2012, Richard Guenther wrote:

> As for the equiv sets - yes, that's known.  I wanted to investigate
> at some point what happens if we instead record the SSA name we
> registered the assert for (thus look up a chain of lattice values
> instead of recording all relevant entries in a bitmap).  ISTR there
> were some correctness issues, but if we restrict the chaining maybe
> we cat get a good compromise here.

Actually the only users are compare_name_with_value and
compare_names.  The idea with a simple chain, thus

struct value_range_d
{
...
  /* SSA name whose value range is equivalent to this one.  */
  tree equiv;
};

is only non-trivial for intersections (thus PHI nodes).  Still
there the equivalent range is the nearest dominating one from
the chains of all PHI arguments.  I can only think of the way
we do iteration that would mess up the chains (but not sure
if we need to require them to be in dom order).

Probably best to try implement the above alongside the bitmaps
and compare the results in compare_name_with_value / compare_names.

Richard.


Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable

2012-09-14 Thread H.J. Lu
On Fri, Sep 14, 2012 at 2:41 AM, Jakub Jelinek  wrote:
> On Thu, Sep 13, 2012 at 07:46:52PM -0700, H.J. Lu wrote:
>> There is no reason why --eh-frame-hdr can't be used with static
>> executable on Linux.  This patch enables --eh-frame-hdr for static
>
> Well, there is.  For more than 2 years after the addition of --eh-frame-hdr
> support dl_iterate_phdr in libc.a would simply always fail, you aren't
> adding any kind of check that old glibc (2001-2003ish) isn't used.
> Even in newer glibcs, it relies on AT_* aux vector values provided by the
> kernel, if they are not provided for whatever reason, it would fail.
>
> Jakub

It was implemented in

http://sourceware.org/ml/libc-alpha/2003-10/msg00098.html

for glibc 2.3.0 and we can check

AT_PHDR: 0x400040
AT_PHNUM:10

with LD_SHOW_AUXV.

-- 
H.J.


Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-14 Thread H.J. Lu
On Thu, Sep 13, 2012 at 10:42 AM, Uros Bizjak  wrote:
> On Thu, Sep 13, 2012 at 5:52 PM, Jakub Jelinek  wrote:
>
>> The fma-*.c testcase show that these intrinsics probably mean to preserve
>> the high elements (other than the lowest) of the first argument of the
>> fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
>> there the destination register, but that varies - for 132 and 213 it is the
>> first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
>> for 231 it is the last one).  What the expander did was to put there
>> an uninitialized pseudo, so we ended up with pretty random content, before
>> H.J's http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190492 it happened
>> to work by accident, but when things changed slightly and reload chose
>> different alternative, this broke.
>>
>> The following patch fixes it, by tweaking the header so that the first
>> argument is not negated (we negate the second one instead), as we don't want
>> to negate the high elements if e.g. for whatever reason combiner doesn't
>> match it.  It fixes the expander to use a dup of the X operand as the high
>> element provider for the pattern, removes the 231 alternatives (because
>> those provide different destination high elements) and removes commutative
>> marker (again, that would mean different high elements).
>
> Can we introduce additional "*fmai_fmadd__1" pattern (and
> others) that would cover missing 231 alternative?
>
>> 2012-09-13  Jakub Jelinek  
>>
>> PR target/54564
>> * config/i386/sse.md (fmai_vmfmadd_): Use (match_dup 1)
>> instead of (match_dup 0) as second argument to vec_merge.
>> (*fmai_fmadd_, *fmai_fmsub_): Likewise.
>> Remove third alternative.
>> (*fmai_fnmadd_, *fmai_fnmsub_): Likewise.  Negate
>> operand 2 instead of operand 1, but put it as first argument
>> of fma.
>>
>> * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
>> _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
>> of the first.
>
> OK, but header change should be also reviewed by H.J.
>

It looks OK to me.

Thanks.


-- 
H.J.


Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Steven Bosscher
On Fri, Sep 14, 2012 at 12:50 PM, Richard Guenther wrote:
>> Yikes, I didn't know about my_rev_post_order_compute. How horrible!
>> That function doesn't compute reverse post-order of the CFG, but a
>> post-order of the reverse CFG!
>
> Ok, well - then that's what we need for compute_antic to have
> minimal number of iterations and it is what VRP needs.  Visit
> all successors before BB if possible.

Right, visit all successors of BB before BB itself, aka visiting in
topological order of the reverse CFG. But your
my_rev_post_order_compute doesn't actually compute a post-order of the
reverse CFG. The first block pushed into the array is EXIT_BLOCK, iff
include_entry_exit==true. Fortunately, the function is only ever
called with include_entry_exit==false.

Ciao!
Steven


Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable

2012-09-14 Thread Jakub Jelinek
On Fri, Sep 14, 2012 at 05:12:19AM -0700, H.J. Lu wrote:
> On Fri, Sep 14, 2012 at 2:41 AM, Jakub Jelinek  wrote:
> > Well, there is.  For more than 2 years after the addition of --eh-frame-hdr
> > support dl_iterate_phdr in libc.a would simply always fail, you aren't
> > adding any kind of check that old glibc (2001-2003ish) isn't used.
> > Even in newer glibcs, it relies on AT_* aux vector values provided by the
> > kernel, if they are not provided for whatever reason, it would fail.

> It was implemented in
> 
> http://sourceware.org/ml/libc-alpha/2003-10/msg00098.html
> 
> for glibc 2.3.0 and we can check

Yeah, I know, but that is still later than 2001 when it was implemented for
dynamically linked executables.
USE_PT_GNU_EH_FRAME is defined even for glibc 2.2.something (if DT_CONFIG
macro is defined in headers).

> AT_PHDR: 0x400040
> AT_PHNUM:10
> 
> with LD_SHOW_AUXV.

I was worried about some loaders that wouldn't pass the aux vector down.
E.g. valgrind's loader does, but perhaps others wouldn't need to.

Anyway, IMHO statically linked binaries aren't something one should spend
too much time on, they shouldn't be used (with very few exceptions) at all.

Jakub


Re: [PATCH] Fix PR54489 - FRE needing AVAIL_OUT

2012-09-14 Thread Richard Guenther
On Fri, 14 Sep 2012, Steven Bosscher wrote:

> On Fri, Sep 14, 2012 at 12:50 PM, Richard Guenther wrote:
> >> Yikes, I didn't know about my_rev_post_order_compute. How horrible!
> >> That function doesn't compute reverse post-order of the CFG, but a
> >> post-order of the reverse CFG!
> >
> > Ok, well - then that's what we need for compute_antic to have
> > minimal number of iterations and it is what VRP needs.  Visit
> > all successors before BB if possible.
> 
> Right, visit all successors of BB before BB itself, aka visiting in
> topological order of the reverse CFG. But your
> my_rev_post_order_compute doesn't actually compute a post-order of the
> reverse CFG. The first block pushed into the array is EXIT_BLOCK, iff
> include_entry_exit==true. Fortunately, the function is only ever
> called with include_entry_exit==false.

Oops.

If you can figure out a better name for the function we should
probably move it to cfganal.c

Richard.


[C++ Patch] PR 54575

2012-09-14 Thread Paolo Carlini

Hi,

here we crash because strip_typedefs while processing _RequireInputIter 
calls make_typename_type which returns error_mark_node (# line 3281). 
Thus I'm simply checking for result == error_mark_node right after the 
switch (in the switch finish_decltype_type could also return 
error_mark_node, for example) and before handling the alignment (where 
we are crashing now). Issue seems rather straightforward.


Tested x86_64-linux.

Thanks,
Paolo.

PS: I'm also attaching a patchlet for a couple of hard errors in 
make_typename_type not protected by (complain & tf_error) spotted in 
make_typename_type.


/
/cp
2012-09-14  Paolo Carlini  

PR c++/54575
* tree.c (strip_typedefs): Check result for error_mark_node.
* pt.c (canonicalize_type_argument): Check strip_typedefs return
value for error_mark_node.

/testsuite
2012-09-14  Paolo Carlini  

PR c++/54575
* g++.dg/cpp0x/pr54575.C: New.
Index: testsuite/g++.dg/cpp0x/pr54575.C
===
--- testsuite/g++.dg/cpp0x/pr54575.C(revision 0)
+++ testsuite/g++.dg/cpp0x/pr54575.C(revision 0)
@@ -0,0 +1,27 @@
+// PR c++/54575
+// { dg-do compile { target c++11 } }
+
+template
+struct is_convertible { static const bool value = true; };
+
+template struct enable_if   { };
+template<> struct enable_if { typedef int type; };
+
+template
+using _RequireInputIter
+= typename enable_if::value>::type;
+
+template
+struct X
+{
+  template>
+void insert(_InputIterator) {}
+};
+
+template
+void foo()
+{
+  X subdomain_indices;
+  subdomain_indices.insert(0);
+}
Index: cp/tree.c
===
--- cp/tree.c   (revision 191290)
+++ cp/tree.c   (working copy)
@@ -1210,8 +1210,10 @@ strip_typedefs (tree t)
   break;
 }
 
+  if (result == error_mark_node)
+return error_mark_node;
   if (!result)
-  result = TYPE_MAIN_VARIANT (t);
+result = TYPE_MAIN_VARIANT (t);
   if (TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (result)
   || TYPE_ALIGN (t) != TYPE_ALIGN (result))
 {
Index: cp/pt.c
===
--- cp/pt.c (revision 191290)
+++ cp/pt.c (working copy)
@@ -6114,6 +6114,8 @@ canonicalize_type_argument (tree arg, tsubst_flags
 return arg;
   mv = TYPE_MAIN_VARIANT (arg);
   arg = strip_typedefs (arg);
+  if (arg == error_mark_node)
+return error_mark_node;
   if (TYPE_ALIGN (arg) != TYPE_ALIGN (mv)
   || TYPE_ATTRIBUTES (arg) != TYPE_ATTRIBUTES (mv))
 {
2012-09-14  Paolo Carlini  

* decl.c (make_typename_type): Only error out if tf_error is set
in complain.
Index: decl.c
===
--- decl.c  (revision 191290)
+++ decl.c  (working copy)
@@ -3235,13 +3235,15 @@ make_typename_type (tree context, tree name, enum
name = TREE_OPERAND (fullname, 0) = DECL_NAME (name);
   else if (TREE_CODE (name) == OVERLOAD)
{
- error ("%qD is not a type", name);
+ if (complain & tf_error)
+   error ("%qD is not a type", name);
  return error_mark_node;
}
 }
   if (TREE_CODE (name) == TEMPLATE_DECL)
 {
-  error ("%qD used without template parameters", name);
+  if (complain & tf_error)
+   error ("%qD used without template parameters", name);
   return error_mark_node;
 }
   gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);


Re: vector comparisons in C++

2012-09-14 Thread Jason Merrill

On 09/13/2012 07:37 PM, Marc Glisse wrote:

Looks like a latent bug in fold_unary. The following seems to work in
this case.


Looks good.

Jason




Re: vector comparisons in C++

2012-09-14 Thread Marc Glisse
Here is the patch I just tested. Changes compared to the previous patch 
include:


* same_type_ignoring_top_level_qualifiers_p
* build_vector_type: don't use an opaque vector for the return type of
  operator< (not sure what the point was of making it opaque?)
* Disable BIT_AND -> TRUTH_AND optimization for vectors
* Disable (type)(a (a a<=b, use a vector type, not boolean
* 2 more testcases (through which I discovered the issues)

I am sure there are other optimizations that do weird things, but I should 
stop increasing the size of this patch... I'll update the doc in a later 
patch.


Ok?

2012-09-14  Marc Glisse  
PR c++/54427

gcc/ChangeLog
* fold-const.c (fold_unary_loc): Disable for VECTOR_TYPE.
(fold_binary_loc): Likewise.
* gimple-fold.c (and_comparisons_1): Handle VECTOR_TYPE.
(or_comparisons_1): Likewise.

gcc/cp/ChangeLog
* typeck.c (cp_build_binary_op) [LSHIFT_EXPR, RSHIFT_EXPR, EQ_EXPR,
NE_EXPR, LE_EXPR, GE_EXPR, LT_EXPR, GT_EXPR]: Handle VECTOR_TYPE.

gcc/testsuite/ChangeLog
* g++.dg/other/vector-compare.C: New testcase.
* gcc/testsuite/c-c++-common/vector-compare-3.c: New testcase.
* gcc.dg/vector-shift.c: Move ...
* c-c++-common/vector-shift.c: ... here.
* gcc.dg/vector-shift1.c: Move ...
* c-c++-common/vector-shift1.c: ... here.
* gcc.dg/vector-shift3.c: Move ...
* c-c++-common/vector-shift3.c: ... here.
* gcc.dg/vector-compare-1.c: Move ...
* c-c++-common/vector-compare-1.c: ... here.
* gcc.dg/vector-compare-2.c: Move ...
* c-c++-common/vector-compare-2.c: ... here.
* gcc.c-torture/execute/vector-compare-1.c: Move ...
* c-c++-common/torture/vector-compare-1.c: ... here.
* gcc.c-torture/execute/vector-compare-2.x: Delete.
* gcc.c-torture/execute/vector-compare-2.c: Move ...
* c-c++-common/torture/vector-compare-2.c: ... here.
* gcc.c-torture/execute/vector-shift.c: Move ...
* c-c++-common/torture/vector-shift.c: ... here.
* gcc.c-torture/execute/vector-shift2.c: Move ...
* c-c++-common/torture/vector-shift2.c: ... here.
* gcc.c-torture/execute/vector-subscript-1.c: Move ...
* c-c++-common/torture/vector-subscript-1.c: ... here.
* gcc.c-torture/execute/vector-subscript-2.c: Move ...
* c-c++-common/torture/vector-subscript-2.c: ... here.
* gcc.c-torture/execute/vector-subscript-3.c: Move ...
* c-c++-common/torture/vector-subscript-3.c: ... here.


--
Marc GlisseIndex: gcc/testsuite/g++.dg/other/vector-compare.C
===
--- gcc/testsuite/g++.dg/other/vector-compare.C (revision 0)
+++ gcc/testsuite/g++.dg/other/vector-compare.C (revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu++11 -Wall" } */
+
+// Check that we can compare vector types that really are the same through
+// typedefs.
+
+typedef float v4f __attribute__((vector_size(4*sizeof(float;
+
+template  void eat (T&&) {}
+
+template 
+struct Vec
+{
+  typedef T type __attribute__((vector_size(4*sizeof(T;
+
+  template 
+  static void fun (type const& t, U& u) { eat (t > u); }
+};
+
+long long
+f (v4f *x, v4f const *y)
+{
+  return ((*x < *y) | (*x <= *y))[2];
+}
+
+int main ()
+{
+  v4f x = {0,1,2,3};
+  typedef decltype (x < x) v4i;
+  v4i y = {4,5,6,7}; // v4i is not opaque
+  Vec::type f = {-1,5,2,3.1};
+  v4i c = (x == f) == y;
+  eat (c);
+  Vec::fun (f, x);
+  Vec::fun (x, f);
+  Vec::fun (f, f);
+  Vec::fun (x, x);
+}

Property changes on: gcc/testsuite/g++.dg/other/vector-compare.C
___
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/testsuite/c-c++-common/vector-compare-3.c
===
--- gcc/testsuite/c-c++-common/vector-compare-3.c   (revision 0)
+++ gcc/testsuite/c-c++-common/vector-compare-3.c   (revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v4i __attribute__((vector_size(4*sizeof(int;
+
+// fold should not turn (vec_other)(x *y;
+}
+

Property changes on: gcc/testsuite/c-c++-common/vector-compare-3.c
___
Added: svn:eol-style
   + native
Added: svn:keywords
   + Author Date Id Revision URL

Index: gcc/testsuite/c-c++-common/vector-shift.c
===
--- gcc/testsuite/c-c++-common/vector-shift.c   (revision 190834)
+++ gcc/testsuite/c-c++-common/vector-shift.c   (working copy)
@@ -1,11 +1,12 @@
 /* { dg-do compile } */
+/* { dg-prune-output "in evaluation of" } */
 #define vector(elcount, type)  \
 __attribute__((vector_size((elcount)*sizeof(type type
 
 int main (int argc, char *argv[]) {
 vector(4,char)

Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable

2012-09-14 Thread H.J. Lu
On Fri, Sep 14, 2012 at 5:26 AM, Jakub Jelinek  wrote:
> On Fri, Sep 14, 2012 at 05:12:19AM -0700, H.J. Lu wrote:
>> On Fri, Sep 14, 2012 at 2:41 AM, Jakub Jelinek  wrote:
>> > Well, there is.  For more than 2 years after the addition of --eh-frame-hdr
>> > support dl_iterate_phdr in libc.a would simply always fail, you aren't
>> > adding any kind of check that old glibc (2001-2003ish) isn't used.
>> > Even in newer glibcs, it relies on AT_* aux vector values provided by the
>> > kernel, if they are not provided for whatever reason, it would fail.
>
>> It was implemented in
>>
>> http://sourceware.org/ml/libc-alpha/2003-10/msg00098.html
>>
>> for glibc 2.3.0 and we can check
>
> Yeah, I know, but that is still later than 2001 when it was implemented for
> dynamically linked executables.
> USE_PT_GNU_EH_FRAME is defined even for glibc 2.2.something (if DT_CONFIG
> macro is defined in headers).

Here is a patch to add an option to use --eh-frame-hdr on static.
It won't enable it for uclibc since it is > glibc 2.2.

>> AT_PHDR: 0x400040
>> AT_PHNUM:10
>>
>> with LD_SHOW_AUXV.
>
> I was worried about some loaders that wouldn't pass the aux vector down.
> E.g. valgrind's loader does, but perhaps others wouldn't need to.

Those loaders are broken for binaries, static or dynamic, which use AUXV,
independent of this change, and they should be fixed.  It shouldn't block
using --eh-frame-hdr for -static.

> Anyway, IMHO statically linked binaries aren't something one should spend
> too much time on, they shouldn't be used (with very few exceptions) at all.
>

Android doesn't need those legacy stuff.  We have to keep it
since GCC doesn't pass --eh-frame-hdr for -static.  At minimum,
I'd like a configure option to use --eh-frame-hdr for -static, even if
it is off by default.

OK to install?

Thanks.

-- 
H.J.
---
gcc/

2012-09-14  H.J. Lu  

PR debug/54568
* configure.ac: Add --enable-eh-frame-hdr-for-static.  Set
USE_EH_FRAME_HDR_FOR_STATIC if PT_GNU_EH_FRAME is supported for
static executable.
* config.in: Regenerated.
* configure: Likewise.

* config/gnu-user.h (LINK_EH_SPEC): Defined as "--eh-frame-hdr "
if USE_EH_FRAME_HDR_FOR_STATIC is defined.
* config/sol2.h (LINK_EH_SPEC): Likewise.
* config/openbsd.h (LINK_EH_SPEC): Likewise.
* config/alpha/elf.h (LINK_EH_SPEC): Likewise.
* config/freebsd.h (LINK_EH_SPEC): Likewise.
* config/rs6000/sysv4.h (LINK_EH_SPEC): Likewise.

gcc/testsuite/

2012-09-13  H.J. Lu  

PR debug/54568
* g++.dg/eh/spec3-static.C: New test.

libgcc/

2012-09-14  H.J. Lu  

PR debug/54568
* crtstuff.c (USE_PT_GNU_EH_FRAME): Check CRTSTUFFT_O together
with USE_EH_FRAME_HDR_FOR_STATIC.


gcc-pr54568.patch
Description: Binary data


Re: [PATCH 0/4] slim-lto-bootstrap.mk build-target support

2012-09-14 Thread Andi Kleen
Markus Trippelsdorf  writes:
>
> If the patches look Ok, I would be nice if someone could commit them,
> because I have no access.

I cannot approve them, so needs someone to do that first.
But I can commit once that's done.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: vector comparisons in C++

2012-09-14 Thread Jason Merrill

On 09/14/2012 09:59 AM, Marc Glisse wrote:

* build_vector_type: don't use an opaque vector for the return type of
   operator< (not sure what the point was of making it opaque?)


I think the point was to allow conversion of the result to a different 
vector type.  Why do you want it not to be opaque?



* Disable (type)(a (a

Right.  It certainly doesn't help for integer vectors.


+ error_at (location, "comparing vectors with different "
+ "element types");


Let's print the vector types in these errors.

Jason



Re: [C++ Patch] PR 54575

2012-09-14 Thread Jason Merrill

On 09/14/2012 09:05 AM, Paolo Carlini wrote:

here we crash because strip_typedefs while processing _RequireInputIter
calls make_typename_type which returns error_mark_node (# line 3281).


strip_typedefs should not return error_mark_node unless it gets 
error_mark_node as an argument; if a typedef-using type is valid, 
removing the typedefs should still produce a valid type.  I ran into a 
similar bug when I was working on the instantiation-dependent stuff, but 
I don't remember which bit fixed that one.



PS: I'm also attaching a patchlet for a couple of hard errors in
make_typename_type not protected by (complain & tf_error) spotted in
make_typename_type.


This patchlet is OK.

Jason



[PATCH][ARM] Require that one of the operands be a register in thumb2_movdf_vfp

2012-09-14 Thread Рубен Бучацкий
Hi,

There are some Thumb2 patterns in vfp.md which are duplicated with only
minor changes from their ARM equivalents.  This patch adds requirement
in "*thumb2_movdf_vfp" pattern, that one of the operands sould be register,
like in ARM "*movdf_vfp" pattern.

There is one functional change: the ARM "*movdf_vfp" pattern disallows the
[mem]=const store case while the Thumb2 one does not.  The ARM version is
more desirable, because [mem]=const would be split anyway and add temporary
vfp register.

Example:

double delta;

void foo ()
{
  delta = 0.0;
}

Generated code (without patch):
Compiler options: g++ -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon
-mfloat-abi=softfp -mthumb -O2

movsr0, #0
movsr1, #0
fmdrr   d16, r0, r1
fstdd16, [r3, #0]

Generated code (with patch):

movsr0, #0
movsr1, #0
strdr0, [r3]

Regtested with QEMU-ARM.

Code size: SPEC2K INT and FP sizes both decrease by ~0.2% (up to 2.64%
on 252.eon).
Performance: 252.eon +4.3%, other tests almost unaffected;
SPEC2K FP grows by 0.16% (up to 1.8% on 183.equake).
(Detailed data below).

Ok for trunk?



BasePeak
Benchmarks  SizeSize
-----  ---
164.gzip33223332230   0,000%
175.vpr108947   1089470   0,000%
176.gcc936220   9362200   0,000%
181.mcf  7768 77680   0,000%
186.crafty 158714   1587140   0,000%
197.parser  66371663710   0,000%
252.eon239982   233646 6336   2,640%
253.perlbmk367652   367628   24   0,007%
254.gap311193   3111930   0,000%
255.vortex 349276   3492688   0,002%
256.bzip2   23044230440   0,000%
300.twolf  144094   143990  104   0,072%

SPECint2000   2746484  2740012 6472   0,236%


BasePeak
Benchmarks  SizeSize
-----  ---
168.wupwise 17541175410   0,000%
171.swim 8662 8686  -24  -0,277%
172.mgrid   1007210088  -16  -0,159%
173.applu   4319342585  608   1,408%
177.mesa   337863   337735  128   0,038%
178.galge l123270   122854  416   0,337%
179.art 1176211686   76   0,646%
183.equake  1483614620  216   1,456%
187.facerec 44554445540   0,000%
188.ammp8594085612  328   0,382%
189.lucas   3176431700   64   0,201%
191.fma3d  662349   662215  134   0,020%
200.sixtrack   713980   712324 1656   0,232%
301.apsi7427773997  280   0,377%

SPECfp20002153860  2149970 3890   0,181%


BasePeak
Benchmarks  SizeSize
-----  ---
252.eon   0.7 0.67  0.3   4.286%
SPECint2000   4.286%

168.wupwise 18,1618,04 0,12   0,661%
171.swim  1,5 1,52-0,02  -1,333%
172.mgrid   40,3340,39-0,06  -0,149%
173.applu0,54 0,55-0,01  -1,852%
177.mesa 2,56 2,57-0,01  -0,391%
178.galgel   22,1   22  0,1   0,452%
179.art 12,3912,29  0,1   0,807%
183.equake   2,25 2,21 0,04   1,778%
187.facerec 10,2510,21 0,04   0,390%
188.ammp22,21 22,1 0,11   0,495%
189.lucas6,26 6,25 0,01   0,160%
191.fma3d   0,6450,648   -0,003  -0,465%
200.sixtrack16,19 16,2-0,01  -0,062%
301.apsi10,8410,92-0,08  -0,738%

SPECfp2000146.565  146.3380.227   0.155%


-- 
Best Regards,
 Ruben. 


thumb2_movdf_vfp.diff
Description: Binary data


Re: vector comparisons in C++

2012-09-14 Thread Marc Glisse

On Fri, 14 Sep 2012, Jason Merrill wrote:


On 09/14/2012 09:59 AM, Marc Glisse wrote:

* build_vector_type: don't use an opaque vector for the return type of
   operator< (not sure what the point was of making it opaque?)


I think the point was to allow conversion of the result to a different vector 
type.


Ah, I see. I'll change it back to opaque and remove that use from the 
testcase. I noticed that for the fold patch, I am using once opaque and 
once not-opaque, I'll make them both opaque, although it probably doesn't 
matter once we are out of the front-end.



Why do you want it not to be opaque?


I wanted to use decltype(xsize as x, and then actually be able to use it. Being opaque, it refuses 
to be initialized (cp/decl.c:5550). Maybe decltype (and others?) could 
return non-opaque types?



+ error_at (location, "comparing vectors with different "
+ "element types");


Let's print the vector types in these errors.


Type is %qT right? I see a number of %q#T but can't remember where the doc 
is. Well, I'll try both and see what happens.


Thanks,

--
Marc Glisse


GCC 4.7.2 Status Report (2012-09-14), branch frozen

2012-09-14 Thread Jakub Jelinek
The GCC 4.7 branch is now frozen for creating a first release candidate
of the GCC 4.7.2 release.

All changes need explicit release manager approval until the final
release of GCC 4.7.2 which should happen roughly one week after
the release candidate if no issues show up with it.



Previous Report
===

http://gcc.gnu.org/ml/gcc/2012-06/msg00195.html


Re: [C++ Patch] PR 54575

2012-09-14 Thread Paolo Carlini

Hi,

On 09/14/2012 04:36 PM, Jason Merrill wrote:

On 09/14/2012 09:05 AM, Paolo Carlini wrote:

here we crash because strip_typedefs while processing _RequireInputIter
calls make_typename_type which returns error_mark_node (# line 3281).


strip_typedefs should not return error_mark_node unless it gets 
error_mark_node as an argument; if a typedef-using type is valid, 
removing the typedefs should still produce a valid type.
I see, makes sense. What I'm seeing is that make_typename_type, called 
for this context (the enable_if):


 align 8 symtab 0 alias set -1 canonical type 0x7694ac78 context 


full-name "struct enable_if::value>"
no-binfo use_template=1 interface-unknown
chain >

wants to return error_mark_node, because here:

  if (!dependent_scope_p (context))
/* We should only set WANT_TYPE when we're a nested typename type.
   Then we can give better diagnostics if we find a non-type. */
t = lookup_field (context, name, 2, /*want_type=*/true);
  else
t = NULL_TREE;

  if ((!t || TREE_CODE (t) == TREE_LIST) && dependent_type_p (context))
return build_typename_type (context, name, fullname, tag_type);

  want_template = TREE_CODE (fullname) == TEMPLATE_ID_EXPR;

  if (!t)
{
  if (complain & tf_error)
error (want_template ? G_("no class template named %q#T in %q#T")
   : G_("no type named %q#T in %q#T"), name, context);
  return error_mark_node;
}

lookup_field returns NULL_TREE and dependent_type_p (context) is false. 
It seems to me that the return value of lookup_field is right. Thus 
either dependent_type_p shouldn't be false or something is wrong in the 
logic or strip_typedefs should not call make_typename_type at all?!?


Paolo.




Re: [PATCH] Set correct source location for deallocator calls

2012-09-14 Thread Dehao Chen
ping...

Thanks,
Dehao

On Sun, Sep 9, 2012 at 5:42 AM, Dehao Chen  wrote:
> Hi,
>
> I've added a libjava unittest which verifies that this patch will not
> break Java debug info. I've also incorporated Richard's review in the
> previous mail. Attached is the new patch, which passed bootstrap and
> all gcc/libjava testsuites on x86.
>
> Is it ok for trunk?
>
> Thanks,
> Dehao
>
> gcc/ChangeLog:
> 2012-09-08  Dehao Chen  
>
>  * tree-eh.c (goto_queue_node): New field.
> (record_in_goto_queue): New parameter.
> (record_in_goto_queue_label): New parameter.
> (lower_try_finally_dup_block): New parameter.
> (maybe_record_in_goto_queue): Update source location.
> (lower_try_finally_copy): Likewise.
> (honor_protect_cleanup_actions): Likewise.
> * gimplify.c (gimplify_expr): Reset the location to unknown.
>
> gcc/testsuite/ChangeLog:
> 2012-09-08  Dehao Chen  
>
> * g++.dg/debug/dwarf2/deallocator.C: New test.
>
> libjava/ChangeLog:
> 2012-09-08  Dehao Chen  
>
> * testsuite/libjava.lang/sourcelocation.java: New cases.
> * testsuite/libjava.lang/sourcelocation.out: New cases.


Re: [PATCH] Set correct source location for deallocator calls

2012-09-14 Thread Andrew Haley
On 09/08/2012 10:42 PM, Dehao Chen wrote:
> I've added a libjava unittest which verifies that this patch will not
> break Java debug info. I've also incorporated Richard's review in the
> previous mail. Attached is the new patch, which passed bootstrap and
> all gcc/libjava testsuites on x86.
> 
> Is it ok for trunk?

Yes, thanks.

Andrew.



Re: [Patch ARM] big-endian support for Neon vext tests

2012-09-14 Thread Christophe Lyon
On 13 September 2012 19:07, Mike Stump  wrote:
> On Sep 13, 2012, at 2:45 AM, Christophe Lyon  
> wrote:
>> Ping?
>> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00330.html
>
> So, two things I thought I'd ask about:
>
>> +/* __attribute__ ((noinline)) is currently required, otherwise the
>> +   generated code computes wrong results in big-endian.  */
>
> and:
>
>> +#ifdef __ARMEL__
>> +  uint64x2_t __mask1 = {1, 0};
>> +#else
>>uint64x2_t __mask1 = {1, 0};
>> +#endif
>
>
>>> * In the case of the test which is executed, I had to force the
>>> noinline attribute on the helper functions, otherwise the computed
>>> results are wrong in big-endian. It is probably an overkill workaround
>>> but it works :-)
>>>  I am going to file a bugzilla for this problem.
>
> I think that for developing the patches noinline was fine, we are confident 
> there aren't any more bugs,
> but, for checkin, I think it is better to leave the test case as is, and let 
> it fail until the PR you filed is fixed.
>  We usually don't put hack arounds for code-gen compiler bugs into the 
> testsuite just to make them
> pass…  :-)

OK. So I will remove the noinline stuff, and update the bugzilla entry
with the name of this testcase.
Should I really leave the test as FAILED rather then XFAIL?

> The second (occurs more than once) just looks odd.  I thought I'd mention it, 
> not sure what my preference is.
Well... it was just for consistency will all the other tests as in
general the mask vectors are different in big and little endian, and
to expose the fact that I hadn't forgotten to write the big-endian
variant :-)

I can remove these occurrences and add a comment instead.


Thanks for your review,

Christophe.


[PATCH, AArch64] Implement ffs standard pattern

2012-09-14 Thread Ian Bolton
I've implemented the standard pattern ffs, which leads to
__builtin_ffs being generated with 4 instructions instead
of 5 instructions.

Regression tests and my new test pass.

OK to commit?


Cheers,
Ian



2012-09-14  Ian Bolton  
 
gcc/
* config/aarch64/aarch64.md (csinc3): Make it into a
named pattern.
* config/aarch64/aarch64.md (ffs2): New pattern.

testsuite/

* gcc.target/aarch64/ffs.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5278957..dfdba42 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2021,7 +2021,7 @@
   [(set_attr "v8type" "csel")
(set_attr "mode" "")])
 
-(define_insn "*csinc3_insn"
+(define_insn "csinc3_insn"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (if_then_else:GPI
  (match_operator:GPI 1 "aarch64_comparison_operator"
@@ -2157,6 +2157,21 @@
   }
 )
 
+(define_expand "ffs2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  ""
+  {
+rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx);
+rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx);
+
+emit_insn (gen_rbit2 (operands[0], operands[1]));
+emit_insn (gen_clz2 (operands[0], operands[0]));
+emit_insn (gen_csinc3_insn (operands[0], x, ccreg, operands[0], 
const0_rtx));
+DONE;
+  }
+)
+
 (define_insn "*and3nr_compare0"
   [(set (reg:CC CC_REGNUM)
(compare:CC
diff --git a/gcc/testsuite/gcc.target/aarch64/ffs.c 
b/gcc/testsuite/gcc.target/aarch64/ffs.c
new file mode 100644
index 000..a344761
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ffs.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned int functest(unsigned int x)
+{
+  return __builtin_ffs(x);
+}
+
+/* { dg-final { scan-assembler "cmp\tw" } } */
+/* { dg-final { scan-assembler "rbit\tw" } } */
+/* { dg-final { scan-assembler "clz\tw" } } */
+/* { dg-final { scan-assembler "csinc\tw" } } */


Re: [C++ Patch] PR 54575

2012-09-14 Thread Paolo Carlini

On 09/14/2012 05:14 PM, Paolo Carlini wrote:
lookup_field returns NULL_TREE and dependent_type_p (context) is 
false. It seems to me that the return value of lookup_field is right.
No, I don't think it makes sense for lookup_field to return NULL_TREE. 
For our testcase we should find the nested type (or we should not call 
lookup_field at all here)


Paolo.


[PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread Tulio Magno Quites Machado Filho
Add __builtin_ppc_get_timebase and __builtin_ppc_mftb to read the Time
Base Register on PowerPC.
They are required by applications that measure time at high frequencies
with high precision that can't afford a syscall.
__builtin_ppc_get_timebase returns the 64 bits of the Time Base Register
while __builtin_ppc_mftb generates only 1 instruction and returns the
least significant word on 32-bit environments and the whole Time Base value
on 64-bit.

[gcc]
2012-09-14 Tulio Magno Quites Machado Filho 

* config/rs6000/rs6000-builtin.def: Add __builtin_ppc_get_timebase
and __builtin_ppc_mftb.
* config/rs6000/rs6000.c (rs6000_expand_zeroop_builtin): New
function to expand an expression that calls a built-in without
arguments.
(rs6000_expand_builtin): Add __builtin_ppc_get_timebase and
__builtin_ppc_mftb.
(rs6000_init_builtins): Likewise.
* config/rs6000/rs6000.md: Likewise.
* doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
Document __builtin_ppc_get_timebase and _builtin_ppc_mftb.

[gcc/testsuite]
2012-09-14 Tulio Magno Quites Machado Filho 

* gcc.target/powerpc/ppc-get-timebase.c: New file.
* gcc.target/powerpc/ppc-mftb.c: New file.
---
 gcc/config/rs6000/rs6000-builtin.def   |6 ++
 gcc/config/rs6000/rs6000.c |   46 +++
 gcc/config/rs6000/rs6000.md|   80 
 gcc/doc/extend.texi|   10 +++
 .../gcc.target/powerpc/ppc-get-timebase.c  |   20 +
 gcc/testsuite/gcc.target/powerpc/ppc-mftb.c|   18 +
 6 files changed, 180 insertions(+), 0 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-get-timebase.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-mftb.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index c8f8f86..9fa3a0f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1429,6 +1429,12 @@ BU_SPECIAL_X (RS6000_BUILTIN_RSQRT, "__builtin_rsqrt", 
RS6000_BTM_FRSQRTE,
 BU_SPECIAL_X (RS6000_BUILTIN_RSQRTF, "__builtin_rsqrtf", RS6000_BTM_FRSQRTES,
  RS6000_BTC_FP)
 
+BU_SPECIAL_X (RS6000_BUILTIN_GET_TB, "__builtin_ppc_get_timebase",
+RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
+
+BU_SPECIAL_X (RS6000_BUILTIN_MFTB, "__builtin_ppc_mftb",
+RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
+
 /* Darwin CfString builtin.  */
 BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, "__builtin_cfstring", RS6000_BTM_ALWAYS,
  RS6000_BTC_MISC)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a5a3848..c3bece1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9748,6 +9748,30 @@ rs6000_overloaded_builtin_p (enum rs6000_builtins fncode)
   return (rs6000_builtin_info[(int)fncode].attr & RS6000_BTC_OVERLOADED) != 0;
 }
 
+/* Expand an expression EXP that calls a builtin without arguments.  */
+static rtx
+rs6000_expand_zeroop_builtin (enum insn_code icode, rtx target)
+{
+  rtx pat;
+  enum machine_mode tmode = insn_data[icode].operand[0].mode;
+
+  if (icode == CODE_FOR_nothing)
+/* Builtin not supported on this processor.  */
+return 0;
+
+  if (target == 0
+  || GET_MODE (target) != tmode
+  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
+target = gen_reg_rtx (tmode);
+
+  pat = GEN_FCN (icode) (target);
+  if (! pat)
+return 0;
+  emit_insn (pat);
+
+  return target;
+}
+
 
 static rtx
 rs6000_expand_unop_builtin (enum insn_code icode, tree exp, rtx target)
@@ -11337,6 +11361,16 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   ? CODE_FOR_bpermd_di
   : CODE_FOR_bpermd_si), exp, target);
 
+case RS6000_BUILTIN_GET_TB:
+  return rs6000_expand_zeroop_builtin (CODE_FOR_rs6000_get_timebase,
+  target);
+
+case RS6000_BUILTIN_MFTB:
+  return rs6000_expand_zeroop_builtin (((TARGET_64BIT)
+   ? CODE_FOR_rs6000_mftb_di
+   : CODE_FOR_rs6000_mftb_si),
+  target);
+
 case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
 case ALTIVEC_BUILTIN_MASK_FOR_STORE:
   {
@@ -11621,6 +11655,18 @@ rs6000_init_builtins (void)
 POWER7_BUILTIN_BPERMD, "__builtin_bpermd");
   def_builtin ("__builtin_bpermd", ftype, POWER7_BUILTIN_BPERMD);
 
+  ftype = build_function_type_list (unsigned_intDI_type_node,
+   NULL_TREE);
+  def_builtin ("__builtin_ppc_get_timebase", ftype, RS6000_BUILTIN_GET_TB);
+
+  if (TARGET_64BIT)
+ftype = build_function_type_list (unsigned_intDI_type_node,
+ NULL_TREE);
+  else
+fty

Re: vector comparisons in C++

2012-09-14 Thread Jason Merrill

On 09/14/2012 11:03 AM, Marc Glisse wrote:

I wanted to use decltype(x

That sounds like the right answer.


Type is %qT right? I see a number of %q#T but can't remember where the
doc is. Well, I'll try both and see what happens.


Either one works; the # asks for more verbose output.

Jason




Re: PATCH: PR debug/54568: --eh-frame-hdr should also be enabled for static executable

2012-09-14 Thread Joseph S. Myers
On Fri, 14 Sep 2012, H.J. Lu wrote:

> +# Only support for glibc 2.3.0 or higher with AT_PHDR/AT_PHNUM from
> +# Linux kernel.
> +   [[if test x$host = x$build -a x$host = x$target &&
> +   ldd --version 2>&1 >/dev/null &&

Could we please stop adding this sort of native-only test?  There is 
various existing code that examines headers to determine glibc features, 
which is more cross-compile friendly.  This should probably be factored 
out into an autoconf macro to determine values from target headers, used 
once in configure.ac to get the glibc version for tools with glibc 
targets, and the --enable-gnu-unique-object test should be changed to use 
the version from the headers like other tests do.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: vector comparisons in C++

2012-09-14 Thread Marc Glisse

On Fri, 14 Sep 2012, Jason Merrill wrote:


On 09/14/2012 11:03 AM, Marc Glisse wrote:

I wanted to use decltype(x

That sounds like the right answer.


Ok, I'll open a bugzilla to remember to try that later.


Type is %qT right? I see a number of %q#T but can't remember where the
doc is. Well, I'll try both and see what happens.


Either one works; the # asks for more verbose output.


The %qT looked good enough to me (it prints the alias and the type). I 
added the types in an "inform" because the line was already getting long, 
I hope that's ok.


The attached just finished the testsuite (changelog unchanged).

--
Marc GlisseIndex: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 191291)
+++ gcc/gimple-fold.c   (working copy)
@@ -23,20 +23,21 @@ along with GCC; see the file COPYING3.
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
 #include "flags.h"
 #include "function.h"
 #include "dumpfile.h"
 #include "tree-flow.h"
 #include "tree-ssa-propagate.h"
 #include "target.h"
 #include "gimple-fold.h"
+#include "langhooks.h"
 
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
We can get declarations that are not possible to reference for various
reasons:
 
  1) When analyzing C++ virtual tables.
C++ virtual tables do have known constructors even
when they are keyed to other compilation unit.
Those tables can contain pointers to methods and vars
@@ -1685,41 +1686,51 @@ and_var_with_comparison_1 (gimple stmt,
(OP1A CODE1 OP1B) and (OP2A CODE2 OP2B), respectively.
If this can be done without constructing an intermediate value,
return the resulting tree; otherwise NULL_TREE is returned.
This function is deliberately asymmetric as it recurses on SSA_DEFs
in the first comparison but not the second.  */
 
 static tree
 and_comparisons_1 (enum tree_code code1, tree op1a, tree op1b,
   enum tree_code code2, tree op2a, tree op2b)
 {
+  tree truth_type = boolean_type_node;
+  if (TREE_CODE (TREE_TYPE (op1a)) == VECTOR_TYPE)
+{
+  tree vec_type = TREE_TYPE (op1a);
+  tree elem = lang_hooks.types.type_for_size
+   (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vec_type))), 0);
+  truth_type = build_opaque_vector_type (elem,
+TYPE_VECTOR_SUBPARTS (vec_type));
+}
+
   /* First check for ((x CODE1 y) AND (x CODE2 y)).  */
   if (operand_equal_p (op1a, op2a, 0)
   && operand_equal_p (op1b, op2b, 0))
 {
   /* Result will be either NULL_TREE, or a combined comparison.  */
   tree t = combine_comparisons (UNKNOWN_LOCATION,
TRUTH_ANDIF_EXPR, code1, code2,
-   boolean_type_node, op1a, op1b);
+   truth_type, op1a, op1b);
   if (t)
return t;
 }
 
   /* Likewise the swapped case of the above.  */
   if (operand_equal_p (op1a, op2b, 0)
   && operand_equal_p (op1b, op2a, 0))
 {
   /* Result will be either NULL_TREE, or a combined comparison.  */
   tree t = combine_comparisons (UNKNOWN_LOCATION,
TRUTH_ANDIF_EXPR, code1,
swap_tree_comparison (code2),
-   boolean_type_node, op1a, op1b);
+   truth_type, op1a, op1b);
   if (t)
return t;
 }
 
   /* If both comparisons are of the same value against constants, we might
  be able to merge them.  */
   if (operand_equal_p (op1a, op2a, 0)
   && TREE_CODE (op1b) == INTEGER_CST
   && TREE_CODE (op2b) == INTEGER_CST)
 {
@@ -2147,41 +2158,51 @@ or_var_with_comparison_1 (gimple stmt,
(OP1A CODE1 OP1B) and (OP2A CODE2 OP2B), respectively.
If this can be done without constructing an intermediate value,
return the resulting tree; otherwise NULL_TREE is returned.
This function is deliberately asymmetric as it recurses on SSA_DEFs
in the first comparison but not the second.  */
 
 static tree
 or_comparisons_1 (enum tree_code code1, tree op1a, tree op1b,
  enum tree_code code2, tree op2a, tree op2b)
 {
+  tree truth_type = boolean_type_node;
+  if (TREE_CODE (TREE_TYPE (op1a)) == VECTOR_TYPE)
+{
+  tree vec_type = TREE_TYPE (op1a);
+  tree elem = lang_hooks.types.type_for_size
+   (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vec_type))), 0);
+  truth_type = build_opaque_vector_type (elem,
+TYPE_VECTOR_SUBPARTS (vec_type));
+}
+
   /* First check for ((x CODE1 y) OR (x CODE2 y)).  */
   if (operand_equal_p (op1a, op2a, 0)
   && operand_equal_p (op1b, op2b, 0))
 {
   /* Result will be either NULL_TREE, or a combined comparison.  */
   tree t = combine_comparisons (UNKNOWN_LOC

Re: [PATCH] Prevent cselib substitution of FP, SP, SFP

2012-09-14 Thread Carrot Wei
Hi Jakub

I have run it on 4.6, it passes the following testing:

x86-64 bootstrap
x86-64 regression test
regression test on arm qemu

Is it OK for gcc4.6?

Ahmad, is it OK for google/gcc-4_6/ and google/gcc-4_6-mobile ?

thanks
Carrot

On Wed, Sep 12, 2012 at 2:01 PM, Carrot Wei  wrote:
> Hi Jakub
>
> The same problem also affects gcc4.6,
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54398. Could this be
> ported to 4.6 branch?
>
> thanks
> Carrot
>
> On Mon, Feb 13, 2012 at 11:54 AM, Jakub Jelinek  wrote:
>>
>> On Wed, Jan 04, 2012 at 05:21:38PM +, Marcus Shawcroft wrote:
>> > Alias analysis by DSE based on CSELIB expansion assumes that
>> > references to the stack frame from different base registers (ie FP, SP)
>> > never alias.
>> >
>> > The comment block in cselib explains that cselib does not allow
>> > substitution of FP, SP or SFP specifically in order not to break DSE.
>>
>> Looks reasonable, appart from coding style (no spaces around -> and
>> no {} around return p->loc;), I just wonder if having a separate
>> loop in expand_loc just for this isn't too expensive.  On sane targets
>> IMHO hard frame pointer in the prologue should be initialized from sp, not
>> the other way around, thus hard frame pointer based VALUEs should have
>> hard frame pointer earlier in the locs list (when there is
>> hfp = sp (+ optionally some const)
>> insn, we first cselib_lookup_from_insn the rhs and add to locs
>> of the new VALUE (plus (VALUE of sp) (const_int)), then process the
>> lhs and add it to locs, moving the plus to locs->next).
>> So I think the following patch could be enough (bootstrapped/regtested
>> on x86_64-linux and i686-linux).
>> There is AVR though, which has really weirdo prologue - PR50063,
>> but I think it should just use UNSPEC for that or something similar,
>> setting sp from hfp seems unnecessary and especially for values with long
>> locs chains could make cselib more expensive.
>>
>> Richard, what do you think about this?
>>
>> 2012-02-13  Jakub Jelinek  
>>
>> * cselib.c (expand_loc): Return sp, fp, hfp or cfa base reg right
>> away if seen.
>>
>> --- gcc/cselib.c.jj 2012-02-13 11:07:15.0 +0100
>> +++ gcc/cselib.c2012-02-13 18:15:17.531776145 +0100
>> @@ -1372,8 +1372,18 @@ expand_loc (struct elt_loc_list *p, stru
>>unsigned int regno = UINT_MAX;
>>struct elt_loc_list *p_in = p;
>>
>> -  for (; p; p = p -> next)
>> +  for (; p; p = p->next)
>>  {
>> +  /* Return these right away to avoid returning stack pointer based
>> +expressions for frame pointer and vice versa, which is something
>> +that would confuse DSE.  See the comment in
>> cselib_expand_value_rtx_1
>> +for more details.  */
>> +  if (REG_P (p->loc)
>> + && (REGNO (p->loc) == STACK_POINTER_REGNUM
>> + || REGNO (p->loc) == FRAME_POINTER_REGNUM
>> + || REGNO (p->loc) == HARD_FRAME_POINTER_REGNUM
>> + || REGNO (p->loc) == cfa_base_preserved_regno))
>> +   return p->loc;
>>/* Avoid infinite recursion trying to expand a reg into a
>>  the same reg.  */
>>if ((REG_P (p->loc))
>>
>>
>> Jakub


Re: [PATCH] Prevent cselib substitution of FP, SP, SFP

2012-09-14 Thread Jakub Jelinek
On Fri, Sep 14, 2012 at 09:39:43AM -0700, Carrot Wei wrote:
> I have run it on 4.6, it passes the following testing:
> 
> x86-64 bootstrap
> x86-64 regression test
> regression test on arm qemu
> 
> Is it OK for gcc4.6?

Yes.

Jakub


Re: vector comparisons in C++

2012-09-14 Thread Jason Merrill

On 09/14/2012 12:33 PM, Marc Glisse wrote:

Ok, I'll open a bugzilla to remember to try that later.


OK.


The attached just finished the testsuite (changelog unchanged).


This patch is OK.

Jason




Re: [PATCH][ARM] Require that one of the operands be a register in thumb2_movdf_vfp

2012-09-14 Thread Richard Earnshaw
On 14/09/12 15:58, Рубен Бучацкий wrote:
> Hi,
> 
> There are some Thumb2 patterns in vfp.md which are duplicated with only
> minor changes from their ARM equivalents.  This patch adds requirement
> in "*thumb2_movdf_vfp" pattern, that one of the operands sould be register,
> like in ARM "*movdf_vfp" pattern.
> 
> There is one functional change: the ARM "*movdf_vfp" pattern disallows the
> [mem]=const store case while the Thumb2 one does not.  The ARM version is
> more desirable, because [mem]=const would be split anyway and add temporary
> vfp register.
> 
> Example:
> 
> double delta;
> 
> void foo ()
> {
>   delta = 0.0;
> }
> 
> Generated code (without patch):
> Compiler options: g++ -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon
> -mfloat-abi=softfp -mthumb -O2
> 
> movsr0, #0
> movsr1, #0
> fmdrr   d16, r0, r1
> fstdd16, [r3, #0]
> 
> Generated code (with patch):
> 
> movsr0, #0
> movsr1, #0
> strdr0, [r3]
> 
> Regtested with QEMU-ARM.
> 
> Code size: SPEC2K INT and FP sizes both decrease by ~0.2% (up to 2.64%
> on 252.eon).
> Performance: 252.eon +4.3%, other tests almost unaffected;
> SPEC2K FP grows by 0.16% (up to 1.8% on 183.equake).
> (Detailed data below).
> 
> Ok for trunk?
> 
> 

OK.

R.





Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread Segher Boessenkool

Hi Tulio,


@@ -14103,6 +14105,84 @@
   ""
   "")

+(define_expand "rs6000_get_timebase"
+  [(use (match_operand:DI 0 "gpc_reg_operand" ""))]
+  ""
+  "
+{
+  if (TARGET_POWERPC64)
+emit_insn (gen_rs6000_get_timebase_ppc64 (operands[0]));
+  else
+emit_insn (gen_rs6000_get_timebase_ppc32 (operands[0]));
+  DONE;
+}")


Please don't put quotes around the C block.


+(define_insn "rs6000_get_timebase_ppc32"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))
+   (clobber (match_scratch:SI 1 "=r"))
+   (clobber (match_scratch:CC 2 "=x"))]


You can do "y" instead, to allow all CRn fields.


+  "!TARGET_POWERPC64"
+{
+  if (WORDS_BIG_ENDIAN)
+if (TARGET_MFCRF)
+  {
+return "mfspr %0, 269\;"
+  "mfspr %L0, 268\;"
+  "mfspr %1, 269\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+  }
+else
+  {
+return "mftbu %0\;"
+  "mftb %L0\;"
+  "mftbu %1\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+  }
+  else
+if (TARGET_MFCRF)
+  {
+return "mfspr %L0, 269\;"
+  "mfspr %0, 268\;"
+  "mfspr %1, 269\;"
+  "cmpw %L0,%1\;"
+  "bne- $-16";
+  }
+else
+  {
+return "mftbu %L0\;"
+  "mftb %0\;"
+  "mftbu %1\;"
+  "cmpw %L0,%1\;"
+  "bne- $-16";
+  }
+})


I don't think TARGET_MFCRF is correct.  For example, if you use
-mcpu=powerpc64 (which doesn't set this flag) you will get code
that does not run on the newer machines.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e850266..b3fc236 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13660,6 +13660,8 @@ float __builtin_rsqrtf (float);
 double __builtin_recipdiv (double, double);
 double __builtin_rsqrt (double);
 long __builtin_bpermd (long, long);
+uint64_t __builtin_ppc_get_timebase ();
+unsigned long __builtin_ppc_mftb ();
 @end smallexample


This is section "PowerPC AltiVec/VSX Built-in Functions"; there
should be a preceding separate "PowerPC Built-in Functions" section.
Some of the current documentation should go there, too.

@@ -13671,6 +13673,14 @@ The @code{__builtin_recipdiv}, and @code 
{__builtin_recipdivf}

 functions generate multiple instructions to implement division using
 the reciprocal estimate instructions.

+The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb}
+functions generate instructions to read the Time Base Register. The
+@code{__builtin_ppc_get_timebase} function may generate multiple
+instructions and always return the 64 bits of the Time Base  
Register. The


s/return/returns/

+@code{__builtin_ppc_mftb} function always generate one instruction  
and


generates

+returns the Time Base Register value as an unsigned long, throwing  
away

+the most significant word on 32-bit environments.



Looks good other than those nits, and the MFCRF thing.  Oh, and you
didn't say how you tested it (what targets).


Segher



Fix C ICEs on truthvalue conversions of some expressions with integer constant operands (PR c/54103)

2012-09-14 Thread Joseph S. Myers
Bug 54103 is a C front-end regression involving ICEs in certain cases
of expressions such as 0 / 0 being converted to truthvalues.

c_common_truthvalue_conversion, or
c_objc_common_truthvalue_conversion, received 0 / 0 directly as a
TRUNC_DIV_EXPR, without any wrapping C_MAYBE_CONST_EXPR to indicate
that the expression has integer constant operands, so did not know
that it had to produce a result valid for an expression with integer
constant operands (that is, one satisfying EXPR_INT_CONST_OPERANDS -
meaning either an INTEGER_CST, or a C_MAYBE_CONST_EXPR with
C_MAYBE_CONST_EXPR_INT_OPERANDS set, but not anything with
C_MAYBE_CONST_EXPR deeper in the expression).  As a result it returned
an expression with a C_MAYBE_CONST_EXPR not at top level - but the
callers assumed this could not occur, so created their own wrapping
C_MAYBE_CONST_EXPR (and C_MAYBE_CONST_EXPR should never be nested).

This is fixed by ensuring that a properly marked expression gets
passed to c_objc_common_truthvalue_conversion; ensuring
c_objc_common_truthvalue_conversion handes such expressions directly
so c_common_truthvalue_conversion doesn't have to; and avoiding direct
calls to c_common_truthvalue_conversion in relevant places.

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  Applied
to mainline.  Will apply to 4.7 (when not frozen) and 4.6 branches
subject to testing there.

c:
2012-09-14  Joseph Myers  

PR c/54103
* c-typeck.c (build_unary_op): Pass original argument of
TRUTH_NOT_EXPR to c_objc_common_truthvalue_conversion, then remove
any C_MAYBE_CONST_EXPR, if it has integer operands.
(build_binary_op): Pass original arguments of TRUTH_ANDIF_EXPR,
TRUTH_ORIF_EXPR, TRUTH_AND_EXPR, TRUTH_OR_EXPR and TRUTH_XOR_EXPR
to c_objc_common_truthvalue_conversion, then remove any
C_MAYBE_CONST_EXPR, if they have integer operands.  Use
c_objc_common_truthvalue_conversion not
c_common_truthvalue_conversion.
(c_objc_common_truthvalue_conversion): Build NE_EXPR directly and
call note_integer_operands for arguments with integer operands
that are not integer constants.

testsuite:
2012-09-14  Joseph Myers  

PR c/54103
* gcc.c-torture/compile/pr54103-1.c,
gcc.c-torture/compile/pr54103-2.c,
gcc.c-torture/compile/pr54103-3.c,
gcc.c-torture/compile/pr54103-4.c,
gcc.c-torture/compile/pr54103-5.c,
gcc.c-torture/compile/pr54103-6.c: New tests.
* gcc.dg/c90-const-expr-8.c: Update expected column number.

Index: c/c-typeck.c
===
--- c/c-typeck.c(revision 191291)
+++ c/c-typeck.c(working copy)
@@ -3553,7 +3553,13 @@ build_unary_op (location_t location,
"wrong type argument to unary exclamation mark");
  return error_mark_node;
}
-  arg = c_objc_common_truthvalue_conversion (location, arg);
+  if (int_operands)
+   {
+ arg = c_objc_common_truthvalue_conversion (location, xarg);
+ arg = remove_c_maybe_const_expr (arg);
+   }
+  else
+   arg = c_objc_common_truthvalue_conversion (location, arg);
   ret = invert_truthvalue_loc (location, arg);
   /* If the TRUTH_NOT_EXPR has been folded, reset the location.  */
   if (EXPR_P (ret) && EXPR_HAS_LOCATION (ret))
@@ -9807,8 +9813,20 @@ build_binary_op (location_t location, enum tree_co
 but that does not mean the operands should be
 converted to ints!  */
  result_type = integer_type_node;
- op0 = c_common_truthvalue_conversion (location, op0);
- op1 = c_common_truthvalue_conversion (location, op1);
+ if (op0_int_operands)
+   {
+ op0 = c_objc_common_truthvalue_conversion (location, orig_op0);
+ op0 = remove_c_maybe_const_expr (op0);
+   }
+ else
+   op0 = c_objc_common_truthvalue_conversion (location, op0);
+ if (op1_int_operands)
+   {
+ op1 = c_objc_common_truthvalue_conversion (location, orig_op1);
+ op1 = remove_c_maybe_const_expr (op1);
+   }
+ else
+   op1 = c_objc_common_truthvalue_conversion (location, op1);
  converted = 1;
  boolean_op = true;
}
@@ -10520,13 +10538,18 @@ c_objc_common_truthvalue_conversion (location_t lo
 
   int_const = (TREE_CODE (expr) == INTEGER_CST && !TREE_OVERFLOW (expr));
   int_operands = EXPR_INT_CONST_OPERANDS (expr);
-  if (int_operands)
-expr = remove_c_maybe_const_expr (expr);
+  if (int_operands && TREE_CODE (expr) != INTEGER_CST)
+{
+  expr = remove_c_maybe_const_expr (expr);
+  expr = build2 (NE_EXPR, integer_type_node, expr,
+convert (TREE_TYPE (expr), integer_zero_node));
+  expr = note_integer_operands (expr);
+}
+  else
+/* ??? Should we also give an error for vectors rather tha

[PATCH, AArch64] Implement fnma, fms and fnms standard patterns

2012-09-14 Thread Ian Bolton
The following standard pattern names were implemented by simply
renaming some existing patterns:

* fnma
* fms
* fnms

I have added an extra pattern for when we don't care about
signed zero, so we can do "-fma (a,b,c)" more efficiently.

Regression testing all passed.
 
OK to commit?

Cheers,
Ian


2012-09-14  Ian Bolton  

gcc/
* config/aarch64/aarch64.md (fmsub4): Renamed
to fnma4.
* config/aarch64/aarch64.md (fnmsub4): Renamed
to fms4.
* config/aarch64/aarch64.md (fnmadd4): Renamed
to fnms4.
* config/aarch64/aarch64.md (*fnmadd4): New pattern.

testsuite/
* gcc.target/aarch64/fmadd.c: Added extra tests.
* gcc.target/aarch64/fnmadd-fastmath.c: New test.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 3fbebf7..33815ff 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2506,7 +2506,7 @@
(set_attr "mode" "")]
 )
 
-(define_insn "*fmsub4"
+(define_insn "fnma4"
   [(set (match_operand:GPF 0 "register_operand" "=w")
(fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w"))
 (match_operand:GPF 2 "register_operand" "w")
@@ -2517,7 +2517,7 @@
(set_attr "mode" "")]
 )
 
-(define_insn "*fnmsub4"
+(define_insn "fms4"
   [(set (match_operand:GPF 0 "register_operand" "=w")
 (fma:GPF (match_operand:GPF 1 "register_operand" "w")
 (match_operand:GPF 2 "register_operand" "w")
@@ -2528,7 +2528,7 @@
(set_attr "mode" "")]
 )
 
-(define_insn "*fnmadd4"
+(define_insn "fnms4"
   [(set (match_operand:GPF 0 "register_operand" "=w")
(fma:GPF (neg:GPF (match_operand:GPF 1 "register_operand" "w"))
 (match_operand:GPF 2 "register_operand" "w")
@@ -2539,6 +2539,18 @@
(set_attr "mode" "")]
 )
 
+;; If signed zeros are ignored, -(a * b + c) = -a * b - c.
+(define_insn "*fnmadd4"
+  [(set (match_operand:GPF 0 "register_operand")
+   (neg:GPF (fma:GPF (match_operand:GPF 1 "register_operand")
+ (match_operand:GPF 2 "register_operand")
+ (match_operand:GPF 3 "register_operand"]
+  "!HONOR_SIGNED_ZEROS (mode) && TARGET_FLOAT"
+  "fnmadd\\t%0, %1, %2, %3"
+  [(set_attr "v8type" "fmadd")
+   (set_attr "mode" "")]
+)
+
 ;; ---
 ;; Floating-point conversions
 ;; ---
diff --git a/gcc/testsuite/gcc.target/aarch64/fmadd.c 
b/gcc/testsuite/gcc.target/aarch64/fmadd.c
index 3b4..39975db 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmadd.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmadd.c
@@ -4,15 +4,52 @@
 extern double fma (double, double, double);
 extern float fmaf (float, float, float);
 
-double test1 (double x, double y, double z)
+double test_fma1 (double x, double y, double z)
 {
   return fma (x, y, z);
 }
 
-float test2 (float x, float y, float z)
+float test_fma2 (float x, float y, float z)
 {
   return fmaf (x, y, z);
 }
 
+double test_fnma1 (double x, double y, double z)
+{
+  return fma (-x, y, z);
+}
+
+float test_fnma2 (float x, float y, float z)
+{
+  return fmaf (-x, y, z);
+}
+
+double test_fms1 (double x, double y, double z)
+{
+  return fma (x, y, -z);
+}
+
+float test_fms2 (float x, float y, float z)
+{
+  return fmaf (x, y, -z);
+}
+
+double test_fnms1 (double x, double y, double z)
+{
+  return fma (-x, y, -z);
+}
+
+float test_fnms2 (float x, float y, float z)
+{
+  return fmaf (-x, y, -z);
+}
+
 /* { dg-final { scan-assembler-times "fmadd\td\[0-9\]" 1 } } */
 /* { dg-final { scan-assembler-times "fmadd\ts\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fmsub\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fmsub\ts\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fnmsub\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fnmsub\ts\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fnmadd\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fnmadd\ts\[0-9\]" 1 } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c 
b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c
new file mode 100644
index 000..9c115df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math" } */
+
+extern double fma (double, double, double);
+extern float fmaf (float, float, float);
+
+double test_fma1 (double x, double y, double z)
+{
+  return - fma (x, y, z);
+}
+
+float test_fma2 (float x, float y, float z)
+{
+  return - fmaf (x, y, z);
+}
+
+/* { dg-final { scan-assembler-times "fnmadd\td\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "fnmadd\ts\[0-9\]" 1 } } */
+


Re: [PATCH, AArch64] Implement fnma, fms and fnms standard patterns

2012-09-14 Thread Richard Earnshaw
On 14/09/12 18:05, Ian Bolton wrote:
> The following standard pattern names were implemented by simply
> renaming some existing patterns:
> 
> * fnma
> * fms
> * fnms
> 
> I have added an extra pattern for when we don't care about
> signed zero, so we can do "-fma (a,b,c)" more efficiently.
> 
> Regression testing all passed.
>  
> OK to commit?
> 
> Cheers,
> Ian
> 
> 
> 2012-09-14  Ian Bolton  
> 
> gcc/
>   * config/aarch64/aarch64.md (fmsub4): Renamed
>   to fnma4.
>   * config/aarch64/aarch64.md (fnmsub4): Renamed
>   to fms4.
>   * config/aarch64/aarch64.md (fnmadd4): Renamed
>   to fnms4.
>   * config/aarch64/aarch64.md (*fnmadd4): New pattern.
> 
> testsuite/
>   * gcc.target/aarch64/fmadd.c: Added extra tests.
>   * gcc.target/aarch64/fnmadd-fastmath.c: New test.
> 

OK.

R.



Re: [Patch ARM] big-endian support for Neon vext tests

2012-09-14 Thread Mike Stump
On Sep 14, 2012, at 8:23 AM, Christophe Lyon  wrote:
> OK. So I will remove the noinline stuff, and update the bugzilla entry
> with the name of this testcase.
> Should I really leave the test as FAILED rather then XFAIL?

No.  Best to either add the test case as that bug is fixed or xfail it.


Re: [PATCH] fix bootstrap on darwin to adapt to VEC changes

2012-09-14 Thread Mike Stump
On Sep 11, 2012, at 7:55 AM, Jack Howarth  wrote:
> The attached patch fixes the bootstrap on darwin to cope with the
> VEC changes to remove unnecessary VEC function overloads. Tested on
> x86_64-apple-darwin12. Okay for gcc trunk.

Ok.


Re: vector comparisons in C++

2012-09-14 Thread Marc Glisse

On Fri, 14 Sep 2012, Jason Merrill wrote:

[decltype of opaque vector]

I think a simple TYPE_MAIN_VARIANT will do, I just need to find where to 
add it, and how much to constrain that change. Type deduction in templates 
and auto already seem to remove opacity :-)



This patch is OK.


Committed. Thank you for all the quick comments,

--
Marc Glisse


Re: [PATCH, AArch64] Implement ffs standard pattern

2012-09-14 Thread Richard Earnshaw
On 14/09/12 16:26, Ian Bolton wrote:
> I've implemented the standard pattern ffs, which leads to
> __builtin_ffs being generated with 4 instructions instead
> of 5 instructions.
> 
> Regression tests and my new test pass.
> 
> OK to commit?
> 
> 
> Cheers,
> Ian
> 
> 
> 
> 2012-09-14  Ian Bolton  
>  
> gcc/
>   * config/aarch64/aarch64.md (csinc3): Make it into a
>   named pattern.
>   * config/aarch64/aarch64.md (ffs2): New pattern.
> 
> testsuite/
> 
>   * gcc.target/aarch64/ffs.c: New test.
> 

OK.

R.



Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread Tulio Magno Quites Machado Filho

Segher Boessenkool  writes:


Hi Tulio,


Hi Segher,


+(define_expand "rs6000_get_timebase"
+  [(use (match_operand:DI 0 "gpc_reg_operand" ""))]
+  ""
+  "
+{
+  if (TARGET_POWERPC64)
+emit_insn (gen_rs6000_get_timebase_ppc64 (operands[0]));
+  else
+emit_insn (gen_rs6000_get_timebase_ppc32 (operands[0]));
+  DONE;
+}")


Please don't put quotes around the C block.


Fixed.




+(define_insn "rs6000_get_timebase_ppc32"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+(unspec_volatile:DI [(const_int 0)] UNSPECV_GETTB))
+   (clobber (match_scratch:SI 1 "=r"))
+   (clobber (match_scratch:CC 2 "=x"))]


You can do "y" instead, to allow all CRn fields.


Fixed.


+  "!TARGET_POWERPC64"
+{
+  if (WORDS_BIG_ENDIAN)
+if (TARGET_MFCRF)
+  {
+return "mfspr %0, 269\;"
+  "mfspr %L0, 268\;"
+  "mfspr %1, 269\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+  }
+else
+  {
+return "mftbu %0\;"
+  "mftb %L0\;"
+  "mftbu %1\;"
+  "cmpw %0,%1\;"
+  "bne- $-16";
+  }
+  else
+if (TARGET_MFCRF)
+  {
+return "mfspr %L0, 269\;"
+  "mfspr %0, 268\;"
+  "mfspr %1, 269\;"
+  "cmpw %L0,%1\;"
+  "bne- $-16";
+  }
+else
+  {
+return "mftbu %L0\;"
+  "mftb %0\;"
+  "mftbu %1\;"
+  "cmpw %L0,%1\;"
+  "bne- $-16";
+  }
+})


I don't think TARGET_MFCRF is correct.  For example, if you use
-mcpu=powerpc64 (which doesn't set this flag) you will get code
that does not run on the newer machines.


Sorry, but it seems to be working here...
I explain how I tested this in the end of the email.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e850266..b3fc236 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13660,6 +13660,8 @@ float __builtin_rsqrtf (float);
 double __builtin_recipdiv (double, double);
 double __builtin_rsqrt (double);
 long __builtin_bpermd (long, long);
+uint64_t __builtin_ppc_get_timebase ();
+unsigned long __builtin_ppc_mftb ();
 @end smallexample


This is section "PowerPC AltiVec/VSX Built-in Functions"; there
should be a preceding separate "PowerPC Built-in Functions" 
section.

Some of the current documentation should go there, too.


Agreed.
I'm changing this.


@@ -13671,6 +13673,14 @@ The @code{__builtin_recipdiv}, and
@code{__builtin_recipdivf}
 functions generate multiple instructions to implement division 
 using

 the reciprocal estimate instructions.

+The @code{__builtin_ppc_get_timebase} and 
@code{__builtin_ppc_mftb}
+functions generate instructions to read the Time Base 
Register. The
+@code{__builtin_ppc_get_timebase} function may generate 
multiple
+instructions and always return the 64 bits of the Time Base 
Register. The


s/return/returns/


Fixed.

+@code{__builtin_ppc_mftb} function always generate one 
instruction and


generates


Fixed.

+returns the Time Base Register value as an unsigned long, 
throwing away

+the most significant word on 32-bit environments.



Looks good other than those nits, and the MFCRF thing.  Oh, and 
you

didn't say how you tested it (what targets).


Sorry, I run the test suite on a Power7 running a 64-bit 
environment.
Then, I manually built and run a modified copy of the 2 included 
tests that
that only print the TBR values a 3 times to make sure they were 
increasing.
I did this for both 32 and 64 bits on a Power3 and a Power7, 
making sure to use

the correct -mcpu.
In the end I checked that the generated assembly was right for 
Power3, Power7

and some random chooses I've done.

Here are some snippets of code for __builtin_ppc_get_timebase, 
i.e.:

- -m64 -mcpu=power7
.L.main:
std 31,-8(1)
stdu 1,-80(1)
mr 31,1
mfspr 9, 268
std 9,56(31)
li 9,0
stw 9,48(31)
b .L2

- -m32 -mcpu=power3
main:
stwu 1,-48(1)
stw 31,44(1)
mr 31,1
mftbu 8
mftb 9
mftbu 10
cmpw 8,10
bne- $-16
stw 8,24(31)
stw 9,28(31)
lfd 0,24(31)
stfd 0,16(31)
li 9,0
stw 9,8(31)
b .L2

- -m64 -mcpu=power3
.L.main:
std 31,-8(1)
stdu 1,-80(1)
mr 31,1
mftb 9
std 9,56(31)
li 9,0
stw 9,48(31)
b .L2

Thanks,

--
Tulio Magno



[PATCH, ARM] Prefer vld1.64/vst1.64 over vldm/vstm

2012-09-14 Thread Ulrich Weigand
Hello,

this patch changes the ARM back-end to use vld1.64/vst1.64 instructions
instead of vldm/vstm  -where possible-  to implement double-word moves.

The main benefit of this is that it allows the compiler to provide
appropriate alignment hints, which may improve performance.

The patch is based on an earlier version by Ramana.  This version has
now successfully passed regression testing and benchmarking (no
performance regressions found, improvements of up to 2.5% on certain
benchmarks).

Tested on arm-linux-gnueabi.
OK for mainline?

Bye,
Ulrich


2012-09-14  Ramana Radhakrishnan  
Ulrich Weigand  

* config/arm/arm.c (output_move_neon): Update comment.
Use vld1.64/vst1.64 instead of vldm/vstm where possible.
(neon_vector_mem_operand): Support double-word modes.
* config/arm/neon.md (*neon_mov VD): Call output_move_neon
instead of output_move_vfp.  Change constraint from Uv to Un.

Index: gcc-head/gcc/config/arm/arm.c
===
--- gcc-head.orig/gcc/config/arm/arm.c  2012-09-14 19:38:20.0 +0200
+++ gcc-head/gcc/config/arm/arm.c   2012-09-14 19:40:51.0 +0200
@@ -9629,7 +9629,11 @@ neon_vector_mem_operand (rtx op, int typ
   && REG_MODE_OK_FOR_BASE_P (XEXP (ind, 0), VOIDmode)
   && CONST_INT_P (XEXP (ind, 1))
   && INTVAL (XEXP (ind, 1)) > -1024
-  && INTVAL (XEXP (ind, 1)) < 1016
+  /* For quad modes, we restrict the constant offset to be slightly less
+than what the instruction format permits.  We have no such constraint
+on double mode offsets.  (This must match arm_legitimate_index_p.)  */
+  && (INTVAL (XEXP (ind, 1))
+ < (VALID_NEON_QREG_MODE (GET_MODE (op))? 1016 : 1024))
   && (INTVAL (XEXP (ind, 1)) & 3) == 0)
 return TRUE;
 
@@ -14573,15 +14577,16 @@ output_move_vfp (rtx *operands)
   return "";
 }
 
-/* Output a Neon quad-word load or store, or a load or store for
-   larger structure modes.
+/* Output a Neon double-word or quad-word load or store, or a load
+   or store for larger structure modes.
 
WARNING: The ordering of elements is weird in big-endian mode,
-   because we use VSTM, as required by the EABI.  GCC RTL defines
-   element ordering based on in-memory order.  This can be differ
-   from the architectural ordering of elements within a NEON register.
-   The intrinsics defined in arm_neon.h use the NEON register element
-   ordering, not the GCC RTL element ordering.
+   because the EABI requires that vectors stored in memory appear
+   as though they were stored by a VSTM, as required by the EABI.
+   GCC RTL defines element ordering based on in-memory order.
+   This can be different from the architectural ordering of elements
+   within a NEON register. The intrinsics defined in arm_neon.h use the
+   NEON register element ordering, not the GCC RTL element ordering.
 
For example, the in-memory ordering of a big-endian a quadword
vector with 16-bit elements when stored from register pair {d0,d1}
@@ -14595,7 +14600,22 @@ output_move_vfp (rtx *operands)
  dN -> (rN+1, rN), dN+1 -> (rN+3, rN+2)
 
So that STM/LDM can be used on vectors in ARM registers, and the
-   same memory layout will result as if VSTM/VLDM were used.  */
+   same memory layout will result as if VSTM/VLDM were used.
+
+   Instead of VSTM/VLDM we prefer to use VST1.64/VLD1.64 where
+   possible, which allows use of appropriate alignment tags.
+   Note that the choice of "64" is independent of the actual vector
+   element size; this size simply ensures that the behavior is
+   equivalent to VSTM/VLDM in both little-endian and big-endian mode.
+
+   Due to limitations of those instructions, use of VST1.64/VLD1.64
+   is not possible if:
+- the address contains PRE_DEC, or
+- the mode refers to more than 4 double-word registers
+
+   In those cases, it would be possible to replace VSTM/VLDM by a
+   sequence of instructions; this is not currently implemented since
+   this is not certain to actually improve performance.  */
 
 const char *
 output_move_neon (rtx *operands)
@@ -14629,13 +14649,23 @@ output_move_neon (rtx *operands)
   switch (GET_CODE (addr))
 {
 case POST_INC:
-  templ = "v%smia%%?\t%%0!, %%h1";
-  ops[0] = XEXP (addr, 0);
+  /* We have to use vldm / vstm for too-large modes.  */
+  if (ARM_NUM_REGS (mode) / 2 > 4)
+   {
+ templ = "v%smia%%?\t%%0!, %%h1";
+ ops[0] = XEXP (addr, 0);
+   }
+  else
+   {
+ templ = "v%s1.64\t%%h1, %%A0";
+ ops[0] = mem;
+   }
   ops[1] = reg;
   break;
 
 case PRE_DEC:
-  /* FIXME: We should be using vld1/vst1 here in BE mode?  */
+  /* We have to use vldm / vstm in this case, since there is no
+pre-decrement form of the vld1 / vst1 instructions.  */
   templ = "v%smdb%%?\t%%0!, %%h1";
   ops[0] = XEXP (addr, 0);
   ops[1] = reg;
@@ -14679,7 

[PATCH, ARM] Use vld1/vst1 to implement vec_set/vec_extract

2012-09-14 Thread Ulrich Weigand
Hello,

following up on the prior patch, this patch exploits more opportunities to
generate the vld1 / vst1 family of instructions, this time to implement the
vec_set and vec_extract patterns with memory scalar operands.

Without the patch, vec_set/vec_extract only support register operands for the
scalar, possibly requiring extra moves.  In some cases we'd still get a vst1
instruction as a result, since combine would match a neon_vst1_lane pattern.

However, that pattern seems to be actually incorrect for big-endian systems
(due to a line number ordering mismatch).  The patch therefore also changes
neon_vst1_lane to use an UNSPEC instead of vec_select to model its operation,
just like all the other NEON intrinsic patterns depending on line numbers
already do.

Benchmarking showed only marginal improvements, but no regression.  It seems
useful to support this anyway ...

Tested on arm-linux-gnueabi.
OK for mainline?

Bye,
Ulrich


2012-09-14  Ulrich Weigand  

* config/arm/arm.c (arm_rtx_costs_1): Handle vec_extract and vec_set
patterns.
* config/arm/arm.md ("vec_set_internal"): Support memory source
operands, implemented via vld1 instruction.
("vec_extract"): Support memory destination operands, implemented
via vst1 instruction.
("neon_vst1_lane"): Use UNSPEC_VST1_LANE instead of vec_select.
* config/arm/predicates.md ("neon_lane_number"): Remove.

Index: gcc-head/gcc/config/arm/arm.c
===
--- gcc-head.orig/gcc/config/arm/arm.c  2012-09-14 19:40:51.0 +0200
+++ gcc-head/gcc/config/arm/arm.c   2012-09-14 19:41:14.0 +0200
@@ -7666,6 +7666,28 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
   return true;
 
 case SET:
+  /* The vec_extract patterns accept memory operands that require an
+address reload.  Account for the cost of that reload to give the
+auto-inc-dec pass an incentive to try to replace them.  */
+  if (TARGET_NEON && MEM_P (SET_DEST (x))
+ && GET_CODE (SET_SRC (x)) == VEC_SELECT)
+   {
+ *total = rtx_cost (SET_DEST (x), code, 0, speed);
+ if (!neon_vector_mem_operand (SET_DEST (x), 2))
+   *total += COSTS_N_INSNS (1);
+ return true;
+   }
+  /* Likewise for the vec_set patterns.  */
+  if (TARGET_NEON && GET_CODE (SET_SRC (x)) == VEC_MERGE
+ && GET_CODE (XEXP (SET_SRC (x), 0)) == VEC_DUPLICATE
+ && MEM_P (XEXP (XEXP (SET_SRC (x), 0), 0)))
+   {
+ rtx mem = XEXP (XEXP (SET_SRC (x), 0), 0);
+ *total = rtx_cost (mem, code, 0, speed);
+ if (!neon_vector_mem_operand (mem, 2))
+   *total += COSTS_N_INSNS (1);
+ return true;
+   }
   return false;
 
 case UNSPEC:
Index: gcc-head/gcc/config/arm/neon.md
===
--- gcc-head.orig/gcc/config/arm/neon.md2012-09-14 19:40:51.0 
+0200
+++ gcc-head/gcc/config/arm/neon.md 2012-09-14 19:41:14.0 +0200
@@ -416,30 +416,33 @@
   [(set_attr "neon_type" "neon_vld1_1_2_regs")])
 
 (define_insn "vec_set_internal"
-  [(set (match_operand:VD 0 "s_register_operand" "=w")
+  [(set (match_operand:VD 0 "s_register_operand" "=w,w")
 (vec_merge:VD
   (vec_duplicate:VD
-(match_operand: 1 "s_register_operand" "r"))
-  (match_operand:VD 3 "s_register_operand" "0")
-  (match_operand:SI 2 "immediate_operand" "i")))]
+(match_operand: 1 "nonimmediate_operand" "Um,r"))
+  (match_operand:VD 3 "s_register_operand" "0,0")
+  (match_operand:SI 2 "immediate_operand" "i,i")))]
   "TARGET_NEON"
 {
   int elt = ffs ((int) INTVAL (operands[2])) - 1;
   if (BYTES_BIG_ENDIAN)
 elt = GET_MODE_NUNITS (mode) - 1 - elt;
   operands[2] = GEN_INT (elt);
-  
-  return "vmov.\t%P0[%c2], %1";
+
+  if (which_alternative == 0)
+return "vld1.\t{%P0[%c2]}, %A1";
+  else
+return "vmov.\t%P0[%c2], %1";
 }
-  [(set_attr "neon_type" "neon_mcr")])
+  [(set_attr "neon_type" "neon_vld1_vld2_lane,neon_mcr")])
 
 (define_insn "vec_set_internal"
-  [(set (match_operand:VQ 0 "s_register_operand" "=w")
+  [(set (match_operand:VQ 0 "s_register_operand" "=w,w")
 (vec_merge:VQ
   (vec_duplicate:VQ
-(match_operand: 1 "s_register_operand" "r"))
-  (match_operand:VQ 3 "s_register_operand" "0")
-  (match_operand:SI 2 "immediate_operand" "i")))]
+(match_operand: 1 "nonimmediate_operand" "Um,r"))
+  (match_operand:VQ 3 "s_register_operand" "0,0")
+  (match_operand:SI 2 "immediate_operand" "i,i")))]
   "TARGET_NEON"
 {
   HOST_WIDE_INT elem = ffs ((int) INTVAL (operands[2])) - 1;
@@ -454,18 +457,21 @@
   operands[0] = gen_rtx_REG (mode, regno + hi);
   operands[2] = GEN_INT (elt);
 
-  return "vmov.\t%P0[%c2], %1";
+  if (which_alternative == 0)
+return "vld1.\t{%P0[%c2]}, %A1";
+  else
+  

[doc] vector extensions

2012-09-14 Thread Marc Glisse
A fairly trivial follow-up to the patch with the code. I added a line for 
PR 53024 while I was there...


2012-09-14  Marc Glisse  

PR c/53024
PR c++/54427
* doc/extend.texi (Vector Extensions): C++ improvements.
Power of 2 size requirement.

--
Marc GlisseIndex: doc/extend.texi
===
--- doc/extend.texi (revision 191308)
+++ doc/extend.texi (working copy)
@@ -6813,21 +6813,22 @@ typedef int v4si __attribute__ ((vector_
 
 The @code{int} type specifies the base type, while the attribute specifies
 the vector size for the variable, measured in bytes.  For example, the
 declaration above causes the compiler to set the mode for the @code{v4si}
 type to be 16 bytes wide and divided into @code{int} sized units.  For
 a 32-bit @code{int} this means a vector of 4 units of 4 bytes, and the
 corresponding mode of @code{foo} will be @acronym{V4SI}.
 
 The @code{vector_size} attribute is only applicable to integral and
 float scalars, although arrays, pointers, and function return values
-are allowed in conjunction with this construct.
+are allowed in conjunction with this construct. Only power of two
+sizes are currently allowed.
 
 All the basic integer types can be used as base types, both as signed
 and as unsigned: @code{char}, @code{short}, @code{int}, @code{long},
 @code{long long}.  In addition, @code{float} and @code{double} can be
 used to build floating-point vector types.
 
 Specifying a combination that is not valid for the current architecture
 will cause GCC to synthesize the instructions using a narrower mode.
 For example, if you specify a variable of type @code{V4SI} and your
 architecture does not allow for this specific SIMD type, GCC will
@@ -6850,21 +6851,21 @@ v4si a, b, c;
 
 c = a + b;
 @end smallexample
 
 Subtraction, multiplication, division, and the logical operations
 operate in a similar manner.  Likewise, the result of using the unary
 minus or complement operators on a vector type is a vector whose
 elements are the negative or complemented values of the corresponding
 elements in the operand.
 
-In C it is possible to use shifting operators @code{<<}, @code{>>} on
+It is possible to use shifting operators @code{<<}, @code{>>} on
 integer-type vectors. The operation is defined as following: @code{@{a0,
 a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
 @dots{}, an >> bn@}}@. Vector operands must have the same number of
 elements. 
 
 For the convenience in C it is allowed to use a binary vector operation
 where one operand is a scalar. In that case the compiler will transform
 the scalar operand into a vector where each element is the scalar from
 the operation. The transformation will happen only if the scalar could be
 safely converted to the vector-element type.
@@ -6881,21 +6882,21 @@ a = 2 * b;/* a = @{2,2,2,2@} * b; */
 
 a = l + a;/* Error, cannot convert long to int. */
 @end smallexample
 
 Vectors can be subscripted as if the vector were an array with
 the same number of elements and base type.  Out of bound accesses
 invoke undefined behavior at runtime.  Warnings for out of bound
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
-In GNU C vector comparison is supported within standard comparison
+Vector comparison is supported with standard comparison
 operators: @code{==, !=, <, <=, >, >=}. Comparison operands can be
 vector expressions of integer-type or real-type. Comparison between
 integer-type vectors and real-type vectors are not supported.  The
 result of the comparison is a vector of the same width and number of
 elements as the comparison operands with a signed integral element
 type.
 
 Vectors are compared element-wise producing 0 when comparison is false
 and -1 (constant of the appropriate type where all bits are set)
 otherwise. Consider the following example.


Re: [C++ Patch] PR 54575

2012-09-14 Thread Paolo Carlini

Jason,

H.J. figured out that this changed when case SCOPE_REF of 
value_dependent_expression_p started always returning true, part of the 
instantiation_dependent_p work...


I'm unassigning myself, I guess you will immediately see which further 
changes are needed.


Thanks!
Paolo.


Tighten forwprop1 testing

2012-09-14 Thread Marc Glisse

Hello,

recent patches have let optimizations move from forwprop2 to forwprop1. 
The attached checks that this remains the case. (copyprop1 is the first 
pass after forwprop1 that does a dce-like cleanup)


Only manually tested for now, will check better if it is accepted.

2012-09-15  Marc Glisse  

* gcc.dg/tree-ssa/forwprop-19.c: Check in forwprop1.
* gcc.dg/tree-ssa/forwprop-20.c: Check in forwprop1.
* gcc.dg/tree-ssa/forwprop-21.c: Check in copyprop1.
* gcc.dg/tree-ssa/forwprop-22.c: Check in copyprop1.

--
Marc GlisseIndex: testsuite/gcc.dg/tree-ssa/forwprop-19.c
===
--- testsuite/gcc.dg/tree-ssa/forwprop-19.c (revision 191308)
+++ testsuite/gcc.dg/tree-ssa/forwprop-19.c (working copy)
@@ -1,15 +1,15 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-forwprop2" } */
+/* { dg-options "-O -fdump-tree-forwprop1" } */
 
 typedef int vec __attribute__((vector_size (4 * sizeof (int;
 void f (vec *x1, vec *x2)
 {
   vec m = { 1, 2, 3, 0 };
   vec n = { 3, 0, 1, 2 };
   vec y = __builtin_shuffle (*x1, *x2, n);
   vec z = __builtin_shuffle (y, m);
   *x1 = z;
 }
 
-/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop2" } } */
-/* { dg-final { cleanup-tree-dump "forwprop2" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */
+/* { dg-final { cleanup-tree-dump "forwprop1" } } */
Index: testsuite/gcc.dg/tree-ssa/forwprop-20.c
===
--- testsuite/gcc.dg/tree-ssa/forwprop-20.c (revision 191308)
+++ testsuite/gcc.dg/tree-ssa/forwprop-20.c (working copy)
@@ -1,13 +1,13 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target double64 } */
-/* { dg-options "-O2 -fdump-tree-optimized" }  */
+/* { dg-options "-O -fdump-tree-forwprop1" }  */
 
 #include 
 
 /* All of these optimizations happen for unsupported vector modes as a
consequence of the lowering pass. We need to test with a vector mode
that is supported by default on at least some architectures, or make
the test target specific so we can pass a flag like -mavx.  */
 
 typedef double vecf __attribute__ ((vector_size (2 * sizeof (double;
 typedef int64_t veci __attribute__ ((vector_size (2 * sizeof (int64_t;
@@ -59,12 +59,12 @@ void k (vecf* r)
 }
 
 void l (double d, vecf* r)
 {
   vecf x = { -d, 5 };
   vecf y = {  d, 4 };
   veci m = {  2, 0 };
   *r = __builtin_shuffle (x, y, m); // { d, -d }
 }
 
-/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
-/* { dg-final { cleanup-tree-dump "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "forwprop1" } } */
+/* { dg-final { cleanup-tree-dump "forwprop1" } } */
Index: testsuite/gcc.dg/tree-ssa/forwprop-21.c
===
--- testsuite/gcc.dg/tree-ssa/forwprop-21.c (revision 191308)
+++ testsuite/gcc.dg/tree-ssa/forwprop-21.c (working copy)
@@ -1,13 +1,16 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-options "-O -fdump-tree-copyprop1" } */
 typedef int v4si __attribute__ ((vector_size (4 * sizeof(int;
 
 int
 test (v4si *x, v4si *y)
 {
   v4si m = { 2, 3, 6, 5 };
   v4si z = __builtin_shuffle (*x, *y, m);
   return z[2];
 }
-/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
-/* { dg-final { cleanup-tree-dump "optimized" } } */
+
+/* Optimization in forwprop1, cleanup in copyprop1.  */
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "copyprop1" } } */
+/* { dg-final { cleanup-tree-dump "copyprop1" } } */
Index: testsuite/gcc.dg/tree-ssa/forwprop-22.c
===
--- testsuite/gcc.dg/tree-ssa/forwprop-22.c (revision 191308)
+++ testsuite/gcc.dg/tree-ssa/forwprop-22.c (working copy)
@@ -1,18 +1,20 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_double } */
 /* { dg-require-effective-target vect_perm } */
-/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-options "-O -fdump-tree-copyprop1" } */
 
 typedef double vec __attribute__((vector_size (2 * sizeof (double;
 void f (vec *px, vec *y, vec *z)
 {
   vec x = *px;
   vec t1 = { x[1], x[0] };
   vec t2 = { x[0], x[1] };
   *y = t1;
   *z = t2;
 }
 
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } */
-/* { dg-final { cleanup-tree-dump "optimized" } } */
+/* Optimization in forwprop1, cleanup in copyprop1.  */
+
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "copyprop1" } } */
+/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "copyprop1" } } */
+/* { dg-final { cleanup-tree-dump "copyprop1" } } */


PR 44436 Associative containers emplace/emplace_hint

2012-09-14 Thread François Dumont

Hi

Here is a patch to add emplace/emplace_hint on associative 
containers in C++11 mode.


I did some refactoring to use as much of the same code between 
insert and emplace methods..


I have also change map::operator[] to now use the emplace logic in 
C++11 so that we only need the value type to be default constructible.


The C++11 status table was not signaling that those methods were 
missing so I didn't had to update it.


2012-09-14  François Dumont  

PR libstdc++/44436
* include/bits/stl_tree.h
(_Rb_tree<>::_M_insert_): Take _Base_ptr rather than
_Const_Base_ptr.
(_Rb_tree<>::_M_insert_node): New.
(_Rb_tree<>::_M_get_insert_unique_pos): New, search code of
_M_insert_unique method.
(_Rb_tree<>::_M_insert_unique): Use latter.
(_Rb_tree<>::_M_emplace_unique): New, likewise.
(_Rb_tree<>::_M_get_insert_equal_pos): New, search code of
_M_insert_equal method.
(_Rb_tree<>::_M_insert_equal): Use latter.
(_Rb_tree<>::_M_emplace_equal): New, likewise.
(_Rb_tree<>::_M_get_insert_hint_unique_pos): New, search code of
_M_insert_unique_ method.
(_Rb_tree<>::_M_insert_unique_): Use latter.
(_Rb_tree<>::_M_emplace_hint_unique): New, likewise.
(_Rb_tree<>::_M_get_insert_hint_equal_pos): New, search code of
_M_insert_equal_ method.
(_Rb_tree<>::_M_insert_equal_): Use latter.
(_Rb_tree<>::_M_emplace_hint_equal): New, likewise.
(_Rb_tree<>::_M_insert_lower): Remove first _Base_ptr parameter,
useless as always null.
* include/bits/stl_map.h: Include  in C++11.
(map<>::operator[](const key_type&)): Use
_Rb_tree<>::_M_emplace_hint_unique in C++11.
(map<>::operator[](key_type&&)): Likewise.
(map<>::emplace): New.
(map<>::emplace_hint): New.
* include/bits/stl_multimap.h (multimap<>::emplace): New.
(multimap<>::emplace_hint): New.
* include/bits/stl_set.h (set<>::emplace): New.
(set<>::emplace_hint): New.
* include/bits/stl_multiset.h (multiset<>::emplace): New.
(multiset<>::emplace_hint): New.
* include/debug/map.h (std::__debug::map<>::emplace): New.
(std::__debug::map<>::emplace_hint): New.
* include/debug/multimap.h (std::__debug::multimap<>::emplace):
New.
(std::__debug::multimap<>::emplace_hint): New.
* include/debug/set.h (std::__debug::set<>::emplace): New.
(std::__debug::set<>::emplace_hint): New.
* include/debug/multiset.h (std::__debug::multiset<>::emplace):
New.
(std::__debug::multiset<>::emplace_hint): New.
* testsuite/util/testsuite_container_traits.h: Signal that emplace
and emplace_hint are available on std::map, std::multimap,
std::set and std::multiset in C++11.
* testsuite/23_containers/map/modifiers/emplate/1.cc: New.
* testsuite/23_containers/multimap/modifiers/emplate/1.cc: New.
* testsuite/23_containers/set/modifiers/emplate/1.cc: New.
* testsuite/23_containers/multiset/modifiers/emplate/1.cc: 
New.Tested under linux x86_64.


Tested x86_64 linux, normal/debug modes, C++98/C++11 modes.

Ok to commit ?

François

Index: include/bits/stl_tree.h
===
--- include/bits/stl_tree.h	(revision 191279)
+++ include/bits/stl_tree.h	(working copy)
@@ -570,27 +570,50 @@
   typedef std::reverse_iterator const_reverse_iterator;
 
 private:
+  pair<_Base_ptr, _Base_ptr>
+  _M_get_insert_unique_pos(const key_type& __k);
+
+  pair<_Base_ptr, _Base_ptr>
+  _M_get_insert_equal_pos(const key_type& __k);
+
+  pair<_Base_ptr, _Base_ptr>
+  _M_get_insert_hint_unique_pos(const_iterator __pos,
+const key_type& __k);
+
+  pair<_Base_ptr, _Base_ptr>
+  _M_get_insert_hint_equal_pos(const_iterator __pos,
+   const key_type& __k);
+
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
   template
 iterator
-_M_insert_(_Const_Base_ptr __x, _Const_Base_ptr __y, _Arg&& __v);
+_M_insert_(_Base_ptr __x, _Base_ptr __y, _Arg&& __v);
 
+  iterator
+  _M_insert_node(_Base_ptr __x, _Base_ptr __y, _Link_type __z);
+
   template
 iterator
-_M_insert_lower(_Base_ptr __x, _Base_ptr __y, _Arg&& __v);
+_M_insert_lower(_Base_ptr __y, _Arg&& __v);
 
   template
 iterator
 _M_insert_equal_lower(_Arg&& __x);
+
+  iterator
+  _M_insert_lower_node(_Base_ptr __p, _Link_type __z);
+
+  iterator
+  _M_insert_equal_lower_node(_Link_type __z);
 #else
   iterator
-  _M_insert_(_Const_Base_ptr __x, _Const_Base_ptr __y,
+  _M_insert_(_Base_ptr __x, _Base_ptr __y,
 		 const value_type& __v);
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 233. Insertion hints in associative containers.
   iterator
-  _M_insert_lower(_Base_ptr __x, _Base_ptr __y, const value_type& __v);
+  _M_insert_lower(_Base_ptr __y, const value_type& __v);
 
   iterator
   _M_insert_equal_lower(const value_type& __x);
@@ -726,6 +749,22 @@

[google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Teresa Johnson
Backport from trunk r190952 to add counter histogram to gcov program summary,
and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).

Tested on x86_64-unknown-linux-gnu. Ok for google branches?

2012-09-14  Teresa Johnson  

* libgcc/libgcov.c (gcov_histogram_insert): New function.
(gcov_compute_histogram): Ditto.
(sort_by_reverse_gcov_value): Remove function.
(gcov_compute_cutoff_values): Ditto.
(gcov_merge_gcda_file): Merge histogram while merging summary.
(gcov_gcda_file_size): Include histogram in summary size computation.
(gcov_write_gcda_file): Remove assert that is no longer valid.
(gcov_exit_init): Invoke gcov_compute_histogram.
* gcc/gcov-io.c (gcov_write_summary): Write out non-zero histogram
entries to function summary along with an occupancy bit vector.
(gcov_read_summary): Read in the histogram entries.
(gcov_histo_index): New function.
(gcov_histogram_merge): Ditto.
* gcc/gcov-io.h (gcov_type_unsigned): New type.
(struct gcov_bucket_type): Ditto.
(struct gcov_ctr_summary): Include histogram.
(GCOV_TAG_SUMMARY_LENGTH): Update to include histogram entries.
(GCOV_HISTOGRAM_SIZE): New macro.
(GCOV_HISTOGRAM_BITVECTOR_SIZE): Ditto.
(gcov_gcda_file_size): New parameter.
* gcc/profile.c (NUM_GCOV_WORKING_SETS): Ditto.
(gcov_working_sets): New global variable.
(compute_working_sets): New function.
(find_working_set): Ditto.
(get_exec_counts): Invoke compute_working_sets.
* gcc/loop-unroll.c (code_size_limit_factor): Call new function
find_working_set to obtain working set information.
* gcc/coverage.c (read_counts_file): Merge histograms, and
fix bug with accessing summary info for non-summable counters.
* gcc/basic-block.h (gcov_type_unsigned): New type.
(struct gcov_working_set_info): Ditto.
(find_working_set): Declare.
* gcc/gcov-dump.c (tag_summary): Dump out histogram.
* gcc/configure.ac (HOST_HAS_F_SETLKW): Set based on compile
test using F_SETLKW with fcntl.
* gcc/configure, gcc/config.in: Regenerate.

Index: libgcc/libgcov.c
===
--- libgcc/libgcov.c(revision 191302)
+++ libgcc/libgcov.c(working copy)
@@ -585,6 +585,76 @@ gcov_dump_module_info (void)
   __gcov_finalize_dyn_callgraph ();
 }
 
+/* Insert counter VALUE into HISTOGRAM.  */
+
+static void
+gcov_histogram_insert(gcov_bucket_type *histogram, gcov_type value)
+{
+  unsigned i;
+
+  i = gcov_histo_index(value);
+  histogram[i].num_counters++;
+  histogram[i].cum_value += value;
+  if (value < histogram[i].min_value)
+histogram[i].min_value = value;
+}
+
+/* Computes a histogram of the arc counters to place in the summary SUM.  */
+
+static void
+gcov_compute_histogram (struct gcov_summary *sum)
+{
+  struct gcov_info *gi_ptr;
+  const struct gcov_fn_info *gfi_ptr;
+  const struct gcov_ctr_info *ci_ptr;
+  struct gcov_ctr_summary *cs_ptr;
+  unsigned t_ix, f_ix, ctr_info_ix, ix;
+  int h_ix;
+
+  /* This currently only applies to arc counters.  */
+  t_ix = GCOV_COUNTER_ARCS;
+
+  /* First check if there are any counts recorded for this counter.  */
+  cs_ptr = &(sum->ctrs[t_ix]);
+  if (!cs_ptr->num)
+return;
+
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+{
+  cs_ptr->histogram[h_ix].num_counters = 0;
+  cs_ptr->histogram[h_ix].min_value = cs_ptr->run_max;
+  cs_ptr->histogram[h_ix].cum_value = 0;
+}
+
+  /* Walk through all the per-object structures and record each of
+ the count values in histogram.  */
+  for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
+{
+  if (!gi_ptr->merge[t_ix])
+continue;
+
+  /* Find the appropriate index into the gcov_ctr_info array
+ for the counter we are currently working on based on the
+ existence of the merge function pointer for this object.  */
+  for (ix = 0, ctr_info_ix = 0; ix < t_ix; ix++)
+{
+  if (gi_ptr->merge[ix])
+ctr_info_ix++;
+}
+  for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++)
+{
+  gfi_ptr = gi_ptr->functions[f_ix];
+
+  if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+continue;
+
+  ci_ptr = &gfi_ptr->ctrs[ctr_info_ix];
+  for (ix = 0; ix < ci_ptr->num; ix++)
+gcov_histogram_insert (cs_ptr->histogram, ci_ptr->values[ix]);
+}
+}
+}
+
 /* Dump the coverage counts. We merge with existing counts when
possible, to avoid growing the .da files ad infinitum. We use this
program's checksum to make sure we only accumulate whole program
@@ -758,118 +828,6 @@ gcov_sort_topn_counter_arrays (const struct gcov_i
  }
 }
 
-/* Used by qsort to sort gcov values in descending order.  */
-
-static int
-sort_by_reverse_gcov_va

Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Diego Novillo
On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson  wrote:
> Backport from trunk r190952 to add counter histogram to gcov program summary,
> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).

Why don't we just get this via the trunk -> google/main merges?


Diego.


Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Teresa Johnson
On Fri, Sep 14, 2012 at 1:10 PM, Diego Novillo  wrote:
> On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson  wrote:
>> Backport from trunk r190952 to add counter histogram to gcov program summary,
>> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).
>
> Why don't we just get this via the trunk -> google/main merges?
>
>
> Diego.

Should I just put it onto ggogle/4_7 and 4_6 directly then?

Teresa

-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Diego Novillo

On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote:


Should I just put it onto ggogle/4_7 and 4_6 directly then?


Yeah.  Not sure it's really needed in 4_6, though.


Diego.


Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Xinliang David Li
Yes. The google/main update will happen next quarter.

David

On Fri, Sep 14, 2012 at 1:17 PM, Teresa Johnson  wrote:
> On Fri, Sep 14, 2012 at 1:10 PM, Diego Novillo  wrote:
>> On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson  wrote:
>>> Backport from trunk r190952 to add counter histogram to gcov program 
>>> summary,
>>> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).
>>
>> Why don't we just get this via the trunk -> google/main merges?
>>
>>
>> Diego.
>
> Should I just put it onto ggogle/4_7 and 4_6 directly then?
>
> Teresa
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Teresa Johnson
On Fri, Sep 14, 2012 at 1:19 PM, Diego Novillo  wrote:
> On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote:
>
>> Should I just put it onto ggogle/4_7 and 4_6 directly then?
>
>
> Yeah.  Not sure it's really needed in 4_6, though.

Ok. There are only trivial differences between the patch I uploaded
for google/main and the google/4_7 patch that I have also already
created and tested. Ok for google/4_7?

Thanks,
Teresa

>
>
> Diego.



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: [google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

2012-09-14 Thread Xinliang David Li
yes.

thanks,

David

On Fri, Sep 14, 2012 at 1:20 PM, Teresa Johnson  wrote:
> On Fri, Sep 14, 2012 at 1:19 PM, Diego Novillo  wrote:
>> On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote:
>>
>>> Should I just put it onto ggogle/4_7 and 4_6 directly then?
>>
>>
>> Yeah.  Not sure it's really needed in 4_6, though.
>
> Ok. There are only trivial differences between the patch I uploaded
> for google/main and the google/4_7 patch that I have also already
> created and tested. Ok for google/4_7?
>
> Thanks,
> Teresa
>
>>
>>
>> Diego.
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


[PATCH,ARM] Suppress the dynamic linker commands for statically linked programs

2012-09-14 Thread Ben Cheng
Hi,

Recently we found out that the .interp section starts to show up in
ARM executables compiled with "-shared -static" and the gold linker
from binutils 2.22. We tracked down the origin of the dynamic linker
commands and they are always explicitly specified in
config/arm/linux-elf.h. We tested the following simple patch to
suppress the dynamic linker options for statically linked programs on
Android's AOSP tree. Everything builds fine and the unneeded .interp
section is gone.

Thanks,
-Ben

==

gcc/ChangeLog
2012-09-14 Ben Cheng 
* config/arm/linux-elf.h: Suppress the dynamic linker commands for
statically linked programs.

Index: config/arm/linux-elf.h
===
--- config/arm/linux-elf.h  (revision 191198)
+++ config/arm/linux-elf.h  (working copy)
@@ -65,8 +65,9 @@
%{static:-Bstatic} \
%{shared:-shared} \
%{symbolic:-Bsymbolic} \
-   %{rdynamic:-export-dynamic} \
-   -dynamic-linker " GNU_USER_DYNAMIC_LINKER " \
+   %{!static: \
+ %{rdynamic:-export-dynamic} \
+ -dynamic-linker " GNU_USER_DYNAMIC_LINKER "} \
-X \
%{mbig-endian:-EB} %{mlittle-endian:-EL}" \
SUBTARGET_EXTRA_LINK_SPEC


Re: PR 44436 Associative containers emplace/emplace_hint

2012-09-14 Thread Paolo Carlini

On 09/14/2012 10:07 PM, François Dumont wrote:

Hi

Here is a patch to add emplace/emplace_hint on associative 
containers in C++11 mode.

Ah, excellent! I will review the patch over the next couple of days.
I did some refactoring to use as much of the same code between 
insert and emplace methods..


I have also change map::operator[] to now use the emplace logic in 
C++11 so that we only need the value type to be default constructible.

About this, can I ask you to likewise add completely similar testcases too?

Thanks!
Paolo.


Re: [TILE-Gx, committed] support -mcmodel=MODEL

2012-09-14 Thread Walter Lee

On 9/1/2012 7:33 AM, Gerald Pfeifer wrote:

On Tue, 28 Aug 2012, Walter Lee wrote:

This patch adds support for the -mcmodel=MODEL flag on TILE-Gx.

At which point I cannot help asking for an update to the
release notes at http://gcc.gnu.org/gcc-4.8/changes.html. ;-)

Let me know if you need help with that.


How does this look:

--- changes.html6 Sep 2012 03:42:45 -   1.28
+++ changes.html14 Sep 2012 20:40:39 -
@@ -298,6 +298,13 @@ by this change.
 Added optimized instruction scheduling for Niagara4.
   

+TILE-Gx
+
+  
+Added support for the -mcmodel=MODEL command-line option.  The
+models supported are small and large.
+  
+
 XStormy16

   


  @emph{TILEPro Options}
  @gccoptlist{-mcpu=CPU -m32}

Why are only -mcpu and -m32 listed here?


Those are the only options supported on TILEPro.  -m32 is basically a 
no-op as tilepro does not support other models.


I've fixed the remaining grammar/spelling issues you pointed out. See 
patch below.


Thanks,

Walter

* doc/invoke.texi (Option Summary): fix typesetting for -mcpu
option for TILEPro and TILE-Gx.
(TILE-Gx Options): Fix grammar and spellings in documentation for
-mcmodel.

--- gcc/doc/invoke.texi(revision 191306)
+++ gcc/doc/invoke.texi(working copy)
@@ -932,10 +932,10 @@ See RS/6000 and PowerPC Options.
 @gccoptlist{-Qy  -Qn  -YP,@var{paths}  -Ym,@var{dir}}

 @emph{TILE-Gx Options}
-@gccoptlist{-mcpu=CPU -m32 -m64 -mcmodel=@var{code-model}}
+@gccoptlist{-mcpu=@var{cpu} -m32 -m64 -mcmodel=@var{code-model}}

 @emph{TILEPro Options}
-@gccoptlist{-mcpu=CPU -m32}
+@gccoptlist{-mcpu=@var{cpu} -m32}

 @emph{V850 Options}
 @gccoptlist{-mlong-calls  -mno-long-calls  -mep  -mno-ep @gol
@@ -19003,13 +19003,13 @@ These @samp{-m} options are supported on
 @table @gcctabopt
 @item -mcmodel=small
 @opindex mcmodel=small
-Generate code for the small model.  Distance for direct calls is
+Generate code for the small model.  The distance for direct calls is
 limited to 500M in either direction.  PC-relative addresses are 32
 bits.  Absolute addresses support the full address range.

 @item -mcmodel=large
 @opindex mcmodel=large
-Generate code for the large model.  There is no limiation on call
+Generate code for the large model.  There is no limitation on call
 distance, pc-relative addresses, or absolute addresses.

 @item -mcpu=@var{name}



Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread Segher Boessenkool

I don't think TARGET_MFCRF is correct.  For example, if you use
-mcpu=powerpc64 (which doesn't set this flag) you will get code
that does not run on the newer machines.


Sorry, but it seems to be working here...
I explain how I tested this in the end of the email.


David tells me all current CPUs actually do support the MFTB insns
just fine, so that there is no problem.

Cheers,


Segher



Fix C ICE with casts to pointers to VLAs (PR c/54552)

2012-09-14 Thread Joseph S. Myers
Bug 54552 is a C front-end regression involving ICEs when an
expression involving C_MAYBE_CONST_EXPR, such as a compound literal of
variably modified type, is cast to a variably modified type.  This
patch fixes it by doing the appropriate folding before creating the
outer C_MAYBE_CONST_EXPR.

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  Applied
to mainline.  Will apply to 4.7 (when not frozen) and 4.6 branches
subject to testing there.

c:
2012-09-14  Joseph Myers  

PR c/54552
* c-typeck.c (c_cast_expr): When casting to a type requiring
C_MAYBE_CONST_EXPR to be created, pass the inner expression to
c_fully_fold first.

testsuite:
2012-09-14  Joseph Myers  

PR c/54552
* gcc.c-torture/compile/pr54552-1.c: New test.

Index: c/c-typeck.c
===
--- c/c-typeck.c(revision 191305)
+++ c/c-typeck.c(working copy)
@@ -4779,8 +4779,11 @@ c_cast_expr (location_t loc, struct c_type_name *t
   ret = build_c_cast (loc, type, expr);
   if (type_expr)
 {
+  bool inner_expr_const = true;
+  ret = c_fully_fold (ret, require_constant_value, &inner_expr_const);
   ret = build2 (C_MAYBE_CONST_EXPR, TREE_TYPE (ret), type_expr, ret);
-  C_MAYBE_CONST_EXPR_NON_CONST (ret) = !type_expr_const;
+  C_MAYBE_CONST_EXPR_NON_CONST (ret) = !(type_expr_const
+&& inner_expr_const);
   SET_EXPR_LOCATION (ret, loc);
 }
 
Index: testsuite/gcc.c-torture/compile/pr54552-1.c
===
--- testsuite/gcc.c-torture/compile/pr54552-1.c (revision 0)
+++ testsuite/gcc.c-torture/compile/pr54552-1.c (revision 0)
@@ -0,0 +1,8 @@
+void
+f (void)
+{
+  unsigned n = 10;
+
+  typedef double T[n];
+  (double (*)[n])((unsigned char (*)[sizeof (T)]){ 0 });
+}

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread David Edelsohn
On Fri, Sep 14, 2012 at 4:52 PM, Segher Boessenkool
 wrote:
>>> I don't think TARGET_MFCRF is correct.  For example, if you use
>>> -mcpu=powerpc64 (which doesn't set this flag) you will get code
>>> that does not run on the newer machines.
>>
>> Sorry, but it seems to be working here...
>> I explain how I tested this in the end of the email.
>
> David tells me all current CPUs actually do support the MFTB insns
> just fine, so that there is no problem.

There is no ideal solution and this method will work on all PowerPC
targets with a combination of GCC and Binutils support., so I think
this is the best we can do.

Thanks, David


Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread Tulio Magno Quites Machado Filho

Segher Boessenkool  writes:

I don't think TARGET_MFCRF is correct.  For example, if you 
use
-mcpu=powerpc64 (which doesn't set this flag) you will get 
code

that does not run on the newer machines.


Sorry, but it seems to be working here...
I explain how I tested this in the end of the email.


David tells me all current CPUs actually do support the MFTB 
insns

just fine, so that there is no problem.


I don't understand the problem.

Let me paste again this snippet of code here:
It was generated using the __builtin_ppc_get_timebase test case 
with the
extra parameters -S -m64 -mcpu=power7. The rest is the same used 
by the

test suite.

.L.main:
std 31,-8(1)
stdu 1,-80(1)
mr 31,1
mfspr 9, 268
std 9,56(31)
li 9,0
stw 9,48(31)
b .L2

I've just done this same test for all -mcpu values from power3 
through power7
and both -m32 and -m64. The only value that outputs mftb is power3 
in both

environments.

What am I missing?

--
Tulio Magno



[PATCH, docs] Fix some obsolete info in tm.texi

2012-09-14 Thread Sandra Loosemore
While trying to revive an ancient 4.1-based port, I found that tm.texi 
still documented things like current_function_pretend_args_size which 
were deleted in 2008.  I've updated the text to reflect the idioms used 
in current code.


Checked in as obvious after building and inspecting the manual.

-Sandra


2012-09-14  Sandra Loosemore  

gcc/

* doc/tm.texi.in (Stack Arguments): Update obsolete references
to current_function_outgoing_args_size.
(Function Entry): Likewise for current_function_pops_args,
current_function_pretend_args_size,
current_function_outgoing_args_size, and
current_function_epilogue_delay_list.
(Misc): Fix garbled sentence referencing nonexistent
current_function_leaf_function.
* doc/tm.texi: Regenerated.

Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in	(revision 191332)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -3863,11 +3863,12 @@ alignment.  Then the definition should b
 If the value of this macro has a type, it should be an unsigned type.
 @end defmac
 
-@findex current_function_outgoing_args_size
+@findex outgoing_args_size
+@findex crtl->outgoing_args_size
 @defmac ACCUMULATE_OUTGOING_ARGS
 A C expression.  If nonzero, the maximum amount of space required for outgoing arguments
-will be computed and placed into the variable
-@code{current_function_outgoing_args_size}.  No space will be pushed
+will be computed and placed into
+@code{crtl->outgoing_args_size}.  No space will be pushed
 onto the stack for each call; instead, the function prologue should
 increase the stack frame size by this amount.
 
@@ -3901,7 +3902,7 @@ if the function called is a library func
 
 If @code{ACCUMULATE_OUTGOING_ARGS} is defined, this macro controls
 whether the space for these arguments counts in the value of
-@code{current_function_outgoing_args_size}.
+@code{crtl->outgoing_args_size}.
 @end defmac
 
 @defmac STACK_PARMS_IN_REG_PARM_AREA
@@ -4700,7 +4701,8 @@ others leave that for the caller to do. 
 given @option{-mrtd} pops arguments in functions that take a fixed
 number of arguments.
 
-@findex current_function_pops_args
+@findex pops_args
+@findex crtl->args.pops_args
 Your definition of the macro @code{RETURN_POPS_ARGS} decides which
 functions pop their own arguments.  @code{TARGET_ASM_FUNCTION_EPILOGUE}
 needs to know what was decided.  The number of bytes of the current
@@ -4710,8 +4712,9 @@ function's arguments that this function 
 
 @itemize @bullet
 @item
-@findex current_function_pretend_args_size
-A region of @code{current_function_pretend_args_size} bytes of
+@findex pretend_args_size
+@findex crtl->args.pretend_args_size
+A region of @code{crtl->args.pretend_args_size} bytes of
 uninitialized space just underneath the first argument arriving on the
 stack.  (This may not be at the very start of the allocated stack region
 if the calling sequence has pushed anything else since pushing the stack
@@ -4738,7 +4741,7 @@ save area closer to the top of the stack
 @item
 @cindex @code{ACCUMULATE_OUTGOING_ARGS} and stack frames
 Optionally, when @code{ACCUMULATE_OUTGOING_ARGS} is defined, a region of
-@code{current_function_outgoing_args_size} bytes to be used for outgoing
+@code{crtl->outgoing_args_size} bytes to be used for outgoing
 argument lists of the function.  @xref{Stack Arguments}.
 @end itemize
 
@@ -4787,11 +4790,12 @@ may be reconsidered for a subsequent del
 (at least in principle) be considered for the so far unfilled delay
 slot.
 
-@findex current_function_epilogue_delay_list
+@findex epilogue_delay_list
+@findex crtl->epilogue_delay_list
 @findex final_scan_insn
 The insns accepted to fill the epilogue delay slots are put in an RTL
-list made with @code{insn_list} objects, stored in the variable
-@code{current_function_epilogue_delay_list}.  The insn for the first
+list made with @code{insn_list} objects, stored in
+@code{crtl->epilogue_delay_list}.  The insn for the first
 delay slot comes first in the list.  Your definition of the macro
 @code{TARGET_ASM_FUNCTION_EPILOGUE} should fill the delay slots by
 outputting the insns in this list, usually by calling
@@ -10831,8 +10835,8 @@ the hard register itself, if it is known
 @code{MEM}.
 If you are returning a @code{MEM}, this is only a hint for the allocator;
 it might decide to use another register anyways.
-You may use @code{current_function_leaf_function} in the hook, functions
-that use @code{REG_N_SETS}, to determine if the hard
+You may use @code{current_function_is_leaf} or 
+@code{REG_N_SETS} in the hook to determine if the hard
 register in question will not be clobbered.
 The default value of this hook is @code{NULL}, which disables any special
 allocation.


Re: [PATCH,mmix] convert to constraints.md

2012-09-14 Thread Hans-Peter Nilsson
On Wed, 12 Sep 2012, Hans-Peter Nilsson wrote:
> On Wed, 12 Sep 2012, Nathan Froyd wrote:
> > > - Keeping old layout of "mmix_reg_or_8bit_operand".  That looked like
> > >   a spurious change and I prefer the ior construct to the
> > >   if_then_else.
> >
> > ISTR without this change, there were lots of assembly changes like:

> I'll try with your original patch and see it I can spot
> something.

Nope, I see no differences in the generated code before/after
the patch-patch below (applied to your original patch, except
edited as if using --no-prefix, to fit with my other patches).
Case closed: I don't think gen* mishandled neither construct.

--- patch.nathanorig.adjusted   2012-09-12 12:33:34.0 +0200
+++ patch3  2012-09-14 14:42:31.0 +0200
@@ -364,7 +364,7 @@ diff --git a/gcc/config/mmix/predicates.
 index b5773b8..7fa3bf1 100644
 --- gcc/config/mmix/predicates.md
 +++ gcc/config/mmix/predicates.md
-@@ -149,7 +149,13 @@
+@@ -149,7 +149,14 @@
  ;; True if this is a register or an int 0..255.

  (define_predicate "mmix_reg_or_8bit_operand"
@@ -372,9 +372,10 @@ index b5773b8..7fa3bf1 100644
 -   (match_operand 0 "register_operand")
 -   (and (match_code "const_int")
 -  (match_test "CONST_OK_FOR_LETTER_P (INTVAL (op), 'I')"
-+  (if_then_else (match_code "const_int")
-+(match_test "satisfies_constraint_I (op)")
-+(match_operand 0 "register_operand")))
++  (ior
++   (match_operand 0 "register_operand")
++   (and (match_code "const_int")
++  (match_test "satisfies_constraint_I (op)"
 +
 +;; True if this is a memory address, possibly strictly.
 +

brgds, H-P


Re: [PATCH v2] rs6000: Add 2 built-ins to read the Time Base Register on PowerPC

2012-09-14 Thread David Edelsohn
On Fri, Sep 14, 2012 at 8:34 PM, Tulio Magno Quites Machado Filho
 wrote:
> Segher Boessenkool  writes:

> I don't understand the problem.

There is no problem.

Segher is concerned that -mcpu=powerpc64, which is suppose to generate
"generic" PPC64 code produces mftb instead of mfspr.  However, there
is no right answer and it is unclear if it is better for
-mcpu=powerpc64 to support the most processors now or only forward
compatibility for the future.  for the moment, working on all PPC64
systems seems like the better option.  we can revisit this in the
future when POWER3 is even father in the past.

- david


Re: [PATCH] Set correct source location for deallocator calls

2012-09-14 Thread H.J. Lu
On Sat, Sep 8, 2012 at 2:42 PM, Dehao Chen  wrote:
> Hi,
>
> I've added a libjava unittest which verifies that this patch will not
> break Java debug info. I've also incorporated Richard's review in the
> previous mail. Attached is the new patch, which passed bootstrap and
> all gcc/libjava testsuites on x86.
>
> Is it ok for trunk?
>
> Thanks,
> Dehao
>
> gcc/ChangeLog:
> 2012-09-08  Dehao Chen  
>
>  * tree-eh.c (goto_queue_node): New field.
> (record_in_goto_queue): New parameter.
> (record_in_goto_queue_label): New parameter.
> (lower_try_finally_dup_block): New parameter.
> (maybe_record_in_goto_queue): Update source location.
> (lower_try_finally_copy): Likewise.
> (honor_protect_cleanup_actions): Likewise.
> * gimplify.c (gimplify_expr): Reset the location to unknown.
>
> gcc/testsuite/ChangeLog:
> 2012-09-08  Dehao Chen  
>
> * g++.dg/debug/dwarf2/deallocator.C: New test.
>
> libjava/ChangeLog:
> 2012-09-08  Dehao Chen  
>
> * testsuite/libjava.lang/sourcelocation.java: New cases.
> * testsuite/libjava.lang/sourcelocation.out: New cases.

On Linux/x86, I got

FAIL: sourcelocation -O3 -findirect-dispatch output - source compiled test
FAIL: sourcelocation -O3 output - source compiled test
FAIL: sourcelocation -findirect-dispatch output - source compiled test
FAIL: sourcelocation output - source compiled test

spawn [open ...]^M
-1
-1
-1
PASS: sourcelocation -findirect-dispatch execution - source compiled test
FAIL: sourcelocation -findirect-dispatch output - source compiled test


-- 
H.J.


Re: [PATCH] Set correct source location for deallocator calls

2012-09-14 Thread H.J. Lu
On Fri, Sep 14, 2012 at 9:25 PM, H.J. Lu  wrote:
> On Sat, Sep 8, 2012 at 2:42 PM, Dehao Chen  wrote:
>> Hi,
>>
>> I've added a libjava unittest which verifies that this patch will not
>> break Java debug info. I've also incorporated Richard's review in the
>> previous mail. Attached is the new patch, which passed bootstrap and
>> all gcc/libjava testsuites on x86.
>>
>> Is it ok for trunk?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>>  * tree-eh.c (goto_queue_node): New field.
>> (record_in_goto_queue): New parameter.
>> (record_in_goto_queue_label): New parameter.
>> (lower_try_finally_dup_block): New parameter.
>> (maybe_record_in_goto_queue): Update source location.
>> (lower_try_finally_copy): Likewise.
>> (honor_protect_cleanup_actions): Likewise.
>> * gimplify.c (gimplify_expr): Reset the location to unknown.
>>
>> gcc/testsuite/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>> * g++.dg/debug/dwarf2/deallocator.C: New test.
>>
>> libjava/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>> * testsuite/libjava.lang/sourcelocation.java: New cases.
>> * testsuite/libjava.lang/sourcelocation.out: New cases.
>
> On Linux/x86, I got
>
> FAIL: sourcelocation -O3 -findirect-dispatch output - source compiled test
> FAIL: sourcelocation -O3 output - source compiled test
> FAIL: sourcelocation -findirect-dispatch output - source compiled test
> FAIL: sourcelocation output - source compiled test
>
> spawn [open ...]^M
> -1
> -1
> -1
> PASS: sourcelocation -findirect-dispatch execution - source compiled test
> FAIL: sourcelocation -findirect-dispatch output - source compiled test
>
>

I am using binutils mainline 20120914 from CVS.

-- 
H.J.


Re: [PATCH] Set correct source location for deallocator calls

2012-09-14 Thread Andrew Pinski
On Fri, Sep 14, 2012 at 9:25 PM, H.J. Lu  wrote:
> On Sat, Sep 8, 2012 at 2:42 PM, Dehao Chen  wrote:
>> Hi,
>>
>> I've added a libjava unittest which verifies that this patch will not
>> break Java debug info. I've also incorporated Richard's review in the
>> previous mail. Attached is the new patch, which passed bootstrap and
>> all gcc/libjava testsuites on x86.
>>
>> Is it ok for trunk?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>>  * tree-eh.c (goto_queue_node): New field.
>> (record_in_goto_queue): New parameter.
>> (record_in_goto_queue_label): New parameter.
>> (lower_try_finally_dup_block): New parameter.
>> (maybe_record_in_goto_queue): Update source location.
>> (lower_try_finally_copy): Likewise.
>> (honor_protect_cleanup_actions): Likewise.
>> * gimplify.c (gimplify_expr): Reset the location to unknown.
>>
>> gcc/testsuite/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>> * g++.dg/debug/dwarf2/deallocator.C: New test.
>>
>> libjava/ChangeLog:
>> 2012-09-08  Dehao Chen  
>>
>> * testsuite/libjava.lang/sourcelocation.java: New cases.
>> * testsuite/libjava.lang/sourcelocation.out: New cases.
>
> On Linux/x86, I got
>
> FAIL: sourcelocation -O3 -findirect-dispatch output - source compiled test
> FAIL: sourcelocation -O3 output - source compiled test
> FAIL: sourcelocation -findirect-dispatch output - source compiled test
> FAIL: sourcelocation output - source compiled test
>
> spawn [open ...]^M
> -1
> -1
> -1
> PASS: sourcelocation -findirect-dispatch execution - source compiled test
> FAIL: sourcelocation -findirect-dispatch output - source compiled test

I bet you have an older addr2line installed.

Thanks,
Andrew Pinski


Re: [Patch,avr] ad PR54222: Support saturated +, -, ABS

2012-09-14 Thread Denis Chertykov
2012/9/14 Georg-Johann Lay :
> This patch adds more fixed-point support, namely saturated operations:
>
> SS_PLUS, SS_MINUS, SS_NEG, SS_ABS,
> US_PLUS, US_MINUS, US_NEG
>
> for all supported fixed-point modes:
>
> [U]QQ,
> [U]HQ, [U]HA,
> [U]SQ, [U]SA,
> [U]DQ, [U]DA, [U]TA.
>
> Depending on their complexity, the functions are implemented in libgcc
> or are natively supported by avr-gcc.
>
> The bulk of code is in the avr_out_plus_1 routine which has been generalized
> to perform saturation.
>
> avr_out_plus has been rewritten and is now generic enough to handle all the
> cases that were formerly treated by:
>   avr_out_plus
>   avr_out_plus_noclobber
>   avr_out_minus
>   avr_out_plus64
>   avr_out_minus64
>
> The latter 4 functions are removed and the md files are cleaned up to use
> avr_out_plus.
>
> There are no new regressions.
>
> However, all new tests with "-Os -flto" fail because they trigger a
> segmentation fault in lto1 at
>gcc/tree-streamer-in.c:unpack_ts_fixed_cst_value_fields()
> while that function tries to deserialize TREE_FIXED_CST.
>
> Thus, these FAILs are because of an LTO issue.
>
> Except the "-Os -flto" cases, all other new tests PASS.
>
> Ok for trunk?
>

Ok. Please apply.

Denis.