Re: [PATCH] Fix incorrect byte swap detection (PR tree-optimization/60454)

2014-03-12 Thread Jakub Jelinek
On Wed, Mar 12, 2014 at 11:45:03AM +0800, Thomas Preud'homme wrote:
> > From: Jakub Jelinek [mailto:ja...@redhat.com]
> > 
> > In theory you could have __CHAR_BIT__ different from 8 and what you care
> > about is that uint32_t has exactly 32 bits, so the check would need to be
> >   if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
> > return 0;
> 
> I could go with:
> 
> In = (0x12 << (__CHAR_BIT__ * 3))
> | (0x34 << (__CHAR_BIT__ * 2))
> | (0x56 << __CHAR_BIT__)
> | 0x78;
> 
> and compare with a similarly constructed out so that I could run the test 
> whenever sizeof (uint32_t) * __CHAR_BIT__ >= 32, isn't it?

No need to bother, I think we don't have any __CHAR_BIT__ != 8 targets right
now, and after all, the bswap code doesn't bother with it either:
static unsigned int
execute_optimize_bswap (void)
{
  basic_block bb;
  bool bswap16_p, bswap32_p, bswap64_p;
  bool changed = false;
  tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type =
  NULL_TREE;

  if (BITS_PER_UNIT != 8)
return 0;

> Thanks for the review. See attachment and below to check the version you 
> approved.

Thanks, this is ok.

Jakub


Re: [PATCH] Fix incorrect byte swap detection (PR tree-optimization/60454)

2014-03-12 Thread Jakub Jelinek
On Wed, Mar 12, 2014 at 11:43:00AM +0800, Joey Ye wrote:
> 4.8 also has this bug. OK to backport?

Yes.

Jakub


Re: [PATCH] Fix PR60505

2014-03-12 Thread Jakub Jelinek
On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> This patch is fixing PR60505 in which the vectorizer may produce
> unnecessary epilogues.
> 
> Bootstrapped and tested on a x86_64 machine.
> 
> OK for trunk?

That looks wrong.  Consider the case where the loop isn't versioned,
if you disable generation of the epilogue loop, you end up only with
a vector loop.

Say:
unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
void
foo (char *__restrict in, char *__restrict out, int num)
{
  int i;

  in = __builtin_assume_aligned (in, 16);
  out = __builtin_assume_aligned (out, 16);
  for (i = 0; i < num; ++i)
out[i] = (ovec[i] = in[i]);
  out[num] = ovec[num / 2];
}
-O2 -ftree-vectorize.  Now, consider if this function is called
with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
valid and your patch will cause a wrong-code in this case).

Jakub


Re: [RFC] Do not consider volatile asms as optimization barriers #1

2014-03-12 Thread Eric Botcazou
> Thanks, and to Bernd for the review.  I went ahead and applied it to trunk.

Thanks.  We need something for the 4.8 branch as well, probably the builtins.c 
hunk and the reversion of the cse.c/cselib.c/dse.c changes to the 4.7 state.

-- 
Eric Botcazou


Re: [PATCH] Fix __builtin_unreachable related regression (PR middle-end/60482)

2014-03-12 Thread Richard Biener
On Tue, 11 Mar 2014, Jakub Jelinek wrote:

> Hi!
> 
> As described in the PR, the r208165 change regressed following test.
> The problem is that VRP inserts a useless ASSERT_EXPR right before
> __builtin_unreachable () (obviously, the uses of the ASSERT_EXPR
> lhs aren't and can't be used by anything), which then prevents
> assert_unreachable_fallthru_edge_p from detecting it properly
> (but, even ignoring ASSERT_EXPRs there still would fail, because
> the ASSERT_EXPR adds another user of the SSA_NAME we check imm uses for).
> 
> Perhaps FOUND_IN_SUBGRAPH (4.3 and earlier era) would be always true
> here, but live_on_edge provably isn't always true, so it makes sense to test
> it, something that isn't live on the edge is useless.
> 
> The tree-cfg.c change is just small improvement discovered when looking into
> it, clobber stmts before __builtin_unreachable can be certainly ignored,
> they don't do anything.
> 
> The patch regresses ssa-ifcombine-10.c testcase, I'll post a fix for that
> momentarily.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2014-03-11  Jakub Jelinek  
> 
>   PR middle-end/60482
>   * tree-vrp.c (register_edge_assert_for_1): Don't add assert
>   if there are multiple uses, but op doesn't live on E edge.
>   * tree-cfg.c (assert_unreachable_fallthru_edge_p): Also ignore
>   clobber stmts before __builtin_unreachable.
> 
>   * gcc.dg/vect/pr60482.c: New test.
> 
> --- gcc/tree-vrp.c.jj 2014-01-25 00:11:37.0 +0100
> +++ gcc/tree-vrp.c2014-03-10 14:59:03.748267354 +0100
> @@ -5423,12 +5423,9 @@ register_edge_assert_for_1 (tree op, enu
>  return false;
>  
>/* We know that OP will have a zero or nonzero value.  If OP is used
> - more than once go ahead and register an assert for OP.
> -
> - The FOUND_IN_SUBGRAPH support is not helpful in this situation as
> - it will always be set for OP (because OP is used in a COND_EXPR in
> - the subgraph).  */
> -  if (!has_single_use (op))
> + more than once go ahead and register an assert for OP.  */
> +  if (live_on_edge (e, op)
> +  && !has_single_use (op))
>  {
>val = build_int_cst (TREE_TYPE (op), 0);
>register_new_assert_for (op, op, code, val, NULL, e, bsi);
> --- gcc/tree-cfg.c.jj 2014-02-20 21:38:42.0 +0100
> +++ gcc/tree-cfg.c2014-03-10 14:59:52.058957446 +0100
> @@ -410,9 +410,9 @@ assert_unreachable_fallthru_edge_p (edge
> if (gsi_end_p (gsi))
>   return false;
> stmt = gsi_stmt (gsi);
> -   if (is_gimple_debug (stmt))
> +   while (is_gimple_debug (stmt) || gimple_clobber_p (stmt))
>   {
> -   gsi_next_nondebug (&gsi);
> +   gsi_next (&gsi);
> if (gsi_end_p (gsi))
>   return false;
> stmt = gsi_stmt (gsi);
> --- gcc/testsuite/gcc.dg/vect/pr60482.c.jj2014-03-10 15:08:16.700085976 
> +0100
> +++ gcc/testsuite/gcc.dg/vect/pr60482.c   2014-03-10 15:15:09.609738455 
> +0100
> @@ -0,0 +1,20 @@
> +/* PR middle-end/60482 */
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +/* { dg-require-effective-target vect_int } */
> +
> +double
> +foo (double *x, int n)
> +{
> +  double p = 0.0;
> +  int i;
> +  x = __builtin_assume_aligned (x, 128);
> +  if (n % 128)
> +__builtin_unreachable ();
> +  for (i = 0; i < n; i++)
> +p += x[i];
> +  return p;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


Re: [PATCH] Improve ifcombine

2014-03-12 Thread Richard Biener
On Tue, 11 Mar 2014, Jakub Jelinek wrote:

> Hi!
> 
> This patch fixes the ssa-ifcombine-10.c regression.
> The thing is that the uselessly added ASSERT_EXPR makes vrp1 change
> the cfg slightly like this:
>:
>_4 = x_3(D) & 1;
>if (_4 == 0)
>  goto ;
>else
>  goto ;
>  
>:
>_5 = x_3(D) & 4;
>if (_5 != 0)
> -goto ;
> -  else
>  goto ;
> +  else
> +goto ;
>  
>:
>  
>:
> -  # t_1 = PHI <0(2), 3(3), 0(4)>
> +  # t_1 = PHI <0(2), 3(4), 0(3)>
>return t_1;
> (addition of the ASSERT_EXPR resulted in creation of a new bb to insert
> it into and that bb is then removed again during cfg cleanup, but
> it ends up effectively swapping the forwarder block from one edge of the
> gimple cond to the other with corresponding phi arg change).
> Now, tree_ssa_ifcombine_bb apparently only groks the latter form (the one
> with + lines), but not the equivalent form the testcase had before VRP
> (and with the PR60482 fix also has after VRP, the one with - lines).
> 
> This patch teaches tree_ssa_ifcombine_bb to handle both forms.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok in principle, but is there a possibility to factor this a bit?
It looks like a lot of cut&paste (without looking too close for subtle
differences).

Thanks,
Richard.

> Note, the phi-opt-2.c change is there because the patch made the
> test fail, as for LOGICAL_OP_NON_SHORT_CIRCUIT we now generate even
> better code, return a && b.  So, I've added ssa-ifcombine-13.c test
> which is phi-opt-2.c and test that for -mbranch-cost=2 we have no ifs,
> and phi-opt-2.c now checks that for -mbranch-cost=1 we do have one if
> (ifcombine then doesn't do anything and we verify that phiopt does what it
> should).
> 
> 2014-03-11  Jakub Jelinek  
> 
>   * tree-ssa-ifcombine.c (forwarder_block_to): New function.
>   (tree_ssa_ifcombine_bb): Handle also cases where else_bb is
>   an empty forwarder block to then_bb or vice versa and then_bb
>   and else_bb are effectively swapped.
> 
>   * gcc.dg/tree-ssa/ssa-ifcombine-12.c: New test.
>   * gcc.dg/tree-ssa/ssa-ifcombine-13.c: New test.
>   * gcc.dg/tree-ssa/phi-opt-2.c: Pass -mbranch-cost=1 if
>   possible, only test for exactly one if if -mbranch-cost=1
>   has been passed.
> 
> --- gcc/tree-ssa-ifcombine.c.jj   2014-03-11 12:13:53.012618098 +0100
> +++ gcc/tree-ssa-ifcombine.c  2014-03-11 16:15:29.084329709 +0100
> @@ -135,6 +135,16 @@ bb_no_side_effects_p (basic_block bb)
>return true;
>  }
>  
> +/* Return true if BB is an empty forwarder block to TO_BB.  */
> +
> +static bool
> +forwarder_block_to (basic_block bb, basic_block to_bb)
> +{
> +  return empty_block_p (bb)
> +  && single_succ_p (bb)
> +  && single_succ (bb) == to_bb;
> +}
> +
>  /* Verify if all PHI node arguments in DEST for edges from BB1 or
> BB2 to DEST are the same.  This makes the CFG merge point
> free from side-effects.  Return true in this case, else false.  */
> @@ -660,6 +670,102 @@ tree_ssa_ifcombine_bb (basic_block inner
> return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false,
>   true);
>   }
> +
> +  if (forwarder_block_to (else_bb, then_bb))
> + {
> +   /* Other possibilities for the && form, if else_bb is
> +  empty forwarder block to then_bb.  Compared to the above simpler
> +  forms this can be treated as if then_bb and else_bb were swapped,
> +  and the corresponding inner_cond_bb not inverted because of that.
> +  For same_phi_args_p we look at equality of arguments between
> +  edge from outer_cond_bb and the forwarder block.  */
> +   if (recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &then_bb)
> +   && same_phi_args_p (outer_cond_bb, else_bb, then_bb)
> +   && bb_no_side_effects_p (inner_cond_bb))
> + {
> +   /* We have
> +
> +  if (q) goto inner_cond_bb; else goto then_bb;
> +
> +  if (p) goto then_bb; else goto else_bb;
> +
> +  # empty fallthru
> +
> +  # x = PHI 
> +  ...
> +*/
> +   return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb,
> + false, false);
> + }
> +
> +   /* And a version where the outer condition is negated.  */
> +   if (recognize_if_then_else (outer_cond_bb, &then_bb, &inner_cond_bb)
> +   && same_phi_args_p (outer_cond_bb, else_bb, then_bb)
> +   && bb_no_side_effects_p (inner_cond_bb))
> + {
> +   /* We have
> +
> +  if (q) goto then_bb; else goto inner_cond_bb;
> +
> +  if (p) goto then_bb; else goto else_bb;
> +
> +  # empty fallthru
> +
> + 

Re: [PATCH] Fix reassoc of vectors (PR tree-optimization/60502)

2014-03-12 Thread Richard Biener
On Tue, 11 Mar 2014, Jakub Jelinek wrote:

> Hi!
> 
> build_low_bits_mask doesn't work for vector types (even TYPE_PRECISION
> alone on it is meaningless), but what we actually want is a constant with
> all bits set.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard.

> 2014-03-11  Jakub Jelinek  
>   Marc Glisse  
> 
>   PR tree-optimization/60502
>   * tree-ssa-reassoc.c (eliminate_not_pairs): Use build_all_ones_cst
>   instead of build_low_bits_mask.
> 
>   * gcc.c-torture/compile/pr60502.c: New test.
> 
> --- gcc/tree-ssa-reassoc.c.jj 2014-03-11 15:47:44.0 +0100
> +++ gcc/tree-ssa-reassoc.c2014-03-11 18:47:53.254946786 +0100
> @@ -828,8 +828,7 @@ eliminate_not_pairs (enum tree_code opco
> if (opcode == BIT_AND_EXPR)
>   oe->op = build_zero_cst (TREE_TYPE (oe->op));
> else if (opcode == BIT_IOR_EXPR)
> - oe->op = build_low_bits_mask (TREE_TYPE (oe->op),
> -   TYPE_PRECISION (TREE_TYPE (oe->op)));
> + oe->op = build_all_ones_cst (TREE_TYPE (oe->op));
>  
> reassociate_stats.ops_eliminated += ops->length () - 1;
> ops->truncate (0);
> --- gcc/testsuite/gcc.c-torture/compile/pr60502.c.jj  2014-03-11 
> 18:36:45.341757473 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr60502.c 2014-03-11 
> 18:35:58.0 +0100
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/60502 */
> +
> +typedef signed char v16i8 __attribute__ ((vector_size (16)));
> +typedef unsigned char v16u8 __attribute__ ((vector_size (16)));
> +
> +void
> +foo (v16i8 *x)
> +{
> +  v16i8 m1 = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
> -1 };
> +  *x |= *x ^ m1;
> +}
> +
> +void
> +bar (v16u8 *x)
> +{
> +  v16u8 m1 = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
> -1 };
> +  *x |= *x ^ m1;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


Re: [PATCH] Fix PR60505

2014-03-12 Thread Richard Biener
On Wed, 12 Mar 2014, Jakub Jelinek wrote:

> On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
> > This patch is fixing PR60505 in which the vectorizer may produce
> > unnecessary epilogues.
> > 
> > Bootstrapped and tested on a x86_64 machine.
> > 
> > OK for trunk?
> 
> That looks wrong.  Consider the case where the loop isn't versioned,
> if you disable generation of the epilogue loop, you end up only with
> a vector loop.
> 
> Say:
> unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> void
> foo (char *__restrict in, char *__restrict out, int num)
> {
>   int i;
> 
>   in = __builtin_assume_aligned (in, 16);
>   out = __builtin_assume_aligned (out, 16);
>   for (i = 0; i < num; ++i)
> out[i] = (ovec[i] = in[i]);
>   out[num] = ovec[num / 2];
> }
> -O2 -ftree-vectorize.  Now, consider if this function is called
> with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> valid and your patch will cause a wrong-code in this case)

Indeed - we also "share" the epilogue loop for the cost model
check.  See where we set its maximum number of iterations.

Richard.


Re: [PATCH][AARCH64]PR60034

2014-03-12 Thread Marcus Shawcroft
Hi Kugan


On 3 March 2014 21:56, Kugan  wrote:

> gcc/
>
> 2014-03-03  Kugan Vivekanandarajah  
>
> PR target/60034
> * aarch64/aarch64.c (aarch64_classify_address): Fix alignment for
> section anchor.
>
>
>
> gcc/testsuite/
>
> 2014-03-03  Kugan Vivekanandarajah  
>
> PR target/60034
> * gcc.target/aarch64/pr60034.c: New file.
>

+  else if (SYMBOL_REF_HAS_BLOCK_INFO_P (sym)

This test makes sense.

+   && SYMBOL_REF_ANCHOR_P (sym)

Do we  need this test  or is the patch being conservative?  I would
have thought that it is sufficient to drop this test and just take the
block alignment...

+   && SYMBOL_REF_BLOCK (sym) != NULL)
+ align = SYMBOL_REF_BLOCK (sym)->alignment;

+/* { dg-options "-std=gnu99 -fgnu89-inline -O -Wall -Winline
-Wwrite-strings -fmerge-all-constants -frounding-math -g
-Wstrict-prototypes" } */

Can you drop all the options that are not actually required to
reproduce the issue?

Cheers
/Marcus


Re: [PATCH][AARCH64]PR60034

2014-03-12 Thread Kugan


On 12/03/14 20:07, Marcus Shawcroft wrote:
> Hi Kugan
> 
> 
> On 3 March 2014 21:56, Kugan  wrote:
> 
>> gcc/
>>
>> 2014-03-03  Kugan Vivekanandarajah  
>>
>> PR target/60034
>> * aarch64/aarch64.c (aarch64_classify_address): Fix alignment for
>> section anchor.
>>
>>
>>
>> gcc/testsuite/
>>
>> 2014-03-03  Kugan Vivekanandarajah  
>>
>> PR target/60034
>> * gcc.target/aarch64/pr60034.c: New file.
>>
> 
> +  else if (SYMBOL_REF_HAS_BLOCK_INFO_P (sym)
> 
> This test makes sense.
> 
> +   && SYMBOL_REF_ANCHOR_P (sym)
> 
> Do we  need this test  or is the patch being conservative?  I would
> have thought that it is sufficient to drop this test and just take the
> block alignment...
> 
Thanks for the review.

If I understand gcc/rtl.h correctly, SYMBOL_REF_ANCHOR_P (sym) is
required for anchor SYMBOL_REFS. SYMBOL_REF_BLOCK (sym) != NULL is
probably redundant. This can probably become an gcc_assert
(SYMBOL_REF_BLOCK (sym)) instead.


> +   && SYMBOL_REF_BLOCK (sym) != NULL)
> + align = SYMBOL_REF_BLOCK (sym)->alignment;
> 
> +/* { dg-options "-std=gnu99 -fgnu89-inline -O -Wall -Winline
> -Wwrite-strings -fmerge-all-constants -frounding-math -g
> -Wstrict-prototypes" } */
> 
> Can you drop all the options that are not actually required to
> reproduce the issue?

I will change it.


Thanks,
Kugan


Re: [PATCH][AArch64] Fix default CPU configurations

2014-03-12 Thread Marcus Shawcroft
On 11 March 2014 11:48, Kyrill Tkachov  wrote:

>> - if test x$target_cpu_cname = x
>> + if test x"$target_cpu_cname" != x
>>
>> I think the addition of quoting here is orthogonal to the issue you
>> are fixing. There are several other references to target_cpu_cname in
>> config.gcc none of which are quoted, so I guess either they should all
>> be quoted, or not, and if they are it is a separate patch.
>
>
> Perhaps I should have commented on this.
> This change is not orthogonal.
> When I initially wrote it as " if test x$target_cpu_cname != x" the script
> complained of an error and happily ignored that line, giving the wrong value
> to target_cpu_default2 on the line below!
>
> "config.gcc: line 4065: test: too many arguments"
>
> If I quote it, it works fine. I suspect it's because of spaces introduced
> into target_cpu_cname earlier, since target_cpu_cname has the format
> "TARGET_CPU_$base_id | $ext_mask" from earlier, but I'm not sure.

For the benefit of the list, Kyrill and I just discussed the need for
quoting on target_cpu_cname. In the aarch64 path the value constructed
in target_cpu_cname is a '|' expression ripped from the table in
aarch64-option-extensions.def hence the quoting on the argument to the
test invocation is required to prevent the shell interpreting the '|'.
 The following use of the variable on the RHS of an assignment does
not require additional quoting.

I'm happy that the patch makes sense and should be committed.

/Marcus


Re: [PATCH] Improve ifcombine

2014-03-12 Thread Jakub Jelinek
On Wed, Mar 12, 2014 at 09:51:46AM +0100, Richard Biener wrote:
> Ok in principle, but is there a possibility to factor this a bit?
> It looks like a lot of cut&paste (without looking too close for subtle
> differences).

Like this?

2014-03-12  Jakub Jelinek  

* tree-ssa-ifcombine.c (forwarder_block_to): New function.
(tree_ssa_ifcombine_bb_1): New function.
(tree_ssa_ifcombine_bb): Use it.  Handle also cases where else_bb
is an empty forwarder block to then_bb or vice versa and then_bb
and else_bb are effectively swapped.

* gcc.dg/tree-ssa/ssa-ifcombine-12.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-13.c: New test.
* gcc.dg/tree-ssa/phi-opt-2.c: Pass -mbranch-cost=1 if
possible, only test for exactly one if if -mbranch-cost=1
has been passed.

--- gcc/tree-ssa-ifcombine.c.jj 2014-03-11 20:14:34.046082392 +0100
+++ gcc/tree-ssa-ifcombine.c2014-03-12 10:45:29.355054715 +0100
@@ -135,6 +135,16 @@ bb_no_side_effects_p (basic_block bb)
   return true;
 }
 
+/* Return true if BB is an empty forwarder block to TO_BB.  */
+
+static bool
+forwarder_block_to (basic_block bb, basic_block to_bb)
+{
+  return empty_block_p (bb)
+&& single_succ_p (bb)
+&& single_succ (bb) == to_bb;
+}
+
 /* Verify if all PHI node arguments in DEST for edges from BB1 or
BB2 to DEST are the same.  This makes the CFG merge point
free from side-effects.  Return true in this case, else false.  */
@@ -561,6 +571,99 @@ ifcombine_ifandif (basic_block inner_con
   return false;
 }
 
+/* Helper function for tree_ssa_ifcombine_bb.  Recognize a CFG pattern and
+   dispatch to the appropriate if-conversion helper for a particular
+   set of INNER_COND_BB, OUTER_COND_BB, THEN_BB and ELSE_BB.
+   PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.  */
+
+static bool
+tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block outer_cond_bb,
+basic_block then_bb, basic_block else_bb,
+basic_block phi_pred_bb)
+{
+  /* The && form is characterized by a common else_bb with
+ the two edges leading to it mergable.  The latter is
+ guaranteed by matching PHI arguments in the else_bb and
+ the inner cond_bb having no side-effects.  */
+  if (phi_pred_bb != else_bb
+  && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &else_bb)
+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb)
+  && bb_no_side_effects_p (inner_cond_bb))
+{
+  /* We have
+  
+if (q) goto inner_cond_bb; else goto else_bb;
+  
+if (p) goto ...; else goto else_bb;
+...
+  
+...
+   */
+  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, false,
+   false);
+}
+
+  /* And a version where the outer condition is negated.  */
+  if (phi_pred_bb != else_bb
+  && recognize_if_then_else (outer_cond_bb, &else_bb, &inner_cond_bb)
+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb)
+  && bb_no_side_effects_p (inner_cond_bb))
+{
+  /* We have
+  
+if (q) goto else_bb; else goto inner_cond_bb;
+  
+if (p) goto ...; else goto else_bb;
+...
+  
+...
+   */
+  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb, true,
+   false);
+}
+
+  /* The || form is characterized by a common then_bb with the
+ two edges leading to it mergable.  The latter is guaranteed
+ by matching PHI arguments in the then_bb and the inner cond_bb
+ having no side-effects.  */
+  if (phi_pred_bb != then_bb
+  && recognize_if_then_else (outer_cond_bb, &then_bb, &inner_cond_bb)
+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb)
+  && bb_no_side_effects_p (inner_cond_bb))
+{
+  /* We have
+  
+if (q) goto then_bb; else goto inner_cond_bb;
+  
+if (q) goto then_bb; else goto ...;
+  
+...
+   */
+  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, true,
+   true);
+}
+
+  /* And a version where the outer condition is negated.  */
+  if (phi_pred_bb != then_bb
+  && recognize_if_then_else (outer_cond_bb, &inner_cond_bb, &then_bb)
+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb)
+  && bb_no_side_effects_p (inner_cond_bb))
+{
+  /* We have
+  
+if (q) goto inner_cond_bb; else goto then_bb;
+  
+if (q) goto then_bb; else goto ...;
+  
+...
+   */
+  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb, false,
+   true);
+}
+
+  return false;
+}
+
 /* Recognize a CFG pattern and dispatch to the appropriate
if-conversion helper.  We start with BB as the innermost
  

Re: [PATCH][AARCH64]PR60034

2014-03-12 Thread Marcus Shawcroft
>> +  else if (SYMBOL_REF_HAS_BLOCK_INFO_P (sym)
>>
>> This test makes sense.
>>
>> +   && SYMBOL_REF_ANCHOR_P (sym)
>>
>> Do we  need this test  or is the patch being conservative?  I would
>> have thought that it is sufficient to drop this test and just take the
>> block alignment...
>>
> Thanks for the review.
>
> If I understand gcc/rtl.h correctly, SYMBOL_REF_ANCHOR_P (sym) is
> required for anchor SYMBOL_REFS. SYMBOL_REF_BLOCK (sym) != NULL is
> probably redundant. This can probably become an gcc_assert
> (SYMBOL_REF_BLOCK (sym)) instead.

I agree with your interpretation of the code and comments in rtl.h.  I
also accept that SYMBOL_REF_ANCHOR_P() is sufficient to resolve the
test case.  However I'm wondering why we need to constraint the test
down to SYMBOL_REF_ANCHOR_P().  At this point in the code we are
trying to find alignment of the object, if we have a SYMBOL_REF_BLOCK
then we can get the block alignment irrespective of
SYMBOL_REF_ANCHOR_P().

Cheers
/Marcus


Re: [PATCH] Use the LTO linker plugin by default

2014-03-12 Thread Rainer Orth
Richard Biener  writes:

> On Mon, 10 Mar 2014, Rainer Orth wrote:
>
>> Richard Biener  writes:
>> 
>> > Ouch.  But as lto-plugin is a host module it should link against
>> > the host libgcc, no?  During stage1, that is.  So the question is
>> > why does it use the gcc/ compiler?
>> >
>> > For me it's using the host gcc:
>> >
>> > gcc -DHAVE_CONFIG_H -I. -I/space/rguenther/tramp3d/trunk/lto-plugin 
>> > -I/space/rguenther/tramp3d/trunk/lto-plugin/../include -DHAVE_CONFIG_H 
>> > -Wall -g -c /space/rguenther/tramp3d/trunk/lto-plugin/lto-plugin.c  -fPIC 
>> > -DPIC -o .libs/lto-plugin.o
>> > /bin/sh ./libtool --tag=CC --tag=disable-static  --mode=link gcc -Wall -g  
>> > -module -bindir /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.9.0  
>> > -static-libstdc++ -static-libgcc  -o liblto_plugin.la -rpath 
>> > /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.9.0 lto-plugin.lo 
>> > -Wc,../libiberty/pic/libiberty.a
>> > libtool: link: gcc -shared  .libs/lto-plugin.o
>> > ../libiberty/pic/libiberty.a   -Wl,-soname -Wl,liblto_plugin.so.0 -o 
>> > .libs/liblto_plugin.so.0.0.0
>> 
>> It does use the host compiler for me, too.
>
> So then if it succeeds to link to a shared libgcc_s then why
> is it not able to find that later?  Maybe you miss setting
> of a suitable LD_LIBRARY_PATH to pick up the runtime for
> your host compiler?

For the same reason that we use -static-libstdc++ to avoid this issue
for libstdc++.so.  I've always considered gcc's tendency to build
binaries that don't run by default a major annoyance, all the weasel
wording in the FAQ nonwithstanding.  I hope to finally do something
about it for 4.10/5.0 (btw., any word on what the next release is going
to be?).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH, libjava]: Cleanup include/dwarf2-signal.h to avoid several warnings

2014-03-12 Thread Uros Bizjak
Hello!

Attached patch cleans include/dwarf2-signal.h to avoid

./include/java-signal.h:26:19: warning: declaration 'class
java::lang::Throwable' does not declare anything
./include/java-signal.h:24:42: warning: unused parameter '_sip'
[-Wunused-parameter]
./include/java-signal.h:26:19: warning: declaration 'class
java::lang::Throwable' does not declare anything
./include/java-signal.h:26:19: warning: declaration 'class
java::lang::Throwable' does not declare anything

on alpha-pc-linux-gnu build. The SIGNAL_HANDLER change is taken from
x86_64-signal.h header and allows empty definition of
MAKE_THROW_FRAME.

2014-03-12  Uros Bizjak  

* include/dwarf2-signal.h: Update copyright year.
(SIGNAL_HANDLER): Remove _sip argument.  Mark _p argument with
__attribute__ ((__unused__)).
(class java::lang::Throwable): Remove declaration.
(MAKE_THROW_FRAME) [!__ia64__]: Define as empty definition.

Patch was bootstrapped and regression tested on alpha-pc-linux-gnu and
is committed to mainline SVN under obvious rule.

Uros.
Index: include/dwarf2-signal.h
===
--- include/dwarf2-signal.h (revision 208493)
+++ include/dwarf2-signal.h (working copy)
@@ -1,6 +1,6 @@
 // dwarf2-signal.h - Catch runtime signals and turn them into exceptions.
 
-/* Copyright (C) 2000, 2001, 2009, 2011  Free Software Foundation
+/* Copyright (C) 2000, 2001, 2009, 2011, 2014  Free Software Foundation
 
This file is part of libgcj.
 
@@ -20,11 +20,10 @@ details.  */
 #define HANDLE_SEGV 1
 #undef HANDLE_FPE
 
-#define SIGNAL_HANDLER(_name)  \
-static void _Jv_##_name (int, siginfo_t *_sip, void *_p)
+#define SIGNAL_HANDLER(_name)  \
+static void _Jv_##_name (int, siginfo_t *, \
+void *_p __attribute__ ((__unused__)))
 
-class java::lang::Throwable;
-
 // Unwind the stack to the point at which the signal was generated and
 // then throw an exception.  With the dwarf2 unwinder we don't usually
 // need to do anything, with some minor exceptions.
@@ -47,12 +46,7 @@ do   
\
 while (0)
 
 #else
-#define MAKE_THROW_FRAME(_exception)   \
-do \
-{  \
-  (void)_p;\
-}  \
-while (0)
+#define MAKE_THROW_FRAME(_exception)
 #endif
 
 #if defined(__sparc__)


Re: [PATCH] Use the LTO linker plugin by default

2014-03-12 Thread Rainer Orth
Richard Biener  writes:

> On Mon, 10 Mar 2014, Rainer Orth wrote:
>
>> Rainer Orth  writes:
>> 
>> > It does use the host compiler for me, too.
>> >
>> >> but maybe _that_ is the issue for you? (see also how it uses
>> >> -static-libgcc, for me it doesn't even depend on libgcc_s)
>> >
>> > But as you can see above, libtool, being its usual helpful self, simply
>> > drops -static-libgcc ;-(  If I use -Wc,-static-libgcc, all seems fine.
>> 
>> The following patch implements this.  The override is necessary to avoid
>> LDFLAGS passed in from the toplevel to replace the Makefile value.
>> 
>> Bootstraps on i386-pc-solaris2.10 and x86_64-unknown-linux-gnu are now
>> well beyond stage1.  Ok for mainline if they pass?
>
> If we go that route I wonder if we should rely on the toplevel passing
> -static-libgcc but instead force -static-libgcc for the plugin
> anyway?  (conditional on compiling with GCC, of course)

That would mean either duplicating the test from the toplevel or adding
a test for gcc in lto-plugin.  Either is ugly, so I'd like to avoid it
if possible.

It occured to me that some of the complexcity would go away if gcc just
accepted and stripped -Wc (which is currently a libtool-only option),
but even if so we'd have to deal with gcc's that don't have this for a
long time.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Improve ifcombine

2014-03-12 Thread Richard Biener
On March 12, 2014 10:52:23 AM CET, Jakub Jelinek  wrote:
>On Wed, Mar 12, 2014 at 09:51:46AM +0100, Richard Biener wrote:
>> Ok in principle, but is there a possibility to factor this a bit?
>> It looks like a lot of cut&paste (without looking too close for
>subtle
>> differences).
>
>Like this?

Yes.

Thanks,
Richard.

>2014-03-12  Jakub Jelinek  
>
>   * tree-ssa-ifcombine.c (forwarder_block_to): New function.
>   (tree_ssa_ifcombine_bb_1): New function.
>   (tree_ssa_ifcombine_bb): Use it.  Handle also cases where else_bb
>   is an empty forwarder block to then_bb or vice versa and then_bb
>   and else_bb are effectively swapped.
>
>   * gcc.dg/tree-ssa/ssa-ifcombine-12.c: New test.
>   * gcc.dg/tree-ssa/ssa-ifcombine-13.c: New test.
>   * gcc.dg/tree-ssa/phi-opt-2.c: Pass -mbranch-cost=1 if
>   possible, only test for exactly one if if -mbranch-cost=1
>   has been passed.
>
>--- gcc/tree-ssa-ifcombine.c.jj2014-03-11 20:14:34.046082392 +0100
>+++ gcc/tree-ssa-ifcombine.c   2014-03-12 10:45:29.355054715 +0100
>@@ -135,6 +135,16 @@ bb_no_side_effects_p (basic_block bb)
>   return true;
> }
> 
>+/* Return true if BB is an empty forwarder block to TO_BB.  */
>+
>+static bool
>+forwarder_block_to (basic_block bb, basic_block to_bb)
>+{
>+  return empty_block_p (bb)
>+   && single_succ_p (bb)
>+   && single_succ (bb) == to_bb;
>+}
>+
> /* Verify if all PHI node arguments in DEST for edges from BB1 or
>BB2 to DEST are the same.  This makes the CFG merge point
>free from side-effects.  Return true in this case, else false.  */
>@@ -561,6 +571,99 @@ ifcombine_ifandif (basic_block inner_con
>   return false;
> }
> 
>+/* Helper function for tree_ssa_ifcombine_bb.  Recognize a CFG pattern
>and
>+   dispatch to the appropriate if-conversion helper for a particular
>+   set of INNER_COND_BB, OUTER_COND_BB, THEN_BB and ELSE_BB.
>+   PHI_PRED_BB should be one of INNER_COND_BB, THEN_BB or ELSE_BB.  */
>+
>+static bool
>+tree_ssa_ifcombine_bb_1 (basic_block inner_cond_bb, basic_block
>outer_cond_bb,
>+   basic_block then_bb, basic_block else_bb,
>+   basic_block phi_pred_bb)
>+{
>+  /* The && form is characterized by a common else_bb with
>+ the two edges leading to it mergable.  The latter is
>+ guaranteed by matching PHI arguments in the else_bb and
>+ the inner cond_bb having no side-effects.  */
>+  if (phi_pred_bb != else_bb
>+  && recognize_if_then_else (outer_cond_bb, &inner_cond_bb,
>&else_bb)
>+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb)
>+  && bb_no_side_effects_p (inner_cond_bb))
>+{
>+  /* We have
>+ 
>+   if (q) goto inner_cond_bb; else goto else_bb;
>+ 
>+   if (p) goto ...; else goto else_bb;
>+   ...
>+ 
>+   ...
>+   */
>+  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb,
>false,
>+  false);
>+}
>+
>+  /* And a version where the outer condition is negated.  */
>+  if (phi_pred_bb != else_bb
>+  && recognize_if_then_else (outer_cond_bb, &else_bb,
>&inner_cond_bb)
>+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, else_bb)
>+  && bb_no_side_effects_p (inner_cond_bb))
>+{
>+  /* We have
>+ 
>+   if (q) goto else_bb; else goto inner_cond_bb;
>+ 
>+   if (p) goto ...; else goto else_bb;
>+   ...
>+ 
>+   ...
>+   */
>+  return ifcombine_ifandif (inner_cond_bb, false, outer_cond_bb,
>true,
>+  false);
>+}
>+
>+  /* The || form is characterized by a common then_bb with the
>+ two edges leading to it mergable.  The latter is guaranteed
>+ by matching PHI arguments in the then_bb and the inner cond_bb
>+ having no side-effects.  */
>+  if (phi_pred_bb != then_bb
>+  && recognize_if_then_else (outer_cond_bb, &then_bb,
>&inner_cond_bb)
>+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb)
>+  && bb_no_side_effects_p (inner_cond_bb))
>+{
>+  /* We have
>+ 
>+   if (q) goto then_bb; else goto inner_cond_bb;
>+ 
>+   if (q) goto then_bb; else goto ...;
>+ 
>+   ...
>+   */
>+  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb,
>true,
>+  true);
>+}
>+
>+  /* And a version where the outer condition is negated.  */
>+  if (phi_pred_bb != then_bb
>+  && recognize_if_then_else (outer_cond_bb, &inner_cond_bb,
>&then_bb)
>+  && same_phi_args_p (outer_cond_bb, phi_pred_bb, then_bb)
>+  && bb_no_side_effects_p (inner_cond_bb))
>+{
>+  /* We have
>+ 
>+   if (q) goto inner_cond_bb; else goto then_bb;
>+ 
>+   if (q) goto then_bb; else goto ...;
>+ 
>+   ...
>+   */
>+  return ifcombine_ifandif (inner_cond_bb, true, outer_cond_bb,
>false,
>+  

[C++ Patch/RFC] PR 60254

2014-03-12 Thread Paolo Carlini

Hi,

in this regression (Jakub figured out that it started with r72165) we 
ICE during error recovery:


60254_1.C: In function ‘bool foo(T)’:
60254_1.C:4:3: error: non-constant condition for static assertion
static_assert(foo(i), "Error");
^
60254_1.C:4:3: internal compiler error: unexpected expression ‘foo’ of 
kind overload

0x743e15 cxx_eval_constant_expression
../../trunk/gcc/cp/semantics.c:9790
0x7425b6 cxx_eval_call_expression
../../trunk/gcc/cp/semantics.c:8364
0x743d64 cxx_eval_constant_expression
../../trunk/gcc/cp/semantics.c:9487
0x7471e6 cxx_eval_outermost_constant_expr
../../trunk/gcc/cp/semantics.c:9810
0x74fd0d cxx_constant_value
../../trunk/gcc/cp/semantics.c:9895
0x74fd0d finish_static_assert(tree_node*, tree_node*, unsigned int, bool)
../../trunk/gcc/cp/semantics.c:6863
0x6b770b cp_parser_static_assert
../../trunk/gcc/cp/parser.c:11838

Would it be Ok to trivially handle OVERLOAD for the benefit of error 
recovery per the attached? The error message we additionally emit makes 
sense:


60254_1.C:7:22: error: expression ‘foo’ does not designate a constexpr 
function

static_assert(foo(i), "Error");

Thanks!
Paolo.

PS: the second testcase is already fixed.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 208507)
+++ cp/semantics.c  (working copy)
@@ -9456,6 +9456,7 @@ cxx_eval_constant_expression (const constexpr_call
 case FUNCTION_DECL:
 case TEMPLATE_DECL:
 case LABEL_DECL:
+case OVERLOAD:
   return t;
 
 case PARM_DECL:
Index: testsuite/g++.dg/cpp0x/static_assert10.C
===
--- testsuite/g++.dg/cpp0x/static_assert10.C(revision 0)
+++ testsuite/g++.dg/cpp0x/static_assert10.C(working copy)
@@ -0,0 +1,8 @@
+// PR c++/60254
+// { dg-do compile { target c++11 } }
+
+template bool foo(T)
+{
+  int i;
+  static_assert(foo(i), "Error"); // { dg-error "non-constant 
condition|constexpr function" }
+}
Index: testsuite/g++.dg/cpp0x/static_assert11.C
===
--- testsuite/g++.dg/cpp0x/static_assert11.C(revision 0)
+++ testsuite/g++.dg/cpp0x/static_assert11.C(working copy)
@@ -0,0 +1,10 @@
+// PR c++/60254
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  template bool foo(T)
+  {
+static_assert(foo(0), "Error"); // { dg-error "non-constant condition" }
+  }
+};


[patch,avr] Fix PR60486: Typo cc_plus against cc_minus in calls of avr_out_plus_1

2014-03-12 Thread Georg-Johann Lay
This fixes a problem because cc_plus and cc_minus are in the wrong places in 
calls of avr_out_plus_1.  This is important when avr_out_plus is called from 
notice_update_cc.  This means that cc_status might be determined incorrectly.


In the vast majority of cases this leads to performance regression because of 
superfluous comparisons when an addition (using SUB instructions) has already 
set the condition code.


But there are also cases where this might lead to wrong code.

No changes in test suite results.

Ok to apply?


I didn't follow this list for some time. Is trunk open for such changes?

If so, I would apply it to trunk and 4.8 branch, otherwise to 4.8, 4.9 and 
trunk once they are open again.


Johann


PR target/60486
* config/avr/avr.c (avr_out_plus): Swap cc_plus and cc_minus in
calls of avr_out_plus_1.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 208473)
+++ config/avr/avr.c	(working copy)
@@ -6812,8 +6812,8 @@ avr_out_plus (rtx insn, rtx *xop, int *p
 
   /* Work out the shortest sequence.  */
 
-  avr_out_plus_1 (op, &len_minus, MINUS, &cc_plus, code_sat, sign, out_label);
-  avr_out_plus_1 (op, &len_plus, PLUS, &cc_minus, code_sat, sign, out_label);
+  avr_out_plus_1 (op, &len_minus, MINUS, &cc_minus, code_sat, sign, out_label);
+  avr_out_plus_1 (op, &len_plus, PLUS, &cc_plus, code_sat, sign, out_label);
 
   if (plen)
 {


Re: [PATCH] ARM: Weaker memory barriers

2014-03-12 Thread Will Deacon
On Tue, Mar 11, 2014 at 09:12:53PM +, John Carr wrote:
> Will Deacon  wrote:
> > On Tue, Mar 11, 2014 at 02:54:18AM +, John Carr wrote:
> > > A comment in arm/sync.md notes "We should consider issuing a inner
> > > shareability zone barrier here instead."  Here is my first attempt
> > > at a patch to emit weaker memory barriers.  Three instructions seem
> > > to be relevant for user mode code on my Cortex A9 Linux box:
> > > 
> > > dmb ishst, dmb ish, dmb sy
> > > 
> > > I believe these correspond to a release barrier, a full barrier
> > > with respect to other CPUs, and a full barrier that also orders
> > > relative to I/O.
> > 
> > Not quite; DMB ISHST only orders writes with other writes, so loads can move
> > across it in both directions. That means it's not sufficient for releasing a
> > lock, for example.
> 
> Release in this context doesn't mean "lock release".  I understand
> it to mean release in the specific context of the C++11 memory model.
> (Similarly, if you're arguing standards compliance "inline" really
> means "relax the one definition rule for this function.")
> 
> I don't see a prohibition on moving non-atomic loads across a release
> store.  Can you point to an analysis that shows a full barrier is needed?

Well, you can use acquire/release to implement a lock easily enough. For
example, try feeding the following to cppmem:

  int main() {
int x = 0, y = 0;
atomic_int z = 0;

{{{ { r1 = x; y = 1;
  z.store(1, memory_order_release); }
||| { r0 = z.load(memory_order_acquire).readsvalue(1);
  r1 = y; x = 1;}
}}}

return 0;
  }

There is one consistent execution, which requires the first thread to have
r1 == 0 (i.e. read x as zero) and the second thread to have r1 == 1 (i.e.
read y as 1).

If we implement store-release using DMB ISHST, the assembly code would look
something like the following (I've treated the atomic accesses like normal
load/store instructions for clarity, since they don't affect the ordering
here):

  T0:

  LDR r1, [x]
  STR #1, [y]
  DMB ISHST
  STR #1, [z]

  T1:

  LDR r0, [z] // Reads 1
  DMB ISH
  LDR r1, [y]
  STR #1, [x]

The problem with this is that the LDR in T0 can be re-ordered *past* the
rest of the sequence, potentially resulting in r1 == 1, which is forbidden.
It's just like reading from a shared, lock-protected data structure without
the lock held.

> If we assume that gcc is used to generate code for processes running
> within a single inner shareable domain, then we can start by demoting
> "dmb sy" to "dmb ish" for the memory barrier with no other change.

I'm all for such a change.

> If a store-store barrier has no place in the gcc atomic memory model,
> that supports my hypothesis that a twisty maze of ifdefs is superior to
> a "portable" attractive nuisance.

I don't understand your point here.

Will


Re: [gomp4 2/3] OpenACC data construct implementation in terms of GF_OMP_TARGET_KIND_OACC_DATA.

2014-03-12 Thread Thomas Schwinge
Hi!

On Fri, 21 Feb 2014 21:32:14 +0100, I wrote:
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -1499,6 +1499,30 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  {
>tree c, decl;
>bool scan_array_reductions = false;
> +  bool offloaded;
> +  switch (gimple_code (ctx->stmt))
> +{
> +case GIMPLE_OACC_PARALLEL:
> +  offloaded = true;
> +  break;
> +case GIMPLE_OMP_TARGET:
> +  switch (gimple_omp_target_kind (ctx->stmt))
> + {
> + case GF_OMP_TARGET_KIND_REGION:
> +   offloaded = true;
> +   break;
> + case GF_OMP_TARGET_KIND_DATA:
> + case GF_OMP_TARGET_KIND_UPDATE:
> + case GF_OMP_TARGET_KIND_OACC_DATA:
> +   offloaded = false;
> +   break;
> + default:
> +   gcc_unreachable ();
> + }
> +  break;
> +default:
> +  offloaded = false;
> +}

I now have a need for this information elsewhere; in gomp-4_0-branch
r208513 changed as follows:

commit 326592ef8fe7501f9ba7e67157d68c6c541e5601
Author: tschwinge 
Date:   Wed Mar 12 13:40:07 2014 +

is_gimple_omp_offloaded.

gcc/
* omp-low.c (scan_sharing_clauses): Move offloaded logic into...
* gimple.h (is_gimple_omp_offloaded): ... this new static inline
function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208513 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 79030d6..4ee843f 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2014-03-12  Thomas Schwinge  
+
+   * omp-low.c (scan_sharing_clauses): Move offloaded logic into...
+   * gimple.h (is_gimple_omp_offloaded): ... this new static inline
+   function.
+
 2014-02-28  Thomas Schwinge  
 
* gimple.def (GIMPLE_OACC_KERNELS): New code.
diff --git gcc/gimple.h gcc/gimple.h
index 514af32..910072d 100644
--- gcc/gimple.h
+++ gcc/gimple.h
@@ -5823,6 +5823,31 @@ is_gimple_omp_oacc_specifically (const_gimple stmt)
 }
 
 
+/* Return true if OMP_* STMT is offloaded.  */
+
+static inline bool
+is_gimple_omp_offloaded (const_gimple stmt)
+{
+  gcc_assert (is_gimple_omp (stmt));
+  switch (gimple_code (stmt))
+{
+case GIMPLE_OACC_KERNELS:
+case GIMPLE_OACC_PARALLEL:
+  return true;
+case GIMPLE_OMP_TARGET:
+  switch (gimple_omp_target_kind (stmt))
+   {
+   case GF_OMP_TARGET_KIND_REGION:
+ return true;
+   default:
+ return false;
+   }
+default:
+  return false;
+}
+}
+
+
 /* Returns TRUE if statement G is a GIMPLE_NOP.  */
 
 static inline bool
diff --git gcc/omp-low.c gcc/omp-low.c
index 2f13fb4..6b676e5 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1499,31 +1499,6 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 {
   tree c, decl;
   bool scan_array_reductions = false;
-  bool offloaded;
-  switch (gimple_code (ctx->stmt))
-{
-case GIMPLE_OACC_KERNELS:
-case GIMPLE_OACC_PARALLEL:
-  offloaded = true;
-  break;
-case GIMPLE_OMP_TARGET:
-  switch (gimple_omp_target_kind (ctx->stmt))
-   {
-   case GF_OMP_TARGET_KIND_REGION:
- offloaded = true;
- break;
-   case GF_OMP_TARGET_KIND_DATA:
-   case GF_OMP_TARGET_KIND_UPDATE:
-   case GF_OMP_TARGET_KIND_OACC_DATA:
- offloaded = false;
- break;
-   default:
- gcc_unreachable ();
-   }
-  break;
-default:
-  offloaded = false;
-}
 
   for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 {
@@ -1696,7 +1671,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  /* Ignore OMP_CLAUSE_MAP_POINTER kind for arrays in
 target regions that are not offloaded; there is nothing to map 
for
 those.  */
- if (!offloaded && !POINTER_TYPE_P (TREE_TYPE (decl)))
+ if (!is_gimple_omp_offloaded (ctx->stmt)
+ && !POINTER_TYPE_P (TREE_TYPE (decl)))
break;
}
  if (DECL_P (decl))
@@ -1721,7 +1697,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
install_var_field (decl, true, 7, ctx);
  else
install_var_field (decl, true, 3, ctx);
- if (offloaded)
+ if (is_gimple_omp_offloaded (ctx->stmt))
install_var_local (decl, ctx);
}
}
@@ -1845,7 +1821,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  gcc_assert (gimple_code (ctx->stmt) != GIMPLE_OMP_TARGET
  || (gimple_omp_target_kind (ctx->stmt)
  != GF_OMP_TARGET_KIND_UPDATE));
- if (!offloaded)
+ if (!is_gimple_omp_offloaded (ctx->stmt))
break;
  decl = OMP_CLAUSE_DECL (c);
  if (DECL_P (decl)


Grüße,
 Thomas


pgp42c2DalXsc.pgp
Description: PGP signature


Re: [gomp4] Accelerator constructs omp lowering and expansion

2014-03-12 Thread Thomas Schwinge
Hi!

On Wed, 4 Sep 2013 20:54:47 +0200, Jakub Jelinek  wrote:
> This patch implements #pragma omp {target{, data, update},teams} lowering
> and expansion, and adds stub calls into libgomp, so that (for now
> unconditionally) we can at least always fall back to host execution.

> 2013-09-04  Jakub Jelinek  

>   * omp-low.c [...]
>   (create_omp_child_function): If current function has
>   "omp declare target" attribute or if current region
>   is OMP_TARGET or lexically nested in it, add that
>   attribute to the omp child function.

It seems that I have missed this one when generalizing the existing code
for OpenACC:

> --- gcc/omp-low.c.jj  2013-08-27 22:44:31.0 +0200
> +++ gcc/omp-low.c 2013-09-04 19:58:30.320019227 +0200
> @@ -1677,6 +1775,26 @@ create_omp_child_function (omp_context *
>DECL_EXTERNAL (decl) = 0;
>DECL_CONTEXT (decl) = NULL_TREE;
>DECL_INITIAL (decl) = make_node (BLOCK);
> +  bool target_p = false;
> +  if (lookup_attribute ("omp declare target",
> + DECL_ATTRIBUTES (current_function_decl)))
> +target_p = true;
> +  else
> +{
> +  omp_context *octx;
> +  for (octx = ctx; octx; octx = octx->outer)
> + if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
> + && gimple_omp_target_kind (octx->stmt)
> +== GF_OMP_TARGET_KIND_REGION)
> +   {
> + target_p = true;
> + break;
> +   }
> +}
> +  if (target_p)
> +DECL_ATTRIBUTES (decl)
> +  = tree_cons (get_identifier ("omp declare target"),
> +NULL_TREE, DECL_ATTRIBUTES (decl));
>  
>t = build_decl (DECL_SOURCE_LOCATION (decl),
> RESULT_DECL, NULL_TREE, void_type_node);

Even if not yet relevant at the moment for OpenACC, I think it makes
sense to make it more obvious, and change the code as follows.  Will
commit soon unless someone disagrees.

commit a07a6e3414da55ff4bbc8b7f0ceb747c1712fecc
Author: Thomas Schwinge 
Date:   Wed Mar 12 12:30:58 2014 +0100

* gcc/omp-low.c (create_omp_child_function): Use 
is_gimple_omp_offloaded.

diff --git gcc/omp-low.c gcc/omp-low.c
index 32f702c..82c0489 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1979,16 +1979,12 @@ create_omp_child_function (omp_context *ctx, bool 
task_copy)
 {
   omp_context *octx;
   for (octx = ctx; octx; octx = octx->outer)
-   if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
-   && gimple_omp_target_kind (octx->stmt)
-  == GF_OMP_TARGET_KIND_REGION)
+   if (is_gimple_omp_offloaded (octx->stmt))
  {
target_p = true;
break;
  }
 }
-  gcc_assert (!is_gimple_omp_oacc_specifically (ctx->stmt)
- || !target_p);
   if (target_p)
 DECL_ATTRIBUTES (decl)
   = tree_cons (get_identifier ("omp declare target"),


Grüße,
 Thomas


pgprEBGA_9IjW.pgp
Description: PGP signature


[Ping] [PATCH, AArch64] Use llfloor and llceil for vcvtmd_s64_f64 and vcvtpd_s64_f64 in arm_neon.h

2014-03-12 Thread Yufeng Zhang

Ping~

Possible for stage-4 as a bug-fix?

Thanks,
Yufeng

On 02/24/14 14:05, Yufeng Zhang wrote:

Hi Marcus,

On 01/14/14 12:30, Marcus Shawcroft wrote:

On 6 January 2014 12:30, Yufeng Zhang   wrote:

This patch fixes the implementation of vcvtmd_s64_f64 and vcvtpd_s64_f64 in
arm_neon.h to use llfloor and llceil instead, which are ILP32-friendly.

This patch will fix the following test failure in the ILP32 mode:

FAIL: gcc.target/aarch64/vect-vcvt.c scan-assembler fcvtms\\tx[0-9]+,
d[0-9]+

OK for the trunk?



OK, but we should wait for stage-1 now.


Although the ILP32 is an experimental feature for 4.9, I think as a bug
fix the patch shall go in for stage-4.

Thanks,
Yufeng







Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-03-12 Thread Bernd Schmidt

Hi,

On 03/08/2014 03:50 PM, Ilya Verbin wrote:

Here is updated patch for libgomp.  It assumes that there is a constructor with
a call to GOMP_offload_register in every target image, created by mkoffload
tool.  How does this look?


LGTM. Shall I start committing my changes to the branch?


Bernd




[Ping] [PATCH, AArch64] Sync merge libffi - fix call frame information in ffi_closure_SYSV

2014-03-12 Thread Yufeng Zhang

Ping~

Originally posted here: 
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01673.html


Thanks,
Yufeng

On 02/28/14 17:44, Yufeng Zhang wrote:

Hi,

The attached patch fixes a bug in ./src/aarch64/sysv.S:ffi_closure_SYSV
where stack unwinding information was not generated correctly.  The
change has been reviewed, approved and merged into the stand-alone
libffi release tree**.

OK for the trunk?

Thanks,
Yufeng

** http://github.com/atgreen/libffi


2014-02-28  Yufeng Zhang

* src/aarch64/sysv.S (ffi_closure_SYSV): Use x29 as the
main CFA reg; update cfi_rel_offset.





Re: [PATCH, AArch64] Sync merge libffi - fix call frame information in ffi_closure_SYSV

2014-03-12 Thread Marcus Shawcroft

On 28/02/14 17:44, Yufeng Zhang wrote:

Hi,

The attached patch fixes a bug in ./src/aarch64/sysv.S:ffi_closure_SYSV
where stack unwinding information was not generated correctly.  The
change has been reviewed, approved and merged into the stand-alone
libffi release tree**.

OK for the trunk?

Thanks,
Yufeng

** http://github.com/atgreen/libffi


2014-02-28  Yufeng Zhang  

* src/aarch64/sysv.S (ffi_closure_SYSV): Use x29 as the
main CFA reg; update cfi_rel_offset.



This change is already committed in upstream libffi.

Since it is a bug fix I think it should be merged to the gcc/libffi for 
4.9.  Please leave another 24 hours for the RM's to comment before 
committing it.


Thanks
/Marcus



Re: [PATCH, AArch64] Use llfloor and llceil for vcvtmd_s64_f64 and vcvtpd_s64_f64 in arm_neon.h

2014-03-12 Thread Marcus Shawcroft

On 24/02/14 14:05, Yufeng Zhang wrote:

Hi Marcus,

On 01/14/14 12:30, Marcus Shawcroft wrote:

On 6 January 2014 12:30, Yufeng Zhang  wrote:

This patch fixes the implementation of vcvtmd_s64_f64 and vcvtpd_s64_f64 in
arm_neon.h to use llfloor and llceil instead, which are ILP32-friendly.

This patch will fix the following test failure in the ILP32 mode:

FAIL: gcc.target/aarch64/vect-vcvt.c scan-assembler fcvtms\\tx[0-9]+,
d[0-9]+

OK for the trunk?



OK, but we should wait for stage-1 now.


Although the ILP32 is an experimental feature for 4.9, I think as a bug
fix the patch shall go in for stage-4.


OK provided there is no RM objection in the next 24 hours.

/Marcus




Re: [PATCH, AArch64] Use llfloor and llceil for vcvtmd_s64_f64 and vcvtpd_s64_f64 in arm_neon.h

2014-03-12 Thread Jakub Jelinek
On Wed, Mar 12, 2014 at 02:32:36PM +, Marcus Shawcroft wrote:
> >>OK, but we should wait for stage-1 now.
> >
> >Although the ILP32 is an experimental feature for 4.9, I think as a bug
> >fix the patch shall go in for stage-4.
> 
> OK provided there is no RM objection in the next 24 hours.

Ok.

Jakub


[PATCH] Ensure UNSUPPORTED tests in tree-prof do not report absolute, paths.

2014-03-12 Thread Marcus Shawcroft

Hi,

The test infrastructure for gcc.dg/tree-prof reports relative paths
for all test outcomes except UNSUPPORTED, for which it reports the
absolute path of the test case.  This patch ensure a relative path is
reported consistently reducing noise in regression test comparisons.


I'd like this to go in now during stage-4 in order to have slightly less 
noisy 4.9 regression tests for the next n years.


OK?

Cheers
/Marcus


testsuite/ChangeLog

2014-03-12  Marcus Shawcroft  

* lib/profopt.exp (profopt-execute): Use $testcase in
unsupported.diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp
index e0d849e..cb6a350 100644
--- a/gcc/testsuite/lib/profopt.exp
+++ b/gcc/testsuite/lib/profopt.exp
@@ -277,7 +277,7 @@ proc profopt-execute { src } {
 	set dg-do-what [list "run" "" P]
 	set extra_flags [profopt-get-options $src]
 	if { [lindex ${dg-do-what} 1 ] == "N" } {
-	unsupported "$src"
+	unsupported "$testcase"
 	unset testname_with_flags
 	verbose "$src not supported on this target, skipping it" 3
 	return
-- 
1.7.9.5


Re: [PATCH] Ensure UNSUPPORTED tests in tree-prof do not report absolute, paths.

2014-03-12 Thread Jakub Jelinek
On Wed, Mar 12, 2014 at 02:39:15PM +, Marcus Shawcroft wrote:
> 2014-03-12  Marcus Shawcroft  
> 
> * lib/profopt.exp (profopt-execute): Use $testcase in
> unsupported.

Ok, thanks.

> --- a/gcc/testsuite/lib/profopt.exp
> +++ b/gcc/testsuite/lib/profopt.exp
> @@ -277,7 +277,7 @@ proc profopt-execute { src } {
>   set dg-do-what [list "run" "" P]
>   set extra_flags [profopt-get-options $src]
>   if { [lindex ${dg-do-what} 1 ] == "N" } {
> - unsupported "$src"
> + unsupported "$testcase"
>   unset testname_with_flags
>   verbose "$src not supported on this target, skipping it" 3
>   return

Jakub


Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-03-12 Thread Ilya Verbin
2014-03-12 18:12 GMT+04:00 Bernd Schmidt :
> LGTM. Shall I start committing my changes to the branch?

Yes, I think you should commit your changes.
And we will rewrite our part to use the new configure approach.

  -- Ilya


Re: [Patch, AArch64] Fix shuffle for big-endian.

2014-03-12 Thread Alan Lawrence

I've been doing some local testing using this patch as a basis for some of my
own work on NEON intrinsics, and it seems good to me. A couple of points:

(1) Re. the comment that "If two vectors, we end up with a wierd mixed-endian
mode on NEON": firstly "wierd" should be spelt "weird";
secondly, if I understand right, this comment belongs with the next "if
(!d->one_vector_p...)" rather than the "if (BYTES_BIG_ENDIAN)" before which it's
written.

(2) as you say, this code is not exercised, unless you do something to remove
the 'if (BYTES_BIG_ENDIAN) return false;' earlier in that same function. Can I
politely suggest you do that here in this patch?

(3) In my own regression testing, with const_vec_perm enabled on big_endian, I
see 2*PASS->FAIL, namely

gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1

gcc.dg/vect/vect-114.c -flto -ffat-lto-objects  scan-tree-dump-times
vect "vectorized 0 loops" 1

These are essentially noise, but the noise is removed and I see no other
problems, if (after this patch) I re-enable the testsuite's "vect_perm" target
selector for aarch64 big-endian (testsuite/lib/target-supports.exp). Would you
like a separate patch for that or roll it in here?

Cheers, Alan

Tejas Belagod wrote:
> > Hi,
> >
> > When a shuffle of more than one input happens, on NEON we end up with a
> > 'mixed-endian' format in the register list which TBL operates on. We don't 
make

> > this correction in RTL and therefore the shuffle operation gets it 
incorrect.
> > Here is a patch that fixes-up the index table in the selector rtx in RTL to 
also

> > be mixed-endian to reflect what's happening on NEON.
> >
> > As trunk stands, this patch will not be exercised as constant vector 
permute for
> > Big-endian is disabled. I've tested this by locally enabling const vec_perm 
and

> > it fixes the some regressions we have on big-endian:
> >
> > aarch64_be-none-elf:
> > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 
-fomit-frame-pointer
> > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 
-fomit-frame-pointer

> > -funroll-all-loops -finline-functions
> > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 
-fomit-frame-pointer

> > -funroll-loops
> > FAIL->PASS: gcc.c-torture/execute/loop-11.c execution,  -O3 -g
> > FAIL->PASS: gcc.dg/torture/vector-shuffle1.c  -O0  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v16qi.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v2df.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v2di.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v2sf.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v2si.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v4sf.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v4si.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v8hi.c  -O2  execution test
> > FAIL->PASS: gcc.dg/torture/vshuf-v8qi.c  -O2  execution test
> > FAIL->PASS: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects execution test
> > FAIL->PASS: gcc.dg/vect/vect-114.c execution test
> > FAIL->PASS: gcc.dg/vect/vect-15.c -flto -ffat-lto-objects execution test
> > FAIL->PASS: gcc.dg/vect/vect-15.c execution test
> >
> > Also regressed on aarch64-none-elf.
> >
> > OK for stage-1?
> >
> > Thanks,
> > Tejas.
> >
> > 2014-02-21  Tejas Belagod  
> >
> > gcc/
> >   * config/aarch64/aarch64.c (aarch64_evpc_tbl): Fix index vector for
> >   big-endian when dealing with more than one input shuffle vector.
> >



Re: [C++ Patch/RFC] PR 60254

2014-03-12 Thread Jason Merrill

Perhaps we need a require_potential_rvalue_constant_expression?

Jason


Re: [PATCH, AArch64] Sync merge libffi - fix call frame information in ffi_closure_SYSV

2014-03-12 Thread Jakub Jelinek
On Wed, Mar 12, 2014 at 02:27:12PM +, Marcus Shawcroft wrote:
> On 28/02/14 17:44, Yufeng Zhang wrote:
> >** http://github.com/atgreen/libffi
> >
> >
> >2014-02-28  Yufeng Zhang  
> >
> > * src/aarch64/sysv.S (ffi_closure_SYSV): Use x29 as the
> > main CFA reg; update cfi_rel_offset.
> >
> 
> This change is already committed in upstream libffi.
> 
> Since it is a bug fix I think it should be merged to the gcc/libffi
> for 4.9.  Please leave another 24 hours for the RM's to comment
> before committing it.

Ok.

Jakub


[PATCH] Try to avoid sorting on SSA_NAME_VERSION during reassoc (PR middle-end/60418)

2014-03-12 Thread Jakub Jelinek
Hi!

Apparently 435.gromacs benchmark is very sensitive (of course with
-ffast-math) to reassociation ordering.

We were sorting on SSA_NAME_VERSIONs, which has the disadvantage that we
reuse SSA_NAME_VERSIONs from SSA_NAMEs dropped by earlier optimization
passes and thus even minor changes in unrelated parts of function in
unrelated optimizations can have very big effects on reassociation
decisions.

As discussed on IRC and in bugzilla, this patch attempts to sort on
the ordering of SSA_NAME_DEF_STMT statements.  If they are in different
basic blocks, it uses bb_rank for sorting, if they are within the same
bb, it checks which stmt dominates the other one in the bb (using
gimple_uid).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-03-12  Jakub Jelinek  

PR tree-optimization/59025
PR middle-end/60418
* tree-ssa-reassoc.c (sort_by_operand_rank): For SSA_NAMEs with the
same rank, sort by bb_rank and gimple_uid of SSA_NAME_DEF_STMT first.

--- gcc/tree-ssa-reassoc.c.jj   2014-03-10 18:12:30.782215912 +0100
+++ gcc/tree-ssa-reassoc.c  2014-03-12 10:09:03.341757696 +0100
@@ -219,6 +219,7 @@ static struct pointer_map_t *operand_ran
 
 /* Forward decls.  */
 static long get_rank (tree);
+static bool reassoc_stmt_dominates_stmt_p (gimple, gimple);
 
 
 /* Bias amount for loop-carried phis.  We want this to be larger than
@@ -506,11 +507,37 @@ sort_by_operand_rank (const void *pa, co
 }
 
   /* Lastly, make sure the versions that are the same go next to each
- other.  We use SSA_NAME_VERSION because it's stable.  */
+ other.  */
   if ((oeb->rank - oea->rank == 0)
   && TREE_CODE (oea->op) == SSA_NAME
   && TREE_CODE (oeb->op) == SSA_NAME)
 {
+  /* As SSA_NAME_VERSION is assigned pretty randomly, because we reuse
+versions of removed SSA_NAMEs, so if possible, prefer to sort
+based on basic block and gimple_uid of the SSA_NAME_DEF_STMT.
+See PR60418.  */
+  if (!SSA_NAME_IS_DEFAULT_DEF (oea->op)
+ && !SSA_NAME_IS_DEFAULT_DEF (oeb->op)
+ && SSA_NAME_VERSION (oeb->op) != SSA_NAME_VERSION (oea->op))
+   {
+ gimple stmta = SSA_NAME_DEF_STMT (oea->op);
+ gimple stmtb = SSA_NAME_DEF_STMT (oeb->op);
+ basic_block bba = gimple_bb (stmta);
+ basic_block bbb = gimple_bb (stmtb);
+ if (bbb != bba)
+   {
+ if (bb_rank[bbb->index] != bb_rank[bba->index])
+   return bb_rank[bbb->index] - bb_rank[bba->index];
+   }
+ else
+   {
+ bool da = reassoc_stmt_dominates_stmt_p (stmta, stmtb);
+ bool db = reassoc_stmt_dominates_stmt_p (stmtb, stmta);
+ if (da != db)
+   return da ? 1 : -1;
+   }
+   }
+
   if (SSA_NAME_VERSION (oeb->op) != SSA_NAME_VERSION (oea->op))
return SSA_NAME_VERSION (oeb->op) - SSA_NAME_VERSION (oea->op);
   else

Jakub


Re: [C++ Patch/RFC] PR 60254

2014-03-12 Thread Paolo Carlini

Hi,

On 03/12/2014 04:01 PM, Jason Merrill wrote:

Perhaps we need a require_potential_rvalue_constant_expression?
Something like the below? Interesting, that should also save some 
duplicate work. Note however, that, besides the trivial adjustment of 
static_assert3.C, we produce slightly different secondary diagnostic for 
both static_assert10.C and static_assert11.C: for the former we notice 
the non-constant 'int i' (*), for the latter the use of 'this'. Is that Ok?


Thanks,
Paolo.

//

(*) I double checked that if 'int i' is changed to 'const int i = 0' we 
provide indeed the error message about the non-constexpr 'foo' as 
secondary error message.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 208507)
+++ cp/semantics.c  (working copy)
@@ -6860,7 +6860,7 @@ finish_static_assert (tree condition, tree message
   else if (condition && condition != error_mark_node)
{
  error ("non-constant condition for static assertion");
- cxx_constant_value (condition);
+ require_potential_rvalue_constant_expression (condition);
}
   input_location = saved_loc;
 }
Index: testsuite/g++.dg/cpp0x/static_assert10.C
===
--- testsuite/g++.dg/cpp0x/static_assert10.C(revision 0)
+++ testsuite/g++.dg/cpp0x/static_assert10.C(working copy)
@@ -0,0 +1,8 @@
+// PR c++/60254
+// { dg-do compile { target c++11 } }
+
+template bool foo(T)
+{
+  int i;
+  static_assert(foo(i), "Error"); // { dg-error "non-constant condition|not 
usable" }
+}
Index: testsuite/g++.dg/cpp0x/static_assert11.C
===
--- testsuite/g++.dg/cpp0x/static_assert11.C(revision 0)
+++ testsuite/g++.dg/cpp0x/static_assert11.C(working copy)
@@ -0,0 +1,10 @@
+// PR c++/60254
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  template bool foo(T)
+  {
+static_assert(foo(0), "Error"); // { dg-error "non-constant 
condition|constant expression" }
+  }
+};
Index: testsuite/g++.dg/cpp0x/static_assert3.C
===
--- testsuite/g++.dg/cpp0x/static_assert3.C (revision 208507)
+++ testsuite/g++.dg/cpp0x/static_assert3.C (working copy)
@@ -1,4 +1,4 @@
 // { dg-do compile { target c++11 } }
 static_assert(7 / 0, "X"); // { dg-error "non-constant condition" 
"non-constant" }
 // { dg-warning "division by zero" "zero" { target *-*-* } 2 }
-// { dg-error "7 / 0.. is not a constant expression" "not a constant" { target 
*-*-* } 2 }
+// { dg-error "division by zero is not a constant-expression" "non-constant" { 
target *-*-* } 2 }


[C++ Patch/RFC] Emit DW_TAG_imported_declaration under the right class for 'using' statements in a class

2014-03-12 Thread Siva Chandra
Hi,

The attached patch fixes what seems to me as a bug in emitting a
DW_TAG_imported_declaration corresponding to 'using' statements in a
class.

Consider the following:

class A
{
 public:
  int a;
  int method (int i);
};

int
A::method (int i)
{
  return i + a;
}

class B : public A
{
 public:
  using A::method;
  int method (const B &b);
};

int
B::method (const B &b)
{
  return b.a + a;
}

Before the patch, the die corresponding to the 'using' statement in
class B was getting emitted as a child of the die corresponding to
class A.

ChangeLog:
2014-03-12  Siva Chandra Reddy  

cp/
* class.c (handle_using_decl): Pass the correct scope to
cp_emit_debug_info_for_using.

testsuite/
* g++.dg/debug/dwarf2/imported-decl-2.C: New testcase.
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index b46391b..6ad82d7 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1299,7 +1299,7 @@ handle_using_decl (tree using_decl, tree t)
old_value = NULL_TREE;
 }
 
-  cp_emit_debug_info_for_using (decl, USING_DECL_SCOPE (using_decl));
+  cp_emit_debug_info_for_using (decl, t);
 
   if (is_overloaded_fn (decl))
 flist = decl;
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/imported-decl-2.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/imported-decl-2.C
new file mode 100644
index 000..70200ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/imported-decl-2.C
@@ -0,0 +1,32 @@
+// { dg-do compile }
+// { dg-options "-gdwarf-2 -dA -O0 -fno-merge-debug-strings" }
+
+class 
+{
+ public:
+  int method (void);
+  int a;
+};
+
+int
+::method (void)
+{
+  return a;
+}
+
+class  : public 
+{
+ public:
+  using ::method;
+
+  int method (int b);
+};
+
+int
+::method (int b)
+{
+  return a + b;
+}
+
+// { dg-final { scan-assembler-not "ascii \"0\".*ascii 
\"0\".*DW_TAG_imported_declaration" } }
+// { dg-final { scan-assembler-times "ascii \"0\".*ascii 
\"0\".*DW_TAG_imported_declaration" 1 } }


[GOOGLE, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
This patch is to fix the problem described here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

The original patch is here:
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00369.html
The attached patch addresses HJ's comment.

bootstrap, regression test is ok. perf test in plain mode is ok. ok
for google-4_8 branch?

Thanks,
Wei.


gcc/ChangeLog:

2014-03-07  Wei Mi  

* config/i386/i386.c (ix86_compute_frame_layout): update
preferred_stack_boundary when there is tls expanded call.
* config/i386/i386.md: set
ix86_tls_descriptor_calls_expanded_in_cfun.

gcc/testsuite/ChangeLog:

2014-03-07  Wei Mi  

* g++.dg/pr58066.C: New test.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 208464)
+++ config/i386/i386.c  (working copy)
@@ -9211,6 +9211,19 @@ ix86_compute_frame_layout (struct ix86_f
   crtl->preferred_stack_boundary = 128;
   crtl->stack_alignment_needed = 128;
 }
+  /* For 64-bit target, preferred_stack_boundary is never updated for call
+ expanded from tls descriptor. Update it here. We don't update it in
+ expand stage because according to the comments before
+ ix86_current_function_calls_tls_descriptor, tls calls may be optimized
+ away.  */
+  else if (TARGET_64BIT
+  && ix86_current_function_calls_tls_descriptor
+  && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+{
+  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY)
+   crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}

   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 208464)
+++ config/i386/i386.md (working copy)
@@ -12776,7 +12776,11 @@
 UNSPEC_TLS_GD))
  (clobber (match_scratch:SI 4))
  (clobber (match_scratch:SI 5))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  ""
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 (define_insn "*tls_global_dynamic_64_"
   [(set (match_operand:P 0 "register_operand" "=a")
@@ -12809,7 +12813,10 @@
   (const_int 0)))
  (unspec:P [(match_operand 1 "tls_symbolic_operand")]
   UNSPEC_TLS_GD)])]
-  "TARGET_64BIT")
+  "TARGET_64BIT"
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 (define_insn "*tls_local_dynamic_base_32_gnu"
   [(set (match_operand:SI 0 "register_operand" "=a")
@@ -12844,7 +12851,11 @@
UNSPEC_TLS_LD_BASE))
   (clobber (match_scratch:SI 3))
   (clobber (match_scratch:SI 4))
-  (clobber (reg:CC FLAGS_REG))])])
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 (define_insn "*tls_local_dynamic_base_64_"
   [(set (match_operand:P 0 "register_operand" "=a")
@@ -12870,7 +12881,10 @@
(mem:QI (match_operand 1 "constant_call_address_operand"))
(const_int 0)))
   (unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])]
-  "TARGET_64BIT")
+  "TARGET_64BIT"
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 ;; Local dynamic of a single variable is a lose.  Show combine how
 ;; to convert that back to global dynamic.
Index: testsuite/g++.dg/pr58066.C
===
--- testsuite/g++.dg/pr58066.C  (revision 0)
+++ testsuite/g++.dg/pr58066.C  (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && { ! ia32 } } } } */
+/* { dg-options "-fPIC -O2" } */
+
+/* Check whether the stack frame starting address of tls expanded call
+   in __cxa_get_globals() is 16bytes aligned.  */
+static __thread char ccc;
+extern "C" void* __cxa_get_globals() throw()
+{
+ return &ccc;
+}
+
+/* { dg-final { scan-assembler ".cfi_def_cfa_offset 16" } } */


Re: [GOOGLE] Writes annotation info in elf section.

2014-03-12 Thread Dehao Chen
Thanks Cary for the comments.

Patch updated, an also added a tool in contrib/ to dump the profile
annotation coverage.

Dehao
>
>
> On Wed, Mar 12, 2014 at 9:48 AM, Cary Coutant  wrote:
>>
>> +void autofdo_source_profile::write_annotated_count () const
>> +{
>> +  switch_to_section (get_section (
>> +  ".gnu.switches.text.annotation",
>> +  SECTION_DEBUG | SECTION_MERGE | SECTION_STRINGS | 1, NULL));
>>
>> I think it would be worth a comment explaining the point of setting
>> the SECTION_MERGE and SECTION_STRINGS flags, and why it works for this
>> section. Also, the "1" is clearer if you write is as "(SECTION_ENTSIZE
>> & 1)".
>>
>> -cary
>
>
Index: contrib/autofdo_coverage.py
===
--- contrib/autofdo_coverage.py (revision 0)
+++ contrib/autofdo_coverage.py (revision 0)
@@ -0,0 +1,46 @@
+#!/usr/bin/python
+#
+# Copyright (C) 2013 Free Software Foundation, Inc.
+#
+# This script is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+
+# This script computes and outputs the percentage of profile that has
+# been used to annotate the AutoFDO optimized binary.
+
+import subprocess
+import sys
+
+if len(sys.argv) != 2:
+  print "Usage: " + sys.argv[0] + " "
+  exit(1)
+
+args = ["readelf", "-p", ".gnu.switches.text.annotation", sys.argv[1]]
+output, _ = subprocess.Popen(args, stdout=subprocess.PIPE).communicate()
+
+profile_map = {}
+
+for l in output.split('\n'):
+  parts = l.split()
+  if len(parts) != 3:
+continue;
+  words = parts[2].split(':')
+  if len(words) != 3:
+continue;
+  function = words[0]
+  total_count = int(words[1])
+  annotated_count = int(words[2])
+  if function not in profile_map:
+profile_map[function] = [total_count, annotated_count]
+  elif annotated_count > profile_map[function][1]:
+profile_map[function][1] = annotated_count
+
+total_sum = 0
+annotated_sum = 0
+for function in profile_map:
+  total_sum += profile_map[function][0]
+  annotated_sum += profile_map[function][1]
+
+print float(annotated_sum) / total_sum

Property changes on: contrib/autofdo_coverage.py
___
Added: svn:executable
   + *

Index: gcc/auto-profile.c
===
--- gcc/auto-profile.c  (revision 208283)
+++ gcc/auto-profile.c  (working copy)
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "l-ipo.h"
 #include "ipa-utils.h"
 #include "ipa-inline.h"
+#include "output.h"
+#include "dwarf2asm.h"
 #include "auto-profile.h"
 
 /* The following routines implements AutoFDO optimization.
@@ -100,9 +102,6 @@ typedef std::vector string_vector;
 /* Map from function name's index in function_name_map to target's
execution count.  */
 typedef std::map icall_target_map;
-/* Represent profile count of an inline stack,  profile count is represented as
-   (execution_count, value_profile_histogram).  */
-typedef std::pair count_info;
 
 /* Set of inline_stack. Used to track if the profile is already used to
annotate the program.  */
@@ -112,6 +111,13 @@ typedef std::set location_set;
to direct call.  */
 typedef std::set stmt_set;
 
+struct count_info
+{
+  gcov_type count;
+  icall_target_map targets;
+  bool annotated;
+};
+
 struct string_compare
 {
   bool operator() (const char *a, const char *b) const
@@ -154,7 +160,7 @@ class function_instance {
   /* Read the profile and create a function_instance with head count as
  HEAD_COUNT. Recursively read callsites to create nested function_instances
  too. STACK is used to track the recursive creation process.  */
-  static const function_instance *read_function_instance (
+  static function_instance *read_function_instance (
   function_instance_stack *stack, gcov_type head_count);
 
   /* Recursively deallocate all callsites (nested function_instances).  */
@@ -167,8 +173,8 @@ class function_instance {
 
   /* Recursively traverse STACK starting from LEVEL to find the corresponding
  function_instance.  */
-  const function_instance *get_function_instance (const inline_stack &stack,
- unsigned level) const;
+  function_instance *get_function_instance (const inline_stack &stack,
+   unsigned level);
 
   /* Store the profile info for LOC in INFO. Return TRUE if profile info
  is found.  */
@@ -178,18 +184,23 @@ class function_instance {
   MAP, return the total count for all inlined indirect calls.  */
   gcov_type find_icall_target_map (gimple stmt, icall_target_map *map) const;
 
+  /* Total number of counts that is used during annotation.  */
+  gcov_type total_annotated_count () const;
+
+  /* Mark LOC as annotated.  */
+  void mark_annotated (location_t loc

Re: [PATCH] Fix PR60505

2014-03-12 Thread Cong Hou
Thank you for pointing it out. I didn't realized that alias analysis
has influences on this issue.

The current problem is that the epilogue may be unnecessary if the
loop bound cannot be larger than the number of iterations of the
vectorized loop multiplied by VF when the vectorized loop is supposed
to be executed. My method is incorrect because I assume the vectorized
loop will be executed which is actually guaranteed by loop bound check
(and also alias checks). So if the alias checks exist, my method is
fine as both conditions are met. If there is no alias checks, I must
consider the possibility that the vectorized loop may not be executed
at runtime and then the epilogue should not be eliminated. The warning
appears on epilogue, and with loop bound checks (and without alias
checks) the warning will be gone. So I think the key is alias checks:
my method only works if there is no alias checks.

How about adding one more condition that checks if alias checks are
needed, as the code shown below?

  else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
  || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
  < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
  && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)
   || (unsigned HOST_WIDE_INT)max_stmt_executions_int
   (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th)))
LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;


thanks,
Cong


On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek  wrote:
> On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote:
>> This patch is fixing PR60505 in which the vectorizer may produce
>> unnecessary epilogues.
>>
>> Bootstrapped and tested on a x86_64 machine.
>>
>> OK for trunk?
>
> That looks wrong.  Consider the case where the loop isn't versioned,
> if you disable generation of the epilogue loop, you end up only with
> a vector loop.
>
> Say:
> unsigned char ovec[16] __attribute__((aligned (16))) = { 0 };
> void
> foo (char *__restrict in, char *__restrict out, int num)
> {
>   int i;
>
>   in = __builtin_assume_aligned (in, 16);
>   out = __builtin_assume_aligned (out, 16);
>   for (i = 0; i < num; ++i)
> out[i] = (ovec[i] = in[i]);
>   out[num] = ovec[num / 2];
> }
> -O2 -ftree-vectorize.  Now, consider if this function is called
> with num != 16 (num > 16 is of course invalid, but num 0 to 15 is
> valid and your patch will cause a wrong-code in this case).
>
> Jakub


Re: [RFC][gomp4] Offloading: Add device initialization and host->target function mapping

2014-03-12 Thread Ilya Verbin
Hi Thomas,

Here is a new version of this patch (it was discussed in other thread: 
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00573.html ) with ChangeLog.
Bootstrap and make check passed.
Ok to commit?


 libgomp/ChangeLog.gomp |   29 
 libgomp/libgomp.map|1 +
 libgomp/plugin-host.c  |   58 -
 libgomp/target.c   |  170 
 4 files changed, 242 insertions(+), 16 deletions(-)

diff --git a/libgomp/ChangeLog.gomp b/libgomp/ChangeLog.gomp
index 7f9ce11..57a600e 100644
--- a/libgomp/ChangeLog.gomp
+++ b/libgomp/ChangeLog.gomp
@@ -1,3 +1,32 @@
+2014-03-12  Ilya Verbin  
+
+   * libgomp.map (GOMP_4.0): Add GOMP_offload_register.
+   * plugin-host.c (device_available): Replace with:
+   (get_num_devices): This.
+   (get_type): New.
+   (offload_register): Ditto.
+   (device_init): Ditto.
+   (device_get_table): Ditto.
+   (device_run): Ditto.
+   * target.c (target_type): New enum.
+   (offload_image_descr): New struct.
+   (offload_images, num_offload_images): New globals.
+   (struct gomp_device_descr): Remove device_available_func.
+   Add type, is_initialized, get_type_func, get_num_devices_func,
+   offload_register_func, device_init_func, device_get_table_func,
+   device_run_func.
+   (mapping_table): New struct.
+   (GOMP_offload_register): New function.
+   (gomp_init_device): Ditto.
+   (GOMP_target): Add device initialization and lookup for target fn.
+   (GOMP_target_data): Add device initialization.
+   (GOMP_target_update): Ditto.
+   (gomp_load_plugin_for_device): Take handles for get_type,
+   get_num_devices, offload_register, device_init, device_get_table,
+   device_run functions.
+   (gomp_register_images_for_device): New function.
+   (gomp_find_available_plugins): Add registration of offload images.
+
 2014-02-28  Thomas Schwinge  
 
* testsuite/libgomp.oacc-c/goacc_kernels.c: New file.
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index e9f8b55..9328767 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -208,6 +208,7 @@ GOMP_3.0 {
 
 GOMP_4.0 {
   global:
+   GOMP_offload_register;
GOMP_barrier_cancel;
GOMP_cancel;
GOMP_cancellation_point;
diff --git a/libgomp/plugin-host.c b/libgomp/plugin-host.c
index 5354ebe..ec0c78c 100644
--- a/libgomp/plugin-host.c
+++ b/libgomp/plugin-host.c
@@ -33,14 +33,53 @@
 #include 
 #include 
 
-bool
-device_available (void)
+const int TARGET_TYPE_HOST = 0;
+
+int
+get_type (void)
 {
 #ifdef DEBUG
   printf ("libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
 #endif
 
-  return true;
+  return TARGET_TYPE_HOST;
+}
+
+int
+get_num_devices (void)
+{
+#ifdef DEBUG
+  printf ("libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+
+  return 1;
+}
+
+void
+offload_register (void *host_table, void *target_data)
+{
+#ifdef DEBUG
+  printf ("libgomp plugin: %s:%s (%p, %p)\n", __FILE__, __FUNCTION__,
+ host_table, target_data);
+#endif
+}
+
+void
+device_init (void)
+{
+#ifdef DEBUG
+  printf ("libgomp plugin: %s:%s\n", __FILE__, __FUNCTION__);
+#endif
+}
+
+int
+device_get_table (void *table)
+{
+#ifdef DEBUG
+  printf ("libgomp plugin: %s:%s (%p)\n", __FILE__, __FUNCTION__, table);
+#endif
+
+  return 0;
 }
 
 void *
@@ -82,3 +121,16 @@ void *device_host2dev (void *dest, const void *src, size_t 
n)
 
   return memcpy (dest, src, n);
 }
+
+void
+device_run (void *fn_ptr, void *vars)
+{
+#ifdef DEBUG
+  printf ("libgomp plugin: %s:%s (%p, %p)\n", __FILE__, __FUNCTION__, fn_ptr,
+ vars);
+#endif
+
+  void (*fn)(void *) = (void (*)(void *)) fn_ptr;
+
+  fn (vars);
+}
diff --git a/libgomp/target.c b/libgomp/target.c
index a6a5505..0715b31 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -84,6 +84,26 @@ struct splay_tree_key_s {
   bool copy_from;
 };
 
+enum target_type {
+  TARGET_TYPE_HOST,
+  TARGET_TYPE_INTEL_MIC
+};
+
+/* This structure describes an offload image.
+   It contains type of the target, pointer to host table descriptor, and 
pointer
+   to target data.  */
+struct offload_image_descr {
+  int type;
+  void *host_table;
+  void *target_data;
+};
+
+/* Array of descriptors of offload images.  */
+static struct offload_image_descr *offload_images;
+
+/* Total number of offload images.  */
+static int num_offload_images;
+
 /* Array of descriptors of all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -117,15 +137,26 @@ struct gomp_device_descr
  TARGET construct.  */
   int id;
 
+  /* This is the TYPE of device.  */
+  int type;
+
+  /* Set to true when device is initialized.  */
+  bool is_initialized;
+
   /* Plugin file handler.  */
   void *plugin_handle;
 
   /* Function handlers.  */
-  bool (*device_available_func) (void);
+  int (*get_type_func) (void);
+  int (*get_num_devices_func) (void);
+  void (*offload_register_func) (void *, void *);
+  void (*d

Re: [PATCH 1/4] [GOMP4] [Fortran] OpenACC 1.0+ support in fortran front-end

2014-03-12 Thread Tobias Burnus

On March 7, 2014 11:45, Ilmir Usmanov wrote:

OpenACC 1.0 support to fortran FE -- core.


Looks good to me. As Thomas is also fine with the patch set [1], the 
patch can now go into the branch :-)


Previously and still approved are Part 3 and 4 [2] of the series. I will 
separately reply to Part 2.


Tobias

[1] http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00472.html
[2] http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00399.html , 
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00400.html



gcc/fortran/
* dump-parse-tree.c
(show_omp_node): Dump also OpenACC executable statements.
(show_code_node): Call it.
(show_namespace): Dump !$ACC DECLARE directive.
* gfortran.h
(ST_OACC_PARALLEL_LOOP, ST_OACC_END_PARALLEL_LOOP, ST_OACC_PARALLEL,
ST_OACC_END_PARALLEL, ST_OACC_KERNELS, ST_OACC_END_KERNELS,
ST_OACC_DATA, ST_OACC_END_DATA, ST_OACC_HOST_DATA,
ST_OACC_END_HOST_DATA, ST_OACC_LOOP, ST_OACC_DECLARE, ST_OACC_UPDATE,
ST_OACC_WAIT, ST_OACC_CACHE, ST_OACC_KERNELS_LOOP,
ST_OACC_END_KERNELS_LOOP, ST_OACC_ENTER_DATA,
ST_OACC_EXIT_DATA, ST_OACC_END_LOOP): New statements.
(gfc_expr_list): New structure to hold list of expressions.
(OMP_LIST_COPY, OMP_LIST_DATA_CLAUSE_FIRST,
OMP_LIST_OACC_COPYIN, OMP_LIST_COPYOUT, OMP_LIST_CREATE, 
OMP_LIST_DELETE,

OMP_LIST_PRESENT, OMP_LIST_PRESENT_OR_COPY,
OMP_LIST_PRESENT_OR_COPYIN, OMP_LIST_PRESENT_OR_COPYOUT,
OMP_LIST_PRESENT_OR_CREATE, OMP_LIST_DEVICEPTR,
OMP_LIST_DATA_CLAUSE_LAST, OMP_LIST_USE_DEVICE,
OMP_LIST_DEVICE_RESIDENT, OMP_LIST_HOST, OMP_LIST_DEVICE,
OMP_LIST_CACHE): New types of list, allowed in clauses.
(gfc_omp_clauses): Add OpenACC clauses.
(gfc_namespace): Add OpenACC declare directive clauses.
(EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_PARALLEL,
EXEC_OACC_KERNELS, EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, 
EXEC_OACC_LOOP,
EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE, 
EXEC_OACC_ENTER_DATA,

EXEC_OACC_EXIT_DATA): New executable statements.
(gfc_free_expr_list): New function declaration.
(gfc_resolve_oacc_directive): Likewise.
(gfc_resolve_oacc_parallel_loop_blocks): Likewise.
(gfc_resolve_oacc_blocks): Likewise.
* match.c (match_exit_cycle): Add support of OpenACC regions and 
loops.

* match.h (gfc_match_oacc_cache): New function declaration.
(gfc_match_oacc_wait, gfc_match_oacc_update): Likewise.
(gfc_match_oacc_declare, gfc_match_oacc_loop): Likewise.
(gfc_match_oacc_host_data, gfc_match_oacc_data): Likewise.
(gfc_match_oacc_kernels, gfc_match_oacc_kernels_loop): Likewise.
(gfc_match_oacc_parallel, gfc_match_oacc_parallel_loop): Likewise.
(gfc_match_oacc_enter_data, gfc_match_oacc_exit_data): Likewise.
* parse.c (decode_oacc_directive): New function.
(verify_token_free, verify_token_fixed): New helper functions.
(next_free, next_fixed): Decode !$ACC sentinel.
(case_executable): Add ST_OACC_UPDATE, ST_OACC_WAIT, ST_OACC_CACHE,
ST_OACC_ENTER_DATA and ST_OACC_EXIT_DATA directives.
(case_exec_markers): Add ST_OACC_PARALLEL_LOOP, ST_OACC_PARALLEL,
ST_OACC_KERNELS, ST_OACC_DATA, ST_OACC_HOST_DATA, ST_OACC_LOOP and
ST_OACC_KERNELS_LOOP directives.
(push_state): Initialize OpenACC declare clauses.
(gfc_ascii_statement): Dump names of OpenACC directives.
(verify_st_order): Verify OpenACC declare directive as declarative.
(parse_spec): Push clauses to state stack when declare directive is
parsed.
(parse_oacc_structured_block, parse_oacc_loop): New functions.
(parse_executable): Call them.
(parse_progunit): Move declare clauses from state stack to namespace.
* parse.h (gfc_state_data): Add declare directive's clauses.
* resolve.c (gfc_resolve_blocks): Resolve OpenACC directives.
(resolve_code): Likewise.
* scanner.c (openacc_flag, openacc_locus): New static variables.
(skip_oacc_attribute, skip_omp_attribute): New helper functions.
(skip_free_comments, skip_fixed_comments): Don't skip !$ACC sentinel.
(gfc_next_char_literal): Support OpenACC directives.
* st.c (gfc_free_statement): Free also OpenACC directives.


Re: [PATCH 2/4] [GOMP4] [Fortran] OpenACC 1.0+ support in fortran front-end

2014-03-12 Thread Tobias Burnus
Hi Ilmir,

Ilmir Usmanov wrote:
> Is it OK now?

Yes. Thanks for the patches - and the patience!

Tobias



Re: [patch,libfortran] [4.7/4.8/4.9 Regression] PR38199 missed optimization: I/O performance

2014-03-12 Thread Tobias Burnus

Jerry DeLisle wrote:

+  if (dtp->common.unit == 0)
+   {
+ len = string_len_trim (dtp->internal_unit_len,
+dtp->internal_unit);
+ if (len > 0)
+   dtp->internal_unit_len = len;
+ iunit->recl = dtp->internal_unit_len;
+   }
Is there a reason for having the "len > 0" check? And would the 
following work?


  dtp->internal_unit_len = len ? len : 1;


+ if (len > 0)
+   dtp->internal_unit_len = len;


Ditto.

Otherwise, it looks good to me - even if Dominique has found another 
special case [PR38199, Comment 14], where the performance is with patch 
is 7% lower.


Tobias


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Fri, Mar 7, 2014 at 2:33 PM, Wei Mi  wrote:
> Yes, x32 has the same problem. It should be tested. Fixed.
>
> Thanks,
> Wei.
>
>
> On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu  wrote:
>> On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi  wrote:
>>> Hi,
>>>
>>> This patch is to fix the problem described here:
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
>>>
>>> I follow Ian's suggestion and set
>>> ix86_tls_descriptor_calls_expanded_in_cfun in
>>> tls_global_dynamic_64_ and tls_local_dynamic_base_64_.
>>> Although 32bit doesn't have the problem,
>>> ix86_tls_descriptor_calls_expanded_in_cfun is also set for
>>> tls_global_dynamic_32 and tls_local_dynamic_base_32 to make
>>> ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across
>>> 32bits and 64bits.
>>>
>>> If ix86_current_function_calls_tls_descriptor is set, we know that
>>> there is tls expanded call in current function. Update
>>> crtl->preferred_stack_boundary and crtl->stack_alignment_needed to be
>>> no less than PREFERED_STACK_ALIGNMENT at the start of
>>> ix86_compute_frame_layout. We don't do the update in
>>> legitimize_tls_address in cfgexpand stage, which is too early because
>>> according to the comments before
>>> ix86_current_function_calls_tls_descriptor, tls call may be optimized
>>> away. ix86_compute_frame_layout is the latest place to do the update.
>>>
>>> bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
>>> for trunk if tests pass?
>>>
>>> Thanks,
>>> Wei.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2014-03-07  Wei Mi  
>>>
>>> * config/i386/i386.c (ix86_compute_frame_layout): Update
>>> preferred_stack_boundary when there is tls expanded call.
>>> * config/i386/i386.md: Set
>>> ix86_tls_descriptor_calls_expanded_in_cfun.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2014-03-07  Wei Mi  
>>>
>>> * g++.dg/pr58066.C: New test.
>>>
>>>
>>> Index: gcc/config/i386/i386.c
>>> ===
>>> --- gcc/config/i386/i386.c  (revision 208410)
>>> +++ gcc/config/i386/i386.c  (working copy)
>>> @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f
>>>crtl->preferred_stack_boundary = 128;
>>>crtl->stack_alignment_needed = 128;
>>>  }
>>> +  /* For 64-bit target, preferred_stack_boundary is never updated for call
>>> + expanded from tls descriptor. Update it here. We don't update it in
>>> + expand stage because according to the comments before
>>> + ix86_current_function_calls_tls_descriptor, tls calls may be optimized
>>> + away.  */
>>> +  else if (TARGET_64BIT
>>> +  && ix86_current_function_calls_tls_descriptor
>>> +  && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
>>> +{
>>> +  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
>>> +  if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY)
>>> +   crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
>>> +}
>>>

There are several problems with this:

1.  It doesn't work with C.
2.  IA32 has the same issue and isn't fixed.
3.  There is no testcase for global dynamic model.

-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
> There are several problems with this:
>
> 1.  It doesn't work with C.

Ok, I will change the testcase using C.

> 2.  IA32 has the same issue and isn't fixed.

I thought IA32 didn't have the same issue because abi only requires 32
bit alignment for stack starting address.

oh, I found the old patch
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed
the default alignment to 128bit. Ok, will remove the TARGET_64BIT
constraint.

> 3.  There is no testcase for global dynamic model.
>
> --
> H.J.

Will add the testcase.

Thanks,
Wei.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi  wrote:
>> There are several problems with this:
>>
>> 1.  It doesn't work with C.
>
> Ok, I will change the testcase using C.
>
>> 2.  IA32 has the same issue and isn't fixed.
>
> I thought IA32 didn't have the same issue because abi only requires 32
> bit alignment for stack starting address.
>
> oh, I found the old patch
> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed
> the default alignment to 128bit. Ok, will remove the TARGET_64BIT
> constraint.
>
>> 3.  There is no testcase for global dynamic model.
>>
>> --
>> H.J.
>
> Will add the testcase.
>

I posted a different patch in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

-- 
H.J.


Re: [Patch, fortran] PR 60392 wrong descriptor when passing a transposed array to a contiguous assumed shape dummy.

2014-03-12 Thread Mikael Morin
Le 11/03/2014 10:25, Janus Weil a écrit :
>> Regression-tested on x86_64-unknown-linux-gnu.
>> This is not a regression as far as I know, but quite a severe
>> wrong-code, albeit limited to the corner case of transpose and
>> assumed shape and contiguous.  OK for trunk/4.8/4.7 anyway ?
> 
> I would say it's ok for trunk at least. About the branches I'm not
> sure. Maybe someone else can add an opinion here ...
> 
Now we are in stage4, so one could argue whether trunk should get a less
conservative treatment than the branches.

Anyway, I will apply to trunk by the end of the week, and leave the
branches untouched for now.

Thanks
Mikael


Re: [C++ Patch/RFC] PR 60254

2014-03-12 Thread Jason Merrill

On 03/12/2014 12:12 PM, Paolo Carlini wrote:

- cxx_constant_value (condition);
+ require_potential_rvalue_constant_expression (condition);


We need both, actually; cxx_constant_value catches some cases that the 
other doesn't.


Jason



Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
Hi H.J.,

Could you show me why you postpone the setting
ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
use ix86_tls_descriptor_calls_expanded_in_cfun instead of
ix86_current_function_calls_tls_descriptor? Isn't
ix86_current_function_calls_tls_descriptor useful to consider the case
that tls call is optimized away?

Thanks,
Wei.

On Wed, Mar 12, 2014 at 2:07 PM, H.J. Lu  wrote:
> On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi  wrote:
>>> There are several problems with this:
>>>
>>> 1.  It doesn't work with C.
>>
>> Ok, I will change the testcase using C.
>>
>>> 2.  IA32 has the same issue and isn't fixed.
>>
>> I thought IA32 didn't have the same issue because abi only requires 32
>> bit alignment for stack starting address.
>>
>> oh, I found the old patch
>> http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed
>> the default alignment to 128bit. Ok, will remove the TARGET_64BIT
>> constraint.
>>
>>> 3.  There is no testcase for global dynamic model.
>>>
>>> --
>>> H.J.
>>
>> Will add the testcase.
>>
>
> I posted a different patch in
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
>
> --
> H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi  wrote:
> Hi H.J.,
>
> Could you show me why you postpone the setting
> ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
> use ix86_tls_descriptor_calls_expanded_in_cfun instead of
> ix86_current_function_calls_tls_descriptor? Isn't
> ix86_current_function_calls_tls_descriptor useful to consider the case
> that tls call is optimized away?
>

When a tls call is optimized away, it won't survive reload.
If it does survive reload, it isn't optimized away.  Also
checking df_regs_ever_live_p (SP_REG) isn't reliable
when called from ix86_compute_frame_layout.

-- 
H.J.


Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2014-03-12 Thread Bernhard Reutner-Fischer
On Sun, Mar 09, 2014 at 08:35:43PM +, Zoran Jovanovic wrote:
> Hello,
> This is new patch version. 
> Approach suggested by Richard Biener with lowering bit-field accesses instead 
> of modifying gimple trees is implemented.
 
> New command line option "-fmerge-bitfields" is introduced.
> 
> Tested - passed gcc regression tests.
> 
> Changelog -
> 
> gcc/ChangeLog:
> 2014-03-09 Zoran Jovanovic (zoran.jovano...@imgtec.com)
>   * common.opt (fmerge-bitfields): New option.
>   * doc/invoke.texi: Added reference to "-fmerge-bitfields".

Present tense.

>   * tree-sra.c (lower_bitfields): New function.
>   Entry for (-fmerge-bitfields).
>   (bfaccess::hash): New function.
>   (bfaccess::equal): New function.
>   (bfaccess::remove): New function.
>   (bitfield_access_p): New function.
>   (lower_bitfield_read): New function.
>   (lower_bitfield_write): New function.
>   (bitfield_stmt_access_pair_htab_hash): New function.
>   (bitfield_stmt_access_pair_htab_eq): New function.
>   (create_and_insert_access): New function.
>   (get_bit_offset): New function.
>   (get_merged_bit_field_size): New function.
>   (add_stmt_access_pair): New function.
>   (cmp_access): New function.
>   * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.

Present tense. Capital 'M'ove

>   (field_byte_offset): declaration moved to tree.h, static removed.

Capital 'D'eclaration. These are supposed to be sentences. By removing
static you IMHO 'make extern'.

>   * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
>   * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
>   * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.

See above.

>   * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.

Likewise.

>   * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.

See above.

>   (simple_type_size_in_bits): moved from dwarf2out.c.

See above.

>   * tree.h (expressions_equal_p): declaration added.

Ditto.

>   (field_byte_offset): declaration added.

Ditto.

> 
> Patch -
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 661516d..3331d03 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2193,6 +2193,10 @@ ftree-sra
>  Common Report Var(flag_tree_sra) Optimization
>  Perform scalar replacement of aggregates
>  
> +fmerge-bitfields
> +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization

Optimization but not enabled for any level. So, where would one
generally want this enabled? CSiBE numbers? SPEC you-name-it
improvements? size(1) improvements where? In GCC there is generally no
interest in the size(1) added to the collection itself, so let me ask
for size(1) and bloat(-o-meter) stats for gcc, cc1 and collect2, just
for the sake of it?

> +Merge loads and stores of consecutive bitfields
> +
>  ftree-ter
>  Common Report Var(flag_tree_ter) Optimization
>  Replace temporary expressions in the SSA->normal pass
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 24bd76e..54bae56 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -411,7 +411,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
>  -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
>  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
> --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
> +-fmerge-bitfields -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
>  -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
>  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
>  -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
> @@ -7807,6 +7807,11 @@ pointer alignment information.
>  This pass only operates on local scalar variables and is enabled by default
>  at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.
>  
> +@item -fbitfield-merge

you are talking about '-fmerge-bitfields' up until here (except for
Subject. [Confusion starts here -- Subject: -ftree-bitfield-merge; sofar
Intro -fmerge-bitfields and ChangeLog -fmerge-bitfields]

> +@opindex fmerge-bitfields
> +Combines several adjacent bit-field accesses that copy values
> +from one memory location to another into one single bit-field access.
> +
>  @item -ftree-ccp
>  @opindex ftree-ccp
>  Perform sparse conditional constant propagation (CCP) on trees.  This

> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 284d544..c6a19b2 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -3462,10 +3462,608 @@ perform_intra_sra (void)
>return ret;
>  }
>  
> +/* Bitfield access and hashtable support commoning same base and
> +   representative.  */
> +
> +struct bfaccess
> +{
> +  bfaccess (tree r):ref (r), r_count (1), w_count (1), merged (false),
> +modified (false), is_barrier (false), next (0), head_access (0)
> +  {
> +  }
> +
> +  tree ref;
> +  unsigned r_count;  /* Read counter.  */
> +  unsigned w_count;  /* Write counter.  */
> +
> +  /* hash_table support.  */
> +

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
Oh, I see. Thanks!

Wei.

On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu  wrote:
> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi  wrote:
>> Hi H.J.,
>>
>> Could you show me why you postpone the setting
>> ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
>> use ix86_tls_descriptor_calls_expanded_in_cfun instead of
>> ix86_current_function_calls_tls_descriptor? Isn't
>> ix86_current_function_calls_tls_descriptor useful to consider the case
>> that tls call is optimized away?
>>
>
> When a tls call is optimized away, it won't survive reload.
> If it does survive reload, it isn't optimized away.  Also
> checking df_regs_ever_live_p (SP_REG) isn't reliable
> when called from ix86_compute_frame_layout.
>
> --
> H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
This is the updated testcase.

Thanks,
Wei.

===
--- testsuite/gcc.dg/pr58066.c (revision 0)
+++ testsuite/gcc.dg/pr58066.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && { ! ia32 } } } } */
+/* { dg-options "-fPIC -O2" } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return &ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return &ccc2;
+}
+
+/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */

On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi  wrote:
> Oh, I see. Thanks!
>
> Wei.
>
> On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu  wrote:
>> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi  wrote:
>>> Hi H.J.,
>>>
>>> Could you show me why you postpone the setting
>>> ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
>>> use ix86_tls_descriptor_calls_expanded_in_cfun instead of
>>> ix86_current_function_calls_tls_descriptor? Isn't
>>> ix86_current_function_calls_tls_descriptor useful to consider the case
>>> that tls call is optimized away?
>>>
>>
>> When a tls call is optimized away, it won't survive reload.
>> If it does survive reload, it isn't optimized away.  Also
>> checking df_regs_ever_live_p (SP_REG) isn't reliable
>> when called from ix86_compute_frame_layout.
>>
>> --
>> H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi  wrote:
> This is the updated testcase.

Does my patch fix the original problem?

> Thanks,
> Wei.
>
> ===
> --- testsuite/gcc.dg/pr58066.c (revision 0)
> +++ testsuite/gcc.dg/pr58066.c (revision 0)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && { ! ia32 } } } } */

Since it is a C testcase and we should test it under ia32, it
should be moved to gcc.target/i386 and remove target.

> +/* { dg-options "-fPIC -O2" } */
> +
> +/* Check whether the stack frame starting addresses of tls expanded calls
> +   in foo and goo are 16bytes aligned.  */
> +static __thread char ccc1;
> +void* foo()
> +{
> + return &ccc1;
> +}
> +
> +__thread char ccc2;
> +void* goo()
> +{
> + return &ccc2;
> +}
> +
> +/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */
>
> On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi  wrote:
>> Oh, I see. Thanks!
>>
>> Wei.
>>
>> On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu  wrote:
>>> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi  wrote:
 Hi H.J.,

 Could you show me why you postpone the setting
 ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
 use ix86_tls_descriptor_calls_expanded_in_cfun instead of
 ix86_current_function_calls_tls_descriptor? Isn't
 ix86_current_function_calls_tls_descriptor useful to consider the case
 that tls call is optimized away?

>>>
>>> When a tls call is optimized away, it won't survive reload.
>>> If it does survive reload, it isn't optimized away.  Also
>>> checking df_regs_ever_live_p (SP_REG) isn't reliable
>>> when called from ix86_compute_frame_layout.
>>>
>>> --
>>> H.J.



-- 
H.J.


Re: [C++ Patch/RFC] PR 60254

2014-03-12 Thread Paolo Carlini

Hi,

On 03/12/2014 10:28 PM, Jason Merrill wrote:

On 03/12/2014 12:12 PM, Paolo Carlini wrote:

-  cxx_constant_value (condition);
+  require_potential_rvalue_constant_expression (condition);


We need both, actually; cxx_constant_value catches some cases that the 
other doesn't.
Ok, I think I got confused when I compared to cxx_alignas_expr: in the 
present case 'condition' is already the result of maybe_constant_value 
(thus it seems we would waste work) but, at variance with 
cxx_constant_value called by the former, it allows for nonconstants, and 
we want to emit at this point also the errors suppressed the first time 
cxx_eval_outermost_constant_expr is called... Thanks for your patience, 
now the various *_constant_* helpers are more clear ;)


The below also passes testing.

Thanks,
Paolo.

///
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 208507)
+++ cp/semantics.c  (working copy)
@@ -6860,7 +6860,8 @@ finish_static_assert (tree condition, tree message
   else if (condition && condition != error_mark_node)
{
  error ("non-constant condition for static assertion");
- cxx_constant_value (condition);
+ if (require_potential_rvalue_constant_expression (condition))
+   cxx_constant_value (condition);
}
   input_location = saved_loc;
 }
Index: testsuite/g++.dg/cpp0x/static_assert10.C
===
--- testsuite/g++.dg/cpp0x/static_assert10.C(revision 0)
+++ testsuite/g++.dg/cpp0x/static_assert10.C(working copy)
@@ -0,0 +1,8 @@
+// PR c++/60254
+// { dg-do compile { target c++11 } }
+
+template bool foo(T)
+{
+  int i;
+  static_assert(foo(i), "Error"); // { dg-error "non-constant condition|not 
usable" }
+}
Index: testsuite/g++.dg/cpp0x/static_assert11.C
===
--- testsuite/g++.dg/cpp0x/static_assert11.C(revision 0)
+++ testsuite/g++.dg/cpp0x/static_assert11.C(working copy)
@@ -0,0 +1,10 @@
+// PR c++/60254
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  template bool foo(T)
+  {
+static_assert(foo(0), "Error"); // { dg-error "non-constant 
condition|constant expression" }
+  }
+};
Index: testsuite/g++.dg/cpp0x/static_assert3.C
===
--- testsuite/g++.dg/cpp0x/static_assert3.C (revision 208507)
+++ testsuite/g++.dg/cpp0x/static_assert3.C (working copy)
@@ -1,4 +1,4 @@
 // { dg-do compile { target c++11 } }
 static_assert(7 / 0, "X"); // { dg-error "non-constant condition" 
"non-constant" }
 // { dg-warning "division by zero" "zero" { target *-*-* } 2 }
-// { dg-error "7 / 0.. is not a constant expression" "not a constant" { target 
*-*-* } 2 }
+// { dg-error "division by zero is not a constant-expression" "not a constant" 
{ target *-*-* } 2 }


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
On Wed, Mar 12, 2014 at 3:07 PM, H.J. Lu  wrote:
> On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi  wrote:
>> This is the updated testcase.
>
> Does my patch fix the original problem?

Yes, it works. I am doing bootstrap and regression test for your patch. Thanks!

>
>> Thanks,
>> Wei.
>>
>> ===
>> --- testsuite/gcc.dg/pr58066.c (revision 0)
>> +++ testsuite/gcc.dg/pr58066.c (revision 0)
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && { ! ia32 } } } } */
>
> Since it is a C testcase and we should test it under ia32, it
> should be moved to gcc.target/i386 and remove target.
>

Fixed.

Thanks,
Wei.

Index: testsuite/gcc.target/i386/pr58066.c
===
--- testsuite/gcc.target/i386/pr58066.c (revision 0)
+++ testsuite/gcc.target/i386/pr58066.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-fPIC -O2" } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return &ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return &ccc2;
+}
+
+/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */


[Patch][google/main] Fix arm build broken

2014-03-12 Thread 沈涵
ARM build (on chrome) is broken because of duplicate entries in arm.md
and unspecs.md. Fixed by removing duplication and merge those in
arm.md into unspecs.md.

(We had a similar fix for google/gcc-4_8 here -
http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=198650)

Tested by building arm cross compiler successfully.

Ok for google/main?

Patch below -

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8b269a4..9aec213 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -75,27 +75,6 @@
   ]
 )

-;; UNSPEC Usage:
-;; Note: sin and cos are no-longer used.
-;; Unspec enumerators for Neon are defined in neon.md.
-
-(define_c_enum "unspec" [
-  UNSPEC_SIN; `sin' operation (MODE_FLOAT):
-;   operand 0 is the result,
-;   operand 1 the parameter.
-  UNPSEC_COS; `cos' operation (MODE_FLOAT):
-;   operand 0 is the result,
-;   operand 1 the parameter.
-  UNSPEC_PROLOGUE_USE   ; As USE insns are not meaningful after reload,
-; this unspec is used to prevent the deletion of
-; instructions setting registers for EH handling
-; and stack frame generation.  Operand 0 is the
-; register to "use".
-  UNSPEC_WMADDS ; Used by the intrinsic form of the iWMMXt
WMADDS instruction.
-  UNSPEC_WMADDU ; Used by the intrinsic form of the iWMMXt
WMADDU instruction.
-  UNSPEC_GOT_PREL_SYM   ; Specify an R_ARM_GOT_PREL relocation of a symbol.
-])
-
 ;; UNSPEC_VOLATILE Usage:


diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 8caa953..89bc528 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -24,6 +24,12 @@
 ;; Unspec enumerators for iwmmxt2 are defined in iwmmxt2.md

 (define_c_enum "unspec" [
+  UNSPEC_SIN; `sin' operation (MODE_FLOAT):
+;   operand 0 is the result,
+;   operand 1 the parameter.
+  UNPSEC_COS; `cos' operation (MODE_FLOAT):
+;   operand 0 is the result,
+;   operand 1 the parameter.
   UNSPEC_PUSH_MULT  ; `push multiple' operation:
 ;   operand 0 is the first register,
 ;   subsequent registers are in parallel (use ...)
@@ -58,6 +64,7 @@
 ; instruction stream.
   UNSPEC_PIC_OFFSET ; A symbolic 12-bit OFFSET that has been treated
 ; correctly for PIC usage.
+  UNSPEC_GOT_PREL_SYM   ; Specify an R_ARM_GOT_PREL relocation of a symbol.
   UNSPEC_GOTSYM_OFF ; The offset of the start of the GOT from a
 ; a given symbolic address.
   UNSPEC_THUMB1_CASESI  ; A Thumb1 compressed dispatch-table call.
@@ -70,6 +77,11 @@
  ; that.
   UNSPEC_UNALIGNED_STORE ; Same for str/strh.
   UNSPEC_PIC_UNIFIED; Create a common pic addressing form.
+  UNSPEC_PROLOGUE_USE   ; As USE insns are not meaningful after reload,
+; this unspec is used to prevent the deletion of
+; instructions setting registers for EH handling
+; and stack frame generation.  Operand 0 is the
+; register to "use".
   UNSPEC_LL ; Represent an unpaired load-register-exclusive.
   UNSPEC_VRINTZ ; Represent a float to integral float rounding
 ; towards zero.
@@ -87,6 +99,8 @@

 (define_c_enum "unspec" [
   UNSPEC_WADDC ; Used by the intrinsic form of the iWMMXt WADDC instruction.
+  UNSPEC_WMADDS ; Used by the intrinsic form of the iWMMXt
WMADDS instruction.
+  UNSPEC_WMADDU ; Used by the intrinsic form of the iWMMXt
WMADDU instruction.
   UNSPEC_WABS ; Used by the intrinsic form of the iWMMXt WABS instruction.
   UNSPEC_WQMULWMR ; Used by the intrinsic form of the iWMMXt WQMULWMR
instruction.
   UNSPEC_WQMULMR ; Used by the intrinsic form of the iWMMXt WQMULMR
instruction.




Han


Re: [Patch][google/main] Fix arm build broken

2014-03-12 Thread Dehao Chen
Looks good to me.

Dehao

On Wed, Mar 12, 2014 at 3:35 PM, Hán Shěn (沈涵)  wrote:
> ARM build (on chrome) is broken because of duplicate entries in arm.md
> and unspecs.md. Fixed by removing duplication and merge those in
> arm.md into unspecs.md.
>
> (We had a similar fix for google/gcc-4_8 here -
> http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=198650)
>
> Tested by building arm cross compiler successfully.
>
> Ok for google/main?
>
> Patch below -
>
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 8b269a4..9aec213 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -75,27 +75,6 @@
>]
>  )
>
> -;; UNSPEC Usage:
> -;; Note: sin and cos are no-longer used.
> -;; Unspec enumerators for Neon are defined in neon.md.
> -
> -(define_c_enum "unspec" [
> -  UNSPEC_SIN; `sin' operation (MODE_FLOAT):
> -;   operand 0 is the result,
> -;   operand 1 the parameter.
> -  UNPSEC_COS; `cos' operation (MODE_FLOAT):
> -;   operand 0 is the result,
> -;   operand 1 the parameter.
> -  UNSPEC_PROLOGUE_USE   ; As USE insns are not meaningful after reload,
> -; this unspec is used to prevent the deletion of
> -; instructions setting registers for EH handling
> -; and stack frame generation.  Operand 0 is the
> -; register to "use".
> -  UNSPEC_WMADDS ; Used by the intrinsic form of the iWMMXt
> WMADDS instruction.
> -  UNSPEC_WMADDU ; Used by the intrinsic form of the iWMMXt
> WMADDU instruction.
> -  UNSPEC_GOT_PREL_SYM   ; Specify an R_ARM_GOT_PREL relocation of a symbol.
> -])
> -
>  ;; UNSPEC_VOLATILE Usage:
>
>
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index 8caa953..89bc528 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -24,6 +24,12 @@
>  ;; Unspec enumerators for iwmmxt2 are defined in iwmmxt2.md
>
>  (define_c_enum "unspec" [
> +  UNSPEC_SIN; `sin' operation (MODE_FLOAT):
> +;   operand 0 is the result,
> +;   operand 1 the parameter.
> +  UNPSEC_COS; `cos' operation (MODE_FLOAT):
> +;   operand 0 is the result,
> +;   operand 1 the parameter.
>UNSPEC_PUSH_MULT  ; `push multiple' operation:
>  ;   operand 0 is the first register,
>  ;   subsequent registers are in parallel (use ...)
> @@ -58,6 +64,7 @@
>  ; instruction stream.
>UNSPEC_PIC_OFFSET ; A symbolic 12-bit OFFSET that has been treated
>  ; correctly for PIC usage.
> +  UNSPEC_GOT_PREL_SYM   ; Specify an R_ARM_GOT_PREL relocation of a symbol.
>UNSPEC_GOTSYM_OFF ; The offset of the start of the GOT from a
>  ; a given symbolic address.
>UNSPEC_THUMB1_CASESI  ; A Thumb1 compressed dispatch-table call.
> @@ -70,6 +77,11 @@
>   ; that.
>UNSPEC_UNALIGNED_STORE ; Same for str/strh.
>UNSPEC_PIC_UNIFIED; Create a common pic addressing form.
> +  UNSPEC_PROLOGUE_USE   ; As USE insns are not meaningful after reload,
> +; this unspec is used to prevent the deletion of
> +; instructions setting registers for EH handling
> +; and stack frame generation.  Operand 0 is the
> +; register to "use".
>UNSPEC_LL ; Represent an unpaired load-register-exclusive.
>UNSPEC_VRINTZ ; Represent a float to integral float rounding
>  ; towards zero.
> @@ -87,6 +99,8 @@
>
>  (define_c_enum "unspec" [
>UNSPEC_WADDC ; Used by the intrinsic form of the iWMMXt WADDC instruction.
> +  UNSPEC_WMADDS ; Used by the intrinsic form of the iWMMXt
> WMADDS instruction.
> +  UNSPEC_WMADDU ; Used by the intrinsic form of the iWMMXt
> WMADDU instruction.
>UNSPEC_WABS ; Used by the intrinsic form of the iWMMXt WABS instruction.
>UNSPEC_WQMULWMR ; Used by the intrinsic form of the iWMMXt WQMULWMR
> instruction.
>UNSPEC_WQMULMR ; Used by the intrinsic form of the iWMMXt WQMULMR
> instruction.
>
>
>
>
> Han


Re: [PATCH] PR libstdc++/59392: Fix ARM EABI uncaught throw from unexpected exception handler

2014-03-12 Thread Roland McGrath
Committed (r208519 on trunk and r208520 on 4.8) after approval posted
in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59392.
Only the dates are changed from what I posted originally.


Thanks,
Roland

On Fri, Dec 6, 2013 at 3:24 PM, Roland McGrath  wrote:
> [This patch is on the git-only branch roland/pr59392.]
>
> As described in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59392, this
> bug looks to have been present since 4.2 originally introduced support
> for ARM EABI-based C++ exception handling.  I'd like to put this fix on
> trunk and 4.8, and don't personally care about older versions but the
> same fix should apply to all versions still being maintained.
>
> The nature of the bug is quite straightforward: it's an unconditional
> null pointer dereference in the code path for an unexpected throw done
> inside a user-supplied handler for unexpected exceptions.  I'm not
> really sure if there are other ways to make it manifest.
>
> Mark Seaborn is responsible for identifying the fix, which mimics the
> similar code for the non-EABI implementation (and copies its comment).
> I filled it out with a regression test.  (We're both covered by Google's
> blanket copyright assignment.)
>
> No regressions in 'make check-c++' on arm-linux-gnueabihf.
>
> Ok for trunk and 4.8?
>
>
> Thanks,
> Roland
>
>
> libstdc++-v3/
> 2013-12-06  Roland McGrath  
> Mark Seaborn  
>
> PR libstdc++/59392
> * libsupc++/eh_call.cc (__cxa_call_unexpected): Call __do_catch with
> the address of a null pointer, not with a null pointer to pointer.
> Copy comment for this case from 
> eh_personality.cc:__cxa_call_unexpected.
> * testsuite/18_support/bad_exception/59392.cc: New file.
>
> --- a/libstdc++-v3/libsupc++/eh_call.cc
> +++ b/libstdc++-v3/libsupc++/eh_call.cc
> @@ -140,7 +140,11 @@ __cxa_call_unexpected(void* exc_obj_in)
>&new_ptr) != ctm_failed)
> __throw_exception_again;
>
> - if (catch_type->__do_catch(&bad_exc, 0, 1))
> + // If the exception spec allows std::bad_exception, throw that.
> + // We don't have a thrown object to compare against, but since
> + // bad_exception doesn't have virtual bases, that's OK; just pass 
> NULL.
> + void* obj = NULL;
> + if (catch_type->__do_catch(&bad_exc, &obj, 1))
> bad_exception_allowed = true;
> }
>
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/18_support/bad_exception/59392.cc
> @@ -0,0 +1,51 @@
> +// Copyright (C) 2013 Free Software Foundation, Inc.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// You should have received a copy of the GNU General Public License along
> +// with this library; see the file COPYING3.  If not see
> +// .
> +
> +#include 
> +#include 
> +
> +class expected {};
> +class unexpected {};
> +class from_handler {};
> +
> +static void func_with_exception_spec() throw(expected)
> +{
> +  throw unexpected();
> +}
> +
> +static void unexpected_handler()
> +{
> +  throw from_handler();
> +}
> +
> +static void terminate_handler()
> +{
> +  exit(0);
> +}
> +
> +// libstdc++/59392
> +int main()
> +{
> +  std::set_unexpected(unexpected_handler);
> +  std::set_terminate(terminate_handler);
> +  try {
> +func_with_exception_spec();
> +  } catch (expected&) {
> +abort();
> +  }
> +  abort();
> +}


Re: [GOOGLE] Writes annotation info in elf section.

2014-03-12 Thread Xinliang David Li
Why is it not enough to emit warnings during build time when source
code changes signifcantly?

David

On Tue, Mar 11, 2014 at 4:27 PM, Dehao Chen  wrote:
> During AutoFDO annotation, we want to record the annotation stats into
> an elf section, so that we can calculate how much percentage of the
> profile is annotated, which can be used as an indicator whether code
> has changed significantly comparing with the profiled source.
>
> Bootstrapped and performance test on-going.
>
> OK for google-4_8?
>
> Thanks,
> Dehao


Re: RFA: New ipa-devirt PATCH for c++/58678 (devirt vs. KDE)

2014-03-12 Thread Jan Hubicka
> > This patch fixes the latest 58678 testcase by avoiding speculative
> > devirtualization to the destructor of an abstract class, which can
> > never succeed: you can't create an object of an abstract class, so
> > the pointer must point to an object of a derived class, and the
> > derived class necessarily has its own destructor.  Other virtual
> > member functions of an abstract class are OK for devirtualization:
> > the destructor is the only virtual function that is always
> > overridden in every class.
> > 
> > We could also detect an abstract class by searching through the
> > vtable for __cxa_pure_virtual, but I figured it was easy enough for
> > the front end to set a flag on the BINFO.
> > 
> > Tested x86_64-pc-linux-gnu.  OK for trunk?
> 
> > commit b64f52066f3f4cdc9d5a30e2d48aaf6dd5efd3d4
> > Author: Jason Merrill 
> > Date:   Wed Mar 5 11:35:07 2014 -0500
> > 
> > PR c++/58678
> > gcc/
> > * tree.h (BINFO_ABSTRACT_P): New.
> > * ipa-devirt.c (abstract_class_dtor_p): New.
> > (likely_target_p): Check it.
> > gcc/cp/
> > * search.c (get_pure_virtuals): Set BINFO_ABSTRACT_P.
> > * tree.c (copy_binfo): Copy it.

Jason, also if abstract_class_dtor_p functions are never called via vtables, is 
there
reason for C++ FE to put them there? I understand that there is a slot in 
vtable initializer
for them, but things would go smoother if it was initialized to NULL or some 
other marker
different from cxa_pure_virtual.  Then gimple-fold will already substitute it 
for
builtin_unreachable and they will get ignored during the ipa-devirt's walks.

Honza


Re: [GOOGLE] Writes annotation info in elf section.

2014-03-12 Thread Xinliang David Li
Dehao explained that the data needs to merged during link time to give
meaningful diagnostics.

Ok for google branch.


David

On Wed, Mar 12, 2014 at 3:55 PM, Xinliang David Li  wrote:
> Why is it not enough to emit warnings during build time when source
> code changes signifcantly?
>
> David
>
> On Tue, Mar 11, 2014 at 4:27 PM, Dehao Chen  wrote:
>> During AutoFDO annotation, we want to record the annotation stats into
>> an elf section, so that we can calculate how much percentage of the
>> profile is annotated, which can be used as an indicator whether code
>> has changed significantly comparing with the profiled source.
>>
>> Bootstrapped and performance test on-going.
>>
>> OK for google-4_8?
>>
>> Thanks,
>> Dehao


Re: [PATCH] Add support for powerpc ISA 2.07 128-bit add/subtract builtins

2014-03-12 Thread David Edelsohn
On Wed, Mar 12, 2014 at 5:20 PM, Michael Meissner
 wrote:
> Internally within IBM, the people wanting to use the new 128-bit integer
> instructions (and the integer decimal support instructions that I will be 
> doing
> soon), asked for a new type to do the calculations in, rather than depend on
> the register allocator chosing the correct register, and not bouncing things
> between the GPR and VSX register sets.
>
> So I reworked the patches to add these instructions to use a new type (vector
> __int128_t or V1TImode) to hold the 128-bit integer types.  I simplified
> things, by removing the __int128_t support.  To convert from TImode to
> V1TImode, you would use vector initialization:
>
> vector __int128_t v128;
> __int128_t i128;
>
> v128 = (vector __int128_t) { i128 };
>
> And to convert from vector __int128_t to __int128_t, you use extract:
>
> #include 
>
> vector __int128_t v128;
> __int128_t i128;
>
> i128 = vec_extract (v128, 0);
>
> I have bootstrapped these changes on a big endian power7, big endian power8,
> and little endian power8 system, and verified that both the test cases work in
> big and little endian modes.  There were no regressions in running make check.
> Are these patches ok to install in the tree (instead of the previous version)?
>
> [gcc]
> 2014-03-11  Michael Meissner  
>
> * config/rs6000/vector.md (VEC_L): Add V1TI mode to vector types.
> (VEC_M): Likewise.
> (VEC_N): Likewise.
> (VEC_R): Likewise.
> (VEC_base): Likewise.
> (mov, VEC_M modes): If we are loading TImode into VSX
> registers, we need to swap double words in little endian mode.
>
> * config/rs6000/rs6000-modes.def (V1TImode): Add new vector mode
> to be a container mode for 128-bit integer operations added in ISA
> 2.07.  Unlike TImode and PTImode, the preferred register set is
> the Altivec/VMX registers for the 128-bit operations.
>
> * config/rs6000/rs6000-protos.h (rs6000_move_128bit_ok_p): Add
> declarations.
> (rs6000_split_128bit_ok_p): Likewise.
>
> * config/rs6000/rs6000-builtin.def (BU_P8V_AV_3): Add new support
> macros for creating ISA 2.07 normal and overloaded builtin
> functions with 3 arguments.
> (BU_P8V_OVERLOAD_3): Likewise.
> (VPERM_1T): Add support for V1TImode in 128-bit vector operations
> for use as overloaded functions.
> (VPERM_1TI_UNS): Likewise.
> (VSEL_1TI): Likewise.
> (VSEL_1TI_UNS): Likewise.
> (ST_INTERNAL_1ti): Likewise.
> (LD_INTERNAL_1ti): Likewise.
> (XXSEL_1TI): Likewise.
> (XXSEL_1TI_UNS): Likewise.
> (VPERM_1TI): Likewise.
> (VPERM_1TI_UNS): Likewise.
> (XXPERMDI_1TI): Likewise.
> (SET_1TI): Likewise.
> (LXVD2X_V1TI): Likewise.
> (STXVD2X_V1TI): Likewise.
> (VEC_INIT_V1TI): Likewise.
> (VEC_SET_V1TI): Likewise.
> (VEC_EXT_V1TI): Likewise.
> (EQV_V1TI): Likewise.
> (NAND_V1TI): Likewise.
> (ORC_V1TI): Likewise.
> (VADDCUQ): Add support for 128-bit integer arithmetic instructions
> added in ISA 2.07.  Add both normal 'altivec' builtins, and the
> overloaded builtin.
> (VADDUQM): Likewise.
> (VSUBCUQ): Likewise.
> (VADDEUQM): Likewise.
> (VADDECUQ): Likewise.
> (VSUBEUQM): Likewise.
> (VSUBECUQ): Likewise.
>
> * config/rs6000/rs6000-c.c (__int128_type): New static to hold
> __int128_t and __uint128_t types.
> (__uint128_type): Likewise.
> (altivec_categorize_keyword): Add support for vector __int128_t,
> vector __uint128_t, vector __int128, and vector unsigned __int128
> as a container type for TImode operations that need to be done in
> VSX/Altivec registers.
> (rs6000_macro_to_expand): Likewise.
> (altivec_overloaded_builtins): Add ISA 2.07 overloaded functions
> to support 128-bit integer instructions vaddcuq, vadduqm,
> vaddecuq, vaddeuqm, vsubcuq, vsubuqm, vsubecuq, vsubeuqm.
> (altivec_resolve_overloaded_builtin): Add support for V1TImode.
>
> * config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Add support
> for V1TImode, and set up preferences to use VSX/Altivec
> registers.  Setup VSX reload handlers.
> (rs6000_debug_reg_global): Likewise.
> (rs6000_init_hard_regno_mode_ok): Likewise.
> (rs6000_preferred_simd_mode): Likewise.
> (vspltis_constant): Do not allow V1TImode as easy altivec
> constants.
> (easy_altivec_constant): Likewise.
> (output_vec_const_move): Likewise.
> (rs6000_expand_vector_set): Convert V1TImode set and extract to
> simple move.
> (rs6000_expand_vector_extract): Likewise.
> (reg_offset_addressing_ok_p): Setup V

[patch] make -flto -save-temps less verbose

2014-03-12 Thread Cesar Philippidis
I noticed that the lto-wrapper is a little noisy without the -v option
when -save-temps is used. E.g.,

$ gcc main.c -flto -save-temps
[Leaving LTRANS /tmp/ccSEvaB7.args]
[Leaving LTRANS /tmp/ccQomDzb.ltrans.out]
[Leaving LTRANS /tmp/ccVzWdGZ.args]
[Leaving LTRANS /tmp/ccQomDzb.ltrans0.o]

Those messages probably should be suppressed unless the user wants
verbose diagnostics. They also show up as errors in the testsuite
(although none currently use -save-temps with -flto, yet). The attached
patch addresses this issue by disabling those messages unless the user
passes -v to the driver. I've also included a simple test case which
would fail without the change.

Is this OK for stage-4? If so, please check it in since I don't have an
SVN account.

Thanks,
Cesar

2014-03-12  Cesar Philippidis  

gcc/
* lto-wrapper.c (maybe_unlink_file): Suppress diagnostic
messages.

gcc/testsuites/
* gcc.dg/lto/save-temps_0.c: New file.

Index: gcc/lto-wrapper.c
===
--- gcc/lto-wrapper.c   (revision 208513)
+++ gcc/lto-wrapper.c   (working copy)
@@ -246,7 +246,7 @@ maybe_unlink_file (const char *file)
  && errno != ENOENT)
fatal_perror ("deleting LTRANS file %s", file);
 }
-  else
+  else if (verbose)
 fprintf (stderr, "[Leaving LTRANS %s]\n", file);
 }
 
Index: gcc/testsuite/gcc.dg/lto/save-temps_0.c
===
--- gcc/testsuite/gcc.dg/lto/save-temps_0.c (revision 0)
+++ gcc/testsuite/gcc.dg/lto/save-temps_0.c (revision 0)
@@ -0,0 +1,8 @@
+/* { dg-lto-options {{ -O -flto -save-temps}} } */
+/* { dg-lto-do link } */
+
+int
+main (void)
+{
+  return 0;
+}


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
>> Does my patch fix the original problem?
>
> Yes, it works. I am doing bootstrap and regression test for your patch. 
> Thanks!
>

The patch passes bootstrap and regression test on x86_64-linux-gnu.

Thanks,
Wei.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi  wrote:
>>> Does my patch fix the original problem?
>>
>> Yes, it works. I am doing bootstrap and regression test for your patch. 
>> Thanks!
>>
>
> The patch passes bootstrap and regression test on x86_64-linux-gnu.
>

My patch fails to handle ia32.  Here is the updated one.

-- 
H.J.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9e33d53..da603b6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -9490,20 +9490,30 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   frame->nregs = ix86_nsaved_regs ();
   frame->nsseregs = ix86_nsaved_sseregs ();
 
-  stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
-  preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
-
   /* 64-bit MS ABI seem to require stack alignment to be always 16 except for
  function prologues and leaf.  */
-  if ((TARGET_64BIT_MS_ABI && preferred_alignment < 16)
+  if ((TARGET_64BIT_MS_ABI && crtl->preferred_stack_boundary < 128)
   && (!crtl->is_leaf || cfun->calls_alloca != 0
   || ix86_current_function_calls_tls_descriptor))
 {
-  preferred_alignment = 16;
-  stack_alignment_needed = 16;
   crtl->preferred_stack_boundary = 128;
   crtl->stack_alignment_needed = 128;
 }
+  /* preferred_stack_boundary is never updated for call
+ expanded from tls descriptor. Update it here. We don't update it in
+ expand stage because according to the comments before
+ ix86_current_function_calls_tls_descriptor, tls calls may be optimized
+ away.  */
+  else if (ix86_current_function_calls_tls_descriptor
+	   && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
+{
+  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY)
+	crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}
+
+  stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
+  preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
 
   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ea1d85f..bc8fb1f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12859,13 +12859,14 @@
 
 (define_insn "*tls_global_dynamic_32_gnu"
   [(set (match_operand:SI 0 "register_operand" "=a")
-	(unspec:SI
-	 [(match_operand:SI 1 "register_operand" "b")
-	  (match_operand 2 "tls_symbolic_operand")
-	  (match_operand 3 "constant_call_address_operand" "z")]
-	 UNSPEC_TLS_GD))
-   (clobber (match_scratch:SI 4 "=d"))
-   (clobber (match_scratch:SI 5 "=c"))
+	(call:SI
+	 (mem:QI (match_operand 3 "constant_call_address_operand" "z"))
+	 (match_operand 4)))
+   (unspec:SI [(match_operand:SI 1 "register_operand" "b")
+	   (match_operand 2 "tls_symbolic_operand")]
+	  UNSPEC_TLS_GD)
+   (clobber (match_scratch:SI 5 "=d"))
+   (clobber (match_scratch:SI 6 "=c"))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_GNU_TLS"
 {
@@ -12885,13 +12886,19 @@
 (define_expand "tls_global_dynamic_32"
   [(parallel
 [(set (match_operand:SI 0 "register_operand")
-	  (unspec:SI [(match_operand:SI 2 "register_operand")
-		  (match_operand 1 "tls_symbolic_operand")
-		  (match_operand 3 "constant_call_address_operand")]
-		 UNSPEC_TLS_GD))
+	  (call:SI
+	   (mem:QI (match_operand 3 "constant_call_address_operand"))
+	   (const_int 0)))
+ (unspec:SI [(match_operand:SI 2 "register_operand")
+		 (match_operand 1 "tls_symbolic_operand")]
+		UNSPEC_TLS_GD)
  (clobber (match_scratch:SI 4))
  (clobber (match_scratch:SI 5))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  ""
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})
 
 (define_insn "*tls_global_dynamic_64_"
   [(set (match_operand:P 0 "register_operand" "=a")
@@ -12946,16 +12953,20 @@
 	   (const_int 0)))
  (unspec:P [(match_operand 1 "tls_symbolic_operand")]
 	   UNSPEC_TLS_GD)])]
-  "TARGET_64BIT")
+  "TARGET_64BIT"
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})
 
 (define_insn "*tls_local_dynamic_base_32_gnu"
   [(set (match_operand:SI 0 "register_operand" "=a")
-	(unspec:SI
-	 [(match_operand:SI 1 "register_operand" "b")
-	  (match_operand 2 "constant_call_address_operand" "z")]
-	 UNSPEC_TLS_LD_BASE))
-   (clobber (match_scratch:SI 3 "=d"))
-   (clobber (match_scratch:SI 4 "=c"))
+	(call:SI
+	 (mem:QI (match_operand 2 "constant_call_address_operand" "z"))
+	 (match_operand 3)))
+   (unspec:SI [(match_operand:SI 1 "register_operand" "b")]
+	  UNSPEC_TLS_LD_BASE)
+   (clobber (match_scratch:SI 4 "=d"))
+   (clobber (match_scratch:SI 5 "=c"))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_GNU_TLS"
 {
@@ -12976,13 +12987,18 @@
 (define_expand "tls_local_dynamic_base_32"
   [(parallel
  [(set (match_operand:SI 0

Re: RFA: New ipa-devirt PATCH for c++/58678 (devirt vs. KDE)

2014-03-12 Thread Jason Merrill

On 03/11/2014 05:08 PM, Jan Hubicka wrote:

Jason, I was looking into this and I think I have patch that works.  I would
just like to verify I inderstnad things right.  First thing I implemented is to
consistently skip dtors of abstract classes as per the comment in
abstract_class_dtor_p there is no way to call those by virtual table pointer.
Unlike your patch it will i.e. enable better unreachable code removal since
they will not appear in possible target lists of polymorphic calls.


Makes sense.


The second change I did is to move methods that are reachable only
via abstract class into the part of list that is in construction,
since obviously we do not have instances of these classes.


I'm not sure how you would tell that a method that is reachable only via 
abstract class; a derived class doesn't have to override methods other 
than the destructor, so we could get the abstract class method for an 
object of a derived class.



What I would like to verify with you shtat I also changed walk when looking
for destructors to not consider types in construction. I believe there is no way
to get destructor call via construction vtable as we always know the type.
Is that right?


I guess it would be possible to get the abstract destructor via 
construction vtable if someone deletes the object while it's being 
constructed.  But surely that's undefined behavior anyway.



also if abstract_class_dtor_p functions are never called via vtables, is there
reason for C++ FE to put them there? I understand that there is a slot in 
vtable initializer
for them, but things would go smoother if it was initialized to NULL or some 
other marker
different from cxa_pure_virtual.  Then gimple-fold will already substitute it 
for
builtin_unreachable and they will get ignored during the ipa-devirt's walks.


Hmm, interesting idea.  Shall I implement that?

Jason



Re: [C++ Patch/RFC] PR 60254

2014-03-12 Thread Jason Merrill

OK.

Jason


Re: [patch,libfortran] [4.7/4.8/4.9 Regression] PR38199 missed optimization: I/O performance

2014-03-12 Thread Jerry DeLisle
On 03/12/2014 11:46 AM, Tobias Burnus wrote:
> Jerry DeLisle wrote:
>> +  if (dtp->common.unit == 0)
>> +{
>> +  len = string_len_trim (dtp->internal_unit_len,
>> + dtp->internal_unit);
>> +  if (len > 0)
>> +dtp->internal_unit_len = len;
>> +  iunit->recl = dtp->internal_unit_len;
>> +}
> Is there a reason for having the "len > 0" check? And would the following 
> work?
> 
>   dtp->internal_unit_len = len ? len : 1;
> 
>> +  if (len > 0)
>> +dtp->internal_unit_len = len;
> 
> Ditto.
> 
> Otherwise, it looks good to me - even if Dominique has found another special
> case [PR38199, Comment 14], where the performance is with patch is 7% lower.
> 
> Tobias
> 

I cleaned it up a bit around that logic in unit.c, Added a snippit in
list_read.c to take care of comment #41 of the PR, regression tested, NIST
tested and committed.

SendingChangeLog
Sendingio/list_read.c
Sendingio/read.c
Sendingio/unit.c
Transmitting file data 
Committed revision 208528.

Thanks for review.

I plan to continue some looking at the eat_line function which I think can be 
done
more efficiently, but no hurry.

Jerry


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
I saw the problem last patch had on ia32. Without explicit call in rtl
template, scheduler may schedule the sp adjusting insn across tls
descriptor and break the alignment assumption.
I am testing the updated patch on x86_64.

Can we combine the last two patches, both adding call explicitly in
rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
reload complete?

Regards,
Wei.

On Wed, Mar 12, 2014 at 5:33 PM, H.J. Lu  wrote:
> On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi  wrote:
 Does my patch fix the original problem?
>>>
>>> Yes, it works. I am doing bootstrap and regression test for your patch. 
>>> Thanks!
>>>
>>
>> The patch passes bootstrap and regression test on x86_64-linux-gnu.
>>
>
> My patch fails to handle ia32.  Here is the updated one.
>
> --
> H.J.