Re: [PATCH V3]PR other/70268: map one directory name (old) to another (new) in __FILE__

2016-03-23 Thread Hongxu Jia

Ping, is that OK? Any suggestions?

//Hongxu

On 03/18/2016 01:16 PM, Hongxu Jia wrote:

Changed in V3:

- Rebase to latest master (efc86c4c627b82364f118a29b5d9d58cad8b8c76)

- Fix bad formatting (missing space before '(').

- Use of @code{} around literal source code text.

//Hongxu





Re: [PATCH] Slightly improve TARGET_STV splitters (PR target/70321)

2016-03-23 Thread Uros Bizjak
On Tue, Mar 22, 2016 at 10:37 PM, Jakub Jelinek  wrote:
> Hi!
>
> As the PR mentions, DImode AND/IOR/XOR patterns often result in too ugly
> code, regression from when the patterns weren't there (before STV has been
> added).  This patch attempts to improve it a little bit by improving the
> splitter for these, rather than always generating two SImode AND/IOR/XOR
> instructions, if the last operand's subword is either 0 or -1, optimize
> the corresponding instruction in the pair to nothing, or to clearing, or
> negation.  More improvement can be IMHO only achieved by moving the STV
> pass before combiner and split patterns we don't adjust into vector patterns
> into corresponding SImode patterns, so that the combiner can handle them,
> but that sounds like stage1 material.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-03-22  Jakub Jelinek  
>
> PR target/70321
> * config/i386/i386.md (*anddi3_doubleword, *di3_doubleword):
> Optimize TARGET_STV splitters, if high or low word of last argument
> is 0 or -1.

Looks reasonable and safe.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2016-03-22 09:13:54.0 +0100
> +++ gcc/config/i386/i386.md 2016-03-22 18:45:16.392316554 +0100
> @@ -8141,16 +8141,31 @@
>  (match_operand:DI 1 "nonimmediate_operand" "%0,0,0")
>  (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,rm")))
> (clobber (reg:CC FLAGS_REG))]
> -  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 && ix86_binary_operator_ok 
> (AND, DImode, operands)"
> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2
> +   && ix86_binary_operator_ok (AND, DImode, operands)"
>"#"
>"&& reload_completed"
> -  [(parallel [(set (match_dup 0)
> -  (and:SI (match_dup 1) (match_dup 2)))
> - (clobber (reg:CC FLAGS_REG))])
> -   (parallel [(set (match_dup 3)
> -  (and:SI (match_dup 4) (match_dup 5)))
> - (clobber (reg:CC FLAGS_REG))])]
> -  "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);")
> +  [(const_int 0)]
> +{
> +  split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);
> +  if (operands[2] == const0_rtx)
> +{
> +  operands[1] = const0_rtx;
> +  ix86_expand_move (SImode, &operands[0]);
> +}
> +  else if (operands[2] != constm1_rtx)
> +emit_insn (gen_andsi3 (operands[0], operands[1], operands[2]));
> +  else if (operands[5] == constm1_rtx)
> +emit_note (NOTE_INSN_DELETED);
> +  if (operands[5] == const0_rtx)
> +{
> +  operands[4] = const0_rtx;
> +  ix86_expand_move (SImode, &operands[3]);
> +}
> +  else if (operands[5] != constm1_rtx)
> +emit_insn (gen_andsi3 (operands[3], operands[4], operands[5]));
> +  DONE;
> +})
>
>  (define_insn "*andsi_1"
>[(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r,Ya,!k")
> @@ -8665,16 +8680,41 @@
>  (match_operand:DI 1 "nonimmediate_operand" "%0,0,0")
>  (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,rm")))
> (clobber (reg:CC FLAGS_REG))]
> -  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2 && ix86_binary_operator_ok 
> (, DImode, operands)"
> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2
> +   && ix86_binary_operator_ok (, DImode, operands)"
>"#"
>"&& reload_completed"
> -  [(parallel [(set (match_dup 0)
> -  (any_or:SI (match_dup 1) (match_dup 2)))
> - (clobber (reg:CC FLAGS_REG))])
> -   (parallel [(set (match_dup 3)
> -  (any_or:SI (match_dup 4) (match_dup 5)))
> - (clobber (reg:CC FLAGS_REG))])]
> -  "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);")
> +  [(const_int 0)]
> +{
> +  split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);
> +  if (operands[2] == constm1_rtx)
> +{
> +  if ( == IOR)
> +   {
> + operands[1] = constm1_rtx;
> + ix86_expand_move (SImode, &operands[0]);
> +   }
> +  else
> +   ix86_expand_unary_operator (NOT, SImode, &operands[0]);
> +}
> +  else if (operands[2] != const0_rtx)
> +ix86_expand_binary_operator (, SImode, &operands[0]);
> +  else if (operands[5] == const0_rtx)
> +emit_note (NOTE_INSN_DELETED);
> +  if (operands[5] == constm1_rtx)
> +{
> +  if ( == IOR)
> +   {
> + operands[4] = constm1_rtx;
> + ix86_expand_move (SImode, &operands[3]);
> +   }
> +  else
> +   ix86_expand_unary_operator (NOT, SImode, &operands[3]);
> +}
> +  else if (operands[5] != const0_rtx)
> +ix86_expand_binary_operator (, SImode, &operands[3]);
> +  DONE;
> +})
>
>  (define_insn_and_split "*andndi3_doubleword"
>[(set (match_operand:DI 0 "register_operand" "=r,r")
>
> Jakub


[RFA][PATCH][tree-optimization/64058] Add new coalescing tie breaker heuristic V2

2016-03-23 Thread Jeff Law


This patch contains two parts.  First is a bit of raw infrastructure in 
bitmap.c.  That change factors out the code to count the bits set in a 
given BITMAP_WORD and uses that new function from bitmap_count_bits.


It also introduces bitmap_count_unique_bits which counts the unique bits 
set across two bitmaps.



The second patch introduces the new tiebreaker in the coalescer and 
utilizes the new bitmap_count_unique_bits in the process.  The major 
change since V1 is lazily counting those bits only when the major sort 
keys are the same for two coalesce_pairs.  And we include the data for 
the new heuristic in the debugging dumps.


As I feared building a good testcase for this is a major PITA.  Testing 
for a specific set of coalescings is not easy at all and IMHO is going 
to be way too fragile.


I did verify by hand that the test in 64058 and its duplicate were fixed 
by various iterations of this patch using trunk compilers reported in 
those BZs.  Spot checking other tests showed a small, but consistent 
improvement in the generated code (fewer copies).


Bootstrapped and regression tested on x86-64-linux-gnu.  OK for the 
trunk now?


Jeff

commit 716d40d9b853795f007476936f2ac06851d201e0
Author: Jeff Law 
Date:   Wed Mar 16 16:56:03 2016 -0400

PR tree-optimization/64058
* bitmap.c (bitmap_count_unique_bits): New function.
(bitmap_count_bits_in_word): New function, extracted from
bitmap_count_bits.
(bitmap_count_bits): Use bitmap_count_bits_in_word.
* bitmap.h (bitmap_count_unique_bits): Declare it.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6f52c2d..05389b6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,12 @@
 2016-03-18  Jeff Law  
 
+   PR tree-optimization/64058
+   * bitmap.c (bitmap_count_unique_bits): New function.
+   (bitmap_count_bits_in_word): New function, extracted from
+   bitmap_count_bits.
+   (bitmap_count_bits): Use bitmap_count_bits_in_word.
+   * bitmap.h (bitmap_count_unique_bits): Declare it.
+
PR rtl-optimization/70263
* ira.c (memref_used_between_p): Assert we found END in the insn chain.
(update_equiv_regs): When trying to move a store to after the insn
diff --git a/gcc/bitmap.c b/gcc/bitmap.c
index ac20ae5..0c05512 100644
--- a/gcc/bitmap.c
+++ b/gcc/bitmap.c
@@ -662,6 +662,26 @@ bitmap_popcount (BITMAP_WORD a)
   return ret;
 }
 #endif
+
+/* Count and return the number of bits set in the bitmap word BITS.  */
+static unsigned long
+bitmap_count_bits_in_word (const BITMAP_WORD *bits)
+{
+  unsigned long count = 0;
+
+  for (unsigned ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
+{
+#if GCC_VERSION >= 3400
+  /* Note that popcountl matches BITMAP_WORD in type, so the actual size
+of BITMAP_WORD is not material.  */
+  count += __builtin_popcountl (bits[ix]);
+#else
+  count += bitmap_popcount (bits[ix]);
+#endif
+}
+  return count;
+}
+
 /* Count the number of bits set in the bitmap, and return it.  */
 
 unsigned long
@@ -669,19 +689,44 @@ bitmap_count_bits (const_bitmap a)
 {
   unsigned long count = 0;
   const bitmap_element *elt;
-  unsigned ix;
 
   for (elt = a->first; elt; elt = elt->next)
+count += bitmap_count_bits_in_word (elt->bits);
+
+  return count;
+}
+
+/* Count the number of unique bits set in A and B and return it.  */
+
+unsigned long
+bitmap_count_unique_bits (const_bitmap a, const_bitmap b)
+{
+  unsigned long count = 0;
+  const bitmap_element *elt_a, *elt_b;
+
+  for (elt_a = a->first, elt_b = b->first; elt_a && elt_b; )
 {
-  for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
+  /* If we're at different indices, then count all the bits
+in the lower element.  If we're at the same index, then
+count the bits in the IOR of the two elements.  */
+  if (elt_a->indx < elt_b->indx)
{
-#if GCC_VERSION >= 3400
- /* Note that popcountl matches BITMAP_WORD in type, so the actual size
-of BITMAP_WORD is not material.  */
- count += __builtin_popcountl (elt->bits[ix]);
-#else
- count += bitmap_popcount (elt->bits[ix]);
-#endif
+ count += bitmap_count_bits_in_word (elt_a->bits);
+ elt_a = elt_a->next;
+   }
+  else if (elt_b->indx < elt_a->indx)
+   {
+ count += bitmap_count_bits_in_word (elt_b->bits);
+ elt_b = elt_b->next;
+   }
+  else
+   {
+ BITMAP_WORD bits[BITMAP_ELEMENT_WORDS];
+ for (unsigned ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
+   bits[ix] = elt_a->bits[ix] | elt_b->bits[ix];
+ count += bitmap_count_bits_in_word (bits);
+ elt_a = elt_a->next;
+ elt_b = elt_b->next;
}
 }
   return count;
diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 805e37e..1115711 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -280,6 +280,9 @@ extern bool bitmap_single_bit_set_p (const_bitmap);
 /* Count the number of bits set in the bitmap.  */
 extern

Re: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread Thomas Schwinge
Hi!

On Fri, 22 Jan 2016 12:50:38 -0600, James Norris  
wrote:
> The attached patch fixes a defect reported with gcc 5.2
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69414).
> It is also the case, the issue is present in the gomp4
> branch. The patch also adds additional testing.
> 
> Committed to gomp4 after bootstrap and regtesting.

Jakub, is that following two-line patch (libgomp/oacc-mem.c, plus test
cases) OK for trunk?

> --- ChangeLog.gomp(revision 232740)
> +++ ChangeLog.gomp(revision 232741)
> @@ -1,3 +1,11 @@
> +2016-01-22  James Norris  
> +
> + * oacc-mem.c (delete_copyout, update_dev_host): Fix device address.

Given that Daichi Fukuoka submitted the patch in
, please acknowledge that in the ChangeLog.
Please also include the test case submitted there; rename their file from
gcc1.f90 to pr69414.f90 (or similar).

> --- oacc-mem.c(revision 232740)
> +++ oacc-mem.c(revision 232741)
> @@ -509,7 +509,7 @@
>gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
>  }
>  
> -  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
> +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);
>  
>host_size = n->host_end - n->host_start;
>  
> @@ -562,7 +562,7 @@
>gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
>  }
>  
> -  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
> +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);
>  
>gomp_mutex_unlock (&acc_dev->lock);
>  

> --- testsuite/libgomp.oacc-fortran/update-1-2.f90 (revision 0)
> +++ testsuite/libgomp.oacc-fortran/update-1-2.f90 (revision 232741)
> @@ -0,0 +1,239 @@
> +! Copy of update-1.f90 with self exchanged with host for !$acc update

Actually, that file contains additional changes.

Anyway, I suggest we don't add new test cases with just self vs. host
clauses exchanged -- their equivalence is something that should (already)
be tested in GCC front end test cases.  I know that we currently have a
distinct libgomp.oacc-c-c++-common/update-1-2.c test case as a "copy of
update-1.c with self exchanged with host for #pragma acc update", once
added by me; I suggest we remove that.  Let's just in the same file use a
mixture of host and self clauses.  (I'll take care of that.)


Grüße
 Thomas


Re: [PATCH] Adjust PR70251 fix

2016-03-23 Thread Richard Biener
On Tue, 22 Mar 2016, Marc Glisse wrote:

> On Tue, 22 Mar 2016, Richard Biener wrote:
> 
> > On March 22, 2016 4:55:13 PM GMT+01:00, Marc Glisse 
> > wrote:
> > > On Tue, 22 Mar 2016, Richard Biener wrote:
> > > 
> > > > 
> > > > This adjusts the PR70251 fix as discussed in the PR audit trail
> > > > and fixes a bug in genmatch required (bah, stupid GENERIC comparisons
> > > in
> > > > GIMPLE operands...).
> > > 
> > > Thanks !
> > > 
> > > Hmm, the transformation is still disabled on AVX512:
> > > 
> > > > ! /* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector
> > > comparisons
> > > > !return all -1 or all 0 results.  */
> > > >  /* ??? We could instead convert all instances of the vec_cond to
> > > negate,
> > > > but that isn't necessarily a win on its own.  */
> > > >  (simplify
> > > > !  (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1
> > > integer_zerop@2)))
> > > >   (if (VECTOR_TYPE_P (type)
> > > >&& TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS
> > > (TREE_TYPE (@0))
> > > >&& (TYPE_MODE (TREE_TYPE (type))
> > > >== TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
> > > > !   (minus @3 (view_convert (vec_cond @0 (negate @1) @2)
> > > 
> > > It seems that the references to @0 in the "if" should use @1 instead
> > > (at
> > > least the last one). I assume this test is to make sure that A has as
> > > many
> > > integer elements of the same size as the result of the vec_cond_expr.
> > 
> > It looks like that is always guaranteed by the input form and instead these
> > are now useless checks which were guarding the original view-convert
> > transform.
> 
> The input form has a view_convert_expr in it, so I don't see what prevents
> from arriving here with
> 
> v4df + view_convert((v8si < v8si) ? v8si : v8si)
> 
> for instance. That seems to indicate that some test is still needed, it is
> just better on the second or third argument of the vec_cond_expr than on the
> condition.

Hmm, indeed.

> Or maybe you mean we could drop the view_convert_expr from the input form. It
> should have been sunk in the 2 constant arguments of the vec_cond_expr anyway
> (I didn't check if that really happens). That sounds good.

I think it originally was intended to cover sign changes that the
vectorizer generates eventually.  Those may still happen but yes,
we could sink them into the vec_cond.  Leaving this (and the
consideration on whether a ? -1 : 0 should be transformed to
view_convert ) to GCC 7 time.

Bootstrapping / testing the following now.

Thanks for noticing,
Richard.

2016-03-23  Richard Biener  

PR middle-end/70251
* match.pd (A + (B vcmp C ? 1 : 0) -> A - (B vcmp C)): Adjust
mode compatibility check.
(A - (B vcmp C ? 1 : 0) -> A + (B vcmp C)): Likewise.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 234415)
+++ gcc/match.pd(working copy)
@@ -1759,18 +1759,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 
integer_zerop@2)))
  (if (VECTOR_TYPE_P (type)
-  && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
+  && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))
   && (TYPE_MODE (TREE_TYPE (type))
-  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
+  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)
   (minus @3 (view_convert (vec_cond @0 (negate @1) @2)
 
 /* ... likewise A - (B vcmp C ? 1 : 0) -> A + (B vcmp C ? -1 : 0).  */
 (simplify
  (minus @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
  (if (VECTOR_TYPE_P (type)
-  && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
+  && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))
   && (TYPE_MODE (TREE_TYPE (type))
-  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)
+  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)
   (plus @3 (view_convert (vec_cond @0 (negate @1) @2)
 
 


Re: [PATCH] Fix *vector_shift_pattern (PR tree-optimization/70354)

2016-03-23 Thread Richard Biener
On Tue, 22 Mar 2016, Jakub Jelinek wrote:

> Hi!
> 
> As the testcase shows, the C/C++ FEs narrow the shift counters from whatever
> type they had originally to unsigned int (previously signed int).
> Then the vect-patterns code, to be able to use vector by vector shifts
> attempts to narrow or widen them again to the right type.  If there is
> already a cast from the right precision, it just uses the rhs1 of that case,
> otherwise it adds a cast to the pattern, which performs the needed widening
> or narrowing.
> 
> Unfortunately, we have information loss during optimizations, we don't know
> anymore if it was say:
> long a[64], b[64];
> void foo (void)
> {
>   for (int i = 0; i < 64; i++)
> a[i] <<= b[i]; // Here we don't need any masking, it would be UB
>  // if b isn't in range
> }
> void bar (void)
> {
>   for (int i = 0; i < 64; i++)
> a[i] <<= (unsigned) b[i]; // But here we can't just use b[i] as the
> // shift count, because the upper bits are to be 
> masked off.
> }

True.  I suppose as signed ops with well-defined overflow would come
in useful sometimes here we'd like to have a conversion with
undefined overflow behavior to capture this fact in the IL...

> void baz (void)
> {
>   for (int i = 0; i < 64; i++)
> a[i] <<= b[i] - 0x72ULL; // And here the optimizers will likely
>// optimize away the subtraction, because 
> there
>// is implicit cast to (unsigned int).  We 
> need
>// to mask instead of using b[i] directly.
> }
> But, not casting say long long shift counters to unsigned int would penalize
> other code, computing unneeded operations.  So I'm afraid we want the
> following fix, which I've bootstrapped/regtested on x86_64-linux and
> i686-linux.  For short/char shifts we don't need this of course.
> 
> Ok for trunk?

Ok.

Thanks,
Richard.

> 2016-03-22  Jakub Jelinek  
> 
>   PR tree-optimization/70354
>   * tree-vect-patterns.c (vect_recog_vector_vector_shift_pattern): If
>   oprnd0 is wider than oprnd1 and there is a cast from the wider
>   type to oprnd1, mask it with the mask of the narrower type.
> 
>   * gcc.dg/vect/pr70354-1.c: New test.
>   * gcc.dg/vect/pr70354-2.c: New test.
>   * gcc.target/i386/avx2-pr70354-1.c: New test.
>   * gcc.target/i386/avx2-pr70354-2.c: New test.
> 
> --- gcc/tree-vect-patterns.c.jj   2016-03-04 15:42:12.0 +0100
> +++ gcc/tree-vect-patterns.c  2016-03-22 15:28:24.403579426 +0100
> @@ -2097,7 +2097,20 @@ vect_recog_vector_vector_shift_pattern (
>if (TYPE_MODE (TREE_TYPE (rhs1)) == TYPE_MODE (TREE_TYPE (oprnd0))
> && TYPE_PRECISION (TREE_TYPE (rhs1))
>== TYPE_PRECISION (TREE_TYPE (oprnd0)))
> - def = rhs1;
> + {
> +   if (TYPE_PRECISION (TREE_TYPE (oprnd1))
> +   >= TYPE_PRECISION (TREE_TYPE (rhs1)))
> + def = rhs1;
> +   else
> + {
> +   tree mask
> + = build_low_bits_mask (TREE_TYPE (rhs1),
> +TYPE_PRECISION (TREE_TYPE (oprnd1)));
> +   def = vect_recog_temp_ssa_var (TREE_TYPE (rhs1), NULL);
> +   def_stmt = gimple_build_assign (def, BIT_AND_EXPR, rhs1, mask);
> +   new_pattern_def_seq (stmt_vinfo, def_stmt);
> + }
> + }
>  }
>  
>if (def == NULL_TREE)
> --- gcc/testsuite/gcc.dg/vect/pr70354-1.c.jj  2016-03-22 15:36:42.210847707 
> +0100
> +++ gcc/testsuite/gcc.dg/vect/pr70354-1.c 2016-03-22 15:47:45.448878909 
> +0100
> @@ -0,0 +1,50 @@
> +/* PR tree-optimization/70354 */
> +/* { dg-do run } */
> +
> +#ifndef main
> +#include "tree-vect.h"
> +#endif
> +
> +long long int b[64], c[64], g[64];
> +unsigned long long int a[64], d[64], e[64], f[64], h[64];
> +
> +__attribute__ ((noinline, noclone)) void
> +foo (void)
> +{
> +  int i;
> +  for (i = 0; i < 64; i++)
> +{
> +  d[i] = h[i] << (unsigned long long int) b[i] * e[i])
> + << (-a[i] - 3752448776177690134ULL))
> +- 8214565720323784703ULL) - 1ULL);
> +  e[i] = (_Bool) (f[i] + (unsigned long long int) g[i]);
> +  g[i] = c[i];
> +}
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +#ifndef main
> +  check_vect ();
> +#endif
> +  if (__CHAR_BIT__ != 8 || sizeof (long long int) != 8)
> +return 0;
> +  for (i = 0; i < 64; ++i)
> +{
> +  a[i] = 14694295297531861425ULL;
> +  b[i] = -1725558902283030715LL;
> +  c[i] = 4402992416302558097LL;
> +  e[i] = 6297173129107286501ULL;
> +  f[i] = 13865724171235650855ULL;
> +  g[i] = 982871027473857427LL;
> +  h[i] = 8193845517487445944ULL;
> +}
> +  foo ();
> +  for (i = 0; i < 64; i++)
> +if (d[i] != 8193845517487445944ULL || e[i] != 1
> + || g[i] != 4402992416302558097ULL)
> +  abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.dg/vect/pr70354-2.c.jj  2016-03-22 15:36:45.

Re: [PATCH] Fix 69845

2016-03-23 Thread Richard Biener
On Tue, Mar 22, 2016 at 6:40 PM, Richard Henderson  wrote:
> In PR68142 you added a check for overflow + __INT_MIN__.
> I can't figure out why the check for __INT_MIN__, except
> that it seems specific to the test case you examined.
>
> And indeed, this test case shows how things go wrong
> with other distributed folding leading to overflow.

Huh, not sure what I was thinking .. but I remember being on
a hunt through various INT_MIN related overflow bugs when
running into this one.

> I added two tests, one signed, one unsigned.  The second
> verifies that we do still fold for the defined-overflow case.
>
> Ok?

Ok.

Note that always when I find bugs in extract_muldiv and try
to decipher what it does I think we need to rip that out,
replacing it with some simple patterns and leaving the rest
to passes like reassoc.  It's simply a beast that proved to
be a can of worms...

Thanks,
Richard.

>
> r~


Re: [RFA][PATCH][tree-optimization/64058] Add new coalescing tie breaker heuristic V2

2016-03-23 Thread Richard Biener
On Wed, Mar 23, 2016 at 8:49 AM, Jeff Law  wrote:
>
> This patch contains two parts.  First is a bit of raw infrastructure in
> bitmap.c.  That change factors out the code to count the bits set in a given
> BITMAP_WORD and uses that new function from bitmap_count_bits.
>
> It also introduces bitmap_count_unique_bits which counts the unique bits set
> across two bitmaps.
>
>
> The second patch introduces the new tiebreaker in the coalescer and utilizes
> the new bitmap_count_unique_bits in the process.  The major change since V1
> is lazily counting those bits only when the major sort keys are the same for
> two coalesce_pairs.  And we include the data for the new heuristic in the
> debugging dumps.
>
> As I feared building a good testcase for this is a major PITA.  Testing for
> a specific set of coalescings is not easy at all and IMHO is going to be way
> too fragile.
>
> I did verify by hand that the test in 64058 and its duplicate were fixed by
> various iterations of this patch using trunk compilers reported in those
> BZs.  Spot checking other tests showed a small, but consistent improvement
> in the generated code (fewer copies).
>
> Bootstrapped and regression tested on x86-64-linux-gnu.  OK for the trunk
> now?

+ssa_conflicts *conflicts_;
+var_map map_;

please make those 'static'.  Ok with this change.

Thanks,
Richard.

> Jeff
>
>
> commit 716d40d9b853795f007476936f2ac06851d201e0
> Author: Jeff Law 
> Date:   Wed Mar 16 16:56:03 2016 -0400
>
> PR tree-optimization/64058
> * bitmap.c (bitmap_count_unique_bits): New function.
> (bitmap_count_bits_in_word): New function, extracted from
> bitmap_count_bits.
> (bitmap_count_bits): Use bitmap_count_bits_in_word.
> * bitmap.h (bitmap_count_unique_bits): Declare it.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 6f52c2d..05389b6 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,12 @@
>  2016-03-18  Jeff Law  
>
> +   PR tree-optimization/64058
> +   * bitmap.c (bitmap_count_unique_bits): New function.
> +   (bitmap_count_bits_in_word): New function, extracted from
> +   bitmap_count_bits.
> +   (bitmap_count_bits): Use bitmap_count_bits_in_word.
> +   * bitmap.h (bitmap_count_unique_bits): Declare it.
> +
> PR rtl-optimization/70263
> * ira.c (memref_used_between_p): Assert we found END in the insn
> chain.
> (update_equiv_regs): When trying to move a store to after the insn
> diff --git a/gcc/bitmap.c b/gcc/bitmap.c
> index ac20ae5..0c05512 100644
> --- a/gcc/bitmap.c
> +++ b/gcc/bitmap.c
> @@ -662,6 +662,26 @@ bitmap_popcount (BITMAP_WORD a)
>return ret;
>  }
>  #endif
> +
> +/* Count and return the number of bits set in the bitmap word BITS.  */
> +static unsigned long
> +bitmap_count_bits_in_word (const BITMAP_WORD *bits)
> +{
> +  unsigned long count = 0;
> +
> +  for (unsigned ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
> +{
> +#if GCC_VERSION >= 3400
> +  /* Note that popcountl matches BITMAP_WORD in type, so the actual
> size
> +of BITMAP_WORD is not material.  */
> +  count += __builtin_popcountl (bits[ix]);
> +#else
> +  count += bitmap_popcount (bits[ix]);
> +#endif
> +}
> +  return count;
> +}
> +
>  /* Count the number of bits set in the bitmap, and return it.  */
>
>  unsigned long
> @@ -669,19 +689,44 @@ bitmap_count_bits (const_bitmap a)
>  {
>unsigned long count = 0;
>const bitmap_element *elt;
> -  unsigned ix;
>
>for (elt = a->first; elt; elt = elt->next)
> +count += bitmap_count_bits_in_word (elt->bits);
> +
> +  return count;
> +}
> +
> +/* Count the number of unique bits set in A and B and return it.  */
> +
> +unsigned long
> +bitmap_count_unique_bits (const_bitmap a, const_bitmap b)
> +{
> +  unsigned long count = 0;
> +  const bitmap_element *elt_a, *elt_b;
> +
> +  for (elt_a = a->first, elt_b = b->first; elt_a && elt_b; )
>  {
> -  for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
> +  /* If we're at different indices, then count all the bits
> +in the lower element.  If we're at the same index, then
> +count the bits in the IOR of the two elements.  */
> +  if (elt_a->indx < elt_b->indx)
> {
> -#if GCC_VERSION >= 3400
> - /* Note that popcountl matches BITMAP_WORD in type, so the actual
> size
> -of BITMAP_WORD is not material.  */
> - count += __builtin_popcountl (elt->bits[ix]);
> -#else
> - count += bitmap_popcount (elt->bits[ix]);
> -#endif
> + count += bitmap_count_bits_in_word (elt_a->bits);
> + elt_a = elt_a->next;
> +   }
> +  else if (elt_b->indx < elt_a->indx)
> +   {
> + count += bitmap_count_bits_in_word (elt_b->bits);
> + elt_b = elt_b->next;
> +   }
> +  else
> +   {
> + BITMAP_WORD bits[BITMAP_ELEMENT_WORDS];
> + for (unsigned ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
> +   bits[ix] = elt_a->bits[ix] | e

Re: [PATCH] Fix 69845

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 09:59:56AM +0100, Richard Biener wrote:
> Note that always when I find bugs in extract_muldiv and try
> to decipher what it does I think we need to rip that out,
> replacing it with some simple patterns and leaving the rest
> to passes like reassoc.  It's simply a beast that proved to
> be a can of worms...

That is true, but sadly any optimization removals from extract_muldiv
proved to be huge cans of worms too (e.g. in constexpr handling
or constant initializers).

Jakub


Re: [PATCH] Fix 69845

2016-03-23 Thread Richard Biener
On Wed, Mar 23, 2016 at 10:53 AM, Jakub Jelinek  wrote:
> On Wed, Mar 23, 2016 at 09:59:56AM +0100, Richard Biener wrote:
>> Note that always when I find bugs in extract_muldiv and try
>> to decipher what it does I think we need to rip that out,
>> replacing it with some simple patterns and leaving the rest
>> to passes like reassoc.  It's simply a beast that proved to
>> be a can of worms...
>
> That is true, but sadly any optimization removals from extract_muldiv
> proved to be huge cans of worms too (e.g. in constexpr handling
> or constant initializers).

Yes, I expect it is quite some work to decipher what this beast actually
does and what parts of it do not have an equivalent on the GIMPLE level
as well.  I'm not sure if constexprs or constnat initializers need to handle
arbitrary association for example.  For example

int a, b;
constexpr int c = (((a + b) - a) - b);

doesn't appear to be a valid constexpr.  So for that (and the constant
initializer case)
I don't see the issue really unless it is coping with pointer
arithmetic lowering
the FEs perform and symbolic constants.  And for those cases the FEs should
simply avoid the lowering (and thus the ME have appropriate representation and
lower itself at some point).

Richard.

> Jakub


Re: [PATCH][ARM] Add deprecation warning on pre-v4t architecture revisions

2016-03-23 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 07/03/16 15:40, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00046.html

Thanks,
Kyrill

On 01/03/16 16:17, Kyrill Tkachov wrote:

Hi all,

For GCC 6 we want to deprecate architecture revisions prior to ARMv4T.
This patch implements this by documenting the deprecation in invoke.texi and 
adding
a warning whenever the user specifies an -march or -mcpu option that selects 
such
an architecture revision.

Bootstrapped and tested on arm.

Ok for trunk?

Thanks,
Kyrill

P.S. I'll add a note to changes.html to that effect separately.

2016-03-01  Kyrylo Tkachov  

* config/arm/arm.c (arm_option_override): Warn on pre-ARMv4T
architecture revisions.
* doc/invoke.texi (ARM Options): Add note on deprecation of pre-ARMv4T
architecture revisions.

2016-03-01  Kyrylo Tkachov  

* gcc.target/arm/ftest-armv4-arm.c: Add dg-warning for deprecation
warning.
* gcc.target/arm/pr62554.c: Likewise.
* gcc.target/arm/pr69610-1.c: Likewise.
* gcc.target/arm/pr69610-2.c: Likewise.






Re: [PATCH][ARM] PR driver/70132: Avoid double fclose in driver-arm.c

2016-03-23 Thread Kyrill Tkachov


On 14/03/16 12:04, Bernd Schmidt wrote:

On 03/11/2016 04:32 PM, Kyrill Tkachov wrote:

 PR driver/70132
 * config/arm/driver-arm.c (host_detect_local_cpu): Set file pointer
 to NULL after closing file.


Doesn't match the patch. Either variant is fine but please use the right 
combination :)




Sorry, that ChangeLog was from a earlier revision of the patch.
The correct one is:

2016-03-23  Kyrylo Tkachov  

PR driver/70132
* config/arm/driver-arm.c (host_detect_local_cpu): Reorder exit logic
to not call fclose twice on file.


Committed with r234419.

Thanks,
Kyrill


Bernd





Re: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 09:35:30AM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 22 Jan 2016 12:50:38 -0600, James Norris  
> wrote:
> > The attached patch fixes a defect reported with gcc 5.2
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69414).
> > It is also the case, the issue is present in the gomp4
> > branch. The patch also adds additional testing.
> > 
> > Committed to gomp4 after bootstrap and regtesting.
> 
> Jakub, is that following two-line patch (libgomp/oacc-mem.c, plus test
> cases) OK for trunk?
> 
> > --- ChangeLog.gomp  (revision 232740)
> > +++ ChangeLog.gomp  (revision 232741)
> > @@ -1,3 +1,11 @@
> > +2016-01-22  James Norris  
> > +
> > +   * oacc-mem.c (delete_copyout, update_dev_host): Fix device address.
> 
> Given that Daichi Fukuoka submitted the patch in
> , please acknowledge that in the ChangeLog.
> Please also include the test case submitted there; rename their file from
> gcc1.f90 to pr69414.f90 (or similar).

Yeah, the oacc-mem.c is a two liner, we don't need copyright assignment for
that, so please mention Daichi-san's name + mail address, and PR libgpmp/69414
in the ChangeLog entry.

> > --- oacc-mem.c  (revision 232740)
> > +++ oacc-mem.c  (revision 232741)
> > @@ -509,7 +509,7 @@
> >gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
> >  }
> >  
> > -  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
> > +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);

I'd prefer (uintptr_t) h instead of, and you probably need to wrap it then.

> >host_size = n->host_end - n->host_start;
> >  
> > @@ -562,7 +562,7 @@
> >gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
> >  }
> >  
> > -  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
> > +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - n->host_start);

Ditto.

> > --- testsuite/libgomp.oacc-fortran/update-1-2.f90   (revision 0)
> > +++ testsuite/libgomp.oacc-fortran/update-1-2.f90   (revision 232741)
> > @@ -0,0 +1,239 @@
> > +! Copy of update-1.f90 with self exchanged with host for !$acc update
> 
> Actually, that file contains additional changes.
> 
> Anyway, I suggest we don't add new test cases with just self vs. host
> clauses exchanged -- their equivalence is something that should (already)
> be tested in GCC front end test cases.  I know that we currently have a
> distinct libgomp.oacc-c-c++-common/update-1-2.c test case as a "copy of
> update-1.c with self exchanged with host for #pragma acc update", once
> added by me; I suggest we remove that.  Let's just in the same file use a
> mixture of host and self clauses.  (I'll take care of that.)

Otherwise LGTM, but please repost it with all the testcase changes you want
to make.

Jakub


RE: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread Fukuoka Daichi
Hi,

> Yeah, the oacc-mem.c is a two liner, we don't need copyright assignment for 
> that, so please mention Daichi-san's name + mail address, and PR 
> libgpmp/69414 in the ChangeLog entry.

If possible, could you add " SGI Japan, Ltd." before/after my name?
Because I am belonging to the company.
Thank you.

Regards,
Daichi Fukuoka

-Original Message-
From: Jakub Jelinek [mailto:ja...@redhat.com] 
Sent: Wednesday, March 23, 2016 7:24 PM
To: Thomas Schwinge
Cc: James Norris; GCC Patches; Fukuoka Daichi
Subject: Re: [gomp4] Fix handling of subarrays with update directive

On Wed, Mar 23, 2016 at 09:35:30AM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 22 Jan 2016 12:50:38 -0600, James Norris  
> wrote:
> > The attached patch fixes a defect reported with gcc 5.2 
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69414).
> > It is also the case, the issue is present in the gomp4 branch. The 
> > patch also adds additional testing.
> > 
> > Committed to gomp4 after bootstrap and regtesting.
> 
> Jakub, is that following two-line patch (libgomp/oacc-mem.c, plus test
> cases) OK for trunk?
> 
> > --- ChangeLog.gomp  (revision 232740)
> > +++ ChangeLog.gomp  (revision 232741)
> > @@ -1,3 +1,11 @@
> > +2016-01-22  James Norris  
> > +
> > +   * oacc-mem.c (delete_copyout, update_dev_host): Fix device address.
> 
> Given that Daichi Fukuoka submitted the patch in 
> , please acknowledge that in the ChangeLog.
> Please also include the test case submitted there; rename their file 
> from
> gcc1.f90 to pr69414.f90 (or similar).

Yeah, the oacc-mem.c is a two liner, we don't need copyright assignment for 
that, so please mention Daichi-san's name + mail address, and PR libgpmp/69414 
in the ChangeLog entry.

> > --- oacc-mem.c  (revision 232740)
> > +++ oacc-mem.c  (revision 232741)
> > @@ -509,7 +509,7 @@
> >gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
> >  }
> >  
> > -  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
> > +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - 
> > + n->host_start);

I'd prefer (uintptr_t) h instead of, and you probably need to wrap it then.

> >host_size = n->host_end - n->host_start;
> >  
> > @@ -562,7 +562,7 @@
> >gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
> >  }
> >  
> > -  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
> > +  d = (void *) (n->tgt->tgt_start + n->tgt_offset + h - 
> > + n->host_start);

Ditto.

> > --- testsuite/libgomp.oacc-fortran/update-1-2.f90   (revision 0)
> > +++ testsuite/libgomp.oacc-fortran/update-1-2.f90   (revision 232741)
> > @@ -0,0 +1,239 @@
> > +! Copy of update-1.f90 with self exchanged with host for !$acc 
> > +update
> 
> Actually, that file contains additional changes.
> 
> Anyway, I suggest we don't add new test cases with just self vs. host 
> clauses exchanged -- their equivalence is something that should 
> (already) be tested in GCC front end test cases.  I know that we 
> currently have a distinct libgomp.oacc-c-c++-common/update-1-2.c test 
> case as a "copy of update-1.c with self exchanged with host for 
> #pragma acc update", once added by me; I suggest we remove that.  
> Let's just in the same file use a mixture of host and self clauses.  
> (I'll take care of that.)

Otherwise LGTM, but please repost it with all the testcase changes you want to 
make.

Jakub


Re: [PATCH][ARM] Split out armv7ve effective target check

2016-03-23 Thread Kyrill Tkachov


On 16/03/16 15:54, Ramana Radhakrishnan wrote:

On Wed, Mar 2, 2016 at 1:32 PM, Kyrill Tkachov
 wrote:

Hi all,

I'm seeing the fails:
FAIL: gcc.target/arm/atomic_loaddi_2.c scan-assembler-times ldrd\tr[0-9]+,
r[0-9]+, \\[r[0-9]+\\] 1
FAIL: gcc.target/arm/atomic_loaddi_5.c scan-assembler-times ldrd\tr[0-9]+,
r[0-9]+, \\[r[0-9]+\\] 1
FAIL: gcc.target/arm/atomic_loaddi_8.c scan-assembler-times ldrd\tr[0-9]+,
r[0-9]+, \\[r[0-9]+\\] 1

when testing an arm multilib with /-march=armv7-a.

The tests have an effective target check for armv7ve but it doesn't work
because
under the hood the check is the same as for armv7-a, that is it checks for
the __ARM_ARCH_7A__
predefine which is set for both march values.

To check for armv7ve using predefines we need to check for both
__ARM_ARCH_7A__ and for the hardware
integer division predefine, making armv7ve special.

So this patch separates the effective target check definition from the rest
of the architectures
and defines it appropriately.

With this patch the aforementioned tests appear UNSUPPORTED when testing the
/-march=armv7-a multilib.

Ok for trunk?

Ok, but please follow up with updating sourcebuild.texi.


sourcebuild.texi shouldn't need any updating as I'm not adding a new effective
target check, I'm just fixing the way one of the already existing ones is 
defined.

Committed with r234420.

Thanks,
Kyrill



Ramana


Thanks,
Kyrill

2016-03-02  Kyrylo Tkachov  

 * lib/target-supports.exp: Remove v7ve entry from loop
 creating effective target checks.
 (check_effective_target_arm_arch_v7ve_ok): New procedure.
 (add_options_for_arm_arch_v7ve): Likewise.




Re: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 10:31:33AM +, Fukuoka Daichi wrote:
> > Yeah, the oacc-mem.c is a two liner, we don't need copyright assignment for 
> > that, so please mention Daichi-san's name + mail address, and PR 
> > libgpmp/69414 in the ChangeLog entry.
> 
> If possible, could you add " SGI Japan, Ltd." before/after my name?
> Because I am belonging to the company.

The ChangeLog format is  
thus company names aren't listed explicitly, but your email is @sgi.com, so
it will be there.

Jakub


Re: [PATCH][ARM] Split out armv7ve effective target check

2016-03-23 Thread Kyrill Tkachov


On 23/03/16 10:33, Kyrill Tkachov wrote:


On 16/03/16 15:54, Ramana Radhakrishnan wrote:

On Wed, Mar 2, 2016 at 1:32 PM, Kyrill Tkachov
 wrote:

Hi all,

I'm seeing the fails:
FAIL: gcc.target/arm/atomic_loaddi_2.c scan-assembler-times ldrd\tr[0-9]+,
r[0-9]+, \\[r[0-9]+\\] 1
FAIL: gcc.target/arm/atomic_loaddi_5.c scan-assembler-times ldrd\tr[0-9]+,
r[0-9]+, \\[r[0-9]+\\] 1
FAIL: gcc.target/arm/atomic_loaddi_8.c scan-assembler-times ldrd\tr[0-9]+,
r[0-9]+, \\[r[0-9]+\\] 1

when testing an arm multilib with /-march=armv7-a.

The tests have an effective target check for armv7ve but it doesn't work
because
under the hood the check is the same as for armv7-a, that is it checks for
the __ARM_ARCH_7A__
predefine which is set for both march values.

To check for armv7ve using predefines we need to check for both
__ARM_ARCH_7A__ and for the hardware
integer division predefine, making armv7ve special.

So this patch separates the effective target check definition from the rest
of the architectures
and defines it appropriately.

With this patch the aforementioned tests appear UNSUPPORTED when testing the
/-march=armv7-a multilib.

Ok for trunk?

Ok, but please follow up with updating sourcebuild.texi.


sourcebuild.texi shouldn't need any updating as I'm not adding a new effective
target check, I'm just fixing the way one of the already existing ones is 
defined.

Committed with r234420.



Is it ok to also backport this to the branches?
We'll need to it for multilib testing of:
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00924.html
and
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00923.html

Thanks,
Kyrill





Ramana


Thanks,
Kyrill

2016-03-02  Kyrylo Tkachov  

 * lib/target-supports.exp: Remove v7ve entry from loop
 creating effective target checks.
 (check_effective_target_arm_arch_v7ve_ok): New procedure.
 (add_options_for_arm_arch_v7ve): Likewise.






RE: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread Fukuoka Daichi
Jakub,

> The ChangeLog format is   thus company names 
> aren't listed explicitly, but your email is @sgi.com, so it will be there.

I understand the format, thanks.

Regards,
Daichi Fukuoka

-Original Message-
From: Jakub Jelinek [mailto:ja...@redhat.com] 
Sent: Wednesday, March 23, 2016 7:37 PM
To: Fukuoka Daichi
Cc: James Norris; Thomas Schwinge; GCC Patches
Subject: Re: [gomp4] Fix handling of subarrays with update directive

On Wed, Mar 23, 2016 at 10:31:33AM +, Fukuoka Daichi wrote:
> > Yeah, the oacc-mem.c is a two liner, we don't need copyright assignment for 
> > that, so please mention Daichi-san's name + mail address, and PR 
> > libgpmp/69414 in the ChangeLog entry.
> 
> If possible, could you add " SGI Japan, Ltd." before/after my name?
> Because I am belonging to the company.

The ChangeLog format is   thus company names 
aren't listed explicitly, but your email is @sgi.com, so it will be there.

Jakub


[PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Kirill Yukhin
Hello,

`vpcmpeqd' insn is only available in AVX2, however
[1] added check for AVX instead. Looks like a typo.

Patch in the bottom fixes that.

Bootstrapped, regtest still run.
Is it ok for main trunk if pass?

gcc/
* config/i386/i386.c (standard_sse_constant_opcode): Emit
vpcmpeqd if AVX2 is enabled.

[1] - https://gcc.gnu.org/ml/gcc-cvs/2011-09/msg00318.html

--
Thanks, K

commit e30c86b06968416e7401a113b866ae353311aa10
Author: Kirill Yukhin 
Date:   Wed Mar 23 14:08:07 2016 +0300

AVX2. Emit vpcmpeqd for const_m1 only if AVX2 is enabled.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1639704..7f5db75 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
  || get_attr_mode (insn) == MODE_V8DF
  || get_attr_mode (insn) == MODE_V16SF)
return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
-  if (TARGET_AVX)
+  if (TARGET_AVX2)
return "vpcmpeqd\t%0, %0, %0";
   else
return "pcmpeqd\t%0, %0";



[PATCH, CHKP, Solaris, PR target/69917, committed] Respect transparent alias chains of assembler names

2016-03-23 Thread Ilya Enkovich
Hi,

This patch fixes gcc.target/i386/chkp-hidden-def.c test failure
on Solaris.  Failure happens because some Solaris emit code
ignores transparent alias chains for assembler names.  The patch
was tested and approved by Rainer Orth (see [1] for more details).
Applied to trunk.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69917

Thanks,
Ilya
--
gcc/

2016-03-23  Ilya Enkovich  

PR target/69917
* config/i386/sol2.h (ASM_OUTPUT_DEF_FROM_DECLS): Follow
transparent alias chain for decl assembler name.
* config/sol2.c (solaris_assemble_visibility): Likewise.


diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
index e56e554..368c2d2 100644
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -137,8 +137,9 @@ along with GCC; see the file COPYING3.  If not see
 /* The Solaris assembler wants a .local for non-exported aliases.  */
 #define ASM_OUTPUT_DEF_FROM_DECLS(FILE, DECL, TARGET)  \
   do { \
-const char *declname = \
-  IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (DECL)); \
+tree id = DECL_ASSEMBLER_NAME (DECL);  \
+ultimate_transparent_alias_target (&id);   \
+const char *declname = IDENTIFIER_POINTER (id);\
 ASM_OUTPUT_DEF ((FILE), declname,  \
IDENTIFIER_POINTER (TARGET));   \
 if (! TREE_PUBLIC (DECL))  \
diff --git a/gcc/config/sol2.c b/gcc/config/sol2.c
index 560a07b..47b41fd 100644
--- a/gcc/config/sol2.c
+++ b/gcc/config/sol2.c
@@ -142,8 +142,11 @@ solaris_assemble_visibility (tree decl, int vis 
ATTRIBUTE_UNUSED)
   };
 
   const char *name, *type;
+  tree id = DECL_ASSEMBLER_NAME (decl);
 
-  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+  while (IDENTIFIER_TRANSPARENT_ALIAS (id))
+id = TREE_CHAIN (id);
+  name = IDENTIFIER_POINTER (id);
   type = visibility_types[vis];
 
   fprintf (asm_out_file, "\t.%s\t", type);


Re: [PATCH][ARM] Split out armv7ve effective target check

2016-03-23 Thread Ramana Radhakrishnan
On 23/03/16 11:09, Kyrill Tkachov wrote:
> 
> On 23/03/16 10:33, Kyrill Tkachov wrote:
>>
>> On 16/03/16 15:54, Ramana Radhakrishnan wrote:
>>> On Wed, Mar 2, 2016 at 1:32 PM, Kyrill Tkachov
>>>  wrote:
 Hi all,

 I'm seeing the fails:
 FAIL: gcc.target/arm/atomic_loaddi_2.c scan-assembler-times ldrd\tr[0-9]+,
 r[0-9]+, \\[r[0-9]+\\] 1
 FAIL: gcc.target/arm/atomic_loaddi_5.c scan-assembler-times ldrd\tr[0-9]+,
 r[0-9]+, \\[r[0-9]+\\] 1
 FAIL: gcc.target/arm/atomic_loaddi_8.c scan-assembler-times ldrd\tr[0-9]+,
 r[0-9]+, \\[r[0-9]+\\] 1

 when testing an arm multilib with /-march=armv7-a.

 The tests have an effective target check for armv7ve but it doesn't work
 because
 under the hood the check is the same as for armv7-a, that is it checks for
 the __ARM_ARCH_7A__
 predefine which is set for both march values.

 To check for armv7ve using predefines we need to check for both
 __ARM_ARCH_7A__ and for the hardware
 integer division predefine, making armv7ve special.

 So this patch separates the effective target check definition from the rest
 of the architectures
 and defines it appropriately.

 With this patch the aforementioned tests appear UNSUPPORTED when testing 
 the
 /-march=armv7-a multilib.

 Ok for trunk?
>>> Ok, but please follow up with updating sourcebuild.texi.
>>
>> sourcebuild.texi shouldn't need any updating as I'm not adding a new 
>> effective
>> target check, I'm just fixing the way one of the already existing ones is 
>> defined.
>>
>> Committed with r234420.
>>
> 
> Is it ok to also backport this to the branches?
> We'll need to it for multilib testing of:
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00924.html
> and
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00923.html
> 

OK.

Ramana

> Thanks,
> Kyrill
> 
>>
>>>
>>> Ramana
>>>
 Thanks,
 Kyrill

 2016-03-02  Kyrylo Tkachov  

  * lib/target-supports.exp: Remove v7ve entry from loop
  creating effective target checks.
  (check_effective_target_arm_arch_v7ve_ok): New procedure.
  (add_options_for_arm_arch_v7ve): Likewise.
>>
> 



Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread H.J. Lu
On Wed, Mar 23, 2016 at 4:28 AM, Kirill Yukhin  wrote:
> Hello,
>
> `vpcmpeqd' insn is only available in AVX2, however
> [1] added check for AVX instead. Looks like a typo.
>
> Patch in the bottom fixes that.
>
> Bootstrapped, regtest still run.
> Is it ok for main trunk if pass?
>
> gcc/
> * config/i386/i386.c (standard_sse_constant_opcode): Emit
> vpcmpeqd if AVX2 is enabled.
>
> [1] - https://gcc.gnu.org/ml/gcc-cvs/2011-09/msg00318.html
>
> --
> Thanks, K
>
> commit e30c86b06968416e7401a113b866ae353311aa10
> Author: Kirill Yukhin 
> Date:   Wed Mar 23 14:08:07 2016 +0300
>
> AVX2. Emit vpcmpeqd for const_m1 only if AVX2 is enabled.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1639704..7f5db75 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
>   || get_attr_mode (insn) == MODE_V8DF
>   || get_attr_mode (insn) == MODE_V16SF)
> return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> -  if (TARGET_AVX)
> +  if (TARGET_AVX2)
> return "vpcmpeqd\t%0, %0, %0";
>else
> return "pcmpeqd\t%0, %0";
>

Don't we want vpcmpeqd for 128-bit -1?

-- 
H.J.


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Kirill Yukhin
On 23 Mar 04:36, H.J. Lu wrote:
> On Wed, Mar 23, 2016 at 4:28 AM, Kirill Yukhin  
> wrote:
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 1639704..7f5db75 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
> >   || get_attr_mode (insn) == MODE_V8DF
> >   || get_attr_mode (insn) == MODE_V16SF)
> > return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> > -  if (TARGET_AVX)
> > +  if (TARGET_AVX2)
> > return "vpcmpeqd\t%0, %0, %0";
> >else
> > return "pcmpeqd\t%0, %0";
> >
> 
> Don't we want vpcmpeqd for 128-bit -1?
Well, we have `pcmpeqd' convering same range of regnumx for XMMs. So, why?
> 
> -- 
> H.J.

--
Thanks, K


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 02:28:03PM +0300, Kirill Yukhin wrote:
> Hello,
> 
> `vpcmpeqd' insn is only available in AVX2, however
> [1] added check for AVX instead. Looks like a typo.
> 
> Patch in the bottom fixes that.
> 
> Bootstrapped, regtest still run.
> Is it ok for main trunk if pass?

??
I see
VEX.NDS.128.66.0F.WIG 76 /r RVM V/V AVX Compare packed 
doublewords in xmm3/m128 and
VPCMPEQD xmm1, xmm2, xmm3/m128  xmm2 for equality.
and
VEX.NDS.256.66.0F.WIG 76 /r RVM V/V AVX2Compare packed 
doublewords in ymm3/m256 and
VPCMPEQD ymm1, ymm2, ymm3 /m256 ymm2 for equality.
in the ISA pdfs.
This code is only executed if standard_sse_constant_p returns 2, which
is for 16-byte vectors and all ones for TARGET_SSE2, and for
32-byte vectors for TARGET_AVX2.
Thus, IMNSHO the patch is wrong, even for plain -mavx -mno-avx2
we want to emit vpcmpeqd %xmm?, %xmm?, %xmm?, so that we don't mix
VEX with non-VEX encoded insns.

> gcc/
>   * config/i386/i386.c (standard_sse_constant_opcode): Emit
>   vpcmpeqd if AVX2 is enabled.
> 
> [1] - https://gcc.gnu.org/ml/gcc-cvs/2011-09/msg00318.html
> 
> --
> Thanks, K
> 
> commit e30c86b06968416e7401a113b866ae353311aa10
> Author: Kirill Yukhin 
> Date:   Wed Mar 23 14:08:07 2016 +0300
> 
> AVX2. Emit vpcmpeqd for const_m1 only if AVX2 is enabled.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1639704..7f5db75 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
>   || get_attr_mode (insn) == MODE_V8DF
>   || get_attr_mode (insn) == MODE_V16SF)
> return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> -  if (TARGET_AVX)
> +  if (TARGET_AVX2)
> return "vpcmpeqd\t%0, %0, %0";
>else
> return "pcmpeqd\t%0, %0";

Jakub


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Kirill Yukhin
On 23 Mar 12:41, Jakub Jelinek wrote:
> On Wed, Mar 23, 2016 at 02:28:03PM +0300, Kirill Yukhin wrote:
> > Hello,
> > 
> > `vpcmpeqd' insn is only available in AVX2, however
> > [1] added check for AVX instead. Looks like a typo.
> > 
> > Patch in the bottom fixes that.
> > 
> > Bootstrapped, regtest still run.
> > Is it ok for main trunk if pass?
> 
> ??
> I see
> VEX.NDS.128.66.0F.WIG 76 /r   RVM V/V AVX Compare packed 
> doublewords in xmm3/m128 and
> VPCMPEQD xmm1, xmm2, xmm3/m128xmm2 for 
> equality.
> and
> VEX.NDS.256.66.0F.WIG 76 /r   RVM V/V AVX2Compare packed 
> doublewords in ymm3/m256 and
> VPCMPEQD ymm1, ymm2, ymm3 /m256   ymm2 for 
> equality.
> in the ISA pdfs.
> This code is only executed if standard_sse_constant_p returns 2, which
> is for 16-byte vectors and all ones for TARGET_SSE2, and for
> 32-byte vectors for TARGET_AVX2.
> Thus, IMNSHO the patch is wrong, even for plain -mavx -mno-avx2
> we want to emit vpcmpeqd %xmm?, %xmm?, %xmm?, so that we don't mix
> VEX with non-VEX encoded insns.
You're right. I've updated the patch to check mode size as well.

--
Thanks, K

commit ae82089e3a4244d870737a7022e66e87042da811
Author: Kirill Yukhin 
Date:   Wed Mar 23 14:08:07 2016 +0300

AVX2. Emit vpcmpeqd for const_m1 only if AVX2 is enabled.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1639704..963cc3d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
  || get_attr_mode (insn) == MODE_V8DF
  || get_attr_mode (insn) == MODE_V16SF)
return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
-  if (TARGET_AVX)
+  if (TARGET_AVX2 || get_attr_mode (insn) == 16)
return "vpcmpeqd\t%0, %0, %0";
   else
return "pcmpeqd\t%0, %0";


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 02:47:54PM +0300, Kirill Yukhin wrote:
> > This code is only executed if standard_sse_constant_p returns 2, which
> > is for 16-byte vectors and all ones for TARGET_SSE2, and for
> > 32-byte vectors for TARGET_AVX2.
> > Thus, IMNSHO the patch is wrong, even for plain -mavx -mno-avx2
> > we want to emit vpcmpeqd %xmm?, %xmm?, %xmm?, so that we don't mix
> > VEX with non-VEX encoded insns.
> You're right. I've updated the patch to check mode size as well.

That is also wrong, you will then emit vpcmpeqd even for -msse2 -mno-avx.

There is really nothing wrong in what is on the trunk, no reason to patch
anything.  What the trunk does is, if you have AVX, you emit VEX encoded
insn, if you don't have AVX, you emit normally encoded insn.
And whether there is support to get cheap all ones is the business of
standard_sse_constant_p function, which DTRT.

> commit ae82089e3a4244d870737a7022e66e87042da811
> Author: Kirill Yukhin 
> Date:   Wed Mar 23 14:08:07 2016 +0300
> 
> AVX2. Emit vpcmpeqd for const_m1 only if AVX2 is enabled.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1639704..963cc3d 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
>   || get_attr_mode (insn) == MODE_V8DF
>   || get_attr_mode (insn) == MODE_V16SF)
> return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> -  if (TARGET_AVX)
> +  if (TARGET_AVX2 || get_attr_mode (insn) == 16)
> return "vpcmpeqd\t%0, %0, %0";
>else
> return "pcmpeqd\t%0, %0";

Jakub


[testsuite, sparc, gcc-5] Skip gcc.target/sparc/20151219-1.c in 64-bit mode

2016-03-23 Thread Rainer Orth
gcc.target/sparc/20151219-1.c currently FAILs on the gcc-5 branch only
for the 64-bit case:

FAIL: gcc.target/sparc/20151219-1.c (test for excess errors)
FAIL: gcc.target/sparc/20151219-1.c scan-assembler-not stx\\t%

Fixed by the following backport.  Tested with the appropriate runtest
invocation on sparc-sun-solaris2.12, installed on the gcc-5 branch.

Rainer


2016-03-23  Rainer Orth  

gcc/testsuite:
Backport from mainline
2016-01-06  Eric Botcazou  

* gcc.target/sparc/20151219-1.c: Skip in 64-bit mode.

# HG changeset patch
# Parent  102d08cd85f4b7b433160bcb69d8a1ca08031f53
Skip gcc.target/sparc/20151219-1.c in 64-bit mode

diff --git a/gcc/testsuite/gcc.target/sparc/20151219-1.c b/gcc/testsuite/gcc.target/sparc/20151219-1.c
--- a/gcc/testsuite/gcc.target/sparc/20151219-1.c
+++ b/gcc/testsuite/gcc.target/sparc/20151219-1.c
@@ -2,6 +2,7 @@
 /* Reported by Sebastian Huber  */
 
 /* { dg-do compile } */
+/* { dg-require-effective-target ilp32 } */
 /* { dg-options "-O2 -mtune=supersparc" } */
 
 typedef unsigned int size_t;

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Kirill Yukhin
On 23 Mar 12:55, Jakub Jelinek wrote:
> On Wed, Mar 23, 2016 at 02:47:54PM +0300, Kirill Yukhin wrote:
> > > This code is only executed if standard_sse_constant_p returns 2, which
> > > is for 16-byte vectors and all ones for TARGET_SSE2, and for
> > > 32-byte vectors for TARGET_AVX2.
> > > Thus, IMNSHO the patch is wrong, even for plain -mavx -mno-avx2
> > > we want to emit vpcmpeqd %xmm?, %xmm?, %xmm?, so that we don't mix
> > > VEX with non-VEX encoded insns.
> > You're right. I've updated the patch to check mode size as well.
> 
> That is also wrong, you will then emit vpcmpeqd even for -msse2 -mno-avx.
> 
> There is really nothing wrong in what is on the trunk, no reason to patch
> anything.  What the trunk does is, if you have AVX, you emit VEX encoded
> insn, if you don't have AVX, you emit normally encoded insn.
> And whether there is support to get cheap all ones is the business of
> standard_sse_constant_p function, which DTRT.
Probably, yes. But this just look pretty strange at first glance.
Something like this looks much more straightforward to me:
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 963cc3d..4aff647 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
  || get_attr_mode (insn) == MODE_V8DF
  || get_attr_mode (insn) == MODE_V16SF)
return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
-  if (TARGET_AVX2 || get_attr_mode (insn) == 16)
+  if ((TARGET_AVX2 && get_attr_mode (insn) == 32) || TARGET_AVX)
return "vpcmpeqd\t%0, %0, %0";
   else
return "pcmpeqd\t%0, %0";

But yes, if standard_sse_constant_p () is used properly there's no need for
the change.

--
Thanks, K


Re: [PATCH][ARM][RFC] PR target/65578 Fix gcc.dg/torture/stackalign/builtin-apply-4.c for single-precision fpus

2016-03-23 Thread Ramana Radhakrishnan
On Tue, Feb 9, 2016 at 5:21 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> In this wrong-code PR the builtin-apply-4.c test fails with -flto but only
> when targeting an fpu
> with only single-precision capabilities.
>
> bar is a function returing a double. For non-LTO compilation the caller of
> bar reads the return value
> from it from the s0 and s1 VFP registers like expected, but for -flto the
> caller seems to expect the
> return value from the r0 and r1 regs.  The RTL dumps show that too.
>
> Debugging the calls to arm_function_value show that in the -flto compilation
> the function bar is deemed
> to be a local function call and assigned the ARM_PCS_AAPCS_LOCAL PCS
> variant, whereas for the non-LTO (and non-breaking)
> compilation it uses the ARM_PCS_AAPCS_VFP variant.
>
> Further down in use_vfp_abi when deciding whether to use VFP registers for
> the result there is a bit of
> logic that rejects VFP registers when handling the ARM_PCS_AAPCS_LOCAL
> variant with a double precision value
> on an FPU that is not TARGET_VFP_DOUBLE.
>
> This seems wrong for ARM_PCS_AAPCS_LOCAL to me. ARM_PCS_AAPCS_LOCAL means
> that the function doesn't escape
> the translation unit and we can thus use whatever variant we want. From what
> I understand we want to use the
> VFP regs when possible for FP values.
>
> So this patch removes that restriction and for the testcase the caller of
> bar correctly reads the return
> value of bar from the VFP registers and everything works.
>
> This patch has been bootstrapped and tested on arm-none-linux-gnueabihf
> configured with --with-fpu=fpv4-sp-d16.
> The bootstrapped was performed with LTO.
> I didn't see any regressions.
>
> It seems that this logic was put there in 2009 with r154034 as part of a
> large patch to enable support for half-precision
> floating point.
>
> I'm not very familiar with this part of the code, so is this a safe patch to
> do?
> The patch should only ever change behaviour for single-precision-only fpus
> and only for static functions
> that don't get called outside their translation units (or during LTO I
> suppose) so there shouldn't
> be any ABI problems, I think.
>
> Is this ok for trunk?

I spent sometime this morning reading through this patch and it does
look reasonably ok. The AAPCS tests if run for hardfloat should catch
any regressions. However given the stage we are in I'd like this
tested through compat.exp and struct-layout.exp across the range of
ABIs and FPU options to ensure we haven't missed anything. Richard ,
could you also give this a once over ?


Ramana







>
> Thanks,
> Kyrill
>
> 2016-02-09  Kyrylo Tkachov  
>
> PR target/65578
> * config/arm/arm.c (use_vfp_abi): Remove id_double argument.
> Don't check for is_double and TARGET_VFP_DOUBLE.
> (aapcs_vfp_is_call_or_return_candidate): Update callsite.
> (aapcs_vfp_is_return_candidate): Likewise.
> (aapcs_vfp_is_call_candidate): Likewise.
> (aapcs_vfp_allocate_return_reg): Likewise.


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 03:06:22PM +0300, Kirill Yukhin wrote:
> On 23 Mar 12:55, Jakub Jelinek wrote:
> > On Wed, Mar 23, 2016 at 02:47:54PM +0300, Kirill Yukhin wrote:
> > > > This code is only executed if standard_sse_constant_p returns 2, which
> > > > is for 16-byte vectors and all ones for TARGET_SSE2, and for
> > > > 32-byte vectors for TARGET_AVX2.
> > > > Thus, IMNSHO the patch is wrong, even for plain -mavx -mno-avx2
> > > > we want to emit vpcmpeqd %xmm?, %xmm?, %xmm?, so that we don't mix
> > > > VEX with non-VEX encoded insns.
> > > You're right. I've updated the patch to check mode size as well.
> > 
> > That is also wrong, you will then emit vpcmpeqd even for -msse2 -mno-avx.
> > 
> > There is really nothing wrong in what is on the trunk, no reason to patch
> > anything.  What the trunk does is, if you have AVX, you emit VEX encoded
> > insn, if you don't have AVX, you emit normally encoded insn.
> > And whether there is support to get cheap all ones is the business of
> > standard_sse_constant_p function, which DTRT.
> Probably, yes. But this just look pretty strange at first glance.
> Something like this looks much more straightforward to me:
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 963cc3d..4aff647 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
>   || get_attr_mode (insn) == MODE_V8DF
>   || get_attr_mode (insn) == MODE_V16SF)
> return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> -  if (TARGET_AVX2 || get_attr_mode (insn) == 16)
> +  if ((TARGET_AVX2 && get_attr_mode (insn) == 32) || TARGET_AVX)
> return "vpcmpeqd\t%0, %0, %0";
>else
> return "pcmpeqd\t%0, %0";
> 
> But yes, if standard_sse_constant_p () is used properly there's no need for
> the change.

Even if used improperly.  standard_sse_constant_opcode starts by calling
standard_sse_constant_p.  So there is no point duplicating anything from it,
it has been already checked.
And
((TARGET_AVX2 && get_attr_mode (insn) == 32) || TARGET_AVX)
is just more fancy and expensive equivalent to
(TARGET_AVX)
(hint, if (TARGET_AVX2), then we know TARGET_AVX is non-zero too).
There is just nothing to guard, you don't want to ever emit pcmpeqd if
TARGET_AVX, simply because you don't want to mix VEX with old encoding,
as there is some penalty for that on some of the older CPUs.

Jakub


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Kirill Yukhin
>On 23 Mar 13:13, Jakub Jelinek wrote:
> On Wed, Mar 23, 2016 at 03:06:22PM +0300, Kirill Yukhin wrote:
> > On 23 Mar 12:55, Jakub Jelinek wrote:
> > > On Wed, Mar 23, 2016 at 02:47:54PM +0300, Kirill Yukhin wrote:
> > > > > This code is only executed if standard_sse_constant_p returns 2, which
> > > > > is for 16-byte vectors and all ones for TARGET_SSE2, and for
> > > > > 32-byte vectors for TARGET_AVX2.
> > > > > Thus, IMNSHO the patch is wrong, even for plain -mavx -mno-avx2
> > > > > we want to emit vpcmpeqd %xmm?, %xmm?, %xmm?, so that we don't mix
> > > > > VEX with non-VEX encoded insns.
> > > > You're right. I've updated the patch to check mode size as well.
> > > 
> > > That is also wrong, you will then emit vpcmpeqd even for -msse2 -mno-avx.
> > > 
> > > There is really nothing wrong in what is on the trunk, no reason to patch
> > > anything.  What the trunk does is, if you have AVX, you emit VEX encoded
> > > insn, if you don't have AVX, you emit normally encoded insn.
> > > And whether there is support to get cheap all ones is the business of
> > > standard_sse_constant_p function, which DTRT.
> > Probably, yes. But this just look pretty strange at first glance.
> > Something like this looks much more straightforward to me:
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 963cc3d..4aff647 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
> >   || get_attr_mode (insn) == MODE_V8DF
> >   || get_attr_mode (insn) == MODE_V16SF)
> > return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> > -  if (TARGET_AVX2 || get_attr_mode (insn) == 16)
> > +  if ((TARGET_AVX2 && get_attr_mode (insn) == 32) || TARGET_AVX)
> > return "vpcmpeqd\t%0, %0, %0";
> >else
> > return "pcmpeqd\t%0, %0";
> > 
> > But yes, if standard_sse_constant_p () is used properly there's no need for
> > the change.
> 
> Even if used improperly.  standard_sse_constant_opcode starts by calling
> standard_sse_constant_p.  So there is no point duplicating anything from it,
> it has been already checked.
> And
> ((TARGET_AVX2 && get_attr_mode (insn) == 32) || TARGET_AVX)
> is just more fancy and expensive equivalent to
> (TARGET_AVX)
> (hint, if (TARGET_AVX2), then we know TARGET_AVX is non-zero too).
> There is just nothing to guard, you don't want to ever emit pcmpeqd if
> TARGET_AVX, simply because you don't want to mix VEX with old encoding,
> as there is some penalty for that on some of the older CPUs.
Okay, got it, thanks!
> 
>   Jakub


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Uros Bizjak
On Wed, Mar 23, 2016 at 12:28 PM, Kirill Yukhin  wrote:
> Hello,
>
> `vpcmpeqd' insn is only available in AVX2, however
> [1] added check for AVX instead. Looks like a typo.
>
> Patch in the bottom fixes that.
>
> Bootstrapped, regtest still run.
> Is it ok for main trunk if pass?
>
> gcc/
> * config/i386/i386.c (standard_sse_constant_opcode): Emit
> vpcmpeqd if AVX2 is enabled.
>
> [1] - https://gcc.gnu.org/ml/gcc-cvs/2011-09/msg00318.html

No, the change is OK (standard_sse_constant_p does all the checks),
but nowadays we could write this part as:

--cut here--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1639704..59154c3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10859,10 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
  || get_attr_mode (insn) == MODE_V8DF
  || get_attr_mode (insn) == MODE_V16SF)
return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
-  if (TARGET_AVX)
-   return "vpcmpeqd\t%0, %0, %0";
-  else
-   return "pcmpeqd\t%0, %0";
+  return "%vpcmpeqd\t%0, %d0";

 default:
   break;
-- cut here--

Uros.

> --
> Thanks, K
>
> commit e30c86b06968416e7401a113b866ae353311aa10
> Author: Kirill Yukhin 
> Date:   Wed Mar 23 14:08:07 2016 +0300
>
> AVX2. Emit vpcmpeqd for const_m1 only if AVX2 is enabled.
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1639704..7f5db75 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10859,7 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
>   || get_attr_mode (insn) == MODE_V8DF
>   || get_attr_mode (insn) == MODE_V16SF)
> return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> -  if (TARGET_AVX)
> +  if (TARGET_AVX2)
> return "vpcmpeqd\t%0, %0, %0";
>else
> return "pcmpeqd\t%0, %0";
>


Re: Fix 69650, bogus line numbers from libcpp

2016-03-23 Thread Richard Biener
On Mon, Mar 21, 2016 at 11:58 PM, Bernd Schmidt  wrote:
> On 03/21/2016 09:15 PM, David Malcolm wrote:
>>
>> On Mon, 2016-03-14 at 14:20 +0100, Bernd Schmidt wrote:
>>>
>>> On 03/11/2016 11:09 PM, David Malcolm wrote:
>
> + cpp_error (pfile, CPP_DL_ERROR,
> +"file \"%s\" left but not entered",
> new_file);


 Although it looks like you're preserving the existing behavior from
 when this was in linemap_add, shouldn't this be
 ORDINARY_MAP_FILE_NAME (from)
 rather than new_file?  (i.e. shouldn't it report the name of the
 file
 being *left*, rather than the one being entered?)
>>
>>
>> I realize now that I was wrong here: "new_file" refers to the filename
>> given in the linemarker directive, which, depending on the (optional)
>> flag could be a directive to enter or leave a linemap.
>>
>> Sorry about that; you may want to disregard my earlier worries.
>
>
> No, I think you were mostly on point: new_file is the one in the #line
> directive, which AFAICT is the file being reentered. so the wording is in
> fact misleading. Including a file called "t.h" from "v.c" produces this
> pattern:
>
> # 1 "t.h" 1
> int t;
> # 2 "v.c" 2
>
>> I was thinking of something like the attached, which makes things more
>> explicit; I felt the first dg-error in your patch was a little too
>> concise.
>
>
> No problem, but I do think we want to change the wording of the message to
> something like "file %s unexpectedly reentered".
>
>> I'm very familiar with the code for tracking ranges and issuing
>> diagnostics, but less familiar with other parts of libcpp, so I'm not
>> sure I'm fully qualified to approve the patch.  That said, the patch
>> looks sane, mostly...  The remaining thing I have a concern about
>> relates to the other question I had, which I don't think you addressed:
>> is it possible to construct a testcase that triggers the logic in the
>> non-MAIN_FILE_P clause?
>
>
> Are you thinking of anything more complex than the following?
>
> # 1 "v.c"
> # 1 "t.h" 1
> # 1 "t2.h" 1
> int t;
> # 2 "t.h" 2
> # 2 "v.c" 2
>
> Change any of the filenames of the "^#.*2$" lines and you'll see error
> messages. For example, changing "t.h" to "x.h" in the second to last line:
>
> In file included from t.h:1:0,
>  from v.c:1:
> t2.h:2:13: error: file "x.h" left but not entered
> t2.h:3:12: error: file "v.c" left but not entered
>
> Of course any such error is likely to have a large number of follow-on
> errors. Not sure how to avoid this given that the input file most likely has
> a completely messed up structure.
>
>> (How much existing test coverage do we have for line-markers?  I found
>> about 15 existing testcases that use them in a crude search with grep,
>> but these are all apparently only incidentally as part of testing other
>> functionality; is it worth me adding some more general test coverage
>> for this?)
>
>
> It might be worthwhile but I'm not planning to for this issue.

Btw, the issue in the PR is also fixed with a simple

Index: libcpp/line-map.c
===
--- libcpp/line-map.c   (revision 234415)
+++ libcpp/line-map.c   (working copy)
@@ -543,7 +543,7 @@ linemap_add (struct line_maps *set, enum
 to_file);

   /* A TO_FILE of NULL is special - we use the natural values.  */
-  if (error || to_file == NULL)
+  if (to_file == NULL)
{
  to_file = ORDINARY_MAP_FILE_NAME (from);
  to_line = SOURCE_LINE (from, from[1].start_location);

I did some archeology and the revision that introduced the error || is
44789 which documented
that part as

(add_line_map): Return pointer to const.  When passed NULL to_file
with LC_LEAVE, use the obvious values for the return point so the
caller doesn't have to figure them out.

where obviously the values used in the error case are not obvious.
Simply keeping the
info parsed from the line directive makes us happy here.

Bootstrap/test still running, I also have a LTO testcase for the crash.

Note this doesn't make the testcase error as I think your patch does
(which I think is
an improvement in itself but possibly would require some -fpermissive
handling as I
expect some old code generators to generate invalid line directives).

Richard.

>
> Bernd
>


Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Kirill Yukhin
On 23 Mar 13:28, Uros Bizjak wrote:
> On Wed, Mar 23, 2016 at 12:28 PM, Kirill Yukhin  
> wrote:
> > Hello,
> >
> > `vpcmpeqd' insn is only available in AVX2, however
> > [1] added check for AVX instead. Looks like a typo.
> >
> > Patch in the bottom fixes that.
> >
> > Bootstrapped, regtest still run.
> > Is it ok for main trunk if pass?
> >
> > gcc/
> > * config/i386/i386.c (standard_sse_constant_opcode): Emit
> > vpcmpeqd if AVX2 is enabled.
> >
> > [1] - https://gcc.gnu.org/ml/gcc-cvs/2011-09/msg00318.html
> 
> No, the change is OK (standard_sse_constant_p does all the checks),
> but nowadays we could write this part as:
> 
> --cut here--
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1639704..59154c3 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -10859,10 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx x)
>   || get_attr_mode (insn) == MODE_V8DF
>   || get_attr_mode (insn) == MODE_V16SF)
> return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> -  if (TARGET_AVX)
> -   return "vpcmpeqd\t%0, %0, %0";
> -  else
> -   return "pcmpeqd\t%0, %0";
> +  return "%vpcmpeqd\t%0, %d0";
> 
>  default:
>break;
> -- cut here--
This looks much better to me.
Going to bootstrap/regtest.

Is it ok to check into main trunk?

> 
> Uros.
> 

--
Thanks, K


Re: [PATCH] Fix PR c++/70332 (ICE due to aggregate initialization of NSDMI)

2016-03-23 Thread Jason Merrill

On 03/22/2016 07:12 PM, Patrick Palka wrote:

On Tue, Mar 22, 2016 at 6:12 PM, Patrick Palka  wrote:

On Tue, Mar 22, 2016 at 6:00 PM, Jason Merrill  wrote:

On 03/22/2016 05:35 PM, Patrick Palka wrote:


+ if (cp_unevaluated_operand == 0



Why check this here?


Just so that the change doesn't affect the behavior of tsubst_decl()
when cp_unevaluated_operand != 0.  Presumably the existing code (10
lines below) handles that case just fine.


Turns out that without the check we can trigger the cxx_dialect >=
cxx14 assert because in c++11 mode we can reach the assert through
get_defaulted_eh_spec() which increments cp_unevaluated_operand and
then calls get_nsdmi (..., /*in_ctor=*/false) causing
current_class_ref to get set to a PLACEHOLDER_EXPR.

So for example g++.dg/cpp0x/nsdmi-template2.C regresses with an ICE.
So it seems the cp_unevaluated_operand != 0 check is necessary as long
as the assert stays.

There are no regressions if both the cp_unevaluated_operand check and
the assert are removed however.


I think that's my preference.

Jason




Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 03:55:01PM +0300, Kirill Yukhin wrote:
> > No, the change is OK (standard_sse_constant_p does all the checks),
> > but nowadays we could write this part as:
> > 
> > --cut here--
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 1639704..59154c3 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -10859,10 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx 
> > x)
> >   || get_attr_mode (insn) == MODE_V8DF
> >   || get_attr_mode (insn) == MODE_V16SF)
> > return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> > -  if (TARGET_AVX)
> > -   return "vpcmpeqd\t%0, %0, %0";
> > -  else
> > -   return "pcmpeqd\t%0, %0";
> > +  return "%vpcmpeqd\t%0, %d0";
> > 
> >  default:
> >break;
> > -- cut here--
> This looks much better to me.
> Going to bootstrap/regtest.
> 
> Is it ok to check into main trunk?

But what is the advantage of doing that?  You trade one simple conditional
for lots more processing at the insn emit time (parse %v, handle if
(TARGET_AVX) there, parse 'd' modifier, handle it if (TARGET_AVX)
conditionally).  One thing is in pattern, where it increases readability,
but in a routine which handles several different conditions anyway?

Jakub


Re: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread James Norris

Jakub,

On 03/23/2016 05:24 AM, Jakub Jelinek wrote:

Otherwise LGTM, but please repost it with all the testcase changes you want
to make.


Attached is the updated patch with the castings added, as well as
the updated tests.

Thanks!
Jim

==

2016-03-23  James Norris  
Daichi Fukuoka 

* oacc-mem.c (delete_copyout, update_dev_host): Fix device address.
* testsuite/libgomp.oacc-c-c++-common/update-1.c: Additional tests.
* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Likewise.
* testsuite/libgomp.oacc-fortran/update-1.f90: New file.

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index fca65e6..66dc6a0 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,11 @@
+2016-03-22  James Norris  
+	Daichi Fukuoka 
+
+	* oacc-mem.c (delete_copyout, update_dev_host): Fix device address.
+	* testsuite/libgomp.oacc-c-c++-common/update-1.c: Additional tests.
+	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/update-1.f90: New file.
+
 2016-03-16  Thomas Schwinge  
 
 	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Adjust to
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index f6cc373..487691e 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -509,7 +509,8 @@ delete_copyout (unsigned f, void *h, size_t s)
   gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
 }
 
-  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset
+	+ (uintptr_t) h - n->host_start);
 
   host_size = n->host_end - n->host_start;
 
@@ -562,7 +563,8 @@ update_dev_host (int is_dev, void *h, size_t s)
   gomp_fatal ("[%p,%d] is not mapped", h, (int)s);
 }
 
-  d = (void *) (n->tgt->tgt_start + n->tgt_offset);
+  d = (void *) (n->tgt->tgt_start + n->tgt_offset
+		+ (uintptr_t) h - n->host_start);
 
   gomp_mutex_unlock (&acc_dev->lock);
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
index c7e7257..82c3192 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
@@ -13,6 +13,7 @@ int
 main (int argc, char **argv)
 {
 int N = 8;
+int NDIV2 = N / 2;
 float *a, *b, *c;
 float *d_a, *d_b, *d_c;
 int i;
@@ -242,7 +243,7 @@ main (int argc, char **argv)
 a[i] = 6.0;
 }
 
-#pragma acc update device (a[0:N >> 1])
+#pragma acc update device (a[0:NDIV2])
 
 #pragma acc parallel present (a[0:N], b[0:N])
 {
@@ -254,7 +255,7 @@ main (int argc, char **argv)
 
 #pragma acc update self (a[0:N], b[0:N])
 
-for (i = 0; i < (N >> 1); i++)
+for (i = 0; i < NDIV2; i++)
 {
 if (a[i] != 6.0)
 abort ();
@@ -263,7 +264,7 @@ main (int argc, char **argv)
 abort ();
 }
 
-for (i = (N >> 1); i < N; i++)
+for (i = NDIV2; i < N; i++)
 {
 if (a[i] != 5.0)
 abort ();
@@ -278,5 +279,83 @@ main (int argc, char **argv)
 if (!acc_is_present (&b[0], (N * sizeof (float
   abort ();
 
+for (i = 0; i < N; i++)
+{
+a[i] = 0.0;
+}
+
+#pragma acc update device (a[0:4])
+
+#pragma acc parallel present (a[0:N])
+{
+int ii;
+
+for (ii = 0; ii < N; ii++)
+a[ii] = a[ii] + 1.0;
+}
+
+#pragma acc update self (a[4:4])
+
+for (i = 0; i < NDIV2; i++)
+{
+if (a[i] != 0.0)
+abort ();
+}
+
+for (i = NDIV2; i < N; i++)
+{
+if (a[i] != 6.0)
+abort ();
+}
+
+#pragma acc update self (a[0:4])
+
+for (i = 0; i < NDIV2; i++)
+{
+if (a[i] != 1.0)
+abort ();
+}
+
+for (i = NDIV2; i < N; i++)
+{
+if (a[i] != 6.0)
+abort ();
+}
+
+a[2] = 9;
+a[3] = 9;
+a[4] = 9;
+a[5] = 9;
+
+#pragma acc update device (a[2:4])
+
+#pragma acc parallel present (a[0:N])
+{
+int ii;
+
+for (ii = 0; ii < N; ii++)
+a[ii] = a[ii] + 1.0;
+}
+
+#pragma acc update self (a[2:4])
+
+for (i = 0; i < 2; i++)
+{
+  if (a[i] != 1.0)
+	abort ();
+}
+
+for (i = 2; i < 6; i++)
+{
+  if (a[i] != 10.0)
+	abort ();
+}
+
+for (i = 6; i < N; i++)
+{
+if (a[i] != 6.0)
+abort ();
+}
+
 return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1.c
index dff139f..1b2a460 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/update-1.c
@@ -11,6 +11,7 @@ int
 main (int argc, char **argv)
 {
 int N = 8;
+int NDIV2 = N / 2;
 float *a, *b, *c;
 float *d_a, *d_b, *d_c;
 int i;
@@ -109,7 +110,7 @@ main (int argc, char **argv)
 b[ii] = a[ii];
 }
 
-#pragma acc update self (a[0:N], b[0:N])
+#pragma acc upda

Re: Fix 69650, bogus line numbers from libcpp

2016-03-23 Thread Bernd Schmidt

On 03/23/2016 01:41 PM, Richard Biener wrote:

Btw, the issue in the PR is also fixed with a simple

Index: libcpp/line-map.c
===
--- libcpp/line-map.c   (revision 234415)
+++ libcpp/line-map.c   (working copy)
@@ -543,7 +543,7 @@ linemap_add (struct line_maps *set, enum
  to_file);

/* A TO_FILE of NULL is special - we use the natural values.  */
-  if (error || to_file == NULL)
+  if (to_file == NULL)
 {
   to_file = ORDINARY_MAP_FILE_NAME (from);
   to_line = SOURCE_LINE (from, from[1].start_location);


I looked at that, but that made it hard to add the testcase as the line 
numbers no longer match the dg-error directives. By moving this code we 
can ignore the erroneous #line directive, and for this one testcase at 
least, that makes the line numbers (and caret diagnostics etc.) come out 
right.



Bernd


Re: [gomp4] Fix handling of subarrays with update directive

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 08:05:19AM -0500, James Norris wrote:
> Jakub,
> 
> On 03/23/2016 05:24 AM, Jakub Jelinek wrote:
> >Otherwise LGTM, but please repost it with all the testcase changes you want
> >to make.
> 
> Attached is the updated patch with the castings added, as well as
> the updated tests.
> 
> Thanks!
> Jim
> 
> ==
> 
> 2016-03-23  James Norris  
> Daichi Fukuoka 
> 

Missing
PR libgomp/69414
> * oacc-mem.c (delete_copyout, update_dev_host): Fix device address.
> * testsuite/libgomp.oacc-c-c++-common/update-1.c: Additional tests.
> * testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Likewise.
> * testsuite/libgomp.oacc-fortran/update-1.f90: New file.

Ok with that change.


Jakub


Re: [RFC][ARM,AArch64] Adding crypto Advsimd intrinsics tests

2016-03-23 Thread Christophe Lyon
On 8 March 2016 at 16:37, Christophe Lyon  wrote:
> To illustrate what I mean, in fact we already have similar cases:
>
> On 7 March 2016 at 10:12, Christophe Lyon  wrote:
>> Hi,
>>
>> While preparing the cleanup of neon-testgen.ml, I'm adding the missing
>> tests to gcc.target/aarch64/advsimd-intrinsics.
>>
>> All the *_p64 and *_p128 are currently missing, and I am wondering
>> what's the best option. I can think of:
>> 1- Update existing tests using #ifdef __ARM_FEATURE_CRYPTO
> This somewhat what we currently have in vfma.c/vfms.c (but here in
> fact the test is empty if the #ifdef is false)
>
>> 2- Update existing tests without #ifdef, but adding effective_target
>> arm_crypto_ok
> That's what we have with vqrdmlah.c, testing arm_v8_1a_neon_hw
>
>> 3- Create dedicated tests, either grouping alll p64/p128 in one single
>> source, or splitting them in as many source files as there are
>> intrinsics.
> That's almost what we have in vcvt_f16.c, which also uses effective
> target arm_neon_fp16_hw.
>

For the record, I've started implementing (3).
I've created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70369
to keep track of this.

Christophe.

>
>> 1- means that we would test different things depending on how GCC is
>> configured (--with-fpu)
>> 2- means that we would not be able to test the subset which does not
>> require crypto if for some reason we cannot force the right effective
>> target
>> 3- might be a bit more confusing as several places cover the same intrinsics.
>>
>> Thoughts?
>>
>> Thanks,
>>
>> Christophe.


Re: [PATCH] Fix PR c++/70332 (ICE due to aggregate initialization of NSDMI)

2016-03-23 Thread Patrick Palka
On Wed, 23 Mar 2016, Jason Merrill wrote:

> On 03/22/2016 07:12 PM, Patrick Palka wrote:
> > On Tue, Mar 22, 2016 at 6:12 PM, Patrick Palka  wrote:
> > > On Tue, Mar 22, 2016 at 6:00 PM, Jason Merrill  wrote:
> > > > On 03/22/2016 05:35 PM, Patrick Palka wrote:
> > > > > 
> > > > > + if (cp_unevaluated_operand == 0
> > > > 
> > > > 
> > > > Why check this here?
> > > 
> > > Just so that the change doesn't affect the behavior of tsubst_decl()
> > > when cp_unevaluated_operand != 0.  Presumably the existing code (10
> > > lines below) handles that case just fine.
> > 
> > Turns out that without the check we can trigger the cxx_dialect >=
> > cxx14 assert because in c++11 mode we can reach the assert through
> > get_defaulted_eh_spec() which increments cp_unevaluated_operand and
> > then calls get_nsdmi (..., /*in_ctor=*/false) causing
> > current_class_ref to get set to a PLACEHOLDER_EXPR.
> > 
> > So for example g++.dg/cpp0x/nsdmi-template2.C regresses with an ICE.
> > So it seems the cp_unevaluated_operand != 0 check is necessary as long
> > as the assert stays.
> > 
> > There are no regressions if both the cp_unevaluated_operand check and
> > the assert are removed however.
> 
> I think that's my preference.
> 
> Jason
> 
> 
> 

Done.  Does this version look OK to commit after bootstrap + regtesting?

gcc/cp/ChangeLog:

PR c++/70332
* pt.c (tsubst_copy) [PARM_DECL]: Handle the use of 'this' in an
NSDMI that's part of an aggregrate initialization.

gcc/testsuite/ChangeLog:

PR c++/70332
* g++.dg/cpp1y/nsdmi-aggr5.C: New test.
---
 gcc/cp/pt.c  |  9 ++---
 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr5.C | 24 
 2 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr5.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index ebfc45b..231d112 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13877,10 +13877,13 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 
   if (r == NULL_TREE)
{
- /* We get here for a use of 'this' in an NSDMI.  */
+ /* We get here for a use of 'this' in an NSDMI as part of a
+constructor call or as part of an aggregate initialization.  */
  if (DECL_NAME (t) == this_identifier
- && current_function_decl
- && DECL_CONSTRUCTOR_P (current_function_decl))
+ && ((current_function_decl
+  && DECL_CONSTRUCTOR_P (current_function_decl))
+ || (current_class_ref
+ && TREE_CODE (current_class_ref) == PLACEHOLDER_EXPR)))
return current_class_ptr;
 
  /* This can happen for a parameter name used later in a function
diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr5.C 
b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr5.C
new file mode 100644
index 000..fe377c3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr5.C
@@ -0,0 +1,24 @@
+// PR c++/70332
+// { dg-do run { target c++14 } }
+
+template 
+struct C
+{
+ T m;
+ T *n = &m;
+};
+
+C c { };
+
+int
+main ()
+{
+  *c.n = 5;
+  if (c.m != 5)
+__builtin_abort ();
+
+  C d { 10 };
+  *d.n = *d.n + 1;
+  if (d.m != 11)
+__builtin_abort ();
+}
-- 
2.8.0.rc3.27.gade0865



Re: [RFA][PATCH][tree-optimization/64058] Add new coalescing tie breaker heuristic V2

2016-03-23 Thread Jeff Law

On 03/23/2016 03:14 AM, Richard Biener wrote:

On Wed, Mar 23, 2016 at 8:49 AM, Jeff Law  wrote:


This patch contains two parts.  First is a bit of raw infrastructure in
bitmap.c.  That change factors out the code to count the bits set in a given
BITMAP_WORD and uses that new function from bitmap_count_bits.

It also introduces bitmap_count_unique_bits which counts the unique bits set
across two bitmaps.


The second patch introduces the new tiebreaker in the coalescer and utilizes
the new bitmap_count_unique_bits in the process.  The major change since V1
is lazily counting those bits only when the major sort keys are the same for
two coalesce_pairs.  And we include the data for the new heuristic in the
debugging dumps.

As I feared building a good testcase for this is a major PITA.  Testing for
a specific set of coalescings is not easy at all and IMHO is going to be way
too fragile.

I did verify by hand that the test in 64058 and its duplicate were fixed by
various iterations of this patch using trunk compilers reported in those
BZs.  Spot checking other tests showed a small, but consistent improvement
in the generated code (fewer copies).

Bootstrapped and regression tested on x86-64-linux-gnu.  OK for the trunk
now?


+ssa_conflicts *conflicts_;
+var_map map_;

please make those 'static'.  Ok with this change.

Thanks.  Updated and installed.

jeff



Re: [PATCH, i386, AVX] Fix typo in i386.c/standard_sse_constant_opcode ().

2016-03-23 Thread Uros Bizjak
On Wed, Mar 23, 2016 at 2:03 PM, Jakub Jelinek  wrote:
> On Wed, Mar 23, 2016 at 03:55:01PM +0300, Kirill Yukhin wrote:
>> > No, the change is OK (standard_sse_constant_p does all the checks),
>> > but nowadays we could write this part as:
>> >
>> > --cut here--
>> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> > index 1639704..59154c3 100644
>> > --- a/gcc/config/i386/i386.c
>> > +++ b/gcc/config/i386/i386.c
>> > @@ -10859,10 +10859,7 @@ standard_sse_constant_opcode (rtx_insn *insn, rtx 
>> > x)
>> >   || get_attr_mode (insn) == MODE_V8DF
>> >   || get_attr_mode (insn) == MODE_V16SF)
>> > return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
>> > -  if (TARGET_AVX)
>> > -   return "vpcmpeqd\t%0, %0, %0";
>> > -  else
>> > -   return "pcmpeqd\t%0, %0";
>> > +  return "%vpcmpeqd\t%0, %d0";
>> >
>> >  default:
>> >break;
>> > -- cut here--
>> This looks much better to me.
>> Going to bootstrap/regtest.
>>
>> Is it ok to check into main trunk?
>
> But what is the advantage of doing that?  You trade one simple conditional
> for lots more processing at the insn emit time (parse %v, handle if
> (TARGET_AVX) there, parse 'd' modifier, handle it if (TARGET_AVX)
> conditionally).  One thing is in pattern, where it increases readability,
> but in a routine which handles several different conditions anyway?

There are several instances of the same construct in this function, so
I think this change follows the established pattern. And I would do it
in the same way in the original patch, if %d was available back
then...

So, Kirill, please go ahead with the patch (after bootstrap/regtest cycle).

Thanks,
Uros.


Re: [PATCH] Fix PR c++/70332 (ICE due to aggregate initialization of NSDMI)

2016-03-23 Thread Jason Merrill

OK.

Jason


[PATCH] Limit alias walking by speculative devirt

2016-03-23 Thread Richard Biener

This reduces the compile-time for the testcase from PR12392 from
~50s to ~35s, dropping the alias-stmt walking time from 40% to around 8%.

Currently (even when -fno-devirtualize-speculatively - heh) when
looking for a must-def that specifies the dynamic type of an object
we invoke a virtual call on we skip may-defs speculatively until
we run into the function start which of course may be quite some
work to do and which is of course not acceptable.

The following limits the number of may-defs we skip.

It does not limit the number of stmts we skip as non-aliasing, thus we can
still run into large overhead cases but that would require to assign an
overall budget to the current function which isn't that trivial
because this is a helper which is called from multiple places in GCC.
Ideally the devirt machinery would record interesting must-defs in
a body walk and thus when looking for it it could find candidates
with a hashtable lookup and would only need to check whether there
is no intermediate may-def.  But as said, with the "tool" nature
of the devirt thing this is hard (but maybe speculative devirt
is really only done at IPA time and not from PRE?).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Honza, is this ok?  Can you check effects on devirt numbers for
the testcases you have monitored that?

Thanks,
Richard.

2016-03-23  Michael Matz  
Richard Biener  

PR ipa/12392
* ipa-polymorphic-call.c (struct type_change_info): Change
speculative to an unsigned allowing to limit the work we do.
(MAXSPEC): New define.
(check_stmt_for_type_change): Limit the number of may-defs
skipped for speculative devirtualization to
max-speculative-devirt-maydefs.
* params.def (max-speculative-devirt-maydefs): New param.
* doc/invoke.texi (--param max-speculative-devirt-maydefs): Document.

Index: gcc/params.def
===
*** gcc/params.def  (revision 234415)
--- gcc/params.def  (working copy)
*** DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
*** 1203,1208 
--- 1203,1214 
  "hsa-gen-debug-stores",
  "Level of hsa debug stores verbosity",
  0, 0, 1)
+ 
+ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
+ "max-speculative-devirt-maydefs",
+ "Maximum number of may-defs visited when devirtualizing "
+ "speculatively", 50, 0, 0)
+ 
  /*
  
  Local variables:
Index: gcc/ipa-polymorphic-call.c
===
*** gcc/ipa-polymorphic-call.c  (revision 234415)
--- gcc/ipa-polymorphic-call.c  (working copy)
*** along with GCC; see the file COPYING3.
*** 38,43 
--- 38,44 
  #include "tree-dfa.h"
  #include "gimple-pretty-print.h"
  #include "tree-into-ssa.h"
+ #include "params.h"
  
  /* Return true when TYPE contains an polymorphic type and thus is interesting
 for devirtualization machinery.  */
*** struct type_change_info
*** 1094,1107 
tree known_current_type;
HOST_WIDE_INT known_current_offset;
  
/* Set to true if dynamic type change has been detected.  */
bool type_maybe_changed;
/* Set to true if multiple types have been encountered.  known_current_type
   must be disregarded in that case.  */
bool multiple_types_encountered;
-   /* Set to true if we possibly missed some dynamic type changes and we should
-  consider the set to be speculative.  */
-   bool speculative;
bool seen_unanalyzed_store;
  };
  
--- 1095,1109 
tree known_current_type;
HOST_WIDE_INT known_current_offset;
  
+   /* Set to nonzero if we possibly missed some dynamic type changes and we
+  should consider the set to be speculative.  */
+   unsigned speculative;
+ 
/* Set to true if dynamic type change has been detected.  */
bool type_maybe_changed;
/* Set to true if multiple types have been encountered.  known_current_type
   must be disregarded in that case.  */
bool multiple_types_encountered;
bool seen_unanalyzed_store;
  };
  
*** record_known_type (struct type_change_in
*** 1338,1343 
--- 1340,1353 
tci->type_maybe_changed = true;
  }
  
+ 
+ /* The maximum number of may-defs we visit when looking for a must-def
+that changes the dynamic type in check_stmt_for_type_change.  Tuned
+after the PR12392 testcase which unlimited spends 40% time within
+these alias walks and 8% with the following limit.  */
+ 
+ #define MAXSPEC (PARAM_VALUE (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS))
+ 
  /* Callback of walk_aliased_vdefs and a helper function for
 detect_type_change to check whether a particular statement may modify
 the virtual table pointer, and if possible also determine the new type of
*** check_stmt_for_type_change (ao_ref *ao A
*** 1384,1398 
  &size, &max_size, &reverse);

Re: [PR69315] enable finish_function to recurse for constexpr functions

2016-03-23 Thread Jakub Jelinek
On Tue, Mar 22, 2016 at 05:28:10PM -0400, Jason Merrill wrote:
> Jakub, you added defer_mark_used_calls for BZ 37189, do you think it's still
> needed?  The testcase passes without it now.

That's a question.  Digging through history, I found:
1) r149750 aka gimplification unit-at-a-time
2) PR48869 that fixed the regression caused by that; so we now should be
   trying to get_copy_ctor/get_dtor for parallel/task already during
   cp-genericize.c
But, looking at this function, e.g. taskloop construct isn't handled,
will try to construct a testcase where it could matter.  Dunno about OpenACC
or Cilk+ constructs, those aren't handled during cp-gimplify.c either.
Though, as we have gimplification unit-at-a-time, I'd say that even if we
synthetize_method during gimplification of something, we'd still enqueue it
in cgraph for later analysis instead of gimplification right away.
The gimplifier still asserts that it isn't called recursively, so I'd hope
the patch can be reverted (test kept).

Jakub


Re: C PATCH for c/70297 (crash with duplicate typedefs and -g)

2016-03-23 Thread Marek Polacek
On Mon, Mar 21, 2016 at 09:57:54PM +0100, Richard Biener wrote:
> On March 21, 2016 6:55:28 PM GMT+01:00, Marek Polacek  
> wrote:
> >This PR points out to a GC problem: when we freed a duplicate typedef,
> >we were
> >leaving its type in the variants list, with its TYPE_NAME still
> >pointing to the
> >now-freed TYPE_DECL, leading to a crash.   I was lucky to discover that
> >the
> >exact same problem was fixed in November for the C++ FE, so I could
> >just follow
> >suit here.  And because that change didn't add a new testcase, I'm
> >putting the
> >new test into c-c++-common/.
> >
> >Bootstrapped/regtested on x86_64-linux, ok for trunk/5?
> 
> But IIRC this will drop the aligned attribute effect as that applied to the 
> new type?

Yes.  So this metamorphosed into another issue: what to do with

typedef int T;
typedef int T __attribute__((aligned (16)));
typedef int T __attribute__((aligned (32)));
?  Either we can reject this, or do what clang (and other compilers) do, that
is pick the strictest alignment - 32 in this case.  The following patch does
that.
I've also checked what the C FE does for a few other attributes:
packed
visibility
 - ignored on TYPE_DECLs
used
unused
deprecated
scalar_storage_order
 - we seem to do the right thing
transparent_union
 - typedef int T;
   typedef int T __attribute__((transparent_union));
   gives an error
vector_size
mode
 - typedef int T;
   typedef int T __attribute__((mode (SI)));
   works while
   typedef int T;
   typedef int T __attribute__((mode (DI)));
   results in "conflicts" error
   similarly for vector_size

Well, here's the patch.  Thoughts?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-03-23  Marek Polacek  

PR c/70297
* c-decl.c (merge_decls): Also set TYPE_ALIGN and TYPE_USER_ALIGN.

* decl.c (duplicate_decls): Also set TYPE_ALIGN and TYPE_USER_ALIGN.

* c-c++-common/pr70297.c: New test.
* g++.dg/cpp0x/typedef-redecl.C: New test.
* gcc.dg/typedef-redecl2.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index bab4715..0dd2121 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2358,6 +2358,35 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, 
tree oldtype)
   DECL_ATTRIBUTES (newdecl)
 = targetm.merge_decl_attributes (olddecl, newdecl);
 
+  /* For typedefs use the old type, as the new type's DECL_NAME points
+ at newdecl, which will be ggc_freed.  */
+  if (TREE_CODE (newdecl) == TYPE_DECL)
+{
+  /* But NEWTYPE might have an attribute, honor that.  */
+  tree tem = newtype;
+  newtype = oldtype;
+
+  if (TYPE_USER_ALIGN (tem))
+   {
+ if (TYPE_ALIGN (tem) > TYPE_ALIGN (newtype))
+   TYPE_ALIGN (newtype) = TYPE_ALIGN (tem);
+ TYPE_USER_ALIGN (newtype) = true;
+   }
+
+  /* And remove the new type from the variants list.  */
+  if (TYPE_NAME (TREE_TYPE (newdecl)) == newdecl)
+   {
+ tree remove = TREE_TYPE (newdecl);
+ for (tree t = TYPE_MAIN_VARIANT (remove); ;
+  t = TYPE_NEXT_VARIANT (t))
+   if (TYPE_NEXT_VARIANT (t) == remove)
+ {
+   TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (remove);
+   break;
+ }
+   }
+}
+
   /* Merge the data types specified in the two decls.  */
   TREE_TYPE (newdecl)
 = TREE_TYPE (olddecl)
diff --git gcc/cp/decl.c gcc/cp/decl.c
index 47a53cb..c21d321 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -2033,8 +2033,17 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
 at newdecl, which will be ggc_freed.  */
   if (TREE_CODE (newdecl) == TYPE_DECL)
{
+ /* But NEWTYPE might have an attribute, honor that.  */
+ tree tem = TREE_TYPE (newdecl);
  newtype = oldtype;
 
+ if (TYPE_USER_ALIGN (tem))
+   {
+ if (TYPE_ALIGN (tem) > TYPE_ALIGN (newtype))
+   TYPE_ALIGN (newtype) = TYPE_ALIGN (tem);
+ TYPE_USER_ALIGN (newtype) = true;
+   }
+
  /* And remove the new type from the variants list.  */
  if (TYPE_NAME (TREE_TYPE (newdecl)) == newdecl)
{
diff --git gcc/testsuite/c-c++-common/pr70297.c 
gcc/testsuite/c-c++-common/pr70297.c
index e69de29..70a4f15 100644
--- gcc/testsuite/c-c++-common/pr70297.c
+++ gcc/testsuite/c-c++-common/pr70297.c
@@ -0,0 +1,9 @@
+/* PR c/70297 */
+/* { dg-do compile } */
+/* { dg-options "-g" } */
+
+typedef int T;
+typedef int T __attribute__((aligned (4)));
+struct S {
+  T *t;
+};
diff --git gcc/testsuite/g++.dg/cpp0x/typedef-redecl.C 
gcc/testsuite/g++.dg/cpp0x/typedef-redecl.C
index e69de29..576c7ce 100644
--- gcc/testsuite/g++.dg/cpp0x/typedef-redecl.C
+++ gcc/testsuite/g++.dg/cpp0x/typedef-redecl.C
@@ -0,0 +1,12 @@
+// PR c/70297
+// { dg-do compile { target c++11 } }
+
+#define N 64
+
+typedef int T;
+typedef int T __attribute__((aligned (N)));
+typedef int T __attribute__((aligned (N * 2)));
+typedef int T __a

Re: [PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.

2016-03-23 Thread Bin.Cheng
On Tue, Mar 22, 2016 at 11:01 AM, Richard Biener
 wrote:
> On Tue, Mar 22, 2016 at 11:22 AM, Bin.Cheng  wrote:
>> On Wed, Mar 16, 2016 at 10:06 AM, Richard Biener
>>  wrote:
>>>
>>> On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng  wrote:
>>> > Hi,
>>> > When I tried to decrease # of IV candidates, I removed code that adds IV 
>>> > candidates for use with constant offset stripped in use->base.  This is 
>>> > kind of too aggressive and triggers PR69042.  So here is a patch adding 
>>> > back the missing candidates.  Honestly, this patch doesn't truly fix the 
>>> > issue, it just brings back the original behavior in IVOPT part (Which is 
>>> > still a right thing to do I think).  The issue still depends on 
>>> > PIC_OFFSET register used on x86 target.  As discussed in 
>>> > https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html.  Furthermore, the real 
>>> > problem could be in register pressure modeling about PIC_OFFSET symbol in 
>>> > IVOPT.
>>> >
>>> > On AArch64, overall spec2k number isn't changed, though 173.applu is 
>>> > regressed by ~4% because couple of loops' # of candidates now hits 
>>> > "--param iv-consider-all-candidates-bound=30".  For spec2k6 data on 
>>> > AArch64, INT is not affected; FP overall is not changed, as for specific 
>>> > case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 2%.  To 
>>> > address the regression, I will send another patch increasing the 
>>> > parameter bound.
>>> >
>>> > Bootstrap&test on x86_64 and AArch64, is it OK?  In the meantime, I will 
>>> > collect spec2k6 data on x86_64.
>>>
>>> Ok.
>> Hi Richard,
>> Hmm, I got spec2k6 data on my x86_64, it (along with patch increasing
>> param iv-consider-all-candidates-bound) causes 1% regression for
>> 436.cactusADM in my run.  I looked into the code, for function
>> bench_staggeredleapfrog2_ (takes 99% running time after patching),
>> IVOPT chooses one fewer candidates for outer loop, but it does result
>> in couple of more instructions there.
>
> You mean IVOPTs chooses one fewer IVs for the outer loop?
Yes.
>
>>  For this case, register
>> pressure is a more interesting issue (36 candidates chosen in outer
>> loop, many stack accesses), not sure if this 1% regression blocks the
>> patch at this stage, or not?
>
> Is this with or without the increase of the param?  What compiler options and
> on what sub-architecture was this?
It's not the param, we have both address and generic iv_uses in the
form of {base + 4, step}, and there is another address iv_use {base +
8, step}.  With the patch we now add candidate {base, step}, which
will be used for all these uses.  Before the patch, candidates {base +
4, step} and {base + 8, step} are chosen for these iv_uses
respectively.
As a matter of fact, {base + 4, step} should be the best choice, but
we can't do that because candidate added from one iv_use is not
related to other iv_use and we exceed the bound of param
iv-consider-all-candidates-bound = 30.
>
> I think if the IVO choice looks optimal before the patch and not optimal after
> then it's worth blocking but it sounds like the IVO choice is a mess anyway?
Yes, it's a mess here.  For three outermost loops in 436, IVO chooses
>32 candidates for each loop.  I found at least one bug in register
pressure calculation is responsible for this.  By fixing this (plus
one another local patch), # of candidates of these loops are reduced
from 36/36/32 to 12/12/7.  And # of lines of assembly is reduced from
~7150 to ~6850.  Interesting thing is, the performance isn't
improved...  That may be because they are outermost loops.

> [can you maybe check IV choice by ICC?]
I don't have ICC at hand, but from past experiments, ICC tends to
reuse more common IVs than GCC.

Thanks,
bin


Re: Patches to fix GCC’s C++ exception handling on NetBSD/VAX

2016-03-23 Thread Christos Zoulas
In article ,
Jake Hamby   wrote:

Hi,

Thanks a lot for your patch. I applied it to our gcc-5 in the tree.
Unfortunately gcc-5 seems that it was never tested to even compile.
I fixed the simple compilation issue, but it fails to compile some
files with an internal error trying to construct a dwarf frame info
element with the wrong register. If you have some time, can you
take a look? I will too.

Thanks,

christos



[PATCH] Fix PR70372

2016-03-23 Thread Richard Biener

I am testing the following obvious patch to fix an ICE in reassoc.

Bootstrap / regtest on x86_64-unknown-linux-gnu running.

Richard.

2016-03-23  Richard Biener  

PR tree-optimization/70372
* tree-ssa-reassoc.c (eliminate_plus_minus_pair): Use
build_minus_one_cst to also handle vector types correctly.

* gcc.dg/tree-ssa/pr70372.c: New testcase.

Index: gcc/tree-ssa-reassoc.c
===
*** gcc/tree-ssa-reassoc.c  (revision 234415)
--- gcc/tree-ssa-reassoc.c  (working copy)
*** eliminate_plus_minus_pair (enum tree_cod
*** 821,827 
}
  
  ops->ordered_remove (i);
! add_to_ops_vec (ops, build_int_cst_type (op_type, -1));
  ops->ordered_remove (currindex);
  reassociate_stats.ops_eliminated ++;
  
--- 821,827 
}
  
  ops->ordered_remove (i);
! add_to_ops_vec (ops, build_minus_one_cst (op_type));
  ops->ordered_remove (currindex);
  reassociate_stats.ops_eliminated ++;
  
Index: gcc/testsuite/gcc.dg/tree-ssa/pr70372.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/pr70372.c (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/pr70372.c (working copy)
***
*** 0 
--- 1,14 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fno-tree-fre -w -Wno-psabi" } */
+ 
+ typedef unsigned v2ti __attribute__ ((vector_size (32)));
+ 
+ v2ti
+ foo (v2ti u, v2ti v)
+ {
+   u[0] >>= 0xf;
+   v ^= ~v;
+   v &= ~u;
+   v -= -u;
+   return v;
+ }


Re: [PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.

2016-03-23 Thread Richard Biener
On Wed, Mar 23, 2016 at 2:58 PM, Bin.Cheng  wrote:
> On Tue, Mar 22, 2016 at 11:01 AM, Richard Biener
>  wrote:
>> On Tue, Mar 22, 2016 at 11:22 AM, Bin.Cheng  wrote:
>>> On Wed, Mar 16, 2016 at 10:06 AM, Richard Biener
>>>  wrote:

 On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng  wrote:
 > Hi,
 > When I tried to decrease # of IV candidates, I removed code that adds IV 
 > candidates for use with constant offset stripped in use->base.  This is 
 > kind of too aggressive and triggers PR69042.  So here is a patch adding 
 > back the missing candidates.  Honestly, this patch doesn't truly fix the 
 > issue, it just brings back the original behavior in IVOPT part (Which is 
 > still a right thing to do I think).  The issue still depends on 
 > PIC_OFFSET register used on x86 target.  As discussed in 
 > https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html.  Furthermore, the real 
 > problem could be in register pressure modeling about PIC_OFFSET symbol 
 > in IVOPT.
 >
 > On AArch64, overall spec2k number isn't changed, though 173.applu is 
 > regressed by ~4% because couple of loops' # of candidates now hits 
 > "--param iv-consider-all-candidates-bound=30".  For spec2k6 data on 
 > AArch64, INT is not affected; FP overall is not changed, as for specific 
 > case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 2%.  To 
 > address the regression, I will send another patch increasing the 
 > parameter bound.
 >
 > Bootstrap&test on x86_64 and AArch64, is it OK?  In the meantime, I will 
 > collect spec2k6 data on x86_64.

 Ok.
>>> Hi Richard,
>>> Hmm, I got spec2k6 data on my x86_64, it (along with patch increasing
>>> param iv-consider-all-candidates-bound) causes 1% regression for
>>> 436.cactusADM in my run.  I looked into the code, for function
>>> bench_staggeredleapfrog2_ (takes 99% running time after patching),
>>> IVOPT chooses one fewer candidates for outer loop, but it does result
>>> in couple of more instructions there.
>>
>> You mean IVOPTs chooses one fewer IVs for the outer loop?
> Yes.
>>
>>>  For this case, register
>>> pressure is a more interesting issue (36 candidates chosen in outer
>>> loop, many stack accesses), not sure if this 1% regression blocks the
>>> patch at this stage, or not?
>>
>> Is this with or without the increase of the param?  What compiler options and
>> on what sub-architecture was this?
> It's not the param, we have both address and generic iv_uses in the
> form of {base + 4, step}, and there is another address iv_use {base +
> 8, step}.  With the patch we now add candidate {base, step}, which
> will be used for all these uses.  Before the patch, candidates {base +
> 4, step} and {base + 8, step} are chosen for these iv_uses
> respectively.
> As a matter of fact, {base + 4, step} should be the best choice, but
> we can't do that because candidate added from one iv_use is not
> related to other iv_use and we exceed the bound of param
> iv-consider-all-candidates-bound = 30.

I suppose upping that to 40 doesn't help here either (AFAIR the loops are quite
big).

>>
>> I think if the IVO choice looks optimal before the patch and not optimal 
>> after
>> then it's worth blocking but it sounds like the IVO choice is a mess anyway?
> Yes, it's a mess here.  For three outermost loops in 436, IVO chooses
>>32 candidates for each loop.  I found at least one bug in register
> pressure calculation is responsible for this.  By fixing this (plus
> one another local patch), # of candidates of these loops are reduced
> from 36/36/32 to 12/12/7.  And # of lines of assembly is reduced from
> ~7150 to ~6850.  Interesting thing is, the performance isn't
> improved...  That may be because they are outermost loops.

Yeah, maybe a good default heuristic would be to keep existing BIVs
and add no related candidates at all for outer loops.  Or to adjust
heuristic to prefer the least number of IVs for those.

>> [can you maybe check IV choice by ICC?]
> I don't have ICC at hand, but from past experiments, ICC tends to
> reuse more common IVs than GCC.

Ok.

I think the patch is ok and eventually we need to revisit the outer loop
handling with this as an excuse.

Richard.

> Thanks,
> bin


Re: Fix 69650, bogus line numbers from libcpp

2016-03-23 Thread Richard Biener
On Wed, Mar 23, 2016 at 2:15 PM, Bernd Schmidt  wrote:
> On 03/23/2016 01:41 PM, Richard Biener wrote:
>>
>> Btw, the issue in the PR is also fixed with a simple
>>
>> Index: libcpp/line-map.c
>> ===
>> --- libcpp/line-map.c   (revision 234415)
>> +++ libcpp/line-map.c   (working copy)
>> @@ -543,7 +543,7 @@ linemap_add (struct line_maps *set, enum
>>   to_file);
>>
>> /* A TO_FILE of NULL is special - we use the natural values.  */
>> -  if (error || to_file == NULL)
>> +  if (to_file == NULL)
>>  {
>>to_file = ORDINARY_MAP_FILE_NAME (from);
>>to_line = SOURCE_LINE (from, from[1].start_location);
>
>
> I looked at that, but that made it hard to add the testcase as the line
> numbers no longer match the dg-error directives. By moving this code we can
> ignore the erroneous #line directive, and for this one testcase at least,
> that makes the line numbers (and caret diagnostics etc.) come out right.

After some more digging and looking at your patch I'd approve that if it would
emit a warning rather than an error - so can you please adjust it?

Thanks,
Richard.

>
> Bernd


Re: [PATCH] Fix PR70372

2016-03-23 Thread Marc Glisse

On Wed, 23 Mar 2016, Richard Biener wrote:


I am testing the following obvious patch to fix an ICE in reassoc.

Bootstrap / regtest on x86_64-unknown-linux-gnu running.

Richard.

2016-03-23  Richard Biener  

PR tree-optimization/70372
* tree-ssa-reassoc.c (eliminate_plus_minus_pair): Use
build_minus_one_cst to also handle vector types correctly.


Are complex numbers also handled in reassoc? If so, build_all_ones_cst 
seems better. I never know if integer complex are supported...



* gcc.dg/tree-ssa/pr70372.c: New testcase.

Index: gcc/tree-ssa-reassoc.c
===
*** gcc/tree-ssa-reassoc.c  (revision 234415)
--- gcc/tree-ssa-reassoc.c  (working copy)
*** eliminate_plus_minus_pair (enum tree_cod
*** 821,827 
}

  ops->ordered_remove (i);
! add_to_ops_vec (ops, build_int_cst_type (op_type, -1));
  ops->ordered_remove (currindex);
  reassociate_stats.ops_eliminated ++;

--- 821,827 
}

  ops->ordered_remove (i);
! add_to_ops_vec (ops, build_minus_one_cst (op_type));
  ops->ordered_remove (currindex);
  reassociate_stats.ops_eliminated ++;

Index: gcc/testsuite/gcc.dg/tree-ssa/pr70372.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/pr70372.c (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/pr70372.c (working copy)
***
*** 0 
--- 1,14 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fno-tree-fre -w -Wno-psabi" } */
+
+ typedef unsigned v2ti __attribute__ ((vector_size (32)));
+
+ v2ti
+ foo (v2ti u, v2ti v)
+ {
+   u[0] >>= 0xf;
+   v ^= ~v;
+   v &= ~u;
+   v -= -u;
+   return v;
+ }



--
Marc Glisse


Re: [PATCH] Fix PR70372

2016-03-23 Thread Richard Biener
On Wed, 23 Mar 2016, Marc Glisse wrote:

> On Wed, 23 Mar 2016, Richard Biener wrote:
> 
> > I am testing the following obvious patch to fix an ICE in reassoc.
> > 
> > Bootstrap / regtest on x86_64-unknown-linux-gnu running.
> > 
> > Richard.
> > 
> > 2016-03-23  Richard Biener  
> > 
> > PR tree-optimization/70372
> > * tree-ssa-reassoc.c (eliminate_plus_minus_pair): Use
> > build_minus_one_cst to also handle vector types correctly.
> 
> Are complex numbers also handled in reassoc? If so, build_all_ones_cst seems
> better. I never know if integer complex are supported...

I think so, they are ANY_INTEGRAL_TYPE_P.  I'll adjust accordingly.

Richard.


Re: [PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.

2016-03-23 Thread Bin.Cheng
On Wed, Mar 23, 2016 at 2:18 PM, Richard Biener
 wrote:
> On Wed, Mar 23, 2016 at 2:58 PM, Bin.Cheng  wrote:
>> On Tue, Mar 22, 2016 at 11:01 AM, Richard Biener
>>  wrote:
>>> On Tue, Mar 22, 2016 at 11:22 AM, Bin.Cheng  wrote:
 On Wed, Mar 16, 2016 at 10:06 AM, Richard Biener
  wrote:
>
> On Wed, Mar 16, 2016 at 10:48 AM, Bin Cheng  wrote:
> > Hi,
> > When I tried to decrease # of IV candidates, I removed code that adds 
> > IV candidates for use with constant offset stripped in use->base.  This 
> > is kind of too aggressive and triggers PR69042.  So here is a patch 
> > adding back the missing candidates.  Honestly, this patch doesn't truly 
> > fix the issue, it just brings back the original behavior in IVOPT part 
> > (Which is still a right thing to do I think).  The issue still depends 
> > on PIC_OFFSET register used on x86 target.  As discussed in 
> > https://gcc.gnu.org/ml/gcc/2016-02/msg00040.html.  Furthermore, the 
> > real problem could be in register pressure modeling about PIC_OFFSET 
> > symbol in IVOPT.
> >
> > On AArch64, overall spec2k number isn't changed, though 173.applu is 
> > regressed by ~4% because couple of loops' # of candidates now hits 
> > "--param iv-consider-all-candidates-bound=30".  For spec2k6 data on 
> > AArch64, INT is not affected; FP overall is not changed, as for 
> > specific case: 459.GemsFDTD is regressed by 2%, 433.milc is improved by 
> > 2%.  To address the regression, I will send another patch increasing 
> > the parameter bound.
> >
> > Bootstrap&test on x86_64 and AArch64, is it OK?  In the meantime, I 
> > will collect spec2k6 data on x86_64.
>
> Ok.
 Hi Richard,
 Hmm, I got spec2k6 data on my x86_64, it (along with patch increasing
 param iv-consider-all-candidates-bound) causes 1% regression for
 436.cactusADM in my run.  I looked into the code, for function
 bench_staggeredleapfrog2_ (takes 99% running time after patching),
 IVOPT chooses one fewer candidates for outer loop, but it does result
 in couple of more instructions there.
>>>
>>> You mean IVOPTs chooses one fewer IVs for the outer loop?
>> Yes.
>>>
  For this case, register
 pressure is a more interesting issue (36 candidates chosen in outer
 loop, many stack accesses), not sure if this 1% regression blocks the
 patch at this stage, or not?
>>>
>>> Is this with or without the increase of the param?  What compiler options 
>>> and
>>> on what sub-architecture was this?
>> It's not the param, we have both address and generic iv_uses in the
>> form of {base + 4, step}, and there is another address iv_use {base +
>> 8, step}.  With the patch we now add candidate {base, step}, which
>> will be used for all these uses.  Before the patch, candidates {base +
>> 4, step} and {base + 8, step} are chosen for these iv_uses
>> respectively.
>> As a matter of fact, {base + 4, step} should be the best choice, but
>> we can't do that because candidate added from one iv_use is not
>> related to other iv_use and we exceed the bound of param
>> iv-consider-all-candidates-bound = 30.
>
> I suppose upping that to 40 doesn't help here either (AFAIR the loops are 
> quite
> big).
Yes, there are ~100 candidates in total for each loop.
>
>>>
>>> I think if the IVO choice looks optimal before the patch and not optimal 
>>> after
>>> then it's worth blocking but it sounds like the IVO choice is a mess anyway?
>> Yes, it's a mess here.  For three outermost loops in 436, IVO chooses
>>>32 candidates for each loop.  I found at least one bug in register
>> pressure calculation is responsible for this.  By fixing this (plus
>> one another local patch), # of candidates of these loops are reduced
>> from 36/36/32 to 12/12/7.  And # of lines of assembly is reduced from
>> ~7150 to ~6850.  Interesting thing is, the performance isn't
>> improved...  That may be because they are outermost loops.
>
> Yeah, maybe a good default heuristic would be to keep existing BIVs
> and add no related candidates at all for outer loops.  Or to adjust
> heuristic to prefer the least number of IVs for those.
>
>>> [can you maybe check IV choice by ICC?]
>> I don't have ICC at hand, but from past experiments, ICC tends to
>> reuse more common IVs than GCC.
>
> Ok.
>
> I think the patch is ok and eventually we need to revisit the outer loop
> handling with this as an excuse.
Yes, currently GCC doesn't handle loop nest at all.  Will revisit this
part in coming stage1.
Patch applied as revision 234429.

Thanks,
bin


Re: [PATCH PR69042/01]Add IV candidate for use with constant offset stripped in base.

2016-03-23 Thread Jeff Law

On 03/23/2016 09:29 AM, Bin.Cheng wrote:



I think the patch is ok and eventually we need to revisit the outer loop
handling with this as an excuse.

Yes, currently GCC doesn't handle loop nest at all.  Will revisit this
part in coming stage1.
Patch applied as revision 234429.

So given there's follow-up items to do, it's not clear the state of 69042.

Are the regressions for 69042 resolved?  If so, we should consider 
closing 69042 and opening a new BZ for the additional items we want to 
fix in the next stage1 cycle.


If not, let's make sure the 69042 is clear on what is still regressing.

Jeff


Re: [PR69315] enable finish_function to recurse for constexpr functions

2016-03-23 Thread Jakub Jelinek
On Wed, Mar 23, 2016 at 02:57:36PM +0100, Jakub Jelinek wrote:
> On Tue, Mar 22, 2016 at 05:28:10PM -0400, Jason Merrill wrote:
> > Jakub, you added defer_mark_used_calls for BZ 37189, do you think it's still
> > needed?  The testcase passes without it now.
> 
> That's a question.  Digging through history, I found:
> 1) r149750 aka gimplification unit-at-a-time
> 2) PR48869 that fixed the regression caused by that; so we now should be
>trying to get_copy_ctor/get_dtor for parallel/task already during
>cp-genericize.c
> But, looking at this function, e.g. taskloop construct isn't handled,
> will try to construct a testcase where it could matter.  Dunno about OpenACC
> or Cilk+ constructs, those aren't handled during cp-gimplify.c either.
> Though, as we have gimplification unit-at-a-time, I'd say that even if we
> synthetize_method during gimplification of something, we'd still enqueue it
> in cgraph for later analysis instead of gimplification right away.
> The gimplifier still asserts that it isn't called recursively, so I'd hope
> the patch can be reverted (test kept).

I'll test then following (had to tweak the testcase a little bit, put c++11
into target, move it to proper dir, and build with -O2 (as starting with
r233671 we don't ICE on the testcase when -fno-inline, which is the default
at -O0).

2016-03-23  Alexandre Oliva  
Jason Merrill  
Jakub Jelinek  

PR c++/69315
* cp-tree.h (defer_mark_used_calls, deferred_mark_used_calls): Remove.
* decl.c (defer_mark_used_calls, deferred_mark_used_calls): Remove.
(finish_function): Don't set or test them.
* decl2.c (mark_used): Don't handle defer_mark_used_calls.

* g++.dg/cpp0x/constexpr-69315.C: New test.
* g++.dg/cpp0x/variadic122.C: Change one dg-warning into dg-bogus.

--- gcc/cp/cp-tree.h.jj 2016-03-21 10:12:31.0 +0100
+++ gcc/cp/cp-tree.h2016-03-23 16:38:46.193501388 +0100
@@ -5846,8 +5846,6 @@ extern tree fndecl_declared_return_type
 extern bool undeduced_auto_decl(tree);
 extern void require_deduced_type   (tree);
 
-extern bool defer_mark_used_calls;
-extern GTY(()) vec *deferred_mark_used_calls;
 extern tree finish_case_label  (location_t, tree, tree);
 extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t);
 
--- gcc/cp/decl.c.jj2016-03-22 09:05:31.0 +0100
+++ gcc/cp/decl.c   2016-03-23 16:39:11.563158827 +0100
@@ -227,11 +227,6 @@ struct GTY((for_user)) named_label_entry
function, two inside the body of a function in a local class, etc.)  */
 int function_depth;
 
-/* To avoid unwanted recursion, finish_function defers all mark_used calls
-   encountered during its execution until it finishes.  */
-bool defer_mark_used_calls;
-vec *deferred_mark_used_calls;
-
 /* States indicating how grokdeclarator() should handle declspecs marked
with __attribute__((deprecated)).  An object declared as
__attribute__((deprecated)) suppresses warnings of uses of other
@@ -14594,9 +14589,6 @@ finish_function (int flags)
   if (c_dialect_objc ())
 objc_finish_function ();
 
-  gcc_assert (!defer_mark_used_calls);
-  defer_mark_used_calls = true;
-
   record_key_method_defined (fndecl);
 
   fntype = TREE_TYPE (fndecl);
@@ -14846,17 +14838,6 @@ finish_function (int flags)
   /* Clean up.  */
   current_function_decl = NULL_TREE;
 
-  defer_mark_used_calls = false;
-  if (deferred_mark_used_calls)
-{
-  unsigned int i;
-  tree decl;
-
-  FOR_EACH_VEC_SAFE_ELT (deferred_mark_used_calls, i, decl)
-   mark_used (decl);
-  vec_free (deferred_mark_used_calls);
-}
-
   invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl);
   return fndecl;
 }
--- gcc/cp/decl2.c.jj   2016-02-19 08:55:04.0 +0100
+++ gcc/cp/decl2.c  2016-03-23 16:39:28.604928715 +0100
@@ -5140,14 +5140,6 @@ mark_used (tree decl, tsubst_flags_t com
   if (DECL_ODR_USED (decl))
 return true;
 
-  /* If within finish_function, defer the rest until that function
- finishes, otherwise it might recurse.  */
-  if (defer_mark_used_calls)
-{
-  vec_safe_push (deferred_mark_used_calls, decl);
-  return true;
-}
-
   /* Normally, we can wait until instantiation-time to synthesize DECL.
  However, if DECL is a static data member initialized with a constant
  or a constexpr function, we need it right now because a reference to
--- gcc/testsuite/g++.dg/cpp0x/constexpr-69315.C.jj 2016-03-23 
16:48:59.675217676 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-69315.C2016-03-23 
16:46:37.0 +0100
@@ -0,0 +1,35 @@
+// PR c++/69315
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2" }
+
+// Template instantiation and evaluation for folding within
+// finish_function may call finish_function recursively.
+// Make sure we don't reject or delay that sort of recursion.
+
+template  struct Iter;
+
+struct Arg {
+  Iter begin()

rs6000 stack_tie mishap again

2016-03-23 Thread Olivier Hainque
Hello,

This is a proposal to address what I think has been a long-standing, very
nasty latent code generation problem, which just manifested in-house on a
proprietary testcase.

This is intricate, so long email ...

Visible misbehavior


The visible effect is a powerpc-eabispe compiler (no red-zone) producing an
epilogue sequence like

   addi %r11,%r1,184# temp pointer within frame

   addi %r1,%r11,104# release frame

   evldd %r21,16(%r11)  # restore registers
   ...  # ...
   evldd %r31,96(%r11)  # ...

   blr  # return

This is causing troubles in situations where interrupt code somehow quicks in
between the frame release and the blr, clobbering the stack slots before the
register restore insn have run.

Triggering conditions
=

We are witnessing a situation where the rs6000 stack_tie mechanism is
ineffective, out of a missed dependency between the stack_tie insn and some
register restores.

We have observed this with a gcc 4.7 back-end and weren't able to reproduce
with a more recent version. I believe, however, that the problem is still
latent, just requiring extremely precise conditions to trigger the problematic
scheduling decision.

At the RTL level, with our 4.7 based compiler, the chain of events is as
follows:

Compiling our testcase with -O2 -mcpu=8540 -mfloat-gprs=double, we get up to:

 p.adb.216r.split4:

(insn 710 147 712 2 (set (reg/f:SI 11 %r11 [650])
(high:SI (symbol_ref/u:SI ("*.LC0") [flags 0x82]))) 495 {elf_high}
...

(note 891 627 892 32 NOTE_INSN_EPILOGUE_BEG)
...
... sequence of register restores ...

(insn 894 893 895 32 (set (reg:SI 11 %r11)
(plus:SI (reg/f:SI 1 %r1)
(const_int 184 [0xb8]))) 
... 
(insn 907 906 908 32 (set (reg:V2SI 31 31)
(mem:V2SI (plus:SI (reg:SI 11 11)
(const_int 96 [0x60])) [0 S8 A64])) 
 
... STACK TIE insn, intended to prevent the scheduler from moving ...
... restores past this point, so they remain issued prior to the  ...
... following insn:
 
(insn 908 907 909 32 (parallel [
(set (mem/c:BLK (reg/f:SI 1 1) [17 A8])
(const_int 0 [0]))
(set (mem/c:BLK (reg:SI 11 11) [17 A8])
(const_int 0 [0]))
]) ../p.adb:84:8 681 {stack_tie}
 (nil))
 
... bumping r1 (stack pointer), releasing the stack frame:
 
(insn/f 909 908 910 32 (set (reg/f:SI 1 1)
(plus:SI (reg:SI 11 11)
(const_int 104 [0x68]))) ../p.adb:84:8 78 {*addsi3_internal1}

The tie references a mem with alias set 17 (stack accesses). The restores
use alias set 0, which should conflict anyway, so this looks OK.

When sched2 kicks in, we however fail to establish a dependency between
907 (and other restores relative to r11) and the stack_tie.

At the bottom of the chain, we see write_dependence_p claiming absence
of dependence for 907 from

  if (! writep)
{
  base = find_base_term (mem_addr);
  if (base && (GET_CODE (base) == LABEL_REF
   || (GET_CODE (base) == SYMBOL_REF
   && CONSTANT_POOL_ADDRESS_P (base
return 0;
}


with

(gdb) pr mem_addr
(plus:SI (reg:SI 11 11)
(const_int 96 [0x60]))
 
and
 
(gdb) pr base
(symbol_ref/u:SI ("*.LC0") [flags 0x82])
 
coming from insn 710, despite 894 in between. Ug.

Further Analysis & patch proposal
=

The reason why 894 is not accounted in the base ref computation is because it
is part of the epilogue sequence, and init_alias_analysis has:

  /* Walk the insns adding values to the new_reg_base_value array.  */
  for (i = 0; i < rpo_cnt; i++)
{ ...
  if (could_be_prologue_epilogue
  && prologue_epilogue_contains (insn))
continue;

The motivation for this is unclear to me.

My rough understanding is that we probably really care about frame_related
insns only here, at least on targets where the flag is supposed to be set
accurately.

This is the basis of the proposed patch, which essentially disconnects the
shortcut above for !frame_related insns on targets which need precise alias
analysis within epilogues, as indicated by a new target hook.

On the key insn at hand, the frame_related bit was cleared on purpose,
per:

https://gcc.gnu.org/ml/gcc-patches/2011-11/msg00543.html

  "1) Marking an instruction setting up r11 for use by _save64gpr_* as
   frame-related results in r11 being set as the cfa for the rest of the
   function.  That's bad when r11 gets used for other things later.
  "

I have verified that it cures the problem with our gcc-4.7 based toolchain,
and we have been exercising it in gcc-4.9 based compilers on lots of ppc and
e500 configurations without problems for a few weeks now

I'll be happy to perform further testing if required. This is pretty heavy for
me, however, so I'd rather have a discussion and a preliminary agreement on
the approach first.

Thi

Re: [AArch64] Emit division using the Newton series

2016-03-23 Thread Evandro Menezes

On 03/17/16 15:09, Evandro Menezes wrote:

This patch implements FP division by an approximation using the Newton
series.

With this patch, DF division is sped up by over 100% and SF division,
zilch, both on A57 and on M1.


gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
(aarch64_emit_approx_div): Declare new function.
* config/aarch64/aarch64.c
(aarch64_emit_approx_div): Define new function.
* config/aarch64/aarch64.md ("div3"): New expansion.
* config/aarch64/aarch64-simd.md ("div3"): Likewise.


This version of the patch cleans up the changes to the MD files and 
optimizes the division when the numerator is 1.0.


Again, I look forward to your feedback.

Thank you,

--
Evandro Menezes

>From 5cd2a628086af3656b3242f0c4f41784646f52b1 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 17 Mar 2016 14:44:55 -0500
Subject: [PATCH] [AArch64] Emit division using the Newton series

2016-03-17  Evandro Menezes  

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
	* config/aarch64/aarch64-protos.h
	(AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
	(aarch64_emit_approx_div): Declare new function.
	* config/aarch64/aarch64.c
	(aarch64_emit_approx_div): Define new function.
	* config/aarch64/aarch64.md ("div3"): New expansion.
	* config/aarch64/aarch64-simd.md ("div3"): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  4 ++
 gcc/config/aarch64/aarch64-simd.md  | 14 +-
 gcc/config/aarch64/aarch64-tuning-flags.def |  3 +-
 gcc/config/aarch64/aarch64.c| 73 +
 gcc/config/aarch64/aarch64.md   | 19 ++--
 5 files changed, 107 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index dced209..52c4838 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags
 };
 #undef AARCH64_EXTRA_TUNING_OPTION
 
+#define AARCH64_EXTRA_TUNE_APPROX_DIV \
+(AARCH64_EXTRA_TUNE_APPROX_DIV_DF | AARCH64_EXTRA_TUNE_APPROX_DIV_SF)
+
 extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
@@ -362,6 +365,7 @@ void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_save_restore_target_globals (tree);
 void aarch64_emit_approx_rsqrt (rtx, rtx);
+bool aarch64_emit_approx_div (rtx, rtx, rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..99be92e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1509,7 +1509,19 @@
   [(set_attr "type" "neon_fp_mul_")]
 )
 
-(define_insn "div3"
+(define_expand "div3"
+ [(set (match_operand:VDQF 0 "register_operand")
+   (div:VDQF (match_operand:VDQF 1 "general_operand")
+		 (match_operand:VDQF 2 "register_operand")))]
+ "TARGET_SIMD"
+{
+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
+DONE;
+
+  operands[1] = force_reg (mode, operands[1]);
+})
+
+(define_insn "*div3"
  [(set (match_operand:VDQF 0 "register_operand" "=w")
(div:VDQF (match_operand:VDQF 1 "register_operand" "w")
 		 (match_operand:VDQF 2 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..ececdc1 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,5 @@
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("approx_div", APPROX_DIV_DF)
+AARCH64_EXTRA_TUNING_OPTION ("approx_divf", APPROX_DIV_SF)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12e498d..2c878ce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7540,6 +7540,79 @@ aarch64_emit_approx_rsqrt (rtx dst, rtx src)
   emit_move_insn (dst, x0);
 }
 
+/* Emit the instruction sequence to compute the approximation for a reciprocal.  */
+
+bool
+aarch64_emit_approx_div (rtx quo, rtx num, rtx div)
+{
+  machine_mode mode = GET_MODE (quo);
+
+  if (!flag_finite_math_only
+  || flag_trapping_math
+  || !flag_unsafe_math_optimizations
+  || optimize_function_for_size_p (cfun)
+  || ((GET_MODE_INNER (mode) != SFmode
+   || !(aarch64_tune_params.extra_tuning_flags
+& AARCH64_EXTRA_TUNE_APPROX_DIV_SF))
+  && (GET_MODE_INNER (mode) != DFmode
+  || !(aarch64_

Re: [AArch64] Emit division using the Newton series

2016-03-23 Thread Evandro Menezes

On 03/17/16 15:09, Evandro Menezes wrote:
This patch implements FP division by an approximation using the Newton 
series.


With this patch, DF division is sped up by over 100% and SF division, 
zilch, both on A57 and on M1.


gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
(aarch64_emit_approx_div): Declare new function.
* config/aarch64/aarch64.c
(aarch64_emit_approx_div): Define new function.
* config/aarch64/aarch64.md ("div3"): New expansion.
* config/aarch64/aarch64-simd.md ("div3"): Likewise.


This version of the patch cleans up the changes to the MD files and 
optimizes the division when the numerator is 1.0.


Again, I look forward to your feedback.

Thank you,

--
Evandro Menezes



[PATCH] Fix PR c++/70347 (default member initializer not picked up by union)

2016-03-23 Thread Patrick Palka
When performing aggregate initialization on a union we fail to take into
account the union's NSDMI (if it has one).  This patch teaches
process_init_constructor_union to consider NSDMIs.

Does this look OK to commit after bootstrap + regtest?

gcc/cp/ChangeLog:

PR c++/70347
* typeck.c (process_init_constructor_union): If the initializer
is empty, use the union's NSDMI if it has one.

gcc/testsuite/ChangeLog:

PR c++/70347
* g++.dg/cpp1y/nsdmi-union1.C: New test.
---
 gcc/cp/typeck2.c  | 19 --
 gcc/testsuite/g++.dg/cpp1y/nsdmi-union1.C | 33 +++
 2 files changed, 50 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/nsdmi-union1.C

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 2a76c96..4ab77cd 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1499,9 +1499,24 @@ process_init_constructor_union (tree type, tree init,
   constructor_elt *ce;
   int len;
 
-  /* If the initializer was empty, use default zero initialization.  */
+  /* If the initializer was empty, use the union's NSDMI if it has one.
+ Otherwise use default zero initialization.  */
   if (vec_safe_is_empty (CONSTRUCTOR_ELTS (init)))
-return 0;
+{
+  for (tree field = TYPE_FIELDS (type); field; field = TREE_CHAIN (field))
+   {
+ if (DECL_INITIAL (field))
+   {
+ CONSTRUCTOR_APPEND_ELT (CONSTRUCTOR_ELTS (init),
+ field,
+ get_nsdmi (field, /*in_ctor=*/false));
+ break;
+   }
+   }
+
+  if (vec_safe_is_empty (CONSTRUCTOR_ELTS (init)))
+   return 0;
+}
 
   len = CONSTRUCTOR_ELTS (init)->length ();
   if (len > 1)
diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-union1.C 
b/gcc/testsuite/g++.dg/cpp1y/nsdmi-union1.C
new file mode 100644
index 000..45b9fd0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-union1.C
@@ -0,0 +1,33 @@
+// PR c++/70347
+// { dg-do compile { target c++14 } }
+
+union A {
+  char a;
+  long b = -42;
+};
+
+struct B {
+  union {
+char a = 10;
+long b;
+  };
+};
+
+A c1{};
+A c2{4};
+B c3{};
+B c4{{9}};
+
+int main() {
+  if (c1.b != -42)
+__builtin_abort ();
+
+  if (c2.a != 4)
+__builtin_abort ();
+
+  if (c3.a != 10)
+__builtin_abort ();
+
+  if (c4.a != 9)
+__builtin_abort ();
+}
-- 
2.8.0.rc3.27.gade0865



C++ PATCH to add -Wignored-attributes (PR c++/69884)

2016-03-23 Thread Marek Polacek
This PR is a request to add a way how to suppress the warning about
ignored attributes.  We concluded that -Wattributes isn't the right
warning for this case, so I went ahead with Richi's suggestion to
add -Wignored-attributes.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-03-23  Marek Polacek  

PR c++/69884
* c.opt (Wignored-attributes): New option.

* pt.c (canonicalize_type_argument): Use OPT_Wignored_attributes.

* doc/invoke.texi: Document -Wignored-attributes.

* g++.dg/warn/Wignored-attributes-1.C: New test.
* g++.dg/warn/Wignored-attributes-2.C: New test.

diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index 7c5f6c7..4f86876 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -482,6 +482,10 @@ Wignored-qualifiers
 C C++ Var(warn_ignored_qualifiers) Warning EnabledBy(Wextra)
 Warn whenever type qualifiers are ignored.
 
+Wignored-attributes
+C C++ Var(warn_ignored_attributes) Init(1) Warning
+Warn whenever attributes are ignored.
+
 Wincompatible-pointer-types
 C ObjC Var(warn_incompatible_pointer_types) Init(1) Warning
 Warn when there is a conversion between pointers that have incompatible types.
diff --git gcc/cp/pt.c gcc/cp/pt.c
index 45cd1ea..6837438 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -6950,7 +6950,8 @@ canonicalize_type_argument (tree arg, tsubst_flags_t 
complain)
   tree canon = strip_typedefs (arg, &removed_attributes);
   if (removed_attributes
   && (complain & tf_warning))
-warning (0, "ignoring attributes on template argument %qT", arg);
+warning (OPT_Wignored_attributes,
+"ignoring attributes on template argument %qT", arg);
   return canon;
 }
 
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 99ac11b..9e54bb7 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -267,7 +267,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wno-format-contains-nul -Wno-format-extra-args -Wformat-nonliteral @gol
 -Wformat-security  -Wformat-signedness  -Wformat-y2k -Wframe-address @gol
 -Wframe-larger-than=@var{len} -Wno-free-nonheap-object -Wjump-misses-init @gol
--Wignored-qualifiers  -Wincompatible-pointer-types @gol
+-Wignored-qualifiers  -Wignored-attributes  -Wincompatible-pointer-types @gol
 -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int @gol
 -Winit-self  -Winline  -Wno-int-conversion @gol
 -Wno-int-to-pointer-cast -Winvalid-memory-model -Wno-invalid-offsetof @gol
@@ -3886,6 +3886,14 @@ even without this option.
 
 This warning is also enabled by @option{-Wextra}.
 
+@item -Wignored-attributes @r{(C and C++ only)}
+@opindex Wignored-attributes
+@opindex Wno-ignored-attributes
+Warn when an attribute is ignored.  This is different from the
+@option{-Wattributes} option in that it warns whenever the compiler decides
+to drop an attribute, not that the attribute is either unknown, used in a
+wrong place, etc.  This warning is enabled by default.
+
 @item -Wmain
 @opindex Wmain
 @opindex Wno-main
diff --git gcc/testsuite/g++.dg/warn/Wignored-attributes-1.C 
gcc/testsuite/g++.dg/warn/Wignored-attributes-1.C
index e69de29..46ccc4a 100644
--- gcc/testsuite/g++.dg/warn/Wignored-attributes-1.C
+++ gcc/testsuite/g++.dg/warn/Wignored-attributes-1.C
@@ -0,0 +1,6 @@
+// PR c++/69884
+// { dg-do compile }
+
+typedef float __m128 __attribute__((__vector_size__(16), __may_alias__));
+template  struct A;
+template <> struct A<__m128>; // { dg-warning "ignoring attributes on template 
argument" }
diff --git gcc/testsuite/g++.dg/warn/Wignored-attributes-2.C 
gcc/testsuite/g++.dg/warn/Wignored-attributes-2.C
index e69de29..6431607 100644
--- gcc/testsuite/g++.dg/warn/Wignored-attributes-2.C
+++ gcc/testsuite/g++.dg/warn/Wignored-attributes-2.C
@@ -0,0 +1,7 @@
+// PR c++/69884
+// { dg-do compile }
+// { dg-options "-Wno-ignored-attributes" }
+
+typedef float __m128 __attribute__((__vector_size__(16), __may_alias__));
+template  struct A;
+template <> struct A<__m128>;

Marek


[committed, testsuite] Add missing initializations in oacc testcases

2016-03-23 Thread Tom de Vries

Hi,

I've run:
- the goacc.exp testcases, and
- the target-libgomp oacc testcases
with -Wuninitialized, and found a few missing initalizations. Fixed and 
committed as attached.


Thanks,
- Tom
Add missing initializations in oacc testcases

2016-03-23  Tom de Vries  

	* c-c++-common/goacc/kernels-default.c (foo): Add missing
	initialization.
	* gfortran.dg/goacc/private-3.f95 (test): Same.
	* gfortran.dg/goacc/routine-5.f90 (gang, worker, vector, seq): Add
	missing use param.

	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Add missing
	initialization of lresult and lvresult.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Same.

---
 gcc/testsuite/c-c++-common/goacc/kernels-default.c | 2 +-
 gcc/testsuite/gfortran.dg/goacc/private-3.f95  | 1 +
 gcc/testsuite/gfortran.dg/goacc/routine-5.f90  | 4 
 libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90 | 4 ++--
 libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90 | 4 ++--
 5 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-default.c b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
index 58cd5e1..cb031c1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-default.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-default.c
@@ -4,7 +4,7 @@
 void
 foo (void)
 {
-  unsigned int i;
+  unsigned int i = 0;
 #pragma acc kernels
   {
 i++;
diff --git a/gcc/testsuite/gfortran.dg/goacc/private-3.f95 b/gcc/testsuite/gfortran.dg/goacc/private-3.f95
index 3490263..a7c6d81 100644
--- a/gcc/testsuite/gfortran.dg/goacc/private-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/private-3.f95
@@ -14,6 +14,7 @@ program test
 !  !$acc end parallel
 
   !$acc parallel private (k)
+  k = 0
   !$acc loop reduction (+:k)
   do i = 1, n
  k = k + 1
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-5.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-5.f90
index 68c5149..5037a31 100644
--- a/gcc/testsuite/gfortran.dg/goacc/routine-5.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-5.f90
@@ -5,6 +5,7 @@ module param
 end module param
 
 subroutine gang (a)
+  use param
   !$acc routine gang
   integer, intent (inout) :: a(N)
   integer :: i
@@ -31,6 +32,7 @@ subroutine gang (a)
 end subroutine gang
 
 subroutine worker (a)
+  use param
   !$acc routine worker
   integer, intent (inout) :: a(N)
   integer :: i
@@ -57,6 +59,7 @@ subroutine worker (a)
 end subroutine worker
 
 subroutine vector (a)
+  use param
   !$acc routine vector
   integer, intent (inout) :: a(N)
   integer :: i
@@ -83,6 +86,7 @@ subroutine vector (a)
 end subroutine vector
 
 subroutine seq (a)
+  use param
   !$acc routine seq
   integer, intent (inout) :: a(N)
   integer :: i
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
index 3d99668..96955ce 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-2.f90
@@ -92,8 +92,8 @@ program reduction_2
 
   if (result.ne.vresult) call abort
 
-  result = 1
-  vresult = 1
+  lresult = .true.
+  lvresult = .true.
 
   ! '.and.' reductions
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
index d0b590e..ecf7fbe 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reduction-3.f90
@@ -92,8 +92,8 @@ program reduction_3
 
   if (result.ne.vresult) call abort
 
-  result = 1
-  vresult = 1
+  lresult = .true.
+  lvresult = .true.
 
   ! '.and.' reductions
 


Re: rs6000 stack_tie mishap again

2016-03-23 Thread David Edelsohn
First, SPE has not been maintained and little participation from
Freescale.  I would rather deprecate all SPE support.  SPE ABI is
broken by design.

I find the approach very heavy-handed.  If you want to enable the
target hook for SPE *only*, that's fine with me.  The description and
references to prior SPE prologue and epilogue changes do not confirm a
wider problem.

- David


Re: [PATCH] Fix PR c++/70347 (default member initializer not picked up by union)

2016-03-23 Thread Jason Merrill

OK.

Jason


Re: C++ PATCH to add -Wignored-attributes (PR c++/69884)

2016-03-23 Thread Jason Merrill

OK.

Jason


Re: [PR69315] enable finish_function to recurse for constexpr functions

2016-03-23 Thread Jason Merrill

OK.

Jason


Re: [PATCH] 69517 - [5/6 regression] SEGV on a VLA with excess initializer elements

2016-03-23 Thread Jason Merrill

On 03/06/2016 08:38 PM, Martin Sebor wrote:

There are a few additional points to note about the patch:

1) It restores the std::bad_array_length exception from N3639,
even though the class isn't specified by the C++ standard.
At first I thought that introducing a different (private)
type would be more appropriate, but in the end couldn't come
up with a good argument for not keeping the same type.  Using
the same type also allows programs that rely on the exception
and that were built with GCC 4.9 to be ported to GCC 6 without
change.


Makes sense.


2) It hardwires a rather arbitrarily restrictive limit of 64 KB
on the size of the biggest C++ VLA.  (This could stand to be
improved and made more intelligent, and perhaps integrated
with stack checking via -fstack-limit, after the GCC 6
release.)


The bounds checking should share code with build_new_1.


3) By throwing an exception for erroneous VLAs the patch largely
defeats the VLA Sanitizer.  The sanitizer is still useful in
C++ 98 mode where the N3639 VLA runtime checking is disabled,
and when exceptions are disabled via -fno-exceptions.
Disabling  the VLA checking in C++ 98 mode doesn't seem like
a useful feature, but I didn't feel like reverting what was
a deliberate decision.


What deliberate decision?  The old code checked for C++14 mode because 
the feature was part of the C++14 working paper.  What's the rationale 
for C++11 as the cutoff?


Jason



C++ PATCH for c++/70344 (ICE with recursive constexpr)

2016-03-23 Thread Jason Merrill
cp_fold tries to fold calls to constexpr functions.  When we are inside 
a constexpr function at the time and see a call to the same function, 
this caused trouble because we hadn't adjusted the parameter types yet. 
 Fixed by handling this special case of calling a constexpr function 
that isn't defined yet.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 4d4c5c65269b6dcde93fa13a90f2af925c8082c7
Author: Jason Merrill 
Date:   Wed Mar 23 13:51:42 2016 -0400

	PR c++/70344
	* constexpr.c (cxx_eval_call_expression): Catch invalid recursion.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 7b13633..d71e488 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1239,6 +1239,21 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
   return t;
 }
 
+  if (fun == current_function_decl)
+{
+  /* A call to the current function, i.e.
+	 constexpr int f (int i) {
+	   constexpr int j = f(i-1);
+	   return j;
+	 }
+	 This would be OK without the constexpr on the declaration of j.  */
+  if (!ctx->quiet)
+	error_at (loc, "%qD called in a constant expression before its "
+		  "definition is complete", fun);
+  *non_constant_p = true;
+  return t;
+}
+
   constexpr_ctx new_ctx = *ctx;
   if (DECL_CONSTRUCTOR_P (fun) && !ctx->object
   && TREE_CODE (t) == AGGR_INIT_EXPR)
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-recursion2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-recursion2.C
new file mode 100644
index 000..978b998
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-recursion2.C
@@ -0,0 +1,17 @@
+// PR c++/70344
+// { dg-do compile { target c++11 } }
+
+struct Z
+{
+  Z () = default;
+  Z (Z const &) = default;
+  constexpr Z (Z &&) {}
+};
+
+constexpr int
+fn (Z v)
+{
+  return fn (v);
+}
+
+auto t = fn (Z ());


Re: [C++ PATCH] Diagnose constexpr overflow (PR c++/70323)

2016-03-23 Thread Jason Merrill
For GCC 7 we should do constexpr evaluation on the unfolded function, 
but for GCC 6 this is OK.


Jason


[committed] Fix up OpenMP taskloop C++ handling (PR c++/70376)

2016-03-23 Thread Jakub Jelinek
Hi!

As the testcase shows, taskloop needs similar handling to task - we need
to find implicitly firstprivate variables and synthetize their copy ctors
and dtors.

2016-03-23  Jakub Jelinek  

PR c++/70376
* cp-gimplify.c (genericize_omp_for_stmt): Don't walk OMP_FOR_CLAUSES
for OMP_TASKLOOP here.
(cp_genericize_r): Handle OMP_TASKLOOP like OMP_TASK, except do call
genericize_omp_for_stmt instead of cp_walk_tree on OMP_BODY.

* testsuite/libgomp.c++/pr70376.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2016-03-22 09:05:32.0 +0100
+++ gcc/cp/cp-gimplify.c2016-03-23 15:23:28.790585928 +0100
@@ -386,7 +386,8 @@ genericize_omp_for_stmt (tree *stmt_p, i
   tree clab = begin_bc_block (bc_continue, locus);
 
   cp_walk_tree (&OMP_FOR_BODY (stmt), cp_genericize_r, data, NULL);
-  cp_walk_tree (&OMP_FOR_CLAUSES (stmt), cp_genericize_r, data, NULL);
+  if (TREE_CODE (stmt) != OMP_TASKLOOP)
+cp_walk_tree (&OMP_FOR_CLAUSES (stmt), cp_genericize_r, data, NULL);
   cp_walk_tree (&OMP_FOR_INIT (stmt), cp_genericize_r, data, NULL);
   cp_walk_tree (&OMP_FOR_COND (stmt), cp_genericize_r, data, NULL);
   cp_walk_tree (&OMP_FOR_INCR (stmt), cp_genericize_r, data, NULL);
@@ -1272,7 +1273,9 @@ cp_genericize_r (tree *stmt_p, int *walk
   if (TREE_CODE (d) == VAR_DECL)
gcc_assert (CP_DECL_THREAD_LOCAL_P (d) == DECL_THREAD_LOCAL_P (d));
 }
-  else if (TREE_CODE (stmt) == OMP_PARALLEL || TREE_CODE (stmt) == OMP_TASK)
+  else if (TREE_CODE (stmt) == OMP_PARALLEL
+  || TREE_CODE (stmt) == OMP_TASK
+  || TREE_CODE (stmt) == OMP_TASKLOOP)
 {
   struct cp_genericize_omp_taskreg omp_ctx;
   tree c, decl;
@@ -1312,7 +1315,10 @@ cp_genericize_r (tree *stmt_p, int *walk
  default:
break;
  }
-  cp_walk_tree (&OMP_BODY (stmt), cp_genericize_r, data, NULL);
+  if (TREE_CODE (stmt) == OMP_TASKLOOP)
+   genericize_omp_for_stmt (stmt_p, walk_subtrees, data);
+  else
+   cp_walk_tree (&OMP_BODY (stmt), cp_genericize_r, data, NULL);
   wtd->omp_ctx = omp_ctx.outer;
   splay_tree_delete (omp_ctx.variables);
 }
@@ -1380,8 +1386,7 @@ cp_genericize_r (tree *stmt_p, int *walk
 genericize_break_stmt (stmt_p);
   else if (TREE_CODE (stmt) == OMP_FOR
   || TREE_CODE (stmt) == OMP_SIMD
-  || TREE_CODE (stmt) == OMP_DISTRIBUTE
-  || TREE_CODE (stmt) == OMP_TASKLOOP)
+  || TREE_CODE (stmt) == OMP_DISTRIBUTE)
 genericize_omp_for_stmt (stmt_p, walk_subtrees, data);
   else if ((flag_sanitize
& (SANITIZE_NULL | SANITIZE_ALIGNMENT | SANITIZE_VPTR))
--- libgomp/testsuite/libgomp.c++/pr70376.C.jj  2016-03-23 15:32:59.906867697 
+0100
+++ libgomp/testsuite/libgomp.c++/pr70376.C 2016-03-23 15:26:04.0 
+0100
@@ -0,0 +1,20 @@
+// PR c++/70376
+// { dg-do link }
+
+template 
+struct A
+{
+  A() { }
+  A(const A&) { }
+  void foo() { }
+};
+
+int
+main ()
+{
+  A a;
+  #pragma omp taskloop
+  for (int i = 0; i < 64; i++)
+a.foo();
+  return 0;
+}

Jakub


Re: Also test -O0 for OpenACC C, C++ offloading test cases

2016-03-23 Thread Bernd Schmidt

On 03/23/2016 07:47 AM, Thomas Schwinge wrote:

Want me to re-word that?  :-| I thought it would be obvious from looking
at the test case code; will not be a problem in practice.  It's because
of constructs used in the test cases, like the following, for example:



   if (__builtin_acc_on_device (5))
 {
   int g = 0, w = 0, v = 0;

   __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (g));
   __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (w));
   __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (v));
   ary[ix] = (g << 16) | (w << 8) | v;


Ok, maybe write "This code uses nvptx assembly guarded with 
__builtin_acc_on_device, which fails to be optimized away at -O0."


Ok with something like that.


Bernd


Re: Also test -O0 for OpenACC C, C++ offloading test cases

2016-03-23 Thread Jakub Jelinek
On Tue, Mar 22, 2016 at 11:23:43AM +0100, Thomas Schwinge wrote:
> As discussed in
> 
> (and similar to what we're already doing for Fortran, and similar to what
> recently got committed to libgomp/testsuite/libgomp.hsa.c/c.exp), it has
> been helpful to also run C, C++ offloading test cases with -O0 in
> addition to the -O2 default.  Making my earlier gomp-4_0-branch patch
> conceptually simpler, I came up with the following; OK for trunk?

How big difference in make check-target-libgomp time is that?
Without PTX offloading I bet zero, but with PTX offloading configured, is it
10% or 50% slower?

Jakub


Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-03-23 Thread Jason Merrill

On 03/22/2016 04:01 PM, Martin Sebor wrote:

On 03/22/2016 12:52 PM, Jason Merrill wrote:

On 03/21/2016 06:09 PM, Jeff Law wrote:

On 03/21/2016 11:54 AM, Jason Merrill wrote:

Both b0 and b1 are invalid and should be diagnosed, but only b1
is.  b1 isn't because because by the time we see its initializer
in constexpr.c it's been transformed into the equivalent of "b1
= (int*)ps" (though we don't see the cast which would also make
it invalid).

But if we can avoid these early simplifying transformations and
retain a more faithful representation of the original source then
doing the checking later will likely be simpler and result in
detecting more problems with greater consistency and less effort.

Do we know where the folding is happening for this case and is it
something we can reasonably defer?ie, is this just a case we
missed
as part of the deferred folding work and hence should have its own
distinct BZ to track?


Yes, why is it already folded?



Let's pull that out into a separate BZ and tackle it for gcc-7.


I need to understand the issue before I agree to defer it.

It turns out that the problem is with how cp_build_binary_op calls
cp_pointer_int_sum and thus the c-common pointer_int_sum, which folds.

The POINTER_PLUS_EXPRs thus created have been a source of many issues
with constexpr evaluation, since it's impossible to reconstruct the
original expression, especially because POINTER_PLUS_EXPR uses an
unsigned second operand.  Deferring lowering to POINTER_PLUS_EXPR would
help a lot.  But it would indeed be a significant risk at this point.

I think let's defer the fix for c++/60760 (i.e. the nullptr_p bits)
until stage 1, when it can be combined with the POINTER_PLUS_EXPR fix,
and put the rest of this patch in now.


I can split up the patch into two and post the subset without
the fix for c++/60760, though I don't expect to be done with
it after I get back (next week).

I'd like to understand your concern with the fix for c++/60760.
Is it that it's incomplete (doesn't reject taking the address
of the first member of a struct, as in &null->first_member),
or are you worried that the changes may not be stable enough?


More the latter; it seems like significant new code and doesn't fix a 
regression.


Jason



Re: [PATCH] 69517 - [5/6 regression] SEGV on a VLA with excess initializer elements

2016-03-23 Thread Martin Sebor

Thanks for the comments.


2) It hardwires a rather arbitrarily restrictive limit of 64 KB
on the size of the biggest C++ VLA.  (This could stand to be
improved and made more intelligent, and perhaps integrated
with stack checking via -fstack-limit, after the GCC 6
release.)


The bounds checking should share code with build_new_1.


I agree that sharing the same code is right long term approach.

I had initially started out down that path, by factoring out code
from build_new_1 into a general function that I had thought could
be shared between it and cp_finish_decl.  But I ended up abandoning
that approach two reasons:

a) The checking expression built for array new is quite a bit less
   involved because only the major dimension of the array in requires
   runtime checking (the others must be constant and are checked at
   compile time).  In contrast, all VLA dimensions are potentially
   dynamic and so must be checked at runtime.

b) While (a) can be solved by making the checking function smart
   and general enough, it felt too intrusive and potentially risky
   to change array new at this stage.

That said, I'm happy to do this refactoring in stage 1.




3) By throwing an exception for erroneous VLAs the patch largely
defeats the VLA Sanitizer.  The sanitizer is still useful in
C++ 98 mode where the N3639 VLA runtime checking is disabled,
and when exceptions are disabled via -fno-exceptions.
Disabling  the VLA checking in C++ 98 mode doesn't seem like
a useful feature, but I didn't feel like reverting what was
a deliberate decision.


What deliberate decision?  The old code checked for C++14 mode because
the feature was part of the C++14 working paper.  What's the rationale
for C++11 as the cutoff?


Sorry, I had misremembered the original C++ 14 check as one for
C++ 11.  Are you suggesting to restore the checking only for C++
14 mode, or for all modes?  Either is easy enough to do though,
IMO, it should be safe in all modes and I would expect it to be
preferable to undefined behavior.

Martin


[PATCH, applied] PR 70381, fix -mfloat128 setting

2016-03-23 Thread Michael Meissner
This patch undoes a change that I made that I did not intend to install in the
trunk.  On October 29th, 2015 I commited a major patch to enable the IEEE
128-bit floating point support (subversion id 229545).  I didn't realize I had
set OPTION_MASK_FLOAT128 in the ISA_2_6_MASKS_SERVER macro (which sets the
default options for power7).  I had only meant to set the mask in
POWERPC_MASKS, which would allow it to be set.

The upshot is -mfloat128 would be enabled for any power7, power8, or power9
system by default.  However, the IEEE 128-bit emulation functions are only
built on the Linux OS.  In addition, while the basic compiler and libgcc
support is there, more work needs to be done on libraries, and other
infrastructure before __float128 can be considered available in the GCC 7.0
time frame.

I built compilers with/without these patches on a Power8 system, and there were
no regressions.  Sorry about enabling it by default.

2016-03-23  Michael Meissner  

PR target/70381
* config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Do not set
-mfloat128 here.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 234407)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -44,7 +44,6 @@
 #define ISA_2_6_MASKS_SERVER   (ISA_2_5_MASKS_SERVER   \
 | OPTION_MASK_POPCNTD  \
 | OPTION_MASK_ALTIVEC  \
-| OPTION_MASK_FLOAT128 \
 | OPTION_MASK_VSX  \
 | OPTION_MASK_UPPER_REGS_DF)
 


Re: [PATCH 7/7] ira.c validate_equiv_mem

2016-03-23 Thread Alan Modra
On Wed, Mar 23, 2016 at 01:03:18AM +0100, Bernd Schmidt wrote:
> On 03/21/2016 02:43 AM, Alan Modra wrote:
> >
> >+enum valid_equiv { valid_none, valid_combine, valid_reload };
> >+
> 
> Might be worth documenting that each step represents a superset of the
> previous one.

Fixed.

> >+  ret = valid_combine;
> >+  if (! MEM_READONLY_P (memref)
> >+  && ! RTL_CONST_OR_PURE_CALL_P (insn))
> >+return valid_none;
> >+}
> 
> The gcc style is actually not to have a space after unary "!". None of the
> code in this file follows that, but I think you may want to change that as
> you modify things in your patches, and have new code follow the recommended
> style.

Fixed.

> >@@ -3536,7 +3557,8 @@ update_equiv_regs (void)
> > {
> >   /* Note that the statement below does not affect the priority
> >  in local-alloc!  */
> >-  REG_LIVE_LENGTH (regno) *= 2;
> >+  if (note)
> >+REG_LIVE_LENGTH (regno) *= 2;
> 
> That's a very suspicious comment. It would be worth testing whether
> REG_LIVE_LENGTH has any effect on our current register allocation at all,
> and remove this code if not.

Yes, REG_LIVE_LENGTH is used in just one place in the whole of gcc,
and that's the test in update_equiv_regs just above the code you
quote.
  /* Don't mess with things live during setjmp.  */
  if (REG_LIVE_LENGTH (regno) >= 0 && optimize)

That could be replaced with
  if (optimize && !bitmap_bit_p (setjmp_crosses, regno))
and outside the loop
  bitmap setjmp_crosses = regstat_get_setjmp_crosses ();

For now, I've removed the REG_LIVE_LENGTH adjustment from patch 7/7.
I'll also prepare a patch to delete REG_LIVE_LENGTH everywhere.

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] Fix PR c++/62212 (ICE during mangling of array type)

2016-03-23 Thread Patrick Palka
When mangling an ARRAY_TYPE whose element type is a typedef variant and
whose domain is a type-dependent expression, we get an ICE when
canonicalize_for_substitution() tries to create a canonical version of
this ARRAY_TYPE with its element type being stripped of typedefs.

During this canonicalization we call build_cplus_array_type which builds
dependent and non-dependent array types differently.  But because at
this this point (during mangling) we are outside of a template decl, the
function wrongly thinks that the array type being built is non-dependent
according to dependent_type_p().  Because it's considered to be
non-dependent we build the type with build_array_type() which tries to
lay the type out and eventually we ICE because of its dependent TYPE_DOMAIN.

So to fix this, the check for type-dependentness in
build_cplus_array_type() should not depend on processing_template_decl.
This patch makes uses_template_parms() get used instead so that it
behaves properly outside of a template context, like during
mangling.

Does this look OK to commit after bootstrap + regtest?

gcc/cp/ChangeLog:

PR c++/62212
* tree.c (build_cplus_array_type): Determine type-dependentess
with uses_template_parms instead of with dependent_type_p.

gcc/testsuite/ChangeLog:

PR c++/62212
* g++.dg/template/mangle2.C: New test.
---
 gcc/cp/tree.c   |  5 ++---
 gcc/testsuite/g++.dg/template/mangle2.C | 19 +++
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/mangle2.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index f784952..5d9de34 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -824,9 +824,8 @@ build_cplus_array_type (tree elt_type, tree index_type)
   if (elt_type == error_mark_node || index_type == error_mark_node)
 return error_mark_node;
 
-  bool dependent = (processing_template_decl
-   && (dependent_type_p (elt_type)
-   || (index_type && dependent_type_p (index_type;
+  bool dependent = (uses_template_parms (elt_type)
+   || (index_type && uses_template_parms (index_type)));
 
   if (elt_type != TYPE_MAIN_VARIANT (elt_type))
 /* Start with an array of the TYPE_MAIN_VARIANT.  */
diff --git a/gcc/testsuite/g++.dg/template/mangle2.C 
b/gcc/testsuite/g++.dg/template/mangle2.C
new file mode 100644
index 000..50b0d9f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/mangle2.C
@@ -0,0 +1,19 @@
+// PR c++/62212
+// { dg-do assemble }
+
+typedef int my_int;
+
+template
+struct X {
+enum {value = 1};
+};
+
+template
+void f(const my_int(&)[X::value]);
+
+int main() {
+const my_int a[1] = {};
+f(a);
+}
+
+// { dg-final { scan-assembler "_Z1fIvEvRAsr1XIT_E5value_Ki" } }
-- 
2.8.0.rc3.27.gade0865



Re: [PATCH] Fix PR c++/62212 (ICE during mangling of array type)

2016-03-23 Thread Jason Merrill

OK.

Jason


Re: rs6000 stack_tie mishap again

2016-03-23 Thread Alan Modra
On Wed, Mar 23, 2016 at 05:04:39PM +0100, Olivier Hainque wrote:
> The reason why 894 is not accounted in the base ref computation is because it
> is part of the epilogue sequence, and init_alias_analysis has:
> 
>   /* Walk the insns adding values to the new_reg_base_value array.  */
>   for (i = 0; i < rpo_cnt; i++)
>   { ...
> if (could_be_prologue_epilogue
> && prologue_epilogue_contains (insn))
>   continue;
> 
> The motivation for this is unclear to me.

https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00048.html

> My rough understanding is that we probably really care about frame_related
> insns only here, at least on targets where the flag is supposed to be set
> accurately.

Also, possibly just prologue insns.  So you might be able to modify
init_alias_analysis just to ignore the prologue and skip any need for
yet another hook.

Let's see what rth thinks.  He did say the patch might need to be
redone.  :)
https://gcc.gnu.org/ml/gcc-patches/1999-08n/msg00072.html

[snip]
> So, aside from the dependency issue which needs to be fixed somehow, I
> think it would make sense to consider using a strong blockage mecanism in
> expand_epilogue.

That's what we both said here
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01180.html

and David agreed too
https://gcc.gnu.org/ml/gcc-patches/2011-11/msg01842.html

but if you can have the alias analysis changes accepted that would be
even better.

-- 
Alan Modra
Australia Development Lab, IBM


Re: rs6000 stack_tie mishap again

2016-03-23 Thread Alan Modra
On Wed, Mar 23, 2016 at 01:38:26PM -0400, David Edelsohn wrote:
> The description and
> references to prior SPE prologue and epilogue changes do not confirm a
> wider problem.

There's a good chance this affects ABI_V4 large stack frames too.
If restoring regs inline we'll be using r11 as a base, just like SPE
does with moderate sized stack frames when restoring 64-bit regs.

-- 
Alan Modra
Australia Development Lab, IBM