[PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass

2015-07-28 Thread Thomas Preud'homme
The bswap pass contain the following loop:

for (i = 0; i < size; i++, inc <<= BITS_PER_MARKER)

In the update to inc and i just before exiting the loop, inc can be shifted by 
a total of more than 62bit, making the value too large to be represented by 
int64_t. This is an undefined behavior [1] and it triggers an error under an 
ubsan bootstrap. This patch change the type of inc to be unsigned, removing the 
undefined behavior.

[1] C++ 98 standard section 5.8 paragraph 2:

"The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are 
zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , 
reduced modulo one more than the maximum value representable in the result 
type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 
is representable in the corresponding unsigned type of the result type, then 
that value, converted to the result type, is the resulting value; otherwise, the
behavior is undefined."

ChangeLog entry is as follows:

2015-07-28  Thomas Preud'homme  

PR tree-optimization/66828
* tree-ssa-math-opts.c (perform_symbolic_merge): Change type of inc
from int64_t to uint64_t.

Testsuite was run on a native x86_64-linux-gnu bootstrapped GCC and an 
arm-none-eabi cross-compiler without any regression. Committed as obvious as 
suggested by  Markus Trippelsdorf in PR66828.

Best regards,

Thomas




RE: [PATCH] Obvious fix for PR66828: left shift with undefined behavior in bswap pass

2015-07-28 Thread Thomas Preud'homme
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> 
> ChangeLog entry is as follows:
> 
> 2015-07-28  Thomas Preud'homme  
> 
> PR tree-optimization/66828
> * tree-ssa-math-opts.c (perform_symbolic_merge): Change type of
> inc
> from int64_t to uint64_t.

And the patch is:

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 55382f3..c3098db 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2122,7 +2122,7 @@ perform_symbolic_merge (gimple source_stmt1, struct 
symbolic_number *n1,
  the same base (array, structure, ...).  */
   if (gimple_assign_rhs1 (source_stmt1) != gimple_assign_rhs1 (source_stmt2))
 {
-  int64_t inc;
+  uint64_t inc;
   HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
   struct symbolic_number *toinc_n_ptr, *n_end;


Best regards,

Thomas




[PATCH][21/n] Remove GENERIC stmt combining from SCCVN

2015-07-28 Thread Richard Biener

This moves/merges the equality folding of decl addresses from
fold_comparison with that from fold_binary in match.pd.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-07-28  Richard Biener  

* fold-const.c (fold_comparison): Remove equality folding
of decl addresses ...
* match.pd: ... here and merge with existing pattern.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 226240)
+++ gcc/fold-const.c(working copy)
@@ -8511,30 +8389,6 @@ fold_comparison (location_t loc, enum tr
  return fold_build2_loc (loc, code, type, offset0, offset1);
}
}
-  /* For non-equal bases we can simplify if they are addresses
-declarations with different addresses.  */
-  else if (indirect_base0 && indirect_base1
-  /* We know that !operand_equal_p (base0, base1, 0)
- because the if condition was false.  But make
- sure two decls are not the same.  */
-  && base0 != base1
-  && TREE_CODE (arg0) == ADDR_EXPR
-  && TREE_CODE (arg1) == ADDR_EXPR
-  && DECL_P (base0)
-  && DECL_P (base1)
-  /* Watch for aliases.  */
-  && (!decl_in_symtab_p (base0)
-  || !decl_in_symtab_p (base1)
-  || !symtab_node::get_create (base0)->equal_address_to
-(symtab_node::get_create (base1
-   {
- if (code == EQ_EXPR)
-   return omit_two_operands_loc (loc, type, boolean_false_node,
- arg0, arg1);
- else if (code == NE_EXPR)
-   return omit_two_operands_loc (loc, type, boolean_true_node,
- arg0, arg1);
-   }
   /* For equal offsets we can simplify to a comparison of the
 base addresses.  */
   else if (bitpos0 == bitpos1
Index: gcc/match.pd
===
--- gcc/match.pd(revision 226240)
+++ gcc/match.pd(working copy)
@@ -1808,15 +1869,20 @@ (define_operator_list CBRT BUILT_IN_CBRT
 have access to attributes for externs), then we know the result.  */
  (simplify
   (cmp (convert? addr@0) (convert? addr@1))
-  (if (decl_in_symtab_p (TREE_OPERAND (@0, 0))
-   && decl_in_symtab_p (TREE_OPERAND (@1, 0)))
-   (with
-{
-  int equal = symtab_node::get_create (TREE_OPERAND (@0, 0))
+  (if (DECL_P (TREE_OPERAND (@0, 0))
+   && DECL_P (TREE_OPERAND (@1, 0)))
+   (if (decl_in_symtab_p (TREE_OPERAND (@0, 0))
+   && decl_in_symtab_p (TREE_OPERAND (@1, 0)))
+(with
+ {
+   int equal = symtab_node::get_create (TREE_OPERAND (@0, 0))
->equal_address_to (symtab_node::get_create (TREE_OPERAND (@1, 0)));
-}
-(if (equal != 2)
- { constant_boolean_node (equal ? cmp == EQ_EXPR : cmp != EQ_EXPR, type); 
}
+ }
+ (if (equal != 2)
+  { constant_boolean_node (equal
+  ? cmp == EQ_EXPR : cmp != EQ_EXPR, type); }))
+(if (TREE_OPERAND (@0, 0) != TREE_OPERAND (@1, 0))
+ { constant_boolean_node (cmp == EQ_EXPR ? false : true, type); }
 
  (simplify
   (cmp (convert? addr@0) integer_zerop)



[committed, gomp4, PR46193] Handle mix/max pointer reductions in parloops

2015-07-28 Thread Tom de Vries

On 22/07/15 20:15, Tom de Vries wrote:

On 13/07/15 13:02, Tom de Vries wrote:

Hi,

this patch fixes PR46193.

It handles min and max reductions of pointer type in parloops.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Ping.



Committed to gomp-4_0-branch.

Thanks,
- Tom


0001-Handle-mix-max-pointer-reductions-in-parloops.patch


Handle mix/max pointer reductions in parloops

2015-07-13  Tom de Vries

PR tree-optimization/46193
* omp-low.c (omp_reduction_init): Handle pointer type for min or max
clause.

* gcc.dg/autopar/pr46193.c: New test.

* testsuite/libgomp.c/pr46193.c: New test.
---
  gcc/omp-low.c  |  4 ++
  gcc/testsuite/gcc.dg/autopar/pr46193.c | 38 +++
  libgomp/testsuite/libgomp.c/pr46193.c  | 67
++
  3 files changed, 109 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/autopar/pr46193.c
  create mode 100644 libgomp/testsuite/libgomp.c/pr46193.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 2e2070a..20d0010 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3423,6 +3423,8 @@ omp_reduction_init (tree clause, tree type)
  real_maxval (&min, 1, TYPE_MODE (type));
return build_real (type, min);
  }
+  else if (POINTER_TYPE_P (type))
+return lower_bound_in_type (type, type);
else
  {
gcc_assert (INTEGRAL_TYPE_P (type));
@@ -3439,6 +3441,8 @@ omp_reduction_init (tree clause, tree type)
  real_maxval (&max, 0, TYPE_MODE (type));
return build_real (type, max);
  }
+  else if (POINTER_TYPE_P (type))
+return upper_bound_in_type (type, type);
else
  {
gcc_assert (INTEGRAL_TYPE_P (type));
diff --git a/gcc/testsuite/gcc.dg/autopar/pr46193.c
b/gcc/testsuite/gcc.dg/autopar/pr46193.c
new file mode 100644
index 000..544a5da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr46193.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2
-fdump-tree-parloops-details" } */
+
+extern void abort (void);
+
+char *
+foo (int count, char **list)
+{
+  char *minaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr < minaddr)
+minaddr = addr;
+}
+
+  return minaddr;
+}
+
+char *
+foo2 (int count, char **list)
+{
+  char *maxaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr > maxaddr)
+maxaddr = addr;
+}
+
+  return maxaddr;
+}
+
+/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 2
"parloops" } } */
diff --git a/libgomp/testsuite/libgomp.c/pr46193.c
b/libgomp/testsuite/libgomp.c/pr46193.c
new file mode 100644
index 000..1e27faf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr46193.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=2" } */
+
+extern void abort (void);
+
+char *
+foo (int count, char **list)
+{
+  char *minaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr < minaddr)
+minaddr = addr;
+}
+
+  return minaddr;
+}
+
+char *
+foo2 (int count, char **list)
+{
+  char *maxaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr > maxaddr)
+maxaddr = addr;
+}
+
+  return maxaddr;
+}
+
+#define N 5
+
+static void
+init (char **list)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+list[i] = (char *)&list[i];
+}
+
+int
+main (void)
+{
+  char *list[N];
+  char * res;
+
+  init (list);
+
+  res = foo (N, list);
+
+  if (res != (char *)&list[0])
+abort ();
+
+  res = foo2 (N, list);
+
+  if (res != (char *)&list[N-1])
+abort ();
+
+  return 0;
+}
-- 1.9.1







[committed, gomp4] Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-28 Thread Tom de Vries

On 24/07/15 16:39, Tom de Vries wrote:

Hi,

this patch allows parallelization and vectorization of reduction
operators that are guaranteed to not overflow (such as min and max
operators), independent of the overflow behaviour of the type.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Committed to gomp-4_0-branch.

Thanks,
- Tom


0002-Allow-non-overflow-ops-in-vect_is_simple_reduction_1.patch


Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-24  Tom de Vries

* tree.c (no_overflow_tree_code): New function.
* tree.h (no_overflow_tree_code): Declare.
* tree-vect-loop.c (vect_is_simple_reduction_1): Use
no_overflow_tree_code.

* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
optimize ("-ftree-parallelize-loops=0").
Add successful scans for 2 detected reductions.  Add xfail scans for 3
detected reductions.
* gcc.dg/autopar/reduc-2short.c: Same.
* gcc.dg/autopar/reduc-8.c  (init_arrays): Mark with attribute
optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
detected reductions.
* gcc.dg/vect/trapv-vect-reduc-4.c: Expect succesful reductions for min
and max loops.
---
  gcc/testsuite/gcc.dg/autopar/reduc-2char.c | 10 +++---
  gcc/testsuite/gcc.dg/autopar/reduc-2short.c| 10 ++
  gcc/testsuite/gcc.dg/autopar/reduc-8.c |  7 ---
  gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |  2 +-
  gcc/tree-vect-loop.c   |  3 ++-
  gcc/tree.c | 24 
  gcc/tree.h |  1 +
  7 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c 
b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 14867f3..a2dad44 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed 
char min_result)
  abort ();
  }

- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
   {
 int i;

@@ -60,7 +61,10 @@ int main (void)
  }


-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail 
*-*-* } } } */
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 
"parloops" } } */
  /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 
"parloops" { xfail *-*-* } } } */


diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c 
b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
index 7c19cc5..a50e14f 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2short.c
@@ -38,8 +38,9 @@ void main1 (short x, short max_result, short min_result)
  abort ();
  }

- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
   {
 int i;

@@ -58,7 +59,8 @@ int main (void)
return 0;
  }

+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail 
*-*-* } } } */

-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 
"parloops" } } */
  /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 
"parloops" { xfail *-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-8.c 
b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
index 1d05c48..18ba03d 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-8.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-8.c
@@ -40,7 +40,8 @@ testmin (const T *c, T init, T result)
  abort ();
  }

-int main (void)
+int __attribute__((optimize ("-ftree-parallelize-loops=0")))
+main (void)
  {
static signed char A[N] = {
  0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
@@ -84,5 +85,5 @@ int main (void)
  }


-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail 
*-*-* } } } */
-/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 
"parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 
"parloops" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c 
b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
index 2129717..86f9b90 100644
--- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c
@@ -46,4 +46,4 @@ int main (void)
return 0;
  }

-

[committed, gomp4] Handle non-overflow reductions in graphite

2015-07-28 Thread Tom de Vries

On 26/07/15 18:53, Tom de Vries wrote:

On 26/07/15 18:49, Tom de Vries wrote:

On 24/07/15 16:39, Tom de Vries wrote:

Hi,

this patch allows parallelization and vectorization of reduction
operators that are guaranteed to not overflow (such as min and max
operators), independent of the overflow behaviour of the type.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom


[ Slip-of-the-keyboard ]

This is the graphite version of this patch.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Committed to gomp-4_0-branch.

Thanks,
- Tom


0002-Handle-non-overflow-reductions-in-graphite.patch


Handle non-overflow reductions in graphite

2015-07-21  Tom de Vries

* graphite-sese-to-poly.c (is_reduction_operation_p): Allow operations
that do not overflow.
---
  gcc/graphite-sese-to-poly.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..531c848 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,19 @@ is_reduction_operation_p (gimple stmt)
if (FLOAT_TYPE_P (type))
  return flag_associative_math;

-  return (INTEGRAL_TYPE_P (type)
- && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+{
+  if (INTEGRAL_TYPE_P (type)
+ && TYPE_OVERFLOW_WRAPS (type))
+   return true;
+
+  if (no_overflow_tree_code (code, type))
+   return true;
+
+  return false;
+}
+
+  return false;
  }

  /* Returns true when PHI contains an argument ARG.  */
-- 1.9.1





Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-07-28 Thread Richard Biener
On Mon, Jul 27, 2015 at 6:20 PM, Jeff Law  wrote:
> On 07/27/2015 03:25 AM, Richard Biener wrote:
>>
>> On Mon, Jul 27, 2015 at 5:41 AM, Michael Collison
>>  wrote:
>>>
>>> This patch is designed to optimize end of loop conditions involving of
>>> the
>>> form
>>>   i < x && i < y into i < min (x, y). Loop condition involving '>' are
>>> handled similarly using max(x,y).
>>> As an example:
>>>
>>> #define N 1024
>>>
>>> int  a[N], b[N], c[N];
>>>
>>> void add (unsignedint  m, unsignedint  n)
>>> {
>>>unsignedint  i, bound = (m < n) ? m : n;
>>>for  (i = 0; i < m && i < n; ++i)
>>>  a[i] = b[i] + c[i];
>>> }
>>>
>>>
>>> Performed bootstrap and make check on: x86_64_unknown-linux-gnu,
>>> arm-linux-gnueabihf, and aarch64-linux-gnu.
>>> Okay for trunk?
>>
>>
>> So this works only for && that has been lowered to non-CFG form
>> (I suppose phiopt would catch that?  If not, ifcombine would be the
>> place to implement it I guess).
>
> phiopt is supposed to be generating MIN/MAX expressions for us.  If it isn't
> it'd be good to see any testcases where it isn't.
>
> I think that raises a general question though.  Does it make more sense to
> capture MIN/MAX (and others) in phiopt or in the match.pd framework?

match.pd is good for pattern recognition - patterns of fixed size.  There are
cases that are done in fold-const.c for example that doesn't fit very well
and should be done as separate pass, like for example figuring out whether
an expression can be easily negated or whether there are sign-changes that
can be stripped.  Basically all cases where fold currently recurses (unbound).

The above case is a corner case I think - the number of && you can change
into (multiple) MIN/MAX is unbound but we might only care about the case
where there will be one MIN/MAX operation.

Generally phiopt and other patterns that match the CFG are not yet well
supported by match.pd (though I outlined how matching PHI nodes when
facing (simplify (cond ...) ...) would be possible).

So while putting something into match.pd is easy I'd like people to
think if doing the same thing elsewhere is better - that is, if this is really
a pattern transform operation or if you are just implementing a special-case
of a general transform as a pattern.

Richard.

> Jeff
>


Re: [PING][PATCH, PR66851] Handle double reduction in parloops

2015-07-28 Thread Richard Biener
On Tue, Jul 28, 2015 at 12:32 AM, Tom de Vries  wrote:
> On 24/07/15 12:30, Tom de Vries wrote:
>>
>> On 13/07/15 16:55, Tom de Vries wrote:
>>>
>>> Hi,
>>>
>>> this patch fixes PR66851.
>>>
>>> In parloops, we manage to parallelize outer loops, but not if the inner
>>> loop contains a reduction. There is an xfail in autopar/outer-4.c for
>>> this:
>>> ...
>>> /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1
>>> "parloops" { xfail *-*-* } } } */
>>> ...
>>>
>>> This patch allows outer loops with a reduction in the inner loop to be
>>> parallelized.
>>>
>
> Updated patch checks that we actually have an inner reduction that we can
> parallelize. So, uns-outer-4.c with unsigned int reduction will be
> paralellized, while outer-4.c with signed int reduction will not be
> paralellized.
>
> Bootstrapped on x86_64, reg-test in progress.
>
>
> OK for trunk?

Ok.

Richard.

> Thanks,
> - Tom
>


Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-28 Thread Richard Biener
On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries  wrote:
> Hi,
>
> this patch allows parallelization and vectorization of reduction operators
> that are guaranteed to not overflow (such as min and max operators),
> independent of the overflow behaviour of the type.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?

Hmm, I don't like that no_overflow_tree_code function.  We have a much more
clear understanding which codes may overflow or trap.  Thus please add
a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like

bool
operation_overflow_traps (tree type, enum tree_code code)
{
  if (!ANY_INTEGRAL_TYPE_P (type)
 || !TYPE_OVERFLOW_TRAPS (type))
return false;
  switch (code)
{
case PLUS_EXPR:
case MINUS_EXPR:
case MULT_EXPR:
case LSHIFT_EXPR:
   /* Can overflow in various ways */
case TRUNC_DIV_EXPR:
case EXACT_DIV_EXPR:
case FLOOR_DIV_EXPR:
case CEIL_DIV_EXPR:
   /* For INT_MIN / -1 */
case NEGATE_EXPR:
case ABS_EXPR:
   /* For -INT_MIN */
   return true;
default:
   return false;
   }
}

and similar variants for _wraps and _undefined.  I think we decided at
some point
the compiler should not take advantage of the fact that lshift or
*_div have undefined
behavior on signed integer overflow, similar we only take advantage of
integral-type
overflow behavior, not vector or complex.  So we could reduce the
number of cases
the functions return true if we document that it returns true only for
the cases where
the compiler needs to / may assume wrapping behavior does not take place.
As for _traps for example we only have optabs and libfuncs for
plus,minus,mult,negate
and abs.

Thanks,
Richard.

> Thanks,
> - Tom


Re: rx: remove some asserts

2015-07-28 Thread Nicholas Clifton

Hi DJ,

There is no need to assert these just to say "not supported" and gcc
may rarely generate addresses from valid code which trigger these
asserts.  Ok?

OK - please apply.

Cheers
  Nick


[gomp4, committed] more parloops-related backports

2015-07-28 Thread Tom de Vries

Hi

I've backported these parloops-related patches to gomp-4_0-branch.

- Simplify gather_scalar_reductions
- Update outer-4.c and uns-outer-4.c
- Handle double reduction in parloops

Thanks,
- Tom
Simplify gather_scalar_reductions

2015-07-27  Tom de Vries  

	backport from trunk:
	2015-07-27  Tom de Vries  

	* tree-parloops.c (gather_scalar_reductions): Simplify function
	structure.
---
 gcc/tree-parloops.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 05508e7..47cb5aa 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2513,6 +2513,8 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
   loop_vec_info simple_loop_info;
 
   simple_loop_info = vect_analyze_loop_form (loop);
+  if (simple_loop_info == NULL)
+return;
 
   for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
 {
@@ -2524,15 +2526,16 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list
   if (virtual_operand_p (res))
 	continue;
 
-  if (!simple_iv (loop, loop, res, &iv, true)
-	&& simple_loop_info)
-	{
-	   gimple reduc_stmt
-	 = vect_force_simple_reduction (simple_loop_info, phi, true,
-	&double_reduc, true);
-	   if (reduc_stmt && !double_reduc)
-  build_new_reduction (reduction_list, reduc_stmt, phi);
-}
+  if (simple_iv (loop, loop, res, &iv, true))
+	continue;
+
+  gimple reduc_stmt
+	= vect_force_simple_reduction (simple_loop_info, phi, true,
+   &double_reduc, true);
+  if (!reduc_stmt || double_reduc)
+	continue;
+
+  build_new_reduction (reduction_list, reduc_stmt, phi);
 }
   destroy_loop_vec_info (simple_loop_info, true);
 
-- 
1.9.1

Update outer-4.c and uns-outer-4.c

2015-07-28  Tom de Vries  

	backport from trunk:
	2015-07-27  Tom de Vries  

	* gcc.dg/autopar/outer-4.c (parloop): Remove superfluous noinline
	attribute.  Update comment.
	(main): Remove.
	Add scan for not parallelizing inner loop.
	* gcc.dg/autopar/uns-outer-4.c (parloop): Remove superfluous noinline
	attribute.
	(main): Remove.
---
 gcc/testsuite/gcc.dg/autopar/outer-4.c | 19 +++
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c | 11 +--
 2 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 2027499..681cf85 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -6,15 +6,16 @@ void abort (void);
 int g_sum=0;
 int x[500][500];
 
-__attribute__((noinline))
-void parloop (int N)
+void
+parloop (int N)
 {
   int i, j;
   int sum;
 
-  /* Double reduction is currently not supported, outer loop is not 
- parallelized.  Inner reduction is detected, inner loop is 
- parallelized.  */
+  /* The inner reduction is not recognized as reduction because we cannot assume
+ that int wraps on overflow.  The way to fix this is to implement the
+ reduction operation in unsigned type, but we've not yet implemented
+ this.  */
   sum = 0;
   for (i = 0; i < N; i++)
 for (j = 0; j < N; j++)
@@ -23,13 +24,7 @@ void parloop (int N)
   g_sum = sum;
 }
 
-int main(void)
-{
-  parloop(500);
-
-  return 0;
-}
-
 
+/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 0 "parloops" } } */
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
index 8365a89..30ead25 100644
--- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
@@ -6,7 +6,7 @@ void abort (void);
 unsigned int g_sum=0;
 unsigned int x[500][500];
 
-void __attribute__((noinline))
+void
 parloop (int N)
 {
   int i, j;
@@ -23,14 +23,5 @@ parloop (int N)
   g_sum = sum;
 }
 
-int
-main (void)
-{
-  parloop (500);
-
-  return 0;
-}
-
-
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
-- 
1.9.1

Handle double reduction in parloops

2015-07-28  Tom de Vries  

	backport from trunk:
	2015-07-28  Tom de Vries  

	* tree-parloops.c (reduc_stmt_res): New function.
	(initialize_reductions, add_field_for_reduction)
	(create_phi_for_local_result, create_loads_for_reductions)
	(create_stores_for_reduction, build_new_reduction): Handle case that
	reduc_stmt is a phi.
	(gather_scalar_reductions): Allow double_reduc reductions.

	* gcc.dg/autopar/uns-outer-4.c: Remove xfail on scan for parallelizing
	outer loop.

	* testsuite/libgomp.c/uns-outer-4.c: New test.
---
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c |  6 +--
 gcc/tree-parloops.c| 73 ++
 l

[gomp4,committed] Replace pass_fre with pass_copy_prop in kernels pass group

2015-07-28 Thread Tom de Vries

Hi,

I couldn't reproduce the need for pass_fre in the oacc kernels pass 
group, so I've replaced it with an instance of pass_copy_prop.


Bootstrapped and reg-tested on x86_64.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Replace pass_fre with pass_copy_prop in kernels pass group

2015-07-28  Tom de Vries  

	* passes.def: Replace pass_fre with pass_copy_prop in oacc kernels pass
	group.

	* g++.dg/ipa/devirt-37.C: Update for removal of pass_fre.
	* g++.dg/ipa/devirt-40.C: Same.
	* g++.dg/tree-ssa/pr61034.C: Same.
	* gcc.dg/ipa/ipa-pta-13.c: Same.
	* gcc.dg/ipa/ipa-pta-3.c: Same.
	* gcc.dg/ipa/ipa-pta-4.c: Same.
---
 gcc/passes.def  |  4 +---
 gcc/testsuite/g++.dg/ipa/devirt-37.C| 10 +-
 gcc/testsuite/g++.dg/ipa/devirt-40.C|  4 ++--
 gcc/testsuite/g++.dg/tree-ssa/pr61034.C |  8 
 gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c   |  4 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c|  4 ++--
 gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c|  4 ++--
 7 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 6a2b095..ae91ed1 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -95,9 +95,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_ch_oacc_kernels);
 	  NEXT_PASS (pass_tree_loop_init);
 	  NEXT_PASS (pass_lim);
-	  NEXT_PASS (pass_tree_loop_done);
-	  NEXT_PASS (pass_fre);
-	  NEXT_PASS (pass_tree_loop_init);
+	  NEXT_PASS (pass_copy_prop);
 	  NEXT_PASS (pass_scev_cprop);
   	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	  NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-37.C b/gcc/testsuite/g++.dg/ipa/devirt-37.C
index b7f52a0..9c5287e 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-37.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-37.C
@@ -1,4 +1,4 @@
-/* { dg-options "-fpermissive -O2 -fno-indirect-inlining -fno-devirtualize-speculatively -fdump-tree-fre3-details -fno-early-inlining"  } */
+/* { dg-options "-fpermissive -O2 -fno-indirect-inlining -fno-devirtualize-speculatively -fdump-tree-fre2-details -fno-early-inlining"  } */
 #include 
 struct A {virtual void test() {abort ();}};
 struct B:A
@@ -30,7 +30,7 @@ t()
 /* After inlining the call within constructor needs to be checked to not go into a basetype.
We should see the vtbl store and we should notice extcall as possibly clobbering the
type but ignore it because b is in static storage.  */
-/* { dg-final { scan-tree-dump "No dynamic type change found."  "fre3"  } } */
-/* { dg-final { scan-tree-dump "Checking vtbl store:"  "fre3"  } } */
-/* { dg-final { scan-tree-dump "Function call may change dynamic type:extcall"  "fre3"  } } */
-/* { dg-final { scan-tree-dump "converting indirect call to function virtual void"  "fre3"  } } */
+/* { dg-final { scan-tree-dump "No dynamic type change found."  "fre2"  } } */
+/* { dg-final { scan-tree-dump "Checking vtbl store:"  "fre2"  } } */
+/* { dg-final { scan-tree-dump "Function call may change dynamic type:extcall"  "fre2"  } } */
+/* { dg-final { scan-tree-dump "converting indirect call to function virtual void"  "fre2"  } } */
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-40.C b/gcc/testsuite/g++.dg/ipa/devirt-40.C
index 5107c29..279a228 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-40.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-40.C
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fdump-tree-fre3-details"  } */
+/* { dg-options "-O2 -fdump-tree-fre2-details"  } */
 typedef enum
 {
 } UErrorCode;
@@ -19,4 +19,4 @@ A::m_fn1 (UnicodeString &, int &p2, UErrorCode &) const
   UnicodeString a[2];
 }
 
-/* { dg-final { scan-tree-dump-not "\\n  OBJ_TYPE_REF" "fre3"  } } */
+/* { dg-final { scan-tree-dump-not "\\n  OBJ_TYPE_REF" "fre2"  } } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr61034.C b/gcc/testsuite/g++.dg/tree-ssa/pr61034.C
index c019830..14fd85a 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr61034.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr61034.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O3 -fdump-tree-fre3" }
+// { dg-options "-O3 -fdump-tree-fre2" }
 
 #define assume(x) if(!(x))__builtin_unreachable()
 
@@ -42,6 +42,6 @@ bool f(I a, I b, I c, I d) {
 // a bunch of conditional free()s and unreachable()s.
 // This works only if everything is inlined into 'f'.
 
-// { dg-final { scan-tree-dump-times ";; Function" 1 "fre3" } }
-// { dg-final { scan-tree-dump-times "free" 18 "fre3" } }
-// { dg-final { scan-tree-dump-times "unreachable" 11 "fre3" } }
+// { dg-final { scan-tree-dump-times ";; Function" 1 "fre2" } }
+// { dg-final { scan-tree-dump-times "free" 18 "fre2" } }
+// { dg-final { scan-tree-dump-times "unreachable" 11 "fre2" } }
diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
index 71b31c4..f558df3 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c
@@ -1,5 +1,5 @@
 /* { dg-do link } */
-/* { dg-options "-O2 -fipa-pta -fdump-ipa-pta-d

[gomp4, committed] Handle double reduction in oacc kernels pass group

2015-07-28 Thread Tom de Vries

Hi,

this patch adds a test-case with a double reduction in an oacc kernels 
region.


In order to get it in the proper shape for parloops to deal with, I 
needed to repeat the pass_lim/pass_copy_prop sequence.


Bootstrapped and reg-tested on x86_64.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Handle double reduction in oacc kernels pass group

2015-07-28  Tom de Vries  

	* passes.def: Repeat pass_lim and pass_copy_prop in oacc kernels pass
	group.

	* c-c++-common/goacc/kernels-double-reduction.c: New test.
---
 gcc/passes.def |  2 ++
 .../c-c++-common/goacc/kernels-double-reduction.c  | 37 ++
 2 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c

diff --git a/gcc/passes.def b/gcc/passes.def
index ae91ed1..e31e39f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -96,6 +96,8 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_tree_loop_init);
 	  NEXT_PASS (pass_lim);
 	  NEXT_PASS (pass_copy_prop);
+	  NEXT_PASS (pass_lim);
+	  NEXT_PASS (pass_copy_prop);
 	  NEXT_PASS (pass_scev_cprop);
   	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
 	  NEXT_PASS (pass_expand_omp_ssa);
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
new file mode 100644
index 000..81467a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -0,0 +1,37 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include 
+
+#define N 500
+
+unsigned int a[N][N];
+
+void  __attribute__((noinline,noclone))
+foo (void)
+{
+  int i, j;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin (a[0:N]) copy (sum)
+  {
+for (i = 0; i < N; ++i)
+  for (j = 0; j < N; ++j)
+	sum += a[i][j];
+  }
+
+  if (sum != 5001)
+abort ();
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */
-- 
1.9.1



[C++ Patch, preapproved] Prefer DECL_SOURCE_LOCATION to "+D" and "+#D" (2/n)

2015-07-28 Thread Paolo Carlini

Hi,

other bits. Tested x86_64-linux.

Thanks,
Paolo.

/
2015-07-28  Paolo Carlini  

* call.c (build_op_delete_call, convert_like_real, build_over_call):
Use Use DECL_SOURCE_LOCATION and "%qD" in inform and pedwarn instead
of "%q+D".
* constexpr.c (explain_invalid_constexpr_fn): Likewise.
* decl.c (duplicate_decls): Likewise for warning/warning_at.
* except.c (maybe_noexcept_warning): Likewise.
* friend.c (make_friend_class): Likewise for inform.
* mangle.c (mangle_decl): Likewise for warning/warning_at.
* method.c (process_subob_fn, walk_field_subobs,
maybe_explain_implicit_delete): Likewise for inform.
* parser.c (cp_parser_lambda_introducer): Likewise.
* pt.c (check_specialization_namespace,
maybe_process_partial_specialization): Likewise for permerror.
(redeclare_class_template): Likewise for inform_n.
(coerce_template_parms, tsubst_copy_and_build): Likewise for inform.
* search.c (check_final_overrider): Likewise.
* semantics.c (process_outer_var_ref): Likewise.
Index: call.c
===
--- call.c  (revision 226260)
+++ call.c  (working copy)
@@ -5843,7 +5843,7 @@ build_op_delete_call (enum tree_code code, tree ad
= G_("exception cleanup for this placement new selects "
 "non-placement operator delete");
  const char *msg2
-   = G_("%q+D is a usual (non-placement) deallocation "
+   = G_("%qD is a usual (non-placement) deallocation "
 "function in C++14 (or with -fsized-deallocation)");
 
  /* But if the class has an operator delete (void *), then that is
@@ -5865,7 +5865,7 @@ build_op_delete_call (enum tree_code code, tree ad
{
  if ((complain & tf_warning)
  && warning (OPT_Wc__14_compat, msg1))
-   inform (0, msg2, fn);
+   inform (DECL_SOURCE_LOCATION (fn), msg2, fn);
  goto ok;
}
 
@@ -5875,9 +5875,10 @@ build_op_delete_call (enum tree_code code, tree ad
{
  /* Only mention C++14 for namespace-scope delete.  */
  if (DECL_NAMESPACE_SCOPE_P (fn))
-   inform (0, msg2, fn);
+   inform (DECL_SOURCE_LOCATION (fn), msg2, fn);
  else
-   inform (0, "%q+D is a usual (non-placement) deallocation "
+   inform (DECL_SOURCE_LOCATION (fn),
+   "%qD is a usual (non-placement) deallocation "
"function", fn);
}
}
@@ -6333,8 +6334,8 @@ convert_like_real (conversion *convs, tree expr, t
  build_user_type_conversion (totype, convs->u.expr, LOOKUP_NORMAL,
  complain);
  if (fn)
-   inform (input_location, "  initializing argument %P of %q+D",
-   argnum, fn);
+   inform (DECL_SOURCE_LOCATION (fn),
+   "  initializing argument %P of %qD", argnum, fn);
}
   return error_mark_node;
 
@@ -6486,8 +6487,8 @@ convert_like_real (conversion *convs, tree expr, t
  gcc_unreachable ();
maybe_print_user_conv_context (convs);
if (fn)
- inform (input_location,
- "  initializing argument %P of %q+D", argnum, fn);
+ inform (DECL_SOURCE_LOCATION (fn),
+ "  initializing argument %P of %qD", argnum, fn);
return error_mark_node;
  }
 
@@ -7307,7 +7308,8 @@ build_over_call (struct z_candidate *cand, int fla
  pedwarn (input_location, 0, "deducing %qT as %qT",
   non_reference (TREE_TYPE (patparm)),
   non_reference (type));
- pedwarn (input_location, 0, "  in call to %q+D", cand->fn);
+ pedwarn (DECL_SOURCE_LOCATION (cand->fn), 0,
+  "  in call to %qD", cand->fn);
  pedwarn (input_location, 0,
   "  (you can disable this with -fno-deduce-init-list)");
}
Index: constexpr.c
===
--- constexpr.c (revision 226260)
+++ constexpr.c (working copy)
@@ -829,7 +829,8 @@ explain_invalid_constexpr_fn (tree fun)
 
   save_loc = input_location;
   input_location = DECL_SOURCE_LOCATION (fun);
-  inform (0, "%q+D is not usable as a constexpr function because:", fun);
+  inform (input_location,
+ "%qD is not usable as a constexpr function because:", fun);
   /* First check the declaration.  */
   if (is_valid_constexpr_fn (fun, true))
 {
Index: decl.c
===
--- decl.c  (revision 226260)
+++ decl.c  (working copy)
@@ -1378,8 +1378,9 @@ duplicate_decls (tree newdecl

[PATCH GCC]By pass following iterations if expr has already been simplified into const.

2015-07-28 Thread Bin Cheng
Hi,
This is an obvious change to bypass following iterations if expr has already
been simplified into constant in function
simplify_using_loop_initial_conditions.

Is it OK?

Thanks,
bin

2015-07-28  Bin Cheng  

* tree-ssa-loop-niter.c (simplify_using_initial_conditions): Break
Loop if EXPR is simplified to const values.
Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 225859)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -1815,6 +1815,10 @@ simplify_using_initial_conditions (struct loop *lo
   if (e->flags & EDGE_FALSE_VALUE)
cond = invert_truthvalue (cond);
   expr = tree_simplify_using_condition (cond, expr);
+  /* Break if EXPR is simplified to const values.  */
+  if (expr && (integer_zerop (expr) || integer_nonzerop (expr)))
+   break;
+
   ++cnt;
 }
 


Re: [gomp4] Add new oacc_transform patch

2015-07-28 Thread Thomas Schwinge
Hi!

On Tue, 21 Jul 2015 10:15:05 -0700, Cesar Philippidis  
wrote:
> Jakub,
> 
> Nathan pointed out that I should make the fold_oacc_reductions pass that
> I introduced in my reduction patch more generic so that other openacc
> transformations may use it. This patch introduces an empty skeleton pass
> called oacc_transform. Currently I'm stashing it inside omp-low.c. Is
> that a good place for it, or should I move it to it's own separate file?
> 
> The motivation behind this pass is to allow us to generate
> target-specific code in a generic manner. E.g., for reductions, I'm
> emitting calls to internal functions during lowering, then later on in
> this pass I'm expanding those calls using target machine hooks. This
> pass will run after lto on the target compiler.

(Another use case for this is to evaluate acc_on_device with compile-time
constant argument earlier than currently.)

Jakub, is this conceptually OK, or even OK to commit to trunk already?


Cesar, please address the following compiler diagnostig:

> 2015-07-21  Cesar Philippidis  
> 
>   gcc/
>   * omp-low.c (execute_oacc_transform): New function.
>   (class pass_oacc_transform): New function.
>   (make_pass_oacc_transform): New function.
>   * passes.def: Add pass_oacc_transform to all_passes.
>   * tree-pass.h (make_pass_oacc_transform): Declare.
>   
> 
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 388013c..23989f9 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -14394,4 +14394,76 @@ make_pass_late_lower_omp (gcc::context *ctxt)
>return new pass_late_lower_omp (ctxt);
>  }
>  
> +/* Main entry point for oacc transformations which run on the device
> +   compiler.  */
> +
> +static unsigned int
> +execute_oacc_transform ()
> +{
> +  basic_block bb;
> +  gimple_stmt_iterator gsi;
> +  gimple stmt;
> +
> +  if (!lookup_attribute ("oacc function",
> +  DECL_ATTRIBUTES (current_function_decl)))
> +return 0;
> +
> +
> +  FOR_ALL_BB_FN (bb, cfun)
> +{
> +  gsi = gsi_start_bb (bb);
> +
> +  while (!gsi_end_p (gsi))
> + {
> +   stmt = gsi_stmt (gsi);
> +   gsi_next (&gsi);
> + }
> +}
> +
> +  return 0;
> +}

[...]/source-gcc/gcc/omp-low.c: In function 'unsigned int 
execute_oacc_transform()':
[...]/source-gcc/gcc/omp-low.c:14406:10: error: variable 'stmt' set but not 
used [-Werror=unused-but-set-variable]
   gimple stmt;
  ^

> +
> +namespace {
> +
> +const pass_data pass_data_oacc_transform =
> +{
> +  GIMPLE_PASS, /* type */
> +  "fold_oacc_transform", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa, /* todo_flags_finish */
> +};
> +
> +class pass_oacc_transform : public gimple_opt_pass
> +{
> +public:
> +  pass_oacc_transform (gcc::context *ctxt)
> +: gimple_opt_pass (pass_data_oacc_transform, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual unsigned int execute (function *)
> +{
> +  bool gate = (flag_openacc != 0 && !seen_error ());
> +
> +  if (!gate)
> + return 0;
> +
> +  return execute_oacc_transform ();
> +}
> +
> +}; // class pass_oacc_transform
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_oacc_transform (gcc::context *ctxt)
> +{
> +  return new pass_oacc_transform (ctxt);
> +}
> +
>  #include "gt-omp-low.h"
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 43e67df..6a2b095 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -165,6 +165,7 @@ along with GCC; see the file COPYING3.  If not see
>INSERT_PASSES_AFTER (all_passes)
>NEXT_PASS (pass_fixup_cfg);
>NEXT_PASS (pass_lower_eh_dispatch);
> +  NEXT_PASS (pass_oacc_transform);
>NEXT_PASS (pass_all_optimizations);
>PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
>NEXT_PASS (pass_remove_cgraph_callee_edges);
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 13f20ea..67dc017 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -410,6 +410,7 @@ extern gimple_opt_pass *make_pass_late_lower_omp 
> (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_oacc_transform (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);


Grüße,
 Thomas


signature.asc
Description: PGP signature


RE: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

2015-07-28 Thread David Sherwood
> 
> On 07/27/2015 04:25 AM, David Sherwood wrote:
> > Hi,
> >
> > Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER 
> > (m)
> > so that it returns m in cases where there is no inner mode. This simplifies 
> > some
> > of the calling code by removing the need to check for VOIDmode and allows
> > calling it unconditionally. I also removed element_precision () as it was 
> > only
> > called in one place and thought it neater to call GET_MODE_PRECISION 
> > explicitly.
> >
> > Parts 2-4 will include further tidy-ups and optimisations based on [1/N].
> >
> > Good to go?
> >
> > Regards,
> > David Sherwood.
> >
> > 2015-07-17  David Sherwood
> >
> >  gcc/
> >  * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call
> >  GET_MODE_INNER unconditionally.
> >  * config/spu/spu.c (arith_immediate_p): Likewise.
> >  * config/i386/i386.c (ix86_build_signbit_mask): Likewise.  New 
> > variable.
> >  * expmed.c (synth_mult): Remove check for VOIDmode result from
> >  GET_MODE_INNER.
> >  (expand_mult_const): Likewise.
> >  * fold-const.c (): Replace call to element_precision with call to
> >  GET_MODE_PRECISION.
> >  * genmodes.c (emit_mode_inner_inline): Replace void_mode->name with
> >  m->name.
> >  (emit_mode_inner): Likewise.
> >  * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER
> >  result check.
> >  * machmode.h (GET_MODE_UNIT_SIZE): Simplify.
> >  (GET_MODE_UNIT_PRECISION): Likewise.
> >  * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally.
> >  * simplify-rtx.c (simplify_immed_subreg): Likewise.
> >  * stor-layout.c (bitwise_type_for_mode): Update assert.
> >  (element_precision): Remove.
> Somehow my brain kept translating INNER into NARROWER.  Naturally I was
> having considerable trouble seeing how the patch could be correct ;-)
> Looking at insn-modes.h cleared things up quickly.
> 
> In a lot of ways this makes GET_INNER_MODE act more like
> GET_MODE_NUNITS, which is probably good.
> 
> You need to update the comment for GET_MODE_INNER in machmode.h to
> reflect the change in its return value for non-vector modes.
Thanks for the quick response! Before I post a new patch, does this new
comment seem ok?

/* Where MODE represents a vector return the mode of the inner elements,
otherwise just return MODE.  */

Dave.

> 
> With that update, this patch is fine.
> 
> jeff





[PATCH GCC]Improve bound information in loop niter analysis

2015-07-28 Thread Bin Cheng
Hi,
Loop niter computes inaccurate bound information for different loops.  This
patch is to improve it by using loop initial condition in
determine_value_range.  Generally, loop niter is computed by subtracting
start var from end var in loop exit condition.  Moreover, loop bound is
computed using value range information of both start and end variables.
Basic idea of this patch is to check if loop initial condition implies more
range information for both start/end variables.  If yes, we refine range
information and use that to compute loop bound.
With this improvement, more accurate loop bound information is computed for
test cases added by this patch.

Is it OK?

Thanks,
bin

2015-07-28  Bin Cheng  

* tree-ssa-loop-niter.c (refine_value_range_using_guard): New.
(determine_value_range): Call refine_value_range_using_guard for
each loop initial condition to improve value range.

gcc/testsuite/ChangeLog
2015-07-28  Bin Cheng  

* gcc.dg/tree-ssa/loop-bound-1.c: New test.
* gcc.dg/tree-ssa/loop-bound-3.c: New test.
* gcc.dg/tree-ssa/loop-bound-5.c: New test.
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-3.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-3.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-3.c(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (unsigned char s, unsigned char l)
+{
+  unsigned char i;
+  int sum = 0;
+
+  for (i = s; i > l; i -= 1)
+{
+  sum += a[i];
+}
+
+  return sum;
+}
+
+/* Check loop niter bound information.  */
+/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-5.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-5.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-5.c(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (unsigned char s)
+{
+  unsigned char i;
+  int sum = 0;
+
+  for (i = s; i > 0; i -= 1)
+{
+  sum += a[i];
+}
+
+  return sum;
+}
+
+/* Check loop niter bound information.  */
+/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-1.c(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (unsigned char s, unsigned char l)
+{
+  unsigned char i;
+  int sum = 0;
+
+  for (i = s; i < l; i += 1)
+{
+  sum += a[i];
+}
+
+  return sum;
+}
+
+/* Check loop niter bound information.  */
+/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */
Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 225859)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -122,6 +122,233 @@ split_to_var_and_offset (tree expr, tree *var, mpz
 }
 }
 
+/* From condition C0 CMP C1 derives information regarding the value range
+   of VAR, which is of TYPE.  Results are stored in to BELOW and UP.  */
+
+static void
+refine_value_range_using_guard (tree type, tree var,
+   tree c0, enum tree_code cmp, tree c1,
+   mpz_t below, mpz_t up)
+{
+  tree varc0, varc1, ctype;
+  mpz_t offc0, offc1;
+  mpz_t mint, maxt, minc1, maxc1;
+  wide_int minv, maxv;
+  bool no_wrap = nowrap_type_p (type);
+  bool c0_ok, c1_ok;
+  signop sgn = TYPE_SIGN (type);
+
+  switch (cmp)
+{
+case LT_EXPR:
+case LE_EXPR:
+case GT_EXPR:
+case GE_EXPR:
+  STRIP_SIGN_NOPS (c0);
+  STRIP_SIGN_NOPS (c1);
+  ctype = TREE_TYPE (c0);
+  if (!useless_type_conversion_p (ctype, type))
+   return;
+
+  break;
+
+case EQ_EXPR:
+  /* We could derive quite precise information from EQ_EXPR, however,
+such a guard is unlikely to appear, so we do not bother with
+handling it.  */
+  return;
+
+case NE_EXPR:
+  /* NE_EXPR comparisons do not contain much of useful information,
+except for cases of comparing with bounds.  */
+  if (TREE_CODE (c1) != INTEGER_CST
+ || !INTEGRAL_TYPE_P (type))
+   return;
+
+  /* Ensure that the condition speaks about an expression in the same
+type as X and Y.  */
+  ctype = TREE_TYPE (c0);
+  if (TYPE_PRECISION (ctype) != TYPE

[PATCH GCC]Improve loop bound info by simplifying conversions in iv base

2015-07-28 Thread Bin Cheng
Hi,
For now, SCEV may compute iv base in the form of "(signed T)((unsigned
T)base + step))".  This complicates other optimizations/analysis depending
on SCEV because it's hard to dive into type conversions.  For many cases,
such type conversions can be simplified with additional range information
implied by loop initial conditions.  This patch does such simplification.
With simplified iv base, loop niter analysis can compute more accurate bound
information since sensible value range can be derived for "base+step".  For
example, accurate loop bound&may_be_zero information is computed for cases
added by this patch.
The code is actually borrowed from loop_exits_before_overflow.  Moreover,
with simplified iv base, the second case handled in that function now
becomes the first case.  I didn't remove that part of code because it may(?)
still be visited in scev analysis itself and simple_iv isn't an interface
for that.

Is it OK?

Thanks,
bin

2015-07-28  Bin Cheng  

* tree-ssa-loop-niter.c (tree_simplify_using_condition): Export
the interface.
* tree-ssa-loop-niter.h (tree_simplify_using_condition): Declare.
* tree-scalar-evolution.c (simple_iv): Simplify type conversions
in iv base using loop initial conditions.

gcc/testsuite/ChangeLog
2015-07-28  Bin Cheng  

* gcc.dg/tree-ssa/loop-bound-2.c: New test.
* gcc.dg/tree-ssa/loop-bound-4.c: New test.
* gcc.dg/tree-ssa/loop-bound-6.c: New test.
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-2.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-2.c(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (signed char s, signed char l)
+{
+  signed char i;
+  int sum = 0;
+
+  for (i = s; i < l; i++)
+{
+  sum += a[i];
+}
+
+  return sum;
+}
+
+/* Check loop niter bound information.  */
+/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "zero if " "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-4.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-4.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-4.c(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (signed char s, signed char l)
+{
+  signed char i;
+  int sum = 0;
+
+  for (i = s; i > l; i--)
+{
+  sum += a[i];
+}
+
+  return sum;
+}
+
+/* Check loop niter bound information.  */
+/* { dg-final { scan-tree-dump "bounded by 254" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "bounded by 255" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "zero if " "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-bound-6.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-bound-6.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-bound-6.c(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int *a;
+
+int
+foo (signed char s)
+{
+  signed char i;
+  int sum = 0;
+
+  for (i = s; i > 0; i--)
+{
+  sum += a[i];
+}
+
+  return sum;
+}
+
+/* Check loop niter bound information.  */
+/* { dg-final { scan-tree-dump "bounded by 126" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "bounded by 127" "ivopts" } } */
+/* { dg-final { scan-tree-dump-not "zero if " "ivopts" } } */
Index: gcc/tree-ssa-loop-niter.c
===
--- gcc/tree-ssa-loop-niter.c   (revision 225859)
+++ gcc/tree-ssa-loop-niter.c   (working copy)
@@ -1779,9 +2043,9 @@ tree_simplify_using_condition (tree cond, tree exp
 
 /* Tries to simplify EXPR using the conditions on entry to LOOP.
Returns the simplified expression (or EXPR unchanged, if no
-   simplification was possible).*/
+   simplification was possible).  */
 
-static tree
+tree
 simplify_using_initial_conditions (struct loop *loop, tree expr)
 {
   edge e;
Index: gcc/tree-ssa-loop-niter.h
===
--- gcc/tree-ssa-loop-niter.h   (revision 225859)
+++ gcc/tree-ssa-loop-niter.h   (working copy)
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_TREE_SSA_LOOP_NITER_H
 
 extern tree expand_simple_operations (tree, tree = NULL);
+extern tree simplify_using_initial_conditions (struct loop *, tree);
 extern bool loop_only_exit_p (const struct loop *, const_edge);
 extern bool number_of_iterations_exit (struct loop *, edge,
   struct tree_niter_desc *niter, bool,
Index: gcc/

Re: [PATCH, PR66846] Mark inner loop for fixup in parloops

2015-07-28 Thread Richard Biener
On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries  wrote:
> On 20/07/15 15:04, Tom de Vries wrote:
>>
>> On 16/07/15 12:15, Richard Biener wrote:
>>>
>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>>  wrote:

 On 16/07/15 10:44, Richard Biener wrote:
>
>
> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries 
> wrote:
>>
>>
>> Hi,
>>
>> I.
>>
>> In openmp expansion of loops, we do some effort to try to create
>> matching
>> loops in the loop state of the child function, f.i.in
>> expand_omp_for_generic:
>> ...
>> struct loop *outer_loop;
>> if (seq_loop)
>>   outer_loop = l0_bb->loop_father;
>> else
>>   {
>> outer_loop = alloc_loop ();
>> outer_loop->header = l0_bb;
>> outer_loop->latch = l2_bb;
>> add_loop (outer_loop, l0_bb->loop_father);
>>   }
>>
>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>   {
>> struct loop *loop = alloc_loop ();
>> loop->header = l1_bb;
>> /* The loop may have multiple latches.  */
>> add_loop (loop, outer_loop);
>>   }
>> ...
>>
>> And if that doesn't work out, we try to mark the loop state for
>> fixup, in
>> expand_omp_taskreg and expand_omp_target:
>> ...
>> /* When the OMP expansion process cannot guarantee an
>> up-to-date
>>loop tree arrange for the child function to fixup
>> loops.  */
>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>   child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>> ...
>>
>> and expand_omp_for:
>> ...
>> else
>>   /* If there isn't a continue then this is a degerate case where
>>  the introduction of abnormal edges during lowering will
>> prevent
>>  original loops from being detected.  Fix that up.  */
>>   loops_state_set (LOOPS_NEED_FIXUP);
>> ...
>>
>> However, loops are fixed up anyway, because the first pass we execute
>> with
>> the new child function is pass_fixup_cfg.
>>
>> The new child function contains a function call to
>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>> subsequently
>> the loops with LOOPS_NEED_FIXUP.
>>
>>
>> II.
>>
>> This patch adds a verification that at the end of the omp-expand
>> processing
>> of the child function, either the loop structure is ok, or marked for
>> fixup.
>>
>> This verfication triggered a failure in parloops. When an outer
>> loop is
>> being parallelized, both the outer and inner loop are cancelled. Then
>> during
>> omp-expansion, we create a loop in the loop state for the outer
>> loop (the
>> one that is transformed), but not for the inner, which causes the
>> verification failure:
>> ...
>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>> ...
>>
>> [ I ran into this verification failure with an openacc kernels
>> testcase
>> on
>> the gomp-4_0-branch, where parloops is called additionally from a
>> different
>> location, and pass_fixup_cfg is not the first pass that the child
>> function
>> is processed by. ]
>>
>> The patch contains a bit that makes sure that the loop state of the
>> child
>> function is marked for fixup in parloops. The bit is non-trival
>> since it
>> create a loop state and sets the fixup flag on the loop state, but
>> postpones
>> the init_loops_structure call till move_sese_region_to_fn, where it
>> can
>> succeed.
>>
>>
>>
>> 
>>
>>> Can we fix the root-cause of the issue instead?  That is, build a
>>> valid loop
>>> structure in the first place?
>>>
>>
>> This patch manages to keep the loop structure, that is, to not cancel
>> the loop tree in parloops, and guarantee a valid loop structure at the
>> end of parloops.
>>
>> The transformation to insert the omp_for invalidates the loop state
>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>> we drop those in parloops.
>>
>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>> the omp_for, and keep it.
>>
>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>
>
> This updated patch tries a more minimal approach.
>
> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
> exit instead.
>
> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>
> Bootstrapped and reg-tested on x86_

Re: [gomp4] Fix some gomp tests

2015-07-28 Thread Thomas Schwinge
Hi Nathan!

On Sat, 25 Jul 2015 16:02:01 -0400, Nathan Sidwell  wrote:
> I've committed this to gomp4 branch.  It fixes some tests that were incorrect 

Hmm, I fail to see what you deem incorrect in the following two Fortran
test cases?  Implicit present_or_copy clauses should be added by the
compiler, basically equal to your explicit present clauses.

> and fail with some development I am working on.

Fail in what way?  I'd expect the original code still to be valid?

>   * testsuite/libgomp.oacc-fortran/data-2.f90: Add present clauses
>   to parallels.
>   * testsuite/libgomp.oacc-fortran/lib-14.f90: Likewise.

> --- libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 (revision 226189)
> +++ libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 (working copy)
> @@ -19,7 +19,7 @@ program test
>  
>!$acc enter data copyin (a(1:N,1:N), b(1:N,1:N))
>  
> -  !$acc parallel
> +  !$acc parallel present (a(1:N,1:N), b(1:N,1:N))
>do i = 1, n
>  do j = 1, n
>b(j,i) = a (j,i)
> @@ -45,7 +45,7 @@ program test
>!$acc enter data copyin (c(1:N)) create (d(1:N)) async
>!$acc wait
>
> -  !$acc parallel 
> +  !$acc parallel present (c(1:N), d(1:N))
>  do i = 1, N
>d(i) = c(i) + 1
>  end do
> @@ -65,7 +65,7 @@ program test
>!$acc enter data create (d(1:N)) wait
>!$acc wait
>  
> -  !$acc parallel 
> +  !$acc parallel present (c(1:N), d(1:N))
>  do i = 1, N
>d(i) = c(i) + 1
>  end do
> @@ -128,7 +128,7 @@ program test
>if (acc_is_present (c) .eqv. .FALSE.) call abort
>if (acc_is_present (d) .eqv. .FALSE.) call abort
>  
> -  !$acc parallel
> +  !$acc parallel present (c(0:N), d(0:N))
>  do i = 1, N
>c(i) = 1.0;
>d(i) = 2.0;

> --- libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90 (revision 226189)
> +++ libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90 (working copy)
> @@ -46,7 +46,7 @@ program main
>  
>if (acc_is_present (h) .neqv. .TRUE.) call abort
>  
> -  !$acc parallel loop
> +  !$acc parallel loop present (h)
>  do i = 1, N
>h(i) = i
>  end do


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-28 Thread Kyrill Tkachov

Hi Jeff,

On 27/07/15 17:40, Kyrill Tkachov wrote:

On 27/07/15 17:09, Jeff Law wrote:

On 07/27/2015 04:17 AM, Kyrill Tkachov wrote:

I experimented with resource.c and the roadblock I hit is that it
seems to have an assumption that it operates on hard regs (in fact
the struct it uses to describe the resources has a HARD_REG_SET for
the regs) and so it triggers various HARD_REGISTER_P asserts when I
try to use the functions there. if-conversion runs before register
allocation, so we're dealing with pseudos here.

Sigh.  resource.c probably isn't going to be useful then.


My other attempt was to go over BB_A and mark the set registers in a
   bitmap then go over BB_B and do a FOR_EACH_SUBRTX of the SET_SRC of
each insn. If a sub-rtx is a reg that is set in the bitmap from BB_A
we return false. This seemed to do the job and testing worked out ok.
That would require one walk over BB_A, one walk over BB_B but I don't
know how expensive FOR_EACH_SUBRTX walks are...

Would that be an acceptable solution?

I think the latter is reasonable.  Ultimately we have to do a full look
at those rtxs, so it's unavoidable to some extent.

The only other possibility would be to use the DF framework.  I'm not
sure if it's even initialized for the ifcvt code.  If it is, then you
might be able to avoid some of the walking of the insns and instead walk
the DF structures.

I think it is initialized (I look at df_get_live_out earlier on
in the call chain). I suppose what we want is for the live in regs for BB_B
to not include any of the set regs in BB_A?







It fails when the last insn is not recognised, because
noce_try_cmove_arith can modify the last insn, but I have not seen
it cause any trouble. If it fails then back in noce_try_cmove_arith
we goto end_seq_and_fail which ends the sequence and throws it away
(and cancels if-conversion down that path), so it should be safe.
OK, I was working for the assumption that memoization ought not
fail, but it seems that was a bad assumption on my part.So
given noce_try_cmove_arith can change the last insn and make it
no-recognizable this code seems reasoanble.

So I think the only outstanding issues are:

1. Investigate moving rather than re-emitting insns.

I'll look into that, but what is the machinery by which one moves
insns?

I don't think we have any good generic machinery for this.  I think
every pass that needs this capability unlinks the insn from the chain
and patches it back in at the new location.

That's the SET_PREV_INSN, SET_NEXT_INSN functions, right?

The current way the top-level noce_process_if_block is structured
it expects the various ifcvt functions (like noce_try_cmove_arith)
to generate a sequence, then it takes it, unshares it and removes
the empty basic blocks.

If we're to instead move insns around we'd need to further modify
   noce_process_if_block to handle differently
   this one case where we move insns instead of re-emitting them.
I think this would make that function more convoluted than it needs to be.
With the current approach we always call unshare_all_rtl_in_chain on the
emitted sequence which should take care of any RTL sharing issues and in
practice I don't expect to have more than 3-4 insns in these sequences since
they will be guarded by the branch cost.

So I would rather argue for re-emitting insns in this case to keep consistent
with the dozen or so similar functions in ifcvt.c that already work that way.


Here's a respin.
I've reworked bbs_ok_for_cmove_arith to go over BB_A once and record
the set registers then go over BB_B once and look inside the SET_SRC
of each insn for those registers. How does this look? Would you like
me to investigate the data-flow infrastructure approach?

Also, in bb_valid_for_noce_process_p I iterate through the sub-rtxes
looking for a MEM with the FOR_EACH_SUBRTX machinery.

As I said above, I think moving the insns rather than re-emitting them
would make the function more convoluted than I think it needs to be.

Bootstrapped and tested on arm, aarch64, x86_64.



2015-07-28  Kyrylo Tkachov  

* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
then_cost, else_cost fields.  Change branch_cost field to unsigned int.
(end_ifcvt_sequence): Call set_used_flags on each insn in the
sequence.
(noce_simple_bbs): New function.
(noce_try_move): Bail if basic blocks are not simple.
(noce_try_store_flag): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(bbs_ok_for_cmove_arith): New function.
(noce_emit_all_but_last): Likewise.
(noce_emit_insn): Likewise.
(noce_emit_bb): Likewise.
(noce_try_cmove_arith): Handle non-simple basic blocks.
(insn_valid_noce_process_p): New function.
(contains_mem_rtx_p): Likewise.

Re: Re: [PATCH][ARM] PR target/66731 Fix vnmul insn with -frounding-math

2015-07-28 Thread Szabolcs Nagy

On 24/07/15 12:27, Kyrill Tkachov wrote:

On 24/07/15 12:10, Szabolcs Nagy wrote:

(-a)*b should not be compiled to vnmul a,b with -frounding-math.
Added a new -(a*b) pattern for vnmul and the old one is only
used if !flag_rounding_math.  Updated the costs too.

This is the ARM version of
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00300.html

Tested with arm-none-linux-gnueabihf cross compiler.
is this OK?

gcc/Changelog:

2015-07-20  Szabolcs Nagy  

PR target/66731
* config/arm/arm.md (muldf3negdf_vfp): Handle -frounding-math.
(mulsf3negsf_vfp): Likewise.


This entry is misleading. You disable two existing patterns
for flag_rounding_math and you add two new patterns. The
entry should reflect that.

..

Can you give the new pattern a different name to reflect that
the neg is on the outside? Something like *negmulsf3_vfp.

..

+/* { dg-options "-O2 -mfpu=vfp -mfloat-abi=hard" } */

Can you please add an explicit -fno-rounding-math here? That way we get a hint 
as to
why these tests exist. Alternatively, you can rename the tests to be 
pr66731_1.c,
pr66731_2.c etc. That way in the future we'll know what issue they're testing 
for.

..

+float
+foo_s (float a, float b)
+{
+  /* { dg-final { scan-assembler "vneg\.f32" } } */
+  /* { dg-final { scan-assembler "vmul\.f32" } } */
+  return -a * b;
+}

I'd prefer if you just do a scan-assembler not "vnmul", which is what this
patch really fixes. Whether the midend decides to use a pair of vneg+vmul
is tangential to this patch, it's the vnmul that we're trying to avoid.


[v2]:
- used different names for the new patterns
- fixed change log accordingly
- used explicit -fno-rounding-math in tests
- used scan-assembler-not "vnmul"

(I havent changed the names of the tests to be
consistent with the aarch64 patches but if ppl
prefer pr.c name I can do that.)

gcc/Changelog:

2015-07-28  Szabolcs Nagy  

PR target/66731
* config/arm/arm.md (negmuldf3_vfp): Add new pattern.
(negmulsf3_vfp): Likewise.
(muldf3negdf_vfp): Disable for -frounding-math.
(mulsf3negsf_vfp): Likewise.
* config/arm/arm.c (arm_new_rtx_costs): Fix NEG cost for VNMUL,
fix MULT cost with -frounding-math.

gcc/testsuite/Changelog:

2015-07-28  Szabolcs Nagy  

PR target/66731
* gcc.target/arm/vnmul-1.c: New.
* gcc.target/arm/vnmul-2.c: New.
* gcc.target/arm/vnmul-3.c: New.
* gcc.target/arm/vnmul-4.c: New.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e1bc727..797c9e5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10177,7 +10177,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	{
 	  rtx op0 = XEXP (x, 0);
 
-	  if (GET_CODE (op0) == NEG)
+	  if (GET_CODE (op0) == NEG && !flag_rounding_math)
 	op0 = XEXP (op0, 0);
 
 	  if (speed_p)
@@ -10251,6 +10251,13 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
   if (TARGET_HARD_FLOAT && GET_MODE_CLASS (mode) == MODE_FLOAT
 	  && (mode == SFmode || !TARGET_VFP_SINGLE))
 	{
+	  if (GET_CODE (XEXP (x, 0)) == MULT)
+	{
+	  /* VNMUL.  */
+	  *cost = rtx_cost (XEXP (x, 0), mode, NEG, 0, speed_p);
+	  return true;
+	}
+
 	  if (speed_p)
 	*cost += extra_cost->fp[mode != SFmode].neg;
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index f62ff79..081aab2 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -770,6 +770,17 @@
   [(set (match_operand:SF		   0 "s_register_operand" "=t")
 	(mult:SF (neg:SF (match_operand:SF 1 "s_register_operand" "t"))
 		 (match_operand:SF	   2 "s_register_operand" "t")))]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP && !flag_rounding_math"
+  "vnmul%?.f32\\t%0, %1, %2"
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
+   (set_attr "type" "fmuls")]
+)
+
+(define_insn "*negmulsf3_vfp"
+  [(set (match_operand:SF		   0 "s_register_operand" "=t")
+	(neg:SF (mult:SF (match_operand:SF 1 "s_register_operand" "t")
+		 (match_operand:SF	   2 "s_register_operand" "t"]
   "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP"
   "vnmul%?.f32\\t%0, %1, %2"
   [(set_attr "predicable" "yes")
@@ -781,6 +792,18 @@
   [(set (match_operand:DF		   0 "s_register_operand" "=w")
 	(mult:DF (neg:DF (match_operand:DF 1 "s_register_operand" "w"))
 		 (match_operand:DF	   2 "s_register_operand" "w")))]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE
+  && !flag_rounding_math"
+  "vnmul%?.f64\\t%P0, %P1, %P2"
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
+   (set_attr "type" "fmuld")]
+)
+
+(define_insn "*negmuldf3_vfp"
+  [(set (match_operand:DF		   0 "s_register_operand" "=w")
+	(neg:DF (mult:DF (match_operand:DF 1 "s_register_operand" "w")
+		 (match_operand:DF	   2 "s_register_operand" "w"]
   "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
   "vnmul%?.f64\\t%P0, %P1, %P2"
   [(set_attr "predicable" "yes")
d

Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-28 Thread Richard Biener
On Fri, Jul 17, 2015 at 10:12 PM, Abe  wrote:
> Dear all,
>
> Another benefit of the new if converter that perhaps I neglected to
> mention/explain...
>
> TLDR: for some source code which can reasonably be expected to exist in
> "real-world code",
> when the false/true values of the condition bits of a given "if" in a given
> loop are very
> well-clustered, the code produced by the new converter runs _much_ faster
> for the same
> inputs than the code produced by the old converter when write-back caching
> is in effect.
>
> The long explanation follows.
>
>
>
> In the case of a loop with a "half-hammock" that looks something like this:
>
>   if (C[index])  A[index] = foo(bar);
>   // important: no else here
>
> ... or problem-wise-equivalently:
>
>   if (C[index])  ; // empty "then" section
>   else   B[index] = foo(bar);
>
> ... the latter of which is semantically equivalent to:
>
>   if (! C[index])  B[index] = foo(bar);
>   // important: no else here
>
>
> ... the old if converter does something that may massively damage
> performance.
>
> Basically, the old if converter converts...
>
>   if (C[index])  A[index] = foo(bar);
>   // important: no else here
>
> ... to the equivalent of:
>
>   __compiler_temp = foo(bar);
>   A[index] = C[index] ? __compiler_temp : A[index];
>
>
> For now, let`s assume the preceding conversion is valid even in the face of
> multithreading,
> since multithreading bugs introduced by an "optimization" are a whole other
> ball of wax than what this message is all about; for now, let`s assume that
> all of A[] is thread-local and no nasty, sneaky pointer-passing has
> occurred.
>
> The problem is this: the compiler cannot, in the general case, predict what
> the values of C[]
> will be at runtime.  Therefor, it cannot [again, in the general case] arrive
> at a conclusion
> "this is definitely worth it" or "this is definitely _not_ worth it".  All
> the compiler can do
> statically without profiling information is to say "I guess a probability of
> 50% on the
> elements of C[] being equivalent to true", which -- under an assumption of
> vectorization --
> means that the vectorization factor is going to make the transformation
> worthwhile.
>
> However: what if the values of C[] are mostly equivalent to false, not to
> true?  For such
> cases, the old if converter yielded code that may cause a big performance
> degradation due to
> the if conversion, even in the presence of vectorization.  If we assume that
> the CPU hardware
> is not checking to see whether writes to an address change the contents,
> then each execution
> of "A[index] = C[index] ? foo(bar) : A[index];" is causing a write to occur
> *_no matter
> what the value of "C[index]" is/was_*.  Now, instead of reading the whole
> A[] and writing
> a tiny fraction of it, the program is reading all of A[] and also
> (re)writing at least
> almost all of A[] (possibly writing all of it even when the probability of
> the
> relevant elements of C[] is _not_ 100%, due to cache-line granularity of
> writes:
> when every cache line from A[] is modified, all of A[] will be rewritten).
>
> The preceding problem could be somewhat ameliorated by profiling, providing
> that the data
> you run through your program while profiling is a good representation of the
> data run
> through the same program by "real executions", but there is no need for that
> extra work
> or extra qualification given the existence of the new if converter.  Plus,
> the profiling
> approach to "fixing" this problem with the old converter would only result
> in a binary
> decision -- "do convert this if" vs. "don`t convert this if" -- which in
> cases where the
> decision is to do/retain the conversion, the converted code is going to
> rewrite the whole
> array.  The new converter`s conversion, OTOH, can produce better performance
> than the
> conversion from the old converter in cases where the elements of C[] in our
> example are
> very clustered: in other words, the _overall_ probability can still be close
> [or =] to the
> hardest-to-deal-with challenge of 50%, but there is a large degree of
> clustering of the
> "true" values and the "false" values.  For example, all the "false" values
> come first in C[].
> In this case, if a profiling-based approach using the old converter decides
> to do/keep
> the conversion, then there are still lots of wasted writes that the new
> converter
> would avoid, assuming at least one level of write-back cache in the relevant
> data path.
>
> The key factor to understand for understanding how/why the new converter`s
> resulting code
> is better than that of the old converter is this: the new converter uses a
> scratchpad
> to "throw away" useless writes.  This not only fixes problems with
> speculative writes
> through a null pointer that the pre-conversion code never actually does, it
> also fixes
> the above-described potential performance problem, at least on architectures
> with
> write-back data cache, which AFAIK co

[PATCH][AArch64] Properly handle simple arith+extend ops in rtx costs

2015-07-28 Thread Kyrill Tkachov

Hi all,

Currently we assign the wrong rtx cost to instructions of the form
  add x0, x0, x1, sxtw

that is, an arith operation plus a single extend (no shifting).
We correctly catch the cases where the extend is inside a shift, but
not the simple case.

This patch fixes that oversight by catching the simple case in
aarch64_rtx_arith_op_extract_p and thus making sure that it gets
assigned the alu.extend_arith extra cost.

Bootstrapped and tested on aarch64.

Ok for trunk?
Thanks,
Kyrill


2015-07-28  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_rtx_arith_op_extract_p):
Handle simple SIGN_EXTEND or ZERO_EXTEND.
(aarch64_rtx_costs): Properly strip extend or extract before
passing down to rtx costs again.
commit 6ad208ea10b0893b356dab9d0c6f59821441229c
Author: Kyrylo Tkachov 
Date:   Fri Jul 24 15:02:10 2015 +0100

[AArch64] Properly handle simple arith+extend ops in rtx costs

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 617c079..eb70c30 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5622,6 +5622,11 @@ aarch64_rtx_arith_op_extract_p (rtx x, machine_mode mode)
 	  return true;
 	}
 }
+  /* The simple case , XD, XN, XM, [us]xt.
+ No shift.  */
+  else if (GET_CODE (x) == SIGN_EXTEND
+	   || GET_CODE (x) == ZERO_EXTEND)
+return REG_P (XEXP (x, 0));
 
   return false;
 }
@@ -6133,7 +6138,8 @@ cost_minus:
 	if (speed)
 	  *cost += extra_cost->alu.extend_arith;
 
-	*cost += rtx_cost (XEXP (XEXP (op1, 0), 0), VOIDmode,
+	op1 = aarch64_strip_extend (op1);
+	*cost += rtx_cost (op1, VOIDmode,
 			   (enum rtx_code) GET_CODE (op1), 0, speed);
 	return true;
 	  }
@@ -6211,7 +6217,8 @@ cost_plus:
 	if (speed)
 	  *cost += extra_cost->alu.extend_arith;
 
-	*cost += rtx_cost (XEXP (XEXP (op0, 0), 0), VOIDmode,
+	op0 = aarch64_strip_extend (op0);
+	*cost += rtx_cost (op0, VOIDmode,
 			   (enum rtx_code) GET_CODE (op0), 0, speed);
 	return true;
 	  }


Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-28 Thread Jonathan Wakely

On 27/07/15 19:40 -0700, Tim Shen wrote:

Done by s/_M_add_collating_element/_M_add_collate_element/.


Great, thanks. OK for trunk and gcc-5-branch.


Re: [PATCH] Unswitching outer loops.

2015-07-28 Thread Richard Biener
On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev  wrote:
> Hi Richard,
>
> I checked that both test-cases from 23855 are sucessfully unswitched
> by proposed patch. I understand that it does not catch deeper loop
> nest as
>for (i=0; i<10; i++)
>  for (j=0;j for (k=0;k<20;k++)
>   ...
> but duplication of middle-loop does not look reasonable.
>
> Here is dump for your second test-case:
>
> void foo(int *ie, int *je, double *x)
> {
>   int i, j;
>   for (j=0; j<*je; ++j)
> for (i=0; i<*ie; ++i)
>   x[i+j] = 0.0;
> }
> grep -i unswitch t6.c.119t.unswitch
> ;; Unswitching outer loop

I was saying that why go with a limited approach when a patch (in
unknown state...)
is available that does it more generally?  Also unswitching is quite
expensive compared
to "moving" the invariant condition.

In your patch:

+  if (!nloop->force_vectorize)
+nloop->force_vectorize = true;
+  if (loop->safelen != 0)
+nloop->safelen = loop->safelen;

I see no guard on force_vectorize so = true looks bogus here.  Please just use
copy_loop_info.

+  if (integer_nonzerop (cond_new))
+gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
+  else if (integer_zerop (cond_new))
+gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);

gimple_cond_make_true/false (cond_stmt);

btw, seems odd that we have to recompute which loop is the true / false variant
when we just fed a guard condition to loop_version.  Can't we statically
determine whether loop or nloop has the in-loop condition true or false?

+  /* Clean-up cfg to remove useless one-argument phi in exit block of
+ outer-loop.  */
+  cleanup_tree_cfg ();

I know unswitching is already O(number-of-unswitched-loops * size-of-function)
because it updates SSA form after each individual unswitching (and it does that
because it invokes itself recursively on unswitched loops).  But do you really
need to invoke CFG cleanup here?

Richard.

> Yuri.
>
> 2015-07-14 14:06 GMT+03:00 Richard Biener :
>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is presented simple transformation which tries to hoist out of
>>> outer-loop a check on zero trip count for inner-loop. This is very
>>> restricted transformation since it accepts outer-loops with very
>>> simple cfg, as for example:
>>> acc = 0;
>>>for (i = 1; i <= m; i++) {
>>>   for (j = 0; j < n; j++)
>>>  if (l[j] == i) { v[j] = acc; acc++; };
>>>   acc <<= 1;
>>>}
>>>
>>> Note that degenerative outer loop (without inner loop) will be
>>> completely deleted as dead code.
>>> The main goal of this transformation was to convert outer-loop to form
>>> accepted by outer-loop vectorization (such test-case is also included
>>> to patch).
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> Is it OK for trunk?
>>
>> I think this is
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>
>> as well.  It has a patch adding a invariant loop guard hoisting
>> phase to loop-header copying.  Yeah, it needs updating to
>> trunk again I suppose.  It's always non-stage1 when I come
>> back to that patch.
>>
>> Your patch seems to be very specific and only handles outer
>> loops of innermost loops.
>>
>> Richard.
>>
>>> ChangeLog:
>>> 2015-07-10  Yuri Rumyantsev  
>>>
>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>> (tree_unswitch_outer_loop): New function.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.


Re: [PATCH][AArch64] Properly handle simple arith+extend ops in rtx costs

2015-07-28 Thread Richard Earnshaw
On 28/07/15 11:25, Kyrill Tkachov wrote:
> Hi all,
> 
> Currently we assign the wrong rtx cost to instructions of the form
>add x0, x0, x1, sxtw
> 
> that is, an arith operation plus a single extend (no shifting).
> We correctly catch the cases where the extend is inside a shift, but
> not the simple case.
> 
> This patch fixes that oversight by catching the simple case in
> aarch64_rtx_arith_op_extract_p and thus making sure that it gets
> assigned the alu.extend_arith extra cost.
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

OK.

R.

> Thanks,
> Kyrill
> 
> 
> 2015-07-28  Kyrylo Tkachov  
> 
>  * config/aarch64/aarch64.c (aarch64_rtx_arith_op_extract_p):
>  Handle simple SIGN_EXTEND or ZERO_EXTEND.
>  (aarch64_rtx_costs): Properly strip extend or extract before
>  passing down to rtx costs again.
> 
> 
> aarch64-arith-extend-costs.patch
> 
> 
> commit 6ad208ea10b0893b356dab9d0c6f59821441229c
> Author: Kyrylo Tkachov 
> Date:   Fri Jul 24 15:02:10 2015 +0100
> 
> [AArch64] Properly handle simple arith+extend ops in rtx costs
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 617c079..eb70c30 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5622,6 +5622,11 @@ aarch64_rtx_arith_op_extract_p (rtx x, machine_mode 
> mode)
> return true;
>   }
>  }
> +  /* The simple case , XD, XN, XM, [us]xt.
> + No shift.  */
> +  else if (GET_CODE (x) == SIGN_EXTEND
> +|| GET_CODE (x) == ZERO_EXTEND)
> +return REG_P (XEXP (x, 0));
>  
>return false;
>  }
> @@ -6133,7 +6138,8 @@ cost_minus:
>   if (speed)
> *cost += extra_cost->alu.extend_arith;
>  
> - *cost += rtx_cost (XEXP (XEXP (op1, 0), 0), VOIDmode,
> + op1 = aarch64_strip_extend (op1);
> + *cost += rtx_cost (op1, VOIDmode,
>  (enum rtx_code) GET_CODE (op1), 0, speed);
>   return true;
> }
> @@ -6211,7 +6217,8 @@ cost_plus:
>   if (speed)
> *cost += extra_cost->alu.extend_arith;
>  
> - *cost += rtx_cost (XEXP (XEXP (op0, 0), 0), VOIDmode,
> + op0 = aarch64_strip_extend (op0);
> + *cost += rtx_cost (op0, VOIDmode,
>  (enum rtx_code) GET_CODE (op0), 0, speed);
>   return true;
> }
> 



Re: [RFC] Elimination of zext/sext - type promotion pass

2015-07-28 Thread Richard Biener
On Tue, Jun 2, 2015 at 1:14 AM, Kugan  wrote:
>

Sorry for replying so late...

> On 08/05/15 22:48, Richard Biener wrote:
>> You compute which promotions are unsafe, like sources/sinks of memory
>> (I think you miss call arguments/return values and also asm operands here).
>> But instead of simply marking those SSA names as not to be promoted
>> I'd instead split their life-ranges, thus replace
>>
>>   _1 = mem;
>>
>> with
>>
>>   _2 = mem;
>>   _1 = [zs]ext (_2, ...);
>>
>> and promote _1 anyway.  So in the first phase I'd do that (and obviously
>> note that _2 isn't to be promoted in the specific example).
>>
>> For promotions that apply I wouldn't bother allocating new SSA names
>> but just "fix" their types (assign to their TREE_TYPE).  This also means
>> they have to become anonymous and if they didn't have a !DECL_IGNORED_P
>> decl before then a debug stmt should be inserted at the point of the
>> promotions.  So
>>
>>   bar_3 = _1 + _2;
>>
>> when promoted would become
>>
>>  _4 = _1 + _2;
>>  _3 = sext <_4, ...>;
>>  # DEBUG bar = (orig-type) _4;  // or _3?
>>
>> so you'd basically always promote defs (you have a lot of stmt/operand
>> walking code I didn't look too closely at - but it looks like too much) and
>> the uses get promoted automagically (because you promote the original
>> SSA name). Promotion of constants has to remain, of course.
>
>
> Thanks Richard. I experimented on this idea to understand it better.
> Please see the attached prototype (I am still working on your other
> comments which is not addressed here). Please have a look and let me
> know if this is along what you would expect. I have few questions though.
>
> 1. In the following example above :
>   char _1;
>   _1 = mem;
>
> when changing with
>
>   char _2;
>   int _1;
>   _2 = mem;
>   _1 = [zs]ext (_2, ...);
>
> for the [zs]ext operation we now use BIT_AND_EXPR and ZEXT_EXPR which
> (as of now) requires that the LHS and RHS are of the same type. Are you
> suggesting that we should have a true ZEXT_EXPR and SEXT_EXPR which can
> do the above in the gimple? I am now using CONVER_EXPR and which is the
> source of many optimization issue.

You indeed need to use CONVERT_EXPR here, maybe you can elaborate
on the optimization issues.

> 2. for inline asm (a reduced test case that might not make much as a
> stand alone test-case, but I ran into similar cases with valid programmes)
>
> ;; Function fn1 (fn1, funcdef_no=0, decl_uid=4220, cgraph_uid=0,
> symbol_order=0)
>
> fn1 (short int p1)
> {
>   :
>   __asm__("" : "=r" p1_2 : "0" p1_1(D));
>   return;
>
> }
>
>
> I am generating something like the following which ICEs. What is the
> expected out?
>
> ;; Function fn1 (fn1, funcdef_no=0, decl_uid=4220, cgraph_uid=0,
> symbol_order=0)
>
> fn1 (short int p1)
> {
>   int _1;
>   int _2;
>   short int _5;
>
>   :
>   _1 = (int) p1_4(D);
>   _5 = (short int) _1;
>   __asm__("" : "=r" p1_6 : "0" _5);
>   _2 = (int) p1_6;
>   return;
>
> }

Parameters are indeed "interesting" to handle ;)  As we now see on ARM
the incoming parameter (the default def) and later assignments to it
can require different promotions (well, different extensions for ARM).

The only sensible way to deal with promoting parameters is to
promote them by changing the function signature.  Thus reflect the
targets ABI for parameters in the GIMPLE representation (which
includes TYPE_ARG_TYPES and DECL_ARGUMENTS).
IMHO we should do this during gimplification of parameters / call
arguments already.

So for your example you'd end up with

fn1 (int p1)
{
  __asm__("" : "=r" p1_6 : "0" p1_4(D));
  return;
}

that is, promotions also apply to asm inputs/outputs (no?)

Richard.

> Thanks a lot for your time,
> Kugan


[PATCH 1/3] [gomp] Add RTEMS configuration

2015-07-28 Thread Sebastian Huber
libgomp/ChangeLog
2015-07-28  Sebastian Huber  

* config/rtems/bar.c: New.
* config/rtems/bar.h: Likewise.
* config/rtems/mutex.c: Likewise.
* config/rtems/mutex.h: Likewise.
* config/rtems/sem.c: Likewise.
* config/rtems/sem.h: Likewise.
* configure.ac (*-*-rtems*): Check that Newlib provides a proper
 header file.
* configure.tgt (*-*-rtems*): Enable RTEMS configuration if
supported by Newlib.
* configure: Regenerate.
---
 libgomp/config/rtems/bar.c   | 255 +++
 libgomp/config/rtems/bar.h   | 170 +
 libgomp/config/rtems/mutex.c |   1 +
 libgomp/config/rtems/mutex.h |  57 ++
 libgomp/config/rtems/sem.c   |   1 +
 libgomp/config/rtems/sem.h   |  55 ++
 libgomp/configure|  17 +++
 libgomp/configure.ac |   7 ++
 libgomp/configure.tgt|   7 ++
 9 files changed, 570 insertions(+)
 create mode 100644 libgomp/config/rtems/bar.c
 create mode 100644 libgomp/config/rtems/bar.h
 create mode 100644 libgomp/config/rtems/mutex.c
 create mode 100644 libgomp/config/rtems/mutex.h
 create mode 100644 libgomp/config/rtems/sem.c
 create mode 100644 libgomp/config/rtems/sem.h

diff --git a/libgomp/config/rtems/bar.c b/libgomp/config/rtems/bar.c
new file mode 100644
index 000..05bb320
--- /dev/null
+++ b/libgomp/config/rtems/bar.c
@@ -0,0 +1,255 @@
+/* Copyright (C) 2005-2015 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is the RTEMS implementation of a barrier synchronization
+   mechanism for libgomp.  It is identical to the Linux implementation, except
+   that the futex API is slightly different.  This type is private to the
+   library.  */
+
+#include 
+#include 
+#include 
+
+static gomp_barrier_t *
+generation_to_barrier(int *addr)
+{
+  return (gomp_barrier_t *)
+((char *) addr - __builtin_offsetof (gomp_barrier_t, generation));
+}
+
+static void
+futex_wait (int *addr, int val)
+{
+  gomp_barrier_t *bar = generation_to_barrier (addr);
+  _Futex_Wait (&bar->futex, addr, val);
+}
+
+static void
+futex_wake (int *addr, int count)
+{
+  gomp_barrier_t *bar = generation_to_barrier (addr);
+  _Futex_Wake (&bar->futex, count);
+}
+
+static int
+do_spin (int *addr, int val)
+{
+  unsigned long long i, count = gomp_spin_count_var;
+
+  if (__builtin_expect (gomp_managed_threads > gomp_available_cpus, 0))
+count = gomp_throttled_spin_count_var;
+  for (i = 0; i < count; i++)
+if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, 0))
+  return 0;
+  return 1;
+}
+
+static void
+do_wait (int *addr, int val)
+{
+  if (do_spin (addr, val))
+futex_wait (addr, val);
+}
+
+/* Everything below this point should be identical to the Linux
+   implementation.  */
+
+void
+gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
+{
+  if (__builtin_expect (state & BAR_WAS_LAST, 0))
+{
+  /* Next time we'll be awaiting TOTAL threads again.  */
+  bar->awaited = bar->total;
+  __atomic_store_n (&bar->generation, bar->generation + BAR_INCR,
+   MEMMODEL_RELEASE);
+  futex_wake ((int *) &bar->generation, INT_MAX);
+}
+  else
+{
+  do
+   do_wait ((int *) &bar->generation, state);
+  while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) == state);
+}
+}
+
+void
+gomp_barrier_wait (gomp_barrier_t *bar)
+{
+  gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
+}
+
+/* Like gomp_barrier_wait, except that if the encountering thread
+   is not the last one to hit the barrier, it returns immediately.
+   The intended usage is that a thread which intends to gomp_barrier_destroy
+   this barrier calls gomp_barrier_wait, while all other threads
+   call gomp_barrier_wait_last.  When gomp_barrier_wait returns,
+   the barrier can be safely destroyed.  */
+
+void
+gomp_barrier_wait_last (gomp_barrier_t *ba

[PATCH 3/3] [gomp] Add thread attribute customization

2015-07-28 Thread Sebastian Huber
libgomp/ChangeLog
2015-07-28  Sebastian Huber  

* config/posix/pool.h (gomp_adjust_thread_attr): New.
* config/rtems/pool.h (gomp_adjust_thread_attr): Likewise.
(gomp_thread_pool_reservoir): Add priority member.
* confi/rtems/proc.c (allocate_thread_pool_reservoir): Add
priority.
(parse_thread_pools): Likewise.
* team.c (gomp_team_start): Rename thread_attr to mutable_attr.
Call configuration provided gomp_adjust_thread_attr(). Destroy
mutable attributes if necessary.
* libgomp.texi: Document GOMP_RTEMS_THREAD_POOLS.
---
 libgomp/config/posix/pool.h |  7 +
 libgomp/config/rtems/pool.h | 29 ++
 libgomp/config/rtems/proc.c | 23 +++---
 libgomp/libgomp.texi| 75 +++--
 libgomp/team.c  | 15 -
 5 files changed, 121 insertions(+), 28 deletions(-)

diff --git a/libgomp/config/posix/pool.h b/libgomp/config/posix/pool.h
index 0d127a0..a8e2eec 100644
--- a/libgomp/config/posix/pool.h
+++ b/libgomp/config/posix/pool.h
@@ -57,4 +57,11 @@ gomp_release_thread_pool (struct gomp_thread_pool *pool)
   /* Do nothing in the default implementation.  */
 }
 
+static inline pthread_attr_t *
+gomp_adjust_thread_attr (pthread_attr_t *attr, pthread_attr_t *mutable_attr)
+{
+  /* Do nothing in the default implementation.  */
+  return attr;
+}
+
 #endif /* GOMP_POOL_H */
diff --git a/libgomp/config/rtems/pool.h b/libgomp/config/rtems/pool.h
index 5c989d0..facac05 100644
--- a/libgomp/config/rtems/pool.h
+++ b/libgomp/config/rtems/pool.h
@@ -41,6 +41,7 @@ struct gomp_thread_pool_reservoir {
   gomp_sem_t available;
   gomp_mutex_t lock;
   size_t index;
+  int priority;
   struct gomp_thread_pool *pools[];
 };
 
@@ -125,4 +126,32 @@ gomp_release_thread_pool (struct gomp_thread_pool *pool)
 }
 }
 
+static inline pthread_attr_t *
+gomp_adjust_thread_attr (pthread_attr_t *attr, pthread_attr_t *mutable_attr)
+{
+  struct gomp_thread_pool_reservoir *res = gomp_get_thread_pool_reservoir ();
+  if (res != NULL && res->priority > 0)
+{
+  struct sched_param param;
+  int err;
+  if (attr != mutable_attr)
+   {
+ attr = mutable_attr;
+ pthread_attr_init (attr);
+   }
+  memset (¶m, 0, sizeof (param));
+  param.sched_priority = res->priority;
+  err = pthread_attr_setschedparam (attr, ¶m);
+  if (err != 0)
+   gomp_fatal ("Thread attribute set scheduler parameters failed: %s", 
strerror (err));
+  err = pthread_attr_setschedpolicy (attr, SCHED_FIFO);
+  if (err != 0)
+   gomp_fatal ("Thread attribute set scheduler policy failed: %s", 
strerror (err));
+  err = pthread_attr_setinheritsched (attr, PTHREAD_EXPLICIT_SCHED);
+  if (err != 0)
+   gomp_fatal ("Thread attribute set explicit scheduler failed: %s", 
strerror (err));
+}
+  return attr;
+}
+
 #endif /* GOMP_POOL_H */
diff --git a/libgomp/config/rtems/proc.c b/libgomp/config/rtems/proc.c
index 9c36dcb..2939928 100644
--- a/libgomp/config/rtems/proc.c
+++ b/libgomp/config/rtems/proc.c
@@ -48,7 +48,8 @@ allocate_thread_pool_reservoirs (void)
 }
 
 static void
-allocate_thread_pool_reservoir (unsigned long count, unsigned long scheduler)
+allocate_thread_pool_reservoir (unsigned long count, unsigned long priority,
+   unsigned long scheduler)
 {
   struct gomp_thread_pool_reservoir *res;
   struct gomp_thread_pool *pools;
@@ -63,6 +64,7 @@ allocate_thread_pool_reservoir (unsigned long count, unsigned 
long scheduler)
   memset (pools, 0, size);
   res = (struct gomp_thread_pool_reservoir *) (pools + count);
   res->index = count;
+  res->priority = priority;
   gomp_sem_init (&res->available, count);
   gomp_mutex_init (&res->lock);
   for (i = 0; i < count; ++i)
@@ -71,7 +73,8 @@ allocate_thread_pool_reservoir (unsigned long count, unsigned 
long scheduler)
 }
 
 static char *
-parse_thread_pools (char *env, unsigned long *count, unsigned long *scheduler)
+parse_thread_pools (char *env, unsigned long *count, unsigned long *priority,
+   unsigned long *scheduler)
 {
   size_t len;
   int i;
@@ -84,6 +87,17 @@ parse_thread_pools (char *env, unsigned long *count, 
unsigned long *scheduler)
   if (errno != 0)
 gomp_fatal ("Invalid thread pool count");
 
+  if (*env == '$')
+{
+  ++env;
+  errno = 0;
+  *priority = strtoul (env, &env, 10);
+  if (errno != 0)
+   gomp_fatal ("Invalid thread pool priority");
+}
+  else
+*priority = -1;
+
   if (*env != '@')
 gomp_fatal ("Invalid thread pool scheduler prefix");
   ++env;
@@ -110,9 +124,10 @@ init_thread_pool_reservoirs (void)
   while (*env != '\0')
{
  unsigned long count;
+ unsigned long priority;
  unsigned long scheduler;
- env = parse_thread_pools (env, &count, &scheduler);
- allocate_thread_pool_reservoir (count, scheduler);
+ env = parse

[PATCH 2/3] [gomp] Thread pool management

2015-07-28 Thread Sebastian Huber
In RTEMS we may have multiple scheduler instances with different
scheduling algorithms.  In addition we have a single process environment
so all threads run in one address space.  In order to support work
stealing applications it is important to limit the number of thread
pools used for OpenMP since otherwise we may end up in an explosion of
OpenMP worker threads.

libgomp/ChangeLog
2015-07-28  Sebastian Huber  

* config/posix/pool.h: New.
* config/rtems/pool.h: Likewise.
* config/rtems/proc.c: Likewise.
* libgomp.h (gomp_thread_destructor): Declare.
* team.c: Include configuration provided .
(gomp_get_thread_pool): Define in configuration.
(gomp_team_end): Call configuration defined
gomp_release_thread_pool().
---
 libgomp/config/posix/pool.h |  60 ++
 libgomp/config/rtems/pool.h | 128 ++
 libgomp/config/rtems/proc.c | 145 
 libgomp/libgomp.h   |   2 +
 libgomp/team.c  |  22 +--
 5 files changed, 337 insertions(+), 20 deletions(-)
 create mode 100644 libgomp/config/posix/pool.h
 create mode 100644 libgomp/config/rtems/pool.h
 create mode 100644 libgomp/config/rtems/proc.c

diff --git a/libgomp/config/posix/pool.h b/libgomp/config/posix/pool.h
new file mode 100644
index 000..0d127a0
--- /dev/null
+++ b/libgomp/config/posix/pool.h
@@ -0,0 +1,60 @@
+/* Copyright (C) 2005-2015 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is the default implementation of the thread pool management
+   for libgomp.  This type is private to the library.  */
+
+#ifndef GOMP_POOL_H
+#define GOMP_POOL_H 1
+
+#include 
+
+/* Get the thread pool, allocate and initialize it on demand.  */
+
+static inline struct gomp_thread_pool *
+gomp_get_thread_pool (struct gomp_thread *thr, unsigned nthreads)
+{
+  struct gomp_thread_pool *pool = thr->thread_pool;
+  if (__builtin_expect (pool == NULL, 0))
+{
+  pool = gomp_malloc (sizeof (*pool));
+  pool->threads = NULL;
+  pool->threads_size = 0;
+  pool->threads_used = 0;
+  pool->last_team = NULL;
+  pool->threads_busy = nthreads;
+  thr->thread_pool = pool;
+  pthread_setspecific (gomp_thread_destructor, thr);
+}
+  return pool;
+}
+
+static inline void
+gomp_release_thread_pool (struct gomp_thread_pool *pool)
+{
+  /* Do nothing in the default implementation.  */
+}
+
+#endif /* GOMP_POOL_H */
diff --git a/libgomp/config/rtems/pool.h b/libgomp/config/rtems/pool.h
new file mode 100644
index 000..5c989d0
--- /dev/null
+++ b/libgomp/config/rtems/pool.h
@@ -0,0 +1,128 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is the RTEMS implementation of the thread pool management
+   for libgomp.  This type is private to the library.  */
+
+#ifndef GOMP_POOL_H

Re: [PATCH] [gomp] Simplify thread pool initialization

2015-07-28 Thread Sebastian Huber

Ping.

This is a pre-requisite for:

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02347.html

On 22/07/15 14:56, Sebastian Huber wrote:

Move the thread pool initialization from the team start to the team
creation.  This eliminates one conditional expression.  In addition this
is a preparation patch to enable shared thread pools which I would like
to use for RTEMS later.  No unexpected failures on
x86_64-unknown-linux-gnu.

libgomp/ChangeLog
2015-07-22  Sebastian Huber  

* team.c (gomp_new_thread_pool): Delete and move content to ...
(gomp_get_thread_pool): ... new function.  Allocate and
initialize thread pool on demand.
(get_last_team): Use gomp_get_thread_pool().
(gomp_team_start): Delete thread pool initialization.
---


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



RE: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

2015-07-28 Thread David Sherwood
Hi,

I have updated the comment above GET_MODE_INNER and while there I have
fixed a spelling mistake in the comment above GET_MODE_UNIT_SIZE.

Tested:
aarch64 and aarch64_be - no regressions in gcc testsuite
x86_64 - bootstrap build, no testsuite regressions
arm-none-eabi - no regressions in gcc testsuite
Run contrib/config-list.mk - only build failures are ones that fail anyway with
warnings being treated as errors.

Hope this is ok.

Cheers,
Dave.

2015-07-28  David Sherwood  

gcc/
* config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call
GET_MODE_INNER unconditionally.
* config/spu/spu.c (arith_immediate_p): Likewise.
* config/i386/i386.c (ix86_build_signbit_mask): Likewise.  New variable.
* expmed.c (synth_mult): Remove check for VOIDmode result from
GET_MODE_INNER.
(expand_mult_const): Likewise.
* fold-const.c (): Replace call to element_precision with call to
GET_MODE_PRECISION.
* genmodes.c (emit_mode_inner_inline): Replace void_mode->name with
m->name.
(emit_mode_inner): Likewise.
* lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER
result check.
* machmode.h (GET_MODE_INNER): Update comment.
(GET_MODE_UNIT_SIZE): Simplify and fix spelling mistake in comment.
(GET_MODE_UNIT_PRECISION): Simplify.
(element_precision): Remove.
* rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally.
* simplify-rtx.c (simplify_immed_subreg): Likewise.
* stor-layout.c (bitwise_type_for_mode): Update assert.
(element_precision): Remove.

> 
> On 07/27/2015 04:25 AM, David Sherwood wrote:
> > Hi,
> >
> > Part 1 of this change is a clean-up. I have changed calls to GET_MODE_INNER 
> > (m)
> > so that it returns m in cases where there is no inner mode. This simplifies 
> > some
> > of the calling code by removing the need to check for VOIDmode and allows
> > calling it unconditionally. I also removed element_precision () as it was 
> > only
> > called in one place and thought it neater to call GET_MODE_PRECISION 
> > explicitly.
> >
> > Parts 2-4 will include further tidy-ups and optimisations based on [1/N].
> >
> > Good to go?
> >
> > Regards,
> > David Sherwood.
> >
> > 2015-07-17  David Sherwood
> >
> >  gcc/
> >  * config/arm/arm.c (neon_element_bits, neon_valid_immediate): Call
> >  GET_MODE_INNER unconditionally.
> >  * config/spu/spu.c (arith_immediate_p): Likewise.
> >  * config/i386/i386.c (ix86_build_signbit_mask): Likewise.  New 
> > variable.
> >  * expmed.c (synth_mult): Remove check for VOIDmode result from
> >  GET_MODE_INNER.
> >  (expand_mult_const): Likewise.
> >  * fold-const.c (): Replace call to element_precision with call to
> >  GET_MODE_PRECISION.
> >  * genmodes.c (emit_mode_inner_inline): Replace void_mode->name with
> >  m->name.
> >  (emit_mode_inner): Likewise.
> >  * lto-streamer-out.c (lto_write_mode_table): Update GET_MODE_INNER
> >  result check.
> >  * machmode.h (GET_MODE_UNIT_SIZE): Simplify.
> >  (GET_MODE_UNIT_PRECISION): Likewise.
> >  * rtlanal.c (subreg_get_info): Call GET_MODE_INNER unconditionally.
> >  * simplify-rtx.c (simplify_immed_subreg): Likewise.
> >  * stor-layout.c (bitwise_type_for_mode): Update assert.
> >  (element_precision): Remove.
> Somehow my brain kept translating INNER into NARROWER.  Naturally I was
> having considerable trouble seeing how the patch could be correct ;-)
> Looking at insn-modes.h cleared things up quickly.
> 
> In a lot of ways this makes GET_INNER_MODE act more like
> GET_MODE_NUNITS, which is probably good.
> 
> You need to update the comment for GET_MODE_INNER in machmode.h to
> reflect the change in its return value for non-vector modes.
> 
> With that update, this patch is fine.
> 
> jeff



mode_inner1.patch
Description: Binary data


[PATCH 0/15][ARM/AArch64] Add support for float16_t vectors (v3)

2015-07-28 Thread Alan Lawrence
All AArch64 patches are unchanged from previous version. However, in response to 
discussion, the ARM patches are changed (much as I suggested 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02249.html); this version:


   * Hides the existing vcvt_f16_f32 and vcvt_f32_f16 intrinsics, and 
float16x4_t type, unless we have a scalar __FP16 type (i.e. unless 
-mfp16-format=ieee/alternative is specified on the command line). Although this 
loses us the ability to write code that uses hardware instructions to work with 
either IEEE or Alternative formats according to the FPSCR bit, it is consistent 
with ACLE statements that the vector types (float16x4_t and float16x8_t) should 
only be available if the scalar type is, and that if the scalar type is 
available, then one or other of __ARM_FP16_FORMAT_IEEE or 
__ARM_FP16_FORMAT_ALTERNATIVE should be set.


(Straightforward interpretation of ACLE can be confusing because GCC has made 
the choice of supporting the __FP16 type even when hardware is not available, 
via software conversion routines - the -mfp16-format flag then picking which set 
of sw routines are in use.)


  * Makes all the new intrinsics available, similarly, only if we have a scalar 
__FP16 type. This means that (in contrast to previous versions of this patch 
series) we will not gain the ability to write programs that pass 
half-precision-float values through as "bags of bits".


I considered the alternative of making -mfp16-format default to ieee, but that 
makes the -mfp16-format=alternative option almost unusable, as one cannot link 
object files compiled with different -mfp16-format :(. We could set the default 
to be ieee only when neon-fp16 is specified, but that change is pretty much 
orthogonal to this patch series so can follow independently if desired.


  * To ease testing (including a couple of existing tests), I modified the 
arm_neon_fp16_ok functions in lib/target-supports.exp to try also flags 
specifying -mfp16-format=ieee (if flags without that fail to compile, presumably 
because of the absence of an __FP16 type; however, this still allows an explicit 
-mfp16-format=alternative if desired). On ARM targets, we then pass in 
-mfpu=neon-fp16 and -mfp16-format flags for all tests in advsimd-intrinsics.exp, 
unless these are overridden by an explicit multilib, in which case we will run 
the advsimd-intrinsics tests without the float16 variants (via #if).


Are these patches OK for trunk? If so I will commit along with the 
previously-approved fix to fold-const.c for HFmode, 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00696.html


Bootstrapped on arm-none-linux-gnueabihf (--with-arch=armv7-a --with-fpu=neon 
--with-float=hard), and aarch64-none-linux-gnu; cross-tested arm-none-eabi (a 
number of variants, especially for the advsimd-intrinsics tests in patch 13+14).


Thanks, Alan



[PATCH 1/15][ARM] Hide existing float16 intrinsics unless we have a scalar __fp16 type

2015-07-28 Thread Alan Lawrence
This makes the existing float16 vector intrinsics available only when we have an 
__fp16 type (i.e. when one of the ARM_FP16_FORMAT_... macros is defined).


Thus, we also rearrange the float16x[48]_t types to use the same type as __fp16 
for the element type (ACLE says that __fp16 should be an alias).


To keep the existing gcc.target/arm/neon/vcvt{f16_f32,f32_f16} tests working, as 
these do not specify an -mfp16-format, I've modified 
check_effective_target_arm_neon_fp16_ok to add in -mfp16-format=ieee *if 
necessary* (hence still allowing an explicit -mfp16-format=alternative). A 
documentation fix for this follows in the last patch.


gcc/ChangeLog:

* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Move
initialization of HFmode scalar type (float16_t) to...
(arm_init_fp16_builtins): ...here, combining with previous __fp16.
(arm_init_builtins): Call arm_init_fp16_builtins earlier and always.

* config/arm/arm_neon.h (vcvt_f16_f32, vcvt_f32_f16): Condition on
having an -mfp16-format.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_neon_fp16_ok_nocache): Add flag variants
with -mfp16-format=ieee.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 89b1b0cd2c51d83abb02d555f3881d0270557ccd..8d4833428382305dc3595cee2e172289c9a874cf 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -771,13 +771,6 @@ arm_init_simd_builtin_types (void)
   int nelts = sizeof (arm_simd_types) / sizeof (arm_simd_types[0]);
   tree tdecl;
 
-  /* Initialize the HFmode scalar type.  */
-  arm_simd_floatHF_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (arm_simd_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
-  layout_type (arm_simd_floatHF_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_floatHF_type_node,
-	 "__builtin_neon_hf");
-
   /* Poly types are a world of their own.  In order to maintain legacy
  ABI, they get initialized using the old interface, and don't get
  an entry in our mangling table, consequently, they get default
@@ -825,6 +818,8 @@ arm_init_simd_builtin_types (void)
  mangling.  */
 
   /* Continue with standard types.  */
+  /* The __builtin_simd{64,128}_float16 types are kept private unless
+ we have a scalar __fp16 type.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
@@ -1704,10 +1699,12 @@ arm_init_iwmmxt_builtins (void)
 static void
 arm_init_fp16_builtins (void)
 {
-  tree fp16_type = make_node (REAL_TYPE);
-  TYPE_PRECISION (fp16_type) = 16;
-  layout_type (fp16_type);
-  (*lang_hooks.types.register_builtin_type) (fp16_type, "__fp16");
+  arm_simd_floatHF_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (arm_simd_floatHF_type_node) = GET_MODE_PRECISION (HFmode);
+  layout_type (arm_simd_floatHF_type_node);
+  if (arm_fp16_format)
+(*lang_hooks.types.register_builtin_type) (arm_simd_floatHF_type_node,
+	   "__fp16");
 }
 
 static void
@@ -1752,12 +1749,11 @@ arm_init_builtins (void)
   if (TARGET_REALLY_IWMMXT)
 arm_init_iwmmxt_builtins ();
 
+  arm_init_fp16_builtins ();
+
   if (TARGET_NEON)
 arm_init_neon_builtins ();
 
-  if (arm_fp16_format)
-arm_init_fp16_builtins ();
-
   if (TARGET_CRC32)
 arm_init_crc32_builtins ();
 
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c923e294cda2f8cb88e4b1ccca6fd4f13a3ed98d..2b30be61a46a0c906478c599a005c27cd467dfa6 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -41,7 +41,9 @@ typedef __simd64_int8_t int8x8_t;
 typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 typedef __simd64_float16_t float16x4_t;
+#endif
 typedef __simd64_float32_t float32x2_t;
 typedef __simd64_poly8_t poly8x8_t;
 typedef __simd64_poly16_t poly16x4_t;
@@ -6220,21 +6222,25 @@ vcvtq_u32_f32 (float32x4_t __a)
 }
 
 #if ((__ARM_FP & 0x2) != 0)
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
 vcvt_f16_f32 (float32x4_t __a)
 {
   return (float16x4_t)__builtin_neon_vcvtv4hfv4sf (__a);
 }
-
 #endif
+#endif
+
 #if ((__ARM_FP & 0x2) != 0)
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vcvt_f32_f16 (float16x4_t __a)
 {
   return (float32x4_t)__builtin_neon_vcvtv4sfv4hf (__a);
 }
-
 #endif
+#endif
+
 __extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
 vcvt_n_s32_f32 (float32x2_t __a, const int __b)
 {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/

[PATCH 2/15][ARM] float16x4_t intrinsics in arm_neon.h

2015-07-28 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00476.html. 
The change is to provide all the new float16 intrinsics only if we actually have 
an scalar __fp16 type. (This covers the intrinsics whose implementation is 
entirely within arm_neon.h; those requiring .md changes follow in patch 7).


gcc/ChangeLog (unchanged):

* config/arm/arm_neon.h (float16_t, vget_lane_f16, vset_lane_f16,
vcreate_f16, vld1_lane_f16, vld1_dup_f16, vreinterpret_p8_f16,
vreinterpret_p16_f16, vreinterpret_f16_p8, vreinterpret_f16_p16,
vreinterpret_f16_f32, vreinterpret_f16_p64, vreinterpret_f16_s64,
vreinterpret_f16_u64, vreinterpret_f16_s8, vreinterpret_f16_s16,
vreinterpret_f16_s32, vreinterpret_f16_u8, vreinterpret_f16_u16,
vreinterpret_f16_u32, vreinterpret_f32_f16, vreinterpret_p64_f16,
vreinterpret_s64_f16, vreinterpret_u64_f16, vreinterpret_s8_f16,
vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
vreinterpret_u16_f16, vreinterpret_u32_f16): New.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 2b30be61a46a0c906478c599a005c27cd467dfa6..3c40f9f94fae30cab5e8833d72d0ac9ff3ac7b0f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -42,6 +42,7 @@ typedef __simd64_int16_t int16x4_t;
 typedef __simd64_int32_t int32x2_t;
 typedef __builtin_neon_di int64x1_t;
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef __fp16 float16_t;
 typedef __simd64_float16_t float16x4_t;
 #endif
 typedef __simd64_float32_t float32x2_t;
@@ -5203,6 +5204,21 @@ vget_lane_s32 (int32x2_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev2si (__a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+/* Functions cannot accept or return __FP16 types.  Even if the function
+   were marked always-inline so there were no call sites, the declaration
+   would nonetheless raise an error.  Hence, we must use a macro instead.  */
+
+#define vget_lane_f16(__v, __idx)		\
+  __extension__	\
+({		\
+  float16x4_t __vec = (__v);		\
+  __builtin_arm_lane_check (4, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+#endif
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vget_lane_f32 (float32x2_t __a, const int __b)
 {
@@ -5335,6 +5351,18 @@ vset_lane_s32 (int32_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vset_lanev2si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define vset_lane_f16(__e, __v, __idx)		\
+  __extension__	\
+({		\
+  float16_t __elem = (__e);			\
+  float16x4_t __vec = (__v);		\
+  __builtin_arm_lane_check (4, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vset_lane_f32 (float32_t __a, float32x2_t __b, const int __c)
 {
@@ -5481,6 +5509,14 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t)__builtin_neon_vcreatedi ((__builtin_neon_di) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcreate_f32 (uint64_t __a)
 {
@@ -8802,6 +8838,14 @@ vld1_lane_s32 (const int32_t * __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vld1_lanev2si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_lane_f16 (const float16_t * __a, float16x4_t __b, const int __c)
+{
+  return vset_lane_f16 (*__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_lane_f32 (const float32_t * __a, float32x2_t __b, const int __c)
 {
@@ -8950,6 +8994,15 @@ vld1_dup_s32 (const int32_t * __a)
   return (int32x2_t)__builtin_neon_vld1_dupv2si ((const __builtin_neon_si *) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vld1_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x4_t) { __f, __f, __f, __f };
+}
+#endif
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vld1_dup_f32 (const float32_t * __a)
 {
@@ -11833,6 +11886,14 @@ vreinterpret_p8_p16 (poly16x4_t __a)
   return (poly8x8_t)__builtin_neon_vreinterpretv8qiv4hi ((int16x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static _

[PATCH 4/15][ARM] float16x8_t intrinsics in arm_neon.h

2015-07-28 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00478.html , 
again making the intrinsics available only if we have a scalar __fp16 type. 
(This covers the intrinsics whose implementation is entirely within arm_neon.h; 
those requiring .md changes follow in the next patch).


gcc/ChangeLog:

* config/arm/arm_neon.h (vgetq_lane_f16, vsetq_lane_f16, vld1q_lane_f16,
vld1q_dup_f16, vreinterpretq_p8_f16, vreinterpretq_p16_f16,
vreinterpretq_f16_p8, vreinterpretq_f16_p16, vreinterpretq_f16_f32,
vreinterpretq_f16_p64, vreinterpretq_f16_p128, vreinterpretq_f16_s64,
vreinterpretq_f16_u64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
vreinterpretq_f16_s32, vreinterpretq_f16_u8, vreinterpretq_f16_u16,
vreinterpretq_f16_u32, vreinterpretq_f32_f16, vreinterpretq_p64_f16,
vreinterpretq_p128_f16, vreinterpretq_s64_f16, vreinterpretq_u64_f16,
vreinterpretq_s8_f16, vreinterpretq_s16_f16, vreinterpretq_s32_f16,
vreinterpretq_u8_f16, vreinterpretq_u16_f16, vreinterpretq_u32_f16):
New.
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 10d2de945e16d8056a7f137bc6d892617576ddb8..b1c9cc76a4cc3480cd23ec254390f492721c4d04 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -5288,6 +5288,17 @@ vgetq_lane_s32 (int32x4_t __a, const int __b)
   return (int32_t)__builtin_neon_vget_lanev4si (__a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define vgetq_lane_f16(__v, __idx)		\
+  __extension__	\
+({		\
+  float16x8_t __vec = (__v);		\
+  __builtin_arm_lane_check (8, __idx);	\
+  float16_t __res = __vec[__idx];		\
+  __res;	\
+})
+#endif
+
 __extension__ static __inline float32_t __attribute__ ((__always_inline__))
 vgetq_lane_f32 (float32x4_t __a, const int __b)
 {
@@ -5432,6 +5443,18 @@ vsetq_lane_s32 (int32_t __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vset_lanev4si ((__builtin_neon_si) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+#define vsetq_lane_f16(__e, __v, __idx)		\
+  __extension__	\
+({		\
+  float16_t __elem = (__e);			\
+  float16x8_t __vec = (__v);		\
+  __builtin_arm_lane_check (8, __idx);	\
+  __vec[__idx] = __elem;			\
+  __vec;	\
+})
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vsetq_lane_f32 (float32_t __a, float32x4_t __b, const int __c)
 {
@@ -8923,6 +8946,14 @@ vld1q_lane_s32 (const int32_t * __a, int32x4_t __b, const int __c)
   return (int32x4_t)__builtin_neon_vld1_lanev4si ((const __builtin_neon_si *) __a, __b, __c);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_lane_f16 (const float16_t * __a, float16x8_t __b, const int __c)
+{
+  return vsetq_lane_f16 (*__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_lane_f32 (const float32_t * __a, float32x4_t __b, const int __c)
 {
@@ -9080,6 +9111,15 @@ vld1q_dup_s32 (const int32_t * __a)
   return (int32x4_t)__builtin_neon_vld1_dupv4si ((const __builtin_neon_si *) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline float16x8_t __attribute__ ((__always_inline__))
+vld1q_dup_f16 (const float16_t * __a)
+{
+  float16_t __f = *__a;
+  return (float16x8_t) { __f, __f, __f, __f, __f, __f, __f, __f };
+}
+#endif
+
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vld1q_dup_f32 (const float32_t * __a)
 {
@@ -12922,6 +12962,14 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
   return (poly8x16_t)__builtin_neon_vreinterpretv16qiv8hi ((int16x8_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+#endif
+
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
@@ -12998,6 +13046,14 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv16qi ((int8x16_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+#endif
+
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
@@ -13068,6 +13124,114 @@ vreinterpretq_p16_u32 (uint32x4_t __a)
   return (poly16x8_t)__builtin_neon_vreinterpretv8hiv4si ((int32x4_t) __a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defi

[PATCH 5/15][ARM] Remaining intrinsics

2015-07-28 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00479.html, 
again to make the intrinsics available only if we have a scalar __fp16 type.


This does not fix existing indentation issues in neon.md but rather keeps the 
affected lines consistent with those around them.


gcc/ChangeLog (as before):

* config/arm/arm-builtins.c (VAR11, VAR12): New.
* config/arm/arm_neon_builtins.def (vcombine, vld2_dup, vld3_dup,
vld4_dup): Add v4hf variant.
(vget_high, vget_low): Add v8hf variant.
(vld1, vst1, vst1_lane, vld2, vld2_lane, vst2, vst2_lane, vld3,
vld3_lane, vst3, vst3_lane, vld4, vld4_lane, vst4, vst4_lane): Add
v4hf and v8hf variants.

* config/arm/iterators.md (VD_LANE, VD_RE, VQ2, VQ_HS): New.
(VDX): Add V4HF.
(V_DOUBLE): Add case for V4HF.
(VQX): Add V8HF.
(V_HALF): Add case for V8HF.
(VDQX): Add V4HF, V8HF.
(V_elem, V_two_elem, V_three_elem, V_four_elem, V_cmp_result,
V_uf_sclr, V_sz_elem, V_mode_nunits, q): Add cases for V4HF & V8HF.

* config/arm/neon.md (vec_setinternal, vec_extract,
neon_vget_lane_sext_internal, neon_vget_lane_zext_internal,
vec_load_lanesoi, neon_vld2, vec_store_lanesoi,
neon_vst2, vec_load_lanesci, neon_vld3,
neon_vld3qa, neon_vld3qb, vec_store_lanesci,
neon_vst3, neon_vst3qa, neon_vst3qb,
vec_load_lanesxi, neon_vld4, neon_vld4qa,
neon_vld4qb, vec_store_lanesxi, neon_vst4,
neon_vst4qa, neon_vst4qb): Change VQ iterator to VQ2.

(neon_vcreate, neon_vreinterpretv8qi,
neon_vreinterpretv4hi, neon_vreinterpretv2si,
neon_vreinterpretv2sf, neon_vreinterpretdi):
Change VDX to VD_RE.

(neon_vld2_lane, neon_vst2_lane, neon_vld3_lane,
neon_vst3_lane, neon_vld4_lane, neon_vst4_lane):
Change VD iterator to VD_LANE, and VMQ iterator to VQ_HS.

* config/arm/arm_neon.h (float16x4x2_t, float16x8x2_t, float16x4x3_t,
float16x8x3_t, float16x4x4_t, float16x8x4_t, vcombine_f16,
vget_high_f16, vget_low_f16, vld1_f16, vld1q_f16, vst1_f16, vst1q_f16,
vst1_lane_f16, vst1q_lane_f16, vld2_f16, vld2q_f16, vld2_lane_f16,
vld2q_lane_f16, vld2_dup_f16, vst2_f16, vst2q_f16, vst2_lane_f16,
vst2q_lane_f16, vld3_f16, vld3q_f16, vld3_lane_f16, vld3q_lane_f16,
vld3_dup_f16, vst3_f16, vst3q_f16, vst3_lane_f16, vst3q_lane_f16,
vld4_f16, vld4q_f16, vld4_lane_f16, vld4q_lane_f16, vld4_dup_f16,
vst4_f16, vst4q_f16, vst4_lane_f16, vst4q_lane_f16, ): New.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 624839ef58b3e4b49cb70dfc3dfbca141941eb7f..7afa3396a6d3e46165ca634ecc60ec42fad78a6e 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -241,6 +241,12 @@ typedef struct {
 #define VAR10(T, N, A, B, C, D, E, F, G, H, I, J) \
   VAR9 (T, N, A, B, C, D, E, F, G, H, I) \
   VAR1 (T, N, J)
+#define VAR11(T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR10 (T, N, A, B, C, D, E, F, G, H, I, J) \
+  VAR1 (T, N, K)
+#define VAR12(T, N, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR11 (T, N, A, B, C, D, E, F, G, H, I, J, K) \
+  VAR1 (T, N, L)
 
 /* The NEON builtin data can be found in arm_neon_builtins.def.
The mode entries in the following table correspond to the "key" type of the
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b1c9cc76a4cc3480cd23ec254390f492721c4d04..66622dfcfe2d6f3d575db98a1420f6a58e13baee 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -166,6 +166,20 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+#endif
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -292,6 +306,20 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+#endif
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -418,6 +446,20 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+#endif
+
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+#endi

[PATCH 3/15][ARM] Add V8HFmode and float16x8_t type

2015-07-28 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00477.html. 
The only change is to publish float16x8_t only if we actually have a scalar 
__fp16 type.


gcc/ChangeLog:

* config/arm/arm.h (VALID_NEON_QREG_MODE): Add V8HFmode.

* config/arm/arm.c (arm_vector_mode_supported_p): Support V8HFmode.

* config/arm/arm-builtins.c (v8hf_UP): New.
(arm_init_simd_builtin_types): Initialise Float16x8_t.

* config/arm/arm-simd-builtin-types.def (Float16x8_t): New.

* config/arm/arm_neon.h (float16x8_t): New typedef.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 8d4833428382305dc3595cee2e172289c9a874cf..624839ef58b3e4b49cb70dfc3dfbca141941eb7f 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -192,6 +192,7 @@ arm_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define di_UPDImode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -822,6 +823,7 @@ arm_init_simd_builtin_types (void)
  we have a scalar __fp16 type.  */
   arm_simd_types[Float16x4_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x2_t].eltype = float_type_node;
+  arm_simd_types[Float16x8_t].eltype = arm_simd_floatHF_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   for (i = 0; i < nelts; i++)
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index bcbd20be057d8bc6c94ca155eb6051f20e5300b6..b178ae6c05f0c532b105b30f9a2706c9f0aa8afe 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -44,5 +44,7 @@
 
   ENTRY (Float16x4_t, V4HF, none, 64, float16, 18)
   ENTRY (Float32x2_t, V2SF, none, 64, float32, 18)
+
+  ENTRY (Float16x8_t, V8HF, none, 128, float16, 19)
   ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 373dc85778d1bd4694c666ea4c6d82dc9ce8e819..c0a83b288b8c801235c160aa0e9611a510244117 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -999,7 +999,7 @@ extern int arm_arch_crc;
 /* Modes valid for Neon Q registers.  */
 #define VALID_NEON_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V4SFmode || (MODE) == V2DImode)
+   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6e074ea3d3910e1d7abf0299f441973259023606..0faa46ceea51ef6c524c8ff8c063f329a524c11d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -26251,7 +26251,8 @@ arm_vector_mode_supported_p (machine_mode mode)
 {
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
-  || mode == V4HFmode || mode == V16QImode || mode == V4SFmode || mode == V2DImode))
+  || mode ==V4HFmode || mode == V16QImode || mode == V4SFmode
+  || mode == V2DImode || mode == V8HFmode))
 return true;
 
   if ((TARGET_NEON || TARGET_IWMMXT)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c40f9f94fae30cab5e8833d72d0ac9ff3ac7b0f..10d2de945e16d8056a7f137bc6d892617576ddb8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -60,6 +60,9 @@ typedef __simd128_int8_t int8x16_t;
 typedef __simd128_int16_t int16x8_t;
 typedef __simd128_int32_t int32x4_t;
 typedef __simd128_int64_t int64x2_t;
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+typedef __simd128_float16_t float16x8_t;
+#endif
 typedef __simd128_float32_t float32x4_t;
 typedef __simd128_poly8_t poly8x16_t;
 typedef __simd128_poly16_t poly16x8_t;


[PATCH 6/15][AArch64] Add basic FP16 support

2015-07-28 Thread Alan Lawrence

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_fp16_type_node): New.
(aarch64_init_builtins): Make aarch64_fp16_type_node, use for __fp16.

* config/aarch64/aarch64-modes.def: Add HFmode.

* config/aarch64/aarch64.h (TARGET_CPU_CPP_BUILTINS): Define
__ARM_FP16_FORMAT_IEEE and __ARM_FP16_ARGS. Set bit 1 of __ARM_FP.

* config/aarch64/aarch64.c (aarch64_init_libfuncs,
aarch64_promoted_type): New.

(aarch64_float_const_representable_p): Disable HFmode.
(aarch64_mangle_type): Mangle half-precision floats to "Dh".
(TARGET_PROMOTED_TYPE): Define to aarch64_promoted_type.
(TARGET_INIT_LIBFUNCS): Define to aarch64_init_libfuncs.

* config/aarch64/aarch64.md (mov): Include HFmode using GPF_F16.
(movhf_aarch64, extendhfsf2, extendhfdf2, truncsfhf2, truncdfhf2): New.

* config/aarch64/iterators.md (GPF_F16): New.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/f16_movs_1.c: New test.
commit 989af1492bbf268be1ecfae06f3303b90ae514c8
Author: Alan Lawrence 
Date:   Tue Dec 2 12:57:39 2014 +

AArch64 1/6: Basic HFmode support (less tests), aarch64_fp16_type_node, patterns, mangling, predefines.

No --fp16-format option.

Disable constants as NYI.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index ec60955..cfb2dc1 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -439,6 +439,9 @@ static struct aarch64_simd_type_info aarch64_simd_types [] = {
 };
 #undef ENTRY
 
+/* This type is not SIMD-specific; it is the user-visible __fp16.  */
+static tree aarch64_fp16_type_node = NULL_TREE;
+
 static tree aarch64_simd_intOI_type_node = NULL_TREE;
 static tree aarch64_simd_intEI_type_node = NULL_TREE;
 static tree aarch64_simd_intCI_type_node = NULL_TREE;
@@ -849,6 +852,12 @@ aarch64_init_builtins (void)
 = add_builtin_function ("__builtin_aarch64_set_fpsr", ftype_set_fpr,
 			AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
 
+  aarch64_fp16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_fp16_type_node) = 16;
+  layout_type (aarch64_fp16_type_node);
+
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node, "__fp16");
+
   if (TARGET_SIMD)
 aarch64_init_simd_builtins ();
   if (TARGET_CRC32)
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index b17b90d..c30059b 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -36,6 +36,10 @@ CC_MODE (CC_DLTU);
 CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
 
+/* Half-precision floating point for arm_neon.h float16_t.  */
+FLOAT_MODE (HF, 2, 0);
+ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
+
 /* Vector modes.  */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI.  */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 17bae08..f338033 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8339,6 +8339,10 @@ aarch64_mangle_type (const_tree type)
   if (lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
 return "St9__va_list";
 
+  /* Half-precision float.  */
+  if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
+return "Dh";
+
   /* Mangle AArch64-specific internal types.  TYPE_NAME is non-NULL_TREE for
  builtin types.  */
   if (TYPE_NAME (type) != NULL)
@@ -9578,6 +9582,33 @@ aarch64_start_file (void)
   default_file_start();
 }
 
+static void
+aarch64_init_libfuncs (void)
+{
+   /* Half-precision float operations.  The compiler handles all operations
+ with NULL libfuncs by converting to SFmode.  */
+
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, HFmode, SFmode, "__gnu_f2h_ieee");
+  set_conv_libfunc (sext_optab, SFmode, HFmode, "__gnu_h2f_ieee");
+
+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, HFmode, NULL);
+  set_optab_libfunc (sdiv_optab, HFmode, NULL);
+  set_optab_libfunc (smul_optab, HFmode, NULL);
+  set_optab_libfunc (neg_optab, HFmode, NULL);
+  set_optab_libfunc (sub_optab, HFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, HFmode, NULL);
+  set_optab_libfunc (ne_optab, HFmode, NULL);
+  set_optab_libfunc (lt_optab, HFmode, NULL);
+  set_optab_libfunc (le_optab, HFmode, NULL);
+  set_optab_libfunc (ge_optab, HFmode, NULL);
+  set_optab_libfunc (gt_optab, HFmode, NULL);
+  set_optab_libfunc (unord_optab, HFmode, NULL);
+}
+
 /* Target hook for c_mode_for_suffix.  */
 static machine_mode
 aarch64_c_mode_for_suffix (char suffix)
@@ -9616,7 +9647,8 @@ aarch64_float_const_representable_p (rtx x)
   if (!CONST_DOUBLE_P (x))
 return false;
 
-  if (GET_MODE (x) == VOIDmode)
+  /* We don't support HFmode constants yet.  */
+  if (GET_MODE (x) == VOIDmode || GET_MODE (x) == HFmode)
 return false;
 
   REAL

[PATCH 7/15][ARM/AArch64 Testsuite] Add basic fp16 tests

2015-07-28 Thread Alan Lawrence

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fp16/fp16.exp: New.
* gcc.target/aarch64/fp16/f16_convs_1.c: New.
* gcc.target/aarch64/fp16/f16_convs_2.c: New.
commit bc5045c0d3dd34b8cb94910281384f9ab9880325
Author: Alan Lawrence 
Date:   Thu May 7 10:08:12 2015 +0100

(ARM+AArch64) Add gcc.target/aarch64/fp16, f16_conv_[12].c tests

diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c
new file mode 100644
index 000..a1c95fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_1.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-mfp16-format=ieee" {target "arm*-*-*"} } */
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+int
+main (int argc, char **argv)
+{
+  float f1 = 3.14159f;
+  float f2 = 2.718f;
+  /* This 'assembler' statement should be portable between ARM and AArch64.  */
+  asm volatile ("" : : : "memory");
+  __fp16 in1 = f1;
+  __fp16 in2 = f2;
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+ float32, adds, converts back to f16, then we convert back to f32).  */
+  __fp16 res1 = in1 + in2;
+  asm volatile ("" : : : "memory");
+  float f_res_1 = res1;
+
+  /* Do the addition on float32's (we convert both operands to f32, and add,
+ as above, but skip the final conversion f32 -> f16 -> f32).  */
+  float f1a = in1;
+  float f2a = in2;
+  float f_res_2 = f1a + f2a;
+
+  if (__builtin_fabs (f_res_2 - f_res_1) > EPSILON)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c
new file mode 100644
index 000..6aa3e59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/f16_convs_2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-mfp16-format=ieee" {target "arm*-*-*"} } */
+
+extern void abort (void);
+
+#define EPSILON 0.0001
+
+int
+main (int argc, char **argv)
+{
+  int i1 = 3;
+  int i2 = 2;
+  /*  This 'assembler' should be portable across ARM and AArch64.  */
+  asm volatile ("" : : : "memory");
+
+  __fp16 in1 = i1;
+  __fp16 in2 = i2;
+
+  /* Do the addition on __fp16's (implicitly converts both operands to
+ float32, adds, converts back to f16, then we convert to int).  */
+  __fp16 res1 = in1 + in2;
+  asm volatile ("" : : : "memory");
+  int result1 = res1;
+
+  /* Do the addition on int's (we convert both operands directly to int, add,
+ and we're done).  */
+  int result2 = ((int) in1) + ((int) in2);
+
+  if (__builtin_abs (result2 - result1) > EPSILON)
+abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp
new file mode 100644
index 000..7dc8d65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fp16/fp16.exp
@@ -0,0 +1,43 @@
+# Tests of 16-bit floating point (__fp16), for both ARM and AArch64.
+# Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM or AArch64 target.
+if {![istarget arm*-*-*]
+&& ![istarget aarch64*-*-*]} then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cC\]]] \
+	"" $DEFAULT_CFLAGS
+
+# All done.
+dg-finish


[PATCH 8/15][AArch64] Add support for float14x{4,8}_t vectors/builtins

2015-07-28 Thread Alan Lawrence

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_vector_mode_supported_p): Support
V4HFmode and V8HFmode.
(aarch64_split_simd_move): Add case for V8HFmode.
* config/aarch64/aarch64-builtins.c (v4hf_UP, v8hf_UP): Define.
(aarch64_simd_builtin_std_type): Handle HFmode.
(aarch64_init_simd_builtin_types): Include Float16x4_t and Float16x8_t.

* config/aarch64/aarch64-simd.md (mov, aarch64_get_lane,
aarch64_ld1, aarch64_st1, aarch64_be_st1): Use VALLDI_F16 iterator.

* config/aarch64/aarch64-simd-builtin-types.def: Add Float16x4_t,
Float16x8_t.

* config/aarch64/aarch64-simd-builtins.def (ld1, st1): Use VALL_F16.
* config/aarch64/arm_neon.h (float16x4_t, float16x8_t, float16_t):
New typedefs.
(vget_lane_f16, vgetq_lane_f16, vset_lane_f16, vsetq_lane_f16,
vld1_f16, vld1q_f16, vst1_f16, vst1q_f16, vst1_lane_f16,
vst1q_lane_f16): New.
* config/aarch64/iterators.md (VD, VQ, VQ_NO2E): Add vectors of HFmode.
(VALLDI_F16, VALL_F16): New.
(Vmtype, VEL, VCONQ, VHALF, VRL3, VRL4, V_TWO_ELEM, V_THREE_ELEM,
V_FOUR_ELEM, q): Add cases for V4HF and V8HF.
(VDBL, VRL2): Add V4HF case.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle-neon-aarch64.C: Add cases for float16x4_t and
float16x8_t.
* gcc.target/aarch64/vset_lane_1.c: Likewise.
* gcc.target/aarch64/vld1-vst1_1.c: Likewise, also missing float32x4_t.
* gcc.target/aarch64/vld1_lane.c: Remove unused constants; add cases
for float16x4_t and float16x8_t.
commit 49cb53a94a44fcda845c3f6ef11e88f9be458aad
Author: Alan Lawrence 
Date:   Tue Dec 2 13:08:15 2014 +

AArch64 2/N: Vector/__builtin basics: define+support types, movs, test ABI.

Patterns, builtins, intrinsics for {ld1,st1}{,_lane},v{g,s}et_lane. Tests: vld1-vst1_1, vset_lane_1, vld1_lane.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index cfb2dc1..a6c3377 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -66,6 +66,7 @@
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
+#define v4hf_UP  V4HFmode
 #define v2si_UP  V2SImode
 #define v2sf_UP  V2SFmode
 #define v1df_UP  V1DFmode
@@ -73,6 +74,7 @@
 #define df_UPDFmode
 #define v16qi_UP V16QImode
 #define v8hi_UP  V8HImode
+#define v8hf_UP  V8HFmode
 #define v4si_UP  V4SImode
 #define v4sf_UP  V4SFmode
 #define v2di_UP  V2DImode
@@ -523,6 +525,8 @@ aarch64_simd_builtin_std_type (enum machine_mode mode,
   return aarch64_simd_intCI_type_node;
 case XImode:
   return aarch64_simd_intXI_type_node;
+case HFmode:
+  return aarch64_fp16_type_node;
 case SFmode:
   return float_type_node;
 case DFmode:
@@ -607,6 +611,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Poly64x2_t].eltype = aarch64_simd_types[Poly64_t].itype;
 
   /* Continue with standard types.  */
+  aarch64_simd_types[Float16x4_t].eltype = aarch64_fp16_type_node;
+  aarch64_simd_types[Float16x8_t].eltype = aarch64_fp16_type_node;
   aarch64_simd_types[Float32x2_t].eltype = float_type_node;
   aarch64_simd_types[Float32x4_t].eltype = float_type_node;
   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def b/gcc/config/aarch64/aarch64-simd-builtin-types.def
index bb54e56..ea219b7 100644
--- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
+++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
@@ -44,6 +44,8 @@
   ENTRY (Poly16x8_t, V8HI, poly, 12)
   ENTRY (Poly64x1_t, DI, poly, 12)
   ENTRY (Poly64x2_t, V2DI, poly, 12)
+  ENTRY (Float16x4_t, V4HF, none, 13)
+  ENTRY (Float16x8_t, V8HF, none, 13)
   ENTRY (Float32x2_t, V2SF, none, 13)
   ENTRY (Float32x4_t, V4SF, none, 13)
   ENTRY (Float64x1_t, V1DF, none, 13)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index dd2bc47..4dd2bc7 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -367,11 +367,11 @@
   VAR1 (UNOP, float_extend_lo_, 0, v2df)
   VAR1 (UNOP, float_truncate_lo_, 0, v2sf)
 
-  /* Implemented by aarch64_ld1.  */
-  BUILTIN_VALL (LOAD1, ld1, 0)
+  /* Implemented by aarch64_ld1.  */
+  BUILTIN_VALL_F16 (LOAD1, ld1, 0)
 
-  /* Implemented by aarch64_st1.  */
-  BUILTIN_VALL (STORE1, st1, 0)
+  /* Implemented by aarch64_st1.  */
+  BUILTIN_VALL_F16 (STORE1, st1, 0)
 
   /* Implemented by fma4.  */
   BUILTIN_VDQF (TERNOP, fma, 4)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b90f938..5cc45ed 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -19,8 +19,8 @@
 ;; .
 
 (define_expand "mov"
-  [(set (match_operand:VALL 0 "nonimmediate_operand" "")
-	(match_operand:VALL 1 "general_opera

[PATCH 9/15][AArch64] vld{2,3,4}{,_lane,_dup}, vcombine, vcreate

2015-07-28 Thread Alan Lawrence

gcc/ChangeLog:

* config/aarch64/aarch64.c (aarch64_split_simd_combine): Add V4HFmode.
* config/aarch64/aarch64-builtins.c (VAR13, VAR14): New.
(aarch64_scalar_builtin_types, aarch64_init_simd_builtin_scalar_types):
Add __builtin_aarch64_simd_hf.
* config/aarch64/arm_neon.h (float16x4x2_t, float16x8x2_t,
float16x4x3_t, float16x8x3_t, float16x4x4_t, float16x8x4_t,
vcombine_f16, vst2_lane_f16, vst2q_lane_f16, vst3_lane_f16,
vst3q_lane_f16, vst4_lane_f16, vst4q_lane_f16, vld2_f16, vld2q_f16,
vld3_f16, vld3q_f16, vld4_f16, vld4q_f16, vld2_dup_f16, vld2q_dup_f16,
vld3_dup_f16, vld3q_dup_f16, vld4_dup_f16, vld4q_dup_f16,
vld2_lane_f16, vld2q_lane_f16, vld3_lane_f16, vld3q_lane_f16,
vld4_lane_f16, vld4q_lane_f16, vst2_f16, vst2q_f16, vst3_f16,
vst3q_f16, vst4_f16, vst4q_f16, vcreate_f16): New.

* config/aarch64/iterators.md (VALLDIF, Vtype, Vetype, Vbtype,
V_cmp_result, v_cmp_result): Add cases for V4HF and V8HF.
(VDC, Vdbl): Add V4HF.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vldN_1.c: Add float16x4_t and float16x8_t cases.
* gcc.target/aarch64/vldN_dup_1.c: Likewise.
* gcc.target/aarch64/vldN_lane_1.c: Likewise.
commit ef719e5d3d6eccc5cf621851283b7c0ba1a9ee6c
Author: Alan Lawrence 
Date:   Tue Aug 5 17:52:28 2014 +0100

AArch64 3/N: v(create|combine|v(ld|st|ld...dup/lane|st...lane)[234](q?))_f16; tests vldN{,_lane,_dup} inc bigendian. Add __builtin_aarch64_simd_hf.

Fix some casts, to ..._hf not ..._sf !

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a6c3377..5367ba6 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -300,6 +300,12 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define VAR12(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
   VAR11 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K) \
   VAR1 (T, N, MAP, L)
+#define VAR13(T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR12 (T, N, MAP, A, B, C, D, E, F, G, H, I, J, K, L) \
+  VAR1 (T, N, MAP, M)
+#define VAR14(T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M, N) \
+  VAR13 (T, X, MAP, A, B, C, D, E, F, G, H, I, J, K, L, M) \
+  VAR1 (T, X, MAP, N)
 
 #include "aarch64-builtin-iterators.h"
 
@@ -377,6 +383,7 @@ const char *aarch64_scalar_builtin_types[] = {
   "__builtin_aarch64_simd_qi",
   "__builtin_aarch64_simd_hi",
   "__builtin_aarch64_simd_si",
+  "__builtin_aarch64_simd_hf",
   "__builtin_aarch64_simd_sf",
   "__builtin_aarch64_simd_di",
   "__builtin_aarch64_simd_df",
@@ -664,6 +671,8 @@ aarch64_init_simd_builtin_scalar_types (void)
 	 "__builtin_aarch64_simd_qi");
   (*lang_hooks.types.register_builtin_type) (intHI_type_node,
 	 "__builtin_aarch64_simd_hi");
+  (*lang_hooks.types.register_builtin_type) (aarch64_fp16_type_node,
+	 "__builtin_aarch64_simd_hf");
   (*lang_hooks.types.register_builtin_type) (intSI_type_node,
 	 "__builtin_aarch64_simd_si");
   (*lang_hooks.types.register_builtin_type) (float_type_node,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ccf063a..bbf5230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1063,6 +1063,9 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
 	case V2SImode:
 	  gen = gen_aarch64_simd_combinev2si;
 	  break;
+	case V4HFmode:
+	  gen = gen_aarch64_simd_combinev4hf;
+	  break;
 	case V2SFmode:
 	  gen = gen_aarch64_simd_combinev2sf;
 	  break;
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7425485..d61e619 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -153,6 +153,16 @@ typedef struct uint64x2x2_t
   uint64x2_t val[2];
 } uint64x2x2_t;
 
+typedef struct float16x4x2_t
+{
+  float16x4_t val[2];
+} float16x4x2_t;
+
+typedef struct float16x8x2_t
+{
+  float16x8_t val[2];
+} float16x8x2_t;
+
 typedef struct float32x2x2_t
 {
   float32x2_t val[2];
@@ -273,6 +283,16 @@ typedef struct uint64x2x3_t
   uint64x2_t val[3];
 } uint64x2x3_t;
 
+typedef struct float16x4x3_t
+{
+  float16x4_t val[3];
+} float16x4x3_t;
+
+typedef struct float16x8x3_t
+{
+  float16x8_t val[3];
+} float16x8x3_t;
+
 typedef struct float32x2x3_t
 {
   float32x2_t val[3];
@@ -393,6 +413,16 @@ typedef struct uint64x2x4_t
   uint64x2_t val[4];
 } uint64x2x4_t;
 
+typedef struct float16x4x4_t
+{
+  float16x4_t val[4];
+} float16x4x4_t;
+
+typedef struct float16x8x4_t
+{
+  float16x8_t val[4];
+} float16x8x4_t;
+
 typedef struct float32x2x4_t
 {
   float32x2_t val[4];
@@ -2644,6 +2674,12 @@ vcreate_s64 (uint64_t __a)
   return (int64x1_t) {__a};
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vcreate_f16 (uint64_t __a)
+{
+  return (float16x4_t) __a;
+}
+
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vcr

[PATCH 10/15][AArch64] Implement vcvt_{,high_}f16_f32

2015-07-28 Thread Alan Lawrence

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_v2sf):
Reparameterize to...
(aarch64_float_truncate_lo_): ...this, for both V2SF and V4HF.
(aarch64_float_truncate_hi_v4sf): Reparameterize to...
(aarch64_float_truncate_hi_): ...this, for both V4SF and V8HF.

* config/aarch64/aarch64-simd-builtins.def (float_truncate_hi_): Add
v8hf variant.
(float_truncate_lo_): Use BUILTIN_VDF iterator.

* config/aarch64/arm_neon.h (vcvt_f16_f32, vcvt_high_f16_f32): New.

* config/aarch64/iterators.md (VDF, Vdtype): New.
(VWIDE, Vmwtype): Add cases for V4HF and V2SF.



[PATCH 11/15][AArch64] vreinterpret(q?), vget_(low|high), vld1(q?)_dup

2015-07-28 Thread Alan Lawrence

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vreinterpret_p8_f16, vreinterpret_p16_f16,
vreinterpret_f16_f64, vreinterpret_f16_s8, vreinterpret_f16_s16,
vreinterpret_f16_s32, vreinterpret_f16_s64, vreinterpret_f16_f32,
vreinterpret_f16_u8, vreinterpret_f16_u16, vreinterpret_f16_u32,
vreinterpret_f16_u64, vreinterpret_f16_p8, vreinterpret_f16_p16,
vreinterpretq_f16_f64, vreinterpretq_f16_s8, vreinterpretq_f16_s16,
vreinterpretq_f16_s32, vreinterpretq_f16_s64, vreinterpretq_f16_f32,
vreinterpretq_f16_u8, vreinterpretq_f16_u16, vreinterpretq_f16_u32,
vreinterpretq_f16_u64, vreinterpretq_f16_p8, vreinterpretq_f16_p16,
vreinterpret_f32_f16, vreinterpret_f64_f16, vreinterpret_s64_f16,
vreinterpret_u64_f16, vreinterpretq_u64_f16, vreinterpret_s8_f16,
vreinterpret_s16_f16, vreinterpret_s32_f16, vreinterpret_u8_f16,
vreinterpret_u16_f16, vreinterpret_u32_f16, vreinterpretq_p8_f16,
vreinterpretq_p16_f16, vreinterpretq_f32_f16, vreinterpretq_f64_f16,
vreinterpretq_s64_f16, vreinterpretq_s8_f16, vreinterpretq_s16_f16,
vreinterpretq_s32_f16, vreinterpretq_u8_f16, vreinterpretq_u16_f16,
vreinterpretq_u32_f16, vget_low_f16, vget_high_f16, vld1_dup_f16,
vld1q_dup_f16): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vget_high_1.c: Add float16x8->float16x4 case.
* gcc.target/aarch64/vget_low_1.c: Likewise.
commit beb21a6bce76d4fbedb13fcf25796563b27f6bae
Author: Alan Lawrence 
Date:   Mon Jun 29 18:46:49 2015 +0100

[AArch64 5/N v2] vreinterpret, vget_(low|high), vld1(q?)_dup. update tests for vget_low/high

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b915754..ff1a45c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -2891,6 +2891,12 @@ vgetq_lane_u64 (uint64x2_t __a, const int __b)
 /* vreinterpret  */
 
 __extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
+vreinterpret_p8_f16 (float16x4_t __a)
+{
+  return (poly8x8_t) __a;
+}
+
+__extension__ static __inline poly8x8_t __attribute__ ((__always_inline__))
 vreinterpret_p8_f64 (float64x1_t __a)
 {
   return (poly8x8_t) __a;
@@ -2987,6 +2993,12 @@ vreinterpretq_p8_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
+vreinterpretq_p8_f16 (float16x8_t __a)
+{
+  return (poly8x16_t) __a;
+}
+
+__extension__ static __inline poly8x16_t __attribute__ ((__always_inline__))
 vreinterpretq_p8_f32 (float32x4_t __a)
 {
   return (poly8x16_t) __a;
@@ -3023,6 +3035,12 @@ vreinterpretq_p8_p16 (poly16x8_t __a)
 }
 
 __extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
+vreinterpret_p16_f16 (float16x4_t __a)
+{
+  return (poly16x4_t) __a;
+}
+
+__extension__ static __inline poly16x4_t __attribute__ ((__always_inline__))
 vreinterpret_p16_f64 (float64x1_t __a)
 {
   return (poly16x4_t) __a;
@@ -3119,6 +3137,12 @@ vreinterpretq_p16_s64 (int64x2_t __a)
 }
 
 __extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
+vreinterpretq_p16_f16 (float16x8_t __a)
+{
+  return (poly16x8_t) __a;
+}
+
+__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
 vreinterpretq_p16_f32 (float32x4_t __a)
 {
   return (poly16x8_t) __a;
@@ -3154,6 +3178,156 @@ vreinterpretq_p16_p8 (poly8x16_t __a)
   return (poly16x8_t) __a;
 }
 
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f64 (float64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s8 (int8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s16 (int16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s32 (int32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_s64 (int64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_f32 (float32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u8 (uint8x8_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u16 (uint16x4_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u32 (uint32x2_t __a)
+{
+  return (float16x4_t) __a;
+}
+
+__extension__ static __inline float16x4_t __attribute__ ((__always_inline__))
+vreinterpret_f16_u64 (uint64x1_t __a)
+{
+  return (float16x4_t) __a;
+}
+

[PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_{f16_f32,f32_f16}

2015-07-28 Thread Alan Lawrence

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
set additional flags for neon-fp16 support.
* gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New.
commit e6cc7467ddf5702d3a122b8ac4163621d0164b37
Author: Alan Lawrence 
Date:   Wed Jan 28 13:02:22 2015 +

v2 Test vcvt{,_high on aarch64}_f{32_f16,16_f32}, with neon-fp16 for ARM targets.

v2a: #if defined(__aarch64__); + clean_results(); fp16 opts for ARM; fp16_hw_ok

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
index ceada83..5f5e1fe 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
@@ -52,8 +52,10 @@ if {[istarget arm*-*-*]} then {
 torture-init
 set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
 
-# Make sure Neon flags are provided, if necessary.
-set additional_flags [add_options_for_arm_neon ""]
+# Make sure Neon flags are provided, if necessary. We try to add FP16 flags
+# for all tests; tests requiring FP16 will abort if a non-FP16 option
+# was forced.
+set additional_flags [add_options_for_arm_neon_fp16 ""]
 
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
new file mode 100644
index 000..7a1c256
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c
@@ -0,0 +1,98 @@
+/* { dg-require-effective-target arm_neon_fp16_hw_ok { target { arm*-*-* } } } */
+#include 
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+#include 
+
+/* Expected results for vcvt.  */
+VECT_VAR_DECL (expected,hfloat,32,4) [] = { 0x4180, 0x4170,
+	0x4160, 0x4150 };
+VECT_VAR_DECL (expected,hfloat,16,4) [] = { 0x3e00, 0x4100, 0x4300, 0x4480 };
+
+/* Expected results for vcvt_high_f32_f16.  */
+VECT_VAR_DECL (expected_high,hfloat,32,4) [] = { 0xc140, 0xc130,
+		 0xc120, 0xc110 };
+/* Expected results for vcvt_high_f16_f32.  */
+VECT_VAR_DECL (expected_high,hfloat,16,8) [] = { 0x4000, 0x4000, 0x4000, 0x4000,
+		 0xcc00, 0xcb80, 0xcb00, 0xca80 };
+
+void
+exec_vcvt (void)
+{
+  clean_results();
+
+#define TEST_MSG vcvt_f32_f16
+  {
+VECT_VAR_DECL (buffer_src, float, 16, 4) [] = { 16.0, 15.0, 14.0, 13.0 };
+
+DECL_VARIABLE (vector_src, float, 16, 4);
+
+VLOAD (vector_src, buffer_src, , float, f, 16, 4);
+DECL_VARIABLE (vector_res, float, 32, 4) =
+	vcvt_f32_f16 (VECT_VAR (vector_src, float, 16, 4));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	   VECT_VAR (vector_res, float, 32, 4));
+
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected, "");
+  }
+#undef TEST_MSG
+
+  clean_results ();
+
+#define TEST_MSG vcvt_f16_f32
+  {
+VECT_VAR_DECL (buffer_src, float, 32, 4) [] = { 1.5, 2.5, 3.5, 4.5 };
+DECL_VARIABLE (vector_src, float, 32, 4);
+
+VLOAD (vector_src, buffer_src, q, float, f, 32, 4);
+DECL_VARIABLE (vector_res, float, 16, 4) =
+  vcvt_f16_f32 (VECT_VAR (vector_src, float, 32, 4));
+vst1_f16 (VECT_VAR (result, float, 16, 4),
+	  VECT_VAR (vector_res, float, 16 ,4));
+
+CHECK_FP (TEST_MSG, float, 16, 4, PRIx16, expected, "");
+  }
+#undef TEST_MSG
+
+#if defined(__aarch64__)
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f32_f16"
+  {
+DECL_VARIABLE (vector_src, float, 16, 8);
+VLOAD (vector_src, buffer, q, float, f, 16, 8);
+DECL_VARIABLE (vector_res, float, 32, 4);
+VECT_VAR (vector_res, float, 32, 4) =
+  vcvt_high_f32_f16 (VECT_VAR (vector_src, float, 16, 8));
+vst1q_f32 (VECT_VAR (result, float, 32, 4),
+	   VECT_VAR (vector_res, float, 32, 4));
+CHECK_FP (TEST_MSG, float, 32, 4, PRIx32, expected_high, "");
+  }
+#undef TEST_MSG
+  clean_results ();
+
+#define TEST_MSG "vcvt_high_f16_f32"
+  {
+DECL_VARIABLE (vector_low, float, 16, 4);
+VDUP (vector_low, , float, f, 16, 4, 2.0);
+
+DECL_VARIABLE (vector_src, float, 32, 4);
+VLOAD (vector_src, buffer, q, float, f, 32, 4);
+
+DECL_VARIABLE (vector_res, float, 16, 8) =
+  vcvt_high_f16_f32 (VECT_VAR (vector_low, float, 16, 4),
+			 VECT_VAR (vector_src, float, 32, 4));
+vst1q_f16 (VECT_VAR (result, float, 16, 8),
+	   VECT_VAR (vector_res, float, 16, 8));
+
+CHECK_FP (TEST_MSG, float, 16, 8, PRIx16, expected_high, "");
+  }
+#endif
+}
+
+int
+main (void)
+{
+  exec_vcvt ();
+  return 0;
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index f0c209f..591e022 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2705,6 +2705,21 @@ proc check_effective_target_arm_neon_fp16_ok { } {
 		check_effective_target_arm_neon_fp16_ok_n

[PATCH 13/15][ARM/AArch64 Testsuite] Add float16 tests to advsimd-intrinsics testsuite

2015-07-28 Thread Alan Lawrence
This is a respin of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00488.html, 
fixing up the testsuite for float16 vectors. Relative to the previous version, 
most of the additions to the tests are now within #if..#endif such that they are 
only compiled if we have a scalar __fp16 type (the exception is hfloat16_t: 
since this is actually an integer type, we can define and use it without any 
compiler fp16 support). Also we  try to use add_options_for_arm_neon_fp16 for 
all tests (on ARM targets), falling back to add_options_for_arm_neon if the 
previous fails.


Cross-tested on many multilibs, including -march=armv6, 
-march=armv7-a{,-mfpu=neon-fp16}, -march=armv7-a/-mfpu=neon, 
-march=armv7-a/-mfp16-format=none{,/-mfpu=neon-fp16,/-mfpu=neon}, 
-march=armv7-a/-mfp16-format=alternative .


Note that on bigendian, this requires path at 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00696.html , which I will commit 
at the same time.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp:
Set additional_flags for neon-fp16 if supported, else fallback to neon.

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(hfloat16_t): New.
(result, expected, clean_results, DECL_VARIABLE_64BITS_VARIANTS,
DECL_VARIABLE_128BITS_VARIANTS): Add float16x4_t and float16x8_t cases
if supported.
(CHECK_RESULTS): Redefine using CHECK_RESULTS_NAMED.
(CHECK_RESULTS_NAMED): Move body to CHECK_RESULTS_NAMED_NO_FP16;
redefine in terms of CHECK_RESULTS_NAMED_NO_FP16 with float16 variants
when those are supported.
(CHECK_RESULTS_NAMED_NO_FP16, CHECK_RESULTS_NO_FP16): New.
(vdup_n_f16): New.

* gcc.target/aarch64/advsimd-intrinsics/compute-ref-data.h (buffer,
buffer_pad, buffer_dup, buffer_dup_pad): Add float16x4 and float16x8_t
cases if supported.

* gcc.target/aarch64/advsimd-intrinsics/vbsl.c (exec_vbsl):
Use CHECK_RESULTS_NO_FP16 in place of CHECK_RESULTS.
* gcc.target/aarch64/advsimd-intrinsics/vdup-vmov.c (exec_vdup_vmov):
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdup_lane.c (exec_vdup_lane):
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vext.c (exec_vext): Likewise.

* gcc.target/aarch64/advsimd-intrinsics/vcombine.c (expected):
Add float16x8_t case.
(main, exec_vcombine): test float16x4_t -> float16x8_t, if supported.
* gcc.target/aarch64/advsimd-intrinsics/vcreate.c (expected,
main, exec_vcreate): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vget_high (expected,
 exec_vget_high): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vget_low.c (expected,
exec_vget_low): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1.c (expected, exec_vld1):
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1_dup.c (expected,
exec_vld1_dup): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld1_lane.c (expected,
exec_vld1_lane): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vldX.c (expected, exec_vldX):
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vldX_dup.c (expected,
exec_vldX_dup): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vldX_lane.c (expected,
exec_vldX_lane): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vset_lane.c (expected,
exec_vset_lane): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1_lane.c (expected,
 exec_vst1_lane): Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
index ceada839d982d3b6a922d924cad910a1c860eed0..462696315e05ea220dff60c1a605160ae2b59a1c 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/advsimd-intrinsics.exp
@@ -52,8 +52,12 @@ if {[istarget arm*-*-*]} then {
 torture-init
 set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
 
-# Make sure Neon flags are provided, if necessary.
-set additional_flags [add_options_for_arm_neon ""]
+# Make sure Neon flags are provided, if necessary.  Use fp16 if we can.
+if {[check_effective_target_arm_neon_fp16_ok]} then {
+  set additional_flags [add_options_for_arm_neon_fp16 ""]
+} else {
+  set additional_flags [add_options_for_arm_neon ""]
+}
 
 # Main loop.
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 4e728d5572c8e669bf6e175a07b6575cb6baf66d..49fbd843e507ede8aa81d02c175a82a1221750a4 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-i

[PATCH 12/15][AArch64] Add vcvt(_high)?_f32_f16 intrinsics, with BE RTL fix

2015-07-28 Thread Alan Lawrence
The fix here (as noted https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01345.html) 
is that this changes the vector indices present in the RTL on bigendian for 
float vec_unpacks, to be the same as for integer vec_unpacks. This appears 
consistent with the usage of VEC_UNPACK_(FLOAT_)?EXPR in tree-vect-stmts.c, 
which uses a different EXPR for the same half of the vector depending on 
endianness. I was not able to construct a testcase where the RTL here mattered 
(i.e. where the RTL was constant-folded, but the tree had not been), but the 
correctness can be seen from a testcase:


double d[4];
void
bar (float *f)
{
  for (int i = 0; i < 4; i++)
d[i] = f[i];
}

which used to produced as final RTL (-O3)

(insn:TI 8 10 12 (set (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])

(float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float 
*)f_6(D)] ] [77])


(parallel [
(const_int 2 [0x2])
(const_int 3 [0x3])
] test.c:40 1274 {vec_unpacks_hi_v4sf}

(expr_list:REG_EQUIV (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)&d]+0 S16 
A64])


(nil)))
(insn:TI 12 8 11 (set (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])

(float_extend:V2DF (vec_select:V2SF (reg:V4SF 32 v0 [orig:77 MEM[(float 
*)f_6(D)] ] [77])


(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
] test.c:40 1272 {vec_unpacks_lo_v4sf}
 (expr_list:REG_EQUIV (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
(const_int 16 [0x10])) [2 MEM[(double *)&d + 16B]+0 S16 A64])
(nil)))

(insn:TI 11 12 15 (set (mem/c:V2DF (reg/f:DI 0 x0 [79]) [2 MEM[(double *)&d]+0 
S16 A64]) (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])) test.c:40 808 
{*aarch64_simd_movv2df}


 (expr_list:REG_DEAD (reg:V2DF 33 v1 [orig:78 vect__9.19 ] [78])
(nil)))
(insn:TI 15 11 22 (set (mem/c:V2DF (plus:DI (reg/f:DI 0 x0 [79])
(const_int 16 [0x10])) [2 MEM[(double *)&d + 16B]+0 S16 A64])

(reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])) test.c:40 808 
{*aarch64_simd_movv2df}

 (expr_list:REG_DEAD (reg:V2DF 32 v0 [orig:81 vect__9.19 ] [81])

i.e. apparently storing vector elements 2 and 3 to the address of d, and elems 
0+1 to address (d+16). Of course this was flipped back again to be correct at 
assembly time, but following this patch the RTL indices are also correct (elems 
0+1 to address d, elems 2+3 to address d+16).



gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_simd_vec_unpacks_lo_,
aarch64_simd_vec_unpacks_hi_): New insn.
(vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf): Delete insn.
(vec_unpacks_lo_, vec_unpacks_hi_): New expand.
(aarch64_float_extend_lo_v2df): Rename to...
(aarch64_float_extend_lo_): this, using VDF and so adding V4SF.

* config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi): Add v8hf.
(float_extend_lo): Add v4sf.

* config/aarch64/arm_neon.h (vcvt_f32_f16, vcvt_high_f32_f16): New.
* config/aarch64/iterators.md (VQ_HSF): New iterator.
(VWIDE, Vwtype, Vhalftype): Add V8HF, V4SF.
(Vwide): New mode_attr.
commit 214fcc00475a543a79ed444f9a64061215397cc8
Author: Alan Lawrence 
Date:   Wed Jan 28 13:01:31 2015 +

AArch64 6/N: vcvt{,_high}_f32_f16 (using vect_par_cnst_hi_half, fixing bigendian indices)

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 8bcab72..9869b73 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -361,11 +361,11 @@
   BUILTIN_VSDQ_I_DI (UNOP, abs, 0)
   BUILTIN_VDQF (UNOP, abs, 2)
 
-  VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
+  VAR2 (UNOP, vec_unpacks_hi_, 10, v4sf, v8hf)
   VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
   VAR1 (BINOP, float_truncate_hi_, 0, v8hf)
 
-  VAR1 (UNOP, float_extend_lo_, 0, v2df)
+  VAR2 (UNOP, float_extend_lo_, 0, v2df, v4sf)
   BUILTIN_VDF (UNOP, float_truncate_lo_, 0)
 
   /* Implemented by aarch64_ld1.  */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2dc54e1..1a7d858 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1691,36 +1691,57 @@
 
 ;; Float widening operations.
 
-(define_insn "vec_unpacks_lo_v4sf"
-  [(set (match_operand:V2DF 0 "register_operand" "=w")
-	(float_extend:V2DF
-	  (vec_select:V2SF
-	(match_operand:V4SF 1 "register_operand" "w")
-	(parallel [(const_int 0) (const_int 1)])
-	  )))]
+(define_insn "aarch64_simd_vec_unpacks_lo_"
+  [(set (match_operand: 0 "register_operand" "=w")
+(float_extend: (vec_select:
+			   (match_operand:VQ_HSF 1 "register_operand" "w")
+			   (match_operand:VQ_HSF 2 "vect_par_cnst_lo_half" "")
+			)))]
   "TARGET_SIMD"
-  "fcvtl\\t%0.2d, %1.2s"
+  "fcvtl\\t%0., %1."
   [(set_attr "type" "neon_fp_cvt_widen_s")

[PATCH 15/15][ARM] Update sourcebuild.texi with testsuite/effective-target hooks

2015-07-28 Thread Alan Lawrence
This documents the change to arm_neon_fp16_ok in the first patch; the addition 
of arm_neon_fp16_hw_ok in the last patch; and corrects a cross-reference.


(I tried using an @ref instead of "Implies previous." but the page ref looked 
very out-of-place in PDF when I am referring to the previous item in the list!)


gcc/ChangeLog:

* doc/sourcebuild.texi (arm_neon_fp16): Correct cross-reference.
(arm_neon_fp16_ok): Document adding of -mfp16-format=ieee flag.
(arm_neon_fp16_hw_ok): New.
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 13f63d1..0c0fe84 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1549,7 +1549,12 @@ options.  Some multilibs may be incompatible with these options.
 @item arm_neon_fp16_ok
 @anchor{arm_neon_fp16_ok}
 ARM Target supports @code{-mfpu=neon-fp16 -mfloat-abi=softfp} or compatible
-options.  Some multilibs may be incompatible with these options.
+options, including @code{-mfp16-format=ieee} if necessary to obtain the
+@code{__fp16} type.  Some multilibs may be incompatible with these options.
+
+@item arm_neon_fp16_hw_ok
+Test system supports executing Neon half-precision float instructions.
+(Implies previous.)
 
 @item arm_thumb1_ok
 ARM target generates Thumb-1 code for @code{-mthumb}.
@@ -2016,7 +2021,7 @@ keyword}.
 @item arm_neon_fp16
 NEON and half-precision floating point support.  Only ARM targets
 support this feature, and only then in certain modes; see
-the @ref{arm_neon_ok,,arm_neon_fp16_ok effective target keyword}.
+the @ref{arm_neon_fp16_ok,,arm_neon_fp16_ok effective target keyword}.
 
 @item arm_vfp3
 arm vfp3 floating point support; see


Re: [gomp4] fiuxup openacc default handling

2015-07-28 Thread Nathan Sidwell

On 07/27/15 11:21, Tom de Vries wrote:

On 26/07/15 19:09, Nathan Sidwell wrote:

I've committed this update to my earlier breakout of default handling.
After complaining about something because of 'none', we should fall
through to the default handling, to prevent ICEing later (on patch
seriesI'm working on).  This matches the OMP default handling.  Also
tweaked the setting of GOVD_ flags slightly, to make the firstprivate
handling I'm working on less invasive.



Hi,

this causes PR 67027 - "[gomp4] FAIL: gfortran.dg/goacc/modules.f95 -O (internal
compiler error)".


Fixed thusly.  committed to gomp4

2015-07-28  Nathan Sidwell  

	* gimplify.c (oacc_default_clause): Always set GOVD_MAP if found
	in outer scope.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c	(revision 226250)
+++ gcc/gimplify.c	(working copy)
@@ -5948,7 +5948,7 @@ oacc_default_clause (struct gimplify_omp
 		= splay_tree_lookup (octx->variables, (splay_tree_key) decl);
 	  if (n2)
 		{
-		  flags |= n2->value & GOVD_MAP;
+		  flags |= GOVD_MAP;
 		  goto found_outer;
 		}
 	  }


[PATCH] Improve compare with min/max simplification for bools

2015-07-28 Thread Richard Biener

For types with just two values max - 1 is equal to min and thus
we fail to optimize some cases of comparisons.  With fold-const.c
bool < 0 needed the abs(x) < 0 simplification to trigger it
(same issue with mis-ordered if / else-ifs).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-07-28  Richard Biener  

* match.pd: Re-order two cases in comparison with max/min
value simplification to make it apply for bools.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 226306)
+++ gcc/match.pd(working copy)
@@ -1856,12 +1856,6 @@ (define_operator_list CBRT BUILT_IN_CBRT
{ constant_boolean_node (true, type); })
(if (cmp == LT_EXPR)
(ne @2 @1
- (if (wi::eq_p (@1, max - 1))
-  (switch
-   (if (cmp == GT_EXPR)
-(eq @2 { wide_int_to_tree (TREE_TYPE (@1), wi::add (@1, 1)); }))
-   (if (cmp == LE_EXPR)
-(ne @2 { wide_int_to_tree (TREE_TYPE (@1), wi::add (@1, 1)); }
  (if (wi::eq_p (@1, min))
   (switch
(if (cmp == LT_EXPR)
@@ -1872,6 +1866,12 @@ (define_operator_list CBRT BUILT_IN_CBRT
 { constant_boolean_node (true, type); })
(if (cmp == GT_EXPR)
 (ne @2 @1
+ (if (wi::eq_p (@1, max - 1))
+  (switch
+   (if (cmp == GT_EXPR)
+(eq @2 { wide_int_to_tree (TREE_TYPE (@1), wi::add (@1, 1)); }))
+   (if (cmp == LE_EXPR)
+(ne @2 { wide_int_to_tree (TREE_TYPE (@1), wi::add (@1, 1)); }
  (if (wi::eq_p (@1, min + 1))
   (switch
(if (cmp == GE_EXPR)


[PATCH][22/n] Remove GENERIC stmt combining from SCCVN

2015-07-28 Thread Richard Biener

This implements some remaining parts of fold_comparison address
comparisons but still no complete part of it.  Still it is good
enough to make fold_stmt not regress when not dispatching to fold_binary.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-07-28  Richard Biener  

* match.pd: Add more simplification of address comparisons.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 226299)
+++ gcc/match.pd(working copy)
@@ -1828,6 +1828,46 @@ (define_operator_list CBRT BUILT_IN_CBRT
   (if (tree_single_nonzero_warnv_p (@0, NULL))
{ constant_boolean_node (cmp == NE_EXPR, type); })))
 
+/* When the addresses are not directly of decls compare base and offset.
+   This implements some remaining parts of fold_comparison address
+   comparisons but still no complete part of it.  Still it is good
+   enough to make fold_stmt not regress when not dispatching to fold_binary.  
*/
+(for cmp (simple_comparison)
+ (simplify
+  (cmp (convert? addr@0) (convert? addr@1))
+  (with
+   {
+ HOST_WIDE_INT off0, off1;
+ tree base0 = get_addr_base_and_unit_offset (TREE_OPERAND (@0, 0), &off0);
+ tree base1 = get_addr_base_and_unit_offset (TREE_OPERAND (@1, 0), &off1);
+ if (base0 && TREE_CODE (base0) == MEM_REF)
+   {
+off0 += mem_ref_offset (base0).to_short_addr ();
+ base0 = TREE_OPERAND (base0, 0);
+   }
+ if (base1 && TREE_CODE (base1) == MEM_REF)
+   {
+ off1 += mem_ref_offset (base1).to_short_addr ();
+ base1 = TREE_OPERAND (base1, 0);
+   }
+   }
+   (if (base0 && base1
+   && operand_equal_p (base0, base1, 0)
+   && (cmp == EQ_EXPR || cmp == NE_EXPR
+   || POINTER_TYPE_OVERFLOW_UNDEFINED))
+(switch
+ (if (cmp == EQ_EXPR)
+  { constant_boolean_node (off0 == off1, type); })
+ (if (cmp == NE_EXPR)
+  { constant_boolean_node (off0 != off1, type); })
+ (if (cmp == LT_EXPR)
+  { constant_boolean_node (off0 < off1, type); })
+ (if (cmp == LE_EXPR)
+  { constant_boolean_node (off0 <= off1, type); })
+ (if (cmp == GE_EXPR)
+  { constant_boolean_node (off0 >= off1, type); })
+ (if (cmp == GT_EXPR)
+  { constant_boolean_node (off0 > off1, type); }))
 
 /* Non-equality compare simplifications from fold_binary  */
 (for cmp (lt gt le ge)


Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-07-28 Thread Kugan


On 27/07/15 05:38, Andreas Schwab wrote:
> Kugan  writes:
> 
>>  * cse.c (cse_insn): Fix missing check for STRICT_LOW_PART and minor
>>  clean up.
> 
> This breaks 
> 
> gcc.target/m68k/tls-ie-xgot.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-ie.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-le-xtls.c scan-assembler jsr __m68k_read_tp
> gcc.target/m68k/tls-le.c scan-assembler jsr __m68k_read_tp
> 
> Andreas.
> 

Sorry for the breakage. My patch to add ZERO_EXTRACT unfortunately
restricts the behaviour in one other case. That is, even when REG_EQUAL
note and src are same, we were setting src_eqv to src when it is
STRICT_LOW_PART. Not sure why but restored the old behaviour.

I could reproduce this issue by inspecting the generated asm and made
sure that it is fixed. However I could not run regression for m68k
(Sorry I don’t have access to the set-up).
I bootstrapped and regression tested on x86_64-linux-gnu and
arm-none-linux-gnu with no new regressions.

Thanks,
Kugan


gcc/ChangeLog:

2015-07-27  Kugan Vivekanandarajah  

* cse.c (cse_insn): Restoring old behaviour for src_eqv
 when dest and value in the REG_EQUAL are same and dest
 is STRICT_LOW_PART.
diff --git a/gcc/cse.c b/gcc/cse.c
index 96adf18..17c0954 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4529,12 +4529,13 @@ cse_insn (rtx_insn *insn)
  this case, and if it isn't set, then there will be no equivalence
  for the destination.  */
   if (n_sets == 1 && REG_NOTES (insn) != 0
-  && (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0
-  && (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl
+  && (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0)
 {
-  if (GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART)
-   src_eqv = copy_rtx (XEXP (tem, 0));
 
+  if (GET_CODE (SET_DEST (sets[0].rtl)) != ZERO_EXTRACT
+ && (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl))
+ || GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART))
+   src_eqv = copy_rtx (XEXP (tem, 0));
   /* If DEST is of the form ZERO_EXTACT, as in:
 (set (zero_extract:SI (reg:SI 119)
  (const_int 16 [0x10])


Re: [PATCH] Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-28 Thread Tom de Vries

On 28/07/15 09:59, Richard Biener wrote:

On Fri, Jul 24, 2015 at 4:39 PM, Tom de Vries  wrote:

Hi,

this patch allows parallelization and vectorization of reduction operators
that are guaranteed to not overflow (such as min and max operators),
independent of the overflow behaviour of the type.

Bootstrapped and reg-tested on x86_64.

OK for trunk?


Hmm, I don't like that no_overflow_tree_code function.  We have a much more
clear understanding which codes may overflow or trap.  Thus please add
a operation specific variant of TYPE_OVERFLOW_{TRAPS,WRAPS,UNDEFINED} like



Done.


bool
operation_overflow_traps (tree type, enum tree_code code)
{
   if (!ANY_INTEGRAL_TYPE_P (type)


I've changed this test into a gcc_checking_assert.


  || !TYPE_OVERFLOW_TRAPS (type))
 return false;
   switch (code)
 {
 case PLUS_EXPR:
 case MINUS_EXPR:
 case MULT_EXPR:
 case LSHIFT_EXPR:
/* Can overflow in various ways */
 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
 case FLOOR_DIV_EXPR:
 case CEIL_DIV_EXPR:
/* For INT_MIN / -1 */
 case NEGATE_EXPR:
 case ABS_EXPR:
/* For -INT_MIN */
return true;
 default:
return false;
}
}

and similar variants for _wraps and _undefined.  I think we decided at
some point
the compiler should not take advantage of the fact that lshift or
*_div have undefined
behavior on signed integer overflow, similar we only take advantage of
integral-type
overflow behavior, not vector or complex.  So we could reduce the
number of cases
the functions return true if we document that it returns true only for
the cases where
the compiler needs to / may assume wrapping behavior does not take place.
As for _traps for example we only have optabs and libfuncs for
plus,minus,mult,negate
and abs.


I've tried to capture all of this in the three new functions:
- operation_overflows_and_traps
- operation_no_overflow_or_wraps
- operation_overflows_and_undefined (unused atm)

I've also added the graphite bit.

OK for trunk, if bootstrap and reg-test succeeds?

Thanks,
- Tom
Allow non-overflow ops in vect_is_simple_reduction_1

2015-07-28  Tom de Vries  

	* tree.c (operation_overflows_and_traps, operation_no_overflow_or_wraps)
	(operation_overflows_and_undefined): New function.
	* tree.h (operation_overflows_and_traps, operation_no_overflow_or_wraps)
	(operation_overflows_and_undefined): Declare.
	* tree-vect-loop.c (vect_is_simple_reduction_1): Use
	operation_overflows_and_traps and operation_overflows_and_wraps.
	* graphite-sese-to-poly.c (is_reduction_operation_p): Same.

	* gcc.dg/autopar/reduc-2char.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").
	Add successful scans for 2 detected reductions.	 Add xfail scans for 3
	detected reductions.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c (init_arrays): Mark with attribute
	optimize ("-ftree-parallelize-loops=0").  Add successful scans for 2
	detected reductions.
	* gcc.dg/vect/trapv-vect-reduc-4.c: Update scan to match vectorized min
	and max reductions.
---
 gcc/graphite-sese-to-poly.c|   6 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c |  10 +-
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c|  10 +-
 gcc/testsuite/gcc.dg/autopar/reduc-8.c |   7 +-
 gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c |   2 +-
 gcc/tree-vect-loop.c   |   5 +-
 gcc/tree.c | 125 +
 gcc/tree.h |   3 +
 8 files changed, 153 insertions(+), 15 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index c583f16..b57dc9c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -2614,8 +2614,10 @@ is_reduction_operation_p (gimple stmt)
   if (FLOAT_TYPE_P (type))
 return flag_associative_math;
 
-  return (INTEGRAL_TYPE_P (type)
-	  && TYPE_OVERFLOW_WRAPS (type));
+  if (ANY_INTEGRAL_TYPE_P (type))
+return operation_no_overflow_or_wraps (type, code);
+
+  return false;
 }
 
 /* Returns true when PHI contains an argument ARG.  */
diff --git a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
index 14867f3..a2dad44 100644
--- a/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
+++ b/gcc/testsuite/gcc.dg/autopar/reduc-2char.c
@@ -39,8 +39,9 @@ void main1 (signed char x, signed char max_result, signed char min_result)
 abort ();
 }
 
- __attribute__((noinline))
- void init_arrays ()
+void __attribute__((noinline))
+  __attribute__((optimize ("-ftree-parallelize-loops=0")))
+init_arrays ()
  {
int i;
 
@@ -60,7 +61,10 @@ int main (void)
 }
 
 
-/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 2 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "Detected reduction" 3 "parloops" { xfail *-*-* } } } */

Re: [gomp4] Fix some gomp tests

2015-07-28 Thread Nathan Sidwell

On 07/28/15 06:14, Thomas Schwinge wrote:

Hi Nathan!

On Sat, 25 Jul 2015 16:02:01 -0400, Nathan Sidwell  wrote:

I've committed this to gomp4 branch.  It fixes some tests that were incorrect


Hmm, I fail to see what you deem incorrect in the following two Fortran
test cases?  Implicit present_or_copy clauses should be added by the
compiler, basically equal to your explicit present clauses.


and fail with some development I am working on.


Fail in what way?  I'd expect the original code still to be valid?



acc enter data creates a dynamic scope with no associated static scope.  As such 
it is therefore not visible by a later acc parallel, even if both are in the 
same static scope.  If a data object used within the parallel is not mentioned 
in a data clause on the parallel, the default behaviour of the parallel then 
occurs. That means we get another copy clause generated (not copy_or_present), 
which fails at runtime because the data is already present on the device.


This is different to if a data/end data pair were used.

nathan

--
Nathan Sidwell


Re: [gomp4] Fix some gomp tests

2015-07-28 Thread Thomas Schwinge
Hi Nathan!

On Tue, 28 Jul 2015 08:19:17 -0400, Nathan Sidwell  
wrote:
> On 07/28/15 06:14, Thomas Schwinge wrote:
> > On Sat, 25 Jul 2015 16:02:01 -0400, Nathan Sidwell  wrote:
> >> I've committed this to gomp4 branch.  It fixes some tests that were 
> >> incorrect
> >
> > Hmm, I fail to see what you deem incorrect in the following two Fortran
> > test cases?  Implicit present_or_copy clauses should be added by the
> > compiler, basically equal to your explicit present clauses.
> >
> >> and fail with some development I am working on.
> >
> > Fail in what way?  I'd expect the original code still to be valid?
> 
> 
> acc enter data creates a dynamic scope with no associated static scope.  As 
> such 
> it is therefore not visible by a later acc parallel, even if both are in the 
> same static scope.  If a data object used within the parallel is not 
> mentioned 
> in a data clause on the parallel, the default behaviour of the parallel then 
> occurs. That means we get another copy clause generated (not 
> copy_or_present), 
> which fails at runtime because the data is already present on the device.

I do agree that a copy clause is wrong (expected to fail at runtime), but
why do you say an implicit copy clause is created?  OpenACC 2.0a, 2.5.1
Parallel Construct, says that »[...] An array or variable of aggregate
data type referenced in the parallel construct that does not appear in a
data clause for the construct or any enclosing data construct will be
treated as if it appeared in a present_or_copy clause for the parallel
construct [...]«.

> This is different to if a data/end data pair were used.


Grüße,
 Thomas


pgppxS1LtL1Zt.pgp
Description: PGP signature


Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-28 Thread Mikael Morin

Le 21/07/2015 21:08, Steve Kargl a écrit :

When C++ was injected into trans-expr.c in the form of vec,
it seems whomever did the conversion to vec forgot to check
for a NULL C++ thing.  This patch seems to avoid the problem,
but having zero knowledge of C++ I could be wrong.

OK for trunk?

2015-07-21  Steven G. Kargl  

PR fortran/66942
* trans-expr.c (gfc_conv_procedure_call): Avoid dereferencing NULL
C++ thing.


Hello Steve,

I believe the vec API should have all that is necessary to handle this 
automatically.

Did you try using vec_safe_splice?

Mikael


[PATCH] Fix uninitialized variable with ubsan on ARM (PR sanitizer/66977)

2015-07-28 Thread Marek Polacek
This fixes a problem where on ARM ubsan can introduce an uninitialized variable.
It's ARM only since the ARM C++ ABI says that when creating a pointer to member
function, the LSB of ptr discriminates between the address of a non-virtual 
member
function and the offset in the class's virtual table of the address of a virtual
function.  That means the compiler will create a RSHIFT_EXPR, and with ubsan 
this
RSHIFT_EXPR is instrumented, i.e. the expression involves SAVE_EXPRs.

But this expr is used more times and that is the crux of the problem:
get_member_function_from_ptrfunc returns a tree that contains the expr, and here
4927   fn = get_member_function_from_ptrfunc (&object_addr, fn,
4928  complain);
4929   vec_safe_insert (*args, 0, object_addr);
4930 }
it also saves the expr into OBJECT_ADDR which is then pushed to args.

Long story short: can't use unshare_expr here, because that doesn't copy
SAVE_EXPRs.  I could use copy_tree_r, as outlined in the PR.  But I think
we can just not instrument the RSHIFT_EXPR -- we know that this one can't
overflow anyway.

I have tried on a cross that the problem indeed goes away.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-07-28  Marek Polacek  

PR sanitizer/66977
* typeck.c (get_member_function_from_ptrfunc): Don't sanitize
RSHIFT_EXPR.

* g++.dg/ubsan/pr66977.C: New test.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index 2ed43be..8530be5 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -3288,6 +3288,7 @@ get_member_function_from_ptrfunc (tree *instance_ptrptr, 
tree function,
   idx = build1 (NOP_EXPR, vtable_index_type, e3);
   switch (TARGET_PTRMEMFUNC_VBIT_LOCATION)
{
+   int flag_sanitize_save;
case ptrmemfunc_vbit_in_pfn:
  e1 = cp_build_binary_op (input_location,
   BIT_AND_EXPR, idx, integer_one_node,
@@ -3303,9 +3304,15 @@ get_member_function_from_ptrfunc (tree *instance_ptrptr, 
tree function,
  e1 = cp_build_binary_op (input_location,
   BIT_AND_EXPR, delta, integer_one_node,
   complain);
+ /* Don't instrument the RSHIFT_EXPR we're about to create because
+we're going to use DELTA number of times, and that wouldn't play
+well with SAVE_EXPRs therein.  */
+ flag_sanitize_save = flag_sanitize;
+ flag_sanitize = 0;
  delta = cp_build_binary_op (input_location,
  RSHIFT_EXPR, delta, integer_one_node,
  complain);
+ flag_sanitize = flag_sanitize_save;
  if (delta == error_mark_node)
return error_mark_node;
  break;
diff --git gcc/testsuite/g++.dg/ubsan/pr66977.C 
gcc/testsuite/g++.dg/ubsan/pr66977.C
index e69de29..3ab8d90 100644
--- gcc/testsuite/g++.dg/ubsan/pr66977.C
+++ gcc/testsuite/g++.dg/ubsan/pr66977.C
@@ -0,0 +1,27 @@
+// PR sanitizer/66977
+// { dg-do compile }
+// { dg-options "-fsanitize=shift -Wmaybe-uninitialized -O" }
+
+class Foo {
+
+private:
+
+  int a_;
+
+public:
+
+  Foo (int a) : a_(a) {};
+
+  inline int get_a () { return a_; };
+};
+
+int bar (int (Foo::*get)()) {
+  Foo *A = new Foo(1);
+  int result = (A->*get)();
+  delete (A);
+  return result;
+}
+
+int main () {
+  return bar (&Foo::get_a);
+}

Marek


Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing

2015-07-28 Thread Steve Kargl
On Tue, Jul 28, 2015 at 03:04:52PM +0200, Mikael Morin wrote:
> Le 21/07/2015 21:08, Steve Kargl a ?crit :
> > When C++ was injected into trans-expr.c in the form of vec,
> > it seems whomever did the conversion to vec forgot to check
> > for a NULL C++ thing.  This patch seems to avoid the problem,
> > but having zero knowledge of C++ I could be wrong.
> >
> > OK for trunk?
> >
> > 2015-07-21  Steven G. Kargl  
> >
> > PR fortran/66942
> > * trans-expr.c (gfc_conv_procedure_call): Avoid dereferencing NULL
> > C++ thing.
> >
> Hello Steve,
> 
> I believe the vec API should have all that is necessary to handle this 
> automatically.
> Did you try using vec_safe_splice?
> 

I know zero about vec and I know zero about C++.

-- 
Steve


[AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt

2015-07-28 Thread Jiong Wang

The instruction sequences for preparing argument for TLS descriptor
runtime resolver and the later function call to resolver can actually be
hoisted out of the loop.

Currently we can't because we have exposed the hard register X0 as
destination of "set".  While GCC's RTL data flow infrastructure will
skip or do very conservative assumption when hard register involved in
and thus some loop IV opportunities are missed.

This patch add another "tlsdesc_small_pseudo_" pattern, and avoid
expose x0 to gcc generic code.

Generally, we define a new register class FIXED_R0 which only contains register
0, so the instruction sequences generated from the new add pattern is the same
as tlsdesc_small_, while the operand 0 is wrapped as pseudo register that
RTL IV opt can handle it.

Ideally, we should allow operand 0 to be any pseudo register, but then
we can't model the override of x0 caused by the function call which is
hidded by the UNSPEC.

So here, we restricting operand 0 to be x0, the override of x0 can be
reflected to the gcc.

OK for trunk?

2015-07-28  Ramana Radhakrishnan  
Jiong Wang  

gcc/
  * config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern.
  * config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0.
  (REG_CLASS_NAMES): Likewise.
  (REG_CLASS_CONTENTS): Likewise.
  * config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise.
  (aarch64_register_move_cost): Likewise.
  (aarch64_load_symref_appropriately): Invoke the new added pattern if
  possible.
  * config/aarch64/constraints.md (Uc0): New constraint.

gcc/testsuite.
  * gcc.target/aarch64/tlsdesc_hoist.c: New testcase.

-- 
Regards,
Jiong

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 3851564..fb4834a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -454,6 +454,7 @@ extern unsigned long aarch64_isa_flags;
 enum reg_class
 {
   NO_REGS,
+  FIXED_REG0,
   CALLER_SAVE_REGS,
   GENERAL_REGS,
   STACK_REG,
@@ -469,6 +470,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {		\
   "NO_REGS",	\
+  "FIXED_REG0"	\
   "CALLER_SAVE_REGS",\
   "GENERAL_REGS",\
   "STACK_REG",	\
@@ -481,6 +483,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS		\
 {	\
   { 0x, 0x, 0x },	/* NO_REGS */		\
+  { 0x0001, 0x, 0x },	/* FIXED_REG0 */	\
   { 0x0007, 0x, 0x },	/* CALLER_SAVE_REGS */	\
   { 0x7fff, 0x, 0x0003 },	/* GENERAL_REGS */	\
   { 0x8000, 0x, 0x },	/* STACK_REG */		\
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ef07e05..f1f2cab 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1038,22 +1038,39 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
   {
 	machine_mode mode = GET_MODE (dest);
 	rtx x0 = gen_rtx_REG (mode, R0_REGNUM);
+	rtx offset;
 	rtx tp;
 
 	gcc_assert (mode == Pmode || mode == ptr_mode);
 
-	/* In ILP32, the got entry is always of SImode size.  Unlike
-	   small GOT, the dest is fixed at reg 0.  */
-	if (TARGET_ILP32)
-	  emit_insn (gen_tlsdesc_small_si (imm));
+	if (can_create_pseudo_p ())
+	  {
+	rtx reg = gen_reg_rtx (mode);
+
+	if (TARGET_ILP32)
+	  emit_insn (gen_tlsdesc_small_pseudo_si (reg, imm));
+	else
+	  emit_insn (gen_tlsdesc_small_pseudo_di (reg, imm));
+
+	offset = reg;
+	  }
 	else
-	  emit_insn (gen_tlsdesc_small_di (imm));
+	  {
+	/* In ILP32, the got entry is always of SImode size.  Unlike
+	   small GOT, the dest is fixed at reg 0.  */
+	if (TARGET_ILP32)
+	  emit_insn (gen_tlsdesc_small_si (imm));
+	else
+	  emit_insn (gen_tlsdesc_small_di (imm));
+
+	offset = x0;
+	  }
 	tp = aarch64_load_tp (NULL);
 
 	if (mode != Pmode)
 	  tp = gen_lowpart (mode, tp);
 
-	emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, x0)));
+	emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, offset)));
 	set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
 	return;
   }
@@ -5099,6 +5116,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
 	aarch64_vector_mode_p (mode)
 	  ? (GET_MODE_SIZE (mode) + UNITS_PER_VREG - 1) / UNITS_PER_VREG
 	  : (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+case FIXED_REG0:
 case STACK_REG:
   return 1;
 
@@ -6948,10 +6966,10 @@ aarch64_register_move_cost (machine_mode mode,
 = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == CALLER_SAVE_REGS || to == POINTER_REGS)
+  if (to == CALLER_SAVE_REGS || to == POINTER_REGS || to == FIXED_REG0)
 to = GENERAL_REGS;
 
-  if (from == CALLER_SAVE_REGS || from == POINTER_REGS)
+  if (from == CALLER_SAVE_REGS || from == POINTER_REGS || from == FIXED_REG0)
 from = GENERAL_REGS;
 
   /* Moving between GPR and stack cost is the same as GP2GP.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/con

Re: [PATCHv2] [fixincludes] Ignore .DS_Store junk files when running make check

2015-07-28 Thread Bruce Korb
Definitely much better.  I won't apply it until the weekend, so
someone else will likely beat me to it.  Thank you.

On Mon, Jul 27, 2015 at 7:36 PM, Eric Gallager  wrote:
> On 7/27/15, Andreas Schwab  wrote:
>> Eric Gallager  writes:
>>
>>> Okay, I tried embedding "! -name CVS/ ! -name .svn/" into the find


Re: [PATCH][AArch64] Properly handle simple arith+extend ops in rtx costs

2015-07-28 Thread pinskia

> On Jul 28, 2015, at 3:25 AM, Kyrill Tkachov  wrote:
> 
> Hi all,
> 
> Currently we assign the wrong rtx cost to instructions of the form
>  add x0, x0, x1, sxtw
> 
> that is, an arith operation plus a single extend (no shifting).
> We correctly catch the cases where the extend is inside a shift, but
> not the simple case.
> 
> This patch fixes that oversight by catching the simple case in
> aarch64_rtx_arith_op_extract_p and thus making sure that it gets
> assigned the alu.extend_arith extra cost.

This patch reminds me, on thunderx the cost for add with sign extend is 
different from add with zero extend.  The zero extend case is the same as a 
normal add while sign extend is one extra cycle. So soon we need to split 
extend to zextend and sextend.  

Thanks,
Andrew

> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?
> Thanks,
> Kyrill
> 
> 
> 2015-07-28  Kyrylo Tkachov  
> 
>* config/aarch64/aarch64.c (aarch64_rtx_arith_op_extract_p):
>Handle simple SIGN_EXTEND or ZERO_EXTEND.
>(aarch64_rtx_costs): Properly strip extend or extract before
>passing down to rtx costs again.
> 


Re: [gomp4] Fix some gomp tests

2015-07-28 Thread Nathan Sidwell

On 07/28/15 08:30, Thomas Schwinge wrote:


I do agree that a copy clause is wrong (expected to fail at runtime), but
why do you say an implicit copy clause is created?  OpenACC 2.0a, 2.5.1
Parallel Construct, says that »[...] An array or variable of aggregate
data type referenced in the parallel construct that does not appear in a
data clause for the construct or any enclosing data construct will be
treated as if it appeared in a present_or_copy clause for the parallel
construct [...]«.


sigh, I thought it was just 'copy' -- I believed the comment in gimplify.c was 
correct :(


nathan

--
Nathan Sidwell


Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-07-28 Thread Segher Boessenkool
On Mon, Jul 27, 2015 at 09:08:34PM -0600, Martin Sebor wrote:
> >>So, my suggestion would be to warn for any call with a nonzero value.
> >
> >The current documentation says that you should only use nonzero values
> >for debug purposes.  A warning would help yes, how many people read the
> >manual after all :-)
> 
> Thank you both for the feedback. Attached is a simplified patch
> to issue a warning for all builtin_xxx_address calls with any
> non-zero argument.
> 
> Martin
> 

> gcc/ChangeLog
> 2015-07-27  Martin Sebor  
> 
> * c-family/c.opt (-Wbuiltin-address): New warning option.
> * doc/invoke.texi (Wbuiltin-address): Document it.
> * doc/extend.texi (__builtin_frame_addrress, __builtin_return_addrress):

Typoes (rr).

> -  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
> - error ("invalid argument to %<__builtin_frame_address%>");
> -  else
> - error ("invalid argument to %<__builtin_return_address%>");
> +  error ("invalid argument to %qD", fndecl);

That works?  Nice.

>  {
> -  rtx tem
> - = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl),
> -   tree_to_uhwi (CALL_EXPR_ARG (exp, 0)));
> +  /* Number of frames to scan up the stack.  */
> +  const unsigned HOST_WIDE_INT count = tree_to_uhwi (CALL_EXPR_ARG (exp, 
> 0));
> +
> +  rtx tem = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl), 
> count);

Do we need to say "const"?

>/* Some ports cannot access arbitrary stack frames.  */
>if (tem == NULL)
>   {
> -   if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
> - warning (0, "unsupported argument to %<__builtin_frame_address%>");
> -   else
> - warning (0, "unsupported argument to %<__builtin_return_address%>");
> +   warning (0, "invalid argument to %qD", fndecl);

"unsupported argument".

> return const0_rtx;
>   }
> 
> +  if (0 < count)

Yoda :-)  You can just say "if (count)" fwiw.

> +Wbuiltin-address
> +C ObjC C++ ObjC++ Var(warn_builtin_address) Warning LangEnabledBy(C ObjC C++ 
> ObjC++,Wall)
> +Warn when __builtin_frame_address or __builtin_return_address is used 
> unsafely

This is not such a nice warning name, maybe -Wbuiltin-frame-address or
-Wframe-address?

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8562,8 +8562,11 @@ to determine if the top of the stack has been reached.
>  Additional post-processing of the returned value may be needed, see
>  @code{__builtin_extract_return_addr}.
> 
> -This function should only be used with a nonzero argument for debugging
> -purposes.
> +Calling this function with a nonzero argument can have unpredictable
> +effects, including crashing the calling program.  As a result, calls
> +that are considered unsafe are diagnosed when the @option{-Wbuiltin-address}
> +option is in effect.  Such calls are typically only useful in debugging
> +situations.

I like the original "should only be used" better than that last line.
Elsewhere there was a "non-zero" btw, but we should use "nonzero" according
to the coding conventions.  Huh.

> +void* __attribute__ ((weak))

Not all targets support weak.


Segher


[RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-07-28 Thread Kumar, Venkataramanan
Hi Richard,

For Aarch64 target, I was trying to  vectorize  the expression  
"arr[i]=arr[i]*4;"   via vector shifts instructions since they don't have 
vector mults.

unsigned  long int __attribute__ ((aligned (64)))arr[100];
int i;
#if 1
void test_vector_shifts()
{
for(i=0; i<=99;i++)
arr[i]=arr[i]<<2;
}
#endif

void test_vectorshift_via_mul()
{
for(i=0; i<=99;i++)
arr[i]=arr[i]*4;

}

I found a similar PR and your comments 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952#c6. 
Based on that and IRC discussion I had with you,  I added vector recog pattern 
that transforms mults to shifts.  The vectorizer is now able to generate vector 
shifts for the above test case.
PR case also gets vectorized 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952#c10.

This is just an initial patch and tries to optimize integer type power 2 
constants.  I wanted to get feedback on this .  I bootstrapped and reg tested 
on aarch64-none-linux-gnu .

Regards,
Venkat.
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f034635..948203d 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -76,6 +76,10 @@ static gimple vect_recog_vector_vector_shift_pattern 
(vec *,
  tree *, tree *);
 static gimple vect_recog_divmod_pattern (vec *,
 tree *, tree *);
+
+static gimple vect_recog_multconst_pattern (vec *,
+ tree *, tree *);
+
 static gimple vect_recog_mixed_size_cond_pattern (vec *,
  tree *, tree *);
 static gimple vect_recog_bool_pattern (vec *, tree *, tree *);
@@ -90,6 +94,7 @@ static vect_recog_func_ptr 
vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_rotate_pattern,
vect_recog_vector_vector_shift_pattern,
vect_recog_divmod_pattern,
+vect_recog_multconst_pattern,
vect_recog_mixed_size_cond_pattern,
vect_recog_bool_pattern};
 
@@ -2147,6 +2152,87 @@ vect_recog_vector_vector_shift_pattern (vec 
*stmts,
   return pattern_stmt;
 }
 
+static gimple
+vect_recog_multconst_pattern (vec *stmts,
+   tree *type_in, tree *type_out)
+{
+  gimple last_stmt = stmts->pop ();
+  tree oprnd0, oprnd1, vectype, itype;
+  gimple pattern_stmt;
+  enum tree_code rhs_code;
+  optab optab;
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+
+  if (!is_gimple_assign (last_stmt))
+return NULL;
+
+  rhs_code = gimple_assign_rhs_code (last_stmt);
+  switch (rhs_code)
+{
+case MULT_EXPR:
+  break;
+default:
+  return NULL;
+}
+
+  if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo))
+return NULL;
+
+  oprnd0 = gimple_assign_rhs1 (last_stmt);
+  oprnd1 = gimple_assign_rhs2 (last_stmt);
+  itype = TREE_TYPE (oprnd0);
+
+  if (TREE_CODE (oprnd0) != SSA_NAME
+  || TREE_CODE (oprnd1) != INTEGER_CST
+  || TREE_CODE (itype) != INTEGER_TYPE
+  || TYPE_PRECISION (itype) != GET_MODE_PRECISION (TYPE_MODE (itype)))
+return NULL;
+
+  vectype = get_vectype_for_scalar_type (itype);
+  if (vectype == NULL_TREE)
+return NULL;
+
+  /* If the target can handle vectorized multiplication natively,
+ don't attempt to optimize this.  */
+  optab = optab_for_tree_code (rhs_code, vectype, optab_default);
+  if (optab != unknown_optab)
+{
+  machine_mode vec_mode = TYPE_MODE (vectype);
+  int icode = (int) optab_handler (optab, vec_mode);
+  if (icode != CODE_FOR_nothing)
+return NULL;
+}
+
+  /* If target cannot handle vector left shift then we cannot 
+ optimize and bail out.  */ 
+  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
+  if (!optab
+  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
+return NULL;
+
+  if (integer_pow2p (oprnd1))
+{
+  /* Pattern detected.  */
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_recog_multconst_pattern: detected:\n");
+
+  tree shift;
+  shift = build_int_cst (itype, tree_log2 (oprnd1));
+  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (itype, 
NULL),
+ LSHIFT_EXPR, oprnd0, shift);
+  if (dump_enabled_p ())
+   dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM, pattern_stmt,
+  0);
+  stmts->safe_push (last_stmt);
+  *type_in = vectype;
+  *type_out = vectype;
+  return pattern_stmt;
+} 
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
otherwise vectorized:
 
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 48c1f8d..833fe4b 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1131,7 +1131,7 @@ extern void vect_slp_transform_bb (basic_block);
Additional pattern recognition functions can (and will) be added
in th

Re: [Patch] Small refactor on _State<>

2015-07-28 Thread Jonathan Wakely

On 26/07/15 13:38 -0700, Tim Shen wrote:

On Sat, Jul 25, 2015 at 8:31 AM, Jonathan Wakely  wrote:

On 25/07/15 00:11 -0700, Tim Shen wrote:


It's not a very necessary refactoring, but simply can't resist. :)

I'm not sure of the ::memcpy calls. It looks not very idiomatic, but
std::copy on char* looks even more weird? :/



The ::memcpy(this, &rhs, sizeof(rhs))...) makes me quite
uncomfortable, because _State is not trivially-copyable, although its
_State_base base class _is_ trivially-copyable, and is at the same
address and has the same size ... so I think it's safe.

But couldn't you replace that memcpy with an assignment?

   _State_base::operator=(__rhs);


Done.


The implicitly defined assignment operator should do the same thing as
a memcpy.

_State should have a deleted copy assignment operator though (or a
user-provided one that correctly handles the _S_opcode_match case, but
since it's not needed it should just be deleted).


Actually it's needed in _StateSeq::_M_clone, but I defined a normal
member function _State::_M_clone to avoid unexpected copying.


But that's a copy construction, I'm talking about assignment.

The copy constructor is fine, you've defined it and it does the right
thing, so I don't think making it private and defining _M_clone() is
useful. Just copying is safe and correct. If there's an unwanted copy
we just get a slight performance hit, but not a disaster.

What I'm concerned about is assignment. You haven't defined an
assignment operator. If there's an unwanted assignment we could get
undefined behaviour. Please delete the assignment operator if it's not
needed.

The private default constructor doesn't seem to be used, so that can
go, and I would get rid of _M_clone() and make the copy constructor
public again, although I'm not going to insist on that if you really
prefer to add _M_clone.


Also, what are the alignment requirements on the _Matcher<> objects?
Is it definitely safe to store them in the _M_matcher_storage buffer?

You could use alignas(_Matcher) to be sure (using char there
should be OK because std::function has the same alignment
as std::function or any other _Matcher type).



--
Regards,
Tim Shen



diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..c9f7bb3 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -70,51 +70,115 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _S_opcode_accept,
  };

-  struct _State_base
-  {
-_Opcode  _M_opcode;   // type of outgoing transition
-_StateIdT_M_next; // outgoing transition
-union // Since they are mutually exclusive.
+  template
+struct _State_base
{
-  size_t _M_subexpr;// for _S_opcode_subexpr_*
-  size_t _M_backref_index;  // for _S_opcode_backref
-  struct
+protected:
+  _Opcode  _M_opcode;   // type of outgoing transition
+
+public:
+  _StateIdT_M_next; // outgoing transition
+  union // Since they are mutually exclusive.
  {
-   // for _S_opcode_alternative, _S_opcode_repeat and
-   // _S_opcode_subexpr_lookahead
-   _StateIdT  _M_alt;
-   // for _S_opcode_word_boundary or _S_opcode_subexpr_lookahead or
-   // quantifiers (ungreedy if set true)
-   bool   _M_neg;
+   size_t _M_subexpr;// for _S_opcode_subexpr_*
+   size_t _M_backref_index;  // for _S_opcode_backref
+   struct
+   {
+ // for _S_opcode_alternative, _S_opcode_repeat and
+ // _S_opcode_subexpr_lookahead
+ _StateIdT  _M_alt;
+ // for _S_opcode_word_boundary or _S_opcode_subexpr_lookahead or
+ // quantifiers (ungreedy if set true)
+ bool   _M_neg;
+   };
+   char _M_matcher_storage[_Matcher_size];// for _S_opcode_match
  };
-};

-explicit _State_base(_Opcode __opcode)
-: _M_opcode(__opcode), _M_next(_S_invalid_state_id)
-{ }
+protected:
+  _State_base() : _M_opcode(_S_opcode_unknown) { }

-  protected:
-~_State_base() = default;
+  explicit _State_base(_Opcode __opcode)
+  : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
+  { }
+
+public:
+  bool
+  _M_has_alt()
+  {
+   return _M_opcode == _S_opcode_alternative
+ || _M_opcode == _S_opcode_repeat
+ || _M_opcode == _S_opcode_subexpr_lookahead;
+  }

-  public:
#ifdef _GLIBCXX_DEBUG
-std::ostream&
-_M_print(std::ostream& ostr) const;
+  std::ostream&
+  _M_print(std::ostream& ostr) const;

-// Prints graphviz dot commands for state.
-std::ostream&
-_M_dot(std::ostream& __ostr, _StateIdT __id) const;
+  // Prints graphviz dot commands for state.
+  std::ostream&
+  _M_dot(std::ostream& __ostr, _StateIdT __id) const;
#endif
-  };
+};

-  template
-struct _State : _State_base
+  template
+struct _State : _Stat

Re: [gomp4] Add new oacc_transform patch

2015-07-28 Thread Cesar Philippidis
On 07/28/2015 02:21 AM, Thomas Schwinge wrote:

> Cesar, please address the following compiler diagnostig:
> 
>> 2015-07-21  Cesar Philippidis  
>>
>>  gcc/
>>  * omp-low.c (execute_oacc_transform): New function.
>>  (class pass_oacc_transform): New function.
>>  (make_pass_oacc_transform): New function.
>>  * passes.def: Add pass_oacc_transform to all_passes.
>>  * tree-pass.h (make_pass_oacc_transform): Declare.
>>  
>>
>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>> index 388013c..23989f9 100644
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -14394,4 +14394,76 @@ make_pass_late_lower_omp (gcc::context *ctxt)
>>return new pass_late_lower_omp (ctxt);
>>  }
>>  
>> +/* Main entry point for oacc transformations which run on the device
>> +   compiler.  */
>> +
>> +static unsigned int
>> +execute_oacc_transform ()
>> +{
>> +  basic_block bb;
>> +  gimple_stmt_iterator gsi;
>> +  gimple stmt;
>> +
>> +  if (!lookup_attribute ("oacc function",
>> + DECL_ATTRIBUTES (current_function_decl)))
>> +return 0;
>> +
>> +
>> +  FOR_ALL_BB_FN (bb, cfun)
>> +{
>> +  gsi = gsi_start_bb (bb);
>> +
>> +  while (!gsi_end_p (gsi))
>> +{
>> +  stmt = gsi_stmt (gsi);
>> +  gsi_next (&gsi);
>> +}
>> +}
>> +
>> +  return 0;
>> +}
> 
> [...]/source-gcc/gcc/omp-low.c: In function 'unsigned int 
> execute_oacc_transform()':
> [...]/source-gcc/gcc/omp-low.c:14406:10: error: variable 'stmt' set but 
> not used [-Werror=unused-but-set-variable]
>gimple stmt;
>   ^

I could apply the attached patch, but I figured that you'd need the stmt
iterator for acc_on_device anyway. Should I apply the patch to
gomp-4_0-branch?

Cesar

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 479b28a..e237c75 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -14431,26 +14431,10 @@ make_pass_late_lower_omp (gcc::context *ctxt)
 static unsigned int
 execute_oacc_transform ()
 {
-  basic_block bb;
-  gimple_stmt_iterator gsi;
-  gimple stmt;
-
   if (!lookup_attribute ("oacc function",
 			 DECL_ATTRIBUTES (current_function_decl)))
 return 0;
 
-
-  FOR_ALL_BB_FN (bb, cfun)
-{
-  gsi = gsi_start_bb (bb);
-
-  while (!gsi_end_p (gsi))
-	{
-	  stmt = gsi_stmt (gsi);
-	  gsi_next (&gsi);
-	}
-}
-
   return 0;
 }
 


Re: [PATCH 3/4] Add libgomp plugin for Intel MIC

2015-07-28 Thread Maxim Blumental
 Applied the idea with python script alternative. Review, please.

2015-07-24 17:18 GMT+03:00 David Malcolm :
> On Fri, 2015-07-24 at 10:01 +0200, Jakub Jelinek wrote:
>> #!/usr/bin/python
>> import sys
>> with open(sys.argv[1],"rb") as f:
>> nextblock = f.read(12)
>> while 1:
>> block = nextblock
>> nextblock = f.read(12)
>> if block == "":
>> break
>> str = ""
>> for ch in block:
>> if str == "":
>> str = "  "
>> else:
>> str += ", "
>> if ord(ch) < 10:
>> str += "0x0" + chr(ord('0')+ord(ch))
>> elif ord(ch) < 16:
>> str += "0x0" + chr(ord('a')+ord(ch)-10)
>> else:
>> str += hex(ord(ch))
>> if nextblock != "":
>> str += ","
>> print str
>>
>> python ./xxd.py $< >> $@
>> does the same thing as
>> cat $< | xxd -include >> $@
>> (CCing David as python expert, my python knowledge is limited and
>> 15 years old, not sure how portable this is (python 2 vs. python 3,
>> and
>> even python 2 minimal versions)).
>
> It doesn't work with Python 3 for various reasons ("print" syntax, and
> str vs bytes issues).
>
> I'm attaching a version which works with both Python 2 and Python 3
> (2.7.5 and 3.3.2 were the versions I tried).
>
> It ought to work with much older python 2 versions (as your script
> appears to), but I don't have them handy.
>
> Presumably it would need a license header and some descriptive comments.
>
> (snip)
>
> Dave



-- 


-
Sincerely yours,
Maxim Blumental
2015-07-28  Maxim Blumenthal  

* configure.ac: Add a check for xxd or python presence when the target
is intelmic or intelmicemul.
* configure: Regenerate.
* liboffloadmic/plugin/Makefile.am: Add a condition into
make_target_image.h generating code.  This condition performs an
action with either xxd or a special python script during the
generating.
* liboffloadmic/plugin/xxd.py: New file.
* liboffloadmic/plugin/Makefile.in: Regenerate.


xxd_check.patch
Description: Binary data


Re: [PATCH] warn for unsafe calls to __builtin_return_address

2015-07-28 Thread Martin Sebor

gcc/ChangeLog
2015-07-27  Martin Sebor  

 * c-family/c.opt (-Wbuiltin-address): New warning option.
 * doc/invoke.texi (Wbuiltin-address): Document it.
 * doc/extend.texi (__builtin_frame_addrress, __builtin_return_addrress):


Typoes (rr).


Fixed.




-  rtx tem
-   = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl),
- tree_to_uhwi (CALL_EXPR_ARG (exp, 0)));
+  /* Number of frames to scan up the stack.  */
+  const unsigned HOST_WIDE_INT count = tree_to_uhwi (CALL_EXPR_ARG (exp, 
0));
+
+  rtx tem = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl), 
count);


Do we need to say "const"?


No, we don't. FWIW, I find code easier to think about when it's
explicit about things like this, even if they have no semantic
effect. But since it's not common practice I took the const out.




/* Some ports cannot access arbitrary stack frames.  */
if (tem == NULL)
{
- if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
-   warning (0, "unsupported argument to %<__builtin_frame_address%>");
- else
-   warning (0, "unsupported argument to %<__builtin_return_address%>");
+ warning (0, "invalid argument to %qD", fndecl);


"unsupported argument".


Thanks, fixed.


+  if (0 < count)


Yoda :-)  You can just say "if (count)" fwiw.


Sure.


This is not such a nice warning name, maybe -Wbuiltin-frame-address or
-Wframe-address?


I renamed it to -Wframe-address.


I like the original "should only be used" better than that last line.


Okay, reworded.


Elsewhere there was a "non-zero" btw, but we should use "nonzero" according
to the coding conventions.  Huh.


Changed.


Not all targets support weak.


I replaced it with __attribute__((noclone, noinline)).

Attached is an updated patch with the changes above.

Thanks
Martin
gcc/ChangeLog
2015-07-28  Martin Sebor  

* c-family/c.opt (-Wframe-address): New warning option.
* doc/invoke.texi (Wframe-address): Document it.
* doc/extend.texi (__builtin_frame_address, __builtin_return_address):
Clarify possible effects of calling the functions with non-zero
arguments and mention -Wframe-address.
* builtins.c (expand_builtin_frame_address): Handle -Wframe-address.

gcc/testsuite/ChangeLog
2015-07-28  Martin Sebor  

* g++.dg/Wframe-address-in-Wall.C: New test.
* g++.dg/Wframe-address.C: New test.
* g++.dg/Wno-frame-address.C: New test.
* gcc.dg/Wframe-address-in-Wall.c: New test.
* gcc.dg/Wframe-address.c: New test.
* gcc.dg/Wno-frame-address.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e8fe3db..b7c5572 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4564,34 +4564,38 @@ expand_builtin_frame_address (tree fndecl, tree exp)
 {
   /* The argument must be a nonnegative integer constant.
  It counts the number of frames to scan up the stack.
- The value is the return address saved in that frame.  */
+ The value is either the frame pointer value or the return
+ address saved in that frame.  */
   if (call_expr_nargs (exp) == 0)
 /* Warning about missing arg was already issued.  */
 return const0_rtx;
   else if (! tree_fits_uhwi_p (CALL_EXPR_ARG (exp, 0)))
 {
-  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
-	error ("invalid argument to %<__builtin_frame_address%>");
-  else
-	error ("invalid argument to %<__builtin_return_address%>");
+  error ("invalid argument to %qD", fndecl);
   return const0_rtx;
 }
   else
 {
-  rtx tem
-	= expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl),
-  tree_to_uhwi (CALL_EXPR_ARG (exp, 0)));
+  /* Number of frames to scan up the stack.  */
+  unsigned HOST_WIDE_INT count = tree_to_uhwi (CALL_EXPR_ARG (exp, 0));
+
+  rtx tem = expand_builtin_return_addr (DECL_FUNCTION_CODE (fndecl), count);

   /* Some ports cannot access arbitrary stack frames.  */
   if (tem == NULL)
 	{
-	  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
-	warning (0, "unsupported argument to %<__builtin_frame_address%>");
-	  else
-	warning (0, "unsupported argument to %<__builtin_return_address%>");
+	  warning (0, "unsupported argument to %qD", fndecl);
 	  return const0_rtx;
 	}

+  if (count)
+	{
+	  /* Warn since no effort is made to ensure that any frame
+	 beyond the current one exists or can be safely reached.  */
+	  warning (OPT_Wframe_address, "calling %qD with "
+		   "a nonzero argument is unsafe", fndecl);
+	}
+
   /* For __builtin_frame_address, return what we've got.  */
   if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_FRAME_ADDRESS)
 	return tem;
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 285952e..ccbb399 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -295,6 +295,10 @@ Wbool-compare
 C ObjC C++ ObjC++ Var(warn_bool_compare) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
 Warn abo

[PATCH][AARCH64] Make arm_align_max_stack_pwr.c and arm_align_max_pwr.c compile testcase, instead of execution.

2015-07-28 Thread Renlin Li

Hi all,

This is a simple patch to make arm_align_max_stack_pwr.c and 
arm_align_max_pwr.c compile test cases, instead of execution tests.


In my local machine, those test cases pass. However, they fail on some 
systems with process memory usage restrictions. Anyway, the required 
space for those two newly defined macros are too big.


By rewriting the test cases, the basic maximum alignment support is 
checked at compile time. The correct code generation is checked by 
scanning assembly output.


Tested using aarch64-none-linux-gnu and aarch64-none-elf toolchain. They 
all passes.


Okay to commit?

gcc/testsuite/ChangeLog:

2015-07-28  Renlin Li  

* gcc.target/aarch64/arm_align_max_pwr.c: Make it a compile test case,
check the assembly.
* gcc.target/aarch64/arm_align_max_stack_pwr.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c b/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c
index bbb4c6f..ffa4d22 100644
--- a/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c
+++ b/gcc/testsuite/gcc.target/aarch64/arm_align_max_pwr.c
@@ -1,15 +1,23 @@
-/* { dg-do run } */
-
-#include 
-#include 
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
 
 #define align (1ul << __ARM_ALIGN_MAX_PWR)
 static int x __attribute__ ((aligned (align)));
+static int y __attribute__ ((aligned (align)));
+
+extern void foo (int *x, int *y);
+extern int bar (int x, int y);
 
 int
-main ()
+dummy ()
 {
-  assert unsigned long)&x) & (align - 1)) == 0);
+  int result;
 
-  return 0;
+  foo (&x, &y);
+  result = bar (x, y);
+
+  return result;
 }
+
+/* { dg-final { scan-assembler-times "zero\t4" 2 } } */
+/* { dg-final { scan-assembler "zero\t268435452" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c b/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c
index 7a6355b..ea22b80 100644
--- a/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c
+++ b/gcc/testsuite/gcc.target/aarch64/arm_align_max_stack_pwr.c
@@ -1,15 +1,23 @@
-/* { dg-do run } */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
 
 #include 
 #include 
 
 #define align (1ul << __ARM_ALIGN_MAX_STACK_PWR)
+extern void foo (int *x);
+extern int bar (int x);
 
 int
-main ()
+dummy ()
 {
   int x __attribute__ ((aligned (align)));
+  int result;
 
-  assert unsigned long)&x) & (align - 1)) == 0);
-  return 0;
+  foo (&x);
+  result = bar (x);
+
+  return result;
 }
+
+/* { dg-final { scan-assembler "and\tx\[0-9\]+, x\[0-9\]+, -65536" } } */


Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-28 Thread Steve Ellcey
Marek,

I have run into a problem with this warning and building glibc.

sysdeps/ieee754/s_matherr.c has:

int
weak_function
__matherr(struct exception *x)
{
int n=0;
if(x->arg1!=x->arg1) return 0;
return n;
}


And arg1 is a floating point type.  I think that if the value of
x->xarg1 is a NaN then the if statement should return TRUE because a NaN
never compares equal to anything, even another NaN (check with your
local IEEE expert).  I believe this method of checking for a NaN is
fairly common and I am not sure if GCC should be emitting a warning for
it.

Steve Ellcey
sell...@imgtec.com



[Patch, MIPS] MIPS specific optimization for o32 ABI

2015-07-28 Thread Steve Ellcey
This patch implements a MIPS o32 ABI specific optimization called frame
header optimization.  In the o32 ABI, routines allocate 16 bytes on the
stack before calling another routine.  This space is used by the callee
as space to write the register arguments to if their address is taken.
The n32 and n64 ABI's use the more common approach of copying register
arguments to local variables if their address is needed.

This optimization allows the callee to use that 16 bytes for other
purposes if it does not need it to write its arguments out to memory and
if it only needs 16 bytes of stack space (or less) for saving callee-saved
registers.

This can allow us to avoid having to allocate extra stack space in a routine
and to remove the stack pointer increment/decrement instructions from the 
prolog and epilogue which results in a small performance improvement.

This patch has been in the Mentor GCC toolchain for MIPS for a while and
gotten some testing there and I tested it on the top-of-tree GCC sources
with no regressions.

OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-07-28  Steve Ellcey  
Zoran Jovanovic  
Catherine Moore  
Tom de Vries  

* config/mips/mips.opt (mframe-header-opt): New option.
* config/mips/mips.c (struct mips_frame_info): Add
skip_stack_frame_allocation_p field.
(struct machine_function): Add callees_use_frame_header_p,
uses_frame_header_p, and initial_total_size fields.
(mips_frame_header_usage): New hash.
(mips_find_if_frame_header_is_used): New Function.
(mips_callee_use_frame_header): New Function.
(mips_callees_use_frame_header_p): New Function.
(mips_cfun_use_frame_header_p): New Function.
(mips_get_updated_offset): New Function.
(mips_skip_stack_frame_alloc): New Function.
(mips_frame_header_update_insn): New Function.
(mips_rest_of_frame_header_opt): New function.
(mips_compute_frame_info): Add recalculate and frame arguments.
(mips_frame_pointer_required): Add new args to
mips_compute_frame_info call.
(mips_initial_elimination_offset): Ditto.
(mips_gp_expand_needed_p): New function factored out of
mips_expand_ghost_gp_insns.
(mips_expand_ghost_gp_insns): Use mips_gp_expand_needed_p.
(mips_reorg): Use mips_rest_of_frame_header_opt.



2015-07-28  Steve Ellcey  
Tom de Vries  

* gcc.target/mips/fho-1.c: New test.
* gcc.target/mips/fho-2.c: New test.
* gcc.target/mips/mips.exp: Add -mframe-header-opt to
mips_option_groups.

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index c3cd52d..7cdef89 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "dumpfile.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -380,6 +381,9 @@ struct GTY(())  mips_frame_info {
 
   /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT hard_frame_pointer_offset;
+
+  /* Skip stack frame allocation if possible.  */
+  bool skip_stack_frame_allocation_p;
 };
 
 /* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts.  */
@@ -472,6 +476,15 @@ struct GTY(())  machine_function {
   /* True if this is an interrupt handler that should use DERET
  instead of ERET.  */
   bool use_debug_exception_return_p;
+
+  /* True if some of the callees uses its frame header.  */
+  bool callees_use_frame_header_p;
+
+  /* True if current function uses its frame header.  */
+  bool uses_frame_header_p;
+
+  /* Frame size before updated by optimizations.  */
+  HOST_WIDE_INT initial_total_size;
 };
 
 /* Information about a single argument.  */
@@ -574,6 +587,8 @@ struct mips_rtx_cost_data
 
 /* Global variables for machine-dependent things.  */
 
+static hash_map *mips_frame_header_usage;
+
 /* The -G setting, or the configuration's default small-data limit if
no -G option is given.  */
 static unsigned int mips_small_data_threshold;
@@ -1296,6 +1311,7 @@ static const struct mips_rtx_cost_data
   }
 };
 
+static void mips_rest_of_frame_header_opt (void);
 static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool);
 static int mips_register_move_cost (machine_mode, reg_class_t,
 reg_class_t);
@@ -10358,6 +10374,114 @@ mips_save_reg_p (unsigned int regno)
   return false;
 }
 
+/* Try to find if function may use its incoming frame header.  */
+
+static bool
+mips_find_if_frame_header_is_used (tree fndecl)
+{
+  bool *frame_header_unused;
+
+  if (mips_frame_header_usage)
+frame_header_unused = mips_frame_header_usage->get (fndecl);
+  else
+frame_header_unused = false;
+
+  return !frame_header_unused;
+}
+
+/* Return true if the instruction is a call and the called function may use its
+   incoming

[gomp4] Redesign oacc_parallel launch API

2015-07-28 Thread Nathan Sidwell
I've committed this patch to the gomp4 branch to redo the launch API.  I'll post 
a version for trunk once the versioning patch gets approved & committed.


This changes the API in a number of ways, allowing device-specific knowledge to 
be moved into the device compiler and out of the host compiler.


Firstly, we attach a tuple of launch dimensions as an attribute to the offloaded 
function's 'oacc function' attribute.  These are the constant launch dimensions. 
 Dynamic dimensions get a zero for their slot in this list.  Further this list 
can be extended in the future to an alist keyed by device_type.


Dynamic dimensions are computed on the host.  however they are passed via 
varadic args to the GOACC_parallel function (which is renamed).  The varadic 
args are passed using key/value representation, and 3 keys are currently defined:

END -- end of the varadic list
DIM - set of runtime-computed dimensions.  Only the dynamic ones are passed.
ASYNC_WAIT - an async and a set of waits (possibly zero).

I have arranged for the key to have a slot that can later be filled by 
device_type, and hence support multiple device types.


The constant dimensions can be used in expansion of the GOACC_nid function in 
the device compiler.  The device compiler could also process that list to select 
the device_type slot that is appropriate.


For PTX the backend is augmented to emit the launch dimensions into the target 
data, from whence the ptx plugin can pick them up and overwrite with any dynamic 
ones passed in from the launch function.


nathan
2015-07-28  Nathan Sidwell  

	include/
	* gomp-constants.h (GOMP_DIM_GANG, GOMP_DIM_WORKER,
	GOMP_DIM_VECTOR): New.
	(GOMP_DIM_MAX, GOMP_DIM_MASK): New.
	(GOMP_LAUNCH_END, GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC_WAIT): New.
	(GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUCNH_DEVICE_SHIFT,
	GOMP_LAUNCH_OP_SHIFT): New.
	(GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE,
	GOMP_LAUNCH_OP): New.
	(GOMP_VERSION_NVIDIA_PTX): Increment to 1.

	gcc/
	* tree.h (OMP_CLAUSE_EXPR): New.
	* omp-low.c (creste_omp_child_function): Do not set oacc function
	attribute here.
	(oacc_launch_pack): New.
	(OACC_FN_ATTRIB): New define.
	(set_oacc_fn_attrib): New.
	(get_oacc_fn_attrib): New.
	(expand_omp_target): Reimplement openacc launch parameters.
	* omp-low.h (get_oacc_fn_attrib): Declare.
	* omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL): Change type.
	(BUILT_IN_GOACC_PARALLEL): Change type and target name.
	* builtin-types.def
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_SIZE_INT_INT_VAR): Replace with ...
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_SIZE_VAR): ... this.
	* tree-parloops.c (create_parallel_loop): Adjust index of
	shared_size arg.
	* except.c: Include omp-low.h
	(finish_eh_generation): Call get_oacc_fn_attrib.
	* config/nvptx/mkoffload.c (process): Accumulate compute grid
	dimensions and emit them.
	* config/nvptx/nvptx.c: Include gomp-constants.h
	(nvptx_record_offload_symbol): Emit compute grid dimensions.

	libgomp/
	* libgomp.map: Add GOACC_parallel_keyed.
	* libgomp.h (struct acc_dispatch_t): Change exec_func parameters.
	* libgomp_g.h (GOACC_parallel): Replace with ...
	(GOACC_parallel_keyed): ... this.
	* oacc-parallel.c (goacc_wait): Take pointer to va_list.  Adjust
	all callers.
	(GOACC_parallel_keyed): Use varadic keyed interface for optional
	parameters.  Renamed from ...
	(GOACC_parallel): ... here.  Replace with forwarding fn.
	* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Adjust
	parameters.
	* plugin/plugin-nvptx.c (struct targ_fn_launch): New structure.
	(targ_fn_descriptor): Point to targ_fn_launch instance.
	(nvptx_exec): Adjust parameters.  Process compute dimensions.
	(struct nvptx_tdata): Adjust type.
	(GOMP_OFFLOAD_load_image_ver): Adjust function handling.
	(GOMP_OFFLOAD_openacc_parallel): Adjust.


	gcc/c-family/
	* c-common.c (DEF_FUNCTION_TYPE_VAR_12): Delete.

	gcc/fortran/
	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_12): Delete.
	* types.def (BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_SIZE_INT_INT_VAR): Replace with ...
	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_SIZE_INT_INT_VAR): ... this.

	gcc/lto/
	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_12): Delete.

Index: libgomp/oacc-parallel.c
===
--- libgomp/oacc-parallel.c	(revision 226312)
+++ libgomp/oacc-parallel.c	(working copy)
@@ -171,15 +171,12 @@ goacc_deallocate_static (acc_device_t d)
   alloc_done = false;
 }
 
-static void goacc_wait (int async, int num_waits, va_list ap);
+static void goacc_wait (int async, int num_waits, va_list *ap);
 
 void
-GOACC_parallel (int device, void (*fn) (void *),
-		size_t mapnum, void **hostaddrs, size_t *sizes,
-		unsigned short *kinds,
-		int num_gangs, int num_workers, int vector_length,
-		size_t shared_size,
-		int async, int num_waits, ...)
+GOACC_parallel_keyed (int device, void (*fn) (void *), size_t mapnum,
+		  void **hostaddrs, size_t *sizes, unsigned short *kinds,
+		  s

Re: [gomp4] Redesign oacc_parallel launch API

2015-07-28 Thread Nathan Sidwell
Oh, one more thing.  I placed constants for the 3 launch dimensions into 
gomp-constants.h, as they are needed by both library and compiler.  Working on a 
patch to remove the current set of constants from omp-low.h


nathan


[PATCH, PR 66521] Fix bootstrap segfault with vtable verification enabled

2015-07-28 Thread Caroline Tice
I believe the following patch fixes a problem with bootstrap failures
on some architectures with vtable verification enabled.  The problem
was related to a change in name mangling, where classes with anonymous
namespaces get "" as their DECL_ASSEMBLER_NAME, rather than the
real mangled name.  This was causing multiple problems for vtable
verification, since not only do we use the mangled name to uniquely
identify the various classes (the anonymous classes were no longer
being properly 'uniqued'), but also the DECL_ASSEMBLER_NAME was being
incorporated into our variable names and ending up in the assembly
code and  angle-brackets are not legal there.

This patch should fix those problems, as well as a few other minor
issues I found while working on this.

I have bootstrapped with this patch on an x85_64 linux system; I have
run all the testsuites with no regressions; and I have verified that
it fixes the problem.  Is this ok to commit?

-- Caroline Tice
cmt...@google.com

ChangeLogs:

libvtv/ChangeLog

2015-07-28  Caroline Tice  

PR 66521
* Makefile.am: Update to match latest tree.
* Makefile.in: Regenerate.
* testsuite/lib/libvtv: Brought up to date.
* vtv_malloc.cc (VTV_DEBUG): Update function call to match renamed
function (old bug!).
* vtv_rts.cc (debug_functions, debug_init, debug_verify_vtable): Update
initializations to work correctly with VTV_DEBUG defined.

gcc/ChangeLog:

2015-07-28  Caroline Tice  

PR 66521
* vtable-verify.c (vtbl_mangled_name_types, vtbl_mangled_name_ids): New
global variables.
(vtbl_find_mangled_name):  New function.
(vtbl_register_mangled_name):  New function.
(vtbl_map_get_node):  If DECL_ASSEMBLER_NAME is "", look up
mangled name in mangled name vectors.
(find_or_create_vtbl_map_node):  Ditto.
(var_is_used_for_virtual_call_p):  Add recursion_depth parameter;
update recursion_depth on function entry; pass it to every recursive
call; automatically exit if depth > 25 (give up looking at that point).
(verify_bb_vtables):  Initialize recursion_depth and pass it to
var_is_used_for_virtual_call_p.
* vtable-verify.h (vtbl_mangbled_name_types, vtbl_mangled_name_ids): New
global variable decls.
(vtbl_register_mangled_name): New extern function decl.

gcc/cp/ChangeLog:
2015-07-28  Caroline Tice  

PR 66521
* mangle.c : Add vtable-verify.h to include files.
(get_mangled_vtable_map_var_name):  If the DECL_ASSEMBLER_NAME
is "" get the real mangled name for the class instead, and
also store the real mangled name in a vector for use later.
Index: gcc/cp/mangle.c
===
--- gcc/cp/mangle.c	(revision 226275)
+++ gcc/cp/mangle.c	(working copy)
@@ -62,6 +62,7 @@
 #include "function.h"
 #include "cgraph.h"
 #include "attribs.h"
+#include "vtable-verify.h"
 
 /* Debugging support.  */
 
@@ -4034,6 +4035,13 @@
   gcc_assert (TREE_CODE (class_type) == RECORD_TYPE);
 
   tree class_id = DECL_ASSEMBLER_NAME (TYPE_NAME (class_type));
+
+  if (strstr (IDENTIFIER_POINTER (class_id), "") != NULL)
+{
+  class_id = get_mangled_id (TYPE_NAME (class_type));
+  vtbl_register_mangled_name (TYPE_NAME (class_type), class_id);
+}
+
   unsigned int len = strlen (IDENTIFIER_POINTER (class_id)) +
  strlen (prefix) +
  strlen (postfix) + 1;
Index: gcc/vtable-verify.c
===
--- gcc/vtable-verify.c	(revision 226275)
+++ gcc/vtable-verify.c	(working copy)
@@ -310,6 +310,70 @@
 /* Vtable map variable nodes stored in a vector.  */
 vec vtbl_map_nodes_vec;
 
+/* Vector of mangled names for anonymous classes.  */
+extern GTY(()) vec *vtbl_mangled_name_types;
+extern GTY(()) vec *vtbl_mangled_name_ids;
+vec *vtbl_mangled_name_types;
+vec *vtbl_mangled_name_ids;
+
+/* Look up class_type (a type decl for record types) in the vtbl_mangled_names_*
+   vectors.  This is a linear lookup.  Return the associated mangled name for
+   the class type.  This is for handling types from anonymous namespaces, whose
+   DECL_ASSEMBLER_NAME ends up being "", which is useless for our
+   purposes.
+
+   We use two vectors of trees to keep track of the mangled names:  One is a
+   vector of class types and the other is a vector of the mangled names.  The
+   assumption is that these two vectors are kept in perfect lock-step so that
+   vtbl_mangled_name_ids[i] is the mangled name for
+   vtbl_mangled_name_types[i].  */
+
+static tree
+vtbl_find_mangled_name (tree class_type)
+{
+  tree result = NULL_TREE;
+  unsigned i;
+
+  if (!vtbl_mangled_name_types or !vtbl_mangled_name_ids)
+return result;
+
+  if (vtbl_mangled_name_types->length() != vtbl_mangled_name_ids->length())
+return result;
+
+  for (i = 0; i < vtbl_mangled_name_types->length(); ++i)
+if ((*vtbl_mangled_name_types)[i] == class_type)
+  {
+	result = (*vtbl_mangled_name_ids)[i];
+	break;
+  }
+
+  return result;
+}
+
+/*

Re: Re: [PATCH] [PATCH][ARM] Fix sibcall testcases.

2015-07-28 Thread Alex Velenko

Hi,

Following last patch, this patch patch prevents arm_thumb1 XPASS in
sibcall-3.c and sibcall-4.c by skipping on arm_thumb1 and arm_thumb2
respectively.
This patch also documents arm_thumb1 and arm_thumb2 effective target 
options.


Is patch ok for trunk and fsf-5?

gcc/testsuite

2015-07-28  Alex Velenko  

* gcc.dg/sibcall-3.c (dg-skip-if): Skip if arm_thumb1.
* gcc.dg/sibcall-4.c (dg-skip-if): Likewise.

gcc/

2015-07-28  Alex Velenko  

* doc/sourcebuild.texi (arm_thumb1): Documented.
(arm-thumb2): Likewise.
---
 gcc/doc/sourcebuild.texi | 8 
 gcc/testsuite/gcc.dg/sibcall-3.c | 1 +
 gcc/testsuite/gcc.dg/sibcall-4.c | 1 +
 3 files changed, 10 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index c6ef40e..ca42a09 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1551,6 +1551,14 @@ options.  Some multilibs may be incompatible with 
these options.
 ARM Target supports @code{-mfpu=neon-fp16 -mfloat-abi=softfp} or 
compatible

 options.  Some multilibs may be incompatible with these options.

+@item arm_thumb1
+ARM target interworks with Thumb-1 - given @code{-mthumb-interwork} 
both ARM and

+Thumb code may be generated interleaved.
+
+@item arm_thumb2
+ARM target interworks with Thumb-2 - given @code{-mthumb-interwork} 
both ARM and

+Thumb code may be generated interleaved.
+
 @item arm_thumb1_ok
 ARM target generates Thumb-1 code for @code{-mthumb}.

diff --git a/gcc/testsuite/gcc.dg/sibcall-3.c 
b/gcc/testsuite/gcc.dg/sibcall-3.c

index eafe8dd..e44596e 100644
--- a/gcc/testsuite/gcc.dg/sibcall-3.c
+++ b/gcc/testsuite/gcc.dg/sibcall-3.c
@@ -8,6 +8,7 @@
 /* { dg-do run { xfail { { cris-*-* crisv32-*-* h8300-*-* hppa*64*-*-* 
m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* 
v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */

 /* -mlongcall disables sibcall patterns.  */
 /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 /* { dg-options "-O2 -foptimize-sibling-calls" } */

 /* The option -foptimize-sibling-calls is the default, but serves as
diff --git a/gcc/testsuite/gcc.dg/sibcall-4.c 
b/gcc/testsuite/gcc.dg/sibcall-4.c

index 1e039c6..5c69490 100644
--- a/gcc/testsuite/gcc.dg/sibcall-4.c
+++ b/gcc/testsuite/gcc.dg/sibcall-4.c
@@ -8,6 +8,7 @@
 /* { dg-do run { xfail { { cris-*-* crisv32-*-* h8300-*-* hppa*64*-*-* 
m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* 
v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */

 /* -mlongcall disables sibcall patterns.  */
 /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */
+/* { dg-skip-if "" { arm_thumb1 } } */
 /* { dg-options "-O2 -foptimize-sibling-calls" } */

 /* The option -foptimize-sibling-calls is the default, but serves as
--
1.8.1.2



Re: Another benefit of the new if converter: better performance for half hammocks when running the generated code on a modern high-speed CPU with write-back caching, relative to the code produced by t

2015-07-28 Thread Abe

[Richard wrote:]

Note the store to *pointer can be done unconditionally


Yes; if I`m mapping things correctly in my mind, this is
something that Sebastian [and Alan, via email?] and I have
already discussed and which we plan to fix in good time.

Please note that this is a minor problem at most,
if/when it it safe to assume that the target can handle
two vectorized conditional operations in the same loop,
since anything remotely resembling an expensive
operation in the [pure] condition should be [is being?]
computed once per index and stored in a temporary.
For example: if the source code looks something like:

  if ( condition(index) )  A[index] = foo(index);
  // important: no else here

... then the new converter currently converts it to something like:

  /* an appropriate type goes here */ __compiler_temp_1 = condition(index);
  /* the type of the scalar goes here */ * pointer = __compiler_temp_1 ? &A[index] 
: &scratchpad;
  /* an appropriate type goes here */ __compiler_temp_2 = foo(index);
  *pointer = __compiler_temp_1 ? __compiler_temp_2 : scratchpad;

… so "condition(index)" is being evaluated only once
per {evaluation that exists in the source code}.

The fix for this would/will therefor be a minor optimization IMO;
the benefit would/will be that in/for iterations/columns
for which the condition is false, the scratchpad will not be
needlessly read from in order to derive the value to throw away.
Always throwing away the unneeded result of evaluating "foo(index)"
is good enough, and by removing an unneeded conditional expression
the burden on the vectorizer is reduced: it now only needs:
  {vectorized decision followed by vectorized store}
in each such loop, not:
  {vectorized decision followed by vectorized decision followed by vectorized 
store}.
[intentionally omitting whatever else it must do
 in a vectorized manner in the same loop]

This is something we [Sebastian and I] plan on fixing eventually anyway,
i.e. regardless of whether or not it fixes a test case we already have.


[Richard wrote:]

and note that another performance issue of if-conversion
is  that foo(bar) is executed unconditionally.


AFAIK this is a fundamental limitation/necessity of if conversion.

A fundamental assumption/requirement here is that "foo(bar)"/"foo(index)"
is/are both pure and low-cost.  [I`ve renamed the example to "foo(index)"
to show that it`s not loop-invariant, since if it were then LICM should
make multiple evaluations of it unneeded and probably not going to happen
unless you are targeting a VLIW ISA and have an unused slot in the
instruction word if you do LICM on the sub-instruction in question.]

If "foo(index)" is not being checked for purity,
then we have a correctness bug.

If "foo(index)" is not being checked for low evaluation cost,
then we have a performance bug IMO.  The compiler should use its
existing estimation mechanism[s] to make an educated guess on
the cost of "foo(index)" and intentionally not do if conversion
if/when {the predicted cost of evaluating "foo(index)"
 for each iteration regardless of the condition bits}
is too high even in the presence of vectorization.


[Richard wrote:]

We have a bugreport that
   if (C[index]) A[index] = exp (x);
massively slows down things if C[index] is almost never true.


Quite understandable.  However, unfortunately I cannot think of
any mechanism that already exists in GCC [or any other compiler
the internals of which I am even slightly familiar] to estimate
the probability of the elements of an arbitrary array --
or [worse yet] of the probability of an arbitrary expression`s
evaluation result -- being convertible to either particular
Boolean value.  Perhaps this is feasible if/when "C[...]" is
truly an array, i.e. not a pointer, and the array`s contents
are known at compile time.  Otherwise, it seems to require
pointer analysis at best, and is infeasible at worst
[e.g. a pointer received from another translation unit].

I think the only thing we can do about this, other than alter our
plans for defaulting the if conversion, is to endeavor to make profiling
[e.g. "gprof"] able to "understand" that a certain piece of code has been
if-converted and able to suggest -- based on profiling -- that the
conversion should be undone b/c it is "costing" more than it is "saving",
even with vectorization, which IMO should be an extremely rare occurrence
if/once we are checking e.g. "exp(x)" [assuming it`s not loop-invariant]
for low cost of evaluation.

IOW, whatever we have [or will set] the threshold on evaluation cost of
the RHS expression for if conversion of source code like the above example
should, IMO, solve most instances of the abovementioned problem.
The remaining problem cases will likely be something like:
  {"exp(x)" is _not_ loop-invariant,
   the probability of C[index] being convertible to true is very low,
   _and_ the statically-estimated evaluation cost of "exp(x)"
   is both under the maximum and too close to that maximum}.

Re: [PATCHv2] [fixincludes] Ignore .DS_Store junk files when running make check

2015-07-28 Thread Mike Stump
On Jul 28, 2015, at 6:38 AM, Bruce Korb  wrote:
> Definitely much better.  I won't apply it until the weekend, so
> someone else will likely beat me to it.

Looks good to me as well, I checked it in.

Committed revision 226317.


Re: [PATCHv2] [fixincludes] Ignore .DS_Store junk files when running make check

2015-07-28 Thread Mike Stump
On Jul 27, 2015, at 7:36 PM, Eric Gallager  wrote:
> On 7/27/15, Andreas Schwab  wrote:
>> Eric Gallager  writes:
>> 
>>> Okay, I tried embedding "! -name CVS/ ! -name .svn/" into the find
>> 
>> -name does an exact match, so you don't need the slash.

> Okay, attached a new version of the patch; make check for fixincludes
> still passes with it.

If someone can test the gcc-5 branch with it and if it fixes it, I’ll approve 
it for the 5 branch as well.



Re: Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-07-28 Thread Alex Velenko

On 21/04/15 06:27, Jeff Law wrote:

On 04/20/2015 01:09 AM, Shiva Chen wrote:

Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks.  This looks pretty good.

I expanded the comment for the new function a bit and renamed the
function in an effort to clarify its purpose.  From reviewing
can_replace_by, it seems it should have been handling this case, but
clearly wasn't due to implementation details.

I then bootstrapped and regression tested the patch on x86_64-linux-gnu
where it passed.  I also instrumented that compiler to see how often
this code triggers.  During a bootstrap it triggers a couple hundred
times (which is obviously a proxy for cross jumping improvements).  So
it's triggering regularly on x86_64, which is good.

I also verified that this fixes BZ64916 for an arm-non-eabi toolchain
configured with --with-arch=armv7.

Installed on the trunk.  No new testcase as it's covered by existing tests.

Thanks,,
jeff



Hi,
I see this patch been committed in r56 on trunk. Is it okay to port 
this to fsf-5?

Kind regards,
Alex



Re: [PING] Re: [PATCH] New configure option to default enable Smart Stack Protection

2015-07-28 Thread Magnus Granberg
måndag 20 juli 2015 16.32.01 skrev  Magnus Granberg:
> > Patch updated and tested on x86_64-unknown-linux-gnu (Gentoo)
> > 
> > Changlogs
> > /gcc
> > 2015-07-05  Magnus Granberg  
> > 
> > * common.opt (fstack-protector): Initialize to -1.
> > (fstack-protector-all): Likewise.
> > (fstack-protector-strong): Likewise.
> > (fstack-protector-explicit): Likewise.
> > * configure.ac: Add --enable-default-ssp.
> > * defaults.h (DEFAULT_FLAG_SSP): New.  Default SSP to strong.
> > * opts.c (finish_options): Update opts->x_flag_stack_protect if it
> > 
> > is -1. * doc/install.texi: Document --enable-default-ssp.
> > 
> > * config.in: Regenerated.
> > * configure: Likewise.
> > 
> > /testsuite
> > 2015-07-13  Magnus Granberg  
> > 
> > * lib/target-supports.exp
> > (check_effective_target_fstack_protector_enabled): New test.
> > * gcc.target/i386/ssp-default.c: New test.
> > 
> > ---
> 
> Ping
> Can this be commited to trunk?
Pinging ones more time on this patch.
/Magnus G.




Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-28 Thread Marek Polacek
On Tue, Jul 28, 2015 at 09:02:51AM -0700, Steve Ellcey wrote:
> Marek,
> 
> I have run into a problem with this warning and building glibc.
> 
> sysdeps/ieee754/s_matherr.c has:
> 
> int
> weak_function
> __matherr(struct exception *x)
> {
> int n=0;
> if(x->arg1!=x->arg1) return 0;
> return n;
> }
> 
> 
> And arg1 is a floating point type.  I think that if the value of
> x->xarg1 is a NaN then the if statement should return TRUE because a NaN
> never compares equal to anything, even another NaN (check with your
> local IEEE expert).  I believe this method of checking for a NaN is
> fairly common and I am not sure if GCC should be emitting a warning for
> it.

Oh, you're right.  In IEEE-754, NaN != NaN.  So I need to adjust the
warning and the documentation a bit.  I suppose this is just about
using get_inner_reference and punting for FLOAT_TYPE_P (I'll try to fix
this tomorrow).

This certainly didn't occur to me when I was writing the warning...
Thanks for bringing this up.

Marek


[gomp4] unify open acc level constants

2015-07-28 Thread Nathan Sidwell
I've committed this to the gomp4 branch.  It cleans up the existing OACC_foo 
constants defined in omp-low.h, replacing them with the ones I just defined in 
gomp-constants.h


nathan
2015-07-28  Nathan Sidwell  

	* omp-low.h (enum oacc_loop_levels): Delete.
	(OACC_LOOP_MASK): Delete.
	* omp-low.c: Change all OACC_x to GOMP_DIM_x
	* config/nvptx/nvptx.c: Change all OACC_x to GOMP_DIM_x
	* builtins.c (expand_oacc_id): Change OACC_HWM to GOMP_DIM_MAX.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 226314)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -1075,7 +1075,7 @@ void
 nvptx_expand_oacc_fork (rtx mode)
 {
   /* Emit fork for worker level.  */
-  if (UINTVAL (mode) == OACC_worker)
+  if (UINTVAL (mode) == GOMP_DIM_WORKER)
 emit_insn (gen_nvptx_fork (mode));
 }
 
@@ -2169,8 +2169,6 @@ nvptx_reorg_subreg ()
a NULL loop.  We should be able to extend this to represent
superblocks.  */
 
-#define OACC_null OACC_HWM
-
 struct parallel
 {
   /* Parent parallel.  */
@@ -2369,7 +2367,7 @@ typedef auto_vec bb_par_vec_t;
 static parallel *
 nvptx_discover_pars (bb_insn_map_t *map)
 {
-  parallel *outer_par = new parallel (0, OACC_null);
+  parallel *outer_par = new parallel (0, GOMP_DIM_MAX);
   bb_par_vec_t worklist;
   basic_block block;
 
@@ -2413,7 +2411,7 @@ nvptx_discover_pars (bb_insn_map_t *map)
 		l = new parallel (l, mode);
 		l->forked_block = block;
 		l->forked_insn = end;
-		if (mode == OACC_worker)
+		if (mode == GOMP_DIM_WORKER)
 		  l->fork_insn
 		= nvptx_discover_pre (block, CODE_FOR_nvptx_fork);
 	  }
@@ -2428,7 +2426,7 @@ nvptx_discover_pars (bb_insn_map_t *map)
 		gcc_assert (l->mode == mode);
 		l->join_block = block;
 		l->join_insn = end;
-		if (mode == OACC_worker)
+		if (mode == GOMP_DIM_WORKER)
 		  l->joining_insn
 		= nvptx_discover_pre (block, CODE_FOR_nvptx_joining);
 		l = l->parent;
@@ -2706,7 +2704,7 @@ nvptx_single (unsigned mask, basic_block
 	{
 	  /* If we're only doing vector single, there's no need to
 	 emit skip code because we'll not insert anything.  */
-	  if (!(mask & OACC_LOOP_MASK (OACC_vector)))
+	  if (!(mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)))
 	skip_mask = 0;
 	}
   else if (tail_branch)
@@ -2717,8 +2715,8 @@ nvptx_single (unsigned mask, basic_block
   /* Insert the vector test inside the worker test.  */
   unsigned mode;
   rtx_insn *before = tail;
-  for (mode = OACC_worker; mode <= OACC_vector; mode++)
-if (OACC_LOOP_MASK (mode) & skip_mask)
+  for (mode = GOMP_DIM_WORKER; mode <= GOMP_DIM_VECTOR; mode++)
+if (GOMP_DIM_MASK (mode) & skip_mask)
   {
 	rtx id = gen_reg_rtx (SImode);
 	rtx pred = gen_reg_rtx (BImode);
@@ -2728,7 +2726,7 @@ nvptx_single (unsigned mask, basic_block
 	rtx cond = gen_rtx_SET (pred, gen_rtx_NE (BImode, id, const0_rtx));
 	emit_insn_before (cond, head);
 	rtx br;
-	if (mode == OACC_vector)
+	if (mode == GOMP_DIM_VECTOR)
 	  br = gen_br_true (pred, label);
 	else
 	  br = gen_br_true_uni (pred, label);
@@ -2746,7 +2744,7 @@ nvptx_single (unsigned mask, basic_block
 {
   rtx pvar = XEXP (XEXP (cond_branch, 0), 0);
 
-  if (OACC_LOOP_MASK (OACC_vector) == mask)
+  if (GOMP_DIM_MASK (GOMP_DIM_VECTOR) == mask)
 	{
 	  /* Vector mode only, do a shuffle.  */
 	  emit_insn_before (nvptx_gen_vcast (pvar), tail);
@@ -2806,7 +2804,7 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
-  unsigned inner_mask = OACC_LOOP_MASK (par->mode);
+  unsigned inner_mask = GOMP_DIM_MASK (par->mode);
   
   /* Do the inner parallels first.  */
   if (par->inner)
@@ -2817,15 +2815,15 @@ nvptx_process_pars (parallel *par)
   
   switch (par->mode)
 {
-case OACC_null:
+case GOMP_DIM_MAX:
   /* Dummy parallel.  */
   break;
 
-case OACC_vector:
+case GOMP_DIM_VECTOR:
   nvptx_vpropagate (par->forked_block, par->forked_insn);
   break;
   
-case OACC_worker:
+case GOMP_DIM_WORKER:
   {
 	nvptx_wpropagate (false, par->forked_block,
 			  par->forked_insn);
@@ -2836,7 +2834,7 @@ nvptx_process_pars (parallel *par)
   }
   break;
 
-case OACC_gang:
+case GOMP_DIM_GANG:
   break;
 
 default:gcc_unreachable ();
@@ -2855,30 +2853,30 @@ nvptx_process_pars (parallel *par)
 static void
 nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer)
 {
-  unsigned me = (OACC_LOOP_MASK (par->mode)
-		 & (OACC_LOOP_MASK (OACC_worker)
-		| OACC_LOOP_MASK (OACC_vector)));
+  unsigned me = (GOMP_DIM_MASK (par->mode)
+		 & (GOMP_DIM_MASK (GOMP_DIM_WORKER)
+		| GOMP_DIM_MASK (GOMP_DIM_VECTOR)));
   unsigned  skip_mask = 0, neuter_mask = 0;
   
   if (par->inner)
 nvptx_neuter_pars (par->inner, modes, outer | me);
 
-  for (unsigned mode = OACC_worker; mode <= OACC_vector; mode++)
+  for (unsigned mode = GOMP_DIM_WORKER; mode <= GOMP_DIM_VECTOR; mode++)
 {
-  if ((outer | me) & OACC

Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.

2015-07-28 Thread Jakub Jelinek
Hi!

> This is just an initial patch and tries to optimize integer type power 2
> constants.  I wanted to get feedback on this .  I bootstrapped and reg
> tested on aarch64-none-linux-gnu .

Thanks for working on it.
ChangeLog entry for the patch is missing, probably also some testcases.

> @@ -90,6 +94,7 @@ static vect_recog_func_ptr 
> vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
>   vect_recog_rotate_pattern,
>   vect_recog_vector_vector_shift_pattern,
>   vect_recog_divmod_pattern,
> +vect_recog_multconst_pattern,
>   vect_recog_mixed_size_cond_pattern,
>   vect_recog_bool_pattern};

Please watch formatting, the other lines are tab indented, so please use a
tab rather than 8 spaces.

> @@ -2147,6 +2152,87 @@ vect_recog_vector_vector_shift_pattern (vec 
> *stmts,
>return pattern_stmt;
>  }
>  

Function comment is missing here.

> +static gimple
> +vect_recog_multconst_pattern (vec *stmts,
> +   tree *type_in, tree *type_out)

About the function name, wonder if just vect_recog_mult_pattern wouldn't be
enough.

> +  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  switch (rhs_code)
> +{
> +case MULT_EXPR:
> +  break;
> +default:
> +  return NULL;
> +}

This looks too weird, I'd just do
  if (gimple_assign_rhs_code (last_stmt) != MULT_EXPR)
return NULL;
(you handle just one pattern).

> +  /* If the target can handle vectorized multiplication natively,
> + don't attempt to optimize this.  */
> +  optab = optab_for_tree_code (rhs_code, vectype, optab_default);

Supposedly you can use MULT_EXPR directly here.

> +  /* If target cannot handle vector left shift then we cannot 
> + optimize and bail out.  */ 
> +  optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector);
> +  if (!optab
> +  || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
> +return NULL;
> +
> +  if (integer_pow2p (oprnd1))
> +{
> +  /* Pattern detected.  */
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "vect_recog_multconst_pattern: detected:\n");
> +
> +  tree shift;
> +  shift = build_int_cst (itype, tree_log2 (oprnd1));
> +  pattern_stmt = gimple_build_assign (vect_recog_temp_ssa_var (itype, 
> NULL),
> +   LSHIFT_EXPR, oprnd0, shift);
> +  if (dump_enabled_p ())
> + dump_gimple_stmt_loc (MSG_NOTE, vect_location, TDF_SLIM, pattern_stmt,
> +  0);
> +  stmts->safe_push (last_stmt);
> +  *type_in = vectype;
> +  *type_out = vectype;
> +  return pattern_stmt;
> +} 

Trailing whitespace.
The integer_pow2p case (have you checked signed multiply by INT_MIN?)
is only one of the cases you can actually handle, you can look at
expand_mult for many other cases - e.g. multiplication by negated powers of
2, or call choose_mult_variant and handle whatever it returns.

Jakub


Re: C++ delayed folding branch review

2015-07-28 Thread Kai Tietz
2015-07-28 1:14 GMT+02:00 Kai Tietz :
> 2015-07-27 18:51 GMT+02:00 Jason Merrill :
>> I've trimmed this to the previously mentioned issues that still need to be
>> addressed; I'll do another full review after these are dealt with.
>
> Thanks for doing this summary of missing parts of prior review.
>
>> On 06/13/2015 12:15 AM, Jason Merrill wrote:
>>>
>>> On 06/12/2015 12:11 PM, Kai Tietz wrote:
>>
>> @@ -1052,6 +1054,9 @@ adjust_temp_type (tree type, tree temp)
>>   {
>> if (TREE_TYPE (temp) == type)
>>   return temp;
>> +  STRIP_NOPS (temp);
>> +  if (TREE_TYPE (temp) == type)
>> +return temp;
>> @@ -1430,6 +1438,8 @@ cxx_eval_call_expression (const constexpr_ctx
>> *ctx,
>> tree t,
>>   bool
>>   reduced_constant_expression_p (tree t)
>>   {
>> +  /* Make sure we remove useless initial NOP_EXPRs.  */
>> +  STRIP_NOPS (t);
>
>
> Within the constexpr code we should be folding away NOPs as they are
> generated, they shouldn't live this long.


 Well, we might see them on overflows ...
>>>
>>>
>>> We shouldn't within the constexpr code.  NOPs for expressions that are
>>> non-constant due to overflow are added in
>>> cxx_eval_outermost_constant_expr, so we shouldn't see them in the middle
>>> of constexpr evaluation.
>>>
>> @@ -1088,7 +1093,10 @@ cxx_bind_parameters_in_call (const constexpr_ctx
>> *ctx, tree t,
>>&& is_dummy_object (x))
>>  {
>>x = ctx->object;
>> - x = cp_build_addr_expr (x, tf_warning_or_error);
>> + if (x)
>> +   x = cp_build_addr_expr (x, tf_warning_or_error);
>> + else
>> +   x = get_nth_callarg (t, i);
>
>
> This still should not be necessary.


 Yeah, most likely.  But I got initially here some issues, so I don't
 see that this code would worsen things.
>>>
>>>
>>> If this code path is hit, that means something has broken my design, and
>>> I don't want to just paper over that.  Please revert this change.
>>>
>>   case SIZEOF_EXPR:
>> +  if (processing_template_decl
>> + && (!COMPLETE_TYPE_P (TREE_TYPE (t))
>> + || TREE_CODE (TYPE_SIZE (TREE_TYPE (t))) != INTEGER_CST))
>> +   return t;
>
>
> Why is this necessary?


 We don't want to resolve SIZEOF_EXPR within template-declarations for
 incomplete types, of if its size isn't fixed.  Issue is that we
 otherwise get issues about expressions without existing type (as usual
 within template-declarations for some expressions).
>>>
>>>
>>> Yes, but we shouldn't have gotten this far with a dependent sizeof;
>>> maybe_constant_value just returns if
>>> instantiation_dependent_expression_p is true.
>>>
>> @@ -3391,8 +3431,23 @@ cxx_eval_constant_expression (const
>> constexpr_ctx
>> *ctx, tree t,
>>   case CONVERT_EXPR:
>>   case VIEW_CONVERT_EXPR:
>>   case NOP_EXPR:
>> +case UNARY_PLUS_EXPR:
>> {
>> +   enum tree_code tcode = TREE_CODE (t);
>>  tree oldop = TREE_OPERAND (t, 0);
>> +
>> +   if (tcode == NOP_EXPR && TREE_TYPE (t) == TREE_TYPE (oldop) &&
>> TREE_OVERFLOW_P (oldop))
>> + {
>> +   if (!ctx->quiet)
>> + permerror (input_location, "overflow in constant
>> expression");
>> +   /* If we're being permissive (and are in an enforcing
>> +   context), ignore the overflow.  */
>> +   if (!flag_permissive)
>> + *overflow_p = true;
>> +   *non_constant_p = true;
>> +
>> +   return t;
>> + }
>>  tree op = cxx_eval_constant_expression (ctx, oldop,
>
>
> Why doesn't the call to cxx_eval_constant_expression at the bottom here
> handle oldop having TREE_OVERFLOW set?


 I just handled the case that we see here a wrapping NOP_EXPR around an
 overflow.  As this isn't handled by cxx_eval_constant_expression.
>>>
>>>
>>> How does it need to be handled?  A NOP_EXPR wrapped around an overflow
>>> is there to indicated that the expression is non-constant, and it can't
>>> be simplified any farther.
>>>
>>> Please give an example of what was going wrong.
>>>
>> @@ -565,6 +571,23 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p,
>> gimple_seq *post_p)
>>
>> switch (code)
>>   {
>> +case SIZEOF_EXPR:
>> +  if (SIZEOF_EXPR_TYPE_P (*expr_p))
>> +   *expr_p = cxx_sizeof_or_alignof_type (TREE_TYPE (TREE_OPERAND
>> (*expr_p,
>> +
>> 0)),
>> + SIZEOF_EXPR, false);
>> +  else if (TYPE_P (TREE_OPERAND (*expr_p, 0)))
>> +   *expr_p = cxx_sizeof_or_alignof_type (TREE_OPERAND (*expr_p,
>> 0),
>> + 

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-28 Thread Richard Sandiford
Trevor Saunders  writes:
> On Mon, Jul 27, 2015 at 09:05:08PM +0100, Richard Sandiford wrote:
>> Alternatively we could have a new target_globals structure that is
>> initialised with the result of calling the hook.  If we do that though,
>> it might make sense to consolidate the hooks rather than have one for
>> every value.  E.g. having one function for UNITS_PER_WORD, one for
>> POINTER_SIZE, one for Pmode, etc., would lead to some very verbose
>> target code.
>
> so something like
>
> struct target_types
> {
>   unsigned long pointer_size;
>   ...
> };
>
> const target_types &targetm.get_type_data ()
>
> ? that seems pretty reasonable, and I wouldn't expect too many ordering
> issues, but who knows.  Its too bad nobody has taken on the big job of
> turning targetm into a class so we can hope for some devirt help from
> the compiler.

I was thinking more:

  void targetm.get_type_data (target_types *);

The caller could then initialise or post-process the defaults.  The
target_types would eventually end up in some target_globals structure.

Thanks,
Richard


Re: [PATCH][1/N] Change GET_MODE_INNER to always return a non-void mode

2015-07-28 Thread Richard Sandiford
"David Sherwood"  writes:
> Hi,
>
> I have updated the comment above GET_MODE_INNER and while there I have
> fixed a spelling mistake in the comment above GET_MODE_UNIT_SIZE.
>
> Tested:
> aarch64 and aarch64_be - no regressions in gcc testsuite
> x86_64 - bootstrap build, no testsuite regressions
> arm-none-eabi - no regressions in gcc testsuite
> Run contrib/config-list.mk - only build failures are ones that fail anyway 
> with
> warnings being treated as errors.
>
> Hope this is ok.
>
> Cheers,
> Dave.

Since Jeff conditionally approved the patch, I went ahead and applied it.

Thanks,
Richard


[committed] Use target-insns.def for indirect_jump

2015-07-28 Thread Richard Sandiford
Continuing after a break for the fr30 patch...

Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
Also tested via config-list.mk.  Committed as preapproved.

Thanks,
Richard


gcc/
* target-insns.def (indirect_jump): New targetm instruction pattern.
* optabs.c (emit_indirect_jump): Use it instead of HAVE_*/gen_*
interface.

Index: gcc/target-insns.def
===
--- gcc/target-insns.def2015-07-28 20:34:48.452276705 +0100
+++ gcc/target-insns.def2015-07-28 20:34:48.444276797 +0100
@@ -44,6 +44,7 @@ DEF_TARGET_INSN (epilogue, (void))
 DEF_TARGET_INSN (exception_receiver, (void))
 DEF_TARGET_INSN (extv, (rtx x0, rtx x1, rtx x2, rtx x3))
 DEF_TARGET_INSN (extzv, (rtx x0, rtx x1, rtx x2, rtx x3))
+DEF_TARGET_INSN (indirect_jump, (rtx x0))
 DEF_TARGET_INSN (insv, (rtx x0, rtx x1, rtx x2, rtx x3))
 DEF_TARGET_INSN (jump, (rtx x0))
 DEF_TARGET_INSN (load_multiple, (rtx x0, rtx x1, rtx x2))
Index: gcc/optabs.c
===
--- gcc/optabs.c2015-07-28 20:34:48.452276705 +0100
+++ gcc/optabs.c2015-07-28 20:34:48.448276751 +0100
@@ -4484,16 +4484,15 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
 /* Generate code to indirectly jump to a location given in the rtx LOC.  */
 
 void
-emit_indirect_jump (rtx loc ATTRIBUTE_UNUSED)
+emit_indirect_jump (rtx loc)
 {
-#ifndef HAVE_indirect_jump
-  sorry ("indirect jumps are not available on this target");
-#else
+  if (!targetm.have_indirect_jump ())
+sorry ("indirect jumps are not available on this target");
+
   struct expand_operand ops[1];
   create_address_operand (&ops[0], loc);
-  expand_jump_insn (CODE_FOR_indirect_jump, 1, ops);
+  expand_jump_insn (targetm.code_for_indirect_jump, 1, ops);
   emit_barrier ();
-#endif
 }
 
 



[committed] Use target-insns.def for eh_return

2015-07-28 Thread Richard Sandiford
The thread_prologue_and_epilogue_insns code that used to be protected
by #ifdef HAVE_eh_return is unconditionally correct and isn't on any
kind of hot path (it's only run once per function).

Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
Also tested via config-list.mk.  Committed as preapproved.

Thanks,
Richard


gcc/
* target-insns.def (eh_return): New targetm instruction pattern.
* except.c (expand_eh_return): Use it instead of HAVE_*/gen_*
interface.
* function.c (thread_prologue_and_epilogue_insns): Remove
preprocessor condition.

Index: gcc/target-insns.def
===
--- gcc/target-insns.def2015-06-28 12:00:27.698196372 +0100
+++ gcc/target-insns.def2015-06-28 12:00:31.448043556 +0100
@@ -39,6 +39,7 @@ DEF_TARGET_INSN (check_stack, (rtx x0))
 DEF_TARGET_INSN (clear_cache, (rtx x0, rtx x1))
 DEF_TARGET_INSN (doloop_begin, (rtx x0, rtx x1))
 DEF_TARGET_INSN (doloop_end, (rtx x0, rtx x1))
+DEF_TARGET_INSN (eh_return, (rtx x0))
 DEF_TARGET_INSN (epilogue, (void))
 DEF_TARGET_INSN (exception_receiver, (void))
 DEF_TARGET_INSN (extv, (rtx x0, rtx x1, rtx x2, rtx x3))
Index: gcc/except.c
===
--- gcc/except.c2015-06-28 12:00:27.698196372 +0100
+++ gcc/except.c2015-06-28 12:00:31.449043515 +0100
@@ -2271,11 +2271,9 @@ expand_eh_return (void)
   emit_move_insn (EH_RETURN_STACKADJ_RTX, crtl->eh.ehr_stackadj);
 #endif
 
-#ifdef HAVE_eh_return
-  if (HAVE_eh_return)
-emit_insn (gen_eh_return (crtl->eh.ehr_handler));
+  if (targetm.have_eh_return ())
+emit_insn (targetm.gen_eh_return (crtl->eh.ehr_handler));
   else
-#endif
 {
 #ifdef EH_RETURN_HANDLER_RTX
   emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler);
Index: gcc/function.c
===
--- gcc/function.c  2015-06-28 12:00:27.698196372 +0100
+++ gcc/function.c  2015-06-28 12:00:31.447043596 +0100
@@ -5963,7 +5963,6 @@ thread_prologue_and_epilogue_insns (void
  uses the flag in the meantime.  */
   epilogue_completed = 1;
 
-#ifdef HAVE_eh_return
   /* Find non-fallthru edges that end with EH_RETURN instructions.  On
  some targets, these get split to a special version of the epilogue
  code.  In order to be able to properly annotate these with unwind
@@ -5987,7 +5986,6 @@ thread_prologue_and_epilogue_insns (void
   record_insns (NEXT_INSN (prev), NEXT_INSN (trial), &epilogue_insn_hash);
   emit_note_after (NOTE_INSN_EPILOGUE_BEG, prev);
 }
-#endif
 
   /* If nothing falls through into the exit block, we don't need an
  epilogue.  */



[committed] Use target-insns.def for can_extend and ptr_extend

2015-07-28 Thread Richard Sandiford
Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
Also tested via config-list.mk.  Committed as preapproved.

Thanks,
Richard


gcc/
* target-insns.def (can_extend, ptr_extend): New targetm instruction
patterns.
* optabs.c (can_extend_p): Use them instead of HAVE_*/gen_* interface.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
* emit-rtl.c (set_reg_attrs_from_value): Likewise.
* rtlanal.c (nonzero_bits1): Likewise.
(num_sign_bit_copies1): Likewise.

Index: gcc/target-insns.def
===
--- gcc/target-insns.def2015-07-28 20:56:29.721512028 +0100
+++ gcc/target-insns.def2015-07-28 20:56:29.713512127 +0100
@@ -34,6 +34,7 @@ DEF_TARGET_INSN (allocate_stack, (rtx x0
 DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
 DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
 DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
+DEF_TARGET_INSN (can_extend, (rtx x0, rtx x1))
 DEF_TARGET_INSN (canonicalize_funcptr_for_compare, (rtx x0, rtx x1))
 DEF_TARGET_INSN (casesi, (rtx x0, rtx x1, rtx x2, rtx x3, rtx x4))
 DEF_TARGET_INSN (check_stack, (rtx x0))
@@ -58,6 +59,7 @@ DEF_TARGET_INSN (prefetch, (rtx x0, rtx
 DEF_TARGET_INSN (probe_stack, (rtx x0))
 DEF_TARGET_INSN (probe_stack_address, (rtx x0))
 DEF_TARGET_INSN (prologue, (void))
+DEF_TARGET_INSN (ptr_extend, (rtx x0, rtx x1))
 DEF_TARGET_INSN (restore_stack_block, (rtx x0, rtx x1))
 DEF_TARGET_INSN (restore_stack_function, (rtx x0, rtx x1))
 DEF_TARGET_INSN (restore_stack_nonlocal, (rtx x0, rtx x1))
Index: gcc/optabs.c
===
--- gcc/optabs.c2015-07-28 20:56:29.721512028 +0100
+++ gcc/optabs.c2015-07-28 20:56:29.709512177 +0100
@@ -4851,10 +4851,8 @@ can_extend_p (machine_mode to_mode, mach
  int unsignedp)
 {
   convert_optab tab;
-#ifdef HAVE_ptr_extend
-  if (unsignedp < 0)
-return CODE_FOR_ptr_extend;
-#endif
+  if (unsignedp < 0 && targetm.have_ptr_extend ())
+return targetm.code_for_ptr_extend;
 
   tab = unsignedp ? zext_optab : sext_optab;
   return convert_optab_handler (tab, to_mode, from_mode);
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2015-07-28 20:56:29.721512028 +0100
+++ gcc/simplify-rtx.c  2015-07-28 20:56:29.713512127 +0100
@@ -1431,7 +1431,7 @@ simplify_unary_operation_1 (enum rtx_cod
}
}
 
-#if defined(POINTERS_EXTEND_UNSIGNED) && !defined(HAVE_ptr_extend)
+#if defined(POINTERS_EXTEND_UNSIGNED)
   /* As we do not know which address space the pointer is referring to,
 we can do this only if the target does not support different pointer
 or address modes depending on the address space.  */
@@ -1442,7 +1442,8 @@ simplify_unary_operation_1 (enum rtx_cod
  || (GET_CODE (op) == SUBREG
  && REG_P (SUBREG_REG (op))
  && REG_POINTER (SUBREG_REG (op))
- && GET_MODE (SUBREG_REG (op)) == Pmode)))
+ && GET_MODE (SUBREG_REG (op)) == Pmode))
+ && !targetm.have_ptr_extend ())
return convert_memory_address (Pmode, op);
 #endif
   break;
@@ -1552,7 +1553,7 @@ simplify_unary_operation_1 (enum rtx_cod
 GET_MODE (SUBREG_REG (op)));
}
 
-#if defined(POINTERS_EXTEND_UNSIGNED) && !defined(HAVE_ptr_extend)
+#if defined(POINTERS_EXTEND_UNSIGNED)
   /* As we do not know which address space the pointer is referring to,
 we can do this only if the target does not support different pointer
 or address modes depending on the address space.  */
@@ -1563,7 +1564,8 @@ simplify_unary_operation_1 (enum rtx_cod
  || (GET_CODE (op) == SUBREG
  && REG_P (SUBREG_REG (op))
  && REG_POINTER (SUBREG_REG (op))
- && GET_MODE (SUBREG_REG (op)) == Pmode)))
+ && GET_MODE (SUBREG_REG (op)) == Pmode))
+ && !targetm.have_ptr_extend ())
return convert_memory_address (Pmode, op);
 #endif
   break;
Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c  2015-07-28 20:56:29.721512028 +0100
+++ gcc/emit-rtl.c  2015-07-28 20:56:29.713512127 +0100
@@ -1159,9 +1159,10 @@ set_reg_attrs_from_value (rtx reg, rtx x
 || GET_CODE (x) == TRUNCATE
 || (GET_CODE (x) == SUBREG && subreg_lowpart_p (x)))
 {
-#if defined(POINTERS_EXTEND_UNSIGNED) && !defined(HAVE_ptr_extend)
-  if ((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
- || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+#if defined(POINTERS_EXTEND_UNSIGNED)
+  if (((GET_CODE (x) == SIGN_EXTEND && POINTERS_EXTEND_UNSIGNED)
+  || (GET_CODE (x) != SIGN_EXTEND && ! POINTERS_EXTEND_UNSIGNED))
+ && !targetm.have_

[committed] Use target-insns.def for atomic_test_and_set

2015-07-28 Thread Richard Sandiford
Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
Also tested via config-list.mk.  Committed as preapproved.

Thanks,
Richard


gcc/
* target-insns.def (atomic_test_and_set): New targetm instruction
pattern.
* optabs.c (maybe_emit_atomic_test_and_set): Use it instead of
HAVE_*/gen_* interface.

Index: gcc/target-insns.def
===
--- gcc/target-insns.def2015-07-28 21:00:09.815019853 +0100
+++ gcc/target-insns.def2015-07-28 21:00:09.811019905 +0100
@@ -31,6 +31,7 @@
 
Instructions should be documented in md.texi rather than here.  */
 DEF_TARGET_INSN (allocate_stack, (rtx x0, rtx x1))
+DEF_TARGET_INSN (atomic_test_and_set, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
 DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
 DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
Index: gcc/optabs.c
===
--- gcc/optabs.c2015-07-28 21:00:09.815019853 +0100
+++ gcc/optabs.c2015-07-28 21:00:09.811019905 +0100
@@ -7258,35 +7258,30 @@ maybe_emit_compare_and_swap_exchange_loo
using the atomic_test_and_set instruction pattern.  A boolean value
is returned from the operation, using TARGET if possible.  */
 
-#ifndef HAVE_atomic_test_and_set
-#define HAVE_atomic_test_and_set 0
-#define CODE_FOR_atomic_test_and_set CODE_FOR_nothing
-#endif
-
 static rtx
 maybe_emit_atomic_test_and_set (rtx target, rtx mem, enum memmodel model)
 {
   machine_mode pat_bool_mode;
   struct expand_operand ops[3];
 
-  if (!HAVE_atomic_test_and_set)
+  if (!targetm.have_atomic_test_and_set ())
 return NULL_RTX;
 
   /* While we always get QImode from __atomic_test_and_set, we get
  other memory modes from __sync_lock_test_and_set.  Note that we
  use no endian adjustment here.  This matches the 4.6 behavior
  in the Sparc backend.  */
-  gcc_checking_assert
-(insn_data[CODE_FOR_atomic_test_and_set].operand[1].mode == QImode);
+  enum insn_code icode = targetm.code_for_atomic_test_and_set;
+  gcc_checking_assert (insn_data[icode].operand[1].mode == QImode);
   if (GET_MODE (mem) != QImode)
 mem = adjust_address_nv (mem, QImode, 0);
 
-  pat_bool_mode = insn_data[CODE_FOR_atomic_test_and_set].operand[0].mode;
+  pat_bool_mode = insn_data[icode].operand[0].mode;
   create_output_operand (&ops[0], target, pat_bool_mode);
   create_fixed_operand (&ops[1], mem);
   create_integer_operand (&ops[2], model);
 
-  if (maybe_expand_insn (CODE_FOR_atomic_test_and_set, 3, ops))
+  if (maybe_expand_insn (icode, 3, ops))
 return ops[0].value;
   return NULL_RTX;
 }



[committed] Use target-insns.def for reload_load_address

2015-07-28 Thread Richard Sandiford
Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
Also tested via config-list.mk.  Committed as preapproved.

Thanks,
Richard


gcc/
* target-insns.def (reload_load_address): New targetm instruction
pattern.
* reload1.c (gen_reload): Use it instead of HAVE_*/gen_* interface.

Index: gcc/target-insns.def
===
--- gcc/target-insns.def2015-06-28 12:29:34.245633312 +0100
+++ gcc/target-insns.def2015-06-28 12:30:58.298088971 +0100
@@ -61,6 +61,7 @@ DEF_TARGET_INSN (probe_stack, (rtx x0))
 DEF_TARGET_INSN (probe_stack_address, (rtx x0))
 DEF_TARGET_INSN (prologue, (void))
 DEF_TARGET_INSN (ptr_extend, (rtx x0, rtx x1))
+DEF_TARGET_INSN (reload_load_address, (rtx x0, rtx x1))
 DEF_TARGET_INSN (restore_stack_block, (rtx x0, rtx x1))
 DEF_TARGET_INSN (restore_stack_function, (rtx x0, rtx x1))
 DEF_TARGET_INSN (restore_stack_nonlocal, (rtx x0, rtx x1))
Index: gcc/reload1.c
===
--- gcc/reload1.c   2015-06-27 17:11:38.224086135 +0100
+++ gcc/reload1.c   2015-06-28 12:29:34.245633312 +0100
@@ -8813,10 +8813,8 @@ gen_reload (rtx out, rtx in, int opnum,
   mark_jump_label (in, tem, 0);
 }
 
-#ifdef HAVE_reload_load_address
-  else if (HAVE_reload_load_address)
-emit_insn (gen_reload_load_address (out, in));
-#endif
+  else if (targetm.have_reload_load_address ())
+emit_insn (targetm.gen_reload_load_address (out, in));
 
   /* Otherwise, just write (set OUT IN) and hope for the best.  */
   else



Re: [committed] Use target-insns.def for indirect_jump

2015-07-28 Thread Andrew Pinski
On Tue, Jul 28, 2015 at 1:35 PM, Richard Sandiford
 wrote:
> Continuing after a break for the fr30 patch...
>
> Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
> Also tested via config-list.mk.  Committed as preapproved.
>
> Thanks,
> Richard
>
>
> gcc/
> * target-insns.def (indirect_jump): New targetm instruction pattern.
> * optabs.c (emit_indirect_jump): Use it instead of HAVE_*/gen_*
> interface.
>
> Index: gcc/target-insns.def
> ===
> --- gcc/target-insns.def2015-07-28 20:34:48.452276705 +0100
> +++ gcc/target-insns.def2015-07-28 20:34:48.444276797 +0100
> @@ -44,6 +44,7 @@ DEF_TARGET_INSN (epilogue, (void))
>  DEF_TARGET_INSN (exception_receiver, (void))
>  DEF_TARGET_INSN (extv, (rtx x0, rtx x1, rtx x2, rtx x3))
>  DEF_TARGET_INSN (extzv, (rtx x0, rtx x1, rtx x2, rtx x3))
> +DEF_TARGET_INSN (indirect_jump, (rtx x0))
>  DEF_TARGET_INSN (insv, (rtx x0, rtx x1, rtx x2, rtx x3))
>  DEF_TARGET_INSN (jump, (rtx x0))
>  DEF_TARGET_INSN (load_multiple, (rtx x0, rtx x1, rtx x2))
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c2015-07-28 20:34:48.452276705 +0100
> +++ gcc/optabs.c2015-07-28 20:34:48.448276751 +0100
> @@ -4484,16 +4484,15 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
>  /* Generate code to indirectly jump to a location given in the rtx LOC.  */
>
>  void
> -emit_indirect_jump (rtx loc ATTRIBUTE_UNUSED)
> +emit_indirect_jump (rtx loc)
>  {
> -#ifndef HAVE_indirect_jump
> -  sorry ("indirect jumps are not available on this target");
> -#else
> +  if (!targetm.have_indirect_jump ())
> +sorry ("indirect jumps are not available on this target");

Hmm,  can you make sure the if gets done correctly in predicting as
not going to be taken in this case?  Not it is going to matter as
indirect jumps are used very far inbetween anyways.

Thanks,
Andrew

> +
>struct expand_operand ops[1];
>create_address_operand (&ops[0], loc);
> -  expand_jump_insn (CODE_FOR_indirect_jump, 1, ops);
> +  expand_jump_insn (targetm.code_for_indirect_jump, 1, ops);
>emit_barrier ();
> -#endif
>  }
>
>
>


Re: [committed] Use target-insns.def for atomic_test_and_set

2015-07-28 Thread Andrew Pinski
On Tue, Jul 28, 2015 at 1:36 PM, Richard Sandiford
 wrote:
> Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
> Also tested via config-list.mk.  Committed as preapproved.
>
> Thanks,
> Richard
>
>
> gcc/
> * target-insns.def (atomic_test_and_set): New targetm instruction
> pattern.
> * optabs.c (maybe_emit_atomic_test_and_set): Use it instead of
> HAVE_*/gen_* interface.
>
> Index: gcc/target-insns.def
> ===
> --- gcc/target-insns.def2015-07-28 21:00:09.815019853 +0100
> +++ gcc/target-insns.def2015-07-28 21:00:09.811019905 +0100
> @@ -31,6 +31,7 @@
>
> Instructions should be documented in md.texi rather than here.  */
>  DEF_TARGET_INSN (allocate_stack, (rtx x0, rtx x1))
> +DEF_TARGET_INSN (atomic_test_and_set, (rtx x0, rtx x1, rtx x2))
>  DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
>  DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
>  DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
> Index: gcc/optabs.c
> ===
> --- gcc/optabs.c2015-07-28 21:00:09.815019853 +0100
> +++ gcc/optabs.c2015-07-28 21:00:09.811019905 +0100
> @@ -7258,35 +7258,30 @@ maybe_emit_compare_and_swap_exchange_loo
> using the atomic_test_and_set instruction pattern.  A boolean value
> is returned from the operation, using TARGET if possible.  */
>
> -#ifndef HAVE_atomic_test_and_set
> -#define HAVE_atomic_test_and_set 0
> -#define CODE_FOR_atomic_test_and_set CODE_FOR_nothing
> -#endif
> -
>  static rtx
>  maybe_emit_atomic_test_and_set (rtx target, rtx mem, enum memmodel model)
>  {
>machine_mode pat_bool_mode;
>struct expand_operand ops[3];
>
> -  if (!HAVE_atomic_test_and_set)
> +  if (!targetm.have_atomic_test_and_set ())
>  return NULL_RTX;

I know this was not there before but this if should be marked as
unlikely as most targets where someone is using __atomic_*/__sync_*
will have those patterns.

Thanks,
Andrew Pinski


>
>/* While we always get QImode from __atomic_test_and_set, we get
>   other memory modes from __sync_lock_test_and_set.  Note that we
>   use no endian adjustment here.  This matches the 4.6 behavior
>   in the Sparc backend.  */
> -  gcc_checking_assert
> -(insn_data[CODE_FOR_atomic_test_and_set].operand[1].mode == QImode);
> +  enum insn_code icode = targetm.code_for_atomic_test_and_set;
> +  gcc_checking_assert (insn_data[icode].operand[1].mode == QImode);
>if (GET_MODE (mem) != QImode)
>  mem = adjust_address_nv (mem, QImode, 0);
>
> -  pat_bool_mode = insn_data[CODE_FOR_atomic_test_and_set].operand[0].mode;
> +  pat_bool_mode = insn_data[icode].operand[0].mode;
>create_output_operand (&ops[0], target, pat_bool_mode);
>create_fixed_operand (&ops[1], mem);
>create_integer_operand (&ops[2], model);
>
> -  if (maybe_expand_insn (CODE_FOR_atomic_test_and_set, 3, ops))
> +  if (maybe_expand_insn (icode, 3, ops))
>  return ops[0].value;
>return NULL_RTX;
>  }
>


Re: [committed] Use target-insns.def for atomic_test_and_set

2015-07-28 Thread Richard Sandiford
Andrew Pinski  writes:
> On Tue, Jul 28, 2015 at 1:36 PM, Richard Sandiford
>  wrote:
>> Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
>> Also tested via config-list.mk.  Committed as preapproved.
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * target-insns.def (atomic_test_and_set): New targetm instruction
>> pattern.
>> * optabs.c (maybe_emit_atomic_test_and_set): Use it instead of
>> HAVE_*/gen_* interface.
>>
>> Index: gcc/target-insns.def
>> ===
>> --- gcc/target-insns.def2015-07-28 21:00:09.815019853 +0100
>> +++ gcc/target-insns.def2015-07-28 21:00:09.811019905 +0100
>> @@ -31,6 +31,7 @@
>>
>> Instructions should be documented in md.texi rather than here.  */
>>  DEF_TARGET_INSN (allocate_stack, (rtx x0, rtx x1))
>> +DEF_TARGET_INSN (atomic_test_and_set, (rtx x0, rtx x1, rtx x2))
>>  DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
>>  DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
>>  DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
>> Index: gcc/optabs.c
>> ===
>> --- gcc/optabs.c2015-07-28 21:00:09.815019853 +0100
>> +++ gcc/optabs.c2015-07-28 21:00:09.811019905 +0100
>> @@ -7258,35 +7258,30 @@ maybe_emit_compare_and_swap_exchange_loo
>> using the atomic_test_and_set instruction pattern.  A boolean value
>> is returned from the operation, using TARGET if possible.  */
>>
>> -#ifndef HAVE_atomic_test_and_set
>> -#define HAVE_atomic_test_and_set 0
>> -#define CODE_FOR_atomic_test_and_set CODE_FOR_nothing
>> -#endif
>> -
>>  static rtx
>>  maybe_emit_atomic_test_and_set (rtx target, rtx mem, enum memmodel model)
>>  {
>>machine_mode pat_bool_mode;
>>struct expand_operand ops[3];
>>
>> -  if (!HAVE_atomic_test_and_set)
>> +  if (!targetm.have_atomic_test_and_set ())
>>  return NULL_RTX;
>
> I know this was not there before but this if should be marked as
> unlikely as most targets where someone is using __atomic_*/__sync_*
> will have those patterns.

I think that'd be premature optimisation.  The path being guarded here
generates new rtl instructions, which is a much more expensive operation
than a mispredicated branch.

Thanks,
Richard



Re: [committed] Use target-insns.def for atomic_test_and_set

2015-07-28 Thread Andrew Pinski
On Tue, Jul 28, 2015 at 3:10 PM, Richard Sandiford
 wrote:
> Andrew Pinski  writes:
>> On Tue, Jul 28, 2015 at 1:36 PM, Richard Sandiford
>>  wrote:
>>> Bootstrapped & regression-tested on x86_64-linux-gnu and aarch64-linux-gnu.
>>> Also tested via config-list.mk.  Committed as preapproved.
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> * target-insns.def (atomic_test_and_set): New targetm instruction
>>> pattern.
>>> * optabs.c (maybe_emit_atomic_test_and_set): Use it instead of
>>> HAVE_*/gen_* interface.
>>>
>>> Index: gcc/target-insns.def
>>> ===
>>> --- gcc/target-insns.def2015-07-28 21:00:09.815019853 +0100
>>> +++ gcc/target-insns.def2015-07-28 21:00:09.811019905 +0100
>>> @@ -31,6 +31,7 @@
>>>
>>> Instructions should be documented in md.texi rather than here.  */
>>>  DEF_TARGET_INSN (allocate_stack, (rtx x0, rtx x1))
>>> +DEF_TARGET_INSN (atomic_test_and_set, (rtx x0, rtx x1, rtx x2))
>>>  DEF_TARGET_INSN (builtin_longjmp, (rtx x0))
>>>  DEF_TARGET_INSN (builtin_setjmp_receiver, (rtx x0))
>>>  DEF_TARGET_INSN (builtin_setjmp_setup, (rtx x0))
>>> Index: gcc/optabs.c
>>> ===
>>> --- gcc/optabs.c2015-07-28 21:00:09.815019853 +0100
>>> +++ gcc/optabs.c2015-07-28 21:00:09.811019905 +0100
>>> @@ -7258,35 +7258,30 @@ maybe_emit_compare_and_swap_exchange_loo
>>> using the atomic_test_and_set instruction pattern.  A boolean value
>>> is returned from the operation, using TARGET if possible.  */
>>>
>>> -#ifndef HAVE_atomic_test_and_set
>>> -#define HAVE_atomic_test_and_set 0
>>> -#define CODE_FOR_atomic_test_and_set CODE_FOR_nothing
>>> -#endif
>>> -
>>>  static rtx
>>>  maybe_emit_atomic_test_and_set (rtx target, rtx mem, enum memmodel model)
>>>  {
>>>machine_mode pat_bool_mode;
>>>struct expand_operand ops[3];
>>>
>>> -  if (!HAVE_atomic_test_and_set)
>>> +  if (!targetm.have_atomic_test_and_set ())
>>>  return NULL_RTX;
>>
>> I know this was not there before but this if should be marked as
>> unlikely as most targets where someone is using __atomic_*/__sync_*
>> will have those patterns.
>
> I think that'd be premature optimisation.  The path being guarded here
> generates new rtl instructions, which is a much more expensive operation
> than a mispredicated branch.

That might be true that the rest is more expensive but the common path
would be through there.
It is not just about mispredicted branch but more about icache miss.

Thanks,
Andrew


>
> Thanks,
> Richard
>


[RFC PATCH] parse #pragma GCC diagnostic in libcpp

2015-07-28 Thread Manuel López-Ibáñez
Currently, #pragma GCC diagnostic is handled entirely by the FE. This
has several drawbacks:

* PR c++/53431 - C++ preprocessor ignores #pragma GCC diagnostic: The
C++ parser lexes (and preprocesses) before handling the pragmas.

* PR 53920 - "gcc -E" does not honor #pragma GCC diagnostic ignored
"-Wunused-macro": Because -E does not invoke the FE code that parses
the FE pragmas.

* PR 64698 - preprocessor ignores #pragma GCC diagnostic when using
-save-temps. Same issue as above.

The following patch moves the handling of #pragma GCC diagnostic to
libcpp but keeps the interface with the diagnostic machinery in the FE
by using a call-back function.

One serious problem with this approach is that the preprocessor will
delete the pragmas from the preprocessed output, thus '-E',
'-save-temps'  will not contain the pragmas and compiling the
preprocessed file will trigger the warnings that they were meant to
suppress.  Any ideas how to prevent libcpp from deleting the #pragmas?

No Changelog since this is not a request for approval, but comments are welcome.

Cheers,

Manuel.
Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c   (revision 226219)
+++ gcc/c-family/c-opts.c   (working copy)
@@ -969,10 +969,11 @@ c_common_post_options (const char **pfil
 }
 
   cb = cpp_get_callbacks (parse_in);
   cb->file_change = cb_file_change;
   cb->dir_change = cb_dir_change;
+  cb->handle_pragma_diagnostic = cb_handle_pragma_diagnostic;
   cpp_post_options (parse_in);
   init_global_opts_from_cpp (&global_options, cpp_get_options (parse_in));
 
   input_location = UNKNOWN_LOCATION;
 
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h (revision 226219)
+++ gcc/c-family/c-common.h (working copy)
@@ -769,10 +769,12 @@ extern void check_function_arguments_rec
  unsigned HOST_WIDE_INT);
 extern bool check_builtin_function_arguments (tree, int, tree *);
 extern void check_function_format (tree, int, tree *);
 extern tree handle_format_attribute (tree *, tree, tree, int, bool *);
 extern tree handle_format_arg_attribute (tree *, tree, tree, int, bool *);
+extern void cb_handle_pragma_diagnostic (location_t, const char *,
+location_t, const char *);
 extern bool attribute_takes_identifier_p (const_tree);
 extern bool c_common_handle_option (size_t, const char *, int, int, location_t,
const struct cl_option_handlers *);
 extern bool default_handle_c_option (size_t, const char *, int);
 extern tree c_common_type_for_mode (machine_mode, int);
Index: gcc/c-family/c-pragma.c
===
--- gcc/c-family/c-pragma.c (revision 226219)
+++ gcc/c-family/c-pragma.c (working copy)
@@ -699,58 +699,75 @@ handle_pragma_visibility (cpp_reader *du
 }
   if (pragma_lex (&x) != CPP_EOF)
 warning (OPT_Wpragmas, "junk at end of %<#pragma GCC visibility%>");
 }
 
-static void
-handle_pragma_diagnostic(cpp_reader *ARG_UNUSED(dummy))
-{
-  const char *kind_string, *option_string;
-  unsigned int option_index;
-  enum cpp_ttype token;
+/* CPP call-back to handle "#pragma GCC diagnostic KIND_STRING
+   OPTION_STRING", where KIND_STRING is error, warning, ignored, push
+   or pop. LOC_KIND is the location of the KIND_STRING. LOC_OPTION is
+   the location of the warning option string.  */
+
+extern void
+cb_handle_pragma_diagnostic (location_t loc_kind, const char * kind_string,
+location_t loc_option, const char * option_string)
+{
+  if (!kind_string)
+{
+  warning_at (loc_kind, OPT_Wpragmas,
+ "missing [error|warning|ignored|push|pop]"
+ " after %<#pragma GCC diagnostic%>");
+  return;
+}
+
   diagnostic_t kind;
-  tree x;
-  struct cl_option_handlers handlers;
 
-  token = pragma_lex (&x);
-  if (token != CPP_NAME)
-GCC_BAD ("missing [error|warning|ignored] after %<#pragma GCC 
diagnostic%>");
-  kind_string = IDENTIFIER_POINTER (x);
   if (strcmp (kind_string, "error") == 0)
 kind = DK_ERROR;
   else if (strcmp (kind_string, "warning") == 0)
 kind = DK_WARNING;
   else if (strcmp (kind_string, "ignored") == 0)
 kind = DK_IGNORED;
   else if (strcmp (kind_string, "push") == 0)
 {
-  diagnostic_push_diagnostics (global_dc, input_location);
+  diagnostic_push_diagnostics (global_dc, loc_kind);
   return;
 }
   else if (strcmp (kind_string, "pop") == 0)
 {
-  diagnostic_pop_diagnostics (global_dc, input_location);
+  diagnostic_pop_diagnostics (global_dc, loc_kind);
   return;
 }
   else
-GCC_BAD ("expected [error|warning|ignored|push|pop] after %<#pragma GCC 
diagnostic%>");
+{
+  warning_at (loc_kind, OPT_Wpragmas,
+ "expected [error|warning|ign

Re: [PATCH 0/9] start converting POINTER_SIZE to a hook

2015-07-28 Thread Trevor Saunders
On Tue, Jul 28, 2015 at 09:24:17PM +0100, Richard Sandiford wrote:
> Trevor Saunders  writes:
> > On Mon, Jul 27, 2015 at 09:05:08PM +0100, Richard Sandiford wrote:
> >> Alternatively we could have a new target_globals structure that is
> >> initialised with the result of calling the hook.  If we do that though,
> >> it might make sense to consolidate the hooks rather than have one for
> >> every value.  E.g. having one function for UNITS_PER_WORD, one for
> >> POINTER_SIZE, one for Pmode, etc., would lead to some very verbose
> >> target code.
> >
> > so something like
> >
> > struct target_types
> > {
> >   unsigned long pointer_size;
> >   ...
> > };
> >
> > const target_types &targetm.get_type_data ()
> >
> > ? that seems pretty reasonable, and I wouldn't expect too many ordering
> > issues, but who knows.  Its too bad nobody has taken on the big job of
> > turning targetm into a class so we can hope for some devirt help from
> > the compiler.
> 
> I was thinking more:
> 
>   void targetm.get_type_data (target_types *);
> 
> The caller could then initialise or post-process the defaults.  The
> target_types would eventually end up in some target_globals structure.

but wouldn't that mean the hook would need to initialize all the fields
every time the hook was called?  I'd think you'd want to avoid that
work, and have a global or set of global constant  target_types structs,
and based on the target in use you could return the appropriate one.  a
target could also just have global one, but change its values when the
sub target changes, but having it be constant seems like a nicer design
for the target.  I guess that has the disadvantage that if the target
isn't a switchable target you need to reinitialize the whole struct
every time the hook is called on the off chance something has changed,
but that seems like the same thing that happens in your proposal?

Trev

> 
> Thanks,
> Richard


Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function

2015-07-28 Thread Jason Merrill
Sorry about the slow response on IRC today, I got distracted onto 
another issue and forgot to check back.  What I started to write:



I'm exploring your suggestion to see if the back end could emit the 
diagnostics. But I'm not sure it has sufficient context (location information) 
to point to the line of code that uses the function.


Hmm, that's a good point.  I think it would make sense for the ADDR_EXPR 
to carry location information as long as we're dealing with trees, but I 
suspect we don't currently set the location of an ADDR_EXPR.  So that 
would need to be fixed as part of this approach.



I suspect the back end or even the middle end route isn't going to work even if 
there was enough context to diagnose the problem expressions because some of them 
will have been optimized away by then (e.g., 'if (& __builtin_foo != 0)' is 
optimized into if (1) by gimple).


I was thinking that if they're optimized away, they aren't problematic 
anymore; that was part of the attraction for me of handling this lower down.



The second question is about your suggestion to consolidate the code into 
mark_rvalue_use. The problem I'm running into there is that mark_rvalue_use is 
called for calls to builtins as well as for other uses and doesn't have enough 
context to tell one from the other.


Ah, true.  But special-casing call uses is still fewer places than 
special-casing all non-call uses.


Jason


  1   2   >