date:20150722

[PING][gomp4, PATCH] Fix libgomp.oacc-c-c++-common/lib-3.c

2015-07-22 Thread Tom de Vries


On 01/07/15 13:16, Tom de Vries wrote:

Hi,

testcase libgomp.oacc-c-c++-common/lib-3.c is supposed to fail.

It fails currently in two ways:
- no device found, if there is no nonhost device type supported, so
   just host and host_nonshm
- no device initialized, if there is a nonhost device type supported,
   f.i. nvidia

The reason for the different failure modes is the usage of
acc_device_not_host.

Neither of the current failure modes is matches by the current dg-output:
...
/* { dg-output "device \[0-9\]+\\\(\[0-9\]+\\\) is initialized" } */
...
I don't understand what this dg-output is trying to achieve.

Attached patch makes sure that both current failure modes are tested and
accepted.

OK for gomp-4_0-branch?



Ping.

Thanks,
- Tom


0003-Fix-libgomp.oacc-c-c-common-lib-3.c.patch


Fix libgomp.oacc-c-c++-common/lib-3.c

2015-07-01  Tom de Vries

* testsuite/lib/libgomp.exp (offload_targets_nonhost): New var.
(check_effective_target_offload_target_nonhost_supported): New proc.
* testsuite/libgomp.oacc-c-c++-common/lib-3.c: Only run if
offload_target_nonhost_supported.
* testsuite/libgomp.oacc-c-c++-common/lib-3b.c: New test.  Copy of
lib-3.c, but only run if !offload_target_nonhost_supported.
---
  libgomp/testsuite/lib/libgomp.exp| 13 +
  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c  |  4 ++--
  libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3b.c | 16 
  3 files changed, 31 insertions(+), 2 deletions(-)
  create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3b.c

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 6dba22b..951e043 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -38,9 +38,11 @@ load_file libgomp-test-support.exp
  # Populate offload_targets_s (offloading targets separated by a space), and
  # offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
  # some of them a little differently).
+set offload_targets_nonhost 0
  set offload_targets_s [split $offload_targets ","]
  set offload_targets_s_openacc {}
  foreach offload_target_openacc $offload_targets_s {
+set nonhost 1
  switch $offload_target_openacc {
intelmic {
# TODO.  Skip; will all FAIL because of missing
@@ -50,8 +52,14 @@ foreach offload_target_openacc $offload_targets_s {
nvptx {
set offload_target_openacc "nvidia"
}
+   host_nonshm {
+   set nonhost 0
+   }
  }
  lappend offload_targets_s_openacc "$offload_target_openacc"
+if { $nonhost == 1 } {
+   set offload_targets_nonhost 1
+}
  }
  lappend offload_targets_s_openacc "host"

@@ -369,6 +377,11 @@ proc check_effective_target_offload_device { } {
  } ]
  }

+proc check_effective_target_offload_target_nonhost_supported { } {
+global offload_targets_nonhost
+return $offload_targets_nonhost;
+}
+
  proc check_effective_target_openacc_nvidia_accel_supported { } {
  global offload_targets_s_openacc
  set res [lsearch $offload_targets_s_openacc "nvidia" ]
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
index bb76c82..2a8c437 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { offload_target_nonhost_supported } } } */

  #include 

@@ -12,5 +12,5 @@ main (int argc, char **argv)
return 0;
  }

-/* { dg-output "device \[0-9\]+\\\(\[0-9\]+\\\) is initialized" } */
+/* { dg-output "no device initialized" } */
  /* { dg-shouldfail "" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3b.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3b.c
new file mode 100644
index 000..f21830c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3b.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target { ! offload_target_nonhost_supported } } } */
+
+#include 
+
+int
+main (int argc, char **argv)
+{
+  acc_init (acc_device_host);
+
+  acc_shutdown (acc_device_not_host);
+
+  return 0;
+}
+
+/* { dg-output "no device found" } */
+/* { dg-shouldfail "" } */
-- 1.9.1

Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.

2015-07-22 Thread Ramana Radhakrishnan


>>
> Committed to trunk r226036.
> Is patch ok for fsf-5?

OK for all release branches where affected as this is a testism.


Ramana

> kind regards,
> Alex
>

Re: [gomp] Move openacc vector& worker single handling to RTL

2015-07-22 Thread Thomas Schwinge

Hi Nathan!

On Tue, 21 Jul 2015 16:05:05 -0400, Nathan Sidwell  
wrote:
> On 07/18/15 11:37, Thomas Schwinge wrote:
> > On Thu, 09 Jul 2015 20:25:22 -0400, Nathan Sidwell  wrote:
> >> This is the patch I committed.  [...]
> >
> > Prompted by your recent "-O0 patch" to »[f]ix PTX worker spill/fill«, I
> > used the attached patch 0001-O0-libgomp-C-C-testing.patch to run all C
> > and C++ libgomp testing with -O0 (for Fortran, we iterate through various
> > kinds of optimization levels anyway).  (There are no regressions of
> > OpenMP testing.)
> >
> > For OpenACC nvptx offloading, there must still be something wrong; here's
> > a count of the (non-deterministic!) regressions of ten runs of the
> > libgomp testsuite.  As private-vars-loop-worker-5.c fails most often, it
> > probably makes sense to look into that one first.
> >
> > For avoidance of doubt, there are no such regressions if I un-apply your
> > patch to »[m]ove openacc vector& worker single handling to RTL«.
> 
> I cannot reproduce the failures.  Applying your patch I see the following new 
> fails:
> 
> FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/lib-5.c 
> -DACC_DEVICE_TYPE_host_nonshm=1 -DACC_MEM_SHARED=0 execution test
> FAIL: 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-local-worker-3.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 e
> xecution test
> FAIL: 
> libgomp.oacc-c/../libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 ex
> ecution test
> FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/present-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 output pattern te
> st, is , should match present clause: !acc_is_present
> FAIL: 
> libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-local-worker-2.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0
>   execution test
> FAIL: 
> libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0
> execution test
> FAIL: 
> libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0
> execution test
> FAIL: 
> libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0
> execution test
> 
> Which differs from your list.

Well, then instead look into one of these (the private-vars-* ones)?  :-)
(Still hoping they're all caused by the same problem.)

> Attempting to reproduce outside the test suite 
> results in working executables.

Have you tried running it multiple times?  As I said, it's
non-deterministic.

Taking from libgomp.log the compile command line of
private-vars-loop-worker-5.c for »-DACC_DEVICE_TYPE_nvidia=1«, removing
the constructor.o stuff, replacing »-L« by »{-L,-Wl\,-rpath\,}«, and
adding »-O0« at the end, I then see the following:

$ while :; do ./private-vars-loop-worker-5.exe 2> /dev/null && echo -n .; 
done
...Aborted (core dumped)
.Aborted (core dumped)
Aborted (core dumped)
Aborted (core dumped)
.Aborted (core dumped)
...Aborted (core dumped)
Aborted (core dumped)
Aborted (core dumped)
.Aborted (core dumped)
...Aborted (core dumped)
[...]


Grüße,
 Thomas


pgpgPPYz2mtcQ.pgp
Description: PGP signature

RE: [Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-07-22 Thread David Sherwood

Hi,

Sorry to bother people again. Is this OK to go now?

Thanks!
David.

-Original Message-
From: David Sherwood [mailto:david.sherw...@arm.com] 
Sent: 15 July 2015 11:29
To: 'Joseph Myers'
Cc: gcc-patches@gcc.gnu.org
Subject: RE: [Patch] Add support for IEEE-conformant versions of scalar fmin* 
and fmax*

> >
> > > On Mon, 29 Jun 2015, David Sherwood wrote:
> > >
> > > > Hi,
> > > >
> > > > I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to 
> > > > support the
> > > > IEEE versions of fmin and fmax. This is done by recognising the math 
> > > > library
> > > > "fmax" and "fmin" builtin functions in a similar way to how this is 
> > > > done for
> > > > -ffast-math. This also allows us to vectorise the IEEE max/min 
> > > > functions for
> > > > targets that support it, for example aarch64/aarch32.
> > >
> > > This patch is missing documentation.  You need to document the new insn
> > > patterns in md.texi and the new tree codes in generic.texi.
> >
> > Hi, I've uploaded a new patch with the documentation. Hope this is ok.
> 
> In various places where you refer to one operand being NaN, I think you
> mean one operand being a *quiet* NaN (if one is a signaling NaN - only
> supported by GCC if -fsignaling-nans - the IEEE minNum and maxNum
> operations raise "invalid" and return a quiet NaN).

Hi, I have a new patch that hopefully addresses the documentation issues.

Thanks,
David.

ChangeLog:

2015-07-15  David Sherwood  

gcc/
* builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
STRICT_MAX_EXPR.
(fold_builtin_fmin_fmax): For strict math, convert builting fmin and 
fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
* expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
* fold-const.c (const_binop): Likewise.
(fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
(tree_binary_nonzero_warnv_p): Likewise.
* optabs.h (strict_minmax_support): Declare.
* optabs.def: Add new optabs strict_max_optab/strict_min_optab.
* optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
and STRICT_MAX_EXPR.
(strict_minmax_support): New function.
* real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
* tree.def: Likewise.
* tree.c (associative_tree_code, commutative_tree_code): Likewise.
* tree-cfg.c (verify_expr): Likewise.
(verify_gimple_assign_binary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
(op_symbol_code): Likewise.
gcc/config:
* aarch64/aarch64.md: New pattern.
* aarch64/aarch64-simd.md: Likewise.
* aarch64/iterators.md: New unspecs, iterators.
* arm/iterators.md: New iterators.
* arm/unspecs.md: New unspecs.
* arm/neon.md: New pattern.
* arm/vfp.md: Likewise.
gcc/doc:
* generic.texi: Add STRICT_MAX_EXPR and STRICT_MIN_EXPR.
* md.texi: Add strict_min and strict_max patterns.
gcc/testsuite
* gcc.target/aarch64/maxmin_strict.c: New test.
* gcc.target/arm/maxmin_strict.c: New test.

strict_max.patch
Description: Binary data

Re: [gomp4] libgomp: Cope with DejaGnu having no mechanism to transfer environment variables to remote boards

2015-07-22 Thread Thomas Schwinge

Hi!

This is about communicating environment variables to the target for use
in (libgomp) execution testing.  In particular, the ACC_DEVICE_TYPE
environment variable to select at runtime which offloading device to use.
I had an ugly hack for that particular case,
,
but this is clearly not acceptable for commit.

On Thu, 14 May 2015 10:54:07 +0200, Jakub Jelinek  wrote:
> On Thu, May 14, 2015 at 10:05:36AM +0200, Jakub Jelinek wrote:
> > On Thu, May 14, 2015 at 12:10:50AM +0200, Thomas Schwinge wrote:
> > > No doubt, looking forward to the day, when this can be reverted.
> > > 
> > >   libgomp/
> > > * env.c (initialize_env): Remove static attribute.
> > >   * libgomp.map (INTERNAL): Export initialize_env.
> > 
> > Ugh.  While you achieve what you want for the remote board cases,
> > doesn't this completely break all OpenMP and OpenACC programs not built
> > as part of the testsuite, because initialize_env won't be called in that 
> > case?

(No, it doesn't break them; initialize_env is still called as before.)

> Can't you just tweak *.exp files so that if dg-set-target-env-var is used
> or some forced env var is added to the same list through other means,

The patch adding that machinery,

and thereabouts, just bails out for remote testing: the wrapper
gcc/testsuite/lib/gcc-dg.exp:${tool}_load does »if { [is_remote target] }
{ return [list "unsupported" ""]« if environment variables are to be set,
so test cases using dg-set-target-env-var are currently UNSUPPORTED in
case of remote testing.

> you invoke on the remo[t]e side env VAR1=val1 VAR2=val2 program arguments
> instead of program arguments ?

Yes, using a wrapper for the program invocation is an idea I had before,
and also what Mike suggested in
,
but implementing that is not exactly trivial as far as I can tell.

First, such a "env wrapper" is appropriate only for a certain class of
target systems (only for "Unix" systems, or something like that), so we
need a way (perferably automatic) to determine that property of the
target.

We can't do the wrapping in gcc/testsuite/lib/gcc-dg.exp:${tool}_load
because when that invokes the original *_load routine
(save_${tool}_load), its program argument will be checked for existence,
to be downloaded to the target, which in our case then would be the
wrapper executable (/usr/bin/env), which doesn't make sense/is not
correct/won't work.

Leaving aside the option of DejaGnu modifications as mentioned by Mike
(difficult to coordinate, deploy...), it seems that we can achieve what
we want by wrapping DejaGnu's remote_exec and remote_spawn procedures in
gcc/testsuite/lib/gcc-dg.exp, where we then can prepend the "env wrapper"
to the program and commandline strings, respectively.  But probably,
doing such wrapping if not exactly elegant, and apart from the question
whether it is appropriate to use an "env wrapper" for the execution at
stake (only for "Unix" systems), there is also a new problem: these
procedures are also invoked for host compilation, for example, in which
case we also don't want to set the "env wrapper".

Any clever ideas?

I can also back out, and follow the other idea that I had,
,
,
about ditching usage of the ACC_DEVICE_TYPE environment variable during
testing, and instead use -foffload=[...] during compilation to only
include offloading code in the executable for one particular device, so
that the runtime then only can offload to this particular device.  But of
course, that won't solve the general problem (other execution tests that
currently are UNSUPPORTED because of dg-set-target-env-var usage).

Grüße,
 Thomas

signature.asc
Description: PGP signature

libgomp testing, RUNTESTFLAGS (was: [gomp4] libgomp: Cope with DejaGnu having no mechanism to transfer environment variables to remote boards)

2015-07-22 Thread Thomas Schwinge

Hi!

(Cesar, you had the same question.)

On Thu, 14 May 2015 11:26:15 +0200, Jakub Jelinek  wrote:
> Talking about the libgomp testsuite, can we rename the
> libgomp/testsuite/libgomp-oacc-*/*.exp files to something unique?
> I mean, trying to run say just the OpenMP C tests is impossible since
> the OpenACC merge, because RUNTESTFLAGS=c.exp runs both OpenMP and OpenACC.

Hmm, I like it that c.exp in fact does run all C language tests, and so
on.

> If it was say acc-c.exp, then you can choose if you want to test just
> OpenACC, or just OpenMP, both, and which particular tests more accurately.

You can use RUNTESTFLAGS='libgomp.c/c.exp', or use runtest's --directory
option to restrict it to libgomp.c only, for example:
RUNTESTFLAGS='--directory libgomp.c c.exp=[...]'.

Grüße,
 Thomas

signature.asc
Description: PGP signature

[PATCH] Fix PR66952

2015-07-22 Thread Richard Biener


This PR shows an issue with ifcombine which ends up executing
stmts producing range info under a different condition than
before (always true).  With a twisted enough maze we end up
miscompiling the testcase for this reason.

Thus the following patch which resets all flow-sensitive
info on defs in the affected block.

I chose this over disabling ifcombine on blocks with flow-sensitive
info and I didn't see an easy way to preserve some of it by analyzing
the condition we replace with true/false.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-07-22  Richard Biener  

PR tree-optimization/66952
* tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): For
blocks we end up executing unconditionally reset all SSA
info such as range and alignment.
* tree-ssanames.h (reset_flow_sensitive_info): Declare.
* tree-ssanames.c (reset_flow_sensitive_info): New function.

* gcc.dg/torture/pr66952.c: New testcase.

Index: gcc/tree-ssa-ifcombine.c
===
--- gcc/tree-ssa-ifcombine.c(revision 226042)
+++ gcc/tree-ssa-ifcombine.c(working copy)
@@ -765,7 +765,22 @@ pass_tree_ifcombine::execute (function *
 
   if (stmt
  && gimple_code (stmt) == GIMPLE_COND)
-   cfg_changed |= tree_ssa_ifcombine_bb (bb);
+   if (tree_ssa_ifcombine_bb (bb))
+ {
+   /* Clear range info from all stmts in BB which is now executed
+  conditional on a always true/false condition.  */
+   for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+!gsi_end_p (gsi); gsi_next (&gsi))
+ {
+   gimple stmt = gsi_stmt (gsi);
+   ssa_op_iter i;
+   tree op;
+   FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
+ reset_flow_sensitive_info (op);
+ }
+
+   cfg_changed |= true;
+ }
 }
 
   free (bbs);
Index: gcc/tree-ssanames.c
===
--- gcc/tree-ssanames.c (revision 226042)
+++ gcc/tree-ssanames.c (working copy)
@@ -528,6 +528,23 @@ duplicate_ssa_name_fn (struct function *
 }
 
 
+/* Reset all flow sensitive data on NAME such as range-info, nonzero
+   bits and alignment.  */
+
+void
+reset_flow_sensitive_info (tree name)
+{
+  if (POINTER_TYPE_P (TREE_TYPE (name)))
+{
+  /* points-to info is not flow-sensitive.  */
+  if (SSA_NAME_PTR_INFO (name))
+   mark_ptr_info_alignment_unknown (SSA_NAME_PTR_INFO (name));
+}
+  else
+SSA_NAME_RANGE_INFO (name) = NULL;
+}
+
+
 /* Release all the SSA_NAMEs created by STMT.  */
 
 void
Index: gcc/tree-ssanames.h
===
--- gcc/tree-ssanames.h (revision 226042)
+++ gcc/tree-ssanames.h (working copy)
@@ -94,6 +94,7 @@ extern void duplicate_ssa_name_ptr_info
 extern tree duplicate_ssa_name_fn (struct function *, tree, gimple);
 extern void duplicate_ssa_name_range_info (tree, enum value_range_type,
   struct range_info_def *);
+extern void reset_flow_sensitive_info (tree);
 extern void release_defs (gimple);
 extern void replace_ssa_name_symbol (tree, tree);
 
Index: gcc/testsuite/gcc.dg/torture/pr66952.c
===
--- gcc/testsuite/gcc.dg/torture/pr66952.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr66952.c  (working copy)
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+
+int a = 128, b;
+
+static int
+fn1 (char p1, int p2)
+{
+  return p1 < 0 || p1 > 1 >> p2 ? 0 : p1 << 1;
+}
+
+static int
+fn2 ()
+{
+  char c = a;
+  b = fn1 (c, 1);
+  if ((128 | c) < 0 ? 1 : 0)
+return 1;
+  return 0;
+}
+
+int
+main ()
+{
+  if (fn2 () != 1)
+__builtin_abort ();
+
+  return 0;
+}

[PATCH] Fix PR66945

2015-07-22 Thread Richard Biener


The following fixes issues that arise when a SSA propagator ends up
deciding only a single outgoing edge is executable but the folder
at substitute-and-fold time decides the other one is executable.
This can of course only happen with undefined behavior (or with
bugs...).  In this case the propagator (copyprop) sees a
condition if (0 > unsigned-var) and decides this always evaluates
to false.  But fold, given more context and being inherently more
powerful than a simple copyprop sees that unsigned-var is actually
0 % 0 and 0 > 0 % 0 evaluates to true (because we have that match.pd
pattern saying that X % Y is smaller than Y which is a valid answer
considering that % 0 invokes undefined behavior).

Rather than trying to fix this by conditionalizing that pattern
against a zero modulo I decided that of course what the propagator
things of edge executability has to agree with what fold produces,
thus we just force the propagators idea.

Otherwise you run into this testcases issue where the lattice
contains a value that is computed in the path that fold now
decides to delete.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-07-22  Richard Biener  

PR tree-optimization/66945
* tree-ssa-propagate.c (substitute_and_fold_dom_walker
::before_dom_children): Force the propagators idea of
non-executable edges to materialize, not what the folder
chooses.

* gcc.dg/torture/pr66945.c: New testcase.

Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 226042)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -1236,13 +1236,33 @@ substitute_and_fold_dom_walker::before_d
 
   /* If we made a replacement, fold the statement.  */
   if (did_replace)
-   fold_stmt (&i, follow_single_use_edges);
+   {
+ fold_stmt (&i, follow_single_use_edges);
+ stmt = gsi_stmt (i);
+   }
+
+  /* If this is a control statement the propagator left edges
+ unexecuted on force the condition in a way consistent with
+that.  See PR66945 for cases where the propagator can end
+up with a different idea of a taken edge than folding
+(once undefined behavior is involved).  */
+  if (gimple_code (stmt) == GIMPLE_COND)
+   {
+ if ((EDGE_SUCC (bb, 0)->flags & EDGE_EXECUTABLE)
+ ^ (EDGE_SUCC (bb, 1)->flags & EDGE_EXECUTABLE))
+   {
+ if (((EDGE_SUCC (bb, 0)->flags & EDGE_TRUE_VALUE) != 0)
+ == ((EDGE_SUCC (bb, 0)->flags & EDGE_EXECUTABLE) != 0))
+   gimple_cond_make_true (as_a  (stmt));
+ else
+   gimple_cond_make_false (as_a  (stmt));
+ did_replace = true;
+   }
+   }
 
   /* Now cleanup.  */
   if (did_replace)
{
- stmt = gsi_stmt (i);
-
  /* If we cleaned up EH information from the statement,
 remove EH edges.  */
  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
Index: gcc/testsuite/gcc.dg/torture/pr66945.c
===
--- gcc/testsuite/gcc.dg/torture/pr66945.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr66945.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+unsigned b;
+void f()
+{
+  for(;;)
+if(!b?:(b=0))
+  ;
+else if(b%0

Re: [PATCH][AArch64][11/14] Re-layout SIMD builtin types on builtin expansion

2015-07-22 Thread James Greenhalgh

On Tue, Jul 21, 2015 at 05:59:39PM +0100, Kyrill Tkachov wrote:
> Sorry, here's the correct version, which uses initialized instead of inited 
> in one of the variable names.

Some nits below.

> 
> Kyrill
> 
> 2015-07-21  Kyrylo Tkachov  
> 
>  * config/aarch64/aarch64.c (aarch64_option_valid_attribute_p):
>  Initialize simd builtins if TARGET_SIMD.
>  * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
>  Make sure that the builtins are initialized only once no matter how
>  many times the function is called.
>  (aarch64_init_builtins): Unconditionally initialize crc builtins.
>  (aarch64_relayout_simd_param): New function.
>  (aarch64_simd_expand_args): Use above during argument expansion.
>  * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Initialize
>  simd builtins if TARGET_SIMD.
>  * config/aarch64/aarch64-protos.h (aarch64_init_simd_builtins): New
>  prototype.
>  (aarch64_relayout_simd_types): Likewise.
> 
> 2015-07-21  Kyrylo Tkachov  
> 
>  * gcc.target/aarch64/target-attr-crypto-ice-1.c: New test.
> 
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index ec60955..ae0ea5b 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -684,11 +684,18 @@ aarch64_init_simd_builtin_scalar_types (void)
>"__builtin_aarch64_simd_udi");
>  }
>  
> -static void
> +static bool simd_builtins_initialized_p = false;

This should be in the "aarch64_" "namespace". simd_builtins_initialized_p
sounds generic enough that it might one day collide.

> +
> +void
>  aarch64_init_simd_builtins (void)
>  {
>unsigned int i, fcode = AARCH64_SIMD_PATTERN_START;
>  
> +  if (simd_builtins_initialized_p)
> +return;
> +
> +  simd_builtins_initialized_p = true;
> +
>aarch64_init_simd_builtin_types ();
>  
>/* Strong-typing hasn't been implemented for all AdvSIMD builtin 
> intrinsics.
> @@ -851,8 +858,8 @@ aarch64_init_builtins (void)
>  
>if (TARGET_SIMD)
>  aarch64_init_simd_builtins ();
> -  if (TARGET_CRC32)
> -aarch64_init_crc32_builtins ();
> +
> +  aarch64_init_crc32_builtins ();
>  }
>  
>  tree
> @@ -872,6 +879,31 @@ typedef enum
>SIMD_ARG_STOP
>  } builtin_simd_arg;
>  
> +/* Relayout the decl of a function arg.  Keep the RTL component the same,
> +   as varasm.c ICEs at varasm.c:1324.  It doesn't like reinitializing the RTL

I think hard coding the line number is probably not helpful as the code
base evolves.

> +   on PARM decls.  Something like this needs to be done when compiling a
> +   file without SIMD and then tagging a function with +simd and using SIMD
> +   intrinsics in there.  The types will have been laid out assuming no SIMD,
> +   so we want to re-lay them out.  */
> +
> +static void
> +aarch64_relayout_simd_param (tree arg)
> +{
> +  tree argdecl = arg;
> +  if (TREE_CODE (argdecl) == SSA_NAME)
> +argdecl = SSA_NAME_VAR (argdecl);
> +
> +  if (argdecl
> +  && (TREE_CODE (argdecl) == PARM_DECL
> +   || TREE_CODE (argdecl) == VAR_DECL))
> +{
> +  rtx rtl = NULL_RTX;
> +  rtl = DECL_RTL_IF_SET (argdecl);
> +  relayout_decl (argdecl);
> +  SET_DECL_RTL (argdecl, rtl);
> +}
> +}
> +
>  static rtx
>  aarch64_simd_expand_args (rtx target, int icode, int have_retval,
> tree exp, builtin_simd_arg *args)
> @@ -900,6 +932,7 @@ aarch64_simd_expand_args (rtx target, int icode, int 
> have_retval,
>   {
> tree arg = CALL_EXPR_ARG (exp, opc - have_retval);
> enum machine_mode mode = insn_data[icode].operand[opc].mode;
> +   aarch64_relayout_simd_param (arg);
> op[opc] = expand_normal (arg);
>  
> switch (thisarg)
> diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
> index c3798a1..ecc9974 100644
> --- a/gcc/config/aarch64/aarch64-c.c
> +++ b/gcc/config/aarch64/aarch64-c.c
> @@ -179,6 +179,19 @@ aarch64_pragma_target_parse (tree args, tree pop_target)
>  
>cpp_opts->warn_unused_macros = saved_warn_unused_macros;
>  
> +  /* Initialize SIMD builtins if we haven't already.
> + Set current_target_pragma to NULL for the duration so that
> + the builtin initialization code doesn't try to tag the functions
> + being built with the attributes specified by any current pragma, thus
> + going into an infinite recursion.  */
> +  if (TARGET_SIMD)
> +{
> +  tree saved_current_target_pragma = current_target_pragma;
> +  current_target_pragma = NULL;
> +  aarch64_init_simd_builtins ();
> +  current_target_pragma = saved_current_target_pragma;
> +}
> +
>return ret;
>  }
>  
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 0191f35..4fe437f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -382,6 +382,8 @@ extern bool

Re: [PATCH 3/4] S390 -march=native related fixes

2015-07-22 Thread Dominik Vogt

On Tue, Jul 21, 2015 at 07:05:27PM +0200, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> > * config/s390/driver-native.c (s390_host_detect_local_cpu): Handle
> > processor capabilities with -march=native.
> > * config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise.
> > (DRIVER_SELF_SPECS): Likewise.  Join specs for 31 and 64 bit.
> > * (S390_TARGET_BITS_STRING): Macro to simplify specs.
> (That last "*" is superfluous.)
> 
> This looks correct to me now, just a cosmetic comment:
> 
> > +/* Defaulting rules.  */
> > +#define DRIVER_SELF_SPECS  \
> > +  "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}} ",\
> > +  MARCH_MTUNE_NATIVE_SPECS,\
> > +  "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}} ",   \
> > +  "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}} "
> 
> There's no need to add those spaces at the end -- the self specs
> are all independent string, they don't need to end in a space.
> 
> Also, I had thought to put MARCH_MTUNE_NATIVE_SPECS right at the
> top of list, like so:
> 
> #define DRIVER_SELF_SPECS \
>   MARCH_MTUNE_NATIVE_SPECS,   \
>   "%{!m31:%{!m64:-m" S390_TARGET_BITS_STRING "}}",\
>   "%{!mesa:%{!mzarch:%{m31:-mesa}%{m64:-mzarch}}}",   \
>   "%{!march=*:%{mesa:-march=g5}%{mzarch:-march=z900}}"
> 
> But there should not be any functional difference between the two,
> it just looks a bit nicer maybe.

The order was important in an earlier version of the patch, so
That's why I missed that.  Version 5 of the patch cleans up all
the things you've mentioned and also removes a superfluous newline.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog
 
	* config/s390/driver-native.c (s390_host_detect_local_cpu): Handle
	processor capabilities with -march=native.
	* config/s390/s390.h (MARCH_MTUNE_NATIVE_SPECS): Likewise.
	(DRIVER_SELF_SPECS): Likewise.  Join specs for 31 and 64 bit.
	(S390_TARGET_BITS_STRING): Macro to simplify specs.
>From 34a28f0cee71af2ec46cdc6f37485746750b2874 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 6 Jul 2015 16:28:32 +0100
Subject: [PATCH 3/4] S390: Handle processor capabilities with -march=native.

---
 gcc/config/s390/driver-native.c | 143 
 gcc/config/s390/s390.h  |  25 ---
 2 files changed, 127 insertions(+), 41 deletions(-)

diff --git a/gcc/config/s390/driver-native.c b/gcc/config/s390/driver-native.c
index 88c76bd..5f7fe0a 100644
--- a/gcc/config/s390/driver-native.c
+++ b/gcc/config/s390/driver-native.c
@@ -42,6 +42,16 @@ s390_host_detect_local_cpu (int argc, const char **argv)
   char buf[256];
   FILE *f;
   bool arch;
+  const char *options = "";
+  unsigned int has_features;
+  unsigned int has_processor;
+  unsigned int is_cpu_z9_109 = 0;
+  unsigned int has_highgprs = 0;
+  unsigned int has_dfp = 0;
+  unsigned int has_te = 0;
+  unsigned int has_vx = 0;
+  unsigned int has_opt_esa_zarch = 0;
+  int i;
 
   if (argc < 1)
 return NULL;
@@ -49,43 +59,120 @@ s390_host_detect_local_cpu (int argc, const char **argv)
   arch = strcmp (argv[0], "arch") == 0;
   if (!arch && strcmp (argv[0], "tune"))
 return NULL;
+  for (i = 1; i < argc; i++)
+if (strcmp (argv[i], "mesa_mzarch") == 0)
+  has_opt_esa_zarch = 1;
 
   f = fopen ("/proc/cpuinfo", "r");
   if (f == NULL)
 return NULL;
 
-  while (fgets (buf, sizeof (buf), f) != NULL)
-if (strncmp (buf, "processor", sizeof ("processor") - 1) == 0)
-  {
-	if (strstr (buf, "machine = 9672") != NULL)
-	  cpu = "g5";
-	else if (strstr (buf, "machine = 2064") != NULL
-		 || strstr (buf, "machine = 2066") != NULL)
-	  cpu = "z900";
-	else if (strstr (buf, "machine = 2084") != NULL
-		 || strstr (buf, "machine = 2086") != NULL)
-	  cpu = "z990";
-	else if (strstr (buf, "machine = 2094") != NULL
-		 || strstr (buf, "machine = 2096") != NULL)
-	  cpu = "z9-109";
-	else if (strstr (buf, "machine = 2097") != NULL
-		 || strstr (buf, "machine = 2098") != NULL)
-	  cpu = "z10";
-	else if (strstr (buf, "machine = 2817") != NULL
-		 || strstr (buf, "machine = 2818") != NULL)
-	  cpu = "z196";
-	else if (strstr (buf, "machine = 2827") != NULL
-		 || strstr (buf, "machine = 2828") != NULL)
-	  cpu = "zEC12";
-	else if (strstr (buf, "machine = 2964") != NULL)
-	  cpu = "z13";
-	break;
-  }
+  for (has_features = 0, has_processor = 0;
+   (has_features == 0 || has_processor == 0)
+	 && fgets (buf, sizeof (buf), f) != NULL; )
+{
+  if (has_processor == 0 && strncmp (buf, "processor", 9) == 0)
+	{
+	  const char *p;
+	  long machine_id;
+
+	  p = strstr (buf, "machine = ");
+	  if (p == NULL)
+	continue;
+	  p += 10;
+	  has_processor = 1;
+	  machine_id = strtol (p, NULL, 16);
+	  switch (machine_id)
+	{
+	case 0x9672:
+	  cpu = "g5";
+	  break;
+	case 0x2064:
+	case 0x2066:
+	  c

[PATCH] Fix match with result operands and conditions

2015-07-22 Thread Richard Biener


Currently the code generated for the following (stupid example)

(match (integer_zerop @0)
 INTEGER_CST@0
 (if (integer_zerop (@0

is wrong in not assigning anything to the result @0.  The following
obvious patch fixes that.  We don't have a match pattern like the
above so it doesn't affect generated code.

Applied.

Richard.

2015-07-22  Richard Biener  

* genmatch.c (parser::parse_result): Properly handle
match with result operands and conditions.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226042)
+++ gcc/genmatch.c  (working copy)
@@ -3555,6 +3555,7 @@ parser::parse_result (operand *result, p
{
  if (!matcher)
fatal_at (peek (), "manual transform not implemented");
+ ife->trueexpr = result;
}
   eat_token (CPP_CLOSE_PAREN);
   return ife;

Re: [AArch64] PR63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-22 Thread Jiong Wang


Jiong Wang writes:

> Current IRA still use both target macros in a few places.
>
> Tell IRA to use the order we defined rather than with it's own cost
> calculation. Allocate caller saved first, then callee saved.
>
> This is especially useful for LR/x30, as it's free to allocate and is
> pure caller saved when used in leaf function.
>
> Haven't noticed significant impact on benchmarks, but by grepping some
> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
> is smaller.
>
> OK for trunk?
>
> 2015-05-19  Jiong. Wang  
>
> gcc/
>   PR 63521
>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>   (HONOR_REG_ALLOC_ORDER): Define.
>
> Regards,
> Jiong

Ping.

I know it's hard to notice the register allocation improvements by this
hook as current IRA/LRA has improved register allocation quite a lot.

But given the example like below:

test.c
==

double dec (double, double);

int cal (int a, int b, double d, double e)
{
  double sum = dec (a , a + b);
  sum = dec (b, a - b);
  sum = dec (sum, a * b);
  return d + e + sum;
}

Although the instruction number is the same before and after this patch,
but the instruction scheduling looks better after this patch as we
allocated w7 instead of w0 there is few instruction dependecies.

Before Patch (-O2)
==
cal:
stp x29, x30, [sp, -48]!
add x29, sp, 0
stp x19, x20, [sp, 16]
stp d8, d9, [sp, 32]
mov w19, w0
add w0, w0, w1
fmovd9, d1
mov w20, w1
fmovd8, d0
scvtf   d1, w0
scvtf   d0, w19
bl  dec
scvtf   d0, w20 
sub w0, w19, w20
mul w19, w19, w20
scvtf   d1, w0
bl  dec
scvtf   d1, w19
bl  dec
faddd8, d8, d9
ldp x19, x20, [sp, 16]
faddd0, d8, d0
ldp d8, d9, [sp, 32]
ldp x29, x30, [sp], 48
fcvtzs  w0, d0
ret

After Patch
===
cal:
stp x29, x30, [sp, -48]!
add w7, w0, w1
add x29, sp, 0
stp d8, d9, [sp, 32]
fmovd9, d1
fmovd8, d0
scvtf   d1, w7
scvtf   d0, w0
stp x19, x20, [sp, 16]
mov w20, w1 
mov w19, w0
bl  dec
scvtf   d0, w20
sub w7, w19, w20
mul w19, w19, w20
scvtf   d1, w7
bl  dec 
scvtf   d1, w19
bl  dec
faddd8, d8, d9
ldp x19, x20, [sp, 16]
faddd0, d8, d0
ldp d8, d9, [sp, 32]
ldp x29, x30, [sp], 48
fcvtzs  w0, d0
ret
-- 
Regards,
Jiong

[PATCH] [AArch64] fix typo in vec_store_lanesoi_lane

2015-07-22 Thread Charles Baylis

Committed as obvious r226061.

gcc/ChangeLog:

2015-07-22  Charles Baylis  

* config/aarch64/aarch64-simd.md (vec_store_lanesoi_lane): Fix
typo in attribute.
From 7d98f7fc82cfc3012b460e4f4f91200fedcb04db Mon Sep 17 00:00:00 2001
From: Charles Baylis 
Date: Tue, 21 Jul 2015 16:54:32 +0100
Subject: [PATCH 2/2] [AArch64] fix typo in vec_store_lanesoi_lane

gcc/ChangeLog:

  Charles Baylis  

* config/aarch64/aarch64-simd.md (vec_store_lanesoi_lane): Fix
	typo in attribute.

Change-Id: I299ea5c01d64cfc72a29c386128ce9e0fef2624b
---
 gcc/config/aarch64/aarch64-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d5da35a..40afced 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3970,7 +3970,7 @@
 operands[2] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[2])));
 return "st2\\t{%S1. - %T1.}[%2], %0";
   }
-  [(set_attr "type" "neon_store3_one_lane")]
+  [(set_attr "type" "neon_store2_one_lane")]
 )
 
 (define_expand "vec_store_lanesoi"
-- 
1.9.1

Re: [RFC, PR66873] Use graphite for parloops

2015-07-22 Thread Richard Biener

On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop  wrote:
> Tom de Vries wrote:
>> Fix reduction safety checks
>>
>>   * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>   flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>   TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
>>   Only allow wrapping fixed-point otherwise.
>>   (build_poly_scop): Always call
>>   rewrite_commutative_reductions_out_of_ssa.
>
> The changes to graphite look good to me.

+  if (SCALAR_FLOAT_TYPE_P (type))
+return flag_associative_math;
+

why only scalar floats?  Please use FLOAT_TYPE_P.

+  if (INTEGRAL_TYPE_P (type))
+return (!TYPE_OVERFLOW_TRAPS (type)
+   && TYPE_OVERFLOW_WRAPS (type));

it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.

I'm sure you'll disable quite some parallelization this way... (the
routine is modeled after
the vectorizers IIRC, so it would be affected as well).  Yeah - I see
you modify autopar
testcases.  Please instead XFAIL the existing ones and add variants
with unsigned
reductions.  Adding -fwrapv isn't a good solution either.

Can you think of a testcase that breaks btw?

The "proper" solution (see other passes) is to rewrite the reduction
to a wrapping
one (cast to unsigned for the reduction op).

+  return (FIXED_POINT_TYPE_P (type)
+ && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));

why?  Simply return false here instead?

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..e014be2 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
loop_info, gimple phi,
"reduction: unsafe fp math optimization: ");
   return NULL;
 }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-  && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
 {
...

You didn't need to adjust any testcases?  That's probably because the
checking above is
not always executed (see PR66623 for a related testcase).  The code
needs refactoring.
And we need a way-out, that is, we do _not_ want to not vectorize
signed reductions.
So you need to fix code generation instead.

+/* Nonzero if fixed-point type TYPE wraps at overflow.
+
+   GCC support of fixed-point types as specified by the draft technical report
+   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to
control overflow
+   and rounding behaviors are not implemented.
+
+   So, if not saturating, we assume modular wrap-around (see Annex E.4 Modwrap
+   overflow).  */
+
+#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
+  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))

somebody with knowledge about fixed-point types needs to review this.
I suggest to
leave fixed-point changes out from the initial patch submission.

Thanks,
Richard.

> Thanks,
> Sebastian

Re: [PATCH v3] [AArch64] PR63870 Improve error messages for NEON single lane memory access intrinsics

2015-07-22 Thread Charles Baylis

On 17 July 2015 at 09:32, James Greenhalgh  wrote:
> This seems an odd limitation, presumably this is a side effect of waiting
> until expand time to throw an error... It does suggest that we're tackling
> the problem in the wrong way by pushing this to so late in the compilation
> pipeline. The property here is on a type itself, which must take a constant
> value within a given range. That feels much more like the sort of thing
> we should be detecting and bailing out on closer to the front-end - perhaps
> with a more generic extension allowing you to annotate any type with an
> expected/required range (both as a helping hand for VRP and as a way to
> express programmer defined preconditions).
>
> But, given that adding such an extension is likely more effort than needed

Agreed on all counts :)

> I think this is OK for now!

Thanks.

Committed in r226059 with suggested fixes. The attribute typo fix was
applied separately
(https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01816.html).

Thanks
Charles

Re: [RFC, PR66873] Use graphite for parloops

2015-07-22 Thread Richard Biener

On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
 wrote:
> On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop  wrote:
>> Tom de Vries wrote:
>>> Fix reduction safety checks
>>>
>>>   * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>>   flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>>   TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
>>>   Only allow wrapping fixed-point otherwise.
>>>   (build_poly_scop): Always call
>>>   rewrite_commutative_reductions_out_of_ssa.
>>
>> The changes to graphite look good to me.
>
> +  if (SCALAR_FLOAT_TYPE_P (type))
> +return flag_associative_math;
> +
>
> why only scalar floats?  Please use FLOAT_TYPE_P.
>
> +  if (INTEGRAL_TYPE_P (type))
> +return (!TYPE_OVERFLOW_TRAPS (type)
> +   && TYPE_OVERFLOW_WRAPS (type));
>
> it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>
> I'm sure you'll disable quite some parallelization this way... (the
> routine is modeled after
> the vectorizers IIRC, so it would be affected as well).  Yeah - I see
> you modify autopar
> testcases.  Please instead XFAIL the existing ones and add variants
> with unsigned
> reductions.  Adding -fwrapv isn't a good solution either.
>
> Can you think of a testcase that breaks btw?
>
> The "proper" solution (see other passes) is to rewrite the reduction
> to a wrapping
> one (cast to unsigned for the reduction op).
>
> +  return (FIXED_POINT_TYPE_P (type)
> + && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
>
> why?  Simply return false here instead?
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 9145dbf..e014be2 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
> loop_info, gimple phi,
> "reduction: unsafe fp math optimization: ");
>return NULL;
>  }
> -  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
> -  && check_reduction)
> +  else if (INTEGRAL_TYPE_P (type) && check_reduction)
>  {
> ...
>
> You didn't need to adjust any testcases?  That's probably because the
> checking above is
> not always executed (see PR66623 for a related testcase).  The code
> needs refactoring.
> And we need a way-out, that is, we do _not_ want to not vectorize
> signed reductions.
> So you need to fix code generation instead.

Btw, for the vectorizer the current "trick" is that nobody takes advantage about
overflow undefinedness for vector types.

> +/* Nonzero if fixed-point type TYPE wraps at overflow.
> +
> +   GCC support of fixed-point types as specified by the draft technical 
> report
> +   (N1169 draft of ISO/IEC DTR 18037) is incomplete: Pragmas to
> control overflow
> +   and rounding behaviors are not implemented.
> +
> +   So, if not saturating, we assume modular wrap-around (see Annex E.4 
> Modwrap
> +   overflow).  */
> +
> +#define FIXED_POINT_TYPE_OVERFLOW_WRAPS_P(TYPE) \
> +  (NON_SAT_FIXED_POINT_TYPE_P (TYPE))
>
> somebody with knowledge about fixed-point types needs to review this.
> I suggest to
> leave fixed-point changes out from the initial patch submission.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Sebastian

Re: Fold some equal to and not equal to patterns in match.pd

2015-07-22 Thread Richard Biener

On Wed, Jul 22, 2015 at 2:40 AM, Andrew Pinski
 wrote:
> On Tue, Jul 21, 2015 at 12:16 PM, Richard Biener
>  wrote:
>> On July 21, 2015 11:38:31 AM GMT+02:00, Jakub Jelinek  
>> wrote:
>>>On Tue, Jul 21, 2015 at 09:15:31AM +, Hurugalawadi, Naveen wrote:
 Please find attached the patch which performs following patterns
>>>folding
 in match.pd:-

 a ==/!= a p+ b to b ==/!= 0.
 a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
>>>
>>>Not sure about this second one.  Why do you think it is generally
>>>beneficial?  On many targets, shifts are as fast as bitwise and, and
>>>-1>>N could be e.g. significantly more expensive constant (say require
>>>3 instructions to construct).
>>
>> And may set flags while shift not? Of course we do a very poor job of 
>> representing this kind of stuff on gimple.
>
> The biggest question now becomes which way is the canonical form for
> gimple and we can decide to optimize it on the RTL level (combine)
> instead if it produces better code in those cases.
> Note on AARCH64, producing x&(-1>>N) has no cost difference from a< so we would like to do it there.  Also in this case producing flags is
> useful.

Canonical GIMPLE is what has less GIMPLE operations.  In this case it's
a << N ==/!= 0 (two ops), not a & (-1 >> N) ==/!= 0 (three ops).

Thus on GIMPLE the transform is not wanted.

Richard.

> Thanks,
> Andrew
>
>>
>> Richard.
>>
>>>   Jakub
>>
>>

Re: Fold some equal to and not equal to patterns in match.pd

2015-07-22 Thread Richard Biener

On Tue, Jul 21, 2015 at 11:15 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
> Please find attached the patch which performs following patterns folding
> in match.pd:-
>
> a ==/!= a p+ b to b ==/!= 0.
> a << N ==/!= 0 to a&(-1>>N) ==/!= 0.
> a * N ==/!= 0 where N is a power of 2 to a & (-1< log2 of N.
>
> Please review the same and let us know if its okay?

+(match unsigned_integral_valued_p
+ @0
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type

please avoid adding matches for simple things like this, instead inline the if.

> a * N ==/!= 0 where N is a power of 2 to a & (-1< log2 of N.

We have the similar (for the signed case)

/* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
   signed arithmetic case.  That form is created by the compiler
   often enough for folding it to be of value.  One example is in
   computing loop trip counts after Operator Strength Reduction.  */

which shows the transform may be valid for other comparison codes as well.
Note as Jakub says the (-1 << N2) constant may be expensive to generate.
Did you check if we expand to a multiply again if that is the case?

Also you have the same idea in two patterns so if a & (-1 << N2) should
be canonical on GIMPLE (I'd say gimple should prefer "smaller" constants
when the number of operations is the same) then why not simplify
all multiplies that way instead of just those feeding comparisons?  That is,
a * N -> a << N?

So we have the remaining

+/* Fold a ==/!= a p + b to b ==/!= 0.  */
+(for op (ne eq)
+ (simplify
+  (op:c @0 (pointer_plus @0 @1))
+  (op @1 { build_zero_cst (TREE_TYPE (@1)); })))

fold_comparison has the more "complete"

  /* For comparisons of pointers we can decompose it to a compile time
 comparison of the base objects and the offsets into the object.
 This requires at least one operand being an ADDR_EXPR or a
 POINTER_PLUS_EXPR to do more than the operand_equal_p test below.  */

but it looks like you are implementing fold_binarys

  /* Transform comparisons of the form X +- Y CMP X to Y CMP 0.  */
  if ((TREE_CODE (arg0) == PLUS_EXPR
   || TREE_CODE (arg0) == POINTER_PLUS_EXPR
   || TREE_CODE (arg0) == MINUS_EXPR)
  && operand_equal_p (tree_strip_nop_conversions (TREE_OPERAND (arg0,
0)),
  arg1, 0)
  && (INTEGRAL_TYPE_P (TREE_TYPE (arg0))
  || POINTER_TYPE_P (TREE_TYPE (arg0
{
  tree val = TREE_OPERAND (arg0, 1);
  return omit_two_operands_loc (loc, type,
fold_build2_loc (loc, code, type,
 val,
 build_int_cst (TREE_TYPE (val),
0)),
TREE_OPERAND (arg0, 0), arg1);
}

but leave out non-pointer support.  So please start over with just
moving the above
code from fold-const.c to match.pd as a pattern.

Thanks,
Richard.

> Regression Tested on X86_64.
>
> On Behalf of Andrew Pinski.
>
> Thanks,
>
> gcc/testsuite/ChangeLog:
>
> 2015-01-21  Andrew Pinski  
>
> * testsuite/gcc.dg/tree-ssa/compare-shiftmult-1.c: New testcase.
> * testsuite/gcc.dg/tree-ssa/compare_pointers-1.c: New testcase.
>
> gcc/ChangeLog:
>
> 2015-01-21  Andrew Pinski  
>
> * match.pd (define_predicates): Add integer_pow2p.
> Add pattern for folding of a ==/!= a p+ b to b ==/!= 0.
> (unsigned_integral_valued_p): New match.
> Add pattern for folding of a<>N) ==/!= 0.
> Add pattern for folding of a*N ==/!= 0 where N is a power of 2
> to a&(-1<

Re: [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER

2015-07-22 Thread James Greenhalgh

On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
> Current IRA still use both target macros in a few places.
> 
> Tell IRA to use the order we defined rather than with it's own cost
> calculation. Allocate caller saved first, then callee saved.
> 
> This is especially useful for LR/x30, as it's free to allocate and is
> pure caller saved when used in leaf function.
> 
> Haven't noticed significant impact on benchmarks, but by grepping some
> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
> is smaller.
> 
> OK for trunk?

OK, sorry for the delay.

It might be mail client mangling, but please check that the trailing slashes
line up in the version that gets committed.

Thanks,
James

> 2015-05-19  Jiong. Wang  
> 
> gcc/
>   PR 63521
>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>   (HONOR_REG_ALLOC_ORDER): Define.
> 
 

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bf59e40..0acdf10 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -337,6 +337,31 @@ extern unsigned long aarch64_tune_flags;
>  V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
>}
>  
> +#define REG_ALLOC_ORDER  \
> +{\
> +  /* Reverse order for argument registers.  */   \
> +  7, 6, 5, 4, 3, 2, 1, 0,\
> +  /* Other caller-saved registers.  */   \
> +  8, 9, 10, 11, 12, 13, 14, 15,  \
> +  16, 17, 18, 30,\
> +  /* Callee-saved registers.  */ \
> +  19, 20, 21, 22, 23, 24, 25, 26,\
> +  27, 28,\
> +  /* All other registers.  */\
> +  29, 31,\
> +  /* Reverse order for argument vregisters.  */  \
> +  39, 38, 37, 36, 35, 34, 33, 32,\
> +  /* Other caller-saved vregisters.  */  \
> +  48, 49, 50, 51, 52, 53, 54, 55,\
> +  56, 57, 58, 59, 60, 61, 62, 63,\
> +  /* Callee-saved vregisters.  */\
> +  40, 41, 42, 43, 44, 45, 46, 47,\
> +  /* Other pseudo registers.  */ \
> +  64, 65, 66 \
> +}
> +
> +#define HONOR_REG_ALLOC_ORDER 1
> +
>  /* Say that the epilogue uses the return address register.  Note that
> in the case of sibcalls, the values "used by the epilogue" are
> considered live at the start of the called function.  */

New Ukrainian PO file for 'gcc' (version 5.2.0)

2015-07-22 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

http://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-5.2.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] Fix canonical form of true/false conditions

2015-07-22 Thread Richard Biener


Currently fold_stmt via gimple_cond_set_condition_from_tree and
gimple_cond_get_ops_from_tree and gimple_cond_make_false and
gimple_cond_make_true do not agree on the canonical form of
if (true) and if (false) resulting in spurious foldings.

The following makes gimple_cond_make_false/true follow the
!= 0 canoncalization that gimple_cond_get_ops_from_tree performs
and thus produce already folded conditions.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-07-22  Richard Biener  

* gimple.h (gimple_cond_make_false): Use 0 != 0.
(gimple_cond_make_true): Use 1 != 0.

Index: gcc/gimple.h
===
--- gcc/gimple.h(revision 226059)
+++ gcc/gimple.h(working copy)
@@ -3187,9 +3187,9 @@ gimple_cond_false_label (const gcond *gs
 static inline void
 gimple_cond_make_false (gcond *gs)
 {
-  gimple_cond_set_lhs (gs, boolean_true_node);
+  gimple_cond_set_lhs (gs, boolean_false_node);
   gimple_cond_set_rhs (gs, boolean_false_node);
-  gs->subcode = EQ_EXPR;
+  gs->subcode = NE_EXPR;
 }
 
 
@@ -3199,8 +3199,8 @@ static inline void
 gimple_cond_make_true (gcond *gs)
 {
   gimple_cond_set_lhs (gs, boolean_true_node);
-  gimple_cond_set_rhs (gs, boolean_true_node);
-  gs->subcode = EQ_EXPR;
+  gimple_cond_set_rhs (gs, boolean_false_node);
+  gs->subcode = NE_EXPR;
 }
 
 /* Check if conditional statemente GS is of the form 'if (1 == 1)',

[PATCH, nios2] Remove unused header from libgcc linux-atomic.c

2015-07-22 Thread Chung-Lin Tang

The  header was used back when Nios II Linux used a syscall 
cmpxchg,
long since removed and actually never got into the FSF trunk.

Patch removes the #include, and the following error code #defines which are
all no longer used. Committed.

Chung-Lin

2015-07-22  Chung-Lin Tang  

* config/nios2/linux-atomic.c (): Remove #include.
(EFAULT,EBUSY,ENOSYS): Delete unused #defines.

Index: config/nios2/linux-atomic.c
===
--- config/nios2/linux-atomic.c (revision 226061)
+++ config/nios2/linux-atomic.c (working copy)
@@ -20,11 +20,6 @@ a copy of the GCC Runtime Library Exception along
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-#include 
-#define EFAULT  14
-#define EBUSY   16
-#define ENOSYS  38
-
 /* We implement byte, short and int versions of each atomic operation
using the kernel helper defined below.  There is no support for
64-bit operations yet.  */

Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t & 0x7FFFFFFF)

2015-07-22 Thread Richard Biener

On Tue, Jul 21, 2015 at 11:16 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
>>> For signed types with TYPE_OVERFLOW_UNDEFINED
>>> you can simply cancel the operation (even for non-power-of-two multipliers).
>
> Thanks for the review and comments.
>
> Please find attached the modified patch as per your comments.
>
> Please review the same and let me know if any further modifications are 
> required.
>
> Regression Tested on X86_64.

@@ -280,6 +280,20 @@ along with GCC; see the file COPYING3.  If not see
&& integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
(bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); }))

+/* Simplify (unsigned t * 2)/2 -> unsigned t & 0x7FFF.  */
+(simplify
+ (exact_div (mult @0 INTEGER_CST@1) @1)
+ (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+  @0))

The comment applies to the pattern below and the pattern above lacks a comment

+(simplify
+ (trunc_div (mult @0 integer_pow2p@1) @1)
+ (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+ (with { tree n2 = build_int_cst (TREE_TYPE (@0),
+ wi::exact_log2 (@1)); }
+  (bit_and @0 (rshift (lshift { build_minus_one_cst (TREE_TYPE (@0)); }
+   { n2; }) { n2; })

please use

  (with
{
  int n2 = wi::exact_log2 (@1);
  tree mask = wide_int_to_tree (type, wi::rshift (wi::lshift (-1, n2), n2));
}
   (bit_and @0 { mask; }

in fact, the -1 << log2 >> log2 looks like it does wi::mask
(TYPE_PRECISION (type) - wi::exact_log2 (@1), false, TYPE_PRECISION
(type));
so using wi::mask is prefered here.

Thanks,
Richard.


> Thanks,
> Naveen
>
> gcc/testsuite/ChangeLog:
>
> 2015-07-21  Naveen H.S  
>
> PR middle-end/25529
> * gcc.dg/pr25529.c: New test.
>
> gcc/ChangeLog:
>
> 2015-07-21  Naveen H.S  
>
> PR middle-end/25529
> * match.pd (exact_div (mult @0 INTEGER_CST@1) @1) : New 
> simplifier.
> (trunc_div (mult @0 integer_pow2p@1) @1) : New simplifier.

Re: [PR25530] Convert (unsigned t / 2) * 2 into (unsigned t & ~1)

2015-07-22 Thread Richard Biener

On Tue, Jul 21, 2015 at 11:16 AM, Hurugalawadi, Naveen
 wrote:
> Hi,
>
>>> handle exact_div differently, like fold-const.c does.
>>> Then expressing ~1 with the result expression is really excessive - you
>>> should simply build this with @1 - 1 if @1 is a power of two.

(*)

> Thanks for the review and comments.
>
> Please find attached the modified patch as per your comments.
>
> Please review the same and let me know if any further modifications are 
> required.
>
> Regression Tested on X86_64.

We already have

+(simplify
+ (mult (exact_div @0 INTEGET_CST@1) @1)
+  @0)

as

/* (X /[ex] A) * A -> X.  */
(simplify
  (mult (convert? (exact_div @0 @1)) @1)
  /* Look through a sign-changing conversion.  */
  (convert @0))

as before the comment applies to your second pattern.

+(simplify
+ (mult (trunc_div @0 integer_pow2p@1) @1)
+  (bit_and @0 (negate @1)))

This doesn't work for signed types at least.
-1 / 2 * 2 == 0, not -2.  Your previous patch correctly
restricted this to unsigned types.

Thanks,
Richard.

> Thanks,
> Naveen
>
> gcc/testsuite/ChangeLog:
>
> 2015-07-21  Naveen H.S  
>
> PR middle-end/25530
> * gcc.dg/pr25530.c: New test.
>
> gcc/ChangeLog:
>
> 2015-07-21  Naveen H.S  
>
> PR middle-end/25530
> * match.pd (mult (exact_div @0 INTEGET_CST@1) @1) : New 
> simplifier.
> (mult (trunc_div @0 integer_pow2p@1) @1) : New simplifier.

Re: RFC: [PATCH] Add __builtin_ia32_stack_top

2015-07-22 Thread H.J. Lu

On Tue, Jul 21, 2015 at 2:45 PM, H.J. Lu  wrote:
> When __builtin_frame_address is used to retrieve the address of the
> function stack frame, the frame pointer is always kept, which wastes one
> register and 2 instructions.  For x86-32, one less register means
> significant negative impact on performance.  This patch adds a new
> builtin function, __builtin_ia32_stack_top, to x86 backend.  It
> returns the stack address when the function is called.
>
> Any comments, feedbacks?
>
> Thanks.
>
>
> H.J.
> ---
> gcc/
>
> PR target/66960
> * config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is
> used and the stack address has been taken.
> (ix86_builtins): Add IX86_BUILTIN_STACK_TOP.
> (ix86_init_mmx_sse_builtins): Add __builtin_ia32_stack_top.
> (ix86_expand_builtin): Handle IX86_BUILTIN_STACK_TOP.
> * config/i386/i386.h (machine_function): Add stack_top_taken.
> * doc/extend.texi: Document __builtin_ia32_stack_top.
>

I got a feedback, suggesting __builtin_stack_top, instead of
__builtin_ia32_stack_top.  But I don't know if

+  /* After the prologue, stack top is at -WORD(AP) in the current
+frame.  */
+  emit_insn (gen_rtx_SET (target,
+ plus_constant (Pmode, arg_pointer_rtx,
+-UNITS_PER_WORD)));

is true for all backends.  If it works on all backends, I can move
it to builtins.c.

-- 
H.J.

[PATCH] Add location to genmatch operator

2015-07-22 Thread Richard Biener


This simplifies code by adding a location to each operator.  This also
fixes bogus locations in the current generated files.

Bootstrap running on x86_64-unknown-linux-gnu.

Richard.

2015-07-22  Richard Biener  

* genmatch.c (struct operand): Add location member.
(predicate, expr, c_expr, capture, if_expr, with_expr): Adjust
constructors.
(struct simplify): Remove match_location and result_location
members.
(elsehwere): Adjust.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226060)
+++ gcc/genmatch.c  (working copy)
@@ -483,8 +483,10 @@ struct capture_info;
 
 struct operand {
   enum op_type { OP_PREDICATE, OP_EXPR, OP_CAPTURE, OP_C_EXPR, OP_IF, OP_WITH 
};
-  operand (enum op_type type_) : type (type_) {}
+  operand (enum op_type type_, source_location loc_)
+: type (type_), location (loc_) {}
   enum op_type type;
+  source_location location;
   virtual void gen_transform (FILE *, int, const char *, bool, int,
  const char *, capture_info *,
  dt_operand ** = 0,
@@ -496,7 +498,8 @@ struct operand {
 
 struct predicate : public operand
 {
-  predicate (predicate_id *p_) : operand (OP_PREDICATE), p (p_) {}
+  predicate (predicate_id *p_, source_location loc)
+: operand (OP_PREDICATE, loc), p (p_) {}
   predicate_id *p;
 };
 
@@ -505,12 +508,12 @@ struct predicate : public operand
 
 struct expr : public operand
 {
-  expr (id_base *operation_, bool is_commutative_ = false)
-: operand (OP_EXPR), operation (operation_),
+  expr (id_base *operation_, source_location loc, bool is_commutative_ = false)
+: operand (OP_EXPR, loc), operation (operation_),
   ops (vNULL), expr_type (NULL), is_commutative (is_commutative_),
   is_generic (false), force_single_use (false) {}
   expr (expr *e)
-: operand (OP_EXPR), operation (e->operation),
+: operand (OP_EXPR, e->location), operation (e->operation),
   ops (vNULL), expr_type (e->expr_type), is_commutative 
(e->is_commutative),
   is_generic (e->is_generic), force_single_use (e->force_single_use) {}
   void append_op (operand *op) { ops.safe_push (op); }
@@ -546,10 +549,11 @@ struct c_expr : public operand
 id_tab (const char *id_, const char *oper_): id (id_), oper (oper_) {}
   };
 
-  c_expr (cpp_reader *r_, vec code_, unsigned nr_stmts_,
+  c_expr (cpp_reader *r_, source_location loc,
+ vec code_, unsigned nr_stmts_,
  vec ids_, cid_map_t *capture_ids_)
-: operand (OP_C_EXPR), r (r_), code (code_), capture_ids (capture_ids_),
-  nr_stmts (nr_stmts_), ids (ids_) {}
+: operand (OP_C_EXPR, loc), r (r_), code (code_),
+  capture_ids (capture_ids_), nr_stmts (nr_stmts_), ids (ids_) {}
   /* cpplib tokens and state to transform this back to source.  */
   cpp_reader *r;
   vec code;
@@ -567,8 +571,8 @@ struct c_expr : public operand
 
 struct capture : public operand
 {
-  capture (unsigned where_, operand *what_)
-  : operand (OP_CAPTURE), where (where_), what (what_) {}
+  capture (source_location loc, unsigned where_, operand *what_)
+  : operand (OP_CAPTURE, loc), where (where_), what (what_) {}
   /* Identifier index for the value.  */
   unsigned where;
   /* The captured value.  */
@@ -582,8 +586,8 @@ struct capture : public operand
 
 struct if_expr : public operand
 {
-  if_expr () : operand (OP_IF), cond (NULL), trueexpr (NULL),
-falseexpr (NULL) {}
+  if_expr (source_location loc)
+: operand (OP_IF, loc), cond (NULL), trueexpr (NULL), falseexpr (NULL) {}
   c_expr *cond;
   operand *trueexpr;
   operand *falseexpr;
@@ -593,7 +597,8 @@ struct if_expr : public operand
 
 struct with_expr : public operand
 {
-  with_expr () : operand (OP_WITH), with (NULL), subexpr (NULL) {}
+  with_expr (source_location loc)
+: operand (OP_WITH, loc), with (NULL), subexpr (NULL) {}
   c_expr *with;
   operand *subexpr;
 };
@@ -655,25 +660,20 @@ struct simplify
 {
   enum simplify_kind { SIMPLIFY, MATCH };
 
-  simplify (simplify_kind kind_,
-   operand *match_, source_location match_location_,
-   struct operand *result_, source_location result_location_,
+  simplify (simplify_kind kind_, operand *match_, operand *result_,
vec > for_vec_, cid_map_t *capture_ids_)
-  : kind (kind_), match (match_), match_location (match_location_),
-  result (result_), result_location (result_location_),
+  : kind (kind_), match (match_), result (result_),
   for_vec (for_vec_),
   capture_ids (capture_ids_), capture_max (capture_ids_->elements () - 1) 
{}
 
   simplify_kind kind;
   /* The expression that is matched against the GENERIC or GIMPLE IL.  */
   operand *match;
-  source_location match_location;
   /* For a (simplify ...) an expression with ifs and withs with the expression
  produced when the pattern applies in the leafs.
  For a (match ...) the l

Re: [Patch, fortran] PR 37131, inline matmul

2015-07-22 Thread Mikael Morin


Le 21/07/2015 21:49, Thomas Koenig a écrit :

Am 21.07.2015 um 19:26 schrieb Mikael Morin:

I would like to avoid the hack in iresolve.  So let's reuse the
frontend-passes.c part of my patch (set resolved_isym)


I would much prefer if that was put into gfc_resolve_fe_runtime_error,
next to the assignment to c->resolved_sym.


Makes sense.


and then handle
it in gfc_conv_intrinsic_subroutine, the way my patch does it (I'm not
sure it actually fixes anything) or some other way (set
resolved_sym->backend_decl as in iresolve, ...).


It does actually fix the issue.  One way of constructing a test case
is to run

$ gfortran -fdump-tree-optimized -fno-realloc-lhs -fcheck=all -O -S
inline_matmul_2.f90

and count the number of calls to "_gfortran_runtime_error " in the
*.optimized dump (without the _at).  It should be zero.

So, OK from my side with the change above and corresponding test case.


This is what it looks like.
However, it introduces regressions on matmul_bounds_{2,4,5}.
It seems the "incorrect extent" runtime errors are completely optimized 
away (even at -O0).

Any ideas?

Mikael


2015-07-22  Mikael Morin  

* iresolve.c (gfc_resolve_fe_runtime_error): Set c->resolved_isym.
* tran-intrinsic.c (gfc_conv_intrinsic_function_args,
conv_intrinsic_procedure_args): Factor the non-function-specific code
from the former into the latter.
(gfc_intrinsic_argument_list_length, intrinsic_argument_list_length):
Ditto.
(gfc_conv_intrinsic_lib_function, conv_intrinsic_lib_procedure):
Ditto.
(gfc_conv_intrinsic_lib_function, find_intrinsic_map):
Factor out from the former into the latter.
(conv_intrinsic_runtime_error): New function.
(gfc_conv_intrinsic_subroutine): Call it
in the GFC_ISYM_FE_RUNTIME_ERROR case.

2015-07-22  Mikael Morin  

* gfortran.dg/inline_matmul_12.f90: New.




diff --git a/gcc/fortran/iresolve.c b/gcc/fortran/iresolve.c
index 9dab49e..1ccd93d 100644
--- a/gcc/fortran/iresolve.c
+++ b/gcc/fortran/iresolve.c
@@ -2208,6 +2208,7 @@ gfc_resolve_fe_runtime_error (gfc_code *c)
 a->name = "%VAL";
 
   c->resolved_sym = gfc_get_intrinsic_sub_symbol (name);
+  c->resolved_isym = gfc_intrinsic_subroutine_by_id (GFC_ISYM_FE_RUNTIME_ERROR);
 }
 
 void
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 1155481..bed8a1e 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -195,18 +195,14 @@ gfc_builtin_decl_for_float_kind (enum built_in_function double_built_in,
generated code to be ignored.  */
 
 static void
-gfc_conv_intrinsic_function_args (gfc_se *se, gfc_expr *expr,
-  tree *argarray, int nargs)
+conv_intrinsic_procedure_args (gfc_se *se, gfc_intrinsic_arg *formal,
+			   gfc_actual_arglist *actual, tree *argarray,
+			   int nargs)
 {
-  gfc_actual_arglist *actual;
   gfc_expr *e;
-  gfc_intrinsic_arg  *formal;
   gfc_se argse;
   int curr_arg;
 
-  formal = expr->value.function.isym->formal;
-  actual = expr->value.function.actual;
-
for (curr_arg = 0; curr_arg < nargs; curr_arg++,
 	actual = actual->next,
 	formal = formal ? formal->next : NULL)
@@ -248,16 +244,29 @@ gfc_conv_intrinsic_function_args (gfc_se *se, gfc_expr *expr,
 }
 }
 
+
+static void
+gfc_conv_intrinsic_function_args (gfc_se *se, gfc_expr *expr,
+  tree *argarray, int nargs)
+{
+  gfc_actual_arglist *actual;
+  gfc_intrinsic_arg  *formal;
+
+  formal = expr->value.function.isym->formal;
+  actual = expr->value.function.actual;
+  conv_intrinsic_procedure_args (se, formal, actual, argarray, nargs);
+}
+
+
 /* Count the number of actual arguments to the intrinsic function EXPR
including any "hidden" string length arguments.  */
 
 static unsigned int
-gfc_intrinsic_argument_list_length (gfc_expr *expr)
+intrinsic_argument_list_length (gfc_actual_arglist *actual)
 {
   int n = 0;
-  gfc_actual_arglist *actual;
 
-  for (actual = expr->value.function.actual; actual; actual = actual->next)
+  for (; actual; actual = actual->next)
 {
   if (!actual->expr)
 	continue;
@@ -272,6 +281,13 @@ gfc_intrinsic_argument_list_length (gfc_expr *expr)
 }
 
 
+static unsigned int
+gfc_intrinsic_argument_list_length (gfc_expr *expr)
+{
+  return intrinsic_argument_list_length (expr->value.function.actual);
+}
+
+
 /* Conversions between different types are output by the frontend as
intrinsic functions.  We implement these directly with inline code.  */
 
@@ -837,17 +853,31 @@ gfc_get_intrinsic_lib_fndecl (gfc_intrinsic_map_t * m, gfc_expr * expr)
 /* Convert an intrinsic function into an external or builtin call.  */
 
 static void
-gfc_conv_intrinsic_lib_function (gfc_se * se, gfc_expr * expr)
+conv_intrinsic_lib_procedure (gfc_se * se, tree fndecl,
+			  gfc_intrinsic_arg * formal,
+			  gfc_actual_arglist * actual)
 {
-  gfc_intrinsic_map_t *m;
-  tree fndecl;
   tree rettype;
   tree *args;
   unsigned int num_arg

Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)

2015-07-22 Thread David Edelsohn

On Tue, Jul 21, 2015 at 5:59 AM, Marek Polacek  wrote:
> On Mon, Jul 20, 2015 at 04:23:08PM -0400, David Edelsohn wrote:
>> This seems to have caused a number of new failures in the PPC
>> testsuite for vmx/unpack.
>
> Sorry about that.  Should be fixed with this patch I'm about to commit.
>
> 2015-07-21  Marek Polacek  
>
> * gcc.dg/vmx/unpack-be-order.c: Use -Wno-shift-overflow.
> * gcc.dg/vmx/unpack.c: Likewise.

This doesn't fully fix the failures.

> * gcc.target/powerpc/quad-atomic.c: Likewise.

> diff --git gcc/testsuite/gcc.dg/vmx/unpack.c gcc/testsuite/gcc.dg/vmx/unpack.c
> index 3c13163..e71a5a6 100644
> --- gcc/testsuite/gcc.dg/vmx/unpack.c
> +++ gcc/testsuite/gcc.dg/vmx/unpack.c
> @@ -1,3 +1,5 @@
> +/* { dg-options "-Wno-shift-overflow" } */
> +

Should this be dg-additional-options ?

>  #include "harness.h"
>
>  #define BIG 4294967295

Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)

2015-07-22 Thread Marek Polacek

On Wed, Jul 22, 2015 at 08:35:12AM -0400, David Edelsohn wrote:
> On Tue, Jul 21, 2015 at 5:59 AM, Marek Polacek  wrote:
> > On Mon, Jul 20, 2015 at 04:23:08PM -0400, David Edelsohn wrote:
> >> This seems to have caused a number of new failures in the PPC
> >> testsuite for vmx/unpack.
> >
> > Sorry about that.  Should be fixed with this patch I'm about to commit.
> >
> > 2015-07-21  Marek Polacek  
> >
> > * gcc.dg/vmx/unpack-be-order.c: Use -Wno-shift-overflow.
> > * gcc.dg/vmx/unpack.c: Likewise.
> 
> This doesn't fully fix the failures.
 
Ouch.  What other failures do you see?  I've tried the patch on ppc64-linux
and didn't see any others.

It'd be very weird to see -Wshift-overflow warnings when -Wno-shift-overflow
is in effect.

> > --- gcc/testsuite/gcc.dg/vmx/unpack.c
> > +++ gcc/testsuite/gcc.dg/vmx/unpack.c
> > @@ -1,3 +1,5 @@
> > +/* { dg-options "-Wno-shift-overflow" } */
> > +
> 
> Should this be dg-additional-options ?

I did what was in gcc.dg/vmx/unpack-be-order.c, i.e. dg-options.  Or does
using dg-additional-options help?

Marek

Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)

2015-07-22 Thread Segher Boessenkool

On Wed, Jul 22, 2015 at 02:39:14PM +0200, Marek Polacek wrote:
> On Wed, Jul 22, 2015 at 08:35:12AM -0400, David Edelsohn wrote:
> > On Tue, Jul 21, 2015 at 5:59 AM, Marek Polacek  wrote:
> > > On Mon, Jul 20, 2015 at 04:23:08PM -0400, David Edelsohn wrote:
> > >> This seems to have caused a number of new failures in the PPC
> > >> testsuite for vmx/unpack.
> > >
> > > Sorry about that.  Should be fixed with this patch I'm about to commit.
> > >
> > > 2015-07-21  Marek Polacek  
> > >
> > > * gcc.dg/vmx/unpack-be-order.c: Use -Wno-shift-overflow.
> > > * gcc.dg/vmx/unpack.c: Likewise.
> > 
> > This doesn't fully fix the failures.
>  
> Ouch.  What other failures do you see?  I've tried the patch on ppc64-linux
> and didn't see any others.

vmx.exp sets a bunch of options and the test overrides that now.  Options
like -maltivec are pretty important for this test to work -- it #includes
altivec.h, which does #error unless -maltivec is set, and things go downhill
from that.  unpack-be-order.c works, unpack.c blows up.

Does your compiler maybe default to -maltivec?


Segher

RE: [PATCH, MIPS] Scheduling for M51xx core family

2015-07-22 Thread Robert Suchanek

Hi Matthew,

> > gcc/
> >
> > * config/mips/m5100.md: New file.
> > * config/mips/mips-cpus.def (m5100, m5101): Define.
> > * config/mips/mips-tables.opt: Regenerate.
> > * config/mips/mips.c (mips_rtx_cost_data): Add costs for m5100.
> > * config/mips/mips.h (MIPS_ISA_LEVEL_SPEC): Map -march=m5100 and
> > -march=m5101 to -mips32r5.
> > (MIPS_ARCH_FLOAT_SPEC): Map -m5101 to -msoft-float.
> > (MIPS_ISA_NAN2008_SPEC): Map -march=m51* to -mnan=2008 if
> > !-msoft-float.
> > * config/mips/mips.md: Include m5100.md.
> > (processor): Add m5100.
> > * doc/invoke.texi (-march=@var{arch}): Add m5100, m5101.
> 
> OK, this looks fine.

The patch committed as r226065.

> I did realise while reading through this that the MIPS_ARCH_FLOAT_SPEC
> is not used for and ordinary MIPS Linux compiler which seems odd but
> I presume this is to make it possible to use one hard-float sysroot
> for any core and emulate the FPU when not present.
> 
> I think it is probably a mistake to have put MIPS_ARCH_FLOAT_SPEC in
> the mti-linux.h and android.h DRIVER_SELF_SPECS so I think they need
> removing. Although we support building soft-float multilibs I don't
> think they actually get used very much so leaving the selection of
> soft-float down to the end user in Linux seems wise.
> 
> With the i6400 scheduler committed then we can also get rid of the w32
> and w64 placeholders that were there solely to provide an R6 processor
> to use as the default processor for the generic arch options.

I'll send a follow-up patch to remove the redundant bits.

Regards,
Robert

RE: [PATCH, MIPS] Add -march=interaptiv

2015-07-22 Thread Robert Suchanek

Hi Catherine,

> > gcc/
> > * config/mips/mips-cpus.def (interaptiv): Define.
> > * config/mips/mips-tables.opt: Regenerate.
> > * config/mips/mips.h (MIPS_ISA_LEVEL_SPEC): Map -
> > march=interaptiv to
> > -mips32r2.
> > (BASE_DRIVER_SELF_SPECS): Likewise but map to -mdsp.
> > * doc/invoke.texi (-march=@var{arch}): Add interaptiv.
> > ---
> 
> Yes, this looks OK.

Committed as r226064.

Regards,
Robert

[PATCH] [gomp] Simplify thread pool initialization

2015-07-22 Thread Sebastian Huber

Move the thread pool initialization from the team start to the team
creation.  This eliminates one conditional expression.  In addition this
is a preparation patch to enable shared thread pools which I would like
to use for RTEMS later.  No unexpected failures on
x86_64-unknown-linux-gnu.

libgomp/ChangeLog
2015-07-22  Sebastian Huber  

* team.c (gomp_new_thread_pool): Delete and move content to ...
(gomp_get_thread_pool): ... new function.  Allocate and
initialize thread pool on demand.
(get_last_team): Use gomp_get_thread_pool().
(gomp_team_start): Delete thread pool initialization.
---
 libgomp/team.c | 56 +++-
 1 file changed, 27 insertions(+), 29 deletions(-)

diff --git a/libgomp/team.c b/libgomp/team.c
index 7671b05..5c56182 100644
--- a/libgomp/team.c
+++ b/libgomp/team.c
@@ -134,22 +134,39 @@ gomp_thread_start (void *xdata)
   return NULL;
 }
 
+/* Get the thread pool, allocate and initialize it on demand.  */
+
+static struct gomp_thread_pool *
+gomp_get_thread_pool (struct gomp_thread *thr, unsigned nthreads)
+{
+  struct gomp_thread_pool *pool = thr->thread_pool;
+  if (__builtin_expect (pool == NULL, 0))
+{
+  pool = gomp_malloc (sizeof (*pool));
+  pool->threads = NULL;
+  pool->threads_size = 0;
+  pool->threads_used = 0;
+  pool->last_team = NULL;
+  pool->threads_busy = nthreads;
+  thr->thread_pool = pool;
+  pthread_setspecific (gomp_thread_destructor, thr);
+}
+  return pool;
+}
+
 static inline struct gomp_team *
 get_last_team (unsigned nthreads)
 {
   struct gomp_thread *thr = gomp_thread ();
   if (thr->ts.team == NULL)
 {
-  struct gomp_thread_pool *pool = thr->thread_pool;
-  if (pool != NULL)
-   {
- struct gomp_team *last_team = pool->last_team;
- if (last_team != NULL && last_team->nthreads == nthreads)
-   {
- pool->last_team = NULL;
- return last_team;
-   }
-   }
+  struct gomp_thread_pool *pool = gomp_get_thread_pool (thr, nthreads);
+  struct gomp_team *last_team = pool->last_team;
+  if (last_team != NULL && last_team->nthreads == nthreads)
+{
+  pool->last_team = NULL;
+  return last_team;
+}
 }
   return NULL;
 }
@@ -219,19 +236,6 @@ free_team (struct gomp_team *team)
   free (team);
 }
 
-/* Allocate and initialize a thread pool. */
-
-static struct gomp_thread_pool *gomp_new_thread_pool (void)
-{
-  struct gomp_thread_pool *pool
-= gomp_malloc (sizeof(struct gomp_thread_pool));
-  pool->threads = NULL;
-  pool->threads_size = 0;
-  pool->threads_used = 0;
-  pool->last_team = NULL;
-  return pool;
-}
-
 static void
 gomp_free_pool_helper (void *thread_pool)
 {
@@ -316,12 +320,6 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned 
nthreads,
 
   thr = gomp_thread ();
   nested = thr->ts.team != NULL;
-  if (__builtin_expect (thr->thread_pool == NULL, 0))
-{
-  thr->thread_pool = gomp_new_thread_pool ();
-  thr->thread_pool->threads_busy = nthreads;
-  pthread_setspecific (gomp_thread_destructor, thr);
-}
   pool = thr->thread_pool;
   task = thr->task;
   icv = task ? &task->icv : &gomp_global_icv;
-- 
1.8.4.5

Re: Fold some equal to and not equal to patterns in match.pd

2015-07-22 Thread Segher Boessenkool

On Tue, Jul 21, 2015 at 05:40:07PM -0700, Andrew Pinski wrote:
> The biggest question now becomes which way is the canonical form for
> gimple and we can decide to optimize it on the RTL level (combine)
> instead if it produces better code in those cases.

combine does not do instruction selection in general; it only does
instruction combination.  It already handles most cases where shifts
and masks are combined; if you find one where it doesn't, please
report.

Segher

[PATCH] Add location to 'two conversions in a row' error

2015-07-22 Thread Richard Biener


... and improve wording.

Committed as obvious.

Richard.

2015-07-22  Richard Biener  

* genmatch.c (expr::gen_transform): Clarify error message
and display location.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 226067)
+++ gcc/genmatch.c  (working copy)
@@ -1877,7 +1890,7 @@ expr::gen_transform (FILE *f, int indent
   type = optype;
 }
   if (!type)
-fatal ("two conversions in a row");
+fatal_at (location, "cannot determine type of operand");
 
   fprintf_indent (f, indent, "{\n");
   indent += 2;

[PATCH] S/390: Improve risbg usage

2015-07-22 Thread Andreas Krebbel

Hi,

with the attached patch we use risbg in more situations.

This especially helps the SpecCPU 400.perlbench testcase.

Bootstrapped on s390 and s390x. No regressions.

I'll commit the patch after waiting a few days for review comments.

Bye,

-Andreas-


gcc/ChangeLog:

2015-07-22  Andreas Krebbel  

* config/s390/s390.c (s390_rtx_costs): Make risbg patterns
cheaper.
(s390_expand_insv): Don't generate risbg pattern for constant zero
sources.
* config/s390/s390.md ("*insv_zEC12_appendbitsleft")
("*insv_z10_appendbitsleft"): New pattern definitions.  New
splitters.

gcc/testsuite/ChangeLog:

2015-07-22  Andreas Krebbel  

* gcc.target/s390/insv-1.c: New test.
* gcc.target/s390/insv-2.c: New test.
* gcc.target/s390/insv-3.c: New test.


diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 861dfb2..a8712b9 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3321,13 +3321,26 @@ s390_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   *total = 0;
   return true;
 
+case IOR:
+  /* risbg */
+  if (GET_CODE (XEXP (x, 0)) == AND
+ && GET_CODE (XEXP (x, 1)) == ASHIFT
+ && REG_P (XEXP (XEXP (x, 0), 0))
+ && REG_P (XEXP (XEXP (x, 1), 0))
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+ && CONST_INT_P (XEXP (XEXP (x, 1), 1))
+ && (UINTVAL (XEXP (XEXP (x, 0), 1)) ==
+ (1UL << UINTVAL (XEXP (XEXP (x, 1), 1))) - 1))
+   {
+ *total = COSTS_N_INSNS (2);
+ return true;
+   }
 case ASHIFT:
 case ASHIFTRT:
 case LSHIFTRT:
 case ROTATE:
 case ROTATERT:
 case AND:
-case IOR:
 case XOR:
 case NEG:
 case NOT:
@@ -5839,8 +5852,17 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
 
   if (mode_s == VOIDmode)
{
- /* Assume const_int etc already in the proper mode.  */
- src = force_reg (mode, src);
+ /* For constant zero values the representation with AND
+appears to be folded in more situations than the (set
+(zero_extract) ...).
+We only do this when the start and end of the bitfield
+remain in the same SImode chunk.  That way nihf or nilf
+can be used.
+The AND patterns might still generate a risbg for this.  */
+ if (src == const0_rtx && bitpos / 32  == (bitpos + bitsize - 1) / 32)
+   return false;
+ else
+   src = force_reg (mode, src);
}
   else if (mode_s != mode)
{
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 8c07d1b..2961f61 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3776,6 +3776,71 @@
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
 
+; Implement appending Y on the left of S bits of X
+; x = (y << s) | (x & ((1 << s) - 1))
+(define_insn "*insv_zEC12_appendbitsleft"
+  [(set (match_operand:GPR 0 "nonimmediate_operand" "=d")
+   (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "0")
+ (match_operand:GPR 2 "immediate_operand" ""))
+(ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "d")
+(match_operand:GPR 4 "nonzero_shift_count_operand" 
""]
+  "TARGET_ZEC12 && UINTVAL (operands[2]) == (1UL << UINTVAL (operands[4])) - 1"
+  "risbgn\t%0,%3,64-,64-%4-1,%4"
+  [(set_attr "op_type" "RIE")
+   (set_attr "z10prop" "z10_super_E1")])
+
+(define_insn "*insv_z10_appendbitsleft"
+  [(set (match_operand:GPR 0 "nonimmediate_operand" "=d")
+   (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "0")
+ (match_operand:GPR 2 "immediate_operand" ""))
+(ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "d")
+(match_operand:GPR 4 "nonzero_shift_count_operand" 
""
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_Z10 && !TARGET_ZEC12 && UINTVAL (operands[2]) == (1UL << UINTVAL 
(operands[4])) - 1"
+  "risbg\t%0,%3,64-,64-%4-1,%4"
+  [(set_attr "op_type" "RIE")
+   (set_attr "z10prop" "z10_super_E1")])
+
+; z = (x << c) | (y >> d) with (x << c) and (y >> d) not overlapping after 
shifting
+;  -> z = y >> d; z = (x << c) | (y & ((1 << c) - 1))
+;  -> z = y >> d; z = risbg;
+
+(define_split
+  [(set (match_operand:GPR 0 "nonimmediate_operand" "")
+   (ior:GPR (lshiftrt:GPR (match_operand:GPR 1 "nonimmediate_operand" "")
+  (match_operand:GPR 2 
"nonzero_shift_count_operand" ""))
+(ashift:GPR (match_operand:GPR 3 "nonimmediate_operand" "")
+(match_operand:GPR 4 "nonzero_shift_count_operand" 
""]
+  "TARGET_ZEC12 && UINTVAL (operands[2]) + UINTVAL (operands[4]) >= "
+  [(set (match_dup 0)
+   (lshiftrt:GPR (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+   (ior:GPR (and:GPR (match_dup 0) (m

Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-22 Thread Maxim Blumental

Changed simd-7 test for C and Fortran to lower the execution time to
~0.6 and ~1 second respectively while keeping the test representative.
Measured the time like:

$ time make check-target-libgomp RUNTESTFLAGS="c.exp"=simd-7.c

on a Linux x86-64 machine. (Without the patch I also had a very long
time: ~7.5 and ~60 seconds)

2015-07-20 22:40 GMT+03:00 H.J. Lu :
> On Tue, Jul 14, 2015 at 11:04 AM, Jakub Jelinek  wrote:
>> On Tue, Jul 14, 2015 at 08:40:50PM +0300, Maxim Blumental wrote:
>>>  The patch replaces all FP comparisons with inequalities and epsilons
>>> in those tests for libgomp.
>>
>
> libgomp.fortran/examples-4/simd-7.f90 takes a very long time to
> run on Linux/ia32.  I get random:
>
> FAIL: libgomp.fortran/examples-4/simd-7.f90   -O0  execution test
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66950
>
>
> --
> H.J.



-- 


-
Sincerely yours,
Maxim Blumental
2015-07-22  Maxim Blumenthal  

PR libgomp/66950
* testsuite/libgomp.c/examples-4/simd-7.c: Lower the defined constant
N to 30.  Add iterative reference function for Fibonacci numbers
- fib_ref.
(fib): Correct corner cases in the recursion.
(main): Replace the non-simd loop with fib_ref call.
* testsuite/libgomp.fortran/examples-4/simd-7.f90: Add iterative
reference subroutine for Fibonacci numbers - fib_ref.
(fibonacci): Lower the parameter N to 30.  Correct accordingly check for
the last array element value.  Replace the non-simd loop with fib_ref
call.  Remove redundant b_ref array.  Remove the comparison of the last
array element with according Fibbonacci sequence element.
(fib): Correct corner cases in the recursion.


fib_fix.patch
Description: Binary data

Re: RFC: [PATCH] Add __builtin_ia32_stack_top

2015-07-22 Thread Segher Boessenkool

On Wed, Jul 22, 2015 at 05:10:04AM -0700, H.J. Lu wrote:
> I got a feedback, suggesting __builtin_stack_top, instead of
> __builtin_ia32_stack_top.  But I don't know if
> 
> +  /* After the prologue, stack top is at -WORD(AP) in the current
> +frame.  */
> +  emit_insn (gen_rtx_SET (target,
> + plus_constant (Pmode, arg_pointer_rtx,
> +-UNITS_PER_WORD)));
> 
> is true for all backends.  If it works on all backends, I can move
> it to builtins.c.

It doesn't afaik.  But can't you define INITIAL_FRAME_ADDRESS_RTX?


Segher

[PATCH] S/390: Fix cfi for GPR 2 FPR saves

2015-07-22 Thread Andreas Krebbel

Hi,

GCC currently does not emit register save cfi information for the
stack pointer register.  Instead dwarf2cfi considers the load into an
FPR as using a new CFA register from now on.

Adding a CFA_REGISTER note prevent dwarf2cfi from interpreting the
insn itself.

This fixes the Glibc testcases tst-cancelx4 and tst-cancelx5.

I'll commit the patch after waiting a few days for review comments.

Bootstrapped on s390x. No regressions.

Bye,

-Andreas-

gcc/ChangeLog:

2015-07-22  Andreas Krebbel  

* config/s390/s390.c (s390_save_gprs_to_fprs): Add CFA_REGISTER
reg note to the GPR -> FPR save instructions.

gcc/testsuite/ChangeLog:

2015-07-22  Andreas Krebbel  

* gcc.target/s390/gpr2fprsavecfi.c: New test.


diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index faf7621..a31f33c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10134,6 +10134,10 @@ s390_save_gprs_to_fprs (void)
emit_move_insn (gen_rtx_REG (DImode, cfun_gpr_save_slot (i)),
gen_rtx_REG (DImode, i));
  RTX_FRAME_RELATED_P (insn) = 1;
+ /* This prevents dwarf2cfi from interpreting the set.  Doing
+so it might emit def_cfa_register infos setting an FPR as
+new CFA.  */
+ add_reg_note (insn, REG_CFA_REGISTER, PATTERN (insn));
}
 }
 }
diff --git a/gcc/testsuite/gcc.target/s390/gpr2fprsavecfi.c 
b/gcc/testsuite/gcc.target/s390/gpr2fprsavecfi.c
new file mode 100644
index 000..92a0d3a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/gpr2fprsavecfi.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z10 -mzarch -fdwarf2-cfi-asm" } */
+
+char *gl[100];
+
+long
+foo ()
+{
+  long r = 0;
+  char bla[100];
+  int i;
+
+  __builtin_memcpy (bla, gl, 100);
+
+  for (i = 0; i < 100; i++)
+r += bla[i];
+
+  return r;
+}
+
+/* { dg-final { scan-assembler-not "cfi_def_cfa_register" } } */
+/* { dg-final { scan-assembler "cfi_register" } } */
+/* { dg-final { scan-assembler "cfi_def_cfa_offset" } } */

Re: RFC: [PATCH] Add __builtin_ia32_stack_top

2015-07-22 Thread H.J. Lu

On Wed, Jul 22, 2015 at 6:55 AM, Segher Boessenkool
 wrote:
> On Wed, Jul 22, 2015 at 05:10:04AM -0700, H.J. Lu wrote:
>> I got a feedback, suggesting __builtin_stack_top, instead of
>> __builtin_ia32_stack_top.  But I don't know if
>>
>> +  /* After the prologue, stack top is at -WORD(AP) in the current
>> +frame.  */
>> +  emit_insn (gen_rtx_SET (target,
>> + plus_constant (Pmode, arg_pointer_rtx,
>> +-UNITS_PER_WORD)));
>>
>> is true for all backends.  If it works on all backends, I can move
>> it to builtins.c.
>
> It doesn't afaik.  But can't you define INITIAL_FRAME_ADDRESS_RTX?
>
>
> Segher

Does INITIAL_FRAME_ADDRESS_RTX point to stack top? It certainly
can't be defined for x86.   I will write a midld-end patch and leave to each
backend to enable it.

-- 
H.J.

Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-22 Thread Jakub Jelinek

On Wed, Jul 22, 2015 at 04:55:09PM +0300, Maxim Blumental wrote:
> 2015-07-22  Maxim Blumenthal  
> 
>   PR libgomp/66950
>   * testsuite/libgomp.c/examples-4/simd-7.c: Lower the defined constant
>   N to 30.  Add iterative reference function for Fibonacci numbers
>   - fib_ref.

Incorrect changelog.  It should be:
* testsuite/libgomp.c/examples-4/simd-7.c (N): Change to 30 from 45.
(fib_ref): New function.

>   (fib): Correct corner cases in the recursion.

Why the n == 2 special case?  I think only the if (n <= 1) return n;
case is really needed.

>   (main): Replace the non-simd loop with fib_ref call.
>   * testsuite/libgomp.fortran/examples-4/simd-7.f90: Add iterative
>   reference subroutine for Fibonacci numbers - fib_ref.

See above for ChangeLog entry issue.

>   (fibonacci): Lower the parameter N to 30.  Correct accordingly check for
>   the last array element value.  Replace the non-simd loop with fib_ref
>   call.  Remove redundant b_ref array.  Remove the comparison of the last
>   array element with according Fibbonacci sequence element.
>   (fib): Correct corner cases in the recursion.

See above for n == 2 special case.

Jakub

[PATCH] MIPS: Prevent the p5600-bonding.c test from being run for the n32 and 64 ABIs

2015-07-22 Thread Andrew Bennett

Hi,

The MIPS p5600-bonding.c test is currently failing for the n32 and n64 
ABIs.  The test is checking if the load/store bonding patterns correctly 
match sequences of load/store instructions.  There are currently no load/store 
bonding patterns to match DI mode values.  For the n32 and n64 ABIs the code 
generated for the testcase produces DI mode load and stores; which means the 
load/store bonding patterns are not matched and the test fails. 

To fix this issue I have added a dg-skip-if option to the test to prevent it
from being run for the n32 and n64 ABIs.  When support for load/store bonding
for DI mode values has been added this can be removed.

The patch has been tested on the mti/img elf/linux-gnu toolchains, and
there have been no new regressions.

The patch and ChangeLog are below.

Ok to commit?


Many thanks,



Andrew


testsuite/
gcc.target/mips/p5600-bonding.c (dg-skip-if): Don't run the test for 
the 
n32 or n64 ABIs.
   
   
   
diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c 
b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
index 0890ffa..20c26ca 100644
--- a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
+++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
 /* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" "-O1" 
} { "" } } */
+/* { dg-skip-if "There is no DI mode support for load/store bonding" { *-*-* } 
{ "-mabi=n32" "-mabi=64" } { "" } } */
 typedef int VINT32 __attribute__ ((vector_size((16;
 
 void

[Patch ARM/AArch64 obvious] Fix typo: Rename insn_reservation cortex_53_advsimd to cortex_a53_advsimd

2015-07-22 Thread James Greenhalgh


Hi,

As subject. This makes the naming scheme for insn_reservations consistent in
config/arm/cortex-a53.md.

Checked that we still build a compiler after this cosmetic change, and
committed as obvious as revision 226069.

Thanks,
James

2015-07-22  James Greenhalgh  

* config/arm/cortex-a53 (cortex_53_advsimd): Rename to...
(cortex_a53_advsimd): ...This.

diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 9065170..db572f6 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -360,7 +360,7 @@
 ;; Crude Advanced SIMD approximation.
 
 
-(define_insn_reservation "cortex_53_advsimd" 4
+(define_insn_reservation "cortex_a53_advsimd" 4
   (and (eq_attr "tune" "cortexa53")
(eq_attr "is_neon_type" "yes"))
   "cortex_a53_simd0")

RE: [PATCH] MIPS: Prevent the p5600-bonding.c test from being run for the n32 and 64 ABIs

2015-07-22 Thread Matthew Fortune

Andrew Bennett  writes:
> diff --git a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> index 0890ffa..20c26ca 100644
> --- a/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> +++ b/gcc/testsuite/gcc.target/mips/p5600-bonding.c
> @@ -1,6 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-dp -mtune=p5600  -mno-micromips -mno-mips16" } */
>  /* { dg-skip-if "Bonding needs peephole optimization." { *-*-* } { "-O0" 
> "-O1" } { "" } }
> */
> +/* { dg-skip-if "There is no DI mode support for load/store bonding" { *-*-* 
> } { "-
> mabi=n32" "-mabi=64" } { "" } } */
>  typedef int VINT32 __attribute__ ((vector_size((16;

If the best fix we can do for this test is to limit what it tests then we
should still not just skip it. There is some precedence for tests that
require a specific arch with the isa=loongson special case. I'd rather
just lock the test down to p5600 as per the filename.

Thanks,
Matthew

RE: [PATCH, MIPS] I6400 scheduling

2015-07-22 Thread Robert Suchanek

Hi,

> > diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md new
> > file mode 100644 index 000..101a20c
> > --- /dev/null
> > +++ b/gcc/config/mips/i6400.md
> > @@ -0,0 +1,142 @@
> > +;; DFA-based pipeline description for I6400.
> > +;;
> > +;; Copyright (C) 2007-2015 Free Software Foundation, Inc.
> 
> This should just be 2015.

Fixed.
 
> > diff --git a/gcc/config/mips/mips-cpus.def b/gcc/config/mips/mips-
> > cpus.def index fb4bae0..90836a3 100644
> > --- a/gcc/config/mips/mips-cpus.def
> > +++ b/gcc/config/mips/mips-cpus.def
> > @@ -50,13 +50,13 @@ MIPS_CPU ("mips32r2", PROCESSOR_74KF2_1, 33,
> > PTF_AVOID_BRANCHLIKELY)
> > as mips32r2.  */
> >  MIPS_CPU ("mips32r3", PROCESSOR_M4K, 34, PTF_AVOID_BRANCHLIKELY)
> > MIPS_CPU ("mips32r5", PROCESSOR_P5600, 36, PTF_AVOID_BRANCHLIKELY) -
> > MIPS_CPU ("mips32r6", PROCESSOR_W32, 37, PTF_AVOID_BRANCHLIKELY)
> > +MIPS_CPU ("mips32r6", PROCESSOR_I6400, 37, PTF_AVOID_BRANCHLIKELY)
> >  MIPS_CPU ("mips64", PROCESSOR_5KC, 64, PTF_AVOID_BRANCHLIKELY)
> >  /* ??? For now just tune the generic MIPS64r2 and above for 5KC as
> > well.   */
> >  MIPS_CPU ("mips64r2", PROCESSOR_5KC, 65, PTF_AVOID_BRANCHLIKELY)
> > MIPS_CPU ("mips64r3", PROCESSOR_5KC, 66, PTF_AVOID_BRANCHLIKELY)
> > MIPS_CPU ("mips64r5", PROCESSOR_5KC, 68, PTF_AVOID_BRANCHLIKELY) -
> > MIPS_CPU ("mips64r6", PROCESSOR_W64, 69, PTF_AVOID_BRANCHLIKELY)
> > +MIPS_CPU ("mips64r6", PROCESSOR_I6400, 69, PTF_AVOID_BRANCHLIKELY)
> >
> >  /* MIPS I processors.  */
> >  MIPS_CPU ("r3000", PROCESSOR_R3000, 1, 0) @@ -166,3 +166,6 @@ MIPS_CPU
> > ("octeon+", PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY)  MIPS_CPU
> > ("octeon2", PROCESSOR_OCTEON2, 65, PTF_AVOID_BRANCHLIKELY)  MIPS_CPU
> > ("octeon3", PROCESSOR_OCTEON3, 65, PTF_AVOID_BRANCHLIKELY)  MIPS_CPU
> > ("xlp", PROCESSOR_XLP, 65, PTF_AVOID_BRANCHLIKELY)
> > +
> > +/* MIPS64 Release 6 processors.  */
> > +MIPS_CPU ("i6400", PROCESSOR_I6400, 69, PTF_AVOID_BRANCHLIKELY)
> 
> I don't think this really matters but the PTF_AVOID_BRANCHLIKELY should
> not be necessary for R6 cores as there are no branch likely instructions.
> Changing this may also require an update to the option handling code
> in mips.c I don't know if it will try to enable branch likely if you
> remove this.

PTF_AVOID_BRANCHLIKELY replaced with 0 in all 3 cases.
AFAICS, there is no need to update the option handling code. The branch
likely will not be enabled as it is additionally guarded by 
ISA_HAS_BRANCHLIKELY.

> 
> OK with those changes.

I'll commit the updated patch once the build completes.

> Does the I6400 support load/store bonding? I seem to think it does but
> could be wrong. If it does then dealing with it in a follow up patch is
> OK with me.

It does support the load/store bonding.  I'll test and send another patch
for this.

Regards,
Robert

Re: [gomp4, PATCH] Fix libgomp.oacc-c-c++-common/lib-3.c

2015-07-22 Thread Thomas Schwinge

Hi Tom!

On Wed, 1 Jul 2015 13:16:14 +0200, Tom de Vries  wrote:
> testcase libgomp.oacc-c-c++-common/lib-3.c is supposed to fail.
> 
> It fails currently in two ways:
> - no device found, if there is no nonhost device type supported, so
>just host and host_nonshm
> - no device initialized, if there is a nonhost device type supported,
>f.i. nvidia
> 
> The reason for the different failure modes is the usage of 
> acc_device_not_host.
> 
> Neither of the current failure modes is matches by the current dg-output:
> ...
> /* { dg-output "device \[0-9\]+\\\(\[0-9\]+\\\) is initialized" } */
> ...
> I don't understand what this dg-output is trying to achieve.

Yeah, neither do I.  I guess the behavior of libgomp changed at some
point.  (For avoidance of doubt, the current behavior is in accord with
the specification, as far as I can tell.)

> Attached patch makes sure that both current failure modes are tested and 
> accepted.

> Fix libgomp.oacc-c-c++-common/lib-3.c
> 
> 2015-07-01  Tom de Vries  
> 
>   * testsuite/lib/libgomp.exp (offload_targets_nonhost): New var.
>   (check_effective_target_offload_target_nonhost_supported): New proc.
>   * testsuite/libgomp.oacc-c-c++-common/lib-3.c: Only run if
>   offload_target_nonhost_supported.
>   * testsuite/libgomp.oacc-c-c++-common/lib-3b.c: New test.  Copy of
>   lib-3.c, but only run if !offload_target_nonhost_supported.

Thanks, but that seemed a bit heavy-weight to me to justify this for just
the small thing that this test case is, so in r226070, I committed the
following to gomp-4_0-branch:

commit bb8f2ef333bb999e6d5e9fe834efab3fbbefa6d8
Author: tschwinge 
Date:   Wed Jul 22 14:24:22 2015 +

libgomp: Resolve XFAIL in libgomp.oacc-c-c++-common/lib-3.c

libgomp/
* testsuite/libgomp.oacc-c-c++-common/lib-3.c: Resolve XFAIL.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226070 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp  |  4 
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c | 10 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 746003f..d71282c 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-07-22  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c-c++-common/lib-3.c: Resolve XFAIL.
+
 2015-07-21  James Norris  
 
* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Additional tests.
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
index d5f390d..e00053c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
@@ -1,4 +1,6 @@
-/* { dg-do run } */
+/* Expect an error message when shutting down a device different from the one
+   that has been initialized.  */
+/* { dg-do run { target { ! openacc_host_selected } } } */
 
 #include 
 
@@ -6,12 +8,10 @@ int
 main (int argc, char **argv)
 {
   acc_init (acc_device_host);
-
-  acc_shutdown (acc_device_not_host);
+  acc_shutdown (acc_device_default);
 
   return 0;
 }
 
-/* TODO: currently prints: "libgomp: no device found".  */
-/* { dg-output "device \[0-9\]+\\\(\[0-9\]+\\\) is initialized" { xfail *-*-* 
} } */
+/* { dg-output "no device initialized" } */
 /* { dg-shouldfail "" } */


Grüße,
 Thomas


pgpbN1gDM8Mhs.pgp
Description: PGP signature

Re: [gomp4.1] calculate pointer offsets for depend(sink)

2015-07-22 Thread Jakub Jelinek

On Tue, Jul 21, 2015 at 03:53:47PM -0700, Aldy Hernandez wrote:
> commit 61b2d11dfa8083014b385fc6ec6564fc18c41c72
> Author: Aldy Hernandez 
> Date:   Tue Jul 21 08:02:39 2015 -0700
> 
>   * tree-pretty-print.c (dump_omp_clause): Pass TYPE_SIGN to
>   wi::neg_p.
> c/
>   * c-typeck.c (c_finish_omp_clauses): Adjust pointer offsets for
>   OMP_CLAUSE_DEPEND_SINK.
> cp/
>   * semantics.c (cp_finish_omp_clause_depend_sink): New.
>   (finish_omp_clauses): Call cp_finish_omp_clause_depend_sink.

Ok, thanks.

Jakub

[gomp4] libgomp: Additional acc_shutdown bug fixing and testing (was: [gomp4, PATCH] Fix libgomp.oacc-c-c++-common/lib-3.c)

2015-07-22 Thread Thomas Schwinge

Hi!

On Wed, 22 Jul 2015 16:32:17 +0200, I wrote:
> On Wed, 1 Jul 2015 13:16:14 +0200, Tom de Vries  
> wrote:
> > testcase libgomp.oacc-c-c++-common/lib-3.c is supposed to fail.

> --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-3.c
> @@ -1,4 +1,6 @@
> -/* { dg-do run } */
> +/* Expect an error message when shutting down a device different from the one
> +   that has been initialized.  */
> +/* { dg-do run { target { ! openacc_host_selected } } } */
>  
>  #include 
>  
> @@ -6,12 +8,10 @@ int
>  main (int argc, char **argv)
>  {
>acc_init (acc_device_host);
> -
> -  acc_shutdown (acc_device_not_host);
> +  acc_shutdown (acc_device_default);
>  
>return 0;
>  }
>  
> -/* TODO: currently prints: "libgomp: no device found".  */
> -/* { dg-output "device \[0-9\]+\\\(\[0-9\]+\\\) is initialized" { xfail 
> *-*-* } } */
> +/* { dg-output "no device initialized" } */
>  /* { dg-shouldfail "" } */

Looking at this issue, I had found an inconsistency in libgomp; committed
to gomp-4_0-branch in r226071:

commit 4e1d42a292c3f868f63ec0b9a3577b6344e087e5
Author: tschwinge 
Date:   Wed Jul 22 14:24:33 2015 +

libgomp: Additional acc_shutdown bug fixing and testing

libgomp/
* oacc-init.c (acc_shutdown): Call gomp_init_targets_once.
* testsuite/libgomp.oacc-c-c++-common/lib-8.c: New file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226071 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp  |  3 +++
 libgomp/oacc-init.c |  2 ++
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-8.c | 16 
 3 files changed, 21 insertions(+)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index d71282c..0d3c62f 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,8 @@
 2015-07-22  Thomas Schwinge  
 
+   * oacc-init.c (acc_shutdown): Call gomp_init_targets_once.
+   * testsuite/libgomp.oacc-c-c++-common/lib-8.c: New file.
+
* testsuite/libgomp.oacc-c-c++-common/lib-3.c: Resolve XFAIL.
 
 2015-07-21  James Norris  
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 63ac710..f0d1df9 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -461,6 +461,8 @@ ialias (acc_init)
 void
 acc_shutdown (acc_device_t d)
 {
+  gomp_init_targets_once ();
+
   gomp_mutex_lock (&acc_device_lock);
 
   acc_shutdown_1 (d);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-8.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-8.c
new file mode 100644
index 000..5eb28af
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-8.c
@@ -0,0 +1,16 @@
+/* Expect error message when shutting down a device that has never been
+   initialized.  */
+/* { dg-do run } */
+
+#include 
+
+int
+main (int argc, char **argv)
+{
+  acc_shutdown (acc_device_default);
+
+  return 0;
+}
+
+/* { dg-output "no device initialized" } */
+/* { dg-shouldfail "" } */


Grüße,
 Thomas


pgp2W1Ey3mMn4.pgp
Description: PGP signature

[gomp4] libgomp testsuite: Remove some explicit acc_device_nvidia usage (was: [gomp4, PATCH] Fix libgomp.oacc-c-c++-common/lib-3.c)

2015-07-22 Thread Thomas Schwinge

Hi!

On Wed, 22 Jul 2015 16:32:17 +0200, I wrote:
> On Wed, 1 Jul 2015 13:16:14 +0200, Tom de Vries  
> wrote:
> > testcase libgomp.oacc-c-c++-common/lib-3.c is supposed to fail.

> libgomp: Resolve XFAIL in libgomp.oacc-c-c++-common/lib-3.c

Working on this, I also came up with the following cleanup; committed to
gomp-4_0-branch in r226072:

commit be5ec016c6e0e3981c609851a40fc1645e6b5d36
Author: tschwinge 
Date:   Wed Jul 22 14:24:47 2015 +

libgomp testsuite: Remove some explicit acc_device_nvidia usage.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/lib-1.c: Remove explicit
acc_device_nvidia usage.
* testsuite/libgomp.oacc-c-c++-common/lib-10.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-9.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226072 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp   |  6 ++
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c  | 14 ++
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c |  9 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c  | 17 +++--
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c  | 14 ++
 5 files changed, 18 insertions(+), 42 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 0d3c62f..33e7b3b 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2015-07-22  Thomas Schwinge  
 
+   * testsuite/libgomp.oacc-c-c++-common/lib-1.c: Remove explicit
+   acc_device_nvidia usage.
+   * testsuite/libgomp.oacc-c-c++-common/lib-10.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/lib-2.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/lib-9.c: Likewise.
+
* oacc-init.c (acc_shutdown): Call gomp_init_targets_once.
* testsuite/libgomp.oacc-c-c++-common/lib-8.c: New file.
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c
index 5ff23b2..b7729df 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-1.c
@@ -5,18 +5,8 @@
 int
 main (int argc, char **argv)
 {
-  acc_device_t devtype = acc_device_host;
-
-#if ACC_DEVICE_TYPE_nvidia
-  devtype = acc_device_nvidia;
-
-  if (acc_get_num_devices (devtype) == 0)
-return 0;
-#endif
-
-  acc_init (devtype);
-
-  acc_init (devtype);
+  acc_init (acc_device_default);
+  acc_init (acc_device_default);
 
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c
index cf1af8c..55054c0 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-10.c
@@ -7,14 +7,7 @@ int
 main (int argc, char **argv)
 {
   void *d;
-  acc_device_t devtype = acc_device_host;
-
-#if ACC_DEVICE_TYPE_nvidia
-  devtype = acc_device_nvidia;
-
-  if (acc_get_num_devices (acc_device_nvidia) == 0)
-return 0;
-#endif
+  acc_device_t devtype = acc_device_default;
 
   acc_init (devtype);
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c
index b16e9e6..90e67d4 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-2.c
@@ -5,20 +5,9 @@
 int
 main (int argc, char **argv)
 {
-  acc_device_t devtype = acc_device_host;
-
-#if ACC_DEVICE_TYPE_nvidia
-  devtype = acc_device_nvidia;
-
-  if (acc_get_num_devices (acc_device_nvidia) == 0)
-return 0;
-#endif
-
-  acc_init (devtype);
-
-  acc_shutdown (devtype);
-
-  acc_shutdown (devtype);
+  acc_init (acc_device_default);
+  acc_shutdown (acc_device_default);
+  acc_shutdown (acc_device_default);
 
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
index a4cf7f2..5dce9b8 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-9.c
@@ -9,18 +9,17 @@ main (int argc, char **argv)
   int i;
   int num_devices;
   int devnum;
-  acc_device_t devtype = acc_device_host;
-
-#if ACC_DEVICE_TYPE_nvidia
-  devtype = acc_device_nvidia;
-#endif
+  acc_device_t devtype = acc_device_default;
 
   num_devices = acc_get_num_devices (devtype);
   if (num_devices == 0)
-return 0;
+abort ();
 
   acc_init (devtype);
 
+  if (num_devices != acc_get_num_devices (devtype))
+abort ();
+
   for (i = 0; i < num_devices; i++)
 {
   acc_set_device_num (i, devtype);
@@ -31,8 +30,7 @@ main (int argc, char **argv)
 
   acc_shutdown (devtype);
 
-  num_devices = acc_get_num_devices (devtype);
-  if (num_devices == 0)
+  if (num_devices != acc_get_num_devices (devtype))
 abort ();
 
   for (i = 0; i < num_devices; i++)


Grüße,
 Thomas


pg

Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-22 Thread Maxim Blumental

 Corrected log and patch.

2015-07-22 17:12 GMT+03:00 Jakub Jelinek :
> On Wed, Jul 22, 2015 at 04:55:09PM +0300, Maxim Blumental wrote:
>> 2015-07-22  Maxim Blumenthal  
>>
>>   PR libgomp/66950
>>   * testsuite/libgomp.c/examples-4/simd-7.c: Lower the defined constant
>>   N to 30.  Add iterative reference function for Fibonacci numbers
>>   - fib_ref.
>
> Incorrect changelog.  It should be:
> * testsuite/libgomp.c/examples-4/simd-7.c (N): Change to 30 from 45.
> (fib_ref): New function.
>
>>   (fib): Correct corner cases in the recursion.
>
> Why the n == 2 special case?  I think only the if (n <= 1) return n;
> case is really needed.
>
>>   (main): Replace the non-simd loop with fib_ref call.
>>   * testsuite/libgomp.fortran/examples-4/simd-7.f90: Add iterative
>>   reference subroutine for Fibonacci numbers - fib_ref.
>
> See above for ChangeLog entry issue.
>
>>   (fibonacci): Lower the parameter N to 30.  Correct accordingly check 
>> for
>>   the last array element value.  Replace the non-simd loop with fib_ref
>>   call.  Remove redundant b_ref array.  Remove the comparison of the last
>>   array element with according Fibbonacci sequence element.
>>   (fib): Correct corner cases in the recursion.
>
> See above for n == 2 special case.
>
> Jakub

-- 

-
Sincerely yours,
Maxim Blumental
2015-07-22  Maxim Blumenthal  

PR libgomp/66950
* testsuite/libgomp.c/examples-4/simd-7.c(N): Change to 30 from 45.
(fib_ref): New function.
(fib): Correct corner cases in the recursion.
(main): Replace the non-simd loop with fib_ref call.
* testsuite/libgomp.fortran/examples-4/simd-7.f90: (fib_ref): New
subroutine.
(fibonacci): Lower the parameter N to 30.  Correct accordingly check
for the last array element value.  Replace the non-simd loop with
fib_ref call.  Remove redundant b_ref array.  Remove the comparison
of the last array element with according Fibbonacci sequence element.
(fib): Correct corner cases in the recursion.

fib_fix.patch
Description: Binary data

Re: Tests for libgomp based on OpenMP Examples 4.0.2.

2015-07-22 Thread Jakub Jelinek

On Wed, Jul 22, 2015 at 05:49:13PM +0300, Maxim Blumental wrote:
> 2015-07-22  Maxim Blumenthal  
> 
>   PR libgomp/66950
>   * testsuite/libgomp.c/examples-4/simd-7.c(N): Change to 30 from 45.

Space before (N):

>   (fib_ref): New function.
>   (fib): Correct corner cases in the recursion.
>   (main): Replace the non-simd loop with fib_ref call.
>   * testsuite/libgomp.fortran/examples-4/simd-7.f90: (fib_ref): New
>   subroutine.
>   (fibonacci): Lower the parameter N to 30.  Correct accordingly check
>   for the last array element value.  Replace the non-simd loop with
>   fib_ref call.  Remove redundant b_ref array.  Remove the comparison
>   of the last array element with according Fibbonacci sequence element.
>   (fib): Correct corner cases in the recursion.

Ok with that change.

Jakub

[PATCH] Document ftrapv/fwrapv interaction

2015-07-22 Thread Tom de Vries


[ Re: [RFC, PR66873] Use graphite for parloops ]
On 22/07/15 13:01, Richard Biener wrote:

why only scalar floats?  Please use FLOAT_TYPE_P.

+  if (INTEGRAL_TYPE_P (type))
+return (!TYPE_OVERFLOW_TRAPS (type)
+   && TYPE_OVERFLOW_WRAPS (type));

it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.


Hmm, indeed, when specifying both, one is quietly ignored. The 
documentation also doesn't mention this.


Attached untested patch mentions this ftrapv/fwrapv interaction in the docs.

OK for trunk, if bootstrap succeeds?

Thanks,
- Tom


Document ftrapv/fwrapv interaction

2015-07-22  Tom de Vries  

	* doc/invoke.texi (@item -ftrapv, @item -fwrapv): Document interaction.
---
 gcc/doc/invoke.texi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 55c2659..aa0b0c0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -23676,6 +23676,11 @@ option is used to control the temporary stack reuse optimization.
 @opindex ftrapv
 This option generates traps for signed overflow on addition, subtraction,
 multiplication operations.
+The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
+@option{-ftrapv} @option{-fwrapv} on the command-line results in
+@option{-fwrapv} being effective.  Note that only active options override, so
+using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
+results in @option{-ftrapv} being effective.
 
 @item -fwrapv
 @opindex fwrapv
@@ -23684,6 +23689,11 @@ overflow of addition, subtraction and multiplication wraps around
 using twos-complement representation.  This flag enables some optimizations
 and disables others.  This option is enabled by default for the Java
 front end, as required by the Java language specification.
+The options @option{-ftrapv} and @option{-fwrapv} override each other, so using
+@option{-ftrapv} @option{-fwrapv} on the command-line results in
+@option{-fwrapv} being effective.  Note that only active options override, so
+using @option{-ftrapv} @option{-fwrapv} @option{-fno-wrapv} on the command-line
+results in @option{-ftrapv} being effective.
 
 @item -fexceptions
 @opindex fexceptions
-- 
1.9.1

Some additional comments for nvptx.c

2015-07-22 Thread Bernd Schmidt

Nathan asked me to go through nvptx.c and update some comments for 
things that aren't completely obvious. I've committed the following to 
trunk.



Bernd
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 225937)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -280,7 +280,9 @@ write_as_kernel (tree attrs)
 	  || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE);
 }
 
-/* Write a function decl for DECL to S, where NAME is the name to be used.  */
+/* Write a function decl for DECL to S, where NAME is the name to be used.
+   This includes ptx .visible or .extern specifiers, .func or .kernel, and
+   argument and return types.  */
 
 static void
 nvptx_write_function_decl (std::stringstream &s, const char *name, const_tree decl)
@@ -770,7 +772,11 @@ nvptx_end_call_args (void)
   free_EXPR_LIST_list (&cfun->machine->call_args);
 }
 
-/* Emit the sequence for a call.  */
+/* Emit the sequence for a call to ADDRESS, setting RETVAL.  Keep
+   track of whether calls involving static chains or varargs were seen
+   in the current function.
+   For libcalls, maintain a hash table of decls we have seen, and
+   record a function decl for later when encountering a new one.  */
 
 void
 nvptx_expand_call (rtx retval, rtx address)
@@ -829,6 +835,8 @@ nvptx_expand_call (rtx retval, rtx addre
   XVECEXP (pat, 0, nargs + 1) = gen_rtx_USE (VOIDmode, this_arg);
 }
 
+  /* Construct the call insn, including a USE for each argument pseudo
+ register.  These will be used when printing the insn.  */
   int i;
   rtx arg;
   for (i = 1, arg = cfun->machine->call_args; arg; arg = XEXP (arg, 1), i++)
@@ -846,6 +854,11 @@ nvptx_expand_call (rtx retval, rtx addre
   t = gen_rtx_SET (tmp_retval, t);
 }
   XVECEXP (pat, 0, 0) = t;
+
+  /* If this is a libcall, decl_type is NULL. For a call to a non-libcall
+ undeclared function, we'll have an external decl without arg types.
+ In either case we have to try to construct a ptx declaration from one of
+ the calls to the function.  */
   if (!REG_P (callee)
   && (decl_type == NULL_TREE
 	  || (external_decl && TYPE_ARG_TYPES (decl_type) == NULL_TREE)))
@@ -1412,7 +1425,10 @@ nvptx_addr_space_from_address (rtx addr)
   return ADDR_SPACE_GLOBAL;
 }
 
-/* Machinery to output constant initializers.  */
+/* Machinery to output constant initializers.  When beginning an initializer,
+   we decide on a chunk size (which is visible in ptx in the type used), and
+   then all initializer data is buffered until a chunk is filled and ready to
+   be written out.  */
 
 /* Used when assembling integers to ensure data is emitted in
pieces whose size matches the declaration we printed.  */
@@ -1682,7 +1698,8 @@ nvptx_assemble_undefined_decl (FILE *fil
 }
 
 /* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
-   involves writing .param declarations and in/out copies into them.  */
+   involves writing .param declarations and in/out copies into them.  For
+   indirect calls, also write the .callprototype.  */
 
 const char *
 nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
@@ -1702,6 +1719,7 @@ nvptx_output_call_insn (rtx_insn *insn,
 	 false));
 }
 
+  /* Ensure we have a ptx declaration in the output if necessary.  */
   if (GET_CODE (callee) == SYMBOL_REF)
 {
   decl = SYMBOL_REF_DECL (callee);
@@ -3031,7 +3049,8 @@ nvptx_file_start (void)
   fputs ("// END PREAMBLE\n", asm_out_file);
 }
 
-/* Write out the function declarations we've collected.  */
+/* Write out the function declarations we've collected and declare storage
+   for the broadcast buffer.  */
 
 static void
 nvptx_file_end (void)

PR c/16351 Extend Wnonnull for returns_nonnull

2015-07-22 Thread Manuel López-Ibáñez

While looking at PR c/16351, I noticed that all tests proposed for
-Wnull-attribute
(https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01715.html) could be
warned from the FEs by simply extending the existing Wnonnull.

Bootstrapped and regression tested on x86_64-linux-gnu.

OK?


gcc/ChangeLog:

2015-07-22  Manuel López-Ibáñez  

PR c/16351
* doc/invoke.texi (Wnonnull): Document behavior for
returns_nonnull.

gcc/testsuite/ChangeLog:

2015-07-22  Manuel López-Ibáñez  

PR c/16351
* c-c++-common/wnonnull-1.c: New test.

gcc/cp/ChangeLog:

2015-07-22  Manuel López-Ibáñez  

PR c/16351
* typeck.c (check_return_expr): Call maybe_warn_returns_nonnull.


gcc/c-family/ChangeLog:

2015-07-22  Manuel López-Ibáñez  

PR c/16351
* c-common.c (maybe_warn_returns_nonnull): New.
* c-common.h (maybe_warn_returns_nonnull): Declare.

gcc/c/ChangeLog:

2015-07-22  Manuel López-Ibáñez  

PR c/16351
* c-typeck.c (c_finish_return): Call maybe_warn_returns_nonnull.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 225868)
+++ gcc/doc/invoke.texi (working copy)
@@ -3709,11 +3709,13 @@ formats that may yield only a two-digit 
 
 @item -Wnonnull
 @opindex Wnonnull
 @opindex Wno-nonnull
 Warn about passing a null pointer for arguments marked as
-requiring a non-null value by the @code{nonnull} function attribute.
+requiring a non-null value by the @code{nonnull} function attribute
+or returning a null pointer from a function declared with the attribute
+@code{returns_nonnull}.
 
 @option{-Wnonnull} is included in @option{-Wall} and @option{-Wformat}.  It
 can be disabled with the @option{-Wno-nonnull} option.
 
 @item -Winit-self @r{(C, C++, Objective-C and Objective-C++ only)}
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 225868)
+++ gcc/c-family/c-common.c (working copy)
@@ -9508,10 +9508,22 @@ check_nonnull_arg (void * ARG_UNUSED (ct
   if (integer_zerop (param))
 warning (OPT_Wnonnull, "null argument where non-null required "
 "(argument %lu)", (unsigned long) param_num);
 }
 
+/* Possibly warn if RETVAL is a null pointer and FNDECL is declared
+   with attribute returns_nonnull.  LOC is the location of RETVAL.  */
+
+void
+maybe_warn_returns_nonnull (location_t loc, tree fndecl, tree retval)
+{
+  if (integer_zerop (retval)
+  && lookup_attribute ("returns_nonnull",
+  TYPE_ATTRIBUTES (TREE_TYPE (fndecl
+warning_at (loc, OPT_Wnonnull, "null return value where non-null 
required");
+}
+
 /* Helper for nonnull attribute handling; fetch the operand number
from the attribute argument list.  */
 
 static bool
 get_nonnull_operand (tree arg_num_expr, unsigned HOST_WIDE_INT *valp)
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h (revision 225868)
+++ gcc/c-family/c-common.h (working copy)
@@ -1049,10 +1049,11 @@ extern void do_warn_double_promotion (tr
 extern void set_underlying_type (tree);
 extern void record_types_used_by_current_var_decl (tree);
 extern void record_locally_defined_typedef (tree);
 extern void maybe_record_typedef_use (tree);
 extern void maybe_warn_unused_local_typedefs (void);
+extern void maybe_warn_returns_nonnull (location_t, tree, tree);
 extern void maybe_warn_bool_compare (location_t, enum tree_code, tree, tree);
 extern vec *make_tree_vector (void);
 extern void release_tree_vector (vec *);
 extern vec *make_tree_vector_single (tree);
 extern vec *make_tree_vector_from_list (tree);
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c(revision 225868)
+++ gcc/c/c-typeck.c(working copy)
@@ -9372,10 +9372,11 @@ c_finish_return (location_t loc, tree re
{
  semantic_type = TREE_TYPE (retval);
  retval = TREE_OPERAND (retval, 0);
}
   retval = c_fully_fold (retval, false, NULL);
+  maybe_warn_returns_nonnull (loc, current_function_decl, retval);
   if (semantic_type)
retval = build1 (EXCESS_PRECISION_EXPR, semantic_type, retval);
 }
 
   if (!retval)
Index: gcc/testsuite/c-c++-common/wnonnull-1.c
===
--- gcc/testsuite/c-c++-common/wnonnull-1.c (revision 0)
+++ gcc/testsuite/c-c++-common/wnonnull-1.c (revision 0)
@@ -0,0 +1,42 @@
+/* { dg-do compile } */ 
+/* { dg-options "-Wnonnull" } */
+
+
+extern void foo(void *) __attribute__ ((__nonnull__ (1)));
+
+int z;
+int y;
+
+void
+com (int a)
+{
+  foo (a == 42 ? &z  : (void *) 0); /* { dg-warning "null" } */
+}
+
+void
+bar (void)
+{
+  foo ((void *)0); /* { dg-warning "null" } */
+}
+
+int * foo_r(int a) __attribute__((returns_nonnull));
+int * bar_r(void) __attribute__((returns_nonnull));
+
+int *
+foo_r(int a

Re: RFC: [PATCH] Add __builtin_ia32_stack_top

2015-07-22 Thread H.J. Lu

On Wed, Jul 22, 2015 at 6:59 AM, H.J. Lu  wrote:
> On Wed, Jul 22, 2015 at 6:55 AM, Segher Boessenkool
>  wrote:
>> On Wed, Jul 22, 2015 at 05:10:04AM -0700, H.J. Lu wrote:
>>> I got a feedback, suggesting __builtin_stack_top, instead of
>>> __builtin_ia32_stack_top.  But I don't know if
>>>
>>> +  /* After the prologue, stack top is at -WORD(AP) in the current
>>> +frame.  */
>>> +  emit_insn (gen_rtx_SET (target,
>>> + plus_constant (Pmode, arg_pointer_rtx,
>>> +-UNITS_PER_WORD)));
>>>
>>> is true for all backends.  If it works on all backends, I can move
>>> it to builtins.c.
>>
>> It doesn't afaik.  But can't you define INITIAL_FRAME_ADDRESS_RTX?
>>
>>
>> Segher
>
> Does INITIAL_FRAME_ADDRESS_RTX point to stack top? It certainly
> can't be defined for x86.   I will write a midld-end patch and leave to each
> backend to enable it.

Here is a patch.  Any comments, feedbacks?

Thanks.

-- 
H.J.
---
When __builtin_frame_address is used to retrieve the address of the
function stack frame, the frame pointer is always kept, which wastes one
register and 2 instructions.  For x86-32, one less register means
significant negative impact on performance.  This patch adds a new
builtin function, __builtin_stack_top.  It returns the stack address
when the function is called.

This patch only enables __builtin_stack_top for x86 backend.  Using
__builtin_stack_top with other backends will lead to

sorry, unimplemented: ‘__builtin_stack_top’ not supported on this target

TARGET_STACK_TOP_RTX must be defined to enable __builtin_stack_top.
default_stack_top_rtx may be extended to support more backends,
including those with INITIAL_FRAME_ADDRESS_RTX.

gcc/

PR target/66960
* builtin-types.def (BT_FN_PTR_VOID): New function type.
* builtins.c (expand_builtin): Handle BUILT_IN_STACK_TOP.
(is_simple_builtin): Likewise.
* ipa-pure-const.c (special_builtin_state): Likewise.
* builtins.def: Add BUILT_IN_STACK_TOP.
* function.h (function): Add stack_top_taken.
* target.def (stack_top_rtx): New target hook.
* targhooks.c (default_stack_top_rtx): New.
* targhooks.h (default_stack_top_rtx): Likewise.
* config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is
used and the stack address has been taken.
(TARGET_STACK_TOP_RTX): New.
* doc/extend.texi: Document __builtin_stack_top.
* doc/tm.texi.in (TARGET_STACK_TOP_RTX): New.
* doc/tm.texi: Regenerated.

gcc/testsuite/

PR target/66960
* gcc.target/i386/pr66960-1.c: New test.
* gcc.target/i386/pr66960-2.c: Likewise.
* gcc.target/i386/pr66960-3.c: Likewise.
* gcc.target/i386/pr66960-4.c: Likewise.
* gcc.target/i386/pr66960-5.c: Likewise.
From 53c2dd6e303d48eccf050696020b3765d3c4c382 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 21 Jul 2015 14:32:09 -0700
Subject: [PATCH] Add __builtin_stack_top
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When __builtin_frame_address is used to retrieve the address of the
function stack frame, the frame pointer is always kept, which wastes one
register and 2 instructions.  For x86-32, one less register means
significant negative impact on performance.  This patch adds a new
builtin function, __builtin_stack_top.  It returns the stack address
when the function is called.

This patch only enables __builtin_stack_top for x86 backend.  Using
__builtin_stack_top with other backends will lead to

sorry, unimplemented: ‘__builtin_stack_top’ not supported on this target

TARGET_STACK_TOP_RTX must be defined to enable __builtin_stack_top.
default_stack_top_rtx may be extended to support more backends,
including those with INITIAL_FRAME_ADDRESS_RTX.

gcc/

	PR target/66960
	* builtin-types.def (BT_FN_PTR_VOID): New function type.
	* builtins.c (expand_builtin): Handle BUILT_IN_STACK_TOP.
	(is_simple_builtin): Likewise.
	* ipa-pure-const.c (special_builtin_state): Likewise.
	* builtins.def: Add BUILT_IN_STACK_TOP.
	* function.h (function): Add stack_top_taken.
	* target.def (stack_top_rtx): New target hook.
	* targhooks.c (default_stack_top_rtx): New.
	* targhooks.h (default_stack_top_rtx): Likewise.
	* config/i386/i386.c (ix86_expand_prologue): Sorry if DRAP is
	used and the stack address has been taken.
	(TARGET_STACK_TOP_RTX): New.
	* doc/extend.texi: Document __builtin_stack_top.
	* doc/tm.texi.in (TARGET_STACK_TOP_RTX): New.
	* doc/tm.texi: Regenerated.

gcc/testsuite/

	PR target/66960
	* gcc.target/i386/pr66960-1.c: New test.
	* gcc.target/i386/pr66960-2.c: Likewise.
	* gcc.target/i386/pr66960-3.c: Likewise.
	* gcc.target/i386/pr66960-4.c: Likewise.
	* gcc.target/i386/pr66960-5.c: Likewise.
---
 gcc/builtin-types.def |  1 +
 gcc/builtins.c| 11 +++
 gcc/builtins.def  |  1 +
 gcc/config/i386/i386.c|  8 
 gcc/doc/extend.texi   |  7 +++
 gcc/doc/tm.texi   |  5 +

PR middle-end/16351 NULL dereference warnings

2015-07-22 Thread Manuel López-Ibáñez

I took the patch in
https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01715.html and removed
the Wnull-attribute part, since most of it can be done from the FE as
shown in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01857.html  and
also to make the patch smaller and easier to review.

I also fixed the comments by Florian here:
https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00149.html and added
more tests from the PR and its duplicates (one xfailed, I'll open a
new PR about it).

Futher cleanups may be possible (infer_nonnull_range_by_attribute
checks flag_delete_null_pointer_checks, which seems weird to me but it
matches the existing behavior of infer_nonnull_range).

I added this to Wall to get as much testing as possible, we can always
move it to Wextra or disable it by default just before the release if
it turns out to be too noisy.

Boostrapped and regression tested on x86_64-linux-gnu.

OK?

gcc/ChangeLog:

2015-07-22  Manuel López-Ibáñez  
Jeff Law  

PR c/16351
* doc/invoke.texi (Wnull-dereference): New.
* tree-vrp.c (infer_value_range): Update call to infer_nonnull_range.
* gimple-ssa-isolate-paths.c (find_implicit_erroneous_behaviour):
Warn for potential NULL dereferences.
(find_explicit_erroneous_behaviour): Warn for NULL dereferences.
* ubsan.c (instrument_nonnull_arg): Call
infer_nonnull_range_by_attribute.
(instrument_nonnull_return): Likewise.
* common.opt (Wnull-dereference); New.
* gimple.c (infer_nonnull_range): Remove bool arguments.
(infer_nonnull_range_by_dereference): New.
(infer_nonnull_range_by_attribute): New.
* gimple.h: Update declarations.

gcc/testsuite/ChangeLog:

2015-07-22  Manuel López-Ibáñez  
Jeff Law  

PR c/16351
* gcc.dg/tree-ssa/isolate-2.c: Close comment.
* gcc.dg/tree-ssa/isolate-4.c: Likewise.
* gcc.dg/tree-ssa/wnull-dereference.c: New test.
* gcc.dg/tree-ssa/isolate-1.c: Test warnings with -Wnull-dereference.
* gcc.dg/tree-ssa/isolate-3.c: Likewise.
* gcc.dg/tree-ssa/isolate-5.c: Likewise.
* c-c++-common/wnonnull-1.c: New test.
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 225868)
+++ gcc/tree-vrp.c  (working copy)
@@ -4936,11 +4936,11 @@ infer_value_range (gimple stmt, tree op,
  break;
   if (e == NULL)
return false;
 }
 
-  if (infer_nonnull_range (stmt, op, true, true))
+  if (infer_nonnull_range (stmt, op))
 {
   *val_p = build_int_cst (TREE_TYPE (op), 0);
   *comp_code_p = NE_EXPR;
   return true;
 }
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 225868)
+++ gcc/doc/invoke.texi (working copy)
@@ -258,10 +258,11 @@ Objective-C and Objective-C++ Dialects}.
 -Wframe-larger-than=@var{len} -Wno-free-nonheap-object -Wjump-misses-init @gol
 -Wignored-qualifiers  -Wincompatible-pointer-types @gol
 -Wimplicit  -Wimplicit-function-declaration  -Wimplicit-int @gol
 -Winit-self  -Winline  -Wno-int-conversion @gol
 -Wno-int-to-pointer-cast -Wno-invalid-offsetof @gol
+-Wnull-dereference @gol
 -Winvalid-pch -Wlarger-than=@var{len}  -Wunsafe-loop-optimizations @gol
 -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol
 -Wmain -Wmaybe-uninitialized -Wmemset-transposed-args @gol
 -Wmisleading-indentation -Wmissing-braces @gol
 -Wmissing-field-initializers -Wmissing-include-dirs @gol
@@ -4130,10 +4133,20 @@ All the above @option{-Wunused} options 
 
 In order to get a warning about an unused function parameter, you must
 either specify @option{-Wextra -Wunused} (note that @option{-Wall} implies
 @option{-Wunused}), or separately specify @option{-Wunused-parameter}.
 
+@item -Wnull-dereference
+@opindex Wnull-dereference
+@opindex Wno-null-dereference
+Warn if the compiler detects paths which trigger erroneous or
+undefined behaviour due to dereferencing a NULL pointer.  This option
+is only active when @option{-fdelete-null-pointer-checks} is active,
+which is enabled by optimizations in most targets.  The precision of
+the warnings depends on the optimization options used.  This option is
+enabled by @option{-Wall}.
+
 @item -Wuninitialized
 @opindex Wuninitialized
 @opindex Wno-uninitialized
 Warn if an automatic variable is used without first being initialized
 or if a variable may be clobbered by a @code{setjmp} call. In C++,
Index: gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c   (revision 225868)
+++ gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c   (working copy)
@@ -33,10 +33,10 @@ bar (void)
returns non-null attribute to isolate a path where NULL flows into
a return statement.  We test this twice, once where the NULL flows
from a PHI, the second with an explicit return 0 in the IL.
 
We also verify that after isolation phi-cprop simplifies the
-

[PATCH] Don't allow unsafe reductions in graphite

2015-07-22 Thread Tom de Vries


[ was: Re: [RFC, PR66873] Use graphite for parloops ]

On 22/07/15 13:02, Richard Biener wrote:

On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
  wrote:

>On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop  wrote:

>>Tom de Vries wrote:

>>>Fix reduction safety checks
>>>
>>>   * graphite-sese-to-poly.c (is_reduction_operation_p): Limit
>>>   flag_associative_math to SCALAR_FLOAT_TYPE_P.  Honour
>>>   TYPE_OVERFLOW_TRAPS and TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P.
>>>   Only allow wrapping fixed-point otherwise.
>>>   (build_poly_scop): Always call
>>>   rewrite_commutative_reductions_out_of_ssa.

>>
>>The changes to graphite look good to me.

>
>+  if (SCALAR_FLOAT_TYPE_P (type))
>+return flag_associative_math;
>+
>
>why only scalar floats?


Copied from the conditions in vect_is_simple_reduction_1.

>> >Please use FLOAT_TYPE_P.

Done.


>
>+  if (INTEGRAL_TYPE_P (type))
>+return (!TYPE_OVERFLOW_TRAPS (type)
>+   && TYPE_OVERFLOW_WRAPS (type));
>
>it cannot both wrap and trap thus TYPE_OVERFLOW_WRAPS is enough.
>


Done.


>I'm sure you'll disable quite some parallelization this way... (the
>routine is modeled after
>the vectorizers IIRC, so it would be affected as well).  Yeah - I see
>you modify autopar
>testcases.


I now split up the patch, this bit only relates to graphite, so no 
autopar testcases are affected.



>Please instead XFAIL the existing ones and add variants
>with unsigned
>reductions.  Adding -fwrapv isn't a good solution either.


Done.


>
>Can you think of a testcase that breaks btw?
>


If you mean a testcase that fails to execute properly with the fix, and 
executes correctly with the fix, then no.  The problem this patch is 
trying to fix, is that we assume wrapping overflow without fwrapv. In 
order to run into a runtime failure, we need a target that does not do 
wrapping overflow without fwrapv.



>The "proper" solution (see other passes) is to rewrite the reduction
>to a wrapping
>one (cast to unsigned for the reduction op).
>


Right.


>+  return (FIXED_POINT_TYPE_P (type)
>+ && FIXED_POINT_TYPE_OVERFLOW_WRAPS_P (type));
>
>why?


Again, copied from the conditions in vect_is_simple_reduction_1.

>> >  Simply return false here instead?

Done.


[ Btw, looking at associative_tree_code, I realized that the
  overflow checking is only necessary for PLUS_EXPR and MULT_EXPR:
...
  switch (code)
{
case BIT_IOR_EXPR:
case BIT_AND_EXPR:
case BIT_XOR_EXPR:
case PLUS_EXPR:
case MULT_EXPR:
case MIN_EXPR:
case MAX_EXPR:
  return true;
...

The other operators cannot overflow to begin with. My guess is that it's 
better to leave this for a trunk-only follow-up patch.

]

Currently bootstrapping and reg-testing on x86_64.

OK for trunk?

OK 5 and 4.9 release branches?

Thanks,
- Tom

Don't allow unsafe reductions in graphite

2015-07-21  Tom de Vries  

	* graphite-sese-to-poly.c (is_reduction_operation_p): Limit
	flag_associative_math to FLOAT_TYPE_P.  Honour
	TYPE_OVERFLOW_WRAPS for INTEGRAL_TYPE_P. Don't allow any other types.

	* gcc.dg/graphite/block-1.c: Xfail scan.
	* gcc.dg/graphite/interchange-12.c: Same.
	* gcc.dg/graphite/interchange-14.c: Same.
	* gcc.dg/graphite/interchange-15.c: Same.
	* gcc.dg/graphite/interchange-9.c: Same.
	* gcc.dg/graphite/interchange-mvt.c: Same.
	* gcc.dg/graphite/uns-block-1.c: New test.
	* gcc.dg/graphite/uns-interchange-12.c: New test.
	* gcc.dg/graphite/uns-interchange-14.c: New test.
	* gcc.dg/graphite/uns-interchange-15.c: New test.
	* gcc.dg/graphite/uns-interchange-9.c: New test.
	* gcc.dg/graphite/uns-interchange-mvt.c: New test.
---
 gcc/graphite-sese-to-poly.c| 14 +++--
 gcc/testsuite/gcc.dg/graphite/block-1.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c  |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c| 48 +
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c | 56 +++
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c | 58 
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c | 53 ++
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  | 47 
 .../gcc.dg/graphite/uns-interchange-mvt.c  | 63 ++
 13 files changed, 342 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-block-1.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.

Re: [AArch64/wwwdoc] Document -fpic support for small memory model

2015-07-22 Thread Jiong Wang


Jiong Wang writes:

> Marcus Shawcroft writes:
>
>> On 26 June 2015 at 10:32, Jiong Wang  wrote:
>>>
>>> This patch respin https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01804.html.
>>>
>>> A new symbol classification "SYMBOL_SMALL_GOT_28K" added to represent symbol
>>> which needs go through GOT table and it's under -fpic/-mcmodel-small. the 
>>> "_28K"
>>> suffix can reflect the symbol's attribute better, and by introducing this 
>>> new
>>> symbol type, we could avoid checking aarch64_cmodel at some extent
>>> though still needs the check somewhere.
>>>
>>> All other code logic not changed.
>>>
>>> OK for trunk?
>>>
>>> Thanks.
>>>
>>> 2015-06-26  Jiong. Wang  
>>>
>>> gcc/
>>>   * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New type
>>>   SYMBOL_SMALL_GOT_28K.
>>>   * config/aarch64/aarch64.md: (ldr_got_small_): Support new GOT
>>>   relocation modifiers.
>>>   (unspec): New enum "UNSPEC_GOTMALLPIC28K.
>>>   (ldr_got_small_28k_): New.
>>>   (ldr_got_small_28k_sidi): New.
>>>   * config/aarch64/iterators.md (got_modifier): New mode iterator.
>>>   * config/aarch64/aarch64-otps.h (aarch64_code_model): New model.
>>>   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Support
>>>   SYMBOL_SMALL_GOT_28K.
>>>   (aarch64_rtx_costs): Add costs for new instruction sequences.
>>>   (initialize_aarch64_code_model): Initialize new model.
>>>   (aarch64_classify_symbol): Recognize new model and new symbol 
>>> classification.
>>>   (aarch64_asm_preferred_eh_data_format): Support new model.
>>>   (aarch64_load_symref_appropriately): Generate new instruction
>>>   sequences for -fpic.
>>>   (TARGET_USE_PSEUDO_PIC_REG): New definition.
>>>   (aarch64_use_pseudo_pic_reg): New function.
>>>
>>> gcc/testsuite/
>>>   * gcc.target/aarch64/pic-small.c: New testcase.
>>
>>
>> OK, Thanks Jiong.  Could you prepare a NEWS entry for this change?
>> Cheers
>> /Marcus
>
> How about this one?
>
> 2015-06-26  Jiong Wang  
>
> wwwdocs/
>   * htdocs/gcc-6/changes.html (AArch64): Document -fpic for small
>   model.

Ping.

-fpic patch for AArch64 has been committed, this is the documentation
 counterpart which needs approval.

Thanks.

-- 
Regards,
Jiong

[PATCH] Enable reductions without fassociative-math in graphite

2015-07-22 Thread Tom de Vries


Hi,

this patch allows non-float reductions to be detected by graphite, 
independent of whether fassociative-math (which only has effect for 
float operations) is set.


Currently bootstrapping and reg-testing on x86_64.

OK for trunk?

Thanks,
- Tom
Enable reductions without fassociative-math in graphite

2015-07-21  Tom de Vries  

	* graphite-sese-to-poly.c (build_poly_scop): Always call
	rewrite_commutative_reductions_out_of_ssa.
---
 gcc/graphite-sese-to-poly.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 68f7df1..28b9817 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -3155,8 +3155,7 @@ build_poly_scop (scop_p scop)
   if (!scop_ivs_can_be_represented (scop))
 return;
 
-  if (flag_associative_math)
-rewrite_commutative_reductions_out_of_ssa (scop);
+  rewrite_commutative_reductions_out_of_ssa (scop);
 
   build_sese_loop_nests (region);
   /* Record all conditions in REGION.  */
-- 
1.9.1

Re: [WIP] OpenMP 4 NVPTX support

2015-07-22 Thread Thomas Schwinge

Hi!

On Tue, 21 Apr 2015 17:58:39 +0200, Jakub Jelinek  wrote:
> Attached is a minimal patch to get at least a trivial OpenMP 4.0 testcase
> offloading to NVPTX (the first patch).  The second patch is WIP, just first
> few needed changes to make libgomp to build for NVPTX (several weeks of work
> at least).

We're not in particular working on making nvptx offloading work for
OpenMP, but also for OpenACC offloading a tiny bit of code is required to
be shipped in an offloading device's runtime library -- code that
conceptually belongs into libgomp.  (On gomp-4_0-branch, it currently
lives in libgcc because that was easier to do.)  Actually, as I should
find out, building a "dummy" (empty) libgomp for nvptx is not actually
difficult.  Additionally to your second patch (U2; quoted at the end of
this email), we'll need the following:

commit ea5213c1eb6e525f64aa103312e8e0ac88048122
Author: Thomas Schwinge 
Date:   Wed Jul 22 12:12:41 2015 +0200

Empty libgomp for nvptx

$ mkdir libgomp/config/nvptx
$ cp libgomp/config/{linux,nvptx}/omp-lock.h
$ for f in libgomp{,/config/linux,/config/posix}/*.c; do touch 
libgomp/config/nvptx/"$(basename "$f")"; done
---
 libgomp/config/nvptx/affinity.c   |  0
 libgomp/config/nvptx/alloc.c  |  0
 libgomp/config/nvptx/bar.c|  0
 libgomp/config/nvptx/barrier.c|  0
 libgomp/config/nvptx/critical.c   |  0
 libgomp/config/nvptx/env.c|  0
 libgomp/config/nvptx/error.c  |  0
 libgomp/config/nvptx/fortran.c|  0
 libgomp/config/nvptx/iter.c   |  0
 libgomp/config/nvptx/iter_ull.c   |  0
 libgomp/config/nvptx/libgomp-plugin.c |  0
 libgomp/config/nvptx/lock.c   |  0
 libgomp/config/nvptx/loop.c   |  0
 libgomp/config/nvptx/loop_ull.c   |  0
 libgomp/config/nvptx/mutex.c  |  0
 libgomp/config/nvptx/oacc-async.c |  0
 libgomp/config/nvptx/oacc-cuda.c  |  0
 libgomp/config/nvptx/oacc-host.c  |  0
 libgomp/config/nvptx/oacc-init.c  |  0
 libgomp/config/nvptx/oacc-mem.c   |  0
 libgomp/config/nvptx/oacc-parallel.c  |  0
 libgomp/config/nvptx/oacc-plugin.c|  0
 libgomp/config/nvptx/omp-lock.h   | 12 
 libgomp/config/nvptx/ordered.c|  0
 libgomp/config/nvptx/parallel.c   |  0
 libgomp/config/nvptx/proc.c   |  0
 libgomp/config/nvptx/ptrlock.c|  0
 libgomp/config/nvptx/sections.c   |  0
 libgomp/config/nvptx/sem.c|  0
 libgomp/config/nvptx/single.c |  0
 libgomp/config/nvptx/splay-tree.c |  0
 libgomp/config/nvptx/target.c |  0
 libgomp/config/nvptx/task.c   |  0
 libgomp/config/nvptx/team.c   |  0
 libgomp/config/nvptx/time.c   |  0
 libgomp/config/nvptx/work.c   |  0
 36 files changed, 12 insertions(+)

diff --git libgomp/config/nvptx/affinity.c libgomp/config/nvptx/affinity.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/alloc.c libgomp/config/nvptx/alloc.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/bar.c libgomp/config/nvptx/bar.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/barrier.c libgomp/config/nvptx/barrier.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/critical.c libgomp/config/nvptx/critical.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/env.c libgomp/config/nvptx/env.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/error.c libgomp/config/nvptx/error.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/fortran.c libgomp/config/nvptx/fortran.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/iter.c libgomp/config/nvptx/iter.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/iter_ull.c libgomp/config/nvptx/iter_ull.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/libgomp-plugin.c 
libgomp/config/nvptx/libgomp-plugin.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/lock.c libgomp/config/nvptx/lock.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/loop.c libgomp/config/nvptx/loop.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/loop_ull.c libgomp/config/nvptx/loop_ull.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/mutex.c libgomp/config/nvptx/mutex.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/oacc-async.c libgomp/config/nvptx/oacc-async.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/oacc-cuda.c libgomp/config/nvptx/oacc-cuda.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/oacc-host.c libgomp/config/nvptx/oacc-host.c
new file mode 100644
index 000..e69de29
diff --git libgomp/config/nvptx/oacc-init.c libgomp/config/nvptx/oac

[PATH PR66926,PR66951} simple fix for ICE.

2015-07-22 Thread Yuri Rumyantsev

Hi All,

Here is simple fix which fixes PR66926 and PR66951 - fix condition for
renaming virtual operands to determine that statement is outside of
loop.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

gcc/ChangeLog
2015-07-22  Yuri Rumyantsev  

PR tree-optimization/66926,66951
* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Delete
INNER_LOOP and fix up condition for renaming virtual operands.


gcc/testsuite/ChangeLog
* gcc.dg/vect/pr66951.c: New test.


patch
Description: Binary data

[PATCH] Check TYPE_OVERFLOW_WRAPS for parloops reductions

2015-07-22 Thread Tom de Vries


[ was: Re: [RFC, PR66873] Use graphite for parloops ]

On 22/07/15 13:02, Richard Biener wrote:

On Wed, Jul 22, 2015 at 1:01 PM, Richard Biener
 wrote:

On Tue, Jul 21, 2015 at 8:42 PM, Sebastian Pop  wrote:

Tom de Vries wrote:

Fix reduction safety checks




diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 9145dbf..e014be2 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2613,16 +2613,30 @@ vect_is_simple_reduction_1 (loop_vec_info
loop_info, gimple phi,
 "reduction: unsafe fp math optimization: ");
return NULL;
  }
-  else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
-  && check_reduction)
+  else if (INTEGRAL_TYPE_P (type) && check_reduction)
  {
...

You didn't need to adjust any testcases?
 That's probably because the
checking above is
not always executed (see PR66623 for a related testcase).  The code
needs refactoring.
And we need a way-out, that is, we do _not_ want to not vectorize
signed reductions.
So you need to fix code generation instead.


Btw, for the vectorizer the current "trick" is that nobody takes advantage about
overflow undefinedness for vector types.



AFAIU, you're saying here that there's no current bug related to 
assuming wrapping overflow in the vectorizer?


I've updated the patch accordingly, so we only bother about 
TYPE_OVERFLOW_WRAPS for parloops reductions.


Currently bootstrapping and reg-testing on x86_64.

OK for trunk?

Thanks,
- Tom

Check TYPE_OVERFLOW_WRAPS for parloops reductions

2015-07-21  Tom de Vries  

	* tree-parloops.c (gather_scalar_reductions): Add arg to call to
	vect_force_simple_reduction.
	* tree-vect-loop.c (vect_analyze_scalar_cycles_1): Same.
	(vect_is_simple_reduction_1): Add and handle
	need_wrapping_integral_overflow parameter.
	(vect_is_simple_reduction, vect_force_simple_reduction): Add and pass
	need_wrapping_integral_overflow parameter.
	(vectorizable_reduction): Add arg to call to vect_is_simple_reduction.
	* tree-vectorizer.h (vect_force_simple_reduction): Add parameter to decl.

	* gcc.dg/autopar/outer-4.c: Add xfail.
	* gcc.dg/autopar/outer-5.c: Same.
	* gcc.dg/autopar/outer-6.c: Same.
	* gcc.dg/autopar/reduc-2.c: Same.
	* gcc.dg/autopar/reduc-2char.c: Same.
	* gcc.dg/autopar/reduc-2short.c: Same.
	* gcc.dg/autopar/reduc-8.c: Same.
	* gcc.dg/autopar/uns-outer-4.c: New test.
	* gcc.dg/autopar/uns-outer-5.c: New test.
	* gcc.dg/autopar/uns-outer-6.c: New test.
---
 gcc/testsuite/gcc.dg/autopar/outer-4.c  |  2 +-
 gcc/testsuite/gcc.dg/autopar/outer-5.c  |  2 +-
 gcc/testsuite/gcc.dg/autopar/outer-6.c  |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-2.c  |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-2char.c  |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-2short.c |  4 +--
 gcc/testsuite/gcc.dg/autopar/reduc-8.c  |  4 +--
 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c  | 36 
 gcc/testsuite/gcc.dg/autopar/uns-outer-5.c  | 49 +++
 gcc/testsuite/gcc.dg/autopar/uns-outer-6.c  | 51 +
 gcc/tree-parloops.c |  6 ++--
 gcc/tree-vect-loop.c| 44 +
 gcc/tree-vectorizer.h   |  3 +-
 13 files changed, 183 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/autopar/uns-outer-4.c
 create mode 100644 gcc/testsuite/gcc.dg/autopar/uns-outer-5.c
 create mode 100644 gcc/testsuite/gcc.dg/autopar/uns-outer-6.c

diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c
index 6fd37c5..2027499 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-4.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c
@@ -32,4 +32,4 @@ int main(void)
 
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-5.c b/gcc/testsuite/gcc.dg/autopar/outer-5.c
index 6a0ae91..d6e0dd3 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-5.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-5.c
@@ -45,4 +45,4 @@ int main(void)
 }
 
 /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/autopar/outer-6.c b/gcc/testsuite/gcc.dg/autopar/outer-6.c
index 6bef7cc..726794c 100644
--- a/gcc/testsuite/gcc.dg/autopar/outer-6.c
+++ b/gcc/testsuite/gcc.dg/autopar/outer-6.c
@@ -44,6 +44,6 @@ int main(void)
 
 
 /* Check that outer loop is parallelized.  */
-/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */
 /* {

Re: [AArch64/wwwdoc] Document -fpic support for small memory model

2015-07-22 Thread James Greenhalgh

On Fri, Jun 26, 2015 at 02:45:39PM +0100, Jiong Wang wrote:
> 
> Marcus Shawcroft writes:
> 
> 2015-06-26  Jiong Wang  
> 
> wwwdocs/
>   * htdocs/gcc-6/changes.html (AArch64): Document -fpic for small model.
> 

> Index: gcc-6/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
> retrieving revision 1.12
> diff -u -r1.12 changes.html
> --- gcc-6/changes.html16 Jun 2015 08:48:02 -  1.12
> +++ gcc-6/changes.html26 Jun 2015 13:30:05 -
> @@ -90,6 +90,15 @@
> If GCC is unable to detect the host CPU these options have no effect.
>   
> 
> +   

This should be a new  (list item) in the above  (unordered list),
rather than a new .

> + 
> +   -fpic is now supported on AArch64 for small memory
> +   model. 

In invoke.texi we describe -mcmodel as the "small code model" rather
than as a "memory model". How about rewording this as so:

  -fpic is now supported by the AArch64 target when generating
  code for the small code model (-mcmodel=small). 

> Compared with -fPIC, -fpic
> +   will guide GCC to generate more efficient position independent
> +   instruction sequences when accessing global objects and
> +   28KiB/15KiB global offset table size supported under ILP64/32.

I'm not sure this part is needed, the difference between -fpic and -fPIC
is already covered by invoke.texi. If you do want to include this text,
I might try rewriting it as:

  -fpic generates position-independent code which accesses all
  constant addresses through a global offset table (GOT). For AArch64, the
  size of the GOT is limited to 28KiB under the LP64 SysV ABI, and 15KiB
  under the ILP32 SysV ABI.

As I was looking in invoke.texi, do we want to document the limits on our
GOT size there as other targets have?

"These maximums are 8k on the SPARC and 32k on the m68k and RS/6000.
 The x86 has no such limit."

Thanks,
James

Re: [WIP] OpenMP 4 NVPTX support

2015-07-22 Thread Jakub Jelinek

On Wed, Jul 22, 2015 at 06:04:20PM +0200, Thomas Schwinge wrote:
> On Tue, 21 Apr 2015 17:58:39 +0200, Jakub Jelinek  wrote:
> > Attached is a minimal patch to get at least a trivial OpenMP 4.0 testcase
> > offloading to NVPTX (the first patch).  The second patch is WIP, just first
> > few needed changes to make libgomp to build for NVPTX (several weeks of work
> > at least).
> 
> We're not in particular working on making nvptx offloading work for
> OpenMP, but also for OpenACC offloading a tiny bit of code is required to
> be shipped in an offloading device's runtime library -- code that
> conceptually belongs into libgomp.  (On gomp-4_0-branch, it currently
> lives in libgcc because that was easier to do.)  Actually, as I should
> find out, building a "dummy" (empty) libgomp for nvptx is not actually
> difficult.  Additionally to your second patch (U2; quoted at the end of
> this email), we'll need the following:

The U2 version was a very early one, I've posted a newer version later,
but supposedly we can go with my U2 (if you've tested it together with your
patch, please check it in yourself) and your patch, and then
incrementally start removing the zero sized stubs or replacing them with
something real.

Jakub

Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)

2015-07-22 Thread Marek Polacek

On Wed, Jul 22, 2015 at 07:48:47AM -0500, Segher Boessenkool wrote:
> vmx.exp sets a bunch of options and the test overrides that now.  Options
> like -maltivec are pretty important for this test to work -- it #includes
> altivec.h, which does #error unless -maltivec is set, and things go downhill
> from that.  unpack-be-order.c works, unpack.c blows up.

Ah, right.
 
> Does your compiler maybe default to -maltivec?

I suppose -- I didn't see any failures on cfarm 112, but I was able to
reproduce the unpack.c fail on cfarm 110.  Turned out I should've used
dg-additional-options.  Thanks.

Tested vmx.exp on powerpc64-unknown-linux-gnu, applying to trunk.

2015-07-22  Marek Polacek  

* gcc.dg/vmx/unpack.c: Use dg-additional-options rather than
dg-options.

diff --git gcc/testsuite/gcc.dg/vmx/unpack.c gcc/testsuite/gcc.dg/vmx/unpack.c
index e71a5a6..b3ec93a 100644
--- gcc/testsuite/gcc.dg/vmx/unpack.c
+++ gcc/testsuite/gcc.dg/vmx/unpack.c
@@ -1,4 +1,4 @@
-/* { dg-options "-Wno-shift-overflow" } */
+/* { dg-additional-options "-Wno-shift-overflow" } */
 
 #include "harness.h"
 

Marek

Re: [gomp] Move openacc vector& worker single handling to RTL

2015-07-22 Thread Nathan Sidwell


On 07/20/15 11:08, Nathan Sidwell wrote:

On 07/20/15 09:01, Nathan Sidwell wrote:

On 07/18/15 11:37, Thomas Schwinge wrote:

Hi Nathan!



For OpenACC nvptx offloading, there must still be something wrong; here's
a count of the (non-deterministic!) regressions of ten runs of the
libgomp testsuite.  As private-vars-loop-worker-5.c fails most often, it
probably makes sense to look into that one first.


I'll take a look. :(


Having difficulty reproducing it (preprocessed source compiled at -O0 works for
me).  Do you have an exact recipe?


Thomas helped me reproduce them -- they are very intermittent.  Anyway, fixed 
with the attached patch I've committed to gomp branch.


The bug was a race condition in the worker-level 'follow along' algorithm. 
Worker zero could overwrite the flag for some subsequent block before all the 
other workers had read the previous value of the flag.  This wasn't 
optimization-level specific, but it appears unoptimized code creates better 
conditions to cause the behaviour.


This appears to fix all the -O0 regressions you observed Thomas.

nathan
2015-07-22  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_option_override): Initialize worker
	buffer alignment here.
	(nvptx_wsync): Generate pattern, not emit instruction.
	(nvptx_single): Insert barrier after read.
	(nvptx_process_pars): Adjust nvptx_wsync use.
	(nvptx_file_end): No need to apply default alignment here.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 226044)
+++ config/nvptx/nvptx.c	(working copy)
@@ -124,6 +124,7 @@ nvptx_option_override (void)
 = hash_table::create_ggc (17);
 
   worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, worker_bcast_name);
+  worker_bcast_align = GET_MODE_SIZE (SImode);
 }
 
 /* Return the mode to be used when declaring a ptx object for OBJ.
@@ -2627,12 +2628,13 @@ nvptx_wpropagate (bool pre_p, basic_bloc
 }
 }
 
-/* Emit a worker-level synchronization barrier.  */
+/* Emit a worker-level synchronization barrier.  We use different
+   markers for before and after synchronizations.  */
 
-static void
-nvptx_wsync (bool tail_p, rtx_insn *insn)
+static rtx
+nvptx_wsync (bool after)
 {
-  emit_insn_after (gen_nvptx_barsync (GEN_INT (tail_p)), insn);
+  return gen_nvptx_barsync (GEN_INT (after));
 }
 
 /* Single neutering according to MASK.  FROM is the incoming block and
@@ -2750,7 +2752,7 @@ nvptx_single (unsigned mask, basic_block
 	}
   else
 	{
-	  /* Includes worker mode, do spill & fill.  by construction
+	  /* Includes worker mode, do spill & fill.  By construction
 	 we should never have worker mode only. */
 	  wcast_data_t data;
 
@@ -2763,10 +2765,14 @@ nvptx_single (unsigned mask, basic_block
 	  data.offset = 0;
 	  emit_insn_before (nvptx_gen_wcast (pvar, PM_read, 0, &data),
 			before);
-	  emit_insn_before (gen_nvptx_barsync (GEN_INT (2)), tail);
+	  /* Barrier so other workers can see the write.  */
+	  emit_insn_before (nvptx_wsync (false), tail);
 	  data.offset = 0;
-	  emit_insn_before (nvptx_gen_wcast (pvar, PM_write, 0, &data),
-			tail);
+	  emit_insn_before (nvptx_gen_wcast (pvar, PM_write, 0, &data), tail);
+	  /* This barrier is needed to avoid worker zero clobbering
+	 the broadcast buffer before all the other workers have
+	 had a chance to read this instance of it.  */
+	  emit_insn_before (nvptx_wsync (true), tail);
 	}
 
   extract_insn (tail);
@@ -2824,8 +2830,8 @@ nvptx_process_pars (parallel *par)
 			  par->forked_insn);
 	nvptx_wpropagate (true, par->forked_block, par->fork_insn);
 	/* Insert begin and end synchronizations.  */
-	nvptx_wsync (false, par->forked_insn);
-	nvptx_wsync (true, par->joining_insn);
+	emit_insn_after (nvptx_wsync (false), par->forked_insn);
+	emit_insn_before (nvptx_wsync (true), par->joining_insn);
   }
   break;
 
@@ -3046,8 +3052,6 @@ nvptx_file_end (void)
 {
   /* Define the broadcast buffer.  */
 
-  if (worker_bcast_align < GET_MODE_SIZE (SImode))
-	worker_bcast_align = GET_MODE_SIZE (SImode);
   worker_bcast_hwm = (worker_bcast_hwm + worker_bcast_align - 1)
 	& ~(worker_bcast_align - 1);

[PATCH] Fix default_binds_local_p_2 for extern protected data

2015-07-22 Thread Szabolcs Nagy

The commit
https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=222184
changed a true to false in varasm.c:

 bool
 default_binds_local_p_2 (const_tree exp)
 {
-  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true);
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true, false,
+ !flag_pic);
 }

where

 default_binds_local_p_3 (const_tree exp, bool shlib, bool weak_dominate,
-bool extern_protected_data)
+bool extern_protected_data, bool common_local_p)
 {

false means that extern protected data binds locally,
which is wrong if the target can have copy relocations
against it (then the address must be loaded from GOT
otherwise the main executable will see different address).

Currently S/390, ARM and AArch64 targets use this predicate
and the current default is wrong for all of them (they can
have copy relocs) so I changed the default instead of doing
it in a target specific way.

The equivalent x86_64 bug was
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248
the default was changed for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65780
now i opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66912
for arm and aarch64.

Needs a further binutils patch too to emit R_*_GLOB_DAT
instead of R_*_RELATIVE relocs for protected data.
The glibc elf/tst-protected1a and elf/tst-protected1b
tests depend on this.

Tested ARM and AArch64 targets.

gcc/ChangeLog:

2015-07-22  Szabolcs Nagy  

PR target/66912
* varasm.c (default_binds_local_p_2): Turn on extern_protected_data.

gcc/testsuite/ChangeLog:

2015-07-22  Szabolcs Nagy  

PR target/66912
* gcc.target/aarch64/pr66912.c: New.
* gcc.target/arm/pr66912.c: New.
diff --git a/gcc/testsuite/gcc.target/aarch64/pr66912.c b/gcc/testsuite/gcc.target/aarch64/pr66912.c
new file mode 100644
index 000..b8aabcd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr66912.c
@@ -0,0 +1,42 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic" } */
+
+__attribute__((visibility("protected")))
+int n_common;
+
+__attribute__((weak, visibility("protected")))
+int n_weak_common;
+
+__attribute__((visibility("protected")))
+int n_init = -1;
+
+__attribute__((weak, visibility("protected")))
+int n_weak_init = -1;
+
+int
+f1 ()
+{
+  /* { dg-final { scan-assembler ":got(page_lo15)?:n_common" } } */
+  return n_common;
+}
+
+int
+f2 ()
+{
+  /* { dg-final { scan-assembler ":got(page_lo15)?:n_weak_common" } } */
+  return n_weak_common;
+}
+
+int
+f3 ()
+{
+  /* { dg-final { scan-assembler ":got(page_lo15)?:n_init" } } */
+  return n_init;
+}
+
+int
+f4 ()
+{
+  /* { dg-final { scan-assembler ":got(page_lo15)?:n_weak_init" } } */
+  return n_weak_init;
+}
diff --git a/gcc/testsuite/gcc.target/arm/pr66912.c b/gcc/testsuite/gcc.target/arm/pr66912.c
new file mode 100644
index 000..27e4c45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr66912.c
@@ -0,0 +1,42 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic" } */
+
+__attribute__((visibility("protected")))
+int n_common;
+
+__attribute__((weak, visibility("protected")))
+int n_weak_common;
+
+__attribute__((visibility("protected")))
+int n_init = -1;
+
+__attribute__((weak, visibility("protected")))
+int n_weak_init = -1;
+
+int
+f1 ()
+{
+  /* { dg-final { scan-assembler "\\.word\\tn_common\\(GOT\\)" } } */
+  return n_common;
+}
+
+int
+f2 ()
+{
+  /* { dg-final { scan-assembler "\\.word\\tn_weak_common\\(GOT\\)" } } */
+  return n_weak_common;
+}
+
+int
+f3 ()
+{
+  /* { dg-final { scan-assembler "\\.word\\tn_init\\(GOT\\)" } } */
+  return n_init;
+}
+
+int
+f4 ()
+{
+  /* { dg-final { scan-assembler "\\.word\\tn_weak_init\\(GOT\\)" } } */
+  return n_weak_init;
+}
diff --git a/gcc/varasm.c b/gcc/varasm.c
index 6a4ba0b..a056792 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6907,12 +6907,13 @@ default_binds_local_p (const_tree exp)
   return default_binds_local_p_3 (exp, flag_shlib != 0, true, false, false);
 }
 
-/* Similar to default_binds_local_p, but common symbol may be local.  */
+/* Similar to default_binds_local_p, but common symbol may be local and
+   extern protected data is non-local.  */
 
 bool
 default_binds_local_p_2 (const_tree exp)
 {
-  return default_binds_local_p_3 (exp, flag_shlib != 0, true, false,
+  return default_binds_local_p_3 (exp, flag_shlib != 0, true, true,
   !flag_pic);
 }

[gomp4, committed] Set safelen to INT_MAX for oacc independent pragma

2015-07-22 Thread Tom de Vries

[ was; Re: [PATCH, gomp4] Propagate independent clause for OpenACC 
kernels pass ]


On 14/07/15 11:48, Jakub Jelinek wrote:

On Tue, Jul 14, 2015 at 05:35:28PM +0800, Chung-Lin Tang wrote:

The wording of OpenACC independent is more simple:
"... the independent clause tells the implementation that the iterations of 
this loop
are data-independent with respect to each other." -- OpenACC spec 2.7.9

I would say this implies even more relaxed conditions than OpenMP simd safelen,
essentially saying that the compiler doesn't even need dependence analysis; just
assume independence of iterations.


safelen is also saying that the compiler doesn't even need dependence
analysis.  It is just that only some transformations of the loop are ok
without dependence analysis, others need to be with dependence analysis.
Classical vectorization optimizations (instead of doing one iteration
at a time you can do up to safelen consecutive iterations together) for the
first statement in the loop, then second statement, etc. are ok without
dependence analysis, but e.g. reversing the loop and running first the last
iteration and so on up to first, or running the iterations in random orders
is not ok.


So if OpenACC independent means there are no dependencies in between
iterations, the OpenMP counterpart here is #pragma omp for simd schedule (auto)
or #pragma omp distribute parallel for simd schedule (auto).


schedule(auto) appears to correspond to the OpenACC 'auto' clause, or
what is implied in a kernels compute construct, but I'm not sure it implies
no dependencies between iterations?


By the schedule(auto) I meant that the user tells the compiler it can
parallelize the loop with whatever schedule it wants.  Other schedules are
quite well defined, if the team has that many threads, which of the thread
gets which iteration, so user could rely on a particular parallelization and
the loop iterations still could not be 100% independent.  With
schedule(auto) you say it is up to the compiler to schedule them, thus they
really have to be all independent.


Putting aside the semantic issues, as of currently safelen>0 turns on a certain 
amount of
vectorization code that we are not currently using (and not likely at all for 
nvptx).
Right now, we're just trying to pass the new flag to a kernels tree-parloops 
based pass.


In any case, when setting your flag you should also set safelen = INT_MAX,
as the OpenACC independent implies that you can vectorize the loop with any
vectorization factor without performing dependency analysis on the loop.
OpenACC is (hopefully) not just about PTX and most other targets will want
to vectorize such loops.



This patch sets safelen to INT_MAX for loops marked with the independent 
clause on the openacc loop directive.


Build and reg-tested on x86_64 with nvidia accelerator.

Committed to gomp-4_0-branch.

Thanks,
- Tom

Set safelen to INT_MAX for oacc independent pragma

2015-07-22  Tom de Vries  

	* omp-low.c (expand_omp_for): Set loop->safelen to INT_MAX if
	marked_independent.
---
 gcc/omp-low.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0419dcd..65c6321 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -8286,6 +8286,7 @@ expand_omp_for (struct omp_region *region, gimple inner_stmt)
 	{
 	  struct loop *loop = region->cont->loop_father; 
 	  loop->marked_independent = true;
+	  loop->safelen = INT_MAX;
 	}
 }
   else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_SIMD)
-- 
1.9.1

[Ping] Re: [C++ Patch/RFC] PR 53184

2015-07-22 Thread Paolo Carlini


Hi,

On 05/05/2015 11:24 PM, Paolo Carlini wrote:

Hi,

per the audit trail, this issue appears to boil down to two separate 
issues:
- The warning doesn't appear universally useful, thus it would be nice 
to give it a name in order to enable disabling it.
- As shown by the testcase, sometimes the wording is misleading: it 
talks about 'anonymous namespace', where, as clarified by Jason in the 
trail, the issue is really about a type with no linkage, no namespace 
involved.


- The former is easy done, I picked: -Wsubobject-linkage. Makes sense?
- The latter is a little more tricky, because it doesn't seem always 
easy to tell one case from the other, in particular when templates are 
involved (eg, g++.dg/warn/anonymous-namespace-3.C) and the linkage 
issue involves template arguments. Given that the warning doesn't seem 
terribly important (as another data point, clang doesn't have it), so 
far I have conditionals which reliably figure out cases of anonymous 
namespace and cases of no linkage (per the testcase at issue, for 
example) and otherwise fall back to an 'or' wording. I hope the 
improvement is good enough. Alternately, I suppose the warning could 
use a completely different, more generic, wording, but in that case 
testcases like anonymous-namespace-3.C will need adjustment.

Any feedback on this?

https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00370.html

Thanks!
Paolo.

Re: [PR64164] drop copyrename, integrate into expand

2015-07-22 Thread Alexandre Oliva

On Jul 21, 2015, Richard Biener  wrote:

> On Sat, Jul 18, 2015 at 9:37 AM, Alexandre Oliva  wrote:
>> On Jul 16, 2015, Alexandre Oliva  wrote:
>> + /* If we are assigning parameters for a function, rather
>> +than for a call, propagate the RTL of the complex parm to
>> +the split declarations, and set their contexts so that
>> +maybe_reset_rtl_for_parm can recognize them and refrain
>> +from resetting their RTL.  */
>> + if (cfun->gimple_df)

> If the cfun->gimple_df check is to decide whether this is a call or a function
> then no, this can't work reliably.  What is this test for else?

That was the reason: call or function.

> You pass another argument to split_complex_arg, so why not pass in a bool
> on whether we split it for this or the other case?

There's only one call to split_complex_args.  I'll try to figure out
where the paths converge and see if it's reasonable to pass an argument
all the way to tell the two cases apart.

Thanks for the suggestion,

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

[PATCH][AArch64] Fix LINUX_TARGET_LINK_SPEC to be consistent with ARM

2015-07-22 Thread Szabolcs Nagy

Same as
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01387.html
but for AArch64.

-dynamic-linker is only passed to the linker if !static && !shared.

-rdynamic handling is changed too to be consistent with arm:
only pass -export-dynamic if !static.

2015-07-22  Szabolcs Nagy  

PR target/65711
* config/aarch64/aarch64-linux.h (LINUX_TARGET_LINK_SPEC): Move
-dynamic-linker within %{!static %{!shared, and -rdynamic within
%{!static.
diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 1600a32..c51c8b2 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -38,8 +38,9 @@
%{static:-Bstatic}\
%{shared:-shared}\
%{symbolic:-Bsymbolic}			\
-   %{rdynamic:-export-dynamic}			\
-   -dynamic-linker " GNU_USER_DYNAMIC_LINKER "	\
+   %{!static:	\
+ %{rdynamic:-export-dynamic}		\
+ %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}} \
-X		\
%{mbig-endian:-EB} %{mlittle-endian:-EL} \
-maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"

Re: [Patch, fortran] PR 37131, inline matmul

2015-07-22 Thread Thomas Koenig


Hi Mikael,


However, it introduces regressions on matmul_bounds_{2,4,5}.
It seems the "incorrect extent" runtime errors are completely optimized
away (even at -O0).
Any ideas?


This is seriously wierd.  It seems that the call to gfortran_error is
really optimized away, because the middle-end decides something strange.

I would assume the backend decl for gfortran_error is somehow wrong.

I will take a look, but this is an area that I don't really know a lot
about...

Thomas

[PATCH][AArch64] elf toolchain does not pass -shared linker option

2015-07-22 Thread Szabolcs Nagy

Valid linker options should be treated for elf consistently
with the linux-gnu target.

I'm not sure about the undocumented -h option (blindly copied
LINUX_TARGET_LINK_SPEC from aarch64-linux without the
dynamic-linker flag).

(Not passing -shared can cause broken vdso.so in the linux
kernel when it is built with the elf toolchain.)

2015-07-22  Szabolcs Nagy  

* config/aarch64/aarch64-elf-raw.h (LINK_SPEC): Handle -h, -static,
-shared, -symbolic, -rdynamic.
diff --git a/gcc/config/aarch64/aarch64-elf-raw.h b/gcc/config/aarch64/aarch64-elf-raw.h
index bd5e51c..d8c682f 100644
--- a/gcc/config/aarch64/aarch64-elf-raw.h
+++ b/gcc/config/aarch64/aarch64-elf-raw.h
@@ -44,7 +44,12 @@
 #endif
 
 #ifndef LINK_SPEC
-#define LINK_SPEC "%{mbig-endian:-EB} %{mlittle-endian:-EL} -X \
+#define LINK_SPEC "%{h*}			\
+   %{static:-Bstatic}\
+   %{shared:-shared}\
+   %{symbolic:-Bsymbolic}			\
+   %{!static:%{rdynamic:-export-dynamic}}	\
+   %{mbig-endian:-EB} %{mlittle-endian:-EL} -X	\
   -maarch64elf%{mabi=ilp32*:32}%{mbig-endian:b}" \
   CA53_ERR_835769_SPEC \
   CA53_ERR_843419_SPEC

[PATCH] Fix ubsan tree sharing (PR sanitizer/66908)

2015-07-22 Thread Marek Polacek

In this testcase we were generating an uninitialized variable when doing
-fsanitize=shift,bounds sanitization.  The shift instrumentation is done
first; after that, the IR looks like

  res[i] = (m > 31) ? __ubsan (... tab[i] ...) ? 0, ... tab[i] ...;

where tab[i] are identical.  That means that when we instrument the first
tab[i] (we shouldn't do this I suppose), the second tab[i] is changed as
well as they're shared.  But that doesn't play well with SAVE_EXPRs, because
SAVE_EXPR  would only be initialized on one path.  Fixed by unsharing
the operands when constructing the ubsan check.  The .gimple diff is in
essence just

+  i.2 = i;
+  UBSAN_BOUNDS (0B, i.2, 21);
-  UBSAN_BOUNDS (0B, i.1, 21);

(Merely not instrumenting __ubsan_* wouldn't help exactly because of the
sharing.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-07-22  Marek Polacek  

PR sanitizer/66908
* c-ubsan.c: Include gimplify.h.
(ubsan_instrument_division): Unshare OP0 and OP1.
(ubsan_instrument_shift): Likewise.

* c-c++-common/ubsan/pr66908.c: New test.

diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index 0baf118..3869511 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "stor-layout.h"
 #include "builtins.h"
+#include "gimplify.h"
 
 /* Instrument division by zero and INT_MIN / -1.  If not instrumenting,
return NULL_TREE.  */
@@ -54,6 +55,9 @@ ubsan_instrument_division (location_t loc, tree op0, tree op1)
   gcc_assert (TYPE_MAIN_VARIANT (TREE_TYPE (op0))
  == TYPE_MAIN_VARIANT (TREE_TYPE (op1)));
 
+  op0 = unshare_expr (op0);
+  op1 = unshare_expr (op1);
+
   if (TREE_CODE (type) == INTEGER_TYPE
   && (flag_sanitize & SANITIZE_DIVIDE))
 t = fold_build2 (EQ_EXPR, boolean_type_node,
@@ -134,6 +138,9 @@ ubsan_instrument_shift (location_t loc, enum tree_code code,
   HOST_WIDE_INT op0_prec = TYPE_PRECISION (type0);
   tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1);
 
+  op0 = unshare_expr (op0);
+  op1 = unshare_expr (op1);
+
   t = fold_convert_loc (loc, op1_utype, op1);
   t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
 
diff --git gcc/testsuite/c-c++-common/ubsan/pr66908.c 
gcc/testsuite/c-c++-common/ubsan/pr66908.c
index e69de29..5f731f0 100644
--- gcc/testsuite/c-c++-common/ubsan/pr66908.c
+++ gcc/testsuite/c-c++-common/ubsan/pr66908.c
@@ -0,0 +1,15 @@
+/* PR sanitizer/66908 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=shift,bounds -O2 -Werror=maybe-uninitialized" } */
+/* { dg-additional-options "-std=gnu90" { target c } } */
+
+struct S { int a[22]; };
+static int const e[22] = { };
+
+void
+foo (struct S const *s, unsigned int m, unsigned int *res)
+{
+  unsigned int i;
+  for (i = 0; i < 22; ++i)
+res[i] = ((s->a[i] + e[i]) << m);
+}

Marek

Re: [PR64164] drop copyrename, integrate into expand

2015-07-22 Thread Alexandre Oliva

On Jul 21, 2015, Richard Biener  wrote:

> On Sat, Jul 18, 2015 at 9:37 AM, Alexandre Oliva  wrote:
>> + if (cfun->gimple_df)

> If the cfun->gimple_df check is to decide whether this is a call or a function
> then no, this can't work reliably.  What is this test for else?

It turns out it's not call or function, as I thought at first, but
gimplifying or expanding the function.  split_complex_args is not used
for calls.  So the above might actually work (minus the misleading
comments I wrote), and I think it's cleaner than adding a bool
expanding_p arg to split_complex_args and
assign_parms_augmented_arg_list, called from gimplify_parameters (during
gimplification of a function) and assign_parms (during its expansion).
Do you agree, or would you prefer the explicit argument?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-22 Thread Marek Polacek

On Tue, Jul 14, 2015 at 11:42:21PM +0200, Marek Polacek wrote:
> On Tue, Jul 14, 2015 at 10:30:06PM +0100, Richard Sandiford wrote:
> > Marek Polacek  writes:
> > > +  /* Don't warn for e.g.
> > > + HOST_WIDE_INT n;
> > > + ...
> > > + if (n == (long) n) ...
> > > +   */
> > > +  if ((CONVERT_EXPR_P (lhs) || TREE_CODE (lhs) == NON_LVALUE_EXPR)
> > > +  ^ (CONVERT_EXPR_P (rhs) || TREE_CODE (rhs) == NON_LVALUE_EXPR))
> > > +return;
> > 
> > I might be misreading it, sorry, but it looks like the XOR means that
> > we'd still warn for:
> > 
> >   if ((HOST_WIDE_INT) n == (long) n) ...
> > 
> > in cases where HOST_WIDE_INT and long have the same precision.
> 
> Yes, that's true.  Maybe we want to warn in that case as well,
> I didn't know.  If we do, just changing ^ into || would probably
> help.  It's somewhat hazy to me what to do in this case.

This is version with || rather than ^.  Pick whichever you prefer.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-07-22  Marek Polacek  

PR c++/66555
PR c/54979
* c-common.c (find_array_ref_with_const_idx_r): New function.
(warn_tautological_cmp): New function.
* c-common.h (warn_tautological_cmp): Declare.
* c.opt (Wtautological-compare): New option.

* c-typeck.c (parser_build_binary_op): Call warn_tautological_cmp.

* call.c (build_new_op_1): Call warn_tautological_cmp.
* pt.c (tsubst_copy_and_build): Use sentinel to suppress tautological
compare warnings.

* doc/invoke.texi: Document -Wtautological-compare.

* c-c++-common/Wtautological-compare-1.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 84e7242..6ceed36 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1849,6 +1849,70 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
 }
 }
 
+/* Helper function for warn_tautological_cmp.  Look for ARRAY_REFs
+   with constant indices.  */
+
+static tree
+find_array_ref_with_const_idx_r (tree *expr_p, int *walk_subtrees, void *data)
+{
+  tree expr = *expr_p;
+
+  if ((TREE_CODE (expr) == ARRAY_REF
+   || TREE_CODE (expr) == ARRAY_RANGE_REF)
+  && TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
+{
+  *(bool *) data = true;
+  *walk_subtrees = 0;
+}
+
+  return NULL_TREE;
+}
+
+/* Warn if a self-comparison always evaluates to true or false.  LOC
+   is the location of the comparison with code CODE, LHS and RHS are
+   operands of the comparison.  */
+
+void
+warn_tautological_cmp (location_t loc, enum tree_code code, tree lhs, tree rhs)
+{
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+return;
+
+  /* We do not warn for constants because they are typical of macro
+ expansions that test for features, sizeof, and similar.  */
+  if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs))
+return;
+
+  /* Don't warn for e.g.
+ HOST_WIDE_INT n;
+ ...
+ if (n == (long) n) ...
+   */
+  if ((CONVERT_EXPR_P (lhs) || TREE_CODE (lhs) == NON_LVALUE_EXPR)
+  || (CONVERT_EXPR_P (rhs) || TREE_CODE (rhs) == NON_LVALUE_EXPR))
+return;
+
+  if (operand_equal_p (lhs, rhs, 0))
+{
+  /* Don't warn about array references with constant indices;
+these are likely to come from a macro.  */
+  bool found = false;
+  walk_tree_without_duplicates (&lhs, find_array_ref_with_const_idx_r,
+   &found);
+  if (found)
+   return;
+  const bool always_true = (code == EQ_EXPR || code == LE_EXPR
+   || code == GE_EXPR || code == UNLE_EXPR
+   || code == UNGE_EXPR || code == UNEQ_EXPR);
+  if (always_true)
+   warning_at (loc, OPT_Wtautological_compare,
+   "self-comparison always evaluates to true");
+  else
+   warning_at (loc, OPT_Wtautological_compare,
+   "self-comparison always evaluates to false");
+}
+}
+
 /* Warn about logical not used on the left hand side operand of a comparison.
This function assumes that the LHS is inside of TRUTH_NOT_EXPR.
Do not warn if RHS is of a boolean type.  */
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index a2a4621..b891bbd 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -812,6 +812,7 @@ extern bool warn_if_unused_value (const_tree, location_t);
 extern void warn_logical_operator (location_t, enum tree_code, tree,
   enum tree_code, tree, enum tree_code, tree);
 extern void warn_logical_not_parentheses (location_t, enum tree_code, tree);
+extern void warn_tautological_cmp (location_t, enum tree_code, tree, tree);
 extern void check_main_parameter_types (tree decl);
 extern bool c_determine_visibility (tree);
 extern bool vector_types_compatible_elements_p (tree, tree);
diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index 285952e..2f6369b 100644
--- gcc/c-family/

Re: [PATCH] Fix ubsan tree sharing (PR sanitizer/66908)

2015-07-22 Thread Jakub Jelinek

On Wed, Jul 22, 2015 at 07:26:22PM +0200, Marek Polacek wrote:
> In this testcase we were generating an uninitialized variable when doing
> -fsanitize=shift,bounds sanitization.  The shift instrumentation is done
> first; after that, the IR looks like
> 
>   res[i] = (m > 31) ? __ubsan (... tab[i] ...) ? 0, ... tab[i] ...;
> 
> where tab[i] are identical.  That means that when we instrument the first
> tab[i] (we shouldn't do this I suppose), the second tab[i] is changed as
> well as they're shared.  But that doesn't play well with SAVE_EXPRs, because
> SAVE_EXPR  would only be initialized on one path.  Fixed by unsharing
> the operands when constructing the ubsan check.  The .gimple diff is in
> essence just
> 
> +  i.2 = i;
> +  UBSAN_BOUNDS (0B, i.2, 21);
> -  UBSAN_BOUNDS (0B, i.1, 21);
> 
> (Merely not instrumenting __ubsan_* wouldn't help exactly because of the
> sharing.)
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

That is strange.  I'd have expected you'd want to unshare if you want to use
the same operand multiple times in the same function, instead of unsharing
it just in case it is shared with something different.

So isn't the bug instead that the UBSAN_BOUNDS generating code doesn't
unshare?  Of course, these two functions use op0 and/or op1 sometimes
multiple times too and thus they might want to unshare too, but I'd have
expected in a different spot.

Jakub

[PATCH, i386]: Fix PR 66954, function multiversioning fails for target "aes"

2015-07-22 Thread Uros Bizjak

Straightforward implementation.

libgcc/ChangeLog:

2015-07-22  Uros Bizjak  

PR target/66954
* config/i386/cpuinfo.c (enum processor_features): Add FEATURE_AES.
(get_available_features): Handle FEATURE_AES.

gcc/ChangeLog:

2015-07-22  Uros Bizjak  

PR target/66954
* config/i386/i386.c (get_builtin_code_for_version): Add P_AES
to enum feature_priority and feature_list.
(fold_builtin_cpu): Add F_AES to enum processor_features
and isa_names_table.

gcc/testsuite/ChangeLog:

2015-07-22  Uros Bizjak  

PR target/66954
* g++.dg/ext/mv24.C: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 226075)
+++ gcc/config/i386/i386.c  (working copy)
@@ -34611,6 +34611,7 @@ get_builtin_code_for_version (tree decl, tree *pre
 P_SSE4_2,
 P_PROC_SSE4_2,
 P_POPCNT,
+P_AES,
 P_AVX,
 P_PROC_AVX,
 P_BMI,
@@ -34648,6 +34649,7 @@ get_builtin_code_for_version (tree decl, tree *pre
   {"sse4.1", P_SSE4_1},
   {"sse4.2", P_SSE4_2},
   {"popcnt", P_POPCNT},
+  {"aes", P_AES},
   {"avx", P_AVX},
   {"bmi", P_BMI},
   {"fma4", P_FMA4},
@@ -35635,6 +35637,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
 F_AVX512F,
 F_BMI,
 F_BMI2,
+F_AES,
 F_MAX
   };
 
@@ -35730,7 +35733,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
   {"avx2",   F_AVX2},
   {"avx512f",F_AVX512F},
   {"bmi",F_BMI},
-  {"bmi2",   F_BMI2}
+  {"bmi2",   F_BMI2},
+  {"aes",F_AES}
 };
 
   tree __processor_model_type = build_processor_model_struct ();
Index: gcc/testsuite/g++.dg/ext/mv24.C
===
--- gcc/testsuite/g++.dg/ext/mv24.C (revision 0)
+++ gcc/testsuite/g++.dg/ext/mv24.C (working copy)
@@ -0,0 +1,35 @@
+// Test case to check if Multiversioning works for AES
+
+// { dg-do run { target i?86-*-* x86_64-*-* } }
+// { dg-require-ifunc "" }
+// { dg-options "-O2" }
+
+#include 
+
+// Check if AES feature selection works
+int foo () __attribute__((target("default")));
+int foo () __attribute__((target("aes")));
+
+int main ()
+{
+  int val = foo ();
+
+  if (__builtin_cpu_supports ("aes"))
+assert (val == 1);
+  else
+assert (val == 0);
+
+  return 0;
+}
+
+int __attribute__ ((target("default")))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target("aes")))
+foo ()
+{
+  return 1;
+}
Index: libgcc/config/i386/cpuinfo.c
===
--- libgcc/config/i386/cpuinfo.c(revision 226075)
+++ libgcc/config/i386/cpuinfo.c(working copy)
@@ -100,7 +100,8 @@ enum processor_features
   FEATURE_FMA,
   FEATURE_AVX512F,
   FEATURE_BMI,
-  FEATURE_BMI2
+  FEATURE_BMI2,
+  FEATURE_AES
 };
 
 struct __processor_model
@@ -273,6 +274,8 @@ get_available_features (unsigned int ecx, unsigned
 features |= (1 << FEATURE_SSE2);
   if (ecx & bit_POPCNT)
 features |= (1 << FEATURE_POPCNT);
+  if (ecx & bit_AES)
+features |= (1 << FEATURE_AES);
   if (ecx & bit_SSE3)
 features |= (1 << FEATURE_SSE3);
   if (ecx & bit_SSSE3)

[PING][PATCH, PR46193] Handle mix/max pointer reductions in parloops

2015-07-22 Thread Tom de Vries


On 13/07/15 13:02, Tom de Vries wrote:

Hi,

this patch fixes PR46193.

It handles min and max reductions of pointer type in parloops.

Bootstrapped and reg-tested on x86_64.

OK for trunk?



Ping.

Thanks,
- Tom


0001-Handle-mix-max-pointer-reductions-in-parloops.patch


Handle mix/max pointer reductions in parloops

2015-07-13  Tom de Vries

PR tree-optimization/46193
* omp-low.c (omp_reduction_init): Handle pointer type for min or max
clause.

* gcc.dg/autopar/pr46193.c: New test.

* testsuite/libgomp.c/pr46193.c: New test.
---
  gcc/omp-low.c  |  4 ++
  gcc/testsuite/gcc.dg/autopar/pr46193.c | 38 +++
  libgomp/testsuite/libgomp.c/pr46193.c  | 67 ++
  3 files changed, 109 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/autopar/pr46193.c
  create mode 100644 libgomp/testsuite/libgomp.c/pr46193.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 2e2070a..20d0010 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3423,6 +3423,8 @@ omp_reduction_init (tree clause, tree type)
real_maxval (&min, 1, TYPE_MODE (type));
  return build_real (type, min);
}
+  else if (POINTER_TYPE_P (type))
+   return lower_bound_in_type (type, type);
else
{
  gcc_assert (INTEGRAL_TYPE_P (type));
@@ -3439,6 +3441,8 @@ omp_reduction_init (tree clause, tree type)
real_maxval (&max, 0, TYPE_MODE (type));
  return build_real (type, max);
}
+  else if (POINTER_TYPE_P (type))
+   return upper_bound_in_type (type, type);
else
{
  gcc_assert (INTEGRAL_TYPE_P (type));
diff --git a/gcc/testsuite/gcc.dg/autopar/pr46193.c 
b/gcc/testsuite/gcc.dg/autopar/pr46193.c
new file mode 100644
index 000..544a5da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr46193.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details" 
} */
+
+extern void abort (void);
+
+char *
+foo (int count, char **list)
+{
+  char *minaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr < minaddr)
+   minaddr = addr;
+}
+
+  return minaddr;
+}
+
+char *
+foo2 (int count, char **list)
+{
+  char *maxaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr > maxaddr)
+   maxaddr = addr;
+}
+
+  return maxaddr;
+}
+
+/* { dg-final { scan-tree-dump-times "parallelizing inner loop" 2 "parloops" } 
} */
diff --git a/libgomp/testsuite/libgomp.c/pr46193.c 
b/libgomp/testsuite/libgomp.c/pr46193.c
new file mode 100644
index 000..1e27faf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr46193.c
@@ -0,0 +1,67 @@
+/* { dg-do run } */
+/* { dg-additional-options "-ftree-parallelize-loops=2" } */
+
+extern void abort (void);
+
+char *
+foo (int count, char **list)
+{
+  char *minaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr < minaddr)
+   minaddr = addr;
+}
+
+  return minaddr;
+}
+
+char *
+foo2 (int count, char **list)
+{
+  char *maxaddr = list[0];
+  int i;
+
+  for (i = 0; i < count; i++)
+{
+  char *addr = list[i];
+  if (addr > maxaddr)
+   maxaddr = addr;
+}
+
+  return maxaddr;
+}
+
+#define N 5
+
+static void
+init (char **list)
+{
+  int i;
+  for (i = 0; i < N; ++i)
+list[i] = (char *)&list[i];
+}
+
+int
+main (void)
+{
+  char *list[N];
+  char * res;
+
+  init (list);
+
+  res = foo (N, list);
+
+  if (res != (char *)&list[0])
+abort ();
+
+  res = foo2 (N, list);
+
+  if (res != (char *)&list[N-1])
+abort ();
+
+  return 0;
+}
-- 1.9.1

[PATCH] Remove unused get_current_pass_name

2015-07-22 Thread Bernd Edlinger

Hi,


I noticed recently that tree-pass.h contains a declaration of 
get_current_pass_name,
but this function is not defined, and where ever we need the current pass name,
we simply use current_pass->name.  So I would like to remove that declaration.


Boot-strapped and regression-tested on x86-64-linux-gnu.
OK for trunk?


Thanks
Bernd.
  2015-07-22  Bernd Edlinger  

* tree-pass.h (get_current_pass_name): Removed.



patch-tree-pass.diff
Description: Binary data

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-22 Thread Martin Sebor


On 07/14/2015 09:18 AM, Marek Polacek wrote:

Code such as "if (i == i)" is hardly ever desirable, so we should be able
to warn about this to prevent dumb mistakes.


I haven't tried the patch or even studied it very carefully but
I wonder if this is also the case when i is declared volatile.
I.e., do we want to issue a warning there? (If we do, the text
of the warning would need to be adjusted in those cases since
the expression need not evaluate to true.)

Martin

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-22 Thread Marek Polacek

On Wed, Jul 22, 2015 at 12:43:53PM -0600, Martin Sebor wrote:
> On 07/14/2015 09:18 AM, Marek Polacek wrote:
> >Code such as "if (i == i)" is hardly ever desirable, so we should be able
> >to warn about this to prevent dumb mistakes.
> 
> I haven't tried the patch or even studied it very carefully but
> I wonder if this is also the case when i is declared volatile.
> I.e., do we want to issue a warning there? (If we do, the text
> of the warning would need to be adjusted in those cases since
> the expression need not evaluate to true.)

We don't warn for volatiles because operand_equal_p doesn't consider
decls with side effects as same.  Admittedly the test doesn't test
that...

Marek

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-22 Thread Martin Sebor


On 07/22/2015 01:06 PM, Marek Polacek wrote:

On Wed, Jul 22, 2015 at 12:43:53PM -0600, Martin Sebor wrote:

On 07/14/2015 09:18 AM, Marek Polacek wrote:

Code such as "if (i == i)" is hardly ever desirable, so we should be able
to warn about this to prevent dumb mistakes.


I haven't tried the patch or even studied it very carefully but
I wonder if this is also the case when i is declared volatile.
I.e., do we want to issue a warning there? (If we do, the text
of the warning would need to be adjusted in those cases since
the expression need not evaluate to true.)


We don't warn for volatiles because operand_equal_p doesn't consider
decls with side effects as same.  Admittedly the test doesn't test
that...


I see. Thanks for clarifying that. Not warning makes sense. I would
suggest to add a test case for it then to make sure it's deliberate.

Martin

Re: [C/C++ PATCH] Implement -Wtautological-compare (PR c++/66555, c/54979)

2015-07-22 Thread Marek Polacek

On Wed, Jul 22, 2015 at 01:48:03PM -0600, Martin Sebor wrote:
> On 07/22/2015 01:06 PM, Marek Polacek wrote:
> >On Wed, Jul 22, 2015 at 12:43:53PM -0600, Martin Sebor wrote:
> >>On 07/14/2015 09:18 AM, Marek Polacek wrote:
> >>>Code such as "if (i == i)" is hardly ever desirable, so we should be able
> >>>to warn about this to prevent dumb mistakes.
> >>
> >>I haven't tried the patch or even studied it very carefully but
> >>I wonder if this is also the case when i is declared volatile.
> >>I.e., do we want to issue a warning there? (If we do, the text
> >>of the warning would need to be adjusted in those cases since
> >>the expression need not evaluate to true.)
> >
> >We don't warn for volatiles because operand_equal_p doesn't consider
> >decls with side effects as same.  Admittedly the test doesn't test
> >that...
> 
> I see. Thanks for clarifying that. Not warning makes sense. I would
> suggest to add a test case for it then to make sure it's deliberate.

Here:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-07-22  Marek Polacek  

PR c++/66555
PR c/54979
* c-common.c (find_array_ref_with_const_idx_r): New function.
(warn_tautological_cmp): New function.
* c-common.h (warn_tautological_cmp): Declare.
* c.opt (Wtautological-compare): New option.

* c-typeck.c (parser_build_binary_op): Call warn_tautological_cmp.

* call.c (build_new_op_1): Call warn_tautological_cmp.
* pt.c (tsubst_copy_and_build): Use sentinel to suppress tautological
compare warnings.

* doc/invoke.texi: Document -Wtautological-compare.

* c-c++-common/Wtautological-compare-1.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index c94596f..6a79b95 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1861,6 +1861,70 @@ warn_logical_operator (location_t location, enum 
tree_code code, tree type,
 }
 }
 
+/* Helper function for warn_tautological_cmp.  Look for ARRAY_REFs
+   with constant indices.  */
+
+static tree
+find_array_ref_with_const_idx_r (tree *expr_p, int *walk_subtrees, void *data)
+{
+  tree expr = *expr_p;
+
+  if ((TREE_CODE (expr) == ARRAY_REF
+   || TREE_CODE (expr) == ARRAY_RANGE_REF)
+  && TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
+{
+  *(bool *) data = true;
+  *walk_subtrees = 0;
+}
+
+  return NULL_TREE;
+}
+
+/* Warn if a self-comparison always evaluates to true or false.  LOC
+   is the location of the comparison with code CODE, LHS and RHS are
+   operands of the comparison.  */
+
+void
+warn_tautological_cmp (location_t loc, enum tree_code code, tree lhs, tree rhs)
+{
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+return;
+
+  /* We do not warn for constants because they are typical of macro
+ expansions that test for features, sizeof, and similar.  */
+  if (CONSTANT_CLASS_P (lhs) || CONSTANT_CLASS_P (rhs))
+return;
+
+  /* Don't warn for e.g.
+ HOST_WIDE_INT n;
+ ...
+ if (n == (long) n) ...
+   */
+  if ((CONVERT_EXPR_P (lhs) || TREE_CODE (lhs) == NON_LVALUE_EXPR)
+  || (CONVERT_EXPR_P (rhs) || TREE_CODE (rhs) == NON_LVALUE_EXPR))
+return;
+
+  if (operand_equal_p (lhs, rhs, 0))
+{
+  /* Don't warn about array references with constant indices;
+these are likely to come from a macro.  */
+  bool found = false;
+  walk_tree_without_duplicates (&lhs, find_array_ref_with_const_idx_r,
+   &found);
+  if (found)
+   return;
+  const bool always_true = (code == EQ_EXPR || code == LE_EXPR
+   || code == GE_EXPR || code == UNLE_EXPR
+   || code == UNGE_EXPR || code == UNEQ_EXPR);
+  if (always_true)
+   warning_at (loc, OPT_Wtautological_compare,
+   "self-comparison always evaluates to true");
+  else
+   warning_at (loc, OPT_Wtautological_compare,
+   "self-comparison always evaluates to false");
+}
+}
+
 /* Warn about logical not used on the left hand side operand of a comparison.
This function assumes that the LHS is inside of TRUTH_NOT_EXPR.
Do not warn if RHS is of a boolean type.  */
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index a198e79..f0640c7 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -812,6 +812,7 @@ extern bool warn_if_unused_value (const_tree, location_t);
 extern void warn_logical_operator (location_t, enum tree_code, tree,
   enum tree_code, tree, enum tree_code, tree);
 extern void warn_logical_not_parentheses (location_t, enum tree_code, tree);
+extern void warn_tautological_cmp (location_t, enum tree_code, tree, tree);
 extern void check_main_parameter_types (tree decl);
 extern bool c_determine_visibility (tree);
 extern bool vector_types_compatible_elements_p (tree, tree);
diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index dc7

Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-07-22 Thread Sriraman Tallam

On Fri, Apr 17, 2015 at 5:36 AM, H.J. Lu  wrote:
> On Fri, Apr 17, 2015 at 4:59 AM, Jakub Jelinek  wrote:
>> On Fri, Apr 17, 2015 at 04:48:48AM -0700, H.J. Lu wrote:
>>> > I don't like it.  Nonshared libgcc is libgcc.a, period.  No sense in
>>> > creating yet another library for that.
>>> > So, IMHO beyond making the __cpu* entrypoints compat symbols only (@ 
>>> > instead
>>> > of @@ symbol versions) the right fix is simply tweak init_gcc_spec, so 
>>> > that
>>> > static_name is always linked in, in the switch combinations that it isn't
>>> > right now of course after shared_name rather than before that.
>>> > I thought we've fixed that years ago...
>>> >
>>>
>>> We never pass -lgcc to linker when building C++ DSO:
>>>
>>>  /usr/libexec/gcc/x86_64-redhat-linux/4.9.2/collect2 -plugin
>>> /usr/libexec/gcc/x86_64-redhat-linux/4.9.2/liblto_plugin.so
>>> -plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/4.9.2/lto-wrapper
>>> -plugin-opt=-fresolution=/tmp/ccZC7iqy.res
>>> -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
>>> -plugin-opt=-pass-through=-lgcc_s --build-id --no-add-needed
>>> --eh-frame-hdr --hash-style=gnu -m elf_x86_64 -shared
>>> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../lib64/crti.o
>>> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/crtbeginS.o
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.9.2
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../lib64
>>> -L/lib/../lib64 -L/usr/lib/../lib64
>>> -L/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../.. x.o -lstdc++ -lm
>>> -lgcc_s -lc -lgcc_s /usr/lib/gcc/x86_64-redhat-linux/4.9.2/crtendS.o
>>> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../lib64/crtn.o
>>> [hjl@gnu-32 tmp]$
>>>
>>> That is why libgcc_nonshared.a is needed.
>>
>> See what I wrote.  I think it is a bug that we don't do that, in your case
>> we should pass -lgcc_s -lgcc -lc -lgcc_s -lgcc.
>> Or, if you don't want to change that, as the multi-versioning change is
>> i386/x86_64 only change, just ensure that those targets have
>> t-slibgcc-libgcc in libgcc/config.host and thus behave like most other linux
>> targets where -lgcc is linked in always after -lgcc_s.
>>
>> Jakub
>
> This patch works for me.  OK for trunk?

H.J:  This patch assumes that libgcc.a is built with
-fvisibility=hidden and __cpu_indicator_init is LOCAL to libgcc.a.  We
have a config where libgcc.a is not built with hidden visibility.  Can
we consider this additional patch to make this explicit, mark
__cpu_indicator_init with hidden visiblity explicitly when not
building shared object.


--- config/i386/cpuinfo.c (revision 225800)
+++ config/i386/cpuinfo.c (working copy)
@@ -34,6 +34,9 @@
 #endif

 int __cpu_indicator_init (void)
+#if !defined(SHARED)
+__attribute__ ((visibility("hidden")))
+#endif
   __attribute__ ((constructor CONSTRUCTOR_PRIORITY));

 /* Get the specific type of AMD CPU.  */
@@ -321,6 +324,9 @@
needs to be called explicitly there.  */

 int __attribute__ ((constructor CONSTRUCTOR_PRIORITY))
+#if !defined(SHARED)
+__attribute__ ((visibility("hidden")))
+#endif
 __cpu_indicator_init (void)


Also,  gold and ld have an incompatibility with symbol versioning as
discussed here:  https://sourceware.org/bugzilla/show_bug.cgi?id=18703

H.J. suggested this nice fix to solve this problem where BFD ld is
always used to build libgcc_s.so.1:

You can pass -fuse-ld=bfd to build libgcc_s.so.1 on Linux:

diff --git a/libgcc/config/i386/t-linux b/libgcc/config/i386/t-linux
index 11bb46e..12aab16 100644
--- a/libgcc/config/i386/t-linux
+++ b/libgcc/config/i386/t-linux
@@ -3,4 +3,8 @@
 # t-slibgcc-elf-ver and t-linux
 SHLIB_MAPFILES = libgcc-std.ver $(srcdir)/config/i386/libgcc-glibc.ver

+# Work around gold bug:
+# https://sourceware.org/bugzilla/show_bug.cgi?id=18703
+SHLIB_LDFLAGS += -fuse-ld=bfd
+
 HOST_LIBGCC2_CFLAGS += -mlong-double-80 -DUSE_ELF_SYMVER


Thanks
Sri

>
> gcc/testsuite/
>
> PR target/65612
> * g++.dg/ext/mv18.C: New test.
> * g++.dg/ext/mv19.C: Likewise.
> * g++.dg/ext/mv20.C: Likewise.
> * g++.dg/ext/mv21.C: Likewise.
> * g++.dg/ext/mv22.C: Likewise.
> * g++.dg/ext/mv23.C: Likewise.
>
> libgcc/
>
> PR target/65612
> * config.host (tmake_file): Add t-slibgcc-libgcc for Linux/x86.
> * config/i386/cpuinfo.c (__cpu_model): Initialize.
> (__cpu_indicator_init@GCC_4.8.0): New.
> (__cpu_model@GCC_4.8.0): Likewise.
> * config/i386/t-linux (HOST_LIBGCC2_CFLAGS): Add
> -DUSE_ELF_SYMVER.
>
> Thanks.
>
> --
> H.J.

[C++ Patch] PR 52987

2015-07-22 Thread Paolo Carlini


Hi,

this bug is purely about error recovery. A while ago I fixed the first 
half, but for, eg:


int foo(x a) {
}

we still emit the pointless:

52987_2.C:1:14: error: expected ‘,’ or ‘;’ before ‘{’ token

In fact, we *already* have code helping error recovery in 
cp_parser_simple_declaration:


  /* If we have already issued an error message we don't need
 to issue another one.  */
  if (decl != error_mark_node
  || cp_parser_uncommitted_to_tentative_parse_p (parser))
cp_parser_error (parser, "expected %<,%> or %<;%>");

but it doesn't trigger in such cases, because the decl is in fact != 
error_mark_node, trace of the error can be found only in its 
DECL_INITIAL (I noticed that only today ;) Thus the below, which so far 
appears to work well for me, passes testing on x86_64-linux 
(g++.old-deja/g++.law/init7.C included, which blocks even simpler 
solutions).


Thanks,
Paolo.



/cp
2015-07-23  Paolo Carlini  

PR c++/52987
* parser.c (cp_parser_simple_declaration): Robustify check avoiding
duplicated error messages.

/testsuite
2015-07-23  Paolo Carlini  

PR c++/52987
* g++.dg/parse/error57.C: New.
* g++.dg/expr/string-2.C: Update.
Index: cp/parser.c
===
--- cp/parser.c (revision 226075)
+++ cp/parser.c (working copy)
@@ -11660,7 +11660,8 @@ cp_parser_simple_declaration (cp_parser* parser,
{
  /* If we have already issued an error message we don't need
 to issue another one.  */
- if (decl != error_mark_node
+ if ((decl != error_mark_node
+  && DECL_INITIAL (decl) != error_mark_node)
  || cp_parser_uncommitted_to_tentative_parse_p (parser))
cp_parser_error (parser, "expected %<,%> or %<;%>");
  /* Skip tokens until we reach the end of the statement.  */
Index: testsuite/g++.dg/expr/string-2.C
===
--- testsuite/g++.dg/expr/string-2.C(revision 226075)
+++ testsuite/g++.dg/expr/string-2.C(working copy)
@@ -4,7 +4,7 @@
 char a[1];
 
 int foo(a = "") // { dg-error "invalid array assignment" }
-{ // { dg-error "" }
+{
   return 0;
 }
 
Index: testsuite/g++.dg/parse/error57.C
===
--- testsuite/g++.dg/parse/error57.C(revision 0)
+++ testsuite/g++.dg/parse/error57.C(working copy)
@@ -0,0 +1,4 @@
+// PR c++/52987
+
+int foo(x a) {  // { dg-error "9:'x' was not declared in this scope" }
+}

Re: [gomp4.1] Initial support for some OpenMP 4.1 construct parsing

2015-07-22 Thread Jakub Jelinek

On Mon, Jul 20, 2015 at 08:10:41PM +0200, Jakub Jelinek wrote:
> And here is untested incremental libgomp side of the proposed
> GOMP_MAP_FIRSTPRIVATE_POINTER.

Actually, that seems unnecessary, for the array section maps we already
have there a pointer, so we can easily implement that just on the
compiler side.

Here is a WIP patch.

Unfortunately, in order not to break numerous examples-4/ testcases that
were doing target data map of array sections with target region without
any explicit maps, with the new way where target{, enter, exit} data
no longer map the base pointer, I had to implement the new implicit
pointer mapping semantics (map (alloc:ptr[0:0])) already in this patch.

And, that patch really requires that if there is ptr[0:something] for
something > 0 already mapped that we use the ptr[0:something] mapping rather
than ptr[0:0].  See the libgomp changes for that.

Unfortunately, that occassionally breaks the target8.f90 testcase at -O0,
where we map zero-sized FRAME.6 object which happens to be adjacent to the
array.  And that reveals IMNSHO very serious flaw in the current standard
draft, no idea what can be done about that...

--- libgomp/testsuite/libgomp.c++/target-7.C.jj 2015-07-22 11:36:53.042867520 
+0200
+++ libgomp/testsuite/libgomp.c++/target-7.C2015-07-22 11:32:00.0 
+0200
@@ -0,0 +1,90 @@
+extern "C" void abort ();
+
+void
+foo (int *x, int *&y, int (&z)[15])
+{
+  int a[10], b[15], err, i;
+  for (i = 0; i < 10; i++)
+a[i] = 7 * i;
+  for (i = 0; i < 15; i++)
+b[i] = 8 * i;
+  #pragma omp target map(to:x[5:10], y[5:10], z[5:10], a[0:10], b[5:10]) 
map(from:err)
+  {
+err = 0;
+for (i = 0; i < 10; i++)
+  if (x[5 + i] != 20 + 4 * i
+ || y[5 + i] != 25 + 5 * i
+ || z[5 + i] != 30 + 6 * i
+ || a[i] != 7 * i
+ || b[5 + i] != 40 + 8 * i)
+   err = 1;
+  }
+  if (err)
+abort ();
+}
+
+void
+bar (int n, int v)
+{
+  int a[n], b[n], c[n], d[n], e[n], err, i;
+  int (*x)[n] = &c;
+  int (*y2)[n] = &d;
+  int (*&y)[n] = y2;
+  int (&z)[n] = e;
+  for (i = 0; i < n; i++)
+{
+  (*x)[i] = 4 * i;
+  (*y)[i] = 5 * i;
+  z[i] = 6 * i;
+  a[i] = 7 * i;
+  b[i] = 8 * i;
+}
+  #pragma omp target map(to:x[0][5:10], y[0][5:10], z[5:10], a[0:10], b[5:10]) 
map(from:err)
+  {
+err = 0;
+for (i = 0; i < 10; i++)
+  if ((*x)[5 + i] != 20 + 4 * i
+ || (*y)[5 + i] != 25 + 5 * i
+ || z[5 + i] != 30 + 6 * i
+ || a[i] != 7 * i
+ || b[5 + i] != 40 + 8 * i)
+   err = 1;
+  }
+  if (err)
+abort ();
+  for (i = 0; i < n; i++)
+{
+  (*x)[i] = 9 * i;
+  (*y)[i] = 10 * i;
+  z[i] = 11 * i;
+  a[i] = 12 * i;
+  b[i] = 13 * i;
+}
+  #pragma omp target map(to:x[0][v:v+5], y[0][v:v+5], z[v:v+5], a[v-5:v+5], 
b[v:v+5]) map(from:err)
+  {
+err = 0;
+for (i = 0; i < 10; i++)
+  if ((*x)[5 + i] != 45 + 9 * i
+ || (*y)[5 + i] != 50 + 10 * i
+ || z[5 + i] != 55 + 11 * i
+ || a[i] != 12 * i
+ || b[5 + i] != 65 + 13 * i)
+   err = 1;
+  }
+  if (err)
+abort ();
+}
+
+int
+main ()
+{
+  int x[15], y2[15], z[15], *y = y2, i;
+  for (i = 0; i < 15; i++)
+{
+  x[i] = 4 * i;
+  y[i] = 5 * i;
+  z[i] = 6 * i;
+}
+  foo (x, y, z);
+  bar (15, 5);
+}
--- libgomp/testsuite/libgomp.c/target-15.c.jj  2015-07-22 11:37:11.655612690 
+0200
+++ libgomp/testsuite/libgomp.c/target-15.c 2015-07-22 11:38:54.590203394 
+0200
@@ -0,0 +1,74 @@
+extern void abort ();
+
+void
+foo (int *x)
+{
+  int a[10], b[15], err, i;
+  for (i = 0; i < 10; i++)
+a[i] = 7 * i;
+  for (i = 0; i < 15; i++)
+b[i] = 8 * i;
+  #pragma omp target map(to:x[5:10], a[0:10], b[5:10]) map(from:err)
+  {
+err = 0;
+for (i = 0; i < 10; i++)
+  if (x[5 + i] != 20 + 4 * i
+ || a[i] != 7 * i
+ || b[5 + i] != 40 + 8 * i)
+   err = 1;
+  }
+  if (err)
+abort ();
+}
+
+void
+bar (int n, int v)
+{
+  int a[n], b[n], c[n], d[n], e[n], err, i;
+  int (*x)[n] = &c;
+  for (i = 0; i < n; i++)
+{
+  (*x)[i] = 4 * i;
+  a[i] = 7 * i;
+  b[i] = 8 * i;
+}
+  #pragma omp target map(to:x[0][5:10], a[0:10], b[5:10]) map(from:err)
+  {
+err = 0;
+for (i = 0; i < 10; i++)
+  if ((*x)[5 + i] != 20 + 4 * i
+ || a[i] != 7 * i
+ || b[5 + i] != 40 + 8 * i)
+   err = 1;
+  }
+  if (err)
+abort ();
+  for (i = 0; i < n; i++)
+{
+  (*x)[i] = 9 * i;
+  a[i] = 12 * i;
+  b[i] = 13 * i;
+}
+  #pragma omp target map(to:x[0][v:v+5], a[v-5:v+5], b[v:v+5]) map(from:err)
+  {
+err = 0;
+for (i = 0; i < 10; i++)
+  if ((*x)[5 + i] != 45 + 9 * i
+ || a[i] != 12 * i
+ || b[5 + i] != 65 + 13 * i)
+   err = 1;
+  }
+  if (err)
+abort ();
+}
+
+int
+main ()
+{
+  int x[15], i;
+  for (i = 0; i < 15; i++)
+x[i] = 4 * i;
+  foo (x);
+  bar (15, 5);
+  return 0;
+}
--- libgomp/target.c.jj 2015-07-21 09:07:23.690851

[PATCH, MIPS] Compact branch support for MIPS32R6/MIPS64R6

2015-07-22 Thread Matthew Fortune

A full range of 'compact' branch instructions were introduced to MIPS
as part of Release 6. The compact term is used to identify the fact
that these do not have a delay slot.

http://imgtec.com/mips/architectures/mips64/

The one subtlety of compact branches is that while they do not have
a delay slot they do have a restriction on what can immediately follow
them. The restriction is referred to as a forbidden slot in the
architecture specification and exists only on the not-taken path of
a conditional compact branch. (The detail of whether the hazard exists
on a not-taken branch is not relevant to a compiler however as it
has to be accounted for anyway as we would not generate a compact
branch if it were always taken.)

The forbidden slot restriction equates to the same rule as delay slots
where control flow instructions are not allowed to be placed there. The
exact same set of instructions cannot be placed in a forbidden slot.

An additional class of branch instructions is also available in
compact form only which allow ordering conditions to be applied
between two register sources. Support for these is included in this
patch.

So how does all this work in GCC?

Compact branches are used based on a branch policy. The polices are:

never: Only use delay slot branches
optimal: Do whatever is best for the current architecture.  This will
 generally mean that delay slot branches will be used if the delay
 slot gets filled but otherwise a compact branch will be used. A
 special case here is that JAL and J will not be used in R6 code
 regardless of whether the delay slot could be filled.
always: Never emit a delay slot form of a branch if a compact form exists.
This policy cannot apply 100% as FP branches (and MSA branches when
committed) only have delay slot forms.

These user choices are combined with the features available in the chosen
architecture and, in particular, the optimal form will get handled like
'never' when there are no compact branches available and will get handled
like 'always' when there are no delay slot branches available.

>From an instruction description perspective we also mark each branch with
a compact_form attribute that says if it 'never' has a compact form, 'maybe'
has a compact form dependent on delay slot filling, or 'always' comes in
a compact form. A secondary attribute is also used to describe whether the
instruction has a forbidden slot hazard. This applies to conditional compact
branches and means that although they do not have a delay slot, it is still
not possible to place a branch instruction immediately after them.

The define_delay definitions are configured by a combination of the user
selected branch policy and the compact_form attribute. This means the
delay slot filler will only operate on branches that should have delay slots.

Output patterns for branches fall into two categories:

1) Predetermined to be compact or delay slot, or this has been detected at
   the point of emitting the pattern. These will generally not use any
   formatters for the 'c' or the trailing NOP that normally get automatically
   injected by the mips_print_operand_punctuation function.
2) Use instruction formatters to enable a branch to naturally become a
   delay slot or compact form depending on whether a delay slot has been
   filled. These will use the %: formatter to indicate that a 'c' can
   be added instead of inserting a NOP using the %/ formatter. I.e.
   %: and %/ should never appear in the same branch instruction pattern.

It is generally safe to rely on using the formatters to produce the correct
branch instructions as a branch instruction that can only have a compact
form will not have a define_delay and therefore will never be in a final
sequence... This then means the %: is guaranteed to emit a 'c'.

The most complicated aspect of this change is to the MIPS_CALL and
MICROMIPS_J macros. These have been rewritten from scratch as a function
that generates an instruction instead.  This code is more complicated than
ordinary 'branch' code as J becomes BC and JAL become BALC which renders
instruction formatters impossible to use. The complexities of pic/non-pic
microMIPS/MIPS and absolute/relative addressing meant that wrapping all
that up in one place made much more sense. Matching the old macros to the
new function is hard but the conversion has been done carefully with a
significant amount of focussed testing.

Some of the framework in this patch is there in preparation for microMIPSR6
which only has compact branches. The support for adding microMIPSR6 to
GCC is a trivial patch on top of this.

This has been tested on multiple configurations albeit that most
configurations were tested from an older trunk revision. A re-run of a
wide range of configurations will be done after review/before commit.
This code has also been in use as part of tools to support internal
development of the I6400 core from Imagination.

gcc/
* con

[patch] Include reduction on libackend.a and language source files

2015-07-22 Thread Andrew MacLeod

This is the result of running include reduction on all the files which 
make up libbackend.a, as well as most of the language files found in 
subdirectories  lto, c ,cp, java, go, fortran, jit, ada. well, some of 
ada. :-)


I looked at the output and hand tweaked a few things... removing 
comments that no longer made sense and stuff like that.


The reduction tool was run across all the targets to pick up macros that 
might be defined.  An Include file was not removed if it defined a macro 
which was used in a conditional expression (ie #if) either in the source 
file, or in other includes files which were determined to be required.
During removal, the header was removed on the host machine, and if 
compilation was successful, the tool proceeded to try it on all 
targets.  I did a dry run on all 201 functioning targets, and the 
results from 1.7 million lines of log file showed that full coverage can 
be attained  with 13 targets:
 aarch64-linux-gnu arm-netbsdelf avr-rtems c6x-elf epiphany-elf 
hppa2.0-hpux10.1 i686-mingw32crt i686-pc-msdosdjgpp mipsel-elf 
powerpc-eabisimaltivec rs6000-ibm-aix5.1.0 sh-superh-elf sparc64-elf spu-elf


The final run was on the coverage targets, and ran much much faster.

I then ran it through an ordering tool, (which I will eventually put in 
contrib).  This tool looks at include files, and puts them in a 
"standard" order, and removes duplicates that have already been 
included.. even if it is indirectly via another file.  ie, it will 
remove obstack.h from the list if bitmap.h has been included for instance.
removing duplicates was a very delicate balancing act when trying to 
aggregate them with other includes,

ie
#include "option.h"
<...>
#include "target.h"

Since target.h includes tm.h (which includes options.h). we don't need 
to include options.h  BUt there may be header files between the two that 
require something in options.h, so target.h needs to be moved up to the 
options.h location.   There are often secondary effects which affect 
other files, and it turned out to be a frustrating juggling act.  So I 
wrote the tool to take care of it. The standard "grouping" order of 
includes turns out to look like :

  "system.h",
  "coretypes.h",
  "backend.h",
  "target.h",
  "rtl.h",
  "tree.h",
  "fortran/gfortran.h",
  "c-family/c-common.h",
  "c/c-tree.h",
  "cp/cp-tree.h",
  "gimple.h",
  "df.h",
  "tm_p.h",
  "gimple-iterators.h",
  "ssa.h",
  "expmed.h",
  "optabs.h",
  "recog.h",
  "gimple-streamer.h"

This order resolves any issues.  The tool also looks at all the files 
included by these and avoids including them a second time. Note that it 
only puts header files into this order which are in the source file. so 
if backend.h isnt in the file, and  function.h is, function.h will occur 
where backend.h would be.   Any headers included by backend.h will occur 
in that relative position, and in the order they are included by backend,.h


I will eventually put all these tools into a directory in contrib. It 
simple enough to run it this ordering on any source file.


This patch is my best effort at a correct include reduction. I 
bootstrapped it on both x86_64-unknown-linux-gnu and 
powerpc64le-unknown-linux-gnu, with no regressions on either host. It 
builds all 201 config-list.mk targets which currently build.  Im not 
aware of any bugs in the tools and my testing seems to show they work 
OK. , and everything seems sane.  Some files look dramatically better :-)


ok for trunk?

I will make tweaks to the tool in order to do the config directories 
next, and a few remaining files that have not been reduced yet.


Andrew



reduce.patch.gz
Description: application/gzip

libstdc++: more __intN tweaks

2015-07-22 Thread DJ Delorie


Another place where a list of "all" types are explicitly listed, and
the __intN types need to be included, and elsewhere protection against
errors [-Wnarrowing] on targets that have small size_t.  Ok?

* include/bits/functional_hash.h: Add specializations for __intN
types.

* include/ext/pb_ds/detail/thin_heap_/thin_heap_.hpp (__gnu_pbds):
Guard against values that might exceed size_t's precision.
 

Index: libstdc++-v3/include/ext/pb_ds/detail/thin_heap_/thin_heap_.hpp
===
--- libstdc++-v3/include/ext/pb_ds/detail/thin_heap_/thin_heap_.hpp 
(revision 226081)
+++ libstdc++-v3/include/ext/pb_ds/detail/thin_heap_/thin_heap_.hpp 
(working copy)
@@ -267,36 +267,45 @@ namespace __gnu_pbds
/* 18*/ 3571ul,
/* 19*/ 5777ul,
/* 20*/ 9349ul,
/* 21*/ 15126ul,
/* 22*/ 24476ul,
/* 23*/ 39602ul,
-   /* 24*/ 64079ul,
+   /* 24*/ 64079ul
+#if __SIZE_MAX__ > 0xul
+   ,
/* 25*/ 103681ul,
/* 26*/ 167761ul,
/* 27*/ 271442ul,
/* 28*/ 439204ul,
-   /* 29*/ 710646ul,
+   /* 29*/ 710646ul
+#if __SIZE_MAX__ > 0xful
+   ,
/* 30*/ 1149851ul,
/* 31*/ 1860497ul,
/* 32*/ 3010349ul,
/* 33*/ 4870846ul,
/* 34*/ 7881196ul,
-   /* 35*/ 12752042ul,
+   /* 35*/ 12752042ul
+#if __SIZE_MAX__ > 0xfful
+   ,
/* 36*/ 20633239ul,
/* 37*/ 33385282ul,
/* 38*/ 54018521ul,
/* 39*/ 87403803ul,
/* 40*/ 141422324ul,
/* 41*/ 228826127ul,
/* 42*/ 370248451ul,
/* 43*/ 599074578ul,
/* 44*/ 969323029ul,
/* 45*/ 1568397607ul,
/* 46*/ 2537720636ul,
/* 47*/ 4106118243ul
+#endif
+#endif
+#endif
/* Pot's good, let's play */
   };
 
 #define PB_DS_ASSERT_NODE_CONSISTENT(_Node, _Bool) \
   _GLIBCXX_DEBUG_ONLY(assert_node_consistent(_Node, _Bool, \
 __FILE__, __LINE__);)
Index: libstdc++-v3/include/bits/functional_hash.h
===
--- libstdc++-v3/include/bits/functional_hash.h (revision 226081)
+++ libstdc++-v3/include/bits/functional_hash.h (working copy)
@@ -118,12 +118,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// Explicit specialization for unsigned long.
   _Cxx_hashtable_define_trivial_hash(unsigned long)
 
   /// Explicit specialization for unsigned long long.
   _Cxx_hashtable_define_trivial_hash(unsigned long long)
 
+#ifdef __GLIBCXX_TYPE_INT_N_0
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_0)
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_0 unsigned)
+#endif
+#ifdef __GLIBCXX_TYPE_INT_N_1
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_1)
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_1 unsigned)
+#endif
+#ifdef __GLIBCXX_TYPE_INT_N_2
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_2)
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_2 unsigned)
+#endif
+#ifdef __GLIBCXX_TYPE_INT_N_3
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_3)
+  _Cxx_hashtable_define_trivial_hash(__GLIBCXX_TYPE_INT_N_3 unsigned)
+#endif
+
 #undef _Cxx_hashtable_define_trivial_hash
 
   struct _Hash_impl
   {
 static size_t
 hash(const void* __ptr, size_t __clength,

[msp430] minor optimizations and tweaks

2015-07-22 Thread DJ Delorie


As indicated.  Committed.

* config/msp430/t-msp430 (MULTILIB_DIRNAMES): Remove trailing
slashes.

* config/msp430/msp430.md (ashlhi3): Optimize shifts of subregs.
(ashrhi3): Likewise.
(lshrhi3): Likewise.
(movhi): Take advantage of zero-extend to load small constants.
(movpsi): Likewise.
(and3): Likewise.
(zero_extendqihi2): Likewise.
(zero_extendqisi2): New.
* config/msp430/constraints.md (N,O): New.
* config/msp430/msp430.h (WORD_REGISTER_OPERATIONS): Define.

Index: config/msp430/msp430.md
===
--- config/msp430/msp430.md (revision 226084)
+++ config/msp430/msp430.md (working copy)
@@ -196,16 +196,17 @@
   "@
   MOV.B\t%1, %0
   MOV%X0.B\t%1, %0"
 )
 
 (define_insn "movhi"
-  [(set (match_operand:HI 0 "msp_nonimmediate_operand" "=rYs,rm")
-   (match_operand:HI 1 "msp_general_operand" "riYs,rmi"))]
+  [(set (match_operand:HI 0 "msp_nonimmediate_operand" "=r,rYs,rm")
+   (match_operand:HI 1 "msp_general_operand" "N,riYs,rmi"))]
   ""
   "@
+  MOV.B\t%1, %0
   MOV.W\t%1, %0
   MOV%X0.W\t%1, %0"
 )
 
 (define_expand "movsi"
   [(set (match_operand:SI 0 "nonimmediate_operand")
@@ -239,16 +240,18 @@
(match_operand:HI 5 "general_operand"))]
   "msp430_split_movsi (operands);"
 )
 
 ;; Some MOVX.A cases can be done with MOVA, this is only a few of them.
 (define_insn "movpsi"
-  [(set (match_operand:PSI 0 "msp_nonimmediate_operand" "=r,Ya,rm")
-   (match_operand:PSI 1 "msp_general_operand" "riYa,r,rmi"))]
+  [(set (match_operand:PSI 0 "msp_nonimmediate_operand" "=r,r,r,Ya,rm")
+   (match_operand:PSI 1 "msp_general_operand" "N,O,riYa,r,rmi"))]
   ""
   "@
+  MOV.B\t%1, %0
+  MOV.W\t%1, %0
   MOVA\t%1, %0
   MOVA\t%1, %0
   MOVX.A\t%1, %0")
 
 ; This pattern is identical to the truncsipsi2 pattern except
 ; that it uses a SUBREG instead of a TRUNC.  It is needed in
@@ -497,17 +500,18 @@
   "@
BIC%x0%b0\t%1, %0
BIC%X0%b0\t%1, %0"
 )
 
 (define_insn "and3"
-  [(set (match_operand:QHI 0 "msp_nonimmediate_operand" "=rYs,rm")
-   (and:QHI (match_operand:QHI 1 "msp_nonimmediate_operand" "%0,0")
-(match_operand:QHI 2 "msp_general_operand" "riYs,rmi")))]
+  [(set (match_operand:QHI 0 "msp_nonimmediate_operand" "=r,rYs,rm")
+   (and:QHI (match_operand:QHI 1 "msp_nonimmediate_operand" "%0,0,0")
+(match_operand:QHI 2 "msp_general_operand" "N,riYs,rmi")))]
   ""
   "@
+   AND%x0.B\t%2, %0
AND%x0%b0\t%2, %0
AND%X0%b0\t%2, %0"
 )
 
 (define_insn "ior3"
   [(set (match_operand:QHI  0 "msp_nonimmediate_operand" "=rYs,rm")
@@ -546,17 +550,19 @@
   "@
SXT%X0\t%0
SXT%X0\t%0"
 )
 
 (define_insn "zero_extendqihi2"
-  [(set (match_operand:HI 0 "msp_nonimmediate_operand" 
"=rYs,m")
-   (zero_extend:HI (match_operand:QI 1 "msp_nonimmediate_operand" "0,0")))]
+  [(set (match_operand:HI 0 "msp_nonimmediate_operand" 
"=rYs,r,r,m")
+   (zero_extend:HI (match_operand:QI 1 "msp_nonimmediate_operand" 
"0,rYs,m,0")))]
   ""
   "@
AND\t#0xff, %0
+   MOV.B\t%1, %0
+   MOV%X0.B\t%1, %0
AND%X0\t#0xff, %0"
 )
 
 ;; Eliminate extraneous zero-extends mysteriously created by gcc.
 (define_peephole2
   [(set (match_operand:HI 0 "register_operand")
@@ -599,12 +605,20 @@
 )
 
 ;; Look for cases where integer/pointer conversions are suboptimal due
 ;; to missing patterns, despite us not having opcodes for these
 ;; patterns.  Doing these manually allows for alternate optimization
 ;; paths.
+
+(define_insn "zero_extendqisi2"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
+   (zero_extend:SI (subreg:HI (match_operand:QI 1 "nonimmediate_operand" 
"rm") 0)))]
+  "msp430x"
+  "MOV.B\t%1,%L0 { CLR\t%H0"
+)
+
 (define_insn "zero_extendhisi2"
   [(set (match_operand:SI 0 "nonimmediate_operand" "=rm,r")
(zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "0,r")))]
   "msp430x"
   "@
   MOV.W\t#0,%H0
@@ -731,12 +745,15 @@
 (define_expand "ashlhi3"
   [(set (match_operand:HI0 "nonimmediate_operand")
(ashift:HI (match_operand:HI 1 "general_operand")
   (match_operand:HI 2 "general_operand")))]
   ""
   {
+if (GET_CODE (operands[1]) == SUBREG
+&& REG_P (XEXP (operands[1], 0)))
+  operands[1] = force_reg (HImode, operands[1]);
 if (msp430x
 && REG_P (operands[0])
 && REG_P (operands[1])
 && CONST_INT_P (operands[2]))
   emit_insn (gen_430x_shift_left (operands[0], operands[1], operands[2]));
 else
@@ -797,12 +814,15 @@
 (define_expand "ashrhi3"
   [(set (match_operand:HI  0 "nonimmediate_operand")
(ashiftrt:HI (match_operand:HI 1 "general_operand")
 (match_operand:HI 2 "general_operand")))]
   ""
   {
+if (GET_CODE (operands[1]) == SUBREG
+&& REG_P (XEXP (operands[1], 0)))
+

Re: [PR25529] Convert (unsigned t * 2)/2 into unsigned (t & 0x7FFFFFFF)

2015-07-22 Thread Hurugalawadi, Naveen

>> so using wi::mask is prefered here.

Thanks for your review and comments.

Please find attached the modified patch as per your comments.

Please let me know if this version is okay?

Thanks,
Naveen

2015-07-22  Naveen H.S  

gcc/testsuite/ChangeLog:
 PR middle-end/25529
 * gcc.dg/pr25529.c: New test.

gcc/ChangeLog:
 PR middle-end/25529
 * match.pd (exact_div (mult @0 INTEGER_CST@1) @1) : New simplifier.
 (trunc_div (mult @0 integer_pow2p@1) @1) : New simplifier.
diff --git a/gcc/match.pd b/gcc/match.pd
index 9a66f52..9c8080f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -29,7 +29,8 @@ along with GCC; see the file COPYING3.  If not see
integer_each_onep integer_truep
real_zerop real_onep real_minus_onep
CONSTANT_CLASS_P
-   tree_expr_nonnegative_p)
+   tree_expr_nonnegative_p
+   integer_pow2p)
 
 /* Operator lists.  */
 (define_operator_list tcc_comparison
@@ -280,6 +281,20 @@ along with GCC; see the file COPYING3.  If not see
 	&& integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
(bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); }))
 
+/* Simplify (t * 2)/2 ->  t.  */
+(simplify
+ (exact_div (mult @0 INTEGER_CST@1) @1)
+ (if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))
+  @0))
+
+/* Simplify (unsigned t * 2)/2 -> unsigned t & 0x7FFF.  */
+(simplify
+ (trunc_div (mult @0 integer_pow2p@1) @1)
+ (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+  (bit_and @0 { wide_int_to_tree
+		(type, wi::mask (TYPE_PRECISION (type) - wi::exact_log2 (@1),
+ false, TYPE_PRECISION (type))); })))
+
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/pr25529.c b/gcc/testsuite/gcc.dg/pr25529.c
new file mode 100644
index 000..4d9fe9e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr25529.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+f (unsigned t)
+{
+  return (t * 2) / 2;
+}
+
+/* { dg-final { scan-tree-dump "\& 2147483647" "optimized" } } */

Re: [PR25530] Convert (unsigned t / 2) * 2 into (unsigned t & ~1)

2015-07-22 Thread Hurugalawadi, Naveen

>> Your previous patch correctly restricted this to unsigned types.

Thanks for your review and comments.

Please find attached the modified patch as per your comments.

Please let me know if this version is okay?

Thanks,
Naveen

2015-07-22  Naveen H.S  

gcc/testsuite/ChangeLog:
 PR middle-end/25530
 * gcc.dg/pr25530.c: New test.

 gcc/ChangeLog:
 PR middle-end/25530
 * match.pd (mult (trunc_div @0 integer_pow2p@1) @1) : New simplifier.
diff --git a/gcc/match.pd b/gcc/match.pd
index 9a66f52..6c37a20 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -29,7 +29,8 @@ along with GCC; see the file COPYING3.  If not see
integer_each_onep integer_truep
real_zerop real_onep real_minus_onep
CONSTANT_CLASS_P
-   tree_expr_nonnegative_p)
+   tree_expr_nonnegative_p
+   integer_pow2p)
 
 /* Operator lists.  */
 (define_operator_list tcc_comparison
@@ -280,6 +281,12 @@ along with GCC; see the file COPYING3.  If not see
 	&& integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
(bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); }))
 
+/* Simplify (unsigned t / 2) * 2 -> unsigned t & ~1.  */
+(simplify
+ (mult (trunc_div @0 integer_pow2p@1) @1)
+ (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+  (bit_and @0 (negate @1
+
 /* X % Y is smaller than Y.  */
 (for cmp (lt ge)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/pr25530.c b/gcc/testsuite/gcc.dg/pr25530.c
new file mode 100644
index 000..f180768
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr25530.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+f (unsigned t)
+{
+  return (t / 2) * 2;
+}
+
+/* { dg-final { scan-tree-dump "\& -2" "optimized" } } */

Re: [C++ Patch] PR 52987

2015-07-22 Thread Jason Merrill


OK.

Jason

94 matches

Mail list logo