Re: [PATCH][25/25] Remove GENERIC stmt combining from SCCVN

2015-10-01 Thread Richard Biener
On Wed, 30 Sep 2015, Richard Biener wrote:

> 
> This is the last patch in the series and it finally ditches the
> stmt combining code from SCCVN which uses GENERIC.  I've been sitting
> on this for a while because of the "bad" interface that new mprts_hook
> is but I couldn't think of a better way than completely refactoring
> stmt folding into more C++ (and I'm not even sure how that end result
> would look like).  So rather than pondering on this forever the following
> patch goes forward.
> 
> Net result is that there will be hopefully no regressions (I know
> about a few corner cases I found with plastering the code with asserts
> but I do not consider them important) but progression both with regarding
> to compile-time / memory-use and optimization (because the new code
> is strictly more powerful, not relying on the has_constants heuristic).
> 
> This is also the last major piece that was sitting on the
> match-and-simplify branch.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.

This is the variant I committed.

Richard.

2015-10-01  Richard Biener  

* gimple-match.h (mprts_hook): Declare.
* gimple-match.head.c (mprts_hook): Define.
(maybe_push_res_to_seq): Use new hook.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Likewise.
* tree-ssa-sccvn.h (vn_ssa_aux::expr): Change to a gimple_seq.
(vn_ssa_aux::has_constants): Remove.
* tree-ssa-sccvn.c: Include gimple-match.h.
(VN_INFO_GET): Assert we don't re-use SSA names.
(vn_get_expr_for): Remove.
(expr_has_constants): Likewise.
(stmt_has_constants): Likewise.
(simplify_binary_expression): Likewise.
(simplify_unary_expression): Likewise.
(vn_lookup_simplify_result): New hook.
(visit_copy): Adjust.
(visit_reference_op_call): Likewise.
(visit_phi): Likewise.
(visit_use): Likewise.
(process_scc): Likewise.
(init_scc_vn): Likewise.
(visit_reference_op_load): Likewise.  Use match-and-simplify and
a gimple seq for inserted expressions.
(try_to_simplify): Remove GENERIC stmt combining code.
(sccvn_dom_walker::before_dom_children): Use match-and-simplify.
* tree-ssa-pre.c (eliminate_insert): Adjust.
(eliminate_dom_walker::before_dom_children): Likewise.

* gcc.dg/tree-ssa/ssa-fre-7.c: Adjust.
* gcc.dg/tree-ssa/ssa-fre-8.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c.orig   2015-09-30 15:00:30.636706575 +0200
--- gcc/tree-ssa-sccvn.c2015-09-30 15:32:03.543978540 +0200
*** along with GCC; see the file COPYING3.
*** 58,63 
--- 58,64 
  #include "domwalk.h"
  #include "cgraph.h"
  #include "gimple-iterator.h"
+ #include "gimple-match.h"
  
  /* This algorithm is based on the SCC algorithm presented by Keith
 Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
*** VN_INFO_GET (tree name)
*** 401,406 
--- 402,409 
  {
vn_ssa_aux_t newinfo;
  
+   gcc_assert (SSA_NAME_VERSION (name) >= vn_ssa_aux_table.length ()
+ || vn_ssa_aux_table[SSA_NAME_VERSION (name)] == NULL);
newinfo = XOBNEW (&vn_ssa_aux_obstack, struct vn_ssa_aux);
memset (newinfo, 0, sizeof (struct vn_ssa_aux));
if (SSA_NAME_VERSION (name) >= vn_ssa_aux_table.length ())
*** VN_INFO_GET (tree name)
*** 410,501 
  }
  
  
- /* Get the representative expression for the SSA_NAME NAME.  Returns
-the representative SSA_NAME if there is no expression associated with it.  
*/
- 
- tree
- vn_get_expr_for (tree name)
- {
-   vn_ssa_aux_t vn = VN_INFO (name);
-   gimple *def_stmt;
-   tree expr = NULL_TREE;
-   enum tree_code code;
- 
-   if (vn->valnum == VN_TOP)
- return name;
- 
-   /* If the value-number is a constant it is the representative
-  expression.  */
-   if (TREE_CODE (vn->valnum) != SSA_NAME)
- return vn->valnum;
- 
-   /* Get to the information of the value of this SSA_NAME.  */
-   vn = VN_INFO (vn->valnum);
- 
-   /* If the value-number is a constant it is the representative
-  expression.  */
-   if (TREE_CODE (vn->valnum) != SSA_NAME)
- return vn->valnum;
- 
-   /* Else if we have an expression, return it.  */
-   if (vn->expr != NULL_TREE)
- return vn->expr;
- 
-   /* Otherwise use the defining statement to build the expression.  */
-   def_stmt = SSA_NAME_DEF_STMT (vn->valnum);
- 
-   /* If the value number is not an assignment use it directly.  */
-   if (!is_gimple_assign (def_stmt))
- return vn->valnum;
- 
-   /* Note that we can valueize here because we clear the cached
-  simplified expressions after each optimistic iteration.  */
-   code = gimple_assign_rhs_code (def_stmt);
-   switch (TREE_CODE_CLASS (code))
- {
- case tcc_reference:
-   if ((code == REALPART_EXPR
-  || code == IMAGPART_EXPR
-   

Re: [PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-10-01 Thread Richard Biener
On Wed, 30 Sep 2015, Marek Polacek wrote:

> Another instance of out of date SSA range info.  Before phiopt1 we had
> 
>   :
>   if (N_2(D) >= 0)
> goto ;
>   else
> goto ;
> 
>   :
>   iftmp.0_3 = MIN_EXPR ;
> 
>   :
>   # iftmp.0_5 = PHI <0(2), iftmp.0_3(3)>
>   value_4 = (short int) iftmp.0_5;
>   return value_4;
> 
> and after phiop1:
> 
>   :
>   iftmp.0_3 = MIN_EXPR ;
>   iftmp.0_6 = MAX_EXPR ;
>   value_4 = (short int) iftmp.0_6;
>   return value_4;
> 
> But the flow-sensitive info in this BB hasn't been cleared up.
> 
> This problem doesn't show up in GCC5 but might be latent there.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk and 5 as well?
> 
> 2015-09-30  Marek Polacek  
> 
>   PR tree-optimization/67769
>   * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
>   reset_flow_sensitive_info_in_bb when changing the CFG.
> 
>   * gcc.dg/torture/pr67769.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/torture/pr67769.c 
> gcc/testsuite/gcc.dg/torture/pr67769.c
> index e69de29..c1d17c3 100644
> --- gcc/testsuite/gcc.dg/torture/pr67769.c
> +++ gcc/testsuite/gcc.dg/torture/pr67769.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run } */
> +
> +static int
> +clamp (int x, int lo, int hi)
> +{
> +  return (x < lo) ? lo : ((x > hi) ? hi : x);
> +}
> +
> +__attribute__ ((noinline))
> +short
> +foo (int N)
> +{
> +  short value = clamp (N, 0, 16);
> +  return value;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (-5) != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> index 37fdf28..101988a 100644
> --- gcc/tree-ssa-phiopt.c
> +++ gcc/tree-ssa-phiopt.c
> @@ -338,6 +338,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads)
> else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
>   cfgchanged = true;
>   }
> +  if (cfgchanged)
> + reset_flow_sensitive_info_in_bb (bb);

That's a bit conservative.  I believe most PHI opt transforms should
be fine as the conditionally executed blocks did not contain any
stmts that prevail.  The merge PHI also should have valid range info.

So I don't think the patch is good as-is.  Please consider reverting
if you already applied it.

Thanks,
Richard.


Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-10-01 Thread Michael Collison


ChangeLog formatting and test case fixed.

On 09/30/2015 12:30 PM, Marc Glisse wrote:

On Fri, 18 Sep 2015, Marc Glisse wrote:

+(bit_and:c (op @0 @1) (op @0 @2))


:c seems useless here. On the other hand, it might make sense to 
use op:s
since this is mostly useful if it removes the 2 original 
comparisons.


As I was saying, :c is useless.
(x:c y z)
is replaced by two copies of the transformation, one with
(x y z)
and the other with
(x z y)
In your transformation, both versions would be equivalent, so the second
one is redundant.

Also, if you have:
a=x

--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

2015-09-30  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267..ef2e025 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2311,3 +2311,13 @@ along with GCC; see the file COPYING3.  If not see
 (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
 	   (convert:utype @4
+
+/* Transform (@0 < @1 and @0 < @2) to use min, 
+   (@0 > @1 and @0 > @2) to use max */
+(for op (lt le gt ge)
+ ext (min min max max)
+(simplify
+(bit_and (op:s @0 @1) (op:s @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (ext @1 @2)
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..2e4300c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int min_test(long a, long b, long c) {
+  int cmp1 = a < b;
+  int cmp2 = a < c;
+  return cmp1 & cmp2;
+}
+
+int max_test (long a, long b, long c) {
+  int cmp1 = a > b;
+  int cmp2 = a > c;
+  return cmp1 & cmp2;
+}
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
-- 
1.9.1



Re: [PATCH 2/2] call scev analysis in scop-detection as in sese-to-poly

2015-10-01 Thread Tobias Grosser

On 10/01/2015 12:11 AM, Sebastian Pop wrote:

Before our rewrite of the scop detection, we used to not have a valid SESE
region under hand, and so we used to do more ad-hoc analysis of data references
by trying to prove that at all levels of a loop nest the data references would
be still valid.

Now that we have a valid SESE region, we can call the scev analysis in the same
way on the same computed loop nest in the scop-detection as in the sese-to-poly.

Next step will be to cache the data references analyzed in the scop detection
and not compute the same info in sese-to-poly.

The patch fixes block-1.f90 that used to ICE on x86_64-linux when compiled with
-m32.  Patch passed bootstrap with BOOT_CFLAGS="-g -O2 -fgraphite-identity
-floop-nest-optimize" and check on x86_64-linux using ISL-0.15.


Nice.

Tobias


Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-10-01 Thread Marc Glisse

On Thu, 1 Oct 2015, Michael Collison wrote:


ChangeLog formatting and test case fixed.


Oups, sorry for the lack of precision, but I meant indenting the code in 
match.pd, I hadn't even looked at the ChangeLog.


--
Marc Glisse


Re: [gomp4] remove goacc locking

2015-10-01 Thread Thomas Schwinge
Hi Nathan!

On Mon, 28 Sep 2015 11:56:09 -0400, Nathan Sidwell  wrote:
> I've committed this to remove the now no longer needed lock and unlock 
> builtins 
> and related infrastructure.

If I understand correctly, it is an implementation detail of the nvptx
offloading implementation that it doesn't require such locking
primitives, but such locking may still be required for other (future)
offloading implementations, at which point something like the following
would have to be introduced again:

>   * target.def (GOACC_LOCK): Delete hook.
>   * doc/tm.texi.in (TARGET_GOACC_LOCK): Delete.
>   * doc/tm.texi: Rebuilt.
>   * targhooks.h (default_goacc_lock): Delete.
>   * internal-fn.def (GOACC_LOCK,  GOACC_UNLOCK, GOACC_LOCK_INIT): Delete.
>   * internal-fn.c (expand_GOACC_LOCK, expand_GOACC_UNLOCK,
>   expand_GOACC_LOCK_INIT): Delete.
>   * omp-low.c (lower_oacc_reductions): Remove locking.
>   (execute_oacc_transform): Remove lock transforming.
>   (default_goacc_lock): Delete.

(Of course, I agree that it doesn't make sense to try to maintain such a
locking implementation, if now/currently there isn't any user of it.)

>   * config/nvptx/nvptx-protos.h (nvptx_expand_oacc_lock): Delete.
>   * config/nvptx/nvptx.md (oacc_lock, oacc_unlock, oacc_lock_init):
>   Delete.
>   (nvptx_spin_lock, nvptx_spin_reset): Delete.
>   * config/nvptx/nvptx.c (LOCK_GLOBAL, LOCK_SHARED, LOCK_MAX): Delete.
>   (lock_names, lock_space, lock_level, lock_used): Delete.
>   (force_global_locks): Delete.
>   (nvptx_option_override): Do not initialize lock syms.
>   (nvptx_expand_oacc_lock): Delete.
>   (nvptx_file_end): Do not finalize locks.
>   (TARGET_GOACC_LOCK): Delete.

> --- config/nvptx/nvptx.c  (revision 228200)
> +++ config/nvptx/nvptx.c  (working copy)

> @@ -4930,9 +4864,6 @@ nvptx_use_anchors_for_symbol (const_rtx
>  #undef TARGET_GOACC_FORK_JOIN
>  #define TARGET_GOACC_FORK_JOIN nvptx_xform_fork_join
>  
> -#undef TARGET_GOACC_LOCK
> -#define TARGET_GOACC_LOCK nvptx_xform_lock
> -
>  #undef TARGET_GOACC_REDUCTION
>  #define TARGET_GOACC_REDUCTION nvptx_goacc_reduction

[...]/source-gcc/gcc/config/nvptx/nvptx.c:4328:1: warning: 'bool 
nvptx_xform_lock(gcall*, const int*, unsigned int)' defined but not used 
[-Wunused-function]+}
 nvptx_xform_lock (gcall *ARG_UNUSED (call),+}
 ^+}

Committed to gomp-4_0-branch in r228321:

commit a65ecefb6320407b221297880e9c25e1b7b9a430
Author: tschwinge 
Date:   Thu Oct 1 08:12:01 2015 +

Address -Wunused-function diagnostic

gcc/
* config/nvptx/nvptx.c (nvptx_xform_lock): Remove function.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228321 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp   |4 
 gcc/config/nvptx/nvptx.c |9 -
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index c4033e0..3a9e01d 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-10-01  Thomas Schwinge  
+
+   * config/nvptx/nvptx.c (nvptx_xform_lock): Remove function.
+
 2015-09-30  Thomas Schwinge  
 
* tree-cfg.c (replace_ssa_name): Revert obsolete changes.
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index e81cbba..fcebf02 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -4322,15 +4322,6 @@ nvptx_xform_fork_join (gcall *call, const int dims[],
   return false;
 }
 
-/* Check lock & unlock.  We don't reqyire any locks.  */
-
-static bool
-nvptx_xform_lock (gcall *ARG_UNUSED (call),
- const int *ARG_UNUSED (dims), unsigned ARG_UNUSED (ifn_code))
-{
-  return true;
-}
-
 static tree
 nvptx_get_worker_red_addr (tree type, tree rid, tree lid)
 {


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH] Optimize certain end of loop conditions into min/max operation

2015-10-01 Thread Michael Collison

Marc,

Ah I did misunderstand you. Patch with match.pd formatting fix.

On 10/01/2015 01:05 AM, Marc Glisse wrote:

On Thu, 1 Oct 2015, Michael Collison wrote:


ChangeLog formatting and test case fixed.


Oups, sorry for the lack of precision, but I meant indenting the code 
in match.pd, I hadn't even looked at the ChangeLog.




--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

2015-09-30  Michael Collison  
Andrew Pinski 

* match.pd ((x < y) && (x < z) -> x < min (y,z),
(x > y) and (x > z) -> x > max (y,z))
* testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267..caf3c82 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2311,3 +2311,13 @@ along with GCC; see the file COPYING3.  If not see
 (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
  (convert (bit_and (op (convert:utype @0) (convert:utype @1))
 	   (convert:utype @4
+
+/* Transform (@0 < @1 and @0 < @2) to use min, 
+   (@0 > @1 and @0 > @2) to use max */
+(for op (lt le gt ge)
+ ext (min min max max)
+ (simplify
+  (bit_and (op:s @0 @1) (op:s @0 @2))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+   (op @0 (ext @1 @2)
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 000..2e4300c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int min_test(long a, long b, long c) {
+  int cmp1 = a < b;
+  int cmp2 = a < c;
+  return cmp1 & cmp2;
+}
+
+int max_test (long a, long b, long c) {
+  int cmp1 = a > b;
+  int cmp2 = a > c;
+  return cmp1 & cmp2;
+}
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "optimized" } } */
-- 
1.9.1



Re: [Patch 2/2 ARM/AArch64] Add a new Cortex-A53 scheduling model

2015-10-01 Thread Kyrill Tkachov


On 25/09/15 08:59, James Greenhalgh wrote:

Hi,


Hi James,



This patch introduces a new scheduling model for Cortex-A53.

Bootstrapped and tested on arm-none-linux-gnueabi and aarch64-none-linux-gnu
and checked with a variety of popular benchmarking and microbenchmarking
suites to show a benefit.

OK?

Thanks,
James

---
2015-09-25  James Greenhalgh  

* config/arm/aarch-common-protos.h
(aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Likewise.
* config/arm/aarch-common.c (aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Liekwise.


s/Liekwise/Likewise/


* config/arm/cortex-a53.md: Rewrite.



This is ok arm-wise.
Thanks,
Kyrill



Re: [Patch 2/2 ARM/AArch64] Add a new Cortex-A53 scheduling model

2015-10-01 Thread Marcus Shawcroft

On 25/09/15 08:59, James Greenhalgh wrote:


Hi,

This patch introduces a new scheduling model for Cortex-A53.

Bootstrapped and tested on arm-none-linux-gnueabi and aarch64-none-linux-gnu
and checked with a variety of popular benchmarking and microbenchmarking
suites to show a benefit.

OK?

Thanks,
James

---
2015-09-25  James Greenhalgh  

* config/arm/aarch-common-protos.h
(aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Likewise.
* config/arm/aarch-common.c (aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Liekwise.
* config/arm/cortex-a53.md: Rewrite.



OK aarch64 with Kyrill's comments fixed.
/M



Re: Do not use TYPE_CANONICAL in useless_type_conversion

2015-10-01 Thread Richard Biener
On Wed, 30 Sep 2015, Jan Hubicka wrote:

> Hi,
> this implements the idea we discussed at Cauldron to not use TYPE_CANONICAL 
> for
> useless_type_conversion_p.  The basic idea is that TYPE_CANONICAL is language
> specific and should not be part of definition of the Gimple type system that 
> should
> be quite agnostic of language.
> 
> useless_type_conversion_p clearly is about operations on the type and those 
> do not
> depends on TYPE_CANONICAL or alias info. For LTO and C/Fortran 
> interpoerability
> rules we are forced to make TYPE_CANONICAL more coarse than it needs to be
> that results in troubles with useless_type_conversion_p use.
> 
> After dropping the check I needed to solve two issues. First is that we need a
> definition of useless conversions for aggregates. As discussed earlier I made
> it to depend only on size. The basic idea is that only operations you can do 
> on
> gimple with those are moves and field accesses. Field accesses have
> corresponding type into in COMPONENT_REF or MEM_REF, so we do not care about
> conversions of those.  This caused three Ada failures on PPC64, because we can
> not move between structures of same size but different mode.
> 
> Other failure introduced was 2 cases in C++ testsuite because we currently
> do not handle OFFSET_TYPE at all.  I added the obvious check for TREE_TYPE
> and BASE_TYPE to be compatible.
> I think we can allow more of conversions between OFFSET_TYPEs and integer
> types, but I would like to leave this for incremental changes. (It is probalby
> not too important anyway).
> 
> Bootstrapped/regtested x86_64-linux except Ada and ppc64-linux for all 
> languages
> including Ada. OK?

Comments below

> I have reviewed the uses of useless_type_conversion_p on non-register types
> and there are some oddities, will send separate patches for those.
> 
> Honza
> 
>   * gimple-expr.c (useless_type_conversion_p): Do not use TYPE_CANONICAL
>   for defining useless conversions; make structure compatible if size
>   and mode are.
> Index: gimple-expr.c
> ===
> --- gimple-expr.c (revision 228267)
> +++ gimple-expr.c (working copy)
> @@ -87,11 +87,6 @@ useless_type_conversion_p (tree outer_ty
>if (inner_type == outer_type)
>  return true;
>  
> -  /* If we know the canonical types, compare them.  */
> -  if (TYPE_CANONICAL (inner_type)
> -  && TYPE_CANONICAL (inner_type) == TYPE_CANONICAL (outer_type))
> -return true;
> -
>/* Changes in machine mode are never useless conversions unless we
>   deal with aggregate types in which case we defer to later checks.  */
>if (TYPE_MODE (inner_type) != TYPE_MODE (outer_type)
> @@ -270,12 +265,23 @@ useless_type_conversion_p (tree outer_ty
>return true;
>  }
>  
> -  /* For aggregates we rely on TYPE_CANONICAL exclusively and require
> - explicit conversions for types involving to be structurally
> - compared types.  */
> +  /* For aggregates compare only the size and mode.  Accesses to fields do 
> have
> + a type information by themselves and thus we only care if we can i.e.
> + use the types in move operations.  */
>else if (AGGREGATE_TYPE_P (inner_type)
>  && TREE_CODE (inner_type) == TREE_CODE (outer_type))
> -return false;
> +return (!TYPE_SIZE (outer_type)
> + || (TYPE_SIZE (inner_type)
> + && operand_equal_p (TYPE_SIZE (inner_type),
> + TYPE_SIZE (outer_type), 0)))
> +&& TYPE_MODE (outer_type) == TYPE_MODE (inner_type);

If we require the TYPE_MODE check then just remove the !AGGREGATE_TYPE_P
check on the mode equality check at the beginning of the function.
There must be a reason why I allowed modes to differ there btw ;)

The size compare might be too conservative for

  X = WITH_SIZE_EXPR ;

where we take the size to copy from the WITH_SIZE_EXPR.  Of course
you can't tell this from the types.  Well.  Do we actually need
the !TYPE_SIZE (aka !COMPLETE_TYPE_P) case?  Or does this happen
exactly when WITH_SIZE_EXPR is used?

vertical space missing

> +  else if (TREE_CODE (inner_type) == OFFSET_TYPE
> +&& TREE_CODE (inner_type) == TREE_CODE (outer_type))
> +return useless_type_conversion_p (TREE_TYPE (outer_type),
> +   TREE_TYPE (inner_type))
> +&& useless_type_conversion_p
> + (TYPE_OFFSET_BASETYPE (outer_type),
> +  TYPE_OFFSET_BASETYPE (inner_type));

I believe OFFSET_TYPEs in GIMPLE are a red herring - the only cases
I saw them are on INTEGER_CSTs.  Nothing in the middle-end looks at their 
type or TYPE_OFFSET_BASETYPE.  IMHO the C++ frontend should
GIMPLIFY those away.  I don't remember trying that btw. so I believe
it might be quite easy (and OFFSET_TYPE should become a C++ frontend
tree code).

>  
>return false;
>  }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Grah

Re: Re: Ping: New suggested patch for pr 62242 & pr 52332

2015-10-01 Thread Paul Richard Thomas
Dear Louis,

I have just a minor nit to pick; otherwise your patch is OK for trunk.

I do not think that quoting the code in the first comment is
necessary. If anybody is interested, they can walz off to
expand_expr_real_1 themselves. The textual part of your comment is
perfectly clear.

Many thanks for the patch.

Paul

PS Do you now have all the paperwork to commit the patch yourself?

On 29 September 2015 at 09:36, Louis Krupp  wrote:
> Paul,
>
> The patch is attached.  I last compiled and tested the code at 21:15 UTC on 
> 28 September.
>
> Would you like me to resend the test cases?
>
> Louis
>
>  On Mon, 28 Sep 2015 00:55:10 -0700 Paul Richard Thomas  wrote 
>>Hi Louis,
>>
>>Could you please send the patch as an attachment - in your message,
>>much of the lhs whitespace information has been lost. Fundamentally,
>>the patch looks OK. Since they pertain to the same PRs, I would
>>consider combining the first and third testcases, either by throwing
>>the procedures into one module of by renaming the modules; eg. gfbug1
>>and gfbug2.
>>
>>Cheers
>>
>>Paul
>>
>>On 25 September 2015 at 18:49, Louis Krupp  wrote:
>>> Index: gcc/fortran/ChangeLog
>>> ===
>>> --- gcc/fortran/ChangeLog (revision 227895)
>>> +++ gcc/fortran/ChangeLog (working copy)
>>> @@ -1,3 +1,15 @@
>>> +2015-09-18 Louis Krupp 
>>> +
>>> + PR fortran/62242
>>> + PR fortran/52332
>>> + * trans-array.c
>>> + (store_backend_decl): Create new gfc_charlen instance if requested
>>> + (get_array_ctor_all_strlen): Call store_backend_decl requesting
>>> + new gfc_charlen
>>> + (trans_array_constructor): Call store_backend_decl requesting
>>> + new gfc_charlen if get_array_ctor_strlen was called
>>> + (gfc_add_loop_ss_code): Don't try to convert non-constant length
>>> +
>>> 2015-09-17 Paul Thomas 
>>>
>>> PR fortran/52846
>>> Index: gcc/testsuite/ChangeLog
>>> ===
>>> --- gcc/testsuite/ChangeLog (revision 227895)
>>> +++ gcc/testsuite/ChangeLog (working copy)
>>> @@ -1,3 +1,10 @@
>>> +2015-09-18 Louis Krupp 
>>> +
>>> + PR fortran/62242 fortran/52332
>>> + * gfortran.dg/string_array_constructor_1.f90: New.
>>> + * gfortran.dg/string_array_constructor_2.f90: New.
>>> + * gfortran.dg/string_array_constructor_3.f90: New.
>>> +
>>> 2015-09-17 Bernd Edlinger 
>>>
>>> PR sanitizer/64078
>>> Index: gcc/fortran/trans-array.c
>>> ===
>>> --- gcc/fortran/trans-array.c (revision 227895)
>>> +++ gcc/fortran/trans-array.c (working copy)
>>> @@ -1799,6 +1799,39 @@ gfc_trans_array_constructor_value (stmtblock_t * p
>>> }
>>>
>>>
>>> +/* The array constructor code can create a string length with an operand
>>> + in the form of a temporary variable. This variable will retain its
>>> + context (current_function_decl). If we store this length tree in a
>>> + gfc_charlen structure which is shared by a variable in another
>>> + context, the resulting gfc_charlen structure with a variable in a
>>> + different context, we could trip the assertion in expand_expr_real_1
>>> + when it sees that a variable has been created in one context and
>>> + referenced in another:
>>> +
>>> + if (exp)
>>> + context = decl_function_context (exp);
>>> + gcc_assert (!exp
>>> + || SCOPE_FILE_SCOPE_P (context)
>>> + || context == current_function_decl
>>> + || TREE_STATIC (exp)
>>> + || DECL_EXTERNAL (exp)
>>> + // ??? C++ creates functions that are not TREE_STATIC.
>>> + || TREE_CODE (exp) == FUNCTION_DECL);
>>> +
>>> + If this might be the case, we create a new gfc_charlen structure and
>>> + link it into the current namespace. */
>>> +
>>> +static void
>>> +store_backend_decl (gfc_charlen **clp, tree len, bool force_new_cl)
>>> +{
>>> + if (force_new_cl)
>>> + {
>>> + gfc_charlen *new_cl = gfc_new_charlen (gfc_current_ns, *clp);
>>> + *clp = new_cl;
>>> + }
>>> + (*clp)->backend_decl = len;
>>> +}
>>> +
>>> /* A catch-all to obtain the string length for anything that is not
>>> a substring of non-constant length, a constant, array or variable. */
>>>
>>> @@ -1836,7 +1869,7 @@ get_array_ctor_all_strlen (stmtblock_t *block, gfc
>>> gfc_add_block_to_block (block, &se.pre);
>>> gfc_add_block_to_block (block, &se.post);
>>>
>>> - e->ts.u.cl->backend_decl = *len;
>>> + store_backend_decl (&e->ts.u.cl, *len, true);
>>> }
>>> }
>>>
>>> @@ -2226,6 +2259,7 @@ trans_array_constructor (gfc_ss * ss, locus * wher
>>> if (expr->ts.type == BT_CHARACTER)
>>> {
>>> bool const_string;
>>> + bool force_new_cl = false;
>>>
>>> /* get_array_ctor_strlen walks the elements of the constructor, if a
>>> typespec was given, we already know the string length and want the one
>>> @@ -2244,14 +2278,17 @@ trans_array_constructor (gfc_ss * ss, locus * wher
>>> gfc_add_block_to_block (&outer_loop->post, &length_se.post);
>>> }
>>> else
>>> - const_string = get_array_ctor_strlen (&outer_loop->pre, c,
>>> - &ss_info->string_len

Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-10-01 Thread Kyrill Tkachov


On 30/09/15 17:39, Kyrill Tkachov wrote:

On 09/06/15 09:17, Kyrill Tkachov wrote:

On 05/06/15 14:14, Kyrill Tkachov wrote:

On 05/06/15 14:11, Richard Earnshaw wrote:

On 05/06/15 14:08, Kyrill Tkachov wrote:

Hi Shiva,

On 05/06/15 10:42, Shiva Chen wrote:

Hi, Kyrill

I add the testcase as stl-cond.c.

Could you help to check the testcase ?

If it's OK, Could you help me to apply the patch ?


This looks ok to me.
One nit on the testcase:

diff --git a/gcc/testsuite/gcc.target/arm/stl-cond.c
b/gcc/testsuite/gcc.target/arm/stl-cond.c
new file mode 100755
index 000..44c6249
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/stl-cond.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2" } */

This should also have -marm as the problem exhibited itself in arm state.
I'll commit this patch with this change in 24 hours on your behalf if no
one
objects.


Explicit use of -marm will break multi-lib testing.  I've forgotten the
correct hook, but there's most-likely something that will give you the
right behaviour, even if it means that thumb-only multi-lib testing
skips this test.

So I think what we want is:

dg-require-effective-target arm_arm_ok

The comment in target-supports.exp is:
# Return 1 if this is an ARM target where -marm causes ARM to be
# used (not Thumb)


I've committed the attached patch to trunk on Shiva's behalf with r224269.
It gates the test on arm_arm_ok and adds -marm, like other similar tests.
The ChangeLog I used is below:

I'd like to backport this to GCC 5 and 4.9
The patch applies and tests cleanly on GCC 5.
On 4.9 it needs some minor changes, which I'm attaching here.
I've bootstrapped and tested this patch on 4.9 and the Shiva's
original patch on GCC 5.

2015-09-30  Kyrylo Tkachov  

  Backport from mainline
  2015-06-09  Shiva Chen  

  * sync.md (atomic_load): Add conditional code for lda/ldr
  (atomic_store): Likewise.

2015-09-30  Kyrylo Tkachov  

  Backport from mainline
  2015-06-09  Shiva Chen  

  * gcc.target/arm/stl-cond.c: New test.


I'll commit them tomorrow.


I've now backported the patch to GCC 5 with r228322
and 4.9 with r228323.

Kyrill


Thanks,
Kyrill




2015-06-09  Shiva Chen  

   * sync.md (atomic_load): Add conditional code for lda/ldr
   (atomic_store): Likewise.

2015-06-09  Shiva Chen  

   * gcc.target/arm/stl-cond.c: New test.


Thanks,
Kyrill


Kyrill



R.


Ramana, Richard, we need to backport it to GCC 5 as well, right?

Thanks,
Kyrill



Thanks,

Shiva

2015-06-05 16:34 GMT+08:00 Kyrill Tkachov :

Hi Shiva,

On 05/06/15 09:29, Shiva Chen wrote:

Hi, Kyrill

I update the patch as Richard's suggestion.

-  return \"str\t%1, %0\";
+  return \"str%(%)\t%1, %0\";
  else
-  return \"stl\t%1, %0\";
+  return \"stl%?\t%1, %0\";
}
-)
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])


Let me sum up.

We add predicable attribute to allow gcc do if-conversion in
ce1/ce2/ce3 not only in final phase by final_prescan_insn finite state
machine.

We set predicalble_short_it to "no" to restrict conditional code
generation on armv8 with thumb mode.

However, we could use the flags -mno-restrict-it to force generating
conditional code on thumb mode.

Therefore, we have to consider the assembly output format for strb
with condition code on arm/thumb mode.

Because arm/thumb mode use different syntax for strb,
we output the assembly as str%(%)
which will put the condition code in the right place according to
TARGET_UNIFIED_ASM.

Is there still missing something ?

That's all correct, and well summarised :)
The patch looks good to me, but please include the testcase
(test.c from earlier) appropriately marked up for the testsuite.
I think to the level of dg-assemble, just so we know everything is
wired up properly.

Thanks for dealing with this.
Kyrill



Thanks,

Shiva

2015-06-04 18:00 GMT+08:00 Kyrill Tkachov
:

Hi Shiva,

On 04/06/15 10:57, Shiva Chen wrote:

Hi, Kyrill

Thanks for the tips of syntax.

It seems that correct syntax for

ldrb with condition code is ldreqb

ldab with condition code is ldabeq


So I modified the pattern as follow

{
  enum memmodel model = (enum memmodel) INTVAL (operands[2]);
  if (model == MEMMODEL_RELAXED
  || model == MEMMODEL_CONSUME
  || model == MEMMODEL_RELEASE)
return \"ldr%?\\t%0, %1\";
  else
return \"lda%?\\t%0, %1\";
}
[(set_attr "predicable" "yes")
 (set_attr "predicable_short_it" "no")])

It seems we don't have to worry about thumb mode,

I suggest you use Richard's suggestion from:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00384.html
to write this in a clean way.


Because we already set "predicable" "yes" and predicable_short_it"
"no"
for the pattern.

That's not qui

Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-10-01 Thread Bernd Schmidt

On 10/01/2015 08:30 AM, Dominik Vogt wrote:


However, you probably should add a sentence or two to
the documentation to specify ordering wrt other parts of the header
of a function.


Any suggestions where that information should be placed in the
documentation?


Just in the hook definition.


Bernd



Re: Re: Re: Ping: New suggested patch for pr 62242 & pr 52332

2015-10-01 Thread Louis Krupp
Paul,

I'll delete the code in the comment.

I'm doing one more update and build just to make sure everything still works as 
intended.

I'm not sure what paperwork I need.  Shall I try to do "svn commit" and see 
what happens?

Louis

 On Thu, 01 Oct 2015 02:10:34 -0700 Paul Richard Thomas  wrote  
>Dear Louis, 
> 
>I have just a minor nit to pick; otherwise your patch is OK for trunk. 
> 
>I do not think that quoting the code in the first comment is 
>necessary. If anybody is interested, they can walz off to 
>expand_expr_real_1 themselves. The textual part of your comment is 
>perfectly clear. 
> 
>Many thanks for the patch. 
> 
>Paul 
> 
>PS Do you now have all the paperwork to commit the patch yourself? 
> 
>On 29 September 2015 at 09:36, Louis Krupp  wrote: 
>> Paul, 
>> 
>> The patch is attached. I last compiled and tested the code at 21:15 UTC on 
>> 28 September. 
>> 
>> Would you like me to resend the test cases? 
>> 
>> Louis 
>> 
>>  On Mon, 28 Sep 2015 00:55:10 -0700 Paul Richard Thomas wrote  
>>>Hi Louis, 
>>> 
>>>Could you please send the patch as an attachment - in your message, 
>>>much of the lhs whitespace information has been lost. Fundamentally, 
>>>the patch looks OK. Since they pertain to the same PRs, I would 
>>>consider combining the first and third testcases, either by throwing 
>>>the procedures into one module of by renaming the modules; eg. gfbug1 
>>>and gfbug2. 
>>> 
>>>Cheers 
>>> 
>>>Paul 
>>> 
>>>On 25 September 2015 at 18:49, Louis Krupp  wrote: 
 Index: gcc/fortran/ChangeLog 
 === 
 --- gcc/fortran/ChangeLog (revision 227895) 
 +++ gcc/fortran/ChangeLog (working copy) 
 @@ -1,3 +1,15 @@ 
 +2015-09-18 Louis Krupp  
 + 
 + PR fortran/62242 
 + PR fortran/52332 
 + * trans-array.c 
 + (store_backend_decl): Create new gfc_charlen instance if requested 
 + (get_array_ctor_all_strlen): Call store_backend_decl requesting 
 + new gfc_charlen 
 + (trans_array_constructor): Call store_backend_decl requesting 
 + new gfc_charlen if get_array_ctor_strlen was called 
 + (gfc_add_loop_ss_code): Don't try to convert non-constant length 
 + 
 2015-09-17 Paul Thomas  
 
 PR fortran/52846 
 Index: gcc/testsuite/ChangeLog 
 === 
 --- gcc/testsuite/ChangeLog (revision 227895) 
 +++ gcc/testsuite/ChangeLog (working copy) 
 @@ -1,3 +1,10 @@ 
 +2015-09-18 Louis Krupp  
 + 
 + PR fortran/62242 fortran/52332 
 + * gfortran.dg/string_array_constructor_1.f90: New. 
 + * gfortran.dg/string_array_constructor_2.f90: New. 
 + * gfortran.dg/string_array_constructor_3.f90: New. 
 + 
 2015-09-17 Bernd Edlinger  
 
 PR sanitizer/64078 
 Index: gcc/fortran/trans-array.c 
 === 
 --- gcc/fortran/trans-array.c (revision 227895) 
 +++ gcc/fortran/trans-array.c (working copy) 
 @@ -1799,6 +1799,39 @@ gfc_trans_array_constructor_value (stmtblock_t * p 
 } 
 
 
 +/* The array constructor code can create a string length with an operand 
 + in the form of a temporary variable. This variable will retain its 
 + context (current_function_decl). If we store this length tree in a 
 + gfc_charlen structure which is shared by a variable in another 
 + context, the resulting gfc_charlen structure with a variable in a 
 + different context, we could trip the assertion in expand_expr_real_1 
 + when it sees that a variable has been created in one context and 
 + referenced in another: 
 + 
 + if (exp) 
 + context = decl_function_context (exp); 
 + gcc_assert (!exp 
 + || SCOPE_FILE_SCOPE_P (context) 
 + || context == current_function_decl 
 + || TREE_STATIC (exp) 
 + || DECL_EXTERNAL (exp) 
 + // ??? C++ creates functions that are not TREE_STATIC. 
 + || TREE_CODE (exp) == FUNCTION_DECL); 
 + 
 + If this might be the case, we create a new gfc_charlen structure and 
 + link it into the current namespace. */ 
 + 
 +static void 
 +store_backend_decl (gfc_charlen **clp, tree len, bool force_new_cl) 
 +{ 
 + if (force_new_cl) 
 + { 
 + gfc_charlen *new_cl = gfc_new_charlen (gfc_current_ns, *clp); 
 + *clp = new_cl; 
 + } 
 + (*clp)->backend_decl = len; 
 +} 
 + 
 /* A catch-all to obtain the string length for anything that is not 
 a substring of non-constant length, a constant, array or variable. */ 
 
 @@ -1836,7 +1869,7 @@ get_array_ctor_all_strlen (stmtblock_t *block, gfc 
 gfc_add_block_to_block (block, &se.pre); 
 gfc_add_block_to_block (block, &se.post); 
 
 - e->ts.u.cl->backend_decl = *len; 
 + store_backend_decl (&e->ts.u.cl, *len, true); 
 } 
 } 
 
 @@ -2226,6 +2

Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-10-01 Thread Bernd Schmidt

On 09/29/2015 04:31 PM, James Greenhalgh wrote:

On the other side of the equation, we want a cost for the converted
sequence. We can build a cost of the generated rtl sequence, but for
targets like AArch64 this is going to be wildly off. AArch64 will expand
(a > b) ? x : y; as a set to the CC register, followed by a conditional
move based on the CC register. Consequently, where we have multiple sets
back to back we end up with:

   set CC (a > b)
   set x1 (CC ? x : y)
   set CC (a > b)
   set x2 (CC ? x : z)
   set CC (a > b)
   set x3 (CC ? x : k)

Which we know will be simplified later to:

   set CC (a > b)
   set x1 (CC ? x : y)
   set x2 (CC ? x : z)
   set x3 (CC ? x : k)


I guess the transformation you want to make is a bit unusual in that it 
generates such extra instructions. rtx_cost has problems taking such 
secondary considerations into account.


I haven't quite made up my mind about the new target hook, but I wonder 
if it might be a good idea to try and simplify the above sequence on the 
spot before calculating costs for it. (Incidentally, which pass removes 
the extra CC sets?)



Bernd



Re: [Patch 2/2 ARM/AArch64] Add a new Cortex-A53 scheduling model

2015-10-01 Thread James Greenhalgh
On Thu, Oct 01, 2015 at 09:33:07AM +0100, Marcus Shawcroft wrote:
> On 25/09/15 08:59, James Greenhalgh wrote:
> >
> > Hi,
> >
> > This patch introduces a new scheduling model for Cortex-A53.
> >
> > Bootstrapped and tested on arm-none-linux-gnueabi and aarch64-none-linux-gnu
> > and checked with a variety of popular benchmarking and microbenchmarking
> > suites to show a benefit.
> >
> > OK?
> >
> > Thanks,
> > James
> >
> > ---
> > 2015-09-25  James Greenhalgh  
> >
> > * config/arm/aarch-common-protos.h
> > (aarch_accumulator_forwarding): New.
> > (aarch_forward_to_shift_is_not_shifted_reg): Likewise.
> > * config/arm/aarch-common.c (aarch_accumulator_forwarding): New.
> > (aarch_forward_to_shift_is_not_shifted_reg): Liekwise.
> > * config/arm/cortex-a53.md: Rewrite.
> >
> 
> OK aarch64 with Kyrill's comments fixed.
> /M

Thanks,

I had to rebase this over Evandro's recent patch adding neon_ldp/neon_ldp_q
types to the old scheduling model. The rebase was obvious to resolve, and
while I was there I also added the neon_stp/neon_stp_q types which were
missing.

I've attached what I ultimately committed as revision 228324. I messed up
fixing the ChangeLog typo before commit, so that is revision 228325.

Thanks,
James

Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 228323)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2015-10-01  James Greenhalgh  
+
+	* config/arm/aarch-common-protos.h
+	(aarch_accumulator_forwarding): New.
+	(aarch_forward_to_shift_is_not_shifted_reg): Likewise.
+	* config/arm/aarch-common.c (aarch_accumulator_forwarding): New.
+	(aarch_forward_to_shift_is_not_shifted_reg): Liekwise.
+	* config/arm/cortex-a53.md: Rewrite.
+
 2015-10-01  Richard Biener  
 
 	* gimple-match.h (mprts_hook): Declare.
Index: gcc/config/arm/aarch-common-protos.h
===
--- gcc/config/arm/aarch-common-protos.h	(revision 228323)
+++ gcc/config/arm/aarch-common-protos.h	(working copy)
@@ -23,7 +23,9 @@
 #ifndef GCC_AARCH_COMMON_PROTOS_H
 #define GCC_AARCH_COMMON_PROTOS_H
 
+extern int aarch_accumulator_forwarding (rtx_insn *, rtx_insn *);
 extern int aarch_crypto_can_dual_issue (rtx_insn *, rtx_insn *);
+extern int aarch_forward_to_shift_is_not_shifted_reg (rtx_insn *, rtx_insn *);
 extern bool aarch_rev16_p (rtx);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
 extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode);
Index: gcc/config/arm/aarch-common.c
===
--- gcc/config/arm/aarch-common.c	(revision 228323)
+++ gcc/config/arm/aarch-common.c	(working copy)
@@ -394,6 +394,112 @@
   && !reg_overlap_mentioned_p (result, op1));
 }
 
+/* Return non-zero if the destination of PRODUCER feeds the accumulator
+   operand of an MLA-like operation.  */
+
+int
+aarch_accumulator_forwarding (rtx_insn *producer, rtx_insn *consumer)
+{
+  rtx producer_set = single_set (producer);
+  rtx consumer_set = single_set (consumer);
+
+  /* We are looking for a SET feeding a SET.  */
+  if (!producer_set || !consumer_set)
+return 0;
+
+  rtx dest = SET_DEST (producer_set);
+  rtx mla = SET_SRC (consumer_set);
+
+  /* We're looking for a register SET.  */
+  if (!REG_P (dest))
+return 0;
+
+  rtx accumulator;
+
+  /* Strip a zero_extend.  */
+  if (GET_CODE (mla) == ZERO_EXTEND)
+mla = XEXP (mla, 0);
+
+  switch (GET_CODE (mla))
+{
+case PLUS:
+  /* Possibly an MADD.  */
+  if (GET_CODE (XEXP (mla, 0)) == MULT)
+	accumulator = XEXP (mla, 1);
+  else
+	return 0;
+  break;
+case MINUS:
+  /* Possibly an MSUB.  */
+  if (GET_CODE (XEXP (mla, 1)) == MULT)
+	accumulator = XEXP (mla, 0);
+  else
+	return 0;
+  break;
+case FMA:
+	{
+	  /* Possibly an FMADD/FMSUB/FNMADD/FNMSUB.  */
+	  if (REG_P (XEXP (mla, 1))
+	  && REG_P (XEXP (mla, 2))
+	  && (REG_P (XEXP (mla, 0))
+		  || GET_CODE (XEXP (mla, 0)) == NEG))
+
+	{
+	  /* FMADD/FMSUB.  */
+	  accumulator = XEXP (mla, 2);
+	}
+	  else if (REG_P (XEXP (mla, 1))
+		   && GET_CODE (XEXP (mla, 2)) == NEG
+		   && (REG_P (XEXP (mla, 0))
+		   || GET_CODE (XEXP (mla, 0)) == NEG))
+	{
+	  /* FNMADD/FNMSUB.  */
+	  accumulator = XEXP (XEXP (mla, 2), 0);
+	}
+	  else
+	return 0;
+	  break;
+	}
+  default:
+	/* Not an MLA-like operation.  */
+	return 0;
+}
+
+  return (REGNO (dest) == REGNO (accumulator));
+}
+
+/* Return nonzero if the CONSUMER instruction is some sort of
+   arithmetic or logic + shift operation, and the register we are
+   writing in PRODUCER is not used in a register shift by register
+   operation.  */
+
+int
+aarch_forward_to_shift_is_not_shifted_reg (rtx_insn *producer,
+	   rtx_insn *consumer)
+{
+  rtx value, op;
+  rtx early_op;
+
+  if (!arm_get_set_operands (producer, consumer, &value, &op))
+return 0;
+

Re: [PATCH] Fix warnings building pdp11 port

2015-10-01 Thread Richard Biener
On Wed, Sep 30, 2015 at 6:43 PM, Jeff Law  wrote:
> On 09/30/2015 01:48 AM, Richard Biener wrote:
>>
>> On Tue, Sep 29, 2015 at 6:55 PM, Jeff Law  wrote:
>>>
>>> The pdp11 port fails to build with the trunk because of a warning.
>>> Essentially VRP determines that the result of using BRANCH_COST is a
>>> constant with the range [0..1].  That's always less than 4, 3 and the
>>> various other magic constants used with BRANCH_COST and VRP issues a
>>> warning
>>> about that comparison.
>>
>>
>> It does?  Huh.  Is it about undefined overflow which is the only thing
>> VRP should end up
>> warning about?  If so I wonder how that happens, at least I can't
>> reproduce it for
>> --target=pdp11 --enable-werror build of cc1.
>
> You have to use a trunk compiler to build the pdp11 cross.  You'll bump into
> this repeatedly:
>
>   if (warn_type_limits
>   && ret && only_ranges
>   && TREE_CODE_CLASS (code) == tcc_comparison
>   && TREE_CODE (op0) == SSA_NAME)
> {
>   /* If the comparison is being folded and the operand on the LHS
>  is being compared against a constant value that is outside of
>  the natural range of OP0's type, then the predicate will
>  always fold regardless of the value of OP0.  If -Wtype-limits
>  was specified, emit a warning.  */
>   tree type = TREE_TYPE (op0);
>   value_range_t *vr0 = get_value_range (op0);
>
>   if (vr0->type == VR_RANGE
>   && INTEGRAL_TYPE_P (type)
>   && vrp_val_is_min (vr0->min)
>   && vrp_val_is_max (vr0->max)
>   && is_gimple_min_invariant (op1))
> {
>   location_t location;
>
>   if (!gimple_has_location (stmt))
> location = input_location;
>   else
> location = gimple_location (stmt);
>
>   warning_at (location, OPT_Wtype_limits,
>   integer_zerop (ret)
>   ? G_("comparison always false "
>"due to limited range of data type")
>   : G_("comparison always true "
>"due to limited range of data type"));
> }
> }

Oh, I didn't remember we have this kind of warning in VRP ... it's
bound to trigger
for example after jump-threading.  So I'm not sure it's useful.

Richard.

>   return ret;
> }
>
>
> Jeff


Re: [PATCH] Update SSA_NAME manager to use two lists

2015-10-01 Thread Richard Biener
On Wed, Sep 30, 2015 at 7:44 PM, Jeff Law  wrote:
>
> The SSA_NAME manager currently has a single free list.  As names are
> released, they're put on the free list and recycled immediately.
>
> This has led to several problems through the years -- in particular removal
> of an edge may result in removal of a PHI when the target of the edge is
> unreachable.  This can result in released names being left in the IL until
> *all* unreachable code is eliminated.  Long term we'd like to discover all
> the unreachable code exposed by a deleted edge earlier, but that's further
> out.
>
> Richi originally suggested using a two list implementation to avoid this
> class of problems.  Essentially released names are queued until it's safe to
> start recycling them.  I agreed, but didn't get around to doing any of the
> implementation work.
>
> Bernd recently took care of the implementation side.  This patch is mostly
> his.  The only change of significance I made is the placement of the call to
> flush the pending list.  Bernd had it in the ssa updater, I put it after cfg
> cleanups.  The former does recycle better, but there's nothing that
> inherently ensures there aren't unreachables in the CFG during update_ssa
> (in practice it's not a problem because we typically update dominators
> first, which requires a cleaned cfg).
>
> I've got a follow-up which exploits this improved safety in DOM to optimize
> things better in DOM rather than waiting for jump threading to clean things
> up.
>
> No additional tests in this patch as the only failure seen when I twiddled
> things a little was already covered by existing tests.
>
> Bootstrapped and regression tested on x86-linux-gnu, with and without the
> follow-up patch to exploit the capability in DOM.
>
> Installed on the trunk.
>
> jeff
>
>
>
>
> * gimple-ssa.h (gimple_df): Add free_ssanames_queue field.
> * passes.c: Include tree-ssanames.h.
> (execute_function_todo): Flush the pending free SSA_NAMEs after
> eliminating unreachable basic blocks.
> * tree-ssanames.c (FREE_SSANAMES_QUEUE): new.
> (init_ssanames): Initialize FREE_SSANAMES_QUEUE.
> (fini_ssanames): Finalize FREE_SSANAMES_QUEUE.
> (flush_ssanames_freelist): New function.
> (release_ssaname_fn): Put released names on the queue.
> (pass_release_ssa_names::execute): Call flush_ssanames_freelist.
> * tree-ssanames.h (flush_ssanames_freelist): Declare.
>
>
>
> diff --git a/gcc/gimple-ssa.h b/gcc/gimple-ssa.h
> index c89071e..39551da 100644
> --- a/gcc/gimple-ssa.h
> +++ b/gcc/gimple-ssa.h
> @@ -90,6 +90,9 @@ struct GTY(()) gimple_df {
>/* Free list of SSA_NAMEs.  */
>vec *free_ssanames;
>
> +  /* Queue of SSA_NAMEs to be freed at the next opportunity.  */
> +  vec *free_ssanames_queue;
> +
>/* Hashtable holding definition for symbol.  If this field is not NULL,
> it
>   means that the first reference to this variable in the function is a
>   USE or a VUSE.  In those cases, the SSA renamer creates an SSA name
> diff --git a/gcc/passes.c b/gcc/passes.c
> index d06a293..5b41102 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfgrtl.h"
>  #include "tree-ssa-live.h"  /* For remove_unused_locals.  */
>  #include "tree-cfgcleanup.h"
> +#include "tree-ssanames.h"
>
>  using namespace gcc;
>
> @@ -1913,6 +1914,14 @@ execute_function_todo (function *fn, void *data)
>  {
>cleanup_tree_cfg ();
>
> +  /* Once unreachable nodes have been removed from the CFG,
> +there can't be any lingering references to released
> +SSA_NAMES (because there is no more unreachable code).
> +
> +Thus, now is the time to flush the SSA_NAMEs freelist.  */
> +  if (fn->gimple_df)
> +   flush_ssaname_freelist ();
> +

Apart from what Jakub said - this keeps the list non-recycled for example
after DCE if that doesnt call cleanup_cfg.  Likewise after passes that call
cleanup_cfg manually.  It also doesn't get called after IPA transform
passes (which would require calling on each function).

To at least catch those passes returning 0 (do nothing) I'd place the
call into execute_todo instead, unconditionally on flags.

>/* When cleanup_tree_cfg merges consecutive blocks, it may
>  perform some simplistic propagation when removing single
>  valued PHI nodes.  This propagation may, in turn, cause the
> diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
> index 4199290..e029062 100644
> --- a/gcc/tree-ssanames.c
> +++ b/gcc/tree-ssanames.c
> @@ -69,6 +69,7 @@ unsigned int ssa_name_nodes_reused;
>  unsigned int ssa_name_nodes_created;
>
>  #define FREE_SSANAMES(fun) (fun)->gimple_df->free_ssanames
> +#define FREE_SSANAMES_QUEUE(fun) (fun)->gimple_df->free_ssanames_queue
>
>
>  /* Initialize management of SSA_NAMEs to default SIZE.  If SIZE is
> @@ -91,6 +92,7 @@ init_ssanames (struct function *

Re: Fold acc_on_device

2015-10-01 Thread Richard Biener
On Wed, Sep 30, 2015 at 9:22 PM, Jakub Jelinek  wrote:
> On Wed, Sep 30, 2015 at 03:01:22PM -0400, Nathan Sidwell wrote:
>> On 09/30/15 08:46, Richard Biener wrote:
>>
>> >>>Please don't add any new GENERIC based builtin folders.  Instead add to
>> >>>gimple-fold.c:gimple_fold_builtin
>>
>> Is this patch ok?
>>
>> nathan
>
>> 2015-09-30  Nathan Sidwell  
>>
>>   * builtins.c: Don't include gomp-constants.h.
>>   (fold_builtin_1): Don't fold acc_on_device here.
>>   * gimple-fold.c: Include gomp-constants.h.
>>   (gimple_fold_builtin_acc_on_device): New.
>>   (gimple_fold_builtin): Call it.
>>
>> Index: gimple-fold.c
>> ===
>> --- gimple-fold.c (revision 228288)
>> +++ gimple-fold.c (working copy)
>> @@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
>>  #include "output.h"
>>  #include "tree-eh.h"
>>  #include "gimple-match.h"
>> +#include "gomp-constants.h"
>>
>>  /* Return true when DECL can be referenced from current unit.
>> FROM_DECL (if non-null) specify constructor of variable DECL was taken 
>> from.
>> @@ -2708,6 +2709,34 @@ gimple_fold_builtin_strlen (gimple_stmt_
>>return true;
>>  }
>>
>> +/* Fold a call to __builtin_acc_on_device.  */
>> +
>> +static bool
>> +gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
>> +{
>> +  /* Defer folding until we know which compiler we're in.  */
>> +  if (symtab->state != EXPANSION)
>> +return false;
>> +
>> +  unsigned val_host = GOMP_DEVICE_HOST;
>> +  unsigned val_dev = GOMP_DEVICE_NONE;
>> +
>> +#ifdef ACCEL_COMPILER
>> +  val_host = GOMP_DEVICE_NOT_HOST;
>> +  val_dev = ACCEL_COMPILER_acc_device;
>> +#endif
>> +
>> +  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
>> +   build_int_cst (integer_type_node, val_host));
>> +  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
>> +  build_int_cst (integer_type_node, val_dev));
>> +
>> +  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
>> +
>> +  result = fold_convert (integer_type_node, result);
>> +  gimplify_and_update_call_from_tree (gsi, result);
>> +  return true;
>> +}
>
> Wouldn't it be better to just emit GIMPLE here instead?
> So
>   tree res = make_ssa_name (boolean_type_node);
>   gimple g = gimple_build_assign (res, EQ_EXPR, arg0,
>   build_int_cst (integer_type_node, 
> val_host));
>   gsi_insert_before (gsi, g);
> ...
> ?

Yeah, best with using gimple_build which also canonicalizes/optimizes.

Richard.

> Jakub


Re: [ARM] Use vector wide add for mixed-mode adds

2015-10-01 Thread Michael Collison

Kyrill,

I have modified the patch to address your comments. I also modified 
check_effective_target_vect_widen_sum_hi_to_si_pattern in 
target-supports.exp to
indicate that arm neon supports vector widen sum of HImode to SImode. 
This resolved

several test suite failures.

Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have 
four related execution failure

tests on armeb-non-linux-gnueabihf with -flto only.

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


I am debugging but have not tracked down the root cause yet. Feedback?

2015-07-22  Michael Collison  

* config/arm/neon.md (widen_sum): New patterns
where mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi3): New
define_insn to match high half of unsigned vaddw.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.

On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:

Hi Michael,

On 23/09/15 00:52, Michael Collison wrote:

This is a modified version of the previous patch that removes the
documentation and read-md.c fixes. These patches have been submitted
separately and approved.

This patch is designed to address code that was not being vectorized due
to missing widening patterns in the ARM backend. Code such as:

int t6(int len, void * dummy, short * __restrict x)
{
len = len & ~31;
int result = 0;
__asm volatile ("");
for (int i = 0; i < len; i++)
  result += x[i];
return result;
}

Validated on arm-none-eabi, arm-none-linux-gnueabi,
arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.

2015-09-22  Michael Collison 

  * config/arm/neon.md (widen_sum): New patterns
  where mode is VQI to improve mixed mode add vectorization.



Please list all the new define_expands and define_insns
in the changelog. Also, please add an ChangeLog entry for
the testsuite additions.

The approach looks ok to me with a few comments on some
parts of the patch itself.


+(define_insn "vec_sel_widen_ssum_hi3"
+  [(set (match_operand: 0 "s_register_operand" "=w")
+(plus: (sign_extend: (vec_select:VW 
(match_operand:VQI 1 "s_register_operand" "%w")
+   (match_operand:VQI 2 
"vect_par_constant_high" "")))
+(match_operand: 3 "s_register_operand" 
"0")))]

+  "TARGET_NEON"
+  "vaddw.\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")
+  (set_attr "length" "8")]
+)


This is a single instruction, and it has a length of 4, so no need to 
override the length attribute.

Same with the other define_insns in this patch.


diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c 
b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c

new file mode 100644
index 000..ed10669
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */

The arm_neon_hw check is usually used when you want to run the tests.
Since this is a compile-only tests you just need arm_neon_ok.

 +/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int
+t6(int len, void * dummy, short * __restrict x)
+{
+  len = len & ~31;
+  int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
+
+
+

Stray trailing newlines. Similar comments for the other testcases.

Thanks,
Kyrill



--
Michael Collison
Linaro Toolchain Working Group
michael.colli...@linaro.org

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..b3485f1 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,55 @@
 
 ;; Widening operations
 
+(define_expand "widen_ssum3"
+  [(set (match_operand: 0 "s_register_operand" "")
+	(plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" ""))
+			   (match_operand: 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+int i;
+int half_elem = /2;
+rtvec v1 = rtvec_alloc (half_elem);
+rtvec v2 = rtvec_alloc (half_elem);
+rtx p1, 

[C PATCH, committed] Explain parameters better for convert_for_assignment

2015-10-01 Thread Marek Polacek
The location parameters description was IMHO inadequate.  The following
should be much clearer.

Applying to trunk.

2015-10-01  Marek Polacek  

* c-typeck.c (convert_for_assignment): Improve commentary.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index a11ccb2..11e487c 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -5690,8 +5690,18 @@ maybe_warn_string_init (location_t loc, tree type, 
struct c_expr expr)
ERRTYPE says whether it is argument passing, assignment,
initialization or return.
 
-   LOCATION is the location of the assignment, EXPR_LOC is the location of
-   the RHS or, for a function, location of an argument.
+   In the following example, '~' denotes where EXPR_LOC and '^' where
+   LOCATION point to:
+
+ f (var);  [ic_argpass]
+ ^  ~~~
+ x = var;  [ic_assign]
+   ^ ~~~;
+ int x = var;  [ic_init]
+^^^
+ return x; [ic_return]
+   ^
+
FUNCTION is a tree for the function being called.
PARMNUM is the number of the argument, for printing in error messages.  */
 
Marek


[PATCH, testsuite]: Fix gcc.target/i386/pr65105-1.c test

2015-10-01 Thread Uros Bizjak
Hello!

Attached patch fixes gcc.target/i386/pr65105-1.c:

a) As a runtime SSE2 test, we have to check for target SSE2 support
and use proper test infrastructure.

b) A runtime test can't check output assembly without -save-temps.

The patch also use another misuse of -save-temps in gcc.target/i386 directory.

The patch solves:

UNRESOLVED: gcc.target/i386/pr65105-1.c scan-assembler por
UNRESOLVED: gcc.target/i386/pr65105-1.c scan-assembler pand

2015-10-01  Uros Bizjak  

* gcc.target/i386/pr65105-1.c: Require sse2 effective target.
(main): Rename to sse2_test.  Abort if count != 5.
(dg-options): Add -save-temps.  Use "-msse2 -mtune=slm" instead
of -march=slm.
* gcc.target/i386/pr46865-2.c (dg-options): Remove -save-temps.

Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: gcc.target/i386/pr65105-1.c
===
--- gcc.target/i386/pr65105-1.c (revision 228292)
+++ gcc.target/i386/pr65105-1.c (working copy)
@@ -1,9 +1,12 @@
 /* PR target/pr65105 */
 /* { dg-do run { target { ia32 } } } */
-/* { dg-options "-O2 -march=slm" } */
+/* { dg-options "-O2 -msse2 -mtune=slm -save-temps" } */
+/* { dg-require-effective-target sse2 } */
 /* { dg-final { scan-assembler "por" } } */
 /* { dg-final { scan-assembler "pand" } } */
 
+#include "sse2-check.h"
+
 #include "stdlib.h"
 
 static int count = 0;
@@ -40,11 +43,13 @@
   arr[5] = 0xff00L;
 }
 
-int
-main (int argc, const char **argv)
+static void
+sse2_test (void)
 {
   long long arr[6];
   fill_data (arr);
   test (arr);
-  return count - 5;
+
+  if (count != 5)
+__builtin_abort ();
 }
Index: gcc.target/i386/pr46865-2.c
===
--- gcc.target/i386/pr46865-2.c (revision 228292)
+++ gcc.target/i386/pr46865-2.c (working copy)
@@ -1,6 +1,6 @@
 /* PR rtl-optimization/46865 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -save-temps" } */
+/* { dg-options "-O2" } */
 
 extern unsigned long f;
 


Re: [PATCH] rs6000: Add "cannot_copy" attribute, use it (PR67788, PR67789)

2015-10-01 Thread Richard Biener
On Thu, Oct 1, 2015 at 8:08 AM, Segher Boessenkool
 wrote:
> After the shrink-wrapping patches the prologue will often be pushed
> "deeper" into the function, which in turn means the software trace cache
> pass will more often want to duplicate the basic block containing the
> prologue.  This caused failures for 32-bit SVR4 with -msecure-plt PIC.
>
> This configuration uses the load_toc_v4_PIC_1 instruction, which creates
> assembler labels without using the normal machinery for that.  If now
> the compiler decides to duplicate the insn, it will emit the same label
> twice.  Boom.
>
> It isn't so easy to fix this to use labels the compiler knows about (let
> alone test that properly).  Instead, this patch wires up a "cannot_copy"
> attribute to be used by TARGET_CANNOT_COPY_P, and sets that attribute on
> these insns we do not want copied.
>
> Bootstrapped and tested on powerpc64-linux, with the usual configurations
> (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); new testcase fails before, works
> after (on 32-bit).
>
> Is this okay for mainline?

Isn't that quite expensive?  So even if not "easy", can you try?

Do we have other ports with local labels in define_insns?  I see some in
darwin.md as well which your patch doesn't handle btw., otherwise
suspicious %0: also appears (only) in h8300.md.  arc.md also has
a suspicious case in its doloop_end_i pattern.

Richard.

>
> Segher
>
>
> 2015-09-30  Segher Boessenkool  
>
> PR target/67788
> PR target/67789
> * config/rs6000/rs6000.c (TARGET_CANNOT_COPY_INSN_P): New.
> (rs6000_cannot_copy_insn_p): New function.
> * config/rs6000/rs6000.md (cannot_copy): New attribute.
> (load_toc_v4_PIC_1_normal): Set cannot_copy.
> (load_toc_v4_PIC_1_476): Ditto.
>
> gcc/testsuite/
> PR target/67788
> PR target/67789
> * gcc.target/powerpc/pr67789.c: New testcase.
>
> ---
>  gcc/config/rs6000/rs6000.c | 11 +
>  gcc/config/rs6000/rs6000.md|  9 +--
>  gcc/testsuite/gcc.target/powerpc/pr67789.c | 39 
> ++
>  3 files changed, 57 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr67789.c
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 93bb725..29fd198 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -1513,6 +1513,8 @@ static const struct attribute_spec 
> rs6000_attribute_table[] =
>  #define TARGET_REGISTER_MOVE_COST rs6000_register_move_cost
>  #undef TARGET_MEMORY_MOVE_COST
>  #define TARGET_MEMORY_MOVE_COST rs6000_memory_move_cost
> +#undef TARGET_CANNOT_COPY_INSN_P
> +#define TARGET_CANNOT_COPY_INSN_P rs6000_cannot_copy_insn_p
>  #undef TARGET_RTX_COSTS
>  #define TARGET_RTX_COSTS rs6000_rtx_costs
>  #undef TARGET_ADDRESS_COST
> @@ -31226,6 +31228,15 @@ rs6000_xcoff_encode_section_info (tree decl, rtx 
> rtl, int first)
>  #endif /* HAVE_AS_TLS */
>  #endif /* TARGET_XCOFF */
>
> +/* Return true if INSN should not be copied.  */
> +
> +static bool
> +rs6000_cannot_copy_insn_p (rtx_insn *insn)
> +{
> +  return recog_memoized (insn) >= 0
> +&& get_attr_cannot_copy (insn);
> +}
> +
>  /* Compute a (partial) cost for rtx X.  Return true if the complete
> cost has been computed, and false if subexpressions should be
> scanned.  In either case, *TOTAL contains the cost result.  */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index cfdb286..8c53c40 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -226,6 +226,9 @@ (define_attr "var_shift" "no,yes"
>   (const_string "no"))
> (const_string "no")))
>
> +;; Is copying of this instruction disallowed?
> +(define_attr "cannot_copy" "no,yes" (const_string "no"))
> +
>  ;; Define floating point instruction sub-types for use with Xfpu.md
>  (define_attr "fp_type" 
> "fp_default,fp_addsub_s,fp_addsub_d,fp_mul_s,fp_mul_d,fp_div_s,fp_div_d,fp_maddsub_s,fp_maddsub_d,fp_sqrt_s,fp_sqrt_d"
>  (const_string "fp_default"))
>
> @@ -9130,7 +9133,8 @@ (define_insn "load_toc_v4_PIC_1_normal"
> && (flag_pic == 2 || (flag_pic && TARGET_SECURE_PLT))"
>"bcl 20,31,%0\\n%0:"
>[(set_attr "type" "branch")
> -   (set_attr "length" "4")])
> +   (set_attr "length" "4")
> +   (set_attr "cannot_copy" "yes")])
>
>  (define_insn "load_toc_v4_PIC_1_476"
>[(set (reg:SI LR_REGNO)
> @@ -9148,7 +9152,8 @@ (define_insn "load_toc_v4_PIC_1_476"
>return templ;
>  }"
>[(set_attr "type" "branch")
> -   (set_attr "length" "4")])
> +   (set_attr "length" "4")
> +   (set_attr "cannot_copy" "yes")])
>
>  (define_expand "load_toc_v4_PIC_1b"
>[(parallel [(set (reg:SI LR_REGNO)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr67789.c 
> b/gcc/testsuite/gcc.target/powerpc/pr67789.c
> new file mode 100644
> index 000..d1bd047
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr67789

Re: [PATCH, testsuite]: Fix gcc.target/i386/pr65105-1.c test

2015-10-01 Thread Ilya Enkovich
2015-10-01 13:12 GMT+03:00 Uros Bizjak :
> Hello!
>
> Attached patch fixes gcc.target/i386/pr65105-1.c:

Thanks!
Ilya

>
> a) As a runtime SSE2 test, we have to check for target SSE2 support
> and use proper test infrastructure.
>
> b) A runtime test can't check output assembly without -save-temps.
>
> The patch also use another misuse of -save-temps in gcc.target/i386 directory.
>
> The patch solves:
>
> UNRESOLVED: gcc.target/i386/pr65105-1.c scan-assembler por
> UNRESOLVED: gcc.target/i386/pr65105-1.c scan-assembler pand
>
> 2015-10-01  Uros Bizjak  
>
> * gcc.target/i386/pr65105-1.c: Require sse2 effective target.
> (main): Rename to sse2_test.  Abort if count != 5.
> (dg-options): Add -save-temps.  Use "-msse2 -mtune=slm" instead
> of -march=slm.
> * gcc.target/i386/pr46865-2.c (dg-options): Remove -save-temps.
>
> Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.
>
> Uros.


[PATCH, i386, AVX-512] Update extract_even_odd w/ AVX-512BW insns.

2015-10-01 Thread Kirill Yukhin
Hello,
Patch in the bottom improves insns sequences for
strided loads.
E.g. on `-march=skylake-avx512' for this test:
 unsigned char yy[1];
 unsigned char xx[1];

  void
  __attribute__ ((noinline)) generateMTFValues (unsigned char s)
  {
 unsigned char i;
 for (i = 0; i < s; i++)
   yy[i] = xx [i*2 + 1];
  }

We have:
vmovdqa64   .LC0(%rip), %zmm0   # 34*movv32hi_internal/2
[length = 11]
vmovdqa64   .LC1(%rip), %zmm1   # 36*movv32hi_internal/2
[length = 11]
vmovdqu64   xx+1(%rip), %zmm3   # 29*movv64qi_internal/2
[length = 11]
vmovdqu64   xx+65(%rip), %zmm4  # 32*movv64qi_internal/2
[length = 11]
vmovdqa64   %zmm0, %zmm2# 153   *movv32hi_internal/2[length 
= 6]
vmovdqa64   %zmm1, %zmm5# 155   *movv32hi_internal/2[length 
= 6]
vpermi2w%zmm4, %zmm3, %zmm2 # 35
avx512bw_vpermi2varv32hi3   [length = 6]
vpermi2w%zmm4, %zmm3, %zmm5 # 37
avx512bw_vpermi2varv32hi3   [length = 6]
vmovdqa64   .LC2(%rip), %zmm4   # 38*movv64qi_internal/2
[length = 11]
vmovdqa64   .LC3(%rip), %zmm3   # 40*movv64qi_internal/2
[length = 11]
vpshufb %zmm4, %zmm2, %zmm2 # 39avx512bw_pshufbv64qi3/2 [length 
= 6]
vpshufb %zmm3, %zmm5, %zmm5 # 41avx512bw_pshufbv64qi3/2 [length 
= 6]
vporq   %zmm5, %zmm2, %zmm2 # 42*iorv64qi3/2[length = 4]
vmovdqu32   %zmm2, yy(%rip) # 44avx512f_storedquv16si   [length 
= 11]
Due to most common permute expander got in charge.

Patch reduces the code to:
vmovdqu64   xx+1(%rip), %zmm0   # 28*movv16si_internal/2
[length = 11]
vmovdqu64   xx+65(%rip), %zmm1  # 134   *movv16si_internal/2
[length = 11]
vpmovwb %zmm0, %ymm0# 34avx512bw_truncatev32hiv32qi2/1  [length 
= 6]
vpmovwb %zmm1, %ymm1# 35avx512bw_truncatev32hiv32qi2/1  [length 
= 6]
vinserti64x4$0x1, %ymm1, %zmm0, %zmm0   # 36
avx_vec_concatv64qi/1   [length = 7]
vmovdqu32   %zmm0, yy(%rip) # 38avx512f_storedquv16si   [length 
= 11]

Also it allows to do extract_even_odd for V64QI.

Bootstrapped. New tests pass (fail w/o the change). Regtesting is in progress.

Is it ok for trunk?

gcc/
* config/i386/i386.c (expand_vec_perm_even_odd_trunc): New.
(expand_vec_perm_even_odd_1): Handle V64QImode.
(ix86_expand_vec_perm_const_1): Try expansion with
expand_vec_perm_even_odd_trunc as well.
* config/i386/sse.md (VI124_AVX512F): Rename to ...
(define_mode_iterator VI124_AVX2_24_AVX512F_1_AVX512BW): This. Extend
to V54QI.
(define_mode_iterator VI248_AVX2_8_AVX512F): Rename to ...
(define_mode_iterator VI248_AVX2_8_AVX512F_24_AVX512BW): This. Extend
to V32HI and V16SI.
(define_insn "avx512bw_v32hiv32qi2"): Unhide pattern name.
(define_expand "vec_pack_trunc_"): Update iterator name.
(define_expand "vec_unpacks_lo_"): Ditto.
(define_expand "vec_unpacks_hi_"): Ditto.
(define_expand "vec_unpacku_lo_"): Ditto.
(define_expand "vec_unpacku_hi_"): Ditto.

gcc/testsuite/
* gcc.target/i386/vect-pack-trunc-1.c: New test.
* gcc.target/i386/vect-pack-trunc-2.c: Ditto.
* gcc.target/i386/vect-perm-even-1.c: Ditto.
* gcc.target/i386/vect-perm-odd-1.c: Ditto.
* gcc.target/i386/vect-unpack-1.c: Ditto.
* gcc.target/i386/vect-unpack-2.c: Ditto.
--
Thanks, K

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d370521..24202b3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -49216,6 +49216,62 @@ expand_vec_perm_even_odd_pack (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* A subroutine of expand_vec_perm_even_odd_1.  Implement extract-even
+   and extract-odd permutations of two V64QI operands
+   with two "shifts", two "truncs" and one "concat" insns for "odd"
+   and two "truncs" and one concat insn for "even."
+   Have already failed all two instruction sequences.  */
+
+static bool
+expand_vec_perm_even_odd_trunc (struct expand_vec_perm_d *d)
+{
+  rtx t1, t2, t3, t4;
+  unsigned i, odd, nelt = d->nelt;
+
+  if (!TARGET_AVX512BW
+  || d->one_operand_p
+  || d->vmode != V64QImode)
+return false;
+
+  /* Check that permutation is even or odd.  */
+  odd = d->perm[0];
+  if (odd > 1)
+return false;
+
+  for (i = 1; i < nelt; ++i)
+if (d->perm[i] != 2 * i + odd)
+  return false;
+
+  if (d->testing_p)
+return true;
+
+
+  if (odd)
+{
+  t1 = gen_reg_rtx (V32HImode);
+  t2 = gen_reg_rtx (V32HImode);
+  emit_insn (gen_lshrv32hi3 (t1,
+gen_lowpart (V32HImode, d->op0),
+GEN_INT (8)));
+  emit_insn (gen_lshrv32hi3 (t2,
+  

Re: [PATCH] rs6000: Add "cannot_copy" attribute, use it (PR67788, PR67789)

2015-10-01 Thread Bernd Schmidt


Do we have other ports with local labels in define_insns?  I see some in
darwin.md as well which your patch doesn't handle btw., otherwise
suspicious %0: also appears (only) in h8300.md.  arc.md also has
a suspicious case in its doloop_end_i pattern.


It is reasonably common, and defining cannot_copy_insn_p is the normal 
way of dealing with it. sh for example does this.



Bernd


Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-10-01 Thread Jonathan Wakely

commit 502928c8061343e82e982e06299c11d465f64b6c
Author: Jonathan Wakely 
Date:   Wed Sep 30 14:10:58 2015 +0100

   Save-and-restore errno more carefully in libstdc++
   
   	* doc/xml/manual/diagnostics.xml: Document use of errno.

* config/locale/generic/c_locale.cc (_Save_errno): New helper.
(__convert_to_v): Use _Save_errno.
* include/ext/string_conversions.h (__stoa): Only restore errno when
it isn't set to non-zero.


Committed to trunk.



Re: [PATCH] Clarify __atomic_compare_exchange_n docs

2015-10-01 Thread Andrew Haley
On 09/29/2015 04:21 PM, Sandra Loosemore wrote:
> What is "weak compare_exchange", and what is "the strong variation", and 
> how do they differ in terms of behavior?

It's in C++11 29.6.5:

Remark: The weak compare-and-exchange operations may fail spuriously,
that is, return false while leaving the contents of memory pointed to
by expected before the operation is the same that same as that of the
object and the same as that of expected after the operation. [ Note:
This spurious failure enables implementation of compare-and-exchange
on a broader class of machines, e.g., load- locked store-conditional
machines. A consequence of spurious failure is that nearly all uses of
weak compare-and-exchange will be in a loop.  When a
compare-and-exchange is in a loop, the weak version will yield better
performance on some platforms. When a weak compare-and-exchange would
require a loop and a strong one would not, the strong one is
preferable. — end note ]

The classic use of this is for shared counters: you don't care if you
miss an occasional count but you don't want the counter to go
backwards.

Whether we should replicate all of the C++11 language is perhaps
something we should discuss.

Andrew.


[PATCH] Fix PR67783, quadraticness in IPA inline analysis

2015-10-01 Thread Richard Biener

The following avoids quadraticness in the loop depth by only considering
loop header defs as IVs for the analysis of the loop_stride predicate.
This will miss cases like

foo (int inv)
{
 for (i = inv; i < n; ++i)
  {
int derived_iv = i + i * inv;
...
  }
}

but I doubt that's important in practice.  Another way would be to
just consider the containing loop when analyzing the IV, thus iterate
over outermost loop bodies only, replacing the

  simple_iv (loop, loop_containing_stmt (stmt), use, &iv, true)

check with

  simple_iv (loop_containing_stmt (stmt), loop_containing_stmt (stmt), 
use, &iv, true);

but doing all this analysis for each stmt is already quite expensive,
esp. as we are doing it for all uses instead of all defs ...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Honza, is this ok or did you do the current way on purpose (rather
than for completeness as it was easy to do?)

Thanks,
Richard.

2015-10-01  Richard Biener  

PR ipa/67783
* ipa-inline-analysis.c (estimate_function_body_sizes): Only
consider loop header PHI defs as IVs.

Index: gcc/ipa-inline-analysis.c
===
*** gcc/ipa-inline-analysis.c   (revision 228319)
--- gcc/ipa-inline-analysis.c   (working copy)
*** estimate_function_body_sizes (struct cgr
*** 2760,2768 
{
  vec exits;
  edge ex;
! unsigned int j, i;
  struct tree_niter_desc niter_desc;
- basic_block *body = get_loop_body (loop);
  bb_predicate = *(struct predicate *) loop->header->aux;
  
  exits = get_loop_exit_edges (loop);
--- 2760,2767 
{
  vec exits;
  edge ex;
! unsigned int j;
  struct tree_niter_desc niter_desc;
  bb_predicate = *(struct predicate *) loop->header->aux;
  
  exits = get_loop_exit_edges (loop);
*** estimate_function_body_sizes (struct cgr
*** 2788,2833 
}
  exits.release ();
  
! for (i = 0; i < loop->num_nodes; i++)
{
! gimple_stmt_iterator gsi;
! bb_predicate = *(struct predicate *) body[i]->aux;
! for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
!  gsi_next (&gsi))
!   {
! gimple *stmt = gsi_stmt (gsi);
! affine_iv iv;
! ssa_op_iter iter;
! tree use;
! 
! FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
! {
!   predicate will_be_nonconstant;
! 
!   if (!simple_iv
!   (loop, loop_containing_stmt (stmt), use, &iv, true)
!   || is_gimple_min_invariant (iv.step))
! continue;
!   will_be_nonconstant
! = will_be_nonconstant_expr_predicate (fbi.info, info,
!   iv.step,
!   nonconstant_names);
!   if (!true_predicate_p (&will_be_nonconstant))
! will_be_nonconstant
!= and_predicates (info->conds,
!  &bb_predicate,
!  &will_be_nonconstant);
!   if (!true_predicate_p (&will_be_nonconstant)
!   && !false_predicate_p (&will_be_nonconstant))
! /* This is slightly inprecise.  We may want to represent
!each loop with independent predicate.  */
! loop_stride =
!   and_predicates (info->conds, &loop_stride,
!   &will_be_nonconstant);
! }
!   }
}
- free (body);
}
set_hint_predicate (&inline_summaries->get (node)->loop_iterations,
  loop_iterations);
--- 2787,2818 
}
  exits.release ();
  
! for (gphi_iterator gsi = gsi_start_phis (loop->header);
!  !gsi_end_p (gsi); gsi_next (&gsi))
{
! gphi *phi = gsi.phi ();
! tree use = gimple_phi_result (phi);
! affine_iv iv;
! predicate will_be_nonconstant;
! if (!virtual_operand_p (use)
! || !simple_iv (loop, loop, use, &iv, true)
! || is_gimple_min_invariant (iv.step))
!   continue;
! will_be_nonconstant
!   = will_be_nonconstant_expr_predicate (fbi.info, info,
! iv.step,
! nonconstant_names);
! if (!true_predicate_p (&will_be_nonconstant))
!   will_be_nonconstant = and_predicates (info->conds,
!

Re: [PATCH] fortran/66979 -- FLUSH requires a UNIT number in the spec-list

2015-10-01 Thread Mikael Morin

Le 01/10/2015 01:58, Steve Kargl a écrit :

When FLUSH is used with a flulsh-spec-list, a unit is required.
Thus, a statement like 'flush(iostat=i)' would lead to an ICE
because gfortran was dereference a pointer to a non-existant
unit number.  The attached patch was built and tested on
x86-*-freebsd.  OK to commit?


OK, thanks.


Re: [build] Support PIE on Solaris

2015-10-01 Thread Rainer Orth
Rainer Orth  writes:

> Rainer Orth  writes:
>
>> Beyond the reasons for the bundled Solaris CRTs already cited in
>>
>>  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01638.html
>>
>> they need to be PIC to support position independent executables (PIE).
>>
>> While linker support for PIE has existed in Solaris ld since at least
>> Solaris 11.2 and GNU ld has just gotten the last (mostly cosmetic) bit
>> for binutils 2.26, there were no usable CRTs before.
>>
>> Now those pieces are in place, this patch enables PIE if the necessary
>> support (linker and CRTs) is detected.  It's mostly straightforward,
>> adapting specs changes in gnu-user.h and allowing for differences in
>> linker options.
>>
>> crtp.o, crtpg.o, and gmon.o are now compiled as PIC to also work with
>> PIE.  I don't thing there's any point to have separate PIC and non-PIC
>> versions here.
>>
>> During early development of the patch, I found that gmon.c includes the
>> trailing NULs in error messages it prints.  Now corrected, though not
>> strictly related to the patch.
>>
>> Contrary to other targets, where -pie seems to be silently ignored if
>> PIE support is missing, I've decided to have gcc error out on Solaris in
>> this situation.  This also allows to easily distinguish between
>> configurations with and without PIE support in the testsuite.  
>>
>> Tested on i386-pc-solaris2.1[012] and sparc-sun-solaris2.1[012] with
>> both as/ld and gas/gld, and x86_64-unknown-linux-gnu.
>>
>> I've also bootstrapped on i386-pc-solaris2.12 and sparc-sun-solaris2.12
>> with --enable-default-pie.  There are a couple of new failures, but they
>> also occur on Linux/x86_64 and I've already filed PRs for (most of?)
>> them.
>>
>> Again, perhaps with exception of the obvious hunk in gcc.c, this patch
>> is purely Solaris-specific, so I'll commit it in a couple of days.  I'd
>> also like to backport it to the gcc-5 branch after some soak time on
>> mainline.
>
> I've now installed both the previous Solaris CRTs patch and this one.  A
> final round of testing revealed a problem with gld PIE support
> detection, though: by mistake I initially did the gas/gld testing with
> an unmodified gld 2.25.  While this works just fine on Solaris/x86 (with
> the exception of the PIE executables not being marked with DF_1_PIE in
> DF_FLAGS_1, a purely informational thing), Solaris/SPARC was different:
> even in a default build, many PIE tests failed with
>
>   gld-2.25: read-only segment has dynamic relocations.
>
> which doesn't happen with a gld 2.25.51 with the Solaris PIE patch.
> Also, Solaris ld links those exact same objects just fine.
>
> Therefore I'm now requiring gld 2.26 on Solaris for PIE support, as in
> the following patchlet.  Tested by configuring with ld, gld 2.25 and a
> gld 2.25.51 faked to call itself gld 2.26 on i386-pc-solaris2.11 and the
> bundled gld 2.23.2 on x86_64-pc-linux-gnu and checking the HAVE_LD_PIE
> is set correctly.
>
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -4751,7 +4751,12 @@ AC_MSG_RESULT($gcc_cv_ld_eh_frame_ciev3)
>  AC_MSG_CHECKING(linker position independent executable support)
>  gcc_cv_ld_pie=no
>  if test $in_tree_ld = yes ; then
> -  if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
> -ge 15 -o "$gcc_cv_gld_major_version" -gt 2 \
> +  case "$target" in
> +# Full PIE support on Solaris was only introduced in gld 2.26.
> +*-*-solaris2*)  gcc_gld_pie_min_version=26 ;;
> +*)   gcc_gld_pie_min_version=15 ;;
> +  esac
> +  if test "$gcc_cv_gld_major_version" -eq 2 -a "$gcc_cv_gld_minor_version" 
> -ge "$gcc_gld_pie_min_version" -o "$gcc_cv_gld_major_version" -gt 2 \
>   && test $in_tree_ld_is_elf = yes; then
>  gcc_cv_ld_pie=yes
>fi
> @@ -4759,6 +4764,14 @@ elif test x$gcc_cv_ld != x; then
># Check if linker supports -pie option
>if $gcc_cv_ld --help 2>/dev/null | grep -- -pie > /dev/null; then
>  gcc_cv_ld_pie=yes
> +case "$target" in
> +  *-*-solaris2*)
> + if echo "$ld_ver" | grep GNU > /dev/null \
> +   && test "$ld_vers_major" -eq 2 -a "$ld_vers_minor" -lt 26; then
> +   gcc_cv_ld_pie=no
> + fi
> + ;;
> +esac
>else
>  case "$target" in
>*-*-solaris2.1[[1-9]]*)
> @@ -4772,7 +4785,7 @@ elif test x$gcc_cv_ld != x; then
>  fi
>  if test x"$gcc_cv_ld_pie" = xyes; then
>   AC_DEFINE(HAVE_LD_PIE, 1,
> -[Define if your linker supports -pie option.])
> +[Define if your linker supports PIE option.])
>  fi
>  AC_MSG_RESULT($gcc_cv_ld_pie)
>  
>
> Despite that patch, with --enable-default-pie, there are many failures
> on sparc-sun-solaris2.12 with gas/gld 2.26, both the same error as above
> and execution failures in boehm-gc, libgomp, and libjava.  Given that
> those errors don't occur with as/ld or on i386-pc-solaris2.12, ISTM that
> there's something amiss with gld on Solaris/SPARC.  Given that this is a
> non-recommended and niche c

[gomp4.1] Fixup doacross lastprivate handling

2015-10-01 Thread Jakub Jelinek
On Thu, Sep 24, 2015 at 08:32:10PM +0200, Jakub Jelinek wrote:
> some edge for that case and condition checking), lastprivate also needs
> checking for all the cases,

This patch handles lastprivate in the doacross loops.  In certain cases
(C++ class iterators and addressable iterators) the user IVs are replaced
with artificial IVs, and the user IVs are assigned (non-class) or adjusted
(class iterators) inside of the body of the loop, but while for normal omp
for (both collapse == 1 and > 1) lastprivate is undefined if there are no
iterations, for doacross it is IMHO only if the collapsed loops have zero
iterations; but if they have non-zero iters, but the ordered loops nested in
them have zero iterations, then the body might be not actually ever invoked.
So we need slightly different lastprivate sequences in that case.  And, to
make it more complicated, for the collapsed > 1 loops we need to add step to
the artificial IV before that, while for collapse == 1 loops or >= collapse
loops we should not.

2015-10-01  Jakub Jelinek  

* gimplify.c (gimplify_omp_for): Fix handling of lastprivate
iterators in doacross loops.
* omp-low.c (expand_omp_for_ordered_loops): Add ordered_lastprivate
argument.  If true, add extra initializers for IVs starting with the
one inner to the first >= collapse loop that could have zero
iterations.
(expand_omp_for_generic): Adjust caller.

* tree-pretty-print.c (dump_omp_clause): Remove unused variable.
gcc/cp/
* semantics.c (handle_omp_for_class_iterator): Add collapse and
ordered arguments.  Fix handling of lastprivate iterators in
doacross loops.
(finish_omp_for): Adjust caller.
libgomp/
* testsuite/libgomp.c++/doacross-1.C: New test.

--- gcc/gimplify.c.jj   2015-09-24 20:20:32.0 +0200
+++ gcc/gimplify.c  2015-10-01 12:55:44.955218974 +0200
@@ -8108,9 +8108,7 @@ gimplify_omp_for (tree *expr_p, gimple_s
  OMP_CLAUSE_LINEAR_STEP (c2) = OMP_CLAUSE_LINEAR_STEP (c);
}
 
-  if ((var != decl || collapse > 1)
- && orig_for_stmt == for_stmt
- && i < collapse)
+  if ((var != decl || collapse > 1) && orig_for_stmt == for_stmt)
{
  for (c = OMP_FOR_CLAUSES (for_stmt); c ; c = OMP_CLAUSE_CHAIN (c))
if (((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE
@@ -8120,16 +8118,22 @@ gimplify_omp_for (tree *expr_p, gimple_s
 && OMP_CLAUSE_LINEAR_GIMPLE_SEQ (c) == NULL))
&& OMP_CLAUSE_DECL (c) == decl)
  {
-   t = TREE_VEC_ELT (OMP_FOR_INCR (for_stmt), i);
-   gcc_assert (TREE_CODE (t) == MODIFY_EXPR);
-   gcc_assert (TREE_OPERAND (t, 0) == var);
-   t = TREE_OPERAND (t, 1);
-   gcc_assert (TREE_CODE (t) == PLUS_EXPR
-   || TREE_CODE (t) == MINUS_EXPR
-   || TREE_CODE (t) == POINTER_PLUS_EXPR);
-   gcc_assert (TREE_OPERAND (t, 0) == var);
-   t = build2 (TREE_CODE (t), TREE_TYPE (decl), decl,
-   TREE_OPERAND (t, 1));
+   if (is_doacross && (collapse == 1 || i >= collapse))
+ t = var;
+   else
+ {
+   t = TREE_VEC_ELT (OMP_FOR_INCR (for_stmt), i);
+   gcc_assert (TREE_CODE (t) == MODIFY_EXPR);
+   gcc_assert (TREE_OPERAND (t, 0) == var);
+   t = TREE_OPERAND (t, 1);
+   gcc_assert (TREE_CODE (t) == PLUS_EXPR
+   || TREE_CODE (t) == MINUS_EXPR
+   || TREE_CODE (t) == POINTER_PLUS_EXPR);
+   gcc_assert (TREE_OPERAND (t, 0) == var);
+   t = build2 (TREE_CODE (t), TREE_TYPE (decl),
+   is_doacross ? var : decl,
+   TREE_OPERAND (t, 1));
+ }
gimple_seq *seq;
if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
  seq = &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c);
--- gcc/omp-low.c.jj2015-09-29 19:07:25.0 +0200
+++ gcc/omp-low.c   2015-09-30 12:09:13.866406256 +0200
@@ -7303,7 +7303,8 @@ expand_omp_ordered_source_sink (struct o
 
 static basic_block
 expand_omp_for_ordered_loops (struct omp_for_data *fd, tree *counts,
- basic_block cont_bb, basic_block body_bb)
+ basic_block cont_bb, basic_block body_bb,
+ bool ordered_lastprivate)
 {
   if (fd->ordered == fd->collapse)
 return cont_bb;
@@ -7411,6 +7412,31 @@ expand_omp_for_ordered_loops (struct omp
  add_loop (loop, body_bb->loop_father);
}
 }
+
+  /* If there are any lastprivate clauses and it is possible some loops
+ might have zero iterations, ensure all the decls are initialized,
+ otherwise we could crash evaluat

Re: [PATCH] fortran/67758 -- Prevent ICE caused by misplaced COMMON

2015-10-01 Thread Mikael Morin

Le 01/10/2015 02:07, Steve Kargl a écrit :

On Wed, Sep 30, 2015 at 05:06:30PM -0700, Steve Kargl wrote:

Patch built and regression tested on x86_64-*-freebsd.
OK to commit?

The patch prevents the dereferencing of a NULL pointer
by jumping out of the cleanup of a list of COMMON blocks.

Hold on, I believe p should be present in the common symbol list pointed 
by p->common.
And by the way, if we are in gfc_restore_last_undo_checkpoint, we have 
found something bogus enough to backtrack, so hopefully an error has 
already been prepared (but maybe not emitted).

I will investigate more.

Mikael


Re: [PATCH] Fix PR67783, quadraticness in IPA inline analysis

2015-10-01 Thread Richard Biener
On Thu, 1 Oct 2015, Richard Biener wrote:

> 
> The following avoids quadraticness in the loop depth by only considering
> loop header defs as IVs for the analysis of the loop_stride predicate.
> This will miss cases like
> 
> foo (int inv)
> {
>  for (i = inv; i < n; ++i)
>   {
> int derived_iv = i + i * inv;
> ...
>   }
> }
> 
> but I doubt that's important in practice.  Another way would be to
> just consider the containing loop when analyzing the IV, thus iterate
> over outermost loop bodies only, replacing the
> 
>   simple_iv (loop, loop_containing_stmt (stmt), use, &iv, true)
> 
> check with
> 
>   simple_iv (loop_containing_stmt (stmt), loop_containing_stmt (stmt), 
> use, &iv, true);
> 
> but doing all this analysis for each stmt is already quite expensive,
> esp. as we are doing it for all uses instead of all defs ...
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Honza, is this ok or did you do the current way on purpose (rather
> than for completeness as it was easy to do?)
> 
> Thanks,
> Richard.
> 
> 2015-10-01  Richard Biener  
> 
>   PR ipa/67783
>   * ipa-inline-analysis.c (estimate_function_body_sizes): Only
>   consider loop header PHI defs as IVs.
> 
> Index: gcc/ipa-inline-analysis.c
> ===
> *** gcc/ipa-inline-analysis.c (revision 228319)
> --- gcc/ipa-inline-analysis.c (working copy)
> *** estimate_function_body_sizes (struct cgr
> *** 2760,2768 
>   {
> vec exits;
> edge ex;
> !   unsigned int j, i;
> struct tree_niter_desc niter_desc;
> -   basic_block *body = get_loop_body (loop);
> bb_predicate = *(struct predicate *) loop->header->aux;
>   
> exits = get_loop_exit_edges (loop);
> --- 2760,2767 
>   {
> vec exits;
> edge ex;
> !   unsigned int j;
> struct tree_niter_desc niter_desc;
> bb_predicate = *(struct predicate *) loop->header->aux;
>   
> exits = get_loop_exit_edges (loop);
> *** estimate_function_body_sizes (struct cgr
> *** 2788,2833 
>   }
> exits.release ();
>   
> !   for (i = 0; i < loop->num_nodes; i++)
>   {
> !   gimple_stmt_iterator gsi;
> !   bb_predicate = *(struct predicate *) body[i]->aux;
> !   for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
> !gsi_next (&gsi))
> ! {
> !   gimple *stmt = gsi_stmt (gsi);
> !   affine_iv iv;
> !   ssa_op_iter iter;
> !   tree use;
> ! 
> !   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
> !   {
> ! predicate will_be_nonconstant;
> ! 
> ! if (!simple_iv
> ! (loop, loop_containing_stmt (stmt), use, &iv, true)
> ! || is_gimple_min_invariant (iv.step))
> !   continue;
> ! will_be_nonconstant
> !   = will_be_nonconstant_expr_predicate (fbi.info, info,
> ! iv.step,
> ! nonconstant_names);
> ! if (!true_predicate_p (&will_be_nonconstant))
> !   will_be_nonconstant
> !  = and_predicates (info->conds,
> !&bb_predicate,
> !&will_be_nonconstant);
> ! if (!true_predicate_p (&will_be_nonconstant)
> ! && !false_predicate_p (&will_be_nonconstant))
> !   /* This is slightly inprecise.  We may want to represent
> !  each loop with independent predicate.  */
> !   loop_stride =
> ! and_predicates (info->conds, &loop_stride,
> ! &will_be_nonconstant);
> !   }
> ! }
>   }
> -   free (body);
>   }
> set_hint_predicate (&inline_summaries->get (node)->loop_iterations,
> loop_iterations);
> --- 2787,2818 
>   }
> exits.release ();
>   
> !   for (gphi_iterator gsi = gsi_start_phis (loop->header);
> !!gsi_end_p (gsi); gsi_next (&gsi))
>   {
> !   gphi *phi = gsi.phi ();
> !   tree use = gimple_phi_result (phi);
> !   affine_iv iv;
> !   predicate will_be_nonconstant;
> !   if (!virtual_operand_p (use)

Just noticed the inverted predicate during testing.  Re-testing with
that fixed now.

Richard.

> !   || !simple_iv (loop, loop, use, &iv, true)
> !   || is_gimple_min_invariant (iv.step))
> ! continue;
> !   will_be_nonconstant
> ! = will_be_nonconstant_expr_predicate (fbi.info, info,
> !   iv.

Re: [PATCH] fortran/67616 -- Fix ICE in BLOCK with a DATA statement

2015-10-01 Thread Mikael Morin

Le 01/10/2015 02:03, Steve Kargl a écrit :

The attached patch was built and tested on x86_64-*-freebsd.
OK to commit?

The patch prevents an ICE in a BLOCK construct that uses
a DATA statement and default initialization.  The problem
was that the derived typed was declared in the host and
was not in the BLOCK's symtree.  The fix looks for the
derived type through host associate.

Just remembered Mikael pre-approved patch.


Yes, OK again. :-)


Re: [gomp4] remove goacc locking

2015-10-01 Thread Nathan Sidwell

On 10/01/15 04:14, Thomas Schwinge wrote:

Hi Nathan!

On Mon, 28 Sep 2015 11:56:09 -0400, Nathan Sidwell  wrote:

I've committed this to remove the now no longer needed lock and unlock builtins
and related infrastructure.


If I understand correctly, it is an implementation detail of the nvptx
offloading implementation that it doesn't require such locking
primitives, but such locking may still be required for other (future)
offloading implementations, at which point something like the following
would have to be introduced again:


I thought about that.  Other implementations can also use the lockless idiom. 
There's always going to be a cmp&swap atomic, otherwise the HW is terribly 
designed.  lock/unlock doesn't work between PTX warps, as we've discovered, and 
I think the same will probably be true of HSA, as that's roughly the same 
conceptually.  If it turns out I'm wrong, these bits can always be resurrected then.


The reason I originally went with lock/unlock is that it was conceptually 
simpler.  The lockless scheme is quite straight forwards, once one understands 
the underlying concept.


nathan


Re: Fold acc_on_device

2015-10-01 Thread Nathan Sidwell

On 10/01/15 06:03, Richard Biener wrote:

On Wed, Sep 30, 2015 at 9:22 PM, Jakub Jelinek  wrote:



Wouldn't it be better to just emit GIMPLE here instead?
So
   tree res = make_ssa_name (boolean_type_node);
   gimple g = gimple_build_assign (res, EQ_EXPR, arg0,
   build_int_cst (integer_type_node, val_host));
   gsi_insert_before (gsi, g);
...


Like this?

nathan
2015-10-01  Nathan Sidwell  

	* builtins.c: Don't include gomp-constants.h.
	(fold_builtin_1): Don't fold acc_on_device here.
	* gimple-fold.c: Include gomp-constants.h.
	(gimple_fold_builtin_acc_on_device): New.
	(gimple_fold_builtin): Call it.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c	(revision 228288)
+++ gcc/gimple-fold.c	(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
 #include "output.h"
 #include "tree-eh.h"
 #include "gimple-match.h"
+#include "gomp-constants.h"
 
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
@@ -2708,6 +2709,47 @@ gimple_fold_builtin_strlen (gimple_stmt_
   return true;
 }
 
+/* Fold a call to __builtin_acc_on_device.  */
+
+static bool
+gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
+{
+  /* Defer folding until we know which compiler we're in.  */
+  if (symtab->state != EXPANSION)
+return false;
+
+  unsigned val_host = GOMP_DEVICE_HOST;
+  unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+  val_host = GOMP_DEVICE_NOT_HOST;
+  val_dev = ACCEL_COMPILER_acc_device;
+#endif
+
+  location_t loc = gimple_location (gsi_stmt (*gsi));
+  
+  tree host_eq = make_ssa_name (boolean_type_node);
+  gimple *host_ass = gimple_build_assign
+(host_eq, EQ_EXPR, arg0, build_int_cst (integer_type_node, val_host));
+  gimple_set_location (host_ass, loc);
+  gsi_insert_before (gsi, host_ass, GSI_SAME_STMT);
+
+  tree dev_eq = make_ssa_name (boolean_type_node);
+  gimple *dev_ass = gimple_build_assign
+(dev_eq, EQ_EXPR, arg0, build_int_cst (integer_type_node, val_dev));
+  gimple_set_location (host_ass, loc);
+  gsi_insert_before (gsi, dev_ass, GSI_SAME_STMT);
+
+  tree result = make_ssa_name (boolean_type_node);
+  gimple *result_ass = gimple_build_assign
+(result, BIT_IOR_EXPR, host_eq, dev_eq);
+  gimple_set_location (host_ass, loc);
+  gsi_insert_before (gsi, result_ass, GSI_SAME_STMT);
+
+  replace_call_with_value (gsi, result);
+
+  return true;
+}
 
 /* Fold the non-target builtin at *GSI and return whether any simplification
was made.  */
@@ -2848,6 +2890,9 @@ gimple_fold_builtin (gimple_stmt_iterato
 	   n == 3
 	   ? gimple_call_arg (stmt, 2)
 	   : NULL_TREE, fcode);
+case BUILT_IN_ACC_ON_DEVICE:
+  return gimple_fold_builtin_acc_on_device (gsi,
+		gimple_call_arg (stmt, 0));
 default:;
 }
 
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 228288)
+++ gcc/builtins.c	(working copy)
@@ -64,7 +64,6 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
-#include "gomp-constants.h"
 
 
 static tree do_mpc_arg1 (tree, tree, int (*)(mpc_ptr, mpc_srcptr, mpc_rnd_t));
@@ -10230,27 +10229,6 @@ fold_builtin_1 (location_t loc, tree fnd
 	return build_empty_stmt (loc);
   break;
 
-case BUILT_IN_ACC_ON_DEVICE:
-  /* Don't fold on_device until we know which compiler is active.  */
-  if (symtab->state == EXPANSION)
-	{
-	  unsigned val_host = GOMP_DEVICE_HOST;
-	  unsigned val_dev = GOMP_DEVICE_NONE;
-
-#ifdef ACCEL_COMPILER
-	  val_host = GOMP_DEVICE_NOT_HOST;
-	  val_dev = ACCEL_COMPILER_acc_device;
-#endif
-	  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
-			  build_int_cst (integer_type_node, val_host));
-	  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
-			 build_int_cst (integer_type_node, val_dev));
-
-	  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
-	  return fold_convert (integer_type_node, result);
-	}
-  break;
-
 default:
   break;
 }


Re: [PATCH] x86 interrupt attribute

2015-10-01 Thread Yulia Koval
Ok, here is the patch.

The interrupt and exception handlers are called by x86 processors.
X86 hardware pushes information onto stack and calls the handler.  The
requirements are

1. Both interrupt and exception handlers must use the 'IRET'
instruction, instead of the 'RET' instruction, to return from the
handlers.

2. All registers are callee-saved in interrupt and exception handlers.

3. The difference between interrupt and exception handlers is the
exception handler must pop 'ERROR_CODE' off the stack before the
'IRET' instruction.

The design goals of interrupt and exception handlers for x86 processors
are:

1. Support both 32-bit and 64-bit modes.
2. Flexible for compilers to optimize.
3. Easy to use by programmers.

To implement interrupt and exception handlers for x86 processors, a
compiler should support:

'interrupt' attribute

Use this attribute to indicate that the specified function with
mandatory arguments is an interrupt or exception handler.  The
compiler generates function entry and exit sequences suitable for use
in an interrupt handler when this attribute is present.  The 'IRET'
instruction, instead of the 'RET' instruction, is used to return from
interrupt or exception handlers.  All registers, except for the EFLAGS
register which is restored by the 'IRET' instruction, are preserved by
the compiler.

Any interruptible-without-stack-switch code must be compiled with
-mno-red-zone since interrupt handlers can and will, because of the
hardware design, touch the red zone.

1. interrupt handler must be declared with a mandatory pointer argument:

struct interrupt_frame;

__attribute__ ((interrupt))

void
f (struct interrupt_frame *frame)
{
...
}

and user must properly define the structure the pointer pointing to.

2. exception handler:

The exception handler is very similar to the interrupt handler with a
different mandatory function signature:

typedef unsigned long long int uword_t;
typedef unsigned int uword_t;

struct interrupt_frame;

__attribute__ ((interrupt))
void
f (struct interrupt_frame *frame, uword_t error_code)
{
 ...
}

and compiler pops the error code off stack before the 'IRET' instruction.

The exception handler should only be used for exceptions which push an
error code and all other exceptions must use the interrupt handler.

The system will crash if the wrong handler is used.

Bootstrapped/regtested on Linux/x86_64 and Linux/i686.

Ok for trunk?

2015-09-29  Julia Koval 
  H.J. Lu 

gcc/
PR target/67630
PR target/67634
* config/i386/i386.c (ix86_frame): Add nbndregs and nmaskregs.
(ix86_nsaved_bndregs): New function.
(ix86_nsaved_maskregs): Likewise.
(ix86_reg_save_area_size): Likewise.
(ix86_nsaved_sseregs): Don't return 0 in interrupt handler.
(ix86_compute_frame_layout): Set nbndregs and nmaskregs.  Set
save_regs_using_mov to true to save bound and mask registers.
Call ix86_reg_save_area_size to get register save area size.
Allocate space to save full vector registers in
interrupt handler.
(ix86_emit_save_reg_using_mov): Set alignment to word_mode
alignment when saving full vector registers in
interrupt handler.
(ix86_emit_save_regs_using_mov): Use regno_reg_rtx to get
register size.
(ix86_emit_restore_regs_using_mov): Likewise.
(ix86_emit_save_sse_regs_using_mov): Save full vector
registers in
interrupt handler.
(ix86_emit_restore_sse_regs_using_mov): Restore full vector
registers in interrupt handler.
(ix86_expand_epilogue): Use move to restore bound registers.
* config/i386/sse.md (*mov_internal): Handle misaligned
SSE load and store in interrupt handler.

PR target/66960
* config/i386/i386-protos.h (ix86_epilogue_uses) New
function decl.
* config/i386/i386.c (ix86_epilogue_uses) New function.
(ix86_set_current_function): Set is_interrupt and is_exception.
Mark arguments in interrupt handler as used.
(ix86_function_ok_for_sibcall): Return false if in interrupt
handler.
(type_natural_mode): Don't warn ABI change for MMX in interrupt
handler.
(ix86_function_arg_advance): Skip for callee in interrupt
handler.
(ix86_function_arg): Handle arguments for callee in interrupt
handler.
(ix86_can_use_return_insn_p): Don't use `ret' instruction in
interrupt handler.
(ix86_save_reg): Preserve callee-saved and
caller-saved registers
in interrupt handler if needed.
(ix86_expand_epilogue): Generate interrupt return for
  

[PATCH] Tune for lakemont

2015-10-01 Thread Yulia Koval
Hi,

The patch below contains some tuning changes for Lakemont, introduced
by H.J. Lu. Bootstraped/regtested for Linux/x86_64. Ok for trunk?

* gcc/config/i386/x86-tune.def (X86_TUNE_USE_BT): Enable
for Lakemont.
(X86_TUNE_ZERO_EXTEND_WITH_AND): Disable for Lakemont.

Julia


0001-gcc-config-i386-x86-tune.def-X86_TUNE_USE_BT-Enable.patch
Description: Binary data


Re: [SH][committed] Improve treg_set_expr matching

2015-10-01 Thread Oleg Endo
On Mon, 2015-09-28 at 23:03 +0900, Oleg Endo wrote:
> Hi,
> 
> This patch has been hanging around in my queue for a while.  Basically,
> it uses reverse_condition to get better matching for treg_set_expr.
> Tested on sh-elf with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> and no new failures.
> Committed as r228202.

Attached is a small follow up patch that fixes a typo in one of the
tests.  Committed as r228332.

Cheers,
Oleg

gcc/testsuite/ChangeLog:
PR target/54236
* gcc.target/sh/pr54236-6.c: Fix assembler-not string.
Index: gcc/testsuite/gcc.target/sh/pr54236-6.c
===
--- gcc/testsuite/gcc.target/sh/pr54236-6.c	(revision 228244)
+++ gcc/testsuite/gcc.target/sh/pr54236-6.c	(working copy)
@@ -14,7 +14,7 @@
 /* { dg-final { scan-assembler-times {tst	#1,r0} 1 } }  */
 /* { dg-final { scan-assembler-times {subc	r} 1 } }  */
 
-/* { dg-final { scan-assembler-not "movt|not|neg\movrt" } }  */
+/* { dg-final { scan-assembler-not "movt|not\t|neg\t|movrt" } }  */
 
 
 struct inode


Re: [PATCH] Tune for lakemont

2015-10-01 Thread Uros Bizjak
On Thu, Oct 1, 2015 at 2:37 PM, Yulia Koval  wrote:
> Hi,
>
> The patch below contains some tuning changes for Lakemont, introduced
> by H.J. Lu. Bootstraped/regtested for Linux/x86_64. Ok for trunk?
>
> * gcc/config/i386/x86-tune.def (X86_TUNE_USE_BT): Enable
> for Lakemont.
> (X86_TUNE_ZERO_EXTEND_WITH_AND): Disable for Lakemont.

Non-algorithmic tuning of various parameters, especially the ones in
x86-tune.def, is always OK and pre-approved as an "obvious" patch.

Thanks,
Uros.


[patch] Add counter inits to zero_iter_bb in expand_omp_for_init_counts

2015-10-01 Thread Tom de Vries

Hi,

this patch adds initialization in zero_iter_bb of counters introduced in 
expand_omp_for_init_counts.


This removes the need to set TREE_NO_WARNING on those counters.

Build on x86_64 and reg-tested with gomp.exp and target-libgomp c.exp.

OK for trunk, if bootstrap and reg-test on x86_64 succeeds?

Thanks,
- Tom
Add counter inits to zero_iter_bb in expand_omp_for_init_counts

2015-10-01  Tom de Vries  

	* omp-low.c (expand_omp_for_init_counts): Add inits for counters in
	zero_iter_bb.
	(expand_omp_for_generic): Remove TREE_NO_WARNING setttings on counters.

	* gcc.dg/gomp/collapse-2.c: New test.
---
 gcc/omp-low.c  | 26 +++---
 gcc/testsuite/gcc.dg/gomp/collapse-2.c | 19 +++
 2 files changed, 38 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/collapse-2.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 8bcad08..8181757 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5732,6 +5732,7 @@ expand_omp_for_init_counts (struct omp_for_data *fd, gimple_stmt_iterator *gsi,
   return;
 }
 
+  bool created_zero_iter_bb = false;
   for (i = 0; i < fd->collapse; i++)
 {
   tree itype = TREE_TYPE (fd->loops[i].v);
@@ -5774,6 +5775,7 @@ expand_omp_for_init_counts (struct omp_for_data *fd, gimple_stmt_iterator *gsi,
 	  gsi_insert_before (gsi, assign_stmt, GSI_SAME_STMT);
 	  set_immediate_dominator (CDI_DOMINATORS, zero_iter_bb,
    entry_bb);
+	  created_zero_iter_bb = true;
 	}
 	  ne = make_edge (entry_bb, zero_iter_bb, EDGE_FALSE_VALUE);
 	  ne->probability = REG_BR_PROB_BASE / 2000 - 1;
@@ -5826,6 +5828,23 @@ expand_omp_for_init_counts (struct omp_for_data *fd, gimple_stmt_iterator *gsi,
 	  expand_omp_build_assign (gsi, fd->loop.n2, t);
 	}
 }
+
+  if (created_zero_iter_bb)
+{
+  gimple_stmt_iterator gsi = gsi_after_labels (zero_iter_bb);
+  /* Atm counts[0] doesn't seem to be used beyond create_zero_iter_bb,
+	 but for robustness-sake we include that one as well.  */
+  for (i = 0; i < fd->collapse; i++)
+	{
+	  tree var = counts[i];
+	  if (!SSA_VAR_P (var))
+	continue;
+
+	  tree zero = build_zero_cst (type);
+	  gassign *assign_stmt = gimple_build_assign (var, zero);
+	  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+	}
+}
 }
 
 
@@ -6116,7 +6135,6 @@ expand_omp_for_generic (struct omp_region *region,
   bool broken_loop = region->cont == NULL;
   edge e, ne;
   tree *counts = NULL;
-  int i;
 
   gcc_assert (!broken_loop || !in_combined_parallel);
   gcc_assert (fd->iter_type == long_integer_type_node
@@ -6185,12 +6203,6 @@ expand_omp_for_generic (struct omp_region *region,
 
   if (zero_iter_bb)
 	{
-	  /* Some counts[i] vars might be uninitialized if
-	 some loop has zero iterations.  But the body shouldn't
-	 be executed in that case, so just avoid uninit warnings.  */
-	  for (i = first_zero_iter; i < fd->collapse; i++)
-	if (SSA_VAR_P (counts[i]))
-	  TREE_NO_WARNING (counts[i]) = 1;
 	  gsi_prev (&gsi);
 	  e = split_block (entry_bb, gsi_stmt (gsi));
 	  entry_bb = e->dest;
diff --git a/gcc/testsuite/gcc.dg/gomp/collapse-2.c b/gcc/testsuite/gcc.dg/gomp/collapse-2.c
new file mode 100644
index 000..5319f89
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/collapse-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -fdump-tree-ssa" } */
+
+#define N 100
+
+int a[N][N];
+
+void
+foo (int m, int n)
+{
+  int i, j;
+#pragma omp parallel
+#pragma omp for collapse(2) schedule (runtime)
+  for (i = 0; i < m; i++)
+for (j = 0; j < n; j++)
+  a[i][j] = 1;
+}
+
+/* { dg-final { scan-tree-dump-not "(?n)PHI.*count.*\\(D\\)" "ssa" } } */
-- 
1.9.1



Re: Fold acc_on_device

2015-10-01 Thread Richard Biener
On Thu, Oct 1, 2015 at 2:33 PM, Nathan Sidwell  wrote:
> On 10/01/15 06:03, Richard Biener wrote:
>>
>> On Wed, Sep 30, 2015 at 9:22 PM, Jakub Jelinek  wrote:
>
>
>>> Wouldn't it be better to just emit GIMPLE here instead?
>>> So
>>>tree res = make_ssa_name (boolean_type_node);
>>>gimple g = gimple_build_assign (res, EQ_EXPR, arg0,
>>>build_int_cst (integer_type_node,
>>> val_host));
>>>gsi_insert_before (gsi, g);
>>> ...
>
>
> Like this?

+  gimple *host_ass = gimple_build_assign
+(host_eq, EQ_EXPR, arg0, build_int_cst (integer_type_node, val_host));

use TREE_TYPE (arg0) for the integer cst.

Otherwise looks good to me.

Thanks,
Richard.

> nathan


Re: [patch] Add counter inits to zero_iter_bb in expand_omp_for_init_counts

2015-10-01 Thread Jakub Jelinek
On Thu, Oct 01, 2015 at 02:46:01PM +0200, Tom de Vries wrote:
> this patch adds initialization in zero_iter_bb of counters introduced in
> expand_omp_for_init_counts.
> 
> This removes the need to set TREE_NO_WARNING on those counters.

Why do you think it is a good idea?  I'd be afraid it slows things down
unnecessarily.  Furthermore, I'd prefer not to change this area of code before
gomp-4_1-branch is merged, as it will be a nightmare for the merge
otherwise.

Jakub


[PATCH, testsuite]: Skip gcc.dg/lto/pr55113_0.c on all x86 targets.

2015-10-01 Thread Uros Bizjak
Hello!

There is no point to use -fshort-double on x86 targets, although it
works with -mno-sse, where it avoids construction of DFmode based
vector builtins.

So, disable non-sensical test on all x86 targets.

2015-10-01  Uros Bizjak  

* gcc.dg/lto/pr55113_0.c: Skip on all x86 targets.

Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.

Uros.

Index: gcc.dg/lto/pr55113_0.c
===
--- gcc.dg/lto/pr55113_0.c  (revision 228326)
+++ gcc.dg/lto/pr55113_0.c  (working copy)
@@ -1,8 +1,7 @@
 /* PR 55113 */
 /* { dg-lto-do link } */
 /* { dg-lto-options { { -flto -fshort-double -O0 } } }*/
-/* { dg-skip-if "PR60410" { x86_64-*-* || { i?86-*-* && lp64 } } } */
-/* { dg-skip-if "PR60410" { i?86-*-solaris2.1[0-9]* } } */
+/* { dg-skip-if "PR60410" { i?86-*-* x86_64-*-* } } */

 int
 main(void)


[PATCH][AArch64] Don't allow -mgeneral-regs-only to change the .arch assembler directives

2015-10-01 Thread Kyrill Tkachov

Hi all,

As part of the SWITCHABLE_TARGET work I inadvertently changed the behaviour of 
-mgeneral-regs-only with respect to the .arch directives that we emit.
The behaviour of -mgeneral-regs-only in GCC 5 and earlier is such that it 
disallows the usage of FP/SIMD registers but does *not* stop the compiler from
emitting the +fp,+simd etc extensions in the .arch directive of the generated 
assembly. This is to accommodate users who may want to write inline assembly
in a file compiled with -mgeneral-regs-only.

This patch restores the trunk behaviour in that respect to that of GCC 5 and 
the documentation for the option is tweaked a bit to reflect that.
Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-01  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_override_options_internal):
Do not alter target_flags due to TARGET_GENERAL_REGS_ONLY_P.
* doc/invoke.texi (AArch64 options): Mention that -mgeneral-regs-only
does not affect the assembler directives.

2015-10-01  Kyrylo Tkachov  

* gcc.target/aarch64/mgeneral-regs_4.c: New test.
commit bd99347f0dad9346dc16ffc13cd423a4889ae339
Author: Kyrylo Tkachov 
Date:   Fri Sep 11 09:40:44 2015 +0100

[AArch64] Don't allow -mgeneral-regs-only to change the .arch assembler directives

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 115c3a7..81e0eb0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7658,19 +7658,6 @@ aarch64_override_options_internal (struct gcc_options *opts)
   if (opts->x_flag_strict_volatile_bitfields < 0 && abi_version_at_least (2))
 opts->x_flag_strict_volatile_bitfields = 1;
 
-  /* -mgeneral-regs-only sets a mask in target_flags, make sure that
- aarch64_isa_flags does not contain the FP/SIMD/Crypto feature flags
- in case some code tries reading aarch64_isa_flags directly to check if
- FP is available.  Reuse the aarch64_parse_extension machinery since it
- knows how to disable any other flags that fp implies.  */
-  if (TARGET_GENERAL_REGS_ONLY_P (opts->x_target_flags))
-{
-  /* aarch64_parse_extension takes char* rather than const char* because
-	 it is usually called from within other parsing functions.  */
-  char tmp_str[] = "+nofp";
-  aarch64_parse_extension (tmp_str, &opts->x_aarch64_isa_flags);
-}
-
   initialize_aarch64_code_model (opts);
   initialize_aarch64_tls_size (opts);
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 547ee2d..e8067f2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12304,10 +12304,9 @@ Generate big-endian code.  This is the default when GCC is configured for an
 
 @item -mgeneral-regs-only
 @opindex mgeneral-regs-only
-Generate code which uses only the general-purpose registers.  This is equivalent
-to feature modifier @option{nofp} of @option{-march} or @option{-mcpu}, except
-that @option{-mgeneral-regs-only} takes precedence over any conflicting feature
-modifier regardless of sequence.
+Generate code which uses only the general-purpose registers.  This will prevent
+the compiler from using floating-point and Advanced SIMD registers but will not
+impose any restrictions on the assembler.
 
 @item -mlittle-endian
 @opindex mlittle-endian
diff --git a/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_4.c b/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_4.c
new file mode 100644
index 000..8eb50aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_4.c
@@ -0,0 +1,9 @@
+/* { dg-options "-mgeneral-regs-only -march=armv8-a+simd+fp -O2" } */
+
+int
+test (void)
+{
+  return 1;
+}
+
+/* { dg-final { scan-assembler "\.arch.*fp.*simd" } } */


[PATCH][RTL ifcvt] PR 67786, 67787: Check that intermediate instructions in the basic block don't clobber a reg used in condition

2015-10-01 Thread Kyrill Tkachov

Hi all,

This patch fixes the two wrong-code PRs.
The problem is related to the way the noce_emit_cmove helper function emits 
conditional moves.
For some targets it re-emits the comparison from the condition block and then 
the conditional move
after we have emitted the two basic blocks. Later passes always catch the 
redundant comparison and eliminate
it anyway. However, this means that if any of the basic blocks clobber a 
register
that is used in that comparison, the comparison will go wrong.

This happens in the testcase where one of the intermediate insns in the basic 
block re-used a pseudo reg
that was used in the comparison to store an intermediate result. When the 
comparison was re-emitted by
noce_emit_cmove later, it used the clobbered pseudo reg.

There's no reason why the basic block should have used that pseudo-reg and not 
just used a fresh one,
but that's what the previous passes produced (this takes place in ce2 after 
combine) and RTL is not
in SSA form.

Anyway, the simple way to deal with this is in bb_valid_for_noce_process_p to 
reject a SET destination that
appears in the cond expression.

This patch fixes the testcases and bootstrap and testing passes on arm, x86_64 
and aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-10-01  Kyrylo Tkachov  

PR rtl-optimization/67786
PR rtl-optimization/67787
* ifcvt.c (bb_valid_for_noce_process_p): Reject basic block if
it modifies a reg used in the condition calculation.

2015-10-01  Kyrylo Tkachov  

* gcc.dg/pr67786.c: New test.
* gcc.dg/pr67787.c: Likewise.
commit ee8b9f163dad61e43f9f53f1e6c4e224a3712095
Author: Kyrylo Tkachov 
Date:   Thu Oct 1 09:37:27 2015 +0100

[RTL ifcvt] PR 67786, 67787: Check that intermediate instructions in the basic block don't clobber a reg used in condition

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f280c64..8846e69 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3110,7 +3110,8 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
 	  gcc_assert (sset);
 
 	  if (contains_mem_rtx_p (SET_SRC (sset))
-	  || !REG_P (SET_DEST (sset)))
+	  || !REG_P (SET_DEST (sset))
+	  || reg_overlap_mentioned_p (SET_DEST (sset), cond))
 	goto free_bitmap_and_fail;
 
 	  potential_cost += insn_rtx_cost (sset, speed_p);
diff --git a/gcc/testsuite/gcc.dg/pr67786.c b/gcc/testsuite/gcc.dg/pr67786.c
new file mode 100644
index 000..76525e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr67786.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+int a, b = 10;
+char c;
+
+int
+main ()
+{
+  char d;
+  int e = 5;
+  for (a = 0; a; a--)
+e = 0;
+  c = (b & 15) ^ e;
+  d = c > e ? c : c << e;
+  __builtin_printf ("%d\n", d);
+  return 0;
+}
+
+/* { dg-output "15" } */
diff --git a/gcc/testsuite/gcc.dg/pr67787.c b/gcc/testsuite/gcc.dg/pr67787.c
new file mode 100644
index 000..238d7e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr67787.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+int a, c, f, g;
+char b;
+
+static int
+fn1 ()
+{
+  char h;
+  int k = -1, i, j;
+  for (; b < 16; b++)
+;
+  __builtin_printf (" ");
+  if (b < 5)
+k++;
+  if (k)
+{
+  int l = 2;
+  a = h = b < 0 || b > (127 >> l) ? b : b << 1;
+  return 0;
+}
+  for (i = 0; i < 1; i++)
+for (j = 0; j < 7; j++)
+  f = 0;
+  for (c = 0; c; c++)
+;
+  if (g)
+for (;;)
+  ;
+  return 0;
+}
+
+int
+main ()
+{
+  fn1 ();
+
+  if (a != 32)
+__builtin_abort ();
+
+  return 0;
+}


Re: [PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-10-01 Thread Marek Polacek
On Thu, Oct 01, 2015 at 09:57:54AM +0200, Richard Biener wrote:
> On Wed, 30 Sep 2015, Marek Polacek wrote:
> 
> > Another instance of out of date SSA range info.  Before phiopt1 we had
> > 
> >   :
> >   if (N_2(D) >= 0)
> > goto ;
> >   else
> > goto ;
> > 
> >   :
> >   iftmp.0_3 = MIN_EXPR ;
> > 
> >   :
> >   # iftmp.0_5 = PHI <0(2), iftmp.0_3(3)>
> >   value_4 = (short int) iftmp.0_5;
> >   return value_4;
> > 
> > and after phiop1:
> > 
> >   :
> >   iftmp.0_3 = MIN_EXPR ;
> >   iftmp.0_6 = MAX_EXPR ;
> >   value_4 = (short int) iftmp.0_6;
> >   return value_4;
> > 
> > But the flow-sensitive info in this BB hasn't been cleared up.
> > 
> > This problem doesn't show up in GCC5 but might be latent there.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5 as well?
> > 
> > 2015-09-30  Marek Polacek  
> > 
> > PR tree-optimization/67769
> > * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
> > reset_flow_sensitive_info_in_bb when changing the CFG.
> > 
> > * gcc.dg/torture/pr67769.c: New test.
> > 
> > diff --git gcc/testsuite/gcc.dg/torture/pr67769.c 
> > gcc/testsuite/gcc.dg/torture/pr67769.c
> > index e69de29..c1d17c3 100644
> > --- gcc/testsuite/gcc.dg/torture/pr67769.c
> > +++ gcc/testsuite/gcc.dg/torture/pr67769.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do run } */
> > +
> > +static int
> > +clamp (int x, int lo, int hi)
> > +{
> > +  return (x < lo) ? lo : ((x > hi) ? hi : x);
> > +}
> > +
> > +__attribute__ ((noinline))
> > +short
> > +foo (int N)
> > +{
> > +  short value = clamp (N, 0, 16);
> > +  return value;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (foo (-5) != 0)
> > +__builtin_abort ();
> > +  return 0;
> > +}
> > diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> > index 37fdf28..101988a 100644
> > --- gcc/tree-ssa-phiopt.c
> > +++ gcc/tree-ssa-phiopt.c
> > @@ -338,6 +338,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> > do_hoist_loads)
> >   else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> > cfgchanged = true;
> > }
> > +  if (cfgchanged)
> > +   reset_flow_sensitive_info_in_bb (bb);
> 
> That's a bit conservative.  I believe most PHI opt transforms should
> be fine as the conditionally executed blocks did not contain any
> stmts that prevail.  The merge PHI also should have valid range info.

Aha.  So would reseting the flow info at the end of conditional_replacement,
minmax_replacement, and abs_replacement work for you?  As in the untested
patch.  Or even somewhere else?

Yeah, I thought I'd rather be conservative here...
 
> So I don't think the patch is good as-is.  Please consider reverting
> if you already applied it.

Luckily I have not ;).

2015-10-01  Marek Polacek  

PR tree-optimization/67769
* tree-ssa-phiopt.c (conditional_replacement): Call
reset_flow_sensitive_info_in_bb.
(minmax_replacement): Likewise.
(abs_replacement): Likewise.

* gcc.dg/torture/pr67769.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67769.c 
gcc/testsuite/gcc.dg/torture/pr67769.c
index e69de29..c1d17c3 100644
--- gcc/testsuite/gcc.dg/torture/pr67769.c
+++ gcc/testsuite/gcc.dg/torture/pr67769.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+
+static int
+clamp (int x, int lo, int hi)
+{
+  return (x < lo) ? lo : ((x > hi) ? hi : x);
+}
+
+__attribute__ ((noinline))
+short
+foo (int N)
+{
+  short value = clamp (N, 0, 16);
+  return value;
+}
+
+int
+main ()
+{
+  if (foo (-5) != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
index 37fdf28..697836a 100644
--- gcc/tree-ssa-phiopt.c
+++ gcc/tree-ssa-phiopt.c
@@ -646,6 +646,7 @@ conditional_replacement (basic_block cond_bb, basic_block 
middle_bb,
 }
 
   replace_phi_edge_with_variable (cond_bb, e1, phi, new_var);
+  reset_flow_sensitive_info_in_bb (cond_bb);
 
   /* Note that we optimized this PHI.  */
   return true;
@@ -1284,6 +1285,8 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
   gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
 
   replace_phi_edge_with_variable (cond_bb, e1, phi, result);
+  reset_flow_sensitive_info_in_bb (cond_bb);
+
   return true;
 }
 
@@ -1402,6 +1405,7 @@ abs_replacement (basic_block cond_bb, basic_block 
middle_bb,
 }
 
   replace_phi_edge_with_variable (cond_bb, e1, phi, result);
+  reset_flow_sensitive_info_in_bb (cond_bb);
 
   /* Note that we optimized this PHI.  */
   return true;

Marek


Re: [PATCH] fortran/67758 -- Prevent ICE caused by misplaced COMMON

2015-10-01 Thread Mikael Morin

Le 01/10/2015 14:16, Mikael Morin a écrit :

Le 01/10/2015 02:07, Steve Kargl a écrit :

On Wed, Sep 30, 2015 at 05:06:30PM -0700, Steve Kargl wrote:

Patch built and regression tested on x86_64-*-freebsd.
OK to commit?

The patch prevents the dereferencing of a NULL pointer
by jumping out of the cleanup of a list of COMMON blocks.


Hold on, I believe p should be present in the common symbol list pointed
by p->common.

s/p->common/p->common_block/

And by the way, if we are in gfc_restore_last_undo_checkpoint, we have
found something bogus enough to backtrack, so hopefully an error has
already been prepared (but maybe not emitted).
I will investigate more.

It seems the error [1] is reported in gfc_add_in_common, between the 
time the symbol's common_block pointer is set and the time the symbol is 
added to the list.
As the program goes straight to clean-up/return upon error, this interim 
state is not fixed and poses problem.


So we need to reduce the interim time to zero or fix the state upon error.
I propose the following, which delays setting the common_block after 
error_checking (I believe it is not used in that time).


Regression-tested on x86_64-unknown-linux-gnu. OK for trunk?

Mikael


[1] Error: PROCEDURE attribute conflicts with COMMON attribute in 'xx' 
at (1)


2015-10-01  Mikael Morin  

PR fortran/67758
* match.c (gfc_match_common): Delay the common_block pointer
assignment after error checking.

2015-10-01  Mikael Morin  

PR fortran/67758
* gfortran.dg/common_24.f: New.
Index: match.c
===
--- match.c	(révision 228170)
+++ match.c	(copie de travail)
@@ -4330,10 +4330,6 @@ gfc_match_common (void)
 	  if (m == MATCH_NO)
 	goto syntax;
 
-  /* Store a ref to the common block for error checking.  */
-  sym->common_block = t;
-  sym->common_block->refs++;
-
   /* See if we know the current common block is bind(c), and if
  so, then see if we can check if the symbol is (which it'll
  need to be).  This can happen if the bind(c) attr stmt was
@@ -4379,6 +4375,10 @@ gfc_match_common (void)
 	  if (!gfc_add_in_common (&sym->attr, sym->name, NULL))
 	goto cleanup;
 
+  /* Store a ref to the common block for error checking.  */
+  sym->common_block = t;
+  sym->common_block->refs++;
+
 	  if (tail != NULL)
 	tail->common_next = sym;
 	  else

c { dg-do compile }
c PR fortran/67758
c
c Check the absence of ICE after emitting the error message
c
c Contributed by Ilya Enkovich 

  COMMON /FMCOM / X(80 000 000)
  CALL T(XX(A))
  COMMON /FMCOM / XX(80 000 000) ! { dg-error "conflicts with COMMON" }
  END



Re: [PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-10-01 Thread Marek Polacek
On Thu, Oct 01, 2015 at 03:26:34PM +0200, Richard Biener wrote:
> No, this looks fine.

Thanks.  Let me do proper testing then.  (And I suppose we might want this 
in gcc-5 as well.)

Marek


Re: [PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-10-01 Thread Richard Biener
On Thu, 1 Oct 2015, Marek Polacek wrote:

> On Thu, Oct 01, 2015 at 09:57:54AM +0200, Richard Biener wrote:
> > On Wed, 30 Sep 2015, Marek Polacek wrote:
> > 
> > > Another instance of out of date SSA range info.  Before phiopt1 we had
> > > 
> > >   :
> > >   if (N_2(D) >= 0)
> > > goto ;
> > >   else
> > > goto ;
> > > 
> > >   :
> > >   iftmp.0_3 = MIN_EXPR ;
> > > 
> > >   :
> > >   # iftmp.0_5 = PHI <0(2), iftmp.0_3(3)>
> > >   value_4 = (short int) iftmp.0_5;
> > >   return value_4;
> > > 
> > > and after phiop1:
> > > 
> > >   :
> > >   iftmp.0_3 = MIN_EXPR ;
> > >   iftmp.0_6 = MAX_EXPR ;
> > >   value_4 = (short int) iftmp.0_6;
> > >   return value_4;
> > > 
> > > But the flow-sensitive info in this BB hasn't been cleared up.
> > > 
> > > This problem doesn't show up in GCC5 but might be latent there.
> > > 
> > > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5 as well?
> > > 
> > > 2015-09-30  Marek Polacek  
> > > 
> > >   PR tree-optimization/67769
> > >   * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
> > >   reset_flow_sensitive_info_in_bb when changing the CFG.
> > > 
> > >   * gcc.dg/torture/pr67769.c: New test.
> > > 
> > > diff --git gcc/testsuite/gcc.dg/torture/pr67769.c 
> > > gcc/testsuite/gcc.dg/torture/pr67769.c
> > > index e69de29..c1d17c3 100644
> > > --- gcc/testsuite/gcc.dg/torture/pr67769.c
> > > +++ gcc/testsuite/gcc.dg/torture/pr67769.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do run } */
> > > +
> > > +static int
> > > +clamp (int x, int lo, int hi)
> > > +{
> > > +  return (x < lo) ? lo : ((x > hi) ? hi : x);
> > > +}
> > > +
> > > +__attribute__ ((noinline))
> > > +short
> > > +foo (int N)
> > > +{
> > > +  short value = clamp (N, 0, 16);
> > > +  return value;
> > > +}
> > > +
> > > +int
> > > +main ()
> > > +{
> > > +  if (foo (-5) != 0)
> > > +__builtin_abort ();
> > > +  return 0;
> > > +}
> > > diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> > > index 37fdf28..101988a 100644
> > > --- gcc/tree-ssa-phiopt.c
> > > +++ gcc/tree-ssa-phiopt.c
> > > @@ -338,6 +338,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> > > do_hoist_loads)
> > > else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> > >   cfgchanged = true;
> > >   }
> > > +  if (cfgchanged)
> > > + reset_flow_sensitive_info_in_bb (bb);
> > 
> > That's a bit conservative.  I believe most PHI opt transforms should
> > be fine as the conditionally executed blocks did not contain any
> > stmts that prevail.  The merge PHI also should have valid range info.
> 
> Aha.  So would reseting the flow info at the end of conditional_replacement,
> minmax_replacement, and abs_replacement work for you?  As in the untested
> patch.  Or even somewhere else?

No, this looks fine.

Thanks,
Richard.

> Yeah, I thought I'd rather be conservative here...
>  
> > So I don't think the patch is good as-is.  Please consider reverting
> > if you already applied it.
> 
> Luckily I have not ;).
> 
> 2015-10-01  Marek Polacek  
> 
>   PR tree-optimization/67769
>   * tree-ssa-phiopt.c (conditional_replacement): Call
>   reset_flow_sensitive_info_in_bb.
>   (minmax_replacement): Likewise.
>   (abs_replacement): Likewise.
> 
>   * gcc.dg/torture/pr67769.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/torture/pr67769.c 
> gcc/testsuite/gcc.dg/torture/pr67769.c
> index e69de29..c1d17c3 100644
> --- gcc/testsuite/gcc.dg/torture/pr67769.c
> +++ gcc/testsuite/gcc.dg/torture/pr67769.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run } */
> +
> +static int
> +clamp (int x, int lo, int hi)
> +{
> +  return (x < lo) ? lo : ((x > hi) ? hi : x);
> +}
> +
> +__attribute__ ((noinline))
> +short
> +foo (int N)
> +{
> +  short value = clamp (N, 0, 16);
> +  return value;
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (-5) != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git gcc/tree-ssa-phiopt.c gcc/tree-ssa-phiopt.c
> index 37fdf28..697836a 100644
> --- gcc/tree-ssa-phiopt.c
> +++ gcc/tree-ssa-phiopt.c
> @@ -646,6 +646,7 @@ conditional_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>  }
>  
>replace_phi_edge_with_variable (cond_bb, e1, phi, new_var);
> +  reset_flow_sensitive_info_in_bb (cond_bb);
>  
>/* Note that we optimized this PHI.  */
>return true;
> @@ -1284,6 +1285,8 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
>  
>replace_phi_edge_with_variable (cond_bb, e1, phi, result);
> +  reset_flow_sensitive_info_in_bb (cond_bb);
> +
>return true;
>  }
>  
> @@ -1402,6 +1405,7 @@ abs_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>  }
>  
>replace_phi_edge_with_variable (cond_bb, e1, phi, result);
> +  reset_flow_sensitive_info_in_bb (cond_bb);
>  
>/* Note that we optimized this PHI.  */
>return true;
> 
>   Marek
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG

Re: [PATCH] Clear flow-sensitive info in phiopt (PR tree-optimization/67769)

2015-10-01 Thread Richard Biener
On Thu, 1 Oct 2015, Marek Polacek wrote:

> On Thu, Oct 01, 2015 at 03:26:34PM +0200, Richard Biener wrote:
> > No, this looks fine.
> 
> Thanks.  Let me do proper testing then.  (And I suppose we might want this 
> in gcc-5 as well.)

Yes.  All the bugs are latent for GCC 5 (and also 4.9), they are just
now exposed more easily by trusting the ranges in VRP2 (which was really
meant as a way to catch these bugs, not so much to improve VRP2)

Richard.


Re: [patch] Add counter inits to zero_iter_bb in expand_omp_for_init_counts

2015-10-01 Thread Tom de Vries

On 01/10/15 14:49, Jakub Jelinek wrote:

On Thu, Oct 01, 2015 at 02:46:01PM +0200, Tom de Vries wrote:

this patch adds initialization in zero_iter_bb of counters introduced in
expand_omp_for_init_counts.

This removes the need to set TREE_NO_WARNING on those counters.


Why do you think it is a good idea?


In replace_ssa_name, I've recently added the assert:
...
  gcc_assert (!SSA_NAME_IS_DEFAULT_DEF (name));
...

On the gomp-4_0-branch, this assert triggers for a collapsed acc loop, 
which uses expand_omp_for_generic for omp-expansion.  The assert 
triggers because (some of) the counters added by 
expand_omp_for_init_counts are not initialized on all paths.


On trunk, for the test-case in the patch, this assert doesn't trigger 
because the omp function is split off before ssa.



I'd be afraid it slows things down unnecessarily.


I think zero_iter_bb is a block that is expected not to be executed 
frequently.


I've attached an sdiff of x86_64 assembly for the test-case (before 
left, after right). AFAICT, this patch has the effect that it speeds up 
the frequent path with one instruction.



 Furthermore, I'd prefer not to change this area of code before
gomp-4_1-branch is merged, as it will be a nightmare for the merge
otherwise.


Committing to gomp-4_0-branch for now would work for me.

Thanks,
- Tom

.file   "collapse.c".file   "collapse.c"
.text   .text
.p2align 4,,15  .p2align 4,,15
.type   foo._omp_fn.0, @funct   .type   foo._omp_fn.0, @funct
foo._omp_fn.0:  foo._omp_fn.0:
.LFB12: .LFB12:
.cfi_startproc  .cfi_startproc
pushq   %rbppushq   %rbp
.cfi_def_cfa_offset 16  .cfi_def_cfa_offset 16
.cfi_offset 6, -16  .cfi_offset 6, -16
pushq   %rbxpushq   %rbx
.cfi_def_cfa_offset 24  .cfi_def_cfa_offset 24
.cfi_offset 3, -24  .cfi_offset 3, -24
xorl%esi, %esi<
subq$24, %rsp   subq$24, %rsp
.cfi_def_cfa_offset 48  .cfi_def_cfa_offset 48
movl(%rdi), %eaxmovl(%rdi), %eax
movl4(%rdi), %ebp   movl4(%rdi), %ebp
testl   %eax, %eax  testl   %eax, %eax
jle .L8   | jle .L9
testl   %ebp, %ebp  testl   %ebp, %ebp
jle .L8   | jle .L9
movslq  %ebp, %rbx  movslq  %ebp, %rbx
movslq  %eax, %rsi  movslq  %eax, %rsi
imulq   %rbx, %rsi  imulq   %rbx, %rsi
.L8:.L8:
leaq8(%rsp), %r8leaq8(%rsp), %r8
xorl%edi, %edi  xorl%edi, %edi
movq%rsp, %rcx  movq%rsp, %rcx
movl$1, %edxmovl$1, %edx
callGOMP_loop_runtime_sta   callGOMP_loop_runtime_sta
testb   %al, %altestb   %al, %al
jne .L10jne .L10
.L6:.L6:
callGOMP_loop_end_nowaitcallGOMP_loop_end_nowait
addq$24, %rsp   addq$24, %rsp
.cfi_remember_state .cfi_remember_state
.cfi_def_cfa_offset 24  .cfi_def_cfa_offset 24
popq%rbxpopq%rbx
.cfi_def_cfa_offset 16  .cfi_def_cfa_offset 16
popq%rbppopq%rbp
.cfi_def_cfa_offset 8   .cfi_def_cfa_offset 8
ret ret
.p2align 4,,10  .p2align 4,,10
.p2align 3  .p2align 3
.L15:   .L15:
.cfi_restore_state  .cfi_restore_state
leaq8(%rsp), %rsi   leaq8(%rsp), %rsi
movq%rsp, %rdi  movq%rsp, %rdi
callGOMP_loop_runtime_nex   callGOMP_loop_runtime_nex
testb   %al, %altestb   %al, %al
je  .L6 je  .L6
.L10:   .L10:
movq(%rsp), %rsimovq(%rsp), %rsi
movq8(%rsp), %r9movq8(%rsp), %r9
movq%rsi, %rax  movq%rsi, %rax
cqtocqto
idivq   %rbx 

Re: [PATCH] Tune for lakemont

2015-10-01 Thread H.J. Lu
On Thu, Oct 1, 2015 at 5:42 AM, Uros Bizjak  wrote:
> On Thu, Oct 1, 2015 at 2:37 PM, Yulia Koval  wrote:
>> Hi,
>>
>> The patch below contains some tuning changes for Lakemont, introduced
>> by H.J. Lu. Bootstraped/regtested for Linux/x86_64. Ok for trunk?
>>
>> * gcc/config/i386/x86-tune.def (X86_TUNE_USE_BT): Enable
>> for Lakemont.
>> (X86_TUNE_ZERO_EXTEND_WITH_AND): Disable for Lakemont.
>
> Non-algorithmic tuning of various parameters, especially the ones in
> x86-tune.def, is always OK and pre-approved as an "obvious" patch.
>

Checked in.

Thanks.

-- 
H.J.


[PATCH] Remove gimplifier use from PRE

2015-10-01 Thread Richard Biener

The following patch from the match-and-simplify branch removes
gimplifier use from PRE replacing it with use of the gimple_build API
building GIMPLE directly.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-10-01  Richard Biener  

* tree-ssa-pre.c (create_component_ref_by_pieces_1): Build
GIMPLE calls directly.
(create_expression_by_pieces): Use gimple_build API and avoid
force_gimple_operand.
(insert_into_preds_of_block): Likewise.
(do_regular_insertion): Add comment.

Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 228320)
--- gcc/tree-ssa-pre.c  (working copy)
*** create_component_ref_by_pieces_1 (basic_
*** 2475,2483 
  {
  case CALL_EXPR:
{
!   tree folded, sc = NULL_TREE;
!   unsigned int nargs = 0;
!   tree fn, *args;
if (TREE_CODE (currop->op0) == FUNCTION_DECL)
  fn = currop->op0;
else
--- 2475,2482 
  {
  case CALL_EXPR:
{
!   tree sc = NULL_TREE;
!   tree fn;
if (TREE_CODE (currop->op0) == FUNCTION_DECL)
  fn = currop->op0;
else
*** create_component_ref_by_pieces_1 (basic_
*** 2490,2514 
if (!sc)
  return NULL_TREE;
  }
!   args = XNEWVEC (tree, ref->operands.length () - 1);
while (*operand < ref->operands.length ())
  {
!   args[nargs] = create_component_ref_by_pieces_1 (block, ref,
!   operand, stmts);
!   if (!args[nargs])
  return NULL_TREE;
!   nargs++;
  }
!   folded = build_call_array (currop->type,
!  (TREE_CODE (fn) == FUNCTION_DECL
!   ? build_fold_addr_expr (fn) : fn),
!  nargs, args);
!   if (currop->with_bounds)
! CALL_WITH_BOUNDS_P (folded) = true;
!   free (args);
if (sc)
! CALL_EXPR_STATIC_CHAIN (folded) = sc;
!   return folded;
}
  
  case MEM_REF:
--- 2489,2521 
if (!sc)
  return NULL_TREE;
  }
!   auto_vec args (ref->operands.length () - 1);
while (*operand < ref->operands.length ())
  {
!   tree arg = create_component_ref_by_pieces_1 (block, ref,
!operand, stmts);
!   if (!arg)
  return NULL_TREE;
!   args.quick_push (arg);
  }
!   gcall *call = gimple_build_call_vec ((TREE_CODE (fn) == FUNCTION_DECL
! ? build_fold_addr_expr (fn) : fn),
!args);
!   gimple_call_set_with_bounds (call, currop->with_bounds);
if (sc)
! gimple_call_set_chain (call, sc);
!   tree forcedname = make_ssa_name (currop->type);
!   gimple_call_set_lhs (call, forcedname);
!   gimple_set_vuse (call, BB_LIVE_VOP_ON_EXIT (block));
!   gimple_seq_add_stmt_without_update (stmts, call);
!   bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (forcedname));
!   VN_INFO_GET (forcedname)->valnum = forcedname;
!   VN_INFO (forcedname)->value_id = get_next_value_id ();
!   pre_expr nameexpr = get_or_alloc_expr_for_name (forcedname);
!   add_to_value (VN_INFO (forcedname)->value_id, nameexpr);
!   bitmap_value_replace_in_set (NEW_SETS (block), nameexpr);
!   bitmap_value_replace_in_set (AVAIL_OUT (block), nameexpr);
!   return forcedname;
}
  
  case MEM_REF:
*** create_expression_by_pieces (basic_block
*** 2851,2866 
switch (nary->length)
  {
  case 1:
!   folded = fold_build1 (nary->opcode, nary->type,
! genop[0]);
break;
  case 2:
!   folded = fold_build2 (nary->opcode, nary->type,
! genop[0], genop[1]);
break;
  case 3:
!   folded = fold_build3 (nary->opcode, nary->type,
! genop[0], genop[1], genop[2]);
break;
  default:
gcc_unreachable ();
--- 2858,2873 
switch (nary->length)
  {
  case 1:
!   folded = gimple_build (&forced_stmts, nary->opcode, nary->type,
!  genop[0]);
break;
  case 2:
!   folded = gimple_build (&forced_stmts, nary->opcode, nary->type,
!  genop[0], genop[1]);
break;
  case 3:
!   folded = gimple_build (&forced_stmts, nary->opcode, nary->type,
!  genop[0], genop[1],

Add a build_real_truncate helper function

2015-10-01 Thread Richard Sandiford
...which simplifies the match.pd patterns I'm about to add.

Bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

gcc/
* real.h (build_real_truncate): Declare.
* real.c (build_real_truncate): New function.
(strip_float_extensions): Use it.
* builtins.c (fold_builtin_cabs, fold_builtin_sqrt, fold_builtin_cbrt)
(fold_builtin_hypot, fold_builtin_pow): Likewise.
* match.pd: Likewise.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2ff1a8c..1751b37 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7593,12 +7593,10 @@ fold_builtin_cabs (location_t loc, tree arg, tree type, 
tree fndecl)
   if (flag_unsafe_math_optimizations
  && operand_equal_p (real, imag, OEP_PURE_SAME))
 {
- const REAL_VALUE_TYPE sqrt2_trunc
-   = real_value_truncate (TYPE_MODE (type), dconst_sqrt2 ());
  STRIP_NOPS (real);
  return fold_build2_loc (loc, MULT_EXPR, type,
- fold_build1_loc (loc, ABS_EXPR, type, real),
- build_real (type, sqrt2_trunc));
+ fold_build1_loc (loc, ABS_EXPR, type, real),
+ build_real_truncate (type, dconst_sqrt2 ()));
}
 }
 
@@ -7757,8 +7755,7 @@ fold_builtin_sqrt (location_t loc, tree arg, tree type)
 
  /* Adjust for the outer root.  */
  SET_REAL_EXP (&dconstroot, REAL_EXP (&dconstroot) - 1);
- dconstroot = real_value_truncate (TYPE_MODE (type), dconstroot);
- tree_root = build_real (type, dconstroot);
+ tree_root = build_real_truncate (type, dconstroot);
  return build_call_expr_loc (loc, powfn, 2, arg0, tree_root);
}
 }
@@ -7805,11 +7802,9 @@ fold_builtin_cbrt (location_t loc, tree arg, tree type)
   if (BUILTIN_EXPONENT_P (fcode))
{
  tree expfn = TREE_OPERAND (CALL_EXPR_FN (arg), 0);
- const REAL_VALUE_TYPE third_trunc =
-   real_value_truncate (TYPE_MODE (type), dconst_third ());
  arg = fold_build2_loc (loc, MULT_EXPR, type,
-CALL_EXPR_ARG (arg, 0),
-build_real (type, third_trunc));
+CALL_EXPR_ARG (arg, 0),
+build_real_truncate (type, dconst_third ()));
  return build_call_expr_loc (loc, expfn, 1, arg);
}
 
@@ -7825,8 +7820,7 @@ fold_builtin_cbrt (location_t loc, tree arg, tree type)
  REAL_VALUE_TYPE dconstroot = dconst_third ();
 
  SET_REAL_EXP (&dconstroot, REAL_EXP (&dconstroot) - 1);
- dconstroot = real_value_truncate (TYPE_MODE (type), dconstroot);
- tree_root = build_real (type, dconstroot);
+ tree_root = build_real_truncate (type, dconstroot);
  return build_call_expr_loc (loc, powfn, 2, arg0, tree_root);
}
}
@@ -7846,8 +7840,7 @@ fold_builtin_cbrt (location_t loc, tree arg, tree type)
 
  real_arithmetic (&dconstroot, MULT_EXPR,
dconst_third_ptr (), dconst_third_ptr ());
- dconstroot = real_value_truncate (TYPE_MODE (type), 
dconstroot);
- tree_root = build_real (type, dconstroot);
+ tree_root = build_real_truncate (type, dconstroot);
  return build_call_expr_loc (loc, powfn, 2, arg0, tree_root);
}
}
@@ -7863,10 +7856,8 @@ fold_builtin_cbrt (location_t loc, tree arg, tree type)
  if (tree_expr_nonnegative_p (arg00))
{
  tree powfn = TREE_OPERAND (CALL_EXPR_FN (arg), 0);
- const REAL_VALUE_TYPE dconstroot
-   = real_value_truncate (TYPE_MODE (type), dconst_third ());
- tree narg01 = fold_build2_loc (loc, MULT_EXPR, type, arg01,
-build_real (type, dconstroot));
+ tree c = build_real_truncate (type, dconst_third ());
+ tree narg01 = fold_build2_loc (loc, MULT_EXPR, type, arg01, c);
  return build_call_expr_loc (loc, powfn, 2, arg00, narg01);
}
}
@@ -8392,13 +8383,9 @@ fold_builtin_hypot (location_t loc, tree fndecl,
   /* hypot(x,x) -> fabs(x)*sqrt(2).  */
   if (flag_unsafe_math_optimizations
   && operand_equal_p (arg0, arg1, OEP_PURE_SAME))
-{
-  const REAL_VALUE_TYPE sqrt2_trunc
-   = real_value_truncate (TYPE_MODE (type), dconst_sqrt2 ());
-  return fold_build2_loc (loc, MULT_EXPR, type,
- fold_build1_loc (loc, ABS_EXPR, type, arg0),
- build_real (type, sqrt2_trunc));
-}
+return fold_build2_loc (loc, MULT_EXPR, type,
+   fold_build1_loc (loc, ABS_EXPR, type, arg0),
+   build_real_truncate (type, dconst_sqrt2 ()));
 
   return NULL_TREE;
 }
@@ -8530,10 +8517,8 @@ fold_builtin_po

Cache reals for 1/4, 1/6 and 1/9

2015-10-01 Thread Richard Sandiford
We have a global 1/2 and a cached 1/3, but recalculate 1/4, 1/6 and 1/9
each time we need them.  That seems a bit arbitrary and makes the folding
code more noisy (especially once it's moved to match.pd).

This patch caches the other three constants too.  Bootstrapped &
regression-tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard

gcc/
* real.h (dconst_quarter, dconst_sixth, dconst_ninth): New macros.
(dconst_quarter_ptr, dconst_sixth_ptr, dconst_ninth_ptr): Declare.
* real.c (CACHED_FRACTION(: New helper macro.
(dconst_third_ptr): Use it.
(dconst_quarter_ptr, dconst_sixth_ptr, dconst_ninth_ptr): New.
* builtins.c (fold_builtin_sqrt): Use dconst_quarter and
dconst_sixth.
(fold_builtin_cbrt): Use dconst_sixth and dconst_ninth.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1751b37..63724b9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7743,20 +7743,10 @@ fold_builtin_sqrt (location_t loc, tree arg, tree type)
   if (powfn)
{
  tree arg0 = CALL_EXPR_ARG (arg, 0);
- tree tree_root;
- /* The inner root was either sqrt or cbrt.  */
- /* This was a conditional expression but it triggered a bug
-in Sun C 5.5.  */
- REAL_VALUE_TYPE dconstroot;
- if (BUILTIN_SQRT_P (fcode))
-   dconstroot = dconsthalf;
- else
-   dconstroot = dconst_third ();
-
- /* Adjust for the outer root.  */
- SET_REAL_EXP (&dconstroot, REAL_EXP (&dconstroot) - 1);
- tree_root = build_real_truncate (type, dconstroot);
- return build_call_expr_loc (loc, powfn, 2, arg0, tree_root);
+ tree arg1 = (BUILTIN_SQRT_P (fcode)
+  ? build_real (type, dconst_quarter ())
+  : build_real_truncate (type, dconst_sixth ()));
+ return build_call_expr_loc (loc, powfn, 2, arg0, arg1);
}
 }
 
@@ -7816,11 +7806,7 @@ fold_builtin_cbrt (location_t loc, tree arg, tree type)
  if (powfn)
{
  tree arg0 = CALL_EXPR_ARG (arg, 0);
- tree tree_root;
- REAL_VALUE_TYPE dconstroot = dconst_third ();
-
- SET_REAL_EXP (&dconstroot, REAL_EXP (&dconstroot) - 1);
- tree_root = build_real_truncate (type, dconstroot);
+ tree tree_root = build_real_truncate (type, dconst_sixth ());
  return build_call_expr_loc (loc, powfn, 2, arg0, tree_root);
}
}
@@ -7835,12 +7821,7 @@ fold_builtin_cbrt (location_t loc, tree arg, tree type)
 
  if (powfn)
{
- tree tree_root;
- REAL_VALUE_TYPE dconstroot;
-
- real_arithmetic (&dconstroot, MULT_EXPR,
-   dconst_third_ptr (), dconst_third_ptr ());
- tree_root = build_real_truncate (type, dconstroot);
+ tree tree_root = build_real_truncate (type, dconst_ninth ());
  return build_call_expr_loc (loc, powfn, 2, arg0, tree_root);
}
}
diff --git a/gcc/real.c b/gcc/real.c
index c1ff78d..78f3623 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -2379,21 +2379,26 @@ dconst_e_ptr (void)
   return &value;
 }
 
-/* Returns the special REAL_VALUE_TYPE corresponding to 1/3.  */
-
-const REAL_VALUE_TYPE *
-dconst_third_ptr (void)
-{
-  static REAL_VALUE_TYPE value;
-
-  /* Initialize mathematical constants for constant folding builtins.
- These constants need to be given to at least 160 bits precision.  */
-  if (value.cl == rvc_zero)
-{
-  real_arithmetic (&value, RDIV_EXPR, &dconst1, real_digit (3));
-}
-  return &value;
-}
+/* Returns a cached REAL_VALUE_TYPE corresponding to 1/n, for various n.  */
+
+#define CACHED_FRACTION(NAME, N)   \
+  const REAL_VALUE_TYPE *  \
+  NAME (void)  \
+  {\
+static REAL_VALUE_TYPE value;  \
+   \
+/* Initialize mathematical constants for constant folding builtins.
\
+   These constants need to be given to at least 160 bits   \
+   precision.  */  \
+if (value.cl == rvc_zero)  \
+  real_arithmetic (&value, RDIV_EXPR, &dconst1, real_digit (N));   \
+return &value; \
+  }
+
+CACHED_FRACTION (dconst_third_ptr, 3)
+CACHED_FRACTION (dconst_quarter_ptr, 4)
+CACHED_FRACTION (dconst_sixth_ptr, 6)
+CACHED_FRACTION (dconst_ninth_ptr, 9)
 
 /* Returns the special REAL_VALUE_TYPE corresponding to sqrt(2).  */
 
diff --git a/gcc/real.h b/gcc/real.h
index 455d853..5d8c92c 10

Re: Add a build_real_truncate helper function

2015-10-01 Thread Bernd Schmidt

On 10/01/2015 03:48 PM, Richard Sandiford wrote:

...which simplifies the match.pd patterns I'm about to add.

Bootstrapped & regression-tested on x86_64-linux-gnu.  OK to install?


Ok.


Bernd



[gomp4] backport some changes

2015-10-01 Thread Nathan Sidwell
I've applied this to gomp4 to apply some changes to these areas that occurred on 
merging to trunk.


nathan
2015-10-01  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_validate_dims): Rename to ...
	(nvptx_goacc_validate_dims): ... here.
	(TARGET_GOACC_VALIDATE_DIMS): Update.
	* target.def (validate_dims): Expand documentation.
	* omp-low.c (default_goacc_validate_dims): Remove erroneous ARG_UNUSED.
	* doc/tm.texi: Rebuild.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 228304)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -4248,12 +4248,12 @@ nvptx_expand_builtin (tree exp, rtx targ
 #define PTX_VECTOR_LENGTH 32
 #define PTX_WORKER_LENGTH 32
 
-/* Validate compute dimensions, fill in non-unity defaults.  FN_LEVEL
-   indicates the level at which a routine might spawn a loop.  It is
-   negative for non-routines.  */
+/* Validate compute dimensions of an OpenACC offload or routine, fill
+   in non-unity defaults.  FN_LEVEL indicates the level at which a
+   routine might spawn a loop.  It is negative for non-routines.  */
 
 static bool
-nvptx_validate_dims (tree decl, int dims[], int fn_level)
+nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
   bool changed = false;
 
@@ -4856,7 +4856,7 @@ nvptx_use_anchors_for_symbol (const_rtx
 #define TARGET_BUILTIN_DECL nvptx_builtin_decl
 
 #undef TARGET_GOACC_VALIDATE_DIMS
-#define TARGET_GOACC_VALIDATE_DIMS nvptx_validate_dims
+#define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims
 
 #undef TARGET_GOACC_DIM_LIMIT
 #define TARGET_GOACC_DIM_LIMIT nvptx_dim_limit
Index: gcc/target.def
===
--- gcc/target.def	(revision 228304)
+++ gcc/target.def	(working copy)
@@ -1646,11 +1646,15 @@ HOOK_VECTOR (TARGET_GOACC, goacc)
 
 DEFHOOK
 (validate_dims,
-"This hook should check the launch dimensions provided.  It should fill\n\
-in anything that needs to default to non-unity and verify non-defaults.\n\
-Defaults are represented as -1.  Diagnostics should be issued as\n\
-appropriate.  Return true if changes have been made.  You must override\n\
-this hook to provide dimensions larger than 1.",
+"This hook should check the launch dimensions provided for an OpenACC\n\
+compute region, or routine.  Defaulted values are represented as -1\n\
+and non-constant values as 0. The @var{fn_level} is negative for the\n\
+function corresponding to the compute region.  For a routine is is the\n\
+outermost level at which partitioned execution may be spawned.  It\n\
+should fill in anything that needs to default to non-unity and verify\n\
+non-defaults.  Diagnostics should be issued as appropriate.  Return\n\
+true, if changes have been made.  You must override this hook to\n\
+provide dimensions larger than 1.",
 bool, (tree decl, int dims[], int fn_level),
 default_goacc_validate_dims)
 
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi	(revision 228304)
+++ gcc/doc/tm.texi	(working copy)
@@ -5749,11 +5749,15 @@ to use it.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int @var{dims[]}, int @var{fn_level})
-This hook should check the launch dimensions provided.  It should fill
-in anything that needs to default to non-unity and verify non-defaults.
-Defaults are represented as -1.  Diagnostics should be issued as
-appropriate.  Return true if changes have been made.  You must override
-this hook to provide dimensions larger than 1.
+This hook should check the launch dimensions provided for an OpenACC
+compute region, or routine.  Defaulted values are represented as -1
+and non-constant values as 0. The @var{fn_level} is negative for the
+function corresponding to the compute region.  For a routine is is the
+outermost level at which partitioned execution may be spawned.  It
+should fill in anything that needs to default to non-unity and verify
+non-defaults.  Diagnostics should be issued as appropriate.  Return
+true, if changes have been made.  You must override this hook to
+provide dimensions larger than 1.
 @end deftypefn
 
 @deftypefn {Target Hook} unsigned TARGET_GOACC_DIM_LIMIT (unsigned @var{axis})
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228304)
+++ gcc/omp-low.c	(working copy)
@@ -14904,7 +14904,7 @@ execute_oacc_device_lower ()
hook.  */
 
 bool
-default_goacc_validate_dims (tree ARG_UNUSED (decl), int *ARG_UNUSED (dims),
+default_goacc_validate_dims (tree ARG_UNUSED (decl), int *dims,
 			 int ARG_UNUSED (fn_level))
 {
   bool changed = false;


[Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread James Greenhalgh

Hi,

If it is cheap enough to treat a floating-point value as an integer and
to do bitwise arithmetic on it (as it is for AArch64) we can rewrite:

  x * copysign (1.0, y)

as:

  x ^ (y & (1 << sign_bit_position))

This patch implements that rewriting rule in match.pd, and a testcase
expecting the transform.

This is worth about 6% in 481.wrf for AArch64. I don't don't know enough
about the x86 microarchitectures to know how productive this transformation
is there. In Spec2006FP I didn't see any interesting results in either
direction. Looking at code generation for the testcase I add, I think the
x86 code generation looks worse, but I can't understand why it doesn't use
a vector-side xor and load the mask vector-side. With that fixed up I think
the code generation would look better - though as I say, I'm not an expert
here...

Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.

OK for trunk?

Thanks,
James

---
gcc/

2015-10-01  James Greenhalgh  

* match.pd (mult (COPYSIGN:s real_onep @0) @1): New simplifier.

gcc/testsuite/

2015-10-01  James Greenhalgh  

* gcc.dg/tree-ssa/copysign.c: New.

diff --git a/gcc/match.pd b/gcc/match.pd
index bd5c267..d51ad2e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 (define_operator_list TAN BUILT_IN_TANF BUILT_IN_TAN BUILT_IN_TANL)
 (define_operator_list COSH BUILT_IN_COSHF BUILT_IN_COSH BUILT_IN_COSHL)
 (define_operator_list CEXPI BUILT_IN_CEXPIF BUILT_IN_CEXPI BUILT_IN_CEXPIL)
+(define_operator_list COPYSIGN BUILT_IN_COPYSIGNF BUILT_IN_COPYSIGN BUILT_IN_COPYSIGNL)
 
 /* Simplifications of operations with one constant operand and
simplifications to constants or single values.  */
@@ -2079,6 +2080,21 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Simplification of math builtins.  */
 
+/* Simplify x * copysign (1.0, y) -> x ^ (y & (1 << sign_bit_position)).  */
+(simplify
+  (mult:c (COPYSIGN:s real_onep @0) @1)
+  (with
+{
+  wide_int m = wi::min_value (TYPE_PRECISION (type), SIGNED);
+  tree tt
+	= build_nonstandard_integer_type (TYPE_PRECISION (type),
+	  false);
+  tree mask = wide_int_to_tree (tt, m);
+}
+(view_convert (bit_xor (view_convert:tt @1)
+			   (bit_and (view_convert:tt @0)
+ { mask; })
+
 /* fold_builtin_logarithm */
 (if (flag_unsafe_math_optimizations)
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copysign.c b/gcc/testsuite/gcc.dg/tree-ssa/copysign.c
new file mode 100644
index 000..b67f3c1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copysign.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-gimple" } */
+
+double
+foo_d (double x, double y)
+{
+  return x * __builtin_copysign (1.0, y);
+}
+
+float
+foo_f (float x, float y)
+{
+  return x * __builtin_copysignf (1.0f, y);
+}
+
+long double
+foo_l (long double x, long double y)
+{
+  return x * __builtin_copysignl (1.0, y);
+}
+
+/* { dg-final { scan-tree-dump-not "copysign" "gimple"} } */


Re: Cache reals for 1/4, 1/6 and 1/9

2015-10-01 Thread Bernd Schmidt

On 10/01/2015 03:51 PM, Richard Sandiford wrote:

We have a global 1/2 and a cached 1/3, but recalculate 1/4, 1/6 and 1/9
each time we need them.  That seems a bit arbitrary and makes the folding
code more noisy (especially once it's moved to match.pd).

This patch caches the other three constants too.  Bootstrapped &
regression-tested on x86_64-linux-gnu.  OK to install?


Looks reasonable enough.


Bernd


Re: [PATCH] rs6000: Add "cannot_copy" attribute, use it (PR67788, PR67789)

2015-10-01 Thread David Edelsohn
On Thu, Oct 1, 2015 at 2:08 AM, Segher Boessenkool
 wrote:
> After the shrink-wrapping patches the prologue will often be pushed
> "deeper" into the function, which in turn means the software trace cache
> pass will more often want to duplicate the basic block containing the
> prologue.  This caused failures for 32-bit SVR4 with -msecure-plt PIC.
>
> This configuration uses the load_toc_v4_PIC_1 instruction, which creates
> assembler labels without using the normal machinery for that.  If now
> the compiler decides to duplicate the insn, it will emit the same label
> twice.  Boom.
>
> It isn't so easy to fix this to use labels the compiler knows about (let
> alone test that properly).  Instead, this patch wires up a "cannot_copy"
> attribute to be used by TARGET_CANNOT_COPY_P, and sets that attribute on
> these insns we do not want copied.
>
> Bootstrapped and tested on powerpc64-linux, with the usual configurations
> (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); new testcase fails before, works
> after (on 32-bit).
>
> Is this okay for mainline?
>
>
> Segher
>
>
> 2015-09-30  Segher Boessenkool  
>
> PR target/67788
> PR target/67789
> * config/rs6000/rs6000.c (TARGET_CANNOT_COPY_INSN_P): New.
> (rs6000_cannot_copy_insn_p): New function.
> * config/rs6000/rs6000.md (cannot_copy): New attribute.
> (load_toc_v4_PIC_1_normal): Set cannot_copy.
> (load_toc_v4_PIC_1_476): Ditto.
>
> gcc/testsuite/
> PR target/67788
> PR target/67789
> * gcc.target/powerpc/pr67789.c: New testcase.

Bernd mentions that this is the normal way of handling this problem, so okay.

Thanks, David


Re: [PATCH] rs6000: Add "cannot_copy" attribute, use it (PR67788, PR67789)

2015-10-01 Thread David Edelsohn
On Thu, Oct 1, 2015 at 2:08 AM, Segher Boessenkool
 wrote:
> After the shrink-wrapping patches the prologue will often be pushed
> "deeper" into the function, which in turn means the software trace cache
> pass will more often want to duplicate the basic block containing the
> prologue.  This caused failures for 32-bit SVR4 with -msecure-plt PIC.
>
> This configuration uses the load_toc_v4_PIC_1 instruction, which creates
> assembler labels without using the normal machinery for that.  If now
> the compiler decides to duplicate the insn, it will emit the same label
> twice.  Boom.
>
> It isn't so easy to fix this to use labels the compiler knows about (let
> alone test that properly).  Instead, this patch wires up a "cannot_copy"
> attribute to be used by TARGET_CANNOT_COPY_P, and sets that attribute on
> these insns we do not want copied.
>
> Bootstrapped and tested on powerpc64-linux, with the usual configurations
> (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); new testcase fails before, works
> after (on 32-bit).
>
> Is this okay for mainline?

Is this expensive enough that it is worth limiting the definition of
the hook to configurations that include 32-bit SVR4 support so that
not every configuration incurs the overhead?

Thanks, David


Re: Do not use TYPE_CANONICAL in useless_type_conversion

2015-10-01 Thread Eric Botcazou
> After dropping the check I needed to solve two issues. First is that we need
> a definition of useless conversions for aggregates. As discussed earlier I
> made it to depend only on size. The basic idea is that only operations you
> can do on gimple with those are moves and field accesses. Field accesses
> have corresponding type into in COMPONENT_REF or MEM_REF, so we do not care
> about conversions of those.  This caused three Ada failures on PPC64,
> because we can not move between structures of same size but different mode.

Do you disregard the alignment on purpose here?

-- 
Eric Botcazou


Re: Fold acc_on_device

2015-10-01 Thread Nathan Sidwell

On 10/01/15 08:46, Richard Biener wrote:

On Thu, Oct 1, 2015 at 2:33 PM, Nathan Sidwell  wrote:



use TREE_TYPE (arg0) for the integer cst.

Otherwise looks good to me.

thanks,

fixed up and applied (also noticed a copy & paste malfunction setting the 
location)


nathan
2015-10-01  Nathan Sidwell  

	* builtins.c: Don't include gomp-constants.h.
	(fold_builtin_1): Don't fold acc_on_device here.
	* gimple-fold.c: Include gomp-constants.h.
	(gimple_fold_builtin_acc_on_device): New.
	(gimple_fold_builtin): Call it.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c	(revision 228288)
+++ gcc/gimple-fold.c	(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
 #include "output.h"
 #include "tree-eh.h"
 #include "gimple-match.h"
+#include "gomp-constants.h"
 
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
@@ -2708,6 +2709,47 @@ gimple_fold_builtin_strlen (gimple_stmt_
   return true;
 }
 
+/* Fold a call to __builtin_acc_on_device.  */
+
+static bool
+gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
+{
+  /* Defer folding until we know which compiler we're in.  */
+  if (symtab->state != EXPANSION)
+return false;
+
+  unsigned val_host = GOMP_DEVICE_HOST;
+  unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+  val_host = GOMP_DEVICE_NOT_HOST;
+  val_dev = ACCEL_COMPILER_acc_device;
+#endif
+
+  location_t loc = gimple_location (gsi_stmt (*gsi));
+  
+  tree host_eq = make_ssa_name (boolean_type_node);
+  gimple *host_ass = gimple_build_assign
+(host_eq, EQ_EXPR, arg0, build_int_cst (TREE_TYPE (arg0), val_host));
+  gimple_set_location (host_ass, loc);
+  gsi_insert_before (gsi, host_ass, GSI_SAME_STMT);
+
+  tree dev_eq = make_ssa_name (boolean_type_node);
+  gimple *dev_ass = gimple_build_assign
+(dev_eq, EQ_EXPR, arg0, build_int_cst (TREE_TYPE (arg0), val_dev));
+  gimple_set_location (dev_ass, loc);
+  gsi_insert_before (gsi, dev_ass, GSI_SAME_STMT);
+
+  tree result = make_ssa_name (boolean_type_node);
+  gimple *result_ass = gimple_build_assign
+(result, BIT_IOR_EXPR, host_eq, dev_eq);
+  gimple_set_location (result_ass, loc);
+  gsi_insert_before (gsi, result_ass, GSI_SAME_STMT);
+
+  replace_call_with_value (gsi, result);
+
+  return true;
+}
 
 /* Fold the non-target builtin at *GSI and return whether any simplification
was made.  */
@@ -2848,6 +2890,9 @@ gimple_fold_builtin (gimple_stmt_iterato
 	   n == 3
 	   ? gimple_call_arg (stmt, 2)
 	   : NULL_TREE, fcode);
+case BUILT_IN_ACC_ON_DEVICE:
+  return gimple_fold_builtin_acc_on_device (gsi,
+		gimple_call_arg (stmt, 0));
 default:;
 }
 
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 228288)
+++ gcc/builtins.c	(working copy)
@@ -64,7 +64,6 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
-#include "gomp-constants.h"
 
 
 static tree do_mpc_arg1 (tree, tree, int (*)(mpc_ptr, mpc_srcptr, mpc_rnd_t));
@@ -10230,27 +10229,6 @@ fold_builtin_1 (location_t loc, tree fnd
 	return build_empty_stmt (loc);
   break;
 
-case BUILT_IN_ACC_ON_DEVICE:
-  /* Don't fold on_device until we know which compiler is active.  */
-  if (symtab->state == EXPANSION)
-	{
-	  unsigned val_host = GOMP_DEVICE_HOST;
-	  unsigned val_dev = GOMP_DEVICE_NONE;
-
-#ifdef ACCEL_COMPILER
-	  val_host = GOMP_DEVICE_NOT_HOST;
-	  val_dev = ACCEL_COMPILER_acc_device;
-#endif
-	  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
-			  build_int_cst (integer_type_node, val_host));
-	  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
-			 build_int_cst (integer_type_node, val_dev));
-
-	  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
-	  return fold_convert (integer_type_node, result);
-	}
-  break;
-
 default:
   break;
 }


Re: Do not use TYPE_CANONICAL in useless_type_conversion

2015-10-01 Thread Richard Biener
On Thu, 1 Oct 2015, Eric Botcazou wrote:

> > After dropping the check I needed to solve two issues. First is that we need
> > a definition of useless conversions for aggregates. As discussed earlier I
> > made it to depend only on size. The basic idea is that only operations you
> > can do on gimple with those are moves and field accesses. Field accesses
> > have corresponding type into in COMPONENT_REF or MEM_REF, so we do not care
> > about conversions of those.  This caused three Ada failures on PPC64,
> > because we can not move between structures of same size but different mode.
> 
> Do you disregard the alignment on purpose here?

Do we require that to match?  I don't remember that we do.  Note the
function only gets types.

Richard.


Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread pinskia
> 
> On Oct 1, 2015, at 6:57 AM, James Greenhalgh  wrote:
> 
> 
> Hi,
> 
> If it is cheap enough to treat a floating-point value as an integer and
> to do bitwise arithmetic on it (as it is for AArch64) we can rewrite:
> 
>  x * copysign (1.0, y)
> 
> as:
> 
>  x ^ (y & (1 << sign_bit_position))

Why not just convert it to copysign (x, y) instead and let expand chose the 
better implementation?  Also I think this can only be done for finite and non 
trapping types. 

Thanks,
Andrew

> 
> This patch implements that rewriting rule in match.pd, and a testcase
> expecting the transform.
> 
> This is worth about 6% in 481.wrf for AArch64. I don't don't know enough
> about the x86 microarchitectures to know how productive this transformation
> is there. In Spec2006FP I didn't see any interesting results in either
> direction. Looking at code generation for the testcase I add, I think the
> x86 code generation looks worse, but I can't understand why it doesn't use
> a vector-side xor and load the mask vector-side. With that fixed up I think
> the code generation would look better - though as I say, I'm not an expert
> here...
> 
> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
> 
> OK for trunk?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2015-10-01  James Greenhalgh  
> 
>* match.pd (mult (COPYSIGN:s real_onep @0) @1): New simplifier.
> 
> gcc/testsuite/
> 
> 2015-10-01  James Greenhalgh  
> 
>* gcc.dg/tree-ssa/copysign.c: New.
> 
> <0001-Patch-match.pd-Add-a-simplify-rule-for-x-copysign-1..patch>


Re: Do not use TYPE_CANONICAL in useless_type_conversion

2015-10-01 Thread Eric Botcazou
> Do we require that to match?  I don't remember that we do.

For scalar types (and arrays of scalars), the alignment is essentially encoded 
in the size/mode pair but that's not the case for non-array aggregate types, 
so declaring a conversion that changes the alignment as useless seems weird.

-- 
Eric Botcazou


[PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.

2015-10-01 Thread Simon Dardis
Hello,

This patch migrates the MIPS backend to the new vector reduction optabs. 


No new regressions, ok to apply?

Thanks,
Simon

gcc/ChangeLog:

* config/mips/loongson.md   (vec_loongson_extract_lo_): New, 
extract low part to scalar.
(reduc_uplus_): Remove.
(reduc_plus_scal_): Rename from reduc_splus_, Use  vec 
loongson_extract_lo_.
(reduc_smax_scal_, reduc_smin_scal_): Rename from 
reduc_smax_, 
reduc_smax_, fix constraints, use vec loongson_extract_lo_.
(reduc_umax_scal_, reduc_umin_scal_): Rename, change 
constraints.

Index: config/mips/loongson.md
===
--- config/mips/loongson.md (revision 228282)
+++ config/mips/loongson.md (working copy)
@@ -852,58 +852,66 @@
   "dsrl\t%0,%1,%2"
   [(set_attr "type" "fcvt")])
 
-(define_expand "reduc_uplus_"
-  [(match_operand:VWH 0 "register_operand" "")
-   (match_operand:VWH 1 "register_operand" "")]
+(define_insn "vec_loongson_extract_lo_"
+  [(set (match_operand: 0 "register_operand" "=r")
+(vec_select:
+  (match_operand:VWHB 1 "register_operand" "f")
+  (parallel [(const_int 0)])))]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
-{
-  mips_expand_vec_reduc (operands[0], operands[1], gen_add3);
-  DONE;
-})
+  "mfc1\t%0,%1"
+  [(set_attr "type" "mfc")])
 
-; ??? Given that we're not describing a widening reduction, we should
-; not have separate optabs for signed and unsigned.
-(define_expand "reduc_splus_"
-  [(match_operand:VWHB 0 "register_operand" "")
+(define_expand "reduc_plus_scal_"
+  [(match_operand: 0 "register_operand" "")
(match_operand:VWHB 1 "register_operand" "")]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
 {
-  emit_insn (gen_reduc_uplus_(operands[0], operands[1]));
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
+  mips_expand_vec_reduc (tmp, operands[1], gen_add3);
+  emit_insn ( gen_vec_loongson_extract_lo_ (operands[0], tmp));
   DONE;
 })
 
-(define_expand "reduc_smax_"
-  [(match_operand:VWHB 0 "register_operand" "")
-   (match_operand:VWHB 1 "register_operand" "")]
+(define_expand "reduc_smax_scal_"
+  [(match_operand:HI 0 "register_operand" "")
+   (match_operand:VH 1 "register_operand" "")]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
 {
-  mips_expand_vec_reduc (operands[0], operands[1], gen_smax3);
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
+  mips_expand_vec_reduc (tmp, operands[1], gen_smax3);
+  emit_insn ( gen_vec_loongson_extract_lo_ (operands[0], tmp));
   DONE;
 })
 
-(define_expand "reduc_smin_"
-  [(match_operand:VWHB 0 "register_operand" "")
-   (match_operand:VWHB 1 "register_operand" "")]
+(define_expand "reduc_smin_scal_"
+  [(match_operand:HI 0 "register_operand" "")
+   (match_operand:VH 1 "register_operand" "")]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
 {
-  mips_expand_vec_reduc (operands[0], operands[1], gen_smin3);
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
+  mips_expand_vec_reduc (tmp, operands[1], gen_smin3);
+  emit_insn ( gen_vec_loongson_extract_lo_ (operands[0], tmp));
   DONE;
 })
 
-(define_expand "reduc_umax_"
-  [(match_operand:VB 0 "register_operand" "")
+(define_expand "reduc_umax_scal_"
+  [(match_operand:QI 0 "register_operand" "")
(match_operand:VB 1 "register_operand" "")]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
 {
-  mips_expand_vec_reduc (operands[0], operands[1], gen_umax3);
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
+  mips_expand_vec_reduc (tmp, operands[1], gen_umax3);
+  emit_insn ( gen_vec_loongson_extract_lo_ (operands[0], tmp));
   DONE;
 })
 
-(define_expand "reduc_umin_"
-  [(match_operand:VB 0 "register_operand" "")
+(define_expand "reduc_umin_scal_"
+  [(match_operand:QI 0 "register_operand" "")
(match_operand:VB 1 "register_operand" "")]
   "TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
 {
-  mips_expand_vec_reduc (operands[0], operands[1], gen_umin3);
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
+  mips_expand_vec_reduc (tmp, operands[1], gen_umin3);
+  emit_insn ( gen_vec_loongson_extract_lo_ (operands[0], tmp));
   DONE;
 })




Go patch committed: Only check OPT_m32 if TARGET_CAN_SPLIT_STACK_64BIT

2015-10-01 Thread Ian Lance Taylor
The option OPT_m32 is not defined on all targets.  This patch only
checks for it if TARGET_CAN_SPLIT_STACK_64BIT is defined, which is the
only case where we need to know the value.  Bootstrapped on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

2015-10-01  Ian Lance Taylor  

PR go/66870
* gospec.c (lang_specific_driver): Only look for OPT_m32 if
TARGET_CAN_SPLIT_STACK_64BIT is defined.
Index: gospec.c
===
--- gospec.c(revision 228311)
+++ gospec.c(working copy)
@@ -158,9 +158,11 @@ lang_specific_driver (struct cl_decoded_
library = (library == 0) ? 1 : library;
  break;
 
+#ifdef TARGET_CAN_SPLIT_STACK_64BIT
case OPT_m32:
  saw_opt_m32 = true;
  break;
+#endif
 
case OPT_pg:
case OPT_p:


C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-01 Thread Marek Polacek
Joseph reminded me that I had forgotten about this patch.  As mentioned
here , I'm
removing the XFAILs in the tests so people are likely to see new FAILs.

I think the following targets will need similar fix as the one below:
* MIPS
* rs6000
* alpha
* sparc
* s390
* arm
* sh
* aarch64

I'm CCing the respective maintainers.  You might want to XFAIL those tests.

Applying to trunk.

2015-10-01  Marek Polacek  

PR c/65345
* config/i386/i386.c (ix86_atomic_assign_expand_fenv): Adjust to use
create_tmp_var_raw rather than create_tmp_var.

* gcc.dg/atomic/pr65345-4.c: New test.
* gcc.dg/pr65345-3.c: New test.

diff --git gcc/config/i386/i386.c gcc/config/i386/i386.c
index fe9c756..cfeba76 100644
--- gcc/config/i386/i386.c
+++ gcc/config/i386/i386.c
@@ -53128,13 +53128,13 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
 {
   if (!TARGET_80387 && !TARGET_SSE_MATH)
 return;
-  tree exceptions_var = create_tmp_var (integer_type_node);
+  tree exceptions_var = create_tmp_var_raw (integer_type_node);
   if (TARGET_80387)
 {
   tree fenv_index_type = build_index_type (size_int (6));
   tree fenv_type = build_array_type (unsigned_type_node, fenv_index_type);
-  tree fenv_var = create_tmp_var (fenv_type);
-  mark_addressable (fenv_var);
+  tree fenv_var = create_tmp_var_raw (fenv_type);
+  TREE_ADDRESSABLE (fenv_var) = 1;
   tree fenv_ptr = build_pointer_type (fenv_type);
   tree fenv_addr = build1 (ADDR_EXPR, fenv_ptr, fenv_var);
   fenv_addr = fold_convert (ptr_type_node, fenv_addr);
@@ -53144,10 +53144,12 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
   tree fnclex = ix86_builtins[IX86_BUILTIN_FNCLEX];
   tree hold_fnstenv = build_call_expr (fnstenv, 1, fenv_addr);
   tree hold_fnclex = build_call_expr (fnclex, 0);
-  *hold = build2 (COMPOUND_EXPR, void_type_node, hold_fnstenv,
+  fenv_var = build4 (TARGET_EXPR, fenv_type, fenv_var, hold_fnstenv,
+NULL_TREE, NULL_TREE);
+  *hold = build2 (COMPOUND_EXPR, void_type_node, fenv_var,
  hold_fnclex);
   *clear = build_call_expr (fnclex, 0);
-  tree sw_var = create_tmp_var (short_unsigned_type_node);
+  tree sw_var = create_tmp_var_raw (short_unsigned_type_node);
   tree fnstsw_call = build_call_expr (fnstsw, 0);
   tree sw_mod = build2 (MODIFY_EXPR, short_unsigned_type_node,
sw_var, fnstsw_call);
@@ -53161,8 +53163,8 @@ ix86_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
 }
   if (TARGET_SSE_MATH)
 {
-  tree mxcsr_orig_var = create_tmp_var (unsigned_type_node);
-  tree mxcsr_mod_var = create_tmp_var (unsigned_type_node);
+  tree mxcsr_orig_var = create_tmp_var_raw (unsigned_type_node);
+  tree mxcsr_mod_var = create_tmp_var_raw (unsigned_type_node);
   tree stmxcsr = ix86_builtins[IX86_BUILTIN_STMXCSR];
   tree ldmxcsr = ix86_builtins[IX86_BUILTIN_LDMXCSR];
   tree stmxcsr_hold_call = build_call_expr (stmxcsr, 0);
diff --git gcc/testsuite/gcc.dg/atomic/pr65345-4.c 
gcc/testsuite/gcc.dg/atomic/pr65345-4.c
index e69de29..6d44def 100644
--- gcc/testsuite/gcc.dg/atomic/pr65345-4.c
+++ gcc/testsuite/gcc.dg/atomic/pr65345-4.c
@@ -0,0 +1,58 @@
+/* PR c/65345 */
+/* { dg-options "" } */
+
+#define CHECK(X) if (!(X)) __builtin_abort ()
+
+_Atomic float i = 5;
+_Atomic float j = 2;
+
+void
+fn1 (float a[(int) (i = 0)])
+{
+}
+
+void
+fn2 (float a[(int) (i += 2)])
+{
+}
+
+void
+fn3 (float a[(int) ++i])
+{
+}
+
+void
+fn4 (float a[(int) ++i])
+{
+}
+
+void
+fn5 (float a[(int) ++i][(int) (j = 10)])
+{
+}
+
+void
+fn6 (float a[(int) (i = 7)][(int) j--])
+{
+}
+
+int
+main ()
+{
+  float a[10];
+  float aa[10][10];
+  fn1 (a);
+  CHECK (i == 0);
+  fn2 (a);
+  CHECK (i == 2);
+  fn3 (a);
+  CHECK (i == 3);
+  fn4 (a);
+  CHECK (i == 4);
+  fn5 (aa);
+  CHECK (i == 5);
+  CHECK (j == 10);
+  fn6 (aa);
+  CHECK (i == 7);
+  CHECK (j == 9);
+}
diff --git gcc/testsuite/gcc.dg/pr65345-3.c gcc/testsuite/gcc.dg/pr65345-3.c
index e69de29..cda9364 100644
--- gcc/testsuite/gcc.dg/pr65345-3.c
+++ gcc/testsuite/gcc.dg/pr65345-3.c
@@ -0,0 +1,35 @@
+/* PR c/65345 */
+/* { dg-options "" } */
+
+_Atomic float i = 3.0f;
+
+float a1 = sizeof (i + 1.2);
+float a2 = sizeof (i = 0);
+float a3 = sizeof (i++);
+float a4 = sizeof (i--);
+float a5 = sizeof (-i);
+
+float b1 = _Alignof (i + 1);
+float b2 = _Alignof (i = 0);
+float b3 = _Alignof (i++);
+float b4 = _Alignof (i--);
+float b5 = _Alignof (-i);
+
+float c1 = i; /* { dg-error "initializer element is not constant" } */
+float c2 = (i ? 1 : 2); /* { dg-error "initializer element is not constant" } 
*/
+float c3[(int) i]; /* { dg-error "variably modified" } */
+float c4 = 0 || i; /* { dg-error "initializer element is not constant" } */
+float c5 = (i += 10); /* { dg-error "initializer element

Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread James Greenhalgh
On Thu, Oct 01, 2015 at 03:28:22PM +0100, pins...@gmail.com wrote:
> > 
> > On Oct 1, 2015, at 6:57 AM, James Greenhalgh  
> > wrote:
> > 
> > 
> > Hi,
> > 
> > If it is cheap enough to treat a floating-point value as an integer and
> > to do bitwise arithmetic on it (as it is for AArch64) we can rewrite:
> > 
> >  x * copysign (1.0, y)
> > 
> > as:
> > 
> >  x ^ (y & (1 << sign_bit_position))
> 
> Why not just convert it to copysign (x, y) instead and let expand chose
> the better implementation?

Because that transformation is invalid :-)

let x = -1.0, y = -1.0

  x * copysign (1.0, y)
=  -1.0 * copysign (1.0, -1.0)
= -1.0 * -1.0
= 1.0

  copysign (x, y)
= copysign (-1.0, -1.0)
= -1.0

Or have I completely lost my maths skills :-)

> Also I think this can only be done for finite and non trapping types. 

That may be well true, I swithered either way and went for no checks, but
I'd happily go back on that and wrap this in something suitable restrictive
if I need to.

Thanks,
James


> > 
> > This patch implements that rewriting rule in match.pd, and a testcase
> > expecting the transform.
> > 
> > This is worth about 6% in 481.wrf for AArch64. I don't don't know enough
> > about the x86 microarchitectures to know how productive this transformation
> > is there. In Spec2006FP I didn't see any interesting results in either
> > direction. Looking at code generation for the testcase I add, I think the
> > x86 code generation looks worse, but I can't understand why it doesn't use
> > a vector-side xor and load the mask vector-side. With that fixed up I think
> > the code generation would look better - though as I say, I'm not an expert
> > here...
> > 
> > Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
> > 
> > OK for trunk?
> > 
> > Thanks,
> > James
> > 
> > ---
> > gcc/
> > 
> > 2015-10-01  James Greenhalgh  
> > 
> >* match.pd (mult (COPYSIGN:s real_onep @0) @1): New simplifier.
> > 
> > gcc/testsuite/
> > 
> > 2015-10-01  James Greenhalgh  
> > 
> >* gcc.dg/tree-ssa/copysign.c: New.
> > 
> > <0001-Patch-match.pd-Add-a-simplify-rule-for-x-copysign-1..patch>
> 


[PATCH, i386, AVX-512, doc] Mention all AVX-512 switches in invoke.texi.

2015-10-01 Thread Kirill Yukhin
Hello,

This patch adds missing AVX-512 switches to invoke.texi.

`make pdf` looks ok.
Is it ok for trunk and gcc-5-branch (a week after check in to trunk)?

gcc/
* doc/invoke.texi: Mention -mavx512vl, -mavx512bw, -mavx512dq,
-mavx521vbmi, -mavx512ifma. Add missing opindex-es.

--
Thanks, K

commit 5615034caed821c52f0a8c97966e0160f6dd9a5e
Author: Kirill Yukhin 
Date:   Thu Oct 1 16:57:52 2015 +0300

AVX-512. Mention all AVX-512 switches in invoke.texi.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ebfaaa1..b5f4b81 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1085,9 +1085,10 @@ See RS/6000 and PowerPC Options.
 -mrecip -mrecip=@var{opt} @gol
 -mvzeroupper -mprefer-avx128 @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol
--mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol
--maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol
--mclflushopt -mxsavec -mxsaves @gol
+-mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -mavx512vl @gol
+-mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -msha -maes @gol
+-mpclmul -mfsgsbase -mrdrnd -mf16c -mfma @gol
+-mprefetchwt1 -mclflushopt -mxsavec -mxsaves @gol
 -msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol
 -mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mmwaitx -mthreads @gol
 -mno-align-stringops  -minline-all-stringops @gol
@@ -22810,31 +22811,58 @@ preferred alignment to 
@option{-mpreferred-stack-boundary=2}.
 @opindex msse
 @need 200
 @itemx -msse2
+@opindex msse2
 @need 200
 @itemx -msse3
+@opindex msse3
 @need 200
 @itemx -mssse3
+@opindex mssse3
 @need 200
 @itemx -msse4
+@opindex msse4
 @need 200
 @itemx -msse4a
+@opindex msse4a
 @need 200
 @itemx -msse4.1
+@opindex msse4.1
 @need 200
 @itemx -msse4.2
+@opindex msse4.2
 @need 200
 @itemx -mavx
 @opindex mavx
 @need 200
 @itemx -mavx2
+@opindex mavx2
 @need 200
 @itemx -mavx512f
+@opindex mavx512f
 @need 200
 @itemx -mavx512pf
+@opindex mavx512pf
 @need 200
 @itemx -mavx512er
+@opindex mavx512er
 @need 200
 @itemx -mavx512cd
+@opindex mavx512cd
+@need 200
+@itemx -mavx512vl
+@opindex mavx512vl
+@need 200
+@itemx -mavx512bw
+@opindex mavx512bw
+@need 200
+@itemx -mavx512dq
+@opindex mavx512dq
+@need 200
+@itemx -mavx512ifma
+@opindex mavx512ifma
+@need 200
+@itemx -mavx512vbmi
+@opindex mavx512vbmi
 @need 200
 @itemx -msha
 @opindex msha
@@ -22861,8 +22889,10 @@ preferred alignment to 
@option{-mpreferred-stack-boundary=2}.
 @opindex mfma
 @need 200
 @itemx -mfma4
+@opindex mfma4
 @need 200
 @itemx -mno-fma4
+@opindex mno-fma4
 @need 200
 @itemx -mprefetchwt1
 @opindex mprefetchwt1
@@ -22919,7 +22949,8 @@ preferred alignment to 
@option{-mpreferred-stack-boundary=2}.
 These switches enable the use of instructions in the MMX, SSE,
 SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD,
 SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM,
-BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX or 3DNow!@:
+AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA AVX512VBMI, BMI, BMI2, FXSR,
+XSAVE, XSAVEOPT, LZCNT, RTM, MPX, MWAITX or 3DNow!@:
 extended instruction sets.  Each has a corresponding @option{-mno-} option
 to disable use of these instructions.
 


Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-01 Thread David Edelsohn
On Thu, Oct 1, 2015 at 10:49 AM, Marek Polacek  wrote:
> Joseph reminded me that I had forgotten about this patch.  As mentioned
> here , I'm
> removing the XFAILs in the tests so people are likely to see new FAILs.
>
> I think the following targets will need similar fix as the one below:
> * MIPS
> * rs6000
> * alpha
> * sparc
> * s390
> * arm
> * sh
> * aarch64
>
> I'm CCing the respective maintainers.  You might want to XFAIL those tests.

Why aren't you testing the appropriate fix on all of the targets?

- David


Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread pinskia

> On Oct 1, 2015, at 7:51 AM, James Greenhalgh  wrote:
> 
> On Thu, Oct 01, 2015 at 03:28:22PM +0100, pins...@gmail.com wrote:
>>> 
>>> On Oct 1, 2015, at 6:57 AM, James Greenhalgh  
>>> wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> If it is cheap enough to treat a floating-point value as an integer and
>>> to do bitwise arithmetic on it (as it is for AArch64) we can rewrite:
>>> 
>>> x * copysign (1.0, y)
>>> 
>>> as:
>>> 
>>> x ^ (y & (1 << sign_bit_position))
>> 
>> Why not just convert it to copysign (x, y) instead and let expand chose
>> the better implementation?
> 
> Because that transformation is invalid :-)
> 
> let x = -1.0, y = -1.0
> 
>  x * copysign (1.0, y)
>=  -1.0 * copysign (1.0, -1.0)
>= -1.0 * -1.0
>= 1.0
> 
>  copysign (x, y)
>= copysign (-1.0, -1.0)
>= -1.0
> 
> Or have I completely lost my maths skills :-)

No you are correct. Note I would rather see the copysign form in the tree level 
and have the integer form on the rtl level. So placing this in expand would be 
better instead of match.md. 

Thanks,
Andrew

> 
>> Also I think this can only be done for finite and non trapping types. 
> 
> That may be well true, I swithered either way and went for no checks, but
> I'd happily go back on that and wrap this in something suitable restrictive
> if I need to.
> 
> Thanks,
> James
> 
> 
>>> 
>>> This patch implements that rewriting rule in match.pd, and a testcase
>>> expecting the transform.
>>> 
>>> This is worth about 6% in 481.wrf for AArch64. I don't don't know enough
>>> about the x86 microarchitectures to know how productive this transformation
>>> is there. In Spec2006FP I didn't see any interesting results in either
>>> direction. Looking at code generation for the testcase I add, I think the
>>> x86 code generation looks worse, but I can't understand why it doesn't use
>>> a vector-side xor and load the mask vector-side. With that fixed up I think
>>> the code generation would look better - though as I say, I'm not an expert
>>> here...
>>> 
>>> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
>>> 
>>> OK for trunk?
>>> 
>>> Thanks,
>>> James
>>> 
>>> ---
>>> gcc/
>>> 
>>> 2015-10-01  James Greenhalgh  
>>> 
>>>   * match.pd (mult (COPYSIGN:s real_onep @0) @1): New simplifier.
>>> 
>>> gcc/testsuite/
>>> 
>>> 2015-10-01  James Greenhalgh  
>>> 
>>>   * gcc.dg/tree-ssa/copysign.c: New.
>>> 
>>> <0001-Patch-match.pd-Add-a-simplify-rule-for-x-copysign-1..patch>
>> 


[gomp4] gimple fold acc_on_device

2015-10-01 Thread Nathan Sidwell

I've applied this version of the acc_on_device folding to gomp4.

See https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00074.html for the trunk 
discussion.


nathan
2015-10-01  Nathan Sidwell  

	* builtins.c: Don't include gomp-constants.h.
	(fold_builtin_1): Don't fold acc_on_device here.
	* gimple-fold.c: Include gomp-constants.h.
	(gimple_fold_builtin_acc_on_device): New.
	(gimple_fold_builtin): Call it.

Index: builtins.c
===
--- builtins.c	(revision 228288)
+++ builtins.c	(working copy)
@@ -64,8 +64,6 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
-#include "gomp-constants.h"
-#include "omp-low.h"
 
 static tree do_mpc_arg1 (tree, tree, int (*)(mpc_ptr, mpc_srcptr, mpc_rnd_t));
 
@@ -10230,27 +10228,6 @@ fold_builtin_1 (location_t loc, tree fnd
 	return build_empty_stmt (loc);
   break;
 
-case BUILT_IN_ACC_ON_DEVICE:
-  /* Don't fold on_device until we know which compiler is active.  */
-  if (symtab->state == EXPANSION)
-	{
-	  unsigned val_host = GOMP_DEVICE_HOST;
-	  unsigned val_dev = GOMP_DEVICE_NONE;
-
-#ifdef ACCEL_COMPILER
-	  val_host = GOMP_DEVICE_NOT_HOST;
-	  val_dev = ACCEL_COMPILER_acc_device;
-#endif
-	  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
-			  build_int_cst (integer_type_node, val_host));
-	  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
-			 build_int_cst (integer_type_node, val_dev));
-
-	  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
-	  return fold_convert (integer_type_node, result);
-	}
-  break;
-
 default:
   break;
 }
Index: gimple-fold.c
===
--- gimple-fold.c	(revision 228288)
+++ gimple-fold.c	(working copy)
@@ -62,6 +62,7 @@ along with GCC; see the file COPYING3.
 #include "output.h"
 #include "tree-eh.h"
 #include "gimple-match.h"
+#include "gomp-constants.h"
 
 /* Return true when DECL can be referenced from current unit.
FROM_DECL (if non-null) specify constructor of variable DECL was taken from.
@@ -2708,6 +2709,47 @@ gimple_fold_builtin_strlen (gimple_stmt_
   return true;
 }
 
+/* Fold a call to __builtin_acc_on_device.  */
+
+static bool
+gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
+{
+  /* Defer folding until we know which compiler we're in.  */
+  if (symtab->state != EXPANSION)
+return false;
+
+  unsigned val_host = GOMP_DEVICE_HOST;
+  unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+  val_host = GOMP_DEVICE_NOT_HOST;
+  val_dev = ACCEL_COMPILER_acc_device;
+#endif
+
+  location_t loc = gimple_location (gsi_stmt (*gsi));
+  
+  tree host_eq = make_ssa_name (boolean_type_node);
+  gimple *host_ass = gimple_build_assign
+(host_eq, EQ_EXPR, arg0, build_int_cst (TREE_TYPE (arg0), val_host));
+  gimple_set_location (host_ass, loc);
+  gsi_insert_before (gsi, host_ass, GSI_SAME_STMT);
+
+  tree dev_eq = make_ssa_name (boolean_type_node);
+  gimple *dev_ass = gimple_build_assign
+(dev_eq, EQ_EXPR, arg0, build_int_cst (TREE_TYPE (arg0), val_dev));
+  gimple_set_location (dev_ass, loc);
+  gsi_insert_before (gsi, dev_ass, GSI_SAME_STMT);
+
+  tree result = make_ssa_name (boolean_type_node);
+  gimple *result_ass = gimple_build_assign
+(result, BIT_IOR_EXPR, host_eq, dev_eq);
+  gimple_set_location (result_ass, loc);
+  gsi_insert_before (gsi, result_ass, GSI_SAME_STMT);
+
+  replace_call_with_value (gsi, result);
+
+  return true;
+}
 
 /* Fold the non-target builtin at *GSI and return whether any simplification
was made.  */
@@ -2848,6 +2890,9 @@ gimple_fold_builtin (gimple_stmt_iterato
 	   n == 3
 	   ? gimple_call_arg (stmt, 2)
 	   : NULL_TREE, fcode);
+case BUILT_IN_ACC_ON_DEVICE:
+  return gimple_fold_builtin_acc_on_device (gsi,
+		gimple_call_arg (stmt, 0));
 default:;
 }
 


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-10-01 Thread Alan Hayward


On 30/09/2015 13:45, "Richard Biener"  wrote:

>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward 
>wrote:
>>
>>
>> On 18/09/2015 14:53, "Alan Hayward"  wrote:
>>
>>>
>>>
>>>On 18/09/2015 14:26, "Alan Lawrence"  wrote:
>>>
On 18/09/15 13:17, Richard Biener wrote:
>
> Ok, I see.
>
> That this case is already vectorized is because it implements
>MAX_EXPR,
> modifying it slightly to
>
> int foo (int *a)
> {
>int val = 0;
>for (int i = 0; i < 1024; ++i)
>  if (a[i] > val)
>val = a[i] + 1;
>return val;
> }
>
> makes it no longer handled by current code.
>

Yes. I believe the idea for the patch is to handle arbitrary
expressions
like

int foo (int *a)
{
int val = 0;
for (int i = 0; i < 1024; ++i)
  if (some_expression (i))
val = another_expression (i);
return val;
}
>>>
>>>Yes, that’s correct. Hopefully my new test cases should cover
>>>everything.
>>>
>>
>> Attached is a new version of the patch containing all the changes
>> requested by Richard.
>
>+  /* Compare the max index vector to the vector of found indexes to
>find
>+the postion of the max value.  This will result in either a
>single
>+match or all of the values.  */
>+  tree vec_compare = make_ssa_name (index_vec_type_signed);
>+  gimple vec_compare_stmt = gimple_build_assign (vec_compare,
>EQ_EXPR,
>+induction_index,
>+max_index_vec);
>
>I'm not sure all targets can handle this.  If I deciper the code
>correctly then we do
>
>  mask = induction_index == max_index_vec;
>  vec_and = mask & vec_data;
>
>plus some casts.  So this is basically
>
>  vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... };
>
>without the need to relate the induction index vector type to the data
>vector type.
>I believe this is also the form all targets support.


Ok, I’ll replace this.

>
>I am missing a comment before all this code-generation that shows the
>transform
>result with the variable names used in the code-gen.  I have a hard
>time connecting
>things here.

Ok, I’ll add some comments.

>
>+  tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR,
>scalar_type,
>+matched_data_reduc);
>+  epilog_stmt = gimple_build_assign (new_scalar_dest,
>+matched_data_reduc_cast);
>+  new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
>+  gimple_assign_set_lhs (epilog_stmt, new_temp);
>
>this will leave the stmt unsimplified.  scalar sign-changes should use
>NOP_EXPR,
>not VIEW_CONVERT_EXPR.  The easiest fix is to use fold_convert instead.
>Also just do like before - first make_ssa_name and then directly use it
>in the
>gimple_build_assign.

We need the VIEW_CONVERT_EXPR for the cases where we have float data
values. The index is always integer.


>
>The patch is somewhat hard to parse with all the indentation changes.  A
>context
>diff would be much easier to read in those contexts.

Ok, I’ll make the next patch like that

>
>+  if (v_reduc_type == COND_REDUCTION)
>+{
>+  widest_int ni;
>+
>+  if (! max_loop_iterations (loop, &ni))
>+   {
>+ if (dump_enabled_p ())
>+   dump_printf_loc (MSG_NOTE, vect_location,
>+"loop count not known, cannot create cond "
>+"reduction.\n");
>
>ugh.  That's bad.
>
>+  /* The additional index will be the same type as the condition.
>Check
>+that the loop can fit into this less one (because we'll use up
>the
>+zero slot for when there are no matches).  */
>+  tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type);
>+  if (wi::geu_p (ni, wi::to_widest (max_index)))
>+   {
>+ if (dump_enabled_p ())
>+   dump_printf_loc (MSG_NOTE, vect_location,
>+"loop size is greater than data size.\n");
>+ return false;
>
>Likewise.

We could do better if we made the index type larger.
But as a first implementation of this optimisation, I didn’t want to
overcomplicate things more.

>
>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt,
>gimple_stmt_iterator *gsi,
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
>
>+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
>+
>   /* FORNOW: Multiple types are not supported for condition.  */
>   if (code == COND_EXPR)
>
>this change looks odd (or wrong).  The type should be _only_ set/changed
>during
>analysis.


The problem is, for COND_EXPRs, this function calls
vectorizable_condition(), which sets STMT_VINFO_TYPE to
condition_vec_info_type.

Therefore we need something to restore it back to reduc_vec_info_type on
the non-analysis call.

I considered setting STMT_VINFO_TY

RE: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-10-01 Thread Evandro Menezes
Hi, Rama.

My patch changed the type of a couple of A64 insns from "neon_load2_2reg_q" and 
"neon_store2_2reg_q".  However, neither ThunderX nor Xgene referred to these 
types, only A57 and A53 did.  So I didn't feel that I'd be the best person to 
add them to those machines.

Thank you,

-- 
Evandro Menezes  Austin, TX


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
> Behalf Of Ramana Radhakrishnan
> Sent: Tuesday, September 29, 2015 19:47
> To: Evandro Menezes
> Cc: gcc-patches; James Greenhalgh; Marcus Shawcroft; Philipp Tomsich
> Subject: Re: [PATCH][AArch64] Add separate insn sched class for vector LDP &
> STP
> 
> On Tue, Sep 29, 2015 at 12:52 AM, Evandro Menezes 
> wrote:
> > In some micro-architectures the insns to load or store pairs of vector
> > registers are implemented rather differently from those affecting
> > lanes in vector registers.  Then, it's important that such insns be
> > described likewise differently in the scheduling model.
> >
> > This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart
> > from the current neon_load2_2reg_q and neon_store2_2reg_q types,
> respectively.
> 
> In such types.md restructuring, please handle these in *all* affected
> scheduler descriptions, in this case thunder and xgene are 2 scheduler
> descriptions that you have missed - Given Andrew is handling Thunder, please
> update the xgene backend too at the same time. I can't think of anything else
> that is affected right now.
> 
> A simple way to do that is to rename the old form to something else in an
> intermediate patch using git to figure out all the micro-architectures
> affected that need to be handled for both arm and
> aarch64 backends and then add the new forms to handle this.
> 
> If there need to be follow up patches for xgene with different handling, I'm
> sure Philipp will follow up - added him to CC.
> 
> Thanks,
> Ramana
> 
> >
> > Thank you,
> >
> > --
> > Evandro Menezes
> >



Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread Michael Matz
Hi,

On Thu, 1 Oct 2015, James Greenhalgh wrote:

> > >  x * copysign (1.0, y)
> > > 
> > >  x ^ (y & (1 << sign_bit_position))
> > 
> > Also I think this can only be done for finite and non trapping types. 
> 
> That may be well true, I swithered either way and went for no checks, 
> but I'd happily go back on that and wrap this in something suitable 
> restrictive if I need to.

I don't think that's necessary.  copysign (1.0, y) is always 1.0 or -1.0, 
even with y being a NaN or inf.  Additionally copysign is allowed to not 
signal even if y is a sNaN.  That leaves only the form of x to doubt.  If 
x is inf all is well (multiplying by +-1.0 is defined and both sequences 
get the same result), if x is NaN the result must be a NaN, and it is in 
both cases.  The catch is that strictly speaking (NaN * -1.0) needs to 
deliver NaN, not -NaN (operations involving quiet NaNs need to provide 
one of the input NaNs as result), and here both are not equivalent.  OTOH 
the sign of NaNs isn't specified, so I think we could reasonably decide to 
not care about this case (it would have to be checked if the hardware 
multiplication even follows that rule, otherwise it's moot anyway).

And yes, also on x86-64 cores the new sequence would be better (or at 
least as good; the latency of xor[sp][sd] is less than or equal to mul), 
but that only is the case if the arithmetic really happen in SSE 
registers, not in integer ones, and this isn't the case right now.  Hmpf.


Ciao,
Michael.


Re: [PATCH] Fix warnings building pdp11 port

2015-10-01 Thread Jeff Law

On 10/01/2015 03:49 AM, Richard Biener wrote:

On Wed, Sep 30, 2015 at 6:43 PM, Jeff Law  wrote:

On 09/30/2015 01:48 AM, Richard Biener wrote:


On Tue, Sep 29, 2015 at 6:55 PM, Jeff Law  wrote:


The pdp11 port fails to build with the trunk because of a warning.
Essentially VRP determines that the result of using BRANCH_COST is a
constant with the range [0..1].  That's always less than 4, 3 and the
various other magic constants used with BRANCH_COST and VRP issues a
warning
about that comparison.



It does?  Huh.  Is it about undefined overflow which is the only thing
VRP should end up
warning about?  If so I wonder how that happens, at least I can't
reproduce it for
--target=pdp11 --enable-werror build of cc1.


You have to use a trunk compiler to build the pdp11 cross.  You'll bump into
this repeatedly:

   if (warn_type_limits
   && ret && only_ranges
   && TREE_CODE_CLASS (code) == tcc_comparison
   && TREE_CODE (op0) == SSA_NAME)
 {
   /* If the comparison is being folded and the operand on the LHS
  is being compared against a constant value that is outside of
  the natural range of OP0's type, then the predicate will
  always fold regardless of the value of OP0.  If -Wtype-limits
  was specified, emit a warning.  */
   tree type = TREE_TYPE (op0);
   value_range_t *vr0 = get_value_range (op0);

   if (vr0->type == VR_RANGE
   && INTEGRAL_TYPE_P (type)
   && vrp_val_is_min (vr0->min)
   && vrp_val_is_max (vr0->max)
   && is_gimple_min_invariant (op1))
 {
   location_t location;

   if (!gimple_has_location (stmt))
 location = input_location;
   else
 location = gimple_location (stmt);

   warning_at (location, OPT_Wtype_limits,
   integer_zerop (ret)
   ? G_("comparison always false "
"due to limited range of data type")
   : G_("comparison always true "
"due to limited range of data type"));
 }
 }


Oh, I didn't remember we have this kind of warning in VRP ... it's
bound to trigger
for example after jump-threading.  So I'm not sure it's useful.
It caught me by surprise as well.  It's a poor man's attempt at 
unreachable code warnings.  Strangely, it's been around since 2009, but 
is only just now causing problems.  I'd certainly question it's utility 
as well.


That may be a symptom of something else not optimizing the condition 
earlier or we've made some changes that expose the collapsed range to VRP.


Jef




Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread Jakub Jelinek
On Thu, Oct 01, 2015 at 05:43:15PM +0200, Michael Matz wrote:
> Hi,
> 
> On Thu, 1 Oct 2015, James Greenhalgh wrote:
> 
> > > >  x * copysign (1.0, y)
> > > > 
> > > >  x ^ (y & (1 << sign_bit_position))
> > > 
> > > Also I think this can only be done for finite and non trapping types. 
> > 
> > That may be well true, I swithered either way and went for no checks, 
> > but I'd happily go back on that and wrap this in something suitable 
> > restrictive if I need to.
> 
> I don't think that's necessary.  copysign (1.0, y) is always 1.0 or -1.0, 
> even with y being a NaN or inf.  Additionally copysign is allowed to not 
> signal even if y is a sNaN.  That leaves only the form of x to doubt.  If 
> x is inf all is well (multiplying by +-1.0 is defined and both sequences 
> get the same result), if x is NaN the result must be a NaN, and it is in 
> both cases.  The catch is that strictly speaking (NaN * -1.0) needs to 
> deliver NaN, not -NaN (operations involving quiet NaNs need to provide 
> one of the input NaNs as result), and here both are not equivalent.  OTOH 
> the sign of NaNs isn't specified, so I think we could reasonably decide to 
> not care about this case (it would have to be checked if the hardware 
> multiplication even follows that rule, otherwise it's moot anyway).

But if x is a sNaN, then the multiplication will throw an exception, while
the transformed operation will not.  So perhaps it should be guarded by
!HONOR_SNANS (TYPE_MODE (type))
?  And sure, somebody should look at why this isn't done in SSE.

Jakub


Re: [PATCH] x86 interrupt attribute

2015-10-01 Thread Uros Bizjak
On Thu, Oct 1, 2015 at 2:24 AM, H.J. Lu  wrote:
> On Wed, Sep 30, 2015 at 12:53 PM, Yulia Koval  wrote:
>> Done.
>>
>
> +  /* If true, the current function is an interrupt service
> + routine as specified by the "interrupt" attribute.  */
> +  BOOL_BITFIELD is_interrupt : 1;
> +
> +  /* If true, the current function is an exception service
> + routine as specified by the "interrupt" attribute.  */
> +  BOOL_BITFIELD is_exception : 1;
>
>
> It is not very clear what is the difference between is_interrupt
> and is_exception.  How about
>
>   /* If true, the current function is an interrupt service routine with
>  a pointer argument and an optional integer argument as specified by
>  the "interrupt" attribute.  */
>   BOOL_BITFIELD is_interrupt : 1;
>
>   /* If true, the current function is an interrupt service routine with
>  a pointer argument and an integer argument as specified by the
>  "interrupt" attribute.  */
>   BOOL_BITFIELD is_exception : 1;

Actually, both BOOL_BITFIELD flags should be rewritten as 2-bit
ENUM_BITFIELD using descriptive enum, e.g.

  ENUM_BITFIELD(function_type) func_type : 2;

with

TYPE_NORMAL = 0,
TYPE_INTERRUPT,
TYPE_EXCEPTION

This will simplify checking of function types, and make everything
more readable and maintainable.

Uros.


Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread Joseph Myers
On Thu, 1 Oct 2015, Michael Matz wrote:

> both cases.  The catch is that strictly speaking (NaN * -1.0) needs to 
> deliver NaN, not -NaN (operations involving quiet NaNs need to provide 
> one of the input NaNs as result), and here both are not equivalent.  OTOH 
> the sign of NaNs isn't specified, so I think we could reasonably decide to 
> not care about this case (it would have to be checked if the hardware 
> multiplication even follows that rule, otherwise it's moot anyway).

"For all other operations, this standard does not specify the sign bit of 
a NaN result, even when there is only one input NaN, or when the NaN is 
produced from an invalid operation." (IEEE 754-2008, 6.3 The sign bit).  
So no need to care about this case.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] x86 interrupt attribute

2015-10-01 Thread H.J. Lu
On Thu, Oct 1, 2015 at 8:59 AM, Uros Bizjak  wrote:
> On Thu, Oct 1, 2015 at 2:24 AM, H.J. Lu  wrote:
>> On Wed, Sep 30, 2015 at 12:53 PM, Yulia Koval  wrote:
>>> Done.
>>>
>>
>> +  /* If true, the current function is an interrupt service
>> + routine as specified by the "interrupt" attribute.  */
>> +  BOOL_BITFIELD is_interrupt : 1;
>> +
>> +  /* If true, the current function is an exception service
>> + routine as specified by the "interrupt" attribute.  */
>> +  BOOL_BITFIELD is_exception : 1;
>>
>>
>> It is not very clear what is the difference between is_interrupt
>> and is_exception.  How about
>>
>>   /* If true, the current function is an interrupt service routine with
>>  a pointer argument and an optional integer argument as specified by
>>  the "interrupt" attribute.  */
>>   BOOL_BITFIELD is_interrupt : 1;
>>
>>   /* If true, the current function is an interrupt service routine with
>>  a pointer argument and an integer argument as specified by the
>>  "interrupt" attribute.  */
>>   BOOL_BITFIELD is_exception : 1;
>
> Actually, both BOOL_BITFIELD flags should be rewritten as 2-bit
> ENUM_BITFIELD using descriptive enum, e.g.
>
>   ENUM_BITFIELD(function_type) func_type : 2;
>
> with
>
> TYPE_NORMAL = 0,
> TYPE_INTERRUPT,
> TYPE_EXCEPTION
>
> This will simplify checking of function types, and make everything
> more readable and maintainable.
>

Since an exception handler is a subset of interrupt handlers,
we need to check 2 bits separately.  Make the field 2 bits
doesn't make code more maintainable.

-- 
H.J.


Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread Michael Matz
Hi,

On Thu, 1 Oct 2015, Jakub Jelinek wrote:

> But if x is a sNaN, then the multiplication will throw an exception, while
> the transformed operation will not.

Hmm, that's right, silly me.

> So perhaps it should be guarded by
> !HONOR_SNANS (TYPE_MODE (type))
> ?

That makes sense, yes.


Ciao,
Michael.


Re: [PATCH] Update SSA_NAME manager to use two lists

2015-10-01 Thread Jeff Law

On 10/01/2015 04:00 AM, Richard Biener wrote:



Apart from what Jakub said - this keeps the list non-recycled for example
after DCE if that doesnt call cleanup_cfg.  Likewise after passes that call
cleanup_cfg manually.  It also doesn't get called after IPA transform
passes (which would require calling on each function).

To at least catch those passes returning 0 (do nothing) I'd place the
call into execute_todo instead, unconditionally on flags.
I can speculate there's pathological cases where it'd be useful, but 
that in the general case it'll be small.  It's easy enough to do some 
testing around this.


I'm also still pondering whether or not to have the code simply adapt 
itself to the conditions.   Essentially allowing immediate recycling up 
until the point where we release an SSA_NAME after removing an edge. 
At that point it'd switch to deferred mode until the next time the 
pending list is flushed.


We'd have to arrange to get notified of edge removals, obviously.  I'm 
also still pondering the long term safety issues of that scheme as well 
as the implementation & testing details.





@@ -607,6 +626,7 @@ unsigned int
  pass_release_ssa_names::execute (function *fun)
  {
unsigned i, j;
+  flush_ssaname_freelist ();


which would make this redundant as well.  I suppose it would be
interesting to see some before/after
statistics of the release_ssa_names pass.  I expect the number of
holes removed to increase, hopefully
not too much (esp. important for analysis passes using sbitmaps of SSA names).
There's some TLC that needs to happen in pass_release_ssa_names -- it 
knows far too much about the underlying details of name management.  All 
that code really belongs in the name manager itself and the pass should 
just issue the call into the name manager to release & pack the data. 
That's one of the refactorings I mentioned in my reply to Jakub.


Essentially this should be driven by the name manager and occur at 
points where it's safe and likely profitable.  Safe points occur between 
passes.  Profitability can likely be estimated cheaply within the 
manager itself since it has a good handle on what's in the free lists vs 
the overall size of the name table.


The other area ripe for refactoring and extension are the verification 
bits.  I was torn whether or not to tackle that first or as a follow-up. 
 I ultimately chose the latter.


Jeff


Re: [Patch match.pd] Add a simplify rule for x * copysign (1.0, y);

2015-10-01 Thread Michael Matz
Hi,

On Thu, 1 Oct 2015, Joseph Myers wrote:

> On Thu, 1 Oct 2015, Michael Matz wrote:
> 
> > both cases.  The catch is that strictly speaking (NaN * -1.0) needs to 
> > deliver NaN, not -NaN (operations involving quiet NaNs need to provide 
> > one of the input NaNs as result), and here both are not equivalent.  OTOH 
> > the sign of NaNs isn't specified, so I think we could reasonably decide to 
> > not care about this case (it would have to be checked if the hardware 
> > multiplication even follows that rule, otherwise it's moot anyway).
> 
> "For all other operations, this standard does not specify the sign bit of 
> a NaN result, even when there is only one input NaN, or when the NaN is 
> produced from an invalid operation." (IEEE 754-2008, 6.3 The sign bit).  
> So no need to care about this case.

Ah.  I was looking at an old version; thanks.


Ciao,
Michael.


Re: [PATCH] x86 interrupt attribute

2015-10-01 Thread Uros Bizjak
On Thu, Oct 1, 2015 at 6:08 PM, H.J. Lu  wrote:
> On Thu, Oct 1, 2015 at 8:59 AM, Uros Bizjak  wrote:
>> On Thu, Oct 1, 2015 at 2:24 AM, H.J. Lu  wrote:
>>> On Wed, Sep 30, 2015 at 12:53 PM, Yulia Koval  wrote:
 Done.

>>>
>>> +  /* If true, the current function is an interrupt service
>>> + routine as specified by the "interrupt" attribute.  */
>>> +  BOOL_BITFIELD is_interrupt : 1;
>>> +
>>> +  /* If true, the current function is an exception service
>>> + routine as specified by the "interrupt" attribute.  */
>>> +  BOOL_BITFIELD is_exception : 1;
>>>
>>>
>>> It is not very clear what is the difference between is_interrupt
>>> and is_exception.  How about
>>>
>>>   /* If true, the current function is an interrupt service routine with
>>>  a pointer argument and an optional integer argument as specified by
>>>  the "interrupt" attribute.  */
>>>   BOOL_BITFIELD is_interrupt : 1;
>>>
>>>   /* If true, the current function is an interrupt service routine with
>>>  a pointer argument and an integer argument as specified by the
>>>  "interrupt" attribute.  */
>>>   BOOL_BITFIELD is_exception : 1;
>>
>> Actually, both BOOL_BITFIELD flags should be rewritten as 2-bit
>> ENUM_BITFIELD using descriptive enum, e.g.
>>
>>   ENUM_BITFIELD(function_type) func_type : 2;
>>
>> with
>>
>> TYPE_NORMAL = 0,
>> TYPE_INTERRUPT,
>> TYPE_EXCEPTION
>>
>> This will simplify checking of function types, and make everything
>> more readable and maintainable.
>>
>
> Since an exception handler is a subset of interrupt handlers,
> we need to check 2 bits separately.  Make the field 2 bits
> doesn't make code more maintainable.

For example, the following code:

+  if (!cum->caller && cfun->machine->is_interrupt)
+{
+  /* The first argument of interrupt handler is a pointer and
+ points to the return address on stack.  The optional second
+ argument is an integer for error code on stack.  */
+  gcc_assert (type != NULL_TREE);
+  if (POINTER_TYPE_P (type))
+ {
+  if (cfun->machine->is_exception)

would become:

if (!cum->caller && cfun->machine->func_type != TYPE_NORMAL)
  {
[...]
if (cfun->machine->func_type == TYPE_EXCEPTION)
[...]

It is kind of unintuitive if the function is an interrupt and an
exception at the same time, as is implied in the original code. In
proposed improvement, it is clear that it is not normal, and is later
refined to an exception.

Uros.

> --
> H.J.


Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)

2015-10-01 Thread Marek Polacek
On Thu, Oct 01, 2015 at 11:02:09AM -0400, David Edelsohn wrote:
> On Thu, Oct 1, 2015 at 10:49 AM, Marek Polacek  wrote:
> > Joseph reminded me that I had forgotten about this patch.  As mentioned
> > here , I'm
> > removing the XFAILs in the tests so people are likely to see new FAILs.
> >
> > I think the following targets will need similar fix as the one below:
> > * MIPS
> > * rs6000
> > * alpha
> > * sparc
> > * s390
> > * arm
> > * sh
> > * aarch64
> >
> > I'm CCing the respective maintainers.  You might want to XFAIL those tests.
> 
> Why aren't you testing the appropriate fix on all of the targets?

It's very improbable that I could fix and properly test all of them;
I simply don't have the cycles and resources to fix e.g. sh/sparc/alpha/mips.

You want me to revert my fix, but I don't really see the point here; the
patch doesn't introduce any regressions, it's just that the new tests are
likely to FAIL.  It sounds preferable to me to fix 2 targets than to leave
all of them broken (and I bet many maintainers were unaware of the issue).

Would XFAILing the new tests work for you, if you don't want to see any
new FAILs?

If you still insist on reverting the patch, ok, but I think this PR is
unlikely to be resolved any time soon then.

Marek


Re: [PATCH] fortran/67758 -- Prevent ICE caused by misplaced COMMON

2015-10-01 Thread Steve Kargl
On Thu, Oct 01, 2015 at 03:29:05PM +0200, Mikael Morin wrote:
> Le 01/10/2015 14:16, Mikael Morin a écrit :
> > Le 01/10/2015 02:07, Steve Kargl a écrit :
> >> On Wed, Sep 30, 2015 at 05:06:30PM -0700, Steve Kargl wrote:
> >>> Patch built and regression tested on x86_64-*-freebsd.
> >>> OK to commit?
> >>>
> >>> The patch prevents the dereferencing of a NULL pointer
> >>> by jumping out of the cleanup of a list of COMMON blocks.
> >>>
> > Hold on, I believe p should be present in the common symbol list pointed
> > by p->common.
> s/p->common/p->common_block/
> > And by the way, if we are in gfc_restore_last_undo_checkpoint, we have
> > found something bogus enough to backtrack, so hopefully an error has
> > already been prepared (but maybe not emitted).
> > I will investigate more.
> >
> It seems the error [1] is reported in gfc_add_in_common, between the 
> time the symbol's common_block pointer is set and the time the symbol is 
> added to the list.
> As the program goes straight to clean-up/return upon error, this interim 
> state is not fixed and poses problem.
> 
> So we need to reduce the interim time to zero or fix the state upon error.
> I propose the following, which delays setting the common_block after 
> error_checking (I believe it is not used in that time).
> 
> Regression-tested on x86_64-unknown-linux-gnu. OK for trunk?
> 

I'm fine with your patch, although I find the error message
to be somewhat confusing as no procedure appears in COMMON.
The call-stmt in the code is the start of an execution-construct.
A common-stmt is not allowed in an execution-construct.  At
least, that's how I intepret the BNF in 2.1 of F2008.

-- 
Steve


[PATCH] correctly handle non affine data references

2015-10-01 Thread Sebastian Pop
PR tree-optimization/66980
* graphite-scop-detection.c (stmt_has_simple_data_refs_p): Return false
when data reference analysis has failed.
---
 gcc/graphite-scop-detection.c|  7 +++
 gcc/testsuite/gcc.dg/graphite/scop-pr66980.c | 10 ++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/scop-pr66980.c

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index c45df55..dee4f86d1 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -274,6 +274,13 @@ stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
   FOR_EACH_VEC_ELT (drs, j, dr)
 {
   int nb_subscripts = DR_NUM_DIMENSIONS (dr);
+
+  if (nb_subscripts < 1)
+   {
+ free_data_refs (drs);
+ return false;
+   }
+
   tree ref = DR_REF (dr);
 
   for (int i = nb_subscripts - 1; i >= 0; i--)
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-pr66980.c 
b/gcc/testsuite/gcc.dg/graphite/scop-pr66980.c
new file mode 100644
index 000..cf93452
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/scop-pr66980.c
@@ -0,0 +1,10 @@
+void foo(unsigned char *in, unsigned char *out, int w, int h)
+{
+  unsigned int i, j;
+  for (i = 0; i < 3*w*h; i++)
+for (j = 0; j < 3*w*h; j++)
+  out[i * w + j] = in[(i * w + j)*3] + in[(i * w + j)*3 + 1] + in[(i * w + 
j)*3 + 2];
+}
+
+/* Requires delinearization to be able to represent "i*w".  */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail 
*-*-* } } } */
-- 
2.1.0.243.g30d45f7



Re: Fold acc_on_device

2015-10-01 Thread Andrew MacLeod

On 09/30/2015 08:46 AM, Richard Biener wrote:

On Wed, Sep 30, 2015 at 2:18 PM, Nathan Sidwell  wrote:

Please don't add any new GENERIC based builtin folders.  Instead add to
gimple-fold.c:gimple_fold_builtin

Otherwise you're just generating more work for us who move foldings from
builtins.c to gimple-fold.c.


Oh, sorry, I didn't know about that.  Will fix.

Should I use the same
  if (symtab->state == EXPANSION)
test to make sure we're after LTO read back (i.e. know which compiler we're
in), or is there another way?

I don't know of a better way, no.  I'll add a comment to builtins.c
(not that I expect anyone sees it ;))




btw, not that it's necessarily important, but I'm about to submit the 
include reduction patches today,  and it turns out this line is the 
first use of anything from cgraph.h in builtins.c.


So if this is "the way" of doing the test, be aware it adds a dependency 
on cgraph.h that wasn't there before.


I noticed because the reducer finished on a 9/28 branch .   When I 
re-applied the patch to a 9/30 branch,  builtins.c failed to compile 
because it wasn't including cgraph.h for 'symtab' and 'EXPANSION'. So it 
wasn't required in 9/28.


Going forward, it will be obvious when we are adding a new dependency 
because the file will fail to compile without adding the requisite 
header file.


Andrew






Do not describe -std=c11 etc. as experimental in c.opt help text

2015-10-01 Thread Joseph Myers
I noticed that c.opt still described -std=c11 and related options as
experimental in the --help text.  This patch fixes this.

Jason, note that -std=gnu++11 and -std=gnu++14 still have that text,
contrary to the descriptions of -std=c++11 and -std=c++14.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to
mainline.

2015-10-01  Joseph Myers  

* c.opt (std=c11): Do not describe as experimental.
(std=gnu11): Likewise.
(std=iso9899:2011): Likewise.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 228327)
+++ gcc/c-family/c.opt  (working copy)
@@ -1656,7 +1656,7 @@
 
 std=c11
 C ObjC
-Conform to the ISO 2011 C standard (experimental and incomplete support)
+Conform to the ISO 2011 C standard
 
 std=c1x
 C ObjC Alias(std=c11)
@@ -1713,7 +1713,7 @@
 
 std=gnu11
 C ObjC
-Conform to the ISO 2011 C standard with GNU extensions (experimental and 
incomplete support)
+Conform to the ISO 2011 C standard with GNU extensions
 
 std=gnu1x
 C ObjC Alias(std=gnu11)
@@ -1753,7 +1753,7 @@
 
 std=iso9899:2011
 C ObjC Alias(std=c11)
-Conform to the ISO 2011 C standard (experimental and incomplete support)
+Conform to the ISO 2011 C standard
 
 traditional
 Driver

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Fold acc_on_device

2015-10-01 Thread Nathan Sidwell

On 10/01/15 13:00, Andrew MacLeod wrote:


btw, not that it's necessarily important, but I'm about to submit the include
reduction patches today,  and it turns out this line is the first use of
anything from cgraph.h in builtins.c.

So if this is "the way" of doing the test, be aware it adds a dependency on
cgraph.h that wasn't there before.


The patch I just committed has moved this to gimple-fold.c.  That appears to 
already include cgraph.h.


nathan



Re: [PATCH] rs6000: Add "cannot_copy" attribute, use it (PR67788, PR67789)

2015-10-01 Thread Segher Boessenkool
On Thu, Oct 01, 2015 at 12:14:44PM +0200, Richard Biener wrote:
> On Thu, Oct 1, 2015 at 8:08 AM, Segher Boessenkool
>  wrote:
> > After the shrink-wrapping patches the prologue will often be pushed
> > "deeper" into the function, which in turn means the software trace cache
> > pass will more often want to duplicate the basic block containing the
> > prologue.  This caused failures for 32-bit SVR4 with -msecure-plt PIC.
> >
> > This configuration uses the load_toc_v4_PIC_1 instruction, which creates
> > assembler labels without using the normal machinery for that.  If now
> > the compiler decides to duplicate the insn, it will emit the same label
> > twice.  Boom.
> >
> > It isn't so easy to fix this to use labels the compiler knows about (let
> > alone test that properly).  Instead, this patch wires up a "cannot_copy"
> > attribute to be used by TARGET_CANNOT_COPY_P, and sets that attribute on
> > these insns we do not want copied.
> >
> > Bootstrapped and tested on powerpc64-linux, with the usual configurations
> > (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); new testcase fails before, works
> > after (on 32-bit).
> >
> > Is this okay for mainline?
> 
> Isn't that quite expensive?

Not really?  recog_memoized isn't so bad.  I cannot measure a difference.

> So even if not "easy", can you try?

I did, and after half a day had a big mess and lots of things failing,
no idea where this was headed, and in the meantime bootstrap still fails
(on affected targets).

Other targets use cannot_copy_p in similar situations.

> Do we have other ports with local labels in define_insns?

I of course tried to find such, but didn't.  Oh, you're asking for any
insn that does anything whatsoever, but with a label.  That's the
"half a day" above.

It might matter that these insns are created after reload.  Or I somehow
need to force the BB to be split, that seems to have been the problem;
and splitting the prologue will be FUN.

> I see some in darwin.md as well which your patch doesn't handle btw.,

Oh argh forgot to grep outside of rs6000.md.  Will fix.

> otherwise suspicious %0: also appears (only) in h8300.md.

I think it is part of the syntax for that insn?   jsr ...:8

> arc.md also has
> a suspicious case in its doloop_end_i pattern.

That one is in an assembler comment  :-)


Segher


Re: [patch] libstdc++/67747 Allocate space for dirent::d_name

2015-10-01 Thread Jonathan Wakely

On 30/09/15 09:30 -0600, Martin Sebor wrote:

On 09/30/2015 05:01 AM, Jonathan Wakely wrote:

On 29/09/15 12:54 -0600, Martin Sebor wrote:

On 09/29/2015 05:37 AM, Jonathan Wakely wrote:

POSIX says that dirent::d_name has an unspecified length, so calls to
readdir_r must pass a buffer with enough trailing space for
{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
GNU/Linux and BSD where d_name is a large array, but fails on Solaris
32-bit.

This uses pathconf to get NAME_MAX and allocates a buffer.

Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
this to trunk today (and backport all the filesystem fixes to
gcc-5-branch).


Calling pathconf is only necessary when _POSIX_NO_TRUNC is zero
which I think exists mainly for legacy file systems. Otherwise,
it's safe to use NAME_MAX instead. Avoiding the call to pathconf


Oh, nice. I was using NAME_MAX originally but the glibc readdir_r(3)
man-page has an example using pathconf and says that should be used,
so I went down that route.


GLIBC pathconf calls statfs to get the NAME_MAX value from the OS,
so there's some chance that some unusual file system will define
it as greater than 255. I tested all those on my Fedora PC and on
my RHEL 7.2 powerpc64le box and didn't find one.

There also isn't one in the Wikipedia comparison of file systems:
 https://en.wikipedia.org/wiki/Comparison_of_file_systems

But to be 100% safe, we would need to call pathconf if readdir
failed due to truncation.


Can we be sure NAME_MAX will never be too big for the stack?


When it's defined I think it's probably safe in practice but not
guaranteed. Also, like all of these  constants, it's not
guaranteed to be defined at all when the implementation doesn't
impose a limit. I believe AIX is one implementation that doesn't
define it. So some preprocessor magic will be required to deal
with that.


OK, latest version attached. This defines a helper type,
dirent_buffer, which either contains aligned_union or
unique_ptr depending on whether NAME_MAX is defined
or whether we have to use pathconf at run-time.

The dirent_buffer is created as part of the directory_iterator or
recursive_directory_iterator internals, so only once per iterator, and
shared between copies of that iterator. When NAME_MAX is defined there
is no second allocation, we just allocate a bit more space.

Tested powerpc64le-linux and x86_64-dragonfly.

I've started a build on powerpc-aix too but I don't know when the
tests for that will finish.

commit 6b30f2675aee599e43bcb55594de3abd2221c3ae
Author: Jonathan Wakely 
Date:   Thu Oct 1 17:01:24 2015 +0100

PR libstdc++/67747 Allocate space for dirent::d_name

	PR libstdc++/67747
	* src/filesystem/dir.cc (_Dir::direntp): New member.
	(dirent_size, dirent_buffer): New helpers for readdir results.
	(native_readdir) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Copy to supplied
	dirent object. Handle end of directory.
	(_Dir::advance): Allocate space for d_name.
	(directory_iterator(const path&, directory_options, error_code*)):
	Allocate a dirent_buffer alongside the _Dir object.
	(recursive_directory_iterator::_Dir_stack): Add dirent_buffer member,
	constructor and push() function that sets the element's entp.

diff --git a/libstdc++-v3/src/filesystem/dir.cc b/libstdc++-v3/src/filesystem/dir.cc
index bce751c..9372074 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -25,8 +25,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+# include 
+#endif
 #ifdef _GLIBCXX_HAVE_DIRENT_H
 # ifdef _GLIBCXX_HAVE_SYS_TYPES_H
 #  include 
@@ -51,7 +55,7 @@ struct fs::_Dir
 
   _Dir(_Dir&& d)
   : dirp(std::exchange(d.dirp, nullptr)), path(std::move(d.path)),
-entry(std::move(d.entry)), type(d.type)
+entry(std::move(d.entry)), type(d.type), entp(d.entp)
   { }
 
   _Dir& operator=(_Dir&&) = delete;
@@ -64,20 +68,64 @@ struct fs::_Dir
   fs::path		path;
   directory_entry	entry;
   file_type		type = file_type::none;
+  ::dirent*		entp = nullptr;
 };
 
 namespace
 {
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+# undef NAME_MAX
+# define NAME_MAX 260
+#endif
+
+  // Size needed for struct dirent with {NAME_MAX} + 1 chars
+  static constexpr size_t
+  dirent_size(size_t name_max)
+  { return offsetof(::dirent, d_name) + name_max + 1; }
+
+  // Manage a buffer large enough for struct dirent with {NAME_MAX} + 1 chars
+  struct dirent_buffer
+  {
+#ifdef NAME_MAX
+dirent_buffer(const fs::path&) { }
+
+::dirent* get() { return reinterpret_cast<::dirent*>(&ent); }
+
+std::aligned_union::type ent;
+#else
+
+dirent_buffer(const fs::path& path __attribute__((__unused__)))
+{
+  long name_max = 255;  // An informed guess.
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+  long pc_name_max = pathconf(path.c_str(), _PC_NAME_MAX);
+  if (pc_name_max != -1)
+	name_max = pc_name_max;
+#endif
+  ptr.reset(static_cast<::diren

Re: [PATCH] Clarify __atomic_compare_exchange_n docs

2015-10-01 Thread Jonathan Wakely

On 01/10/15 12:28 +0100, Andrew Haley wrote:

On 09/29/2015 04:21 PM, Sandra Loosemore wrote:

What is "weak compare_exchange", and what is "the strong variation", and
how do they differ in terms of behavior?


It's in C++11 29.6.5:

Remark: The weak compare-and-exchange operations may fail spuriously,
that is, return false while leaving the contents of memory pointed to
by expected before the operation is the same that same as that of the
object and the same as that of expected after the operation. [ Note:
This spurious failure enables implementation of compare-and-exchange
on a broader class of machines, e.g., load- locked store-conditional
machines. A consequence of spurious failure is that nearly all uses of
weak compare-and-exchange will be in a loop.  When a
compare-and-exchange is in a loop, the weak version will yield better
performance on some platforms. When a weak compare-and-exchange would
require a loop and a strong one would not, the strong one is
preferable. — end note ]

The classic use of this is for shared counters: you don't care if you
miss an occasional count but you don't want the counter to go
backwards.

Whether we should replicate all of the C++11 language is perhaps
something we should discuss.


I would suggest we don't try to reproduce the standard definition, but
just say the weak version can fail spuriously and the strong can't.
IMHO this isn't the place to educate people in the fine points of
low-level atomics. As it says, "when in doubt use the strong
variation".

i.e. apply this in addition to my earlier suggestion.


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0de94f2..ce1b4ae 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9354,7 +9354,8 @@ This compares the contents of @code{*@var{ptr}} with the contents of
 operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
 equal, the operation is a @emph{read} and the current contents of
 @code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is true
-for weak compare_exchange, and false for the strong variation.  Many targets 
+for weak compare_exchange, which may fail spuriously, and false for
+the strong variation, which never fails spuriously.  Many targets 
 only offer the strong variation and ignore the parameter.  When in doubt, use
 the strong variation.
 


Re: [PATCH] Clarify __atomic_compare_exchange_n docs

2015-10-01 Thread Andrew Haley
On 10/01/2015 06:32 PM, Jonathan Wakely wrote:
> I would suggest we don't try to reproduce the standard definition, but
> just say the weak version can fail spuriously and the strong can't.
> IMHO this isn't the place to educate people in the fine points of
> low-level atomics. As it says, "when in doubt use the strong
> variation".
> 
> i.e. apply this in addition to my earlier suggestion.

"If you don't already know what a weak CAS is you probably should
not even think of using it."  :)

Andrew.



Re: [PATCH] rs6000: Add "cannot_copy" attribute, use it (PR67788, PR67789)

2015-10-01 Thread Segher Boessenkool
On Thu, Oct 01, 2015 at 10:08:50AM -0400, David Edelsohn wrote:
> Is this expensive enough that it is worth limiting the definition of
> the hook to configurations that include 32-bit SVR4 support so that
> not every configuration incurs the overhead?

I don't think so.  That won't save the call to the target hook, and
that is a big part of the overhead already.


Segher


Re: [PATCH] Clarify __atomic_compare_exchange_n docs

2015-10-01 Thread Jonathan Wakely

On 01/10/15 18:34 +0100, Andrew Haley wrote:

On 10/01/2015 06:32 PM, Jonathan Wakely wrote:

I would suggest we don't try to reproduce the standard definition, but
just say the weak version can fail spuriously and the strong can't.
IMHO this isn't the place to educate people in the fine points of
low-level atomics. As it says, "when in doubt use the strong
variation".

i.e. apply this in addition to my earlier suggestion.


"If you don't already know what a weak CAS is you probably should
not even think of using it."  :)


Exactly. These are not the built-ins you're looking for. :)




  1   2   >