date:20151201

Re: [UPC 01/22] front-end changes

2015-12-01 Thread Eric Botcazou

> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.

That's not all languages though, Ada and Java are missing.

-- 
Eric Botcazou

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek

On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> Ok, but it doesn't solve the issue with doing it for the executable, because
> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

?? You mean that the
devicep->unload_image_func (devicep->target_id, version, target_data);
call deinitializes the device or something else (I mean, if there is some
other tgt, then it had to be initialized)?
If it is just that order, I wonder if you can't just move the
unload_image_func call after the splay_tree_remove loops (or even after the
node freeing call).

Jakub

Re: [UPC 01/22] front-end changes

2015-12-01 Thread Gary Funck

On 12/01/15 09:12:44, Eric Botcazou wrote:
> > All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> > bootstrapped; no test suite regressions were introduced,
> > relative to the GCC trunk.
> 
> That's not all languages though, Ada and Java are missing.

OK. I'll bootstrap and run tests on those as well, and
report back in a day/two.

thanks,
- Gary

Re: [OpenACC 0/7] host_data construct

2015-12-01 Thread Jakub Jelinek

On Mon, Nov 30, 2015 at 07:30:34PM +, Julian Brown wrote:
> Julian Brown  
> Cesar Philippidis  
> James Norris  
> 
> gcc/
> * c-family/c-pragma.c (oacc_pragmas): Add PRAGMA_OACC_HOST_DATA.
> * c-family/c-pragma.h (pragma_kind): Add PRAGMA_OACC_HOST_DATA.

c-family/, c/ and cp/ subdirectories have their own ChangeLog, so you need
to split the entry into multiple ChangeLog files and remove the directory
prefixes.

> @@ -6120,6 +6121,9 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
> decl, bool in_code)
>(splay_tree_key) decl);
> if (n2)
>   {
> +   if (octx->region_type == ORT_ACC_HOST_DATA)
> + error ("variable %qE declared in enclosing "
> +"host_data region", DECL_NAME (decl));

% instead?
> nflags |= GOVD_MAP;
> goto found_outer;
>   }
> @@ -6418,6 +6422,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>case OMP_TARGET_DATA:
>case OMP_TARGET_ENTER_DATA:
>case OMP_TARGET_EXIT_DATA:
> +  case OACC_HOST_DATA:
>   ctx->target_firstprivatize_array_bases = true;
>default:
>   break;
> @@ -6683,6 +6688,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   case OMP_TARGET_DATA:
>   case OMP_TARGET_ENTER_DATA:
>   case OMP_TARGET_EXIT_DATA:
> + case OACC_HOST_DATA:
> if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
> || (OMP_CLAUSE_MAP_KIND (c)
> == GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> @@ -6695,6 +6701,22 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   }
> if (remove)
>   break;
> +   if (DECL_P (decl) && outer_ctx && (region_type & ORT_ACC))
> + {
> +   struct gimplify_omp_ctx *octx;
> +   for (octx = outer_ctx; octx; octx = octx->outer_context)
> + {
> +   if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
> + break;

Wouldn't it be better to do
if (octx->region_type != ORT_ACC_HOST_DATA)
  continue;
here, thus only lookup if you really want to use it?

> +   splay_tree_node n2
> + = splay_tree_lookup (octx->variables,
> +  (splay_tree_key) decl);
> +   if (n2 && octx->region_type == ORT_ACC_HOST_DATA)

and remove the && ... part from the condition?

> + error_at (OMP_CLAUSE_LOCATION (c), "variable %qE "
> +   "declared in enclosing host_data region",
> +   DECL_NAME (decl));
> + }
> + }
> if (OMP_CLAUSE_SIZE (c) == NULL_TREE)
>   OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)
> : TYPE_SIZE_UNIT (TREE_TYPE (decl));

Ok with those changes.

Jakub

Re: [openacc] fortran loop clauses and splitting

2015-12-01 Thread Jakub Jelinek

On Mon, Nov 30, 2015 at 10:00:06AM -0800, Cesar Philippidis wrote:
> This patch contains the following bug fixes:
> 
>  * Teaches gfortran to accept both num and static gang arguments inside
>same clause. E.g. gang(num:10, static:30). Currently, gfortran only
>allows one of those arguments to appear in a gang clause.
> 
>  * Make the diagnostics reported by resovle_oacc_positive_int_expr more
>accurate for worker and vector clauses.
> 
>  * Updates how combined loops are split to account for the renamed gang
>clause members in gfc_omp_clauses.  Also corrected a bug that Tom
>discovered in the c front end where combined reductions were being
>attached to kernels and parallel constructs. Now, they are only
>associated with the split acc loop.
> 
> Is this OK for trunk?

Ok, thanks.

Jakub

[Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-01 Thread Ajit Kumar Agarwal

The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the
instruction prefetch instruction. The instruction prefetch optimization is done 
to generate the iprefetch instruction 
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose
of adding the flags is that selection of "wic" instruction should be enabled in 
the reconfigurable design and the 
selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit


iprefetch.patch
Description: iprefetch.patch

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin


> On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> 
>> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
>> Ok, but it doesn't solve the issue with doing it for the executable, because
>> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> 
> ?? You mean that the
> devicep->unload_image_func (devicep->target_id, version, target_data);
> call deinitializes the device or something else (I mean, if there is some
> other tgt, then it had to be initialized)?

No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see 
my last mail with the patch).

  -- Ilya

[PATCH] Fix PR68590

2015-12-01 Thread Richard Biener


The following avoids PR68590 by merging two match.pd patterns.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2015-12-01  Richard Biener  

PR middle-end/68590
* match.pd: Merge (eq @0 @0) and (ge/le @0 @0) patterns.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 231065)
+++ gcc/match.pd(working copy)
@@ -1828,15 +1828,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  
 /* Simplify comparison of something with itself.  For IEEE
floating-point, we can only do some of these simplifications.  */
-(simplify
- (eq @0 @0)
- (if (! FLOAT_TYPE_P (TREE_TYPE (@0))
-  || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (@0
-  { constant_boolean_node (true, type); }))
-(for cmp (ge le)
+(for cmp (eq ge le)
  (simplify
   (cmp @0 @0)
-  (eq @0 @0)))
+  (if (! FLOAT_TYPE_P (TREE_TYPE (@0))
+   || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (@0
+   { constant_boolean_node (true, type); }
+   (if (cmp != EQ_EXPR)
+(eq @0 @0)
 (for cmp (ne gt lt)
  (simplify
   (cmp @0 @0)

[Patch AArch64] Fix typo in aarch64_builtin_reciprocal.

2015-12-01 Thread Ramana Radhakrishnan

The patch to restructure builtin_reciprocals missed out an obvious ')'. 
Adjusted thusly and applied as obvious to trunk.

regards
Ramana


2015-12-01  Ramana Radhakrishnan  

* config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Fix typo.

RE: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-01 Thread Ajit Kumar Agarwal

Moreover this patch is tested and run on hardware with Mibench/EEMBC benchmarks 
for Microblaze target. The reconfigurable 
design is enabled with the selection of "wic" instruction prefetch instruction 
and above benchmarks compiled with -mxl-prefetch flags.

Thanks & Regards
Ajit
-Original Message-
From: Ajit Kumar Agarwal 
Sent: Tuesday, December 01, 2015 2:19 PM
To: GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the instruction prefetch instruction. The 
instruction prefetch optimization is done to generate the iprefetch instruction 
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose of adding the flags is that 
selection of "wic" instruction should be enabled in the reconfigurable design 
and the selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit

Re: [Patch AArch64] Fix typo in aarch64_builtin_reciprocal.

2015-12-01 Thread Jakub Jelinek

On Tue, Dec 01, 2015 at 08:58:53AM +, Ramana Radhakrishnan wrote:
> The patch to restructure builtin_reciprocals missed out an obvious ')'. 
> Adjusted thusly and applied as obvious to trunk.

Sorry for that.  Could you please also handle the gimple_call_internal_p
case, so that it actually returns the aarch64 builtin decls if
it is internal SQRT call with the right modes?  See the i386 and rs6000
builtins.  Haven't done that for aarch64, because it uses a helper function
defined somewhere else, so haven't been sure how you want it to look like.
> 
> 2015-12-01  Ramana Radhakrishnan  
> 
> * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Fix typo.

Jakub

Re: [gomp] Move openacc vector& worker single handling to RTL

2015-12-01 Thread Thomas Schwinge

Hi!

On Thu, 09 Jul 2015 20:25:22 -0400, Nathan Sidwell  wrote:
> This is the patch I committed.  [...]

> 2015-07-09  Nathan Sidwell  

>   * omp-low.c (omp_region): [...]
>   (enclosing_target_region, required_predication_mask,
>   generate_vector_broadcast, generate_oacc_broadcast,
>   make_predication_test, predicate_bb, find_predicatable_bbs,
>   predicate_omp_regions): Delete.
>   [...]

This removed all usage of bb_region_map.  Now cleaned up in
gomp-4_0-branch r231102:

commit ff7e1eb4e855aa16d14ae047172269bc7192a069
Author: tschwinge 
Date:   Tue Dec 1 09:04:33 2015 +

gcc/omp-low.c: Remove bb_region_map

gcc/
* omp-low.c (bb_region_map): Remove.  Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231102 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  4 
 gcc/omp-low.c  | 42 +-
 2 files changed, 21 insertions(+), 25 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 0e4f371..4842164 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-12-01  Thomas Schwinge  
+
+   * omp-low.c (bb_region_map): Remove.  Adjust all users.
+
 2015-11-30  Cesar Philippidis  
 
* tree-nested.c (convert_nonlocal_omp_clauses): Handle optional
diff --git gcc/omp-low.c gcc/omp-low.c
index 1b52f6b..a1e7a14 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -13356,9 +13356,6 @@ expand_omp (struct omp_region *region)
 }
 }
 
-/* Map each basic block to an omp_region.  */
-static hash_map *bb_region_map;
-
 static void
 find_omp_for_region_data (struct omp_region *region, gomp_for *stmt)
 {
@@ -13394,8 +13391,6 @@ build_omp_regions_1 (basic_block bb, struct omp_region 
*parent,
   gimple *stmt;
   basic_block son;
 
-  bb_region_map->put (bb, parent);
-
   gsi = gsi_last_bb (bb);
   if (!gsi_end_p (gsi) && is_gimple_omp (gsi_stmt (gsi)))
 {
@@ -13536,31 +13531,28 @@ build_omp_regions (void)
 static unsigned int
 execute_expand_omp (void)
 {
-  bb_region_map = new hash_map;
-
   build_omp_regions ();
 
-  if (root_omp_region)
+  if (!root_omp_region)
+return 0;
+
+  if (dump_file)
 {
-  if (dump_file)
-   {
- fprintf (dump_file, "\nOMP region tree\n\n");
- dump_omp_region (dump_file, root_omp_region, 0);
- fprintf (dump_file, "\n");
-   }
-
-  remove_exit_barriers (root_omp_region);
-
-  expand_omp (root_omp_region);
-
-  if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
-   verify_loop_structure ();
-  cleanup_tree_cfg ();
-
-  free_omp_regions ();
+  fprintf (dump_file, "\nOMP region tree\n\n");
+  dump_omp_region (dump_file, root_omp_region, 0);
+  fprintf (dump_file, "\n");
 }
 
-  delete bb_region_map;
+  remove_exit_barriers (root_omp_region);
+
+  expand_omp (root_omp_region);
+
+  if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+verify_loop_structure ();
+  cleanup_tree_cfg ();
+
+  free_omp_regions ();
+
   return 0;
 }
 


Grüße
 Thomas


signature.asc
Description: PGP signature

PR68577: Handle narrowing for vector popcount, etc.

2015-12-01 Thread Richard Sandiford

This patch adds support for simple cases where the a vector internal
function returns wider results than the scalar equivalent.  It punts
on other cases.

Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68577
* tree-vect-stmts.c (simple_integer_narrowing): New function.
(vectorizable_call): Restrict internal function handling
to NONE and NARROW cases, using simple_integer_narrowing
to test for the latter.  Add cost of narrowing operation
and insert it where necessary.

gcc/testsuite/
PR tree-optimization/68577
* gcc.dg/vect/pr68577.c: New test.

diff --git a/gcc/testsuite/gcc.dg/vect/pr68577.c 
b/gcc/testsuite/gcc.dg/vect/pr68577.c
new file mode 100644
index 000..999c1c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr68577.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+
+int a, b;
+
+void
+__sched_cpucount (void)
+{
+  while (b)
+{
+  long l = b++;
+  a += __builtin_popcountl(l);
+}
+}
+
+void
+slp_test (int *x, long *y)
+{
+  for (int i = 0; i < 512; i += 4)
+{
+  x[i] = __builtin_popcountl(y[i]);
+  x[i + 1] = __builtin_popcountl(y[i + 1]);
+  x[i + 2] = __builtin_popcountl(y[i + 2]);
+  x[i + 3] = __builtin_popcountl(y[i + 3]);
+}
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3b078da..af86bce 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2122,6 +2122,40 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* Return true if vector type VECTYPE_OUT has integer elements and
+   if we can narrow two integer vectors with the same shape as
+   VECTYPE_IN to VECTYPE_OUT in a single step.  On success,
+   return the binary pack code in *CONVERT_CODE and the types
+   of the input vectors in *CONVERT_FROM.  */
+
+static bool
+simple_integer_narrowing (tree vectype_out, tree vectype_in,
+ tree_code *convert_code, tree *convert_from)
+{
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out)))
+return false;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
+{
+  unsigned int bits
+   = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype_in)));
+  tree scalar_type = build_nonstandard_integer_type (bits, 0);
+  vectype_in = get_same_sized_vectype (scalar_type, vectype_in);
+}
+
+  tree_code code;
+  int multi_step_cvt = 0;
+  auto_vec  interm_types;
+  if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
+   &code, &multi_step_cvt,
+   &interm_types)
+  || multi_step_cvt)
+return false;
+
+  *convert_code = code;
+  *convert_from = vectype_in;
+  return true;
+}
 
 /* Function vectorizable_call.
 
@@ -2288,7 +2322,13 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   tree callee = gimple_call_fndecl (stmt);
 
   /* First try using an internal function.  */
-  if (cfn != CFN_LAST)
+  tree_code convert_code = ERROR_MARK;
+  tree convert_from = NULL_TREE;
+  if (cfn != CFN_LAST
+  && (modifier == NONE
+ || (modifier == NARROW
+ && simple_integer_narrowing (vectype_out, vectype_in,
+  &convert_code, &convert_from
 ifn = vectorizable_internal_function (cfn, callee, vectype_out,
  vectype_in);
 
@@ -2328,7 +2368,7 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, 
gimple **vec_stmt,
 
   if (slp_node || PURE_SLP_STMT (stmt_info))
 ncopies = 1;
-  else if (modifier == NARROW)
+  else if (modifier == NARROW && ifn == IFN_LAST)
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
   else
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
@@ -2344,6 +2384,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
 dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_call ==="
  "\n");
   vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+  if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
+   add_stmt_cost (stmt_info->vinfo->target_cost_data, ncopies / 2,
+  vec_promote_demote, stmt_info, 0, vect_body);
+
   return true;
 }
 
@@ -2357,9 +2401,9 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, 
gimple **vec_stmt,
   vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
   prev_stmt_info = NULL;
-  switch (modifier)
+  if (modifier == NONE || ifn != IFN_LAST)
 {
-case NONE:
+  tree prev_res = NULL_TREE;
   for (j = 0; j < ncopies; ++j)
{
  /* Build argument list for the vectorized call.  */
@@ -2387,12 +2431,30 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
  vec vec_oprndsk = vec_defs[k];
  vargs[k] =

Re: [PATCH testsuite ARM] : Update armv6 unaligned macro tests

2015-12-01 Thread Kyrill Tkachov


Hi Christian,

On 30/11/15 10:16, Christian Bruel wrote:

Hi Kyrill,

Your fix (https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01392.html) exposed new 
FAILs with the macro testings in ftest-armv6[kz]-thumb.c.

From what I understood, only ARMv6T2 will have TARGET_32BIT set, and set 
unaligned_access as tested in ftest-armv6t2-thumb.c.
It seems that the other fttest-armv6-thumb tests should be updated to reflect 
your fix.



Yes, thanks for catching this.
Ok.

Kyrill


Tested for arm-none-eabi .

PR68474: Fix tree-call-cdce.c:use_internal_fn

2015-12-01 Thread Richard Sandiford

We'd call gen_shrink_wrap_conditions for functions that it can't handle
but edom_only_function can.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68474
* tree-call-cdce.c (use_internal_fn): Protect call to
gen_shrink_wrap_conditions.

gcc/testsuite/
PR tree-optimization/68474
* gcc.dg/pr68474.c: New test.

diff --git a/gcc/testsuite/gcc.dg/pr68474.c b/gcc/testsuite/gcc.dg/pr68474.c
new file mode 100644
index 000..8ad7def
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68474.c
@@ -0,0 +1,7 @@
+/* { dg-options "-O -funsafe-math-optimizations" } */
+
+long double
+foo (long double d1, long double d2)
+{
+  return d1 || __builtin_significandl (d2);
+}
diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
index 75ef180..4123130 100644
--- a/gcc/tree-call-cdce.c
+++ b/gcc/tree-call-cdce.c
@@ -959,7 +959,8 @@ use_internal_fn (gcall *call)
 {
   unsigned nconds = 0;
   auto_vec conds;
-  gen_shrink_wrap_conditions (call, conds, &nconds);
+  if (can_test_argument_range (call))
+gen_shrink_wrap_conditions (call, conds, &nconds);
   if (nconds == 0 && !edom_only_function (call))
 return false;

[committed] Improve error reporting from genattrtab.c

2015-12-01 Thread Richard Sandiford

The errors reported by check_attr_value weren't very helpful because
they always used the location of the define(_enum)_attr, even if the
error was in a define_insn.  Also, the errors reported by
check_attr_test didn't say which attribute was faulty.

Although not technically a bug fix, it was really useful in writing
the patch for PR68432.

Tested on a variety of targets and applied.

Richard


gcc/
* genattrtab.c (check_attr_test): Take an attr_desc instead of
an is_const flag.  Put the file_location argument first.
Update recursive calls.  Improve error messages.
(check_attr_value): Take a file location and use it instead
of attr->loc.  Improve error messages.  Update calls to
check_attr_test.
(check_defs): Update call to check_attr_value.
(make_canonical): Likewise.
(gen_attr): Likewise.
(main): Likewise.
(gen_insn_reserv): Update call to check_attr_test.

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 32b837c..2caf8f6 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -729,9 +729,8 @@ attr_copy_rtx (rtx orig)
   return copy;
 }
 
-/* Given a test expression for an attribute, ensure it is validly formed.
-   IS_CONST indicates whether the expression is constant for each compiler
-   run (a constant expression may not test any particular insn).
+/* Given a test expression EXP for attribute ATTR, ensure it is validly
+   formed.  LOC is the location of the .md construct that contains EXP.
 
Convert (eq_attr "att" "a1,a2") to (ior (eq_attr ... ) (eq_attrq ..))
and (eq_attr "att" "!a1") to (not (eq_attr "att" "a1")).  Do the latter
@@ -744,9 +743,8 @@ attr_copy_rtx (rtx orig)
Return the new expression, if any.  */
 
 static rtx
-check_attr_test (rtx exp, int is_const, file_location loc)
+check_attr_test (file_location loc, rtx exp, attr_desc *attr)
 {
-  struct attr_desc *attr;
   struct attr_value *av;
   const char *name_ptr, *p;
   rtx orexp, newexp;
@@ -756,26 +754,27 @@ check_attr_test (rtx exp, int is_const, file_location loc)
 case EQ_ATTR:
   /* Handle negation test.  */
   if (XSTR (exp, 1)[0] == '!')
-   return check_attr_test (attr_rtx (NOT,
+   return check_attr_test (loc,
+   attr_rtx (NOT,
  attr_eq (XSTR (exp, 0),
   &XSTR (exp, 1)[1])),
-   is_const, loc);
+   attr);
 
   else if (n_comma_elts (XSTR (exp, 1)) == 1)
{
- attr = find_attr (&XSTR (exp, 0), 0);
- if (attr == NULL)
+ attr_desc *attr2 = find_attr (&XSTR (exp, 0), 0);
+ if (attr2 == NULL)
{
  if (! strcmp (XSTR (exp, 0), "alternative"))
return mk_attr_alt (((uint64_t) 1) << atoi (XSTR (exp, 1)));
  else
-   fatal_at (loc, "unknown attribute `%s' in EQ_ATTR",
- XSTR (exp, 0));
+   fatal_at (loc, "unknown attribute `%s' in definition of"
+ " attribute `%s'", XSTR (exp, 0), attr->name);
}
 
- if (is_const && ! attr->is_const)
-   fatal_at (loc, "constant expression uses insn attribute `%s'"
- " in EQ_ATTR", XSTR (exp, 0));
+ if (attr->is_const && ! attr2->is_const)
+   fatal_at (loc, "constant attribute `%s' cannot test non-constant"
+ " attribute `%s'", attr->name, attr2->name);
 
  /* Copy this just to make it permanent,
 so expressions using it can be permanent too.  */
@@ -784,26 +783,26 @@ check_attr_test (rtx exp, int is_const, file_location loc)
  /* It shouldn't be possible to simplify the value given to a
 constant attribute, so don't expand this until it's time to
 write the test expression.  */
- if (attr->is_const)
+ if (attr2->is_const)
ATTR_IND_SIMPLIFIED_P (exp) = 1;
 
- if (attr->is_numeric)
+ if (attr2->is_numeric)
{
  for (p = XSTR (exp, 1); *p; p++)
if (! ISDIGIT (*p))
  fatal_at (loc, "attribute `%s' takes only numeric values",
-   XSTR (exp, 0));
+   attr2->name);
}
  else
{
- for (av = attr->first_value; av; av = av->next)
+ for (av = attr2->first_value; av; av = av->next)
if (GET_CODE (av->value) == CONST_STRING
&& ! strcmp (XSTR (exp, 1), XSTR (av->value, 0)))
  break;
 
  if (av == NULL)
-   fatal_at (loc, "unknown value `%s' for `%s' attribute",
- XSTR (exp, 1), XSTR (exp, 0));
+   fatal_at (loc, "unknown value `%s' for attribute `%s'",
+ XSTR (exp, 1), attr2->name);

[PATCH, PR middle-end/68595] Fix invariant boolean vector generation

2015-12-01 Thread Ilya Enkovich

Hi,

This patch fixes a way invariant boolean vector is generated.  It makes sure 
boolean vector consists of 0 and -1 values.  Bootstrapped and tested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-12-01  Ilya Enkovich  

PR middle-end/68595
* tree-vect-stmts.c (vect_init_vector): Cast boolean
scalars to a proper value before building a vector.

gcc/testsuite/

2015-12-01  Ilya Enkovich  

PR middle-end/68595
* gcc.dg/pr68595.c: New test.


diff --git a/gcc/testsuite/gcc.dg/pr68595.c b/gcc/testsuite/gcc.dg/pr68595.c
new file mode 100644
index 000..179c6c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68595.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b;
+char c;
+void fn1() {
+  b = 30;
+  for (; b <= 32; b++) {
+c = -17;
+for (; c <= 56; c++)
+  a -= 0 == (c || b);
+  }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3b078da..5bb2289 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1300,7 +1300,25 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
gimple_stmt_iterator *gsi)
 {
   if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
{
- if (CONSTANT_CLASS_P (val))
+ /* Scalar boolean value should be transformed into
+all zeros or all ones value before building a vector.  */
+ if (VECTOR_BOOLEAN_TYPE_P (type))
+   {
+ tree true_val = build_zero_cst (TREE_TYPE (type));
+ tree false_val = build_all_ones_cst (TREE_TYPE (type));
+
+ if (CONSTANT_CLASS_P (val))
+   val = integer_zerop (val) ? false_val : true_val;
+ else
+   {
+ new_temp = make_ssa_name (TREE_TYPE (type));
+ init_stmt = gimple_build_assign (new_temp, COND_EXPR,
+  val, true_val, false_val);
+ vect_init_vector_1 (stmt, init_stmt, gsi);
+ val = new_temp;
+   }
+   }
+ else if (CONSTANT_CLASS_P (val))
val = fold_convert (TREE_TYPE (type), val);
  else
{

Re: [Patch AArch64] Fix typo in aarch64_builtin_reciprocal.

2015-12-01 Thread Ramana Radhakrishnan



On 01/12/15 09:04, Jakub Jelinek wrote:
> On Tue, Dec 01, 2015 at 08:58:53AM +, Ramana Radhakrishnan wrote:
>> The patch to restructure builtin_reciprocals missed out an obvious ')'. 
>> Adjusted thusly and applied as obvious to trunk.
> 
> Sorry for that.  Could you please also handle the gimple_call_internal_p
> case, so that it actually returns the aarch64 builtin decls if
> it is internal SQRT call with the right modes?  See the i386 and rs6000
> builtins.  Haven't done that for aarch64, because it uses a helper function
> defined somewhere else, so haven't been sure how you want it to look like.

Thanks for pointing this out. James - can you please take a look ?  I don't 
think I'll have the time to get to this today.

I just realized my patch wasn't attached to the previous mail - here it is FTR.

regards
Ramana


>>
>> 2015-12-01  Ramana Radhakrishnan  
>>
>> * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Fix typo.
> 
>   Jakub
> 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b150283..88dbe15 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7112,7 +7112,7 @@ aarch64_builtin_reciprocal (gcall *call)
   & AARCH64_EXTRA_TUNE_RECIP_SQRT))
 return NULL_TREE;
 
-  if (gimple_call_internal_p (call)
+  if (gimple_call_internal_p (call))
 return NULL_TREE;
 
   tree fndecl = gimple_call_fndecl (call);

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-01 Thread Dominik Vogt

On Mon, Nov 30, 2015 at 06:11:33PM +0100, Ulrich Weigand wrote:
> On 11/30/2015 04:11 PM, Dominik Vogt wrote:
> > The attached patch fixes some warnings generated by the setmem...
> > patterns in s390.md during build and add test cases for the
> > patterns.  The patch is to be added on to p of the movstr patch:
> > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03485.html
> > 
> > The test cases validate that the patterns are actually used, but
> > at the moment the setmem_long_and pattern is never actually used
> > and thus the test case would fail.  So I've split the patch in two
> > (both attached to this message) to activate this part of the test
> > once we've fixed that.
> > 
> > The patch has passed the SPEC2006 testsuite without any measurable
> > changes in performance.
> 
> What would you think about something like the following?
> 
> (define_insn "*setmem_long"
>   [(clobber (match_operand: 0 "register_operand" "=d"))
>(set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
> (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
>  (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE))
>(use (match_operand: 1 "register_operand" "d"))
>(clobber (reg:CC CC_REGNUM))]

New patch attached (patch 1.5 and ChangeLog are the same).  I've
swapped the operands 1 and 3 so that the numbering is the same as
before.  I think there are still a couple of problems with the
patched code:

1.

The new pattern has "(use (match_operand 3))" where the old one
just had match_dup (which did not express that a register pair was
required).  The expander function now requires a fourth, unused
argument that I don't know how to get rid of.

  emit_insn (gen_setmem_long_di (dst, convert_to_mode (Pmode, len, 1),
  val, NULL_RTX));
   

2.

I think the pattern should express that the register pair with the
destination address and length gets clobbered by the mvcle
instruction, and I'm not sure whether it's necessary to tell Gcc
explicitly that the register pair with the source address and
legth gets zeroed.

> [ Not sure if we'd need an extra (use (match_dup 3)) any more. ]
> 
> B.t.w. this is certainly wrong and cannot be generated by common code:
> (and:BLK (unspec:BLK
> [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")]
> UNSPEC_P_TO_BLK)
>(match_operand 4 "const_int_operand" "n"))
> (This explains why the pattern would never match.)

It never matched before this change either.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_expand_setmem): Use new expanders.
* config/s390/s390.md ("*setmem_long")
("*setmem_long_and", "*setmem_long_31z"): Fix warnings.
("setmem_long_"): New expanders.
("setmem_long"): Removed.

gcc/testsuite/ChangeLog

* gcc.target/s390/md/setmem_long-1.c: New test.
* gcc.target/s390/md/setmem_long-2.c: New test.
>From 0e1bc4be3466b0f07b1d5c1334e3717802a7db82 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 4 Nov 2015 03:16:24 +0100
Subject: [PATCH 1/1.5] S/390: Fix warnings in "*setmem_long..." patterns.

---
 gcc/config/s390/s390.c   |  7 +++-
 gcc/config/s390/s390.md  | 51 ++--
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c | 20 ++
 gcc/testsuite/gcc.target/s390/md/setmem_long-2.c | 20 ++
 4 files changed, 75 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/setmem_long-2.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 40ee2f7..df7af91 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5178,7 +5178,12 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
   else if (TARGET_MVCLE)
 {
   val = force_not_mem (convert_modes (Pmode, QImode, val, 1));
-  emit_insn (gen_setmem_long (dst, convert_to_mode (Pmode, len, 1), val));
+  if (TARGET_64BIT)
+	emit_insn (gen_setmem_long_di (dst, convert_to_mode (Pmode, len, 1),
+   val, NULL_RTX));
+  else
+	emit_insn (gen_setmem_long_si (dst, convert_to_mode (Pmode, len, 1),
+   val, NULL_RTX));
 }
 
   else
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 75e9af7..e093fd3 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -70,6 +70,9 @@
; Copy CC as is into the lower 2 bits of an integer register
UNSPEC_CC_TO_INT
 
+   ; Convert Pmode to BLKmode
+   UNSPEC_REPLICATE_BYTE
+
; GOT/PLT and lt-relative accesses
UNSPEC_LTREL_OFFSET
UNSPEC_LTREL_BASE
@@ -3281,13 +3284,13 @@
 
 ; Initialize a block of arbitrary length with (operands[2] % 256).
 
-(define_expand "setmem_long"
+(define_expand "setmem_long_"
   [(parallel
-[(clobber (match_

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-01 Thread Richard Earnshaw

On 01/12/15 03:19, Bin.Cheng wrote:
> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>  wrote:
>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>> On 24/11/15 02:51, Bin.Cheng wrote:
>> The aarch64's problem is we don't define addptr3 pattern, and we don't
 have direct insn pattern describing the "x + y << z".  According to
 gcc internal:

 ‘addptrm3’
 Like addm3 but is guaranteed to only be used for address calculations.
 The expanded code is not allowed to clobber the condition code. It
 only needs to be defined if addm3 sets the condition code.
>>
>> addm3 on aarch64 does not set the condition codes, so by this rule we
>> shouldn't need to define this pattern.
 Hi Richard,
 I think that rule has a prerequisite that backend needs to support
 register shifted addition in addm3 pattern.
>>>
>>> addm3 is a named pattern and its format is well defined.  It does not
>>> take a shifted operand and never has.
>>>
 Apparently for AArch64,
 addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
 "does not set the condition codes" actually, because both
 "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>
>>> You appear to be confusing named patterns (used by expand) with
>>> recognizers.  Anyway, we have
>>>
>>> (define_insn "*add__"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>   (match_operand:QI 2
>>> "aarch64_shift_imm_" "n"))
>>>   (match_operand:GPI 3 "register_operand" "r")))]
>>>
>>> Which is a non-flag setting add with shifted operand.
>>>
 Either way I think it is another backend issue, so do you approve that
 I commit this patch now?
>>>
>>> Not yet.  I think there's something fundamental amiss here.
>>>
>>> BTW, it looks to me as though addptr3 should have exactly the same
>>> operand rules as add3 (documentation reads "like add3"), so a
>>> shifted operand shouldn't be supported there either.  If that isn't the
>>> case then that should be clearly called out in the documentation.
>>>
>>> R.
>>>
>>
>> PS.
>>
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
> 
> Hi Richard,
> Thanks for the comments.  I realized that the not-recognized insn
> issue is because the original patch build non-canonical expressions.
> When reloading address expression, LRA generates non-canonical
> register scaled insn, which can't be recognized by aarch64 backend.
> 
> Here is the updated patch using canonical form pattern,  it passes
> bootstrap and regression test.  Well, the ivo failure still exists,
> but it analyzed in the original message.
> 
> Is this patch OK?
> 
> As for Jiong's concern about the additional extension instruction, I
> think this only stands for atmoic load store instructions.  For
> general load store, AArch64 supports zext/sext in register scaling
> addressing mode, the additional instruction can be forward propagated
> into memory reference.  The problem for atomic load store is AArch64
> only supports direct register addressing mode.  After LRA reloads
> address expression out of memory reference, there is no combine/fwprop
> optimizer to merge instructions.  The problem is atomic_store's
> predicate doesn't match its constraint.   The predicate used for
> atomic_store is memory_operand, while all other atomic patterns
> use aarch64_sync_memory_operand.  I think this might be a typo.  With
> this change, expand will not generate addressing mode requiring reload
> anymore.  I will test another patch fixing this.
> 
> Thanks,
> bin

Some comments inline.

>>
>> R.
>>
>> aarch64_legitimize_addr-20151128.txt
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3fe2f0f..5b3e3c4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>> */, machine_mode mode)
>>   We try to pick as large a range for the offset as possible to
>>   maximize the chance of a CSE.  However, for aligned addresses
>>   we limit the range to 4k so that structures with different sized
>> - elements are likely to use the same base.  */
>> + elements are likely to use the same base.  We need to be careful
>> + not split CONST for some forms address expressions, otherwise it

not to split a CONST for some forms of address expression,

>> + will generate sub-optimal code.  */
>>  
>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>  {
>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>HOST_WIDE_INT base_offset;
>>  
>> +  if (GET_CODE (XEXP (x, 0)) == PLUS)
>> +{
>> +  rt

Re: -fstrict-aliasing fixes 2/5: drop alias set 0 streaming

2015-12-01 Thread Richard Biener

On Tue, 1 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch disables the streaming of alias 0 flag and adds a comment why.
> 
> Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

> Honza
> 
>   * lto-streamer-out.c (hash_tree): Do not stream TYPE_ALIAS_SET.
>   * tree-streamer-out.c (pack_ts_type_common_value_fields): Do not
>   stream TYPE_ALIAS_SET.
>   * tree-streamer-in.c (unpack_ts_type_common_value_fields): Do not
>   stream TYPE_ALIAS_SET.
> 
>   * lto.c (compare_tree_sccs_1): Do not compare TYPE_ALIAS_SET.
> 
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 231081)
> +++ lto-streamer-out.c(working copy)
> @@ -1109,10 +1109,6 @@ hash_tree (struct streamer_tree_cache_d
>hstate.commit_flag ();
>hstate.add_int (TYPE_PRECISION (t));
>hstate.add_int (TYPE_ALIGN (t));
> -  hstate.add_int ((TYPE_ALIAS_SET (t) == 0
> -  || (!in_lto_p
> -  && get_alias_set (t) == 0))
> - ? 0 : -1);
>  }
>  
>if (CODE_CONTAINS_STRUCT (code, TS_TRANSLATION_UNIT_DECL))
> Index: lto/lto.c
> ===
> --- lto/lto.c (revision 231081)
> +++ lto/lto.c (working copy)
> @@ -1166,7 +1166,9 @@ compare_tree_sccs_1 (tree t1, tree t2, t
>compare_values (TYPE_READONLY);
>compare_values (TYPE_PRECISION);
>compare_values (TYPE_ALIGN);
> -  compare_values (TYPE_ALIAS_SET);
> +  /* Do not compare TYPE_ALIAS_SET.  Doing so introduce ordering issues
> + with calls to get_alias_set which may initialize it for streamed
> +  in types.  */
>  }
>  
>/* We don't want to compare locations, so there is nothing do compare
> Index: tree-streamer-out.c
> ===
> --- tree-streamer-out.c   (revision 231081)
> +++ tree-streamer-out.c   (working copy)
> @@ -317,13 +317,9 @@ pack_ts_type_common_value_fields (struct
>bp_pack_value (bp, TYPE_RESTRICT (expr), 1);
>bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1);
>bp_pack_value (bp, TYPE_READONLY (expr), 1);
> -  /* Make sure to preserve the fact whether the frontend would assign
> - alias-set zero to this type.  Do that only for main variants, because
> - type variants alias sets are never computed.
> - FIXME:  This does not work for pre-streamed builtin types.  */
> -  bp_pack_value (bp, (TYPE_ALIAS_SET (expr) == 0
> -   || (!in_lto_p && TYPE_MAIN_VARIANT (expr) == expr
> -   && get_alias_set (expr) == 0)), 1);
> +  /* We used to stream TYPE_ALIAS_SET == 0 information to let frontends mark
> + types that are opaque for TBAA.  This however did not work as intended,
> + becuase TYPE_ALIAS_SET == 0 was regularly lost in canonical type 
> merging.  */
>if (RECORD_OR_UNION_TYPE_P (expr))
>  {
>bp_pack_value (bp, TYPE_TRANSPARENT_AGGR (expr), 1);
> Index: tree-streamer-in.c
> ===
> --- tree-streamer-in.c(revision 231081)
> +++ tree-streamer-in.c(working copy)
> @@ -366,7 +366,6 @@ unpack_ts_type_common_value_fields (stru
>TYPE_RESTRICT (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_READONLY (expr) = (unsigned) bp_unpack_value (bp, 1);
> -  TYPE_ALIAS_SET (expr) = bp_unpack_value (bp, 1) ? 0 : -1;
>if (RECORD_OR_UNION_TYPE_P (expr))
>  {
>TYPE_TRANSPARENT_AGGR (expr) = (unsigned) bp_unpack_value (bp, 1);
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: -fstrict-aliasing fixes 3/5: Do not ignore -fstrict-aliasing changes when parsing optimization attribute

2015-12-01 Thread Richard Biener

On Tue, 1 Dec 2015, Jan Hubicka wrote:

> Hi,
> this is third part which enables us to change -fstrict-aliasing using
> optimize attribute.  This ought to work safely now because inliner
> propagate the flag.

Ok.

Thanks,
Richard.

> Bootstrapped/regtested x86_64-linux.
> 
> Honza
> 
>   * gcc.c-torture/execute/alias-1.c: New testcase.
>   * c-common.c: Do not silently ignore -fstrict-aliasing changes.
> Index: testsuite/gcc.c-torture/execute/alias-1.c
> ===
> --- testsuite/gcc.c-torture/execute/alias-1.c (revision 0)
> +++ testsuite/gcc.c-torture/execute/alias-1.c (revision 0)
> @@ -0,0 +1,19 @@
> +int val;
> +
> +int *ptr = &val;
> +float *ptr2 = &val;
> +
> +__attribute__((optimize ("-fno-strict-aliasing")))
> +typepun ()
> +{
> +  *ptr2=0;
> +}
> +
> +main()
> +{
> +  *ptr=1;
> +  typepun ();
> +  if (*ptr)
> +__builtin_abort ();
> +}
> +
> Index: c-family/c-common.c
> ===
> --- c-family/c-common.c   (revision 231097)
> +++ c-family/c-common.c   (working copy)
> @@ -9988,7 +9988,6 @@ parse_optimize_options (tree args, bool
>bool ret = true;
>unsigned opt_argc;
>unsigned i;
> -  int saved_flag_strict_aliasing;
>const char **opt_argv;
>struct cl_decoded_option *decoded_options;
>unsigned int decoded_options_count;
> @@ -10081,8 +10080,6 @@ parse_optimize_options (tree args, bool
>for (i = 1; i < opt_argc; i++)
>  opt_argv[i] = (*optimize_args)[i];
>  
> -  saved_flag_strict_aliasing = flag_strict_aliasing;
> -
>/* Now parse the options.  */
>decode_cmdline_options_to_array_default_mask (opt_argc, opt_argv,
>   &decoded_options,
> @@ -10093,9 +10090,6 @@ parse_optimize_options (tree args, bool
>  
>targetm.override_options_after_change();
>  
> -  /* Don't allow changing -fstrict-aliasing.  */
> -  flag_strict_aliasing = saved_flag_strict_aliasing;
> -
>optimize_args->truncate (0);
>return ret;
>  }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: When not optimizing do not compute RTX memory attributes

2015-12-01 Thread Richard Biener

On Tue, 1 Dec 2015, Jan Hubicka wrote:

> Hi,
> memory attributes are currently optimized and attached to RTL even when not
> optimizing. This is obviously just a wasted effort.

Huh, are you sure?  What about globals used from different optimize
contexts?

> Bootstrapped/regtested x86_64-linux, OK?

I don't think so.  Did you bootstrap with BOOT_CFLAGS="-O0 -g"?

Richard.

> Honza
>   * emit-rtl.c (set_mem_attrs, set_mem_attributes_minus_bitpos):
>   Do not compute memory attributes when not optimizing.
> 
> Index: emit-rtl.c
> ===
> --- emit-rtl.c(revision 231081)
> +++ emit-rtl.c(working copy)
> @@ -336,7 +336,8 @@ static void
>  set_mem_attrs (rtx mem, mem_attrs *attrs)
>  {
>/* If everything is the default, we can just clear the attributes.  */
> -  if (mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)]))
> +  if (!optimize
> +  || mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)]))
>  {
>MEM_ATTRS (mem) = 0;
>return;
> @@ -1749,6 +1750,9 @@ set_mem_attributes_minus_bitpos (rtx ref
>struct mem_attrs attrs, *defattrs, *refattrs;
>addr_space_t as;
>  
> +  if (!optimize)
> +return;
> +
>/* It can happen that type_for_mode was given a mode for which there
>   is no language-level type.  In which case it returns NULL, which
>   we can see here.  */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: PR68577: Handle narrowing for vector popcount, etc.

2015-12-01 Thread Richard Biener

On Tue, Dec 1, 2015 at 10:14 AM, Richard Sandiford
 wrote:
> This patch adds support for simple cases where the a vector internal
> function returns wider results than the scalar equivalent.  It punts
> on other cases.
>
> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
> PR tree-optimization/68577
> * tree-vect-stmts.c (simple_integer_narrowing): New function.
> (vectorizable_call): Restrict internal function handling
> to NONE and NARROW cases, using simple_integer_narrowing
> to test for the latter.  Add cost of narrowing operation
> and insert it where necessary.
>
> gcc/testsuite/
> PR tree-optimization/68577
> * gcc.dg/vect/pr68577.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr68577.c 
> b/gcc/testsuite/gcc.dg/vect/pr68577.c
> new file mode 100644
> index 000..999c1c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr68577.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +
> +int a, b;
> +
> +void
> +__sched_cpucount (void)
> +{
> +  while (b)
> +{
> +  long l = b++;
> +  a += __builtin_popcountl(l);
> +}
> +}
> +
> +void
> +slp_test (int *x, long *y)
> +{
> +  for (int i = 0; i < 512; i += 4)
> +{
> +  x[i] = __builtin_popcountl(y[i]);
> +  x[i + 1] = __builtin_popcountl(y[i + 1]);
> +  x[i + 2] = __builtin_popcountl(y[i + 2]);
> +  x[i + 3] = __builtin_popcountl(y[i + 3]);
> +}
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 3b078da..af86bce 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -2122,6 +2122,40 @@ vectorizable_mask_load_store (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>return true;
>  }
>
> +/* Return true if vector type VECTYPE_OUT has integer elements and
> +   if we can narrow two integer vectors with the same shape as
> +   VECTYPE_IN to VECTYPE_OUT in a single step.  On success,
> +   return the binary pack code in *CONVERT_CODE and the types
> +   of the input vectors in *CONVERT_FROM.  */
> +
> +static bool
> +simple_integer_narrowing (tree vectype_out, tree vectype_in,
> + tree_code *convert_code, tree *convert_from)
> +{
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out)))
> +return false;
> +
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
> +{
> +  unsigned int bits
> +   = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype_in)));
> +  tree scalar_type = build_nonstandard_integer_type (bits, 0);
> +  vectype_in = get_same_sized_vectype (scalar_type, vectype_in);
> +}
> +

any reason for supporting non-integer types on the input?  It seems to me
you are doing this for the lrint case?  If so isn't the "question" wrong and
you should pass the integer type the IFN returns as vectype_in instead?

That said, this conversion doesn't seem to belong to simple_integer_narrowing.

The patch is ok with simply removing it.

Thanks,
Richard.

> +  tree_code code;
> +  int multi_step_cvt = 0;
> +  auto_vec  interm_types;
> +  if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
> +   &code, &multi_step_cvt,
> +   &interm_types)
> +  || multi_step_cvt)
> +return false;
> +
> +  *convert_code = code;
> +  *convert_from = vectype_in;
> +  return true;
> +}
>
>  /* Function vectorizable_call.
>
> @@ -2288,7 +2322,13 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>tree callee = gimple_call_fndecl (stmt);
>
>/* First try using an internal function.  */
> -  if (cfn != CFN_LAST)
> +  tree_code convert_code = ERROR_MARK;
> +  tree convert_from = NULL_TREE;
> +  if (cfn != CFN_LAST
> +  && (modifier == NONE
> + || (modifier == NARROW
> + && simple_integer_narrowing (vectype_out, vectype_in,
> +  &convert_code, &convert_from
>  ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>   vectype_in);
>
> @@ -2328,7 +2368,7 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>
>if (slp_node || PURE_SLP_STMT (stmt_info))
>  ncopies = 1;
> -  else if (modifier == NARROW)
> +  else if (modifier == NARROW && ifn == IFN_LAST)
>  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
>else
>  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
> @@ -2344,6 +2384,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>  dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_call ==="
>   "\n");
>vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
> +  if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
> +   add_stmt_cost (stmt_info->vinfo->target_cost_data, ncopies / 2,
> +  vec_promote_demote,

Re: PR68474: Fix tree-call-cdce.c:use_internal_fn

2015-12-01 Thread Richard Biener

On Tue, Dec 1, 2015 at 10:24 AM, Richard Sandiford
 wrote:
> We'd call gen_shrink_wrap_conditions for functions that it can't handle
> but edom_only_function can.
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Richard.

> Thanks,
> Richard
>
>
> gcc/
> PR tree-optimization/68474
> * tree-call-cdce.c (use_internal_fn): Protect call to
> gen_shrink_wrap_conditions.
>
> gcc/testsuite/
> PR tree-optimization/68474
> * gcc.dg/pr68474.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/pr68474.c b/gcc/testsuite/gcc.dg/pr68474.c
> new file mode 100644
> index 000..8ad7def
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68474.c
> @@ -0,0 +1,7 @@
> +/* { dg-options "-O -funsafe-math-optimizations" } */
> +
> +long double
> +foo (long double d1, long double d2)
> +{
> +  return d1 || __builtin_significandl (d2);
> +}
> diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
> index 75ef180..4123130 100644
> --- a/gcc/tree-call-cdce.c
> +++ b/gcc/tree-call-cdce.c
> @@ -959,7 +959,8 @@ use_internal_fn (gcall *call)
>  {
>unsigned nconds = 0;
>auto_vec conds;
> -  gen_shrink_wrap_conditions (call, conds, &nconds);
> +  if (can_test_argument_range (call))
> +gen_shrink_wrap_conditions (call, conds, &nconds);
>if (nconds == 0 && !edom_only_function (call))
>  return false;
>
>

Re: [PATCH, PR middle-end/68595] Fix invariant boolean vector generation

2015-12-01 Thread Richard Biener

On Tue, Dec 1, 2015 at 10:44 AM, Ilya Enkovich  wrote:
> Hi,
>
> This patch fixes a way invariant boolean vector is generated.  It makes sure 
> boolean vector consists of 0 and -1 values.  Bootstrapped and tested on 
> x86_64-unknown-linux-gnu.  OK for trunk?

Ok.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-12-01  Ilya Enkovich  
>
> PR middle-end/68595
> * tree-vect-stmts.c (vect_init_vector): Cast boolean
> scalars to a proper value before building a vector.
>
> gcc/testsuite/
>
> 2015-12-01  Ilya Enkovich  
>
> PR middle-end/68595
> * gcc.dg/pr68595.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.dg/pr68595.c b/gcc/testsuite/gcc.dg/pr68595.c
> new file mode 100644
> index 000..179c6c3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68595.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b;
> +char c;
> +void fn1() {
> +  b = 30;
> +  for (; b <= 32; b++) {
> +c = -17;
> +for (; c <= 56; c++)
> +  a -= 0 == (c || b);
> +  }
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 3b078da..5bb2289 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1300,7 +1300,25 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
> gimple_stmt_iterator *gsi)
>  {
>if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
> {
> - if (CONSTANT_CLASS_P (val))
> + /* Scalar boolean value should be transformed into
> +all zeros or all ones value before building a vector.  */
> + if (VECTOR_BOOLEAN_TYPE_P (type))
> +   {
> + tree true_val = build_zero_cst (TREE_TYPE (type));
> + tree false_val = build_all_ones_cst (TREE_TYPE (type));
> +
> + if (CONSTANT_CLASS_P (val))
> +   val = integer_zerop (val) ? false_val : true_val;
> + else
> +   {
> + new_temp = make_ssa_name (TREE_TYPE (type));
> + init_stmt = gimple_build_assign (new_temp, COND_EXPR,
> +  val, true_val, false_val);
> + vect_init_vector_1 (stmt, init_stmt, gsi);
> + val = new_temp;
> +   }
> +   }
> + else if (CONSTANT_CLASS_P (val))
> val = fold_convert (TREE_TYPE (type), val);
>   else
> {

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Richard Biener

On Mon, 30 Nov 2015, Gary Funck wrote:

> 
> Some time ago, we submitted an RFC for the introduction of
> UPC support into GCC.  During the intervening time period,
> we have continued to keep the 'gupc' (GNU UPC) branch in sync
> with the GCC trunk and have incorporated feedback and contributions from
> various GCC developers (Joseph Myers, Tom Tromey, Jakub Jelinek,
> Richard Henderson, Meador Inge, and others).  We have also implemented
> various bug fixes and improvements.
> 
> At this time, we would like to re-submit the UPC patches for comment
> with the goal of introducing these changes into GCC 6.0.

First of all let me say that it is IMNSHO now too late for GCC 6.

> This email provides an overview of UPC and summarizes the
> impact of UPC changes on the GCC front-end.
> 
> Subsequent emails will include various patch sets which are grouped
> by the area of GCC that they impact (front-end, generic, documentation,
> build, test, target-specific, and so on), so that they can receive
> a more focused review by their respective maintainers.
> 
> The main review-related changes are:
> 
> * GUPC is no longer implemented as a separate language
> (e.g., Objective-C or C++) compiler.  Rather, a new -fupc switch
> has been added, which enables UPC support in the C compiler.
> 
> * The UPC blocking factor now only uses two of the tree's
> "spare" bits.  If the UPC blocking factor is not the default
> value of 1 or the "indefinite" value of 0, then it is recorded
> in a separate hash table, indexed by the tree node.
> 
> * UPC-specific tree support has been integrated into
> gcc/c-family/c-common.def and gcc/c-family/c-common.h.
> 
> * The number of UPC-specific configuration options
> have been reduced.
> 
> * The UPC pointer-to-shared format per-target configuration
> has been simplified.  Before, both a "packed" and a "struct"
> pointer-to-shared representation was supported.  Now, only
> the "struct" format is supported and various configuration
> options for tweaking field sizes and such have been removed.
> 
> * In keeping with current GCC development guidelines
> target macros are no longer used.  Rather, where needed,
> target hooks are defined and used.
> 
> * FIXME's and TODO's were either fixed or cleaned up.
> 
> * The copyright and license notices were updated.
> 
> * The code was reviewed for conformance to coding standards and updated.
> 
> * Diagnostics now use appropriate format strings rather than building
> up the strings with sprintf().
> 
> * Files in c-family/ no longer include c-tree.h to conform with modularization
> improvements.
> 
> * Most of the #ifdef conditionals have been removed.  Some target hooks
> have been defined and documented in tm.texi.
> 
> * The code was reviewed to verify that it conforms with
> current GCC coding practices and that it incorporates cleanups
> done in the past several years.
> 
> * Comments were added to most new functions, and typos and
> spelling errors in comments were fixed.
> 
> * Changes that appeared in the diff's that were unrelated to UPC
> were removed or incorporated into the trunk.
> 
> * The linkage to the libgupc library was changed to use the newly
> defined method (used in libgomp/libgo for example) of including
> library 'spec' files.  This led to a simplification where we no
> longer needed to add UPC-specific spec. files in various
> target-specific config. directories.
> 
> Introduction: UPC-related Changes
> -
> 
> Below, various UPC-related changes are summarized.
> This introduction is provided as background for review of the UPC
> changes implemented in the GUPC branch.  Each individual change will be
> discussed in more detail in the patch sets found in the following emails.
> 
> The current GUPC branch is based upon a recent version of the GCC trunk
> and has been bootstrapped on x86_64/i686 Linux, x86_64
> Darwin, IA64/Altix Linux, PowerPC Power7 (big endian), and Power8
> (little endian).  Also some testing has been done on various flavors
> of BSD and Solaris and in the past MIPS was tested and supported.
> 
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
> 
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
> 
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
> 
> In the discussion below, some changes are excerpted in order to
> highlight important aspects of the changes.
> 
> UPC's Shared Qualifier and Layout Qualifier
> ---
> 
> The UPC language specification describes
> the language syntax and semantics:
>   http://upc.lbl.gov/publications/upc-spec-1.3.pdf
> 
> UPC introduces a new qualifier, "shared" that indicates that the
> qualified object is located in a global shared address space that is
> accessible by all UPC threads.  Additional qualifiers ("strict" and
> "r

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Bernd Schmidt


On 12/01/2015 06:31 AM, Gary Funck wrote:

At this time, we would like to re-submit the UPC patches for comment
with the goal of introducing these changes into GCC 6.0.


This has missed stage 1 by a few weeks, we'd have to make an exception 
to include it at this late stage.



@@ -857,9 +875,14 @@ struct GTY(()) tree_base {
unsigned user_align : 1;
unsigned nameless_flag : 1;
unsigned atomic_flag : 1;
-  unsigned spare0 : 3;
-
-  unsigned spare1 : 8;
+  unsigned shared_flag : 1;
+  unsigned strict_flag : 1;
+  unsigned relaxed_flag : 1;
+
+  unsigned threads_factor_flag : 1;
+  unsigned block_factor_0 : 1;
+  unsigned block_factor_x : 1;
+  unsigned spare1 : 5;


That's a lot of bits used up at once.

Does this solve anything that cannot be done with OpenMP, which we 
already support? Can you show us any users of this that demonstrate that 
this is actually in use by anyone outside the universities responsible 
for UPC? The language standard is apparently from 2005, but I've never 
heard of it and googling "upc" does not give any sensible results. The 
gccupc mailing list seems to have been dead for years judging by the 
archives. I'm worried we'll end up carrying something around as a burden 
that is of no practical use (considering we already support the more 
widespread OpenMP).



Bernd

Re: [UPC 02/22] tree-related changes

2015-12-01 Thread Richard Biener

On Mon, 30 Nov 2015, Gary Funck wrote:

> 
> Background
> --
> 
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
> 
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
> 
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
> 
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
> 
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
> 
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
> 
> Overview
> 
> 
> UPC introduces a new qualifier, "shared", that indicates that the
> qualified object is located in a global shared address space that is
> accessible by all UPC threads.  Additional qualifiers ("strict" and
> "relaxed") further specify the semantics of accesses to
> UPC shared objects.
> 
> In UPC, a shared qualified array can further specify a "layout
> qualifier" (blocking factor) that indicates how the shared data
> is blocked and distributed.
> 
> The following example illustrates the use of the UPC "shared" qualifier
> combined with a layout qualifier.
> 
> #define BLKSIZE 5
> #define N_PER_THREAD (4 * BLKSIZE)
> shared [BLKSIZE] double A[N_PER_THREAD*THREADS];
> 
> Above the "[BLKSIZE]" construct is the UPC layout qualifier; this
> specifies that the shared array, A, distributes its elements across
> each thread in blocks of 5 elements.  If the program is run with two
> threads, then A is distributed as shown below:
> 
> Thread 0Thread 1
> -
> A[ 0.. 4]   A[ 5.. 9]
> A[10..14]   A[15..19]
> A[20..24]   A[25..29]
> A[30..34]   A[35..39]
> 
> Above, the elements shown for thread 0 are defined as having "affinity"
> to thread 0.  Similarly, those elements shown for thread 1 have
> affinity to thread 1.  In UPC, a pointer to a shared object can be
> cast to a thread local pointer (a "C" pointer), when the designated
> shared object has affinity to the referencing thread.
> 
> A UPC "pointer-to-shared" (PTS) is a pointer that references a UPC
> shared object.  A UPC pointer-to-shared is a "fat" pointer with the
> following logical fields:
>(virt_addr, thread, phase)
> 
> The virtual address (virt_addr) field is combined with the thread
> number (thread) to derive the location of the referenced object
> within the UPC shared address space.  The phase field is used
> keep track of the current block offset for PTS's that have
> blocking factor that is greater than one.
> 
> GUPC implements pointer-to-shared objects using a "struct" internal
> representation.  Until recently, GUPC also supported a "packed"
> representation, which is more space efficient, but limits the range of
> various fields in the UPC pointer-to-shared representation.  We have
> decided to support only the "struct" representation so that the
> compiler uses a single ABI that supports the full range of addresses,
> threads, and blocking factors.
> 
> GCC's internal tree representation is extended to record the UPC
> "shared", "strict", "relaxed" qualifiers, and the layout qualifier.
> 
> --- gcc/tree-core.h (.../trunk) (revision 228959)
> +++ gcc/tree-core.h (.../branches/gupc) (revision 229159)
> @@ -470,7 +470,11 @@ enum cv_qualifier {
>TYPE_QUAL_CONST= 0x1,
>TYPE_QUAL_VOLATILE = 0x2,
>TYPE_QUAL_RESTRICT = 0x4,
> -  TYPE_QUAL_ATOMIC   = 0x8
> +  TYPE_QUAL_ATOMIC   = 0x8,
> +  /* UPC qualifiers */
> +  TYPE_QUAL_SHARED   = 0x10,
> +  TYPE_QUAL_RELAXED  = 0x20,
> +  TYPE_QUAL_STRICT   = 0x40
>  };
> [...]
> @@ -857,9 +875,14 @@ struct GTY(()) tree_base {
>unsigned user_align : 1;
>unsigned nameless_flag : 1;
>unsigned atomic_flag : 1;
> -  unsigned spare0 : 3;
> -
> -  unsigned spare1 : 8;
> +  unsigned shared_flag : 1;
> +  unsigned strict_flag : 1;
> +  unsigned relaxed_flag : 1;
> +
> +  unsigned threads_factor_flag : 1;
> +  unsigned block_factor_0 : 1;
> +  unsigned block_factor_x : 1;
> +  unsigned spare1 : 5;
> 
> A given type is a UPC shared type if its 'shared_flag' is set.
> However, for array types, the shared_flag of the *element type*
> must be checked.  Thus,
> 
> /* Return TRUE if TYPE is a shared type.  For arrays,
>the element type must be queried, because array types
>are never qualified.  */
> #define SHARED_TYPE_P(TYPE) \
>   ((TYPE) && TYPE_P (TYPE) \
>&& TYPE_SHARED ((TREE_CODE

[patch] Fix clang error with std::experimental::filesystem::path

2015-12-01 Thread Jonathan Wakely


I got a report that Clang fails to compile the filesystem lib, with
the following error:

/home/jwakely/gcc/latest/include/c++/6.0.0/experimental/bits/fs_path.h:563:18: fatal 
error: explicit specialization of 
'std::experimental::filesystem::v1::path::__is_encoded_char' after 
instantiation
   struct path::__is_encoded_char : std::true_type
^~~
/home/jwakely/gcc/latest/include/c++/6.0.0/experimental/bits/fs_path.h:104:9: 
note: implicit instantiation first required here
 : decltype(__is_path_src(std::declval<_Source>(), 0))
   ^

The problem is that the path::_Cmpt constructors do overload
resolution on the path constructors, which requires the explicit
specializations of __is_encoded_char. Solved by moving the definition
of _Cmpt after the specializations.

I'm still reducing it to report a g++ bug for the missed diagnostic.

Tested powerpc64le-linux, committed to trunk. This is unfortunately
too late for 5.3, so I'll fix it on the branch when it reopens.


commit 0bd97117b71c759cbccfe9d19ea09b96c3bce472
Author: Jonathan Wakely 
Date:   Tue Dec 1 10:30:31 2015 +

Define path::_Cmpt after specializing path::__is_encoded_char

	* include/experimental/bits/fs_path.h (path::_Cmpt): Move definition
	after path::__is_encoded_char explicit specializations.

diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h b/libstdc++-v3/include/experimental/bits/fs_path.h
index 40462a6..98820ad 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -549,16 +549,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 std::string _M_what = _M_gen_what();
   };
 
-  struct path::_Cmpt : path
-  {
-_Cmpt(string_type __s, _Type __t, size_t __pos)
-  : path(std::move(__s), __t), _M_pos(__pos) { }
-
-_Cmpt() : _M_pos(-1) { }
-
-size_t _M_pos;
-  };
-
   template<>
 struct path::__is_encoded_char : std::true_type
 { using value_type = char; };
@@ -575,6 +565,16 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 struct path::__is_encoded_char : std::true_type
 { using value_type = char32_t; };
 
+  struct path::_Cmpt : path
+  {
+_Cmpt(string_type __s, _Type __t, size_t __pos)
+  : path(std::move(__s), __t), _M_pos(__pos) { }
+
+_Cmpt() : _M_pos(-1) { }
+
+size_t _M_pos;
+  };
+
   // specialize _Cvt for degenerate 'noconv' case
   template<>
 struct path::_Cvt

Re: [UPC 07/22] lowering, pointer-to-shared ops

2015-12-01 Thread Richard Biener

On Mon, 30 Nov 2015, Gary Funck wrote:

> 
> Background
> --
> 
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
> 
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
> 
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
> 
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
> 
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
> 
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
> 
> Overview
> 
> 
> The UPC lowering pass traverses the current function tree
> and rewrites UPC related statements and operations into GENERIC.
> The resulting GENERIC tree code will retain UPC pointers-to-shared (PTS)
> types, but all operations such as 'get' and 'put' which indirect
> through a pointer-to-shared have been lowered to use the internal
> representation type.  Most of these operations on UPC pointers-to-shared
> is implemented in c/c-upc-pts-ops.c.
> 
> The UPC lowering pass is implemented by upc_genericize() in
> c/c-upc-low.c.  upc_genericize() is called from finish_function()
> in c/c-decl.c. It is called just prior to calling c_genericize(),
> if -fupc has been asserted.
> 
> The file c/c-upc-rts-names.h defines the names of the UPC runtime
> entry points and variables that implement the runtime ABI.
> To date, there has been no need to implement target dependent names,
> perhaps partly because UPC is supported primarily on POSIX-compliant targets.
> 
> UPC requires some special logic for handling file scoped initializations.
> This is due to the fact that UPC shared addresses are not known
> until runtime and therefore cannot be statically initialized
> in the usual way.  For example, 'addr_x' below must be initialized
> at runtime.
> 
>   shared int x;
>   shared int *addr_x = &x;
> 
> The routine, upc_check_decl_init(), checks an initialization
> statement to determine if it needs special handling.
> It is called from store_init_value().  If an initialization
> refers to UPC-related constructs that require initialization
> at runtime, then upc_decl_init() is called to save the
> initialization statement on a list.  This list is
> processed by upc_write_global_declarations(), which
> is called via a UPC-specific language hook from
> c_common_parse_file(), just after calling c_parse_file().
> 
> 
> 2015-11-30  Gary Funck  
> 
>   gcc/c-family/
>   * c-upc-pts.h: New.  Define the sizes and types of fields
>   in the UPC pointer-to-shared representation.
>   gcc/c/
>   * c-upc-low.c: New.  Lower UPC constructs to GENERIC.
>   * c-upc-low.h: New.  Prototypes for c-upc-low.c.
>   * c-upc-pts-ops.c: New. Implement UPC pointer-to-shared-operations.
>   * c-upc-pts-ops.h: New. Prototypes for c-upc-pts-ops.c.
>   * c-upc-rts-names.h: New.  Names of some functions in the UPC runtime.
> 
> Index: gcc/c-family/c-upc-pts.h
> ===
> --- gcc/c-family/c-upc-pts.h  (.../trunk) (revision 0)
> +++ gcc/c-family/c-upc-pts.h  (.../branches/gupc) (revision 231080)
> @@ -0,0 +1,40 @@
> +/* Define UPC pointer-to-shared representation characteristics.
> +   Copyright (C) 2008-2015 Free Software Foundation, Inc.
> +   Contributed by Gary Funck 
> + and Nenad Vukicevic .
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#ifndef GCC_C_FAMILY_UPC_PTS_H
> +#define GCC_C_FAMILY_UPC_PTS_H 1
> +
> +#define UPC_PTS_SIZE(LONG_TYPE_SIZE + POINTER_SIZE)
> +#define UPC_PTS_PHASE_SIZE  (LONG_TYPE_SIZE / 2)
> +#define UPC_PTS_THREAD_SIZE (LONG_TYPE_SIZE / 2)
> +#define UPC_PTS_VADDR_SIZE  POINTER_SIZE
> +#define UPC_PTS_PHASE_TYPE  ((LONG_TYPE_SIZE == 64) \
> + ? "uint32_t" : "uint16_t")
> +#define UPC_PTS_THREAD

Re: [UPC 14/22] constant folding changes

2015-12-01 Thread Richard Biener

On Mon, 30 Nov 2015, Gary Funck wrote:

> 
> Background
> --
> 
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
> 
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
> 
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
> 
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
> 
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
> 
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
> 
> Overview
> 
> 
> UPC pointers-to-shared (aka shared pointers) are not interchangeable
> with integers as they are in regular "C".  Therefore, additions
> and subtraction operations which involve UPC shared pointers
> should not be further simplified.

This looks worrysome.  I suppose this applies to simplifications
done before lowering only?  If so I wonder if not using regular
plus/minus/convert/nop for operations on UPC shared pointers is
better than introducing this kind of checks.

Richard.

> 2015-11-30  Gary Funck  
> 
>   gcc/
>   * fold-const.c (fold_unary_loc): Do not perform this simplification
>   if either of the types are UPC pointer-to-shared types.
>   (fold_binary_loc): Disable optimizations involving UPC
>   pointers-to-shared because integers are not interoperable
>   with UPC pointers-to-shared.
>   * match.pd: Do not simplify POINTER_PLUS operations which
>   involve UPC pointers-to-shared.  Do not simplify integral
>   conversions involving UPC pointers-to-shared.  For a chain
>   of two conversions, do not simplify conversions involving
>   UPC pointers-to-shared unless they meet specific criteria.
> 
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c  (.../trunk) (revision 231059)
> +++ gcc/fold-const.c  (.../branches/gupc) (revision 231080)
> @@ -7805,10 +7805,16 @@ fold_unary_loc (location_t loc, enum tre
>  
>/* Convert (T1)(X p+ Y) into ((T1)X p+ Y), for pointer type, when the 
> new
>cast (T1)X will fold away.  We assume that this happens when X itself
> -  is a cast.  */
> +  is a cast.
> +  
> +  Do not perform this simplification if either of the types 
> +  are UPC pointer-to-shared types.  */
>if (POINTER_TYPE_P (type)
> && TREE_CODE (arg0) == POINTER_PLUS_EXPR
> -   && CONVERT_EXPR_P (TREE_OPERAND (arg0, 0)))
> +   && CONVERT_EXPR_P (TREE_OPERAND (arg0, 0))
> +   && !SHARED_TYPE_P (TREE_TYPE (type))
> +   && !SHARED_TYPE_P (TREE_TYPE (
> +TREE_TYPE (TREE_OPERAND (arg0, 0)
>   {
> tree arg00 = TREE_OPERAND (arg0, 0);
> tree arg01 = TREE_OPERAND (arg0, 1);
> @@ -9271,6 +9277,14 @@ fold_binary_loc (location_t loc,
>return NULL_TREE;
>  
>  case PLUS_EXPR:
> +  /* Disable further optimizations involving UPC shared pointers,
> + because integers are not interoperable with shared pointers.  */
> +  if ((TREE_TYPE (arg0) && POINTER_TYPE_P (TREE_TYPE (arg0))
> +  && SHARED_TYPE_P (TREE_TYPE (TREE_TYPE (arg0
> + || (TREE_TYPE (arg1) && POINTER_TYPE_P (TREE_TYPE (arg1))
> + && SHARED_TYPE_P (TREE_TYPE (TREE_TYPE (arg1)
> +return NULL_TREE;
> +
>if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
>   {
> /* X + (X / CST) * -CST is X % CST.  */
> @@ -9679,6 +9693,16 @@ fold_binary_loc (location_t loc,
>return NULL_TREE;
>  
>  case MINUS_EXPR:
> +
> +  /* Disable further optimizations involving UPC shared pointers,
> + because integers are not interoperable with shared pointers.
> +  (The test below also detects pointer difference between
> +  shared pointers, which cannot be folded.  */
> +
> +  if (TREE_TYPE (arg0) && POINTER_TYPE_P (TREE_TYPE (arg0))
> +  && SHARED_TYPE_P (TREE_TYPE (TREE_TYPE (arg0
> +return NULL_TREE;
> +
>/* (-A) - B -> (-B) - A  where B is easily negated and we can swap.  */
>if (TREE_CODE (arg0) == NEGATE_EXPR
> && negate_expr_p (op1)
> Index: gcc/match.pd
> ===
> --- gcc/match.pd  (.../trunk) (revision 231059)
> +++ gcc/match.pd  (.../branches/gupc) (revision 231080)
> @@ -931,10 +931,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>

Re: [UPC 16/22] gimple/gimplify changes

2015-12-01 Thread Richard Biener

On Mon, 30 Nov 2015, Gary Funck wrote:

> 
> Background
> --
> 
> An overview email, describing the UPC-related changes is here:
>   https://gcc.gnu.org/ml/gcc-patches/2015-12/msg5.html
> 
> The GUPC branch is described here:
>   http://gcc.gnu.org/projects/gupc.html
> 
> The UPC-related source code differences are summarized here:
>   http://gccupc.org/gupc-changes
> 
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.
> 
> If you are on the cc-list, your name was chosen either
> because you are listed as a maintainer for the area that
> applies to the patches described in this email, or you
> were a frequent contributor of patches made to files listed
> in this email.
> 
> In the change log entries included in each patch, the directory
> containing the affected files is listed, followed by the files.
> When the patches are applied, the change log entries will be
> distributed to the appropriate ChangeLog file.
> 
> Overview
> 
> 
> In gimple-expr.c, logic is added to useless_type_conversion_p() to
> handle conversions involving UPC pointers-to-shared.
> lang_hooks.types_compatible_p() is called to check conversions
> between UPC pointers-to-shared.  This will in turn call c_types_compatible_p()
> which will call upc_types_compatible_p() if -fupc is asserted.
> 
> The hook is needed here because the gimple-related routines are
> defined at the top-level of the GCC tree and can be linked with
> other front-ends.

Like I said elsewhere this is purely middle-end code and thus
may not call langhooks.

> In gimplify.c, flag_instrument_functions_exclude_p() is exported
> as an external function rather than being defined as a static function.
> It is called from upc_genericize_function() defined in c/c-upc-low.c,
> when -fupc-instrument-functions is asserted.
> 
> 2015-11-30  Gary Funck  
> 
>   gcc/
>   * gimple-expr.c: #include "langhooks.h".
>   (useless_type_conversion_p): Retain conversions from UPC
>   pointer-to-shared and a regular C pointer.
>   Retain conversions between incompatible UPC pointers-to-shared.
>   Call lang_hooks.types_compatible_p() to check type
>   compatibility between UPC pointers-to-shared.
>   * gimplify.c (flag_instrument_functions_exclude_p): Make it into
>   an external function.
>   * gimplify.h (flag_instrument_functions_exclude_p): New prototype.
> 
> Index: gcc/gimple-expr.c
> ===
> --- gcc/gimple-expr.c (.../trunk) (revision 231059)
> +++ gcc/gimple-expr.c (.../branches/gupc) (revision 231080)
> @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.
>  #include "gimple-ssa.h"
>  #include "fold-const.h"
>  #include "tree-eh.h"
> +#include "langhooks.h"
>  #include "gimplify.h"
>  #include "stor-layout.h"
>  #include "demangle.h"
> @@ -67,6 +68,19 @@ useless_type_conversion_p (tree outer_ty
>if (POINTER_TYPE_P (inner_type)
>&& POINTER_TYPE_P (outer_type))
>  {
> +  int i_shared = SHARED_TYPE_P (TREE_TYPE (inner_type));
> +  int o_shared = SHARED_TYPE_P (TREE_TYPE (outer_type));
> +
> +  /* Retain conversions from a UPC shared pointer to
> + a regular C pointer.  */
> +  if (!o_shared && i_shared)
> +return false;
> +
> +  /* Retain conversions between incompatible UPC shared pointers.  */
> +  if (o_shared && i_shared
> +   && !lang_hooks.types_compatible_p (inner_type, outer_type))
> +return false;
> +
>/* Do not lose casts between pointers to different address spaces.  */
>if (TYPE_ADDR_SPACE (TREE_TYPE (outer_type))
> != TYPE_ADDR_SPACE (TREE_TYPE (inner_type)))

As the addr-space check is right below the place you change - why
are incompatible UPC shared pointers not using different address-spaces
then?

That is, why are you introducing a different kind of "address space"
representation?

Richard.

> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c(.../trunk) (revision 231059)
> +++ gcc/gimplify.c(.../branches/gupc) (revision 231080)
> @@ -11269,7 +11269,7 @@ typedef char *char_p; /* For DEF_VEC_P.
>  
>  /* Return whether we should exclude FNDECL from instrumentation.  */
>  
> -static bool
> +bool
>  flag_instrument_functions_exclude_p (tree fndecl)
>  {
>vec *v;
> Index: gcc/gimplify.h
> ===
> --- gcc/gimplify.h(.../trunk) (revision 231059)
> +++ gcc/gimplify.h(.../branches/gupc) (revision 231080)
> @@ -77,6 +77,7 @@ extern enum gimplify_status gimplify_exp
>  extern void gimplify_type_sizes (tree, gimple_seq *);
>  extern void gimplify_one_sizepos (tree *, gimple_seq *);
>  extern gbind *gimplify_body (tree, bool);
> +extern bool flag_instrument_functions_exclude_p (tree);
>  extern

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Bernd Schmidt


On 11/26/2015 05:22 PM, Richard Sandiford wrote:

Bernd Schmidt  writes:


I wish we'd taken some more time to think through the consequences of
the original internal_fn patchset.


I don't think this PR shows that the approach was wrong.


I think it does. Internal functions make a new assumptions, that 
expanders don't FAIL - but as we've now seen, they do. The optimize_size 
thing is reasonably easy to grep for and it looks like only i386 is 
affected, but have you looked at every expander in every port that could 
be used by an internal function to ensure it does not FAIL for a 
different reason?


Is there a simple way to disable the entire internal_fn machinery and 
get us back to where we were in gcc-5, without taking out all the code 
immediately? That would give us time until next stage 1 to think through 
the issues.



Bernd

Re: [PATCH, ARM] PR target/68617 Fix armv6 unaligned_access with attribute thumb

2015-12-01 Thread Kyrill Tkachov


Hi Christian,

On 01/12/15 09:18, Christian Bruel wrote:

Hi,

This patches fixes the PR my making the unaligned_access flag sensitive to the 
attribute target, since some armv6 might use unaligned loads depending on the 
TARGET_32BIT flag.

OK for stage3 ?



Index: gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c
===
--- gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c  (revision 0)
+++ gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c  (working copy)
@@ -0,0 +1,19 @@
+/* PR target/68617
+   Verify that unaligned_access is correctly with attribute target.  */
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { 
"-march=armv6" } } */
+/* { dg-options "-Os -mfloat-abi=softfp -mtp=soft" } */
+/* { dg-add-options arm_arch_v6 } */


Do you need the -mtp=soft ?

This is ok for trunk.
Thanks for picking this up.

Kyrill

Re: [PATCH, libgomp] Rewire OpenACC async

2015-12-01 Thread Julian Brown

On Tue, 24 Nov 2015 18:27:24 +0800
Chung-Lin Tang  wrote:

> Hi, this patch reworks some of the way that asynchronous copyouts are
> implemented for OpenACC in libgomp.
> 
> Before this patch, we had a somewhat confusing way of implementing
> this by having two refcounts for each mapping: refcount and
> async_refcount, which I never got working again after the last wave
> of async regressions showed up.
> 
> So this patch implements what I believe to be a simplification:
> async_refcount is removed, and instead of trying to queue the async
> copyouts during unmapping we actually do that during the plugin event
> handling. This requires a addition of the async stream integer as an
> argument to the register_async_cleanup plugin hook, but overall I
> think this should be more elegant than before.

This looks OK to me I think (I've only looked fairly briefly). I vaguely
remember trying something along these lines in an earlier iteration of
the async support -- maybe hitting problems with locking (I see you
have code to mitigate problems with that, and locking generally has
probably evolved a bit since I last looked at the code in detail
anyway).

Can event_gc ever be called when the *device* lock is held?

I'm slightly concerned that pushing async unmapping into event_gc means
that program-level semantics are deferred to the backend, which is
arguably the wrong place. But then I don't understand what went wrong
with the dual-refcount implementation, so maybe it's unavoidable for
some reason.

HTH,

Julian

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Richard Biener

On Tue, Dec 1, 2015 at 12:54 PM, Bernd Schmidt  wrote:
> On 11/26/2015 05:22 PM, Richard Sandiford wrote:
>>
>> Bernd Schmidt  writes:
>>
>>> I wish we'd taken some more time to think through the consequences of
>>> the original internal_fn patchset.
>>
>>
>> I don't think this PR shows that the approach was wrong.
>
>
> I think it does. Internal functions make a new assumptions, that expanders
> don't FAIL - but as we've now seen, they do. The optimize_size thing is
> reasonably easy to grep for and it looks like only i386 is affected, but
> have you looked at every expander in every port that could be used by an
> internal function to ensure it does not FAIL for a different reason?

Of course we are not sure.  But I think the approach in the series is the only
reasonable one.  I view the internal_fn support for optabs as a great way to
provide sth like instruction selection to GIMPLE with the goal to simplify
RTL expansion itself (which since quite some time cannot rely on seeing
"large" expressions anymore and with TER has the limitation of seeing
only some under the constraint TER and out-of-SSA operate).

> Is there a simple way to disable the entire internal_fn machinery and get us
> back to where we were in gcc-5, without taking out all the code immediately?
> That would give us time until next stage 1 to think through the issues.

Do you have even a guess as to how to approach the issue differently?

Yes, I think we can rip out uses of the new machinery quite easily but I don't
think we're at the point declaring failure yet.

Richard.

>
> Bernd

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Bernd Schmidt


On 12/01/2015 01:16 PM, Richard Biener wrote:

On Tue, Dec 1, 2015 at 12:54 PM, Bernd Schmidt  wrote:

On 11/26/2015 05:22 PM, Richard Sandiford wrote:


Bernd Schmidt  writes:


I wish we'd taken some more time to think through the consequences of
the original internal_fn patchset.



I don't think this PR shows that the approach was wrong.



I think it does. Internal functions make a new assumptions, that expanders
don't FAIL - but as we've now seen, they do. The optimize_size thing is
reasonably easy to grep for and it looks like only i386 is affected, but
have you looked at every expander in every port that could be used by an
internal function to ensure it does not FAIL for a different reason?


Of course we are not sure.


Hence, the correct approach for gcc-6 is to recognize that these patches 
were not ready, and disable the new functionality IMO. I'm not 
suggesting reverting the patches just yet, maybe we can solve these 
issues for gcc-7.



Do you have even a guess as to how to approach the issue differently?


Not off-hand. That's not a question we're facing for stage3 though, and 
I'd rather take a cautious approach than go deeper into the hole.



Bernd

Re: [patch] RFC asan support for i?86/x86_64-freebsd

2015-12-01 Thread Uros Bizjak

Hello!

> 2015-11-29  Andreas Tobler  
>
> * config/i386/i386.h: Define two new macros:
> SUBTARGET_SHADOW_OFFSET_64 and SUBTARGET_SHADOW_OFFSET_32.
> * config/i386/i386.c (ix86_asan_shadow_offset): Use these macros.
> * config/i386/darwin.h: Override the SUBTARGET_SHADOW_OFFSET_64
> macro.
> * config/i386/freebsd.h: Override the SUBTARGET_SHADOW_OFFSET_64
> and the SUBTARGET_SHADOW_OFFSET_32 macro.
> * config/freebsd.h (LIBASAN_EARLY_SPEC): Define.
> (LIBTSAN_EARLY_SPEC): Likewise.
> (LIBLSAN_EARLY_SPEC): Likewise.

IMO, there is no compelling reason for _64 and _32 subtargets split,
especially since it depends on TARGET_LP64, not on the usual
TARGET_64BIT. Due to this, I'd rather introduce only
TARGET_SHADOW_OFFSET, like:

#define TARGET_SHADOW_OFFSET \
  (TARGET_LP64 ? HOST_WIDE_INT_C (0x7fff8000) : HOST_WIDE_INT_1 << 29)

(and similar for other targets).

Uros.

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-01 Thread Christophe Lyon

On 30 November 2015 at 18:55, Tom de Vries  wrote:
> On 30/11/15 17:48, Jakub Jelinek wrote:
>>
>> On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote:
>>>
>>> +int
>>> +main (void)
>>> +{
>>> +  unsigned results[nEvents];
>>> +  unsigned pData[nEvents];
>>> +  unsigned coeff = 2;
>>> +
>>> +  init (&results[0], &pData[0]);
>>> +
>>> +#pragma omp parallel for
>>> +  for (int idx = 0; idx < (int)nEvents; idx++)
>>> +results[idx] = coeff * pData[idx];
>>
>>
>> Could you please add another testcase, where you have say pData
>> and some other pointer that init sets to alias with pData, and verify
>> that such loop (would need to be say normal loop inside #pragma omp single
>> or master) is not vectorized?
>
>
> I've:
> - added a simpler (not vectorizer-based) version of the testcase as
>   pr46032-2.c, and
> - copied pr46032-2.c to pr46032-3.c and modified it such that two
>   pointers are aliasing
>
> Committed to trunk.
>

Hi,

I've committed the attached patch as obvious: it adds
dg-require-effective-target fopenmp to these new tests
so that they are skipped e.g. on arm bare-metal targets
(using newlib).

Note that pr46032.c has some failures:
FAIL:  gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1
on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*,
and on armeb-none-linux-gnueabihf

I haven't looked at the details yet; see
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html
for more information.

Thanks,

Christophe.

2015-12-01  Christophe Lyon  

* gcc.dg/pr46032.c: Add dg-require-effective-target fopenmp.
* gcc.dg/pr46032-2.c: Likewise.
* gcc.dg/pr46032-3.c: Likewise.


> Thanks,
> - Tom
>
Index: gcc/testsuite/gcc.dg/pr46032-2.c
===
--- gcc/testsuite/gcc.dg/pr46032-2.c(revision 231108)
+++ gcc/testsuite/gcc.dg/pr46032-2.c(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
 
 #define N 2
Index: gcc/testsuite/gcc.dg/pr46032-3.c
===
--- gcc/testsuite/gcc.dg/pr46032-3.c(revision 231108)
+++ gcc/testsuite/gcc.dg/pr46032-3.c(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
 
 #define N 2
Index: gcc/testsuite/gcc.dg/pr46032.c
===
--- gcc/testsuite/gcc.dg/pr46032.c  (revision 231108)
+++ gcc/testsuite/gcc.dg/pr46032.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta 
-fdump-tree-vect-all" } */
 
 extern void abort (void);

Re: [PATCH, ARM] PR target/68617 Fix armv6 unaligned_access with attribute thumb

2015-12-01 Thread Kyrill Tkachov



On 01/12/15 12:28, Christian Bruel wrote:



On 12/01/2015 12:57 PM, Kyrill Tkachov wrote:

Hi Christian,

On 01/12/15 09:18, Christian Bruel wrote:

Hi,

This patches fixes the PR my making the unaligned_access flag sensitive to the 
attribute target, since some armv6 might use unaligned loads depending on the 
TARGET_32BIT flag.

OK for stage3 ?



Index: gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c
===
--- gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c (revision 0)
+++ gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c (working copy)
@@ -0,0 +1,19 @@
+/* PR target/68617
+   Verify that unaligned_access is correctly with attribute target.  */
+/* { dg-do compile } */
+/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } { 
"-march=armv6" } } */
+/* { dg-options "-Os -mfloat-abi=softfp -mtp=soft" } */
+/* { dg-add-options arm_arch_v6 } */


Do you need the -mtp=soft ?


I think so. When auto, the TP mode is "TP_SOFT" for arm and "TP_CP15" for 
thumb, which cannot be thumb1. To avoid this kind of discrepancy I prefer to force it.

this is guarded by the lines @arm.c:2759:

  if (TARGET_HARD_TP && TARGET_THUMB1_P (flags))
error ("can not use -mtp=cp15 with 16-bit Thumb");



Ok, thanks, I was just curious.
Kyrill




This is ok for trunk.
Thanks for picking this up.

Kyrill

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-01 Thread Jakub Jelinek

On Tue, Dec 01, 2015 at 01:27:32PM +0100, Christophe Lyon wrote:
> I've committed the attached patch as obvious: it adds
> dg-require-effective-target fopenmp to these new tests
> so that they are skipped e.g. on arm bare-metal targets
> (using newlib).
> 
> Note that pr46032.c has some failures:
> FAIL:  gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1
> on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*,
> and on armeb-none-linux-gnueabihf
> 
> I haven't looked at the details yet; see
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html
> for more information.

Supposedly pr46032-{2,3}.c should go into testsuite/gcc.dg/gomp/ instead and
pr46032.c into testsuite/gcc.dg/vect/ (with the fopenmp effective target and
perhaps some other effective target conditions)?
> 2015-12-01  Christophe Lyon  
> 
> * gcc.dg/pr46032.c: Add dg-require-effective-target fopenmp.
> * gcc.dg/pr46032-2.c: Likewise.
> * gcc.dg/pr46032-3.c: Likewise.

Jakub

[PATCH] Derive interface buffers from max name length

2015-12-01 Thread Bernhard Reutner-Fischer

These three function used a hardcoded buffer of 100 but would be better
off to base off GFC_MAX_SYMBOL_LEN which denotes the maximum length of a
name in any of our supported standards (63 as of f2003 ff.).

Regstrapped without regressions, ok for trunk stage3 now / next stage1?

gcc/fortran/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  

* interface.c (check_sym_interfaces, check_uop_interfaces,
gfc_check_interfaces): Base interface_name buffer off
GFC_MAX_SYMBOL_LEN.

Signed-off-by: Bernhard Reutner-Fischer 
---
 gcc/fortran/interface.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index dcf3eae..30cc522 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -1696,7 +1696,7 @@ check_interface1 (gfc_interface *p, gfc_interface *q0,
 static void
 check_sym_interfaces (gfc_symbol *sym)
 {
-  char interface_name[100];
+  char interface_name[GFC_MAX_SYMBOL_LEN + sizeof("generic interface ''")];
   gfc_interface *p;
 
   if (sym->ns != gfc_current_ns)
@@ -1733,7 +1733,7 @@ check_sym_interfaces (gfc_symbol *sym)
 static void
 check_uop_interfaces (gfc_user_op *uop)
 {
-  char interface_name[100];
+  char interface_name[GFC_MAX_SYMBOL_LEN + sizeof("operator interface ''")];
   gfc_user_op *uop2;
   gfc_namespace *ns;
 
@@ -1810,7 +1810,7 @@ void
 gfc_check_interfaces (gfc_namespace *ns)
 {
   gfc_namespace *old_ns, *ns2;
-  char interface_name[100];
+  char interface_name[GFC_MAX_SYMBOL_LEN + sizeof("intrinsic '' operator")];
   int i;
 
   old_ns = gfc_current_ns;
-- 
2.6.2

[PATCH] Commentary typo fix for gfc_typenode_for_spec()

2015-12-01 Thread Bernhard Reutner-Fischer

Regstrapped without regressions, ok for trunk stage3 now / next stage1?

gcc/fortran/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  

* trans-types.c (gfc_typenode_for_spec): Commentary typo fix.

Signed-off-by: Bernhard Reutner-Fischer 
---
 gcc/fortran/trans-types.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index 6e2b3f1..0ac337e 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -1049,7 +1049,7 @@ gfc_get_character_type (int kind, gfc_charlen * cl)
   return gfc_get_character_type_len (kind, len);
 }
 
-/* Covert a basic type.  This will be an array for character types.  */
+/* Convert a basic type.  This will be an array for character types.  */
 
 tree
 gfc_typenode_for_spec (gfc_typespec * spec)
-- 
2.6.2

[PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Bernhard Reutner-Fischer

gcc/fortran/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  

* gfortran.h (gfc_lookup_function_fuzzy): New declaration.
* resolve.c: Include spellcheck.h.
(lookup_function_fuzzy_find_candidates): New static function.
(lookup_uop_fuzzy_find_candidates): Likewise.
(lookup_uop_fuzzy): Likewise.
(resolve_operator) : Call lookup_uop_fuzzy.
(gfc_lookup_function_fuzzy): New definition.
(resolve_unknown_f): Call gfc_lookup_function_fuzzy.
* interface.c (check_interface0): Likewise.
* symbol.c: Include spellcheck.h.
(lookup_symbol_fuzzy_find_candidates): New static function.
(lookup_symbol_fuzzy): Likewise.
(gfc_set_default_type): Call lookup_symbol_fuzzy.
(lookup_component_fuzzy_find_candidates): New static function.
(lookup_component_fuzzy): Likewise.
(gfc_find_component): Call lookup_component_fuzzy.

gcc/testsuite/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  

* gfortran.dg/spellcheck-operator.f90: New testcase.
* gfortran.dg/spellcheck-procedure.f90: New testcase.
* gfortran.dg/spellcheck-structure.f90: New testcase.

---

David Malcolm nice Levenshtein distance spelling check helpers
were used in some parts of other frontends. This proposed patch adds
some spelling corrections to the fortran frontend.

Suggestions are printed if we can find a suitable name, currently
perusing a very simple cutoff factor:
/* If more than half of the letters were misspelled, the suggestion is
   likely to be meaningless.  */
cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
which effectively skips names with less than 4 characters.
For e.g. structures, one could try to be much smarter in an attempt to
also provide suggestions for single-letter members/components.

This patch covers (at least partly):
- user-defined operators
- structures (types and their components)
- functions
- symbols (variables)

I do not immediately see how to handle subroutines. Ideas?

If anybody has a testcase where a spelling-suggestion would make sense
then please pass it along so we maybe can add support for GCC-7.

Signed-off-by: Bernhard Reutner-Fischer 
---
 gcc/fortran/gfortran.h |   1 +
 gcc/fortran/interface.c|  16 ++-
 gcc/fortran/resolve.c  | 135 -
 gcc/fortran/symbol.c   | 129 +++-
 gcc/testsuite/gfortran.dg/spellcheck-operator.f90  |  30 +
 gcc/testsuite/gfortran.dg/spellcheck-procedure.f90 |  41 +++
 gcc/testsuite/gfortran.dg/spellcheck-structure.f90 |  35 ++
 7 files changed, 376 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-operator.f90
 create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
 create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-structure.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5487c93..cbfd592 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3060,6 +3060,7 @@ bool gfc_type_is_extensible (gfc_symbol *);
 bool gfc_resolve_intrinsic (gfc_symbol *, locus *);
 bool gfc_explicit_interface_required (gfc_symbol *, char *, int);
 extern int gfc_do_concurrent_flag;
+const char* gfc_lookup_function_fuzzy (const char *, gfc_symtree *);
 
 
 /* array.c */
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 30cc522..19f800f 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -1590,10 +1590,18 @@ check_interface0 (gfc_interface *p, const char 
*interface_name)
  if (p->sym->attr.external)
gfc_error ("Procedure %qs in %s at %L has no explicit interface",
   p->sym->name, interface_name, &p->sym->declared_at);
- else
-   gfc_error ("Procedure %qs in %s at %L is neither function nor "
-  "subroutine", p->sym->name, interface_name,
- &p->sym->declared_at);
+ else {
+   const char *guessed
+ = gfc_lookup_function_fuzzy (p->sym->name, p->sym->ns->sym_root);
+   if (guessed)
+ gfc_error ("Procedure %qs in %s at %L is neither function nor "
+"subroutine; did you mean %qs?", p->sym->name,
+   interface_name, &p->sym->declared_at, guessed);
+   else
+ gfc_error ("Procedure %qs in %s at %L is neither function nor "
+"subroutine", p->sym->name, interface_name,
+   &p->sym->declared_at);
+ }
  return 1;
}
 
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 685e3f5..6e1f63c 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "data.h"
 #include "target-memory.h" /* for gfc_simplify_transfer */
 #include "constructor

[PATCH] Use gfc_add_*_component defines where appropriate

2015-12-01 Thread Bernhard Reutner-Fischer

A couple of places used gfc_add_component_ref(expr, "string") instead of
the defines from gfortran.h

Regstrapped without regressions, ok for trunk stage3 now / next stage1?

gcc/fortran/ChangeLog

2015-11-29  Bernhard Reutner-Fischer  

* class.c (gfc_add_class_array_ref): Call gfc_add_data_component()
instead of gfc_add_component_ref().
(gfc_get_len_component): Call gfc_add_len_component() instead of
gfc_add_component_ref().
* trans-intrinsic.c (gfc_conv_intrinsic_loc): Call
gfc_add_data_component() instead of gfc_add_component_ref().
* trans.c (gfc_add_finalizer_call): Call
gfc_add_final_component() and gfc_add_size_component() instead
of gfc_add_component_ref.

Signed-off-by: Bernhard Reutner-Fischer 
---
 gcc/fortran/class.c   | 4 ++--
 gcc/fortran/trans-intrinsic.c | 2 +-
 gcc/fortran/trans.c   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 8b49ae9..027cb89 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -258,7 +258,7 @@ gfc_add_class_array_ref (gfc_expr *e)
   int rank = CLASS_DATA (e)->as->rank;
   gfc_array_spec *as = CLASS_DATA (e)->as;
   gfc_ref *ref = NULL;
-  gfc_add_component_ref (e, "_data");
+  gfc_add_data_component (e);
   e->rank = rank;
   for (ref = e->ref; ref; ref = ref->next)
 if (!ref->next)
@@ -584,7 +584,7 @@ gfc_get_len_component (gfc_expr *e)
   ref = ref->next;
 }
   /* And replace if with a ref to the _len component.  */
-  gfc_add_component_ref (ptr, "_len");
+  gfc_add_len_component (ptr);
   return ptr;
 }
 
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 1dabc26..2ef0709 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -7112,7 +7112,7 @@ gfc_conv_intrinsic_loc (gfc_se * se, gfc_expr * expr)
   if (arg_expr->rank == 0)
 {
   if (arg_expr->ts.type == BT_CLASS)
-   gfc_add_component_ref (arg_expr, "_data");
+   gfc_add_data_component (arg_expr);
   gfc_conv_expr_reference (se, arg_expr);
 }
   else
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 2a91c35..14dad0f 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -1132,11 +1132,11 @@ gfc_add_finalizer_call (stmtblock_t *block, gfc_expr 
*expr2)
 
   final_expr = gfc_copy_expr (expr);
   gfc_add_vptr_component (final_expr);
-  gfc_add_component_ref (final_expr, "_final");
+  gfc_add_final_component (final_expr);
 
   elem_size = gfc_copy_expr (expr);
   gfc_add_vptr_component (elem_size);
-  gfc_add_component_ref (elem_size, "_size");
+  gfc_add_size_component (elem_size);
 }
 
   gcc_assert (final_expr->expr_type == EXPR_VARIABLE);
-- 
2.6.2

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek

On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> 
> > On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> > 
> >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> >> Ok, but it doesn't solve the issue with doing it for the executable, 
> >> because
> >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> > 
> > ?? You mean that the
> > devicep->unload_image_func (devicep->target_id, version, target_data);
> > call deinitializes the device or something else (I mean, if there is some
> > other tgt, then it had to be initialized)?
> 
> No, I mean that it can be deinitialized from plugin's __run_exit_handlers 
> (see my last mail with the patch).

Then the bug is that you have too many atexit registered handlers that
perform some finalization, better would be to have a single one that
performs everything in order.

Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
intelmic plugin) to set some flag and ignore free_func calls when the flag
is set or something like that.

Note library destructors can also use OpenMP code in them, similarly C++
dtors etc., so when you at some point finalize certain device, you should
arrange for newer events on the device to be ignored and new offloadings to
go to host fallback.

Jakub

Re: Solaris vtv port breaks x32 build

2015-12-01 Thread Bernd Schmidt


(add gcc-patches)

On 12/01/2015 08:39 AM, Matthias Klose wrote:

On 01.12.2015 03:58, Ulrich Drepper wrote:

On Mon, Nov 30, 2015 at 9:14 PM, Jeff Law  wrote:

Right, but isn't AC_COMPILE_IFELSE a compile test, not a run test?



The problem macro is _AC_COMPILER_EXEEXT_WORKS.  The message is at the
end.

This macro *should* work for cross-compiling but somehow it doesn't
work.  In libvtv/configure $cross_compiling is not defined
appropriately.  I'm configuring with the following which definitely
indicates that cross-compiling is selected.


that might be another instance of
https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02064.html
Does something like this help?


Given that your the change you referenced in the archive was installed, 
I think your suggestion for libvtv can be checked in as obvious if it helps.



Bernd

Re: PING: [PATCH] PR target/67215: -fno-plt needs improvements for x86

2015-12-01 Thread David Edelsohn

On Wed, 28 Oct 2015 at 18:14 PM, H.J. Lu wrote:
> On Wed, Oct 28, 2015 at 6:11 PM, Bernd Schmidt  wrote:
>> On 10/29/2015 02:10 AM, H.J. Lu wrote:
>>>
>>> On Wed, Oct 28, 2015 at 5:23 PM, Jeff Law  wrote:


 So I'll ask again, why did you commit a patch which you clearly knew did
 not
 meet the conditions Bernd set forth for approval?
>>>
>>>
>>> I believed that aarch64 backend didn't properly handle -fno-plt,
>>> which shouldn't block my patch.
>>
>>
>> This really isn't how the rules work, and you've been around long enough to
>> know it.
>>
>
> Sometimes It seems that it is the only way to get attention from the
> community.  BTW, my patch was submitted in August.

H.J.:

Because you have committed unapproved patches on several occasions,
contrary to specific requests from GCC reviewers, the GCC Steering
Committee has voted to suspend your committer privileges to the GCC
Repository for two weeks. Future unapproved commits will lead to
longer suspensions.

- The GCC Steering Committee

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Richard Sandiford

Bernd Schmidt  writes:
> On 11/26/2015 05:22 PM, Richard Sandiford wrote:
>> Bernd Schmidt  writes:
>>
>>> I wish we'd taken some more time to think through the consequences of
>>> the original internal_fn patchset.
>>
>> I don't think this PR shows that the approach was wrong.
>
> I think it does. Internal functions make a new assumptions, that 
> expanders don't FAIL - but as we've now seen, they do. The optimize_size 
> thing is reasonably easy to grep for and it looks like only i386 is 
> affected, but have you looked at every expander in every port that could 
> be used by an internal function to ensure it does not FAIL for a 
> different reason?

I've tried and I couldn't see any other problems.

I don't think what you say is an argument that the approach is wrong.
The C conditions for optabs have always been more restricted than
other define_expands and define_insns, since they cannot refer
to operands.  When caching of optabs was added, they also lost
the ability to test for size/speed choices.  There have also
always been optabs that are not allowed to FAIL (such as moves,
get_thread_pointer, widening multiplication, vec_cond, etc.).
This series is extending that list, but it's in the spirit
of restrictions that have always existed.  I don't see that
that's an argument that the approach is wrong.

The current approach to FAILs dated from a time when expand was
the first code-generation pass.  The FAILs aren't a good fit for
gimple optimisers that are trying to find out what the target
can do (and how cheaply it can do it).

> Is there a simple way to disable the entire internal_fn machinery and 
> get us back to where we were in gcc-5, without taking out all the code 
> immediately? That would give us time until next stage 1 to think through 
> the issues.

That seems like an overreaction.

I went for the 22-patch series because I think it's the best fix for
this problem.  It also makes the existing enabled, preferred_for_size
and preferred_for_speed handling more robust (as shown by the ARM bug
that the structural changes exposed at compile time).  But there are
other less-invasive ways of fixing it too, as described in the thread
about rsqrt.  I'm going to work on that today.

Thanks,
Richard

Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-01 Thread Dominik Vogt

On Tue, Dec 01, 2015 at 10:59:54AM +0100, Dominik Vogt wrote:
> @@ -3336,11 +3342,12 @@
> (set_attr "type" "vs")])
>  
>  (define_insn "*setmem_long_31z"
> -  [(clobber (match_operand:TI 0 "register_operand" "=d"))
> -   (set (mem:BLK (subreg:SI (match_operand:TI 3 "register_operand" "0") 4))
> -(match_operand 2 "shift_count_or_setmem_operand" "Y"))
> -   (use (match_dup 3))
> -   (use (match_operand:TI 1 "register_operand" "d"))
> +  [(clobber
> +(mem:BLK (subreg:SI (match_operand:TI 0 "register_operand" "=d") 4)))
> +   (set (mem:BLK (subreg:SI (match_operand:TI 1 "register_operand" "0") 0))
> +(unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
 ^^^

match_operand:SI

> + (subreg:P (match_dup 1) 8)] UNSPEC_REPLICATE_BYTE))
 
subreg:SI

> +   (use (match_operand:TI 3 "register_operand" "d"))
> (clobber (reg:CC CC_REGNUM))]
>"!TARGET_64BIT && TARGET_ZARCH"
>"mvcle\t%0,%1,%Y2\;jo\t.-4"

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Jeff Law


On 12/01/2015 04:12 AM, Richard Biener wrote:

On Mon, 30 Nov 2015, Gary Funck wrote:



Some time ago, we submitted an RFC for the introduction of
UPC support into GCC.  During the intervening time period,
we have continued to keep the 'gupc' (GNU UPC) branch in sync
with the GCC trunk and have incorporated feedback and contributions from
various GCC developers (Joseph Myers, Tom Tromey, Jakub Jelinek,
Richard Henderson, Meador Inge, and others).  We have also implemented
various bug fixes and improvements.

At this time, we would like to re-submit the UPC patches for comment
with the goal of introducing these changes into GCC 6.0.


First of all let me say that it is IMNSHO now too late for GCC 6.
Agreed.  I put it in my queue of stuff to look at in the spring when 
development opens for GCC 7.


jeff

Re: [PR68432 00/26] Handle size/speed choices for internal functions

2015-12-01 Thread Bernd Schmidt


On 12/01/2015 02:43 PM, Richard Sandiford wrote:

I don't think what you say is an argument that the approach is wrong.
The C conditions for optabs have always been more restricted than
other define_expands and define_insns, since they cannot refer
to operands.  When caching of optabs was added, they also lost
the ability to test for size/speed choices.  There have also
always been optabs that are not allowed to FAIL (such as moves,
get_thread_pointer, widening multiplication, vec_cond, etc.).
This series is extending that list, but it's in the spirit
of restrictions that have always existed.  I don't see that
that's an argument that the approach is wrong.


Ok, you can of course change the rules, but that means the following 
needs to be done as a minimum (and it should have been done initially):

 * the new rules must be documented
 * all existing expanders need to be examined to see whether they
   comply.

At the moment we don't know how widespread the problem is. If you're 
willing to do the audit of all ports then I'd be more willing to 
consider this suitable for gcc-6.



Bernd

[PATCH] Fix PR68590

2015-12-01 Thread Richard Biener


This fixes the PR in another way as well, allowing as many capture
uses in the replacement expression as in the original one, avoiding
some spurious save_exprs that way (and not perform "CSE" within
the match-and-simplify framework).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-12-01  Richard Biener  

PR middle-end/68590
* genmatch.c (struct capture_info): Add match_use_count.
(capture_info::walk_match): Increment match_use_count.
(dt_simplify::gen_1): For GENERIC, only wrap multi-use
replacements in a save_expr if they occur more often than
in the original expression.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 231101)
+++ gcc/genmatch.c  (working copy)
@@ -1851,7 +1850,8 @@ struct capture_info
   bool force_single_use;
   bool cond_expr_cond_p;
   unsigned long toplevel_msk;
-  int result_use_count;
+  unsigned match_use_count;
+  unsigned result_use_count;
   unsigned same_as;
   capture *c;
 };
@@ -1901,6 +1901,7 @@ capture_info::walk_match (operand *o, un
   if (capture *c = dyn_cast  (o))
 {
   unsigned where = c->where;
+  info[where].match_use_count++;
   info[where].toplevel_msk |= 1 << toplevel_arg;
   info[where].force_no_side_effects_p |= conditional_p;
   info[where].cond_expr_cond_p |= cond_expr_cond_p;
@@ -3106,13 +3107,16 @@ dt_simplify::gen_1 (FILE *f, int indent,
  else if (is_a  (opr))
is_predicate = true;
  /* Search for captures used multiple times in the result expression
-and dependent on TREE_SIDE_EFFECTS emit a SAVE_EXPR.  */
+and wrap them in a SAVE_EXPR.  Allow as many uses as in the
+original expression.  */
  if (!is_predicate)
for (int i = 0; i < s->capture_max + 1; ++i)
  {
-   if (cinfo.info[i].same_as != (unsigned)i)
+   if (cinfo.info[i].same_as != (unsigned)i
+   || cinfo.info[i].cse_p)
  continue;
-   if (cinfo.info[i].result_use_count > 1)
+   if (cinfo.info[i].result_use_count
+   > cinfo.info[i].match_use_count)
  fprintf_indent (f, indent,
  "captures[%d] = save_expr (captures[%d]);\n",
  i, i);

[PATCH] Fix PR68379

2015-12-01 Thread Richard Biener


The following fixes PR68379.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-30  Richard Biener  

PR tree-optimization/68379
* tree-vect-stmts.c (vectorizable_load): For BB vectorization
always base loads on the first used DR of a group.
* tree-vect-data-refs.c (vect_slp_analyze_and_verify_node_alignment):
Compute alignment of the first scalar element unconditionally.

* gcc.dg/torture/pr68379.c: New testcase.
* gfortran.dg/pr68379-1.f90: Likewise.
* gfortran.dg/pr68379-2.f: Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 231065)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -6130,6 +6133,7 @@ vectorizable_load (gimple *stmt, gimple_
   bool grouped_load = false;
   bool load_lanes_p = false;
   gimple *first_stmt;
+  gimple *first_stmt_for_drptr = NULL;
   bool inv_p;
   bool negative = false;
   bool compute_in_loop = false;
@@ -6733,10 +6737,14 @@ vectorizable_load (gimple *stmt, gimple_
   if (grouped_load)
 {
   first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
-  /* For BB vectorization we directly vectorize a subchain
+  /* For SLP vectorization we directly vectorize a subchain
  without permutation.  */
   if (slp && ! SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
-first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0];
+   first_stmt = SLP_TREE_SCALAR_STMTS (slp_node)[0];
+  /* For BB vectorization always use the first stmt to base
+the data ref pointer on.  */
+  if (bb_vinfo)
+   first_stmt_for_drptr = SLP_TREE_SCALAR_STMTS (slp_node)[0];
 
   /* Check if the chain of loads is already vectorized.  */
   if (STMT_VINFO_VEC_STMT (vinfo_for_stmt (first_stmt))
@@ -6948,6 +6956,24 @@ vectorizable_load (gimple *stmt, gimple_
  (DR_REF (first_dr)), 0);
  inv_p = false;
}
+ else if (first_stmt_for_drptr
+  && first_stmt != first_stmt_for_drptr)
+   {
+ dataref_ptr
+   = vect_create_data_ref_ptr (first_stmt_for_drptr, aggr_type,
+   at_loop, offset, &dummy, gsi,
+   &ptr_incr, simd_lane_access_p,
+   &inv_p, byte_offset);
+ /* Adjust the pointer by the difference to first_stmt.  */
+ data_reference_p ptrdr
+   = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt_for_drptr));
+ tree diff = fold_convert (sizetype,
+   size_binop (MINUS_EXPR,
+   DR_INIT (first_dr),
+   DR_INIT (ptrdr)));
+ dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
+stmt, diff);
+   }
  else
dataref_ptr
  = vect_create_data_ref_ptr (first_stmt, aggr_type, at_loop,
Index: gcc/testsuite/gcc.dg/torture/pr68379.c
===
--- gcc/testsuite/gcc.dg/torture/pr68379.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68379.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+int a, b[3], c[3][5];
+
+void
+fn1 ()
+{
+  int e;
+  for (a = 2; a >= 0; a--)
+for (e = 0; e < 4; e++)
+  c[a][e] = b[a];
+}
Index: gcc/testsuite/gfortran.dg/pr68379-1.f90
===
--- gcc/testsuite/gfortran.dg/pr68379-1.f90 (revision 0)
+++ gcc/testsuite/gfortran.dg/pr68379-1.f90 (working copy)
@@ -0,0 +1,35 @@
+! { dg-do compile }
+! { dg-options "-O3" }
+MODULE qs_efield_berry
+  TYPE cp_error_type
+  END TYPE
+  INTEGER, PARAMETER :: dp=8
+  TYPE qs_energy_type
+REAL(KIND=dp), POINTER :: efield
+  END TYPE
+  TYPE qs_environment_type
+  END TYPE
+  INTERFACE 
+SUBROUTINE foo(qs_env,energy,error)
+   IMPORT 
+   TYPE(qs_environment_type), POINTER :: qs_env
+   TYPE(cp_error_type)  :: error
+   TYPE(qs_energy_type), POINTER   :: energy
+END SUBROUTINE
+  END INTERFACE
+CONTAINS
+  SUBROUTINE qs_efield_mo_derivatives()
+TYPE(qs_environment_type), POINTER :: qs_env
+TYPE(cp_error_type)  :: error
+COMPLEX(dp)  ::   zi(3), zphase(3)
+REAL(dp) :: ci(3)
+TYPE(qs_energy_type), POINTER  :: energy
+CALL foo(qs_env, energy, error)
+zi = zi * zphase
+ci = AIMAG(LOG(zi))
+DO idir=1,3
+   ener_field=ener_field+ci(idir)*fieldfac(idir)
+END DO
+energy%efield=ener_field
+  END SUBROUTINE qs_efield_mo_derivatives
+END MODULE qs_efield_berry
Index: gcc/testsuite/gfortran.dg/pr68379-2.f

[PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-01 Thread Tom de Vries


[ was: Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta ]

On 30/11/15 17:36, Tom de Vries wrote:

On 30/11/15 14:24, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


On 30/11/15 10:16, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


Hi,

this patch fixes PR46032.

It handles a call:
...
__builtin_GOMP_parallel (fn, data, num_threads, flags)
...
as:
...
fn (data)
...
in ipa-pta.

This improves ipa-pta alias analysis in the parallelized function
fn,


This follow-up patch does the same for BUILT_IN_GOACC_PARALLEL.

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom
Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-01  Tom de Vries  

	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
	(find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL.

	* c-c++-common/goacc/kernels-alias-ipa-pta-2.c: New test.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: New test.
	* c-c++-common/goacc/kernels-alias-ipa-pta.c: New test.

---
 .../c-c++-common/goacc/kernels-alias-ipa-pta-2.c   | 37 ++
 .../c-c++-common/goacc/kernels-alias-ipa-pta-3.c   | 36 +
 .../c-c++-common/goacc/kernels-alias-ipa-pta.c | 23 ++
 gcc/tree-ssa-structalias.c | 28 +---
 .../kernels-alias-ipa-pta-2.c  | 27 
 .../kernels-alias-ipa-pta-3.c  | 26 +++
 .../kernels-alias-ipa-pta.c| 26 +++
 7 files changed, 199 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
new file mode 100644
index 000..f16d698
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
@@ -0,0 +1,37 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+typedef __SIZE_TYPE__ size_t;
+void *malloc (size_t);
+void free (void *);
+#ifdef __cplusplus
+}
+#endif
+
+#define N 2
+
+void
+foo (void)
+{
+  unsigned int *a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  unsigned int *b = (unsigned int *)malloc (N * sizeof (unsigned int));
+  unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N])
+  {
+a[0] = 0;
+b[0] = 1;
+c[0] = a[0];
+  }
+
+  free (a);
+  free (b);
+  free (c);
+}
+
+/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n)= \\*a" 0 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
new file mode 100644
index 000..1eb56eb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
@@ -0,0 +1,36 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+typedef __SIZE_TYPE__ size_t;
+void *malloc (size_t);
+void free (void *);
+#ifdef __cplusplus
+}
+#endif
+
+#define N 2
+
+void
+foo (void)
+{
+  unsigned int *a = (unsigned int *)malloc (N * sizeof (unsigned int));
+  unsigned int *b = a;
+  unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N])
+  {
+a[0] = 0;
+b[0] = 1;
+c[0] = a[0];
+  }
+
+  free (a);
+  free (c);
+}
+
+/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n)= \\*a" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
new file mode 100644
index 000..969b466
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+void
+foo (void)
+{
+  unsigned int a[N];
+  unsigned int b[N];
+  unsigned int c[N];
+
+#pragma acc kernels pcopyout (a, b, c)
+  {
+a[0] = 0;
+b[0] = 1;
+c[0] = a[0];
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "(?n)= 0;$" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n)= 1;$" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n)= \\*_\[0-9\]\\\[0\\\];$" 0 "optimized" } } */
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 7f4a8ad..060ff3e 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4507,15 +4507,32 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	  return true;
 	}
   case BUILT_IN_GOMP_PA

[PATCH, rs6000] Fix analyze_swaps to handle vperm for large and small code models

2015-12-01 Thread Bill Schmidt

Hi,

Uli Weigand discovered that the gcc.target/powerpc/swaps-p8-21.c test
case fails when large and small code models are used, rather than the
default medium code model.  This is because analyze_swaps is determining
whether the mask used for a vperm insn is loaded from the constant pool,
and there is an extra indirection for such loads when the large or small
code model is used.  This patch changes analyze_swaps to handle the
extra indirection correctly.  A new test case variant is added to check
for it.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


[gcc]

2015-12-01  Bill Schmidt  

* config/rs6000/rs6000.c (const_load_sequence_p): Handle extra
indirection for large and small code models.
(adjust_vperm): Likewise.

[gcc/testsuite]

2015-12-01  Bill Schmidt  

* gcc.target/powerpc/swaps-p8-22.c: New.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 231083)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -36613,7 +36613,12 @@ const_load_sequence_p (swap_web_entry *insn_entry,
  rtx base, offset;
  if (GET_CODE (tocrel_body) != SET)
return false;
- if (!toc_relative_expr_p (SET_SRC (tocrel_body), false))
+ /* There is an extra level of indirection for small/large
+code models.  */
+ rtx tocrel_expr = SET_SRC (tocrel_body);
+ if (GET_CODE (tocrel_expr) == MEM)
+   tocrel_expr = XEXP (tocrel_expr, 0);
+ if (!toc_relative_expr_p (tocrel_expr, false))
return false;
  split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
  if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
@@ -37294,10 +37299,19 @@ adjust_vperm (rtx_insn *insn)
  to set tocrel_base; otherwise it would be unnecessary as we've
  already established it will return true.  */
   rtx base, offset;
-  if (!toc_relative_expr_p (SET_SRC (PATTERN (tocrel_insn)), false))
+  rtx tocrel_expr = SET_SRC (PATTERN (tocrel_insn));
+  /* There is an extra level of indirection for small/large code models.  */
+  if (GET_CODE (tocrel_expr) == MEM)
+tocrel_expr = XEXP (tocrel_expr, 0);
+  if (!toc_relative_expr_p (tocrel_expr, false))
 gcc_unreachable ();
   split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
   rtx const_vector = get_pool_constant (base);
+  /* With the extra indirection, get_pool_constant will produce the
+ real constant from the reg_equal expression, so get the real
+ constant.  */
+  if (GET_CODE (const_vector) == SYMBOL_REF)
+const_vector = get_pool_constant (const_vector);
   gcc_assert (GET_CODE (const_vector) == CONST_VECTOR);
 
   /* Create an adjusted mask from the initial mask.  */
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-22.c
===
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-22.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-22.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } }
+/* { dg-options "-O2 -mcpu=power8 -maltivec -mcmodel=large" } */
+
+/* The expansion for vector character multiply introduces a vperm operation.
+   This tests that changing the vperm mask allows us to remove all swaps
+   from the generated code.  It is a duplicate of swaps-p8-21.c, except
+   that it applies the large code model, which requires an extra indirection
+   in the load of the constant mask.  */
+
+#include 
+
+void abort ();
+
+vector unsigned char r;
+vector unsigned char v =
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
+vector unsigned char i =
+  { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 };
+
+int main ()
+{
+  int j;
+  r = v * i;
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vperm" 1 } } */
+/* { dg-final { scan-assembler-not "xxpermdi" } } */

[PATCH] Fix PR68625

2015-12-01 Thread Richard Biener


I am testing the following patch to avoid CFG cleanup calling merge-blocks
in dead regions thereby exposing that (in this case) copyprop propagates
a def from a dead region to another place in another dead region
(itself not realizing they are dead).  The issue with that is that
if CFG cleanup first removes the region with the def and later
tries to propagate out a PHI merging two blocks (in the other dead
region, before finally removing it) it ICEs because may_propagate_copy
is not happy with being called on released SSA names.

The above scheme can only happen if a propagator created "invalid"
SSA form by viewing some edges as non-executable thereby making
the use being dominated by the def only "virtually".  It forces
non-executable edges to be optimized away by CFG cleanup by making
the controlling conditions trivially true/false.

The fix is to make the SSA form "valid" as a very first thing in
CFG cleanup, thus remove trivially dead edges before trying to
execute BB merging opportunities.

Eventually we can avoid iterating cleanup_control_flow_bb (secondary
opportunities should be very rare) but I didn't want to change
this during stage3.  I'll place a gcc_assert there in another
bootstrap anyway (in theory out-propagating a PHI during BB merging
could expose a trivially true/false condition though BB merging)

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-12-01  Richard Biener  

PR tree-optimization/68625
* tree-cfgcleanup.c (cleanup_tree_cfg_bb): Do not call
cleanup_control_flow_bb.
(cleanup_tree_cfg_1): First perform cleanup_control_flow_bb
on all BBs, then cleanup_tree_cfg_bb and finally iterate
over the worklist doing both.

* gcc.dg/torture/pr68625.c: New testcase.

Index: gcc/tree-cfgcleanup.c
===
*** gcc/tree-cfgcleanup.c   (revision 231101)
--- gcc/tree-cfgcleanup.c   (working copy)
*** fixup_noreturn_call (gimple *stmt)
*** 614,621 
  static bool
  cleanup_tree_cfg_bb (basic_block bb)
  {
-   bool retval = cleanup_control_flow_bb (bb);
- 
if (tree_forwarder_block_p (bb, false)
&& remove_forwarder_block (bb))
  return true;
--- 614,619 
*** cleanup_tree_cfg_bb (basic_block bb)
*** 640,646 
}
  }
  
!   return retval;
  }
  
  /* Iterate the cfg cleanups, while anything changes.  */
--- 638,644 
}
  }
  
!   return false;
  }
  
  /* Iterate the cfg cleanups, while anything changes.  */
*** cleanup_tree_cfg_1 (void)
*** 660,667 
   recording of edge to CASE_LABEL_EXPR.  */
start_recording_case_labels ();
  
!   /* Start by iterating over all basic blocks.  We cannot use FOR_EACH_BB_FN,
   since the basic blocks may get removed.  */
n = last_basic_block_for_fn (cfun);
for (i = NUM_FIXED_BLOCKS; i < n; i++)
  {
--- 658,683 
   recording of edge to CASE_LABEL_EXPR.  */
start_recording_case_labels ();
  
!   /* We cannot use FOR_EACH_BB_FN for the BB iterations below
   since the basic blocks may get removed.  */
+ 
+   /* Start by iterating over all basic blocks looking for edge removal
+  opportunities.  Do this first because incoming SSA form may be
+  invalid and we want to avoid performing SSA related tasks such
+  as propgating out a PHI node during BB merging in that state.  */
+   n = last_basic_block_for_fn (cfun);
+   for (i = NUM_FIXED_BLOCKS; i < n; i++)
+ {
+   bb = BASIC_BLOCK_FOR_FN (cfun, i);
+   if (bb)
+   retval |= cleanup_control_flow_bb (bb);
+ }
+ 
+   /* After doing the above SSA form should be valid (or an update SSA
+  should be required).  */
+ 
+   /* Continue by iterating over all basic blocks looking for BB merging
+  opportunities.  */
n = last_basic_block_for_fn (cfun);
for (i = NUM_FIXED_BLOCKS; i < n; i++)
  {
*** cleanup_tree_cfg_1 (void)
*** 682,687 
--- 698,704 
if (!bb)
continue;
  
+   retval |= cleanup_control_flow_bb (bb);
retval |= cleanup_tree_cfg_bb (bb);
  }
  
Index: gcc/testsuite/gcc.dg/torture/pr68625.c
===
*** gcc/testsuite/gcc.dg/torture/pr68625.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr68625.c  (working copy)
***
*** 0 
--- 1,51 
+ /* { dg-do compile } */
+ /* { dg-additional-options "-w" } */
+ 
+ int **dp;
+ int sg;
+ 
+ void
+ z9(void)
+ {
+   int pz, oi, vz, yp, zi, hd, pw, gr, w9 = 0, j0 = -1, rb = &w9;
+   int *lr;
+   while (w9 < 1) {
+   lr++;
+   *lr = 1;
+   if (*lr < 1)
+   for (;;)
+ if (pz && *lr) {
+ ee:
+ **dp = 0;
+ }
+   pz = zi = vz;
+   if (j0 ^ (vz > 0))
+   continue;
+   **dp = 1;
+   while (**dp)
+   if (++oi) {
+   int mq = dp;
+   j0 = 1;
+

Re: [PATCH, rs6000] Fix analyze_swaps to handle vperm for large and small code models

2015-12-01 Thread David Edelsohn

> Uli Weigand discovered that the gcc.target/powerpc/swaps-p8-21.c test
case fails when large and small code models are used, rather than the
default medium code model.  This is because analyze_swaps is determining
whether the mask used for a vperm insn is loaded from the constant pool,
and there is an extra indirection for such loads when the large or small
code model is used.  This patch changes analyze_swaps to handle the
extra indirection correctly.  A new test case variant is added to check
for it.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


[gcc]

2015-12-01  Bill Schmidt  

* config/rs6000/rs6000.c (const_load_sequence_p): Handle extra
indirection for large and small code models.
(adjust_vperm): Likewise.

[gcc/testsuite]

2015-12-01  Bill Schmidt  

* gcc.target/powerpc/swaps-p8-22.c: New.

Okay.

Thanks, David

Re: [RFA] Implement incremental IL linking

2015-12-01 Thread Richard Biener

On Tue, 1 Dec 2015, Jan Hubicka wrote:

> > Hi,
> > this is polished version of the patch to implement IL level incremental 
> > inking.
> > -flinker-output is now documented and can be specified to the GCC driver.
> > In this case plugin gets option -linker-output-known and it stops from
> > attempts to detect it from info passed down by linker. I also added doc for
> > the flag to invoke.texi
> > 
> > Modulo the testsuite compensation the rest of patch is basically unchanged
> > since earlier version: lto-wrapper looks for linker-output flag and 
> > switches to
> > non-WPA mode (because we do not want to execute ltrans compilatoins) and lto
> > frontends configure the compiler to output IL and possibly flat lto binary 
> > to
> > the object file.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> Hmm and now for the fun part.  I just noticed that the patch works well with 
> both
> GNU LD and Gold from my system instalation, wich is
> 
> GNU gold (GNU Binutils 2.24.51.20140405) 1.11
> 
> while newer version:
> 
> GNU gold (GNU Binutils 2.25.51.20150520) 1.11
> 
> fails with:
> 
> /tmp/ccPIuUSA.lto.o: plugin needed to handle lto object
> 
> in the final stage of incremental linking.  This seems like binutils bug 
> - the message should be output only if there are LTO objects not claimed 
> by the linker before invoking the plugin. There is no need to error out 
> when plugin itself produce IL for incremental linking.

Ah, yeah - I ran into this issue with GNU ld as well with LTO early 
debug...

Richard.

> I will check if new version fixes it and fill in PR.  I suppose I can 
> whitelist
> ld versions in the plugin and enable -flinker-output=rel only on binutils
> version where this works. There is LDPT_GOLD_VERSION which tells me the
> info.  I will update patch accordingly and check what version range refuses to
> finish the link.
> 
> Honza
> 
> > 
> > Honza
> > 
> > * lto-plugin.c: Document options; add -linker-output-known;
> > determine when to use rel and when nolto-rel output.
> > 
> > * lto-wrapper.c (run_gcc): Look for -flinker-output=rel also in the
> > list of options passed from the driver.
> > * passes.c (ipa_write_summaries): Only modify statements if body
> > is in memory.
> > * cgraphunit.c (ipa_passes): Also produce intermeidate code when
> > incrementally linking.
> > (ipa_passes): LIkewise.
> > * lto-cgraph.c (lto_output_node): When incrementally linking do not
> > pass down resolution info.
> > * common.opt (flag_incremental_link): Update info.
> > * gcc.c (plugin specs): Turn flinker-output=* to
> > -plugin-opt=-linker-output-known
> > * toplev.c (compile_file): Also cut compilation when doing incremental
> > link.
> > * flag-types.h (enum lto_partition_model): Add
> > LTO_LINKER_OUTPUT_NOLTOREL.
> > (invoke.texi): Add -flinker-output docs.
> > 
> > * lang.opt (lto_linker_output): Add nolto-rel.
> > * lto-lang.c (lto_post_options): Handle LTO_LINKER_OUTPUT_REL
> > and LTO_LINKER_OUTPUT_NOLTOREL:.
> > (lto_init): Generate lto when doing incremental link.
> > 
> > * gcc.dg/lto/20081120-2_0.c: Add -flinker-output=nolto-rel
> > * gcc.dg/lto/20090126-1_0.c: Likewise.
> > * gcc.dg/lto/20091020-2_0.c: Likewise.
> > * gcc.dg/lto/20081204-2_0.c: Likewise.
> > * gcc.dg/lto/20091015-1_0.c: Likewise.
> > * gcc.dg/lto/20090126-2_0.c: Likewiwe.
> > * gcc.dg/lto/20090116_0.c: Likewise.
> > * gcc.dg/lto/20081224_0.c: Likewise.
> > * gcc.dg/lto/20091027-1_0.c: Likewise.
> > * gcc.dg/lto/20090219_0.c: Likewise.
> > * gcc.dg/lto/20081212-1_0.c: Likewise.
> > * gcc.dg/lto/20091013-1_0.c: Likewise.
> > * gcc.dg/lto/20081126_0.c: Likewise.
> > * gcc.dg/lto/20090206-1_0.c: Likewise.
> > * gcc.dg/lto/20091016-1_0.c: Likewise.
> > * gcc.dg/lto/20081120-1_0.c: Likewise.
> > * gcc.dg/lto/20091020-1_0.c: Likewise.
> > * gcc.dg/lto/20100426_0.c: Likewise.
> > * gcc.dg/lto/20081204-1_0.c: Likewise.
> > * gcc.dg/lto/20091014-1_0.c: Likewise.
> > * g++.dg/lto/20081109-1_0.C: Likewise.
> > * g++.dg/lto/20100724-1_0.C: Likewise.
> > * g++.dg/lto/20081204-1_0.C: Likewise.
> > * g++.dg/lto/pr45679-2_0.C: Likewise.
> > * g++.dg/lto/20110311-1_0.C: Likewise.
> > * g++.dg/lto/20090302_0.C: Likewise.
> > * g++.dg/lto/20081118_0.C: Likewise.
> > * g++.dg/lto/20091002-2_0.C: Likewise.
> > * g++.dg/lto/20081120-2_0.C: Likewise.
> > * g++.dg/lto/20081123_0.C: Likewise.
> > * g++.dg/lto/20090313_0.C: Likewise.
> > * g++.dg/lto/pr54625-1_0.c: Likewise.
> > * g++.dg/lto/pr48354-1_0.C: Likewise.
> > * g++.dg/lto/20081219_0.C: Likewise.
> > * g++.dg/lto/pr48042_0.C: Likewise.
> > * g++.dg/lto/20101015-2_0.C: Likewise.
> > * g++.dg/lto/pr45679-1_0.C: Likewise.
> > * g++.dg/lto/20091026-1_0.C: Likewise.
> > * g++.dg/lto/pr45621_0.C: Likewise.
> > * g++.dg/lto/20081119-1_0.C: Likewise

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-01 Thread Jakub Jelinek

On Tue, Dec 01, 2015 at 03:25:42PM +0100, Tom de Vries wrote:
> Handle BUILT_IN_GOACC_PARALLEL in ipa-pta
> 
> 2015-12-01  Tom de Vries  
> 
>   * tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
>   (find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL.

Isn't this cheating though?  The kernel will be called with those addresses
only if doing host fallback (and for GOMP_target_ext even not for that
always - firstprivate vars will have the addresses replaced by addresses of
alloca-ed copies of those objects).
I haven't studied in detail what exactly IPA-PTA does, so maybe it is good
enough to pretend that.

Jakub

Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-01 Thread Richard Biener

On Tue, 1 Dec 2015, Tom de Vries wrote:

> [ was: Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta ]
> 
> On 30/11/15 17:36, Tom de Vries wrote:
> > On 30/11/15 14:24, Richard Biener wrote:
> > > On Mon, 30 Nov 2015, Tom de Vries wrote:
> > > 
> > > > On 30/11/15 10:16, Richard Biener wrote:
> > > > > On Mon, 30 Nov 2015, Tom de Vries wrote:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > this patch fixes PR46032.
> > > > > > 
> > > > > > It handles a call:
> > > > > > ...
> > > > > > __builtin_GOMP_parallel (fn, data, num_threads, flags)
> > > > > > ...
> > > > > > as:
> > > > > > ...
> > > > > > fn (data)
> > > > > > ...
> > > > > > in ipa-pta.
> > > > > > 
> > > > > > This improves ipa-pta alias analysis in the parallelized function
> > > > > > fn,
> 
> This follow-up patch does the same for BUILT_IN_GOACC_PARALLEL.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage3 trunk?

Ok.

Richard.

[PATCH] Don't ignore noreturn functions for "unused" warning (PR middle-end/68582)

2015-12-01 Thread Marek Polacek

We were failing to give "defined but not used" warning for functions marked
with the attribute noreturn/volatile.  The problem is that for functions the
TREE_THIS_VOLATILE flag means something different than for decls.  The fix is
to check the flag only for VAR_DECLs, as suggested by Richi in the PR.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-01  Marek Polacek  

PR middle-end/68582
* cgraphunit.c (check_global_declaration): Only depend on 
TREE_THIS_VOLATILE
for VAR_DECLs.

* c-c++-common/pr68582.c: New test.

diff --git gcc/cgraphunit.c gcc/cgraphunit.c
index f73d9a7..4ce5f9b 100644
--- gcc/cgraphunit.c
+++ gcc/cgraphunit.c
@@ -956,7 +956,7 @@ check_global_declaration (symtab_node *snode)
   && ! DECL_ABSTRACT_ORIGIN (decl)
   && ! TREE_PUBLIC (decl)
   /* A volatile variable might be used in some non-obvious way.  */
-  && ! TREE_THIS_VOLATILE (decl)
+  && (! VAR_P (decl) || ! TREE_THIS_VOLATILE (decl))
   /* Global register variables must be declared to reserve them.  */
   && ! (TREE_CODE (decl) == VAR_DECL && DECL_REGISTER (decl))
   /* Global ctors and dtors are called by the runtime.  */
diff --git gcc/testsuite/c-c++-common/pr68582.c 
gcc/testsuite/c-c++-common/pr68582.c
index e69de29..95ca9a4 100644
--- gcc/testsuite/c-c++-common/pr68582.c
+++ gcc/testsuite/c-c++-common/pr68582.c
@@ -0,0 +1,25 @@
+/* PR middle-end/68582 */
+/* { dg-do compile } */
+/* { dg-options "-Wunused-function" } */
+
+/* We failed to give the warning for functions with TREE_THIS_VOLATILE set.  */
+
+static void
+fn1 (void) /* { dg-warning "defined but not used" } */
+{
+  __builtin_abort ();
+}
+
+__attribute__ ((noreturn))
+static void
+fn2 (void) /* { dg-warning "defined but not used" } */
+{
+  __builtin_abort ();
+}
+
+__attribute__ ((volatile))
+static void
+fn3 (void) /* { dg-warning "defined but not used" } */
+{
+  __builtin_abort ();
+}

Marek

Re: [1/2] OpenACC routine support

2015-12-01 Thread Thomas Schwinge

Hi Cesar!

I noticed while working on other test cases:

On Wed, 18 Nov 2015 11:02:01 -0800, Cesar Philippidis  
wrote:
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c

> @@ -1318,13 +1318,21 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree 
> fndecl)
>  }
>  }
>  
> -/* Diagnose if #pragma omp routine isn't followed immediately
> -   by function declaration or definition.   */
> +/* Diagnose if #pragma acc routine isn't followed immediately by function
> +   declaration or definition.  */
>  
>  static inline void
>  cp_ensure_no_oacc_routine (cp_parser *parser)
>  {
> -  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
> +  if (parser->oacc_routine && !parser->oacc_routine->error_seen)
> +{
> +  tree clauses = parser->oacc_routine->clauses;
> +  location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE(clauses));
> +
> +  error_at (loc, "%<#pragma oacc routine%> not followed by function "
> + "declaration or definition");
> +  parser->oacc_routine = NULL;
> +}
>  }

"#pragma acc routine", not "oacc".  Also in a few other places.

Next, in the function quoted above, you use "not followed by function
declaration or definition", but you use "not followed by a single
function declaration or definition" in a lot of (but not all) other
places -- is that intentional?

For example:

>  cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
>   enum pragma_context context)
>  {
> [...]
> +   error_at (OMP_CLAUSE_LOCATION (parser->oacc_routine->clauses),
> + "%<#pragma oacc routine%> not followed by a single "
> + "function declaration or definition");

"a single".

> [...]
> +   if (parser->oacc_routine
> +   && !parser->oacc_routine->error_seen
> +   && !parser->oacc_routine->fndecl_seen)
> + error_at (loc, "%<#pragma acc routine%> not followed by "
> +   "function declaration or definition");

Not "a single".

> +
> +   data.tokens.release ();
> +   parser->oacc_routine = NULL;
> + }
> +}
> +}
> +
> +/* Finalize #pragma acc routine clauses after direct declarator has
> +   been parsed, and put that into "oacc routine" attribute.  */

There is no "oacc routine" attribute (anymore)?

> +static tree
> +cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
> +{
> [...]
> +  if ((!data->error_seen && data->fndecl_seen)
> +  || data->tokens.length () != 1)
> +{
> +  error_at (loc, "%<#pragma oacc routine%> not followed by a single "
> + "function declaration or definition");

"a single".

(I have not verified all of the parser(s) source code.)


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-01 Thread Tom de Vries


On 01/12/15 13:29, Jakub Jelinek wrote:

On Tue, Dec 01, 2015 at 01:27:32PM +0100, Christophe Lyon wrote:

>I've committed the attached patch as obvious: it adds
>dg-require-effective-target fopenmp to these new tests
>so that they are skipped e.g. on arm bare-metal targets
>(using newlib).
>
>Note that pr46032.c has some failures:
>FAIL:  gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1
>on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*,
>and on armeb-none-linux-gnueabihf
>
>I haven't looked at the details yet; see
>http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html
>for more information.



Supposedly pr46032-{2,3}.c should go into testsuite/gcc.dg/gomp/ instead and
pr46032.c into testsuite/gcc.dg/vect/ (with the fopenmp effective target and
perhaps some other effective target conditions)?


I've moved the tests, and added dg-require-effective-target vect_int in 
pr46032.c.


Committed to trunk as obvious.

Thanks,
- Tom
Move pr46032*.c tests

2015-12-01  Tom de Vries  

	* gcc.dg/pr46032.c: Move to ...
	* gcc.dg/vect/pr46032.c: here.  Add dg-require-effective-target
	vect_int.
	* gcc.dg/pr46032-2.c: Move to ...
	* gcc.dg/gomp/pr46032-2.c: ... here.  Drop dg-require-effective-target fopenmp.
	* gcc.dg/pr46032-3.c: Move to ...
	* gcc.dg/gomp/pr46032-3.c: ... here.  Drop dg-require-effective-target fopenmp.

---
 gcc/testsuite/gcc.dg/gomp/pr46032-2.c | 29 +
 gcc/testsuite/gcc.dg/gomp/pr46032-3.c | 28 
 gcc/testsuite/gcc.dg/pr46032-2.c  | 30 -
 gcc/testsuite/gcc.dg/pr46032-3.c  | 29 -
 gcc/testsuite/gcc.dg/pr46032.c| 48 --
 gcc/testsuite/gcc.dg/vect/pr46032.c   | 49 +++
 6 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/gomp/pr46032-2.c b/gcc/testsuite/gcc.dg/gomp/pr46032-2.c
new file mode 100644
index 000..e110880
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr46032-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+int
+foo (void)
+{
+  int a[N], b[N], c[N];
+  int *ap = &a[0];
+  int *bp = &b[0];
+  int *cp = &c[0];
+
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+{
+  ap[idx] = 1;
+  bp[idx] = 2;
+  cp[idx] = ap[idx];
+}
+
+  return *cp;
+}
+
+/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/gomp/pr46032-3.c b/gcc/testsuite/gcc.dg/gomp/pr46032-3.c
new file mode 100644
index 000..a4af7ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr46032-3.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+int
+foo (void)
+{
+  int a[N], c[N];
+  int *ap = &a[0];
+  int *bp = &a[0];
+  int *cp = &c[0];
+
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+{
+  ap[idx] = 1;
+  bp[idx] = 2;
+  cp[idx] = ap[idx];
+}
+
+  return *cp;
+}
+
+/* { dg-final { scan-tree-dump-times "\\] = 1;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr46032-2.c b/gcc/testsuite/gcc.dg/pr46032-2.c
deleted file mode 100644
index d769597..000
--- a/gcc/testsuite/gcc.dg/pr46032-2.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target fopenmp } */
-/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
-
-#define N 2
-
-int
-foo (void)
-{
-  int a[N], b[N], c[N];
-  int *ap = &a[0];
-  int *bp = &b[0];
-  int *cp = &c[0];
-
-#pragma omp parallel for
-  for (unsigned int idx = 0; idx < N; idx++)
-{
-  ap[idx] = 1;
-  bp[idx] = 2;
-  cp[idx] = ap[idx];
-}
-
-  return *cp;
-}
-
-/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
-
diff --git a/gcc/testsuite/gcc.dg/pr46032-3.c b/gcc/testsuite/gcc.dg/pr46032-3.c
deleted file mode 100644
index a9e74d0..000
--- a/gcc/testsuite/gcc.dg/pr46032-3.c
+++ /dev/null
@@ -1,29 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target fopenmp } */
-/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
-
-#define N 2
-
-int

Re: [PATCH] Derive interface buffers from max name length

2015-12-01 Thread Janne Blomqvist

On Tue, Dec 1, 2015 at 2:54 PM, Bernhard Reutner-Fischer
 wrote:
> These three function used a hardcoded buffer of 100 but would be better
> off to base off GFC_MAX_SYMBOL_LEN which denotes the maximum length of a
> name in any of our supported standards (63 as of f2003 ff.).

Please use xasprintf() instead (and free the result, or course). One
of my backburner projects is to get rid of these static symbol
buffers, and use dynamic buffers (or the symbol table) instead. We
IIRC already have some ugly hacks by using hashing to get around
GFC_MAX_SYMBOL_LEN when handling mangled symbols. Your patch doesn't
make the situation worse per se, but if you're going to fix it, lets
do it properly.

Ok for GCC 7 stage1 with these changes. I don't think it's worth
putting it into GCC 6 at this point anymore, unless this is actually
fixing some bugs that are visible to users?

-- 
Janne Blomqvist

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Steve Kargl

On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
> 
> David Malcolm nice Levenshtein distance spelling check helpers
> were used in some parts of other frontends. This proposed patch adds
> some spelling corrections to the fortran frontend.
> 
> Suggestions are printed if we can find a suitable name, currently
> perusing a very simple cutoff factor:
> /* If more than half of the letters were misspelled, the suggestion is
>likely to be meaningless.  */
> cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
> which effectively skips names with less than 4 characters.
> For e.g. structures, one could try to be much smarter in an attempt to
> also provide suggestions for single-letter members/components.
> 
> This patch covers (at least partly):
> - user-defined operators
> - structures (types and their components)
> - functions
> - symbols (variables)
> 
> I do not immediately see how to handle subroutines. Ideas?
> 
> If anybody has a testcase where a spelling-suggestion would make sense
> then please pass it along so we maybe can add support for GCC-7.
> 

What problem are you trying to solve here?  The patch looks like
unneeded complexity with the result of injecting C++ idioms into
the Fortran FE.

-- 
Steve

Re: [1/2] OpenACC routine support

2015-12-01 Thread Cesar Philippidis

On 12/01/2015 06:40 AM, Thomas Schwinge wrote:

> I noticed while working on other test cases:
> 
> On Wed, 18 Nov 2015 11:02:01 -0800, Cesar Philippidis 
>  wrote:
>> --- a/gcc/cp/parser.c
>> +++ b/gcc/cp/parser.c
> 
>> @@ -1318,13 +1318,21 @@ cp_finalize_omp_declare_simd (cp_parser *parser, 
>> tree fndecl)
>>  }
>>  }
>>  
>> -/* Diagnose if #pragma omp routine isn't followed immediately
>> -   by function declaration or definition.   */
>> +/* Diagnose if #pragma acc routine isn't followed immediately by function
>> +   declaration or definition.  */
>>  
>>  static inline void
>>  cp_ensure_no_oacc_routine (cp_parser *parser)
>>  {
>> -  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
>> +  if (parser->oacc_routine && !parser->oacc_routine->error_seen)
>> +{
>> +  tree clauses = parser->oacc_routine->clauses;
>> +  location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE(clauses));
>> +
>> +  error_at (loc, "%<#pragma oacc routine%> not followed by function "
>> +"declaration or definition");
>> +  parser->oacc_routine = NULL;
>> +}
>>  }
> 
> "#pragma acc routine", not "oacc".  Also in a few other places.

Good eyes. Thanks for catching that.

> Next, in the function quoted above, you use "not followed by function
> declaration or definition", but you use "not followed by a single
> function declaration or definition" in a lot of (but not all) other
> places -- is that intentional?

I probably wasn't being consistent. Which error message do you prefer?
I'll take a look at what the c front end does.

> For example:
> 
>>  cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
>>  enum pragma_context context)
>>  {
>> [...]
>> +  error_at (OMP_CLAUSE_LOCATION (parser->oacc_routine->clauses),
>> +"%<#pragma oacc routine%> not followed by a single "
>> +"function declaration or definition");
> 
> "a single".
> 
>> [...]
>> +  if (parser->oacc_routine
>> +  && !parser->oacc_routine->error_seen
>> +  && !parser->oacc_routine->fndecl_seen)
>> +error_at (loc, "%<#pragma acc routine%> not followed by "
>> +  "function declaration or definition");
> 
> Not "a single".
> 
>> +
>> +  data.tokens.release ();
>> +  parser->oacc_routine = NULL;
>> +}
>> +}
>> +}
>> +
>> +/* Finalize #pragma acc routine clauses after direct declarator has
>> +   been parsed, and put that into "oacc routine" attribute.  */
> 
> There is no "oacc routine" attribute (anymore)?

You're right, it was renamed to 'oacc function'.

>> +static tree
>> +cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
>> +{
>> [...]
>> +  if ((!data->error_seen && data->fndecl_seen)
>> +  || data->tokens.length () != 1)
>> +{
>> +  error_at (loc, "%<#pragma oacc routine%> not followed by a single "
>> +"function declaration or definition");
> 
> "a single".
> 
> (I have not verified all of the parser(s) source code.)

Thanks. I'll go through and update the comments and error messages.

Cesar

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Andi Kleen

Bernd Schmidt  writes:

> I'm worried we'll end up carrying
> something around as a burden that is of no practical use (considering
> we already support the more widespread OpenMP).

I'm not an expert on UPC, but from glancing over the description it
seems to target a distributed message passing programing model,
which is very different from OpenMP. I don't think any of the existing
parallelization models in gcc (OpenMP, cilk) support that niche.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only

RFD: annotate iterator patterns with expanded forms

2015-12-01 Thread Bernd Schmidt

One problem I have whenever I try to edit i386.md is that I can't find 
the patterns I'm looking for. Let's say I'm looking for lshrsi3, but 
there's no pattern by this name, what I'm looking for is 
"3". Even worse are things like "*xordi_2", which has 
just "*_2" and can't reasonably be searched for.


I've made a little proof-of-concept patch which makes gensupport 
generate ed scripts that can be applied to machine descriptions after 
some post processing. I'm attaching that patch, and the effect of the 
annotations on i386.md.


What should I do with this? Would people like to see a fully method of 
updating machine descriptions? Should we just generate them once for the 
most difficult files such as i386.md and apply them? Or do people find 
the additional comments to be visual clutter (the i386.md ones are 
brief, but the avx patterns in sse.md would end up with pretty long lists)?



Bernd
diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 484ead2..4daaef9 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -2236,6 +2236,32 @@ rtx_handle_directive (file_location loc, const char *rtx_name)
 
   rtx x;
   unsigned int i;
+  if (subrtxs.length () > 1
+  && (GET_CODE (subrtxs[0]) == DEFINE_INSN
+	  || GET_CODE (subrtxs[0]) == DEFINE_EXPAND))
+{
+  const char *p = "";
+  fprintf (stderr, "%s:%d\\ni\\n;; Expands to:\\n;; ", loc.filename, loc.lineno);
+  int len = 3;
+  int p_len = 0;
+  FOR_EACH_VEC_ELT (subrtxs, i, x)
+	{
+	  int this_len = strlen (XSTR (x, 0));
+	  if (len + this_len + p_len >= 78)
+	{
+	  fprintf (stderr, "\\n;; ");
+	  len = 3;
+	  p = "";
+	  p_len = 0;
+	}
+	  fprintf (stderr, "%s%s", p, XSTR (x, 0));
+	  len += this_len + p_len;
+	  p = ", ";
+	  p_len = 2;
+	}
+  fprintf (stderr, "\\n.\n");
+}
+
   FOR_EACH_VEC_ELT (subrtxs, i, x)
 process_rtx (x, loc);
 }
--- ../../git/gcc/config/i386/i386.md	2015-11-30 14:34:27.995459571 +0100
+++ ./i386.md	2015-12-01 15:58:59.817779596 +0100
@@ -1199,6 +1199,8 @@
 
 ;; Compare and branch/compare and store instructions.
 
+;; Expands to:
+;; cbranchqi4, cbranchhi4, cbranchsi4, cbranchdi4, cbranchti4
 (define_expand "cbranch4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
@@ -1217,6 +1219,8 @@
   DONE;
 })
 
+;; Expands to:
+;; cstoreqi4, cstorehi4, cstoresi4, cstoredi4
 (define_expand "cstore4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:SWIM 2 "nonimmediate_operand")
@@ -1233,11 +1237,15 @@
   DONE;
 })
 
+;; Expands to:
+;; cmpsi_1, cmpdi_1
 (define_expand "cmp_1"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
 		(match_operand:SWI48 1 "")))])
 
+;; Expands to:
+;; *cmpqi_ccno_1, *cmphi_ccno_1, *cmpsi_ccno_1, *cmpdi_ccno_1
 (define_insn "*cmp_ccno_1"
   [(set (reg FLAGS_REG)
 	(compare (match_operand:SWI 0 "nonimmediate_operand" ",?m")
@@ -1251,6 +1259,8 @@
(set_attr "modrm_class" "op0,unknown")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpqi_1, *cmphi_1, *cmpsi_1, *cmpdi_1
 (define_insn "*cmp_1"
   [(set (reg FLAGS_REG)
 	(compare (match_operand:SWI 0 "nonimmediate_operand" "m,")
@@ -1260,6 +1270,8 @@
   [(set_attr "type" "icmp")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpqi_minus_1, *cmphi_minus_1, *cmpsi_minus_1, *cmpdi_minus_1
 (define_insn "*cmp_minus_1"
   [(set (reg FLAGS_REG)
 	(compare
@@ -1382,6 +1394,8 @@
   DONE;
 })
 
+;; Expands to:
+;; cbranchsf4, cbranchdf4
 (define_expand "cbranch4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
@@ -1399,6 +1413,8 @@
   DONE;
 })
 
+;; Expands to:
+;; cstoresf4, cstoredf4
 (define_expand "cstore4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
@@ -1450,6 +1466,8 @@
 ;; We may not use "#" to split and emit these, since the REG_DEAD notes
 ;; used to manage the reg stack popping would not be preserved.
 
+;; Expands to:
+;; *cmpsf_0_i387, *cmpdf_0_i387, *cmpxf_0_i387
 (define_insn "*cmp_0_i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1516,6 +1534,8 @@
(set_attr "unit" "i387")
(set_attr "mode" "XF")])
 
+;; Expands to:
+;; *cmpsf_i387, *cmpdf_i387
 (define_insn "*cmp_i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1549,6 +1569,8 @@
(set_attr "unit" "i387")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpusf_i387, *cmpudf_i387, *cmpuxf_i387
 (define_insn "*cmpu_i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1582,6 +1604,9 @@
(set_attr "unit" "i387")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpsf_hi_i387, *cmpdf_hi_i387, *cmpxf_hi_i387, *cmpsf_si_i387
+;; *cmpdf_si_i387, *cmpxf_si_i387
 (define_insn "*cmp__i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1666,6 +1691,8 @@
 (define_mode_iterator FPCMP [CCFP CCFPU])
 (define_mode_attr unord [(CCFP "") (CCFPU "u")])
 
+;; E

[PATCH] Add testcase for tree-optimization/67916

2015-12-01 Thread Marek Polacek

This PR was fixed in r228767 (or went latent?), but this testcase has never
been added.

Tested on x86_64-linux, ok for trunk?

2015-12-01  Marek Polacek  

PR tree-optimization/67916
* gcc.dg/torture/pr67916.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67916.c 
gcc/testsuite/gcc.dg/torture/pr67916.c
index e69de29..88541f9 100644
--- gcc/testsuite/gcc.dg/torture/pr67916.c
+++ gcc/testsuite/gcc.dg/torture/pr67916.c
@@ -0,0 +1,46 @@
+/* PR tree-optimization/67916 */
+/* { dg-do run } */
+
+int a[6], b = 1, d, e;
+long long c;
+static int f = 1;
+
+void
+fn1 (int p1)
+{
+  b = (b >> 1) & (1 ^ a[(1 ^ p1) & 5]);
+}
+
+void
+fn2 ()
+{
+  b = (b >> 1) & (1 ^ a[(b ^ 1) & 1]);
+  fn1 (c >> 1 & 5);
+  fn1 (c >> 2 & 5);
+  fn1 (c >> 4 & 5);
+  fn1 (c >> 8 & 5);
+}
+
+int
+main ()
+{
+  int i, j;
+  for (; d;)
+{
+  for (; e;)
+   fn2 ();
+  f = 0;
+}
+  for (i = 0; i < 8; i++)
+{
+  if (f)
+   i = 9;
+  for (j = 0; j < 7; j++)
+   fn2 ();
+}
+
+  if (b != 0)
+__builtin_abort ();
+
+  return 0;
+}

Marek

Re: [PATCH] Don't ignore noreturn functions for "unused" warning (PR middle-end/68582)

2015-12-01 Thread Richard Biener

On Tue, Dec 1, 2015 at 3:47 PM, Marek Polacek  wrote:
> We were failing to give "defined but not used" warning for functions marked
> with the attribute noreturn/volatile.  The problem is that for functions the
> TREE_THIS_VOLATILE flag means something different than for decls.  The fix is
> to check the flag only for VAR_DECLs, as suggested by Richi in the PR.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks.
Richard.

> 2015-12-01  Marek Polacek  
>
> PR middle-end/68582
> * cgraphunit.c (check_global_declaration): Only depend on 
> TREE_THIS_VOLATILE
> for VAR_DECLs.
>
> * c-c++-common/pr68582.c: New test.
>
> diff --git gcc/cgraphunit.c gcc/cgraphunit.c
> index f73d9a7..4ce5f9b 100644
> --- gcc/cgraphunit.c
> +++ gcc/cgraphunit.c
> @@ -956,7 +956,7 @@ check_global_declaration (symtab_node *snode)
>&& ! DECL_ABSTRACT_ORIGIN (decl)
>&& ! TREE_PUBLIC (decl)
>/* A volatile variable might be used in some non-obvious way.  */
> -  && ! TREE_THIS_VOLATILE (decl)
> +  && (! VAR_P (decl) || ! TREE_THIS_VOLATILE (decl))
>/* Global register variables must be declared to reserve them.  */
>&& ! (TREE_CODE (decl) == VAR_DECL && DECL_REGISTER (decl))
>/* Global ctors and dtors are called by the runtime.  */
> diff --git gcc/testsuite/c-c++-common/pr68582.c 
> gcc/testsuite/c-c++-common/pr68582.c
> index e69de29..95ca9a4 100644
> --- gcc/testsuite/c-c++-common/pr68582.c
> +++ gcc/testsuite/c-c++-common/pr68582.c
> @@ -0,0 +1,25 @@
> +/* PR middle-end/68582 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wunused-function" } */
> +
> +/* We failed to give the warning for functions with TREE_THIS_VOLATILE set.  
> */
> +
> +static void
> +fn1 (void) /* { dg-warning "defined but not used" } */
> +{
> +  __builtin_abort ();
> +}
> +
> +__attribute__ ((noreturn))
> +static void
> +fn2 (void) /* { dg-warning "defined but not used" } */
> +{
> +  __builtin_abort ();
> +}
> +
> +__attribute__ ((volatile))
> +static void
> +fn3 (void) /* { dg-warning "defined but not used" } */
> +{
> +  __builtin_abort ();
> +}
>
> Marek

Re: [PATCH] Add testcase for tree-optimization/67916

2015-12-01 Thread Richard Biener

On Tue, Dec 1, 2015 at 4:18 PM, Marek Polacek  wrote:
> This PR was fixed in r228767 (or went latent?), but this testcase has never
> been added.
>
> Tested on x86_64-linux, ok for trunk?

Ok.

Richard.

> 2015-12-01  Marek Polacek  
>
> PR tree-optimization/67916
> * gcc.dg/torture/pr67916.c: New test.
>
> diff --git gcc/testsuite/gcc.dg/torture/pr67916.c 
> gcc/testsuite/gcc.dg/torture/pr67916.c
> index e69de29..88541f9 100644
> --- gcc/testsuite/gcc.dg/torture/pr67916.c
> +++ gcc/testsuite/gcc.dg/torture/pr67916.c
> @@ -0,0 +1,46 @@
> +/* PR tree-optimization/67916 */
> +/* { dg-do run } */
> +
> +int a[6], b = 1, d, e;
> +long long c;
> +static int f = 1;
> +
> +void
> +fn1 (int p1)
> +{
> +  b = (b >> 1) & (1 ^ a[(1 ^ p1) & 5]);
> +}
> +
> +void
> +fn2 ()
> +{
> +  b = (b >> 1) & (1 ^ a[(b ^ 1) & 1]);
> +  fn1 (c >> 1 & 5);
> +  fn1 (c >> 2 & 5);
> +  fn1 (c >> 4 & 5);
> +  fn1 (c >> 8 & 5);
> +}
> +
> +int
> +main ()
> +{
> +  int i, j;
> +  for (; d;)
> +{
> +  for (; e;)
> +   fn2 ();
> +  f = 0;
> +}
> +  for (i = 0; i < 8; i++)
> +{
> +  if (f)
> +   i = 9;
> +  for (j = 0; j < 7; j++)
> +   fn2 ();
> +}
> +
> +  if (b != 0)
> +__builtin_abort ();
> +
> +  return 0;
> +}
>
> Marek

[gomp-nvptx 7/9] nvptx mkoffload: pass -mgomp for OpenMP offloading

2015-12-01 Thread Alexander Monakov

This patch wires up use of alternative -mgomp multilib for OpenMP offloading
via nvptx mkoffload.  It makes OpenACC and OpenMP incompatible for
simultaneous offloading compilation, so I've added a diagnostic for that.

* config/nvptx/mkoffload.c (main): Check that either OpenACC or OpenMP
is selected.  Pass -mgomp to offload compiler in OpenMP case.
---
 gcc/config/nvptx/mkoffload.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 7aa6f09..9a5d36d 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -460,6 +460,7 @@ main (int argc, char **argv)
 
   /* Scan the argument vector.  */
   bool fopenmp = false;
+  bool fopenacc = false;
   for (int i = 1; i < argc; i++)
 {
 #define STR "-foffload-abi="
@@ -476,11 +477,15 @@ main (int argc, char **argv)
 #undef STR
   else if (strcmp (argv[i], "-fopenmp") == 0)
fopenmp = true;
+  else if (strcmp (argv[i], "-fopenacc") == 0)
+   fopenacc = true;
   else if (strcmp (argv[i], "-save-temps") == 0)
save_temps = true;
   else if (strcmp (argv[i], "-v") == 0)
verbose = true;
 }
+  if (!(fopenacc ^ fopenmp))
+fatal_error (input_location, "either -fopenacc or -fopenmp must be set");
 
   struct obstack argv_obstack;
   obstack_init (&argv_obstack);
@@ -501,6 +506,8 @@ main (int argc, char **argv)
 default:
   gcc_unreachable ();
 }
+  if (fopenmp)
+obstack_ptr_grow (&argv_obstack, "-mgomp");
 
   for (int ix = 1; ix != argc; ix++)
 {

[gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib

2015-12-01 Thread Alexander Monakov

Since OpenMP offloading requires both soft-stacks and "uniform SIMT", both
non-traditional codegen variants, I'm building a multilib variant with those
enabled.  This patch adds option -mgomp which enables -msoft-stack plus
-muniform-simt, and builds a multilib with it.

* config/nvptx/nvptx.c (nvptx_option_override): Handle TARGET_GOMP.
* config/nvptx/nvptx.opt (mgomp): New option.
* config/nvptx/t-nvptx (MULTILIB_OPTIONS): New.
* doc/invoke.texi (mgomp): Document.
---
 gcc/config/nvptx/nvptx.c   | 3 +++
 gcc/config/nvptx/nvptx.opt | 4 
 gcc/config/nvptx/t-nvptx   | 2 ++
 gcc/doc/invoke.texi| 5 +
 4 files changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3bd3cf7..48ee96e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -153,6 +153,9 @@ nvptx_option_override (void)
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, worker_red_name);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
+  if (TARGET_GOMP)
+target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT;
 }
 
 /* Return the mode to be used when declaring a ptx object for OBJ.
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 47e811e..8826659 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -36,3 +36,7 @@ Use custom stacks instead of local memory for automatic 
storage.
 muniform-simt
 Target Report Mask(UNIFORM_SIMT)
 Generate code that executes all threads in a warp as if one was active.
+
+mgomp
+Target Report Mask(GOMP)
+Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt.
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index e2580c9..6c1010d 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -8,3 +8,5 @@ ALL_HOST_OBJS += mkoffload.o
 mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
$(LIBIBERTY) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
  mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBS)
+
+MULTILIB_OPTIONS = mgomp
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 46cd2e9..7e7f3b4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18956,6 +18956,11 @@ all-ones bitmasks for each warp, indicating current 
mode (0 outside of SIMD
 regions).  Each thread can bitwise-and the bitmask at position @code{tid.y}
 with current lane index to compute the master lane index.
 
+@item -mgomp
+@opindex mgomp
+Generate code for use in OpenMP offloading: enables @option{-msoft-stack} and
+@option{-muniform-simt} options, and selects corresponding multilib variant.
+
 @end table
 
 @node PDP-11 Options

[gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-01 Thread Alexander Monakov

This patch introduces a code generation variant for NVPTX that I'm using for
SIMD work in OpenMP offloading.  Let me try to explain the idea behind it...

In place of SIMD vectorization, NVPTX is using SIMT (single
instruction/multiple threads) execution: groups of 32 threads execute the same
instruction, with some threads possibly masked off if under a divergent branch.
So we are mapping OpenMP threads to such thread groups ("warps"), and hardware
threads are then mapped to OpenMP SIMD lanes.

We need to reach heads of SIMD regions with all hw threads active, because
there's no way to "resurrect" them once masked off: they need to follow the
same control flow, and reach the SIMD region entry with the same local state
(registers, and stack too for OpenACC).

The approach in OpenACC is to, outside of "vector" loops, 1) make threads 1-31
"slaves" which just follow branches without any computation -- that requires
extra jumps and broadcasting branch predicates, -- and 2) broadcast register
state and stack state from master to slaves when entering "vector" regions.

I'm taking a different approach.  I want to execute all insns in all warp
members, while ensuring that effect (on global and local state) is that same
as if any single thread was executing that instruction.  Most instructions
automatically satisfy that: if threads have the same state, then executing an
arithmetic instruction, normal memory load/store, etc. keep local state the
same in all threads.

The two exception insn categories are atomics and calls.  For calls, we can
demand recursively that they uphold this execution model, until we reach
runtime-provided "syscalls": malloc/free/vprintf.  Those we can handle like
atomics.

To handle atomics, we
  1) execute the atomic conditionally only in one warp member -- so its side
  effect happens once;
  2) copy the register that was set from that warp member to others -- so
  local state is kept synchronized:

atom.op dest, ...

becomes

/* pred = (current_lane == 0);  */
@pred atom.op dest, ...
shuffle.idx dest, dest, /*srclane=*/0

So the overhead is one shuffle insn following each atomic, plus predicate
setup in the prologue.

OK, so the above handles execution out of SIMD regions nicely, but then we'd
also need to run code inside of SIMD regions, where we need to turn off this
synching effect.  Turns out we can keep atomics decorated almost like before:

@pred atom.op dest, ...
shuffle.idx dest, dest, master_lane

and compute 'pred' and 'master_lane' accordingly: outside of SIMD regions we
need (master_lane == 0 && pred == (current_lane == 0)), and inside we need
(master_lane == current_lane && pred == true) (so that shuffle is no-op, and
predicate is 'true' for all lanes).  Then, (pred = (current_lane ==
master_lane) works in both cases, and we just need to set up master_lane
accordingly: master_lane = current_lane & mask, where mask is all-0 outside of
SIMD regions, and all-1 inside.  To store these per-warp masks, I've
introduced another shared memory array, __nvptx_uni.

* config/nvptx/nvptx.c (need_unisimt_decl): New variable.  Set it...
(nvptx_init_unisimt_predicate): ...here (new function) and use it...
(nvptx_file_end): ...here to emit declaration of __nvptx_uni array.
(nvptx_declare_function_name): Call nvptx_init_unisimt_predicate.
(nvptx_get_unisimt_master): New helper function.
(nvptx_get_unisimt_predicate): Ditto.
(nvptx_call_insn_is_syscall_p): Ditto.
(nvptx_unisimt_handle_set): Ditto.
(nvptx_reorg_uniform_simt): New.  Transform code for -muniform-simt.
(nvptx_get_axis_predicate): New helper function, factored out from...
(nvptx_single): ...here.
(nvptx_reorg): Call nvptx_reorg_uniform_simt.
* config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Define
__nvptx_unisimt__ when -muniform-simt option is active.
(struct machine_function): Add unisimt_master, unisimt_predicate
rtx fields.
* config/nvptx/nvptx.md (divergent): New attribute.
(atomic_compare_and_swap_1): Mark as divergent.
(atomic_exchange): Ditto.
(atomic_fetch_add): Ditto.
(atomic_fetch_addsf): Ditto.
(atomic_fetch_): Ditto.
* config/nvptx/nvptx.opt (muniform-simt): New option.
* doc/invoke.texi (-muniform-simt): Document.
---
 gcc/config/nvptx/nvptx.c   | 138 ++---
 gcc/config/nvptx/nvptx.h   |   4 ++
 gcc/config/nvptx/nvptx.md  |  18 --
 gcc/config/nvptx/nvptx.opt |   4 ++
 gcc/doc/invoke.texi|  14 +
 5 files changed, 165 insertions(+), 13 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2dad3e2..9209b47 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -117,6 +117,9 @@ static GTY(()) rtx worker_red_sym;
 /* True if any function references __nvptx_stacks.  */
 static bool need_softstack_decl;
 
+/* True

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Richard Biener

On Tue, 1 Dec 2015, Andi Kleen wrote:

> Bernd Schmidt  writes:
> 
> > I'm worried we'll end up carrying
> > something around as a burden that is of no practical use (considering
> > we already support the more widespread OpenMP).
> 
> I'm not an expert on UPC, but from glancing over the description it
> seems to target a distributed message passing programing model,
> which is very different from OpenMP. I don't think any of the existing
> parallelization models in gcc (OpenMP, cilk) support that niche.

Fortran CoArrays do though.  Ok, slightly irrelevant...

Btw, I don't think we should talk about "no practical use" given
we took openACC.

Richard.

[gomp-nvptx 8/9] libgomp: update gomp_nvptx_main for -mgomp

2015-12-01 Thread Alexander Monakov

Here's how I've updated gomp_nvptx_main to set up shared memory arrays
__nvptx_stacks and __nvptx_uni for -mgomp.  Since it makes sense only for
-mgomp multilib, I've wrapped the whole file under #ifdef that checks
corresponding built-in macros.

Reaching those shared memory arrays is awkward.  I cannot declare them with
toplevel asms because the compiler implicitely declares them too, and ptxas
does not handle duplicated declaration.  Ideally I'd like to be able to say:

extern char *__shared __nvptx_stacks[32];

Bernd, is your position on exposing shared memory as first-class address space
on NVPTX subject to change?  Do you remember what middle-end issues you've
encountered when trying that?

* config/nvptx/team.c (gomp_nvptx_main): Rename to...
(gomp_nvptx_main_1): ... this and mark noinline.
(gomp_nvptx_main): Wrap the above, set up __nvptx_uni and
__nvptx_stacks.
---
 libgomp/config/nvptx/team.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 88d1d34..deb0860 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -24,6 +24,8 @@
 
 /* This file handles the maintainence of threads on NVPTX.  */
 
+#if defined __nvptx_softstack && defined __nvptx_unisimt__
+
 #include "libgomp.h"
 #include 
 
@@ -31,15 +33,9 @@ struct gomp_thread *nvptx_thrs;
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
-void
-gomp_nvptx_main (void (*fn) (void *), void *fn_data)
+static void __attribute__((noinline))
+gomp_nvptx_main_1 (void (*fn) (void *), void *fn_data, int ntids, int tid)
 {
-  int ntids, tid, laneid;
-  asm ("mov.u32 %0, %%laneid;" : "=r" (laneid));
-  if (laneid)
-return;
-  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
-  asm ("mov.u32 %0, %%ntid.y;" : "=r"(ntids));
   if (tid == 0)
 {
   gomp_global_icv.nthreads_var = ntids;
@@ -72,6 +68,30 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
 }
 }
 
+void
+gomp_nvptx_main (void (*fn) (void *), void *fn_data)
+{
+  int tid, ntids;
+  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
+  asm ("mov.u32 %0, %%ntid.y;" : "=r"(ntids));
+  char *stacks = 0;
+  int *__nvptx_uni;
+  asm ("cvta.shared.u64 %0, __nvptx_uni;" : "=r" (__nvptx_uni));
+  __nvptx_uni[tid] = 0;
+  if (tid == 0)
+{
+  size_t stacksize = 131072;
+  stacks = gomp_malloc (stacksize * ntids);
+  char **__nvptx_stacks = 0;
+  asm ("cvta.shared.u64 %0, __nvptx_stacks;" : "=r" (__nvptx_stacks));
+  for (int i = 0; i < ntids; i++)
+   __nvptx_stacks[i] = stacks + stacksize * (i + 1);
+}
+  asm ("bar.sync 0;");
+  gomp_nvptx_main_1 (fn, fn_data, ntids, tid);
+  free (stacks);
+}
+
 /* This function is a pthread_create entry point.  This contains the idle
loop in which a thread waits to be called up to become part of a team.  */
 
@@ -160,3 +180,4 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned 
nthreads,
 }
 
 #include "../../team.c"
+#endif

[gomp-nvptx 5/9] new target hook: TARGET_SIMT_VF

2015-12-01 Thread Alexander Monakov

This patch adds a new target hook and implements it in a straightforward
manner on NVPTX to indicate that the target is running in SIMT fashion with 32
threads in a synchronous group ("warp").  For use in OpenMP transforms.
---
 gcc/config/nvptx/nvptx.c | 12 
 gcc/doc/tm.texi  |  4 
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def   | 12 
 4 files changed, 30 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 48ee96e..eb3b67e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3684,10 +3684,19 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 }
 }
 
+
 /* Define dimension sizes for known hardware.  */
 #define PTX_VECTOR_LENGTH 32
 #define PTX_WORKER_LENGTH 32
 
+/* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
+
+static int
+nvptx_simt_vf ()
+{
+  return PTX_VECTOR_LENGTH;
+}
+
 /* Validate compute dimensions of an OpenACC offload or routine, fill
in non-unity defaults.  FN_LEVEL indicates the level at which a
routine might spawn a loop.  It is negative for non-routines.  */
@@ -4258,6 +4267,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef  TARGET_BUILTIN_DECL
 #define TARGET_BUILTIN_DECL nvptx_builtin_decl
 
+#undef TARGET_SIMT_VF
+#define TARGET_SIMT_VF nvptx_simt_vf
+
 #undef TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f394db7..e54944d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5765,6 +5765,10 @@ usable.  In that case, the smaller the number is, the 
more desirable it is
 to use it.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_SIMT_VF (void)
+Return number of threads in SIMT thread group on the target.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int 
*@var{dims}, int @var{fn_level})
 This hook should check the launch dimensions provided for an OpenACC
 compute region, or routine.  Defaulted values are represented as -1
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d188c57..44ba697c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4260,6 +4260,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMT_VF
+
 @hook TARGET_GOACC_VALIDATE_DIMS
 
 @hook TARGET_GOACC_DIM_LIMIT
diff --git a/gcc/target.def b/gcc/target.def
index c7ec292..f5a03d6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1639,6 +1639,18 @@ int, (struct cgraph_node *), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
+/* Functions relating to OpenMP SIMT vectorization transform.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_SIMT_"
+HOOK_VECTOR (TARGET_SIMT, simt)
+
+DEFHOOK
+(vf,
+"Return number of threads in SIMT thread group on the target.",
+int, (void), NULL)
+
+HOOK_VECTOR_END (simt)
+
 /* Functions relating to openacc.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_GOACC_"

[gomp-nvptx 0/9] Codegen bits for NVPTX OpenMP SIMD

2015-12-01 Thread Alexander Monakov

Hello!

This patch series shows how I'm approaching OpenMP SIMD for NVPTX.  It looks
good both in check-c testing and libgomp testing, including new target-3x.c
cases (but for-5.c fails to run with resource exhaustion, maybe it should be
split for NVPTX -- will investigate more later).

The previously posted patch to handle 'omp_data_o' is no longer necessary with
soft-stacks.

Looking forward to your comments.

Alexander

  nvptx backend: allow emitting COND_EXEC insns
  nvptx backend: new "uniform SIMT" codegen variant
  nvptx backend: add two more identifier maps
  nvptx backend: add -mgomp option and multilib
  new target hook: TARGET_SIMT_VF
  nvptx libgcc: rewrite in C
  nvptx mkoffload: pass -mgomp for OpenMP offloading
  libgomp: update gomp_nvptx_main for -mgomp
  adjust SIMD loop lowering for SIMT targets

 gcc/config/nvptx/mkoffload.c   |   7 ++
 gcc/config/nvptx/nvptx.c   | 181 -
 gcc/config/nvptx/nvptx.h   |   4 +
 gcc/config/nvptx/nvptx.md  |  61 +
 gcc/config/nvptx/nvptx.opt |   8 ++
 gcc/config/nvptx/t-nvptx   |   2 +
 gcc/doc/invoke.texi|  19 
 gcc/doc/tm.texi|   4 +
 gcc/doc/tm.texi.in |   2 +
 gcc/internal-fn.c  |  22 +
 gcc/internal-fn.def|   2 +
 gcc/omp-low.c  | 138 ++--
 gcc/passes.def |   1 +
 gcc/target.def |  12 +++
 gcc/tree-pass.h|   2 +
 libgcc/config/nvptx/crt0.c |  61 +
 libgcc/config/nvptx/crt0.s |  54 ---
 libgcc/config/nvptx/free.asm   |  50 --
 libgcc/config/nvptx/free.c |  34 +++
 libgcc/config/nvptx/malloc.asm |  55 ---
 libgcc/config/nvptx/malloc.c   |  35 +++
 libgcc/config/nvptx/nvptx-malloc.h |   5 +
 libgcc/config/nvptx/realloc.c  |   2 +
 libgcc/config/nvptx/stacks.c   |  30 ++
 libgcc/config/nvptx/t-nvptx|  11 ++-
 libgomp/config/nvptx/team.c|  37 ++--
 26 files changed, 622 insertions(+), 217 deletions(-)
 create mode 100644 libgcc/config/nvptx/crt0.c
 delete mode 100644 libgcc/config/nvptx/crt0.s
 delete mode 100644 libgcc/config/nvptx/free.asm
 create mode 100644 libgcc/config/nvptx/free.c
 delete mode 100644 libgcc/config/nvptx/malloc.asm
 create mode 100644 libgcc/config/nvptx/malloc.c
 create mode 100644 libgcc/config/nvptx/stacks.c

[gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-01 Thread Alexander Monakov

This is incomplete.

This handles OpenMP SIMD for NVPTX in simple cases, partly by punting on
anything unusual such as simduid loops, partly by getting lucky, as testcases
do not expose the missing bits.

What it currently does is transform SIMD loop

  for (V = N1; V cmp N2; V + STEP) BODY;

into

  for (V = N1 + (STEP * LANE); V cmp N2; V + (STEP * VF)) BODY;

and then folding LANE/VF to 0/1 on non-NVPTX post-ipa.

To make it proper, I'll need to handle SIMDUID loops (still thinking how to
best approach that), and SAFELEN (but that simply need a condition jump around
the loop, "if (LANE >= SAFELEN)").  Handling collapsed loops eventually should
be nice too.

Also, it needs something like __nvptx_{enter/exit}_simd() calls around the
loop, to switch from uniform to non-uniform SIMT execution (set bitmask in
__nvptx_uni from 0 to -1, and back on exit), and to switch from per-warp
soft-stacks to per-hwthread hard-stacks (by reserving a small area in .local
memory, and setting __nvptx_stacks[] pointer to top of that area).

Also, since SIMD regions should run on per-hwthread stacks, I'm thinking I'll
have to outline the loop into its own function.  Can I do that post-ipa
easily?
---
 gcc/internal-fn.c   |  22 +
 gcc/internal-fn.def |   2 +
 gcc/omp-low.c   | 138 +---
 gcc/passes.def  |   1 +
 gcc/tree-pass.h |   2 +
 5 files changed, 158 insertions(+), 7 deletions(-)

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a3c4a90..3189e96 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -142,6 +142,28 @@ expand_ANNOTATE (gcall *)
   gcc_unreachable ();
 }
 
+/* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
+   without SIMT execution this should be expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_LANE (gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  /* FIXME: use a separate pattern for OpenMP?  */
+  gcc_assert (targetm.have_oacc_dim_pos ());
+  emit_insn (targetm.gen_oacc_dim_pos (target, const2_rtx));
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_VF (gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* This should get expanded in adjust_simduid_builtins.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 1cb14a8..66c7422 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 
 DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF, NULL)
+DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cc0435e..51ac0e5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10173,7 +10173,7 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
  OMP_CLAUSE_SAFELEN);
   tree simduid = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
  OMP_CLAUSE__SIMDUID_);
-  tree n1, n2;
+  tree n1, n2, step;
 
   type = TREE_TYPE (fd->loop.v);
   entry_bb = region->entry;
@@ -10218,12 +10218,37 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
 
   n1 = fd->loop.n1;
   n2 = fd->loop.n2;
+  step = fd->loop.step;
+  bool do_simt_transform
+= (cgraph_node::get (current_function_decl)->offloadable
+   && !broken_loop
+   && !safelen
+   && !simduid
+   && !(fd->collapse > 1));
+  if (do_simt_transform)
+{
+  tree simt_lane
+   = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_LANE,
+   integer_type_node, 0);
+  simt_lane = fold_convert (TREE_TYPE (step), simt_lane);
+  simt_lane = fold_build2 (MULT_EXPR, TREE_TYPE (step), step, simt_lane);
+  cfun->curr_properties &= ~PROP_gimple_lomp_dev;
+}
+
   if (gimple_omp_for_combined_into_p (fd->for_stmt))
 {
   tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
 OMP_CLAUSE__LOOPTEMP_);
   gcc_assert (innerc);
   n1 = OMP_CLAUSE_DECL (innerc);
+  if (do_simt_transform)
+   {
+ n1 = fold_convert (type, n1);
+ if (POINTER_TYPE_P (type))
+   n1 = fold_build_pointer_plus (n1, simt_lane);
+ else
+   n1 = fold_build2 (PLUS_EXPR, type, n1, fold_convert (type, 
simt_lane));
+   }
   innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
OMP_CLAUSE__LOOPTEMP_);
   gcc_asse

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread VandeVondele Joost

Today, I ran 'gfortran -static-libfortran test.f90' and was very pleased with 
the answer:

gfortran: error: unrecognized command line option ‘-static-libfortran’; did you 
mean ‘-static-libgfortran’?

So thanks David, and hopefully we get this user experience for the FE as well.

Joost

[gomp-nvptx 6/9] nvptx libgcc: rewrite in C

2015-12-01 Thread Alexander Monakov

To easily build libgcc for -mgomp multilib, I've rewritten libgcc routines
from asm to C.

En passant, I've fixed a bug in malloc and realloc wrappers where they failed
to handle out-of-memory conditions.  I'm assuming it wasn't intentional.

I also use a patch for Newlib that rewrites its nvptx-specific 'printf'
implementation in C.

* config/nvptx/crt0.c: New, rewritten in C from ...
* config/nvptx/crt0.s: ...this.  Delete.
* config/nvptx/free.c: New, rewritten in C from ...
* config/nvptx/free.asm: ...this.  Delete.
* config/nvptx/malloc.c: New, rewritten in C from ...
* config/nvptx/malloc.asm: ...this.  Delete.
* config/nvptx/realloc.c: Handle out-of-memory condition.
* config/nvptx/nvptx-malloc.h (__nvptx_real_free,
__nvptx_real_malloc): Declare.
* config/nvptx/stacks.c: New.
* config/nvptx/t-nvptx: Adjust.
---
 libgcc/config/nvptx/crt0.c | 61 ++
 libgcc/config/nvptx/crt0.s | 54 -
 libgcc/config/nvptx/free.asm   | 50 ---
 libgcc/config/nvptx/free.c | 34 +
 libgcc/config/nvptx/malloc.asm | 55 --
 libgcc/config/nvptx/malloc.c   | 35 ++
 libgcc/config/nvptx/nvptx-malloc.h |  5 
 libgcc/config/nvptx/realloc.c  |  2 ++
 libgcc/config/nvptx/stacks.c   | 30 +++
 libgcc/config/nvptx/t-nvptx| 11 +++
 10 files changed, 173 insertions(+), 164 deletions(-)
 create mode 100644 libgcc/config/nvptx/crt0.c
 delete mode 100644 libgcc/config/nvptx/crt0.s
 delete mode 100644 libgcc/config/nvptx/free.asm
 create mode 100644 libgcc/config/nvptx/free.c
 delete mode 100644 libgcc/config/nvptx/malloc.asm
 create mode 100644 libgcc/config/nvptx/malloc.c
 create mode 100644 libgcc/config/nvptx/stacks.c

diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c
new file mode 100644
index 000..74483c4
--- /dev/null
+++ b/libgcc/config/nvptx/crt0.c
@@ -0,0 +1,61 @@
+/* Startup routine for standalone execution.
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+void exit (int);
+void abort (void);
+void __attribute__((kernel)) __main (int *, int, char *[]);
+
+static int *__exitval;
+
+void
+exit (int arg)
+{
+  *__exitval = arg;
+  asm volatile ("exit;");
+  __builtin_unreachable ();
+}
+
+void
+abort (void)
+{
+  exit (255);
+}
+
+asm ("// BEGIN GLOBAL VAR DECL: __nvptx_stacks");
+asm (".extern .shared .u64 __nvptx_stacks[32];");
+asm ("// BEGIN GLOBAL VAR DECL: __nvptx_uni");
+asm (".extern .shared .u32 __nvptx_uni[32];");
+
+extern int main (int argc, char *argv[]);
+
+void __attribute__((kernel))
+__main (int *__retval, int __argc, char *__argv[])
+{
+  __exitval = __retval;
+
+  static char gstack[131072] __attribute__((aligned(8)));
+  asm ("st.shared.u64 [__nvptx_stacks], %0;" : : "r" (gstack + sizeof gstack));
+  asm ("st.shared.u32 [__nvptx_uni], %0;" : : "r" (0));
+
+  exit (main (__argc, __argv));
+}
diff --git a/libgcc/config/nvptx/crt0.s b/libgcc/config/nvptx/crt0.s
deleted file mode 100644
index 1ac69a5..000
--- a/libgcc/config/nvptx/crt0.s
+++ /dev/null
@@ -1,54 +0,0 @@
-   .version 3.1
-   .target sm_30
-   .address_size 64
-
-.global .u64 %__exitval;
-// BEGIN GLOBAL FUNCTION DEF: abort
-.visible .func abort
-{
-.reg .u64 %rd1;
-ld.global.u64   %rd1,[%__exitval];
-st.u32   [%rd1], 255;
-exit;
-}
-// BEGIN GLOBAL FUNCTION DEF: exit
-.visible .func exit (.param .u32 %arg)
-{
-.reg .u64 %rd1;
-   .reg .u32 %val;
-   ld.param.u32 %val,[%arg];
-ld.global.u64   %rd1,[%__exitval];
-st.u32   [%rd1], %val;
-exit;
-}
-
-.visible .shared .u64 __nvptx_stacks[1];
-.global .align 8 .u8 %__softstack[131072];
-
-.extern .func (.param.u32 retval) main (.param.u32 argc, .param.u64 argv);
-
-.visible .entry __main (.param .u64 __retval, .param.u32 __argc, .param.u64 
_

Re: RFD: annotate iterator patterns with expanded forms

2015-12-01 Thread Jakub Jelinek

On Tue, Dec 01, 2015 at 04:14:21PM +0100, Bernd Schmidt wrote:
> One problem I have whenever I try to edit i386.md is that I can't find the
> patterns I'm looking for. Let's say I'm looking for lshrsi3, but there's no
> pattern by this name, what I'm looking for is "3". Even
> worse are things like "*xordi_2", which has just "*_2" and can't
> reasonably be searched for.

For this purpose there is
make mddump
goal which generates tmp-mddump.md in the object directory with expanded
iterators, where you can search for whatever you want.
With the comments in the *.md file I'd worry about them getting out of date,
or people feeling they have to edit them manually (rather than being
regenerated or whatever).

Jakub

Re: RFD: annotate iterator patterns with expanded forms

2015-12-01 Thread Bernd Schmidt


On 12/01/2015 04:23 PM, Jakub Jelinek wrote:

On Tue, Dec 01, 2015 at 04:14:21PM +0100, Bernd Schmidt wrote:

One problem I have whenever I try to edit i386.md is that I can't find the
patterns I'm looking for. Let's say I'm looking for lshrsi3, but there's no
pattern by this name, what I'm looking for is "3". Even
worse are things like "*xordi_2", which has just "*_2" and can't
reasonably be searched for.


For this purpose there is
make mddump
goal which generates tmp-mddump.md in the object directory with expanded
iterators, where you can search for whatever you want.
With the comments in the *.md file I'd worry about them getting out of date,
or people feeling they have to edit them manually (rather than being
regenerated or whatever).


I suppose we could have a Makefile rule that checks for out-of-date 
comments (by redoing the annotation and running diff). That would also 
alleviate the second worry.


I'd much prefer the original source files to be searchable, because if I 
want to make modifications, I can't make them in tmp-mddump.md and going 
back and forth between two files is just inconvenient.



Bernd

[PATCH] Add testcase for tree-optimization/64769

2015-12-01 Thread Marek Polacek

There's an open PR with -fopenmp-simd testcase that used to ICE but is now
fixed for 5/6, but not 4.9.

Should I commit this right away to trunk, wait for gcc-5 branch to open and
then commit it to 5 as well and then close the PR?

Or just to trunk and close the PR?

Tested on x86_64-linux.

2015-12-01  Marek Polacek  

PR tree-optimization/64769
* c-c++-common/gomp/pr64769.c: New test.

diff --git gcc/testsuite/c-c++-common/gomp/pr64769.c 
gcc/testsuite/c-c++-common/gomp/pr64769.c
index e69de29..3a30149 100644
--- gcc/testsuite/c-c++-common/gomp/pr64769.c
+++ gcc/testsuite/c-c++-common/gomp/pr64769.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/64769 */
+/* { dg-do compile } */
+/* { dg-options "-fopenmp-simd" } */
+
+#pragma omp declare simd linear(i)
+void
+foo (int i)
+{
+}

Marek

[gomp-nvptx 3/9] nvptx backend: add two more identifier maps

2015-12-01 Thread Alexander Monakov

This allows to rewrite libgcc wrappers in C by adding back-maps
__nvptx_real_malloc -> malloc and __nvptx_real_free -> free.  While at it,
I've made the implementation leaner.

* config/nvptx/nvptx.c (nvptx_name_replacement): Rewrite.  Add
__nvptx_real_malloc -> malloc and __nvptx_real_free -> free
replacements.
---
 gcc/config/nvptx/nvptx.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9209b47..3bd3cf7 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -538,14 +538,14 @@ write_function_decl_and_comment (std::stringstream &s, 
const char *name, const_t
 static const char *
 nvptx_name_replacement (const char *name)
 {
-  if (strcmp (name, "call") == 0)
-return "__nvptx_call";
-  if (strcmp (name, "malloc") == 0)
-return "__nvptx_malloc";
-  if (strcmp (name, "free") == 0)
-return "__nvptx_free";
-  if (strcmp (name, "realloc") == 0)
-return "__nvptx_realloc";
+  static const char *const replacements[] = {
+"malloc", "__nvptx_malloc", "free", "__nvptx_free",
+"realloc", "__nvptx_realloc", "call", "__nvptx_call",
+"__nvptx_real_malloc", "malloc", "__nvptx_real_free", "free"
+  };
+  for (size_t i = 0; i < ARRAY_SIZE (replacements) / 2; i++)
+if (!strcmp (name, replacements[2 * i]))
+  return replacements[2 * i + 1];
   return name;
 }

[gomp-nvptx 1/9] nvptx backend: allow emitting COND_EXEC insns

2015-12-01 Thread Alexander Monakov

This allows to use COND_EXEC patterns on nvptx.  The backend is mostly ready
for that, although I had to slightly fix nvptx_print_operand.  I've also opted
to make calls predicable to make the uniform-simt patch simpler, and to that
end I need a small fixup in nvptx_output_call_insn.

RTL optimization won't emit COND_EXEC insns, because it's done only after
reload, and register allocation is not done.  I need this patch to create
COND_EXEC patterns in the backend during reorg.

* config/nvptx/nvptx.c (nvptx_output_call_insn): Handle COND_EXEC
patterns.  Emit instruction predicate.
(nvptx_print_operand): Unbreak handling of instruction predicates.
* config/nvptx/nvptx.md (predicable): New attribute.  Generate
predicated forms via define_cond_exec.
(br_true): Mark as not predicable.
(br_false): Ditto.
(br_true_uni): Ditto.
(br_false_uni): Ditto.
(return): Ditto.
(trap_if_true): Ditto.
(trap_if_false): Ditto.
(nvptx_fork): Ditto.
(nvptx_forked): Ditto.
(nvptx_joining): Ditto.
(nvptx_join): Ditto.
(nvptx_barsync): Ditto.
---
 gcc/config/nvptx/nvptx.c  | 12 +++-
 gcc/config/nvptx/nvptx.md | 43 +++
 2 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 19445ad..2dad3e2 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1905,6 +1905,8 @@ nvptx_assemble_undefined_decl (FILE *file, const char 
*name, const_tree decl)
   fprintf (file, ";\n\n");
 }
 
+static void nvptx_print_operand (FILE *, rtx, int);
+
 /* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
involves writing .param declarations and in/out copies into them.  For
indirect calls, also write the .callprototype.  */
@@ -1916,6 +1918,8 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx 
callee)
   static int labelno;
   bool needs_tgt = register_operand (callee, Pmode);
   rtx pat = PATTERN (insn);
+  if (GET_CODE (pat) == COND_EXEC)
+pat = COND_EXEC_CODE (pat);
   int arg_end = XVECLEN (pat, 0);
   tree decl = NULL_TREE;
 
@@ -1975,6 +1979,7 @@ nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx 
callee)
}
 }
 
+  nvptx_print_operand (asm_out_file, NULL_RTX, '.');
   fprintf (asm_out_file, "\t\tcall ");
   if (result != NULL_RTX)
 fprintf (asm_out_file, "(%%retval_in), ");
@@ -2032,8 +2037,6 @@ nvptx_print_operand_punct_valid_p (unsigned char c)
   return c == '.' || c== '#';
 }
 
-static void nvptx_print_operand (FILE *, rtx, int);
-
 /* Subroutine of nvptx_print_operand; used to print a memory reference X to 
FILE.  */
 
 static void
@@ -2098,11 +2101,10 @@ nvptx_print_operand (FILE *file, rtx x, int code)
   if (x)
{
  unsigned int regno = REGNO (XEXP (x, 0));
- fputs ("[", file);
+ fputs ("@", file);
  if (GET_CODE (x) == EQ)
fputs ("!", file);
- fputs (reg_names [regno], file);
- fputs ("]", file);
+ fprintf (file, "%%r%d", regno);
}
   return;
 }
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 7930f8d..5ce7a89 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -226,6 +226,17 @@ (define_predicate "call_operation"
   return true;
 })
 
+(define_attr "predicable" "false,true"
+  (const_string "true"))
+
+(define_cond_exec
+  [(match_operator 0 "predicate_operator"
+  [(match_operand:BI 1 "nvptx_register_operand" "")
+   (match_operand:BI 2 "const0_operand" "")])]
+  ""
+  ""
+  )
+
 (define_constraint "P0"
   "An integer with the value 0."
   (and (match_code "const_int")
@@ -821,7 +832,8 @@ (define_insn "br_true"
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "%j0\\tbra\\t%l1;")
+  "%j0\\tbra\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 (define_insn "br_false"
   [(set (pc)
@@ -830,7 +842,8 @@ (define_insn "br_false"
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "%J0\\tbra\\t%l1;")
+  "%J0\\tbra\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 ;; unified conditional branch
 (define_insn "br_true_uni"
@@ -839,7 +852,8 @@ (define_insn "br_true_uni"
   UNSPEC_BR_UNIFIED) (const_int 0))
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
-  "%j0\\tbra.uni\\t%l1;")
+  "%j0\\tbra.uni\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 (define_insn "br_false_uni"
   [(set (pc) (if_then_else
@@ -847,7 +861,8 @@ (define_insn "br_false_uni"
   UNSPEC_BR_UNIFIED) (const_int 0))
 (label_ref (match_operand 1 "" "")) (pc)))]
   ""
-  "%J0\\tbra.uni\\t%l1;")
+  "%J0\\tbra.uni\\t%l1;"
+  [(set_attr "predicable" "false")])
 
 (define_expand "cbranch4"
   [(set (pc)
@@ -1239,7 +1254,8 @@ (define_insn "return"
   ""
 {
   return nvptx

Re: [PATCH] Add testcase for tree-optimization/64769

2015-12-01 Thread Jakub Jelinek

On Tue, Dec 01, 2015 at 04:38:03PM +0100, Marek Polacek wrote:
> There's an open PR with -fopenmp-simd testcase that used to ICE but is now
> fixed for 5/6, but not 4.9.
> 
> Should I commit this right away to trunk, wait for gcc-5 branch to open and
> then commit it to 5 as well and then close the PR?
> 
> Or just to trunk and close the PR?
> 
> Tested on x86_64-linux.
> 
> 2015-12-01  Marek Polacek  
> 
>   PR tree-optimization/64769
>   * c-c++-common/gomp/pr64769.c: New test.

Ok for trunk.

> diff --git gcc/testsuite/c-c++-common/gomp/pr64769.c 
> gcc/testsuite/c-c++-common/gomp/pr64769.c
> index e69de29..3a30149 100644
> --- gcc/testsuite/c-c++-common/gomp/pr64769.c
> +++ gcc/testsuite/c-c++-common/gomp/pr64769.c
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/64769 */
> +/* { dg-do compile } */
> +/* { dg-options "-fopenmp-simd" } */
> +
> +#pragma omp declare simd linear(i)
> +void
> +foo (int i)
> +{
> +}

Jakub

Re: [PATCH] Commentary typo fix for gfc_typenode_for_spec()

2015-12-01 Thread Steve Kargl

On Tue, Dec 01, 2015 at 01:55:00PM +0100, Bernhard Reutner-Fischer wrote:
> Regstrapped without regressions, ok for trunk stage3 now / next stage1?
> 
> gcc/fortran/ChangeLog
> 
> 2015-11-29  Bernhard Reutner-Fischer  
> 
>   * trans-types.c (gfc_typenode_for_spec): Commentary typo fix.
> 

Patches to fix typographical errors in comments are pre-approved.

-- 
Steve

Re: [gomp-nvptx 8/9] libgomp: update gomp_nvptx_main for -mgomp

2015-12-01 Thread Bernd Schmidt


On 12/01/2015 04:28 PM, Alexander Monakov wrote:

Bernd, is your position on exposing shared memory as first-class address space
on NVPTX subject to change?  Do you remember what middle-end issues you've
encountered when trying that?


TYPE_ADDR_SPACE does not reliably contain the address space. Patches to 
deal with that (rather than fix it which Joseph doesn't like) got really 
ugly and I gave up on it. So please use the patch I sent which deals 
with .shared inside the ptx backend (although I think it may have to be 
reworked a little since Nathan changed the code around recently).



Bernd

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-01 Thread Bernd Schmidt


On 12/01/2015 04:28 PM, Alexander Monakov wrote:

I'm taking a different approach.  I want to execute all insns in all warp
members, while ensuring that effect (on global and local state) is that same
as if any single thread was executing that instruction.  Most instructions
automatically satisfy that: if threads have the same state, then executing an
arithmetic instruction, normal memory load/store, etc. keep local state the
same in all threads.

The two exception insn categories are atomics and calls.  For calls, we can
demand recursively that they uphold this execution model, until we reach
runtime-provided "syscalls": malloc/free/vprintf.  Those we can handle like
atomics.


Didn't we also conclude that address-taking (let's say for stack 
addresses) is also an operation that does not result in the same state?


Have you tried to use the mechanism used for OpenACC? IMO that would be 
a good first step - get things working with fewer changes, and then look 
into optimizing them (ideally for OpenMP and OpenACC both).



Bernd

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Bernhard Reutner-Fischer

On 1 December 2015 at 16:01, Steve Kargl
 wrote:
> On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
>>
>> David Malcolm nice Levenshtein distance spelling check helpers
>> were used in some parts of other frontends. This proposed patch adds
>> some spelling corrections to the fortran frontend.

> What problem are you trying to solve here?  The patch looks like

The idea is to improve the programmer experience when writing code.
See the testcases enclosed in the patch. I consider this a feature :)

> unneeded complexity with the result of injecting C++ idioms into
> the Fortran FE.

What C++ idioms are you referring to? The autovec?
AFAIU the light use of C++ in GCC is deemed OK. I see usage of
std::swap and std::map in the FE, not to mention the wide-int uses
(wi::). Thus we don't have to realloc/strcat but can use vectors to
the same effect, just as other frontends, including the C frontend,
do.
I take it you remember that we had to change all "try" to something
C++ friendly. If the Fortran FE meant to opt-out of being compiled
with a C++ compiler in the first place, why were all the C++ clashes
rewritten, back then? :)

thanks,

Re: [PATCH][ARM] Use snprintf rather than sprintf where possible

2015-12-01 Thread Kyrill Tkachov


Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00937.html

This fell through the cracks for me.
Is this ok at this stage? Or should I leave it for GCC 7?

Thanks,
Kyrill

On 09/11/15 11:36, Kyrill Tkachov wrote:

Hi all,

Judging by the thread at 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01912.html
I looked at replacing calls to sprintf with calls to snprintf in the arm 
backend.
We use them a lot to print assembly mnemonics into static char buffers.
This patch replaces the calls with snprintf and adds a size argument equal to 
the size
of the buffer used. This way, if any of the format strings changes/increases 
past the size
of the allocated buffer, snprintf will truncate it (and the assembler will 
catch it) rather
than trying to write past the end of the buffer with unexpected results.

I managed to replace all uses of sprintf in the arm backend except the one in 
aout.h:
#define ASM_GENERATE_INTERNAL_LABEL(STRING, PREFIX, NUM)  \
  sprintf (STRING, "*%s%s%u", LOCAL_LABEL_PREFIX, PREFIX, (unsigned int)(NUM))

Here, ASM_GENERATE_INTERNAL_LABEL is used in various places in the midend to 
print labels
to static buffers. I've seen those buffers have sizes ranging from 12 chars to 
256 chars.
The size of the buffer that ASM_GENERATE_INTERNAL_LABEL can depend on is not 
mandated in the
documentation or passed down to the macro, so I think this is a bit dangerous. 
In practice, however,
I don't think we print labels that long that that would cause an issue.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

* config/arm/arm.c (arm_set_fixed_optab_libfunc): Use snprintf
rather than sprintf.
(arm_set_fixed_conv_libfunc): Likewise.
(arm_option_override): Likewise.
(neon_output_logic_immediate): Likewise.
(neon_output_shift_immediate): Likewise.
(arm_output_multireg_pop): Likewise.
(vfp_output_vstmd): Likewise.
(output_move_vfp): Likewise.
(output_move_neon): Likewise.
(output_return_instruction): Likewise.
(arm_elf_asm_cdtor): Likewise.
(arm_output_shift): Likewise.
(arm_output_iwmmxt_shift_immediate): Likewise.
(arm_output_iwmmxt_tinsr): Likewise.
* config/arm/neon.md (*neon_mov, VDX): Likewise.
(*neon_mov, VQXMOV): Likewise.
(neon_vc_insn): Likewise.
(neon_vc_insn_unspec): Likewise.

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-01 Thread Alexander Monakov

On Tue, 1 Dec 2015, Bernd Schmidt wrote:
> 
> Didn't we also conclude that address-taking (let's say for stack addresses) is
> also an operation that does not result in the same state?

This is intended to be used with soft-stacks in OpenMP offloading, and
soft-stacks are per-warp outside of SIMD regions, not private to hwthread.  So
no such problem arises.

(also, I wouldn't phrase it that way -- I wouldn't say that taking address of
a classic .local stack slot desyncs state)

> Have you tried to use the mechanism used for OpenACC? IMO that would be a good
> first step - get things working with fewer changes, and then look into
> optimizing them (ideally for OpenMP and OpenACC both).

I don't think I would have as much success trying to apply the OpenACC
mechanism with the overall direction I'm taking, that is, running with a
slightly modified libgomp port.  The way parallel regions are activated in the
guts of libgomp via GOMP_parallel/gomp_team_start makes things different, for
example.

Alexander

RE: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-12-01 Thread David Sherwood

Hi,

Thanks for the comments James, I've moved the patterns around
and added new comments to them. Hope this is ok.

Regards,
David Sherwood.

ChangeLog:

2015-12-01  David Sherwood  

gcc/
* config/aarch64/aarch64.md: New pattern.
* config/aarch64/aarch64-simd.md: Likewise.
* config/aarch64/iterators.md: New unspecs, iterators.
gcc/testsuite
* gcc.target/aarch64/fmaxmin.c: New test.

> -Original Message-
> From: James Greenhalgh [mailto:james.greenha...@arm.com]
> Sent: 26 November 2015 16:53
> To: David Sherwood
> Cc: GCC Patches
> Subject: Re: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of 
> scalar fmin* and
fmax*
> 
> On Thu, Nov 26, 2015 at 04:20:35PM -, David Sherwood wrote:
> > Hi,
> >
> > Here is the second patch of the fmin/fmax change, which adds the optabs
> > to the aarch64 backend.
> >
> > Tested:
> >
> > x86_64-linux: no regressions
> > aarch64-none-elf: no regressions
> >
> > Good to go?
> > David Sherwood.
> 
> Could you also update the comment a few lines above the pattern you add
> in aarch64-simd.md? Unless I've misunderstood the point of this patch set,
> that looks to be out of date now:
> 
>   ;; FP Max/Min
>   ;; Max/Min are introduced by idiom recognition by GCC's mid-end.  An
>   ;; expression like:
>   ;;  a = (b < c) ? b : c;
>   ;; is idiom-matched as MIN_EXPR only if -ffinite-math-only is enabled
>   ;; either explicitly or indirectly via -ffast-math.
>   ;;
>   ;; MIN_EXPR and MAX_EXPR eventually map to 'smin' and 'smax' in RTL.
>   ;; The 'smax' and 'smin' RTL standard pattern names do not specify which
>   ;; operand will be returned when both operands are zero (i.e. they may not
>   ;; honour signed zeroes), or when either operand is NaN.  Therefore GCC
>   ;; only introduces MIN_EXPR/MAX_EXPR in fast math mode or when not honouring
>   ;; NaNs.
> 
> Either that, or reorder the patterns you add so the existing patterns that
> this comment pertains to are kept close to it, and add a new comment for
> your new pattern - explaining that it is the auto-vectorized form of the
> IEEE-754 fmax/fmin functions.
> 
> Thanks,
> James
> 
> 
> 
> >
> > ChangeLog:
> >
> > 2015-11-26  David Sherwood  
> >
> > gcc/
> > * config/aarch64/aarch64.md: New pattern.
> > * config/aarch64/aarch64-simd.md: Likewise.
> > * config/aarch64/iterators.md: New unspecs, iterators.
> > gcc/testsuite
> > * gcc.target/aarch64/fmaxmin.c: New test.
> >
> > > -Original Message-
> > > From: Richard Biener [mailto:richard.guent...@gmail.com]
> > > Sent: 25 November 2015 12:39
> > > To: David Sherwood
> > > Cc: GCC Patches; Richard Sandiford
> > > Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of 
> > > scalar fmin* and fmax*
> > >
> > > On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood  
> > > wrote:
> > > > Hi,
> > > >
> > > > This is part 1 of a reworked version of a patch I originally submitted 
> > > > in
> > > > August, rebased after Richard Sandiford's recent work on the internal
> > > > functions. This first patch adds the internal function definitions and 
> > > > optabs
> > > > that provide support for IEEE fmax()/fmin() functions.
> > > >
> > > > Later patches will add the appropriate aarch64/aarch32 vector 
> > > > instructions.
> > >
> > > Ok.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Tested:
> > > >
> > > > x86_64-linux: no regressions
> > > > aarch64-none-elf: no regressions
> > > > arm-none-eabi: no regressions
> > > >
> > > > Regards,
> > > > David Sherwood.
> > > >
> > > > ChangeLog:
> > > >
> > > > 2015-11-19  David Sherwood  
> > > >
> > > > gcc/
> > > > * optabs.def: Add new optabs fmax_optab/fmin_optab.
> > > > * internal-fn.def: Add new fmax/fmin internal functions.
> > > > * config/aarch64/aarch64.md: New pattern.
> > > > * config/aarch64/aarch64-simd.md: Likewise.
> > > > * config/aarch64/iterators.md: New unspecs, iterators.
> > > > * config/arm/iterators.md: New iterators.
> > > > * config/arm/unspecs.md: New unspecs.
> > > > * config/arm/neon.md: New pattern.
> > > > * config/arm/vfp.md: Likewise.
> > > > * doc/md.texi: Add fmin and fmax patterns.
> > > > gcc/testsuite
> > > > * gcc.target/aarch64/fmaxmin.c: New test.
> > > > * gcc.target/arm/fmaxmin.c: New test.
> > > >
> > > >
> > > >> -Original Message-
> > > >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> > > >> Sent: 19 August 2015 13:35
> > > >> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> > > >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of 
> > > >> scalar fmin* and
fmax*
> > > >>
> > > >> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
> > > >>  wrote:
> > > >> > Richard Biener  writes:
> > > >> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> > > >> >>  wrote:
> > > >> >>> Richard Biener  writes:
> > > >>  On

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Gary Funck

On 12/01/15 12:12:29, Richard Biener wrote:
> On Mon, 30 Nov 2015, Gary Funck wrote:
> > At this time, we would like to re-submit the UPC patches for comment
> > with the goal of introducing these changes into GCC 6.0.
>
>  First of all let me say that it is IMNSHO now too late for GCC 6.

I realize that stage 1 recently closed, and that if UPC were
accepted for inclusion, it would be an exception.  To offset
potential risk, we perform weekly merges and run a large suite
of tests and apps. on differing hosts/cpu architectures.
We have also tried to follow the sorts of re-factoring and C++
changes made over the course of the last year/so.  I'd just ask
that the changes be given some further consideration for 6.0.

> You claim bits in tree_base - are those bits really used for
> all tree kinds?  The qualifiers look type specific where
> eventually FE specific flags in type-lang-specific parts could
> have been used (yeah, there are no spare bits in tree_type_*).
> Similar the _factor stuff should not be on all tree kinds.

When we first started building the gupc branch, it was suggested
that UPC be implemented as a separate language ala ObjC.
In that case, we used "language bits".  Over time, this approach
fell out of favor, and we were asked to move everything into
the C front-end and middle-end, making compilation contingent
upon -fupc, which is the way it is now.  Also, over the past
couple of years, there has been work to minimize the number of
bits used by tree nodes, so some additional changes were needed.

The main change recommended to reduce tree space was moving the
"layout factor" (blocking factor) out of the tree node, and using
only two bits there, one bit for a relatively common case of 0,
and the other for > 1.  It was suggested that we use a hash
table to map tree nodes to layout qualifiers for the case they
are > 1.  This necessitated using a garbage collected tree map,
which unfortunately meant that tree nodes needed special garbage
collection logic.

It is worth noting that the "layout qualifier" is an integral
constant, currently represented as a tree node reference.
It might be possible to represent it as a "wide int" instead.
I did give that a go once, but it rippled through the code
making some things awkward.  Perhaps not as awkward as a
custom tree node GC routine; this could be re-visited.

> I find the names used a bit unspecific, please consider
> prefixing them with upc_ (esp. shared_flag may be confused
> with the similar private_flag).

When we previously asked for a review, it was noted that
if the UPC bits were moved into what amounts to common/generic
tree node fields that we should drop UPC_ or upc_ from the
related node names and functions.  That's what we did.
There is some middle ground, for example, where only
TYPE_SHARED_P() is renamed to UPC_SHARED_TYPE_P()
and the rest remain as is.

Since renames are straight forward, we can make any
recommended changes quickly.

Originally, we were keeping the door open for UPC++, but
there are complications with generalizing C++ into a multi-node
environment, and that idea has been tabled for now.
Therefore, the current structure/implementation is C only,
with most of the new front-end/middle-end logic under
the c/ directory.

> Are these and the new tree codes below living beyond the time
> the frontend is in control?  That is, do they need to survive
> throughout the middle-end?

I'm not sure where the line is drawn for the front-end and middle-end.
After upc_genericize() runs (just before c_genericize())
all operations on tree nodes that are UPC-specific are lowered
into operations on the internal representation of a pointer-to-shared
and/or runtime calls that operate on the internal representation.
The pointer-to-shared values/types still show up
in the tree, but only as containers (pointers-to-shared
are typically 2x the size of a regular "C" pointer).

The places where SHARED_TYPE_P() is referenced in 'c/'
and 'c-family/' are:

c/c-convert.c
c/c-objc-common.c
c/c-upc-pts-ops.c
c/c-parser.c
c/c-typeck.c
c/c-upc-low.c
c/c-upc-lang.c
c/c-decl.c
c/c-upc.c
c-family/c-common.c

The places in the gcc top-level where SHARED_TYPE_P()
is referenced are:

convert.c
explow.c
fold-const.c
function.c
gimple-expr.c
match.pd
tree.c
tree.h
tree-sra.c

The target-specific references are here:

config/rs6000/rs6000.c
config/i386/i386.c

All of the references outside of c/ and c-family/
and tree.[ch] are to differentiate operations on UPC pointers-to-shared from
regular "C" pointers.  (Some/all of those references might
be mitigated by defining new language hooks.  We haven't looked
into that.)

It may be the case that in the current design, that only
the "shared" bit is needed in the common (base?) tree node,
as long as there is some way to record the additional
"strict" and "relaxed" qualifiers, the "layout qualifier"
(a tree reference to an integral constant)

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Steve Kargl

On Tue, Dec 01, 2015 at 05:12:57PM +0100, Bernhard Reutner-Fischer wrote:
> On 1 December 2015 at 16:01, Steve Kargl
>  wrote:
> > On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
> >>
> >> David Malcolm nice Levenshtein distance spelling check helpers
> >> were used in some parts of other frontends. This proposed patch adds
> >> some spelling corrections to the fortran frontend.
> 
> > What problem are you trying to solve here?  The patch looks like
> 
> The idea is to improve the programmer experience when writing code.
> See the testcases enclosed in the patch. I consider this a feature :)

Opinions differ.  I consider it unnecessary bloat.

> > unneeded complexity with the result of injecting C++ idioms into
> > the Fortran FE.
> 
> What C++ idioms are you referring to? The autovec?
> AFAIU the light use of C++ in GCC is deemed OK. I see usage of
> std::swap and std::map in the FE, not to mention the wide-int uses
> (wi::). Thus we don't have to realloc/strcat but can use vectors to
> the same effect, just as other frontends, including the C frontend,
> do.
> I take it you remember that we had to change all "try" to something
> C++ friendly. If the Fortran FE meant to opt-out of being compiled
> with a C++ compiler in the first place, why were all the C++ clashes
> rewritten, back then? :)

Yes, I know there are other C++ (mis)features within the
Fortran FE especially in the trans-*.c files.  Those are
accepted (by some) as necessary evils to interface with 
the ME.  Your patch injects C++ into otherwise perfectly
fine C code, which makes it more difficult for those with
no or very limited C++ knowledge to maintain the gfortran.

There are currently 806 open bug reports for gfortran.
AFAIK, your patch does not address any of those bug reports.
The continued push to inject C++ into the Fortran FE will
have the (un)intentional consequence of forcing at least one
active gfortran contributor to stop.

--  
Steve

Gimple loop splitting v2

2015-12-01 Thread Michael Matz

Hi,

On Mon, 16 Nov 2015, Jeff Law wrote:

> OK, if you want to keep them, then have a consistent way to turn them 
> on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
> someone else what to turn on/off if they need to debug this stuff.

> > > I don't see any negative tests -- ie tests that should not be split 
> > > due to boundary conditions.  Do you have any from development?
> > 
> > Good point, I had some but only ones where I was able to extend the 
> > splitters to cover them.  I'll think of some that really shouldn't be 
> > split.
> If you've got them, certainly add them.  Though I realize they may get 
> lost over time.

Actually, thinking a bit more about this, I don't have any that wouldn't 
be merely restrictions in the implementation that couldn't be lifted in 
the future (e.g. unequal step sizes), so I've added no additional ones.

> But in that case, the immediate dominator of pre2 & join is still the 
> initial if statement.  So I think we're OK.  That was the conclusion I 
> was starting to come to yesterday, having the ascii art makes it pretty 
> clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
> explicitly and then everything seems so clear and simple.

So, this second version should reflect the review.  I've moved everything 
to a new file, split the long function into several logically separate 
ones, and even included ascii art in the comments :)  The testcase got a 
comment about what to #define for debugging.  I've included the pass to 
-O3 or alternatively if profile-use is on, similar to funswitch-loops.  
I've also added a proper -fsplit-loops option.

There's two functional changes in v2: a bugfix to not try splitting a 
non-iterating loop (irritatingly such a look returns true from 
number_of_iterations_exit, but with an ERROR_MARK comparator), and a 
limitation to avoid combinatorical explosion in artificial testcases: Once 
we have done a splitting, we don't do any in that loops parents (we may 
still do splitting in siblings or childs of siblings).

I've also done some measurements: first, bootstrap time is unaffected, and 
regstrapping succeeds without regressions when I activate the pass by 
default.  Then SPECcpu2006: build times are unaffected, everything builds 
and works also with -fsplit-loops, performance is mostly unaffected, base 
is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops.

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
400.perlbench9770325   30.1 *9770323   30.3 *  
401.bzip29650382   25.2 *9650382   25.3 *  
403.gcc  8050242   33.3 *8050241   33.4 *  
429.mcf  9120311   29.3 *9120311   29.3 *  
445.gobmk   10490392   26.8 *   10490391   26.8 *  
456.hmmer9330345   27.0 *9330342   27.3 *  
458.sjeng   12100422   28.7 *   12100420   28.8 *  
462.libquantum  20720308   67.3 *   20720308   67.3 *  
464.h264ref 22130423   52.3 *   22130423   52.3 *  
471.omnetpp  6250273   22.9 *6250273   22.9 *  
473.astar7020311   22.6 *7020311   22.6 *  
483.xalancbmk6900191   36.2 *6900190   36.2 *  
 Est. SPECint_base2006 31.7
 Est. SPECint2006  31.7

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
410.bwaves  13590235   57.7 *   13590235   57.8 *  
416.gamess  NR  NR 
433.milc 9180347   26.5 *9180345   26.6 *  
434.zeusmp   9100269   33.9 *9100268   33.9 *  
435.gromacs  7140260   27.4 *7140262   27.3 *  
436.cactusADM   11950237   50.5 *   11950240   49.9 *  
437.leslie3d 9400228   41.3 *9400228   41.2 *  
444.namd 8020312   25.7 *8020311   25.7 *  
447.dealII  11440254   45.0 *   11440254   45.0 *  
450.soplex   8340201   41.4 *8340202   41.4 *  
453.povray  NR  NR 
454.calculix

Re: [PATCH] Derive interface buffers from max name length

2015-12-01 Thread Bernhard Reutner-Fischer

On 1 December 2015 at 15:52, Janne Blomqvist  wrote:
> On Tue, Dec 1, 2015 at 2:54 PM, Bernhard Reutner-Fischer
>  wrote:
>> These three function used a hardcoded buffer of 100 but would be better
>> off to base off GFC_MAX_SYMBOL_LEN which denotes the maximum length of a
>> name in any of our supported standards (63 as of f2003 ff.).
>
> Please use xasprintf() instead (and free the result, or course). One
> of my backburner projects is to get rid of these static symbol
> buffers, and use dynamic buffers (or the symbol table) instead. We
> IIRC already have some ugly hacks by using hashing to get around
> GFC_MAX_SYMBOL_LEN when handling mangled symbols. Your patch doesn't
> make the situation worse per se, but if you're going to fix it, lets
> do it properly.

I see.

/scratch/src/gcc-6.0.mine/gcc/fortran$ git grep
"^[[:space:]]*char[[:space:]][[:space:]]*[^[;[:space:]]*\[" | wc -l
142
/scratch/src/gcc-6.0.mine/gcc/fortran$ git grep "xasprintf" | wc -l
32

What about memory fragmentation when switching to heap-based allocation?
Or is there consensus that these are in the noise compared to other
parts of the compiler?

BTW:
$ git grep APO
io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };
io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };


> Ok for GCC 7 stage1 with these changes. I don't think it's worth
> putting it into GCC 6 at this point anymore, unless this is actually
> fixing some bugs that are visible to users?

Not visible, no, can wait easily.

1 2 >

1 - 100 of 160 matches

Mail list logo