Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-02 Thread Cesar Philippidis
On 07/02/2018 07:14 AM, Tom de Vries wrote:
> On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
>> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
>>>> Now it follows the formula contained in
>>>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>>
>>> Any reason we're not using the cuda runtime functions to get the
>>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>>> to determine launch configuration in nvptx ) ?
>>
>> There are two reasons:
>>
>>   1) cuda_occupancy.h depends on the CUDA runtime to extract the device
>>  properties instead of the CUDA driver API. However, we can always
>>  teach libgomp how to populate the cudaDeviceProp struct using the
>>  driver API.
>>
>>   2) CUDA is not always present on the build host, and that's why
>>  libgomp maintains its own cuda.h. So at the very least, this
>>  functionality would be good to have in libgomp as a fallback
>>  implementation;
> 
> Libgomp maintains its own cuda.h to "allow building GCC with PTX
> offloading even without CUDA being installed" (
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ).
> 
> The libgomp nvptx plugin however uses the cuda driver API to launch
> kernels etc, so we can assume that's always available at launch time.
> And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
> Configuration", the occupancy API is also available in the driver API.

Thanks for the info. I was not aware that the CUDA driver API had a
thread occupancy calculator (it' described in section 4.18).

> What we cannot assume to be available is the occupancy API pre cuda-6.5.
> So it's fine to have a fallback for that (properly isolated in utility
> functions), but for cuda 6.5 and up we want to use the occupancy API.

That seems reasonable. I'll run some experiments with that. In the
meantime, would it be OK to make this fallback the default, then add
support for the driver occupancy calculator as a follow up?

>>  its not good to have program fail due to
>>  insufficient hardware resources errors when it is avoidable.
>>
> 
> Right, in fact there are two separate things you're trying to address
> here: launch failure and occupancy heuristic, so split the patch.

ACK. I'll split those changes into separate patches.

By the way, do you have any preferences on how to break up the nvptx
vector length changes for trunk submission? I was planning on breaking
it down into four components - generic ME changes, tests, nvptx
reductions and the rest. Those two nvptx compoinents are large, so I'll
probably break them down to smaller patches, but I'm not sure if it's
worthwhile to make them independent from one another with the use of a
lot of stub functions.

Cesar


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-11 Thread Cesar Philippidis
On 07/02/2018 07:14 AM, Tom de Vries wrote:
> On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
>> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
>>>> Now it follows the formula contained in
>>>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>>
>>> Any reason we're not using the cuda runtime functions to get the
>>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>>> to determine launch configuration in nvptx ) ?
>>
>> There are two reasons:
>>
>>   1) cuda_occupancy.h depends on the CUDA runtime to extract the device
>>  properties instead of the CUDA driver API. However, we can always
>>  teach libgomp how to populate the cudaDeviceProp struct using the
>>  driver API.
>>
>>   2) CUDA is not always present on the build host, and that's why
>>  libgomp maintains its own cuda.h. So at the very least, this
>>  functionality would be good to have in libgomp as a fallback
>>  implementation;
> 
> Libgomp maintains its own cuda.h to "allow building GCC with PTX
> offloading even without CUDA being installed" (
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ).
> 
> The libgomp nvptx plugin however uses the cuda driver API to launch
> kernels etc, so we can assume that's always available at launch time.
> And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
> Configuration", the occupancy API is also available in the driver API.
> 
> What we cannot assume to be available is the occupancy API pre cuda-6.5.
> So it's fine to have a fallback for that (properly isolated in utility
> functions), but for cuda 6.5 and up we want to use the occupancy API.

Here's revision 2 to the patch. I replaced all of my thread occupancy
heuristics with calls to the CUDA driver as you suggested. The
performance is worse than my heuristics, but that's to be expected
because the CUDA driver only guarantees the minimal launch geometry to
to fully utilize the hardware, and not the optimal value. I'll
reintroduce my heuristics later as a follow up patch. The major
advantage of the CUDA thread occupancy calculator is that it allows the
runtime to select sensible default num_workers to avoid those annoying
runtime failures due to insufficient GPU hardware resources.

One thing that may stick out in this patch is how it probes for the
driver version instead of the API version. It turns out that the API
version corresponds to the SM version declared in the PTX sources,
whereas the driver version corresponds to the latest version of CUDA
supported by the driver. At least that's the case with driver version
396.24.

>>  its not good to have program fail due to
>>  insufficient hardware resources errors when it is avoidable.
>>
> 
> Right, in fact there are two separate things you're trying to address
> here: launch failure and occupancy heuristic, so split the patch.

That hunk was small, so I included it with this patch. Although if you
insist, I can remove it.

Is this patch OK for trunk? I tested it x86_64 with nvptx offloading.

Cesar
2018-07-XX  Cesar Philippidis  
	Tom de Vries  

	gcc/
	* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to ...
	(PTX_DEFAULT_RUNTIME_DIM): ... this.
	(nvptx_goacc_validate_dims): Set default worker and gang dims to
	PTX_DEFAULT_RUNTIME_DIM.
	(nvptx_dim_limit): Ignore GOMP_DIM_WORKER;

	libgomp/
	* plugin/cuda/cuda.h (CUoccupancyB2DSize): Declare.
	(cuOccupancyMaxPotentialBlockSizeWithFlags): Likewise.
	* plugin/plugin-nvptx.c (struct ptx_device): Add driver_version member.
	(nvptx_open_device): Set it.
	(nvptx_exec): Use the CUDA driver to both determine default num_gangs
	and num_workers, and error if the hardware doesn't have sufficient
	resources to launch a kernel.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5608bee8a8d..c1946e75f42 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5165,7 +5165,7 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 /* Define dimension sizes for known hardware.  */
 #define PTX_VECTOR_LENGTH 32
 #define PTX_WORKER_LENGTH 32
-#define PTX_GANG_DEFAULT  0 /* Defer to runtime.  */
+#define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
 /* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
 
@@ -5214,9 +5214,9 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level)
 {
   dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH;
   if (dims[GOMP_DIM_WORKER] < 0)
-	dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH;
+	dims[GOMP_DIM_WORKER] = PTX_DEFAULT_RUNTIME_DIM;
   if (dims[GOMP_DIM_GANG] < 0)
-	dims[GOMP_

[PATCH] Fix PR70828 - broken array-type subarrays inside acc data, in OpenACC

2018-07-20 Thread Cesar Philippidis
Attached is an old gomp-4_0-branch that fixes PR70828. Besides for
fixing the PR, it also introduces some changes which will enable the
forthcoming nvptx vector length enhancements. More details on the patch
can be found here <https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01293.html>

I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar
>From 3a58144cfaca8f6e3a889346e736e68a9ed17e6a Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Thu, 18 Aug 2016 01:12:15 +
Subject: [PATCH 1/5] Fix PR70828s "broken array-type subarrays inside acc data
 in openacc"

2018-XX-YY  Cesar Philippidis  

	gcc/
	* gimplify.c (struct gimplify_omp_ctx): Add tree clauses member.
	(new_omp_context): Initialize clauses to NULL_TREE.
	(gimplify_scan_omp_clauses): Set clauses in the gimplify_omp_ctx.
	(omp_clause_matching_array_ref): New function.
	(gomp_needs_data_present): New function.
	(gimplify_adjust_omp_clauses_1): Use preset or pointer omp clause map
	kinds when creating implicit data clauses for OpenACC offloaded
	variables defined used an acc data region as necessary.  Link ACC
	new clauses with the old ones.

	gcc/testsuite/
	* c-c++-common/goacc/acc-data-chain.c: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
	* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
	* testsuite/libgomp.oacc-fortran/lib-13.f90: Remove XFAIL.
---
 gcc/gimplify.c| 101 +-
 .../c-c++-common/goacc/acc-data-chain.c   |  24 +
 .../libgomp.oacc-c-c++-common/pr70828.c   |  25 +
 .../testsuite/libgomp.oacc-fortran/lib-13.f90 |   1 -
 .../libgomp.oacc-fortran/pr70828.f90  |  24 +
 5 files changed, 173 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/acc-data-chain.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr70828.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828.f90

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 4a109aee27a..cf8977c8508 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -191,6 +191,7 @@ struct gimplify_omp_ctx
   bool target_map_scalars_firstprivate;
   bool target_map_pointers_as_0len_arrays;
   bool target_firstprivatize_array_bases;
+  tree clauses;
 };
 
 static struct gimplify_ctx *gimplify_ctxp;
@@ -409,6 +410,7 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->clauses = NULL_TREE;
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -7501,6 +7503,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
   tree *prev_list_p = NULL;
 
   ctx = new_omp_context (region_type);
+  ctx->clauses = *list_p;
   outer_ctx = ctx->outer_context;
   if (code == OMP_TARGET)
 {
@@ -8696,6 +8699,58 @@ struct gimplify_adjust_omp_clauses_data
   gimple_seq *pre_p;
 };
 
+/* Return true if clause contains an array_ref of DECL.  */
+
+static bool
+omp_clause_matching_array_ref (tree clause, tree decl)
+{
+  tree cdecl = OMP_CLAUSE_DECL (clause);
+
+  if (TREE_CODE (cdecl) != ARRAY_REF)
+return false;
+
+  return TREE_OPERAND (cdecl, 0) == decl;
+}
+
+/* Inside OpenACC parallel and kernels regions, the implicit data
+   clauses for arrays must respect the explicit data clauses set by a
+   containing acc data region.  Specifically, care must be taken
+   pointers or if an subarray of a local array is specified in an acc
+   data region, so that the referenced array inside the offloaded
+   region has a present data clasue for that array with an
+   approporiate subarray argument.  This function returns the tree
+   node of the acc data clause that utilizes DECL as an argument.  */
+
+static tree
+gomp_needs_data_present (tree decl)
+{
+  gimplify_omp_ctx *ctx = NULL;
+  bool found_match = false;
+  tree c = NULL_TREE;
+
+  if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+return NULL_TREE;
+
+  if (gimplify_omp_ctxp->region_type != ORT_ACC_PARALLEL
+  && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS)
+return NULL_TREE;
+
+  for (ctx = gimplify_omp_ctxp->outer_context; !found_match && ctx;
+   ctx = ctx->outer_context)
+{
+  if (ctx->region_type != ORT_ACC_DATA)
+	break;
+
+  for (c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	&& (omp_clause_matching_array_ref (c, decl)
+		|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER))
+	  return c;
+}
+
+  return NULL_TREE;
+}
+
 /* For all variables that were not actually used within the context,
remove PRIVATE, SHARED, and FIRSTPRIVATE clauses.  */
 
@@ -8849,7 +8904,51 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	  gcc_unreachable ();
 	}
   OMP_CLAUSE_SET_MAP_KIND (clause, kind);
-  if (DECL_SIZE (decl)
+

[PATCH] Add support for making maps 'private' inside OpenACC offloaded regions

2018-07-20 Thread Cesar Philippidis
Due to the different levels of parallelism available in OpenACC, it is
useful to mark certain variables as GOMP_MAP_PRIVATE so that they can be
used in reductions. This patch was introduced in openacc-gcc-7-branch
here <https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00274.html>.


I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar

>From b0e7fb09bf3a3f853e77c2712b6f85ad21472e72 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 5 Sep 2017 22:09:34 +0800
Subject: [PATCH 2/5] [OpenACC] Add support for making maps 'private' inside
 offloaded regions

2018-XX-YY Chung-Lin Tang  
	   Cesar Philippidis  

	gcc/
	* tree.h (OMP_CLAUSE_MAP_PRIVATE): Define macro.
	* gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_PRIVATE enum value.
	(omp_add_variable): Add GOVD_MAP_PRIVATE to reduction clause flags if
	not a gang-partitioned loop directive.
	(gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_PRIVATE of new map
	clause to 1 if GOVD_MAP_PRIVATE flag is present.
	* omp-low.c (lower_oacc_reductions): Handle map clauses with
	OMP_CLAUSE_MAP_PRIVATE set in same matter as firstprivate/private.
	(lower_omp_target): Likewise. Add copy back code for map clauses with
	OMP_CLAUSE_MAP_PRIVATE set.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/reduction-9.c: New test.

(cherry picked from openacc-gcc-7-branch commit
2dc21f336368889c1ebf031801a7613f65899ef1, e17bb2068f9)
---
 gcc/gimplify.c| 34 ++-
 gcc/omp-low.c | 28 +++--
 gcc/tree.h|  3 ++
 .../libgomp.oacc-c-c++-common/reduction-9.c   | 41 +++
 4 files changed, 101 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-9.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index cf8977c8508..7dadf69b758 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -105,6 +105,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: must be present already.  */
   GOVD_MAP_FORCE_PRESENT = 524288,
 
+  /* Flag for GOVD_MAP, copy to/from private storage inside offloaded region.  */
+  GOVD_MAP_PRIVATE = 1048576,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
 			   | GOVD_LOCAL)
@@ -6835,6 +6838,21 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
 {
   struct gimplify_omp_ctx *outer_ctx = ctx->outer_context;
+
+  bool gang = false, worker = false, vector = false;
+  for (tree c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	{
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_GANG)
+	gang = true;
+	  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WORKER)
+	worker = true;
+	  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_VECTOR)
+	vector = true;
+	}
+
+  /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
+  bool map_private = !gang && (worker || vector);
+
   while (outer_ctx)
 	{
 	  n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl);
@@ -6856,12 +6874,21 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 		  /* Remove firstprivate and make it a copy map.  */
 		  n->value &= ~GOVD_FIRSTPRIVATE;
 		  n->value |= GOVD_MAP;
+
+		  /* If not gang-partitioned, add MAP_PRIVATE on the map
+		 clause.  */
+		  if (map_private)
+		n->value |= GOVD_MAP_PRIVATE;
 		}
 	}
 	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
 	{
-	  splay_tree_insert (outer_ctx->variables, (splay_tree_key)decl,
- GOVD_MAP | GOVD_SEEN);
+	  unsigned f = GOVD_MAP | GOVD_SEEN;
+
+	  /* If not gang-partitioned, add MAP_PRIVATE on the map clause.  */
+	  if (map_private)
+		f |= GOVD_MAP_PRIVATE;
+	  splay_tree_insert (outer_ctx->variables, (splay_tree_key)decl, f);
 	  break;
 	}
 	  outer_ctx = outer_ctx->outer_context;
@@ -8904,6 +8931,9 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	  gcc_unreachable ();
 	}
   OMP_CLAUSE_SET_MAP_KIND (clause, kind);
+  if ((flags & GOVD_MAP_PRIVATE)
+	  && TREE_CODE (OMP_CLAUSE_DECL (clause)) == VAR_DECL)
+	OMP_CLAUSE_MAP_PRIVATE (clause) = 1;
   tree c2 = gomp_needs_data_present (decl);
   /* Handle OpenACC pointers that were declared inside acc data
 	 regions.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 714490d6921..ef3c7651c74 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4907,7 +4907,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		  goto has_outer_reduction;
 		}
 		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
-			|| OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			|| OMP_CLAUSE_CODE (cls) == OMP

[PATCH] Privatize independent OpenACC reductions

2018-07-20 Thread Cesar Philippidis
This is another OpenACC reduction patch to privatize reduction variables
used inside inner acc loops. For some reason, I can't find the original
email announcement on the gcc-patches mailing list. But according to the
ChangeLog, I committed that change to og7 back on Jan 26, 2018.

I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar
>From a4753e2b40cf3d707aabd7c9d5bad7d8f9be8b6f Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Fri, 26 Jan 2018 08:30:13 -0800
Subject: [PATCH 3/5] Privatize independent OpenACC reductions

2018-XX-YY  Cesar Philippidis  

	gcc/
	* gimplify.c (oacc_privatize_reduction): New function.
	(omp_add_variable): Use it to determine if a reduction variable
	needs to be privatized.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/inner-reduction.c: New test.

(cherry picked from openacc-gcc-7-branch commit
330ba2316fabd0e5525c99fdacedb0bfae270244, 133f3a8fb5c)
---
 gcc/gimplify.c| 35 ++-
 .../inner-reduction.c | 23 
 2 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 7dadf69b758..737a280cfe9 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6722,6 +6722,32 @@ omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *ctx, tree type)
   lang_hooks.types.omp_firstprivatize_type_sizes (ctx, type);
 }
 
+/* Determine if CTX might contain any gang partitioned loops.  During
+   oacc_dev_low, independent loops are assign gangs at the outermost
+   level, and vectors in the innermost.  */
+
+static bool
+oacc_privatize_reduction (struct gimplify_omp_ctx *ctx)
+{
+  if (ctx == NULL)
+return false;
+
+  if (ctx->region_type != ORT_ACC)
+return false;
+
+  for (tree c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+switch (OMP_CLAUSE_CODE (c))
+  {
+  case OMP_CLAUSE_SEQ:
+	return oacc_privatize_reduction (ctx->outer_context);
+  case OMP_CLAUSE_GANG:
+	return true;
+  default:;
+  }
+
+  return true;
+}
+
 /* Add an entry for DECL in the OMP context CTX with FLAGS.  */
 
 static void
@@ -6851,7 +6877,14 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 	}
 
   /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
-  bool map_private = !gang && (worker || vector);
+  bool map_private;
+
+  if (gang)
+	map_private = false;
+  else if (worker || vector)
+	map_private = true;
+  else
+	map_private = oacc_privatize_reduction (ctx->outer_context);
 
   while (outer_ctx)
 	{
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c
new file mode 100644
index 000..0c317dcf8a6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c
@@ -0,0 +1,23 @@
+#include 
+
+int
+main ()
+{
+  const int n = 1000;
+  int i, j, temp, a[n];
+
+#pragma acc parallel loop
+  for (i = 0; i < n; i++)
+{
+  temp = i;
+#pragma acc loop reduction (+:temp)
+  for (j = 0; j < n; j++)
+	temp ++;
+  a[i] = temp;
+}
+
+  for (i = 0; i < n; i++)
+assert (a[i] == i+n);
+
+  return 0;
+}
-- 
2.17.1



[PATCH] Enable firstprivate OpenACC reductions

2018-07-20 Thread Cesar Philippidis
At present, all reduction variables are transferred via an implicit
'copy' clause. As shown the the recent patches I've been posting, that
causes a lot of problems when the reduction variables are used by
multiple workers or vectors. This patch teaches the gimplifier to
transfer reduction variable as firstprivate in OpenACC parallel regions,
if the are in an inner loop. This matches the behavior of reductions in
OpenACC 2.6.

Is this patch OK for trunk? I bootstrapped and regtested on x86_64/nvptx.

Thanks,
Cesar
>From 035be51a795ad8bed5342ba181220bf3102bcd6d Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Wed, 31 Jan 2018 07:21:53 -0800
Subject: [PATCH 4/5] Enable firstprivate OpenACC reductions

2018-XX-YY  Cesar Philippidis  

	gcc/
	* gimplify.c (omp_add_variable): Allow certain OpenACC reduction
	variables to remain firstprivate.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-8.c: New test.

(cherry picked from openacc-gcc-7-branch commit
441621739e2a067c97409f8b0e3e30362a7905be, cec00212ad8)
---
 gcc/gimplify.c| 30 --
 .../c-c++-common/goacc/reduction-8.c  | 94 +++
 2 files changed, 117 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/reduction-8.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 737a280cfe9..bcfb029275c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6858,9 +6858,16 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   else
 splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags);
 
-  /* For reductions clauses in OpenACC loop directives, by default create a
- copy clause on the enclosing parallel construct for carrying back the
- results.  */
+  /* For OpenACC loop directives, when a reduction is immediately
+ enclosed within an acc parallel or kernels construct, it must
+ have an implied copy data mapping. E.g.
+
+   #pragma acc parallel
+	 {
+	   #pragma acc loop reduction (+:sum)
+
+ a copy clause for sum should be added on the enclosing parallel
+ construct for carrying back the results.  */
   if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
 {
   struct gimplify_omp_ctx *outer_ctx = ctx->outer_context;
@@ -6876,8 +6883,11 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 	vector = true;
 	}
 
-  /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
-  bool map_private;
+  /* Reduction data maps need to be marked as private for worker
+	 and vector loops, in order to ensure that value of the
+	 reduction carried back to the host.  Set new copy map as
+	 'private' if sure we're not gang-partitioning.  */
+  bool map_private, update_data_map = false;
 
   if (gang)
 	map_private = false;
@@ -6886,6 +6896,10 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   else
 	map_private = oacc_privatize_reduction (ctx->outer_context);
 
+  if (ctx->outer_context
+	  && ctx->outer_context->region_type == ORT_ACC_PARALLEL)
+	update_data_map = true;
+
   while (outer_ctx)
 	{
 	  n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl);
@@ -6902,7 +6916,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 		  gcc_assert (!(n->value & GOVD_FIRSTPRIVATE)
 			  && (n->value & GOVD_MAP));
 		}
-	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
+	  else if (update_data_map
+		   && outer_ctx->region_type == ORT_ACC_PARALLEL)
 		{
 		  /* Remove firstprivate and make it a copy map.  */
 		  n->value &= ~GOVD_FIRSTPRIVATE;
@@ -6914,7 +6929,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 		n->value |= GOVD_MAP_PRIVATE;
 		}
 	}
-	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
+	  else if (update_data_map
+		   && outer_ctx->region_type == ORT_ACC_PARALLEL)
 	{
 	  unsigned f = GOVD_MAP | GOVD_SEEN;
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-8.c b/gcc/testsuite/c-c++-common/goacc/reduction-8.c
new file mode 100644
index 000..8a0283f4ac3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-8.c
@@ -0,0 +1,94 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  int result, array[n];
+
+#pragma acc parallel loop reduction (+:result)
+  for (i = 0; i < n; i++)
+result ++;
+
+#pragma acc parallel
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+result ++;
+
+#pragma acc parallel
+#pragma acc loop
+  for (i = 0; i < n; i++)
+{
+  result = i;
+
+#pragma acc loop reduction(+:result)
+  for (j = 0; j < n; j++)
+	result ++;
+
+  array[i] = result;
+}
+

[PATCH] Adjust offsets for present data clauses

2018-07-20 Thread Cesar Philippidis
This is another old gomp4 patch that corrects a bug where the runtime
was passing the wrong offset for subarray data to the accelerator. The
original description of this patch can be found here
<https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01676.html>

I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar
>From fb743d8a45193c177cb0082400d140949e8c1e6d Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Wed, 24 Aug 2016 00:02:50 +
Subject: [PATCH 5/5] [libgomp, OpenACC] Adjust offsets for present data
 clauses

2018-XX-YY  Cesar Philippidis  

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Add offset to devaddrs.
	* testsuite/libgomp.oacc-c-c++-common/data_offset.c: New test.
	* testsuite/libgomp.oacc-fortran/data_offset.f90: New test.

(cherry picked from gomp-4_0-branch r239723, 00c2585)
---
 libgomp/oacc-parallel.c   | 10 -
 .../libgomp.oacc-c-c++-common/data_offset.c   | 41 ++
 .../libgomp.oacc-fortran/data_offset.f90  | 43 +++
 3 files changed, 92 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90

diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index b80ace58590..20e9ab2e251 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -231,8 +231,14 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
-			+ tgt->list[i].key->tgt_offset);
+{
+  if (tgt->list[i].key != NULL)
+	devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
++ tgt->list[i].key->tgt_offset
++ tgt->list[i].offset);
+  else
+	devaddrs[i] = NULL;
+}
 
   acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
 			  async, dims, tgt);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c
new file mode 100644
index 000..ccbbfcab87b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c
@@ -0,0 +1,41 @@
+/* Test present data clauses in acc offloaded regions when the
+   subarray inside the present clause does not have the same base
+   offset value as the subarray in the enclosing acc data or acc enter
+   data variable.  */
+
+#include 
+
+void
+offset (int *data, int n)
+{
+  int i;
+
+#pragma acc parallel loop present (data[0:n])
+  for (i = 0; i < n; i++)
+data[i] = n;
+}
+
+int
+main ()
+{
+  const int n = 30;
+  int data[n], i;
+
+  for (i = 0; i < n; i++)
+data[i] = -1;
+
+#pragma acc data copy(data[0:n])
+  {
+offset (data+10, 10);
+  }
+
+  for (i = 0; i < n; i++)
+{
+  if (i < 10 || i >= 20)
+	assert (data[i] == -1);
+  else
+	assert (data[i] == 10);
+}
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90 b/libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90
new file mode 100644
index 000..ff8ee39f964
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90
@@ -0,0 +1,43 @@
+! Test present data clauses in acc offloaded regions when the subarray
+! inside the present clause does not have the same base offset value
+! as the subarray in the enclosing acc data or acc enter data variable.
+
+program test
+  implicit none
+
+  integer, parameter :: n = 30, m = 10
+  integer :: i
+  integer, allocatable :: data(:)
+  logical bounded
+
+  allocate (data(n))
+
+  data(:) = -1
+
+  !$acc data copy (data(5:20))
+  call test_data (data, n, m)
+  !$acc end data
+
+  do i = 1, n
+ bounded = i < m .or. i >= m+m
+ if (bounded .and. (data(i) /= -1)) then
+call abort
+ else if (.not. bounded .and. data(i) /= 10) then
+call abort
+ end if
+  end do
+
+  deallocate (data)
+end program test
+
+subroutine test_data (data, n, m)
+  implicit none
+
+  integer :: n, m, data(n), i
+
+  !$acc parallel loop present (data(m:m))
+  do i = m, m+m-1
+ data(i) = m
+  end do
+  !$acc end parallel loop
+end subroutine test_data
-- 
2.17.1



[PATCH 1/3] Correct the reported line number in fortran combined OpenACC directives

2018-07-25 Thread Cesar Philippidis
The fortran FE incorrectly records the line locations of combined acc
loop directives when it lowers the construct to gimple. Usually this
isn't a problem because the fortran FE is able to report problems with
acc loops itself. However, there will be inaccuracies if the ME tries
to use those locations.

Note that test cases are inconspicuously absent in this patch.
However, without this bug fix, -fopt-info-note-omp will report bogus
line numbers. This code patch will be tested in a later patch in
this series.

Is this OK for trunk? I bootstrapped and regtested it on x86_64 with
nvptx offloading.

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

gcc/fortran/
* trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
location of combined acc loops.

(cherry picked from gomp-4_0-branch r245653)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c..e7707d0 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   gfc_omp_clauses construct_clauses, loop_clauses;
   tree stmt, oacc_clauses = NULL_TREE;
   enum tree_code construct_code;
+  location_t loc = input_location;
 
   switch (code->op)
 {
@@ -3930,12 +3931,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   else
 pushlevel ();
   stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL);
+
+  if (CAN_HAVE_LOCATION_P (stmt))
+SET_EXPR_LOCATION (stmt, loc);
+
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   else
 poplevel (0, 0);
-  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
-oacc_clauses);
+
+  stmt = build2_loc (loc, construct_code, void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
 }
-- 
2.7.4



[PATCH 0/3] Add OpenACC diagnostics to -fopt-info-note-omp

2018-07-25 Thread Cesar Philippidis
This patch series extends -fopt-info-note-omp to include OpenACC loop
diagnostics when it is used in conjunction with -fopenacc. At present,
the diagnostics are limited to reporting how OpenACC loops are
partitioned, e.g., seq, gang, worker or vector. The major advantage of
this diagnostics is that it informs the user how GCC automatically
partitions independent loops, i.e., acc loops without any parallelism
clauses inside acc parallel regions. This information provides the
user with insights on how to select num_gangs, num_workers and
vector_length for their application.

All three patches in this series are independent from one
another. Patches 1 and 2 fix diagnostics bugs involving incorrect line
numbers. Patch 3 is responsible for generating the actual diagnostics.

Cesar


[PATCH 2/3] Correct the reported line number in c++ combined OpenACC directives

2018-07-25 Thread Cesar Philippidis
Like the fortran FE, the C++ FE doesn't set the expr_location of the
split acc loop in combined acc parallel/kernels loop directives. This
only happens for with combined directives, otherwise
cp_parser_omp_construct would be responsible for setting the
location. After fixing this bug, I was able to resolve a couple of
long standing diagnostics discrepancies between the c/c++ FEs in the
test suite.

Is this patch OK for trunk? I bootstrapped and regtested using x86_64
with nvptx offloading.

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

gcc/cp/
* parser.c (cp_parser_oacc_kernels_parallel): Adjust EXPR_LOCATION
on the combined acc loop.

gcc/testsuite/
* c-c++-common/goacc/combined-directives-3.c: New test.
* c-c++-common/goacc/loop-2-kernels.c (void K): Adjust test.
* c-c++-common/goacc/loop-2-parallel.c (void P): Adjust test.
* c-c++-common/goacc/loop-3.c (void p2): Adjust test.

(cherry picked from gomp-4_0-branch r245673)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 90d5d00..52e61fc 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -37183,8 +37183,9 @@ cp_parser_oacc_kernels_parallel (cp_parser *parser, 
cp_token *pragma_tok,
  cp_lexer_consume_token (parser->lexer);
  tree block = begin_omp_parallel ();
  tree clauses;
- cp_parser_oacc_loop (parser, pragma_tok, p_name, mask, &clauses,
-  if_p);
+ tree stmt = cp_parser_oacc_loop (parser, pragma_tok, p_name, mask,
+  &clauses, if_p);
+ protected_set_expr_location (stmt, pragma_tok->location);
  return finish_omp_construct (code, block, clauses);
}
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives-3.c 
b/gcc/testsuite/c-c++-common/goacc/combined-directives-3.c
new file mode 100644
index 000..77d4182
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives-3.c
@@ -0,0 +1,24 @@
+/* Verify the accuracy of the line number associated with combined
+   constructs.  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq auto /* { dg-error "'seq' overrides other 
OpenACC loop specifiers" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+for (y = 0; y < 10; y++)
+  ;
+
+#pragma acc parallel loop gang auto /* { dg-error "'auto' conflicts with other 
OpenACC loop specifiers" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker auto /* { dg-error "'auto' conflicts with other 
OpenACC loop specifiers" } */
+for (y = 0; y < 10; y++)
+#pragma acc loop vector
+  for (z = 0; z < 10; z++)
+   ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c 
b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
index 01ad32d..3a11ef5f 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -145,8 +145,8 @@ void K(void)
 #pragma acc kernels loop worker(num:5)
   for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" }
+  for (i = 0; i < 10; i++)
 { }
 #pragma acc kernels loop gang worker
   for (i = 0; i < 10; i++)
@@ -161,8 +161,8 @@ void K(void)
 #pragma acc kernels loop vector(length:5)
   for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" }
+  for (i = 0; i < 10; i++)
 { }
 #pragma acc kernels loop gang vector
   for (i = 0; i < 10; i++)
@@ -174,16 +174,16 @@ void K(void)
 #pragma acc kernels loop auto
   for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "" { target 
c } }
-  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" }
+  for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" "" { 
target c } }
-  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+#pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" }
+  for (i = 0; i < 10; i++)
 { }
-#pragma acc kernels loop worker aut

[PATCH 3/3] Add user-friendly OpenACC diagnostics regarding detected parallelism.

2018-07-25 Thread Cesar Philippidis
This patch teaches GCC to inform the user how it assigned parallelism
to each OpenACC loop at compile time using the -fopt-info-note-omp
flag. For instance, given the acc parallel loop nest:

  #pragma acc parallel loop
  for (...)
#pragma acc loop vector
for (...)

GCC will report somthing like

  foo.c:4:0: note: Detected parallelism 
  foo.c:6:0: note: Detected parallelism 

Note how only the inner loop specifies vector parallelism. In this
example, GCC automatically assigned gang and worker parallelism to the
outermost loop. Perhaps, going forward, it would be useful to
distinguish which parallelism was specified by the user and which was
assigned by the compiler. But that can be added in a follow up patch.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
with nvptx offloading.

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

gcc/
* omp-offload.c (inform_oacc_loop): New function.
(execute_oacc_device_lower): Use it to display loop parallelism.

gcc/testsuite/
* c-c++-common/goacc/note-parallelism.c: New test.
* gfortran.dg/goacc/note-parallelism.f90: New test.

(cherry picked from gomp-4_0-branch r245683, and gcc/testsuite/ parts of
r245770)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 0abf028..66b99bb 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -866,6 +866,31 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " seq" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+? " gang" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+? " worker" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+? " vector" : "";
+  dump_location_t loc = dump_location_t::from_location_t (loop->loc);
+
+  dump_printf_loc (MSG_NOTE, loc,
+  "Detected parallelism \n", seq, gang,
+  worker, vector);
+
+  if (loop->child)
+inform_oacc_loop (loop->child);
+  if (loop->sibling)
+inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
structures as we go.  By construction these loops are properly
nested.  */
@@ -1533,6 +1558,8 @@ execute_oacc_device_lower ()
   dump_oacc_loop (dump_file, loops, 0);
   fprintf (dump_file, "\n");
 }
+  if (dump_enabled_p () && loops->child)
+inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
  dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
new file mode 100644
index 000..3ec794c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -0,0 +1,61 @@
+/* Test the output of -fopt-info-note-omp.  */
+
+/* { dg-additional-options "-fopt-info-note-omp" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc parallel loop seq /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop vector /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang vector /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang worker /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop worker vector /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop gang worker vector /* { dg-message "note: Detected 
parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+;
+
+#pragma acc parallel loop /* { dg-message "note: Detected parallelism " } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop /* { dg-message "note: Detected parallelism " } */
+for (y = 0; y < 10; y++)
+  ;
+
+#pragma acc parallel loop gang /* { dg-message "note: Detected parallelism 
" } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker /* { dg-message "note: Detected parallelism " } */
+ 

Re: [PATCH 1/3] Correct the reported line number in fortran combined OpenACC directives

2018-07-25 Thread Cesar Philippidis
On 07/25/2018 08:32 AM, Marek Polacek wrote:
> On Wed, Jul 25, 2018 at 08:29:17AM -0700, Cesar Philippidis wrote:
>> The fortran FE incorrectly records the line locations of combined acc
>> loop directives when it lowers the construct to gimple. Usually this
>> isn't a problem because the fortran FE is able to report problems with
>> acc loops itself. However, there will be inaccuracies if the ME tries
>> to use those locations.
>>
>> Note that test cases are inconspicuously absent in this patch.
>> However, without this bug fix, -fopt-info-note-omp will report bogus
>> line numbers. This code patch will be tested in a later patch in
>> this series.
>>
>> Is this OK for trunk? I bootstrapped and regtested it on x86_64 with
>> nvptx offloading.
>>
>> Thanks,
>> Cesar
>>
>> 2018-XX-YY  Cesar Philippidis  
>>
>>  gcc/fortran/
>>  * trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
>>  location of combined acc loops.
>>
>> (cherry picked from gomp-4_0-branch r245653)
>>
>> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
>> index f038f4c..e7707d0 100644
>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
>> @@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>>gfc_omp_clauses construct_clauses, loop_clauses;
>>tree stmt, oacc_clauses = NULL_TREE;
>>enum tree_code construct_code;
>> +  location_t loc = input_location;
>>  
>>switch (code->op)
>>  {
>> @@ -3930,12 +3931,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>>else
>>  pushlevel ();
>>stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, 
>> NULL);
>> +
>> +  if (CAN_HAVE_LOCATION_P (stmt))
>> +SET_EXPR_LOCATION (stmt, loc);
> 
> This is protected_set_expr_location.

Neat, thanks! This patch includes that correction. Is it ok for trunk
after bootstrapping and regression testing?

Thanks,
Cesar

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* trans-openmp.c (gfc_trans_oacc_combined_directive): Set the
	location of combined acc loops.

(cherry picked from gomp-4_0-branch r245653)
---
 gcc/fortran/trans-openmp.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c5bf8..b549c682533 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -3869,6 +3869,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   gfc_omp_clauses construct_clauses, loop_clauses;
   tree stmt, oacc_clauses = NULL_TREE;
   enum tree_code construct_code;
+  location_t loc = input_location;
 
   switch (code->op)
 {
@@ -3929,13 +3930,16 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
 pblock = █
   else
 pushlevel ();
+
   stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL);
+  protected_set_expr_location (stmt, loc);
+
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
   else
 poplevel (0, 0);
-  stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
-		 oacc_clauses);
+
+  stmt = build2_loc (loc, construct_code, void_type_node, stmt, oacc_clauses);
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
 }
-- 
2.17.1



Re: [PATCH 00/11] [nvptx] Initial vector length changes

2018-07-25 Thread Cesar Philippidis
On 07/24/2018 01:47 PM, ce...@codesourcery.com wrote:
> From: Cesar Philippidis 
> 
> This patch series contains various cleanups and structural
> reorganizations to the NVPTX BE in preparation for the forthcoming
> variable length vector length enhancements. Tom, in order to make
> these changes easier for you to review, I broke these patches into
> logical components. If approved for trunk, would you like to see these
> patches committed individually, or all together in a single huge
> commit?
> 
> One notable change in this patch set is the partial inclusion of the
> PTX_DEFAULT_RUNTIME_DIM change that I previously placed with the
> libgomp default geometry update patch that I posted a couple of weeks
> ago. I don't want to block this patch series so I included the nvptx
> changes in patch 01.
> 
> It this OK for trunk? I regtested both standalone and offloading
> compiliers. I'm seeing some inconsistencies in the standalone compiler
> results, so I might rerun those just to be safe. But the results using
> nvptx as an offloading compiler came back clean.

On further inspection, the inconsistencies turned out to be isolated in
the c++ tests. The c tests results are clean.

Cesar


Re: [PATCH 3/3] Add user-friendly OpenACC diagnostics regarding detected parallelism.

2018-07-26 Thread Cesar Philippidis
On 07/26/2018 01:33 AM, Richard Biener wrote:
> On Wed, Jul 25, 2018 at 5:30 PM Cesar Philippidis
>  wrote:
>>
>> This patch teaches GCC to inform the user how it assigned parallelism
>> to each OpenACC loop at compile time using the -fopt-info-note-omp
>> flag. For instance, given the acc parallel loop nest:
>>
>>   #pragma acc parallel loop
>>   for (...)
>> #pragma acc loop vector
>> for (...)
>>
>> GCC will report somthing like
>>
>>   foo.c:4:0: note: Detected parallelism 
>>   foo.c:6:0: note: Detected parallelism 
>>
>> Note how only the inner loop specifies vector parallelism. In this
>> example, GCC automatically assigned gang and worker parallelism to the
>> outermost loop. Perhaps, going forward, it would be useful to
>> distinguish which parallelism was specified by the user and which was
>> assigned by the compiler. But that can be added in a follow up patch.
>>
>> Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
>> with nvptx offloading.
> 
> Shouldn't this use MSG_OPTIMIZED_LOCATIONS instead?  Are there
> any other optinfo notes emitted?  Like when despite pragmas loops
> are not handled or so?

Early on I was just using the diagnostics in omp-grid.c as a model, but
yes, it does make sense to use MSG_OPTIMIZED_LOCATIONS instead of
MSG_NOTE. And no, these are the only optinfo notes that we're emitting
at the moment. All of the other diagnostics are just errors and
warnings, although we probably should revisit that for some of the
forthcoming acc routine diagnostics. Going forward, now that there's in
interest in automatic parallelism inside acc kernels, we do plan on
expanding the diagnostics.

The attached revised patch now uses MSG_OPTIMIZED_LOCATIONS for the
diagnostics. If this gets approved for trunk, I'll go ahead and backport
it to og8 and update the OpenACC wiki to change the usage of
-fopt-info-note-omp to -fopt-info-optimized-omp.

Is this OK for trunk?

Thanks,
Cesar
2018-XX-YY  Cesar Philippidis  

	gcc/
	* omp-offload.c (inform_oacc_loop): New function.
	(execute_oacc_device_lower): Use it to display loop parallelism.

	gcc/testsuite/
	* c-c++-common/goacc/note-parallelism.c: New test.
	* gfortran.dg/goacc/note-parallelism.f90: New test.

(cherry picked from gomp-4_0-branch r245683, and gcc/testsuite/ parts of
r245770)

use MSG_OPTIMIZED_LOCATIONS instead of MSG_NOTE
---
 gcc/omp-offload.c | 27 
 .../c-c++-common/goacc/note-parallelism.c | 61 ++
 .../gfortran.dg/goacc/note-parallelism.f90| 62 +++
 3 files changed, 150 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/note-parallelism.f90

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 0abf0283c9e..3582dda3d1a 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -866,6 +866,31 @@ debug_oacc_loop (oacc_loop *loop)
   dump_oacc_loop (stderr, loop, 0);
 }
 
+/* Provide diagnostics on OpenACC loops LOOP, its siblings and its
+   children.  */
+
+static void
+inform_oacc_loop (oacc_loop *loop)
+{
+  const char *seq = loop->mask == 0 ? " seq" : "";
+  const char *gang = loop->mask & GOMP_DIM_MASK (GOMP_DIM_GANG)
+? " gang" : "";
+  const char *worker = loop->mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)
+? " worker" : "";
+  const char *vector = loop->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)
+? " vector" : "";
+  dump_location_t loc = dump_location_t::from_location_t (loop->loc);
+
+  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+		   "Detected parallelism \n", seq, gang,
+		   worker, vector);
+
+  if (loop->child)
+inform_oacc_loop (loop->child);
+  if (loop->sibling)
+inform_oacc_loop (loop->sibling);
+}
+
 /* DFS walk of basic blocks BB onwards, creating OpenACC loop
structures as we go.  By construction these loops are properly
nested.  */
@@ -1533,6 +1558,8 @@ execute_oacc_device_lower ()
   dump_oacc_loop (dump_file, loops, 0);
   fprintf (dump_file, "\n");
 }
+  if (dump_enabled_p () && loops->child)
+inform_oacc_loop (loops->child);
 
   /* Offloaded targets may introduce new basic blocks, which require
  dominance information to update SSA.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
new file mode 100644
index 000..2e50d86cd23
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -0,0 +1,61 @@
+/* Test the output of -fopt-info-note-omp.  */
+
+/* { dg-additional-options "-fopt-info-note-optimized" } */
+
+int
+main ()
+{
+  int x, y, z;
+
+

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-26 Thread Cesar Philippidis
Hi Tom,

I see that you're reviewing the libgomp changes. Please disregard the
following hunk:

On 07/11/2018 12:13 PM, Cesar Philippidis wrote:
> @@ -1199,12 +1202,59 @@ nvptx_exec (void (*fn), size_t mapnum, void 
> **hostaddrs, void **devaddrs,
>default_dims[GOMP_DIM_VECTOR]);
>   }
>pthread_mutex_unlock (&ptx_dev_lock);
> +  int vectors = default_dims[GOMP_DIM_VECTOR];
> +  int workers = default_dims[GOMP_DIM_WORKER];
> +  int gangs = default_dims[GOMP_DIM_GANG];
> +
> +  if (nvptx_thread()->ptx_dev->driver_version > 6050)
> + {
> +   int grids, blocks;
> +   CUDA_CALL_ASSERT (cuOccupancyMaxPotentialBlockSize, &grids,
> + &blocks, function, NULL, 0,
> + dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]);
> +   GOMP_PLUGIN_debug (0, "cuOccupancyMaxPotentialBlockSize: "
> +  "grid = %d, block = %d\n", grids, blocks);
> +
> +   gangs = grids * dev_size;
> +   workers = blocks / vectors;
> + }

I revisited this change yesterday and I noticed it was setting gangs
incorrectly. Basically, gangs should be set as follows

  gangs = grids * (blocks / warp_size);

or to be more closer to og8 as

  gangs = 2 * grids * (blocks / warp_size);

The use of that magic constant 2 is to prevent thread starvation. That's
a similar concept behind make -j<2*#threads>.

Anyway, I'm still experimenting with that change. There are still some
discrepancies between the way that I select num_workers and how the
driver does. The driver appears to be a little bit more conservative,
but according to the thread occupancy calculator, that should yield
greater performance on GPUs.

I just wanted to give you a heads up because you seem to be working on this.

Thanks for all of your reviews!

By the way, are you now maintainer of the libgomp nvptx plugin?

Cesar


Re: [PATCH 0/8] Reduce/remove dependencies on _GLIBCXX_USE_C99_STDINT_TR1

2018-07-26 Thread Cesar Philippidis
On 07/26/2018 07:01 AM, jwak...@redhat.com wrote:
> From: Jonathan Wakely 

It looks like you're using git send-email for this patch series. And it
seems like you made the same mistake that I did when you configured git
sendmail.from. According to the git sent-email manpage, from should be
your email address, however, it really wants it to be in of the form

  Full Name 

This is not a huge deal because the email went through, but it was
something that wasn't immediately obvious to me.

Cesar


Re: [libgomp, nvptx, committed] Calculate default dims per device

2018-07-30 Thread Cesar Philippidis
On 07/30/2018 03:19 AM, Tom de Vries wrote:
> 
> [libgomp, nvptx] Calculate default dims per device
> 
> The default dimensions are calculated using per-device properties, but
> initialized once and used on all devices.
> 
> This patch fixes this problem by introducing per-device default dimensions.

Neat, thanks!

I wonder if it's worthwhile to optimize the case where a system has more
than one identical GPU.

Cesar


[PATCH,nvptx] Use CUDA driver API to select default runtime launch, geometry

2018-07-31 Thread Cesar Philippidis
The attached patch teaches libgomp how to use the CUDA thread occupancy
calculator built into the CUDA driver. Despite both being based off the
CUDA thread occupancy spreadsheet distributed with CUDA, the built in
occupancy calculator differs from the occupancy calculator in og8 in two
key ways. First, og8 launches twice the number of gangs as the driver
thread occupancy calculator. This was my attempt at preventing threads
from idling, and it operating on a similar principle of running 'make
-jN', where N is twice the number of CPU threads. Second, whereas og8
always attempts to maximize the CUDA block size, the driver may select a
smaller block, which effectively decreases num_workers.

In terms of performance, there really isn't that much of a difference
between the CUDA driver's occupancy calculator and og8's. However, on
the tests that are impacted, they are generally within a factor of two
from one another, with some tests running faster with the driver
occupancy calculator and others with og8's.

Unfortunately, support for the CUDA driver API isn't universal; it's
only available in CUDA version 6.5 (or 6050) and newer. In this patch,
I'm exploiting the fact that init_cuda_lib only checks for errors on the
last library function initialized. Therefore it guards the usage of

  cuOccupancyMaxPotentialBlockSizeWithFlags

by checking driver_version. If the driver occupancy calculator isn't
available, it falls back to the existing defaults. Maybe the og8 thread
occupancy would make a better default for older versions of CUDA, but
that's a patch for another day.

Is this patch OK for trunk? I bootstrapped and regression tested it
using x86_64 with nvptx offloading.

Thanks,
Cesar
[nvptx] Use CUDA driver API to select default runtime launch geometry

2018-XX-YY  Cesar Philippidis  
	libgomp/
	plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef.
	(cuDriverGetVersion): Declare.
	(cuOccupancyMaxPotentialBlockSizeWithFlags): Declare.
	plugin/plugin-nvptx.c (CUDA_ONE_CALL): Add entries for
	cuDriverGetVersion and cuOccupancyMaxPotentialBlockSize.
	(ptx_device): Add driver_version member.
	(nvptx_open_device): Initialize it.
	(nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the
	default num_gangs and num_workers when the driver supports it.
---
 libgomp/plugin/cuda/cuda.h|  5 +
 libgomp/plugin/plugin-nvptx.c | 37 -
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 4799825..1fc694d 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -44,6 +44,7 @@ typedef void *CUevent;
 typedef void *CUfunction;
 typedef void *CUlinkState;
 typedef void *CUmodule;
+typedef size_t (*CUoccupancyB2DSize)(int);
 typedef void *CUstream;
 
 typedef enum {
@@ -123,6 +124,7 @@ CUresult cuCtxSynchronize (void);
 CUresult cuDeviceGet (CUdevice *, int);
 CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
 CUresult cuDeviceGetCount (int *);
+CUresult cuDriverGetVersion (int *);
 CUresult cuEventCreate (CUevent *, unsigned);
 #define cuEventDestroy cuEventDestroy_v2
 CUresult cuEventDestroy (CUevent);
@@ -170,6 +172,9 @@ CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, CUmodule, const char *);
 CUresult cuModuleLoad (CUmodule *, const char *);
 CUresult cuModuleLoadData (CUmodule *, const void *);
 CUresult cuModuleUnload (CUmodule);
+CUresult cuOccupancyMaxPotentialBlockSizeWithFlags (int *, int *, CUfunction,
+		CUoccupancyB2DSize, size_t,
+		int, unsigned int);
 CUresult cuStreamCreate (CUstream *, unsigned);
 #define cuStreamDestroy cuStreamDestroy_v2
 CUresult cuStreamDestroy (CUstream);
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index b6ec5f8..2647af6 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -63,6 +63,7 @@ CUDA_ONE_CALL (cuCtxSynchronize)	\
 CUDA_ONE_CALL (cuDeviceGet)		\
 CUDA_ONE_CALL (cuDeviceGetAttribute)	\
 CUDA_ONE_CALL (cuDeviceGetCount)	\
+CUDA_ONE_CALL (cuDriverGetVersion)	\
 CUDA_ONE_CALL (cuEventCreate)		\
 CUDA_ONE_CALL (cuEventDestroy)		\
 CUDA_ONE_CALL (cuEventElapsedTime)	\
@@ -94,6 +95,7 @@ CUDA_ONE_CALL (cuModuleGetGlobal)	\
 CUDA_ONE_CALL (cuModuleLoad)		\
 CUDA_ONE_CALL (cuModuleLoadData)	\
 CUDA_ONE_CALL (cuModuleUnload)		\
+CUDA_ONE_CALL (cuOccupancyMaxPotentialBlockSize) \
 CUDA_ONE_CALL (cuStreamCreate)		\
 CUDA_ONE_CALL (cuStreamDestroy)		\
 CUDA_ONE_CALL (cuStreamQuery)		\
@@ -423,6 +425,7 @@ struct ptx_device
   int max_threads_per_block;
   int max_threads_per_multiprocessor;
   int default_dims[GOMP_DIM_MAX];
+  int driver_version;
 
   struct ptx_image_data *images;  /* Images loaded on device.  */
   pthread_mutex_t image_lock; /* Lock for above list.  */
@@ -734,6 +737,7 @@ nvptx_open_device (int n)
   ptx_dev->ord = n;
   ptx_dev->dev = dev;
   ptx_dev->ctx_shared = f

[PATCH,nvptx] Remove use of 'struct map' from plugin (nvptx)

2018-07-31 Thread Cesar Philippidis
This is an old patch which removes the struct map from the nvptx plugin.
I believe at one point this was supposed to be used to manage async data
mappings, but in practice that never worked out.

Is this OK for trunk? I bootstrapped and regtested on x86_64 with nvptx
offloading.

Thanks,
Cesar
[PATCH] Remove use of 'struct map' from plugin (nvptx)

2018-XX-YY  Cesar Philippidis  
	James Norris 	

	libgomp/
	* plugin/plugin-nvptx.c (struct map): Removed.
	(map_init, map_pop): Remove use of struct map. (map_push):
	Likewise and change argument list.
	* testsuite/libgomp.oacc-c-c++-common/mapping-1.c: New

(cherry picked from gomp-4_0-branch r231616)
---
 libgomp/plugin/plugin-nvptx.c  | 33 +++-
 .../libgomp.oacc-c-c++-common/mapping-1.c  | 63 ++
 2 files changed, 69 insertions(+), 27 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index a92f054..1237ea10 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -225,13 +225,6 @@ struct nvptx_thread
   struct ptx_device *ptx_dev;
 };
 
-struct map
-{
-  int async;
-  size_t  size;
-  charmappings[0];
-};
-
 static bool
 map_init (struct ptx_stream *s)
 {
@@ -265,16 +258,12 @@ map_fini (struct ptx_stream *s)
 static void
 map_pop (struct ptx_stream *s)
 {
-  struct map *m;
-
   assert (s != NULL);
   assert (s->h_next);
   assert (s->h_prev);
   assert (s->h_tail);
 
-  m = s->h_tail;
-
-  s->h_tail += m->size;
+  s->h_tail = s->h_next;
 
   if (s->h_tail >= s->h_end)
 s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
@@ -292,37 +281,27 @@ map_pop (struct ptx_stream *s)
 }
 
 static void
-map_push (struct ptx_stream *s, int async, size_t size, void **h, void **d)
+map_push (struct ptx_stream *s, size_t size, void **h, void **d)
 {
   int left;
   int offset;
-  struct map *m;
 
   assert (s != NULL);
 
   left = s->h_end - s->h_next;
-  size += sizeof (struct map);
 
   assert (s->h_prev);
   assert (s->h_next);
 
   if (size >= left)
 {
-  m = s->h_prev;
-  m->size += left;
-  s->h_next = s->h_begin;
-
-  if (s->h_next + size > s->h_end)
-	GOMP_PLUGIN_fatal ("unable to push map");
+  assert (s->h_next == s->h_prev);
+  s->h_next = s->h_prev = s->h_tail = s->h_begin;
 }
 
   assert (s->h_next);
 
-  m = s->h_next;
-  m->async = async;
-  m->size = size;
-
-  offset = (void *)&m->mappings[0] - s->h;
+  offset = s->h_next - s->h;
 
   *d = (void *)(s->d + offset);
   *h = (void *)(s->h + offset);
@@ -1291,7 +1270,7 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   /* This reserves a chunk of a pre-allocated page of memory mapped on both
  the host and the device. HP is a host pointer to the new chunk, and DP is
  the corresponding device pointer.  */
-  map_push (dev_str, async, mapnum * sizeof (void *), &hp, &dp);
+  map_push (dev_str, mapnum * sizeof (void *), &hp, &dp);
 
   GOMP_PLUGIN_debug (0, "  %s: prepare mappings\n", __FUNCTION__);
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c
new file mode 100644
index 000..593e7d4
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/mapping-1.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+
+#include 
+#include 
+#include 
+
+/* Exercise the kernel launch argument mapping.  */
+
+int
+main (int argc, char **argv)
+{
+  int a[256], b[256], c[256], d[256], e[256], f[256];
+  int i;
+  int n;
+
+  /* 48 is the size of the mappings for the first parallel construct.  */
+  n = sysconf (_SC_PAGESIZE) / 48 - 1;
+
+  i = 0;
+
+  for (i = 0; i < n; i++)
+{
+  #pragma acc parallel copy (a, b, c, d)
+	{
+	  int j;
+
+	  for (j = 0; j < 256; j++)
+	{
+	  a[j] = j;
+	  b[j] = j;
+	  c[j] = j;
+	  d[j] = j;
+	}
+	}
+}
+
+#pragma acc parallel copy (a, b, c, d, e, f)
+  {
+int j;
+
+for (j = 0; j < 256; j++)
+  {
+	a[j] = j;
+	b[j] = j;
+	c[j] = j;
+	d[j] = j;
+	e[j] = j;
+	f[j] = j;
+  }
+  }
+
+  for (i = 0; i < 256; i++)
+   {
+ if (a[i] != i) abort();
+ if (b[i] != i) abort();
+ if (c[i] != i) abort();
+ if (d[i] != i) abort();
+ if (e[i] != i) abort();
+ if (f[i] != i) abort();
+   }
+
+  exit (0);
+}
-- 
2.7.4



[PATCH,nvptx] Remove use of CUDA unified memory in libgomp

2018-07-31 Thread Cesar Philippidis
At present, libgomp is using CUDA unified memory only as a buffer pass
to the struct containing the pointers to the data mappings to the
offloaded functions. I'm not sure why unified memory is needed here if
it is still being managed explicitly by the driver.

This patch removes the use of CUDA unified memory from the driver. I
don't recall observing any reduction in performance. Besides,
eventually, we'd like to eliminate the struct containing all pointers to
the offloaded data mappings and pass those pointers as individual
function arguments to cuLaunchKernel directly.

Is this patch OK for trunk? I bootstrapped and regression tested it for
x86_64 with nvptx offloading.

Thanks,
Cesar
[PATCH] [nvptx] Remove use of CUDA unified memory in libgomp

2018-XX-YY  Cesar Philippidis  

	libgomp/
	* plugin/plugin-nvptx.c (struct cuda_map): New.
	(struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev,
	h_tail with (cuda_map *) map.
	(cuda_map_create): New function.
	(cuda_map_destroy): New function.
	(map_init): Update to use a linked list of cuda_map objects.
	(map_fini): Likewise.
	(map_pop): Likewise.
	(map_push): Likewise.  Return CUdeviceptr instead of void.
	(init_streams_for_device): Remove stales references to ptx_stream
	members.
	(select_stream_for_async): Likewise.
	(nvptx_exec): Update call to map_init.

(cherry picked from gomp-4_0-branch r242614)
---
 libgomp/plugin/plugin-nvptx.c | 167 +++---
 1 file changed, 90 insertions(+), 77 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 1237ea10..d79ddf1 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -200,20 +200,20 @@ cuda_error (CUresult r)
 static unsigned int instantiated_devices = 0;
 static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER;
 
+struct cuda_map
+{
+  CUdeviceptr d;
+  size_t size;
+  bool active;
+  struct cuda_map *next;
+};
+
 struct ptx_stream
 {
   CUstream stream;
   pthread_t host_thread;
   bool multithreaded;
-
-  CUdeviceptr d;
-  void *h;
-  void *h_begin;
-  void *h_end;
-  void *h_next;
-  void *h_prev;
-  void *h_tail;
-
+  struct cuda_map *map;
   struct ptx_stream *next;
 };
 
@@ -225,101 +225,114 @@ struct nvptx_thread
   struct ptx_device *ptx_dev;
 };
 
+static struct cuda_map *
+cuda_map_create (size_t size)
+{
+  struct cuda_map *map = GOMP_PLUGIN_malloc (sizeof (struct cuda_map));
+
+  assert (map);
+
+  map->next = NULL;
+  map->size = size;
+  map->active = false;
+
+  CUDA_CALL_ERET (NULL, cuMemAlloc, &map->d, size);
+  assert (map->d);
+
+  return map;
+}
+
+static void
+cuda_map_destroy (struct cuda_map *map)
+{
+  CUDA_CALL_ASSERT (cuMemFree, map->d);
+  free (map);
+}
+
+/* The following map_* routines manage the CUDA device memory that
+   contains the data mapping arguments for cuLaunchKernel.  Each
+   asynchronous PTX stream may have multiple pending kernel
+   invocations, which are launched in a FIFO order.  As such, the map
+   routines maintains a queue of cuLaunchKernel arguments.
+
+   Calls to map_push and map_pop must be guarded by ptx_event_lock.
+   Likewise, calls to map_init and map_fini are guarded by
+   ptx_dev_lock inside GOMP_OFFLOAD_init_device and
+   GOMP_OFFLOAD_fini_device, respectively.  */
+
 static bool
 map_init (struct ptx_stream *s)
 {
   int size = getpagesize ();
 
   assert (s);
-  assert (!s->d);
-  assert (!s->h);
-
-  CUDA_CALL (cuMemAllocHost, &s->h, size);
-  CUDA_CALL (cuMemHostGetDevicePointer, &s->d, s->h, 0);
 
-  assert (s->h);
+  s->map = cuda_map_create (size);
 
-  s->h_begin = s->h;
-  s->h_end = s->h_begin + size;
-  s->h_next = s->h_prev = s->h_tail = s->h_begin;
-
-  assert (s->h_next);
-  assert (s->h_end);
   return true;
 }
 
 static bool
 map_fini (struct ptx_stream *s)
 {
-  CUDA_CALL (cuMemFreeHost, s->h);
+  assert (s->map->next == NULL);
+  assert (!s->map->active);
+
+  cuda_map_destroy (s->map);
+
   return true;
 }
 
 static void
 map_pop (struct ptx_stream *s)
 {
-  assert (s != NULL);
-  assert (s->h_next);
-  assert (s->h_prev);
-  assert (s->h_tail);
-
-  s->h_tail = s->h_next;
-
-  if (s->h_tail >= s->h_end)
-s->h_tail = s->h_begin + (int) (s->h_tail - s->h_end);
+  struct cuda_map *next;
 
-  if (s->h_next == s->h_tail)
-s->h_prev = s->h_next;
+  assert (s != NULL);
 
-  assert (s->h_next >= s->h_begin);
-  assert (s->h_tail >= s->h_begin);
-  assert (s->h_prev >= s->h_begin);
+  if (s->map->next == NULL)
+{
+  s->map->active = false;
+  return;
+}
 
-  assert (s->h_next <= s->h_end);
-  assert (s->h_tail <= s->h_end);
-  assert (s->h_prev <= s->h_end);
+  next = s->map->next;
+  cuda_map_destroy (s->map);
+  s->map = next;
 }
 
-static void
-ma

[PATCH,nvptx] Truncate config/nvptx/oacc-parallel.c

2018-07-31 Thread Cesar Philippidis
Way back in the GCC 5 days when support for OpenACC was in its infancy,
we used to rely on having various GOACC_ thread functions in the runtime
to implement the execution model, or there lack of (that version of GCC
only supported vector level parallelism). However, beginning with GCC 6,
those external functions were replaced with internal functions that get
expanded by the nvptx BE directly.

This patch removes those stale libgomp functions from the nvptx libgomp
target. Is this OK for trunk, or does libgomp still need to maintain
backwards compatibility with GCC 5?

This patch has been bootstrapped and regtested for x86_64 with nvptx
offloading.

Thanks,
Cesar
[PATCH] [libgomp] Truncate config/nvptx/oacc-parallel.c

2018-XX-YY  Cesar Philippidis  
	Thomas Schwinge 

	libgomp/
	* config/nvptx/oacc-parallel.c: Truncate.

(cherry picked from gomp-4_0-branch r228836)
---
 libgomp/config/nvptx/oacc-parallel.c | 358 ---
 1 file changed, 358 deletions(-)

diff --git a/libgomp/config/nvptx/oacc-parallel.c b/libgomp/config/nvptx/oacc-parallel.c
index 5dc53da..e69de29 100644
--- a/libgomp/config/nvptx/oacc-parallel.c
+++ b/libgomp/config/nvptx/oacc-parallel.c
@@ -1,358 +0,0 @@
-/* OpenACC constructs
-
-   Copyright (C) 2014-2018 Free Software Foundation, Inc.
-
-   Contributed by Mentor Embedded.
-
-   This file is part of the GNU Offloading and Multi Processing Library
-   (libgomp).
-
-   Libgomp is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3, or (at your option)
-   any later version.
-
-   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include "libgomp_g.h"
-
-__asm__ (".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1);\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_ntid (.param .u32 %in_ar1);\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_ctaid (.param .u32 %in_ar1);\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_nctaid (.param .u32 %in_ar1);\n"
-	 "// BEGIN GLOBAL FUNCTION DECL: GOACC_get_num_threads\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_get_num_threads;\n"
-	 "// BEGIN GLOBAL FUNCTION DECL: GOACC_get_thread_num\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_get_thread_num;\n"
-	 "// BEGIN GLOBAL FUNCTION DECL: abort\n"
-	 ".extern .func abort;\n"
-	 ".visible .func (.param .u32 %out_retval) GOACC_tid (.param .u32 %in_ar1)\n"
-	 "{\n"
-	 ".reg .u32 %ar1;\n"
-	 ".reg .u32 %retval;\n"
-	 ".reg .u64 %hr10;\n"
-	 ".reg .u32 %r22;\n"
-	 ".reg .u32 %r23;\n"
-	 ".reg .u32 %r24;\n"
-	 ".reg .u32 %r25;\n"
-	 ".reg .u32 %r26;\n"
-	 ".reg .u32 %r27;\n"
-	 ".reg .u32 %r28;\n"
-	 ".reg .u32 %r29;\n"
-	 ".reg .pred %r30;\n"
-	 ".reg .u32 %r31;\n"
-	 ".reg .pred %r32;\n"
-	 ".reg .u32 %r33;\n"
-	 ".reg .pred %r34;\n"
-	 ".local .align 8 .b8 %frame[4];\n"
-	 "ld.param.u32 %ar1,[%in_ar1];\n"
-	 "mov.u32 %r27,%ar1;\n"
-	 "st.local.u32 [%frame],%r27;\n"
-	 "ld.local.u32 %r28,[%frame];\n"
-	 "mov.u32 %r29,1;\n"
-	 "setp.eq.u32 %r30,%r28,%r29;\n"
-	 "@%r30 bra $L4;\n"
-	 "mov.u32 %r31,2;\n"
-	 "setp.eq.u32 %r32,%r28,%r31;\n"
-	 "@%r32 bra $L5;\n"
-	 "mov.u32 %r33,0;\n"
-	 "setp.eq.u32 %r34,%r28,%r33;\n"
-	 "@!%r34 bra $L8;\n"
-	 "mov.u32 %r23,%tid.x;\n"
-	 "mov.u32 %r22,%r23;\n"
-	 "bra $L7;\n"
-	 "$L4:\n"
-	 "mov.u32 %r24,%tid.y;\n"
-	 "mov.u32 %r22,%r24;\n"
-	 "bra $L7;\n"
-	 "$L5:\n"
-	 "mov.u32 %r25,%tid.z;\n"
-	 "mov.u32 %r22,%r25;\n"
-	 "bra $L7;\n"
-	 "$L8:\n"
-	 "{\n"
-	 "{\n"
-	 "call abort;\n"
-	 "}\n"
-	 "}\n"
-	 "$L7:\n"
-	 "mov.u32 %r26,%r22;\n"
-	 "mov.u32 %retval,%r26;\n"
-	 "st.param.u32 [%ou

[og8] Add __builtin_goacc_parlevel_{id,size}

2018-07-31 Thread Cesar Philippidis
I've committed this patch to og8 which backports the first of Tom's
goacc_parlevel patches from mainline. I'll post of a followup patch
which contains various bug fixes. I believe that this patch was
originally introduced in PR82428, or at least it resolves that PR.

Cesar
[og8] Add __builtin_goacc_parlevel_{id,size}

2018-07-31  Cesar Philippidis  

	Backport from mainline:
	2018-05-02  Tom de Vries  

	PR libgomp/82428
	gcc/
	* builtins.def (DEF_GOACC_BUILTIN_ONLY): Define.
	* omp-builtins.def (BUILT_IN_GOACC_PARLEVEL_ID)
	(BUILT_IN_GOACC_PARLEVEL_SIZE): New builtin.
	* builtins.c (expand_builtin_goacc_parlevel_id_size): New function.
	(expand_builtin): Call expand_builtin_goacc_parlevel_id_size.
	* doc/extend.texi (Other Builtins): Add __builtin_goacc_parlevel_id and
	__builtin_goacc_parlevel_size.

	gcc/fortran/
	* f95-lang.c (DEF_GOACC_BUILTIN_ONLY): Define.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size-2.c: New test.
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Use
	__builtin_goacc_parlevel_{id,size}.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/tile-1.c: Same.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259850
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/builtins.c b/gcc/builtins.c
index a71555e..300e13c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -71,6 +71,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "intl.h"
 #include "file-prefix-map.h" /* remap_macro_filename()  */
+#include "gomp-constants.h"
+#include "omp-general.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -6628,6 +6630,71 @@ expand_stack_save (void)
   return ret;
 }
 
+/* Emit code to get the openacc gang, worker or vector id or size.  */
+
+static rtx
+expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
+{
+  const char *name;
+  rtx fallback_retval;
+  rtx_insn *(*gen_fn) (rtx, rtx);
+  switch (DECL_FUNCTION_CODE (get_callee_fndecl (exp)))
+{
+case BUILT_IN_GOACC_PARLEVEL_ID:
+  name = "__builtin_goacc_parlevel_id";
+  fallback_retval = const0_rtx;
+  gen_fn = targetm.gen_oacc_dim_pos;
+  break;
+case BUILT_IN_GOACC_PARLEVEL_SIZE:
+  name = "__builtin_goacc_parlevel_size";
+  fallback_retval = const1_rtx;
+  gen_fn = targetm.gen_oacc_dim_size;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (oacc_get_fn_attrib (current_function_decl) == NULL_TREE)
+{
+  error ("%qs only supported in OpenACC code", name);
+  return const0_rtx;
+}
+
+  tree arg = CALL_EXPR_ARG (exp, 0);
+  if (TREE_CODE (arg) != INTEGER_CST)
+{
+  error ("non-constant argument 0 to %qs", name);
+  return const0_rtx;
+}
+
+  int dim = TREE_INT_CST_LOW (arg);
+  switch (dim)
+{
+case GOMP_DIM_GANG:
+case GOMP_DIM_WORKER:
+case GOMP_DIM_VECTOR:
+  break;
+default:
+  error ("illegal argument 0 to %qs", name);
+  return const0_rtx;
+}
+
+  if (ignore)
+return target;
+
+  if (!targetm.have_oacc_dim_size ())
+{
+  emit_move_insn (target, fallback_retval);
+  return target;
+}
+
+  rtx reg = MEM_P (target) ? gen_reg_rtx (GET_MODE (target)) : target;
+  emit_insn (gen_fn (reg, GEN_INT (dim)));
+  if (reg != target)
+emit_move_insn (target, reg);
+
+  return target;
+}
 
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that

[og8] More goacc_parlevel enhancements

2018-07-31 Thread Cesar Philippidis
I've committed this patch which contains all of the remaining
goacc_parlevel bug fixes present in trunk to og8.

The goal of the goacc parlevel changes is replace the use of inline ptx
code with builtin functions so that the certain OpenACC execution tests
that exercise the execution model can be target independent. For the
most part, these patches applied cleanly to og8, however, as I noted in
PR86757, there were a couple of og8-specific regressions involving tests
that started to fail when built -O0. I believe that problem is caused by
the ganglocal memory changes.

Chung-Lin, we'll need to fix PR86757 before we push the gangprivate
changes upstream.

Julian, I'm not sure if the GCN port supports gangprivate memory. If it
does, you might be hit by this failure at -O0. But those tests have
already been xfailed, so you should be OK.

Cesar
[og8] More goacc_parlevel enhancements

2018-07-31  Cesar Philippidis  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.

	Backport from mainline:
	2018-05-02  Tom de Vries  

	PR libgomp/85411
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_exec): Move parsing of
	GOMP_OPENACC_DIM ...
	* env.c (parse_gomp_openacc_dim): ... here.  New function.
	(initialize_env): Call parse_gomp_openacc_dim.
	(goacc_default_dims): Define.
	* libgomp.h (goacc_default_dims): Declare.
	* oacc-plugin.c (GOMP_PLUGIN_acc_default_dim): New function.
	* oacc-plugin.h (GOMP_PLUGIN_acc_default_dim): Declare.
	* libgomp.map: New version "GOMP_PLUGIN_1.2". Add
	GOMP_PLUGIN_acc_default_dim.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-runtime.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.

	2018-05-04  Tom de Vries  
	PR libgomp/85639
	gcc/
	* builtins.c (expand_builtin_goacc_parlevel_id_size): Handle null target
	if ignore == 0.

	2018-05-07  Tom de Vries  
	PR testsuite/85677
	libgomp/
	* testsuite/lib/libgomp.exp (libgomp_init): Move inclusion of top-level
	include directory in ALWAYS_CFLAGS out of $blddir != "" condition.

[openacc] Move GOMP_OPENACC_DIM parsing out of nvptx plugin

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259852
138bc75d-0d04-0410-961f-82ee72b054a4

[expand] Handle null target in expand_builtin_goacc_parlevel_id_size

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259927
138bc75d-0d04-0410-961f-82ee72b054a4

[openacc, testsuite] Allow installed testing of libgomp to find gomp-constants.h

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259992
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 300e13c..0097d5b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6682,6 +6682,9 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
   if (ignore)
 return target;
 
+  if (target == NULL_RTX)
+target = gen_reg_rtx (TYPE_MODE (TREE_TYPE (exp)));
+
   if (!targetm.have_oacc_dim_size ())
 {
   emit_move_insn (target, fallback_retval);
diff --git a/libgomp/env.c b/libgomp/env.c
index c99ba85..fab35b7 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -90,6 +90,7 @@ int gomp_debug_var;
 unsigned int gomp_num_teams_var;
 char *goacc_device_type;
 int goacc_device_num;
+int goacc_default_dims[GOMP_DIM_MAX];
 
 #ifndef LIBGOMP_OFFLOADED_ONLY
 
@@ -1066,6 +1067,36 @@ parse_acc_device_type (void)
 }
 
 static void
+parse_gomp_openacc_dim (void)
+{
+  /* The syntax is the same as for the -fopenacc-dim compilation option.  */
+  const char *var_name = "GOMP_OPENACC_DIM";
+  const char *env_var = getenv (var_name);
+  if (!env_var)
+return;
+
+  const char *pos = env_var;
+  int i;
+  for (i = 0; *pos && i != GOMP_DIM_MAX; i++)
+{
+  if (i && *pos++ != ':')
+	break;
+
+  if (*pos == ':')
+	continue;
+
+  const char *eptr;
+  errno = 0;
+  long val = strtol (pos, (char **)&eptr, 10);
+  if (errno || val < 0 || (unsigned)val != val)
+	break;
+
+  goacc_default_dims[i] = (int)val;
+  pos = eptr;
+}
+}
+
+static void
 handle_omp_display_env (unsigned long stacksize, int wait_policy)
 {
   const char *env;
@@ -1336,6 +1367,7 @@ initialize_env (void)
 goacc_device_num = 0;
 
   parse_acc_device_type ();
+  parse_gomp_openacc_dim ();
 
   goacc_runtime_initialize ();
 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a9aca74..607f4c2 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -44,6 +44,7 @@

[patch] combine ICE fix

2013-10-10 Thread Cesar Philippidis
This patch addresses an ICE when building qemu for a Mips target in
Yocto. Both gcc-trunk, gcc-4.8 and all of the targets are potentially
affected. The problem occurs because the instruction combine phase uses
two data structures to keep track of registers, reg_stat and
regstat_n_sets_and_refs, but they potentially end up out of sync; when
combine inserts a new register into reg_stat it does not update
regstat_n_sets_and_refs. Failing to update the latter results in an
occasional segmentation fault.

Is this OK for trunk and gcc-4.8? If so, please check it in. I tested it
on Mips and x86-64 and no regressions showed up.

Thanks,
Cesar
2013-10-10  Cesar Philippidis  

gcc/
* regs.h (REG_N_GROW): New function. 
* combine.c (combine_split_insns): Call REG_N_GROW when
new registers are created.

Index: gcc/regs.h
===
--- gcc/regs.h  (revision 421441)
+++ gcc/regs.h  (working copy)
@@ -85,6 +85,17 @@ REG_N_SETS (int regno)
   return regstat_n_sets_and_refs[regno].sets;
 }
 
+/* Indexed by n, inserts a new register (REG n).  */
+static inline void
+REG_N_GROW (int regno)
+{
+  regstat_n_sets_and_refs = XRESIZEVEC (struct regstat_n_sets_and_refs_t, 
+   regstat_n_sets_and_refs, regno+1);
+
+  regstat_n_sets_and_refs[regno].sets = 1;
+  regstat_n_sets_and_refs[regno].refs = 1;
+}
+
 /* Indexed by n, gives number of times (REG n) is set.  */
 #define SET_REG_N_SETS(N,V) (regstat_n_sets_and_refs[N].sets = V)
 #define INC_REG_N_SETS(N,V) (regstat_n_sets_and_refs[N].sets += V)
Index: gcc/combine.c
===
--- gcc/combine.c   (revision 421441)
+++ gcc/combine.c   (working copy)
@@ -518,7 +518,10 @@ combine_split_insns (rtx pattern, rtx insn)
   ret = split_insns (pattern, insn);
   nregs = max_reg_num ();
   if (nregs > reg_stat.length ())
-reg_stat.safe_grow_cleared (nregs);
+{
+  reg_stat.safe_grow_cleared (nregs);
+  REG_N_GROW (nregs);
+}
   return ret;
 }
 


Re: [patch] combine ICE fix

2013-10-11 Thread Cesar Philippidis
On 10/10/13 9:25 AM, Jakub Jelinek wrote:

> That looks broken.  You leave everything from the last size till the current
> one uninitialized, so it would only work if combine_split_insns
> always increments max_reg_num () by at most one.

Good catch.

> Furthermore, there are macros which should be used to access
> the fields, and, if the vector is ever going to be resized, supposedly
> it should be vec.h vector rather than just array.
> Or perhaps take into account:
> /* If a pass need to change these values in some magical way or the
>pass needs to have accurate values for these and is not using
>incremental df scanning, then it should use REG_N_SETS and
>REG_N_USES.  If the pass is doing incremental scanning then it
>should be getting the info from DF_REG_DEF_COUNT and
>DF_REG_USE_COUNT.  */
> and not use REG_N_SETS etc. but instead the df stuff.

I was thinking about converting that array to a vec. But I don't want to
touch more code than I have to right now. Is this OK as a stopgap?

Thanks for the review!

Cesar
2013-10-11  Cesar Philippidis  

gcc/
* regs.h (REG_N_GROW): New function. 
* combine.c (combine_split_insns): Call REG_N_GROW when
new registers are created.

Index: gcc/regs.h
===
--- gcc/regs.h  (revision 203289)
+++ gcc/regs.h  (working copy)
@@ -89,6 +89,20 @@ REG_N_SETS (int regno)
 #define SET_REG_N_SETS(N,V) (regstat_n_sets_and_refs[N].sets = V)
 #define INC_REG_N_SETS(N,V) (regstat_n_sets_and_refs[N].sets += V)
 
+/* Indexed by n, inserts new registers (old_regno+1)..new_regno.  */
+static inline void
+REG_N_GROW (int new_regno, int old_regno)
+{
+  regstat_n_sets_and_refs = XRESIZEVEC (struct regstat_n_sets_and_refs_t, 
+   regstat_n_sets_and_refs, new_regno+1);
+
+  for (int i = old_regno + 1; i <= new_regno; ++i)
+{
+  SET_REG_N_SETS (i, 1);
+  SET_REG_N_REFS (i, 1);
+}
+}
+
 /* Given a REG, return TRUE if the reg is a PARM_DECL, FALSE otherwise.  */
 extern bool reg_is_parm_p (rtx);
 
Index: gcc/combine.c
===
--- gcc/combine.c   (revision 203289)
+++ gcc/combine.c   (working copy)
@@ -518,7 +518,10 @@ combine_split_insns (rtx pattern, rtx insn)
   ret = split_insns (pattern, insn);
   nregs = max_reg_num ();
   if (nregs > reg_stat.length ())
-reg_stat.safe_grow_cleared (nregs);
+{
+  REG_N_GROW (nregs, reg_stat.length ());
+  reg_stat.safe_grow_cleared (nregs);
+}
   return ret;
 }
 


Re: [patch] combine ICE fix

2013-10-16 Thread Cesar Philippidis
On 10/15/13 12:16 PM, Jeff Law wrote:
> On 10/10/13 10:25, Jakub Jelinek wrote:
>> On Thu, Oct 10, 2013 at 07:26:43AM -0700, Cesar Philippidis wrote:
>>> This patch addresses an ICE when building qemu for a Mips target in
>>> Yocto. Both gcc-trunk, gcc-4.8 and all of the targets are potentially
>>> affected. The problem occurs because the instruction combine phase uses
>>> two data structures to keep track of registers, reg_stat and
>>> regstat_n_sets_and_refs, but they potentially end up out of sync; when
>>> combine inserts a new register into reg_stat it does not update
>>> regstat_n_sets_and_refs. Failing to update the latter results in an
>>> occasional segmentation fault.
>>>
>>> Is this OK for trunk and gcc-4.8? If so, please check it in. I tested it
>>> on Mips and x86-64 and no regressions showed up.
>>
>> That looks broken.  You leave everything from the last size till the
>> current
>> one uninitialized, so it would only work if combine_split_insns
>> always increments max_reg_num () by at most one.
> I don't think that assumption is safe.  Consider a parallel with a bunch
> of (clobber (match_scratch)) expressions.

I address that in the patch posted here
<http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00802.html>. Is that still
insufficient?

>> Furthermore, there are macros which should be used to access
>> the fields, and, if the vector is ever going to be resized, supposedly
>> it should be vec.h vector rather than just array.
>> Or perhaps take into account:
>> /* If a pass need to change these values in some magical way or the
>> pass needs to have accurate values for these and is not using
>> incremental df scanning, then it should use REG_N_SETS and
>> REG_N_USES.  If the pass is doing incremental scanning then it
>> should be getting the info from DF_REG_DEF_COUNT and
>> DF_REG_USE_COUNT.  */
>> and not use REG_N_SETS etc. but instead the df stuff.
> Which begs the question, how exactly is combine utilizing the
> regstat_n_* structures and is that use even valid for combine?

I'll take a look at that.

Cesar



Re: [PATCH] libgomp testsuite fixes

2013-10-24 Thread Cesar Philippidis
On 6/20/13 9:49 AM, Mike Stump wrote:
> On May 30, 2013, at 12:59 PM, Cesar Philippidis  
> wrote:
>> Here is a patch from our backlog at Mentor Graphics that addresses a 
>> libgomp issue where setting ENABLE_LTO=1 in site.exp causes the following 
>> error with dejagnu
> 
>> Is it OK for trunk?
> 
> Ok.
> 
> Committed revision 200253.

Please backport this patch along with the libitm fix to 4.8. Thank you.
(The libitm patch was discussed here
<http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01229.html>.)

Cesar

Index: libgomp/testsuite/libgomp.fortran/fortran.exp
===
--- libgomp/testsuite/libgomp.fortran/fortran.exp   (revision 199267)
+++ libgomp/testsuite/libgomp.fortran/fortran.exp   (working copy)
@@ -1,4 +1,6 @@
 load_lib libgomp-dg.exp
+load_gcc_lib gcc-dg.exp
+load_gcc_lib gfortran-dg.exp
 
 global shlib_ext
 global ALWAYS_CFLAGS
Index: libgomp/testsuite/lib/libgomp.exp
===
--- libgomp/testsuite/lib/libgomp.exp   (revision 199267)
+++ libgomp/testsuite/lib/libgomp.exp   (working copy)
@@ -9,24 +9,27 @@
 }
 
 load_lib dg.exp
+
+# Required to use gcc-dg.exp - however, the latter should NOT be
+# loaded until ${tool}_target_compile is defined since it uses that
+# to determine default LTO options.
+
+load_gcc_lib prune.exp
+load_gcc_lib target-libpath.exp
+load_gcc_lib wrapper.exp
+load_gcc_lib gcc-defs.exp
+load_gcc_lib timeout.exp
+load_gcc_lib target-supports.exp
 load_gcc_lib file-format.exp
-load_gcc_lib target-supports.exp
 load_gcc_lib target-supports-dg.exp
 load_gcc_lib scanasm.exp
 load_gcc_lib scandump.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scantree.exp
 load_gcc_lib scanipa.exp
-load_gcc_lib prune.exp
-load_gcc_lib target-libpath.exp
-load_gcc_lib wrapper.exp
-load_gcc_lib gcc-defs.exp
+load_gcc_lib timeout-dg.exp
 load_gcc_lib torture-options.exp
-load_gcc_lib timeout.exp
-load_gcc_lib timeout-dg.exp
 load_gcc_lib fortran-modules.exp
-load_gcc_lib gcc-dg.exp
-load_gcc_lib gfortran-dg.exp
 
 set dg-do-what-default run
 
Index: libgomp/testsuite/libgomp.c/c.exp
===
--- libgomp/testsuite/libgomp.c/c.exp   (revision 199267)
+++ libgomp/testsuite/libgomp.c/c.exp   (working copy)
@@ -7,6 +7,7 @@
 }
 
 load_lib libgomp-dg.exp
+load_gcc_lib gcc-dg.exp
 
 # If a testcase doesn't have special options, use these.
 if ![info exists DEFAULT_CFLAGS] then {
Index: libgomp/testsuite/libgomp.graphite/graphite.exp
===
--- libgomp/testsuite/libgomp.graphite/graphite.exp (revision 199267)
+++ libgomp/testsuite/libgomp.graphite/graphite.exp (working copy)
@@ -23,6 +23,7 @@
 }
 
 load_lib libgomp-dg.exp
+load_gcc_lib gcc-dg.exp
 
 if ![check_effective_target_pthread] {
   return
Index: libgomp/testsuite/libgomp.c++/c++.exp
===
--- libgomp/testsuite/libgomp.c++/c++.exp   (revision 199267)
+++ libgomp/testsuite/libgomp.c++/c++.exp   (working copy)
@@ -1,4 +1,5 @@
 load_lib libgomp-dg.exp
+load_gcc_lib gcc-dg.exp
 
 global shlib_ext
 
@@ -53,7 +54,7 @@
 }
 
 # Main loop.
-gfortran-dg-runtest $tests $libstdcxx_includes
+dg-runtest $tests "" $libstdcxx_includes
 }
 
 # All done.
Index: libitm/testsuite/lib/libitm.exp
===
--- libitm/testsuite/lib/libitm.exp (revision 199267)
+++ libitm/testsuite/lib/libitm.exp (working copy)
@@ -23,23 +23,27 @@
 }
 
 load_lib dg.exp
+
+# Required to use gcc-dg.exp - however, the latter should NOT be
+# loaded until ${tool}_target_compile is defined since it uses that
+# to determine default LTO options.
+
+load_gcc_lib prune.exp
+load_gcc_lib target-libpath.exp
+load_gcc_lib wrapper.exp
+load_gcc_lib gcc-defs.exp
+load_gcc_lib timeout.exp
+load_gcc_lib target-supports.exp
 load_gcc_lib file-format.exp
-load_gcc_lib target-supports.exp
 load_gcc_lib target-supports-dg.exp
 load_gcc_lib scanasm.exp
 load_gcc_lib scandump.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scantree.exp
 load_gcc_lib scanipa.exp
-load_gcc_lib prune.exp
-load_gcc_lib target-libpath.exp
-load_gcc_lib wrapper.exp
-load_gcc_lib gcc-defs.exp
+load_gcc_lib timeout-dg.exp
 load_gcc_lib torture-options.exp
-load_gcc_lib timeout.exp
-load_gcc_lib timeout-dg.exp
 load_gcc_lib fortran-modules.exp
-load_gcc_lib gcc-dg.exp
 
 set dg-do-what-default run
 
Index: libitm/testsuite/libitm.c++/c++.exp
===
--- libitm/testsuite/libitm.c++/c++.exp (revision 199267)
+++ libitm/testsuite/libitm.c++/c++.exp (working copy)
@@ -15,6 +15,7 @@
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, 
USA.
 
 load_lib libitm-dg.exp
+load_gcc_lib gcc-d

[PATCH] libstdc++ testsuite cxxflags

2013-10-28 Thread Cesar Philippidis
This patch addresses two issues with the libstdc++ testsuite:

  * duplicate "-g -O2" CXXFLAGS
  * missing "-g -O2" for remote targets

The duplicate "-g -O2" flags is a result of testsuite_flags.in using
build-time CXXFLAGS and proc libstdc++_init using the environmental
CXXFLAGS, which defaults to its build-time value. This patch prevents
testsuite_flags.in from using build-time CXXFLAGS.

Certain remote targets require a minimum optimization level -O1 in order
to pass several atomics built-in function tests. This patch ensures
cxxflags contains "-g -O2" at minimum when no other optimization flags
are specified. The testsuite used to set those flags prior to Benjamin's
patch to remove duplicate cxxflags here
<http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01572.html>.

Is this OK for trunk? If so, please apply.

Thanks,
Cesar
2013-10-28  Cesar Philippidis  

libstdc++-v3/
* scripts/testsuite_flags.in (cxxflags): Don't use build-time
CXXFLAGS and EXTRA_CXX_FLAGS.
* testsuite/lib/libstdc++.exp (libstdc++_init): Ensure, at minimum,
cxxflags contains "-g -O2".

diff --git a/libstdc++-v3/scripts/testsuite_flags.in 
b/libstdc++-v3/scripts/testsuite_flags.in
index d7710ca..35b36e7 100755
--- a/libstdc++-v3/scripts/testsuite_flags.in
+++ b/libstdc++-v3/scripts/testsuite_flags.in
@@ -55,7 +55,7 @@ case ${query} in
   ;;
 --cxxflags)
   CXXFLAGS_default="-D_GLIBCXX_ASSERT -fmessage-length=0"
-  CXXFLAGS_config="@SECTION_FLAGS@ @CXXFLAGS@ @EXTRA_CXX_FLAGS@"
+  CXXFLAGS_config="@SECTION_FLAGS@"
   echo ${CXXFLAGS_default} ${CXXFLAGS_config}
   ;;
 --cxxparallelflags)
diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 51ff6dd..68dcb15 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -265,6 +265,15 @@ proc libstdc++_init { testfile } {
 }
 append cxxflags " "
 append cxxflags [getenv CXXFLAGS]
+
+if {$cxxflags == "-D_GLIBCXX_ASSERT -fmessage-length=0 "} {
+   append cxxflags "-g"
+}
+
+if ![regexp "\-O" $cxxflags] {
+   append cxxflags " -O2"
+}
+
 v3track cxxflags 2
 
 # Always use MO files built by this test harness.


Re: [gomp4] fix c++ reference mappings in openacc

2016-01-20 Thread Cesar Philippidis
On 01/20/2016 07:46 PM, Cesar Philippidis wrote:
> I've applied this patch to gomp-4_0-branch which fixes of problems
> involving reference type variables in openacc data clauses. The first
> problem was, the c++ front end was incorrectly handling reference types
> in general in openacc. Instead of mapping the variable, it would map the
> pointer to the variable by itself. The second problem was, if the
> gimplifier saw a pointer mapping for a data clause, it would propagate
> it to omp-lower. That's bad because if you have something like this
> 
>   int &var = ...
> 
>   #pragma acc data copy (var)
>   {
>  ...var...
>   }
> 
> where the var inside the data region would have some uninitialized value
> because omplower installs a new variable for it. The gimpifier is
> already handling openmp target data regions properly, so this patch
> extends it to ignore pointer mappings in acc enter/exit and data constructs.
> 
> Ultimately this patch will need to go in trunk, but the c++ changes
> don't apply cleanly. I'll need to work on that later.

And here's the patch.

Cesar
2016-01-20  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_parser_oacc_all_clauses): Call finish_omp_clauses
	with allow_fields set to true.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	* semantics.c (finish_omp_clauses): Ensure that is_oacc is properly
	set when calling hanlde_omp_array_sections.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses):  Consider OACC_{DATA,
	PARALLEL, KERNELS} when processing firstprivate pointers and
	references, and setting target_kind_firstprivatize_array_bases.

	libgomp/
	* testsuite/libgomp.oacc-c++/non-scalar-data.C: New test.


diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d88877a..4882b19 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32324,7 +32324,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 
   if (finish_p)
-return finish_omp_clauses (clauses, true, false);
+return finish_omp_clauses (clauses, true, true);
 
   return clauses;
 }
@@ -35140,7 +35140,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token *pragma_tok)
   tree stmt, clauses;
 
   clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE__CACHE_, NULL_TREE);
-  clauses = finish_omp_clauses (clauses, true, false);
+  clauses = finish_omp_clauses (clauses, true, true);
 
   cp_parser_require_pragma_eol (parser, cp_lexer_peek_token (parser->lexer));
 
@@ -35471,9 +35471,9 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
 {
   clauses = c_oacc_split_loop_clauses (clauses, cclauses);
   if (*cclauses)
-	finish_omp_clauses (*cclauses, true, false);
+	finish_omp_clauses (*cclauses, true, true);
   if (clauses)
-	finish_omp_clauses (clauses, true, false);
+	finish_omp_clauses (clauses, true, true);
 }
 
   tree block = begin_omp_structured_block ();
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 3ca6137..e161186 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5807,7 +5807,7 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool allow_fields,
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) == TREE_LIST)
 	{
-	  if (handle_omp_array_sections (c, allow_fields))
+	  if (handle_omp_array_sections (c, allow_fields && !is_oacc))
 		{
 		  remove = true;
 		  break;
@@ -6567,7 +6567,7 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool allow_fields,
 	}
 	  if (TREE_CODE (t) == TREE_LIST)
 	{
-	  if (handle_omp_array_sections (c, allow_fields))
+	  if (handle_omp_array_sections (c, allow_fields && !is_oacc))
 		remove = true;
 	  break;
 	}
@@ -6601,7 +6601,7 @@ finish_omp_clauses (tree clauses, bool is_oacc, bool allow_fields,
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) == TREE_LIST)
 	{
-	  if (handle_omp_array_sections (c, allow_fields))
+	  if (handle_omp_array_sections (c, allow_fields && !is_oacc))
 		remove = true;
 	  else
 		{
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index cdb5b96..152942f 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6092,7 +6092,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 	{
 	  unsigned nflags = flags;
 	  if (ctx->target_map_pointers_as_0len_arrays
-	  || ctx->target_map_scalars_firstprivate)
+	   || ctx->target_map_scalars_firstprivate)
 	{
 	  bool is_declare_target = false;
 	  bool is_scalar = false;
@@ -6456,7 +6456,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
   case OMP_TARGET_DATA:
   case OMP_TARGET_ENTER_DATA:
   case OMP_TARGET_EXIT_DATA:
+  case OACC_DATA:
   case OACC_HOST_DATA:
+  case OACC_PARALLEL:
+  case OACC_KERNELS:
 	ctx->target_firstprivatize_array_bases = true;
   defau

[gomp4] fix c++ reference mappings in openacc

2016-01-20 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which fixes of problems
involving reference type variables in openacc data clauses. The first
problem was, the c++ front end was incorrectly handling reference types
in general in openacc. Instead of mapping the variable, it would map the
pointer to the variable by itself. The second problem was, if the
gimplifier saw a pointer mapping for a data clause, it would propagate
it to omp-lower. That's bad because if you have something like this

  int &var = ...

  #pragma acc data copy (var)
  {
 ...var...
  }

where the var inside the data region would have some uninitialized value
because omplower installs a new variable for it. The gimpifier is
already handling openmp target data regions properly, so this patch
extends it to ignore pointer mappings in acc enter/exit and data constructs.

Ultimately this patch will need to go in trunk, but the c++ changes
don't apply cleanly. I'll need to work on that later.

Cesar


[openacc] reference-typed data mappings

2016-02-01 Thread Cesar Philippidis
This patch fixes a couple of bugs preventing c++ reference-typed
variables from working in openacc data clauses. These fixes include:

 * Teach the gimplifier to filter out pointer data mappings for
   OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
   Along with using a firsptrivate mapping for the array base pointers
   in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.

 * Make the data mapping errors emitted by the c and c++ front ends
   more consistent with openacc by reporting data mapping errors, not
   omp-specific map errors.

 * Add some light checking for duplicate reference mappings in c++. The
   c++ FE still fails to detect duplicate component refs, but that's not
   working in openacc at the moment, anyway.

Jakub, the latter issue also affects openmp. I've added a simple openmp
test case, but it could probably be more extensive. Can you add more
test coverage or tell me what should be included?

Is this patch ok for trunk?

Cesar
2016-02-01  Cesar Philippidis  

	gcc/c/
	* c-typeck.c (c_finish_omp_clauses): Report OMP_CLAUSE_MAP errors
	as data clause errors for OpenACC.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/pcopy.c: Likewise.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/present-2.c: Likewise.

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 65925cb..4d93005 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13157,8 +13157,10 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd)
 	{
 	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
 		error ("%qD appears more than once in motion clauses", t);
-	  else
+	  else if (is_omp)
 		error ("%qD appears more than once in map clauses", t);
+	  else
+		error ("%qD appears more than once in data clauses", t);
 	  remove = true;
 	}
 	  else if (bitmap_bit_p (&generic_head, DECL_UID (t))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 0aeee57..baf3495 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6415,7 +6415,7 @@ extern tree omp_reduction_id			(enum tree_code, tree, tree);
 extern tree cp_remove_omp_priv_cleanup_stmt	(tree *, int *, void *);
 extern void cp_check_omp_declare_reduction	(tree);
 extern void finish_omp_declare_simd_methods	(tree);
-extern tree finish_omp_clauses			(tree, bool, bool = false);
+extern tree finish_omp_clauses			(tree, bool, bool, bool = false);
 extern tree push_omp_privatization_clauses	(bool);
 extern void pop_omp_privatization_clauses	(tree);
 extern voi

Re: [openacc] reference-typed data mappings

2016-02-09 Thread Cesar Philippidis
On 02/01/2016 09:57 AM, Cesar Philippidis wrote:

> This patch fixes a couple of bugs preventing c++ reference-typed
> variables from working in openacc data clauses. These fixes include:
> 
>  * Teach the gimplifier to filter out pointer data mappings for
>OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>Along with using a firsptrivate mapping for the array base pointers
>in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
> 
>  * Make the data mapping errors emitted by the c and c++ front ends
>more consistent with openacc by reporting data mapping errors, not
>omp-specific map errors.
> 
>  * Add some light checking for duplicate reference mappings in c++. The
>c++ FE still fails to detect duplicate component refs, but that's not
>working in openacc at the moment, anyway.
> 
> Jakub, the latter issue also affects openmp. I've added a simple openmp
> test case, but it could probably be more extensive. Can you add more
> test coverage or tell me what should be included?

While working on a different reduction problem, I noticed that both the
c and c++ front end's are treating reductions as generic data clauses.
That means, parallel reductions of the form

  #pragma acc copy(foo) reduction(+:foo)

would get treated as an error. This patch fixes that, in addition to the
changes listed above.

Is this patch ok for trunk?

Cesar
2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/present-2.c: Likewise.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index eede3a7..20ff7da 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@

openacc reference reductions

2016-02-09 Thread Cesar Philippidis
This patch teaches omp-lower how handle reference-typed reductions,
which are common in fortran subroutines. Unlike the implementation in
gomp4 branch, this patch doesn't rewrite the reference reduction
variables as local variables. Instead, a local copy is created for
reduction variable.

There are two things that stick out in this patch. First, I took care
not remap any reduction variable appearing on a parallel directive
inside an offloaded region in order to keep it private. Second, you'll
notice that I'm creating quite a few temporary pointers inside
lower_oacc_reductions. Without those separate pointers, I'd get SSA
validation errors because those pointers get deferenced multiple times.
I didn't investigate that problem further.

Is this patch ok for trunk?

Cesar
2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index d41688b..8a66760 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -308,6 +308,28 @@ is_oacc_kernels (omp_context *ctx)
 	  == G

Re: openacc reference reductions

2016-02-09 Thread Cesar Philippidis
On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
> While I've not looked at the rest of the patch, this bit stood out:
> 
>> +static bool
>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>> +{
>> +  if (!is_oacc_parallel (ctx))
>> +return false;
>> +
>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>> +
>> +  /* Don't install a local copy of the decl if it used
>> + inside a acc parallel reduction.  */
> 
> ^^ comment is misleading -- this routine's not installing anything
> 
>> +  if (is_oacc_parallel (ctx))
> 
> ^^ already checked above.
> 
>> +for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>> +  && OMP_CLAUSE_DECL (c) == var)
>> +return true;
>> +
>> +  return false;
>> +}
>> +

Thanks for catching that. Those are artifacts from when this code used
to be located exclusively in scan_sharing_clauses. I've updated the
patch with those changes.

Cesar

2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	coverage.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f

Re: [openacc] reference-typed data mappings

2016-02-09 Thread Cesar Philippidis
On 02/09/2016 07:00 AM, Cesar Philippidis wrote:
> On 02/01/2016 09:57 AM, Cesar Philippidis wrote:
> 
>> > This patch fixes a couple of bugs preventing c++ reference-typed
>> > variables from working in openacc data clauses. These fixes include:
>> > 
>> >  * Teach the gimplifier to filter out pointer data mappings for
>> >OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>> >Along with using a firsptrivate mapping for the array base pointers
>> >in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
>> > 
>> >  * Make the data mapping errors emitted by the c and c++ front ends
>> >more consistent with openacc by reporting data mapping errors, not
>> >omp-specific map errors.
>> > 
>> >  * Add some light checking for duplicate reference mappings in c++. The
>> >c++ FE still fails to detect duplicate component refs, but that's not
>> >working in openacc at the moment, anyway.
>> > 
>> > Jakub, the latter issue also affects openmp. I've added a simple openmp
>> > test case, but it could probably be more extensive. Can you add more
>> > test coverage or tell me what should be included?
> While working on a different reduction problem, I noticed that both the
> c and c++ front end's are treating reductions as generic data clauses.
> That means, parallel reductions of the form
> 
>   #pragma acc copy(foo) reduction(+:foo)
> 
> would get treated as an error. This patch fixes that, in addition to the
> changes listed above.
> 
> Is this patch ok for trunk?

>   libgomp/
>   * testsuite/libgomp.c++/non-scalar-data.C: New test.

I copied the wrong test here. It should be testing omp target, not acc
*. This patch updates that test case.

Cesar

2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/lib

[openacc] vector state propagation

2016-02-22 Thread Cesar Philippidis
This patch teaches the nvptx vector state propagator how to handle
QImode and HImode variables. Basically, I'm converting the 8- and 16-bit
values into 32-bits so that the shuffle broadcast can be used to
propagate the register.

I'm not sure if my solution is the best way to resolve this problem. It
looks like the nvptx backend frequently assigns a larger .u16 and .u32
register for chars and shorts, and consequently masks this problem in
-O0.  Because a lot of the registers are already u32, the conversion to
and from u8 and u16 seems like an unnecessary step, when the nvptx
backend should be able to broadcast the origin u32 register directly.

Is there a better way to resolve this issue, or is this patch OK for
trunk as-is?

Cesar
2016-02-22  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (nvptx_gen_shuffle): Add support for QImode
	and HImode register.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/vprop.c: New test.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3faacd5..728cb00 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1306,6 +1306,20 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
 	end_sequence ();
   }
   break;
+case QImode:
+case HImode:
+  {
+	rtx tmp = gen_reg_rtx (SImode);
+
+	start_sequence ();
+	emit_insn (gen_rtx_SET (tmp, gen_rtx_fmt_e (ZERO_EXTEND, SImode, src)));
+	emit_insn (nvptx_gen_shuffle (tmp, tmp, idx, kind));
+	emit_insn (gen_rtx_SET (dst, gen_rtx_fmt_e (TRUNCATE, GET_MODE (dst),
+		tmp)));
+	res = get_insns ();
+	end_sequence ();
+  }
+  break;
   
 default:
   gcc_unreachable ();
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
new file mode 100644
index 000..a9b63dc
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
@@ -0,0 +1,34 @@
+#include 
+
+#define test(type)\
+void		\
+test_##type ()	\
+{		\
+  type b[100];	\
+  type i, j, x = -1, y = -1;			\
+		\
+  _Pragma("acc parallel loop copyout (b)")	\
+  for (j = 0; j > -5; j--)			\
+{		\
+  type c = x+y; \
+  _Pragma("acc loop vector")		\
+  for (i = 0; i < 20; i++)			\
+	b[-j*20 + i] = c;			\
+  b[5-j] = c;   \
+}		\
+		\
+  for (i = 0; i < 100; i++)			\
+assert (b[i] == -2);			\
+}
+
+test(char)
+test(short)
+
+int
+main ()
+{
+  test_char ();
+  test_short ();
+
+  return 0;
+}


Re: openacc reference reductions

2016-02-22 Thread Cesar Philippidis
Ping. This patch still needs a review.

Cesar

On 02/09/2016 08:17 AM, Cesar Philippidis wrote:
> On 02/09/2016 07:33 AM, Nathan Sidwell wrote:
>> While I've not looked at the rest of the patch, this bit stood out:
>>
>>> +static bool
>>> +is_oacc_parallel_reduction (tree var, omp_context *ctx)
>>> +{
>>> +  if (!is_oacc_parallel (ctx))
>>> +return false;
>>> +
>>> +  tree clauses = gimple_omp_target_clauses (ctx->stmt);
>>> +
>>> +  /* Don't install a local copy of the decl if it used
>>> + inside a acc parallel reduction.  */
>>
>> ^^ comment is misleading -- this routine's not installing anything
>>
>>> +  if (is_oacc_parallel (ctx))
>>
>> ^^ already checked above.
>>
>>> +for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>>> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
>>> +  && OMP_CLAUSE_DECL (c) == var)
>>> +return true;
>>> +
>>> +  return false;
>>> +}
>>> +
> 
> Thanks for catching that. Those are artifacts from when this code used
> to be located exclusively in scan_sharing_clauses. I've updated the
> patch with those changes.
> 
> Cesar
> 

2016-02-09  Cesar Philippidis  

	gcc/
	* omp-low.c (is_oacc_parallel_reduction): New function.
	(scan_sharing_clauses): Use it to prevent installing local variables
	for those used in acc parallel reductions.
	(lower_rec_input_clauses): Remove dead code.
	(lower_oacc_reductions): Add support for reference reductions.
	(lower_reduction_clauses): Remove dead code.
	(lower_omp_target): Don't remap variables appearing in acc parallel
	reductions.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Add more test coverage.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-default.h: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: New
	test.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Add more test
	coverage.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New test.
	* testsuite/libgomp.oacc-fortran/parallel-loop-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: New test.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Add more test
	

Re: [openacc] reference-typed data mappings

2016-02-22 Thread Cesar Philippidis
Ping.

Cesar

On 02/09/2016 09:05 AM, Cesar Philippidis wrote:
> On 02/09/2016 07:00 AM, Cesar Philippidis wrote:
>> On 02/01/2016 09:57 AM, Cesar Philippidis wrote:
>>
>>>> This patch fixes a couple of bugs preventing c++ reference-typed
>>>> variables from working in openacc data clauses. These fixes include:
>>>>
>>>>  * Teach the gimplifier to filter out pointer data mappings for
>>>>OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
>>>>Along with using a firsptrivate mapping for the array base pointers
>>>>in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.
>>>>
>>>>  * Make the data mapping errors emitted by the c and c++ front ends
>>>>more consistent with openacc by reporting data mapping errors, not
>>>>omp-specific map errors.
>>>>
>>>>  * Add some light checking for duplicate reference mappings in c++. The
>>>>c++ FE still fails to detect duplicate component refs, but that's not
>>>>working in openacc at the moment, anyway.
>>>>
>>>> Jakub, the latter issue also affects openmp. I've added a simple openmp
>>>> test case, but it could probably be more extensive. Can you add more
>>>> test coverage or tell me what should be included?
>> While working on a different reduction problem, I noticed that both the
>> c and c++ front end's are treating reductions as generic data clauses.
>> That means, parallel reductions of the form
>>
>>   #pragma acc copy(foo) reduction(+:foo)
>>
>> would get treated as an error. This patch fixes that, in addition to the
>> changes listed above.
>>
>> Is this patch ok for trunk?
> 
>>  libgomp/
>>  * testsuite/libgomp.c++/non-scalar-data.C: New test.
> 
> I copied the wrong test here. It should be testing omp target, not acc
> *. This patch updates that test case.
> 
> Cesar
> 

2016-02-09  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_omp_all_clauses): Likewise.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	(omp_split_clauses): Likewise.
	(c_parser_omp_declare_target): Likewise.
	(c_parser_cilk_for): Likewise.
	* c-tree.h (c_finish_omp_clauses): Update prototype.
	* c-typeck.c (c_finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Allow OpenACC
	reductions variables to appear in data clauses.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.  Allow OpenACC reductions variables to
	appear in data clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/parallel-reduction.c: New test.
	* c-c++-common/goacc/pcopy.c: Adjust test.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* c-c++-common/goacc/private-reduction-1.c: New test.
	* c-c++-common/goacc/reduction-5.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test

Re: [patch] combine ICE fix

2013-11-27 Thread Cesar Philippidis
On 10/16/13, 11:03 AM, Jeff Law wrote:
> On 10/16/13 09:34, Cesar Philippidis wrote:
>> On 10/15/13 12:16 PM, Jeff Law wrote:
>>> On 10/10/13 10:25, Jakub Jelinek wrote:
>>>> On Thu, Oct 10, 2013 at 07:26:43AM -0700, Cesar Philippidis wrote:
>>>>> This patch addresses an ICE when building qemu for a Mips target in
>>>>> Yocto. Both gcc-trunk, gcc-4.8 and all of the targets are potentially
>>>>> affected. The problem occurs because the instruction combine phase
>>>>> uses
>>>>> two data structures to keep track of registers, reg_stat and
>>>>> regstat_n_sets_and_refs, but they potentially end up out of sync; when
>>>>> combine inserts a new register into reg_stat it does not update
>>>>> regstat_n_sets_and_refs. Failing to update the latter results in an
>>>>> occasional segmentation fault.
>>>>>
>>>>> Is this OK for trunk and gcc-4.8? If so, please check it in. I
>>>>> tested it
>>>>> on Mips and x86-64 and no regressions showed up.
>>>>
>>>> That looks broken.  You leave everything from the last size till the
>>>> current
>>>> one uninitialized, so it would only work if combine_split_insns
>>>> always increments max_reg_num () by at most one.
>>> I don't think that assumption is safe.  Consider a parallel with a bunch
>>> of (clobber (match_scratch)) expressions.
>>
>> I address that in the patch posted here
>> <http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00802.html>. Is that still
>> insufficient?
> Thanks.  I wasn't aware of the follow-up.
> 
>>
>>>> Furthermore, there are macros which should be used to access
>>>> the fields, and, if the vector is ever going to be resized, supposedly
>>>> it should be vec.h vector rather than just array.
>>>> Or perhaps take into account:
>>>> /* If a pass need to change these values in some magical way or the
>>>>  pass needs to have accurate values for these and is not using
>>>>  incremental df scanning, then it should use REG_N_SETS and
>>>>  REG_N_USES.  If the pass is doing incremental scanning then it
>>>>  should be getting the info from DF_REG_DEF_COUNT and
>>>>  DF_REG_USE_COUNT.  */
>>>> and not use REG_N_SETS etc. but instead the df stuff.
>>> Which begs the question, how exactly is combine utilizing the
>>> regstat_n_* structures and is that use even valid for combine?
>>
>> I'll take a look at that.
> This needs to be resolved before we can go forward with your patch.

Sorry for the delayed response. I had some time to work on this recently.

I looked into adding support for incremental DF scanning from df*.[ch]
in combine but there are a couple of problems. First of all, combine
does its own DF analysis. It does so because its usage falls under this
category (df-core.c):

   c) If the pass modifies insns several times, this incremental
  updating may be expensive.

Furthermore, combine's DF relies on the DF scanning to be deferred, so
the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED
take place before it updates the notes for those insns. Also, combine
has a tendency to undo its changes occasionally.

With that in mind, is the patch here
<http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00802.html> OK? Otherwise,
since combine only uses REG_N_SETS, I was considering adding a
reg_n_sets member to struct reg_stat_struct. Is that approach better?

Thanks,
Cesar



Re: [gomp4, fortran] Patch to fix continuation checks of OpenACC and OpenMP directives

2015-07-14 Thread Cesar Philippidis
On 07/14/2015 02:20 PM, Ilmir Usmanov wrote:
> Ping

Sorry, I thought I had already approved this. It's fine for gomp-4_0-branch.

Cesar

> On 07.07.2015 14:27, Ilmir Usmanov wrote:
>> Ping
>>
>> 30.06.2015, 03:43, "Ilmir Usmanov" :
>>> Hi Cesar!
>>>
>>> Thanks for your review!
>>>
>>> 08.06.2015, 17:59, "Cesar Philippidis" :
>>>>   On 06/07/2015 02:05 PM, Ilmir Usmanov wrote:
>>>>>Fixed fortran mail-list address. Sorry for inconvenience.
>>>>>
>>>>>08.06.2015, 00:01, "Ilmir Usmanov" :
>>>>>>>Hi Cesar!
>>>>>>>
>>>>>>>This patch fixes checks of OpenMP and OpenACC continuations in
>>>>>>>case if someone mixes them (i.e. continues OpenMP directive with
>>>>>>>!$ACC sentinel or vice versa).
>>>>>>>
>>>>>>>OK for gomp branch?
>>>>   Thanks for working on this. Does this fix PR63858 by any chance?
>>> No problem. I had a feeling that something is wrong in the scanner since
>>> I've committed an initial support of OpenACC ver. 1.0 to gomp branch
>>> (more than a year ago).
>>> Now it does fix the PR, because I've added support of fixed form to the
>>> patch. BTW, your test in the PR has a wrong continuation. Fixed test
>>> added to the patch.
>>>
>>>>   two minor nits...
>>>>
>>>>>0001-Fix-mix-of-OpenACC-and-OpenMP-sentinels-in-continuat.patch
>>>>>
>>>>>From 5492bf5bc991b6924f5e3b35c11eeaed745df073 Mon Sep 17
>>>>> 00:00:00 2001
>>>>>From: Ilmir Usmanov 
>>>>>Date: Sun, 7 Jun 2015 23:55:22 +0300
>>>>>Subject: [PATCH] Fix mix of OpenACC and OpenMP sentinels in
>>>>> continuation
>>>>>
>>>>>---
>>>>> gcc/fortran/ChangeLog | 5 +
>>>>   Use ChangeLog.gomp for gomp-4_0-branch.
>>> Done.
>>>
>>>>>+ /* In case we have an OpenMP directive continued by OpenACC
>>>>>+ sentinel, or vice versa, we get both openmp_flag and
>>>>>+ openacc_flag on. */
>>>>>+
>>>>>+ if (openacc_flag && openmp_flag)
>>>>>+ {
>>>>>+ int is_openmp = 0;
>>>>>+ for (i = 0; i < 5; i++, c = next_char ())
>>>>>+ {
>>>>>+ if (gfc_wide_tolower (c) != (unsigned char) "!$acc"[i])
>>>>>+ is_openmp = 1;
>>>>>+ if (i == 4)
>>>>>+ old_loc = gfc_current_locus;
>>>>>+ }
>>>>>+ gfc_error ("Wrong %s continuation at %C: expected %s, got %s",
>>>>>+ is_openmp ? "OpenACC" : "OpenMP",
>>>>>+ is_openmp ? "!$ACC" : "!$OMP",
>>>>>+ is_openmp ? "!$OMP" : "!$ACC");
>>>>   I think it's better for the translation project if you made this a
>>>>   complete string. So maybe change this line into
>>>>
>>>> gfc_error (is_openmp ? "Wrong continuation at %C: expected
>>>> !$ACC, got"
>>>>" !$OMP",
>>>>: "Wrong continuation at %C: expected !$OMP, got
>>>> !$ACC");
>>> Done
>>>
>>>>   Other than that, it looks fine.
>>>>
>>>>   Thanks,
>>>>   Cesar
>>> OK for gomp branch?
>>>
>>> -- 
>>> Ilmir.
>> -- 
>> Ilmir.
>>



[gomp4] OpenACC vector and worker reductions

2015-07-17 Thread Cesar Philippidis
This patch adds support for OpenACC vector and worker reductions in a
target-independent fashion. It adds quite a bit of machinery to
accomplish that goal. For starters, three internal functions,
GOACC_REDUCTION_INIT, GOACC_REDUCTION and GOACC_REDUCTION_WRITEBACK,
have been introduced. It's probably easiest to explain all of the
changes with an example. Given an acc loop reduction as follows

  red = ...

  #pragma acc loop reduction (+:red) vector
  for (...)
red++;

the OpenMP way to lower this reduction would be to introduce a new
private variable for 'red', which I'll call red.private. That private
reduction variable gets initialized with some value depending on the
reduction operation. All of the references to the original reduction
variable inside the loop get replaced with the private copy. Immediately
after the loop exits, the original reduction variable is atomically
updated with the private copy.

The code ends up looking something as follows:

  red = ...
  red.private = 0;   // initialize red.internal
  #pragma omp for (...)
red.internal++;
  #pragma omp continue
red += red.private // this is an atomic operation
  #pragma omp end

Conceptually, this loop may be decomposed into three sections. The first
section is the reduction initializer, the second is the loop, and the
third is the reduction finalizer.

This get a little more complicated in OpenACC. For starters, there are
three levels of parallelism that may be associated with a single acc
loop. When transferring from one level of parallelism to another, some
targets (e.g. nvptx) may require variable state propagation and
predication due to the constraints of static thread scheduling. Nathan
solved that problem, at least from a high-level, by surrounding acc
loops with GOACC_FORK and GOACC_JOIN function markers.

Furthermore, certain targets have hardware limitations preventing
general atomic operations from being utilized. Specifically, spinlocks
may not be used by threads inside the same warp for nvptx targets. In
gcc 6.0, warps corresponds to vectors, which currently contain 32
threads. That said, spinlocks are usable on nvptx targets if only one
thread within a warp is using it. This patch solves this problem by
breaking up the reduction finalizer into two steps -- a parallel
reduction (a call to GOACC_REDUCTION) and a write-back to the original
variable. In OpenACC, the original loop gets lowered into the following
form:

  red = ...
  red.private = GOACC_REDUCTION_INIT (0)
  GOACC_FORK ()
  #pragma omp for (...)
red.internal++;
  #pragma omp continue
red.private = GOACC_REDUCTION (gwv_mask, op, red.private)
GOACC_WRITEBACK ()
red += red.private // this is an atomic operation
  #pragma omp end
  GOACC_JOIN ()

First of all, the call to GOACC_REDUCTION_INIT is necessary to ensure
that red.private has a value to propagate to all of the threads
associated with that loop. Without it, in situations where there are
more threads than loop iterations, the threads that didn't enter the
body of the loop would not contain a proper initial value, so the
reduction finalizer would be generating bogus results.

Both GOACC_REDUCTION and GOACC_WRITEBACK get evaluated inside the target
compiler by a new fold_oacc_reductions pass. That pass uses
targetm.goacc.fold_reduction to fold GOACC_REDUCTION in a
target-specific way. That pass also removes the GOACC_WRITEBACK marker
and moves the nearest GOACC_JOIN call at it's place if necessary
(worker-only loops are special). This is guaranteed to work because
OpenACC loops are single-entry, single-exit and there is only one
GOACC_WRITEBACK marker per acc loop (there is one GOACC_REDUCTION per
reduction though). Moving the GOACC_JOIN up allows the reduction
write-back to operate in a corresponds 'single' mode. E.g. since this
example executes the body in vector-partitioned mode, the original
reduction variable must be updated in vector-single mode.

There's one more quirk that I encountered while working on this patch.
All dummy args to fortran subroutine are passed by reference. That
causes problems for loop state propagation, because only the pointer
gets propagated, and not the value being pointed to. To get around this,
I taught the gimplifier to introduce a new local copy of the reduction
variable. Now the reduction clause has five operands associated with it,
with the fifth one being new private reduction variable.

In addition to the above machinery, this patch also implements the
fold_reduction hook on nvptx targets to use a tree-reduction for vector
loops. All other reductions on nvptx targets use atomics.

I hopefully ironed out all of the bugs in this patch, but I am rerunning
the entire regression testsuite again. Any comments are welcome. Is this
reduction scheme too nvptx-specific?

I'll post the test cases in a follow up patch because the patch would be
too big for the mailing list otherwise.

Thanks,
Cesar
2015-07-17  

[gomp4] OpenACC reduction tests

2015-07-17 Thread Cesar Philippidis
This patch updates the libgomp OpenACC reduction test cases to check
worker, vector and combined gang worker vector reductions. I tried to
use some macros to simplify the c test cases a bit. I probably could
have made them more generic with an additional header file/macro, but
then that makes it too confusing too debug. The fortran tests are a bit
of a lost clause, unless someone knows how to use the preprocessor with
!$acc loops.

Cesar
2015-07-17  Cesar Philippidis  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/reduction.h: New file.
	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Update tests
	with worker, vector and combined reductions.
	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-6.c: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.


diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
index bb81759..8738927 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-1.c
@@ -3,44 +3,54 @@
 /* Integer reductions.  */
 
 #include 
-#include 
-
-#define ng 32
-
-#define DO_PRAGMA(x) _Pragma (#x)
-
-#define check_reduction_op(type, op, init, b)	\
-  {		\
-type res, vres;\
-res = (init);\
-DO_PRAGMA (acc parallel num_gangs (ng) copy (res)) \
-DO_PRAGMA (acc loop gang reduction (op:res))	\
-for (i = 0; i < n; i++)			\
-  res = res op (b);\
-		\
-vres = (init);\
-for (i = 0; i < n; i++)			\
-  vres = vres op (b);			\
-		\
-if (res != vres)\
-  abort ();	\
-  }
+#include "reduction.h"
+
+const int ng = 8;
+const int nw = 4;
+const int vl = 32;
 
 static void
-test_reductions_int (void)
+test_reductions (void)
 {
-  const int n = 1000;
+  const int n = 100;
   int i;
   int array[n];
 
   for (i = 0; i < n; i++)
-array[i] = i;
-
-  check_reduction_op (int, +, 0, array[i]);
-  check_reduction_op (int, *, 1, array[i]);
-  check_reduction_op (int, &, -1, array[i]);
-  check_reduction_op (int, |, 0, array[i]);
-  check_reduction_op (int, ^, 0, array[i]);
+array[i] = i+1;
+
+  /* Gang reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng), gang);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng), gang);
+
+  /* Worker reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, *, 1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, &, -1, array[i], num_workers (nw), worker);
+  check_reduction_op (int, |, 0, array[i], num_workers (nw), worker);
+  check_reduction_op (int, ^, 0, array[i], num_workers (nw), worker);
+
+  /* Vector reductions.  */
+  check_reduction_op (int, +, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, *, 1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, &, -1, array[i], vector_length (vl), vector);
+  check_reduction_op (int, |, 0, array[i], vector_length (vl), vector);
+  check_reduction_op (int, ^, 0, array[i], vector_length (vl), vector);
+
+  /* Combined reductions.  */
+  check_reduction_op (int, +, 0, array[i], num_gangs (ng) num_workers (nw)
+		  vector_length (vl), gang worker vector);
+  check_reduction_op (int, *, 1, array[i], num_gangs (ng) num_workers (nw)
+		  vector_length (vl), gang worker vector);
+  check_reduction_op (int, &, -1, array[i], num_gangs (ng) num_workers (nw)
+		  vector_length (vl), gang worker vector);
+  check_reduction_op (int, |, 0, array[i], num_gangs (ng) num_workers (nw)
+		  vector_length (vl), gang worker vector);
+  check_reduction_op (int, ^, 0, array[i], num_gangs (ng) num_workers (nw)
+		  vector_length (vl), gang worker vector);
 }
 
 static void
@@ -55,32 +65,31 @@ test_reductions_bool (void)
 array[i] = i;
 
   cmp_val = 5;
-#if 0
-  // TODO
-  check_reduction_op (bool, &&, true, (cmp_val > array[i]));
-  check_reduction_op (bool, ||, false, (cmp_val > array[i]));
-#endif
-}
 
-#define check_reduction_macro(type, op, init, b)	\
-  {			\
-type res, vres;	\
-res = (init);	\
-DO_PRAGMA (acc parallel 

[gomp4] cleanup firstprivate test case

2015-07-17 Thread Cesar Philippidis
Tom noticed that one of my firstprivate test cases in libgomp had an omp
pragma. That pragma shouldn't be there. I probably forgot to remove that
pragma when I integrated that test into the libgomp test suite. This
patch corrects that test.

I applied this patch to gomp-4_0-branch.

Cesar
2015-07-17  Cesar Philippidis  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: Remove
	omp pragma.


diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
index e5fc6a0..69abb23 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
@@ -16,7 +16,6 @@ main()
   for (i = 0; i < n; i++)
 b[i] = -1;
 
-  #pragma omp parallel for firstprivate (a)
   #pragma acc parallel num_gangs (n) firstprivate (a)
   #pragma acc loop gang
   for (i = 0; i < n; i++)


[gomp4] Add new oacc_transform patch

2015-07-21 Thread Cesar Philippidis
Jakub,

Nathan pointed out that I should make the fold_oacc_reductions pass that
I introduced in my reduction patch more generic so that other openacc
transformations may use it. This patch introduces an empty skeleton pass
called oacc_transform. Currently I'm stashing it inside omp-low.c. Is
that a good place for it, or should I move it to it's own separate file?

The motivation behind this pass is to allow us to generate
target-specific code in a generic manner. E.g., for reductions, I'm
emitting calls to internal functions during lowering, then later on in
this pass I'm expanding those calls using target machine hooks. This
pass will run after lto on the target compiler.

Thanks,
Cesar
2015-07-21  Cesar Philippidis  

	gcc/
	* omp-low.c (execute_oacc_transform): New function.
	(class pass_oacc_transform): New function.
	(make_pass_oacc_transform): New function.
	* passes.def: Add pass_oacc_transform to all_passes.
	* tree-pass.h (make_pass_oacc_transform): Declare.
	

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 388013c..23989f9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -14394,4 +14394,76 @@ make_pass_late_lower_omp (gcc::context *ctxt)
   return new pass_late_lower_omp (ctxt);
 }
 
+/* Main entry point for oacc transformations which run on the device
+   compiler.  */
+
+static unsigned int
+execute_oacc_transform ()
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+
+  if (!lookup_attribute ("oacc function",
+			 DECL_ATTRIBUTES (current_function_decl)))
+return 0;
+
+
+  FOR_ALL_BB_FN (bb, cfun)
+{
+  gsi = gsi_start_bb (bb);
+
+  while (!gsi_end_p (gsi))
+	{
+	  stmt = gsi_stmt (gsi);
+	  gsi_next (&gsi);
+	}
+}
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_oacc_transform =
+{
+  GIMPLE_PASS, /* type */
+  "fold_oacc_transform", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_oacc_transform : public gimple_opt_pass
+{
+public:
+  pass_oacc_transform (gcc::context *ctxt)
+: gimple_opt_pass (pass_data_oacc_transform, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual unsigned int execute (function *)
+{
+  bool gate = (flag_openacc != 0 && !seen_error ());
+
+  if (!gate)
+	return 0;
+
+  return execute_oacc_transform ();
+}
+
+}; // class pass_oacc_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_transform (gcc::context *ctxt)
+{
+  return new pass_oacc_transform (ctxt);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index 43e67df..6a2b095 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -165,6 +165,7 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
+  NEXT_PASS (pass_oacc_transform);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
   NEXT_PASS (pass_remove_cgraph_callee_edges);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 13f20ea..67dc017 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -410,6 +410,7 @@ extern gimple_opt_pass *make_pass_late_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_transform (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);


[patch] PR66714 -- Re: Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-23 Thread Cesar Philippidis
On 07/13/2015 06:43 AM, Michael Matz wrote:

> This also hints at other problems (which might not actually occur in the 
> case at hand, but still): the contents of DECL_VALUE_EXPR is the "real" 
> thing containing the value of a decl (i.e. a decl having a value-expr 
> doesn't itself occur in the code anymore), be it a decl itself, or some 
> expression (which might also refer to decls).  Now, in PR 66714 you 
> analyzed that one of those D* was removed from the function, which should 
> have happened only because no code referred to anymore, i.e. D* was also 
> rewritten to some other D'* (if it weren't rewritten and D* was referred 
> to in code, you would have created a miscompilation).  At that point also 
> the DECL_VALUE_EXPRs need to be rewritten to refer to D'*, not to D* 
> anymore.

The attached patch does just that; it teaches
replace_block_vars_by_duplicates to replace the decls inside the
value-exprs with a duplicate too. It's kind of messy though. At the
moment I'm only considering VAR_DECL, PARM_DECL, RESULT_DECL, ADDR_EXPR,
ARRAY_REF, COMPONENT_REF, CONVERT_EXPR, NOP_EXPR, INDIRECT_REF and
MEM_REFs. I suspect that I may be missing some, but these are the only
ones that were triggered gcc_unreachable during testing.

As Tom mentioned in PR66714, this bug is present on trunk, specifically
in code using omp targets. Is this patch OK for trunk? I bootstrapped
and tested on x86_64-linux-gnu.

Cesar
2015-07-22  Cesar Philippidis  
	Tom de Vries  

	gcc/
	* tree-cfg.c (replace_by_duplicate_decl_value_expr): New function.
	(replace_block_vars_by_duplicates): Ensure that value expr decls
	have been copied usign replace_by_duplicate_decl_value_expr.

	libgomp/
	* testsuite/libgomp.c/pr66714.c: New file.
	

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index fde7fbc..15cb122 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6439,6 +6439,99 @@ replace_by_duplicate_decl (tree *tp, hash_map *vars_map,
   *tp = new_t;
 }
 
+/* Replaces the value expression *TP with a duplicate (belonging to function
+   TO_CONTEXT).  The duplicates are recorded in VARS_MAP.  */
+
+static void
+replace_by_duplicate_decl_value_expr (tree *tp,
+  hash_map *vars_map,
+  tree to_context)
+{
+  tree x = *tp;
+
+  switch (TREE_CODE (*tp))
+{
+case VAR_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+  replace_by_duplicate_decl (tp, vars_map, to_context);
+  break;
+case ADDR_EXPR:
+  {
+	tree expr = TREE_OPERAND (x, 0);
+
+	replace_by_duplicate_decl_value_expr (&expr, vars_map, to_context);
+	*tp = build1 (ADDR_EXPR, TREE_TYPE (x), expr);
+  }
+  break;
+case ARRAY_REF:
+  {
+	tree array = TREE_OPERAND (x, 0);
+	tree index = TREE_OPERAND (x, 1);
+	tree arg2 = TREE_OPERAND (x, 2);
+	tree arg3 = TREE_OPERAND (x, 3);
+
+	replace_by_duplicate_decl (&array, vars_map, to_context);
+	replace_by_duplicate_decl (&index, vars_map, to_context);
+
+	*tp = build4 (ARRAY_REF, TREE_TYPE (x), array, index,
+		  arg2, arg3);
+  }
+  break;
+case COMPONENT_REF:
+  {
+	tree component = TREE_OPERAND (x, 0);
+	tree field = TREE_OPERAND (x, 1);
+	tree ref;
+
+	/* Components may be MEM_REFs.  */
+	replace_by_duplicate_decl_value_expr (&component, vars_map,
+	  to_context);
+	ref = build3 (COMPONENT_REF, TREE_TYPE (field), component,
+		  field, NULL);
+
+	if (TREE_THIS_VOLATILE (x))
+	  TREE_THIS_VOLATILE (ref) |= 1;
+	if (TREE_READONLY (x))
+	  TREE_READONLY (ref) |= 1;
+
+	*tp = ref;
+  }
+  break;
+case CONVERT_EXPR:
+case NOP_EXPR:
+case INDIRECT_REF:
+  {
+	tree expr = TREE_OPERAND (x, 0);
+	tree decl;
+
+	if (CONVERT_EXPR_CODE_P (TREE_CODE (expr)))
+	  decl = TREE_OPERAND (expr, 0);
+	else
+	  decl = expr;
+
+	replace_by_duplicate_decl (&decl, vars_map, to_context);
+
+	if (CONVERT_EXPR_CODE_P (TREE_CODE (expr)))
+	  expr = build1 (TREE_CODE (expr), TREE_TYPE (expr), decl);
+	else
+	  expr = decl;
+
+	*tp = build_simple_mem_ref (expr);
+  }
+  break;
+case MEM_REF:
+  {
+	tree mem = TREE_OPERAND (x, 0);
+
+	replace_by_duplicate_decl_value_expr (&mem, vars_map, to_context);
+	*tp = build_simple_mem_ref (mem);
+  }
+  break;
+default:
+  gcc_unreachable ();
+}
+}
 
 /* Creates an ssa name in TO_CONTEXT equivalent to NAME.
VARS_MAP maps old ssa names and var_decls to the new ones.  */
@@ -6916,7 +7009,11 @@ replace_block_vars_by_duplicates (tree block, hash_map *vars_map,
 	{
 	  if (TREE_CODE (*tp) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (*tp))
 	{
-	  SET_DECL_VALUE_EXPR (t, DECL_VALUE_EXPR (*tp));
+	  tree x = DECL_VALUE_EXPR (*tp);
+
+	  replace_by_duplicate_decl_value_expr (&x, vars_map, to_context);
+
+	  SET_DECL_VALUE_EXPR (t, x);
 	  DECL_HAS_VALUE_EXPR_P (t) = 1;
 	}
 	  DECL_CHAIN (t) = DECL_CHAIN (*tp);
diff --git a/libgomp/testsuite/libgomp.

Re: [patch] PR66714 -- Re: Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-23 Thread Cesar Philippidis
On 07/23/2015 08:32 AM, Jakub Jelinek wrote:
> On Thu, Jul 23, 2015 at 08:20:50AM -0700, Cesar Philippidis wrote:
>> The attached patch does just that; it teaches
>> replace_block_vars_by_duplicates to replace the decls inside the
>> value-exprs with a duplicate too. It's kind of messy though. At the
>> moment I'm only considering VAR_DECL, PARM_DECL, RESULT_DECL, ADDR_EXPR,
>> ARRAY_REF, COMPONENT_REF, CONVERT_EXPR, NOP_EXPR, INDIRECT_REF and
>> MEM_REFs. I suspect that I may be missing some, but these are the only
>> ones that were triggered gcc_unreachable during testing.
> 
> Ugh, that looks ugly, why do we have all the tree walkers?
> I'd unshare_expr the value expr first, you really don't want to share
> it anyway, and then just walk_tree and find all the decls in there
> (with *walk_subtrees on types and perhaps something else too) and for them
> replace_by_duplicate_decl (tp, vars_map, to_context);

Something like the attached patch? Why do TREE_TYPEs need special handling?

Is it OK for trunk?

Cesar
2015-07-23  Cesar Philippidis  

	gcc/
	* tree-cfg.c (struct replace_decls_d): New struct.
	(replace_block_vars_by_duplicates_1): New function.
	(replace_block_vars_by_duplicates): Use it to replace the decls
	in the value exprs by duplicates.

	libgomp/
	* testsuite/libgomp.c/pr66714.c: New test.


diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index fde7fbc..900274a 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "tree-cfgcleanup.h"
 #include "wide-int-print.h"
+#include "gimplify.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
for a function tree.  */
@@ -108,6 +109,13 @@ struct cfg_stats_d
 
 static struct cfg_stats_d cfg_stats;
 
+/* Data to pass to replace_block_vars_by_duplicates_1.  */
+struct replace_decls_d
+{
+  hash_map *vars_map;
+  tree to_context;
+};
+
 /* Hash table to store last discriminator assigned for each locus.  */
 struct locus_discrim_map
 {
@@ -6897,6 +6905,29 @@ new_label_mapper (tree decl, void *data)
   return m->to;
 }
 
+/* Tree walker to replace the decls used inside value expressions by
+   duplicates.  */
+
+static tree
+replace_block_vars_by_duplicates_1 (tree *tp, int *walk_subtrees, void *data)
+{
+  struct replace_decls_d *rd = (struct replace_decls_d *)data;
+
+  switch (TREE_CODE (*tp))
+{
+case VAR_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+  replace_by_duplicate_decl (tp, rd->vars_map, rd->to_context);
+  *walk_subtrees = 0;
+  break;
+default:
+  break;
+}
+
+  return NULL;
+}
+
 /* Change DECL_CONTEXT of all BLOCK_VARS in block, including
subblocks.  */
 
@@ -6916,7 +6947,11 @@ replace_block_vars_by_duplicates (tree block, hash_map *vars_map,
 	{
 	  if (TREE_CODE (*tp) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (*tp))
 	{
-	  SET_DECL_VALUE_EXPR (t, DECL_VALUE_EXPR (*tp));
+	  tree x = DECL_VALUE_EXPR (*tp);
+	  struct replace_decls_d rd = { vars_map, to_context };
+	  unshare_expr (x);
+	  walk_tree (&x, replace_block_vars_by_duplicates_1, &rd, NULL);
+	  SET_DECL_VALUE_EXPR (t, x);
 	  DECL_HAS_VALUE_EXPR_P (t) = 1;
 	}
 	  DECL_CHAIN (t) = DECL_CHAIN (*tp);
diff --git a/libgomp/testsuite/libgomp.c/pr66714.c b/libgomp/testsuite/libgomp.c/pr66714.c
new file mode 100644
index 000..c9af4a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr66714.c
@@ -0,0 +1,17 @@
+/* { dg-do "compile" } */
+/* { dg-additional-options "--param ggc-min-expand=0" } */
+/* { dg-additional-options "--param ggc-min-heapsize=0" } */
+/* { dg-additional-options "-g" } */
+
+/* Minimized from on target-2.c.  */
+
+void
+fn3 (int x)
+{
+  double b[3 * x];
+  int i;
+#pragma omp target
+#pragma omp parallel for
+  for (i = 0; i < x; i++)
+b[i] += 1;
+}


Re: [patch] PR66714 -- Re: Re: [RFC] two-phase marking in gt_cleare_cache

2015-07-24 Thread Cesar Philippidis
On 07/23/2015 03:11 PM, Jakub Jelinek wrote:
> On Thu, Jul 23, 2015 at 03:01:25PM -0700, Cesar Philippidis wrote:
>> On 07/23/2015 08:32 AM, Jakub Jelinek wrote:
>>> On Thu, Jul 23, 2015 at 08:20:50AM -0700, Cesar Philippidis wrote:
>>>> The attached patch does just that; it teaches
>>>> replace_block_vars_by_duplicates to replace the decls inside the
>>>> value-exprs with a duplicate too. It's kind of messy though. At the
>>>> moment I'm only considering VAR_DECL, PARM_DECL, RESULT_DECL, ADDR_EXPR,
>>>> ARRAY_REF, COMPONENT_REF, CONVERT_EXPR, NOP_EXPR, INDIRECT_REF and
>>>> MEM_REFs. I suspect that I may be missing some, but these are the only
>>>> ones that were triggered gcc_unreachable during testing.
>>>
>>> Ugh, that looks ugly, why do we have all the tree walkers?
>>> I'd unshare_expr the value expr first, you really don't want to share
>>> it anyway, and then just walk_tree and find all the decls in there
>>> (with *walk_subtrees on types and perhaps something else too) and for them
>>> replace_by_duplicate_decl (tp, vars_map, to_context);
>>
>> Something like the attached patch? Why do TREE_TYPEs need special handling?
> 
> They can have decls in various places like TYPE_SIZE_UNIT, TYPE_SIZE, the
> bounds of TYPE_DOMAIN etc. and I believe you generally don't want to replace
> those.  Plus you risk infinite recursion then (unless 
> walk_tree_without_duplicates).
> Most walk_tree callbacks just do something like
>   if (IS_TYPE_OR_DECL_P (*tp))
> *walk_subtrees = 0;

This patch the check for IS_TYPE_OF_DECL_P in this patch. Is this ok for
trunk?

Cesar
2015-07-24  Cesar Philippidis  

	gcc/
	* tree-cfg.c (struct replace_decls_d): New struct.
	(replace_block_vars_by_duplicates_1): New function.
	(replace_block_vars_by_duplicates): Use it to replace the decls
	in the value exprs by duplicates.

	libgomp/
	* testsuite/libgomp.c/pr66714.c: New test.
	

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index fde7fbc..cb9fe6d 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "tree-cfgcleanup.h"
 #include "wide-int-print.h"
+#include "gimplify.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
for a function tree.  */
@@ -108,6 +109,13 @@ struct cfg_stats_d
 
 static struct cfg_stats_d cfg_stats;
 
+/* Data to pass to replace_block_vars_by_duplicates_1.  */
+struct replace_decls_d
+{
+  hash_map *vars_map;
+  tree to_context;
+};
+
 /* Hash table to store last discriminator assigned for each locus.  */
 struct locus_discrim_map
 {
@@ -6897,6 +6905,31 @@ new_label_mapper (tree decl, void *data)
   return m->to;
 }
 
+/* Tree walker to replace the decls used inside value expressions by
+   duplicates.  */
+
+static tree
+replace_block_vars_by_duplicates_1 (tree *tp, int *walk_subtrees, void *data)
+{
+  struct replace_decls_d *rd = (struct replace_decls_d *)data;
+
+  switch (TREE_CODE (*tp))
+{
+case VAR_DECL:
+case PARM_DECL:
+case RESULT_DECL:
+  replace_by_duplicate_decl (tp, rd->vars_map, rd->to_context);
+  break;
+default:
+  break;
+}
+
+  if (IS_TYPE_OR_DECL_P (*tp))
+*walk_subtrees = false;
+
+  return NULL;
+}
+
 /* Change DECL_CONTEXT of all BLOCK_VARS in block, including
subblocks.  */
 
@@ -6916,7 +6949,11 @@ replace_block_vars_by_duplicates (tree block, hash_map *vars_map,
 	{
 	  if (TREE_CODE (*tp) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (*tp))
 	{
-	  SET_DECL_VALUE_EXPR (t, DECL_VALUE_EXPR (*tp));
+	  tree x = DECL_VALUE_EXPR (*tp);
+	  struct replace_decls_d rd = { vars_map, to_context };
+	  unshare_expr (x);
+	  walk_tree (&x, replace_block_vars_by_duplicates_1, &rd, NULL);
+	  SET_DECL_VALUE_EXPR (t, x);
 	  DECL_HAS_VALUE_EXPR_P (t) = 1;
 	}
 	  DECL_CHAIN (t) = DECL_CHAIN (*tp);
diff --git a/libgomp/testsuite/libgomp.c/pr66714.c b/libgomp/testsuite/libgomp.c/pr66714.c
new file mode 100644
index 000..c9af4a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr66714.c
@@ -0,0 +1,17 @@
+/* { dg-do "compile" } */
+/* { dg-additional-options "--param ggc-min-expand=0" } */
+/* { dg-additional-options "--param ggc-min-heapsize=0" } */
+/* { dg-additional-options "-g" } */
+
+/* Minimized from on target-2.c.  */
+
+void
+fn3 (int x)
+{
+  double b[3 * x];
+  int i;
+#pragma omp target
+#pragma omp parallel for
+  for (i = 0; i < x; i++)
+b[i] += 1;
+}


[gomp4] acc routines bugfix

2015-07-24 Thread Cesar Philippidis
Jim ran into an ICE in a fortran program which contains an acc vector
loop with a call to a subroutine. There are two things going on in here.
First, a couple of functions in tree-nested.c weren't considering the
_GANG, _WORKER, _VECTOR, and _SEQ omp clause codes. Second, the target
lto compiler would fail an assert if a routine clause wasn't applied to
the subroutine.

The second point is interesting. Offloaded functions require the "omp
target" attribute or that function won't reach the lto compiler. That's
fine because not all targets can handle general code. The problem occurs
when a user forgets to bless a function as offloaded, which OpenACC
allows. This patch teaches the lto-wrapper to error on unrecognized
functions with flag_openacc or hit gcc_unreachable otherwise. I couldn't
think of a way to test the lto error message because that involves
having two compilers present. I wonder if it's ok to have libgomp check
for compiler expected compiler errors? However, that's more of a
gcc/testsuite type of check.

I don't think trunk has much support for acc routines just yet, so I
applied this patch to gomp-4_0-branch for now.

Cesar
2015-07-24  Cesar Philippidis  

	gcc/
	* lto-cgraph.c (input_overwrite_node): Gracefully error on missing
	symbols with flag_openacc.
	* tree-nested.c (convert_nonlocal_omp_clauses): Handle OMP_CLAUSE_GANG,
	OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR, and OMP_CLAUSE_SEQ.
	(convert_local_omp_clauses): Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/vector-routine.f90: New test.
	

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 97585c9..bc589bd 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1219,9 +1219,23 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
  LDPR_NUM_KNOWN);
   node->instrumentation_clone = bp_unpack_value (bp, 1);
   node->split_part = bp_unpack_value (bp, 1);
-  gcc_assert (flag_ltrans
-	  || (!node->in_other_partition
-		  && !node->used_from_other_partition));
+
+  int success = flag_ltrans || (!node->in_other_partition
+&& !node->used_from_other_partition);
+
+  if (!success)
+{
+  if (flag_openacc)
+	{
+	  if (TREE_CODE (node->decl) == FUNCTION_DECL)
+	error ("Missing routine function %<%s%>", node->name ());
+	  else
+	error ("Missing declared variable %<%s%>", node->name ());
+	}
+
+  else
+	gcc_unreachable ();
+}
 }
 
 /* Return string alias is alias of.  */
diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
index 6b75020..3b02443 100644
--- a/gcc/tree-nested.c
+++ b/gcc/tree-nested.c
@@ -1188,6 +1188,10 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PROC_BIND:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_SEQ:
 	  break;
 
 	default:
@@ -1828,6 +1832,10 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 	case OMP_CLAUSE_UNTIED:
 	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PROC_BIND:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_SEQ:
 	  break;
 
 	default:
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/vector-routine.f90 b/libgomp/testsuite/libgomp.oacc-fortran/vector-routine.f90
new file mode 100644
index 000..a8d078a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/vector-routine.f90
@@ -0,0 +1,46 @@
+! { dg-do run }
+
+module param
+  integer, parameter :: N = 32
+end module param
+
+program main
+  use param
+  integer :: i
+  integer :: a(N)
+
+  do i = 1, N
+a(i) = i
+  end do
+
+  !
+  ! Appears there's two bugs...
+  ! 1) loop with vector
+  ! 2) loop without vector
+  !
+
+  !$acc parallel copy (a)
+  !$acc loop vector
+do i = 1, N
+  call vector (a)
+end do
+  !$acc end parallel
+
+  do i = 1, N
+if (a(i) .ne. 0) call abort
+  end do
+
+contains
+
+  subroutine vector (a)
+  !$acc routine vector
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  do i = 1, N
+a(i) = a(i) - a(i) 
+  end do
+
+end subroutine vector
+
+end program main


Re: [gomp4] acc routines bugfix

2015-07-24 Thread Cesar Philippidis
On 07/24/2015 08:21 AM, Ilya Verbin wrote:
> On Fri, Jul 24, 2015 at 08:05:00 -0700, Cesar Philippidis wrote:
>> The second point is interesting. Offloaded functions require the "omp
>> target" attribute or that function won't reach the lto compiler. That's
>> fine because not all targets can handle general code. The problem occurs
>> when a user forgets to bless a function as offloaded, which OpenACC
>> allows. This patch teaches the lto-wrapper to error on unrecognized
>> functions with flag_openacc or hit gcc_unreachable otherwise. I couldn't
>> think of a way to test the lto error message because that involves
>> having two compilers present. I wonder if it's ok to have libgomp check
>> for compiler expected compiler errors? However, that's more of a
>> gcc/testsuite type of check.
>>
>> I don't think trunk has much support for acc routines just yet, so I
>> applied this patch to gomp-4_0-branch for now.
> 
> OpenMP has similar issue.
> 
>> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
>> index 97585c9..bc589bd 100644
>> --- a/gcc/lto-cgraph.c
>> +++ b/gcc/lto-cgraph.c
>> @@ -1219,9 +1219,23 @@ input_overwrite_node (struct lto_file_decl_data 
>> *file_data,
>>   LDPR_NUM_KNOWN);
>>node->instrumentation_clone = bp_unpack_value (bp, 1);
>>node->split_part = bp_unpack_value (bp, 1);
>> -  gcc_assert (flag_ltrans
>> -  || (!node->in_other_partition
>> -  && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> +&& !node->used_from_other_partition);
>> +
>> +  if (!success)
>> +{
>> +  if (flag_openacc)
>> +{
>> +  if (TREE_CODE (node->decl) == FUNCTION_DECL)
>> +error ("Missing routine function %<%s%>", node->name ());
>> +  else
>> +error ("Missing declared variable %<%s%>", node->name ());
>> +}
>> +
>> +  else
>> +gcc_unreachable ();
>> +}
>>  }
> 
> This will print an error not only when a fn/var, referenced from offload 
> region,
> missed its attribute, but also when something goes wrong in general LTO
> partitioning (if flag_openacc is set).  So, maybe just replace gcc_assert ()
> with error () without checking for flag_openacc?
> 
> And how about similar assert in input_varpool_node?

Good catch. I've been too focused on OpenACC lately. Would this patch be
OK for trunk if it passes testing?

Cesar
2015-07-24  Cesar Philippidis  

	gcc/
	* lto-cgraph.c (input_overwrite_node): Error instead of assert
	on missing cgraph partitions.
	(input_varpool_node): Likewise.


diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index d70537d..7e2fc80 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1218,9 +1218,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
  LDPR_NUM_KNOWN);
   node->instrumentation_clone = bp_unpack_value (bp, 1);
   node->split_part = bp_unpack_value (bp, 1);
-  gcc_assert (flag_ltrans
-	  || (!node->in_other_partition
-		  && !node->used_from_other_partition));
+
+  int success = flag_ltrans || (!node->in_other_partition
+&& !node->used_from_other_partition);
+  if (!success)
+error ("Missing %<%s%>", node->name ());
 }
 
 /* Return string alias is alias of.  */
@@ -1432,9 +1434,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
 node->set_section_for_node (section);
   node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
 	LDPR_NUM_KNOWN);
-  gcc_assert (flag_ltrans
-	  || (!node->in_other_partition
-		  && !node->used_from_other_partition));
+
+  int success = flag_ltrans || (!node->in_other_partition
+&& !node->used_from_other_partition);
+  if (!success)
+error ("Missing %<%s%>", node->name ());
 
   return node;
 }


Re: [gomp4] Add new oacc_transform patch

2015-07-28 Thread Cesar Philippidis
On 07/28/2015 02:21 AM, Thomas Schwinge wrote:

> Cesar, please address the following compiler diagnostig:
> 
>> 2015-07-21  Cesar Philippidis  
>>
>>  gcc/
>>  * omp-low.c (execute_oacc_transform): New function.
>>  (class pass_oacc_transform): New function.
>>  (make_pass_oacc_transform): New function.
>>  * passes.def: Add pass_oacc_transform to all_passes.
>>  * tree-pass.h (make_pass_oacc_transform): Declare.
>>  
>>
>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>> index 388013c..23989f9 100644
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -14394,4 +14394,76 @@ make_pass_late_lower_omp (gcc::context *ctxt)
>>return new pass_late_lower_omp (ctxt);
>>  }
>>  
>> +/* Main entry point for oacc transformations which run on the device
>> +   compiler.  */
>> +
>> +static unsigned int
>> +execute_oacc_transform ()
>> +{
>> +  basic_block bb;
>> +  gimple_stmt_iterator gsi;
>> +  gimple stmt;
>> +
>> +  if (!lookup_attribute ("oacc function",
>> + DECL_ATTRIBUTES (current_function_decl)))
>> +return 0;
>> +
>> +
>> +  FOR_ALL_BB_FN (bb, cfun)
>> +{
>> +  gsi = gsi_start_bb (bb);
>> +
>> +  while (!gsi_end_p (gsi))
>> +{
>> +  stmt = gsi_stmt (gsi);
>> +  gsi_next (&gsi);
>> +}
>> +}
>> +
>> +  return 0;
>> +}
> 
> [...]/source-gcc/gcc/omp-low.c: In function 'unsigned int 
> execute_oacc_transform()':
> [...]/source-gcc/gcc/omp-low.c:14406:10: error: variable 'stmt' set but 
> not used [-Werror=unused-but-set-variable]
>gimple stmt;
>   ^

I could apply the attached patch, but I figured that you'd need the stmt
iterator for acc_on_device anyway. Should I apply the patch to
gomp-4_0-branch?

Cesar

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 479b28a..e237c75 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -14431,26 +14431,10 @@ make_pass_late_lower_omp (gcc::context *ctxt)
 static unsigned int
 execute_oacc_transform ()
 {
-  basic_block bb;
-  gimple_stmt_iterator gsi;
-  gimple stmt;
-
   if (!lookup_attribute ("oacc function",
 			 DECL_ATTRIBUTES (current_function_decl)))
 return 0;
 
-
-  FOR_ALL_BB_FN (bb, cfun)
-{
-  gsi = gsi_start_bb (bb);
-
-  while (!gsi_end_p (gsi))
-	{
-	  stmt = gsi_stmt (gsi);
-	  gsi_next (&gsi);
-	}
-}
-
   return 0;
 }
 


Re: [gomp4] fix spinlock

2015-08-06 Thread Cesar Philippidis
On 08/06/2015 01:41 AM, Nathan Sidwell wrote:

> I've committed this to fix the spinlock problem Cesar fell over.  While
> there I added more checking on the worker dimension.

I hit a couple of more bugs with the spinlocks. First, the address space
argument to membar wasn't being handled properly. Second,
nvptx_spinunlock should probably be using atom.exch instead of atom.cas.
Finally, ptxas complains about the period prefix to the atom
instructions. This patch addresses these problems.

Is there a better way to allocate a scratch register for
nvptx_spinunlock, or is my solution ok as-is for gomp-4_0-branch?

Thanks,
Cesar



2015-08-06  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (nvptx_expand_lock_unlock): Pass an
	additional scratch register to gen_nvptx_spinlock.
	* config/nvptx/nvptx.md (nvptx_membar): Use %B for the address
	space operand.
	(nvptx_spinlock): Remove period prefix from atom.
	(nvptx_spinunlock): Take additional scratch register argument.
	Use atom.exch to update the lock.
	

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2013219..881aea4 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3327,7 +3327,7 @@ nvptx_expand_lock_unlock (tree exp, bool lock)
 label);
 }
   else
-pat = gen_nvptx_spinunlock (mem, space);
+pat = gen_nvptx_spinunlock (mem, space, gen_reg_rtx (SImode));
   emit_insn (pat);
   if (lock)
 emit_insn (barrier);
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 8cd8300..fb88c72 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1569,7 +1569,7 @@
   [(unspec_volatile [(match_operand:SI 0 "const_int_operand" "")]
 		UNSPECV_MEMBAR)]
   ""
-  "membar%M0;")
+  "membar%B0;")
 
 ;; spinlock and unlock
 (define_insn "nvptx_spinlock"
@@ -1581,11 +1581,12 @@
   (match_operand:BI 3 "register_operand" "=R")
   (label_ref (match_operand 4 "" ""))])]
""
-   "%4:\\t.atom%R1.cas.b32 %2,%0,0,1;setp.ne.u32 %3,%2,0;@%3 bra.uni %4;")
+   "%4:\\tatom%R1.cas.b32 %2,%0,0,1;setp.ne.u32 %3,%2,0;@%3 bra.uni %4;")
 
 (define_insn "nvptx_spinunlock"
[(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")
 		  (match_operand:SI 1 "const_int_operand" "i")]
-		  UNSPECV_UNLOCK)]
+		  UNSPECV_UNLOCK)
+(match_operand:SI 2 "register_operand" "=R")]
""
-   ".atom%R1.cas.b32 %0,1,0;")
+   "atom%R1.exch.b32 %2,%0,0;")


Re: [gomp4] Redesign oacc_parallel launch API

2015-08-06 Thread Cesar Philippidis
On 07/28/2015 09:52 AM, Nathan Sidwell wrote:
> I've committed this patch to the gomp4 branch to redo the launch API. 
> I'll post a version for trunk once the versioning patch gets approved &
> committed.
> 
> This changes the API in a number of ways, allowing device-specific
> knowledge to be moved into the device compiler and out of the host
> compiler.
> 
> Firstly, we attach a tuple of launch dimensions as an attribute to the
> offloaded function's 'oacc function' attribute.  These are the constant
> launch dimensions.  Dynamic dimensions get a zero for their slot in this
> list.  Further this list can be extended in the future to an alist keyed
> by device_type.
> 
> Dynamic dimensions are computed on the host.  however they are passed
> via varadic args to the GOACC_parallel function (which is renamed).  The
> varadic args are passed using key/value representation, and 3 keys are
> currently defined:
> END -- end of the varadic list
> DIM - set of runtime-computed dimensions.  Only the dynamic ones are
> passed.
> ASYNC_WAIT - an async and a set of waits (possibly zero).
> 
> I have arranged for the key to have a slot that can later be filled by
> device_type, and hence support multiple device types.
> 
> The constant dimensions can be used in expansion of the GOACC_nid
> function in the device compiler.  The device compiler could also process
> that list to select the device_type slot that is appropriate.
> 
> For PTX the backend is augmented to emit the launch dimensions into the
> target data, from whence the ptx plugin can pick them up and overwrite
> with any dynamic ones passed in from the launch function.

Looking at set_oacc_fn_attrib, it appears that const values are also
considered dynamic. See the attached test case more more info. Is that
the expected behavior? If not, I could take a look at this after I
finished my reduction patch.

Cesar
#include 

const int vl = 32;

int
main ()
{
  unsigned int red = 0;

#pragma acc parallel loop vector_length (vl) vector reduction (+:red) copy (red)
  for (int i = 0; i < 100; i++)
red ++;

  printf ("red = %d\n", red);

  return 0;
}


Re: [gomp4] Worker reduction builtin

2015-08-06 Thread Cesar Philippidis
On 08/04/2015 04:50 AM, Nathan Sidwell wrote:

> +/* Worker reduction address expander.  */
> +static rtx
> +nvptx_expand_work_red_addr (tree exp, rtx target,
> + machine_mode ARG_UNUSED (mode),
> + int ignore)
>  {
> -  return nvptx_expand_lock_unlock (desc, exp, false);
> +  if (ignore)
> +return target;
> +  
> +  rtx loop_id = expand_expr (CALL_EXPR_ARG (exp, 0),
> +  NULL_RTX, mode, EXPAND_NORMAL);
> +  rtx red_id = expand_expr (CALL_EXPR_ARG (exp, 1),
> +  NULL_RTX, mode, EXPAND_NORMAL);
> +  gcc_assert (GET_CODE (loop_id) == CONST_INT
> +   && GET_CODE (red_id) == CONST_INT);
> +  gcc_assert (REG_P (target));
> +
> +  unsigned lid = (unsigned)UINTVAL (loop_id);
> +  unsigned rid = (unsigned)UINTVAL (red_id);
> +
> +  unsigned ix;
> +
> +  for (ix = 0; ix != loop_reds.length (); ix++)
> +if (loop_reds[ix].id == lid)
> +  goto found_lid;
> +  /* Allocate a new loop.  */
> +  loop_reds.safe_push (loop_red (lid));
> + found_lid:
> +  loop_red &loop = loop_reds[ix];
> +  for (ix = 0; ix != loop.vars.length (); ix++)
> +if (loop.vars[ix].first == rid)
> +  goto found_rid;
> +
> +  /* Allocate a new var. */
> +  {
> +tree type = TREE_TYPE (TREE_TYPE (exp));
> +enum machine_mode mode = TYPE_MODE (type);
> +unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
> +unsigned off = loop.hwm;
> +
> +if (align > worker_red_align)
> +  worker_red_align = align;
> +off = (off + align - 1) & ~(align -1);
> +loop.hwm = off + GET_MODE_SIZE (mode);
> +loop.vars.safe_push (var_red_t (rid, off));
> +  }
> + found_rid:
> +
> +  /* Return offset into worker reduction array.  */
> +  unsigned offset = loop.vars[ix].second;
> +  
> +  rtx addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (addr,
> +   gen_rtx_PLUS (Pmode, worker_red_sym, GEN_INT (offset)));
> +  emit_insn (gen_rtx_SET (target,
> +   gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
> +   UNSPEC_FROM_SHARED)));
> +  return target;
>  }

Something is wrong over here. I'm seeing this ICE:

wred.c: In function ‘main._omp_fn.0’:
wred.c:9:9: error: unrecognizable insn:
 #pragma acc parallel loop vector_length (32) num_workers (32) worker
reduction (+:red) copy (red)
 ^
(insn 28 27 29 2 (set (reg:DI 59)
(plus:DI (symbol_ref:DI ("__worker_red"))
(const_int 0 [0]))) wred.c:9 -1
 (nil))

The attached patch fixes it by assigning worker_red_sym to a scratch
register. Is this OK gomp-4_0-branch?

Cesar
2015-08-06  Cesar Philippidis  

gcc/
	* config/nvptx/nvptx.c (nvptx_expand_work_red_addr): Use a
	scratch register for worker_red_sym.
	

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index e343e53..389e370 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3415,10 +3415,12 @@ nvptx_expand_work_red_addr (tree exp, rtx target,
 
   /* Return offset into worker reduction array.  */
   unsigned offset = loop.vars[ix].second;
-  
+
+  rtx base = gen_reg_rtx (Pmode);
   rtx addr = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (base, worker_red_sym));
   emit_move_insn (addr,
-		  gen_rtx_PLUS (Pmode, worker_red_sym, GEN_INT (offset)));
+		  gen_rtx_PLUS (Pmode, base, GEN_INT (offset)));
   emit_insn (gen_rtx_SET (target,
 			  gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
 	  UNSPEC_FROM_SHARED)));


Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-09-18 Thread Cesar Philippidis
On 09/18/2015 01:39 AM, Thomas Schwinge wrote:

> On Tue, 1 Sep 2015 18:29:55 +0200, Tom de Vries  
> wrote:
>> On 27/08/15 03:37, Cesar Philippidis wrote:
>>> -  ctx->ganglocal_size_host = align_and_expand (&gl_host, host_size, align);
>>
>> I suspect this caused a bootstrap failure (align_and_expand unused). 
>> Worked-around as attached.
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -1450,7 +1450,7 @@ omp_copy_decl (tree var, copy_body_data *cb)
>>  
>>  /* Modify the old size *POLDSZ to align it up to ALIGN, and then return
>> a value with SIZE added to it.  */
>> -static tree
>> +static tree ATTRIBUTE_UNUSED
>>  align_and_expand (tree *poldsz, tree size, unsigned int align)
>>  {
>>tree oldsz = *poldsz;
> 
> If I remember correctly, this has only ever been used in the "ganglocal"
> implementation -- which is now gone.  So, should align_and_expand also be
> elided (Cesar)?

Most likely. I probably overlooked it when I was working on that
ganglocal removal patch. Can you remove it please? I'm already juggling
a couple of patches right now.

Thanks,
Cesar





Re: New post-LTO OpenACC pass

2015-09-21 Thread Cesar Philippidis
On 09/21/2015 09:30 AM, Nathan Sidwell wrote:

> +const pass_data pass_data_oacc_transform =
> +{
> +  GIMPLE_PASS, /* type */
> +  "fold_oacc_transform", /* name */

Want to rename the tree dump file to oacc_xforms like I'm did in the
attached patch? Regardless, I think we need to document this flag in
invoke.texi.

> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};

Cesar
2015-09-21  Cesar Philippidis  

	gcc/
	* doc/invoke.texi: Document -fdump-tree-oacc_xforms.
	* omp-low.c (pass_data_oacc_transform): Rename the tree dump for
	oacc_transform as oacc_xforms.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 92f82d7..7406941 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7158,6 +7158,11 @@ is made by appending @file{.slp} to the source file name.
 Dump each function after Value Range Propagation (VRP).  The file name
 is made by appending @file{.vrp} to the source file name.
 
+@item oacc_xforms
+@opindex fdump-tree-oacc_xforms
+Dump each function after applying target-specific OpenACC transformations.
+The file name is made by appending @file{.oacc_xforms} to the source file name.
+
 @item all
 @opindex fdump-tree-all
 Enable all the available tree dumps with the flags provided in this option.
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e3dc160..f31e6cd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15086,7 +15086,7 @@ namespace {
 const pass_data pass_data_oacc_transform =
 {
   GIMPLE_PASS, /* type */
-  "fold_oacc_transform", /* name */
+  "oacc_xforms", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */


OpenACC subarray data alignment in fortran

2015-09-22 Thread Cesar Philippidis
In both OpenACC and OpenMP, each subarray has at least two data mappings
associated with them, one for the pointer and another for the data in
the array section (fortan also has a pset mapping). One problem I
observed in fortran is that array section data is casted to char *.
Consequently, when lower_omp_target assigns alignment for the subarray
data, it does so incorrectly. This is a problem on nvptx if you have a
data clause such as

  integer foo
  real*8 bar (100)

  !$acc data copy (foo, bar(1:100))

Here, the data associated with bar could get aligned on a 4 byte
boundary instead of 8 byte. That causes problems on nvptx targets.

My fix for this is to prevent the fortran front end from casting the
data pointers to char *. I only prevented casting on the code which
handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
also get casted to char *, but I left those as-is because I'm not that
familiar with how non-OpenMP target regions get lowered.

Is this patch OK for trunk?

Thanks,
Cesar
2015-09-22  Cesar Philippidis  

	gcc/
	* fortran/trans-openmp.c (gfc_omp_finish_clause): Don't cast ptr
	into a character pointer.
	(gfc_trans_omp_clauses_1): Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/data-alignment.f90: New test.

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index cd76f2a..8c1e897 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1065,7 +1065,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
   gfc_start_block (&block);
   tree type = TREE_TYPE (decl);
   tree ptr = gfc_conv_descriptor_data_get (decl);
-  ptr = fold_convert (build_pointer_type (char_type_node), ptr);
   ptr = build_fold_indirect_ref (ptr);
   OMP_CLAUSE_DECL (c) = ptr;
   c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
@@ -1972,8 +1971,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 		{
 		  tree type = TREE_TYPE (decl);
 		  tree ptr = gfc_conv_descriptor_data_get (decl);
-		  ptr = fold_convert (build_pointer_type (char_type_node),
-	  ptr);
 		  ptr = build_fold_indirect_ref (ptr);
 		  OMP_CLAUSE_DECL (node) = ptr;
 		  node2 = build_omp_clause (input_location,
@@ -2066,8 +2063,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
    OMP_CLAUSE_SIZE (node), elemsz);
 		}
 		  gfc_add_block_to_block (block, &se.post);
-		  ptr = fold_convert (build_pointer_type (char_type_node),
-  ptr);
 		  OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr);
 
 		  if (POINTER_TYPE_P (TREE_TYPE (decl))
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90 b/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90
new file mode 100644
index 000..3c309c0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90
@@ -0,0 +1,35 @@
+! Test if the array data associated with c is properly aligned
+! on the accelerator.  If it is not, this program will crash.
+
+! { dg-do run }
+
+integer function routine_align()
+  implicit none
+  integer, parameter :: n = 1
+  real*8, dimension(:), allocatable :: c
+  integer :: i, idx
+
+  allocate (c(n))
+  routine_align = 0
+  c = 0.0
+
+  !$acc data copyin(idx) copy(c(1:n))
+
+  !$acc parallel vector_length(32)
+  !$acc loop vector
+  do i=1, n
+ c(i) = i
+  enddo
+  !$acc end parallel
+
+  !$acc end data
+end function routine_align
+
+
+! main driver
+program routine_align_main
+  implicit none
+  integer :: success
+  integer routine_align
+  success = routine_align()
+end program routine_align_main


[gomp4] implicit data mappings of dummy arguments

2015-09-22 Thread Cesar Philippidis
Currently, the gimplifier will incorrectly create implicit firstprivate
mappings for pointer variables. That's fine except when the pointer
points to a dummy argument. In which case, the gimplifier should check
the type of the value being pointed to before deciding on the type of
implicit mapping. This patch teaches the gimplifier to do that. This
corrects a bug where a dummy array gets implicitly transferred as
firstprivate instead of pcopy.

I've applied patch has been committed to gomp-4_0-branch.

Cesar
2015-09-22  Cesar Philippidis  

	gcc/
	* gimplify.c (oacc_default_clause): Inspect pointer types when
	determining implicit data mappings.

	libgomp/
	* testsuite/libgomp.oacc-fortran/dummy-array.f90: New test.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90: New test.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 914570b..6dc7df7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5948,7 +5948,8 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
 	  {
 	tree type = TREE_TYPE (decl);
 
-	if (TREE_CODE (type) == REFERENCE_TYPE)
+	if (TREE_CODE (type) == REFERENCE_TYPE
+		|| POINTER_TYPE_P (type))
 	  type = TREE_TYPE (type);
 	
 	if (AGGREGATE_TYPE_P (type))
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/dummy-array.f90 b/libgomp/testsuite/libgomp.oacc-fortran/dummy-array.f90
new file mode 100644
index 000..e95563c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/dummy-array.f90
@@ -0,0 +1,28 @@
+! Ensure that dummy arrays are transferred to the accelerator
+! via an implicit pcopy.
+
+! { dg-do run } 
+
+program main
+  integer, parameter :: n = 1000
+  integer :: a(n)
+  integer :: i
+
+  a(:) = -1
+
+  call dummy_array (a, n)
+  
+  do i = 1, n
+ if (a(i) .ne. i) call abort
+  end do
+end program main
+
+subroutine dummy_array (a, n)
+  integer a(n)
+
+  !$acc parallel loop num_gangs (100) gang
+  do i = 1, n
+ a(i) = i
+  end do
+  !$acc end parallel loop
+end subroutine
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reference-reductions.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reference-reductions.f90
new file mode 100644
index 000..a684d07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reference-reductions.f90
@@ -0,0 +1,38 @@
+! Test reductions on dummy arguments inside modules.
+
+! { dg-do run }
+
+module prm
+  implicit none
+
+contains
+
+subroutine param_reduction(var)
+  implicit none
+  integer(kind=8) :: var
+  integer  :: j,k
+
+!$acc parallel copy(var)
+!$acc loop reduction(+ : var) gang
+ do k=1,10
+!$acc loop vector reduction(+ : var)
+do j=1,100
+ var = var + 1.0
+enddo
+ enddo
+!$acc end parallel
+end subroutine param_reduction
+
+end module prm
+
+program test
+  use prm
+  implicit none
+
+  integer(8) :: r
+
+  r=10.0
+  call param_reduction (r)
+
+  if (r .ne. 1010) call abort ()
+end program test


[gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
Gang, worker, vector and collapse all contain optional arguments which
may be used during loop expansion. In OpenACC, those expressions could
contain variables, but those variables aren't always getting remapped
automatically. This patch remaps those variables inside lower_omp_loop.

Note that I didn't need to use a tree walker for more complicated
expressions because it's not required. By the time those clauses reach
lower_omp_loop, only the result of the expression is available. So the
other variables in those expressions get remapped with everything else
during omplow. Therefore, the only problematic case is when the the
optional expression is just a decl, e.g. gang(static:foo).

I've applied this patch to gomp-4_0-branch.

Cesar


Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
On 09/23/2015 10:42 AM, Cesar Philippidis wrote:

> I've applied this patch to gomp-4_0-branch.

This patch, that is.

Cesar

2015-09-23  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_omp_for): Remap any variables present in
	OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR and
	OMP_CLAUSE_COLLAPSE becuase they will be used later by expand_omp_for.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Test if
	static gang expressions containing variables work.
	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ec76096..3f36b7a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11325,6 +11325,35 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (oacc_tail)
 gimple_seq_add_seq (&body, oacc_tail);
 
+  /* Update the variables inside any clauses which may be involved in loop
+ expansion later on.  */
+  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
+{
+  int args;
+
+  switch (OMP_CLAUSE_CODE (c))
+	{
+	default:
+	  args = 0;
+	  break;
+	case OMP_CLAUSE_GANG:
+	  args = 2;
+	  break;
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_COLLAPSE:
+	  args = 1;
+	  break;
+	}
+
+  for (int i = 0; i < args; i++)
+	{
+	  tree expr = OMP_CLAUSE_OPERAND (c, i);
+	  if (expr && DECL_P (expr))
+	OMP_CLAUSE_OPERAND (c, i) = build_outer_var_ref (expr, ctx);
+	}
+}
+
   pop_gimplify_context (new_stmt);
 
   gimple_bind_append_vars (new_stmt, ctx->block_vars);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
index 3a9a508..20a866d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
@@ -39,7 +39,7 @@ int
 main ()
 {
   int a[N];
-  int i;
+  int i, x;
 
 #pragma acc parallel loop gang (static:*) num_gangs (10)
   for (i = 0; i < 100; i++)
@@ -78,5 +78,21 @@ main ()
 
   test_nonstatic (a, 10);
 
+  /* Static arguments with a variable expression.  */
+
+  x = 20;
+#pragma acc parallel loop gang (static:0+x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  x = 20;
+#pragma acc parallel loop gang (static:x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
index e562535..7d56060 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
@@ -3,6 +3,7 @@
 program main
   integer, parameter :: n = 100
   integer i, a(n), b(n)
+  integer x
 
   do i = 1, n
  b(i) = i
@@ -48,6 +49,23 @@ program main
 
   call test (a, b, 20, n)
 
+  x = 5
+  !$acc parallel loop gang (static:0+x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  x = 10
+  !$acc parallel loop gang (static:x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 10
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 10, n)
 end program main
 
 subroutine test (a, b, sarg, n)


Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
On 09/23/2015 11:26 AM, Thomas Schwinge wrote:
> On Wed, 23 Sep 2015 10:57:40 -0700, Cesar Philippidis 
>  wrote:
>> On 09/23/2015 10:42 AM, Cesar Philippidis wrote:
>> | Gang, worker, vector and collapse all contain optional arguments which
>> | may be used during loop expansion. In OpenACC, those expressions could
>> | contain variables
> 
> I'm fairly sure that at least the collapse clause needs to be a
> compile-time constant?

Thanks, you're correct. I was looking at a user application and not the
spec when I made this change. I've applied this patch to fix that.

>> | but those variables aren't always getting remapped
>> | automatically. This patch remaps those variables inside lower_omp_loop.
> 
> Shouldn't that be done in lower_rec_input_clauses?  (Maybe I'm confused
> -- it's been a long time that I looked at this code.)  (Jakub?)

I thought that lower_rec_input_clauses was for omp reductions and
firstprivate initialization? Variables ultimately get remapped when
omplower eventually calls gimple_regimplify_operands. That function uses
the value-expr to for remapping.

In this case, since lower_omp_for is responsible for GIMPLE_OMP_FOR
stmts, gimple_regimplify_operands doesn't get called on the clauses.

Cesar
2015-09-23  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_omp_for): Don't remap OMP_CLAUSE_COLLAPSE
	because it always a constant value.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fa6b8a5..753996b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11341,7 +11341,6 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  break;
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_WORKER:
-	case OMP_CLAUSE_COLLAPSE:
 	  args = 1;
 	  break;
 	}


Re: [gomp4] Another oacc reduction simplification

2015-09-24 Thread Cesar Philippidis
On 09/22/2015 08:29 AM, Nathan Sidwell wrote:

> 1) Don't have a fake gang reduction outside of worker & vector loops. 
> Deal with the receiver object directly.  I.e. 'ref_to_res' need not be a
> null pointer for vector and worker loops.

What happens when there is no receiver object. E.g. a reduction inside a
routine? Specifically, inside lower_oacc_reductions, your doing this:

/* This is the outermost construct with this reduction,
   see if there's a mapping for it.  */
if (maybe_lookup_field (orig, outer))
  ref_to_res = build_receiver_ref (orig, false, outer);

That's going to ICE inside a routine.

> 2) Create a local private instance for all cases of reference var
> reductions, not just those in vector & worker loops

Good. I was about to make a similar change to fix a gang reduction bug.

Cesar



Re: [gomp4] Another oacc reduction simplification

2015-09-25 Thread Cesar Philippidis
On 09/25/2015 03:57 AM, Nathan Sidwell wrote:
> On 09/24/15 16:32, Cesar Philippidis wrote:
>> On 09/22/2015 08:29 AM, Nathan Sidwell wrote:
>>
>>> 1) Don't have a fake gang reduction outside of worker & vector loops.
>>> Deal with the receiver object directly.  I.e. 'ref_to_res' need not be a
>>> null pointer for vector and worker loops.
>>
>> What happens when there is no receiver object. E.g. a reduction inside a
>> routine? Specifically, inside lower_oacc_reductions, your doing this:
>>
>> /* This is the outermost construct with this reduction,
>>see if there's a mapping for it.  */
>> if (maybe_lookup_field (orig, outer))
>>   ref_to_res = build_receiver_ref (orig, false, outer);
>>
>> That's going to ICE inside a routine.
> 
> Is it?  the 'maybe_lookup' should protect against that.  do you have a
> testcase?

See gcc/testsuite/c-c++-common/goacc/routine-7.c.

Cesar


[gomp4] error on acc loops not associated with offloaded acc regions

2015-09-28 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which teaches omplower how to
error when it detects acc loops which aren't nested inside an acc
parallel or kernels region or located within a function marked as an acc
routine. A couple of test cases needed to be updated.

The error message is kind of long. Let me know if it should be revised.

Cesar
2015-09-28  Cesar Philippidis  

	gcc/
	* omp-low.c (check_omp_nesting_restrictions): Check for acc loops not
	associated with acc regions or routines.

	gcc/testsuite/
	* c-c++-common/goacc/non-routine.c: New test.
	* c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
	nesting.
	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
	* c-c++-common/goacc/clauses-fail.c: Likewise.
	* c-c++-common/goacc/sb-1.c: Likewise.
	* c-c++-common/goacc/sb-3.c: Likewise.
	* gcc.dg/goacc/sb-1.c: Likewise.
	* gcc.dg/goacc/sb-3.c: Likewise.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 99b3939..2329a71 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	}
 	  return true;
 	}
+  if (is_gimple_omp_oacc (stmt) && ctx == NULL
+	  && get_oacc_fn_attrib (current_function_decl) == NULL)
+	{
+	  error_at (gimple_location (stmt),
+		"acc loops must be associated with an acc region or "
+		"routine");
+	  return false;
+	}
   /* FALLTHRU */
 case GIMPLE_CALL:
   if (is_gimple_call (stmt)
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index b38e181..75d6a1d 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -20,6 +20,7 @@ f_acc_kernels (void)
   }
 }
 
+#pragma acc routine
 void
 f_acc_loop (void)
 {
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 14c6aa6..6d91484 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -361,72 +361,72 @@ f_acc_data (void)
 void
 f_acc_loop (void)
 {
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp parallel
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp for
   for (i = 0; i < 3; i++)
 	;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp sections
   {
 	;
   }
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp single
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp task
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp master
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp critical
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp ordered
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target
   ;
-#pragma omp target data /* { dg-error "non-OpenACC construct i

Re: OpenACC subarray data alignment in fortran

2015-09-29 Thread Cesar Philippidis
Ping.

In the meantime, I'll apply this patch to gomp-4_0-branch.

Cesar

On 09/22/2015 08:24 AM, Cesar Philippidis wrote:
> In both OpenACC and OpenMP, each subarray has at least two data mappings
> associated with them, one for the pointer and another for the data in
> the array section (fortan also has a pset mapping). One problem I
> observed in fortran is that array section data is casted to char *.
> Consequently, when lower_omp_target assigns alignment for the subarray
> data, it does so incorrectly. This is a problem on nvptx if you have a
> data clause such as
> 
>   integer foo
>   real*8 bar (100)
> 
>   !$acc data copy (foo, bar(1:100))
> 
> Here, the data associated with bar could get aligned on a 4 byte
> boundary instead of 8 byte. That causes problems on nvptx targets.
> 
> My fix for this is to prevent the fortran front end from casting the
> data pointers to char *. I only prevented casting on the code which
> handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
> also get casted to char *, but I left those as-is because I'm not that
> familiar with how non-OpenMP target regions get lowered.
> 
> Is this patch OK for trunk?
> 
> Thanks,
> Cesar
> 



Re: [gomp4] error on acc loops not associated with offloaded acc regions

2015-09-29 Thread Cesar Philippidis
On 09/29/2015 02:48 AM, Thomas Schwinge wrote:

> On Mon, 28 Sep 2015 10:08:34 -0700, Cesar Philippidis 
>  wrote:
>> I've applied this patch to gomp-4_0-branch which teaches omplower how to
>> error when it detects acc loops which aren't nested inside an acc
>> parallel or kernels region or located within a function marked as an acc
>> routine. A couple of test cases needed to be updated.
>>
>> The error message is kind of long. Let me know if it should be revised.
> 
>>  gcc/testsuite/
>>  * c-c++-common/goacc/non-routine.c: New test.
>>  * c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
>>  nesting.
>>  * c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
>>  * c-c++-common/goacc/clauses-fail.c: Likewise.
>>  * c-c++-common/goacc/sb-1.c: Likewise.
>>  * c-c++-common/goacc/sb-3.c: Likewise.
>>  * gcc.dg/goacc/sb-1.c: Likewise.
>>  * gcc.dg/goacc/sb-3.c: Likewise.
> 
> What about any Fortran test cases?

My first thought was that we didn't need one because this is generic
error handling in omplow, and there are already a lot of c tests cases
exercising it. However a fortran test can't hurt, so I added one in this
new patch. Note that I had to create a new test instead of hijacking an
existing test, because the fortran front end bails out when it detects
errors before it hands anything over to omplow. And the existing tests
had a bunch of expected front end errors.

>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, 
>> omp_context *ctx)
>>  }
>>return true;
>>  }
>> +  if (is_gimple_omp_oacc (stmt) && ctx == NULL
>> +  && get_oacc_fn_attrib (current_function_decl) == NULL)
>> +{
>> +  error_at (gimple_location (stmt),
>> +"acc loops must be associated with an acc region or "
>> +"routine");
>> +  return false;
>> +}
>>/* FALLTHRU */
>>  case GIMPLE_CALL:
>>if (is_gimple_call (stmt)
> 
> I see that the error reporting doesn't really use a consistent style
> currently, but what about something like "loop directive must be
> associated with compute region" (where "compute region" is the language
> used by OpenACC 2.0a to mean the structured block associated with a
> compute construct as well as routine directive)?

That sounds reasonable, but it's not much shorter.

>> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
>> @@ -20,6 +20,7 @@ f_acc_kernels (void)
>>}
>>  }
>>  
>> +#pragma acc routine
>>  void
>>  f_acc_loop (void)
>>  {
> 
> OK, but...
> 
>> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
>> @@ -361,72 +361,72 @@ f_acc_data (void)
>>  void
>>  f_acc_loop (void)
>>  {
>> -#pragma acc loop
>> +#pragma acc loop /* { dg-error "acc loops must be associated with an acc 
>> region or routine" } */
>>for (i = 0; i < 2; ++i)
>>  {
>> -#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC 
>> region" } */
>> +#pragma omp parallel
>>;
>>  }
> 
> ... here you're changing what this is meant to be testing, so please
> restore the original meaning (by adding "#pragma acc routine" to this
> function, I suppose), and then perhaps add whichever additional test
> cases you deem necessary.

I was wondering about that too. After thinking about it some more, I did
as you suggested -- revert those changes and used a routine pragma.

>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/non-routine.c
>> @@ -0,0 +1,16 @@
>> +/* This program validates the behavior of acc loops which are
>> +   not associated with a parallel or kernles region or routine.  */
> 
> :-) Thanks for adding such a comment -- this is missing in too many test
> cases.

We definitely need more of them. I'm not starting to forget what I was
trying to test several months ago.

I'll apply this patch to gomp4.

Cesar

2015-09-29  Cesar Philippidis  

	gcc/
	* omp-low.c (check_omp_nesting_restrictions): Update the error
	message for loops not affliated with acc compute regions.

	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-fail-1.c (f_omp): Revert changes and
	mark the function as an acc routine.
	* c-c++-common/goacc/clauses-fail.c: Likewise.
	* c-c++-commo

[gomp4] tile clause asterisk argument

2015-09-30 Thread Cesar Philippidis
This patch fixes a fortran ICE when a tile clause contains an asterisk.
The problem was the asterisk argument is represented by a NULL
expression. That caused problems as the code when the code is translated
into gimple. The fix is to convert those NULL expressions into -1
expressions late, since that what the c and c++ front ends do.

It looks like there is a lot of existing test coverage for the tile
clause. However, this ICE isn't triggered if there are parser errors.
The new test does contain some deliberate errors, but I included them to
test for invalid nesting which gets triggered in omplow.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-09-30  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Represent astrisk tile
	arguments as -1.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-5.f95: New test.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 0bdbb73..c42a2c2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4891,10 +4891,21 @@ resolve_oacc_loop_blocks (gfc_code *code)
 	{
 	  num++;
 	  if (el->expr == NULL)
-	continue;
-	  resolve_oacc_positive_int_expr (el->expr, "TILE");
-	  if (el->expr->expr_type != EXPR_CONSTANT)
-	gfc_error ("TILE requires constant expression at %L", &code->loc);
+	{
+	  /* NULL expressions are used to represent '*' arguments.
+		 Convert those to a -1 expressions.  */
+	  el->expr = gfc_get_constant_expr (BT_INTEGER,
+		gfc_default_integer_kind,
+		&code->loc);
+	  mpz_set_si (el->expr->value.integer, -1);
+	}
+	  else
+	{
+	  resolve_oacc_positive_int_expr (el->expr, "TILE");
+	  if (el->expr->expr_type != EXPR_CONSTANT)
+		gfc_error ("TILE requires constant expression at %L",
+			   &code->loc);
+	}
 	}
   resolve_oacc_nested_loops (code, code->block->next, num, "tiled");
 }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
new file mode 100644
index 000..c2db090
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -0,0 +1,429 @@
+! { dg-do compile }
+! { dg-additional-options "-fmax-errors=100" }
+
+! TODO: nested kernels are allowed in 2.0
+
+program test
+  implicit none
+  integer :: i, j
+
+  !$acc kernels
+!$acc loop auto
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+ENDDO
+!$acc loop gang(5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(num:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:*)
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+  !$acc loop worker
+  DO j = 1,10
+  ENDDO
+ENDDO
+
+!$acc loop worker
+DO i = 1,10
+ENDDO
+!$acc loop worker(5)
+DO i = 1,10
+ENDDO
+!$acc loop worker(num:5)
+DO i = 1,10
+ENDDO
+!$acc loop worker
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop gang worker
+DO i = 1,10
+ENDDO
+
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop vector(5)
+DO i = 1,10
+ENDDO
+!$acc loop vector(length:5)
+DO i = 1,10
+ENDDO
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop gang vector
+DO i = 1,10
+ENDDO
+!$acc loop worker vector
+DO i = 1,10
+ENDDO
+
+!$acc loop auto
+DO i = 1,10
+ENDDO
+
+!$acc loop tile(1)
+DO i = 1,10
+ENDDO
+!$acc loop tile(2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(6-2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(6+2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop tile(*, 1)
+DO i = 1,10
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop tile(-1) ! { dg-warning "must be positive" }
+do i = 1,10
+enddo
+!$acc loop vector tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop worker tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop gang tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop vector gang tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop vector worker tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop gang worker tile(*)
+DO i = 1,10
+ENDDO
+  !$acc end kernels
+
+
+  !$acc parallel
+!$acc loop auto
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:*)
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+  !$acc loop worker
+  DO j = 1,10
+  ENDDO
+ENDDO
+
+!$acc loop worker
+DO i = 1,10
+ENDDO
+!$acc loop worker
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  E

[gomp4] handle missing OMP_LIST_ clauses in fortran's parse tree debugger

2015-10-01 Thread Cesar Philippidis
While debugging gfortran with -fdump-fortran-*, I noticed that a couple
of OMP_LIST_ entries weren't being handled show_omp_clauses so I've
added them. I also took advantage of the opportunity to rearrange the
the cases in the switch statement that handles those lists in a way that
matches the enum in gfortran.h because I couldn't figure out how things
were ordered before.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-10-01  Cesar Philippidis  

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Add missing omp list_types
	and reorder the switch cases to match the enum in gfortran.h.
	

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 48476af..3e5ac17 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1251,19 +1251,24 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 	const char *type = NULL;
 	switch (list_type)
 	  {
-	  case OMP_LIST_USE_DEVICE: type = "USE_DEVICE"; break;
-	  case OMP_LIST_DEVICE_RESIDENT: type = "USE_DEVICE"; break;
-	  case OMP_LIST_CACHE: type = ""; break;
 	  case OMP_LIST_PRIVATE: type = "PRIVATE"; break;
 	  case OMP_LIST_FIRSTPRIVATE: type = "FIRSTPRIVATE"; break;
 	  case OMP_LIST_LASTPRIVATE: type = "LASTPRIVATE"; break;
-	  case OMP_LIST_SHARED: type = "SHARED"; break;
+	  case OMP_LIST_COPYPRIVATE: type = "COPYPRIVATE"; break;
+	  case OMP_LIST_SHARED: type = "SHARE"; break;
 	  case OMP_LIST_COPYIN: type = "COPYIN"; break;
 	  case OMP_LIST_UNIFORM: type = "UNIFORM"; break;
 	  case OMP_LIST_ALIGNED: type = "ALIGNED"; break;
 	  case OMP_LIST_LINEAR: type = "LINEAR"; break;
-	  case OMP_LIST_REDUCTION: type = "REDUCTION"; break;
 	  case OMP_LIST_DEPEND: type = "DEPEND"; break;
+	  case OMP_LIST_MAP: type = "MAP"; break;
+	  case OMP_LIST_TO: type = "TO"; break;
+	  case OMP_LIST_FROM: type = "FROM"; break;
+	  case OMP_LIST_REDUCTION: type = "REDUCTION"; break;
+	  case OMP_LIST_DEVICE_RESIDENT: type = "DEVICE_RESIDENT"; break;
+	  case OMP_LIST_LINK: type = "LINK"; break;
+	  case OMP_LIST_USE_DEVICE: type = "USE_DEVICE"; break;
+	  case OMP_LIST_CACHE: type = "CACHE"; break;
 	  default:
 	gcc_unreachable ();
 	  }


[gomp4] privatize internal array variables introduced by the fortran FE

2015-10-13 Thread Cesar Philippidis
Arrays in fortran have a couple of internal variables associated with
them, e.g. stride, lbound, ubound, size, etc. Depending on how and where
the array was declared, these internal variables may be packed inside an
array descriptor represented by a struct or defined individually. The
major problem with this is that kernels and parallel regions with
default(none) will generate errors if those internal variables are
defined individually since the user has no way to add clauses to them. I
suspect this is also true for arrays inside omp target regions.

My fix for this involves two parts. First, I reinitialize those private
array variables which aren't associated with array descriptors at the
beginning of the parallel/kernels region they are used in. Second, I
added OMP_CLAUSE_PRIVATE for those internal variables.

I'll apply this patch to gomp-4_0-branch shortly.

Is there any reason why only certain arrays have array descriptors? The
arrays with descriptors don't have this problem. It's only the ones
without descriptors that leak new internal variables that cause errors
with default(none).

Cesar
2015-10-13  Cesar Philippidis  

	gcc/fortran/
	* trans-array.c (gfc_trans_array_bounds): Add an INIT_VLA argument
	to control whether VLAs should be initialized.  Don't mark this
	function as static.
	(gfc_trans_auto_array_allocation): Update call to
	gfc_trans_array_bounds.
	(gfc_trans_g77_array): Likewise.
	* trans-array.h: Declare gfc_trans_array_bounds.
	* trans-openmp.c (gfc_scan_nodesc_arrays): New function.
	(gfc_privatize_nodesc_arrays_1): New function.
	(gfc_privatize_nodesc_arrays): New function.
	(gfc_init_nodesc_arrays): New function.
	(gfc_trans_oacc_construct): Initialize any internal variables for
	arrays without array descriptors inside the offloaded parallel and
	kernels region.
	(gfc_trans_oacc_combined_directive): Likewise.

	gcc/testsuite/
	* gfortran.dg/goacc/default_none.f95: New test.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a6b761b..86f983a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -5709,9 +5709,9 @@ gfc_trans_array_cobounds (tree type, stmtblock_t * pblock,
 /* Generate code to evaluate non-constant array bounds.  Sets *poffset and
returns the size (in elements) of the array.  */
 
-static tree
+tree
 gfc_trans_array_bounds (tree type, gfc_symbol * sym, tree * poffset,
-stmtblock_t * pblock)
+stmtblock_t * pblock, bool init_vla)
 {
   gfc_array_spec *as;
   tree size;
@@ -5788,7 +5788,9 @@ gfc_trans_array_bounds (tree type, gfc_symbol * sym, tree * poffset,
 }
 
   gfc_trans_array_cobounds (type, pblock, sym);
-  gfc_trans_vla_type_sizes (sym, pblock);
+
+  if (init_vla)
+gfc_trans_vla_type_sizes (sym, pblock);
 
   *poffset = offset;
   return size;
@@ -5852,7 +5854,7 @@ gfc_trans_auto_array_allocation (tree decl, gfc_symbol * sym,
   && !INTEGER_CST_P (sym->ts.u.cl->backend_decl))
 gfc_conv_string_length (sym->ts.u.cl, NULL, &init);
 
-  size = gfc_trans_array_bounds (type, sym, &offset, &init);
+  size = gfc_trans_array_bounds (type, sym, &offset, &init, true);
 
   /* Don't actually allocate space for Cray Pointees.  */
   if (sym->attr.cray_pointee)
@@ -5947,7 +5949,7 @@ gfc_trans_g77_array (gfc_symbol * sym, gfc_wrapped_block * block)
 gfc_conv_string_length (sym->ts.u.cl, NULL, &init);
 
   /* Evaluate the bounds of the array.  */
-  gfc_trans_array_bounds (type, sym, &offset, &init);
+  gfc_trans_array_bounds (type, sym, &offset, &init, true);
 
   /* Set the offset.  */
   if (TREE_CODE (GFC_TYPE_ARRAY_OFFSET (type)) == VAR_DECL)
diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h
index 52f1c9a..8dbafb9 100644
--- a/gcc/fortran/trans-array.h
+++ b/gcc/fortran/trans-array.h
@@ -44,6 +44,8 @@ void gfc_trans_g77_array (gfc_symbol *, gfc_wrapped_block *);
 /* Generate code to deallocate an array, if it is allocated.  */
 tree gfc_trans_dealloc_allocated (tree, bool, gfc_expr *);
 
+tree gfc_trans_array_bounds (tree, gfc_symbol *, tree *, stmtblock_t *, bool);
+
 tree gfc_full_array_size (stmtblock_t *, tree, int);
 
 tree gfc_duplicate_allocatable (tree, tree, tree, int, tree);
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 8c1e897..f2e9803 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -39,6 +39,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "arith.h"
 #include "omp-low.h"
 #include "gomp-constants.h"
+#include "hash-set.h"
+#include "tree-iterator.h"
 
 int ompws_flags;
 
@@ -2716,22 +2718,157 @@ gfc_trans_omp_code (gfc_code *code, bool force_empty)
   return stmt;
 }
 
+void gfc_debug_expr (gfc_expr *);
+
+/* Add any array that does not have an array descriptor to the hash_set
+   pointed to by DATA.  */
+
+static in

Re: [gomp4] privatize internal array variables introduced by the fortran FE

2015-10-13 Thread Cesar Philippidis
On 10/13/2015 01:29 PM, Jakub Jelinek wrote:
> On Tue, Oct 13, 2015 at 01:12:25PM -0700, Cesar Philippidis wrote:
>> Arrays in fortran have a couple of internal variables associated with
>> them, e.g. stride, lbound, ubound, size, etc. Depending on how and where
>> the array was declared, these internal variables may be packed inside an
>> array descriptor represented by a struct or defined individually. The
>> major problem with this is that kernels and parallel regions with
>> default(none) will generate errors if those internal variables are
>> defined individually since the user has no way to add clauses to them. I
>> suspect this is also true for arrays inside omp target regions.
> 
> I believe gfc_omp_predetermined_sharing is supposed to handle this,
> returning predetermined shared for certain DECL_ARTIFICIAL decls.
> If you are not using that hook, perhaps you should have similar one tuned
> for OpenACC purposes?

We do have one for openacc. I thought it's job was to mark variables as
firstprivate or pcopy as necessary. Anyway, it might be too late to call
gfc_omp_predetermined_sharing from the gimplifier from a performance
standpoint. Consider something like this:

  !$acc data copy (array)
  do i = 1,n
!$acc parallel loop
 do j = 1,n
   ...array...
 end do
  end do
  !$acc end data

The problem here is that all of those internal variables would end up
getting marked as firstprivate. And that would cause more data to be
transferred to the accelerator. This patch reinitialized those variables
on the accelerator so they don't have to be transferred at all.

Cesar


Re: [gomp4 03/14] nvptx: expand support for address spaces

2015-10-20 Thread Cesar Philippidis
On 10/20/2015 02:13 PM, Bernd Schmidt wrote:
> On 10/20/2015 11:04 PM, Alexander Monakov wrote:
>> On Tue, 20 Oct 2015, Bernd Schmidt wrote:
>>
>>> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
 This allows to emit decls in 'shared' memory from the middle-end.

   * config/nvptx/nvptx.c (nvptx_legitimate_address_p): Adjust
 prototype.
   (nvptx_section_for_decl): If type of decl has a specific
 address
   space, return it.
   (nvptx_addr_space_from_address): Ditto.
   (TARGET_ADDR_SPACE_POINTER_MODE): Define.
   (TARGET_ADDR_SPACE_ADDRESS_MODE): Ditto.
   (TARGET_ADDR_SPACE_SUBSET_P): Ditto.
   (TARGET_ADDR_SPACE_CONVERT): Ditto.
   (TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Ditto.
>>>
>>> Not a fan of this I'm afraid. I used to have address space support in
>>> the
>>> nvptx backend, but the middle-end was too broken for it to work, so I
>>> made
>>> nvptx deal with all the address space complications internally. Is
>>> there a
>>> reason why this approach can't work for what you want to do? (Also,
>>> where are
>>> you using this?)
>>
>> It is used in patch 06/14, to copy omp_data_o to shared memory.  I
>> don't see
>> any other sane approach.
> 
> There is an alternative - decorate anything you'd like to go to shared
> memory with a special attribute, then handled that attribute in
> nvptx_addr_space_from_address and nvptx_section_for_decl. I actually
> made such a patch for Cesar a while ago, maybe he still has it?
> 
> This would avoid the pitfalls with gcc's middle-end address space
> handling, and the #ifdef ADDR_SPACE_SHARED in patch 6 which is a bit ugly.

Was it this one that you're referring to Bernd? I think this is the
patch that introduces the "oacc ganglocal" attribute. It has bitrot
significantly though.

Regardless, keep in mind that we're abandoning dynamically allocated
shared memory in gcc 6.0. Right now in gomp-4_0-branch the two use cases
for shared memory are spill-and-fill for worker variable broadcasting
and worker reductions.

What are you planning on using shared memory for? It's an extremely
limited resource and it has some quirks.

Cesar
Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c	(revision 224547)
+++ gcc/cgraphunit.c	(working copy)
@@ -2171,6 +2171,23 @@ ipa_passes (void)
   execute_ipa_pass_list (passes->all_small_ipa_passes);
   if (seen_error ())
 	return;
+
+  if (g->have_offload)
+	{
+	  extern void write_offload_lto ();
+	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+	  write_offload_lto ();
+	}
+}
+  bool do_local_opts = !in_lto_p;
+#ifdef ACCEL_COMPILER
+  do_local_opts = true;
+#endif
+  if (do_local_opts)
+{
+  execute_ipa_pass_list (passes->all_local_opt_passes);
+  if (seen_error ())
+	return;
 }
 
   /* This extra symtab_remove_unreachable_nodes pass tends to catch some
@@ -2182,7 +2199,7 @@ ipa_passes (void)
   if (symtab->state < IPA_SSA)
 symtab->state = IPA_SSA;
 
-  if (!in_lto_p)
+  if (do_local_opts)
 {
   /* Generate coverage variables and constructors.  */
   coverage_finish ();
@@ -2285,6 +2302,14 @@ symbol_table::compile (void)
   if (seen_error ())
 return;
 
+#ifdef ACCEL_COMPILER
+  {
+cgraph_node *node;
+FOR_EACH_DEFINED_FUNCTION (node)
+  node->get_untransformed_body ();
+  }
+#endif
+
 #ifdef ENABLE_CHECKING
   symtab_node::verify_symtab_nodes ();
 #endif
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 224547)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -1171,18 +1171,42 @@ nvptx_section_from_addr_space (addr_spac
 }
 }
 
-/* Determine whether DECL goes into .const or .global.  */
+/* Determine the address space DECL lives in.  */
 
-const char *
-nvptx_section_for_decl (const_tree decl)
+static addr_space_t
+nvptx_addr_space_for_decl (const_tree decl)
 {
+  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
+return ADDR_SPACE_GENERIC;
+
+  if (lookup_attribute ("oacc ganglocal", DECL_ATTRIBUTES (decl)) != NULL_TREE)
+return ADDR_SPACE_SHARED;
+
   bool is_const = (CONSTANT_CLASS_P (decl)
 		   || TREE_CODE (decl) == CONST_DECL
 		   || TREE_READONLY (decl));
   if (is_const)
-return ".const";
+return ADDR_SPACE_CONST;
 
-  return ".global";
+  return ADDR_SPACE_GLOBAL;
+}
+
+/* Return a ptx string representing the address space for a variable DECL.  */
+
+const char *
+nvptx_section_for_decl (const_tree decl)
+{
+  switch (nvptx_addr_space_for_decl (decl))
+{
+case ADDR_SPACE_CONST:
+  return ".const";
+case ADDR_SPACE_SHARED:
+  return ".shared";
+case ADDR_SPACE_GLOBAL:
+  return ".global";
+default:
+  gcc_unreachable ();
+}
 }
 
 /* Look for a SYMBOL_REF in ADDR and return the address space to be used
@@ -1196,17 +1220,7 @@ nvp

Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Cesar Philippidis
On 10/22/2015 07:23 AM, Nathan Sidwell wrote:
> On 10/22/15 10:05, Jakub Jelinek wrote:
>> On Thu, Oct 22, 2015 at 09:53:46AM -0400, Nathan Sidwell wrote:
>>> On 10/22/15 05:37, Jakub Jelinek wrote:
>>>
 And, I must say I'm at least missing testcases that check parsing
 but also
 runtime behavior of the vector or worker clause arguments (there
 is one gang (static:1) clause, but not the other clauses nor other
 styles of
 gang arguments.
>>>
>>> the static clause is only valid on gang.
>>
>> That is what I've figured out.
>> But it is unclear from the parsing what from these is allowed:
> 
> good questions.  As you may have guessed, I'm not the primary author of
> the parsing code.  Cesar's stepped up to address this.

I'll go into more detail later when I post the revised patch, but for
the time being, in response to your to your earlier question I've
inlined how the clauses should be translated in comments below:

> But it is unclear from the parsing what from these is allowed:

int v, w;
...
gang(26)  // equivalent to gang(num:26)
gang(v)   // gang(num:v)
vector(length: 16)  // vector(length: 16)
vector(length: v)  // vector(length: v)
vector(16)  // vector(length: 16)
vector(v)   // vector(length: v)
worker(num: 16)  // worker(num: 16)
worker(num: v)   // worker(num: 16)
worker(16)  // worker(num: 16)
worker(v)   // worker(num: 16)
gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
  // should be an error
gang(v, w)  // likewise
gang(static: 16, num: 5)  // gang(static: 16, num: 5)
gang(static: v, num: w)   // gang(static: v, num: w)
gang(num: 5, static: 4)   // gang(num: 5, static: 4)
gang(num: v, static: w)   // gang(num: v, static: w)

Also note that the static argument can accept '*'.

> and if the length: or num: part is really optional, then
> int length, num;
> vector(length)
> worker(num)
> gang(num, static: 6)
> gang(static: 5, num)
> should be also accepted (or subset thereof?).

Interesting question. The spec is unclear. It defines gang, worker and
vector as follows in section 2.7 in the OpenACC 2.0a spec:

  gang [( gang-arg-list )]
  worker [( [num:] int-expr )]
  vector [( [length:] int-expr )]

where gang-arg is one of:

  [num:] int-expr
  static: size-expr

and gang-arg-list may have at most one num and one static argument,
and where size-expr is one of:

  *
  int-expr

So I've interpreted that as a requirement that length and num must be
followed by an int-expr, whatever that is.

I've been meaning to cleanup to up the c and c++ front ends for a while
now, but I've been bogged down by other things. This is next on my todo
list.

Cesar


Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Cesar Philippidis
On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>> But it is unclear from the parsing what from these is allowed:
>>
>> int v, w;
>> ...
>> gang(26)  // equivalent to gang(num:26)
>> gang(v)   // gang(num:v)
>> vector(length: 16)  // vector(length: 16)
>> vector(length: v)  // vector(length: v)
>> vector(16)  // vector(length: 16)
>> vector(v)   // vector(length: v)
>> worker(num: 16)  // worker(num: 16)
>> worker(num: v)   // worker(num: 16)
>> worker(16)  // worker(num: 16)
>> worker(v)   // worker(num: 16)
>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>   // should be an error
>> gang(v, w)  // likewise
>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>> gang(static: v, num: w)   // gang(static: v, num: w)
>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>> gang(num: v, static: w)   // gang(num: v, static: w)
>>
>> Also note that the static argument can accept '*'.
>>
>>> and if the length: or num: part is really optional, then
>>> int length, num;
>>> vector(length)
>>> worker(num)
>>> gang(num, static: 6)
>>> gang(static: 5, num)
>>> should be also accepted (or subset thereof?).
>>
>> Interesting question. The spec is unclear. It defines gang, worker and
>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>
>>   gang [( gang-arg-list )]
>>   worker [( [num:] int-expr )]
>>   vector [( [length:] int-expr )]
>>
>> where gang-arg is one of:
>>
>>   [num:] int-expr
>>   static: size-expr
>>
>> and gang-arg-list may have at most one num and one static argument,
>> and where size-expr is one of:
>>
>>   *
>>   int-expr
>>
>> So I've interpreted that as a requirement that length and num must be
>> followed by an int-expr, whatever that is.
> 
> My reading of the above is that
> vector(length)
> is equivalent to
> vector(length: length)
> and
> worker(num)
> is equivalent to
> vector(num: num)
> etc.  Basically, neither length nor num aren't reserved identifiers,
> so you can use them for variable names, and if
> vector(v) is equivalent to vector(length: v), then
> vector(length) should be equivalent to vector(length:length)
> or
> vector(length + 1) should be equivalent to vector(length: length+1)
> static is a keyword that can't start an integral expression, so I guess
> it is fine if you issue an expected : diagnostics after it.

You're correct. I overlooked that 'int length, num' declaration.

> In any case, please add a testcase (both C and C++) which covers all these
> allowed variants (ideally one testcase) and rejected variants (another
> testcase with dg-error).
> 
> This is still an easy case, as even the C FE has 2 tokens lookup.
> E.g. for OpenMP map clause where
> map (always, tofrom: x)
> means one thing and
> map (always, tofrom, y)
> another one (map (tofrom: always, tofrom, y))
> I had to do quite ugly things to get around this.

I'll add more test cases.

Thanks,
Cesar



more accurate omp in fortran

2015-10-22 Thread Cesar Philippidis
Currently, for certain omp and oacc errors the fortran will inaccurately
report exactly where in the omp/acc construct the error has occurred. E.g.

   !$acc parallel copy (i) copy (i) copy (j)
   1
Error: Symbol ‘i’ present on multiple clauses at (1)

instead of

   !$acc parallel copy (i) copy (i) copy (j)
1
Error: Symbol ‘i’ present on multiple clauses at (1)

The problem here is how the front end uses the locus for the construct
and not the individual clause. As a result that diagnostic pointer
points to the end of the construct.

This patch teaches gfc_resolve_omp_clauses how to use the locus of each
individual clause instead of the construct when reporting errors
involving OMP_LIST_ clauses (which are typically clauses involving
variables). It's still not perfect, but it does improve the quality of
the error reporting a little. In particular, in openacc, other compilers
are somewhat lenient in allowing variables to appear in multiple
clauses, e.g. copyin (foo) copyout (foo), but this is clearly forbidden
by the spec. I received some bug reports complaining that gfortran's
errors aren't accurate.

I've also split off the check for variables appearing in multiple
clauses into a separate function. It's a little overkill for trunk right
now, but it is used quite a bit in gomp4 for oacc declare.

I've tested these changes on x86_64. Is this ok for trunk?

Cesar


2015-10-22  Cesar Philippidis  

	gcc/fortran/
	* gfortran.h (gfc_omp_namespace): Add locus where member.
	* openmp.c (gfc_match_omp_variable_list): Set where for each list
	item found.
	(resolve_omp_duplicate_list): New function.
	(oacc_compatible_clauses): Delete.
	(resolve_omp_clauses): Remove where argument and use the where
	gfc_omp_namespace member when reporting errors.  Use
	resolve_omp_duplicate_list to check for variables appearing in
	mulitple clauses.
	(resolve_omp_do): Update call to resolve_omp_clauses.
	(resolve_oacc_loop): Likewise.
	(gfc_resolve_oacc_directive): Likewise.
	(gfc_resolve_omp_directive): Likewise.
	(gfc_resolve_omp_declare_simd): Likewise.

	gcc/testsuite/
	* gfortran.dg/gomp/intentin1.f90: Adjust copyprivate warning.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index b2894cc..93adb7b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
+  locus where;
 }
 gfc_omp_namelist;
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 3c12d8e..56a95d4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	}
 	  tail->sym = sym;
 	  tail->expr = expr;
+	  tail->where = cur_loc;
 	  goto next_item;
 	case MATCH_NO:
 	  break;
@@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->sym = sym;
+	  tail->where = cur_loc;
 	}
 
 next_item:
@@ -2832,36 +2834,47 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
   return copy;
 }
 
-/* Returns true if clause in list 'list' is compatible with any of
-   of the clauses in lists [0..list-1].  E.g., a reduction variable may
-   appear in both reduction and private clauses, so this function
-   will return true in this case.  */
+/* Check if a variable appears in multiple clauses.  */
 
-static bool
-oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
-			   gfc_symbol *sym, bool openacc)
+static void
+resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
+			int list)
 {
   gfc_omp_namelist *n;
+  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
 
-  if (!openacc)
-return false;
+  /* OpenACC reduction clauses are compatible with everything.  We only
+ need to check if a reduction variable is used more than once.  */
+  if (openacc && list == OMP_LIST_REDUCTION)
+{
+  hash_set reductions;
 
-  if (list != OMP_LIST_REDUCTION)
-return false;
+  for (n = clause_list; n; n = n->next)
+	{
+	  if (reductions.contains (n->sym))
+	gfc_error (error_msg, n->sym->name, &n->where);
+	  else
+	reductions.add (n->sym);
+	}
 
-  for (n = clauses->lists[OMP_LIST_FIRST]; n; n = n->next)
-if (n->sym == sym)
-  return true;
+  return;
+}
 
-  return false;
+  /* Ensure that variables are only used in one clause.  */
+  for (n = clause_list; n; n = n->next)
+{
+  if (n->sym->mark)
+	gfc_error (error_msg, n->sym->name, &n->where);
+  else
+	n->sym->mark = 1;
+}
 }
 
 /* OpenMP directive resolving routines.  */
 
 static void
-resolve_omp_clauses (gfc_code *code, locus *where,
-		 gfc_omp_cl

Re: Re: [OpenACC 4/11] C FE changes

2015-10-23 Thread Cesar Philippidis
On 10/22/2015 01:22 AM, Jakub Jelinek wrote:
> On Wed, Oct 21, 2015 at 03:16:20PM -0400, Nathan Sidwell wrote:
>> 2015-10-20  Cesar Philippidis  
>>  Thomas Schwinge  
>>  James Norris  
>>  Joseph Myers  
>>  Julian Brown  
>>
>>  * c-parser.c (c_parser_oacc_shape_clause): New.
>>  (c_parser_oacc_simple_clause): New.
>>  (c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
>>  (OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.
> 
> Ok, with one nit.
> 
>>  /* OpenACC:
>> +   gang [( gang_expr_list )]
>> +   worker [( expression )]
>> +   vector [( expression )] */
>> +
>> +static tree
>> +c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
>> +const char *str, tree list)
> 
> I think it would be better to remove the c_kind argument and pass to this
> function omp_clause_code kind instead.  The callers are already in a big
> switch, with a separate call for each of the clauses.
> After all, e.g. for c_parser_oacc_simple_clause you already do it that way
> too.
> 
>> +{
>> +  omp_clause_code kind;
>> +  const char *id = "num";
>> +
>> +  switch (c_kind)
>> +{
>> +default:
>> +  gcc_unreachable ();
>> +case PRAGMA_OACC_CLAUSE_GANG:
>> +  kind = OMP_CLAUSE_GANG;
>> +  break;
>> +case PRAGMA_OACC_CLAUSE_VECTOR:
>> +  kind = OMP_CLAUSE_VECTOR;
>> +  id = "length";
>> +  break;
>> +case PRAGMA_OACC_CLAUSE_WORKER:
>> +  kind = OMP_CLAUSE_WORKER;
>> +  break;
>> +}
> 
> Then you can replace this switch with just if (kind == OMP_CLAUSE_VECTOR)
> id = "length";

Good idea, thanks. This patch also corrects the problems parsing weird
combinations of num, static and length arguments that you mentioned
elsewhere.

Is this OK for trunk?

Nathan, can you try out this patch with your updated patch set? I saw
some test cases getting stuck when expanding expand_GOACC_DIM_SIZE in on
the host compiler, which is wrong. I don't see that happening in
gomp-4_0-branch with this patch. Also, can you merge this patch along
with the c++ and new test case patches to trunk? I'll handle the gomp4
backport.

Cesar

2015-10-20  Cesar Philippidis  
	Thomas Schwinge  
	James Norris  
	Joseph Myers  
	Julian Brown  
	Bernd Schmidt  

	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..1e3c333 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,142 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 }
 
 /* OpenACC:
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
+
+static tree
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+			const char *str, tree list)
+{
+  const char *id = "num";
+
+  if (kind == OMP_CLAUSE_VECTOR)
+id = "length";
+
+  tree op0 = NULL_TREE, op1 = NULL_TREE;
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+{
+  tree *op_to_parse = &op0;
+  c_parser_consume_token (parser);
+
+  do
+	{
+	  loc = c_parser_peek_token (parser)->location;
+	  op_to_parse = &op0;
+
+	  if ((c_parser_next_token_is (parser, CPP_NAME)
+	   || c_parser_next_token_is (parser, CPP_KEYWORD))
+	  && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+	{
+	  tree name_kind = c_parser_peek_token (parser)->value;
+	  const char *p = IDENTIFIER_POINTER (name_kind);
+	  if (kind == OMP_CLAUSE_GANG
+		  && c_parser_next_token_is_keyword (parser, RID_STATIC))
+		{
+		  c_parser_consume_token (parser); /* static  */
+		  c_parser_consume_token (parser); /* ':'  */
+
+		  op_to_parse = &op1;
+		  if (c_parser_next_token_is (parser, CPP_MULT))
+		{
+		  c_parser_consume_token (parser);
+		  *op_to_parse = integer_minus_one_node;
+
+		  /* Consume a comma if present.  */
+		  if (c_parser_next_token_is (parser, CPP_COMMA))
+			c_parser_consume_token (parser);
+
+		  continue;
+		}
+		}
+	  else if (strcmp (id, p) == 0)
+		{
+		  c_parser_consume_token (parser);  /* id  */
+		  c_parser_consume_token (parser);  /* ':'  */
+		}
+	  else
+		{
+		  if (kind == OMP_CLAUSE_GANG)

Re: Re: [OpenACC 5/11] C++ FE changes

2015-10-23 Thread Cesar Philippidis
On 10/22/2015 01:52 AM, Jakub Jelinek wrote:
> On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
>> This patch is the C++ changes matching the C ones of patch 4.  In
>> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
>> as OpenMP's 'num_threads' clause.  One change to num_threads is the
>> augmentation of a diagnostic to add %<...%>  markers to the clause name.
> 
> Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
> keywords.  Something to fix eventually.

I updated omp tasks and teams in semantics.c.

>> 2015-10-20  Cesar Philippidis  
>>  Thomas Schwinge  
>>  James Norris  
>>  Joseph Myers  
>>  Julian Brown  
>>  Nathan Sidwell 
>>
>>  * parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>>  vector, worker.
>>  (cp_parser_oacc_simple_clause): New.
>>  (cp_parser_oacc_shape_clause): New.
> 
> What I've said for the C FE patch, plus:
> 
>> +  if (cp_lexer_next_token_is (lexer, CPP_NAME)
>> +  || cp_lexer_next_token_is (lexer, CPP_KEYWORD))
>> +{
>> +  tree name_kind = cp_lexer_peek_token (lexer)->u.value;
>> +  const char *p = IDENTIFIER_POINTER (name_kind);
>> +  if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
> 
> As static is a keyword, wouldn't it be better to just handle that case
> using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?
> 
> Also, what is the exact grammar of the shape arguments?
> Would be nice to describe the grammar, in the grammar you just say
> expression, at least for vector/worker, which is clearly not accurate.
> 
> It seems the intent is that num: or length: or static: is optional, right?
> But if that is the case, you should treat those as parsed only if followed
> by :.  While static is a keyword, so you can't have a variable called like
> that, having vector(length) or vector(num) should not be rejected.
> So, I would have expected that it should test if it is RID_STATIC
> followed by CPP_COLON (and only in that case consume those tokens),
> or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
> tokens), otherwise parse it as assignment expression.

That function now peeks ahead to look for a colon, so now it can handle
variables with the name of clause keywords.

> The C FE may have similar issue.  Plus of course there should be testsuite
> coverage for all the weird cases.

I included a new test in a different patch because it's common to both c
and c++.

>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_WORKER:
>> +  /* Operand 0 is the num: or length: argument.  */
>> +  t = OMP_CLAUSE_OPERAND (c, 0);
>> +  if (t == NULL_TREE)
>> +break;
>> +
>> +  t = maybe_convert_cond (t);
> 
> Can you explain the maybe_convert_cond calls (in both cases here,
> plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
> The reason why it is used for OpenMP if and final clauses is that those have
> a condition argument, either the condition is zero or non-zero (so
> effectively it is turned into a bool).
> But aren't the gang/vector/worker/vector_length arguments integers rather
> than conditions?  I'd expect that finish_omp_clauses should verify
> those operands are indeed integral expressions (if that is the requirement
> in the standard), as it is something that for C++ can't be verified during
> parsing, if arbitrary expressions are parsed there.

It's probably a copy-and-paste error. This functionality was added
incrementally. I removed that check.

>> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
>>break;
>>  
>>  case OMP_CLAUSE_NUM_THREADS:
>> -  t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
>> -  if (t == error_mark_node)
>> -remove = true;
>> -  else if (!type_dependent_expression_p (t)
>> -   && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>> -{
>> -  error ("num_threads expression must be integral");
>> -  remove = true;
>> -}
>> -  else
>> -{
>> -  t = mark_rvalue_use (t);
>> -  if (!processing_template_decl)
>> -{
>> -  t = maybe_constant_value (t);
>> -  if (TREE_CODE (t) == INTEGER_CST
>> -  && tree_int_cst_sgn (t) != 1)
>> -{
>> -  warning_at (OMP_CLAUSE_LOCATION (c), 0,
>> -  

Re: [OpenACC 11/11] execution tests

2015-10-23 Thread Cesar Philippidis
On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>> But it is unclear from the parsing what from these is allowed:
>>
>> int v, w;
>> ...
>> gang(26)  // equivalent to gang(num:26)
>> gang(v)   // gang(num:v)
>> vector(length: 16)  // vector(length: 16)
>> vector(length: v)  // vector(length: v)
>> vector(16)  // vector(length: 16)
>> vector(v)   // vector(length: v)
>> worker(num: 16)  // worker(num: 16)
>> worker(num: v)   // worker(num: 16)
>> worker(16)  // worker(num: 16)
>> worker(v)   // worker(num: 16)
>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>   // should be an error
>> gang(v, w)  // likewise
>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>> gang(static: v, num: w)   // gang(static: v, num: w)
>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>> gang(num: v, static: w)   // gang(num: v, static: w)
>>
>> Also note that the static argument can accept '*'.
>>
>>> and if the length: or num: part is really optional, then
>>> int length, num;
>>> vector(length)
>>> worker(num)
>>> gang(num, static: 6)
>>> gang(static: 5, num)
>>> should be also accepted (or subset thereof?).
>>
>> Interesting question. The spec is unclear. It defines gang, worker and
>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>
>>   gang [( gang-arg-list )]
>>   worker [( [num:] int-expr )]
>>   vector [( [length:] int-expr )]
>>
>> where gang-arg is one of:
>>
>>   [num:] int-expr
>>   static: size-expr
>>
>> and gang-arg-list may have at most one num and one static argument,
>> and where size-expr is one of:
>>
>>   *
>>   int-expr
>>
>> So I've interpreted that as a requirement that length and num must be
>> followed by an int-expr, whatever that is.
> 
> My reading of the above is that
> vector(length)
> is equivalent to
> vector(length: length)
> and
> worker(num)
> is equivalent to
> vector(num: num)
> etc.  Basically, neither length nor num aren't reserved identifiers,
> so you can use them for variable names, and if
> vector(v) is equivalent to vector(length: v), then
> vector(length) should be equivalent to vector(length:length)
> or
> vector(length + 1) should be equivalent to vector(length: length+1)
> static is a keyword that can't start an integral expression, so I guess
> it is fine if you issue an expected : diagnostics after it.
> 
> In any case, please add a testcase (both C and C++) which covers all these
> allowed variants (ideally one testcase) and rejected variants (another
> testcase with dg-error).
> 
> This is still an easy case, as even the C FE has 2 tokens lookup.
> E.g. for OpenMP map clause where
> map (always, tofrom: x)
> means one thing and
> map (always, tofrom, y)
> another one (map (tofrom: always, tofrom, y))
> I had to do quite ugly things to get around this.

Here are the updated test cases. Besides for adding a new test to
exercise the loop shape parsing, I also removed that assembly file
included in the original patch that Ilya noticed.

Is this OK for trunk?

Cesar

2015-10-23  Nathan Sidwell  

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: New.

2015-10-23  Cesar Philippidis  

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c: New.


diff --git a/gcc/testsuite/c-c++-common/goacc/loop-shape.c b/gcc/testsuite/c-c++-common/goacc/loop-shape.c
new file mode 100644
index 000..3cb3006
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/loop-shape.c
@@ -0,0 +1,197 @@
+/* Exercise *_parser_oacc_shape_clause by checking various combinations
+   of gang, worker and vector clause arguments.  */
+
+/* { dg-compile } */
+
+int main ()
+{
+  int i;
+  int v, w;
+  int length, num;
+
+  /* Valid uses.  */
+
+  #pragma acc kernels
+  #pragma acc loop gang worker vector
+  for (i = 0; i < 10; i++)
+;
+  
+  #pragma acc kernels
+  #pragma acc loop gang(26)
+  for (i = 0; i < 10; i++)
+;
+  
+  #pragma acc kernels
+  #pragma acc loop gang(v)
+  for (i = 0; i < 10; i++)
+;
+
+  #pragma acc kernels
+  #pragma acc loop vector(length: 16)
+  for (i = 0; i < 10; i++)
+;
+
+  #pragma acc kernels
+  #pragma acc loop vector(length: v)
+  for (i = 0; i < 10; i++)
+;
+
+  #pragma acc kernels
+  #pra

Re: [OpenACC 4/11] C FE changes

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 01:31 PM, Jakub Jelinek wrote:
> On Fri, Oct 23, 2015 at 01:17:07PM -0700, Cesar Philippidis wrote:
>> Good idea, thanks. This patch also corrects the problems parsing weird
>> combinations of num, static and length arguments that you mentioned
>> elsewhere.
>>
>> Is this OK for trunk?
> 
> I'd strongly prefer to see always patches accompanied by testcases.
> 
>> +  loc = c_parser_peek_token (parser)->location;
>> +  op_to_parse = &op0;
>> +
>> +  if ((c_parser_next_token_is (parser, CPP_NAME)
>> +   || c_parser_next_token_is (parser, CPP_KEYWORD))
>> +  && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
>> +{
>> +  tree name_kind = c_parser_peek_token (parser)->value;
>> +  const char *p = IDENTIFIER_POINTER (name_kind);
> 
> I think I'd prefer not to peek at this at all if it is RID_STATIC,
> so perhaps just have (and name_kind is weird):
> else
>   {
> tree val = c_parser_peek_token (parser)->value;
> if (strcmp (id, IDENTIFIER_POINTER (val)) == 0)
>   {
> c_parser_consume_token (parser);  /* id  */
> c_parser_consume_token (parser);  /* ':'  */
>   }
> else
>   {
> ...
>   }
>   }
> ?

My plan over here was try and catch any arguments with a colon. But that
fell threw because...

>> +  if (kind == OMP_CLAUSE_GANG
>> +  && c_parser_next_token_is_keyword (parser, RID_STATIC))
>> +{
>> +  c_parser_consume_token (parser); /* static  */
>> +  c_parser_consume_token (parser); /* ':'  */
>> +
>> +  op_to_parse = &op1;
>> +  if (c_parser_next_token_is (parser, CPP_MULT))
>> +{
>> +  c_parser_consume_token (parser);
>> +  *op_to_parse = integer_minus_one_node;
>> +
>> +  /* Consume a comma if present.  */
>> +  if (c_parser_next_token_is (parser, CPP_COMMA))
>> +c_parser_consume_token (parser);
> 
> Doesn't this mean that you happily parse
> gang (static: * abc)
> or
> gang (static:*num:1)
> etc.?  I'd say the comma should be non-optional (i.e. either accept
> CPP_COMMA, or CPP_CLOSE_PARENT, but nothing else) in that case (at least,
> when in OpenMP grammar something is *-list it is meant to be comma
> separated).

I'm not handling commas properly. My next patch is going to handle the
static argument separately.

>> +  /* Consume a comma if present.  */
>> +  if (c_parser_next_token_is (parser, CPP_COMMA))
>> +c_parser_consume_token (parser);
> 
> Similarly this means
> gang (num: 5 static: *)
> is accepted.  If it is valid, then again it should have testsuite coverage.

I'll include a test case for this with the next patch.

Cesar



Re: [OpenACC 4/11] C FE changes

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 02:31 PM, Cesar Philippidis wrote:
> On 10/23/2015 01:31 PM, Jakub Jelinek wrote:
>> On Fri, Oct 23, 2015 at 01:17:07PM -0700, Cesar Philippidis wrote:
>>> Good idea, thanks. This patch also corrects the problems parsing weird
>>> combinations of num, static and length arguments that you mentioned
>>> elsewhere.
>>>
>>> Is this OK for trunk?
>>
>> I'd strongly prefer to see always patches accompanied by testcases.
>>
>>> + loc = c_parser_peek_token (parser)->location;
>>> + op_to_parse = &op0;
>>> +
>>> + if ((c_parser_next_token_is (parser, CPP_NAME)
>>> +  || c_parser_next_token_is (parser, CPP_KEYWORD))
>>> + && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
>>> +   {
>>> + tree name_kind = c_parser_peek_token (parser)->value;
>>> + const char *p = IDENTIFIER_POINTER (name_kind);
>>
>> I think I'd prefer not to peek at this at all if it is RID_STATIC,
>> so perhaps just have (and name_kind is weird):
>>else
>>  {
>>tree val = c_parser_peek_token (parser)->value;
>>if (strcmp (id, IDENTIFIER_POINTER (val)) == 0)
>>  {
>>c_parser_consume_token (parser);  /* id  */
>>c_parser_consume_token (parser);  /* ':'  */
>>  }
>>else
>>  {
>> ...
>>  }
>>  }
>> ?
> 
> My plan over here was try and catch any arguments with a colon. But that
> fell threw because...
> 
>>> + if (kind == OMP_CLAUSE_GANG
>>> + && c_parser_next_token_is_keyword (parser, RID_STATIC))
>>> +   {
>>> + c_parser_consume_token (parser); /* static  */
>>> + c_parser_consume_token (parser); /* ':'  */
>>> +
>>> + op_to_parse = &op1;
>>> + if (c_parser_next_token_is (parser, CPP_MULT))
>>> +   {
>>> + c_parser_consume_token (parser);
>>> + *op_to_parse = integer_minus_one_node;
>>> +
>>> + /* Consume a comma if present.  */
>>> + if (c_parser_next_token_is (parser, CPP_COMMA))
>>> +   c_parser_consume_token (parser);
>>
>> Doesn't this mean that you happily parse
>> gang (static: * abc)
>> or
>> gang (static:*num:1)
>> etc.?  I'd say the comma should be non-optional (i.e. either accept
>> CPP_COMMA, or CPP_CLOSE_PARENT, but nothing else) in that case (at least,
>> when in OpenMP grammar something is *-list it is meant to be comma
>> separated).
> 
> I'm not handling commas properly. My next patch is going to handle the
> static argument separately.
> 
>>> + /* Consume a comma if present.  */
>>> + if (c_parser_next_token_is (parser, CPP_COMMA))
>>> +   c_parser_consume_token (parser);
>>
>> Similarly this means
>> gang (num: 5 static: *)
>> is accepted.  If it is valid, then again it should have testsuite coverage.
> 
> I'll include a test case for this with the next patch.

Here's the updated patch. Hopefully I addressed everything. Thank you
for suggesting all of those test cases.

Is this OK for trunk?

Cesar

2015-10-23  Cesar Philippidis  
	Thomas Schwinge  
	James Norris  
	Joseph Myers  
	Julian Brown  
	Bernd Schmidt  

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

2015-10-23  Cesar Philippidis  

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c (int main):

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..7d2baa9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,156 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 }
 
 /* OpenACC:
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
+
+static tree
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+			const char *str, tree list)
+{
+  const char *id = "num";
+  tree op0 = NULL_TREE, op1 = NULL_TREE, c;
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  if (kind == OMP_CLAUSE

Re: [OpenACC 5/11] C++ FE changes

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 01:25 PM, Cesar Philippidis wrote:
> On 10/22/2015 01:52 AM, Jakub Jelinek wrote:
>> On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
>>> This patch is the C++ changes matching the C ones of patch 4.  In
>>> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
>>> as OpenMP's 'num_threads' clause.  One change to num_threads is the
>>> augmentation of a diagnostic to add %<...%>  markers to the clause name.
>>
>> Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
>> keywords.  Something to fix eventually.
> 
> I updated omp tasks and teams in semantics.c.
> 
>>> 2015-10-20  Cesar Philippidis  
>>> Thomas Schwinge  
>>> James Norris  
>>> Joseph Myers  
>>> Julian Brown  
>>> Nathan Sidwell 
>>>
>>> * parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>>> vector, worker.
>>> (cp_parser_oacc_simple_clause): New.
>>> (cp_parser_oacc_shape_clause): New.
>>
>> What I've said for the C FE patch, plus:
>>
>>> + if (cp_lexer_next_token_is (lexer, CPP_NAME)
>>> + || cp_lexer_next_token_is (lexer, CPP_KEYWORD))
>>> +   {
>>> + tree name_kind = cp_lexer_peek_token (lexer)->u.value;
>>> + const char *p = IDENTIFIER_POINTER (name_kind);
>>> + if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
>>
>> As static is a keyword, wouldn't it be better to just handle that case
>> using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?
>>
>> Also, what is the exact grammar of the shape arguments?
>> Would be nice to describe the grammar, in the grammar you just say
>> expression, at least for vector/worker, which is clearly not accurate.
>>
>> It seems the intent is that num: or length: or static: is optional, right?
>> But if that is the case, you should treat those as parsed only if followed
>> by :.  While static is a keyword, so you can't have a variable called like
>> that, having vector(length) or vector(num) should not be rejected.
>> So, I would have expected that it should test if it is RID_STATIC
>> followed by CPP_COLON (and only in that case consume those tokens),
>> or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
>> tokens), otherwise parse it as assignment expression.
> 
> That function now peeks ahead to look for a colon, so now it can handle
> variables with the name of clause keywords.
> 
>> The C FE may have similar issue.  Plus of course there should be testsuite
>> coverage for all the weird cases.
> 
> I included a new test in a different patch because it's common to both c
> and c++.
> 
>>> +   case OMP_CLAUSE_GANG:
>>> +   case OMP_CLAUSE_VECTOR:
>>> +   case OMP_CLAUSE_WORKER:
>>> + /* Operand 0 is the num: or length: argument.  */
>>> + t = OMP_CLAUSE_OPERAND (c, 0);
>>> + if (t == NULL_TREE)
>>> +   break;
>>> +
>>> + t = maybe_convert_cond (t);
>>
>> Can you explain the maybe_convert_cond calls (in both cases here,
>> plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
>> The reason why it is used for OpenMP if and final clauses is that those have
>> a condition argument, either the condition is zero or non-zero (so
>> effectively it is turned into a bool).
>> But aren't the gang/vector/worker/vector_length arguments integers rather
>> than conditions?  I'd expect that finish_omp_clauses should verify
>> those operands are indeed integral expressions (if that is the requirement
>> in the standard), as it is something that for C++ can't be verified during
>> parsing, if arbitrary expressions are parsed there.
> 
> It's probably a copy-and-paste error. This functionality was added
> incrementally. I removed that check.
> 
>>> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
>>>   break;
>>>  
>>> case OMP_CLAUSE_NUM_THREADS:
>>> - t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
>>> - if (t == error_mark_node)
>>> -   remove = true;
>>> - else if (!type_dependent_expression_p (t)
>>> -  && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>>> -   {
>>> - error ("num_threads expression must be integral");
>>> - remove = true;
>>> -   }
>>> - else
>>> -  

Re: [OpenACC 11/11] execution tests

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 01:29 PM, Cesar Philippidis wrote:
> On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
>> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>>> But it is unclear from the parsing what from these is allowed:
>>>
>>> int v, w;
>>> ...
>>> gang(26)  // equivalent to gang(num:26)
>>> gang(v)   // gang(num:v)
>>> vector(length: 16)  // vector(length: 16)
>>> vector(length: v)  // vector(length: v)
>>> vector(16)  // vector(length: 16)
>>> vector(v)   // vector(length: v)
>>> worker(num: 16)  // worker(num: 16)
>>> worker(num: v)   // worker(num: 16)
>>> worker(16)  // worker(num: 16)
>>> worker(v)   // worker(num: 16)
>>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>>   // should be an error
>>> gang(v, w)  // likewise
>>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>>> gang(static: v, num: w)   // gang(static: v, num: w)
>>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>>> gang(num: v, static: w)   // gang(num: v, static: w)
>>>
>>> Also note that the static argument can accept '*'.
>>>
>>>> and if the length: or num: part is really optional, then
>>>> int length, num;
>>>> vector(length)
>>>> worker(num)
>>>> gang(num, static: 6)
>>>> gang(static: 5, num)
>>>> should be also accepted (or subset thereof?).
>>>
>>> Interesting question. The spec is unclear. It defines gang, worker and
>>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>>
>>>   gang [( gang-arg-list )]
>>>   worker [( [num:] int-expr )]
>>>   vector [( [length:] int-expr )]
>>>
>>> where gang-arg is one of:
>>>
>>>   [num:] int-expr
>>>   static: size-expr
>>>
>>> and gang-arg-list may have at most one num and one static argument,
>>> and where size-expr is one of:
>>>
>>>   *
>>>   int-expr
>>>
>>> So I've interpreted that as a requirement that length and num must be
>>> followed by an int-expr, whatever that is.
>>
>> My reading of the above is that
>> vector(length)
>> is equivalent to
>> vector(length: length)
>> and
>> worker(num)
>> is equivalent to
>> vector(num: num)
>> etc.  Basically, neither length nor num aren't reserved identifiers,
>> so you can use them for variable names, and if
>> vector(v) is equivalent to vector(length: v), then
>> vector(length) should be equivalent to vector(length:length)
>> or
>> vector(length + 1) should be equivalent to vector(length: length+1)
>> static is a keyword that can't start an integral expression, so I guess
>> it is fine if you issue an expected : diagnostics after it.
>>
>> In any case, please add a testcase (both C and C++) which covers all these
>> allowed variants (ideally one testcase) and rejected variants (another
>> testcase with dg-error).
>>
>> This is still an easy case, as even the C FE has 2 tokens lookup.
>> E.g. for OpenMP map clause where
>> map (always, tofrom: x)
>> means one thing and
>> map (always, tofrom, y)
>> another one (map (tofrom: always, tofrom, y))
>> I had to do quite ugly things to get around this.
> 
> Here are the updated test cases. Besides for adding a new test to
> exercise the loop shape parsing, I also removed that assembly file
> included in the original patch that Ilya noticed.
> 
> Is this OK for trunk?

This patch is mostly the same as I posted earlier, with the exclusion of
the loop-shape parser test. That test was included with the c parser
changes.

Is this OK for trunk?

Cesar

2015-10-23  Nathan Sidwell  

	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.s: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: New.

diff --git a/libgomp/testsuite/libgomp.c++/member-2.C b/libgomp/testsuite/libgomp.c++/member-2.C
index bb348d8..bbe2bdf4 100644
--- a/libgomp/testsuite/libgomp.c++/member-2.C
+++ b/libgomp/testsuite/libgomp.c++/member-2.C
@@ -154,7 +154,7 @@ A::m1 ()
 {
   f = false;
 #pragma omp single
-#pragma omp taskloop lastprivate (a, T::t, b, n)
+#pragma omp taskloop lastprivate (a, T::t, b, n) private (R::r)
   for (int i = 0; i < 30;

Re: [OpenACC 4/11] C FE changes

2015-10-24 Thread Cesar Philippidis
On 10/24/2015 01:03 AM, Jakub Jelinek wrote:
> On Fri, Oct 23, 2015 at 07:31:51PM -0700, Cesar Philippidis wrote:
> 
>> +static tree
>> +c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
>> +const char *str, tree list)
>> +{
>> +  const char *id = "num";
>> +  tree op0 = NULL_TREE, op1 = NULL_TREE, c;
>> +  location_t loc = c_parser_peek_token (parser)->location;
>> +
>> +  if (kind == OMP_CLAUSE_VECTOR)
>> +id = "length";
>> +
>> +  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
>> +{
>> +  tree *op_to_parse = &op0;
>> +  c_token *next;
>> +
>> +  c_parser_consume_token (parser);
>> +
>> +  do
>> +{
>> +  op_to_parse = &op0;
>> +
>> +  /* Consume a comma if present.  */
>> +  if (c_parser_next_token_is (parser, CPP_COMMA))
>> +{
>> +  if (op0 == NULL && op1 == NULL)
>> +{
>> +  c_parser_error (parser, "unexpected argument");
>> +  goto cleanup_error;
>> +}
>> +
>> +  c_parser_consume_token (parser);
>> +}
> 
> This means you parse
> gang (, static: *)
> vector (, 5)
> etc., even when you error on it afterwards with unexpected argument,
> it is still different diagnostics from other invalid tokens immediately
> after the opening (.

So you didn't like how the error messages are inconsistent? It was
catching those errors.

I've added those new test cases. Unfortunately, c and c++ report
different error messages, so I had the make dg-error generic to that
line containing those types of errors.

> Also, loc and next are wrong if there is a valid comma.

Yeah, I don't think it needs to be adjusted in the loop. c_parser_error
already knows where to report the error at anyway.

> So I'm really wondering why
> gang (static: *, num: 5)
> works, because next is the CPP_COMMA token, so while
> c_parser_next_token_is (parser, CPP_NAME) matches the actual name,
> what exactly next->value contains is unclear.
> 
> I think it would be better to:
> 
>   tree ops[2] = { NULL_TREE, NULL_TREE };
> 
>   do
>   {
> // Note, declare these here
> c_token *next = c_parser_peek_token (parser);
> location_t loc = next->location;
> // Just use ops[idx] instead of *op_to_parse etc., though if you strongly
> // prefer *op_to_parse, I won't object.
> int idx = 0;
> // Note it seems generally the C parser doesn't check for CPP_KEYWORD
> // before calling c_parser_next_token_is_keyword.  And I'd just do it
> // for OMP_CLAUSE_GANG, which has it in the grammar.
> if (kind == OMP_CLAUSE_GANG
> && c_parser_next_token_is_keyword (parser, RID_STATIC))
>   {
> // ...
> // Your current code, except that for 
> if (c_parser_next_token_is (parser, CPP_MULT))
>   {
> c_parser_consume_token (parser);
> if (c_parser_next_token_is (parser, CPP_COMMA))
>   {
> c_parser_consume_token (parser);
> continue;
>   }
> break;
>   }
>   }
> else if (... num: / length: )
>   {
> // ...
>   }
> // ...
> mark_exp_read (expr);
> ops[idx] = expr;
> 
> if (kind == OMP_CLAUSE_GANG
> && c_parser_next_token_is (parser, CPP_COMMA))
>   {
> c_parser_consume_token (parser);
> continue;
>   }
> break;
>   }
>   while (1);
> 
>   if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
>   goto cleanup_error;
> 
> That way you don't parse something that is not in the grammar.

I did that. It turned out to be a little more compact than what I had
before. Is this OK for trunk?

Cesar

2015-10-24  Cesar Philippidis  
	Thomas Schwinge  
	James Norris  
	Joseph Myers  
	Julian Brown  
	Bernd Schmidt  

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

2015-10-24  Cesar Philippidis  

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c: New test.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..2ad3825 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,144 @@ c_parser_omp_clause_num_workers (c_p

Re: [OpenACC 5/11] C++ FE changes

2015-10-24 Thread Cesar Philippidis
On 10/23/2015 07:37 PM, Cesar Philippidis wrote:
> On 10/23/2015 01:25 PM, Cesar Philippidis wrote:
>> On 10/22/2015 01:52 AM, Jakub Jelinek wrote:
>>> On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
>>>> This patch is the C++ changes matching the C ones of patch 4.  In
>>>> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
>>>> as OpenMP's 'num_threads' clause.  One change to num_threads is the
>>>> augmentation of a diagnostic to add %<...%>  markers to the clause name.
>>>
>>> Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
>>> keywords.  Something to fix eventually.
>>
>> I updated omp tasks and teams in semantics.c.
>>
>>>> 2015-10-20  Cesar Philippidis  
>>>>Thomas Schwinge  
>>>>James Norris  
>>>>Joseph Myers  
>>>>Julian Brown  
>>>>Nathan Sidwell 
>>>>
>>>>* parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>>>>vector, worker.
>>>>(cp_parser_oacc_simple_clause): New.
>>>>(cp_parser_oacc_shape_clause): New.
>>>
>>> What I've said for the C FE patch, plus:
>>>
>>>> +if (cp_lexer_next_token_is (lexer, CPP_NAME)
>>>> +|| cp_lexer_next_token_is (lexer, CPP_KEYWORD))
>>>> +  {
>>>> +tree name_kind = cp_lexer_peek_token (lexer)->u.value;
>>>> +const char *p = IDENTIFIER_POINTER (name_kind);
>>>> +if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
>>>
>>> As static is a keyword, wouldn't it be better to just handle that case
>>> using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?
>>>
>>> Also, what is the exact grammar of the shape arguments?
>>> Would be nice to describe the grammar, in the grammar you just say
>>> expression, at least for vector/worker, which is clearly not accurate.
>>>
>>> It seems the intent is that num: or length: or static: is optional, right?
>>> But if that is the case, you should treat those as parsed only if followed
>>> by :.  While static is a keyword, so you can't have a variable called like
>>> that, having vector(length) or vector(num) should not be rejected.
>>> So, I would have expected that it should test if it is RID_STATIC
>>> followed by CPP_COLON (and only in that case consume those tokens),
>>> or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
>>> tokens), otherwise parse it as assignment expression.
>>
>> That function now peeks ahead to look for a colon, so now it can handle
>> variables with the name of clause keywords.
>>
>>> The C FE may have similar issue.  Plus of course there should be testsuite
>>> coverage for all the weird cases.
>>
>> I included a new test in a different patch because it's common to both c
>> and c++.
>>
>>>> +  case OMP_CLAUSE_GANG:
>>>> +  case OMP_CLAUSE_VECTOR:
>>>> +  case OMP_CLAUSE_WORKER:
>>>> +/* Operand 0 is the num: or length: argument.  */
>>>> +t = OMP_CLAUSE_OPERAND (c, 0);
>>>> +if (t == NULL_TREE)
>>>> +  break;
>>>> +
>>>> +t = maybe_convert_cond (t);
>>>
>>> Can you explain the maybe_convert_cond calls (in both cases here,
>>> plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
>>> The reason why it is used for OpenMP if and final clauses is that those have
>>> a condition argument, either the condition is zero or non-zero (so
>>> effectively it is turned into a bool).
>>> But aren't the gang/vector/worker/vector_length arguments integers rather
>>> than conditions?  I'd expect that finish_omp_clauses should verify
>>> those operands are indeed integral expressions (if that is the requirement
>>> in the standard), as it is something that for C++ can't be verified during
>>> parsing, if arbitrary expressions are parsed there.
>>
>> It's probably a copy-and-paste error. This functionality was added
>> incrementally. I removed that check.
>>
>>>> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
>>>>  break;
>>>>  
>>>>case OMP_CLAUSE_NUM_THREADS:
>>>> -t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
>>>> -if (t == error_mark_node)
>>

Re: [OpenACC 4/11] C FE changes

2015-10-26 Thread Cesar Philippidis
On 10/26/2015 01:59 AM, Jakub Jelinek wrote:

> Ok for trunk with those changes fixed.

Here's the patch with those changes. Nathan will commit this patch the
rest of the openacc execution model patches.

Thanks,
Cesar

2015-10-26  Cesar Philippidis  
	Thomas Schwinge  
	James Norris  
	Joseph Myers  
	Julian Brown  
	Bernd Schmidt  

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

2015-10-26  Cesar Philippidis  

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c: New test.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..13f09d8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,167 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 }
 
 /* OpenACC:
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
+
+static tree
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+			const char *str, tree list)
+{
+  const char *id = "num";
+  tree ops[2] = { NULL_TREE, NULL_TREE }, c;
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  if (kind == OMP_CLAUSE_VECTOR)
+id = "length";
+
+  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+{
+  c_parser_consume_token (parser);
+
+  do
+	{
+	  c_token *next = c_parser_peek_token (parser);
+	  int idx = 0;
+
+	  /* Gang static argument.  */
+	  if (kind == OMP_CLAUSE_GANG
+	  && c_parser_next_token_is_keyword (parser, RID_STATIC))
+	{
+	  c_parser_consume_token (parser);
+
+	  if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
+		goto cleanup_error;
+
+	  idx = 1;
+	  if (ops[idx] != NULL_TREE)
+		{
+		  c_parser_error (parser, "too many % arguments");
+		  goto cleanup_error;
+		}
+
+	  /* Check for the '*' argument.  */
+	  if (c_parser_next_token_is (parser, CPP_MULT))
+		{
+		  c_parser_consume_token (parser);
+		  ops[idx] = integer_minus_one_node;
+
+		  if (c_parser_next_token_is (parser, CPP_COMMA))
+		{
+		  c_parser_consume_token (parser);
+		  continue;
+		}
+		  else
+		break;
+		}
+	}
+	  /* Worker num: argument and vector length: arguments.  */
+	  else if (c_parser_next_token_is (parser, CPP_NAME)
+		   && strcmp (id, IDENTIFIER_POINTER (next->value)) == 0
+		   && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+	{
+	  c_parser_consume_token (parser);  /* id  */
+	  c_parser_consume_token (parser);  /* ':'  */
+	}
+
+	  /* Now collect the actual argument.  */
+	  if (ops[idx] != NULL_TREE)
+	{
+	  c_parser_error (parser, "unexpected argument");
+	  goto cleanup_error;
+	}
+
+	  location_t expr_loc = c_parser_peek_token (parser)->location;
+	  tree expr = c_parser_expr_no_commas (parser, NULL).value;
+	  if (expr == error_mark_node)
+	goto cleanup_error;
+
+	  mark_exp_read (expr);
+	  expr = c_fully_fold (expr, false, NULL);
+
+	  /* Attempt to statically determine when the number isn't a
+	 positive integer.  */
+
+	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)))
+	{
+	  c_parser_error (parser, "expected integer expression");
+	  return list;
+	}
+
+	  tree c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, expr,
+build_int_cst (TREE_TYPE (expr), 0));
+	  if (c == boolean_true_node)
+	{
+	  warning_at (loc, 0,
+			  "%<%s%> value must be positive", str);
+	  expr = integer_one_node;
+	}
+
+	  ops[idx] = expr;
+
+	  if (kind == OMP_CLAUSE_GANG
+	  && c_parser_next_token_is (parser, CPP_COMMA))
+	{
+	  c_parser_consume_token (parser);
+	  continue;
+	}
+	  break;
+	}
+  while (1);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+	goto cleanup_error;
+}
+
+  check_no_duplicate_clause (list, kind, str);
+
+  c = build_omp_clause (loc, kind);
+
+  if (ops[1])
+OMP_CLAUSE_OPERAND (c, 1) = ops[1];
+
+  OMP_CLAUSE_OPERAND (c, 0) = ops[0];
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  return c;
+
+ cleanup_error:
+  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+  return list;
+}
+
+/* OpenACC:
+   auto
+   independent
+   nohost
+   seq */
+
+static tree
+c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
+			 tree list)
+{
+  check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
+
+  tree c = build_omp_clause (c_parser_peek_token (parser)->location, code);
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  return c;
+}
+
+/* OpenACC:
async [( int-expr )] */
 
 stati

Re: [OpenACC 5/11] C++ FE changes

2015-10-26 Thread Cesar Philippidis
On 10/26/2015 03:20 AM, Jakub Jelinek wrote:
> On Sat, Oct 24, 2015 at 02:11:41PM -0700, Cesar Philippidis wrote:

>> --- a/gcc/cp/semantics.c
>> +++ b/gcc/cp/semantics.c
>> @@ -5911,6 +5911,31 @@ finish_omp_clauses (tree clauses, bool allow_fields, 
>> bool declare_simd)
>>  bitmap_set_bit (&firstprivate_head, DECL_UID (t));
>>goto handle_field_decl;
>>  
>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_WORKER:
>> +  /* Operand 0 is the num: or length: argument.  */
>> +  t = OMP_CLAUSE_OPERAND (c, 0);
>> +  if (t == NULL_TREE)
>> +break;
>> +
>> +  if (!processing_template_decl)
>> +t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
>> +  OMP_CLAUSE_OPERAND (c, 0) = t;
>> +
>> +  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_GANG)
>> +break;
> 
> I think it would be better to do the Operand 1 stuff first for
> case OMP_CLAUSE_GANG: only, and then have /* FALLTHRU */ into
> case OMP_CLAUSE_{VECTOR,WORKER}: which would handle the first argument.
> 
> You should add testing that the operand has INTEGRAL_TYPE_P type
> (except that for processing_template_decl it can be
> type_dependent_expression_p instead of INTEGRAL_TYPE_P).
>
> Also, the if (t == NULL_TREE) stuff looks fishy, because e.g. right now
> if you have OMP_CLAUSE_GANG gang (static: expr) or similar,
> you wouldn't wrap the expr into cleanup point.
> So, instead it should be
>   if (t)
> {
>   if (t == error_mark_node)
>   remove = true;
>   else if (!type_dependent_expression_p (t)
>&& !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>   {
> error_at (OMP_CLAUSE_LOCATION (c), ...);
> remove = true;
> }
>   else
>   {
> t = mark_rvalue_use (t);
> if (!processing_template_decl)
>   t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
> OMP_CLAUSE_OPERAND (c, 0) = t;
>   }
> }
> or so.  Also, can the expressions be arbitrary integers, or just
> non-negative, or positive?  If it is INTEGER_CST, that is something that
> could be checked here too.

I ended up handling with with OMP_CLAUSE_NUM_*, since they all require
positive integer expressions. The only exception was OMP_CLAUSE_GANG
which has two optional arguments.

>>else if (!type_dependent_expression_p (t)
>> && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>>  {
>> -  error ("num_threads expression must be integral");
>> + switch (OMP_CLAUSE_CODE (c))
>> +{
>> +case OMP_CLAUSE_NUM_TASKS:
>> +  error ("% expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_TEAMS:
>> +  error ("% expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_THREADS:
>> +  error ("% expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_GANGS:
>> +  error ("% expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_WORKERS:
>> +  error ("% expression must be integral");
>> +  break;
>> +case OMP_CLAUSE_VECTOR_LENGTH:
>> +  error ("% expression must be integral");
>> +  break;
> 
> When touching these, can you please use error_at (OMP_CLAUSE_LOCATION (c),
> instead of error ( ?

Done

>> +default:
>> +  error ("invalid argument");
> 
> What invalid argument?  I'd say that is clearly gcc_unreachable (); case.
> 
> But, I think it would be better to just use
>   error_at (OMP_CLAUSE_LOCATION (c), "%qs expression must be integral",
>   omp_clause_code_name[c]);

I used that generic message for all of those clauses except for _GANG,
_WORKER and _VECTOR. The gang clause, at the very least, needed it to
disambiguate the static and num arguments. If you want I can handle
_WORKER and _VECTOR with the generic message. I only included it because
those arguments are optional, whereas they are mandatory for the other
clauses.

Is this patch OK for trunk?

Cesar

2015-10-26  Cesar Philippidis  
	Thomas Schwinge  
	James Norris  
	Joseph Myers  
	Julian Brown  
	    Nathan Sidwell 
	Bernd Schmidt  

	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
	vector, worker.
	(cp_parser_oacc_simple_clause): New.
	(cp_parser_oacc_shape_clause): New.
	(cp_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLA

Re: more accurate error messages omp in fortran

2015-10-27 Thread Cesar Philippidis
(was "Re: more accurate omp in fortran"

Ping.

Cesar

On 10/22/2015 08:21 AM, Cesar Philippidis wrote:
> Currently, for certain omp and oacc errors the fortran will inaccurately
> report exactly where in the omp/acc construct the error has occurred. E.g.
> 
>!$acc parallel copy (i) copy (i) copy (j)
>1
> Error: Symbol ‘i’ present on multiple clauses at (1)
> 
> instead of
> 
>!$acc parallel copy (i) copy (i) copy (j)
> 1
> Error: Symbol ‘i’ present on multiple clauses at (1)
> 
> The problem here is how the front end uses the locus for the construct
> and not the individual clause. As a result that diagnostic pointer
> points to the end of the construct.
> 
> This patch teaches gfc_resolve_omp_clauses how to use the locus of each
> individual clause instead of the construct when reporting errors
> involving OMP_LIST_ clauses (which are typically clauses involving
> variables). It's still not perfect, but it does improve the quality of
> the error reporting a little. In particular, in openacc, other compilers
> are somewhat lenient in allowing variables to appear in multiple
> clauses, e.g. copyin (foo) copyout (foo), but this is clearly forbidden
> by the spec. I received some bug reports complaining that gfortran's
> errors aren't accurate.
> 
> I've also split off the check for variables appearing in multiple
> clauses into a separate function. It's a little overkill for trunk right
> now, but it is used quite a bit in gomp4 for oacc declare.
> 
> I've tested these changes on x86_64. Is this ok for trunk?
> 
> Cesar
> 
> 



Re: Re: [Bulk] [OpenACC 0/7] host_data construct

2015-10-27 Thread Cesar Philippidis
On 10/26/2015 11:34 AM, Jakub Jelinek wrote:
> On Fri, Oct 23, 2015 at 10:51:42AM -0500, James Norris wrote:
>> @@ -12942,6 +12961,7 @@ c_finish_omp_clauses (tree clauses, bool is_omp, 
>> bool declare_simd)
>>  case OMP_CLAUSE_GANG:
>>  case OMP_CLAUSE_WORKER:
>>  case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_USE_DEVICE:
>>pc = &OMP_CLAUSE_CHAIN (c);
>>continue;
>>  
> 
> Are there any restrictions on whether you can specify the same var multiple
> times in use_device clause?
> #pragma acc host_data use_device (x) use_device (x) use_device (y, y, y)
> ?
> If not, have you verified that the gimplifier doesn't ICE on it?  Generally
> it doesn't like the same var being mentioned multiple times.
> If yes, you can use e.g. the generic_head bitmap for that and in any case,
> cover that with sufficient testsuite coverage.

Generally variables cannot appear in multiple clauses. I'll add more
testing for this.

>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index ab9e540..0c32219 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -93,6 +93,8 @@ enum gimplify_omp_var_data
>>  
>>GOVD_MAP_0LEN_ARRAY = 32768,
>>  
>> +  GOVD_USE_DEVICE = 65536,
>> +
>>GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
>> | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
>> | GOVD_LOCAL)
>> @@ -116,7 +118,9 @@ enum omp_region_type
>>ORT_COMBINED_TARGET = 33,
>>/* Dummy OpenMP region, used to disable expansion of
>>   DECL_VALUE_EXPRs in taskloop pre body.  */
>> -  ORT_NONE = 64
>> +  ORT_NONE = 64,
>> +  /* An OpenACC host-data region.  */
>> +  ORT_HOST_DATA = 128
> 
> I'd prefer ORT_NONE to be the last one, can you just renumber it and put
> ORT_HOST_DATA before it?

OK.

>> +static tree
>> +gimplify_oacc_host_data_1 (tree *tp, int *walk_subtrees,
>> +   void *data ATTRIBUTE_UNUSED)
>> +{
> 
> Your use_device sounds very similar to use_device_ptr clause in OpenMP,
> which is allowed on #pragma omp target data construct and is implemented
> quite a bit differently from this; it is unclear if the OpenACC standard
> requires this kind of implementation, or you just chose to implement it this
> way.  In particular, the GOMP_target_data call puts the variables mentioned
> in the use_device_ptr clauses into the mapping structures (similarly how
> map clause appears) and the corresponding vars are privatized within the
> target data region (which is a host region, basically a fancy { } braces),
> where the private variables contain the offloading device's pointers.

Is this a new OpenMP 4.5 feature? I'll take a closer look and see if
they are similar enough. I also noticed that OpenMP 4.5 has something
similar to OpenACC's enter/exit data construct now.

>> +  splay_tree_node n = NULL;
>> +  location_t loc = EXPR_LOCATION (*tp);
>> +
>> +  switch (TREE_CODE (*tp))
>> +{
>> +case ADDR_EXPR:
>> +  {
>> +tree decl = TREE_OPERAND (*tp, 0);
>> +
>> +switch (TREE_CODE (decl))
>> +  {
>> +  case ARRAY_REF:
>> +  case ARRAY_RANGE_REF:
>> +  case COMPONENT_REF:
>> +  case VIEW_CONVERT_EXPR:
>> +  case REALPART_EXPR:
>> +  case IMAGPART_EXPR:
>> +if (TREE_CODE (TREE_OPERAND (decl, 0)) == VAR_DECL)
>> +  n = splay_tree_lookup (gimplify_omp_ctxp->variables,
>> + (splay_tree_key) TREE_OPERAND (decl, 0));
>> +break;
> 
> I must say this looks really strange, you throw away all the offsets
> embedded in the component codes (fixed or variable).
> Where comes the above list?  What about other components (say bit field refs,
> etc.)?

I'm not sure. This is one of those things where multiple developers
worked on it, and the history got lost. I'll investigate it.

>> +case VAR_DECL:
> 
> What is so special about VAR_DECLs?  Shouldn't PARM_DECLs / RESULT_DECLs
> be treated the same way?
>> --- a/libgomp/libgomp.map
>> +++ b/libgomp/libgomp.map
>> @@ -378,6 +378,7 @@ GOACC_2.0 {
>>  GOACC_wait;
>>  GOACC_get_thread_num;
>>  GOACC_get_num_threads;
>> +GOACC_deviceptr;
>>  };
>>  
>>  GOACC_2.0.1 {
> 
> You shouldn't be adding new symbols into a symbol version that appeared in a
> compiler that shipped already (GCC 5 already had GOACC_2.0 symbols).
> So it should go into GOACC_2.0.1.

OK.

>> diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
>> index af067d6..497ab92 100644
>> --- a/libgomp/oacc-mem.c
>> +++ b/libgomp/oacc-mem.c
>> @@ -204,6 +204,38 @@ acc_deviceptr (void *h)
>>return d;
>>  }
>>  
>> +/* This function is used as a helper in generated code to implement pointer
>> +   lookup in host_data regions.  Unlike acc_deviceptr, it returns its 
>> argument
>> +   unchanged on a shared-memory system (e.g. the host).  */
>> +
>> +void *
>> +GOACC_deviceptr (void *h)
>> +{
>> +  splay_tree_key n;
>> +  void *d;
>> +  void *offset;
>> +
>> +  goacc_lazy_initialize ();
>> +
>> +  st

[gomp4] fortran cleanups and c/c++ loop parsing backport

2015-10-27 Thread Cesar Philippidis
This patch contains the following:

  * C front end changes from trunk:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02528.html

  * C++ front end changes from trunk:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02540.html

  * Proposed fortran cleanups and enhanced error reporting changes:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html

In addition, I've also added a couple of more test cases and updated the
way that combined directives are handled in fortran. Because the
device_type clauses form a chain of gfc_omp_clauses, I couldn't reuse
gfc_split_omp_clauses for combined parallel and kernels loops. So that's
why I introduced a new gfc_filter_oacc_combined_clauses function.

I'll apply this patch to gomp-4_0-branch shortly. I know that I should
have broken this patch down into smaller patches, but it was already
arranged as one big patch in my source tree.

Cesar
2015-10-27  Cesar Philippidis  

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): Backport from trunk.
	(c_parser_omp_simple_clause): Likewise.
	(c_parser_oacc_all_clauses): Likewise.

	gcc/cp/
	* parser.c (cp_parser_oacc_shape_clause): Backport from trunk.
	(cp_parser_oacc_all_clauses): Likewise.
	* semantics.c (finish_omp_clauses): Likewise.

	gcc/fortran/
	* gfortran.h (gfc_omp_namelist): Add locus where member.
	* openmp.c (gfc_free_omp_clauses): Recursively deallocate device_type
	clauses.
	(gfc_match_omp_variable_list): New function.
	(resolve_omp_clauses): Remove where argument and use the where
	gfc_omp_namespace member when reporting errors.  Use
	resolve_omp_duplicate_list to check for variables appearing in
	mulitple clauses.
	(gfc_match_omp_clauses): Update call to resolve_omp_clauses.
	(gfc_match_oacc_declare): Likewise.
	(resolve_omp_do): Likewise.
	(resolve_oacc_loop): Likewise.
	(gfc_resolve_oacc_directive): Likewise.
	(gfc_resolve_omp_directive): Likewise.
	(gfc_resolve_omp_declare_simd): Likewise.
	(resolve_oacc_declare_map): New function.
	(gfc_resolve_oacc_declare): Use it.
	* trans-openmp.c (gfc_filter_oacc_combined_clauses): New function.
	(gfc_trans_oacc_combined_directive): Use it.

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c (int main): New test.
	* g++.dg/gomp/pr33372-1.C: Adjust expected error messages.
	* g++.dg/gomp/pr33372-3.C: Likewise.
	* gfortran.dg/goacc/combined-directives.f90: New test.
	* gfortran.dg/goacc/declare-2.f95: Adjust error message.
	* gfortran.dg/goacc/multi-clause.f90: New test.
	* gfortran.dg/gomp/intentin1.f90: Adjust error message.

	libgomp/
	* testsuite/libgomp.oacc-fortran/combdir-1.f90: Rename to ...
	* testsuite/libgomp.oacc-fortran/combined-directive-1.f90: ... this.
	Add a description of the test at the top of the file.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 3c36fc6..a1465bf 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11226,119 +11226,146 @@ c_parser_omp_clause_is_device_ptr (c_parser *parser, tree list)
 }
 
 /* OpenACC:
-   gang [( gang_expr_list )]
-   worker [( expression )]
-   vector [( expression )] */
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
 
 static tree
-c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
 			const char *str, tree list)
 {
-  omp_clause_code kind;
   const char *id = "num";
-
-  switch (c_kind)
-{
-default:
-  gcc_unreachable ();
-case PRAGMA_OACC_CLAUSE_GANG:
-  kind = OMP_CLAUSE_GANG;
-  break;
-case PRAGMA_OACC_CLAUSE_VECTOR:
-  kind = OMP_CLAUSE_VECTOR;
-  id = "length";
-  break;
-case PRAGMA_OACC_CLAUSE_WORKER:
-  kind = OMP_CLAUSE_WORKER;
-  break;
-}
-
-  tree op0 = NULL_TREE, op1 = NULL_TREE;
+  tree ops[2] = { NULL_TREE, NULL_TREE }, c;
   location_t loc = c_parser_peek_token (parser)->location;
 
+  if (kind == OMP_CLAUSE_VECTOR)
+id = "length";
+
   if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
 {
-  tree *op_to_parse = &op0;
   c_parser_consume_token (parser);
 
   do
 	{
-	  if (c_parser_next_token_is (parser, CPP_NAME)
-	  || c_parser_next_token_is (parser, CPP_KEYWORD))
+	  c_token *next = c_parser_peek_token (parser);
+	  int idx = 0;
+
+	  /* Gang static argument.  */
+	  if (kind == OMP_CLAUSE_GANG
+	  && c_parser_next_token_is_keyword (parser, RID_STATIC))
 	{
-	  tree name_kind = c_parser_peek_token (parser)->value;
-	  const char *p = IDENTIFIER_POINTER (name_kind);
-	  if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
+	  c_parser_consume_token (parser);
+
+	  if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
+		goto cleanup_error;
+
+	  idx = 1;
+	

Re: [gomp4] fortran cleanups and c/c++ loop parsing backport

2015-10-28 Thread Cesar Philippidis
On 10/28/2015 04:00 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Tue, 27 Oct 2015 11:36:10 -0700, Cesar Philippidis 
>  wrote:
>> This patch contains the following:
>>
>>   * C front end changes from trunk:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02528.html
>>
>>   * C++ front end changes from trunk:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02540.html
>>
>>   * Proposed fortran cleanups and enhanced error reporting changes:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html
> 
> I suppose the latter is a prerequisite for other Fortran front end
> changes you've also committed?  Otherwise, why not get that patch into
> trunk first?  That sould save me from having to deal with potentially
> more merge conflicts later on...

It wasn't necessarily a prerequisite for these changes, but I've been
trying to get that patch into trunk for a while now. Plus, part of those
cleanups touched declare, which Jim is going to work on soon.

>> In addition, I've also added a couple of more test cases and updated the
>> way that combined directives are handled in fortran. Because the
>> device_type clauses form a chain of gfc_omp_clauses, I couldn't reuse
>> gfc_split_omp_clauses for combined parallel and kernels loops. So that's
>> why I introduced a new gfc_filter_oacc_combined_clauses function.
> 
> Thanks, but...
> 
>> I'll apply this patch to gomp-4_0-branch shortly. I know that I should
>> have broken this patch down into smaller patches
> 
> Yes.
> 
>> but it was already
>> arranged as one big patch in my source tree.
> 
> You're using Git, so that's not a good excuse.  :-P

I find git to be too temperamental.

>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
>> @@ -3634,12 +3634,65 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, 
>> stmtblock_t *pblock,
>>return gfc_finish_block (&block);
>>  }
>>  
>> -/* parallel loop and kernels loop. */
>> +/* Helper function to filter combined oacc constructs.  ORIG_CLAUSES
>> +   contains the unfiltered list of clauses.  LOOP_CLAUSES corresponds to
>> +   the filter list of loop clauses corresponding to the enclosed list.
>> +   This function is called recursively to handle device_type clauses.  */
>> +
>> +static void
>> +gfc_filter_oacc_combined_clauses (gfc_omp_clauses **orig_clauses,
>> +  gfc_omp_clauses **loop_clauses)
>> +{
>> +  if (*orig_clauses == NULL)
>> +{
>> +  *loop_clauses = NULL;
>> +  return;
>> +}
>> +
>> +  *loop_clauses = gfc_get_omp_clauses ();
>> +
>> +  memset (*loop_clauses, 0, sizeof (gfc_omp_clauses));
> 
> This has already been created zero-initialized.

I was just doing what I was doing before. I removed that in the follow
up patch.

>> +  (*loop_clauses)->gang = (*orig_clauses)->gang;
>> +  (*orig_clauses)->gang = false;
>> +  (*loop_clauses)->gang_expr = (*orig_clauses)->gang_expr;
>> +  (*orig_clauses)->gang_expr = NULL;
>> +  (*loop_clauses)->gang_static = (*orig_clauses)->gang_static;
>> +  (*orig_clauses)->gang_static = false;
>> +  (*loop_clauses)->vector = (*orig_clauses)->vector;
>> +  (*orig_clauses)->vector = false;
>> +  (*loop_clauses)->vector_expr = (*orig_clauses)->vector_expr;
>> +  (*orig_clauses)->vector_expr = NULL;
>> +  (*loop_clauses)->worker = (*orig_clauses)->worker;
>> +  (*orig_clauses)->worker = false;
>> +  (*loop_clauses)->worker_expr = (*orig_clauses)->worker_expr;
>> +  (*orig_clauses)->worker_expr = NULL;
>> +  (*loop_clauses)->seq = (*orig_clauses)->seq;
>> +  (*orig_clauses)->seq = false;
>> +  (*loop_clauses)->independent = (*orig_clauses)->independent;
>> +  (*orig_clauses)->independent = false;
>> +  (*loop_clauses)->par_auto = (*orig_clauses)->par_auto;
>> +  (*orig_clauses)->par_auto = false;
>> +  (*loop_clauses)->acc_collapse = (*orig_clauses)->acc_collapse;
>> +  (*orig_clauses)->acc_collapse = false;
>> +  (*loop_clauses)->collapse = (*orig_clauses)->collapse;
>> +  /* Don't reset (*orig_clauses)->collapse.  */
> 
> Why?  (Extend source code comment?)  The original code (cited just below)
> did this differently.

Because that's what gfc_split_omp_clauses does. I'm not sure what that's
required for gfc_trans_omp_do, but it is. gfc_trans_omp_do appears to be
operating on two sets of clauses for some non-obvious reason.

>> +  (*loop_clause

Re: [OpenACC] declare directive

2015-10-28 Thread Cesar Philippidis
On 10/27/2015 01:18 PM, James Norris wrote:

> This patch adds the processing of OpenACC declare directive in C
> and C++. (Note: Support in Fortran is already in trunk.)
> Commentary on the changes is included as an attachment (NOTES).

A quick diff of gomp4 and trunk reveals quite a few fortran changes that
aren't present in trunk. Can you post those changes as a separate patch?

Thanks,
Cesar



[gomp4] minor cfe backports

2015-10-28 Thread Cesar Philippidis
I've applied this patch which backports a change in the way that seq and
auto are parsed in the c front end from trunk to gomp4.

Next up, I'm preparing a patch to remove *_omp_positive_int_clause from
the c and c++ front ends in gomp4. That function is used to parse
num_threads, num_gangs, num_workers and vector_length in gomp4. But
support for those clauses are already present in trunk. I'll post more
details with the patch later.

Cesar
2015-10-28  Cesar Philippidis  

	* gcc/c/c-parser.c (c_parser_oacc_simple_clause): New
	function.
	(c_parser_oacc_all_clauses): Use it instead of
	c_parser_omp_simple_clause.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index a1465bf..e4a0aca 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11365,7 +11365,25 @@ c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
 
  cleanup_error:
   c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
-  return list;  return c;
+  return list;
+}
+
+/* OpenACC:
+   auto
+   independent
+   nohost
+   seq */
+
+static tree
+c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
+			 tree list)
+{
+  check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
+
+  tree c = build_omp_clause (c_parser_peek_token (parser)->location, code);
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  return c;
 }
 
 /* OpenACC:
@@ -12724,7 +12742,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "async";
 	  break;
 	case PRAGMA_OACC_CLAUSE_AUTO:
-	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_AUTO,
+	  clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_AUTO,
 		clauses);
 	  c_name = "auto";
 	  break;
@@ -12848,7 +12866,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
-	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_SEQ,
+	  clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
 		clauses);
 	  c_name = "seq";
 	  break;


[gomp4] revert num_gangs, num_workers, vector_length and num_threads parser changes in c/c++

2015-10-29 Thread Cesar Philippidis
In gomp-4_0-branch, we've tried to consolidate the parsing all of the
clauses of the form

  foo (int-expression)

into a single c*_parser_omp_positive_int_clause function. At the time,
such clauses included num_gangs, num_workers, vector_length and
num_threads. Looking at OpenMP 4.5, there are additional candidates for
this function, specifically num_tasks, grainsize, priority and hint.
With that in mind, parser support for all of the aforementioned clauses
is already present in trunk, so I'll revert these change in
gomp-4_0-branch since they add no functionality. We might revisit a
similar patch if OpenACC adds new clauses of this form in the future.

I've applied this patch to gomp-4_0-branch.

Cesar


Re: [gomp4] revert num_gangs, num_workers, vector_length and num_threads parser changes in c/c++

2015-10-29 Thread Cesar Philippidis
On 10/29/2015 07:08 AM, Cesar Philippidis wrote:
> In gomp-4_0-branch, we've tried to consolidate the parsing all of the
> clauses of the form
> 
>   foo (int-expression)
> 
> into a single c*_parser_omp_positive_int_clause function. At the time,
> such clauses included num_gangs, num_workers, vector_length and
> num_threads. Looking at OpenMP 4.5, there are additional candidates for
> this function, specifically num_tasks, grainsize, priority and hint.
> With that in mind, parser support for all of the aforementioned clauses
> is already present in trunk, so I'll revert these change in
> gomp-4_0-branch since they add no functionality. We might revisit a
> similar patch if OpenACC adds new clauses of this form in the future.
> 
> I've applied this patch to gomp-4_0-branch.

I found some other bits that needed to be transferred from trunk, which
the attached patch does.

Note that I introduced a regression in template.C in gomp-4_0-branch in
the previous patch. The plan is to get templates working in trunk first,
then backport the fix to gomp-4_0-branch.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-10-29  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_parser_omp_simple_clause): Rename to ...
	(cp_parser_oacc_simple_clause): ... this.
	(cp_parser_omp_clause_untied): Restore from trunk.
	(cp_parser_omp_clause_branch): Likewise.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_simple_clause for
	OACC_CLAUSE_{AUTO,INDEPENDENT,NOHOST,NUM_GANGS,SEQ}.
	(cp_parser_omp_all_clauses): Use cp_parser_omp_clause_untied for
	OMP_CLAUSE_UNTIED, and cp_parser_omp_clause_branch for
	OMP_CLAUSE_{INBRANCH,NOTINBRANCH} and CICK_CLAUSE_{MASK,NOMASK}.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 71c33c4..8c1b20d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29642,6 +29642,23 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser *parser, tree list)
   return list;
 }
 
+/* OpenACC 2.0:
+   auto
+   independent
+   nohost
+   seq */
+
+static tree
+cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
+			  enum omp_clause_code code,
+			  tree list, location_t location)
+{
+  check_no_duplicate_clause (list, code, omp_clause_code_name[code], location);
+  tree c = build_omp_clause (location, code);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -30886,20 +30903,27 @@ cp_parser_omp_clause_schedule (cp_parser *parser, tree list, location_t location
 }
 
 /* OpenMP 3.0:
-   untied
+   untied */
 
-   OpenMP 4.0:
-   inbranch
-   notinbranch
+static tree
+cp_parser_omp_clause_untied (cp_parser * /*parser*/,
+			 tree list, location_t location)
+{
+  tree c;
 
-   OpenACC 2.0:
-   auto
-   independent
-   nohost
-   seq */
+  check_no_duplicate_clause (list, OMP_CLAUSE_UNTIED, "untied", location);
+
+  c = build_omp_clause (location, OMP_CLAUSE_UNTIED);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenMP 4.0:
+   inbranch
+   notinbranch */
 
 static tree
-cp_parser_omp_simple_clause (cp_parser * /*parser*/, enum omp_clause_code code,
+cp_parser_omp_clause_branch (cp_parser * /*parser*/, enum omp_clause_code code,
 			 tree list, location_t location)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code], location);
@@ -31697,7 +31721,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "async";
 	  break;
 	case PRAGMA_OACC_CLAUSE_AUTO:
-	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_AUTO,
+	  clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_AUTO,
 		 clauses, here);
 	  c_name = "auto";
 	  break;
@@ -31762,9 +31786,9 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "if";
 	  break;
 	case PRAGMA_OACC_CLAUSE_INDEPENDENT:
-	  clauses = cp_parser_omp_simple_clause (parser,
-		 OMP_CLAUSE_INDEPENDENT,
-		 clauses, here);
+	  clauses = cp_parser_oacc_simple_clause (parser,
+		  OMP_CLAUSE_INDEPENDENT,
+		  clauses, here);
 	  c_name = "independent";
 	  break;
 	case PRAGMA_OACC_CLAUSE_GANG:
@@ -31781,8 +31805,8 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "link";
 	  break;
 	case PRAGMA_OACC_CLAUSE_NOHOST:
-	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_NOHOST,
-		 clauses, here);
+	  clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_NOHOST,
+		  clauses, here);
 	  c_name = "nohost";
 	  break;
 	case PRAGMA_OACC_CLAUSE_NUM_GANGS:
@@ -31823,7 +31847,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
-	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_SEQ,
+	  clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
 		 clauses, here);
 	  c_name = "s

[OpenACC] num_gangs, num_workers and vector_length in c++

2015-10-29 Thread Cesar Philippidis
I noticed that num_gangs, num_workers and vector_length are parsed in
somewhat insistent ways in the c++ FE. Both vector_length and num_gangs
bail out whenever as soon as they detect errors, whereas num_workers
does not. Besides for that, they are also checking for integral
expressions as the arguments are scanned instead of deferring that check
to finish_omp_clauses. That check will cause ICEs when template
arguments are used when we add support for template arguments later on.

Rather than fix each function individually, I've consolidated them into
a single cp_parser_oacc_positive_int_clause function. While this
function could be extended to support openmp clauses which accept an
integer expression argument, like num_threads, I've decided to leave
those as-is since there are no known problems with those functions at
this moment.

It this OK for trunk? I've regression tested and bootstrapped on
x86_64-linux.

Cesar


2015-10-29  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_parser_oacc_positive_int_clause): New function.
	(cp_parser_oacc_clause_vector_length): Delete.
	(cp_parser_omp_clause_num_gangs): Delete.
	(cp_parser_omp_clause_num_workers): Delete.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_positive_int_clause
	to handle OMP_CLAUSE_{NUM_GANGS,NUM_WORKERS,VECTOR_LENGTH}.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c8f8b3d..b1172e7 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29603,6 +29603,39 @@ cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
   return c;
 }
 
+ /* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
+
+static tree
+cp_parser_oacc_positive_int_clause (cp_parser *parser, omp_clause_code code,
+const char *str, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+return list;
+
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
+
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+{
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
+  return list;
+}
+
+  check_no_duplicate_clause (list, code, str, loc);
+
+  tree c = build_omp_clause (loc, code);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -29726,45 +29759,6 @@ cp_parser_oacc_shape_clause (cp_parser *parser, omp_clause_code kind,
   return list;
 }
 
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
-
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-{
-  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenACC 2.0
Parse wait clause or directive parameters.  */
 
@@ -30143,42 +30137,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenMP 2.5:
num_threads ( expression ) */
 
@@ -30387,43 +30345,6 @@ cp_parser_omp_clause_d

Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 06:37 AM, Jakub Jelinek wrote:
> On Thu, Oct 29, 2015 at 04:02:11PM -0700, Cesar Philippidis wrote:
>> I noticed that num_gangs, num_workers and vector_length are parsed in
>> somewhat insistent ways in the c++ FE. Both vector_length and num_gangs
>> bail out whenever as soon as they detect errors, whereas num_workers
>> does not. Besides for that, they are also checking for integral
>> expressions as the arguments are scanned instead of deferring that check
>> to finish_omp_clauses. That check will cause ICEs when template
>> arguments are used when we add support for template arguments later on.
>>
>> Rather than fix each function individually, I've consolidated them into
>> a single cp_parser_oacc_positive_int_clause function. While this
>> function could be extended to support openmp clauses which accept an
>> integer expression argument, like num_threads, I've decided to leave
>> those as-is since there are no known problems with those functions at
>> this moment.
> 
> First question is what int-expr in OpenACC actually stands for (but I'll
> have to raise similar question for OpenMP too).
> 
> Previously you were using cp_parser_condition, which is clearly undesirable
> in this case, it allows e.g.
> num_gangs (int a = 5)
> but the question is if
> num_gangs (5, 6)
> is valid and stands for (5, 6) expression, then it should use
> cp_parser_expression, or if you want to error on it, then you should use
> cp_parser_assignment_expression.

The openacc spec doesn't actually define int-expr, but we take to me
mean a single integral value. In general, the openacc spec uses the term
list to describe comma separated expressions. So we've been assuming
that expr cannot contain commas. Besides, for num_gangs, num_workers and
vector_length it doesn't make sense to accept more than one value. A
construct can accept one than one of those clauses, but they'd have to
be associated with a different device_type.

> From quick skimming of the (now removed) C/C++ Grammar Appendix in OpenMP,
> I believe all the places where expression or scalar-expression is used
> in the grammar are meant to be cp_parser_expression cases (except
> expression-list used in UDRs which stands for normal C++ expression-list
> non-terminal), so clearly I need to fix up omp_clause_{if,final} to call
> cp_parser_expression instead of cp_parser_condition, and the various
> OpenMP clauses that use cp_parser_assignment_expression to instead use
> cp_parser_expression.  Say schedule(static, 3, 6) should be valid IMHO.
> But, in OpenMP expression or scalar-expression in the grammar is never
> followed by , or optional , while in OpenACC grammar clearly is (e.g. for
> the gang clause).
> If OpenACC wants something different, clearly you can't share the parsing
> routines between say num_tasks and num_workers.

So num_threads, num_tasks, grainsize, priority, hint, num_teams,
thread_limit should all accept comma-separated lists?

> Another thing is what Jason as C++ maintainer wants, it is nice to get rid
> of some code redundancies, on the other side the fact that there is one
> function per non-terminal in the grammar is also quite nice property.
> I know I've violated this a few times too.
> 
> Next question is, why do you call it cp_parser_oacc_positive_int_clause
> when the parsing function actually doesn't verify neither the positive nor
> the int properties (and it should not), so perhaps it should just reflect
> in the name that it is a clause with assignment? expression.
> Or, see the previous paragraph, have a helper that does that and then
> have a separate function for each clause kind that calls those with the
> right arguments.

That name had some legacy from the c FE in gomp-4_0-branch which the
function was inherited from. On one hand, it doesn't make sense to allow
negative integer values for those clauses, but at the same time, those
values aren't checked during scanning. Maybe it should be renamed
cp_parser_oacc_single_int_clause instead?

If you like, I could make a more general
cp_parser_omp_generic_expression that has a scan_list argument so that
it can be used for both general expressions and assignment-expressions.
That way it can be used for both omp and oacc clauses of the form 'foo (
expression )'.

What's your preference?

Thanks,
Cesar


Re: more accurate omp in fortran

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 07:47 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 08:21:35AM -0700, Cesar Philippidis wrote:
>> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
>> index b2894cc..93adb7b 100644
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
>>  } u;
>>struct gfc_omp_namelist_udr *udr;
>>struct gfc_omp_namelist *next;
>> +  locus where;
>>  }
>>  gfc_omp_namelist;
>>  
>> diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
>> index 3c12d8e..56a95d4 100644
>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, 
>> gfc_omp_namelist **list,
>>  }
>>tail->sym = sym;
>>tail->expr = expr;
>> +  tail->where = cur_loc;
>>goto next_item;
>>  case MATCH_NO:
>>break;
>> @@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, 
>> gfc_omp_namelist **list,
>>tail = tail->next;
>>  }
>>tail->sym = sym;
>> +  tail->where = cur_loc;
>>  }
>>  
>>  next_item:
> 
> The above is fine.

Thanks. I'll apply this change separately.

>> @@ -2832,36 +2834,47 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, 
>> gfc_namespace *ns,
>>return copy;
>>  }
>>  
>> -/* Returns true if clause in list 'list' is compatible with any of
>> -   of the clauses in lists [0..list-1].  E.g., a reduction variable may
>> -   appear in both reduction and private clauses, so this function
>> -   will return true in this case.  */
>> +/* Check if a variable appears in multiple clauses.  */
>>  
>> -static bool
>> -oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
>> -   gfc_symbol *sym, bool openacc)
>> +static void
>> +resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
>> +int list)
>>  {
>>gfc_omp_namelist *n;
>> +  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
> 
> Please don't do this, I'm afraid this breaks translations.
> Also, can you explain why all the mess with OMP_LIST_REDUCTION && openacc?
> That clearly looks misplaced to me.
> If one list item may be in at most one reduction clause, but may be in
> any other clause too, then it is the same case as e.g. OpenMP
> OMP_LIST_ALIGNED case, so you should instead just:
>   && (list != OMP_LIST_REDUCTION || !openacc)
> to the for (list = 0; list < OMP_LIST_NUM; list++) loop, and handle
> OMP_LIST_REDUCTION specially, similarly how OMP_LIST_ALIGNED is handled,
> just guarded with if (openacc).

That's a good idea, thanks. Reduction variables may appear in multiple
clauses in openacc because you have have reductions on kernels and
parallel constructs. And the same reduction variable may be associated
with a data clause.

Cesar


Re: more accurate omp in fortran

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 09:58 AM, Jakub Jelinek wrote:

> What I meant not just the above changes, but also all changes that
> replace where with &n->where and the like, so pretty much everything
> except for the oacc_compatible_clauses removal and addition of
> resolve_omp_duplicate_list.  That is kind of unrelated change.

Yeah, I was post the patch before I applied it anyway. Here's what I'm
testing now. I just into some fallout with Andrew MacLeod's header file
reduction patch when building offloading compilers. Seems like some
files are not including context.h anymore.

Cesar

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 90f63cf..13e730f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
+  locus where;
 }
 gfc_omp_namelist;
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 6c78c97..197b6d6 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	}
 	  tail->sym = sym;
 	  tail->expr = expr;
+	  tail->where = cur_loc;
 	  goto next_item;
 	case MATCH_NO:
 	  break;
@@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->sym = sym;
+	  tail->where = cur_loc;
 	}
 
 next_item:
@@ -2860,9 +2862,8 @@ oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
 /* OpenMP directive resolving routines.  */
 
 static void
-resolve_omp_clauses (gfc_code *code, locus *where,
-		 gfc_omp_clauses *omp_clauses, gfc_namespace *ns,
-		 bool openacc = false)
+resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
+		 gfc_namespace *ns, bool openacc = false)
 {
   gfc_omp_namelist *n;
   gfc_expr_list *el;
@@ -2921,7 +2922,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  {
 	if (!code && (!n->sym->attr.dummy || n->sym->ns != ns))
 	  gfc_error ("Variable %qs is not a dummy argument at %L",
-			 n->sym->name, where);
+			 n->sym->name, n->where);
 	continue;
 	  }
 	if (n->sym->attr.flavor == FL_PROCEDURE
@@ -2953,7 +2954,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  }
 	  }
 	gfc_error ("Object %qs is not a variable at %L", n->sym->name,
-		   where);
+		   &n->where);
   }
 
   for (list = 0; list < OMP_LIST_NUM; list++)
@@ -2969,7 +2970,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  if (n->sym->mark && !oacc_compatible_clauses (omp_clauses, list,
 			n->sym, openacc))
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
 	  else
 	n->sym->mark = 1;
 	}
@@ -2980,7 +2981,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
   if (n->sym->mark)
 	{
 	  gfc_error ("Symbol %qs present on multiple clauses at %L",
-		 n->sym->name, where);
+		 n->sym->name, n->where);
 	  n->sym->mark = 0;
 	}
 
@@ -2988,7 +2989,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->sym->mark)
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
   else
 	n->sym->mark = 1;
 }
@@ -2999,7 +3000,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->sym->mark)
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
   else
 	n->sym->mark = 1;
 }
@@ -3011,7 +3012,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->sym->mark)
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
   else
 	n->sym->mark = 1;
 }
@@ -3025,7 +3026,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->expr == NULL && n->sym->mark)
 	gfc_error ("Symbol %qs present on both FROM and TO clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, &n->where);
   else
 	n->sym->mark = 1;
 }
@@ -3047,7 +3048,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  {
 		if (!n->sym->attr.threadprivate)
 		  gfc_error ("Non-THREADPRIVATE object %qs in COPYIN clause"
-			 " at %L", n->sym->name, where);
+			 " at %L", n->sym->name, &n->where);
 	  }
 	break;
 	  case OMP_LIST_COPYPRIVATE:
@@ -3055,10 +3056,10 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  {
 		if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
 		  gfc_error ("Assumed size array %qs in COPYPRIVATE clause "
-			 "at %L", n->sym->name, where);
+			 "at %L", n->sym->name, &n->where);
 		if (n->sym->attr.pointer && n->sym->attr.intent == INTENT_IN)
 		  gfc_error ("INTENT(IN) POINTER %qs in COPYPRIVATE clause "
-			 "at %L", n->sym->name, where);
+			 "at %L", n->sym->name, &n->where);
 	  }
 	br

Re: [patch] New backend header reduction

2015-10-30 Thread Cesar Philippidis
On 10/23/2015 12:24 PM, Jeff Law wrote:
> On 10/23/2015 10:53 AM, Andrew MacLeod wrote:
>> Just finished running...  I think the external hard drive was slowing
>> down this run :-P  It took quite a while.
>>
>> Anyway, this is the reduction patch independent of the header-ordering
>> patch... ie, that patch needs to be applied before this one.   So this
>> should be mostly just removals.   I also need to follow up and build all
>> the target and bootstrap from scratch to make sure there arent any
>> weirdnesses with it.   But you can at least get a look at it now.
>>
>> a few interesting stats:
>>
>> Top reductions:
>>
>> passes.c: Reduction performed, 26 includes removed.
>> shrink-wrap.c: Reduction performed, 21 includes removed.
>> ipa-polymorphic-call.c: Reduction performed, 21 includes removed.
>> lto-cgraph.c: Reduction performed, 19 includes removed.
>> ddg.c: Reduction performed, 19 includes removed.
>> tree-ssa-pre.c: Reduction performed, 18 includes removed.
>> lra-remat.c: Reduction performed, 18 includes removed.
>> cgraph.c: Reduction performed, 18 includes removed.
>> cgraphclones.c: Reduction performed, 18 includes removed.
>> tsan.c: Reduction performed, 17 includes removed.
>> tree-into-ssa.c: Reduction performed, 17 includes removed.
>> lto-section-in.c: Reduction performed, 17 includes removed.
>>
>> And headers most often removed:
>>
>> alias.h: Removed 230 times.
>> flags.h: Removed 207 times.
>> internal-fn.h: Removed 143 times.
>> stmt.h: Removed 128 times.
>> dojump.h: Removed 122 times.
>> expmed.h: Removed 115 times.
>> explow.h: Removed 115 times.
>> varasm.h: Removed 114 times.
>> calls.h: Removed 114 times.
>> expr.h: Removed 81 times.
>> insn-config.h: Removed 77 times.
>> emit-rtl.h: Removed 62 times.
>> hard-reg-set.h: Removed 60 times.
>> tm_p.h: Removed 56 times.
>> fold-const.h: Removed 56 times.
>> diagnostic-core.h: Removed 53 times.
>> except.h: Removed 51 times.
> Approved.  This was the easy part :-)
> 
> A quick grep shows 2309 unnecessary #includes removed.

There's a little bit of fallout with this patch when building an
offloaded compiler for openacc. It looks like cgraph.c needs to include
context.h and varpool.c needs context.h and omp-low.h. There's a couple
of ifdef ENABLE_OFFLOADING which may have gone undetected with your script.

I've bootstrapped the attached patch for an nvptx/x86_64-linux target.
I'm still testing that toolchain. If the testing comes back clean, is
this patch OK for trunk?

Cesar

2015-10-30  Cesar Philippidis  

	gcc/
	* cgraph.c: Include context.h for offloading.
	* varpool.c: Include context.h and omp-low.h.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 92b8613..7839c72 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "profile.h"
 #include "params.h"
 #include "tree-chkp.h"
+#include "context.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
 #include "tree-pass.h"
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 3010dbb..478f365 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "varasm.h"
 #include "debug.h"
 #include "output.h"
+#include "omp-low.h"
+#include "context.h"
 
 const char * const tls_model_names[]={"none", "emulated",
   "global-dynamic", "local-dynamic",


Re: [patch] New backend header reduction

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 01:20 PM, Andrew MacLeod wrote:
> On 10/30/2015 02:09 PM, Andrew MacLeod wrote:
>> On 10/30/2015 01:56 PM, Cesar Philippidis wrote:
>>> On 10/23/2015 12:24 PM, Jeff Law wrote:
>>>> On 10/23/2015 10:53 AM, Andrew MacLeod wrote:
>>>>
>>> There's a little bit of fallout with this patch when building an
>>> offloaded compiler for openacc. It looks like cgraph.c needs to include
>>> context.h and varpool.c needs context.h and omp-low.h. There's a couple
>>> of ifdef ENABLE_OFFLOADING which may have gone undetected with your
>>> script.
>> If they are defined on the command line or some other way I couldn't
>> see with the targets I built, then that is the common case when that
>> happens.  I don't think I did any openacc builds. OR maybe I need
>> to add nvptx to my coverage builds. Perhaps that is best.
>>> I've bootstrapped the attached patch for an nvptx/x86_64-linux target.
>>> I'm still testing that toolchain. If the testing comes back clean, is
>>> this patch OK for trunk?
> Ah, I see.  there is no nvptx target in config-list.mk, so it never got
> covered.

Yeah, you need to build two separate compilers. Thomas posted some
directions here <https://gcc.gnu.org/wiki/Offloading>. You could
probably reproduce it with openmp and Intel's MIC emulation target too.

Cesar



Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 10:05 AM, Jakub Jelinek wrote:
> On Fri, Oct 30, 2015 at 07:42:39AM -0700, Cesar Philippidis wrote:

>>> Another thing is what Jason as C++ maintainer wants, it is nice to get rid
>>> of some code redundancies, on the other side the fact that there is one
>>> function per non-terminal in the grammar is also quite nice property.
>>> I know I've violated this a few times too.
> 
>> That name had some legacy from the c FE in gomp-4_0-branch which the
>> function was inherited from. On one hand, it doesn't make sense to allow
>> negative integer values for those clauses, but at the same time, those
>> values aren't checked during scanning. Maybe it should be renamed
>> cp_parser_oacc_single_int_clause instead?
> 
> That is better.
> 
>> If you like, I could make a more general
>> cp_parser_omp_generic_expression that has a scan_list argument so that
>> it can be used for both general expressions and assignment-expressions.
>> That way it can be used for both omp and oacc clauses of the form 'foo (
>> expression )'.
> 
> No, that will only confuse readers of the parser.  After all, the code to
> parse an expression argument of a clause is not that large.
> So, either cp_parser_oacc_single_int_clause or just keeping the old separate
> parsing functions, just remove the cruft from those (testing the type,
> using cp_parser_condition instead of cp_parser_assignment_expression) is ok
> with me.  Please ping Jason on what he prefers from those two.

Jason, what's your preference here? Should I create a single function to
parser num_gangs, num_workers and vector_length since they all accept
the same type of argument or should I just correct the existing
functions as I did in the attached patch? Either one would be specific
to openacc.

This patch has been bootstrapped and regression tested on trunk.

Cesar
2015-10-30  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_parser_oacc_clause_vector_length): Parse the clause
	argument as an assignment expression. Bail out early on error.
	(cp_parser_omp_clause_num_gangs): Likewise.
	(cp_parser_omp_clause_num_workers): Likewise.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c8f8b3d..a0d3f3b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29732,37 +29732,29 @@ cp_parser_oacc_shape_clause (cp_parser *parser, omp_clause_code kind,
 static tree
 cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
 {
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
 return list;
 
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
 
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
 {
   cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
   return list;
 }
 
   check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
+			 loc);
 
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
+  tree c = build_omp_clause (loc, OMP_CLAUSE_VECTOR_LENGTH);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
   OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
+  return c;
 }
 
 /* OpenACC 2.0
@@ -30149,34 +30141,28 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
 static tree
 cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
 {
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
 return list;
 
-  t = cp_parser_condition (parser);
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
 
   if (t == error_mark_node
   || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
 {
-  error_at (location, "expected positive integer expression");
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*c

Re: more accurate omp in fortran

2015-10-31 Thread Cesar Philippidis
On 10/30/2015 09:29 PM, Dominique d'Humières wrote:
>> diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
> 
> Revision r229609 breaks bootstrap:
> 
> ../../work/gcc/fortran/openmp.c: In function 'void 
> resolve_omp_clauses(gfc_code*, gfc_omp_clauses*, gfc_namespace*, bool)':
> ../../work/gcc/fortran/openmp.c:2925:27: error: format '%L' expects argument 
> of type 'locus*', but argument 3 has type 'locus' [-Werror=format=]
>  n->sym->name, n->where);
>^
> cc1plus: all warnings being treated as errors

Sorry about that. I as I explained in PR68168, I wasn't using
--enable-bootstrap when I tested this patch because I thought it was
implied by default. I was able to reproduce this problem and fix it with
the attached patch after I explicitly configured and built gcc with
--enable-bootstrap.

I've applied this patch to trunk, since it should have been included
with the original patch in the first place.

Cesar

2015-10-31  Cesar Philippidis  

	PR Bootstrap/68168

	gcc/fortran/
	* openmp.c (resolve_omp_clauses): Pass &n->where when calling
	gfc_error.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 3fd19b8..e59139c 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -2922,7 +2922,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	  {
 	if (!code && (!n->sym->attr.dummy || n->sym->ns != ns))
 	  gfc_error ("Variable %qs is not a dummy argument at %L",
-			 n->sym->name, n->where);
+			 n->sym->name, &n->where);
 	continue;
 	  }
 	if (n->sym->attr.flavor == FL_PROCEDURE


  1   2   3   4   5   6   7   >