Hi!

First, thanks for picking this up, and improving the patch you inherited.


Then, just a few individual comments, not a complete review.

(As far as I concerned, and as far as relevant, these can be addressed
later, incrementally, of course.)


I understand right that this will address some aspects of PR90115
"OpenACC: predetermined private levels for variables declared in blocks"
(so please mention that one in the ChangeLog updates, and commit log),
but it doesn't address all of these aspects (and see also Cesar's list in
<http://mid.mail-archive.com/70d27ebd-762e-59a3-082f-48fa0c687212@codesourcery.com>),
and also not yet PR90114 "Predetermined private levels for variables
declared in OpenACC accelerator routines"?


On Fri, 7 Jun 2019 15:08:37 +0100, Julian Brown <jul...@codesourcery.com> wrote:
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c

> @@ -5237,6 +5248,10 @@ nvptx_file_end (void)
>      write_shared_buffer (asm_out_file, vector_red_sym,
>                        vector_red_align, vector_red_size);
>  
> +  if (gangprivate_shared_size)
> +    write_shared_buffer (asm_out_file, gangprivate_shared_sym,
> +                      gangprivate_shared_align, gangprivate_shared_size);

Curious, what is the reason that we maintain this '__gangprivate_shared'
variable on a per-file basis instead of on a per-function basis (with
names '__gangprivate_shared_[function]', or similar), which should make
it more obvious where each block of '.shared' memory belongs to?


> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi

> +@deftypefn {Target Hook} rtx TARGET_GOACC_EXPAND_ACCEL_VAR (tree @var{var})
> +This hook, if defined, is used by accelerator target back-ends to expand
> +specially handled kinds of VAR_DECL expressions.  A particular use is to
> +place variables with specific attributes inside special accelarator
> +memories.  A return value of NULL indicates that the target does not
> +handle this VAR_DECL, and normal RTL expanding is resumed.
> +@end deftypefn

I guess I'm not terribly happy with the 'goacc.expand_accel_var' name.
Using different "memories" for specially tagged DECLs seems to be a
pretty generic concept (address spaces?), and...

> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -9974,8 +9974,19 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
> tmode,
>        exp = SSA_NAME_VAR (ssa_name);
>        goto expand_decl_rtl;
>  
> -    case PARM_DECL:
>      case VAR_DECL:
> +      /* Allow accel compiler to handle specific cases of variables,
> +      specifically those tagged with the "oacc gangprivate" attribute,
> +      which may be intended to be placed in special memory in GPUs.  */
> +      if (flag_openacc && targetm.goacc.expand_accel_var)
> +     {
> +       temp = targetm.goacc.expand_accel_var (exp);
> +       if (temp)
> +         return temp;
> +     }
> +      /* ... fall through ...  */
> +
> +    case PARM_DECL:

... I'm thus confused that there isn't already a generic mechanism
available in GCC, that we can just use instead of adding a new one here?
Thinking about the "address spaces" stuff in 'gcc/target.def' -- or is
that the wrong concept?  (I'm not familiar with all that, and haven't
looked closely.)


> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +      {
> +     tree decl = OMP_CLAUSE_DECL (c);
> +     if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +       {
> +         ctx->oacc_addressable_var_decls.safe_push (decl);
> +         maybe_oacc_gangprivate_vars = true;
> +       }
> +      }
> +}

Are all the relevant variables addressable?  And/or, need only those be
considered?

> +/* Record addressable vars declared in BINDVARS in CTX.  This information is
> +   used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> +{
> +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> +    if (VAR_P (v) && TREE_ADDRESSABLE (v))
> +      {
> +     ctx->oacc_addressable_var_decls.safe_push (v);
> +     maybe_oacc_gangprivate_vars = true;
> +      }
> +}

Likewise.


> +/* Mark addressable variables which are declared implicitly or explicitly as
> +   gang private with a special attribute.  These may need to have their
> +   declarations altered later on in compilation (e.g. in
> +   execute_oacc_device_lower or the backend, depending on how the OpenACC
> +   execution model is implemented on a given target) to ensure that sharing
> +   semantics are correct.  */
> +
> +static void
> +mark_oacc_gangprivate (vec<tree> *decls, omp_context *ctx)
> +{
> +  int i;
> +  tree decl;
> +
> +  FOR_EACH_VEC_ELT (*decls, i, decl)
> +    {
> +      for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
> +     {
> +       tree inner_decl = maybe_lookup_decl (decl, thisctx);
> +       if (inner_decl)
> +         {
> +           decl = inner_decl;
> +           break;
> +         }
> +     }
> +      if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
> +     {
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +         {
> +           fprintf (dump_file,
> +                    "Setting 'oacc gangprivate' attribute for decl:");
> +           print_generic_decl (dump_file, decl, TDF_SLIM);
> +           fputc ('\n', dump_file);
> +         }
> +       DECL_ATTRIBUTES (decl)
> +         = tree_cons (get_identifier ("oacc gangprivate"),
> +                      NULL, DECL_ATTRIBUTES (decl));
> +     }
> +    }
> +}

So I'm confused how that can be done here ('omplower'), given that the
decision about how levels of parallelism (gang, worker, vector) are
assigned is only done later ('oaccdevlow'), separately/differently per
offloading target?

The following seems relevant:

> +/* Find gang-private variables in a context.  */
> +
> +static int
> +process_oacc_gangprivate (splay_tree_node node, void * ARG_UNUSED (data))
> +{
> +  omp_context *ctx = (omp_context *) node->value;
> +  unsigned level_total = 0;
> +  omp_context *thisctx;
> +
> +  for (thisctx = ctx; thisctx; thisctx = thisctx->outer)
> +    level_total += thisctx->oacc_partitioning_levels;
> +
> +  /* If the current context and parent contexts are distributed over a
> +     total of one parallelism level, we have gang partitioning.  */
> +  if (level_total == 1)
> +    mark_oacc_gangprivate (&ctx->oacc_addressable_var_decls, ctx);
> +
> +  return 0;
> +}

..., but I didn't quickly manage to grok that.  (I shall try harder,
later on.)

But still then, this looks like it might work for the outer level (gang)
only (because all offloading targets are expected to assign gang level to
the outermost loop -- might that be the underlying assumption?), but it
won't work for inner loop/privatization levels?  (..., which I understand
this patch isn't doing anything about.)


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c/pr85465.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-w" } */
> +
> +int
> +main (void)
> +{
> +#pragma acc parallel
> +  foo ();
> +
> +  return 0;
> +}

I think that given your re-work of the implementation (move stuff from
front ends into OMP lowering) this test case isn't relevant anymore (was
a front end ICE).


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90
> @@ -0,0 +1,25 @@
> +! Test for "oacc gangprivate" attribute on gang-private variables
> +
> +! { dg-do run }
> +! { dg-additional-options "-fdump-tree-omplower-details" }
> +! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute 
> for decl:  integer\\(kind=4\\) w;" 1 "omplower" } } */

I prefer if such scanning is placed close to relevant source code
constructs, so I'd move this 'scan-tree-dump-times'...

> +
> +program main
> +  integer :: w, arr(0:31)
> +
> +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> +    !$acc loop gang private(w)

... here.

(Just to make sure, a Fortran 'integer' will always be
'integer(kind=4)'?)

> +    do j = 0, 31
> +      w = 0
> +      !$acc loop seq
> +      do i = 0, 31
> +        !$acc atomic update
> +        w = w + 1
> +        !$acc end atomic
> +      end do
> +      arr(j) = w
> +    end do
> +  !$acc end parallel
> +
> +  if (any (arr .ne. 32)) stop 1
> +end program main

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90
> @@ -0,0 +1,23 @@
> +! Test for lack of "oacc gangprivate" attribute on worker-private variables
> +
> +! { dg-do run }
> +! { dg-additional-options "-fdump-tree-omplower-details" }
> +! { dg-final { scan-tree-dump-times "Setting 'oacc gangprivate' attribute 
> for decl" 0 "omplower" } } */

Likewise...

> +
> +program main
> +  integer :: w, arr(0:31)
> +
> +  !$acc parallel num_gangs(32) num_workers(32) copyout(arr)
> +    !$acc loop gang worker private(w)

... here (I suppose).

> +    do j = 0, 31
> +      w = 0
> +      !$acc loop seq
> +      do i = 0, 31
> +        w = w + 1
> +      end do
> +      arr(j) = w
> +    end do
> +  !$acc end parallel
> +
> +  if (any (arr .ne. 32)) stop 1
> +end program main


Grüße
 Thomas

Attachment: signature.asc
Description: PGP signature

Reply via email to