On Wed, 3 Jun 2020 17:19:47 +0200
Thomas Schwinge <tho...@codesourcery.com> wrote:

> Hi Julian!
> 
> On 2020-06-03T14:36:14+0200, I wrote:
> > On 2020-05-22T15:16:05-0700, Julian Brown <jul...@codesourcery.com>
> > wrote:  
> >> This patch adjusts the semantics of dynamic reference counts, as
> >> described in the parent email.  
> >
> > Thanks!
> >
> > A few questions, but no need to send an updated patch.
> >  
> >> --- a/libgomp/oacc-mem.c
> >> +++ b/libgomp/oacc-mem.c  
> >  
> >> @@ -1018,13 +1036,102 @@ goacc_enter_data_internal (struct
> >> gomp_device_descr *acc_dev, size_t mapnum, {
> >>    for (size_t i = 0; i < mapnum; i++)
> >>      {
> >> -      int group_last = find_group_last (i, mapnum, sizes, kinds);
> >> +      splay_tree_key n;
> >> +      size_t group_last = find_group_last (i, mapnum, sizes,
> >> kinds);
> >> +      bool struct_p = false;
> >> +      size_t size, groupnum = (group_last - i) + 1;
> >>  
> >> -      gomp_map_vars_async (acc_dev, aq,
> >> -                     (group_last - i) + 1,
> >> -                     &hostaddrs[i], NULL,
> >> -                     &sizes[i], &kinds[i], true,
> >> -                     GOMP_MAP_VARS_OPENACC_ENTER_DATA);
> >> +      switch (kinds[i] & 0xff)
> >> +  {
> >> +  case GOMP_MAP_STRUCT:
> >> +    {
> >> +      int last = i + sizes[i];  
> >
> > The 'last' calculated here must always equal the 'group_last'
> > calculated above.  ;-) (... so we might just use 'group_last'
> > instead of 'last' in the following.)
> >  
> >> +      size = (uintptr_t) hostaddrs[last] + sizes[last]
> >> +             - (uintptr_t) hostaddrs[i];
> >> +      struct_p = true;
> >> +    }
> >> +    break;
> >> +
> >> +  case GOMP_MAP_ATTACH:
> >> +    size = sizeof (void *);
> >> +    break;
> >> +
> >> +  default:
> >> +    size = sizes[i];
> >> +  }
> >> +
> >> +      n = lookup_host (acc_dev, hostaddrs[i], size);
> >> +  
> >  
> >> +      if (n && struct_p)
> >> +  {
> >> +    if (n->refcount != REFCOUNT_INFINITY)
> >> +      n->refcount += groupnum - 1;
> >> +    n->dynamic_refcount += groupnum - 1;
> >> +    gomp_mutex_unlock (&acc_dev->lock);
> >> +  }  
> >
> > Is the 'GOMP_MAP_STRUCT' handling here specifically necessary, or
> > is that just an optimization of the 'n && groupnum > 1' case below?
> >  
> 
> Eh, OK, I think I see where this is going; the 'n && groupnum > 1'
> case below might not necessarily take care of the 'groupnum - 1'
> refcounts that we're filing here?

Right. GOMP_MAP_STRUCT is a little special in this case.

> >> +      else if (n && groupnum == 1)
> >> +  {
> >> +    void *h = hostaddrs[i];
> >> +    size_t s = sizes[i];
> >> +
> >> +    /* A standalone attach clause.  */
> >> +    if ((kinds[i] & 0xff) == GOMP_MAP_ATTACH)
> >> +      gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map,
> >> n,
> >> +                           (uintptr_t) h, s, NULL);
> >> +    else if (h + s > (void *) n->host_end)
> >> +      {
> >> +        gomp_mutex_unlock (&acc_dev->lock);
> >> +        gomp_fatal ("[%p,+%d] not mapped", (void *)h,
> >> (int)s);
> >> +      }
> >> +
> >> +    assert (n->refcount != REFCOUNT_LINK);
> >> +    if (n->refcount != REFCOUNT_INFINITY)
> >> +      n->refcount++;
> >> +    n->dynamic_refcount++;
> >> +
> >> +    gomp_mutex_unlock (&acc_dev->lock);
> >> +  }  
> >  
> >> +      else if (n && groupnum > 1)
> >> +  {
> >> +    assert (n->refcount != REFCOUNT_INFINITY
> >> +            && n->refcount != REFCOUNT_LINK);
> >> +
> >> +    bool processed = false;
> >> +
> >> +    struct target_mem_desc *tgt = n->tgt;
> >> +    for (size_t j = 0; j < tgt->list_count; j++)
> >> +      if (tgt->list[j].key == n)
> >> +        {
> >> +          for (size_t k = 0; k < groupnum; k++)
> >> +            if (j + k < tgt->list_count && tgt->list[j +
> >> k].key)
> >> +              {
> >> +                tgt->list[j + k].key->refcount++;
> >> +                tgt->list[j + k].key->dynamic_refcount++;
> >> +              }
> >> +          processed = true;
> >> +        }
> >> +
> >> +    gomp_mutex_unlock (&acc_dev->lock);
> >> +    if (!processed)
> >> +      gomp_fatal ("dynamic refcount incrementing failed for
> >> "
> >> +                  "pointer/pset");
> >> +  }  
> >
> > Please add some text to explain the nested 'j', 'k' loops and their
> > 'if' conditionals, and the 'groupnum' usage in the 'k' loop
> > boundary.  Should the 'k' loop maybe run 'for (size_t k = j; k <
> > tgt->list_count; ++k)' (..., or is 'groupnum' relevant?), and in
> > the loop body then use 'k' instead of 'j + k'?  (Maybe I've now
> > confused myself, staring at this for a while...)  
> 
> Audacious as I am sometimes, I did put a '__builtin_abort' right after
> 'tgt->list[j].key == n' -- and it doesn't trigger one single time for
> the current libgomp test cases, meaning this is all dead code?  I'm
> confused.

Huh, I didn't expect that! Indeed that stanza appears to be dead code
(at least with mapping clauses generated from current GCC). The reason
is a late bug-fix to the manual deep copy code that strips
GOMP_MAP_TO_PSET and GOMP_MAP_POINTER from OpenACC enter/exit mappings
altogether. (In
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg01253.html).

That means "grouped" mappings are actually only now used
for GOMP_MAP_STRUCT, so actually even more of the find_group_last
function is probably dead now too, modulo backward compatibility issues.

Rewinding a bit, here is an explanation of the problem that the removal
of those clauses fixes, in case we want to revisit that.

With the attached patch (reverting the fix), the attached test case
fails (e.g. compiled at -O0). The problem is that with a dynamic data
lifetime, it's possible for an array descriptor on the stack to go out
of scope before the array data it is associated with does. This might
well be violating either Fortran rules or OpenACC semantics -- if that's
the case, then we had no problem here. (I did see a similar problem "in
the wild", but hadn't come up with a standalone test case until now.)

The attached test case starts out with a explicit-shape array local. It
passes this to a subroutine "enterdata_wrapper". This subroutine
fabricates an assumed-shape array pointer to its argument (creating an
array descriptor), and passes it to another subroutine "enterdata".

The "enterdata" subroutine then performs an OpenACC "enter data"
operation with the array -- whose data comes from the original
explicit-shape array in the main program, but whose descriptor comes
from the stack frame of the caller (i.e. "enterdata_wrapper"). This
descriptor then goes out of scope before returning to the main program.

The test case tries to fiddle with the stack layout by adding arbitrary
other arrays, and does the same dance again with nested subroutines to
perform an "exit data" operation.  But now the address of the (new)
descriptor is different, and the unmapping operation fails.

In short -- OpenACC "enter data" operations can (could) create hidden
dangling references to array descriptors, in some circumstances.

So, the fix was to strip out GOMP_MAP_TO_PSET (and GOMP_MAP_POINTER,
which I don't think has any meaning on these directives) from OpenACC
"enter data" and "exit data" directives altogether. If an array has a
descriptor when we get to a compute kernel, that descriptor is copied
to the target anyway, *even for present clauses*, so passing the
array descriptor to "enter data" descriptor doesn't appear to be
necessary, even in cases where it stays in scope before unmapping from
the target.

So, questions:

1. Does the attached program violate Fortran semantics in some way?

2. Or OpenACC semantics?

3. Are there unintended side-effects of removing GOMP_MAP_TO_PSET and
   GOMP_MAP_POINTER from OpenACC enter/exit data directives?

4. Should the clauses be stripped from the equivalent OpenMP directives
   too?

(FAOD, I'm not asking for review on the attached patch at this time.)

HTH,

Julian
program myprog
  implicit none
  integer :: a(16)
  integer :: i

  call enterdata_wrapper(a, 16)
  call exitdata_wrapper(a, 16)

  contains

  subroutine enterdata_wrapper(a, n)
    implicit none
    integer :: n
    integer, target :: a(n)
    integer :: aa(16)
    integer :: bb(16)
    integer, pointer :: ap(:)
    integer :: cc(16)
    integer :: dd(16)

    ! An array descriptor appears somewhere around here...
    ap => a

    !$acc enter data copyin(aa,bb,cc,dd)
    call enterdata(ap)
    !$acc exit data copyout(aa,bb,cc,dd)

    ! ...and goes out of scope.
  end subroutine enterdata_wrapper

  subroutine enterdata(a)
    implicit none
    integer, pointer :: a(:)

    ! Map "to(a.data) to_pset(a) pointer(a.data)"
    !$acc enter data copyin(a)
  end subroutine enterdata

  subroutine exitdata_wrapper(a, n)
    implicit none
    integer :: n
    integer, target :: a(n)
    integer :: aa(32)
    integer :: bb(32)
    integer, pointer :: ap(:)
    integer :: cc(32)
    integer :: dd(32)

    ! A different array descriptor appears...
    ap => a

    !$acc enter data copyin(aa,bb,cc,dd)
    call exitdata(ap)
    !$acc exit data copyout(aa,bb,cc,dd)

    ! ...and goes out of scope.
  end subroutine exitdata_wrapper

  subroutine exitdata(a)
    implicit none
    integer, pointer :: a(:)

    ! Try to unmap the fresh array descriptor: FAILS.
    !$acc exit data copyout(a)
  end subroutine exitdata
end program myprog
commit 7a4d9939a7c5f770f3d2fcd02be01bfd146589ce
Author: Julian Brown <jul...@codesourcery.com>
Date:   Fri Jun 5 14:46:41 2020 -0700

    Remove GOMP_MAP_TO_PSET dangling reference bugfix

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e14932fafaf..79120e53129 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8748,6 +8748,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	    case OMP_TARGET_DATA:
 	    case OMP_TARGET_ENTER_DATA:
 	    case OMP_TARGET_EXIT_DATA:
+	    case OACC_ENTER_DATA:
+	    case OACC_EXIT_DATA:
 	    case OACC_HOST_DATA:
 	      if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
 		  || (OMP_CLAUSE_MAP_KIND (c)
@@ -8756,15 +8758,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 		   mapped, but not the pointer to it.  */
 		remove = true;
 	      break;
-	    case OACC_ENTER_DATA:
-	    case OACC_EXIT_DATA:
-	      if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
-		  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_TO_PSET
-		  || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
-		  || (OMP_CLAUSE_MAP_KIND (c)
-		      == GOMP_MAP_FIRSTPRIVATE_REFERENCE))
-		remove = true;
-	      break;
 	    default:
 	      break;
 	    }
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index bc25527c616..c462cbb1007 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1015,9 +1015,12 @@ find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds)
   switch (kind0)
     {
     case GOMP_MAP_TO_PSET:
-      while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
+      while (pos + 1 < mapnum
+	     && ((kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER
+		 || (kinds[pos + 1] & 0xff) == GOMP_MAP_ATTACH))
 	pos++;
-      /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET.  */
+      /* We expect at least one GOMP_MAP_POINTER (or GOMP_MAP_ATTACH)
+	 after a GOMP_MAP_TO_PSET.  */
       assert (pos > first_pos);
       break;
 
@@ -1044,7 +1047,9 @@ find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds)
 
       /* We can have zero or more GOMP_MAP_POINTER mappings after a to/from
 	 (etc.) mapping.  */
-      while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
+      while (pos + 1 < mapnum
+	     && ((kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER
+		 || (kinds[pos + 1] & 0xff) == GOMP_MAP_ATTACH))
 	pos++;
     }
 
@@ -1122,6 +1127,15 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
 	  assert (n->refcount != REFCOUNT_INFINITY
 		  && n->refcount != REFCOUNT_LINK);
 
+	  for (size_t j = i + 1; j <= group_last; j++)
+	    if ((kinds[j] & 0xff) == GOMP_MAP_ATTACH)
+	      {
+		splay_tree_key m
+		  = lookup_host (acc_dev, hostaddrs[j], sizeof (void *));
+		gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, m,
+				     (uintptr_t) hostaddrs[j], sizes[j], NULL);
+	      }
+
 	  bool processed = false;
 
 	  struct target_mem_desc *tgt = n->tgt;

Reply via email to