Hi!

This is a new version of the patch which hopefully addresses all review
comments. Further commentary below.

On Mon, 21 Oct 2019 16:14:11 +0200
Thomas Schwinge <tho...@codesourcery.com> wrote:

> On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com>
> wrote:
> > This patch has been broken out of the patch supporting OpenACC 2.6
> > manual deep copy last posted here:
> >
> >   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html  
> 
> Thanks.
> 
> Remeber to look into <https://gcc.gnu.org/PR92116> "Potential null
> pointer dereference in 'gomp_acc_remove_pointer'", which may be
> relevant here.

I've deleted the whole function (see below) so nothing to do there, I
don't think, even if that code had still been live in the last version
of the patch.

> I see you've merged in the relevant parts of my incremental patch
> '[WIP] OpenACC 2.6 manual deep copy support (attach/detach): adjust
> for "goacc_async_unmap_tgt" removal', that I included in
> <http://mid.mail-archive.com/yxfpftuqpakv.fsf@hertz.schwinge.homeip.net>,
> which tells me that I supposedly understood that part alright.  ;-D

Yes I think so -- I'll add you as co-author to the ChangeLog. Apologies
for the omission!

> > As part of developing that patch, libgomp's OpenACC reference
> > counting implementation proved to be somewhat inconsistent,
> > especially when used in combination with the deep copy support
> > which exercises it more thoroughly.
> >
> > So, this patch contains just the changes to reference-counting
> > behaviour, for ease of (re-)review.  The other parts of OpenACC 2.6
> > manual deep copy support are forthcoming, but some changes in this
> > patch anticipate that support.  
> 
> As we're discussing these separately, please for now remove the
> changes related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving
> 'link_key' into an union (to later be shared with 'attach_count');
> <87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>.

Done (I have a plan for the link_key/attach_count fields, but it's not
in this patch, and I'm not sure how well it'll work out yet).

> > Tested with offloading to NVPTX, with good results (though a couple
> > of tests need fixing also).  
> 
> The testsuite changes we're discussing separately, and need to go in
> before this one, obviously.

Those tests no longer regress, so no testsuite changes are strictly
necessary for this patch.

> > OK for trunk?  
> 
> I haven't understood all the changes related to replacing
> 'dynamic_refcount' with 'virtual_refcount', getting rid of
> 'data_environ', the 'lookup_dev' rework, but I trust you got that
> right. In particular, these seem to remove special-case OpenACC code
> in favor of generic OMP code, which is good.

Yep -- the previous email you dug up included the following rationale:

 - reference counts in the linked memory-mapping splay tree structure
   can be self-checked for consistency using optional (i.e.
   development-only) code.  This survives a libgomp test run (with
   offloading to nvptx), so I'm reasonably confident it's good.

 - the "data_environ" field in the device descriptor -- a linear linked
   list containing a target memory descriptor for each "acc enter data"
   mapping -- has been removed.  This brings OpenACC closer to the
   OpenMP implementation for non-lexically-scoped data mapping
   (GOMP_target_enter_exit_data), and is potentially a performance win
   if lots of data is mapped in this way.

 - the semantics of the "dynamic_refcount" field in the splay_tree_key
   structure have shifted slightly, so I've renamed the field.  It now
   represents references that are excess to those represented by actual
   pointers in the linked splay tree/target-memory descriptor structure.
   That might have been the intention before in fact, but the
   implementation was inconsistent.

The big thing here is the auto-checking of refcounting behaviour. There
were quite a few corner cases that were broken before.

> A few more comments:
> 
> > --- a/libgomp/libgomp.h
> > +++ b/libgomp/libgomp.h  
> 
> >  typedef struct acc_dispatch_t
> >  {
> > -  /* This is a linked list of data mapped using the
> > -     acc_map_data/acc_unmap_data or "acc enter data"/"acc exit
> > data" pragmas.
> > -     Unlike mapped_data in the goacc_thread struct, unmapping can
> > -     happen out-of-order with respect to mapping.  */
> > -  /* This is guarded by the lock in the "outer" struct
> > gomp_device_descr.  */
> > -  struct target_mem_desc *data_environ;  
> 
> As mentioned before, please also accordingly update the comment
> attached to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'.

Done.

> That code:
> 
> > -/* Free address mapping tables.  MM must be locked on entry, and
> > remains locked
> > -   on return.  */
> > -
> > -attribute_hidden void
> > -gomp_free_memmap (struct splay_tree_s *mem_map)
> > -{
> > -  while (mem_map->root)
> > -    {
> > -      struct target_mem_desc *tgt = mem_map->root->key.tgt;
> > -
> > -      splay_tree_remove (mem_map, &mem_map->root->key);
> > -      free (tgt->array);
> > -      free (tgt);
> > -    }
> > -}  
> 
> ... kind-of gets inlined here:
> 
> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> > @@ -356,9 +356,13 @@ acc_shutdown_1 (acc_device_t d)
> >  
> >        if (walk->dev)
> >     {
> > -     gomp_mutex_lock (&walk->dev->lock);
> > -     gomp_free_memmap (&walk->dev->mem_map);
> > -     gomp_mutex_unlock (&walk->dev->lock);
> > +     while (walk->dev->mem_map.root)
> > +       {
> > +         splay_tree_key k = &walk->dev->mem_map.root->key;
> > +         gomp_remove_var (walk->dev, k);
> > +       }
> >  
> >       walk->dev = NULL;
> >       walk->base_dev = NULL;  
> 
> It's not obvious to me why it's OK to remove the locking?  Don't all
> operations on the 'mem_map' have to have the device locked?

You're probably right about this -- good catch. Although if the user is
shutting down the device whilst it is still active (from some other
thread?) it's just a case of how ugly their crash is going to be
either way, I suspect!

> Does that code now still have the previous (and expected?) "finalize"
> semantics (don't consider 'refcount', always unmap)?  (Should we
> assert here that 'gomp_remove_var' always returns 'true'?  And/or, if
> it doesn't, what does that mean then?)  Or am I confused?  ;-)

Yeah. The splay tree keys are removed one at a time (without paying
attention to the refcounts for those), and the linked target_mem_descs
are freed when their refcounts drop to zero. Hence is_tgt_unmapped
won't always be true -- only when one of the linked target_mem_descs
gets freed.

> > --- a/libgomp/oacc-mem.c
> > +++ b/libgomp/oacc-mem.c  
> 
> > @@ -427,6 +418,7 @@ acc_unmap_data (void *h)
> >  {
> >    struct goacc_thread *thr = goacc_thread ();
> >    struct gomp_device_descr *acc_dev = thr->dev;
> > +  struct splay_tree_key_s cur_node;  
> 
> I know it's often not the case in existing code, but when adding new
> code, please move definitions next to their first use.

Done.

> > @@ -438,12 +430,11 @@ acc_unmap_data (void *h)
> >    acc_api_info api_info;
> >    bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info,
> > &api_info); 
> >    gomp_mutex_lock (&acc_dev->lock);
> >  
> > -  splay_tree_key n = lookup_host (acc_dev, h, 1);
> > -  struct target_mem_desc *t;
> > +  cur_node.host_start = (uintptr_t) h;
> > +  cur_node.host_end = cur_node.host_start + 1;
> > +  splay_tree_key n = splay_tree_lookup (&acc_dev->mem_map,
> > &cur_node); 
> >    if (!n)
> >      {  
> 
> Isn't this just inlining 'lookup_host'?  There may be a good reason
> to do that, but what is it?

Yeah, looks like it. I changed the code to use lookup_host.

> > @@ -451,47 +442,28 @@ acc_unmap_data (void *h)  
> 
> > -  /* Mark for removal.  */
> > -  n->refcount = 1;
> > +  splay_tree_remove (&acc_dev->mem_map, n);
> >  
> > -  t = n->tgt;
> > +  struct target_mem_desc *tgt = n->tgt;
> >  
> > -  if (t->refcount == 2)
> > +  if (tgt->refcount > 0)
> > +    tgt->refcount--;
> > +  else
> >      {
> > -[...]
> > +      free (tgt->array);
> > +      free (tgt);
> >      }  
> 
> Shouldn't that be 'if (tgt->refcount > 1)' (instead of '> 0'), like in
> 'gomp_unref_tgt' -- or actually use that function?

I think you're right about the condition -- well spotted! We can't use
gomp_unref_tgt here because acc_unmap_data isn't supposed to free the
device memory.

> >  
> >    gomp_mutex_unlock (&acc_dev->lock);
> >  
> > -  gomp_unmap_vars (t, true);
> > -
> >    if (profiling_p)
> >      {
> >        thr->prof_info = NULL;  
> 
> Hmm, I don't understand the changes leading to this, but again, I
> shall trust that you've got that right.
> 
> Or, was that a bug in the existing code, and we don't have proper test
> coverage?

I think that was a bug in the original code.

> > @@ -577,17 +551,14 @@ present_create_copy (unsigned f, void *h,
> > size_t s, int async)  
> 
> > -      d = tgt->to_free;  
> 
> > +      n = lookup_host (acc_dev, h, s);
> > +      assert (n != NULL);
> > +      d = (void *) (n->tgt->tgt_start + n->tgt_offset +
> > (uintptr_t) h
> > +               - n->host_start);  
> 
> |   return d;
> 
> Again, it's not obvious to me how that is semantically equivalent to
> what we've returned before?

This is a bug fix (it's mentioned in the ChangeLog).

> >  void
> > -gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom,
> > int async,
> > -                    int finalize, int mapnum)
> > +gomp_acc_remove_pointer (struct gomp_device_descr *acc_dev, void
> > **hostaddrs,
> > +                    size_t *sizes, unsigned short *kinds, int
> > async,
> > +                    bool finalize, int mapnum)
> >  {  
> 
> > +      switch (kind)
> > +        {
> > +   case GOMP_MAP_FROM:
> > +   case GOMP_MAP_FORCE_FROM:
> > +   case GOMP_MAP_ALWAYS_FROM:
> > +     copyfrom = true;
> > +     /* Fallthrough.  */
> > +
> > +   case GOMP_MAP_TO_PSET:
> > +   case GOMP_MAP_POINTER:
> > +   case GOMP_MAP_DELETE:
> > +   case GOMP_MAP_RELEASE:
> > [...]
> > +   default:
> > +     gomp_mutex_unlock (&acc_dev->lock);
> > +     gomp_fatal ("gomp_acc_remove_pointer unhandled kind
> > 0x%.2x",
> > +                 kind);  
> 
> Thanks for being explicit about the expected mapping kinds, etc.

That code's all gone with this version...

> > -      /* If running synchronously, unmap immediately.  */
> > -      if (async < acc_async_noval)
> > -   gomp_unmap_vars (t, true);
> > -      else
> > -   {
> > -     goacc_aq aq = get_goacc_asyncqueue (async);
> > -     gomp_unmap_vars_async (t, true, aq);  
> 
> As mentioned before, 'gomp_acc_remove_pointer' now "has an unused
> 'async' formal parameter.  Is that meant to be resolved to an
> asyncqueue, and pass that one to 'gomp_copy_dev2host', and call
> 'gomp_remove_var_async' instead of 'gomp_remove_var'"?  That's here:

Hmm yeah, that's all gone however.

> > +     if (copyfrom)
> > +       gomp_copy_dev2host (acc_dev, NULL, (void *)
> > cur_node.host_start,
> > +                           (void *) (n->tgt->tgt_start +
> > n->tgt_offset
> > +                                     + cur_node.host_start
> > +                                     - n->host_start),
> > +                           cur_node.host_end -
> > cur_node.host_start); +
> > +     if (n->refcount == 0)
> > +       gomp_remove_var (acc_dev, n);
> > +     break;  
> 
> > --- a/libgomp/oacc-parallel.c
> > +++ b/libgomp/oacc-parallel.c
> > @@ -56,12 +56,29 @@ find_pointer (int pos, size_t mapnum, unsigned
> > short *kinds)  
> 
> I've always been confused by this function (before known as
> 'find_pset'); this feels wrong, but I've never gotten to the bottom
> of it.

This version removes that function in favour of a function that finds
groups of consecutive mappings that should be kept together for a
single gomp_map_vars invocation. I think that fits better with my
findings as written up on the wiki page
https://gcc.gnu.org/wiki/LibgompPointerMappingKinds.

> I'll trust that your changes there can only improve the current
> situation, not worsen it.  ;-)
> 
> And, again, thanks for being explicit about the expected mapping
> kinds, etc.
> 
> > @@ -745,8 +762,14 @@ GOACC_enter_exit_data (int flags_m, size_t
> > mapnum, }
> >       else
> >         {
> > -         gomp_acc_insert_pointer (pointer, &hostaddrs[i],
> > -                                  &sizes[i], &kinds[i],
> > async);
> > +         goacc_aq aq = get_goacc_asyncqueue (async);
> > +         for (int j = 0; j < 2; j++)  
> 
> Should this magic constant '2' be derived from 'pointer' or some such?
> 
> > +           gomp_map_vars_async (acc_dev, aq,
> > +                                (j == 0 || pointer == 2) ?
> > 1 : 2,
> > +                                &hostaddrs[i + j], NULL,
> > +                                &sizes[i + j], &kinds[i + j],
> > true,
> > +
> > GOMP_MAP_VARS_OPENACC_ENTER_DATA);  
> 
> ;-) Yuck.  As requested before: "Can we get a comment added to such
> 'magic', please?"

That magic is gone now. 

> I just wish that eventually we'll be able to can get rid of that
> stuff, and just let 'gomp_map_vars' do its thing.  Similar to
> <https://gcc.gnu.org/PR90596> "'GOACC_parallel_keyed' should use
> 'GOMP_MAP_VARS_TARGET'".
> 
> (For avoidance of doubt, that's not your task right now.)

Does this version look better? I've removed the special-case handling
of pointers in the enter/exit data code, and combined the
gomp_acc_remove_pointer code (which now iterated over mappings
one-at-a-time anyway) with the loop iterating over mappings in the
new goacc_exit_data_internal function. It was a bit nonsensical to have
the "exit data" code split over two files, as before.

> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -536,7 +536,8 @@ gomp_map_vars_internal (struct
> > gomp_device_descr *devicep, struct target_mem_desc *tgt
> >      = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
> >    tgt->list_count = mapnum;
> > -  tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
> > +  tgt->refcount = (pragma_kind == GOMP_MAP_VARS_ENTER_DATA
> > +              || pragma_kind ==
> > GOMP_MAP_VARS_OPENACC_ENTER_DATA) ? 0 : 1; tgt->device_descr =
> > devicep; struct gomp_coalesce_buf cbuf, *cbufp = NULL;
> >  
> > @@ -1051,8 +1053,20 @@ gomp_map_vars_internal (struct
> > gomp_device_descr *devicep, /* If the variable from "omp target
> > enter data" map-list was already mapped, tgt is not needed.
> > Otherwise tgt will be freed by gomp_unmap_vars or gomp_exit_data.
> > */
> > -  if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount ==
> > 0)
> > -    {
> > +  if ((pragma_kind == GOMP_MAP_VARS_ENTER_DATA
> > +       || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA)
> > +      && tgt->refcount == 0)
> > +    {
> > +      /* If we're about to discard a target_mem_desc with no
> > "structural"
> > +    references (tgt->refcount == 0), any splay keys linked in
> > the tgt's
> > +    list must have their virtual refcount incremented to
> > represent that
> > +    "lost" reference in order to implement the semantics of
> > the OpenACC
> > +    "present increment" operation properly.  */
> > +      if (pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA)
> > +   for (i = 0; i < tgt->list_count; i++)
> > +     if (tgt->list[i].key)
> > +       tgt->list[i].key->virtual_refcount++;
> > +
> >        free (tgt);
> >        tgt = NULL;
> >      }  
> 
> So that last item is the only difference between
> 'GOMP_MAP_VARS_ENTER_DATA' and 'GOMP_MAP_VARS_OPENACC_ENTER_DATA'.
> Again I have not digested that one, but will trust you.

Yeah, because of the OpenACC reference counting & finalize semantics,
which I don't think are applicable to OpenMP.

> > @@ -1310,7 +1366,7 @@ gomp_load_image_to_device (struct
> > gomp_device_descr *devicep, unsigned version, k->tgt = tgt;
> >        k->tgt_offset = target_table[i].start;
> >        k->refcount = REFCOUNT_INFINITY;
> > -      k->link_key = NULL;
> > +      k->virtual_refcount = 0;
> >        array->left = NULL;
> >        array->right = NULL;
> >        splay_tree_insert (&devicep->mem_map, array);
> > @@ -1342,7 +1398,7 @@ gomp_load_image_to_device (struct
> > gomp_device_descr *devicep, unsigned version, k->tgt = tgt;
> >        k->tgt_offset = target_var->start;
> >        k->refcount = target_size & link_bit ? REFCOUNT_LINK :
> > REFCOUNT_INFINITY;
> > -      k->link_key = NULL;
> > +      k->virtual_refcount = 0;
> >        array->left = NULL;
> >        array->right = NULL;
> >        splay_tree_insert (&devicep->mem_map, array);  
> 
> Why no longer initialize 'link_key' here?
> 
> I'd expect that always all fields of 'k' ('struct splay_tree_key_s')
> get initialized, so like:
> 
> > @@ -2612,6 +2652,8 @@ omp_target_associate_ptr (const void
> > *host_ptr, const void *device_ptr, k->tgt = tgt;
> >        k->tgt_offset = (uintptr_t) device_ptr + device_offset;
> >        k->refcount = REFCOUNT_INFINITY;
> > +      k->virtual_refcount = 0;
> > +      k->u.link_key = NULL;
> >        array->left = NULL;
> >        array->right = NULL;
> >        splay_tree_insert (&devicep->mem_map, array);  
> 
> (I haven't verified whether that's always done, please verify.)

This version (without the link_key union, etc.) should avoid those
problems. I've added some missing initialisations, too.

Re-tested with offloading to nvptx. OK for trunk?

Thanks,

Julian
commit 5edb6e32d9afb5e9d2e0b72e8a311ce4d3865a34
Author: Julian Brown <jul...@codesourcery.com>
Date:   Mon Nov 5 15:51:46 2018 -0800

    OpenACC reference count overhaul
    
    2019-10-28  Julian Brown  <jul...@codesourcery.com>
                Thomas Schwinge  <tho...@codesourcery.com>
    
            libgomp/
            * libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount
            field for virtual_refcount.
            (struct acc_dispatch_t): Remove data_environ field.
            (struct gomp_device_descr): Update comment on openacc field.
            (enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA.
            (gomp_acc_insert_pointer, gomp_acc_remove_pointer, gomp_free_memmap):
            Remove prototypes.
            (gomp_remove_var_async): Add prototype.
            * oacc-host.c (host_dispatch): Don't initialise removed data_environ
            field.
            * oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var
            instead of calling gomp_free_memmap.
            * oacc-mem.c (lookup_dev_1): New function.
            (lookup_dev): Reimplement using above.
            (acc_free, acc_hostptr): Update calls to lookup_dev.
            (acc_map_data): Likewise.  Don't add to data_environ list.
            (acc_unmap_data): Remove call to gomp_unmap_vars.  Fix semantics to
            remove mapping, but not mapped data.
            (present_create_copy): Use virtual_refcount instead of
            dynamic_refcount.  Don't manipulate data_environ.  Fix target pointer
            return value.
            (delete_copyout): Update for virtual_refcount semantics.  Use
            goacc_remove_var_async for asynchronous delete/copyouts.
            (gomp_acc_insert_pointer, gomp_acc_remove_pointer): Remove functions.
            * oacc-parallel.c (find_pointer): Remove function.
            (find_group_last, goacc_enter_data_internal,
            goacc_exit_data_internal): New functions.
            (GOACC_enter_exit_data): Use goacc_enter_data_internal and
            goacc_exit_data_internal helper functions.
            * target.c (gomp_map_vars_internal): Handle
            GOMP_MAP_VARS_OPENACC_ENTER_DATA.  Update for virtual_refcount
            semantics.
            (gomp_remove_var): Reimplement in terms of...
            (gomp_remove_var_internal): ...this new helper function.
            (gomp_remove_var_async): Implement using above helper function.
            (gomp_unmap_vars_internal): Update for virtual_refcount
            semantics.
            (gomp_load_image_to_device): Zero-initialise virtual_refcount field.
            (gomp_free_memmap): Remove function.
            (omp_target_associate_ptr): Zero-initialise virtual_refcount and
            link_key splay tree key fields.
            (gomp_target_init): Don't initialise removed data_environ field.

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 178eb600ccd..8f01e9db9c5 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -919,8 +919,11 @@ struct splay_tree_key_s {
   uintptr_t tgt_offset;
   /* Reference count.  */
   uintptr_t refcount;
-  /* Dynamic reference count.  */
-  uintptr_t dynamic_refcount;
+  /* Reference counts beyond those that represent genuine references in the
+     linked splay tree key/target memory structures, e.g. for multiple OpenACC
+     "present increment" operations (via "acc enter data") referring to the same
+     host-memory block.  */
+  uintptr_t virtual_refcount;
   /* Pointer to the original mapping of "omp declare target link" object.  */
   splay_tree_key link_key;
 };
@@ -944,13 +947,6 @@ splay_compare (splay_tree_key x, splay_tree_key y)
 
 typedef struct acc_dispatch_t
 {
-  /* This is a linked list of data mapped using the
-     acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas.
-     Unlike mapped_data in the goacc_thread struct, unmapping can
-     happen out-of-order with respect to mapping.  */
-  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
-  struct target_mem_desc *data_environ;
-
   /* Execute.  */
   __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
@@ -1051,8 +1047,7 @@ struct gomp_device_descr
   enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
-  /* This is mutable because of its mutable data_environ and target_data
-     members.  */
+  /* This is mutable because of the target_data member.  */
   acc_dispatch_t openacc;
 };
 
@@ -1060,13 +1055,12 @@ struct gomp_device_descr
 enum gomp_map_vars_kind
 {
   GOMP_MAP_VARS_OPENACC,
+  GOMP_MAP_VARS_OPENACC_ENTER_DATA,
   GOMP_MAP_VARS_TARGET,
   GOMP_MAP_VARS_DATA,
   GOMP_MAP_VARS_ENTER_DATA
 };
 
-extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int);
-extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int);
 extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *,
 				       unsigned short *);
 struct gomp_coalesce_buf;
@@ -1092,9 +1086,10 @@ extern void gomp_unmap_vars_async (struct target_mem_desc *, bool,
 				   struct goacc_asyncqueue *);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern bool gomp_fini_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 extern bool gomp_remove_var (struct gomp_device_descr *, splay_tree_key);
+extern void gomp_remove_var_async (struct gomp_device_descr *, splay_tree_key,
+				   struct goacc_asyncqueue *);
 
 /* work.c */
 
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 12299aee65d..1b9adcec774 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -264,8 +264,6 @@ static struct gomp_device_descr host_dispatch =
     .state = GOMP_DEVICE_UNINITIALIZED,
 
     .openacc = {
-      .data_environ = NULL,
-
       .exec_func = host_openacc_exec,
 
       .create_thread_data_func = host_openacc_create_thread_data,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index e1568c535b3..8465ed503a3 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -357,7 +357,14 @@ acc_shutdown_1 (acc_device_t d)
       if (walk->dev)
 	{
 	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
+
+	  while (walk->dev->mem_map.root)
+	    {
+	      splay_tree_key k = &walk->dev->mem_map.root->key;
+	      k->link_key = NULL;
+	      gomp_remove_var (walk->dev, k);
+	    }
+
 	  gomp_mutex_unlock (&walk->dev->lock);
 
 	  walk->dev = NULL;
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 2f271009fb8..c6876fdff4f 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -50,42 +50,39 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s)
   return key;
 }
 
-/* Return block containing [D->S), or NULL if not contained.
-   The list isn't ordered by device address, so we have to iterate
-   over the whole array.  This is not expected to be a common
-   operation.  The device lock associated with TGT must be locked on entry, and
-   remains locked on exit.  */
+/* Helper for lookup_dev.  Iterate over splay tree.  */
 
 static splay_tree_key
-lookup_dev (struct target_mem_desc *tgt, void *d, size_t s)
+lookup_dev_1 (splay_tree_node node, uintptr_t d, size_t s)
 {
-  int i;
-  struct target_mem_desc *t;
+  splay_tree_key k = &node->key, found = NULL;
+  struct target_mem_desc *t = k->tgt;
 
-  if (!tgt)
-    return NULL;
+  if (d >= t->tgt_start && d + s <= t->tgt_end)
+    return k;
 
-  for (t = tgt; t != NULL; t = t->prev)
-    {
-      if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s)
-        break;
-    }
+  if (node->left)
+    found = lookup_dev_1 (node->left, d, s);
 
-  if (!t)
-    return NULL;
+  if (!found && node->right)
+    found = lookup_dev_1 (node->right, d, s);
 
-  for (i = 0; i < t->list_count; i++)
-    {
-      void * offset;
+  return found;
+}
 
-      splay_tree_key k = &t->array[i].key;
-      offset = d - t->tgt_start + k->tgt_offset;
+/* Return block containing [D->S), or NULL if not contained.
+   The list isn't ordered by device address, so we have to iterate
+   over the whole array.  This is not expected to be a common
+   operation.  The device lock associated with TGT must be locked on entry, and
+   remains locked on exit.  */
 
-      if (k->host_start + offset <= (void *) k->host_end)
-        return k;
-    }
+static splay_tree_key
+lookup_dev (splay_tree mem_map, void *d, size_t s)
+{
+  if (!mem_map || !mem_map->root)
+    return NULL;
 
-  return NULL;
+  return lookup_dev_1 (mem_map->root, (uintptr_t) d, s);
 }
 
 /* OpenACC is silent on how memory exhaustion is indicated.  We return
@@ -150,7 +147,7 @@ acc_free (void *d)
   /* We don't have to call lazy open here, as the ptr value must have
      been returned by acc_malloc.  It's not permitted to pass NULL in
      (unless you got that null from acc_malloc).  */
-  if ((k = lookup_dev (acc_dev->openacc.data_environ, d, 1)))
+  if ((k = lookup_dev (&acc_dev->mem_map, d, 1)))
     {
       void *offset;
 
@@ -301,7 +298,7 @@ acc_hostptr (void *d)
 
   gomp_mutex_lock (&acc_dev->lock);
 
-  n = lookup_dev (acc_dev->openacc.data_environ, d, 1);
+  n = lookup_dev (&acc_dev->mem_map, d, 1);
 
   if (!n)
     {
@@ -396,7 +393,7 @@ acc_map_data (void *h, void *d, size_t s)
 		      (int)s);
 	}
 
-      if (lookup_dev (thr->dev->openacc.data_environ, d, s))
+      if (lookup_dev (&thr->dev->mem_map, d, s))
         {
 	  gomp_mutex_unlock (&acc_dev->lock);
 	  gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d,
@@ -415,11 +412,6 @@ acc_map_data (void *h, void *d, size_t s)
 	  thr->api_info = NULL;
 	}
     }
-
-  gomp_mutex_lock (&acc_dev->lock);
-  tgt->prev = acc_dev->openacc.data_environ;
-  acc_dev->openacc.data_environ = tgt;
-  gomp_mutex_unlock (&acc_dev->lock);
 }
 
 void
@@ -438,12 +430,9 @@ acc_unmap_data (void *h)
   acc_api_info api_info;
   bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info);
 
-  size_t host_size;
-
   gomp_mutex_lock (&acc_dev->lock);
 
   splay_tree_key n = lookup_host (acc_dev, h, 1);
-  struct target_mem_desc *t;
 
   if (!n)
     {
@@ -451,47 +440,28 @@ acc_unmap_data (void *h)
       gomp_fatal ("%p is not a mapped block", (void *)h);
     }
 
-  host_size = n->host_end - n->host_start;
-
   if (n->host_start != (uintptr_t) h)
     {
+      size_t host_size = n->host_end - n->host_start;
       gomp_mutex_unlock (&acc_dev->lock);
       gomp_fatal ("[%p,%d] surrounds %p",
 		  (void *) n->host_start, (int) host_size, (void *) h);
     }
 
-  /* Mark for removal.  */
-  n->refcount = 1;
+  splay_tree_remove (&acc_dev->mem_map, n);
 
-  t = n->tgt;
+  struct target_mem_desc *tgt = n->tgt;
 
-  if (t->refcount == 2)
+  if (tgt->refcount > 1)
+    tgt->refcount--;
+  else
     {
-      struct target_mem_desc *tp;
-
-      /* This is the last reference, so pull the descriptor off the
-         chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from
-         freeing the device memory. */
-      t->tgt_end = 0;
-      t->to_free = 0;
-
-      for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
-	   tp = t, t = t->prev)
-	if (n->tgt == t)
-	  {
-	    if (tp)
-	      tp->prev = t->prev;
-	    else
-	      acc_dev->openacc.data_environ = t->prev;
-
-	    break;
-	  }
+      free (tgt->array);
+      free (tgt);
     }
 
   gomp_mutex_unlock (&acc_dev->lock);
 
-  gomp_unmap_vars (t, true);
-
   if (profiling_p)
     {
       thr->prof_info = NULL;
@@ -552,8 +522,9 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
       if (n->refcount != REFCOUNT_INFINITY)
 	{
 	  n->refcount++;
-	  n->dynamic_refcount++;
+	  n->virtual_refcount++;
 	}
+
       gomp_mutex_unlock (&acc_dev->lock);
     }
   else if (!(f & FLAG_CREATE))
@@ -563,7 +534,6 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
     }
   else
     {
-      struct target_mem_desc *tgt;
       size_t mapnum = 1;
       unsigned short kinds;
       void *hostaddrs = h;
@@ -577,17 +547,14 @@ present_create_copy (unsigned f, void *h, size_t s, int async)
 
       goacc_aq aq = get_goacc_asyncqueue (async);
 
-      tgt = gomp_map_vars_async (acc_dev, aq, mapnum, &hostaddrs, NULL, &s,
-				 &kinds, true, GOMP_MAP_VARS_OPENACC);
-      /* Initialize dynamic refcount.  */
-      tgt->list[0].key->dynamic_refcount = 1;
+      gomp_map_vars_async (acc_dev, aq, mapnum, &hostaddrs, NULL, &s, &kinds,
+			   true, GOMP_MAP_VARS_OPENACC_ENTER_DATA);
 
       gomp_mutex_lock (&acc_dev->lock);
-
-      d = tgt->to_free;
-      tgt->prev = acc_dev->openacc.data_environ;
-      acc_dev->openacc.data_environ = tgt;
-
+      n = lookup_host (acc_dev, h, s);
+      assert (n != NULL);
+      d = (void *) (n->tgt->tgt_start + n->tgt_offset + (uintptr_t) h
+		    - n->host_start);
       gomp_mutex_unlock (&acc_dev->lock);
     }
 
@@ -671,7 +638,6 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname)
 {
   size_t host_size;
   splay_tree_key n;
-  void *d;
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
@@ -700,9 +666,6 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname)
       gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s);
     }
 
-  d = (void *) (n->tgt->tgt_start + n->tgt_offset
-		+ (uintptr_t) h - n->host_start);
-
   host_size = n->host_end - n->host_start;
 
   if (n->host_start != (uintptr_t) h || host_size != s)
@@ -715,48 +678,34 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname)
   if (n->refcount == REFCOUNT_INFINITY)
     {
       n->refcount = 0;
-      n->dynamic_refcount = 0;
-    }
-  if (n->refcount < n->dynamic_refcount)
-    {
-      gomp_mutex_unlock (&acc_dev->lock);
-      gomp_fatal ("Dynamic reference counting assert fail\n");
+      n->virtual_refcount = 0;
     }
 
   if (f & FLAG_FINALIZE)
     {
-      n->refcount -= n->dynamic_refcount;
-      n->dynamic_refcount = 0;
+      n->refcount -= n->virtual_refcount;
+      n->virtual_refcount = 0;
     }
-  else if (n->dynamic_refcount)
+
+  if (n->virtual_refcount > 0)
     {
-      n->dynamic_refcount--;
       n->refcount--;
+      n->virtual_refcount--;
     }
+  else if (n->refcount > 0)
+    n->refcount--;
 
   if (n->refcount == 0)
     {
-      if (n->tgt->refcount == 2)
-	{
-	  struct target_mem_desc *tp, *t;
-	  for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
-	       tp = t, t = t->prev)
-	    if (n->tgt == t)
-	      {
-		if (tp)
-		  tp->prev = t->prev;
-		else
-		  acc_dev->openacc.data_environ = t->prev;
-		break;
-	      }
-	}
+      goacc_aq aq = get_goacc_asyncqueue (async);
 
       if (f & FLAG_COPYOUT)
 	{
-	  goacc_aq aq = get_goacc_asyncqueue (async);
+	  void *d = (void *) (n->tgt->tgt_start + n->tgt_offset
+			      + (uintptr_t) h - n->host_start);
 	  gomp_copy_dev2host (acc_dev, aq, h, d, s);
 	}
-      gomp_remove_var (acc_dev, n);
+      gomp_remove_var_async (acc_dev, n, aq);
     }
 
   gomp_mutex_unlock (&acc_dev->lock);
@@ -892,142 +841,3 @@ acc_update_self_async (void *h, size_t s, int async)
 {
   update_dev_host (0, h, s, async);
 }
-
-void
-gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes,
-			 void *kinds, int async)
-{
-  struct target_mem_desc *tgt;
-  struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev = thr->dev;
-
-  if (acc_is_present (*hostaddrs, *sizes))
-    {
-      splay_tree_key n;
-      gomp_mutex_lock (&acc_dev->lock);
-      n = lookup_host (acc_dev, *hostaddrs, *sizes);
-      gomp_mutex_unlock (&acc_dev->lock);
-
-      tgt = n->tgt;
-      for (size_t i = 0; i < tgt->list_count; i++)
-	if (tgt->list[i].key == n)
-	  {
-	    for (size_t j = 0; j < mapnum; j++)
-	      if (i + j < tgt->list_count && tgt->list[i + j].key)
-		{
-		  tgt->list[i + j].key->refcount++;
-		  tgt->list[i + j].key->dynamic_refcount++;
-		}
-	    return;
-	  }
-      /* Should not reach here.  */
-      gomp_fatal ("Dynamic refcount incrementing failed for pointer/pset");
-    }
-
-  gomp_debug (0, "  %s: prepare mappings\n", __FUNCTION__);
-  goacc_aq aq = get_goacc_asyncqueue (async);
-  tgt = gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs,
-			     NULL, sizes, kinds, true, GOMP_MAP_VARS_OPENACC);
-  gomp_debug (0, "  %s: mappings prepared\n", __FUNCTION__);
-
-  /* Initialize dynamic refcount.  */
-  tgt->list[0].key->dynamic_refcount = 1;
-
-  gomp_mutex_lock (&acc_dev->lock);
-  tgt->prev = acc_dev->openacc.data_environ;
-  acc_dev->openacc.data_environ = tgt;
-  gomp_mutex_unlock (&acc_dev->lock);
-}
-
-void
-gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom, int async,
-			 int finalize, int mapnum)
-{
-  struct goacc_thread *thr = goacc_thread ();
-  struct gomp_device_descr *acc_dev = thr->dev;
-  splay_tree_key n;
-  struct target_mem_desc *t;
-  int minrefs = (mapnum == 1) ? 2 : 3;
-
-  if (!acc_is_present (h, s))
-    return;
-
-  gomp_mutex_lock (&acc_dev->lock);
-
-  n = lookup_host (acc_dev, h, 1);
-
-  if (!n)
-    {
-      gomp_mutex_unlock (&acc_dev->lock);
-      gomp_fatal ("%p is not a mapped block", (void *)h);
-    }
-
-  gomp_debug (0, "  %s: restore mappings\n", __FUNCTION__);
-
-  t = n->tgt;
-
-  if (n->refcount < n->dynamic_refcount)
-    {
-      gomp_mutex_unlock (&acc_dev->lock);
-      gomp_fatal ("Dynamic reference counting assert fail\n");
-    }
-
-  if (finalize)
-    {
-      n->refcount -= n->dynamic_refcount;
-      n->dynamic_refcount = 0;
-    }
-  else if (n->dynamic_refcount)
-    {
-      n->dynamic_refcount--;
-      n->refcount--;
-    }
-
-  gomp_mutex_unlock (&acc_dev->lock);
-
-  if (n->refcount == 0)
-    {
-      if (t->refcount == minrefs)
-	{
-	  /* This is the last reference, so pull the descriptor off the
-	     chain. This prevents gomp_unmap_vars via gomp_unmap_tgt from
-	     freeing the device memory. */
-	  struct target_mem_desc *tp;
-	  for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL;
-	       tp = t, t = t->prev)
-	    {
-	      if (n->tgt == t)
-		{
-		  if (tp)
-		    tp->prev = t->prev;
-		  else
-		    acc_dev->openacc.data_environ = t->prev;
-		  break;
-		}
-	    }
-	}
-
-      /* Set refcount to 1 to allow gomp_unmap_vars to unmap it.  */
-      n->refcount = 1;
-      t->refcount = minrefs;
-      for (size_t i = 0; i < t->list_count; i++)
-	if (t->list[i].key == n)
-	  {
-	    t->list[i].copy_from = force_copyfrom ? 1 : 0;
-	    break;
-	  }
-
-      /* If running synchronously, unmap immediately.  */
-      if (async < acc_async_noval)
-	gomp_unmap_vars (t, true);
-      else
-	{
-	  goacc_aq aq = get_goacc_asyncqueue (async);
-	  gomp_unmap_vars_async (t, true, aq);
-	}
-    }
-
-  gomp_mutex_unlock (&acc_dev->lock);
-
-  gomp_debug (0, "  %s: mappings restored\n", __FUNCTION__);
-}
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index 68a60de24fa..87f9ae01906 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -47,23 +47,39 @@ _Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_HOST_FALLBACK)
 		"legacy GOMP_DEVICE_HOST_FALLBACK broken");
 
 
-/* Returns the number of mappings associated with the pointer or pset. PSET
-   have three mappings, whereas pointer have two.  */
+/* Some types of (pointer) variables use several consecutive mappings, which
+   must be treated as a group for enter/exit data directives.  This function
+   returns the last mapping in such a group (inclusive), or POS for singleton
+   mappings.  */
 
 static int
-find_pointer (int pos, size_t mapnum, unsigned short *kinds)
+find_group_last (int pos, size_t mapnum, unsigned short *kinds)
 {
-  if (pos + 1 >= mapnum)
-    return 0;
+  unsigned char kind0 = kinds[pos] & 0xff;
+  int first_pos = pos, last_pos = pos;
 
-  unsigned char kind = kinds[pos+1] & 0xff;
-
-  if (kind == GOMP_MAP_TO_PSET)
-    return 3;
-  else if (kind == GOMP_MAP_POINTER)
-    return 2;
+  if (kind0 == GOMP_MAP_TO_PSET)
+    {
+      while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
+	last_pos = ++pos;
+      /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET.  */
+      assert (last_pos > first_pos);
+    }
+  else
+    {
+      /* GOMP_MAP_ALWAYS_POINTER can only appear directly after some other
+	 mapping.  */
+      if (pos + 1 < mapnum
+	  && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER)
+	return pos + 1;
+
+      /* We can have one or several GOMP_MAP_POINTER mappings after a to/from
+	 (etc.) mapping.  */
+      while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER)
+	last_pos = ++pos;
+    }
 
-  return 0;
+  return last_pos;
 }
 
 /* Handle the mapping pair that are presented when a
@@ -580,6 +596,108 @@ GOACC_data_end (void)
     }
 }
 
+/* Map variables for OpenACC "enter data".  We can't just call
+   gomp_map_vars_async once, because individual mapped variables might have
+   "exit data" called for them at different times.  */
+
+static void
+goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
+			   void **hostaddrs, size_t *sizes,
+			   unsigned short *kinds, goacc_aq aq)
+{
+  for (size_t i = 0; i < mapnum; i++)
+    {
+      int group_last = find_group_last (i, mapnum, kinds);
+
+      gomp_map_vars_async (acc_dev, aq,
+			   (group_last - i) + 1,
+			   &hostaddrs[i], NULL,
+			   &sizes[i], &kinds[i], true,
+			   GOMP_MAP_VARS_OPENACC_ENTER_DATA);
+
+      i = group_last;
+    }
+}
+
+/* Unmap variables for OpenACC "exit data", with optional finalization
+   (affecting all mappings in this operation).  */
+
+static void
+goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum,
+			  void **hostaddrs, size_t *sizes,
+			  unsigned short *kinds, bool finalize, goacc_aq aq)
+{
+  gomp_mutex_lock (&acc_dev->lock);
+
+  for (size_t i = 0; i < mapnum; ++i)
+    {
+      unsigned char kind = kinds[i] & 0xff;
+      bool copyfrom = false;
+
+      switch (kind)
+	{
+	case GOMP_MAP_FROM:
+	case GOMP_MAP_FORCE_FROM:
+	case GOMP_MAP_ALWAYS_FROM:
+	  copyfrom = true;
+	  /* Fallthrough.  */
+
+	case GOMP_MAP_TO_PSET:
+	case GOMP_MAP_POINTER:
+	case GOMP_MAP_DELETE:
+	case GOMP_MAP_RELEASE:
+	  {
+	    struct splay_tree_key_s cur_node;
+	    cur_node.host_start = (uintptr_t) hostaddrs[i];
+	    cur_node.host_end = cur_node.host_start
+				+ (kind == GOMP_MAP_POINTER
+				   ? sizeof (void *) : sizes[i]);
+	    splay_tree_key n
+	      = splay_tree_lookup (&acc_dev->mem_map, &cur_node);
+
+	    if (n == NULL)
+	      continue;
+
+	    if (n->refcount == REFCOUNT_INFINITY)
+	      {
+		n->refcount = 1;
+		n->virtual_refcount = 0;
+	      }
+
+	    if (finalize)
+	      {
+		n->refcount -= n->virtual_refcount;
+		n->virtual_refcount = 0;
+	      }
+
+	    if (n->virtual_refcount > 0)
+	      {
+		n->refcount--;
+		n->virtual_refcount--;
+	      }
+	    else if (n->refcount > 0)
+	      n->refcount--;
+
+	    if (copyfrom)
+	      gomp_copy_dev2host (acc_dev, aq, (void *) cur_node.host_start,
+				  (void *) (n->tgt->tgt_start + n->tgt_offset
+					    + cur_node.host_start
+					    - n->host_start),
+				  cur_node.host_end - cur_node.host_start);
+
+	    if (n->refcount == 0)
+	      gomp_remove_var_async (acc_dev, n, aq);
+	  }
+	  break;
+	default:
+	  gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x",
+			  kind);
+	}
+    }
+
+  gomp_mutex_unlock (&acc_dev->lock);
+}
+
 void
 GOACC_enter_exit_data (int flags_m, size_t mapnum,
 		       void **hostaddrs, size_t *sizes, unsigned short *kinds,
@@ -707,98 +825,13 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum,
       va_end (ap);
     }
 
-  /* In c, non-pointers and arrays are represented by a single data clause.
-     Dynamically allocated arrays and subarrays are represented by a data
-     clause followed by an internal GOMP_MAP_POINTER.
-
-     In fortran, scalars and not allocated arrays are represented by a
-     single data clause. Allocated arrays and subarrays have three mappings:
-     1) the original data clause, 2) a PSET 3) a pointer to the array data.
-  */
+  goacc_aq aq = get_goacc_asyncqueue (async);
 
   if (data_enter)
-    {
-      for (i = 0; i < mapnum; i++)
-	{
-	  unsigned char kind = kinds[i] & 0xff;
-
-	  /* Scan for pointers and PSETs.  */
-	  int pointer = find_pointer (i, mapnum, kinds);
-
-	  if (!pointer)
-	    {
-	      switch (kind)
-		{
-		case GOMP_MAP_ALLOC:
-		case GOMP_MAP_FORCE_ALLOC:
-		  acc_create_async (hostaddrs[i], sizes[i], async);
-		  break;
-		case GOMP_MAP_TO:
-		case GOMP_MAP_FORCE_TO:
-		  acc_copyin_async (hostaddrs[i], sizes[i], async);
-		  break;
-		default:
-		  gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x",
-			      kind);
-		  break;
-		}
-	    }
-	  else
-	    {
-	      gomp_acc_insert_pointer (pointer, &hostaddrs[i],
-				       &sizes[i], &kinds[i], async);
-	      /* Increment 'i' by two because OpenACC requires fortran
-		 arrays to be contiguous, so each PSET is associated with
-		 one of MAP_FORCE_ALLOC/MAP_FORCE_PRESET/MAP_FORCE_TO, and
-		 one MAP_POINTER.  */
-	      i += pointer - 1;
-	    }
-	}
-    }
+    goacc_enter_data_internal (acc_dev, mapnum, hostaddrs, sizes, kinds, aq);
   else
-    for (i = 0; i < mapnum; ++i)
-      {
-	unsigned char kind = kinds[i] & 0xff;
-
-	int pointer = find_pointer (i, mapnum, kinds);
-
-	if (!pointer)
-	  {
-	    switch (kind)
-	      {
-	      case GOMP_MAP_RELEASE:
-	      case GOMP_MAP_DELETE:
-		if (acc_is_present (hostaddrs[i], sizes[i]))
-		  {
-		    if (finalize)
-		      acc_delete_finalize_async (hostaddrs[i], sizes[i], async);
-		    else
-		      acc_delete_async (hostaddrs[i], sizes[i], async);
-		  }
-		break;
-	      case GOMP_MAP_FROM:
-	      case GOMP_MAP_FORCE_FROM:
-		if (finalize)
-		  acc_copyout_finalize_async (hostaddrs[i], sizes[i], async);
-		else
-		  acc_copyout_async (hostaddrs[i], sizes[i], async);
-		break;
-	      default:
-		gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x",
-			    kind);
-		break;
-	      }
-	  }
-	else
-	  {
-	    bool copyfrom = (kind == GOMP_MAP_FORCE_FROM
-			     || kind == GOMP_MAP_FROM);
-	    gomp_acc_remove_pointer (hostaddrs[i], sizes[i], copyfrom, async,
-				     finalize, pointer);
-	    /* See the above comment.  */
-	    i += pointer - 1;
-	  }
-      }
+    goacc_exit_data_internal (acc_dev, mapnum, hostaddrs, sizes, kinds,
+			      finalize, aq);
 
  out_prof:
   if (profiling_p)
diff --git a/libgomp/target.c b/libgomp/target.c
index 84d6daa76ca..552820d874f 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -536,8 +536,10 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
   struct target_mem_desc *tgt
     = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
   tgt->list_count = mapnum;
-  tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
+  tgt->refcount = (pragma_kind == GOMP_MAP_VARS_ENTER_DATA
+		   || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) ? 0 : 1;
   tgt->device_descr = devicep;
+  tgt->prev = NULL;
   struct gomp_coalesce_buf cbuf, *cbufp = NULL;
 
   if (mapnum == 0)
@@ -937,7 +939,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		tgt->list[i].offset = 0;
 		tgt->list[i].length = k->host_end - k->host_start;
 		k->refcount = 1;
-		k->dynamic_refcount = 0;
+		k->virtual_refcount = 0;
 		tgt->refcount++;
 		array->left = NULL;
 		array->right = NULL;
@@ -1075,8 +1077,20 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
   /* If the variable from "omp target enter data" map-list was already mapped,
      tgt is not needed.  Otherwise tgt will be freed by gomp_unmap_vars or
      gomp_exit_data.  */
-  if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount == 0)
-    {
+  if ((pragma_kind == GOMP_MAP_VARS_ENTER_DATA
+       || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA)
+      && tgt->refcount == 0)
+    {
+      /* If we're about to discard a target_mem_desc with no "structural"
+	 references (tgt->refcount == 0), any splay keys linked in the tgt's
+	 list must have their virtual refcount incremented to represent that
+	 "lost" reference in order to implement the semantics of the OpenACC
+	 "present increment" operation properly.  */
+      if (pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA)
+	for (i = 0; i < tgt->list_count; i++)
+	  if (tgt->list[i].key)
+	    tgt->list[i].key->virtual_refcount++;
+
       free (tgt);
       tgt = NULL;
     }
@@ -1116,32 +1130,63 @@ gomp_unmap_tgt (struct target_mem_desc *tgt)
   free (tgt);
 }
 
-attribute_hidden bool
-gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
+static bool
+gomp_unref_tgt (void *ptr)
 {
   bool is_tgt_unmapped = false;
-  splay_tree_remove (&devicep->mem_map, k);
-  if (k->link_key)
-    splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->link_key);
-  if (k->tgt->refcount > 1)
-    k->tgt->refcount--;
+
+  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
+
+  if (tgt->refcount > 1)
+    tgt->refcount--;
   else
     {
+      gomp_unmap_tgt (tgt);
       is_tgt_unmapped = true;
-      gomp_unmap_tgt (k->tgt);
     }
+
   return is_tgt_unmapped;
 }
 
 static void
-gomp_unref_tgt (void *ptr)
+gomp_unref_tgt_void (void *ptr)
 {
-  struct target_mem_desc *tgt = (struct target_mem_desc *) ptr;
+  (void) gomp_unref_tgt (ptr);
+}
 
-  if (tgt->refcount > 1)
-    tgt->refcount--;
+static inline __attribute__((always_inline)) bool
+gomp_remove_var_internal (struct gomp_device_descr *devicep, splay_tree_key k,
+			  struct goacc_asyncqueue *aq)
+{
+  bool is_tgt_unmapped = false;
+  splay_tree_remove (&devicep->mem_map, k);
+  if (k->link_key)
+    splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->link_key);
+  if (aq)
+    devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void,
+						(void *) k->tgt);
   else
-    gomp_unmap_tgt (tgt);
+    is_tgt_unmapped = gomp_unref_tgt ((void *) k->tgt);
+  return is_tgt_unmapped;
+}
+
+attribute_hidden bool
+gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k)
+{
+  return gomp_remove_var_internal (devicep, k, NULL);
+}
+
+/* Remove a variable asynchronously.  This actually removes the variable
+   mapping immediately, but retains the linked target_mem_desc until the
+   asynchronous operation has completed (as it may still refer to target
+   memory).  The device lock must be held before entry, and remains locked on
+   exit.  */
+
+attribute_hidden void
+gomp_remove_var_async (struct gomp_device_descr *devicep, splay_tree_key k,
+		       struct goacc_asyncqueue *aq)
+{
+  (void) gomp_remove_var_internal (devicep, k, aq);
 }
 
 /* Unmap variables described by TGT.  If DO_COPYFROM is true, copy relevant
@@ -1177,7 +1222,14 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
 	continue;
 
       bool do_unmap = false;
-      if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
+      if (k->tgt == tgt
+	  && k->virtual_refcount > 0
+	  && k->refcount != REFCOUNT_INFINITY)
+	{
+	  k->virtual_refcount--;
+	  k->refcount--;
+	}
+      else if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY)
 	k->refcount--;
       else if (k->refcount == 1)
 	{
@@ -1197,7 +1249,7 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom,
     }
 
   if (aq)
-    devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt,
+    devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void,
 						(void *) tgt);
   else
     gomp_unref_tgt ((void *) tgt);
@@ -1334,6 +1386,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->tgt = tgt;
       k->tgt_offset = target_table[i].start;
       k->refcount = REFCOUNT_INFINITY;
+      k->virtual_refcount = 0;
       k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
@@ -1366,6 +1419,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->tgt = tgt;
       k->tgt_offset = target_var->start;
       k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
+      k->virtual_refcount = 0;
       k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
@@ -1600,22 +1654,6 @@ gomp_unload_device (struct gomp_device_descr *devicep)
     }
 }
 
-/* Free address mapping tables.  MM must be locked on entry, and remains locked
-   on return.  */
-
-attribute_hidden void
-gomp_free_memmap (struct splay_tree_s *mem_map)
-{
-  while (mem_map->root)
-    {
-      struct target_mem_desc *tgt = mem_map->root->key.tgt;
-
-      splay_tree_remove (mem_map, &mem_map->root->key);
-      free (tgt->array);
-      free (tgt);
-    }
-}
-
 /* Host fallback for GOMP_target{,_ext} routines.  */
 
 static void
@@ -2636,6 +2674,8 @@ omp_target_associate_ptr (const void *host_ptr, const void *device_ptr,
       k->tgt = tgt;
       k->tgt_offset = (uintptr_t) device_ptr + device_offset;
       k->refcount = REFCOUNT_INFINITY;
+      k->virtual_refcount = 0;
+      k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -2906,7 +2946,6 @@ gomp_target_init (void)
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.state = GOMP_DEVICE_UNINITIALIZED;
-		current_device.openacc.data_environ = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;

Reply via email to