Hi! This is a new version of the patch which hopefully addresses all review comments. Further commentary below.
On Mon, 21 Oct 2019 16:14:11 +0200 Thomas Schwinge <tho...@codesourcery.com> wrote: > On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com> > wrote: > > This patch has been broken out of the patch supporting OpenACC 2.6 > > manual deep copy last posted here: > > > > https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html > > Thanks. > > Remeber to look into <https://gcc.gnu.org/PR92116> "Potential null > pointer dereference in 'gomp_acc_remove_pointer'", which may be > relevant here. I've deleted the whole function (see below) so nothing to do there, I don't think, even if that code had still been live in the last version of the patch. > I see you've merged in the relevant parts of my incremental patch > '[WIP] OpenACC 2.6 manual deep copy support (attach/detach): adjust > for "goacc_async_unmap_tgt" removal', that I included in > <http://mid.mail-archive.com/yxfpftuqpakv.fsf@hertz.schwinge.homeip.net>, > which tells me that I supposedly understood that part alright. ;-D Yes I think so -- I'll add you as co-author to the ChangeLog. Apologies for the omission! > > As part of developing that patch, libgomp's OpenACC reference > > counting implementation proved to be somewhat inconsistent, > > especially when used in combination with the deep copy support > > which exercises it more thoroughly. > > > > So, this patch contains just the changes to reference-counting > > behaviour, for ease of (re-)review. The other parts of OpenACC 2.6 > > manual deep copy support are forthcoming, but some changes in this > > patch anticipate that support. > > As we're discussing these separately, please for now remove the > changes related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving > 'link_key' into an union (to later be shared with 'attach_count'); > <87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>. Done (I have a plan for the link_key/attach_count fields, but it's not in this patch, and I'm not sure how well it'll work out yet). > > Tested with offloading to NVPTX, with good results (though a couple > > of tests need fixing also). > > The testsuite changes we're discussing separately, and need to go in > before this one, obviously. Those tests no longer regress, so no testsuite changes are strictly necessary for this patch. > > OK for trunk? > > I haven't understood all the changes related to replacing > 'dynamic_refcount' with 'virtual_refcount', getting rid of > 'data_environ', the 'lookup_dev' rework, but I trust you got that > right. In particular, these seem to remove special-case OpenACC code > in favor of generic OMP code, which is good. Yep -- the previous email you dug up included the following rationale: - reference counts in the linked memory-mapping splay tree structure can be self-checked for consistency using optional (i.e. development-only) code. This survives a libgomp test run (with offloading to nvptx), so I'm reasonably confident it's good. - the "data_environ" field in the device descriptor -- a linear linked list containing a target memory descriptor for each "acc enter data" mapping -- has been removed. This brings OpenACC closer to the OpenMP implementation for non-lexically-scoped data mapping (GOMP_target_enter_exit_data), and is potentially a performance win if lots of data is mapped in this way. - the semantics of the "dynamic_refcount" field in the splay_tree_key structure have shifted slightly, so I've renamed the field. It now represents references that are excess to those represented by actual pointers in the linked splay tree/target-memory descriptor structure. That might have been the intention before in fact, but the implementation was inconsistent. The big thing here is the auto-checking of refcounting behaviour. There were quite a few corner cases that were broken before. > A few more comments: > > > --- a/libgomp/libgomp.h > > +++ b/libgomp/libgomp.h > > > typedef struct acc_dispatch_t > > { > > - /* This is a linked list of data mapped using the > > - acc_map_data/acc_unmap_data or "acc enter data"/"acc exit > > data" pragmas. > > - Unlike mapped_data in the goacc_thread struct, unmapping can > > - happen out-of-order with respect to mapping. */ > > - /* This is guarded by the lock in the "outer" struct > > gomp_device_descr. */ > > - struct target_mem_desc *data_environ; > > As mentioned before, please also accordingly update the comment > attached to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'. Done. > That code: > > > -/* Free address mapping tables. MM must be locked on entry, and > > remains locked > > - on return. */ > > - > > -attribute_hidden void > > -gomp_free_memmap (struct splay_tree_s *mem_map) > > -{ > > - while (mem_map->root) > > - { > > - struct target_mem_desc *tgt = mem_map->root->key.tgt; > > - > > - splay_tree_remove (mem_map, &mem_map->root->key); > > - free (tgt->array); > > - free (tgt); > > - } > > -} > > ... kind-of gets inlined here: > > > --- a/libgomp/oacc-init.c > > +++ b/libgomp/oacc-init.c > > @@ -356,9 +356,13 @@ acc_shutdown_1 (acc_device_t d) > > > > if (walk->dev) > > { > > - gomp_mutex_lock (&walk->dev->lock); > > - gomp_free_memmap (&walk->dev->mem_map); > > - gomp_mutex_unlock (&walk->dev->lock); > > + while (walk->dev->mem_map.root) > > + { > > + splay_tree_key k = &walk->dev->mem_map.root->key; > > + gomp_remove_var (walk->dev, k); > > + } > > > > walk->dev = NULL; > > walk->base_dev = NULL; > > It's not obvious to me why it's OK to remove the locking? Don't all > operations on the 'mem_map' have to have the device locked? You're probably right about this -- good catch. Although if the user is shutting down the device whilst it is still active (from some other thread?) it's just a case of how ugly their crash is going to be either way, I suspect! > Does that code now still have the previous (and expected?) "finalize" > semantics (don't consider 'refcount', always unmap)? (Should we > assert here that 'gomp_remove_var' always returns 'true'? And/or, if > it doesn't, what does that mean then?) Or am I confused? ;-) Yeah. The splay tree keys are removed one at a time (without paying attention to the refcounts for those), and the linked target_mem_descs are freed when their refcounts drop to zero. Hence is_tgt_unmapped won't always be true -- only when one of the linked target_mem_descs gets freed. > > --- a/libgomp/oacc-mem.c > > +++ b/libgomp/oacc-mem.c > > > @@ -427,6 +418,7 @@ acc_unmap_data (void *h) > > { > > struct goacc_thread *thr = goacc_thread (); > > struct gomp_device_descr *acc_dev = thr->dev; > > + struct splay_tree_key_s cur_node; > > I know it's often not the case in existing code, but when adding new > code, please move definitions next to their first use. Done. > > @@ -438,12 +430,11 @@ acc_unmap_data (void *h) > > acc_api_info api_info; > > bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, > > &api_info); > > gomp_mutex_lock (&acc_dev->lock); > > > > - splay_tree_key n = lookup_host (acc_dev, h, 1); > > - struct target_mem_desc *t; > > + cur_node.host_start = (uintptr_t) h; > > + cur_node.host_end = cur_node.host_start + 1; > > + splay_tree_key n = splay_tree_lookup (&acc_dev->mem_map, > > &cur_node); > > if (!n) > > { > > Isn't this just inlining 'lookup_host'? There may be a good reason > to do that, but what is it? Yeah, looks like it. I changed the code to use lookup_host. > > @@ -451,47 +442,28 @@ acc_unmap_data (void *h) > > > - /* Mark for removal. */ > > - n->refcount = 1; > > + splay_tree_remove (&acc_dev->mem_map, n); > > > > - t = n->tgt; > > + struct target_mem_desc *tgt = n->tgt; > > > > - if (t->refcount == 2) > > + if (tgt->refcount > 0) > > + tgt->refcount--; > > + else > > { > > -[...] > > + free (tgt->array); > > + free (tgt); > > } > > Shouldn't that be 'if (tgt->refcount > 1)' (instead of '> 0'), like in > 'gomp_unref_tgt' -- or actually use that function? I think you're right about the condition -- well spotted! We can't use gomp_unref_tgt here because acc_unmap_data isn't supposed to free the device memory. > > > > gomp_mutex_unlock (&acc_dev->lock); > > > > - gomp_unmap_vars (t, true); > > - > > if (profiling_p) > > { > > thr->prof_info = NULL; > > Hmm, I don't understand the changes leading to this, but again, I > shall trust that you've got that right. > > Or, was that a bug in the existing code, and we don't have proper test > coverage? I think that was a bug in the original code. > > @@ -577,17 +551,14 @@ present_create_copy (unsigned f, void *h, > > size_t s, int async) > > > - d = tgt->to_free; > > > + n = lookup_host (acc_dev, h, s); > > + assert (n != NULL); > > + d = (void *) (n->tgt->tgt_start + n->tgt_offset + > > (uintptr_t) h > > + - n->host_start); > > | return d; > > Again, it's not obvious to me how that is semantically equivalent to > what we've returned before? This is a bug fix (it's mentioned in the ChangeLog). > > void > > -gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom, > > int async, > > - int finalize, int mapnum) > > +gomp_acc_remove_pointer (struct gomp_device_descr *acc_dev, void > > **hostaddrs, > > + size_t *sizes, unsigned short *kinds, int > > async, > > + bool finalize, int mapnum) > > { > > > + switch (kind) > > + { > > + case GOMP_MAP_FROM: > > + case GOMP_MAP_FORCE_FROM: > > + case GOMP_MAP_ALWAYS_FROM: > > + copyfrom = true; > > + /* Fallthrough. */ > > + > > + case GOMP_MAP_TO_PSET: > > + case GOMP_MAP_POINTER: > > + case GOMP_MAP_DELETE: > > + case GOMP_MAP_RELEASE: > > [...] > > + default: > > + gomp_mutex_unlock (&acc_dev->lock); > > + gomp_fatal ("gomp_acc_remove_pointer unhandled kind > > 0x%.2x", > > + kind); > > Thanks for being explicit about the expected mapping kinds, etc. That code's all gone with this version... > > - /* If running synchronously, unmap immediately. */ > > - if (async < acc_async_noval) > > - gomp_unmap_vars (t, true); > > - else > > - { > > - goacc_aq aq = get_goacc_asyncqueue (async); > > - gomp_unmap_vars_async (t, true, aq); > > As mentioned before, 'gomp_acc_remove_pointer' now "has an unused > 'async' formal parameter. Is that meant to be resolved to an > asyncqueue, and pass that one to 'gomp_copy_dev2host', and call > 'gomp_remove_var_async' instead of 'gomp_remove_var'"? That's here: Hmm yeah, that's all gone however. > > + if (copyfrom) > > + gomp_copy_dev2host (acc_dev, NULL, (void *) > > cur_node.host_start, > > + (void *) (n->tgt->tgt_start + > > n->tgt_offset > > + + cur_node.host_start > > + - n->host_start), > > + cur_node.host_end - > > cur_node.host_start); + > > + if (n->refcount == 0) > > + gomp_remove_var (acc_dev, n); > > + break; > > > --- a/libgomp/oacc-parallel.c > > +++ b/libgomp/oacc-parallel.c > > @@ -56,12 +56,29 @@ find_pointer (int pos, size_t mapnum, unsigned > > short *kinds) > > I've always been confused by this function (before known as > 'find_pset'); this feels wrong, but I've never gotten to the bottom > of it. This version removes that function in favour of a function that finds groups of consecutive mappings that should be kept together for a single gomp_map_vars invocation. I think that fits better with my findings as written up on the wiki page https://gcc.gnu.org/wiki/LibgompPointerMappingKinds. > I'll trust that your changes there can only improve the current > situation, not worsen it. ;-) > > And, again, thanks for being explicit about the expected mapping > kinds, etc. > > > @@ -745,8 +762,14 @@ GOACC_enter_exit_data (int flags_m, size_t > > mapnum, } > > else > > { > > - gomp_acc_insert_pointer (pointer, &hostaddrs[i], > > - &sizes[i], &kinds[i], > > async); > > + goacc_aq aq = get_goacc_asyncqueue (async); > > + for (int j = 0; j < 2; j++) > > Should this magic constant '2' be derived from 'pointer' or some such? > > > + gomp_map_vars_async (acc_dev, aq, > > + (j == 0 || pointer == 2) ? > > 1 : 2, > > + &hostaddrs[i + j], NULL, > > + &sizes[i + j], &kinds[i + j], > > true, > > + > > GOMP_MAP_VARS_OPENACC_ENTER_DATA); > > ;-) Yuck. As requested before: "Can we get a comment added to such > 'magic', please?" That magic is gone now. > I just wish that eventually we'll be able to can get rid of that > stuff, and just let 'gomp_map_vars' do its thing. Similar to > <https://gcc.gnu.org/PR90596> "'GOACC_parallel_keyed' should use > 'GOMP_MAP_VARS_TARGET'". > > (For avoidance of doubt, that's not your task right now.) Does this version look better? I've removed the special-case handling of pointers in the enter/exit data code, and combined the gomp_acc_remove_pointer code (which now iterated over mappings one-at-a-time anyway) with the loop iterating over mappings in the new goacc_exit_data_internal function. It was a bit nonsensical to have the "exit data" code split over two files, as before. > > --- a/libgomp/target.c > > +++ b/libgomp/target.c > > @@ -536,7 +536,8 @@ gomp_map_vars_internal (struct > > gomp_device_descr *devicep, struct target_mem_desc *tgt > > = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); > > tgt->list_count = mapnum; > > - tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1; > > + tgt->refcount = (pragma_kind == GOMP_MAP_VARS_ENTER_DATA > > + || pragma_kind == > > GOMP_MAP_VARS_OPENACC_ENTER_DATA) ? 0 : 1; tgt->device_descr = > > devicep; struct gomp_coalesce_buf cbuf, *cbufp = NULL; > > > > @@ -1051,8 +1053,20 @@ gomp_map_vars_internal (struct > > gomp_device_descr *devicep, /* If the variable from "omp target > > enter data" map-list was already mapped, tgt is not needed. > > Otherwise tgt will be freed by gomp_unmap_vars or gomp_exit_data. > > */ > > - if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount == > > 0) > > - { > > + if ((pragma_kind == GOMP_MAP_VARS_ENTER_DATA > > + || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) > > + && tgt->refcount == 0) > > + { > > + /* If we're about to discard a target_mem_desc with no > > "structural" > > + references (tgt->refcount == 0), any splay keys linked in > > the tgt's > > + list must have their virtual refcount incremented to > > represent that > > + "lost" reference in order to implement the semantics of > > the OpenACC > > + "present increment" operation properly. */ > > + if (pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) > > + for (i = 0; i < tgt->list_count; i++) > > + if (tgt->list[i].key) > > + tgt->list[i].key->virtual_refcount++; > > + > > free (tgt); > > tgt = NULL; > > } > > So that last item is the only difference between > 'GOMP_MAP_VARS_ENTER_DATA' and 'GOMP_MAP_VARS_OPENACC_ENTER_DATA'. > Again I have not digested that one, but will trust you. Yeah, because of the OpenACC reference counting & finalize semantics, which I don't think are applicable to OpenMP. > > @@ -1310,7 +1366,7 @@ gomp_load_image_to_device (struct > > gomp_device_descr *devicep, unsigned version, k->tgt = tgt; > > k->tgt_offset = target_table[i].start; > > k->refcount = REFCOUNT_INFINITY; > > - k->link_key = NULL; > > + k->virtual_refcount = 0; > > array->left = NULL; > > array->right = NULL; > > splay_tree_insert (&devicep->mem_map, array); > > @@ -1342,7 +1398,7 @@ gomp_load_image_to_device (struct > > gomp_device_descr *devicep, unsigned version, k->tgt = tgt; > > k->tgt_offset = target_var->start; > > k->refcount = target_size & link_bit ? REFCOUNT_LINK : > > REFCOUNT_INFINITY; > > - k->link_key = NULL; > > + k->virtual_refcount = 0; > > array->left = NULL; > > array->right = NULL; > > splay_tree_insert (&devicep->mem_map, array); > > Why no longer initialize 'link_key' here? > > I'd expect that always all fields of 'k' ('struct splay_tree_key_s') > get initialized, so like: > > > @@ -2612,6 +2652,8 @@ omp_target_associate_ptr (const void > > *host_ptr, const void *device_ptr, k->tgt = tgt; > > k->tgt_offset = (uintptr_t) device_ptr + device_offset; > > k->refcount = REFCOUNT_INFINITY; > > + k->virtual_refcount = 0; > > + k->u.link_key = NULL; > > array->left = NULL; > > array->right = NULL; > > splay_tree_insert (&devicep->mem_map, array); > > (I haven't verified whether that's always done, please verify.) This version (without the link_key union, etc.) should avoid those problems. I've added some missing initialisations, too. Re-tested with offloading to nvptx. OK for trunk? Thanks, Julian
commit 5edb6e32d9afb5e9d2e0b72e8a311ce4d3865a34 Author: Julian Brown <jul...@codesourcery.com> Date: Mon Nov 5 15:51:46 2018 -0800 OpenACC reference count overhaul 2019-10-28 Julian Brown <jul...@codesourcery.com> Thomas Schwinge <tho...@codesourcery.com> libgomp/ * libgomp.h (struct splay_tree_key_s): Substitute dynamic_refcount field for virtual_refcount. (struct acc_dispatch_t): Remove data_environ field. (struct gomp_device_descr): Update comment on openacc field. (enum gomp_map_vars_kind): Add GOMP_MAP_VARS_OPENACC_ENTER_DATA. (gomp_acc_insert_pointer, gomp_acc_remove_pointer, gomp_free_memmap): Remove prototypes. (gomp_remove_var_async): Add prototype. * oacc-host.c (host_dispatch): Don't initialise removed data_environ field. * oacc-init.c (acc_shutdown_1): Iteratively call gomp_remove_var instead of calling gomp_free_memmap. * oacc-mem.c (lookup_dev_1): New function. (lookup_dev): Reimplement using above. (acc_free, acc_hostptr): Update calls to lookup_dev. (acc_map_data): Likewise. Don't add to data_environ list. (acc_unmap_data): Remove call to gomp_unmap_vars. Fix semantics to remove mapping, but not mapped data. (present_create_copy): Use virtual_refcount instead of dynamic_refcount. Don't manipulate data_environ. Fix target pointer return value. (delete_copyout): Update for virtual_refcount semantics. Use goacc_remove_var_async for asynchronous delete/copyouts. (gomp_acc_insert_pointer, gomp_acc_remove_pointer): Remove functions. * oacc-parallel.c (find_pointer): Remove function. (find_group_last, goacc_enter_data_internal, goacc_exit_data_internal): New functions. (GOACC_enter_exit_data): Use goacc_enter_data_internal and goacc_exit_data_internal helper functions. * target.c (gomp_map_vars_internal): Handle GOMP_MAP_VARS_OPENACC_ENTER_DATA. Update for virtual_refcount semantics. (gomp_remove_var): Reimplement in terms of... (gomp_remove_var_internal): ...this new helper function. (gomp_remove_var_async): Implement using above helper function. (gomp_unmap_vars_internal): Update for virtual_refcount semantics. (gomp_load_image_to_device): Zero-initialise virtual_refcount field. (gomp_free_memmap): Remove function. (omp_target_associate_ptr): Zero-initialise virtual_refcount and link_key splay tree key fields. (gomp_target_init): Don't initialise removed data_environ field. diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 178eb600ccd..8f01e9db9c5 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -919,8 +919,11 @@ struct splay_tree_key_s { uintptr_t tgt_offset; /* Reference count. */ uintptr_t refcount; - /* Dynamic reference count. */ - uintptr_t dynamic_refcount; + /* Reference counts beyond those that represent genuine references in the + linked splay tree key/target memory structures, e.g. for multiple OpenACC + "present increment" operations (via "acc enter data") referring to the same + host-memory block. */ + uintptr_t virtual_refcount; /* Pointer to the original mapping of "omp declare target link" object. */ splay_tree_key link_key; }; @@ -944,13 +947,6 @@ splay_compare (splay_tree_key x, splay_tree_key y) typedef struct acc_dispatch_t { - /* This is a linked list of data mapped using the - acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas. - Unlike mapped_data in the goacc_thread struct, unmapping can - happen out-of-order with respect to mapping. */ - /* This is guarded by the lock in the "outer" struct gomp_device_descr. */ - struct target_mem_desc *data_environ; - /* Execute. */ __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func; @@ -1051,8 +1047,7 @@ struct gomp_device_descr enum gomp_device_state state; /* OpenACC-specific data and functions. */ - /* This is mutable because of its mutable data_environ and target_data - members. */ + /* This is mutable because of the target_data member. */ acc_dispatch_t openacc; }; @@ -1060,13 +1055,12 @@ struct gomp_device_descr enum gomp_map_vars_kind { GOMP_MAP_VARS_OPENACC, + GOMP_MAP_VARS_OPENACC_ENTER_DATA, GOMP_MAP_VARS_TARGET, GOMP_MAP_VARS_DATA, GOMP_MAP_VARS_ENTER_DATA }; -extern void gomp_acc_insert_pointer (size_t, void **, size_t *, void *, int); -extern void gomp_acc_remove_pointer (void *, size_t, bool, int, int, int); extern void gomp_acc_declare_allocate (bool, size_t, void **, size_t *, unsigned short *); struct gomp_coalesce_buf; @@ -1092,9 +1086,10 @@ extern void gomp_unmap_vars_async (struct target_mem_desc *, bool, struct goacc_asyncqueue *); extern void gomp_init_device (struct gomp_device_descr *); extern bool gomp_fini_device (struct gomp_device_descr *); -extern void gomp_free_memmap (struct splay_tree_s *); extern void gomp_unload_device (struct gomp_device_descr *); extern bool gomp_remove_var (struct gomp_device_descr *, splay_tree_key); +extern void gomp_remove_var_async (struct gomp_device_descr *, splay_tree_key, + struct goacc_asyncqueue *); /* work.c */ diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index 12299aee65d..1b9adcec774 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -264,8 +264,6 @@ static struct gomp_device_descr host_dispatch = .state = GOMP_DEVICE_UNINITIALIZED, .openacc = { - .data_environ = NULL, - .exec_func = host_openacc_exec, .create_thread_data_func = host_openacc_create_thread_data, diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index e1568c535b3..8465ed503a3 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -357,7 +357,14 @@ acc_shutdown_1 (acc_device_t d) if (walk->dev) { gomp_mutex_lock (&walk->dev->lock); - gomp_free_memmap (&walk->dev->mem_map); + + while (walk->dev->mem_map.root) + { + splay_tree_key k = &walk->dev->mem_map.root->key; + k->link_key = NULL; + gomp_remove_var (walk->dev, k); + } + gomp_mutex_unlock (&walk->dev->lock); walk->dev = NULL; diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index 2f271009fb8..c6876fdff4f 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -50,42 +50,39 @@ lookup_host (struct gomp_device_descr *dev, void *h, size_t s) return key; } -/* Return block containing [D->S), or NULL if not contained. - The list isn't ordered by device address, so we have to iterate - over the whole array. This is not expected to be a common - operation. The device lock associated with TGT must be locked on entry, and - remains locked on exit. */ +/* Helper for lookup_dev. Iterate over splay tree. */ static splay_tree_key -lookup_dev (struct target_mem_desc *tgt, void *d, size_t s) +lookup_dev_1 (splay_tree_node node, uintptr_t d, size_t s) { - int i; - struct target_mem_desc *t; + splay_tree_key k = &node->key, found = NULL; + struct target_mem_desc *t = k->tgt; - if (!tgt) - return NULL; + if (d >= t->tgt_start && d + s <= t->tgt_end) + return k; - for (t = tgt; t != NULL; t = t->prev) - { - if (t->tgt_start <= (uintptr_t) d && t->tgt_end >= (uintptr_t) d + s) - break; - } + if (node->left) + found = lookup_dev_1 (node->left, d, s); - if (!t) - return NULL; + if (!found && node->right) + found = lookup_dev_1 (node->right, d, s); - for (i = 0; i < t->list_count; i++) - { - void * offset; + return found; +} - splay_tree_key k = &t->array[i].key; - offset = d - t->tgt_start + k->tgt_offset; +/* Return block containing [D->S), or NULL if not contained. + The list isn't ordered by device address, so we have to iterate + over the whole array. This is not expected to be a common + operation. The device lock associated with TGT must be locked on entry, and + remains locked on exit. */ - if (k->host_start + offset <= (void *) k->host_end) - return k; - } +static splay_tree_key +lookup_dev (splay_tree mem_map, void *d, size_t s) +{ + if (!mem_map || !mem_map->root) + return NULL; - return NULL; + return lookup_dev_1 (mem_map->root, (uintptr_t) d, s); } /* OpenACC is silent on how memory exhaustion is indicated. We return @@ -150,7 +147,7 @@ acc_free (void *d) /* We don't have to call lazy open here, as the ptr value must have been returned by acc_malloc. It's not permitted to pass NULL in (unless you got that null from acc_malloc). */ - if ((k = lookup_dev (acc_dev->openacc.data_environ, d, 1))) + if ((k = lookup_dev (&acc_dev->mem_map, d, 1))) { void *offset; @@ -301,7 +298,7 @@ acc_hostptr (void *d) gomp_mutex_lock (&acc_dev->lock); - n = lookup_dev (acc_dev->openacc.data_environ, d, 1); + n = lookup_dev (&acc_dev->mem_map, d, 1); if (!n) { @@ -396,7 +393,7 @@ acc_map_data (void *h, void *d, size_t s) (int)s); } - if (lookup_dev (thr->dev->openacc.data_environ, d, s)) + if (lookup_dev (&thr->dev->mem_map, d, s)) { gomp_mutex_unlock (&acc_dev->lock); gomp_fatal ("device address [%p, +%d] is already mapped", (void *)d, @@ -415,11 +412,6 @@ acc_map_data (void *h, void *d, size_t s) thr->api_info = NULL; } } - - gomp_mutex_lock (&acc_dev->lock); - tgt->prev = acc_dev->openacc.data_environ; - acc_dev->openacc.data_environ = tgt; - gomp_mutex_unlock (&acc_dev->lock); } void @@ -438,12 +430,9 @@ acc_unmap_data (void *h) acc_api_info api_info; bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info); - size_t host_size; - gomp_mutex_lock (&acc_dev->lock); splay_tree_key n = lookup_host (acc_dev, h, 1); - struct target_mem_desc *t; if (!n) { @@ -451,47 +440,28 @@ acc_unmap_data (void *h) gomp_fatal ("%p is not a mapped block", (void *)h); } - host_size = n->host_end - n->host_start; - if (n->host_start != (uintptr_t) h) { + size_t host_size = n->host_end - n->host_start; gomp_mutex_unlock (&acc_dev->lock); gomp_fatal ("[%p,%d] surrounds %p", (void *) n->host_start, (int) host_size, (void *) h); } - /* Mark for removal. */ - n->refcount = 1; + splay_tree_remove (&acc_dev->mem_map, n); - t = n->tgt; + struct target_mem_desc *tgt = n->tgt; - if (t->refcount == 2) + if (tgt->refcount > 1) + tgt->refcount--; + else { - struct target_mem_desc *tp; - - /* This is the last reference, so pull the descriptor off the - chain. This avoids gomp_unmap_vars via gomp_unmap_tgt from - freeing the device memory. */ - t->tgt_end = 0; - t->to_free = 0; - - for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL; - tp = t, t = t->prev) - if (n->tgt == t) - { - if (tp) - tp->prev = t->prev; - else - acc_dev->openacc.data_environ = t->prev; - - break; - } + free (tgt->array); + free (tgt); } gomp_mutex_unlock (&acc_dev->lock); - gomp_unmap_vars (t, true); - if (profiling_p) { thr->prof_info = NULL; @@ -552,8 +522,9 @@ present_create_copy (unsigned f, void *h, size_t s, int async) if (n->refcount != REFCOUNT_INFINITY) { n->refcount++; - n->dynamic_refcount++; + n->virtual_refcount++; } + gomp_mutex_unlock (&acc_dev->lock); } else if (!(f & FLAG_CREATE)) @@ -563,7 +534,6 @@ present_create_copy (unsigned f, void *h, size_t s, int async) } else { - struct target_mem_desc *tgt; size_t mapnum = 1; unsigned short kinds; void *hostaddrs = h; @@ -577,17 +547,14 @@ present_create_copy (unsigned f, void *h, size_t s, int async) goacc_aq aq = get_goacc_asyncqueue (async); - tgt = gomp_map_vars_async (acc_dev, aq, mapnum, &hostaddrs, NULL, &s, - &kinds, true, GOMP_MAP_VARS_OPENACC); - /* Initialize dynamic refcount. */ - tgt->list[0].key->dynamic_refcount = 1; + gomp_map_vars_async (acc_dev, aq, mapnum, &hostaddrs, NULL, &s, &kinds, + true, GOMP_MAP_VARS_OPENACC_ENTER_DATA); gomp_mutex_lock (&acc_dev->lock); - - d = tgt->to_free; - tgt->prev = acc_dev->openacc.data_environ; - acc_dev->openacc.data_environ = tgt; - + n = lookup_host (acc_dev, h, s); + assert (n != NULL); + d = (void *) (n->tgt->tgt_start + n->tgt_offset + (uintptr_t) h + - n->host_start); gomp_mutex_unlock (&acc_dev->lock); } @@ -671,7 +638,6 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) { size_t host_size; splay_tree_key n; - void *d; struct goacc_thread *thr = goacc_thread (); struct gomp_device_descr *acc_dev = thr->dev; @@ -700,9 +666,6 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) gomp_fatal ("[%p,%d] is not mapped", (void *)h, (int)s); } - d = (void *) (n->tgt->tgt_start + n->tgt_offset - + (uintptr_t) h - n->host_start); - host_size = n->host_end - n->host_start; if (n->host_start != (uintptr_t) h || host_size != s) @@ -715,48 +678,34 @@ delete_copyout (unsigned f, void *h, size_t s, int async, const char *libfnname) if (n->refcount == REFCOUNT_INFINITY) { n->refcount = 0; - n->dynamic_refcount = 0; - } - if (n->refcount < n->dynamic_refcount) - { - gomp_mutex_unlock (&acc_dev->lock); - gomp_fatal ("Dynamic reference counting assert fail\n"); + n->virtual_refcount = 0; } if (f & FLAG_FINALIZE) { - n->refcount -= n->dynamic_refcount; - n->dynamic_refcount = 0; + n->refcount -= n->virtual_refcount; + n->virtual_refcount = 0; } - else if (n->dynamic_refcount) + + if (n->virtual_refcount > 0) { - n->dynamic_refcount--; n->refcount--; + n->virtual_refcount--; } + else if (n->refcount > 0) + n->refcount--; if (n->refcount == 0) { - if (n->tgt->refcount == 2) - { - struct target_mem_desc *tp, *t; - for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL; - tp = t, t = t->prev) - if (n->tgt == t) - { - if (tp) - tp->prev = t->prev; - else - acc_dev->openacc.data_environ = t->prev; - break; - } - } + goacc_aq aq = get_goacc_asyncqueue (async); if (f & FLAG_COPYOUT) { - goacc_aq aq = get_goacc_asyncqueue (async); + void *d = (void *) (n->tgt->tgt_start + n->tgt_offset + + (uintptr_t) h - n->host_start); gomp_copy_dev2host (acc_dev, aq, h, d, s); } - gomp_remove_var (acc_dev, n); + gomp_remove_var_async (acc_dev, n, aq); } gomp_mutex_unlock (&acc_dev->lock); @@ -892,142 +841,3 @@ acc_update_self_async (void *h, size_t s, int async) { update_dev_host (0, h, s, async); } - -void -gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes, - void *kinds, int async) -{ - struct target_mem_desc *tgt; - struct goacc_thread *thr = goacc_thread (); - struct gomp_device_descr *acc_dev = thr->dev; - - if (acc_is_present (*hostaddrs, *sizes)) - { - splay_tree_key n; - gomp_mutex_lock (&acc_dev->lock); - n = lookup_host (acc_dev, *hostaddrs, *sizes); - gomp_mutex_unlock (&acc_dev->lock); - - tgt = n->tgt; - for (size_t i = 0; i < tgt->list_count; i++) - if (tgt->list[i].key == n) - { - for (size_t j = 0; j < mapnum; j++) - if (i + j < tgt->list_count && tgt->list[i + j].key) - { - tgt->list[i + j].key->refcount++; - tgt->list[i + j].key->dynamic_refcount++; - } - return; - } - /* Should not reach here. */ - gomp_fatal ("Dynamic refcount incrementing failed for pointer/pset"); - } - - gomp_debug (0, " %s: prepare mappings\n", __FUNCTION__); - goacc_aq aq = get_goacc_asyncqueue (async); - tgt = gomp_map_vars_async (acc_dev, aq, mapnum, hostaddrs, - NULL, sizes, kinds, true, GOMP_MAP_VARS_OPENACC); - gomp_debug (0, " %s: mappings prepared\n", __FUNCTION__); - - /* Initialize dynamic refcount. */ - tgt->list[0].key->dynamic_refcount = 1; - - gomp_mutex_lock (&acc_dev->lock); - tgt->prev = acc_dev->openacc.data_environ; - acc_dev->openacc.data_environ = tgt; - gomp_mutex_unlock (&acc_dev->lock); -} - -void -gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom, int async, - int finalize, int mapnum) -{ - struct goacc_thread *thr = goacc_thread (); - struct gomp_device_descr *acc_dev = thr->dev; - splay_tree_key n; - struct target_mem_desc *t; - int minrefs = (mapnum == 1) ? 2 : 3; - - if (!acc_is_present (h, s)) - return; - - gomp_mutex_lock (&acc_dev->lock); - - n = lookup_host (acc_dev, h, 1); - - if (!n) - { - gomp_mutex_unlock (&acc_dev->lock); - gomp_fatal ("%p is not a mapped block", (void *)h); - } - - gomp_debug (0, " %s: restore mappings\n", __FUNCTION__); - - t = n->tgt; - - if (n->refcount < n->dynamic_refcount) - { - gomp_mutex_unlock (&acc_dev->lock); - gomp_fatal ("Dynamic reference counting assert fail\n"); - } - - if (finalize) - { - n->refcount -= n->dynamic_refcount; - n->dynamic_refcount = 0; - } - else if (n->dynamic_refcount) - { - n->dynamic_refcount--; - n->refcount--; - } - - gomp_mutex_unlock (&acc_dev->lock); - - if (n->refcount == 0) - { - if (t->refcount == minrefs) - { - /* This is the last reference, so pull the descriptor off the - chain. This prevents gomp_unmap_vars via gomp_unmap_tgt from - freeing the device memory. */ - struct target_mem_desc *tp; - for (tp = NULL, t = acc_dev->openacc.data_environ; t != NULL; - tp = t, t = t->prev) - { - if (n->tgt == t) - { - if (tp) - tp->prev = t->prev; - else - acc_dev->openacc.data_environ = t->prev; - break; - } - } - } - - /* Set refcount to 1 to allow gomp_unmap_vars to unmap it. */ - n->refcount = 1; - t->refcount = minrefs; - for (size_t i = 0; i < t->list_count; i++) - if (t->list[i].key == n) - { - t->list[i].copy_from = force_copyfrom ? 1 : 0; - break; - } - - /* If running synchronously, unmap immediately. */ - if (async < acc_async_noval) - gomp_unmap_vars (t, true); - else - { - goacc_aq aq = get_goacc_asyncqueue (async); - gomp_unmap_vars_async (t, true, aq); - } - } - - gomp_mutex_unlock (&acc_dev->lock); - - gomp_debug (0, " %s: mappings restored\n", __FUNCTION__); -} diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c index 68a60de24fa..87f9ae01906 100644 --- a/libgomp/oacc-parallel.c +++ b/libgomp/oacc-parallel.c @@ -47,23 +47,39 @@ _Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_HOST_FALLBACK) "legacy GOMP_DEVICE_HOST_FALLBACK broken"); -/* Returns the number of mappings associated with the pointer or pset. PSET - have three mappings, whereas pointer have two. */ +/* Some types of (pointer) variables use several consecutive mappings, which + must be treated as a group for enter/exit data directives. This function + returns the last mapping in such a group (inclusive), or POS for singleton + mappings. */ static int -find_pointer (int pos, size_t mapnum, unsigned short *kinds) +find_group_last (int pos, size_t mapnum, unsigned short *kinds) { - if (pos + 1 >= mapnum) - return 0; + unsigned char kind0 = kinds[pos] & 0xff; + int first_pos = pos, last_pos = pos; - unsigned char kind = kinds[pos+1] & 0xff; - - if (kind == GOMP_MAP_TO_PSET) - return 3; - else if (kind == GOMP_MAP_POINTER) - return 2; + if (kind0 == GOMP_MAP_TO_PSET) + { + while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER) + last_pos = ++pos; + /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET. */ + assert (last_pos > first_pos); + } + else + { + /* GOMP_MAP_ALWAYS_POINTER can only appear directly after some other + mapping. */ + if (pos + 1 < mapnum + && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER) + return pos + 1; + + /* We can have one or several GOMP_MAP_POINTER mappings after a to/from + (etc.) mapping. */ + while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER) + last_pos = ++pos; + } - return 0; + return last_pos; } /* Handle the mapping pair that are presented when a @@ -580,6 +596,108 @@ GOACC_data_end (void) } } +/* Map variables for OpenACC "enter data". We can't just call + gomp_map_vars_async once, because individual mapped variables might have + "exit data" called for them at different times. */ + +static void +goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, + void **hostaddrs, size_t *sizes, + unsigned short *kinds, goacc_aq aq) +{ + for (size_t i = 0; i < mapnum; i++) + { + int group_last = find_group_last (i, mapnum, kinds); + + gomp_map_vars_async (acc_dev, aq, + (group_last - i) + 1, + &hostaddrs[i], NULL, + &sizes[i], &kinds[i], true, + GOMP_MAP_VARS_OPENACC_ENTER_DATA); + + i = group_last; + } +} + +/* Unmap variables for OpenACC "exit data", with optional finalization + (affecting all mappings in this operation). */ + +static void +goacc_exit_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, + void **hostaddrs, size_t *sizes, + unsigned short *kinds, bool finalize, goacc_aq aq) +{ + gomp_mutex_lock (&acc_dev->lock); + + for (size_t i = 0; i < mapnum; ++i) + { + unsigned char kind = kinds[i] & 0xff; + bool copyfrom = false; + + switch (kind) + { + case GOMP_MAP_FROM: + case GOMP_MAP_FORCE_FROM: + case GOMP_MAP_ALWAYS_FROM: + copyfrom = true; + /* Fallthrough. */ + + case GOMP_MAP_TO_PSET: + case GOMP_MAP_POINTER: + case GOMP_MAP_DELETE: + case GOMP_MAP_RELEASE: + { + struct splay_tree_key_s cur_node; + cur_node.host_start = (uintptr_t) hostaddrs[i]; + cur_node.host_end = cur_node.host_start + + (kind == GOMP_MAP_POINTER + ? sizeof (void *) : sizes[i]); + splay_tree_key n + = splay_tree_lookup (&acc_dev->mem_map, &cur_node); + + if (n == NULL) + continue; + + if (n->refcount == REFCOUNT_INFINITY) + { + n->refcount = 1; + n->virtual_refcount = 0; + } + + if (finalize) + { + n->refcount -= n->virtual_refcount; + n->virtual_refcount = 0; + } + + if (n->virtual_refcount > 0) + { + n->refcount--; + n->virtual_refcount--; + } + else if (n->refcount > 0) + n->refcount--; + + if (copyfrom) + gomp_copy_dev2host (acc_dev, aq, (void *) cur_node.host_start, + (void *) (n->tgt->tgt_start + n->tgt_offset + + cur_node.host_start + - n->host_start), + cur_node.host_end - cur_node.host_start); + + if (n->refcount == 0) + gomp_remove_var_async (acc_dev, n, aq); + } + break; + default: + gomp_fatal (">>>> goacc_exit_data_internal UNHANDLED kind 0x%.2x", + kind); + } + } + + gomp_mutex_unlock (&acc_dev->lock); +} + void GOACC_enter_exit_data (int flags_m, size_t mapnum, void **hostaddrs, size_t *sizes, unsigned short *kinds, @@ -707,98 +825,13 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum, va_end (ap); } - /* In c, non-pointers and arrays are represented by a single data clause. - Dynamically allocated arrays and subarrays are represented by a data - clause followed by an internal GOMP_MAP_POINTER. - - In fortran, scalars and not allocated arrays are represented by a - single data clause. Allocated arrays and subarrays have three mappings: - 1) the original data clause, 2) a PSET 3) a pointer to the array data. - */ + goacc_aq aq = get_goacc_asyncqueue (async); if (data_enter) - { - for (i = 0; i < mapnum; i++) - { - unsigned char kind = kinds[i] & 0xff; - - /* Scan for pointers and PSETs. */ - int pointer = find_pointer (i, mapnum, kinds); - - if (!pointer) - { - switch (kind) - { - case GOMP_MAP_ALLOC: - case GOMP_MAP_FORCE_ALLOC: - acc_create_async (hostaddrs[i], sizes[i], async); - break; - case GOMP_MAP_TO: - case GOMP_MAP_FORCE_TO: - acc_copyin_async (hostaddrs[i], sizes[i], async); - break; - default: - gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x", - kind); - break; - } - } - else - { - gomp_acc_insert_pointer (pointer, &hostaddrs[i], - &sizes[i], &kinds[i], async); - /* Increment 'i' by two because OpenACC requires fortran - arrays to be contiguous, so each PSET is associated with - one of MAP_FORCE_ALLOC/MAP_FORCE_PRESET/MAP_FORCE_TO, and - one MAP_POINTER. */ - i += pointer - 1; - } - } - } + goacc_enter_data_internal (acc_dev, mapnum, hostaddrs, sizes, kinds, aq); else - for (i = 0; i < mapnum; ++i) - { - unsigned char kind = kinds[i] & 0xff; - - int pointer = find_pointer (i, mapnum, kinds); - - if (!pointer) - { - switch (kind) - { - case GOMP_MAP_RELEASE: - case GOMP_MAP_DELETE: - if (acc_is_present (hostaddrs[i], sizes[i])) - { - if (finalize) - acc_delete_finalize_async (hostaddrs[i], sizes[i], async); - else - acc_delete_async (hostaddrs[i], sizes[i], async); - } - break; - case GOMP_MAP_FROM: - case GOMP_MAP_FORCE_FROM: - if (finalize) - acc_copyout_finalize_async (hostaddrs[i], sizes[i], async); - else - acc_copyout_async (hostaddrs[i], sizes[i], async); - break; - default: - gomp_fatal (">>>> GOACC_enter_exit_data UNHANDLED kind 0x%.2x", - kind); - break; - } - } - else - { - bool copyfrom = (kind == GOMP_MAP_FORCE_FROM - || kind == GOMP_MAP_FROM); - gomp_acc_remove_pointer (hostaddrs[i], sizes[i], copyfrom, async, - finalize, pointer); - /* See the above comment. */ - i += pointer - 1; - } - } + goacc_exit_data_internal (acc_dev, mapnum, hostaddrs, sizes, kinds, + finalize, aq); out_prof: if (profiling_p) diff --git a/libgomp/target.c b/libgomp/target.c index 84d6daa76ca..552820d874f 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -536,8 +536,10 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); tgt->list_count = mapnum; - tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1; + tgt->refcount = (pragma_kind == GOMP_MAP_VARS_ENTER_DATA + || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) ? 0 : 1; tgt->device_descr = devicep; + tgt->prev = NULL; struct gomp_coalesce_buf cbuf, *cbufp = NULL; if (mapnum == 0) @@ -937,7 +939,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, tgt->list[i].offset = 0; tgt->list[i].length = k->host_end - k->host_start; k->refcount = 1; - k->dynamic_refcount = 0; + k->virtual_refcount = 0; tgt->refcount++; array->left = NULL; array->right = NULL; @@ -1075,8 +1077,20 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, /* If the variable from "omp target enter data" map-list was already mapped, tgt is not needed. Otherwise tgt will be freed by gomp_unmap_vars or gomp_exit_data. */ - if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount == 0) - { + if ((pragma_kind == GOMP_MAP_VARS_ENTER_DATA + || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) + && tgt->refcount == 0) + { + /* If we're about to discard a target_mem_desc with no "structural" + references (tgt->refcount == 0), any splay keys linked in the tgt's + list must have their virtual refcount incremented to represent that + "lost" reference in order to implement the semantics of the OpenACC + "present increment" operation properly. */ + if (pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) + for (i = 0; i < tgt->list_count; i++) + if (tgt->list[i].key) + tgt->list[i].key->virtual_refcount++; + free (tgt); tgt = NULL; } @@ -1116,32 +1130,63 @@ gomp_unmap_tgt (struct target_mem_desc *tgt) free (tgt); } -attribute_hidden bool -gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k) +static bool +gomp_unref_tgt (void *ptr) { bool is_tgt_unmapped = false; - splay_tree_remove (&devicep->mem_map, k); - if (k->link_key) - splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->link_key); - if (k->tgt->refcount > 1) - k->tgt->refcount--; + + struct target_mem_desc *tgt = (struct target_mem_desc *) ptr; + + if (tgt->refcount > 1) + tgt->refcount--; else { + gomp_unmap_tgt (tgt); is_tgt_unmapped = true; - gomp_unmap_tgt (k->tgt); } + return is_tgt_unmapped; } static void -gomp_unref_tgt (void *ptr) +gomp_unref_tgt_void (void *ptr) { - struct target_mem_desc *tgt = (struct target_mem_desc *) ptr; + (void) gomp_unref_tgt (ptr); +} - if (tgt->refcount > 1) - tgt->refcount--; +static inline __attribute__((always_inline)) bool +gomp_remove_var_internal (struct gomp_device_descr *devicep, splay_tree_key k, + struct goacc_asyncqueue *aq) +{ + bool is_tgt_unmapped = false; + splay_tree_remove (&devicep->mem_map, k); + if (k->link_key) + splay_tree_insert (&devicep->mem_map, (splay_tree_node) k->link_key); + if (aq) + devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void, + (void *) k->tgt); else - gomp_unmap_tgt (tgt); + is_tgt_unmapped = gomp_unref_tgt ((void *) k->tgt); + return is_tgt_unmapped; +} + +attribute_hidden bool +gomp_remove_var (struct gomp_device_descr *devicep, splay_tree_key k) +{ + return gomp_remove_var_internal (devicep, k, NULL); +} + +/* Remove a variable asynchronously. This actually removes the variable + mapping immediately, but retains the linked target_mem_desc until the + asynchronous operation has completed (as it may still refer to target + memory). The device lock must be held before entry, and remains locked on + exit. */ + +attribute_hidden void +gomp_remove_var_async (struct gomp_device_descr *devicep, splay_tree_key k, + struct goacc_asyncqueue *aq) +{ + (void) gomp_remove_var_internal (devicep, k, aq); } /* Unmap variables described by TGT. If DO_COPYFROM is true, copy relevant @@ -1177,7 +1222,14 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom, continue; bool do_unmap = false; - if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY) + if (k->tgt == tgt + && k->virtual_refcount > 0 + && k->refcount != REFCOUNT_INFINITY) + { + k->virtual_refcount--; + k->refcount--; + } + else if (k->refcount > 1 && k->refcount != REFCOUNT_INFINITY) k->refcount--; else if (k->refcount == 1) { @@ -1197,7 +1249,7 @@ gomp_unmap_vars_internal (struct target_mem_desc *tgt, bool do_copyfrom, } if (aq) - devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt, + devicep->openacc.async.queue_callback_func (aq, gomp_unref_tgt_void, (void *) tgt); else gomp_unref_tgt ((void *) tgt); @@ -1334,6 +1386,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, k->tgt = tgt; k->tgt_offset = target_table[i].start; k->refcount = REFCOUNT_INFINITY; + k->virtual_refcount = 0; k->link_key = NULL; array->left = NULL; array->right = NULL; @@ -1366,6 +1419,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version, k->tgt = tgt; k->tgt_offset = target_var->start; k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY; + k->virtual_refcount = 0; k->link_key = NULL; array->left = NULL; array->right = NULL; @@ -1600,22 +1654,6 @@ gomp_unload_device (struct gomp_device_descr *devicep) } } -/* Free address mapping tables. MM must be locked on entry, and remains locked - on return. */ - -attribute_hidden void -gomp_free_memmap (struct splay_tree_s *mem_map) -{ - while (mem_map->root) - { - struct target_mem_desc *tgt = mem_map->root->key.tgt; - - splay_tree_remove (mem_map, &mem_map->root->key); - free (tgt->array); - free (tgt); - } -} - /* Host fallback for GOMP_target{,_ext} routines. */ static void @@ -2636,6 +2674,8 @@ omp_target_associate_ptr (const void *host_ptr, const void *device_ptr, k->tgt = tgt; k->tgt_offset = (uintptr_t) device_ptr + device_offset; k->refcount = REFCOUNT_INFINITY; + k->virtual_refcount = 0; + k->link_key = NULL; array->left = NULL; array->right = NULL; splay_tree_insert (&devicep->mem_map, array); @@ -2906,7 +2946,6 @@ gomp_target_init (void) current_device.type = current_device.get_type_func (); current_device.mem_map.root = NULL; current_device.state = GOMP_DEVICE_UNINITIALIZED; - current_device.openacc.data_environ = NULL; for (i = 0; i < new_num_devices; i++) { current_device.target_id = i;