Hi! On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com> wrote: > This patch has been broken out of the patch supporting OpenACC 2.6 manual > deep copy last posted here: > > https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html
Thanks. Remeber to look into <https://gcc.gnu.org/PR92116> "Potential null pointer dereference in 'gomp_acc_remove_pointer'", which may be relevant here. I see you've merged in the relevant parts of my incremental patch '[WIP] OpenACC 2.6 manual deep copy support (attach/detach): adjust for "goacc_async_unmap_tgt" removal', that I included in <http://mid.mail-archive.com/yxfpftuqpakv.fsf@hertz.schwinge.homeip.net>, which tells me that I supposedly understood that part alright. ;-D > As part of developing that patch, libgomp's OpenACC reference counting > implementation proved to be somewhat inconsistent, especially when > used in combination with the deep copy support which exercises it > more thoroughly. > > So, this patch contains just the changes to reference-counting behaviour, > for ease of (re-)review. The other parts of OpenACC 2.6 manual deep > copy support are forthcoming, but some changes in this patch anticipate > that support. As we're discussing these separately, please for now remove the changes related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving 'link_key' into an union (to later be shared with 'attach_count'); <87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>. > Tested with offloading to NVPTX, with good results (though a couple of > tests need fixing also). The testsuite changes we're discussing separately, and need to go in before this one, obviously. > OK for trunk? I haven't understood all the changes related to replacing 'dynamic_refcount' with 'virtual_refcount', getting rid of 'data_environ', the 'lookup_dev' rework, but I trust you got that right. In particular, these seem to remove special-case OpenACC code in favor of generic OMP code, which is good. A few more comments: > --- a/libgomp/libgomp.h > +++ b/libgomp/libgomp.h > typedef struct acc_dispatch_t > { > - /* This is a linked list of data mapped using the > - acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas. > - Unlike mapped_data in the goacc_thread struct, unmapping can > - happen out-of-order with respect to mapping. */ > - /* This is guarded by the lock in the "outer" struct gomp_device_descr. */ > - struct target_mem_desc *data_environ; As mentioned before, please also accordingly update the comment attached to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'. That code: > -/* Free address mapping tables. MM must be locked on entry, and remains > locked > - on return. */ > - > -attribute_hidden void > -gomp_free_memmap (struct splay_tree_s *mem_map) > -{ > - while (mem_map->root) > - { > - struct target_mem_desc *tgt = mem_map->root->key.tgt; > - > - splay_tree_remove (mem_map, &mem_map->root->key); > - free (tgt->array); > - free (tgt); > - } > -} ... kind-of gets inlined here: > --- a/libgomp/oacc-init.c > +++ b/libgomp/oacc-init.c > @@ -356,9 +356,13 @@ acc_shutdown_1 (acc_device_t d) > > if (walk->dev) > { > - gomp_mutex_lock (&walk->dev->lock); > - gomp_free_memmap (&walk->dev->mem_map); > - gomp_mutex_unlock (&walk->dev->lock); > + while (walk->dev->mem_map.root) > + { > + splay_tree_key k = &walk->dev->mem_map.root->key; > + gomp_remove_var (walk->dev, k); > + } > > walk->dev = NULL; > walk->base_dev = NULL; It's not obvious to me why it's OK to remove the locking? Don't all operations on the 'mem_map' have to have the device locked? Does that code now still have the previous (and expected?) "finalize" semantics (don't consider 'refcount', always unmap)? (Should we assert here that 'gomp_remove_var' always returns 'true'? And/or, if it doesn't, what does that mean then?) Or am I confused? ;-) > --- a/libgomp/oacc-mem.c > +++ b/libgomp/oacc-mem.c > @@ -427,6 +418,7 @@ acc_unmap_data (void *h) > { > struct goacc_thread *thr = goacc_thread (); > struct gomp_device_descr *acc_dev = thr->dev; > + struct splay_tree_key_s cur_node; I know it's often not the case in existing code, but when adding new code, please move definitions next to their first use. > @@ -438,12 +430,11 @@ acc_unmap_data (void *h) > acc_api_info api_info; > bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info); > > gomp_mutex_lock (&acc_dev->lock); > > - splay_tree_key n = lookup_host (acc_dev, h, 1); > - struct target_mem_desc *t; > + cur_node.host_start = (uintptr_t) h; > + cur_node.host_end = cur_node.host_start + 1; > + splay_tree_key n = splay_tree_lookup (&acc_dev->mem_map, &cur_node); > > if (!n) > { Isn't this just inlining 'lookup_host'? There may be a good reason to do that, but what is it? > @@ -451,47 +442,28 @@ acc_unmap_data (void *h) > - /* Mark for removal. */ > - n->refcount = 1; > + splay_tree_remove (&acc_dev->mem_map, n); > > - t = n->tgt; > + struct target_mem_desc *tgt = n->tgt; > > - if (t->refcount == 2) > + if (tgt->refcount > 0) > + tgt->refcount--; > + else > { > -[...] > + free (tgt->array); > + free (tgt); > } Shouldn't that be 'if (tgt->refcount > 1)' (instead of '> 0'), like in 'gomp_unref_tgt' -- or actually use that function? > > gomp_mutex_unlock (&acc_dev->lock); > > - gomp_unmap_vars (t, true); > - > if (profiling_p) > { > thr->prof_info = NULL; Hmm, I don't understand the changes leading to this, but again, I shall trust that you've got that right. Or, was that a bug in the existing code, and we don't have proper test coverage? > @@ -577,17 +551,14 @@ present_create_copy (unsigned f, void *h, size_t s, int > async) > - d = tgt->to_free; > + n = lookup_host (acc_dev, h, s); > + assert (n != NULL); > + d = (void *) (n->tgt->tgt_start + n->tgt_offset + (uintptr_t) h > + - n->host_start); | return d; Again, it's not obvious to me how that is semantically equivalent to what we've returned before? > void > -gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom, int async, > - int finalize, int mapnum) > +gomp_acc_remove_pointer (struct gomp_device_descr *acc_dev, void **hostaddrs, > + size_t *sizes, unsigned short *kinds, int async, > + bool finalize, int mapnum) > { > + switch (kind) > + { > + case GOMP_MAP_FROM: > + case GOMP_MAP_FORCE_FROM: > + case GOMP_MAP_ALWAYS_FROM: > + copyfrom = true; > + /* Fallthrough. */ > + > + case GOMP_MAP_TO_PSET: > + case GOMP_MAP_POINTER: > + case GOMP_MAP_DELETE: > + case GOMP_MAP_RELEASE: > [...] > + default: > + gomp_mutex_unlock (&acc_dev->lock); > + gomp_fatal ("gomp_acc_remove_pointer unhandled kind 0x%.2x", > + kind); Thanks for being explicit about the expected mapping kinds, etc. > - /* If running synchronously, unmap immediately. */ > - if (async < acc_async_noval) > - gomp_unmap_vars (t, true); > - else > - { > - goacc_aq aq = get_goacc_asyncqueue (async); > - gomp_unmap_vars_async (t, true, aq); As mentioned before, 'gomp_acc_remove_pointer' now "has an unused 'async' formal parameter. Is that meant to be resolved to an asyncqueue, and pass that one to 'gomp_copy_dev2host', and call 'gomp_remove_var_async' instead of 'gomp_remove_var'"? That's here: > + if (copyfrom) > + gomp_copy_dev2host (acc_dev, NULL, (void *) cur_node.host_start, > + (void *) (n->tgt->tgt_start + n->tgt_offset > + + cur_node.host_start > + - n->host_start), > + cur_node.host_end - cur_node.host_start); > + > + if (n->refcount == 0) > + gomp_remove_var (acc_dev, n); > + break; > --- a/libgomp/oacc-parallel.c > +++ b/libgomp/oacc-parallel.c > @@ -56,12 +56,29 @@ find_pointer (int pos, size_t mapnum, unsigned short > *kinds) I've always been confused by this function (before known as 'find_pset'); this feels wrong, but I've never gotten to the bottom of it. I'll trust that your changes there can only improve the current situation, not worsen it. ;-) And, again, thanks for being explicit about the expected mapping kinds, etc. > @@ -745,8 +762,14 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum, > } > else > { > - gomp_acc_insert_pointer (pointer, &hostaddrs[i], > - &sizes[i], &kinds[i], async); > + goacc_aq aq = get_goacc_asyncqueue (async); > + for (int j = 0; j < 2; j++) Should this magic constant '2' be derived from 'pointer' or some such? > + gomp_map_vars_async (acc_dev, aq, > + (j == 0 || pointer == 2) ? 1 : 2, > + &hostaddrs[i + j], NULL, > + &sizes[i + j], &kinds[i + j], true, > + GOMP_MAP_VARS_OPENACC_ENTER_DATA); ;-) Yuck. As requested before: "Can we get a comment added to such 'magic', please?" I just wish that eventually we'll be able to can get rid of that stuff, and just let 'gomp_map_vars' do its thing. Similar to <https://gcc.gnu.org/PR90596> "'GOACC_parallel_keyed' should use 'GOMP_MAP_VARS_TARGET'". (For avoidance of doubt, that's not your task right now.) > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -536,7 +536,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, > struct target_mem_desc *tgt > = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum); > tgt->list_count = mapnum; > - tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1; > + tgt->refcount = (pragma_kind == GOMP_MAP_VARS_ENTER_DATA > + || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) ? 0 : 1; > tgt->device_descr = devicep; > struct gomp_coalesce_buf cbuf, *cbufp = NULL; > > @@ -1051,8 +1053,20 @@ gomp_map_vars_internal (struct gomp_device_descr > *devicep, > /* If the variable from "omp target enter data" map-list was already > mapped, > tgt is not needed. Otherwise tgt will be freed by gomp_unmap_vars or > gomp_exit_data. */ > - if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount == 0) > - { > + if ((pragma_kind == GOMP_MAP_VARS_ENTER_DATA > + || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) > + && tgt->refcount == 0) > + { > + /* If we're about to discard a target_mem_desc with no "structural" > + references (tgt->refcount == 0), any splay keys linked in the tgt's > + list must have their virtual refcount incremented to represent that > + "lost" reference in order to implement the semantics of the OpenACC > + "present increment" operation properly. */ > + if (pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) > + for (i = 0; i < tgt->list_count; i++) > + if (tgt->list[i].key) > + tgt->list[i].key->virtual_refcount++; > + > free (tgt); > tgt = NULL; > } So that last item is the only difference between 'GOMP_MAP_VARS_ENTER_DATA' and 'GOMP_MAP_VARS_OPENACC_ENTER_DATA'. Again I have not digested that one, but will trust you. > @@ -1310,7 +1366,7 @@ gomp_load_image_to_device (struct gomp_device_descr > *devicep, unsigned version, > k->tgt = tgt; > k->tgt_offset = target_table[i].start; > k->refcount = REFCOUNT_INFINITY; > - k->link_key = NULL; > + k->virtual_refcount = 0; > array->left = NULL; > array->right = NULL; > splay_tree_insert (&devicep->mem_map, array); > @@ -1342,7 +1398,7 @@ gomp_load_image_to_device (struct gomp_device_descr > *devicep, unsigned version, > k->tgt = tgt; > k->tgt_offset = target_var->start; > k->refcount = target_size & link_bit ? REFCOUNT_LINK : > REFCOUNT_INFINITY; > - k->link_key = NULL; > + k->virtual_refcount = 0; > array->left = NULL; > array->right = NULL; > splay_tree_insert (&devicep->mem_map, array); Why no longer initialize 'link_key' here? I'd expect that always all fields of 'k' ('struct splay_tree_key_s') get initialized, so like: > @@ -2612,6 +2652,8 @@ omp_target_associate_ptr (const void *host_ptr, const > void *device_ptr, > k->tgt = tgt; > k->tgt_offset = (uintptr_t) device_ptr + device_offset; > k->refcount = REFCOUNT_INFINITY; > + k->virtual_refcount = 0; > + k->u.link_key = NULL; > array->left = NULL; > array->right = NULL; > splay_tree_insert (&devicep->mem_map, array); (I haven't verified whether that's always done, please verify.) Grüße Thomas
signature.asc
Description: PGP signature