Hi!

On 2019-10-03T09:35:04-0700, Julian Brown <jul...@codesourcery.com> wrote:
> This patch has been broken out of the patch supporting OpenACC 2.6 manual
> deep copy last posted here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html

Thanks.

Remeber to look into <https://gcc.gnu.org/PR92116> "Potential null
pointer dereference in 'gomp_acc_remove_pointer'", which may be relevant
here.

I see you've merged in the relevant parts of my incremental patch '[WIP]
OpenACC 2.6 manual deep copy support (attach/detach): adjust for
"goacc_async_unmap_tgt" removal', that I included in
<http://mid.mail-archive.com/yxfpftuqpakv.fsf@hertz.schwinge.homeip.net>,
which tells me that I supposedly understood that part alright.  ;-D

> As part of developing that patch, libgomp's OpenACC reference counting
> implementation proved to be somewhat inconsistent, especially when
> used in combination with the deep copy support which exercises it
> more thoroughly.
>
> So, this patch contains just the changes to reference-counting behaviour,
> for ease of (re-)review.  The other parts of OpenACC 2.6 manual deep
> copy support are forthcoming, but some changes in this patch anticipate
> that support.

As we're discussing these separately, please for now remove the changes
related to the 'VREFCOUNT_LINK_KEY' toggle flag, and moving 'link_key'
into an union (to later be shared with 'attach_count');
<87pniuuhkj.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87pniuuhkj.fsf@euler.schwinge.homeip.net>.

> Tested with offloading to NVPTX, with good results (though a couple of
> tests need fixing also).

The testsuite changes we're discussing separately, and need to go in
before this one, obviously.

> OK for trunk?

I haven't understood all the changes related to replacing
'dynamic_refcount' with 'virtual_refcount', getting rid of
'data_environ', the 'lookup_dev' rework, but I trust you got that right.
In particular, these seem to remove special-case OpenACC code in favor of
generic OMP code, which is good.

A few more comments:

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h

>  typedef struct acc_dispatch_t
>  {
> -  /* This is a linked list of data mapped using the
> -     acc_map_data/acc_unmap_data or "acc enter data"/"acc exit data" pragmas.
> -     Unlike mapped_data in the goacc_thread struct, unmapping can
> -     happen out-of-order with respect to mapping.  */
> -  /* This is guarded by the lock in the "outer" struct gomp_device_descr.  */
> -  struct target_mem_desc *data_environ;

As mentioned before, please also accordingly update the comment attached
to 'acc_dispatch_t openacc' in 'struct gomp_device_descr'.

That code:

> -/* Free address mapping tables.  MM must be locked on entry, and remains 
> locked
> -   on return.  */
> -
> -attribute_hidden void
> -gomp_free_memmap (struct splay_tree_s *mem_map)
> -{
> -  while (mem_map->root)
> -    {
> -      struct target_mem_desc *tgt = mem_map->root->key.tgt;
> -
> -      splay_tree_remove (mem_map, &mem_map->root->key);
> -      free (tgt->array);
> -      free (tgt);
> -    }
> -}

... kind-of gets inlined here:

> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -356,9 +356,13 @@ acc_shutdown_1 (acc_device_t d)
>  
>        if (walk->dev)
>       {
> -       gomp_mutex_lock (&walk->dev->lock);
> -       gomp_free_memmap (&walk->dev->mem_map);
> -       gomp_mutex_unlock (&walk->dev->lock);
> +       while (walk->dev->mem_map.root)
> +         {
> +           splay_tree_key k = &walk->dev->mem_map.root->key;
> +           gomp_remove_var (walk->dev, k);
> +         }
>  
>         walk->dev = NULL;
>         walk->base_dev = NULL;

It's not obvious to me why it's OK to remove the locking?  Don't all
operations on the 'mem_map' have to have the device locked?

Does that code now still have the previous (and expected?) "finalize"
semantics (don't consider 'refcount', always unmap)?  (Should we assert
here that 'gomp_remove_var' always returns 'true'?  And/or, if it
doesn't, what does that mean then?)  Or am I confused?  ;-)

> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c

> @@ -427,6 +418,7 @@ acc_unmap_data (void *h)
>  {
>    struct goacc_thread *thr = goacc_thread ();
>    struct gomp_device_descr *acc_dev = thr->dev;
> +  struct splay_tree_key_s cur_node;

I know it's often not the case in existing code, but when adding new
code, please move definitions next to their first use.

> @@ -438,12 +430,11 @@ acc_unmap_data (void *h)
>    acc_api_info api_info;
>    bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info);
>  
>    gomp_mutex_lock (&acc_dev->lock);
>  
> -  splay_tree_key n = lookup_host (acc_dev, h, 1);
> -  struct target_mem_desc *t;
> +  cur_node.host_start = (uintptr_t) h;
> +  cur_node.host_end = cur_node.host_start + 1;
> +  splay_tree_key n = splay_tree_lookup (&acc_dev->mem_map, &cur_node);
>  
>    if (!n)
>      {

Isn't this just inlining 'lookup_host'?  There may be a good reason to do
that, but what is it?

> @@ -451,47 +442,28 @@ acc_unmap_data (void *h)

> -  /* Mark for removal.  */
> -  n->refcount = 1;
> +  splay_tree_remove (&acc_dev->mem_map, n);
>  
> -  t = n->tgt;
> +  struct target_mem_desc *tgt = n->tgt;
>  
> -  if (t->refcount == 2)
> +  if (tgt->refcount > 0)
> +    tgt->refcount--;
> +  else
>      {
> -[...]
> +      free (tgt->array);
> +      free (tgt);
>      }

Shouldn't that be 'if (tgt->refcount > 1)' (instead of '> 0'), like in
'gomp_unref_tgt' -- or actually use that function?

>  
>    gomp_mutex_unlock (&acc_dev->lock);
>  
> -  gomp_unmap_vars (t, true);
> -
>    if (profiling_p)
>      {
>        thr->prof_info = NULL;

Hmm, I don't understand the changes leading to this, but again, I shall
trust that you've got that right.

Or, was that a bug in the existing code, and we don't have proper test
coverage?

> @@ -577,17 +551,14 @@ present_create_copy (unsigned f, void *h, size_t s, int 
> async)

> -      d = tgt->to_free;

> +      n = lookup_host (acc_dev, h, s);
> +      assert (n != NULL);
> +      d = (void *) (n->tgt->tgt_start + n->tgt_offset + (uintptr_t) h
> +                 - n->host_start);

|   return d;

Again, it's not obvious to me how that is semantically equivalent to what
we've returned before?

>  void
> -gomp_acc_remove_pointer (void *h, size_t s, bool force_copyfrom, int async,
> -                      int finalize, int mapnum)
> +gomp_acc_remove_pointer (struct gomp_device_descr *acc_dev, void **hostaddrs,
> +                      size_t *sizes, unsigned short *kinds, int async,
> +                      bool finalize, int mapnum)
>  {

> +      switch (kind)
> +        {
> +     case GOMP_MAP_FROM:
> +     case GOMP_MAP_FORCE_FROM:
> +     case GOMP_MAP_ALWAYS_FROM:
> +       copyfrom = true;
> +       /* Fallthrough.  */
> +
> +     case GOMP_MAP_TO_PSET:
> +     case GOMP_MAP_POINTER:
> +     case GOMP_MAP_DELETE:
> +     case GOMP_MAP_RELEASE:
> [...]
> +     default:
> +       gomp_mutex_unlock (&acc_dev->lock);
> +       gomp_fatal ("gomp_acc_remove_pointer unhandled kind 0x%.2x",
> +                   kind);

Thanks for being explicit about the expected mapping kinds, etc.

> -      /* If running synchronously, unmap immediately.  */
> -      if (async < acc_async_noval)
> -     gomp_unmap_vars (t, true);
> -      else
> -     {
> -       goacc_aq aq = get_goacc_asyncqueue (async);
> -       gomp_unmap_vars_async (t, true, aq);

As mentioned before, 'gomp_acc_remove_pointer' now "has an unused 'async'
formal parameter.  Is that meant to be resolved to an asyncqueue, and
pass that one to 'gomp_copy_dev2host', and call 'gomp_remove_var_async'
instead of 'gomp_remove_var'"?  That's here:

> +       if (copyfrom)
> +         gomp_copy_dev2host (acc_dev, NULL, (void *) cur_node.host_start,
> +                             (void *) (n->tgt->tgt_start + n->tgt_offset
> +                                       + cur_node.host_start
> +                                       - n->host_start),
> +                             cur_node.host_end - cur_node.host_start);
> +
> +       if (n->refcount == 0)
> +         gomp_remove_var (acc_dev, n);
> +       break;

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -56,12 +56,29 @@ find_pointer (int pos, size_t mapnum, unsigned short 
> *kinds)

I've always been confused by this function (before known as 'find_pset');
this feels wrong, but I've never gotten to the bottom of it.

I'll trust that your changes there can only improve the current
situation, not worsen it.  ;-)

And, again, thanks for being explicit about the expected mapping kinds,
etc.

> @@ -745,8 +762,14 @@ GOACC_enter_exit_data (int flags_m, size_t mapnum,
>           }
>         else
>           {
> -           gomp_acc_insert_pointer (pointer, &hostaddrs[i],
> -                                    &sizes[i], &kinds[i], async);
> +           goacc_aq aq = get_goacc_asyncqueue (async);
> +           for (int j = 0; j < 2; j++)

Should this magic constant '2' be derived from 'pointer' or some such?

> +             gomp_map_vars_async (acc_dev, aq,
> +                                  (j == 0 || pointer == 2) ? 1 : 2,
> +                                  &hostaddrs[i + j], NULL,
> +                                  &sizes[i + j], &kinds[i + j], true,
> +                                  GOMP_MAP_VARS_OPENACC_ENTER_DATA);

;-) Yuck.  As requested before: "Can we get a comment added to such
'magic', please?"

I just wish that eventually we'll be able to can get rid of that stuff,
and just let 'gomp_map_vars' do its thing.  Similar to
<https://gcc.gnu.org/PR90596> "'GOACC_parallel_keyed' should use
'GOMP_MAP_VARS_TARGET'".

(For avoidance of doubt, that's not your task right now.)

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -536,7 +536,8 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
>    struct target_mem_desc *tgt
>      = gomp_malloc (sizeof (*tgt) + sizeof (tgt->list[0]) * mapnum);
>    tgt->list_count = mapnum;
> -  tgt->refcount = pragma_kind == GOMP_MAP_VARS_ENTER_DATA ? 0 : 1;
> +  tgt->refcount = (pragma_kind == GOMP_MAP_VARS_ENTER_DATA
> +                || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA) ? 0 : 1;
>    tgt->device_descr = devicep;
>    struct gomp_coalesce_buf cbuf, *cbufp = NULL;
>  
> @@ -1051,8 +1053,20 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
>    /* If the variable from "omp target enter data" map-list was already 
> mapped,
>       tgt is not needed.  Otherwise tgt will be freed by gomp_unmap_vars or
>       gomp_exit_data.  */
> -  if (pragma_kind == GOMP_MAP_VARS_ENTER_DATA && tgt->refcount == 0)
> -    {
> +  if ((pragma_kind == GOMP_MAP_VARS_ENTER_DATA
> +       || pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA)
> +      && tgt->refcount == 0)
> +    {
> +      /* If we're about to discard a target_mem_desc with no "structural"
> +      references (tgt->refcount == 0), any splay keys linked in the tgt's
> +      list must have their virtual refcount incremented to represent that
> +      "lost" reference in order to implement the semantics of the OpenACC
> +      "present increment" operation properly.  */
> +      if (pragma_kind == GOMP_MAP_VARS_OPENACC_ENTER_DATA)
> +     for (i = 0; i < tgt->list_count; i++)
> +       if (tgt->list[i].key)
> +         tgt->list[i].key->virtual_refcount++;
> +
>        free (tgt);
>        tgt = NULL;
>      }

So that last item is the only difference between
'GOMP_MAP_VARS_ENTER_DATA' and 'GOMP_MAP_VARS_OPENACC_ENTER_DATA'.  Again
I have not digested that one, but will trust you.

> @@ -1310,7 +1366,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
> *devicep, unsigned version,
>        k->tgt = tgt;
>        k->tgt_offset = target_table[i].start;
>        k->refcount = REFCOUNT_INFINITY;
> -      k->link_key = NULL;
> +      k->virtual_refcount = 0;
>        array->left = NULL;
>        array->right = NULL;
>        splay_tree_insert (&devicep->mem_map, array);
> @@ -1342,7 +1398,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
> *devicep, unsigned version,
>        k->tgt = tgt;
>        k->tgt_offset = target_var->start;
>        k->refcount = target_size & link_bit ? REFCOUNT_LINK : 
> REFCOUNT_INFINITY;
> -      k->link_key = NULL;
> +      k->virtual_refcount = 0;
>        array->left = NULL;
>        array->right = NULL;
>        splay_tree_insert (&devicep->mem_map, array);

Why no longer initialize 'link_key' here?

I'd expect that always all fields of 'k' ('struct splay_tree_key_s') get
initialized, so like:

> @@ -2612,6 +2652,8 @@ omp_target_associate_ptr (const void *host_ptr, const 
> void *device_ptr,
>        k->tgt = tgt;
>        k->tgt_offset = (uintptr_t) device_ptr + device_offset;
>        k->refcount = REFCOUNT_INFINITY;
> +      k->virtual_refcount = 0;
> +      k->u.link_key = NULL;
>        array->left = NULL;
>        array->right = NULL;
>        splay_tree_insert (&devicep->mem_map, array);

(I haven't verified whether that's always done, please verify.)


Grüße
 Thomas

Attachment: signature.asc
Description: PGP signature

Reply via email to