On Sat, 14 Dec 2019 00:19:04 +0000
Julian Brown <jul...@codesourcery.com> wrote:

> On Fri, 13 Dec 2019 16:25:25 +0100
> Thomas Schwinge <tho...@codesourcery.com> wrote:
> 
> > Hi Julian!
> > 
> > On 2019-10-29T12:15:01+0000, Julian Brown <jul...@codesourcery.com>
> > wrote:  
> > >  static int
> > > -find_pointer (int pos, size_t mapnum, unsigned short *kinds)
> > > +find_group_last (int pos, size_t mapnum, unsigned short *kinds)
> > >  {
> > > -  if (pos + 1 >= mapnum)
> > > -    return 0;
> > > +  unsigned char kind0 = kinds[pos] & 0xff;
> > > +  int first_pos = pos, last_pos = pos;
> > >  
> > > -  unsigned char kind = kinds[pos+1] & 0xff;
> > > -
> > > -  if (kind == GOMP_MAP_TO_PSET)
> > > -    return 3;
> > > -  else if (kind == GOMP_MAP_POINTER)
> > > -    return 2;
> > > +  if (kind0 == GOMP_MAP_TO_PSET)
> > > +    {
> > > +      while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) ==
> > > GOMP_MAP_POINTER)
> > > + last_pos = ++pos;
> > > +      /* We expect at least one GOMP_MAP_POINTER after a
> > > GOMP_MAP_TO_PSET.  */
> > > +      assert (last_pos > first_pos);
> > > +    }
> > > +  else
> > > +    {
> > > +      /* GOMP_MAP_ALWAYS_POINTER can only appear directly after
> > > some other
> > > +  mapping.  */
> > > +      if (pos + 1 < mapnum
> > > +   && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER)
> > > + return pos + 1;
> > > +
> > > +      /* We can have one or several GOMP_MAP_POINTER mappings
> > > after a to/from
> > > +  (etc.) mapping.  */
> > > +      while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) ==
> > > GOMP_MAP_POINTER)
> > > + last_pos = ++pos;
> > > +    }
> > >  
> > > -  return 0;
> > > +  return last_pos;
> > >  }    
> > 
> > So I ran a simple experiment where I did:
> > 
> >     assert (find_group_last (i, mapnum, kinds) == i + pointer);
> > 
> > ... where 'pointer' is the current 'find_pointer' function.  (That
> > is, compare that the old and new way are doing the same things,
> > given the current GCC code generation/test cases.)
> > 
> > This 'assert' triggers for a few test cases:
> > 'libgomp.oacc-fortran/allocatable-array-1.f90',
> > 'libgomp.oacc-fortran/data-2.f90',
> > 'libgomp.oacc-fortran/data-3.f90',
> > 'libgomp.oacc-fortran/data-4-2.f90',
> > 'libgomp.oacc-fortran/data-4.f90',
> > 'libgomp.oacc-fortran/data-5.f90', 'libgomp.oacc-fortran/if-1.f90',
> > 'libgomp.oacc-fortran/optional-data-enter-exit.f90'.  (Maybe those
> > are the only ones actually using that stuff?)
> > 
> > I looked into the first one
> > ('libgomp.oacc-fortran/allocatable-array-1.f90'), and for:
> > 
> >     integer, parameter :: n = 40
> >     integer, allocatable :: ar(:,:,:)
> > 
> >     allocate (ar(1:n,0:n-1,0:n-1))
> >     !$acc enter data copyin (ar)
> > 
> > ... found:
> > 
> >     (gdb) print mapnum
> >     $2 = 3
> >     (gdb) print kinds[0]
> >     $3 = 1 // GOMP_MAP_TO
> >     (gdb) print kinds[1]
> >     $4 = 773
> >     (gdb) print kinds[1] & 0xff
> >     $5 = 5 // GOMP_MAP_TO_PSET
> >     (gdb) print kinds[2]
> >     $6 = 772
> >     (gdb) print kinds[2] & 0xff
> >     $7 = 4 // GOMP_MAP_POINTER
> > 
> > Current behavior: 'find_pointer (0, mapnum, kinds) == 3', so all
> > three get mapped as one group.
> > 
> > New behavior: 'find_group_last (0, mapnum, kinds) == 0', so the
> > 'GOMP_MAP_TO' gets mapped alone.  Then, 'find_group_last (1, mapnum,
> > kinds) == 2', so the 'GOMP_MAP_TO_PSET', 'GOMP_MAP_POINTER' get
> > mapped as one group.
> > 
> > Is that intentional?  
> 
> Yes. In a previous iteration of the refcount overhaul patch, we had
> the "magic" code fragment:
> 
> > +         for (int j = 0; j < 2; j++)  
> > +           gomp_map_vars_async (acc_dev, aq,
> > +                                (j == 0 || pointer == 2) ?
> > 1 : 2,
> > +                                &hostaddrs[i + j], NULL,
> > +                                &sizes[i + j], &kinds[i + j],
> > true,
> > +
> > GOMP_MAP_VARS_OPENACC_ENTER_DATA);    
> 
> The "pointer == 3" case here will do precisely the same thing as the
> current iteration of the patch: pass the GOMP_MAP_TO to one
> gomp_map_vars_async call, and pass the GOMP_MAP_TO_PSET +
> GOMP_MAP_POINTER as a pair in a second call.
> 
> The "pointer == 2" case (i.e. with a GOMP_MAP_TO and a
> GOMP_MAP_POINTER) will also handle the mappings separately in both the
> earlier patch iteration and this one.
> 
> That's different from the current behaviour, because we don't want all
> three mappings to be bound together. The problematic cases of doing
> so might only appear with the manual deep copy patch applied also,
> though (and/or with the refcount-checking patch applied/enabled). (I
> don't remember exactly which test cases this affected, but I can
> check.)

To follow up from this: the change in this patch is really to ensure
that reference counts are correct/consistent for *all* mappings at all
times. Contrast the behaviour described in the following comment in the
existing code (goacc_insert_pointer):

   /* ...

   Only the first mapping is considered in reference counting; the
   following ones implicitly follow suit.  */

This is problematic with automated checking since the "hidden" mappings
will have incorrect counts, and the problem becomes worse when the
GOMP_MAP_ATTACH, etc. mappings are added by the manual deep copy patch.

I tweaked the patch together with some debug-dumping code, and the
change from "find_pointer-like" behaviour and "find_group_last-like"
behaviour can be seen as follows (from deep-copy-8.c):

with find_pointer:

mapping group 0-4
  0 : gomp_map_struct                           0x7ffd5aa10ce0          4
  1 : gomp_map_to                               0x7ffd5aa10ce0          4
  2 : gomp_map_alloc                            0x7ffd5aa10ce8          8
  3 : gomp_map_alloc                            0x7ffd5aa10cf0          8
  4 : gomp_map_alloc                            0x7ffd5aa10cf8          8
mapping group 5-6
  0 : gomp_map_to                                    0x14ee050        400
  1 : gomp_map_attach                           0x7ffd5aa10ce8          0
mapping group 7-8
  0 : gomp_map_to                                    0x14ee1f0        400
  1 : gomp_map_attach                           0x7ffd5aa10cf0          0
mapping group 9-10
  0 : gomp_map_to                                    0x14ee390        400
  1 : gomp_map_attach                           0x7ffd5aa10cf8          0

with find_group_last:

mapping group 0-4
  0 : gomp_map_struct                           0x7ffc9011c3b0          4
  1 : gomp_map_to                               0x7ffc9011c3b0          4
  2 : gomp_map_alloc                            0x7ffc9011c3b8          8
  3 : gomp_map_alloc                            0x7ffc9011c3c0          8
  4 : gomp_map_alloc                            0x7ffc9011c3c8          8
mapping group 5-5
  0 : gomp_map_to                                    0x10e0050        400
mapping group 6-6
  0 : gomp_map_attach                           0x7ffc9011c3b8          0
mapping group 7-7
  0 : gomp_map_to                                    0x10e01f0        400
mapping group 8-8
  0 : gomp_map_attach                           0x7ffc9011c3c0          0
mapping group 9-9
  0 : gomp_map_to                                    0x10e0390        400
mapping group 10-10
  0 : gomp_map_attach                           0x7ffc9011c3c8          0

In the former case, each grouped "gomp_map_to/gomp_map_attach" will
form a single target_mem_desc. Then, goacc_exit_data_internal (or the
previous code it replaces) performs unmapping one mapping (splay tree
key) at a time.

If any of these splay trees reference count hits zero,
gomp_remove_var_async will be called, and then (I think) that
grouping-together becomes problematic: the reference for the "other"
splay tree key in the target_mem_desc's list gets lost.

> > Any then, compating that to
> > 'libgomp/target.c:GOMP_target_enter_exit_data', where (aside from
> > 'GOMP_MAP_STRUCT'; not relevant for us right now, yes?) everything
> > always gets mapped alone:
> > 
> >     for (i = 0; i < mapnum; i++)
> >       if ((kinds[i] & 0xff) == GOMP_MAP_STRUCT)
> >         { [...] }
> >       else
> >         gomp_map_vars (devicep, 1, &hostaddrs[i], NULL, &sizes[i],
> > &kinds[i], true, GOMP_MAP_VARS_ENTER_DATA);
> > 
> > Is it just an "accident" that for OpenACC we were and still are
> > going to do this differently, or is there an actual reason?  
> 
> ...why mapping one-at-a-time is the right thing to do here. Maybe the
> OpenMP version never sees GOMP_MAP_TO_PSET (or
> GOMP_MAP_ALWAYS_POINTER, which has a hard-wired dependency on the
> previous clause)? (I can try to check that too.)

Actually it looks like GOMP_MAP_TO_PSET can occur in
GOMP_target_enter_exit_data, but it seems that only a single test case
exercises that (libgomp.fortran/target9.f90). I'd guess probably either
way works -- either with GOMP_MAP_POINTER grouped together after
a related GOMP_MAP_TO_PSET, or not.

Thanks,

Julian

Reply via email to