On Tue, Oct 24, 2017 at 08:39:13PM +0300, Alexander Monakov wrote:
> On Tue, 24 Oct 2017, Jakub Jelinek wrote:
> > loop transfering the addresses or firstprivate_int values to the device
> > - where we issued mapnum host2dev transfers each just pointer-sized
> > when we could have just prepared all the pointers in an array and host2dev
> > copy them all together.
> 
> Can you please give an example OpenMP code? I thought such variables are
> just fields of one omp_data_? struct that is copied all at once, but I guess
> I'm misunderstanding.

Basically anything with multiple mappings.
void foo () {
int a[10], b[10], c[10], d, e, f;
struct S { int g[10]; } h;
init (a, b, c, &d, &e, &f, &h);
#pragma omp target map(to:a, b, c) firstprivate (d, e, f, h)
use (a, b, c, d, e, f, h);
}
The above has mapnum 7, if none of this is mapped, then the current trunk
will perform 3 host2dev 40 byte copies for the 3 arrays, 1 40 byte copy for
the firstprivate h, and 7 pointer-sized copies for the addresses of the 3
arrays, one firstprivate struct and 3 ints encoded in pointers.
As all the 4 40 byte allocations plus the 7 * pointer sized allocations
are adjacent with no gaps, with the patch there will be a single host2dev
transfer of 160+7*sizeof(void*) bytes.

> > +struct gomp_map_cache
> > +{
> > +  void *buf;
> > +  struct target_mem_desc *tgt;
> > +  size_t *chunks;
> > +  long chunk_cnt;
> > +  long use_cnt;
> > +};
> 
> Would really appreciate comments for meaning of fields here.  Also, is the
> struct properly named?  From the patch description I understood it to be a
> copy coalescing buffer, not a cache.

I'll rename it and add comments.

> > @@ -449,19 +531,34 @@ gomp_map_vars (struct gomp_device_descr
> >           size_t align = (size_t) 1 << (kind >> rshift);
> >           if (tgt_align < align)
> >             tgt_align = align;
> > -         tgt_size -= (uintptr_t) hostaddrs[first]
> > -                     - (uintptr_t) hostaddrs[i];
> > +         tgt_size -= (uintptr_t) hostaddrs[first] - cur_node.host_start;
> >           tgt_size = (tgt_size + align - 1) & ~(align - 1);
> > -         tgt_size += cur_node.host_end - (uintptr_t) hostaddrs[i];
> > +         tgt_size += cur_node.host_end - cur_node.host_start;
> >           not_found_cnt += last - i;
> >           for (i = first; i <= last; i++)
> > -           tgt->list[i].key = NULL;
> > +           {
> > +             tgt->list[i].key = NULL;
> > +             switch (get_kind (short_mapkind, kinds, i) & typemask)
> > +               {
> > +               case GOMP_MAP_ALLOC:
> > +               case GOMP_MAP_FROM:
> > +               case GOMP_MAP_FORCE_ALLOC:
> > +               case GOMP_MAP_ALWAYS_FROM:
> > +                 break;
> > +               default:
> > +                 /* All the others copy data if newly allocated.  */
> > +                 gomp_cache_add (&cache, tgt_size - cur_node.host_end
> > +                                         + (uintptr_t) hostaddrs[i],
> > +                                 sizes[i]);
> 
> A similar switch needed to be duplicated below.  Would it be appropriate to
> pass the map kind to gomp_cache_add, or have a thin wrapper around it to have
> checks for appropriate kinds in one place?

No, I'd prefer to keep the logic out of gomp_cache_add, but can add an
inline predicate whether kind writes to device.

        Jakub

Reply via email to