On Tue, Oct 24, 2017 at 08:39:13PM +0300, Alexander Monakov wrote: > On Tue, 24 Oct 2017, Jakub Jelinek wrote: > > loop transfering the addresses or firstprivate_int values to the device > > - where we issued mapnum host2dev transfers each just pointer-sized > > when we could have just prepared all the pointers in an array and host2dev > > copy them all together. > > Can you please give an example OpenMP code? I thought such variables are > just fields of one omp_data_? struct that is copied all at once, but I guess > I'm misunderstanding.
Basically anything with multiple mappings. void foo () { int a[10], b[10], c[10], d, e, f; struct S { int g[10]; } h; init (a, b, c, &d, &e, &f, &h); #pragma omp target map(to:a, b, c) firstprivate (d, e, f, h) use (a, b, c, d, e, f, h); } The above has mapnum 7, if none of this is mapped, then the current trunk will perform 3 host2dev 40 byte copies for the 3 arrays, 1 40 byte copy for the firstprivate h, and 7 pointer-sized copies for the addresses of the 3 arrays, one firstprivate struct and 3 ints encoded in pointers. As all the 4 40 byte allocations plus the 7 * pointer sized allocations are adjacent with no gaps, with the patch there will be a single host2dev transfer of 160+7*sizeof(void*) bytes. > > +struct gomp_map_cache > > +{ > > + void *buf; > > + struct target_mem_desc *tgt; > > + size_t *chunks; > > + long chunk_cnt; > > + long use_cnt; > > +}; > > Would really appreciate comments for meaning of fields here. Also, is the > struct properly named? From the patch description I understood it to be a > copy coalescing buffer, not a cache. I'll rename it and add comments. > > @@ -449,19 +531,34 @@ gomp_map_vars (struct gomp_device_descr > > size_t align = (size_t) 1 << (kind >> rshift); > > if (tgt_align < align) > > tgt_align = align; > > - tgt_size -= (uintptr_t) hostaddrs[first] > > - - (uintptr_t) hostaddrs[i]; > > + tgt_size -= (uintptr_t) hostaddrs[first] - cur_node.host_start; > > tgt_size = (tgt_size + align - 1) & ~(align - 1); > > - tgt_size += cur_node.host_end - (uintptr_t) hostaddrs[i]; > > + tgt_size += cur_node.host_end - cur_node.host_start; > > not_found_cnt += last - i; > > for (i = first; i <= last; i++) > > - tgt->list[i].key = NULL; > > + { > > + tgt->list[i].key = NULL; > > + switch (get_kind (short_mapkind, kinds, i) & typemask) > > + { > > + case GOMP_MAP_ALLOC: > > + case GOMP_MAP_FROM: > > + case GOMP_MAP_FORCE_ALLOC: > > + case GOMP_MAP_ALWAYS_FROM: > > + break; > > + default: > > + /* All the others copy data if newly allocated. */ > > + gomp_cache_add (&cache, tgt_size - cur_node.host_end > > + + (uintptr_t) hostaddrs[i], > > + sizes[i]); > > A similar switch needed to be duplicated below. Would it be appropriate to > pass the map kind to gomp_cache_add, or have a thin wrapper around it to have > checks for appropriate kinds in one place? No, I'd prefer to keep the logic out of gomp_cache_add, but can add an inline predicate whether kind writes to device. Jakub