On Sat, 14 Dec 2019 00:19:04 +0000 Julian Brown <jul...@codesourcery.com> wrote:
> On Fri, 13 Dec 2019 16:25:25 +0100 > Thomas Schwinge <tho...@codesourcery.com> wrote: > > > Hi Julian! > > > > On 2019-10-29T12:15:01+0000, Julian Brown <jul...@codesourcery.com> > > wrote: > > > static int > > > -find_pointer (int pos, size_t mapnum, unsigned short *kinds) > > > +find_group_last (int pos, size_t mapnum, unsigned short *kinds) > > > { > > > - if (pos + 1 >= mapnum) > > > - return 0; > > > + unsigned char kind0 = kinds[pos] & 0xff; > > > + int first_pos = pos, last_pos = pos; > > > > > > - unsigned char kind = kinds[pos+1] & 0xff; > > > - > > > - if (kind == GOMP_MAP_TO_PSET) > > > - return 3; > > > - else if (kind == GOMP_MAP_POINTER) > > > - return 2; > > > + if (kind0 == GOMP_MAP_TO_PSET) > > > + { > > > + while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == > > > GOMP_MAP_POINTER) > > > + last_pos = ++pos; > > > + /* We expect at least one GOMP_MAP_POINTER after a > > > GOMP_MAP_TO_PSET. */ > > > + assert (last_pos > first_pos); > > > + } > > > + else > > > + { > > > + /* GOMP_MAP_ALWAYS_POINTER can only appear directly after > > > some other > > > + mapping. */ > > > + if (pos + 1 < mapnum > > > + && (kinds[pos + 1] & 0xff) == GOMP_MAP_ALWAYS_POINTER) > > > + return pos + 1; > > > + > > > + /* We can have one or several GOMP_MAP_POINTER mappings > > > after a to/from > > > + (etc.) mapping. */ > > > + while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == > > > GOMP_MAP_POINTER) > > > + last_pos = ++pos; > > > + } > > > > > > - return 0; > > > + return last_pos; > > > } > > > > So I ran a simple experiment where I did: > > > > assert (find_group_last (i, mapnum, kinds) == i + pointer); > > > > ... where 'pointer' is the current 'find_pointer' function. (That > > is, compare that the old and new way are doing the same things, > > given the current GCC code generation/test cases.) > > > > This 'assert' triggers for a few test cases: > > 'libgomp.oacc-fortran/allocatable-array-1.f90', > > 'libgomp.oacc-fortran/data-2.f90', > > 'libgomp.oacc-fortran/data-3.f90', > > 'libgomp.oacc-fortran/data-4-2.f90', > > 'libgomp.oacc-fortran/data-4.f90', > > 'libgomp.oacc-fortran/data-5.f90', 'libgomp.oacc-fortran/if-1.f90', > > 'libgomp.oacc-fortran/optional-data-enter-exit.f90'. (Maybe those > > are the only ones actually using that stuff?) > > > > I looked into the first one > > ('libgomp.oacc-fortran/allocatable-array-1.f90'), and for: > > > > integer, parameter :: n = 40 > > integer, allocatable :: ar(:,:,:) > > > > allocate (ar(1:n,0:n-1,0:n-1)) > > !$acc enter data copyin (ar) > > > > ... found: > > > > (gdb) print mapnum > > $2 = 3 > > (gdb) print kinds[0] > > $3 = 1 // GOMP_MAP_TO > > (gdb) print kinds[1] > > $4 = 773 > > (gdb) print kinds[1] & 0xff > > $5 = 5 // GOMP_MAP_TO_PSET > > (gdb) print kinds[2] > > $6 = 772 > > (gdb) print kinds[2] & 0xff > > $7 = 4 // GOMP_MAP_POINTER > > > > Current behavior: 'find_pointer (0, mapnum, kinds) == 3', so all > > three get mapped as one group. > > > > New behavior: 'find_group_last (0, mapnum, kinds) == 0', so the > > 'GOMP_MAP_TO' gets mapped alone. Then, 'find_group_last (1, mapnum, > > kinds) == 2', so the 'GOMP_MAP_TO_PSET', 'GOMP_MAP_POINTER' get > > mapped as one group. > > > > Is that intentional? > > Yes. In a previous iteration of the refcount overhaul patch, we had > the "magic" code fragment: > > > + for (int j = 0; j < 2; j++) > > + gomp_map_vars_async (acc_dev, aq, > > + (j == 0 || pointer == 2) ? > > 1 : 2, > > + &hostaddrs[i + j], NULL, > > + &sizes[i + j], &kinds[i + j], > > true, > > + > > GOMP_MAP_VARS_OPENACC_ENTER_DATA); > > The "pointer == 3" case here will do precisely the same thing as the > current iteration of the patch: pass the GOMP_MAP_TO to one > gomp_map_vars_async call, and pass the GOMP_MAP_TO_PSET + > GOMP_MAP_POINTER as a pair in a second call. > > The "pointer == 2" case (i.e. with a GOMP_MAP_TO and a > GOMP_MAP_POINTER) will also handle the mappings separately in both the > earlier patch iteration and this one. > > That's different from the current behaviour, because we don't want all > three mappings to be bound together. The problematic cases of doing > so might only appear with the manual deep copy patch applied also, > though (and/or with the refcount-checking patch applied/enabled). (I > don't remember exactly which test cases this affected, but I can > check.) To follow up from this: the change in this patch is really to ensure that reference counts are correct/consistent for *all* mappings at all times. Contrast the behaviour described in the following comment in the existing code (goacc_insert_pointer): /* ... Only the first mapping is considered in reference counting; the following ones implicitly follow suit. */ This is problematic with automated checking since the "hidden" mappings will have incorrect counts, and the problem becomes worse when the GOMP_MAP_ATTACH, etc. mappings are added by the manual deep copy patch. I tweaked the patch together with some debug-dumping code, and the change from "find_pointer-like" behaviour and "find_group_last-like" behaviour can be seen as follows (from deep-copy-8.c): with find_pointer: mapping group 0-4 0 : gomp_map_struct 0x7ffd5aa10ce0 4 1 : gomp_map_to 0x7ffd5aa10ce0 4 2 : gomp_map_alloc 0x7ffd5aa10ce8 8 3 : gomp_map_alloc 0x7ffd5aa10cf0 8 4 : gomp_map_alloc 0x7ffd5aa10cf8 8 mapping group 5-6 0 : gomp_map_to 0x14ee050 400 1 : gomp_map_attach 0x7ffd5aa10ce8 0 mapping group 7-8 0 : gomp_map_to 0x14ee1f0 400 1 : gomp_map_attach 0x7ffd5aa10cf0 0 mapping group 9-10 0 : gomp_map_to 0x14ee390 400 1 : gomp_map_attach 0x7ffd5aa10cf8 0 with find_group_last: mapping group 0-4 0 : gomp_map_struct 0x7ffc9011c3b0 4 1 : gomp_map_to 0x7ffc9011c3b0 4 2 : gomp_map_alloc 0x7ffc9011c3b8 8 3 : gomp_map_alloc 0x7ffc9011c3c0 8 4 : gomp_map_alloc 0x7ffc9011c3c8 8 mapping group 5-5 0 : gomp_map_to 0x10e0050 400 mapping group 6-6 0 : gomp_map_attach 0x7ffc9011c3b8 0 mapping group 7-7 0 : gomp_map_to 0x10e01f0 400 mapping group 8-8 0 : gomp_map_attach 0x7ffc9011c3c0 0 mapping group 9-9 0 : gomp_map_to 0x10e0390 400 mapping group 10-10 0 : gomp_map_attach 0x7ffc9011c3c8 0 In the former case, each grouped "gomp_map_to/gomp_map_attach" will form a single target_mem_desc. Then, goacc_exit_data_internal (or the previous code it replaces) performs unmapping one mapping (splay tree key) at a time. If any of these splay trees reference count hits zero, gomp_remove_var_async will be called, and then (I think) that grouping-together becomes problematic: the reference for the "other" splay tree key in the target_mem_desc's list gets lost. > > Any then, compating that to > > 'libgomp/target.c:GOMP_target_enter_exit_data', where (aside from > > 'GOMP_MAP_STRUCT'; not relevant for us right now, yes?) everything > > always gets mapped alone: > > > > for (i = 0; i < mapnum; i++) > > if ((kinds[i] & 0xff) == GOMP_MAP_STRUCT) > > { [...] } > > else > > gomp_map_vars (devicep, 1, &hostaddrs[i], NULL, &sizes[i], > > &kinds[i], true, GOMP_MAP_VARS_ENTER_DATA); > > > > Is it just an "accident" that for OpenACC we were and still are > > going to do this differently, or is there an actual reason? > > ...why mapping one-at-a-time is the right thing to do here. Maybe the > OpenMP version never sees GOMP_MAP_TO_PSET (or > GOMP_MAP_ALWAYS_POINTER, which has a hard-wired dependency on the > previous clause)? (I can try to check that too.) Actually it looks like GOMP_MAP_TO_PSET can occur in GOMP_target_enter_exit_data, but it seems that only a single test case exercises that (libgomp.fortran/target9.f90). I'd guess probably either way works -- either with GOMP_MAP_POINTER grouped together after a related GOMP_MAP_TO_PSET, or not. Thanks, Julian