On Mon, Oct 26, 2015 at 09:35:52PM +0300, Ilya Verbin wrote: > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote: > > As the testcases show, #pragma omp declare target has now a new form (well, > > two; with some issues on it pending), where it is used just as a single > > declarative directive rather than a pair of them and allows marking > > vars and functions by name as "omp declare target" vars/functions (which the > > middle-end etc. already handles), but also "omp declare target link", which > > is a deferred var, that is not initially mapped (on devices without shared > > memory with host), but has to be mapped explicitly. > > I don't quite understand how link should work. OpenMP 4.5 says: > > "The list items of a link clause are not mapped by the declare target > directive. > Instead, their mapping is deferred until they are mapped by target data or > target constructs. They are mapped only for such regions." > > But doesn't this mean that the example bellow should work identically > with/without USE_LINK defined? Or is there some difference on other > testcases?
On your testcase, the end result is pretty much the same, the variable is not mapped initially to the device, and at the beginning of omp target it is mapped to device, at the end of the region it is unmapped from the device (without copying back). But consider: int a = 1, b = 1; #pragma omp declare target link (a) to (b) int foo (void) { return a++ + b++; } #pragma omp declare target to (foo) int main () { a = 2; b = 2; int res; #pragma omp target map (to: a, b) map (from: res) { res = foo () + foo (); } // This assumes only non-shared address space, so would need to be guarded // for that. if (res != (2 + 1) + (3 + 2)) __builtin_abort (); return 0; } Without declare target link or to, you can't use the global variables in orphaned accelerated routines (unless you e.g. take the address of the mapped variable in the region and pass it around). The to variables (non-deferred) are always mapped and are initialized with the original initializer, refcount is infinity. link (deferred) work more like the normal mapping, referencing those vars when they aren't explicitly (or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly with to kind, it gets the current value of the host var rather than the original one. But, beyond the mapping the compiler needs to ensure that all uses of the link global var (or perhaps just all uses of the link global var outside of the target construct body where it is mapped, because you could use there the pointer you got from GOMP_target) are replaced by dereference of some artificial pointer, so a becomes *a_tmp and &a becomes &*a_tmp, and that the runtime library during registration of the tables is told about the address of this artificial pointer. During registration, I'd expect it would stick an entry for this range into the table, with some special flag or something similar, indicating that it is deferred mapping and where the offloading device pointer is. During mapping, it would map it as any other not yet mapped object, but additionally would also set this device pointer to the device address of the mapped object. We also need to ensure that when we drop the refcount of that mapping back to 0, we get it back to the state where it is described as a range with registered deferred mapping and where the device pointer is. > > This patch only marks them with the new attribute, the actual middle-end > > implementation needs to be implemented. > > > > I believe OpenACC has something similar, but no idea if it is already > > implemented. > > > > Anyway, I think the implementation should be that in some pass running on > > the ACCEL_COMPILER side (guarded by separate address space aka non-HSA) > > HSA does not define ACCEL_COMPILER, because it uses only one compiler. HSA is a non-issue here, as it has shared address space, therefore map clause does nothing, declare target to or link clauses also don't do anything. > > we actually replace the variables with pointers to variables, then need > > to somehow also mark those in the offloading tables, so that the library > > I see 2 possible options: use the MSB of the size, or introduce the third > field > for flags. Well, it can be either recorded in the host variable tables (which contain address and size pair, right), or in corresponding offloading device table (which contains the pointer, something else?). Jakub