On Mon, Oct 26, 2015 at 09:35:52PM +0300, Ilya Verbin wrote:
> On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > As the testcases show, #pragma omp declare target has now a new form (well,
> > two; with some issues on it pending), where it is used just as a single
> > declarative directive rather than a pair of them and allows marking
> > vars and functions by name as "omp declare target" vars/functions (which the
> > middle-end etc. already handles), but also "omp declare target link", which
> > is a deferred var, that is not initially mapped (on devices without shared
> > memory with host), but has to be mapped explicitly.
> 
> I don't quite understand how link should work.  OpenMP 4.5 says:
> 
> "The list items of a link clause are not mapped by the declare target 
> directive.
> Instead, their mapping is deferred until they are mapped by target data or
> target constructs. They are mapped only for such regions."
>
> But doesn't this mean that the example bellow should work identically
> with/without USE_LINK defined?  Or is there some difference on other 
> testcases?

On your testcase, the end result is pretty much the same, the variable is
not mapped initially to the device, and at the beginning of omp target it is
mapped to device, at the end of the region it is unmapped from the device
(without copying back).

But consider:

int a = 1, b = 1;
#pragma omp declare target link (a) to (b)
int
foo (void)
{
  return a++ + b++;
}
#pragma omp declare target to (foo)
int
main ()
{
  a = 2;
  b = 2;
  int res;
  #pragma omp target map (to: a, b) map (from: res)
  {
    res = foo () + foo ();
  }
  // This assumes only non-shared address space, so would need to be guarded
  // for that.
  if (res != (2 + 1) + (3 + 2))
    __builtin_abort ();
  return 0;
}

Without declare target link or to, you can't use the global variables
in orphaned accelerated routines (unless you e.g. take the address of the
mapped variable in the region and pass it around).
The to variables (non-deferred) are always mapped and are initialized with
the original initializer, refcount is infinity.  link (deferred) work more
like the normal mapping, referencing those vars when they aren't explicitly
(or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
with to kind, it gets the current value of the host var rather than the
original one.  But, beyond the mapping the compiler needs to ensure that
all uses of the link global var (or perhaps just all uses of the link global
var outside of the target construct body where it is mapped, because you
could use there the pointer you got from GOMP_target) are replaced by
dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
&*a_tmp, and that the runtime library during registration of the tables is
told about the address of this artificial pointer.  During registration,
I'd expect it would stick an entry for this range into the table, with some
special flag or something similar, indicating that it is deferred mapping
and where the offloading device pointer is.  During mapping, it would map it
as any other not yet mapped object, but additionally would also set this
device pointer to the device address of the mapped object.  We also need to
ensure that when we drop the refcount of that mapping back to 0, we get it
back to the state where it is described as a range with registered deferred
mapping and where the device pointer is.

> > This patch only marks them with the new attribute, the actual middle-end
> > implementation needs to be implemented.
> > 
> > I believe OpenACC has something similar, but no idea if it is already
> > implemented.
> > 
> > Anyway, I think the implementation should be that in some pass running on
> > the ACCEL_COMPILER side (guarded by separate address space aka non-HSA)
> 
> HSA does not define ACCEL_COMPILER, because it uses only one compiler.

HSA is a non-issue here, as it has shared address space, therefore map
clause does nothing, declare target to or link clauses also don't do
anything.

> > we actually replace the variables with pointers to variables, then need
> > to somehow also mark those in the offloading tables, so that the library
> 
> I see 2 possible options: use the MSB of the size, or introduce the third 
> field
> for flags.

Well, it can be either recorded in the host variable tables (which contain
address and size pair, right), or in corresponding offloading device table
(which contains the pointer, something else?).

        Jakub

Reply via email to