On 2017/6/27 6:45 AM, Cesar Philippidis wrote:
>> (1) Instead of essentially implementing the entire OpenACC async support
>> inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented
>> by the plugin, along with core 'test', 'synchronize', 'serialize', etc.
>> plugin functions. Most of the OpenACC specific logic is pulled into
>> libgomp/oacc-async.c
> I'm not sure if plugins need to maintain backwards compatibility.
> However, I don't see any changes inside libgomp.map, so maybe it's not
> required.

This patch is pretty large, but only inner workings (including libgomp vs. 
plugin interface) were modified.
No user API compatibility was altered.

>> (3) For 'wait + async', we now add a local thread synchronize, instead
>> of just ordering the streams.
>>
>> (4) To work with the (3) change, some front end changes were added to
>> propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to
>> represent a 'wait all'.
> What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been
> easier to make that change in the gimplifier?

Actually, we were basically throwing away argument-less wait clauses in 
front-ends
before this patch; i.e. '#pragma acc parallel async' and '#pragma acc parallel 
wait async'
were internally the same.

The use of GOMP_ASYNC_NOVAL (-1) was just following the current 'async' clause 
representation
convention.

>> Patch was tested to have no regressions on gomp-4_0-branch. I'll commit
>> this after the weekend (or Tues.)
>>      * plugin/plugin-nvptx.c (struct cuda_map): Remove.
>>         (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code.
>>         (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function.
> These two functions seem extremely similar.  I wonder if you should
> consolidate them.

It would be nice to have a proper set of pthreads based host fallback hooks
for the openacc.async substruct later. Ideally, an accelerator plugin can
just implement GOMP_OFFLOAD_openacc_exec, and the default host pthreads-based
GOMP_OFFLOAD_openacc_async_exec can be implemented in terms of
the synchronous GOMP_OFFLOAD_openacc_exec. Combining the two hook routines
would make this less clean.

> Overall, I like how you were able eliminate the externally managed map_*
> data structure which was used to pass in arguments to nvptx_exec.
> Although I wonder if we should just pass in those individual arguments
> directly to cuLaunchKernel. But that's a big change in itself.

I didn't think of that when working on the current patch, maybe later.

Thanks,
Chung-Lin

Reply via email to