On 2017/6/27 6:45 AM, Cesar Philippidis wrote: >> (1) Instead of essentially implementing the entire OpenACC async support >> inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented >> by the plugin, along with core 'test', 'synchronize', 'serialize', etc. >> plugin functions. Most of the OpenACC specific logic is pulled into >> libgomp/oacc-async.c > I'm not sure if plugins need to maintain backwards compatibility. > However, I don't see any changes inside libgomp.map, so maybe it's not > required.
This patch is pretty large, but only inner workings (including libgomp vs. plugin interface) were modified. No user API compatibility was altered. >> (3) For 'wait + async', we now add a local thread synchronize, instead >> of just ordering the streams. >> >> (4) To work with the (3) change, some front end changes were added to >> propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to >> represent a 'wait all'. > What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been > easier to make that change in the gimplifier? Actually, we were basically throwing away argument-less wait clauses in front-ends before this patch; i.e. '#pragma acc parallel async' and '#pragma acc parallel wait async' were internally the same. The use of GOMP_ASYNC_NOVAL (-1) was just following the current 'async' clause representation convention. >> Patch was tested to have no regressions on gomp-4_0-branch. I'll commit >> this after the weekend (or Tues.) >> * plugin/plugin-nvptx.c (struct cuda_map): Remove. >> (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code. >> (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function. > These two functions seem extremely similar. I wonder if you should > consolidate them. It would be nice to have a proper set of pthreads based host fallback hooks for the openacc.async substruct later. Ideally, an accelerator plugin can just implement GOMP_OFFLOAD_openacc_exec, and the default host pthreads-based GOMP_OFFLOAD_openacc_async_exec can be implemented in terms of the synchronous GOMP_OFFLOAD_openacc_exec. Combining the two hook routines would make this less clean. > Overall, I like how you were able eliminate the externally managed map_* > data structure which was used to pass in arguments to nvptx_exec. > Although I wonder if we should just pass in those individual arguments > directly to cuLaunchKernel. But that's a big change in itself. I didn't think of that when working on the current patch, maybe later. Thanks, Chung-Lin