On Thu, 15 Oct 2015, Jakub Jelinek wrote: > Looking at Cuda, for async target region kernels we'd probably use > a non-default stream and enqueue the async kernel in there. I see > we can e.g. cudaEventRecord into the stream and then either cudaEventQuery > to busy poll the event, or cudaEventSynchronize to block until the event > occurs, plus there is cudaStreamWaitEvent that perhaps might be even used to > resolve the above mentioned mapping/unmapping async issues for Cuda > - like add an event after the mapping operations that the other target tasks > could wait for if they see any in_flux stuff, and wait for an event etc. > I don't see a possibility to have something like a callback on stream > completion though, so it has to be handled with polling.
Not sure why you say so. There's cu[da]StreamAddCallback, which exists exactly for registering completion callback, but there are restrictions: - this functionality doesn't currently work through CUDA MPS ("multi-process server", for funneling CUDA calls from different processes through a single "server" process, avoiding context-switch overhead on the device, sometimes used for CUDA-with-MPI applications); - it is explicitely forbidden to invoke CUDA API calls from the callback; perhaps understandable, as the callback may be running in a signal-handler context (unlikely), or, more plausibly, in a different thread than the one that registered the callback. Ideally we'd queue all accelerator work up front via EventRecord/StreamWaitEvent, and not rely on callbacks. If host-side work must be done on completion, we could spawn a helper thread waiting on cudaEventSynchronize. > > --- /dev/null > > +++ b/libgomp/testsuite/libgomp.c/target-tmp.c > > @@ -0,0 +1,40 @@ > > +#include <stdio.h> > > +#include <unistd.h> > > + > > +#pragma omp declare target > > +void foo (int n) > > +{ > > + printf ("Start tgt %d\n", n); > > + usleep (5000000); > > 5s is too long. Not to mention that not sure if PTX can do printf > and especially usleep. printf is available, usleep is not (but presumably use of usleep needs to be revised anyway) Alexander