On 2018/12/17 10:32 PM, Thomas Schwinge wrote:
The reason there are deadlocks from inside the plugin on GOMP_PLUGIN_fatal() is 
when we hold the
struct gomp_device_descr's*device*  lock, which is also acquired when we 
execute atexit device shutdown handlers, hence the deadlock.

I don't think this is the case for the OpenACC entry points that grab at the 
openacc.async.* hooks,
Ah, OK, I see.  (But I thought that method of deadlock had been fixed by
some structural changes, to have plugin functions call the
non-terminating "GOMP_PLUGIN_error" and return some error, instead of
calling "GOMP_PLUGIN_fatal"?  I may be misremembering.  Or, that's
another TODO item for later, separately...  Or, if that's actually the
case, that this has been fixed in the way I described, then should these
functions also be changed accordingly: instead of "GOMP_PLUGIN_fatal"
call "GOMP_PLUGIN_error", and then return an error code?)

You remembered correctly, although...

though I can audit them again if deemed so.
My understanding had been that deadlock may happen if we're inside some
of these async/wait/serialize/synchronize functions, with "async" locked,
then run into an error, then libgomp prepares to abort, and at that time
shuts down the device, which will shut down the asyncqueues
("goacc_fini_asyncqueues"), which will again try to lock "async" -- which
it actually doesn't.  My misunderstanding, I guess?

...at least now, you can see that goacc_fini_asyncqueues() does not attempt to
acquire devicep->openacc.async.lock when doing cleanup.

Come to think of it, that might be a bug there. :P

"If there are two or more host threads executing and sharing the same 
accelerator device,
two asynchronous operations with the same async-value will be enqueued on the same 
activity queue"
Right, but then, in the very next sentence, it goes on to state: "If the
threads are not synchronized with respect to each other, the operations
may be enqueued in either order and therefore may execute on the device
in either order".  So this, and given that:

I actually didn't care much about that next sentence, since it's just stating 
the obvious :)

It also seem to imply that the multiple host threads are enqueuing operations 
to the same async queue, hence further
corroborating that queues are device-wide, not thread.

That said, I recall most (if not all) of the synchronization operations and 
behavior are all
defined to be with respect to operations of the local host thread only, so the 
spec mentioning interaction with
other host threads here may be moot, as there's no way meaningful way to 
synchronize with
the OpenACC activity of other host threads (though correct me if I forgot some 
API)
..., I concluded something must be wrong in the OpenACC 2.6,
2.16.1. "async clause" text, and no such (host-side) inter-thread
synchronization can be expected from OpenACC "async"/"wait".  I've also
already got that on my list of things to clarify with the OpenACC
technical committee, later on.

I just remembered, there does seem to be one area where device vs. thread-wide 
interpretation will be visible:
when using acc_get/set_cuda_stream(). Supposedly, given the specification's 
device-wide queue/stream model,
different host-threads should access the same CUDA stream when using 
acc_get/set_cuda_stream().
This will break if we made async queues to be thread-local.

Also, CUDA streams do not seem to support local-thread-operation-only 
synchronization.
I remember this was an issue in the old implementation inside the nvptx plugin 
as well, and we
had hacks to somewhat alleviate it (i.e. calling streams "single" or "multi" 
threaded)
Right.

Well, another issue that we might want to bring up to the OpenACC committee:)
I agree that if async queues spaces were simply thread-local then things would 
be much simpler.
OK, so you agree with that, good.

And, no problem foreseeable about simply moving the asyncqueues into
"goacc_thread" -- and removing the "async" lock?

I think we should still try to solve the potential deadlock problems, and stay 
close to the current
implementation just for now. We can ask the committee for further guidance 
later.

Chung-Lin

Reply via email to