David Eklund <[email protected]> writes: > We have a persistent problem attempting to multithread using pycuda. I have > a thread pool with one thread per GPU, each one initializes its own context > with its given device ID and waits to read jobs from a common Queue object. > The main thread processes requests and adds CUDA related jobs to the Queue. > This works well enough and utilizes all available GPUs but we frequently > run into a locking issue when issuing lots of relatively fast cuda calls > where one computation will hang indefinitely. When the contexts are created > with the pycuda.driver.ctx_flags.SCHED_BLOCKING_SYNC flag and I attach to a > hung process I find it's waiting on a semaphore in cuCtxSynchronize in > libcuda.so; when the contexts are created without the SCHED_BLOCKING_SYNC > flag I find its still stuck in cuCtxSynchronize but in a spin loop waiting > for results. > > I have an alternative version with all the same code but bypassing pycuda > and calling directly into an nvcc compiled shared library using ctypes that > uses cudaSetDevice and cudaDeviceSynchronize rather than the cuCtx* > functions and it does not experience these same locking issues.
This looks much like an Nvidia bug--I really don't know what PyCUDA could be doing to prompt this sort of behavior. Do you get the same behavior if you use multiple processes? Anyway, it might be worth pinging Nvidia over this. It'd also be helpful if you could post a minimal program that reproduces this. Also, what driver version? > Has anyone ran into this kind of issue before? Also, is there support in > pycuda (or planned support for future releases) to use cudaDevice* > functions rather than explicit context management? cuda* functions are from the so-called 'run-time API', whereas PyCUDA uses the cu* functions, which form the so-called 'driver API'. Andreas
pgpiysubRSMr8.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
