Sounds like the thing to do is use the thread.join as the synchronization callable, rather than an event. Should work out pretty much the same. Sound about right? I don't have any strict need to use events; just trying to figure out a decent way to spread the work over multiple GPUs without having to do a lot of bookkeeping.
We actually started this project with OpenCL, but ran into too many bugs in the implementation we were using (Apple's) and decided it wasn't stable enough for our use (it also didn't seem like there was good profiler / debugger support at the time; not sure if that's changed). Throwing around all of our numpy data across processes doesn't sound like any less of a headache than threads. :/ Maybe MPC's gotten a lot better since I looked at it last? Threads should get the job done, though... Thanks, Eli On Thu, Mar 8, 2012 at 11:08 AM, Andreas Kloeckner <[email protected]> wrote: > <#part sign=pgpmime> > Hi Eli, > > On Thu, 8 Mar 2012 09:33:21 -0800, "Eli Stevens (Gmail)" > <[email protected]> wrote: >> I was wondering if the following will work: >> >> - Main thread spins up thread B. >> - Thread B creates a context, invokes a kernel, and creates an event. >> - Event is saved. >> - Thread B pops the context (kernel is still running at this point) >> and finishes. >> - Main thread join()s B and grabs the event. >> - Main thread does other stuff and eventually calls .synchronize() >> >> Does that work? Or will trying to use an event after popping the >> associated context (and from a different thread) cause problems? My >> actual use case involves a thread C that's doing other things on a >> second GPU. Maybe instead of an event, I should just have the threads >> block and then use the join to indicate when the kernel is done? Any >> advice appreciated. :) > > This should work as well as it does in the underlying CUDA > implementation. The problem is that there is always a question of what > context is active where and there's an intricate dance that has to be > performed of one thread having to release the context and another one > grabbing it (as you describe). To me, this seems not worth the > headache. OpenCL is cleaner in this respect, if that's an option for > you. Failing that, keeping all CUDA objects associated with a context > within a thread *will* make your life easier (especially with respect to > garbage collection). If you can, processes make all of this even less > brittle (and more concurrent). > > HTH, > Andreas > _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
