I never had the pleasure of working with multiple GPUs at once, but I nonetheless have given some thought to how I might handle this sort of situation. My plan would have been to use MPI to launch one process for each GPU and shuttle the data between processes using Numpy's MPI bindings. On Ubuntu, at least, it's possible to run MPI with multiple processes on a single processor.
Hope that helps, if you can't solve your original problem. David On Thu, Mar 8, 2012 at 2:55 PM, Eli Stevens (Gmail) <[email protected]>wrote: > Sounds like the thing to do is use the thread.join as the > synchronization callable, rather than an event. Should work out > pretty much the same. Sound about right? I don't have any strict > need to use events; just trying to figure out a decent way to spread > the work over multiple GPUs without having to do a lot of bookkeeping. > > We actually started this project with OpenCL, but ran into too many > bugs in the implementation we were using (Apple's) and decided it > wasn't stable enough for our use (it also didn't seem like there was > good profiler / debugger support at the time; not sure if that's > changed). > > Throwing around all of our numpy data across processes doesn't sound > like any less of a headache than threads. :/ Maybe MPC's gotten a > lot better since I looked at it last? Threads should get the job > done, though... > > Thanks, > Eli > > On Thu, Mar 8, 2012 at 11:08 AM, Andreas Kloeckner > <[email protected]> wrote: > > <#part sign=pgpmime> > > Hi Eli, > > > > On Thu, 8 Mar 2012 09:33:21 -0800, "Eli Stevens (Gmail)" < > [email protected]> wrote: > >> I was wondering if the following will work: > >> > >> - Main thread spins up thread B. > >> - Thread B creates a context, invokes a kernel, and creates an event. > >> - Event is saved. > >> - Thread B pops the context (kernel is still running at this point) > >> and finishes. > >> - Main thread join()s B and grabs the event. > >> - Main thread does other stuff and eventually calls .synchronize() > >> > >> Does that work? Or will trying to use an event after popping the > >> associated context (and from a different thread) cause problems? My > >> actual use case involves a thread C that's doing other things on a > >> second GPU. Maybe instead of an event, I should just have the threads > >> block and then use the join to indicate when the kernel is done? Any > >> advice appreciated. :) > > > > This should work as well as it does in the underlying CUDA > > implementation. The problem is that there is always a question of what > > context is active where and there's an intricate dance that has to be > > performed of one thread having to release the context and another one > > grabbing it (as you describe). To me, this seems not worth the > > headache. OpenCL is cleaner in this respect, if that's an option for > > you. Failing that, keeping all CUDA objects associated with a context > > within a thread *will* make your life easier (especially with respect to > > garbage collection). If you can, processes make all of this even less > > brittle (and more concurrent). > > > > HTH, > > Andreas > > > > _______________________________________________ > PyCUDA mailing list > [email protected] > http://lists.tiker.net/listinfo/pycuda > -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
