Thanks for the quick response. Threading is not completely essential for what I'm doing right now, but certainly a nice-to-have at least. It seems disabling the thread check in this case removes the leak (I'm only using one context), but does introduce other odd behavior that I haven't figured out yet. I removed the other threads I had for now, but would be interested in looking at this some more -- perhaps I will stop by sometime next week.
David On Feb 2, 2012, at 4:56 PM, Andreas Kloeckner wrote: > On Thu, 2 Feb 2012 14:49:32 -0500, David Eigen <[email protected]> wrote: >> Hi, >> >> I ran into a gpu memory leak that appears to happen when gc frees GPUArrays >> that were created in a thread other than the one gc is running in. I did >> not see a github issue tracking this. Is this a known issue that others >> have run into? I'm using pycuda 2011.2.2 with python 2.6.7. >> >> This happens when gc frees a DeviceAllocation that had been created in >> another thread. Since there are no refs to it, it is indeed freed, and the >> destructor tries to free the corresponding CUdeviceptr. However, the >> following lines cause mem_free not to be called: the >> scoped_context_activation checks that the running thread matches the >> context's thread; since it doesn't, it throws an exception, which is >> silently caught in CUDAPP_CATCH_CLEANUP_ON_DEAD_CONTEXT: >> >> class device_allocation ... >> void free() >> { >> if (m_valid) >> { >> try >> { >> scoped_context_activation ca(get_context()); >> mem_free(m_devptr); >> } >> CUDAPP_CATCH_CLEANUP_ON_DEAD_CONTEXT(device_allocation); >> >> I was wondering how to go about fixing or working with this, or if >> anyone has any advice? > > OpenCL is a better API if you absolutely need threading, so using > PyOpenCL is one possible workaround. Using processes rather than threads > is another workaround, possibly with some explicitly shared memory. [1] > (Note you need to fork before pycuda.init(), it seems.) > > [1] > http://docs.python.org/dev/library/multiprocessing.html#module-multiprocessing > > If you absolutely want this fixed, you might introduce a per-context > queue of things to be freed. I'll warn you that context management in > CUDA is a mess with poorly documented semantics. This got partially > fixed with a new, less broken API in CUDA 4.0, and I'm happy that the > current code doesn't seem to be too horribly broken. If we were to > switch to APIs now, we'd ditch backward compatibility with CUDA 3.x and > below. > > If you like, you can also just come by to discuss this. (I'm in 1105A WWH) > > Andreas _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
