Found the interesting part in the CUDA programming guide (Chapter 3.2.6.5.3 ... who made up that chapter number?)
Implicit Synchronization Two commands from different streams cannot run concurrently if either one of the following operations is issued in-between them by the host thread: * a page-locked host memory allocation, * a device memory allocation, * a device memory set, * a device - device memory copy, So for the example I gave before I can imagine that the fft allocates device memory and is therefore blocking. The get_async allocates page-locked host memory so that will be blocking ... So am I correct in assuming that for Streams to work I need to run them in different threads in python? If that is the case I just lost any interest in using Streams. -Magnus ----------------------------------------------- Magnus Paulsson Assistant Professor School of Computer Science, Physics and Mathematics Linnaeus University Phone: +46-480-446308 Mobile: +46-70-6942987 _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
