Found the interesting part in the CUDA programming guide (Chapter
3.2.6.5.3 ... who made up that chapter number?)

Implicit Synchronization
Two commands from different streams cannot run concurrently if either
one of the following operations is issued in-between them by the host
thread:
* a page-locked host memory allocation,
* a device memory allocation,
* a device memory set,
* a device - device memory copy,

So for the example I gave before I can imagine that the fft allocates
device memory and is therefore blocking. The get_async allocates
page-locked host memory so that will be blocking ...

So am I correct in assuming that for Streams to work I need to run
them in different threads in python? If that is the case I just lost
any interest in using Streams.

-Magnus

-----------------------------------------------
Magnus Paulsson
Assistant Professor
School of Computer Science, Physics and Mathematics
Linnaeus University
Phone: +46-480-446308
Mobile: +46-70-6942987

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to