Hello Tomi,

On Thu, Apr 5, 2012 at 12:01 AM, Tomi Pieviläinen
<[email protected]> wrote:
> On Wed, Apr 04, 2012 at 04:35:21PM +0300, Tomi Pieviläinen wrote:
>> Hi all,
>>
>> after profiling my script, pstats shows that significant amount of
>> time is spent in a function that only calls
>> pycuda.autoinit.context.synchronize() and few cuda kernel calls
>> depending on a setting (different calls in brances of if).
>>
>> Is it really possible, that most of the time is spent processing that
>> if, or are syncing or gpu-function calls somehow skipped in cProfile?

Synchronization call on host waits for all queued kernels. This is why
you see it in the profiler — all kernel invocations are quite fast, so
the host code just reaches the synchronization call and waits there.
If you want to profile the calls themselves, temporary replace stream
argument to kernels by None, which will make them run synchronously.

> Well, I ran it through the line_profiler, and it seems like 99% of the
> time is spent on synchronization calls. Weird that the syncthreads
> within the kernel code doesn't cause much delays (I'm running only one
> block, so they should be equivalent, right?).

Synchronization call inside the kernel synchronizes warps inside the
block and does not have anything to do with your problem.

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to