[PyCUDA] Context desctruction vs. profiler (patch)

Tomasz Rybak Wed, 09 Feb 2011 13:00:50 -0800

Hello.
Recently I have been playing with profiler and logger in CUDA.
Detailed description is in
http://wiki.tiker.net/ToolCheatSheet
and
http://wiki.tiker.net/ToolCheatSheet?action=AttachFile&do=view&target=compute-profiler-manual.txt


Basically I have set environment variable
COMPUTE_PROFILE to 1 and run PyCUDA programs.
I have observed that logger does not put every function
call into text files. I have also observed that test cases
(functions decorated by pycuda.tools.mark_cuda_test)
were generating full logs. The only difference I have
found was calling context.detach() in mark_cuda_test.

I have then experimented a little bit and observed
that indeed when I was not using pycuda.autotools but
instead created context manually and then popped
_and detached_ it full log was generated.

I am attaching patch that adds ctx.detach to functions
called at exit of program in pycuda.autoinit.
I have tested PyCUDA with this patch, and all programs
from test/* run without problems.

I am also attaching two logs from examples/demo.py.
One is result of using autoinit with detach, one without.
As you can see the latter misses some of the functions
like (2*gpuarray).get() (axpb kernel).

So Andreas, please apply this patch before finalising 2011.1.

Best regards.

-- 
Tomasz Rybak <[email protected]> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GTX 460
# TIMESTAMPFACTOR fffff703cfe75b50
method,gputime,cputime,occupancy
method=[ memcpyHtoD ] gputime=[ 0.992 ] cputime=[ 29.000 ] 
method=[ doublify ] gputime=[ 2.976 ] cputime=[ 38.000 ] occupancy=[ 0.021 ] 
method=[ memcpyDtoH ] gputime=[ 1.248 ] cputime=[ 22.000 ] 
method=[ memcpyHtoD ] gputime=[ 0.864 ] cputime=[ 8.000 ] 
method=[ doublify ] gputime=[ 1.920 ] cputime=[ 12.000 ] occupancy=[ 0.021 ] 
method=[ memcpyDtoH ] gputime=[ 1.056 ] cputime=[ 22.000 ] 
method=[ memcpyHtoD ] gputime=[ 0.832 ] cputime=[ 3.000 ] 
method=[ axpb ] gputime=[ 2.592 ] cputime=[ 19.000 ] occupancy=[ 0.021 ] 
method=[ memcpyDtoH ] gputime=[ 1.280 ] cputime=[ 17.000 ] 
method=[ memcpyDtoH ] gputime=[ 1.056 ] cputime=[ 14.000 ]

# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 GeForce GTX 460
# TIMESTAMPFACTOR fffff703d17a0930
method,gputime,cputime,occupancy
method=[ memcpyHtoD ] gputime=[ 1.024 ] cputime=[ 28.000 ] 
method=[ doublify ] gputime=[ 3.008 ] cputime=[ 51.000 ] occupancy=[ 0.021 ] 
method=[ memcpyDtoH ] gputime=[ 1.248 ] cputime=[ 25.000 ] 
method=[ memcpyHtoD ] gputime=[ 0.832 ] cputime=[ 11.000 ] 
method=[ doublify ] gputime=[ 1.920 ] cputime=[ 17.000 ] occupancy=[ 0.021 ]

diff --git a/pycuda/autoinit.py b/pycuda/autoinit.py
index ec1c7bd..9aadb90 100644
--- a/pycuda/autoinit.py
+++ b/pycuda/autoinit.py
@@ -8,4 +8,5 @@ context = make_default_context()
 device = context.get_device()
 
 import atexit
+atexit.register(context.detach)
 atexit.register(context.pop)

signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

[PyCUDA] Context desctruction vs. profiler (patch)

Reply via email to