Dear Matthew,
first of all what GPU you are using? A Geforce or a Tesla or something
like that?
I see a quadruple loop in your kernel. If it's quick to execute there's
no problem with that, however, be careful, if you have a Geforce for
instance, there's a time limit a thread can run (5s I think), so the
kernel cannot run forever. But I think that's not the case with Teslas
for instance. Try to execute something shorter and see if it
works....then add another sub-loop and you'll see that it's too slow.
++
Peter
On 06/21/2011 08:05 AM, Matthew Graham wrote:
> Hi,
>
> I'm having problems with big arrays; this crashes horribly:
>
> from pycuda import driver, compiler, gpuarray, tools
> from pycuda.curandom import rand as curand
> import numpy as np
> import pycuda.autoinit
>
> NV = 5632 # No of vectors
> DIM = 2031 # No of dimensions per vector
> BLOCK_SIZE = 16 # 512 max threads per mp
> GRID_SIZE = 352
> HEIGHT = 8
> WIDTH = 8
>
> gj = curand((NV, DIM))
> nodes = curand((HEIGHT, WIDTH, DIM)) # Each row is a vector
> dissim = curand((DIM,DIM))
> nearest = gpuarray.zeros((NV), np.float32)
>
> kernel_code = '''
> __global__ void NearestPrototypeKernel(float *gj, float *nodes, float
> *dissim, float *nearest)
> {
> // Element
> int idx = blockIdx.x * blockDim.x + threadIdx.x;
> float value = 0.;
>
> for (int i = 0; i < 8; ++i) {
> for (int j = 0; j < 8; ++j) {
> for (int l = 0; l < 2031; ++l) {
> float wp_l = gj[idx * 2031 + l];
> if (wp_l > 0) {
> for (int k = 0; k < 2031; ++k) {
> value += 0.01;
> }
> }
> }
> }
> }
>
> nearest[idx] = value;
> }
> '''
> mod = compiler.SourceModule(kernel_code)
> npker = mod.get_function("NearestPrototypeKernel")
> npker(
> # inputs
> gj, nodes, dissim,
> # output
> nearest,
> # block of multiple threads
> block = (BLOCK_SIZE, 1, 1),
> # grid of blocks
> grid = (GRID_SIZE, 1)
> )
> a = nearest.get()
>
> ---
>
> with this:
>
> /Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/compiler.py:122:
> UserWarning: The CUDA compiler suceeded, but said the following:
> ptxas /tmp/tmpxft_000003e5_00000000-2_kernel.ptx, line 93; warning : Double
> is not supported. Demoting to float
>
> +stdout+stderr)
> Traceback (most recent call last):
> File "kernel_test.py", line 54, in <module>
> a = nearest.get()
> File
> "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/gpuarray.py",
> line 177, in get
> drv.memcpy_dtoh(ary, self.gpudata)
> pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
> File
> "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/atexit.py",
> line 24, in _run_exitfuncs
> func(*targs, **kargs)
> File
> "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/autoinit.py",
> line 13, in _finish_up
> context.pop()
> LaunchError: cuCtxPopCurrent failed: launch timeout
> Error in sys.exitfunc:
> Traceback (most recent call last):
> File
> "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/atexit.py",
> line 24, in _run_exitfuncs
> func(*targs, **kargs)
> File
> "/Library/Frameworks/Python.framework/Versions/6.0.0/lib/python2.6/site-packages/pycuda-2011.1-py2.6-macosx-10.5-i386.egg/pycuda/autoinit.py",
> line 13, in _finish_up
> context.pop()
> pycuda._driver.LaunchError: cuCtxPopCurrent failed: launch timeout
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: invalid context
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: invalid context
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: invalid context
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuModuleUnload failed: invalid context
> PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
> cuMemFree failed: invalid context
> -------------------------------------------------------------------
> PyCUDA ERROR: The context stack was not empty upon module cleanup.
> -------------------------------------------------------------------
> A context was still active when the context stack was being
> cleaned up. At this point in our execution, CUDA may already
> have been deinitialized, so there is no way we can finish
> cleanly. The program will be aborted now.
> Use Context.pop() to avoid this problem.
> -------------------------------------------------------------------
> Abort
>
> ----
>
> Does anybody have any idea to get around this?
>
> Cheers,
>
> Matthew
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>
--
Peter Schmidtke
PhD Student
Dept. Physical Chemistry
Faculty of Pharmacy
University of Barcelona
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda