Hi,

In the following case I have an unexpected behavior with the
gpuarray.to_gpu() function:

import numpy

import pycuda.autoinit
import pycuda.gpuarray

A=numpy.random.rand(3,3)
A_GPU=pycuda.gpuarray.to_gpu(A)
# work as expected
assert numpy.allclose(A_GPU.get(),A)


AT=A.T
AT_GPU=pycuda.gpuarray.to_gpu(AT)
# FAIL!
assert numpy.allclose(AT_GPU.get(),AT)

The problem is that the function to_gpu() copy the memory buffer of
the numpy array without checking the stride. This is equivalent to
suppose that everything in the numpy array is always c contiguous. In
the second case, it is not the case.

Is there some reason or explanation for this behavior?

I know that gpuarray don't have stride and as such are just a memory
buffer with a shape attribute for convenience. I think that in the
case when the data is not c contiguous on the cpu, pycuda should 1)
raise an error or 2) make a contiguous copy and use that for the
transfert(there is optimization possible, but I don't talk about
that).

Here is a simple patch to make it automatically c contiguous:

diff --git a/pycuda/gpuarray.py b/pycuda/gpuarray.py
index 6579926..6e2431f 100644
--- a/pycuda/gpuarray.py
+++ b/pycuda/gpuarray.py
@@ -155,6 +155,8 @@ class GPUArray(object):
     def set(self, ary):
         assert ary.size == self.size
         assert ary.dtype == self.dtype
+        if not ary.flags['C_CONTIGUOUS']:
+            ary = ary.copy()
         if self.size:
             drv.memcpy_htod(self.gpudata, ary)


thanks

Frédéric Bastien

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to