Hi,
In the following case I have an unexpected behavior with the
gpuarray.to_gpu() function:
import numpy
import pycuda.autoinit
import pycuda.gpuarray
A=numpy.random.rand(3,3)
A_GPU=pycuda.gpuarray.to_gpu(A)
# work as expected
assert numpy.allclose(A_GPU.get(),A)
AT=A.T
AT_GPU=pycuda.gpuarray.to_gpu(AT)
# FAIL!
assert numpy.allclose(AT_GPU.get(),AT)
The problem is that the function to_gpu() copy the memory buffer of
the numpy array without checking the stride. This is equivalent to
suppose that everything in the numpy array is always c contiguous. In
the second case, it is not the case.
Is there some reason or explanation for this behavior?
I know that gpuarray don't have stride and as such are just a memory
buffer with a shape attribute for convenience. I think that in the
case when the data is not c contiguous on the cpu, pycuda should 1)
raise an error or 2) make a contiguous copy and use that for the
transfert(there is optimization possible, but I don't talk about
that).
Here is a simple patch to make it automatically c contiguous:
diff --git a/pycuda/gpuarray.py b/pycuda/gpuarray.py
index 6579926..6e2431f 100644
--- a/pycuda/gpuarray.py
+++ b/pycuda/gpuarray.py
@@ -155,6 +155,8 @@ class GPUArray(object):
def set(self, ary):
assert ary.size == self.size
assert ary.dtype == self.dtype
+ if not ary.flags['C_CONTIGUOUS']:
+ ary = ary.copy()
if self.size:
drv.memcpy_htod(self.gpudata, ary)
thanks
Frédéric Bastien
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda