Hello all,
I am implementing a simple 3d convolution on the gpu using pyfft. The
basic idea is straightforward - obtain the 3d Fourier transform for each
array, multiply and take the inverse transform of the product. I am
using pyfft for the implementation. The code below works correctly when
my input array is 256^3 but fails (executes but gives garbage results)
for a 512^3 voxel grid.
# w,h,k are the array dimensions in a power of 2
# im1, im2 are the input 3d arrays of dtype complex64
plan = Plan((w,h,k), normalize=True)
# forward transform on device
im1_gpu = gpuarray.to_gpu(im1)
plan.execute(im1_gpu)
im1_ft = im1_gpu.get()
del im1_gpu
im2_gpu = gpuarray.to_gpu(im2)
plan.execute(im2_gpu)
im2_ft = im2_gpu.get()
del im2_gpu
# do multiplication on host - can be done on device.
conv = im1_ft * im2_ft
#inverse transform on device
conv_gpu = gpuarray.to_gpu(conv)
plan.execute(conv_gpu, inverse='True')
corr = conv_gpu.get()
I don't think there's anything wrong with the code (it works for smaller
array sizes) as such but I am perplexed as to why the failure occurs. I
am running the code on a Tesla C2050 (2.8GB available memory) and so
there's enough space to hold the 512^3 array with complex64 dtype. Does
anyone have an explanation?
-Saigopal
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda