Hello all,
        I am implementing a simple 3d convolution on the gpu using pyfft. The
basic idea is straightforward - obtain the 3d Fourier transform for each
array, multiply and take the inverse transform of the product. I am
using pyfft for the implementation. The code below works correctly when
my input array is 256^3 but fails (executes but gives garbage results)
for a 512^3 voxel grid. 


# w,h,k are the array dimensions in a power of 2
# im1, im2 are the input 3d arrays of dtype complex64

plan = Plan((w,h,k), normalize=True)    

# forward transform on device
im1_gpu = gpuarray.to_gpu(im1)
plan.execute(im1_gpu) 
im1_ft = im1_gpu.get()
del im1_gpu

im2_gpu = gpuarray.to_gpu(im2) 
plan.execute(im2_gpu)   
im2_ft = im2_gpu.get()
del im2_gpu


# do multiplication on host - can be done on device.
conv = im1_ft * im2_ft

#inverse transform on device
conv_gpu = gpuarray.to_gpu(conv)
plan.execute(conv_gpu, inverse='True')
corr = conv_gpu.get() 


I don't think there's anything wrong with the code (it works for smaller
array sizes) as such but I am perplexed as to why the failure occurs. I
am running the code on a Tesla C2050 (2.8GB available memory) and so
there's enough space to hold the 512^3 array with complex64 dtype. Does
anyone have an explanation?

-Saigopal 


_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to