This is a more general question.  I was attempting to run the code below on my 
aforementioned OLD POS Cuda card.  I get the following error:

Traceback (most recent call last):
  File "histo.py", line 54, in <module>
    mod_grid = compiler.SourceModule(grid_gpu)
  File 
"/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
 line 283, in __init__
    arch, code, cache_dir, include_dirs)
  File 
"/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
 line 273, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir)
  File 
"/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py",
 line 137, in compile_plain
    cmdline, stdout=stdout, stderr=stderr)
pycuda.driver.CompileError: nvcc said it demoted types in source code it 
compiled--this is likely not what you want.
[command: nvcc --cubin -arch sm_11 
-I/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda
 kernel.cu]
[stderr:
ptxas /tmp/tmpxft_0000153b_00000000-2_kernel.ptx, line 95; warning : Double is 
not supported. Demoting to float
]

Nowhere in the code is Double referenced.  I'm guessing something behind the 
scenes is trying to declare a Double on my behalf.  Is there a way to turn this 
off?

Thanks
Dominic


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of 
Francisco Villaescusa Navarro
Sent: Friday, April 06, 2012 11:26 AM
To: Thomas Wiecki
Cc: [email protected]
Subject: Re: [PyCUDA] Histograms with PyCUDA

Thanks for all the suggestions!

Regarding removing sqrt: it seems that the code only gains about ~1%,
and you lose the capacity to easily define linear intervals...

I have tried with sqrt and sqrtf, but there is not difference in the
total time (or it is very small).

The code to find the histogram of an array with values between 0 and 1
should be something as:

import numpy as np
import time
import pycuda.driver as cuda
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
from pycuda.compiler import SourceModule
from pycuda import compiler

grid_gpu_template = """
__global__ void grid(float *values, int size, float *temp_grid)
{
     unsigned int id = threadIdx.x;
     int i,bin;
     const uint interv = %(interv)s;

     for(i=id;i<size;i+=blockDim.x){
         bin=(int)(values[i]*interv);
         if (bin==interv){
            bin=interv-1;
         }
         temp_grid[id*interv+bin]+=1.0;
     }
}
"""

reduction_gpu_template = """
__global__ void reduction(float *temp_grid, float *his)
{
     unsigned int id = blockIdx.x*blockDim.x+threadIdx.x;
     const uint interv = %(interv)s;
     const uint threads = %(max_number_of_threads)s;

     if(id<interv){
         for(int i=0;i<threads;i++){
             his[id]+=temp_grid[id+interv*i];
         }
     }
}
"""

number_of_points=100000000
max_number_of_threads=512
interv=1024

blocks=interv/max_number_of_threads
if interv%max_number_of_threads!=0:
     blocks+=1

values=np.random.random(number_of_points).astype(np.float32)

grid_gpu = grid_gpu_template % {
     'interv': interv,
}
mod_grid = compiler.SourceModule(grid_gpu)
grid = mod_grid.get_function("grid")

reduction_gpu = reduction_gpu_template % {
     'interv': interv,
     'max_number_of_threads': max_number_of_threads,
}
mod_redt = compiler.SourceModule(reduction_gpu)
redt = mod_redt.get_function("reduction")

values_gpu=gpuarray.to_gpu(values)
temp_grid_gpu
=gpuarray.zeros((max_number_of_threads,interv),dtype=np.float32)
hist=np.zeros(interv,dtype=np.float32)
hist_gpu=gpuarray.to_gpu(hist)

start=time.clock()*1e3
grid
(values_gpu
,np
.int32
(number_of_points
),temp_grid_gpu,grid=(1,1),block=(max_number_of_threads,1,1))
redt(temp_grid_gpu,hist_gpu,grid=(blocks,
1),block=(max_number_of_threads,1,1))
hist=hist_gpu.get()
print 'Time used to grid with GPU:',time.clock()*1e3-start,' ms'


start=time.clock()*1e3
bins_histo=np.linspace(0.0,1.0,interv+1)
hist_CPU=np.histogram(values,bins=bins_histo)[0]
print 'Time used to grid with CPU:',time.clock()*1e3-start,' ms'

print 'max difference between methods=',np.max(hist_CPU-hist)


################

Results:

Time used to grid with GPU: 680.0  ms
Time used to grid with CPU: 9320.0  ms
max difference between methods= 0.0

So it seems that with this algorithm we can't achieve factors larger
than ~15

Fran.



_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to