Thanks a lot for the replies!
I'm not sure to fully understand what you say, so please, let me say
it with my own words (if I'm wrong please let me know):
I transfer the array with the numbers I want to grid to the GPU. Over
each element of that array I overwrite the value of the bin that
corresponds to that array's element and I return that array
(containing integer numbers with the positions of the bins) to the CPU
where I make the reduction.
Fran.
El 04/04/2012, a las 22:34, Pazzula, Dominic J escribió:
Exactly what I was about to propose. Doing the reduction would
probably be faster on the CPU. NumPy + MKL would thread what is
essentially a series of element-wise array additions.
From: [email protected] [mailto:[email protected]] On
Behalf Of David Mertens
Sent: Wednesday, April 04, 2012 3:27 PM
To: Francisco Villaescusa Navarro
Cc: [email protected]
Subject: Re: [PyCUDA] Histograms with PyCUDA
You could take a two-step approach: (1) Have each thread work on a
subset of the data and create its own histogram, then (2) run the
resulting collection of bins through a sum/reduction kernel. The
time for the first step would be roughly N_data /
N_simultaneous_blocks_per_device, and the running time for the
second step would be roughly N_blocks * log(N_blocks).
Not a very detailed answer, but I hope that helps. :-)
David
On Wed, Apr 4, 2012 at 2:07 PM, Francisco Villaescusa Navarro <[email protected]
> wrote:
Thanks a lot for the quick reply!
I was wondering whether I could "count" the number of elements in a
given interval by something such as:
moduleHistrogram = SourceModule("""
__global__ void H(float *pos, int size, float his, float
lower_limit, float upper_limit)
{
unsigned int idx = blockIdx.x*blockDim.x+threadIdx.x;
unsigned int idy = blockIdx.y*blockDim.y+threadIdx.y;
unsigned int id = idy*gridDim.x*blockDim.x+idx;
if (id<size) {
if (pos[id]<upper_limit && pos[id]>lower_limit){
his=his+1.0;
}
}
}
""")
I have tried this but it doesn't work (because the value of the
variable his is not "viewed" by different threads, each of them has
its own local value for the variable his. I also tried with the
kernel:
moduleHistrogram = SourceModule("""
__global__ void H(float *pos, int size, float his, float
lower_limit, float upper_limit)
{
unsigned int idx = blockIdx.x*blockDim.x+threadIdx.x;
unsigned int idy = blockIdx.y*blockDim.y+threadIdx.y;
unsigned int id = idy*gridDim.x*blockDim.x+idx;
__shared__ float A;
A=his;
__syncthreads();
if (id<size) {
if (pos[id]<upper_limit && pos[id]>lower_limit){
A=A+1.0;
__syncthreads();
his=A;
}
}
}
""")
but the problem isn't solved.
Probably I'm doing something very stupid and I would like to know
what it is.
Thanks a lot,
Fran.
El 04/04/2012, a las 20:32, Andreas Kloeckner escribió:
<#part sign=pgpmime>
On Wed, 4 Apr 2012 19:47:08 +0200, Francisco Villaescusa Navarro <[email protected]
> wrote:
Hi,
I have been writing some lines for a project regarding management of
pretty large data sets. I have been trying to simplify the problem as
much as possible to understand where the problem is since I got wrong
results.
The simplification of the problem is the following:
I have a pretty long array of data containing numbers in a given
interval (let's suppose between 0.0 and 1.0), for example
total_numbers=10000
np.random.random(total_numbers).astype(np.float32)
I would like make a histogram of those data. I was wondering which
would be the best strategy to achieve this in PyCUDA.
http://lmgtfy.com/?q=cuda+histogram
:) (Nothing special about *Py*CUDA in this instance. In particular,
there's no canned functionality that will do this for you.)
HTH,
Andreas
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda