I just noticed a bug in this code. "PyArray_ITER_NEXT(iter);" should be moved out of the if statement.
eric eric jones wrote: > > > Rick White wrote: >> Just so we don't get too smug about the speed, if I do this in IDL >> on the same machine it is 10 times faster (0.28 seconds instead of >> 4 seconds). I'm sure the IDL version uses the much faster approach >> of just sweeping through the array once, incrementing counts in the >> appropriate bins. It only handles equal-sized bins, so it is not as >> general as the numpy version -- but equal-sized bins is a very >> common case. I'd still like to see a C version of histogram (which >> I guess would need to be a ufunc) go into the core numpy. >> > Yes, this gets rid of the search, and indices can just be caluclated > from offsets. I've attached a modified weaved histogram that takes > this approach. Running the snippet below on my machine takes .118 sec > for the evenly binned weave algorithm and 0.385 sec for Rick's > algorithm on 5 million elements. That is close to 4x faster (but not > 10x...), so there is indeed some speed to be gained for the common > special case. I don't know if the code I wrote has a 2x gain left in > it, but I've spent zero time optimizing it. I'd bet it can be > improved substantially. > > eric > > ### test_weave_even_histogram.py > > from numpy import arange, product, sum, zeros, uint8 > from numpy.random import randint > > import weave_even_histogram > > import time > > shape = 1000,1000,5 > size = product(shape) > data = randint(0,256,size).astype(uint8) > bins = arange(256+1) > > print 'type:', data.dtype > print 'millions of elements:', size/1e6 > > bin_start = 0 > bin_size = 1 > bin_count = 256 > t1 = time.clock() > res = weave_even_histogram.histogram(data, bin_start, bin_size, > bin_count) > t2 = time.clock() > print 'sec (evenly spaced):', t2-t1, sum(res) > print res > > >> Rick >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion@scipy.org >> http://projects.scipy.org/mailman/listinfo/numpy-discussion >> >> > > ------------------------------------------------------------------------ > > from numpy import array, zeros, asarray, sort, int32 > from scipy import weave > from typed_array_converter import converters > > def histogram(ary, bin_start, bin_size, bin_count): > > ary = asarray(ary) > > # Create an array to hold the histogram count results. > results = zeros(bin_count,dtype=int32) > > # The C++ code that actually does the histogramming. > code = """ > PyArrayIterObject *iter = > (PyArrayIterObject*)PyArray_IterNew(py_ary); > > while(iter->index < iter->size) > { > > ////////////////////////////////////////////////////////// > // binary search > ////////////////////////////////////////////////////////// > > // This requires an update to weave > ary_data_type value = *((ary_data_type*)iter->dataptr); > if (value>=bin_start) > { > int bin_index = (int)((value-bin_start)/bin_size); > > ////////////////////////////////////////////////////////// > // Bin counter increment > ////////////////////////////////////////////////////////// > > // If the value was found, increment the counter for that > bin. > if (bin_index < bin_count) > { > results[bin_index]++; > } > PyArray_ITER_NEXT(iter); > } > } > """ > weave.inline(code, ['ary', 'bin_start', 'bin_size','bin_count', > 'results'], > type_converters=converters, > compiler='gcc') > > return results > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion