subject:"\[Numpy\-discussion\] Histograms of extremely large data sets"

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread Cameron Walsh

Using Eric's latest speed-testing, here's David's results: [EMAIL PROTECTED]:~/code_snippets/histogram$ python histogram_speed.py type: uint8 millions of elements: 100.0 sec (C indexing based): 8.44 1 sec (numpy iteration based): 8.91 1 sec (rick's pure python): 6.4 1 sec (

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread David Huard

Hi, I spent some time a while ago on an histogram function for numpy. It uses digitize and bincount instead of sorting the data. If I remember right, it was significantly faster than numpy's histogram, but I don't know how it will behave with very large data sets. I attached the file if you want

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread eric jones

I just noticed a bug in this code. "PyArray_ITER_NEXT(iter);" should be moved out of the if statement. eric eric jones wrote: > > > Rick White wrote: >> Just so we don't get too smug about the speed, if I do this in IDL >> on the same machine it is 10 times faster (0.28 seconds instead of >>

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread eric jones

Rick White wrote: Just so we don't get too smug about the speed, if I do this in IDL on the same machine it is 10 times faster (0.28 seconds instead of 4 seconds). I'm sure the IDL version uses the much faster approach of just sweeping through the array once, incrementing counts in the

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread Brian Granger

This same idea could be used to parallelize the histogram computation. Then you could really get into large (many Gb/TB/PB) data sets. I might try to find time to do this with ipython1, but someone else could do this as well. Brian On 12/13/06, Rick White <[EMAIL PROTECTED]> wrote: > On Dec 12,

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-14 Thread Rick White

On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote: > At some point I might try and test > different cache sizes for different data-set sizes and see what the > effect is. For now, 65536 seems a good number and I would be happy to > see this replace the current numpy.histogram. I experimented a li

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread Cameron Walsh

Hi all, Absolutely gorgeous, I confirm the 1.6x speed-up over the weave version, i.e. a 25x speed-up over the existing version. It would be good if the redefinition of the range function could be changed in the numpy modules, before it goes into subversion, to avoid the need for Rick's line lran

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread eric jones

Looks to me like Rick's version is simpler and faster.It looks like it offers a speed-up of about 1.6 on my machine over the weave version. I believe this is because the sorting approach results in quite a few less compares than the algorithm I used. Very cool. I vote that his version go int

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread Rick White

On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote: > I'm trying to generate histograms of extremely large datasets. I've > tried a few methods, listed below, all with their own shortcomings. > Mailing-list archive and google searches have not revealed any > solutions. The numpy.histogram functio

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread eric jones

Glad to here it worked for you. see ya, eric Cameron Walsh wrote: > Thanks very much, Eric. That line fixed it for me, although I'm still > not sure why it broke with the last line. > > Your weave_histogram works a charm and is around 16 times faster than > any of the other options I've tried.

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread Cameron Walsh

Thanks very much, Eric. That line fixed it for me, although I'm still not sure why it broke with the last line. Your weave_histogram works a charm and is around 16 times faster than any of the other options I've tried. On my laptop it took 30 seconds to generate a histogram from 500 million numb

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread eric jones

Hmmm. ? Not sure. ? Change that line to this instead which should work as well. code = array_converter.declaration_code(self, templatize, inline) Both work for me. eric Cameron Walsh wrote: > On 13/12/06, Cameron Walsh <[EMAIL PROTECTED]> wrote: > >> On 13/12/06, eric jones <[EMAI

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread Cameron Walsh

On 13/12/06, Cameron Walsh <[EMAIL PROTECTED]> wrote: > On 13/12/06, eric jones <[EMAIL PROTECTED]> wrote 290 lines of > awesome code and a fantastic explanation: > > > Hey Cameron, > > > > I wrote a simple weave based histogram function that should work for > > your problem. It should work for an

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-13 Thread Cameron Walsh

On 13/12/06, eric jones <[EMAIL PROTECTED]> wrote 290 lines of awesome code and a fantastic explanation: > Hey Cameron, > > I wrote a simple weave based histogram function that should work for > your problem. It should work for any array input data type. The needed > files (and a few tests and e

Re: [Numpy-discussion] Histograms of extremely large data sets

2006-12-12 Thread eric jones

Hey Cameron, I wrote a simple weave based histogram function that should work for your problem. It should work for any array input data type. The needed files (and a few tests and examples) are attached. Below is the output from the histogram_speed.py file attached. The test takes about 1

[Numpy-discussion] Histograms of extremely large data sets

2006-12-12 Thread Cameron Walsh

Hi all, I'm trying to generate histograms of extremely large datasets. I've tried a few methods, listed below, all with their own shortcomings. Mailing-list archive and google searches have not revealed any solutions. Method 1: import numpy import matplotlib data=numpy.empty((489,1000,1000),dt

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

Re: [Numpy-discussion] Histograms of extremely large data sets

[Numpy-discussion] Histograms of extremely large data sets

16 matches

Site Navigation

Mail list logo

Footer information