Using Eric's latest speed-testing, here's David's results:
[EMAIL PROTECTED]:~/code_snippets/histogram$ python histogram_speed.py
type: uint8
millions of elements: 100.0
sec (C indexing based): 8.44 1
sec (numpy iteration based): 8.91 1
sec (rick's pure python): 6.4 1
sec (
Hi,
I spent some time a while ago on an histogram function for numpy. It uses
digitize and bincount instead of sorting the data. If I remember right, it
was significantly faster than numpy's histogram, but I don't know how it
will behave with very large data sets.
I attached the file if you want
I just noticed a bug in this code. "PyArray_ITER_NEXT(iter);" should be moved
out of the if statement.
eric
eric jones wrote:
>
>
> Rick White wrote:
>> Just so we don't get too smug about the speed, if I do this in IDL
>> on the same machine it is 10 times faster (0.28 seconds instead of
>>
Rick White wrote:
Just so we don't get too smug about the speed, if I do this in IDL on
the same machine it is 10 times faster (0.28 seconds instead of 4
seconds). I'm sure the IDL version uses the much faster approach of
just sweeping through the array once, incrementing counts in the
This same idea could be used to parallelize the histogram computation.
Then you could really get into large (many Gb/TB/PB) data sets. I
might try to find time to do this with ipython1, but someone else
could do this as well.
Brian
On 12/13/06, Rick White <[EMAIL PROTECTED]> wrote:
> On Dec 12,
On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote:
> At some point I might try and test
> different cache sizes for different data-set sizes and see what the
> effect is. For now, 65536 seems a good number and I would be happy to
> see this replace the current numpy.histogram.
I experimented a li
Hi all,
Absolutely gorgeous, I confirm the 1.6x speed-up over the weave
version, i.e. a 25x speed-up over the existing version.
It would be good if the redefinition of the range function could be
changed in the numpy modules, before it goes into subversion, to
avoid the need for Rick's line
lran
Looks to me like Rick's version is simpler and faster.It looks like it
offers a speed-up of about 1.6 on my machine over the weave version. I
believe this is because the sorting approach results in quite a few less
compares than the algorithm I used.
Very cool. I vote that his version go int
On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote:
> I'm trying to generate histograms of extremely large datasets. I've
> tried a few methods, listed below, all with their own shortcomings.
> Mailing-list archive and google searches have not revealed any
> solutions.
The numpy.histogram functio
Glad to here it worked for you.
see ya,
eric
Cameron Walsh wrote:
> Thanks very much, Eric. That line fixed it for me, although I'm still
> not sure why it broke with the last line.
>
> Your weave_histogram works a charm and is around 16 times faster than
> any of the other options I've tried.
Thanks very much, Eric. That line fixed it for me, although I'm still
not sure why it broke with the last line.
Your weave_histogram works a charm and is around 16 times faster than
any of the other options I've tried. On my laptop it took 30 seconds
to generate a histogram from 500 million numb
Hmmm.
? Not sure. ?
Change that line to this instead which should work as well.
code = array_converter.declaration_code(self, templatize, inline)
Both work for me.
eric
Cameron Walsh wrote:
> On 13/12/06, Cameron Walsh <[EMAIL PROTECTED]> wrote:
>
>> On 13/12/06, eric jones <[EMAI
On 13/12/06, Cameron Walsh <[EMAIL PROTECTED]> wrote:
> On 13/12/06, eric jones <[EMAIL PROTECTED]> wrote 290 lines of
> awesome code and a fantastic explanation:
>
> > Hey Cameron,
> >
> > I wrote a simple weave based histogram function that should work for
> > your problem. It should work for an
On 13/12/06, eric jones <[EMAIL PROTECTED]> wrote 290 lines of
awesome code and a fantastic explanation:
> Hey Cameron,
>
> I wrote a simple weave based histogram function that should work for
> your problem. It should work for any array input data type. The needed
> files (and a few tests and e
Hey Cameron,
I wrote a simple weave based histogram function that should work for
your problem. It should work for any array input data type. The needed
files (and a few tests and examples) are attached.
Below is the output from the histogram_speed.py file attached. The test
takes about 1
Hi all,
I'm trying to generate histograms of extremely large datasets. I've
tried a few methods, listed below, all with their own shortcomings.
Mailing-list archive and google searches have not revealed any
solutions.
Method 1:
import numpy
import matplotlib
data=numpy.empty((489,1000,1000),dt
16 matches
Mail list logo