Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-15 Thread Charles R Harris
On Wed, May 14, 2008 at 8:50 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > Aha, I've found the problem -- my values were int64 and my keys were > uint64. Switching to the same data type immediately fixes the issue! > It's not a memory cache issue at all. > > Perhaps searchsorted() should emit a wa

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-14 Thread Andrew Straw
Aha, I've found the problem -- my values were int64 and my keys were uint64. Switching to the same data type immediately fixes the issue! It's not a memory cache issue at all. Perhaps searchsorted() should emit a warning if the keys require casting... I can't believe how bad the hit was. -Andrew

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-14 Thread Charles R Harris
On Wed, May 14, 2008 at 2:00 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > Charles R Harris wrote: > > > > > > On Wed, May 14, 2008 at 8:09 AM, Andrew Straw <[EMAIL PROTECTED] > > > wrote: > > > > > > > > Quite a difference (a factor of about 3000)! At this point, I h

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-14 Thread Andrew Straw
Charles R Harris wrote: > > > On Wed, May 14, 2008 at 8:09 AM, Andrew Straw <[EMAIL PROTECTED] > > wrote: > > > > Quite a difference (a factor of about 3000)! At this point, I haven't > delved into the dataset to see what makes it so pathological -- > performan

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-14 Thread Charles R Harris
On Wed, May 14, 2008 at 8:09 AM, Andrew Straw <[EMAIL PROTECTED]> wrote: > > > I will post any new insights as I continue to work on this... > > > OK, I save isolated a sample of my data that illustrates the terrible > performance with the binarysearch. I have uploaded it as a pytables file > to h

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-14 Thread Andrew Straw
Andrew Straw wrote: > I have uploaded it as a pytables file to http://astraw.com/framenumbers.h5 Ahh, forgot to mention a potentially important point -- this data file is 191 MB. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://project

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-14 Thread Andrew Straw
> I will post any new insights as I continue to work on this... > OK, I save isolated a sample of my data that illustrates the terrible performance with the binarysearch. I have uploaded it as a pytables file to http://astraw.com/framenumbers.h5 in case anyone wants to have a look themselves. H

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-13 Thread Andrew Straw
Nathan Bell wrote: > On Tue, May 13, 2008 at 6:59 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > >> easier and still blazingly fast compared to the binary search >> implemented in searchsorted() given today's cached memory architectures. >> > > Andrew, I looked at your code and I don't qui

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-13 Thread Nathan Bell
On Tue, May 13, 2008 at 6:59 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > easier and still blazingly fast compared to the binary search > implemented in searchsorted() given today's cached memory architectures. Andrew, I looked at your code and I don't quite understand something. Why are you lo

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-13 Thread Andrew Straw
Charles R Harris wrote: > > > On Tue, May 13, 2008 at 5:59 PM, Andrew Straw <[EMAIL PROTECTED] > > wrote: > > Thanks for all the comments on my original question. I was more > offline > than intended after I sent it until now, so I'm sorry I wasn't > immedi

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-13 Thread Charles R Harris
On Tue, May 13, 2008 at 5:59 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > Thanks for all the comments on my original question. I was more offline > than intended after I sent it until now, so I'm sorry I wasn't > immediately able to participate in the discussion. > > Anyhow, after working on this

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-13 Thread Andrew Straw
Thanks for all the comments on my original question. I was more offline than intended after I sent it until now, so I'm sorry I wasn't immediately able to participate in the discussion. Anyhow, after working on this a bit more, I came up with a few implementations of search algorithms doing just w

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-10 Thread Charles R Harris
On Sat, May 10, 2008 at 9:31 AM, Stéfan van der Walt <[EMAIL PROTECTED]> wrote: > Hi Andrew > > 2008/5/9 Andrew Straw <[EMAIL PROTECTED]>: > > I've got a big element array (25 million int64s) that searchsorted() > > takes a long time to grind through. After a bit of digging in the > > literature a

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-10 Thread Stéfan van der Walt
Hi Andrew 2008/5/9 Andrew Straw <[EMAIL PROTECTED]>: > I've got a big element array (25 million int64s) that searchsorted() > takes a long time to grind through. After a bit of digging in the > literature and the numpy source code, I believe that searchsorted() is > implementing a classic binary s

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-09 Thread Charles R Harris
On Thu, May 8, 2008 at 9:51 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > I've got a big element array (25 million int64s) that searchsorted() > takes a long time to grind through. After a bit of digging in the > literature and the numpy source code, I believe that searchsorted() is > implementing

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-09 Thread Bruce Southey
Hi, I don't know if it helps, but Ulrich Drepper had a 9 part series about memory in Linux Weekly News (http://lwn.net). You can under all 9 linked under his name in the Guest archives (http://lwn.net/Archives/GuestIndex/) as not all are linked together. The first one is: http://lwn.net/Arti

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-09 Thread Francesc Alted
A Friday 09 May 2008, Andrew Straw escrigué: > I've got a big element array (25 million int64s) that searchsorted() > takes a long time to grind through. After a bit of digging in the > literature and the numpy source code, I believe that searchsorted() > is implementing a classic binary search, wh

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 11:06 PM, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > On Thu, May 8, 2008 at 10:30 PM, Charles R Harris < > [EMAIL PROTECTED]> wrote: > >> >> >> On Thu, May 8, 2008 at 9:55 PM, Robert Kern <[EMAIL PROTECTED]> >> wrote: >> >>> On Thu, May 8, 2008 at 10:51 PM, Andrew Str

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 10:30 PM, Charles R Harris <[EMAIL PROTECTED]> wrote: > > > On Thu, May 8, 2008 at 9:55 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > >> On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <[EMAIL PROTECTED]> >> wrote: >> > I've got a big element array (25 million int64s) that sear

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-08 Thread Charles R Harris
On Thu, May 8, 2008 at 9:55 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > > I've got a big element array (25 million int64s) that searchsorted() > > takes a long time to grind through. After a bit of digging in the > > liter

Re: [Numpy-discussion] searchsorted() and memory cache

2008-05-08 Thread Robert Kern
On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <[EMAIL PROTECTED]> wrote: > I've got a big element array (25 million int64s) that searchsorted() > takes a long time to grind through. After a bit of digging in the > literature and the numpy source code, I believe that searchsorted() is > implementing

[Numpy-discussion] searchsorted() and memory cache

2008-05-08 Thread Andrew Straw
I've got a big element array (25 million int64s) that searchsorted() takes a long time to grind through. After a bit of digging in the literature and the numpy source code, I believe that searchsorted() is implementing a classic binary search, which is pretty bad in terms of cache misses. There are