A Tuesday 20 November 2007, Istvan Albert escrigué: > On Nov 19, 2:33 pm, Francesc Altet <[EMAIL PROTECTED]> wrote: > > Just for the record. I was unable to stop thinking about this, and > > after some investigation, I guess that my rememberings were > > gathered from some benchmarks with code in Pyrex (i.e. pretty near > > to C speed). > > Pretty interesting and quite unexpected.
Yeah, and also bogus :( It turned out that in my previous benchmark, I've used plain numpy int arrays to do measurements, and when you extract an element out of a numpy array what you obtain is a *numpy scalar*, which is a different beast than a python (long) integer. Unfortunately, pyrex does swallow it and happily converted to a long long C, but produces a wrong result. This looks like a pyrex fault, and I should report it to the pyrex list. At any rate, with the corrected benchmarks using plain python lists instead (attached), the numbers are: Calibrating loop... Calibrating time: 1.458 Creating the dictionary... Time for dict creation: 15.305 Timing queries with a dict... Time for dict query: 11.632 Creating BinarySearch object... Time for BinarySearch creation: 8.648 Timing queries with a BinarySearch... Time for BinarySearch queries: 19.393 That is, dictionaries do lookups 1.7x faster than a binary lookup (1.42 us/lookup vs 2.37 us/lookup), which is, I suppose, what is expected. Well, this seems the end of my 'peculiar' theory, but at least served to uncover what it seems a bug in pyrex :P -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
# BinarySearch class to demonstrate that binary search is competitive
# against dictionaries based on hashes
import numpy
# In order to access the data area in NumPy objects
cdef extern from "numpy/arrayobject.h":
ctypedef extern class numpy.ndarray [object PyArrayObject]:
cdef char *data
void import_array()
import_array()
cdef long bisect_left(long long *a, long long x, long lo, long hi):
cdef long mid
while lo < hi:
mid = (lo+hi)/2
if a[mid] < x: lo = mid+1
else: hi = mid
return lo
cdef class BinarySearch:
cdef long len
cdef object values
cdef long *revidd
cdef long long *sidd
cdef ndarray sid, revid
def __new__(self, object keys, object values):
self.sid = numpy.array(keys)
self.revid = self.sid.argsort()
self.revidd = <long *>self.revid.data
self.sid.sort()
self.sidd = <long long *>self.sid.data
self.values = values
self.len = len(values) - 1
def __getitem__(self, x):
cdef long pos, idx
pos = bisect_left(self.sidd, x, 0, self.len)
idx = self.revidd[pos]
return self.values[idx]
gq-binary-search.py
Description: application/python
setup.py
Description: application/python
-- http://mail.python.org/mailman/listinfo/python-list
