Scott David Daniels wrote:
> Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do
> about 300 probes once the table is set (the underscores below):
>
> not the xyznot the xyznot the xyz...
> not ther_
> not the__
>not ther_
>
Fredrik Lundh wrote:
(btw, the benchmark was taken from jim hugunin's ironpython talk, and
seems to be carefully designed to kill performance also for more advanced
algorithms -- including boyer-moore)
Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do
about 300 probes once th
> The longer the thing you are searching in, the more one-time-only
> overhead you can afford to reduce the per-search-character cost.
Only if you don't find it close to the start.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Py
Irmen de Jong wrote:
There's the Boyer-Moore string search algorithm which is
allegedly much faster than a simplistic scanning approach,
and I also found this: http://portal.acm.org/citation.cfm?id=79184
So perhaps there's room for improvement :)
The problem is setup vs. run. If the question is 'a
> memcmp still compiles to REP CMPB on many x86 compilers, and the setup
> overhead for memcmp sucks on modern x86 hardware
make that "compiles to REPE CMPSB" and "the setup overhead for
REPE CMPSB"
___
Python-Dev mailing list
Python-Dev@python.or
> but refactoring the contains code to use find_internal sounds like a
good
> first step. any takers?
>
>
I'm up for it.
Raymond Hettinger
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsu
Guido van Rossum wrote:
> Which is exactly how s.find() wins this race. (I guess it loses when
> it's found by having to do the "find" lookup.) Maybe string_contains
> should just call string_find_internal()?
I somehow suspected that "in" did some extra work in case the "find"
failed; guess I sho
Mike Brown wrote:
>> any special reason why "in" is faster if the substring is found, but
>> a lot slower if it's not in there?
>
> Just guessing here, but in general I would think that it would stop searching
> as soon as it found it, whereas until then, it keeps looking, which takes more
> time.
A.M. Kuchling wrote:
>> time. But I would also hope that it would be smart enough to know that it
>> doesn't need to look past the 2nd character in 'not the xyz' when it is
>> searching for 'not there' (due to the lengths of the sequences).
>
> Assuming stringobject.c:string_contains is the right