[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
Scott David Daniels wrote: > Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do > about 300 probes once the table is set (the underscores below): > > not the xyznot the xyznot the xyz... > not ther_ > not the__ >not ther_ >

[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Scott David Daniels
Fredrik Lundh wrote: (btw, the benchmark was taken from jim hugunin's ironpython talk, and seems to be carefully designed to kill performance also for more advanced algorithms -- including boyer-moore) Looking for "not there" in "not the xyz"*100 using Boyer-Moore should do about 300 probes once th

Re: [Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Guido van Rossum
> The longer the thing you are searching in, the more one-time-only > overhead you can afford to reduce the per-search-character cost. Only if you don't find it close to the start. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Py

[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Scott David Daniels
Irmen de Jong wrote: There's the Boyer-Moore string search algorithm which is allegedly much faster than a simplistic scanning approach, and I also found this: http://portal.acm.org/citation.cfm?id=79184 So perhaps there's room for improvement :) The problem is setup vs. run. If the question is 'a

[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
> memcmp still compiles to REP CMPB on many x86 compilers, and the setup > overhead for memcmp sucks on modern x86 hardware make that "compiles to REPE CMPSB" and "the setup overhead for REPE CMPSB" ___ Python-Dev mailing list Python-Dev@python.or

Re: [Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Raymond Hettinger
> but refactoring the contains code to use find_internal sounds like a good > first step.  any takers? > >   I'm up for it.   Raymond Hettinger ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsu

[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
Guido van Rossum wrote: > Which is exactly how s.find() wins this race. (I guess it loses when > it's found by having to do the "find" lookup.) Maybe string_contains > should just call string_find_internal()? I somehow suspected that "in" did some extra work in case the "find" failed; guess I sho

[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
Mike Brown wrote: >> any special reason why "in" is faster if the substring is found, but >> a lot slower if it's not in there? > > Just guessing here, but in general I would think that it would stop searching > as soon as it found it, whereas until then, it keeps looking, which takes more > time.

[Python-Dev] Re: string find(substring) vs. substring in string

2005-02-16 Thread Fredrik Lundh
A.M. Kuchling wrote: >> time. But I would also hope that it would be smart enough to know that it >> doesn't need to look past the 2nd character in 'not the xyz' when it is >> searching for 'not there' (due to the lengths of the sequences). > > Assuming stringobject.c:string_contains is the right