Re: memmem speedup

2008-01-08 Thread Bruno Haible
James Youngman wrote: > ... searching for a constant needle in a large > number of haystacks. In such cases, the preparation only needs to be > done once at all. In the specific case of findutils, locate will > typically search for a single needle in one or two million haystacks. > Of course, l

Re: memmem speedup

2008-01-08 Thread James Youngman
On Jan 8, 2008 2:58 PM, Benoit Sigoure <[EMAIL PROTECTED]> wrote: > In this specific case of findutils, wouldn't it be even more > efficient to use a data structure that orders data according to there > common prefixes (such as Tries [http://en.wikipedia.org/wiki/Trie])? While I'm sure there are f

Re: memmem speedup

2008-01-08 Thread Benoit Sigoure
On Jan 8, 2008, at 3:34 PM, James Youngman wrote: On Jan 8, 2008 1:32 PM, Eric Blake <[EMAIL PROTECTED]> wrote: (I may later factor out the two_way static functions into a parameterized header, in order to share code with strstr, strcasestr, ...; however, note that strstr can never achieve

Re: memmem speedup

2008-01-08 Thread James Youngman
On Jan 8, 2008 1:32 PM, Eric Blake <[EMAIL PROTECTED]> wrote: > (I may later factor out the two_way static functions into a parameterized > header, in order to share code with strstr, strcasestr, ...; however, note > that strstr can never achieve sublinear performance, because it costs O(n) > to f

Re: memmem speedup

2008-01-08 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Bruno Haible on 1/7/2008 5:54 PM: | | And LONG_NEEDLE_THRESHOLD should better be (1U << (CHAR_BIT - 1)) | rather than (1 << (CHAR_BIT - 1)), for platforms where CHAR_BIT = 32. | On such platforms, the function two_way_long_needle will not

Re: memmem speedup

2008-01-07 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Bruno Haible on 1/7/2008 5:54 PM: | | Thanks for the comments; they are essential for understanding the code. | I also find your comments clearer than the text in .../~lecroq/... Thanks for your review. Also, is there any reason to have

Re: memmem speedup

2008-01-07 Thread Bruno Haible
Eric Blake wrote: > I've finished my implementation of a Two-Way plus Boyer-Moore hybrid > string search algorithm for the memmmem module. This patch has passed > everything I've thrown at it so far, and it has the nice properties of > avoiding alloca/malloc (hence it is async-safe), as well as al

Re: memmem speedup

2008-01-07 Thread Ralf Wildenhues
Hi Eric, * Eric Blake wrote on Sun, Jan 06, 2008 at 02:23:33AM CET: > > I'd appreciate any reviews before checking it in. Here's a rough glance at it. FWIW, the diff is not very readable (there was a patch to diffutils out there for --more-readable). > +/* We use the Two-Way string matching al

Re: memmem speedup

2008-01-05 Thread Mike Frysinger
On Saturday 05 January 2008, Eric Blake wrote: > Mike Frysinger gentoo.org> writes: > > on the topic of integrating it elsewhere ... the comment header states > > that this is part of the GNU C library. > > The original, before my hacks, was a straight copy from glibc; so this is > intended to be

Re: memmem speedup

2008-01-05 Thread Eric Blake
Mike Frysinger gentoo.org> writes: > on the topic of integrating it elsewhere ... the comment header states that > this is part of the GNU C library. The original, before my hacks, was a straight copy from glibc; so this is intended to be pushed back to glibc. > if it were part of glibc, it w

Re: memmem speedup

2008-01-05 Thread Mike Frysinger
On Saturday 05 January 2008, Eric Blake wrote: > I've finished my implementation of a Two-Way plus Boyer-Moore hybrid > string search algorithm for the memmmem module. This patch has passed > everything I've thrown at it so far, and it has the nice properties of > avoiding alloca/malloc (hence it

memmem speedup

2008-01-05 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've finished my implementation of a Two-Way plus Boyer-Moore hybrid string search algorithm for the memmmem module. This patch has passed everything I've thrown at it so far, and it has the nice properties of avoiding alloca/malloc (hence it is asyn