-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Bruno Haible on 1/7/2008 5:54 PM: | | And LONG_NEEDLE_THRESHOLD should better be (1U << (CHAR_BIT - 1)) | rather than (1 << (CHAR_BIT - 1)), for platforms where CHAR_BIT = 32. | On such platforms, the function two_way_long_needle will not be used. | Therefore it's also appropriate to wrap the added test (tests/test-memmem.c | lines 155..180) in an | if (CHAR_BIT < 10) { ... }.
Gnulib tends to target POSIX-like platforms, where CHAR_BIT == 8. But I agree that on non-POSIX platforms, the memory cost of larger bytes is prohibitive, so I've updated the code accordingly. Actually, I went ahead and lowered the threshold from 128 down to 32 (I figured the additional cost of 256+32 extra operations is an acceptable startup penalty of 9*needle_len, compared to previous 256+128 factor of 3x), because: - - a 32x speedup in comparisons is noticeable - - 32 is a common cache line size, so secondary effects of less cache pollution can also start coming into play for a needle as short as 33 bytes - - 32 bytes tends to be longer than most single words, but average for small phrases in natural language text, and natural language text tends to benefit more from the bad-character shift table than repetitive patterns. I still agree that for extremely short needles, the cost is prohibitive (in the worst case, building the shift table for a 2-byte needle is a 129*needle_len startup penalty, and all that it can accomplish is at most a 2x speedup). I've tried to improve the comments even further, then committed the following for memmem: http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9d8d6cd It may be easier to view the raw memmem.c: http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=lib/memmem.c;h=622a034;hb=9d8d6cd (I may later factor out the two_way static functions into a parameterized header, in order to share code with strstr, strcasestr, ...; however, note that strstr can never achieve sublinear performance, because it costs O(n) to find the NUL byte that terminates the haystack, and it is not safe to read beyond the NUL). - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHg3uC84KuGfSFAYARAkeTAJ9OaYhbABw/3nELWsIKAY4uvu0RMgCeOiff PuOGKn3o8RJTXlfGw63FMsY= =wB5E -----END PGP SIGNATURE-----