On 22/06/10 00:30, Eric Blake wrote: > Meanwhile, glibc bug 11607 already complains that the time spent on > factorization is extremely noticeable for short strings. I have not > benchmarked things myself (partly because I don't know how cache-line > effects would play into a benchmark), but do wonder if tweaking some > heuristics to use a known-quadratic naive algorithm for short needles, > to end up averaging less work compared to the time we spend on > factorization (for a worst-case needle "aaaa", the naive algorithm has > no initialization cost, but ends up comparing each "a" almost 4 times > per byte of the haystack, while more common needles like "abcd" easily > approach one comparison per haystack byte. The two-way algorithm > guarantees no more than two comparisons per haystack byte, but also > costs 8 comparisons and several conditional jumps for any 4-byte needle. > So given that probability favors that the needle will not be periodic, > at what length of needle does the two-way algorithm save rather than > cost time on an average use? I'm thinking 4- or 8- bytes might also be > able to take advantage of SSE-type vector operations for some > assembly-based speedups to naive searches, when the entire needle fits > in special-purpose large registers. > http://sourceware.org/bugzilla/show_bug.cgi?id=11607
Coincidentally, I was about to mail you about this :) Thanks for pointing out that bug which I was unaware of. I'm currently working on multibyte enhancements to coreutils and was using memmem() to search for single multibyte chars. For the degenerate unibyte case, it was fast due to the special case memchr() done in that situation, but when comparing 2 or 3 byte utf8 characters, things slowed down, so I was going to use memmem-simple instead. If you special cased 4 bytes, it would handle most UTF-8 chars for example. Note the docs for "memmem-simple" say if fixes "This function returns incorrect values in some cases, such as when given an empty needle: glibc <= 2.0, Cygwin 1.5.x." Could that functionality be rolled into memmem-simple so that memmem is just the "fast/fat" version? If not could you briefly expand on "some cases" above as I don't know from the docs if I can use memmem-simple. cheers, Pádraig.