On Dec 20, 2007 1:27 PM, Eric Blake <[EMAIL PROTECTED]> wrote: > Also, I wonder if Boyer-Moore would be any more effective than KMP in the > average case (both have worst-case linear complexity, but we know that in > general, we aren't always dealing with worst-case). > > http://en.wikipedia.org/wiki/Knuth-morris-pratt > http://en.wikipedia.org/wiki/Boyer-Moore_string_search_algorithm
Quite possibly. If we extracted the KMP implementation as a prepare-then-search interface, and as I mentioned, I've done quite a bit of work in that area already, we could do this dynamically (or at least, without necessarily impacting the caller). I'm sure Bruno thought about this already, though. The main problem though is that the B-M algorithm (I guess we'd likely select Turbo-Boyer-Moore to reduce the comparisons from 3N to 2N) needs to move backward through the haystack. That doesn't work well on multibyte strings. We could, as I mentioned, build an ancilliary data structure, but that would mean adding an extra N operations (thus making the cost ~3N once again, I suppose). James.