On 06/22/2010 03:18 AM, Pádraig Brady wrote: >> http://sourceware.org/bugzilla/show_bug.cgi?id=11607 > > Coincidentally, I was about to mail you about this :) > Thanks for pointing out that bug which I was unaware of. > > I'm currently working on multibyte enhancements to coreutils > and was using memmem() to search for single multibyte chars. > For the degenerate unibyte case, it was fast due to > the special case memchr() done in that situation, but > when comparing 2 or 3 byte utf8 characters, things slowed down, > so I was going to use memmem-simple instead. > If you special cased 4 bytes, it would handle most UTF-8 chars > for example.
Special-casing up to 4 byte needles at a time, even in C code, may be a win, because that's in the realm where we can do vectorized searching (such as how memchr searches multiple bytes at a time). I guess I should play more with the idea. > > Note the docs for "memmem-simple" say if fixes > > "This function returns incorrect values in some cases, such as when > given an empty needle: glibc <= 2.0, Cygwin 1.5.x." > > Could that functionality be rolled into memmem-simple so that > memmem is just the "fast/fat" version? If not could you briefly > expand on "some cases" above as I don't know from the docs if > I can use memmem-simple. The idea is that memmem-simple gives you the replacement only if the system memmem is buggy (gives wrong answers or reads beyond bounds[1]) but without regards to whether it is quadratic, while memmem also gives you the replacement if the system memmem is slow. Either way, if the replacement kicks in, you are guaranteed fast. An example where it makes a difference - if you compile against glibc 2.7, memmem() is not buggy but it is slow; if you only search for compile-time length-limited needles, memmem-simple is an appropriate choice, but if you ever search for a user-supplied needle, then use the memmem module to avoid the possibility of a user causing a DoS by exploiting the quadratic nature. [1] It's surprising, looking through glibc history, how many times in just the past year that various architectures have had bugs of reading beyond bounds as various assembler SSE algorithms are being developed. -- Eric Blake ebl...@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature