-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Simon Josefsson on 1/9/2008 3:16 AM: | | For gnutls, the strings are always the same (X.509 PEM headers), and we | haven't seen such slowdowns even on large inputs (searching for a 10 | byte needle in a 500kb input is, while not typical, not uncommon). What | properties do unlucky strings have?
The quadratic nature of the brute force algorithm (which glibc currently uses) is, in the worst case, proportional to the length of the haystack times the length of the needle, and occurs when the needle is either not present or is only present at the end of the haystack; it is also a factor of how periodic the start of the needle is. In other words, a needle of "1234567890" will behave linearly (every character in the haystack is checked at most twice), even on glibc, since it is non-periodic, while a needle of "1111111112" in a haystack of 500kb of all '1' will require 10x effort, since all nine '1' in the needle are compared in the inner loop prior to the failing '2', for every 1-byte progression of the outer loop. ~ Meanwhile, the new Two-Way algorithm requires at most 2x effort, since it exploits some properties that it learned about the needle in a single linear preprocessing step, in order to advance the outer loop by steps larger than one; it will never compare a character more than twice, no matter how periodic the needle is. If all your needles are short, and more importantly, non-periodic, then you are correct that the worst-case slowdown of the brute-force algorithm will not be triggered, so the speed difference is not noticeable. But if you ever have user-provided needles of arbitrary length, as is the case in m4, then it is critical to use a non-quadratic algorithm if you want to avoid denial-of-service attacks due to carefully chosen needles and haystacks causing wasted processing cycles. So I'm okay with the idea of adding a memmem-simple module that skips the quadratic check if the system memmem is otherwise working (remember, however, that the system memmem may be broken, as on cygwin). But I think it should be done solely through m4 magic - in other words, split gl_FUNC_MEMMEM into two macros, where the behavioral check is called for both memmem and memmem-simple, but the speed check is called only for memmem; both modules can share the lib/memmem.c implementation. Meanwhile, I'm installing this, to claim ownership of my implementation, and to give gcc some compiler hints about memmem's properties. - -- Don't work too hard, make some time for fun as well! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHhM4j84KuGfSFAYARAucnAKCD2AHMrZnRnLYcSOlKwjLKKrdVJwCeJu3D 862KQvrBAKgWKno8SZN2jCI= =F/2t -----END PGP SIGNATURE-----
>From c01669e0970410a989f44883c4117b1a74b651e3 Mon Sep 17 00:00:00 2001 From: Eric Blake <[EMAIL PROTECTED]> Date: Tue, 8 Jan 2008 17:15:27 -0700 Subject: [PATCH] Give gcc some memmem optimization hints. * lib/string.in.h (memmem, memrchr, strchrnul, strnlen, strpbrk) (strcasestr): Declare as pure. * modules/memmem (Maintainer): Claim my implementation. Signed-off-by: Eric Blake <[EMAIL PROTECTED]> --- ChangeLog | 9 ++++++++- lib/string.in.h | 32 +++++++++++++++++++++++++------- modules/memmem | 2 +- 3 files changed, 34 insertions(+), 9 deletions(-) diff --git a/ChangeLog b/ChangeLog index bd4daf8..975665d 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2008-01-09 Eric Blake <[EMAIL PROTECTED]> + + Give gcc some memmem optimization hints. + * lib/string.in.h (memmem, memrchr, strchrnul, strnlen, strpbrk) + (strcasestr): Declare as pure. + * modules/memmem (Maintainer): Claim my implementation. + 2008-01-09 Ralf Wildenhues <[EMAIL PROTECTED]> Support AIX 6.1 and higher. @@ -91,7 +98,7 @@ Suggested by Paul Eggert. 2008-01-01 Sylvain Beucler <[EMAIL PROTECTED]> - Bruno Haible <[EMAIL PROTECTED]> + Bruno Haible <[EMAIL PROTECTED]> Improve memory cleanup in 'relocatable' module. * lib/relocatable.h (compute_curr_prefix): Change return type to diff --git a/lib/string.in.h b/lib/string.in.h index 09205e7..355479a 100644 --- a/lib/string.in.h +++ b/lib/string.in.h @@ -1,6 +1,6 @@ /* A GNU-like <string.h>. - Copyright (C) 1995-1996, 2001-2007 Free Software Foundation, Inc. + Copyright (C) 1995-1996, 2001-2008 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -25,6 +25,18 @@ #define _GL_STRING_H +#ifndef __attribute__ +/* This feature is available in gcc versions 2.5 and later. */ +# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 5) || __STRICT_ANSI__ +# define __attribute__(Spec) /* empty */ +# endif +/* The attribute __pure__ was added in gcc 2.96. */ +# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 96) +# define __pure__ /* empty */ +# endif +#endif + + /* The definition of GL_LINK_WARNING is copied here. */ @@ -40,7 +52,8 @@ extern "C" { # endif # if ! @HAVE_DECL_MEMMEM@ || @REPLACE_MEMMEM@ extern void *memmem (void const *__haystack, size_t __haystack_len, - void const *__needle, size_t __needle_len); + void const *__needle, size_t __needle_len) + __attribute__ ((__pure__)); # endif #elif defined GNULIB_POSIXCHECK # undef memmem @@ -68,7 +81,8 @@ extern void *mempcpy (void *restrict __dest, void const *restrict __src, /* Search backwards through a block for a byte (specified as an int). */ #if @GNULIB_MEMRCHR@ # if ! @HAVE_DECL_MEMRCHR@ -extern void *memrchr (void const *, int, size_t); +extern void *memrchr (void const *, int, size_t) + __attribute__ ((__pure__)); # endif #elif defined GNULIB_POSIXCHECK # undef memrchr @@ -121,7 +135,8 @@ extern char *stpncpy (char *restrict __dst, char const *restrict __src, /* Find the first occurrence of C in S or the final NUL byte. */ #if @GNULIB_STRCHRNUL@ # if ! @HAVE_STRCHRNUL@ -extern char *strchrnul (char const *__s, int __c_in); +extern char *strchrnul (char const *__s, int __c_in) + __attribute__ ((__pure__)); # endif #elif defined GNULIB_POSIXCHECK # undef strchrnul @@ -166,7 +181,8 @@ extern char *strndup (char const *__string, size_t __n); return MAXLEN. */ #if @GNULIB_STRNLEN@ # if ! @HAVE_DECL_STRNLEN@ -extern size_t strnlen (char const *__string, size_t __maxlen); +extern size_t strnlen (char const *__string, size_t __maxlen) + __attribute__ ((__pure__)); # endif #elif defined GNULIB_POSIXCHECK # undef strnlen @@ -192,7 +208,8 @@ extern size_t strnlen (char const *__string, size_t __maxlen); /* Find the first occurrence in S of any character in ACCEPT. */ #if @GNULIB_STRPBRK@ # if ! @HAVE_STRPBRK@ -extern char *strpbrk (char const *__s, char const *__accept); +extern char *strpbrk (char const *__s, char const *__accept) + __attribute__ ((__pure__)); # endif # if defined GNULIB_POSIXCHECK /* strpbrk() assumes the second argument is a list of single-byte characters. @@ -288,7 +305,8 @@ extern char *strsep (char **restrict __stringp, char const *restrict __delim); /* Find the first occurrence of NEEDLE in HAYSTACK, using case-insensitive comparison. */ #if ! @HAVE_STRCASESTR@ -extern char *strcasestr (const char *haystack, const char *needle); +extern char *strcasestr (const char *haystack, const char *needle) + __attribute__ ((__pure__)); #endif #if defined GNULIB_POSIXCHECK /* strcasestr() does not work with multibyte strings: diff --git a/modules/memmem b/modules/memmem index 2c02be9..71f770f 100644 --- a/modules/memmem +++ b/modules/memmem @@ -25,4 +25,4 @@ License: LGPLv2+ Maintainer: -libc, Simon Josefsson +libc, Eric Blake -- 1.5.3.5