Paolo Bonzini wrote: > If there is one, use the faster u8_strchr algorithm (you can > just use u8_strchr, even though that does a useless conversion back to > UTF-8).
Nice suggestion. Implemented as follows: 2010-07-31 Bruno Haible <br...@clisp.org> unistr/u8-strstr, unistr/u16-strstr: Optimize the one-character case. * lib/unistr/u-strstr.h (FUNC): When the needle contains only one character, perform the search using U_STRCHR. * lib/unistr/u8-strstr.c (U_STRMBTOUC): New macro. * lib/unistr/u16-strstr.c (U_STRMBTOUC): Likewise. * modules/unistr/u8-strstr (Depends-on): Add unistr/u8-strmbtouc. * modules/unistr/u16-strstr (Depends-on): Add unistr/u16-strmbtouc. Suggested by Paolo Bonzini. --- lib/unistr/u-strstr.h.orig Sat Jul 31 22:05:02 2010 +++ lib/unistr/u-strstr.h Sat Jul 31 21:59:37 2010 @@ -1,5 +1,5 @@ /* Substring test for UTF-8/UTF-16/UTF-32 strings. - Copyright (C) 1999, 2002, 2006, 2009-2010 Free Software Foundation, Inc. + Copyright (C) 1999, 2002, 2006, 2010 Free Software Foundation, Inc. Written by Bruno Haible <br...@clisp.org>, 2002. This program is free software: you can redistribute it and/or modify it @@ -24,10 +24,20 @@ if (first == 0) return (UNIT *) haystack; - /* Is needle nearly empty? */ + /* Is needle nearly empty (only one unit)? */ if (needle[1] == 0) return U_STRCHR (haystack, first); +#ifdef U_STRMBTOUC + /* Is needle nearly empty (only one character)? */ + { + ucs4_t first_uc; + int count = U_STRMBTOUC (&first_uc, needle); + if (count > 0 && needle[count] == 0) + return U_STRCHR (haystack, first_uc); + } +#endif + /* Search for needle's first unit. */ for (; *haystack != 0; haystack++) if (*haystack == first) --- lib/unistr/u16-strstr.c.orig Sat Jul 31 22:05:02 2010 +++ lib/unistr/u16-strstr.c Sat Jul 31 21:56:25 2010 @@ -1,5 +1,5 @@ /* Substring test for UTF-16 strings. - Copyright (C) 1999, 2002, 2006, 2009-2010 Free Software Foundation, Inc. + Copyright (C) 1999, 2002, 2006, 2010 Free Software Foundation, Inc. Written by Bruno Haible <br...@clisp.org>, 2002. This program is free software: you can redistribute it and/or modify it @@ -25,4 +25,5 @@ #define FUNC u16_strstr #define UNIT uint16_t #define U_STRCHR u16_strchr +#define U_STRMBTOUC u16_strmbtouc #include "u-strstr.h" --- lib/unistr/u8-strstr.c.orig Sat Jul 31 22:05:02 2010 +++ lib/unistr/u8-strstr.c Sat Jul 31 21:56:25 2010 @@ -1,5 +1,5 @@ /* Substring test for UTF-8 strings. - Copyright (C) 1999, 2002, 2006, 2009-2010 Free Software Foundation, Inc. + Copyright (C) 1999, 2002, 2006, 2010 Free Software Foundation, Inc. Written by Bruno Haible <br...@clisp.org>, 2002. This program is free software: you can redistribute it and/or modify it @@ -25,4 +25,5 @@ #define FUNC u8_strstr #define UNIT uint8_t #define U_STRCHR u8_strchr +#define U_STRMBTOUC u8_strmbtouc #include "u-strstr.h" --- modules/unistr/u16-strstr.orig Sat Jul 31 22:05:02 2010 +++ modules/unistr/u16-strstr Sat Jul 31 22:04:36 2010 @@ -8,6 +8,7 @@ Depends-on: unistr/base unistr/u16-strchr +unistr/u16-strmbtouc configure.ac: gl_LIBUNISTRING_MODULE([0.9], [unistr/u16-strstr]) --- modules/unistr/u8-strstr.orig Sat Jul 31 22:05:02 2010 +++ modules/unistr/u8-strstr Sat Jul 31 22:04:43 2010 @@ -8,6 +8,7 @@ Depends-on: unistr/base unistr/u8-strchr +unistr/u8-strmbtouc configure.ac: gl_LIBUNISTRING_MODULE([0.9], [unistr/u8-strstr])