Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

Pádraig Brady Thu, 08 Jul 2010 03:05:08 -0700

On 07/07/10 23:49, Pádraig Brady wrote:
> On 07/07/10 17:07, Simon Josefsson wrote:
>> Pádraig Brady <p...@draigbrady.com> writes:
>>
>>> +    /* The following is equivalent to:
>>> +         return memmem (s, strlen(s), c, csize);
>>> +       but faster for long S with matching UC near the start,
>>> +       and also memmem is sometimes buggy and inefficient.  */
>>>      switch (u8_uctomb_aux (c, uc, 6))
>>
>> Don't we have an efficient memmem in gnulib that this code could use?
> 
> Well the current replacement isn't great for small needles.
> Also since we don't know the length of the string up front
> we would also waste a bit of time on the strlen().


Well actually testing it, shows that memmem() isn't that bad at all,
though it is slower than the current u8_strchr() for the interesting
2 and 3 character cases

function        hay_len     needle_len      iterations  time
-------------------------------------------------------------
gl_memmem       long        1               1           2.17
pb_memmem¹      long        1               1           3.45
u8_strchr       long        1               1           3.28
gl_memmem       60          1               10000       2.18
pb_memmem       60          1               10000       1.99
u8_strchr       60          1               10000       2.17

gl_memmem       long        2               1           3.88
pb_memmem       long        2               1           4.67
u8_strchr       long        2               1           3.47
gl_memmem       60          2               10000       1.97
pb_memmem       60          2               10000       1.97
u8_strchr       60          2               10000       1.96

gl_memmem       long        3               1           5.86
pb_memmem       long        3               1           4.02
u8_strchr       long        3               1           4.28
gl_memmem       60          3               10000       1.97
pb_memmem       60          3               10000       1.97
u8_strchr       60          3               10000       1.98

The above tests were compiled with:
  gcc -fno-builtin-strlen -march=pentium-m -O3 test-memmem.c

¹ My very simple implementation of memmem() just for comparison
char* pb_memmem (const char* hay, size_t hl, const char* needle, size_t nl)
{
    const char* nohay = hay + hl - nl;
    const char* noneedle = needle + nl;
    if (nl > hl) return NULL;
    for (;;)
      {
        const char* p1 = hay;
        const char* p2 = needle;
        do
          if (p2 == noneedle) return (char*) hay;
        while (*p1++ == *p2++);
        hay++;
        if (hay > nohay) return NULL;
      }
}

cheers,
Pádraig.

Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters

Reply via email to