Re: [PATCH] (x)memcoll: performance improvement when input is known to be NUL delimited.

Bruno Haible Mon, 08 Mar 2010 13:30:57 -0800

Hi Chen,

Thanks for your submission! I think your patch is a useful addition to the
'memcoll' and 'xmemcoll' modules.


The code in gnulib should be copyright-assigned to the FSF. This is necessary,
in order to be on the safe side, legally, should disputes à la SCO happen in
the future. Is it a possibility for you to assign the copyright for this
contribution to the FSF? There are multiple ways of doing it; I send it to you
in separate mail.

> as well as include it inline, just to be on the safe side.

Thanks, it's easier to reply to it, this way.

Here are a couple of comments. I'm suggesting changes that will make your
code more similar to other gnulib code.

> If each input line is
> NUL delimited when read in, memcoll no longer needs to save the
> last byte, NUL delimit, then put the last byte back every time the
> line is compared. Sorting a 96MB, 1M line file, performance
> improvement is roughly 1%.

I'm in favour of it, not because of the 1% speed gain - you can often get
5% of speedup just by putting a couple of 'inline' keywords in the right
places -, but because a function like memcoll should really be able to
work without modifying the input strings, and your addition achieves this.

> diff --git a/lib/memcoll.h b/lib/memcoll.h
> index 8f2e1b1..392484d 100644
> --- a/lib/memcoll.h
> +++ b/lib/memcoll.h
> @@ -23,5 +23,7 @@
>  # include <stddef.h>
>  
>  int memcoll (char *, size_t, char *, size_t);
> +int memcoll_nul (char *, size_t, char *, size_t);

The function does not modify its input strings, therefore its arguments
should be 'const char *', not 'char *'.

About the function name: There is somewhat of a habit to use a suffix '0'
to indicate that the function deals with NUL terminated strings. We have
the functions obstack_grow0, readtokens0, xmemdup0. A suffix '_nul' is
probably better, but we have now already a precedent for suffix '0'...

> +static inline int strcoll_loop (char *, size_t, char *, size_t);

The function is defined in memcoll.c and 'static', therefore this declaration
is of no use to users who  #include "memcoll.h". In other words, this
declaration belongs in memcoll.c, not memcoll.h.

> diff --git a/lib/xmemcoll.h b/lib/xmemcoll.h
> index 2f422e8..df2069d 100644
> --- a/lib/xmemcoll.h
> +++ b/lib/xmemcoll.h
> @@ -1,2 +1,4 @@
>  #include <stddef.h>
>  int xmemcoll (char *, size_t, char *, size_t);
> +int xmemcoll_nul (char *, size_t, char *, size_t);
> +static inline void collate_error (int, char *, size_t, char *, size_t);

Likewise.

> +/* Like memcoll, but S1 and S2 are known to be NUL delimited, thus no
> +   modification to S1 or S2 are needed. */
> +int
> +memcoll_nul (char *s1, size_t s1len, char *s2, size_t s2len)
> +{
> +  int diff;

It would be good to start the function with a safety check:

     if (!(s1len > 0 && s1[s1len - 1] == '\0'))
       abort ();
     if (!(s2len > 0 && s2[s2len - 1] == '\0'))
       abort ();

> +#if HAVE_STRCOLL

There is no need to provide a fallback for the case that strcoll does not
exist: The file gnulib/doc/posix-functions/strcoll.texi does not list any
portability problems with strcoll(). If the function was missing on some
platforms, we would have noticed. (We have a database of which function is
present on which platform.)

> +static inline int
> +strcoll_loop (char *s1, size_t s1len, char *s2, size_t s2len)

Inline functions should occur before they are used, not afterwards. Gcc
version 2.x can only process inlines when the function to be inlined
is defined before it is invoked.

Since you introduce 'inline' in this file, you also have to add an invocation
  AC_REQUIRE([AC_C_INLINE])
to m4/memcoll.m4. Source code and autoconf macros are tied together, in gnulib.

> +static inline void
> +collate_error (int collation_errno, char *s1, size_t s1len, char *s2,
> +               size_t s2len)

Likewise.

> This is in suport of a patch to coreutils' sort, where for each
> input line xmemcoll is called several times.

For each input line, xmemcoll is called multiple times? Then you can certainly
gain much more speed than 1% by replacing memcoll with 2 x memxfrm and
1 x memcmp2. Of course, the 'sort' program will then need additional temporary
storage for the memxfrm() results of each line.

It makes no big difference in the "C" locale, but in a "ja_JP.UTF-8" locale,
I bet the speed difference will be much larger.

Bruno

Re: [PATCH] (x)memcoll: performance improvement when input is known to be NUL delimited.

Reply via email to