Re: bug#10953: Potential logical bug in readtokens.c

Paul Eggert Tue, 06 Mar 2012 21:33:15 -0800

On 03/06/2012 03:32 PM, Eric Blake wrote:
> Why not just strchr instead of building up an isdelim bitmap?


strchr would not be right, since '\0' is valid in data and
as a delimiter.

No doubt you meant 'memchr'; but using 'memchr' would slow
down readtoken by about a factor of two.  I got this result by
timing the following benchmark on gcc-4.6.1.tar (uncompressed)
on Fedora 15 x86-64 with GCC 4.6.2:

#include <stdio.h>
#include <readtokens.h>

struct tokenbuffer t;

int main (void)
{
  for (;;)
    {
      size_t s = readtoken (stdin, " \t\n", 3, &t);
      if (s == (size_t) -1)
        return 0;
    }
}

On this benchmark, the relative speeds (user+sys CPU time ratios,
bigger numbers are better) are:

 0.54  readtoken with memchr
 1.00  current readtoken (with non-thread-safe byte array)
 1.13  proposed readtoken (with thread-safe bitset)

So the proposed patch is a performance win even in non-thread-safe use.

> And why
> are we calling getc() one character at a time, instead of using tricks
> like freadahead() to operate on a larger buffer?
> 
> Also, is readtoken() intended to be a more powerful interface than
> strtok, in which case we _do_ want to be non-threadsafe, and to have a
> readtoken_r interface that is the underlying threadsafe variant that can
> benefit from caching?

I haven't thought about these issues, but surely they are
independent of the proposed patch.

Re: bug#10953: Potential logical bug in readtokens.c

Reply via email to