On 03/06/2012 03:32 PM, Eric Blake wrote: > Why not just strchr instead of building up an isdelim bitmap?
strchr would not be right, since '\0' is valid in data and as a delimiter. No doubt you meant 'memchr'; but using 'memchr' would slow down readtoken by about a factor of two. I got this result by timing the following benchmark on gcc-4.6.1.tar (uncompressed) on Fedora 15 x86-64 with GCC 4.6.2: #include <stdio.h> #include <readtokens.h> struct tokenbuffer t; int main (void) { for (;;) { size_t s = readtoken (stdin, " \t\n", 3, &t); if (s == (size_t) -1) return 0; } } On this benchmark, the relative speeds (user+sys CPU time ratios, bigger numbers are better) are: 0.54 readtoken with memchr 1.00 current readtoken (with non-thread-safe byte array) 1.13 proposed readtoken (with thread-safe bitset) So the proposed patch is a performance win even in non-thread-safe use. > And why > are we calling getc() one character at a time, instead of using tricks > like freadahead() to operate on a larger buffer? > > Also, is readtoken() intended to be a more powerful interface than > strtok, in which case we _do_ want to be non-threadsafe, and to have a > readtoken_r interface that is the underlying threadsafe variant that can > benefit from caching? I haven't thought about these issues, but surely they are independent of the proposed patch.