[adding gnulib, since the code in question lives there] On 03/06/2012 04:21 PM, Paul Eggert wrote: > Thanks, I agree that code is potentially buggy. I don't see > any way to trigger the bug in coreutils, but it's just asking > for trouble. Here's a proposed patch. > >>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 > From: Paul Eggert <egg...@cs.ucla.edu> > Date: Tue, 6 Mar 2012 15:19:24 -0800 > Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns > > Reported by Xu Zhongxing in <http://debbugs.gnu.org/10953>. > * lib/readtokens.c: Include limits.h. > (word, bits_per_word, get_nth_bit, set_nth_bit): New. > (readtoken): Don't cache the delimiters; the cache code was buggy > if !delim && saved_delim, or if the new n_delim differs from the old. > Also, it wasn't thread-safe.
The fix looks right to me from a correctness perspective, but I'm afraid we're still sub-optimal in performance. > --- > ChangeLog | 10 +++++++++ > lib/readtokens.c | 56 +++++++++++++++++++++++------------------------------ > 2 files changed, 34 insertions(+), 32 deletions(-) > > diff --git a/ChangeLog b/ChangeLog > index 163e154..eb99d25 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,13 @@ > +2012-03-06 Paul Eggert <egg...@cs.ucla.edu> > + > + readtokens: avoid core dumps with unusual calling patterns > + Reported by Xu Zhongxing in <http://debbugs.gnu.org/10953>. > + * lib/readtokens.c: Include limits.h. > + (word, bits_per_word, get_nth_bit, set_nth_bit): New. > + (readtoken): Don't cache the delimiters; the cache code was buggy > + if !delim && saved_delim, or if the new n_delim differs from the old. > + Also, it wasn't thread-safe. > + > 2012-03-06 Bruno Haible <br...@clisp.org> > > math: Ensure declarations of math functions. > diff --git a/lib/readtokens.c b/lib/readtokens.c > index 705a62b..f11d3f6 100644 > --- a/lib/readtokens.c > +++ b/lib/readtokens.c > @@ -26,6 +26,7 @@ > > #include "readtokens.h" > > +#include <limits.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer) > tokenbuffer->buffer = NULL; > } > > +typedef size_t word; > +enum { bits_per_word = sizeof (word) * CHAR_BIT }; > + > +static bool > +get_nth_bit (size_t n, word const *bitset) > +{ > + return bitset[n / bits_per_word] >> n % bits_per_word & 1; > +} > + > +static void > +set_nth_bit (size_t n, word *bitset) > +{ > + size_t one = 1; > + bitset[n / bits_per_word] |= one << n % bits_per_word; > +} > + > /* Read a token from STREAM into TOKENBUFFER. > A token is delimited by any of the N_DELIM bytes in DELIM. > Upon return, the token is in tokenbuffer->buffer and > @@ -68,42 +85,17 @@ readtoken (FILE *stream, > char *p; > int c; > size_t i, n; > - static const char *saved_delim = NULL; > - static char isdelim[256]; > - bool same_delimiters; > - > - if (delim == NULL && saved_delim == NULL) > - abort (); > + word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word]; > > - same_delimiters = false; > - if (delim != saved_delim && saved_delim != NULL) > + memset (isdelim, 0, sizeof isdelim); > + for (i = 0; i < n_delim; i++) > { > - same_delimiters = true; > - for (i = 0; i < n_delim; i++) > - { > - if (delim[i] != saved_delim[i]) > - { > - same_delimiters = false; > - break; > - } > - } > - } > - > - if (!same_delimiters) > - { > - size_t j; > - saved_delim = delim; > - memset (isdelim, 0, sizeof isdelim); > - for (j = 0; j < n_delim; j++) > - { > - unsigned char ch = delim[j]; > - isdelim[ch] = 1; > - } > + unsigned char ch = delim[i]; > + set_nth_bit (ch, isdelim); > } > > - /* FIXME: don't fool with this caching. Use strchr instead. */ > /* skip over any leading delimiters */ > - for (c = getc (stream); c >= 0 && isdelim[c]; c = getc (stream)) > + for (c = getc (stream); c >= 0 && get_nth_bit (c, isdelim); c = getc > (stream)) Why not just strchr instead of building up an isdelim bitmap? And why are we calling getc() one character at a time, instead of using tricks like freadahead() to operate on a larger buffer? Also, is readtoken() intended to be a more powerful interface than strtok, in which case we _do_ want to be non-threadsafe, and to have a readtoken_r interface that is the underlying threadsafe variant that can benefit from caching? > { > /* empty */ > } > @@ -124,7 +116,7 @@ readtoken (FILE *stream, > p[i] = 0; > break; > } > - if (isdelim[c]) > + if (get_nth_bit (c, isdelim)) > { > p[i] = 0; > break; -- Eric Blake ebl...@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature