Re: memmem issues

2008-01-08 Thread Bruno Haible
> > +TESTS += test-memmem > > +TESTS_ENVIRONMENT += EXEEXT='@EXEEXT@' > > +check_PROGRAMS += test-memmem > > You can drop the TESTS_ENVIRONMENT line. I've dropped this line now. TESTS_ENVIRONMENT setting EXEEXT is only needed when one of the tests is a shell script that calls an executable. 2008

Re: string.h and NULL [was: memmem issues]

2008-01-07 Thread Bruno Haible
Eric Blake wrote: > According to Bruno Haible on 12/21/2007 6:42 AM: > | does not define NULL. Better write (void*)0 or "" instead of > NULL. > > Which platforms have a broken that fails to define NULL? Actually all platforms I have access to today define NULL in . Probably the latest platform

string.h and NULL [was: memmem issues]

2008-01-05 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Bruno Haible on 12/21/2007 6:42 AM: | |> + [AC_RUN_IFELSE([AC_LANG_PROGRAM([#include ], |> + [return !memmem ("a", 1, NULL, 0);])], | | does not define NULL. Better write (void*)0 or "" instead of NULL. I've since fixed thi

Re: Re: memmem issues

2008-01-03 Thread Bruno Haible
Paul Eggert wrote: > Yes, thanks, that looks good. Committed. Bruno

Re: memmem issues

2008-01-03 Thread Paul Eggert
"Bruno Haible" <[EMAIL PROTECTED]> writes: > * lib/memmem.c (knuth_morris_pratt, memmem): Change all 'char *' > variables to 'unsigned char *' type. > Reported by Paul Eggert. Yes, thanks, that looks good.

Re: Re: memmem issues

2008-01-03 Thread Bruno Haible
> Thanks, but the email that I got lacked a patch. Sorry about it. Here it is (please forgive the line breaking caused by the webmailer): 2008-01-02 Bruno Haible <[EMAIL PROTECTED]> * lib/memmem.c (knuth_morris_pratt, memmem): Change all 'char *' variables to 'unsigned char *

Re: memmem issues

2008-01-03 Thread Jim Meyering
"Bruno Haible" <[EMAIL PROTECTED]> wrote: > Paul Eggert wrote: >> section 6.2.6.1: >> ... >> Here again, the intent is that memcmp(&x, &y, sizeof (T)) compares all >> the bits in the underlying representation. The only way to do that >> portably is via "unsigned char *". > > OK. Here's a proposed

Re: Re: memmem issues

2008-01-02 Thread Bruno Haible
Paul Eggert wrote: > section 6.2.6.1: > ... > Here again, the intent is that memcmp(&x, &y, sizeof (T)) compares all > the bits in the underlying representation. The only way to do that > portably is via "unsigned char *". OK. Here's a proposed patch for the memmem function. Bruno

Re: memmem issues

2007-12-31 Thread Paul Eggert
Bruno Haible <[EMAIL PROTECTED]> writes: > It says that the _values_ of the characters are interpreted as > unsigned char. I.e. like this: Sure, but a character, as used in the sense here, is a bit representation that fits in a byte (see section 3.7.1); it is not a 'char' value. The intent is th

Re: memmem issues

2007-12-31 Thread Bruno Haible
Paul Eggert wrote: > Bruno Haible <[EMAIL PROTECTED]> writes: > > > Most of your comments apply to all copies of the KMP code in gnulib. > > Ouch! Should these be coalesced? Out of the 8 copies of the code currently in gnulib, I can easily coalesce 5 into 1; see attached patch. The remaining 4

Re: memmem issues

2007-12-31 Thread Bruno Haible
Paul Eggert wrote: > OK, I installed this patch. > > 2007-12-29 Paul Eggert <[EMAIL PROTECTED]> > > * lib/memmem.c (knuth_morris_pratt): Check for size_t overflow > when multiplying M by sizeof (size_t). Thanks. Let me generalize it, as follows. 2007-12-30 Bruno Haible <[EMAIL

Re: memmem issues

2007-12-31 Thread Bruno Haible
Paul Eggert wrote: > The 'char' type can have padding bits, which means that code like this: > > char const *needle = (char const *) needle_start; > char const *haystack = (char const *) haystack_start; > if ((unsigned char) *needle == (unsigned char) *haystack) ... > > might ignor

Re: memmem issues

2007-12-31 Thread Bruno Haible
Ben Pfaff wrote: > >> > +unsigned char b = (unsigned char) needle[i - 1]; > >> > ... > >> > +if (b == (unsigned char) needle[j]) > >> > >> Would it be cleaner to declare 'b' to be of type 'char' and avoid the > >> casts? > > > > No; ISO C 99 section 7.21.4 says that when byte st

Re: memmem issues

2007-12-29 Thread Paul Eggert
Bruno Haible <[EMAIL PROTECTED]> writes: > No; ISO C 99 section 7.21.4 says that when byte strings are compared the > elements are considered as 'unsigned char' values. Why risk bugs when the > approach is very simple: after fetching any 'char' from any of the strings, > cast it to 'unsigned char'

Re: memmem issues

2007-12-29 Thread Paul Eggert
Bruno Haible <[EMAIL PROTECTED]> writes: > Most of your comments apply to all copies of the KMP code in gnulib. Ouch! Should these be coalesced? >> Shouldn't this check for overflow in the multiplication? > > Yes, it should. I'm always lazy about this. OK, I installed this patch. 2007-12-29

Re: memmem issues

2007-12-26 Thread Ben Pfaff
>> > + unsigned char b = (unsigned char) needle[i - 1]; >> > ... >> > + if (b == (unsigned char) needle[j]) >> >> Would it be cleaner to declare 'b' to be of type 'char' and avoid the >> casts? > > No; ISO C 99 section 7.21.4 says that when byte strings are compared the > elements are conside

Re: memmem issues

2007-12-26 Thread Bruno Haible
Hi Paul, Most of your comments apply to all copies of the KMP code in gnulib. > Eric Blake <[EMAIL PROTECTED]> writes: > > + size_t *table = (size_t *) malloca (m * sizeof (size_t)); > > + if (table == NULL) > > +return false; > > Shouldn't this check for overflow in the multiplication? Ye

Re: memmem issues

2007-12-26 Thread Bruno Haible
Eric Blake wrote: > + Fix memmem to avoid O(n^2) worst-case complexity. > + * lib/memmem.c (knuth_morris_pratt): New function. > + (memmem): Use it if first few naive iterations fail. While considering to submit a modified version of this code to glibc for inclusion, I found it good to

Re: memmem issues

2007-12-26 Thread Bruno Haible
Eric Blake wrote: > + * tests/test-memmem.c: Rewrite, borrowing ideas from > + test-mbsstr1.c; the old version wouldn't even compile! After the rewrite, there's no code left from the original test. I committed this, to correct the attribution: 2007-12-23 Bruno Haible <[EMAIL PROTECTED]

Re: memmem issues

2007-12-21 Thread Paul Eggert
Eric Blake <[EMAIL PROTECTED]> writes: > + size_t *table = (size_t *) malloca (m * sizeof (size_t)); > + if (table == NULL) > +return false; Shouldn't this check for overflow in the multiplication? Something like this, perhaps? size_t *table; if (xalloc_oversized (m, sizeof *table))

Re: memmem issues

2007-12-21 Thread Bruno Haible
I wrote: > > + precisely, when > > + - the outer loop count is >= 10, > > + - the average number of comparisons per outer loop is >= 5, > > + - the total number of comparisons is >= m. > > These comments are out of sync with the code. Oops, that was nonsense. Please ignore t

Re: memmem issues

2007-12-21 Thread Bruno Haible
Eric Blake wrote: > - It uses memcmp without depending on the memcmp module, making it needlessly > fail on some older platforms. If memory serves me well, the platforms where memcmp was incorrect (comparing signed instead of unsigned byte values) were SunOS 4 and possibly IRIX 4. *Very* ancient.

Re: memmem issues

2007-12-20 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Jim Meyering on 12/20/2007 12:33 AM: > Eric Blake <[EMAIL PROTECTED]> wrote: > ... >> - The implementation was naively quadratic in the worst-case complexity. I >> lifted Bruno's KMP ideas in mbsstr to make it linear+delayed allocation.

Re: gnulib's Knuth-Morris-Pratt implementation (was: memmem issues)

2007-12-20 Thread James Youngman
On Dec 20, 2007 1:27 PM, Eric Blake <[EMAIL PROTECTED]> wrote: > Also, I wonder if Boyer-Moore would be any more effective than KMP in the > average case (both have worst-case linear complexity, but we know that in > general, we aren't always dealing with worst-case). > > http://en.wikipedia.org/wi

Re: gnulib's Knuth-Morris-Pratt implementation (was: memmem issues)

2007-12-20 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to James Youngman on 12/20/2007 3:59 AM: > I've been fiddling around with Bruno's KMP code recently. I like it a > lot. I wish though there was some way for the caller to indicate > that they are willing to amortise the cost of the KMP ini

gnulib's Knuth-Morris-Pratt implementation (was: memmem issues)

2007-12-20 Thread James Youngman
On Dec 19, 2007 11:15 PM, Eric Blake <[EMAIL PROTECTED]> wrote: > - The implementation was naively quadratic in the worst-case complexity. I > lifted Bruno's KMP ideas in mbsstr to make it linear+delayed allocation. > glibc > still uses the naive implementation - should we file a bug with them

Re: memmem issues

2007-12-20 Thread Simon Josefsson
Eric Blake <[EMAIL PROTECTED]> writes: > Several issues with memmem: > > - It uses memcmp without depending on the memcmp module, making it needlessly > fail on some older platforms. But memcmp is currently licensed LGPL; is it > okay with everyone that this patch relicenses it as LGPLv2+? (Or

Re: memmem issues

2007-12-19 Thread Jim Meyering
Eric Blake <[EMAIL PROTECTED]> wrote: ... > - The implementation was naively quadratic in the worst-case complexity. I > lifted Bruno's KMP ideas in mbsstr to make it linear+delayed allocation. > glibc > still uses the naive implementation - should we file a bug with them to fix > it? This is t

Re: memmem issues

2007-12-19 Thread Jim Meyering
Eric Blake <[EMAIL PROTECTED]> wrote: > I'm trying to convert m4 to transparently handle embedded NUL. In the > process, > I need to move from strstr to memmem (if m4 were fully i18n already, I would > instead be moving from mbsstr to the length-based analog of mbsstr that > tolerates embedded NU

memmem issues

2007-12-19 Thread Eric Blake
I'm trying to convert m4 to transparently handle embedded NUL. In the process, I need to move from strstr to memmem (if m4 were fully i18n already, I would instead be moving from mbsstr to the length-based analog of mbsstr that tolerates embedded NUL - is there such a thing?) Several issues wi