tags 446270 + patch thanks On Sun, Nov 23, 2008 at 05:36:48PM +0100, Peter De Wachter wrote: >The root cause of this bug is the use of mbtowc in 64-egf-speedup.patch >and 67-w.patch. These patches try to use mbtowc to look at the >character before and after the match to check if the match is a whole >word. But when a binary file is being grepped, mbtowc gets passed >random junk rather than a valid UTF-8 character. As a consequence, its >internal state gets messed up, and you get nonsense for the following >matches. The fix is to use mbrtowc so you can reset its state. A patch >is attached. > >65-dfa-optional.patch is a red herring. I guess that patch just exposes >the bug because it causes grep to use a different code path. But you >get the same bug with grep -F, which is not touched by that patch. > >-- >Peter De Wachter
>--- a/build-tree/grep-2.5.3/src/search.c >+++ b/build-tree/grep-2.5.3/src/search.c >@@ -502,7 +502,7 @@ > } > else > s = last_char; >- mr = mbtowc (&pwc, s, match - s); >+ mr = mbrtowc (&pwc, s, match - s, &mbs); > if (mr <= 0) > { > memset (&mbs, '\0', sizeof (mbstate_t)); >@@ -531,8 +531,8 @@ > wchar_t nwc; > int mr; > >- mr = mbtowc (&nwc, buf + start + len, >- end - buf - start - len - 1); >+ mr = mbrtowc (&nwc, buf + start + len, >+ end - buf - start - len - 1, >&mbs); > if (mr <= 0) > { > memset (&mbs, '\0', sizeof (mbstate_t)); >@@ -941,7 +941,7 @@ > } > else > s = last_char; >- mr = mbtowc (&pwc, s, beg - s); >+ mr = mbrtowc (&pwc, s, beg - s, &mbs); > if (mr <= 0) > memset (&mbs, '\0', sizeof (mbstate_t)); > else if ((iswalnum (pwc) || pwc == L'_') >@@ -959,7 +959,7 @@ > wchar_t nwc; > int mr; > >- mr = mbtowc (&nwc, beg + len, buf + size - beg - len); >+ mr = mbrtowc (&nwc, beg + len, buf + size - beg - len, &mbs); > if (mr <= 0) > { > memset (&mbs, '\0', sizeof (mbstate_t)); Peter De Wachter, thank you for the patch. Nicolas François, when you have a chance, please review the patch above.
signature.asc
Description: Digital signature