Paolo Bonzini wrote: > * src/dfa.c (setbit_case_fold): Remove, replace with... > (setbit_wc, setbit_c, setbit_case_fold_c): ... these. > (parse_bracket_exp): Use setbit_case_fold_c when iterating over > single-byte sequences. Use setbit_wc for multi-byte character sets, > and setbit_case_fold_c for single-byte character sets. > (lex): Use setbit_case_fold_c for single-byte character sets. > --- > > At first I was going to say this: > > > > You are using ru_RU.KOI8-R, which is a uni-byte locale, yet your > > inputs (both stdin and the grep regexp) use the two-byte > > representation \xd0\9f for П, instead of the uni-byte \360. > > > > But it fails even with the single-byte version. > > So it is indeed a bug in grep, but at least this time > > it affects relatively few locales. > > > > Here's the fix I expect to use and a test case to exercise it. > > The bug affects all single-byte locales except ISO-8859-1 ones. > It is quite serious---the logic to map wide characters back to > bytes makes no sense. > > The attached patch fixes it and does not regress high-bit-range, > while removing the debatable uses of wctob and checks for EOF. Ok > to apply together with your testcase? > --- > src/dfa.c | 102 > ++++++++++++++++++++++++++++++++++--------------------------- > 1 files changed, 57 insertions(+), 45 deletions(-)
Hi Paolo, Thanks for following through on this. At first glance (I'll look carefully today) this looks like the right approach. However, I've gone ahead and pushed my patch and test case, since it does solve the problem at hand, and I have not seen inputs that make that code misbehave. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org