Re: [PATCH 4/4] dfa: do not match invalid UTF-8

2019-12-18 Thread Paul Eggert
On 12/18/19 12:48 AM, Bruno Haible wrote re my recent Gnulib change , with corresponding Grep change : >

Re: [PATCH 4/4] dfa: do not match invalid UTF-8

2019-12-18 Thread Bruno Haible
Hi Paul, > (add_utf8_anychar): Match only valid UTF-8 byte sequences > instead of allowing overlong encodings or surrogate halves. Do I understand it correctly that, as a consequence of this change, 'grep' with a regex of '^.*$' will no longer match lines which contains an invalid UTF-8 byte sequ

[PATCH 4/4] dfa: do not match invalid UTF-8

2019-12-17 Thread Paul Eggert
* lib/dfa.c (struct dfa): Grow utf8_anychar_classes member array from 5 to 9 tokens; this is needed due to the changes to add_utf8_anychar. (charclass_index): 2nd arg is now pointer-to-const. (add_utf8_anychar): Match only valid UTF-8 byte sequences instead of allowing overlong encodings or surroga