Re: [PATCH 4/4] dfa: do not match invalid UTF-8

Bruno Haible Wed, 18 Dec 2019 00:50:16 -0800

Hi Paul,

> (add_utf8_anychar): Match only valid UTF-8 byte sequences
> instead of allowing overlong encodings or surrogate halves.


Do I understand it correctly that, as a consequence of this change,
'grep' with a regex of '^.*$' will no longer match lines which contains
an invalid UTF-8 byte sequence?

If so:
  - Is this effect on 'grep' intended? (And the workaround is to use the
    "C" locale.)
  - Is it consistent with the behaviour of regex and kwset, which 'grep'
    also uses, depending on the arguments (as far as I understand)?

Bruno

Re: [PATCH 4/4] dfa: do not match invalid UTF-8

Reply via email to