Bug#758105: handling bytes not part of the charset, and other garbage

2014-09-11 Thread Vincent Lefevre
On 2014-09-11 09:22:49 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > > >There's no reason that '.' matches something that doesn't belong to > >the charset in C locale, but doesn't match in a UTF-8 locale. > > In the C locale on GNU/Linux, all byte values are members of the charset. I don'

Bug#758105: handling bytes not part of the charset, and other garbage

2014-09-11 Thread Paul Eggert
Vincent Lefevre wrote: There's no reason that '.' matches something that doesn't belong to the charset in C locale, but doesn't match in a UTF-8 locale. In the C locale on GNU/Linux, all byte values are members of the charset. That is why it's OK for '.' to accept that byte in the C locale

Bug#758105: handling bytes not part of the charset, and other garbage (was: grep -P and invalid exits with error)

2014-09-11 Thread Vincent Lefevre
On 2014-09-01 01:31:53 -0700, Paul Eggert wrote: > Vincent Lefevre wrote: > >If there are many invalid UTF8 bytes, this would be slow, IMHO > > That's OK. We don't need grep -P to be fast on invalid input. I can see a too important slowdown in practical cases. > >But is the copy of the buffer r