Bug#758105: grep -P and invalid exits with error

2014-09-08 Thread Santiago
Patch updated. Paul, thanks for the previous comments. As you suggested, the attached patch doesn't copy the buffer and splits the input when it finds an invalid character. For the moment, I don't see a cleaner way to avoid the pcre internals. Regards, Santiago From d58b53f86bb3f4b97137f708c159

Bug#758105: grep -P and invalid exits with error

2014-09-01 Thread Paul Eggert
Vincent Lefevre wrote: [...] Note that this option can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the validity checking of subject strings only. If the same string is being matched many times, the option can be safely set for the second and

Bug#758105: grep -P and invalid exits with error

2014-09-01 Thread Vincent Lefevre
On 2014-08-29 06:43:45 -0700, Paul Eggert wrote: > Thanks, but that patch seems to depend on libpcre internals, in that it > "knows" that pcre_exec cannot possibly succeed without first checking its > entire input buffer for invalid UTF-8 bytes. Even if that's true now, it > reflects a performance

Bug#758105: grep -P and invalid exits with error

2014-08-29 Thread Paul Eggert
Thanks, but that patch seems to depend on libpcre internals, in that it "knows" that pcre_exec cannot possibly succeed without first checking its entire input buffer for invalid UTF-8 bytes. Even if that's true now, it reflects a performance bug that might be fixed in a future libpcre version.

Bug#758105: grep -P and invalid exits with error

2014-08-28 Thread Santiago
El 16/08/14 a las 11:36, Paul Eggert escribió: > Santiago wrote: > >Another solution would be to don't check if binary files are valid > >(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd > >avoid security holes > > It wouldn't. (We already tried it.) > Another try. This pat

Bug#758105: grep -P and invalid exits with error

2014-08-14 Thread Santiago
Hi, Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373 That commit (re)introduced a regression bug (See http://debbugs.gnu.org/15758). pcresearch checks again if input is UTF-8 valid. The problem is that binary files are utf-8 invalid, so grep -P, in unicode locales, exits with error: LANG=