Vincent Lefevre wrote:

I've just reported a new Debian concerning the performance problem.

It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is meant.

Since this is a performance bug with PCRE, I suggest moving the Debian bug report to the Debian libpcre3 package. Grep cannot go back to the old way, which could cause grep to crash, and the bug cannot be fixed in grep because libpcre3 does not provide a fast way to search arbitrary data that may include encoding errors. It really is a problem that requires changes to libpcre3 to fix; grep cannot fix it.

In the meantime, in order to use 'grep' to search for strings in arbitrary data, I suggest omitting the '-P'. Also, I suggest using the C locale.

As the GNU bug 18266 "grep -P and invalid exits with error" has been fixed, I'm closing that bug report. Please feel free to open a separate GNU bug report for the performance issue.

PS. While composing this email I noticed another bug in grep -P and encoding errors, which I fixed by installing the attached patch.
From fb39b32b12be0c6114f09d51818cd703161b104e Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 11 Sep 2014 09:52:01 -0700
Subject: [PATCH] grep: fix false matches with -P '...$' and invalid UTF-8

* src/pcresearch.c (Pexecute): Use PCRE_NOTEOL when matching
initial substrings of a line.
---
 src/pcresearch.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/pcresearch.c b/src/pcresearch.c
index 4e2ccf8..17e0e32 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -163,7 +163,8 @@ Pexecute (char const *buf, size_t size, size_t *match_size,
             break;
           valid_bytes = sub[0];
           e = pcre_exec (cre, extra, p, valid_bytes, 0,
-                         options | PCRE_NO_UTF8_CHECK, sub, nsub);
+                         options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL,
+                         sub, nsub);
           if (e != PCRE_ERROR_NOMATCH)
             break;
           p += valid_bytes + 1;
-- 
1.9.3

Reply via email to