[Resending my mail from 2023-04-11 as it does not appear in the Debian
BTS. With help from #debian-admin I learned the my local part (mail@) in
my From: was the problem. My mails went to /dev/null or somewhere. :-(
Apologies for any inconveniences.]



Hi Robert,

Thanks for the detailed bug report!

On Thu, Feb 16, 2023 at 09:41:56PM +0100, Robert Alm Nilsson wrote:
> When I search backward in w3m by using the '?' command, the program
> freezes with high CPU usage for a bit of time if the found text is near
> the end of a very long line.

[Snip repro steps]

> The reason that this happens when searching backward but not forward is
> that in the function `backwardSearch` in search.c there is this loop:
> 
>     while (regexMatch(...) == 1)
> 
> but in `forwardSearch` it's just an if statement:
> 
>     if (regexMatch(...) == 1)
> 
> That means that when searching backward, we do one regexMatch for each
> character on the line and since the regexMatch itself already searches
> through the characters, that's pretty wasteful and that's what gives us
> the squared times.
> 
> It should be possible to change this so that an if statement is used in
> `backwardSearch` just like in `forwardSearch`.

This does not work as we are searching backwards and need the last
match, not the first.

> Maybe there could be two versions of `regexMatch` where one of them
> searches backwards or just returns the last match instead of the first
> one so that we don't have to call it in a loop.

This would be the ideal solution, of course - but also not a trivial
one. As a workaround we can change the function to not proceed char by
char, but to skip past the end of the last match. This will of course
not help if nearly the whole line matches.

See my patch below (I also refactored backwardSearch to remove a lot of
duplicated code).


---
Author: Rene Kita <m...@rkta.de>
Subject: Fix slow backward search in long lines

We need to search the whole line up to the position we are at to find
the first match from the end. If we have a match we know its end and can
skip past it. This makes searching very long strings way more efficient.

While there, refactor the whole function.
---
 search.c |  104 ++++++++++++++++++++++++++-------------------------------------
 1 file changed, 44 insertions(+), 60 deletions(-)

--- a/search.c
+++ b/search.c
@@ -220,7 +220,7 @@ backwardSearch(Buffer *buf, char *str)
        while (l->bpos && l->prev)
            l = l->prev;
     }
-    begin = l;
+
     if (pos > 0) {
        pos--;
 #ifdef USE_M17N
@@ -228,80 +228,64 @@ backwardSearch(Buffer *buf, char *str)
            pos--;
 #endif
        p = &l->lineBuf[pos];
-       found = NULL;
-       found_last = NULL;
+    }
+    else {
+       if (!(l = l->prev)) {
+           if (!WrapSearch)
+               return SR_NOTFOUND;
+           l = buf->lastLine;
+       }
+       p = &l->lineBuf[l->size];
+    }
+
+    begin = l;
+    found = NULL;
+    found_last = NULL;
+
+    while (1) {
        q = l->lineBuf;
+       /*
+        * Search as long as we have matches on the current line.
+        * We are searching the line from begin to end. As we are searching
+        * backwards we want the last match on the line.
+        */
        while (regexMatch(q, &l->lineBuf[l->size] - q, q == l->lineBuf) == 1) {
            matchedPosition(&first, &last);
            if (first <= p) {
                found = first;
                found_last = last;
            }
-           if (q - l->lineBuf >= l->size)
-               break;
-           q++;
+           q = last;
 #ifdef USE_M17N
            while (q - l->lineBuf < l->size
                   && l->propBuf[q - l->lineBuf] & PC_WCHAR2)
                q++;
 #endif
-           if (q > p)
+           if (q >= p)
                break;
        }
-       if (found) {
-           pos = found - l->lineBuf;
-           while (pos >= l->len && l->next && l->next->bpos) {
-               pos -= l->len;
-               l = l->next;
-           }
-           buf->pos = pos;
-           if (l != buf->currentLine)
-               gotoLine(buf, l->linenumber);
-           arrangeCursor(buf);
-           set_mark(l, pos, pos + found_last - found);
-           return SR_FOUND;
+
+       if (found)
+           break;
+
+       if (!(l = l->prev)) {
+           if (!WrapSearch)
+               return SR_NOTFOUND;
+           l = buf->lastLine;
        }
+       if (l == begin) /* no match */
+           return SR_NOTFOUND;;
     }
-    for (l = l->prev;; l = l->prev) {
-       if (l == NULL) {
-           if (WrapSearch) {
-               l = buf->lastLine;
-               wrapped = TRUE;
-           }
-           else {
-               break;
-           }
-       }
-       found = NULL;
-       found_last = NULL;
-       q = l->lineBuf;
-       while (regexMatch(q, &l->lineBuf[l->size] - q, q == l->lineBuf) == 1) {
-           matchedPosition(&first, &last);
-           found = first;
-           found_last = last;
-           if (q - l->lineBuf >= l->size)
-               break;
-           q++;
-#ifdef USE_M17N
-           while (q - l->lineBuf < l->size
-                  && l->propBuf[q - l->lineBuf] & PC_WCHAR2)
-               q++;
-#endif
-       }
-       if (found) {
-           pos = found - l->lineBuf;
-           while (pos >= l->len && l->next && l->next->bpos) {
-               pos -= l->len;
-               l = l->next;
-           }
-           buf->pos = pos;
-           gotoLine(buf, l->linenumber);
-           arrangeCursor(buf);
-           set_mark(l, pos, pos + found_last - found);
-           return SR_FOUND | (wrapped ? SR_WRAPPED : 0);
-       }
-       if (wrapped && l == begin)      /* no match */
-           break;
+
+    pos = found - l->lineBuf;
+    while (pos >= l->len && l->next && l->next->bpos) {
+       pos -= l->len;
+       l = l->next;
     }
-    return SR_NOTFOUND;
+    buf->pos = pos;
+    if (l != buf->currentLine)
+       gotoLine(buf, l->linenumber);
+    arrangeCursor(buf);
+    set_mark(l, pos, pos + found_last - found);
+    return SR_FOUND | (wrapped ? SR_WRAPPED : 0);
 }

Reply via email to