Made some more progress. It seems to be something about the size of the string the regexp is processing, in addition to the new ID tag in the <a href> for each message. I tried running the regexp over different portions of the $tmpPage string, and as it got longer and longer it appears the CPU time went exponential.
So, on a 2Ghz CPU with 512M RAM and not much else going on, it didn't finish processing a single pass in an hour. The attached patch sort of fixes the problem. It's almost certainly not the way this should be solved, but hopefully sheds some light on what's going wrong. What's a little odd about this solution is it leaves the last message in the inbox on first run with --delete, but then the next run gets the remaining message. -- Adam Rosi-Kessel http://adam.rosi-kessel.org
--- fetchyahoo 2005-11-13 08:49:58.000000000 -0500 +++ fetchyahoo.new 2005-11-13 08:44:24.000000000 -0500 @@ -853,7 +853,9 @@ my $tmpLine = ''; # the long regex matches and removes a single message - while ( $tmpPage =~ s/^.*?^[\s]*<tr class=msg(new|old).*?^<td.*?name="Mid".value="([^"]+)".*?^<td>(.*?)<.*?^<td>.*?^[\s]*<a.href=.*?ShowLetter\?MsgId=([^&]+)&.*?\n(.*?)\n.*?^[\s]*<td .*?>(.*?)<.*?^[\s]*<td>(.*?)<//ms ) { + # Adam Rosi-Kessel 2005/11/13 Hackish patch to stop regexp from hanging + $tmpPage =~ s/^.*?^[\s]*<tr class=msg/<tr class=msg/ms; + while ( $tmpPage =~ s/^<tr class=msg(new|old).*?^<td.*?name="Mid".value="([^"]+)".*?^<td>(.*?)<.*?^<td>.*?^[\s]*<a.*?href=.*?ShowLetter\?MsgId=([^&]+)&.*?\n(.*?)\n.*?^[\s]*<td .*?>(.*?)<.*?^[\s]*<td>(.*?)<//ms ) { if (! $2 eq $4) { print "\nWarning: message ID's $2 and $4 don't match.\n" unless $quiet; } @@ -871,6 +873,7 @@ if ($newOnly) { $tmpLine =~ s/^(new|old) //; } print $msgcount . ". " . $tmpLine . "\n"; } + $tmpPage =~ s/^.*?^[\s]*(<tr class=msg)/$1/sm; } $pagecount = $pagecount+1 ; # next summary page