I'm using contains.  Strange .. not sure what's going on.

> Are you using contains() or match()?  If you're using match(), then
> switch to contains() and it should work.  Here's my sanity check for
> the pattern (to avoid having to write a Java test program):
>
> ~> wget -O - http://www.myspace.com/pain  2> /dev/null | perl -e '@txt =
> <STDIN>; $txt = join("", @txt); $txt =~
> m#<span\s+class="nametext">[^<]*</span><br>[^<]*<font\s[^>]*><strong>([^<]+)</strong></font>#si;
> print "$1\n";'
>
>  Metal / Industrial
>

Ah yes I figured that was the issue after I saw your pattern.   The bits I
don't understand though is how [^<]* is working.  What exactly does that
part of the pattern mean?

In any case, the key to prevent excessive backtracking is to make the
> pattern as specific as possible.  The original pattern posed problems
> because of the leading .* as well as following .+ pattern elements which
> caused a lot of backtracking.
>
>

Reply via email to