In message <[email protected]>, no sp
am writes:
>I'm using this pattern:
>p5c.compile(".*?<td\\s+.+<span\\s+class=\"nametext\">"+
>".+?<strong>(.+?)</strong></font>.+?Profile\\s+Views",
>Perl5Compiler.SINGLELINE_MASK);
>
>to try and pull genres out of myspace pages. However some pages like this
...
>How can I prevent these loops?
Presumably, you're concerned only with the capture group (containing
the genre), so rewrite the expression along the following lines to
avoid the ambiguous/excessive backtracking:
p5c.compile("<span\\s+class=\"nametext\">[^<]*</span><br>[^<]*<font[^>]*>"+
"<strong>([^<]+)</strong></font>",
Perl5Compiler.SINGLELINE_MASK);
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]