Tim Peters added the comment:
Closing this as "Won't Fix", since the possibility of exponential-time behavior
with naively written nested quantifiers is well known, and there are no plans
to "do something" about that.
--
resolution: -> wont fix
stage: -> resolved
status: open -> cl
Tim Peters added the comment:
Note that the relatively tiny pattern here extracts just a small piece of the
regexp in question. As the output shows, increase the length of a string it
fails to match by one character, and the time taken to fail approximately
doubles: exponential-time behavior
Tim Peters added the comment:
The repr truncates the pattern string, for display, if it's "too long". The
only visual clue about that, though, is that the display is missing the pattern
string's closing quote, as in the output you showed here. If you look at
url_pat.pattern, though, you'll s
Dennis Sweeney added the comment:
It looks like only the first 200 characters of the input string's repr are used
as the compiled pattern's repr for some reason:
https://github.com/python/cpython/blob/master/Modules/_sre.c#L1294
I don't know if there is a good reason, especially since th
Steven D'Aprano added the comment:
Wait, I'm sorry, do you mean this?
py> repr(r)[13:-16]
'?i)b((?:[a-z][w-]+:(?:/{1,3}|[a-z0-9%])|wwwd{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^s()<>]+|(([^s()<>]+|(([^s()<>]+)))*))+(?:(([^s()<>]+|(
Steven D'Aprano added the comment:
> notice the stripped characters in the `repr`
Er, no. Your regex looks like line noise, and it hurts my brain to look at it
:-)
If you have spotted a difference, can you tell us what characters are stripped?
When I try running it, I don't get any character
New submission from Matt Miller :
I was evaluating a few regular expressions for parsing URL. One such
expression
(https://daringfireball.net/2010/07/improved_regex_for_matching_urls) causes
the `re.Pattern` to exhibit some strange behavior (notice the stripped
characters in the `repr`):
``