[issue40879] Strange regex cycle

2020-06-06 Thread Tim Peters
Tim Peters added the comment: Closing this as "Won't Fix", since the possibility of exponential-time behavior with naively written nested quantifiers is well known, and there are no plans to "do something" about that. -- resolution: -> wont fix stage: -> resolved status: open -> cl

[issue40879] Strange regex cycle

2020-06-05 Thread Tim Peters
Tim Peters added the comment: Note that the relatively tiny pattern here extracts just a small piece of the regexp in question. As the output shows, increase the length of a string it fails to match by one character, and the time taken to fail approximately doubles: exponential-time behavior

[issue40879] Strange regex cycle

2020-06-05 Thread Tim Peters
Tim Peters added the comment: The repr truncates the pattern string, for display, if it's "too long". The only visual clue about that, though, is that the display is missing the pattern string's closing quote, as in the output you showed here. If you look at url_pat.pattern, though, you'll s

[issue40879] Strange regex cycle

2020-06-05 Thread Dennis Sweeney
Dennis Sweeney added the comment: It looks like only the first 200 characters of the input string's repr are used as the compiled pattern's repr for some reason: https://github.com/python/cpython/blob/master/Modules/_sre.c#L1294 I don't know if there is a good reason, especially since th

[issue40879] Strange regex cycle

2020-06-05 Thread Steven D'Aprano
Steven D'Aprano added the comment: Wait, I'm sorry, do you mean this? py> repr(r)[13:-16] '?i)b((?:[a-z][w-]+:(?:/{1,3}|[a-z0-9%])|wwwd{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^s()<>]+|(([^s()<>]+|(([^s()<>]+)))*))+(?:(([^s()<>]+|(

[issue40879] Strange regex cycle

2020-06-05 Thread Steven D'Aprano
Steven D'Aprano added the comment: > notice the stripped characters in the `repr` Er, no. Your regex looks like line noise, and it hurts my brain to look at it :-) If you have spotted a difference, can you tell us what characters are stripped? When I try running it, I don't get any character

[issue40879] Strange regex cycle

2020-06-05 Thread Matt Miller
New submission from Matt Miller : I was evaluating a few regular expressions for parsing URL. One such expression (https://daringfireball.net/2010/07/improved_regex_for_matching_urls) causes the `re.Pattern` to exhibit some strange behavior (notice the stripped characters in the `repr`): ``