[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed
New submission from William Budd: pattern = re.compile('(.*?)', flags=re.DOTALL) # This works as expected in the following case: print(re.sub(pattern, '\\1', 'foo\n' 'bar123456789\n')) # which outputs: foo bar123456789 # However, it does NOT work as I expect in this case: print(re.sub(pattern, '\\1', 'foo123456789\n' 'bar\n')) # actual output: foo123456789 bar # expected output: foo123456789 bar It seems that pattern matching/substitution iterations only go haywire once the matching iteration immediately prior to it turned out not to be a match. Maybe some internal variable is not cleaned up properly in an edge(?) case triggered by the example above? ------ components: Regular Expressions messages: 296506 nosy: William Budd, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: re.sub substitution match group contains wrong value after unmatched pattern was processed versions: Python 3.6 ___ Python tracker <http://bugs.python.org/issue30720> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed
William Budd added the comment: I don't understand... Isn't the "?" in ".*?" supposed to make the ".*" matching non-greedy, hence matching the first "" rather than the last ""? -- ___ Python tracker <http://bugs.python.org/issue30720> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed
William Budd added the comment: I now see you're right of course. Not a bug after all. Thank you. I mistakenly assumed that the group boundary ")" would delimit the end of the non-greedy match group. I.e., ".*?" versus ".*?". I don't see a way to accomplish the "even less greedy" variant I'm looking for though... -- ___ Python tracker <http://bugs.python.org/issue30720> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed
William Budd added the comment: Doh! This has a really easy solution, doesn't it; just replace "." with "[^<]": re.compile('([^<]*?)', flags=re.DOTALL). Sorry about the noise. -- ___ Python tracker <http://bugs.python.org/issue30720> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com