[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed

2017-06-20 Thread William Budd

New submission from William Budd:

pattern = re.compile('(.*?)', flags=re.DOTALL)



# This works as expected in the following case:

print(re.sub(pattern, '\\1',
 'foo\n'
 'bar123456789\n'))

# which outputs:

foo
bar123456789



# However, it does NOT work as I expect in this case:

print(re.sub(pattern, '\\1',
 'foo123456789\n'
 'bar\n'))

# actual output:

foo123456789
bar

# expected output:

foo123456789
bar



It seems that pattern matching/substitution iterations only go haywire once the 
matching iteration immediately prior to it turned out not to be a match. Maybe 
some internal variable is not cleaned up properly in an edge(?) case triggered 
by the example above?

------
components: Regular Expressions
messages: 296506
nosy: William Budd, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.sub substitution match group contains wrong value after unmatched 
pattern was processed
versions: Python 3.6

___
Python tracker 
<http://bugs.python.org/issue30720>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed

2017-06-20 Thread William Budd

William Budd added the comment:

I don't understand... Isn't the "?" in ".*?" supposed to make the ".*" matching 
non-greedy, hence matching the first "" rather than the last ""?

--

___
Python tracker 
<http://bugs.python.org/issue30720>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed

2017-06-20 Thread William Budd

William Budd added the comment:

I now see you're right of course. Not a bug after all. Thank you.

I mistakenly assumed that the group boundary ")" would delimit the end of the 
non-greedy match group. I.e., ".*?" versus ".*?".

I don't see a way to accomplish the "even less greedy" variant I'm looking for 
though...

--

___
Python tracker 
<http://bugs.python.org/issue30720>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30720] re.sub substitution match group contains wrong value after unmatched pattern was processed

2017-06-20 Thread William Budd

William Budd added the comment:

Doh! This has a really easy solution, doesn't it; just replace "." with "[^<]": 
re.compile('([^<]*?)', flags=re.DOTALL).

Sorry about the noise.

--

___
Python tracker 
<http://bugs.python.org/issue30720>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com