[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: It's not a double replacement: chr(92)+chr(0) is processed only once. And the second paragraph of the re documentation already contains such a warning. -- ___ Python tracker

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread anatoly techtonik
anatoly techtonik added the comment: FWIW, I reimplemented substitution logic in my wikify [1] engine some time ago. I was kind of disappointed that I have to reinvent the bicycle, but now I see that this was for good. Thanks to people in this report I now understand the whole stuff much bette

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread anatoly techtonik
anatoly techtonik added the comment: Amaury, the documentation could make it more clear that it is a double replacement. Of course I payed attention to the repeated instructions about string substitution, but I thought that it is just a reminder, not an extra processing layer on top of standar

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Anatoly, your last question about re.sub is covered by the documentation: re.sub will process the replacement string, and interpret the sequence \ 0 as the NUL character. So you get the NUL character in the returned string. This is unrelated to raw litera

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread anatoly techtonik
anatoly techtonik added the comment: Users list the effect. Then a research is made to find the source. Then a decision is made to find the right cause for the source of the bug, and then a decision about if the fix is possible. The bug is closed, but that doesn't mean we can not dedicate some

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread anatoly techtonik
anatoly techtonik added the comment: I thought that trackers are used to track the sources of the bugs. Aren't they? -- ___ Python tracker ___ ___

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Guido van Rossum
Guido van Rossum added the comment: Anatoly, your question belongs on python-list or stack overflow, not in the tracker. --Guido van Rossum (sent from Android phone) On Mar 15, 2013 9:28 AM, "anatoly techtonik" wrote: > > anatoly techtonik added the comment: > > Matthew, finally the right answ

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread anatoly techtonik
anatoly techtonik added the comment: Matthew, finally the right answer. Thanks! Looking further, there is a bug in processing backslashes in raw literal replacement strings. re.sub ignores raw strings as replacements. This can be even more confusing for people who look for more advanced equiv

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Guido van Rossum
Guido van Rossum added the comment: Anatoly, your argument for consistency with other languages is ridiculous. -- ___ Python tracker ___ _

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Matthew Barnett
Matthew Barnett added the comment: The regex behaves the same as re. The reason it isn't supported is that \0 starts an octal escape sequence. -- ___ Python tracker ___

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread anatoly techtonik
anatoly techtonik added the comment: The perl syntax supported $0 according to this doc http://turtle.ee.ncku.edu.tw/docs/perl/manual/pod/perlre.html but was removed for unknown reason. Using the fact that support is removed without knowing the true reason is "cargo cult argument", which I hop

[issue17426] \0 in re.sub substitutes to space

2013-03-15 Thread Ezio Melotti
Ezio Melotti added the comment: PERL uses $& for the whole match rather than $0. That would explain why \0 is not supported. For .group() it probably made sense to access the whole match using 0 rather than passing something else, and that was likely reflected in the \g<...> form, but not in

[issue17426] \0 in re.sub substitutes to space

2013-03-14 Thread anatoly techtonik
anatoly techtonik added the comment: Am I right that \0 is not supported just because nobody thought about supporting it? -- ___ Python tracker ___ _

[issue17426] \0 in re.sub substitutes to space

2013-03-14 Thread anatoly techtonik
anatoly techtonik added the comment: You're right - groups are defined here: http://docs.python.org/2/library/re.html#re.MatchObject.group The need to fix this is to gain internal language consistency, external consistency with other major implementations, reduce docs and amount of exception

[issue17426] \0 in re.sub substitutes to space

2013-03-14 Thread Guido van Rossum
Guido van Rossum added the comment: The doc Ezio quotes for \number is describing the regex syntax, not the substitution string syntax. Unfortunately this syntax is documented somewhat less formally than the regex syntax. Fortunately, it does mention explicitly that \g<0> substitutes the enti

[issue17426] \0 in re.sub substitutes to space

2013-03-14 Thread Ezio Melotti
Ezio Melotti added the comment: The space you see is the character \x00: >>> re.sub('a+', r'__\0__', 'bbaaabb') 'bb__\x00__bb' The re documentation says: """ \number Matches the contents of the group of the same number. Groups are numbered starting from 1. """ so the re module is behaving

[issue17426] \0 in re.sub substitutes to space

2013-03-14 Thread Guido van Rossum
Guido van Rossum added the comment: It's not a space, it's a null byte. Would you mind pointing out exactly where the Python docs state that \0 in re.sub() refers to th ewhole group? (IIRC it should only say that group 0 refers the whole string in the argument to the .group() method on a matc

[issue17426] \0 in re.sub substitutes to space

2013-03-14 Thread anatoly techtonik
New submission from anatoly techtonik: According to docs, group 0 is equivalent to the whole match, which is not true for Python. import re print( re.sub('aaa', r'__\0__', 'argaaagra') ) arg__ __gra import re print( re.sub('(aaa)', r'__\1__', 'argaaagra') ) arg__aaa__gra See also: http: